← Browse all jobs
T

Senior DevOps & Site Reliability Engineer

Tangentia
Toronto, OntarioOn-site1 week ago
Apply Now →

About this role

Role: Senior DevOps & Site Reliability Engineer Location: Toronto, ON Hybrid Job description • 3+ years of experience in Site Reliability Engineering or related roles. • Experience with Apigee Hybrid, Google Distributed Cloud, Azure, GCP, and Kubernetes. • Advanced DevOps and SRE skills: CI/CD, automation, monitoring, infrastructure as code. • Certificate management scripting and automation. • Proficiency with Ansible for configuration management and orchestration. • Experience with APM tools such as Dynatrace, Splunk • Programming experience with python Key Responsibilities: • Oversee the reliability, availability, and performance of Apigee Hybrid and Google Distributed Cloud environments, ensuring robust SRE practices. • Manage and automate certificate management processes, including renewals, deployments, and compliance checks. • Plan and execute upgrades and maintenance activities for Apigee Hybrid and distributed cloud infrastructure, minimizing downtime and ensuring seamless transitions. • Implement and maintain monitoring solutions using Dynatrace and Splunk, proactively identifying and resolving issues to ensure system health and performance. • Troubleshoot complex production incidents, perform root cause analysis, and drive incident resolution to restore service quickly and prevent recurrence. • Develop and maintain automation scripts and Ansible playbooks for operational efficiency, including tasks such as Kubernetes context retrieval, proxy configuration, and container management. • Collaborate with cross-functional teams to ensure security, compliance, and best practices are followed across all SRE activities. • Mentor and guide team members in SRE methodologies, fostering a culture of continuous improvement and operational excellence.
Want to see how well you match this job?
Get AI-scored for free →