Senior DevOps & Site Reliability Engineer

Tangentia
Toronto, OntarioOn-site1 week ago
About this role

Role: Senior DevOps & Site Reliability Engineer
Location: Toronto, ON

Hybrid

Job description

• 3+ years of experience in Site Reliability Engineering or related roles.

• Experience with Apigee Hybrid, Google Distributed Cloud, Azure, GCP, and Kubernetes.

• Advanced DevOps and SRE skills: CI/CD, automation, monitoring, infrastructure as code.

• Certificate management scripting and automation.

• Proficiency with Ansible for configuration management and orchestration.

• Experience with APM tools such as Dynatrace, Splunk

• Programming experience with python

Key Responsibilities:

• Oversee the reliability, availability, and performance of Apigee Hybrid and Google Distributed Cloud environments, ensuring robust SRE practices.

• Manage and automate certificate management processes, including renewals, deployments, and compliance checks.

• Plan and execute upgrades and maintenance activities for Apigee Hybrid and distributed cloud infrastructure, minimizing downtime and ensuring seamless transitions.

• Implement and maintain monitoring solutions using Dynatrace and Splunk, proactively identifying and resolving issues to ensure system health and performance.

• Troubleshoot complex production incidents, perform root cause analysis, and drive incident resolution to restore service quickly and prevent recurrence.

• Develop and maintain automation scripts and Ansible playbooks for operational efficiency, including tasks such as Kubernetes context retrieval, proxy configuration, and container management.

• Collaborate with cross-functional teams to ensure security, compliance, and best practices are followed across all SRE activities.

• Mentor and guide team members in SRE methodologies, fostering a culture of continuous improvement and operational excellence.
Want to see how well you match this job?
Get AI-scored for free →