Description

Twitter

Facebook

Google+

Job Title: SRE – AWS/Dynatrace with Development experience

Duration: 12 Months contract

Location: Toronto, Ontario, Canada

Job Description:

Reliability, resiliency, and operational excellence for mission‑critical AWS serverless platforms, ensuring high availability, low MTTR, and strong production governance using Dynatrace‑driven observability.

Resiliency strategy for serverless architectures (Lambda, API Gateway, async/event‑driven systems)
SLOs / SLIs / Error Budgets for critical API’s
Incident analysis and post‑incident reviews
Dynatrace observability: dashboards, alert tuning, dependency mapping, RCA acceleration
Operational excellence improvements: incident reduction, MTTR improvement, toil automation
Reliability guardrails embedded into CI/CD and production readiness reviews

Core Responsibilities

Design & enforce resiliency patterns: timeouts, retries, circuit breakers, throttling, graceful degradation
Lead major incidents and drive actionable RCAs with sustained fixes
Build signal‑driven alerts aligned to SLOs (noise reduction focus)
Enable automation & self‑healing where feasible

Required Experience

5-6+ years in SRE/DevOps/Production Engineering
Deep hands‑on with AWS serverless (Lambda, API Gateway, SQS/SNS, DynamoDB/RDS)
Strong expertise in Dynatrace for serverless monitoring & triage
Proven success improving availability, MTTR, and incident trends
Solid coding/scripting (Python / Java / Node.js)

Twitter

Facebook

Google+

SRE – AWS/Dynatrace with Development experience

Description

Enterprise Solutions Inc.

Job Alerts