Description
Job Title: Site Reliability Engineer
Experience Level: Level 4 (advanced): 7-15 years
Location: Montreal (Day 1 onboarding onsite / in office presence 3x week)
Duration: 12+ months contract
Primary Responsibilities:
- Provide L3 support for ***’s private cloud, including on-call rotation
- Work closely with the internal engineering team and provide input on testing of new component releases and infrastructure upgrades, as well as performance, capacity, and monitoring
- Create and improve processes for support, including training, documentation, customer engagement, automation, and scripting, incident, problem, and change management
- Work together with L2 teams and other L3 team members internationally
Required Skills:
- 5 to 10 years of relevant experience
- 3 to 5 years of Linux experience
- Experience in front and back-end development with Golang
- Sound knowledge of server infrastructure, virtualization, cloud computing
- Proven Kubernetes and Docker experience
- Excellent understanding of internet and networking protocols, including TCP/IP, HTTP/HTTPS
- Strong understanding of security protocols, e.g. SSL/TLS, Kerberos
- Strong organizational skills and ability to manage multiple tasks and high-pressure situations for outage resolution
- Experience with Agile and DevOps/SRE concepts
- Have administrative competence in at least one major scripting language or platform (for example Python)
- Communicate effectively with various user groups, e.g. developers and engineers, as well as remote team members
- Willing to work in on-call rotation (every 5 weeks)





