Description

Job Title: Site Reliability Engineer

Experience Level: Level 4 (advanced): 7-15 years

Location: Montreal (Day 1 onboarding onsite / in office presence 3x week)

Duration: 12+ months contract

Primary Responsibilities:

  • Provide L3 support for ***’s private cloud, including on-call rotation
  • Work closely with the internal engineering team and provide input on testing of new component releases and infrastructure upgrades, as well as performance, capacity, and monitoring
  • Create and improve processes for support, including training, documentation, customer engagement, automation, and scripting, incident, problem, and change management
  • Work together with L2 teams and other L3 team members internationally

Required Skills:

  • 5 to 10 years of relevant experience
  • 3 to 5 years of Linux experience
  • Experience in front and back-end development with Golang
  • Sound knowledge of server infrastructure, virtualization, cloud computing
  • Proven Kubernetes and Docker experience
  • Excellent understanding of internet and networking protocols, including TCP/IP, HTTP/HTTPS
  • Strong understanding of security protocols, e.g. SSL/TLS, Kerberos
  • Strong organizational skills and ability to manage multiple tasks and high-pressure situations for outage resolution
  • Experience with Agile and DevOps/SRE concepts
  • Have administrative competence in at least one major scripting language or platform (for example Python)
  • Communicate effectively with various user groups, e.g. developers and engineers, as well as remote team members
  • Willing to work in on-call rotation (every 5 weeks)