Description

Job Summary

The role seeks a highly experienced Infrastructure Specialist to spearhead the design, deployment, and operational excellence of a modern cloud-native infrastructure. The ideal candidate must possess deep expertise in container orchestration (Kubernetes), distributed storage (Ceph), and robust security principles (OAuth, KeyCloak).

Key Responsibilities

* Lead the infrastructure team in the design, implementation, and maintenance of the core cloud-native platform, including Kubernetes, Ingress/Egress, and related technologies.

* Drive automation and configuration management using advanced tools; specifically, utilize Helm for packaging, deployment, and lifecycle management of applications on Kubernetes in a production environment.

* Develop and maintain operational tooling, custom integrations, and system automation scripts primarily using Python to streamline deployment pipelines and enhance platform observability.

* Oversee and manage large-scale, resilient storage solutions, with hands-on expertise in administering and optimizing Ceph clusters.

* Design and implement robust Identity and Access Management (IAM) and Single Sign-On (SSO) solutions utilizing KeyCloak, OAuth, and LDAP to ensure secure authentication and authorization across all services.

* Collaborate with teams on secure and efficient network architecture, including configuration of firewalls, VPNs, and managing Ingress and Egress traffic flow.

* Ensure compliance with security and regulatory requirements and maintain high standards for system reliability and air-gap deployment solutions.

* Provide technical guidance, mentorship, and leadership to the infrastructure team, fostering a culture of continuous improvement and adoption of emerging technologies.

Must-Have Requirements

* 10+ years of progressive experience in infrastructure design, implementation, and maintenance, with a strong focus on security and cloud-native environments.

* Kubernetes administration and deployment experience in production environments.

* Developing, managing, and maintaining complex application deployments using Helm charts.

* Distributed, software-defined storage solutions, particularly Ceph.

* Identity and Access Management (IAM), including KeyCloak, OAuth, LDAP.

* Python for automation, system integration, and operational tasks.

* Configuring and managing Ingress controllers and network security.