Description
Job Summary
The role seeks a highly experienced Infrastructure Specialist to spearhead the design, deployment, and operational excellence of a modern cloud-native infrastructure. The ideal candidate must possess deep expertise in container orchestration (Kubernetes), distributed storage (Ceph), and robust security principles (OAuth, KeyCloak).
Key Responsibilities
* Lead the infrastructure team in the design, implementation, and maintenance of the core cloud-native platform, including Kubernetes, Ingress/Egress, and related technologies.
* Drive automation and configuration management using advanced tools; specifically, utilize Helm for packaging, deployment, and lifecycle management of applications on Kubernetes in a production environment.
* Develop and maintain operational tooling, custom integrations, and system automation scripts primarily using Python to streamline deployment pipelines and enhance platform observability.
* Oversee and manage large-scale, resilient storage solutions, with hands-on expertise in administering and optimizing Ceph clusters.
* Design and implement robust Identity and Access Management (IAM) and Single Sign-On (SSO) solutions utilizing KeyCloak, OAuth, and LDAP to ensure secure authentication and authorization across all services.
* Collaborate with teams on secure and efficient network architecture, including configuration of firewalls, VPNs, and managing Ingress and Egress traffic flow.
* Ensure compliance with security and regulatory requirements and maintain high standards for system reliability and air-gap deployment solutions.
* Provide technical guidance, mentorship, and leadership to the infrastructure team, fostering a culture of continuous improvement and adoption of emerging technologies.
Must-Have Requirements
* 10+ years of progressive experience in infrastructure design, implementation, and maintenance, with a strong focus on security and cloud-native environments.
* Kubernetes administration and deployment experience in production environments.
* Developing, managing, and maintaining complex application deployments using Helm charts.
* Distributed, software-defined storage solutions, particularly Ceph.
* Identity and Access Management (IAM), including KeyCloak, OAuth, LDAP.
* Python for automation, system integration, and operational tasks.
* Configuring and managing Ingress controllers and network security.





