In this role, you will be responsible for designing, deploying, and optimizing high-performance infrastructure and data platforms across cloud-native and containerized environments. As a key member of our DevOps team, you will contribute to the development and support of our company’s Open Data Platform and play a vital role in the delivery of critical customer installations. You will also collaborate on AI chatbot initiatives and support key solutions in areas such as Fault Management, IoT, and Provisioning. Your expertise in distributed systems, automation, and big data technologies will drive innovation and ensure the stability and scalability of our infrastructure.
Main Responsibilities
- Design and manage cloud infrastructure—primarily on Microsoft Azure—ensuring high availability, performance, and scalability
- Build and maintain Kubernetes clusters and container-based deployments
- Automate infrastructure using Terraform, Ansible, and Infrastructure as Code (IaC) best practices
- Develop and manage robust CI/CD pipelines to streamline deployment, testing, and delivery processes
- Oversee monitoring, alerting, and logging systems (e.g., Prometheus, Grafana) to ensure proactive system health and reliability
- Provide 1st and 2nd level support for platform issues, incidents, and infrastructure troubleshooting
- Collaborate with customers and internal teams to deliver technical solutions, gather requirements, and support deployments
- Operate and support distributed data platforms, including Hadoop and related ecosystem components
- Maintain and secure Linux-based VMs, ensuring performance, patching, and compliance
- Use Git for version control, collaboration, and code lifecycle management
Requirements
- Strong experience with Azure and cloud-native architectures
- Solid understanding of Kubernetes and containerization technologies (4+ years)
- Proven expertise with Terraform, Ansible, and automation-driven practices
- Hands-on experience with CI/CD pipelines (e.g., GitLab CI, Jenkins)
- Familiarity with monitoring and alerting solutions such as Prometheus, Grafana, ELK, etc.
- Understanding of SRE principles, including SLIs, SLOs, incident response, and operational excellence
- Experience working directly with customers and providing technical support (1st/2nd level)
- Proficiency in Linux, virtual machines, and shell scripting
- Practical knowledge of Hadoop architecture and related tools (e.g., HDFS, Hive, Spark)
- Strong collaboration, communication, and troubleshooting skills