All jobs

Senior Platform Engineer – Cloud & ML Platform (m/f/d)

Quantum- Systems GmbH Sourced

Gilching Full-time Not specified

About the role

As a Platform Engineer – Cloud & ML Platform (m/f/d), you will be a key contributor to the cloud-native infrastructure that powers our AI and autonomy development at global scale. You will design, deploy, operate, and continuously improve Kubernetes-based platforms that enable our teams to train, evaluate, deploy, and monitor machine learning workloads reliably across regions, clouds, and compute environments. At Quantum Systems, we build intelligent unmanned systems that operate under real-world constraints. Our AI teams depend on scalable, secure, and high-performance infrastructure to turn data, models, and experiments into field-ready capabilities. In this role, you will help build the cloud and ML platform backbone that makes this possible. You will work closely with AI engineers, data engineers, software teams, security, IT, and product stakeholders to provide robust, automated, and developer-friendly infrastructure for large-scale ML workloads. Your work will directly support our mission to push the boundaries of autonomous systems through cutting-edge software, edge computing, and real-time AI-powered data processing. What is your Day to Day Mission: Design, deploy, operate, and continuously improve Kubernetes-based platforms for machine learning and data-intensive workloads. Build and maintain globally distributed Kubernetes clusters with a strong focus on reliability, scalability, security, and observability. Own the lifecycle management of ML platform components, including Kubeflow, Metaflow, workflow orchestration, experiment tracking, and related MLOps tooling. Enable AI and data teams to run scalable training, inference, evaluation, and data processing pipelines across heterogeneous compute environments. Develop infrastructure-as-code, automation, and GitOps workflows to ensure reproducible, auditable, and efficient platform operations. Manage GPU-enabled workloads, scheduling, storage, networking, secrets, access control, and cost-aware resource utilization. Improve platform resilience through monitoring, alerting, incident response, backup strategies, disaster recovery, and capacity planning. Collaborate with AI, software, DevOps, security, and IT teams to define platform standards, best practices, and deployment patterns. Support hybrid and multi-cloud infrastructure scenarios, including on-premise, private cloud, and public cloud environments. Evaluate and integrate cloud providers and infrastructure technologies, including Azure, AWS, Telekom Cloud, or comparable platforms. Continuously improve developer experience for ML engineers through self-service tooling, documentation, templates, and platform abstractions. Help bring AI capabilities from prototype to production by providing a reliable, scalable, and secure ML infrastructure foundation. What you bring to the team: Strong hands-on expertise with Kubernetes in production environments, including cluster operations, networking, storage, security, scaling, upgrades, and troubleshooting. Proven experience deploying and maintaining globally distributed, large-scale clusters for production or mission-critical workloads. Strong experience with Kubeflow and Metaflow in production or production-like ML platform environments. Solid understanding of MLOps workflows, including training pipelines, model lifecycle management, artifact handling, experiment tracking, reproducibility, and deployment automation. Experience operating GPU-enabled Kubernetes environments and supporting high-performance machine learning workloads. Strong infrastructure-as-code experience using tools such as Terraform, Helm, Kustomize, Argo CD, Flux, Crossplane, Ansible, or comparable technologies. Good understanding of cloud-native observability, including metrics, logs, traces, alerting, dashboards, and SLO-driven operations. Experience with containerization, CI/CD, GitOps, secrets management, identity and access control, and secure platform operations. Familiarity with

Skills

Internet and softwareprofessional

Apply to Senior Platform Engineer – Cloud & ML Platform (m/f/d) at Quantum- Systems GmbH

Hyrovo matches you to jobs worldwide and helps you apply. Browsing, matching, and applying are free; AI-written CVs and cover letters are pay-as-you-go.

Apply now