Was ehemalige und aktuelle Mitarbeiter:innen über diesen Arbeitgeber sagen
Geschätztes Gehalt
Zu diesem Beruf haben wir derzeit leider keine Daten. Übst du diesen Beruf aus? Teile dein Gehalt mit uns – natürlich anonym – und trage zur Gehaltstransparenz bei!
Was die Firma über den Job sagt
Workload Orchestration Engineer
As a Workload Orchestration Engineer in the Accelerated Compute Engineering team, you will manage and advance workload orchestration for High-Performance Computing (HPC) and AI Factory platforms. You will be responsible for deploying, configuring, and fine-tuning orchestration tools to ensure efficient scheduling and utilization of CPU and GPU environments for large-scale AI training and scientific simulations.
Responsibilities
- Design, implement, and maintain the SLURM Workload Manager ecosystem for HPC clusters.
- Deploy and manage Run:ai for AI Factory orchestration and fractional GPU allocation.
- Implement SLURM Slinky integrations to bridge Kubernetes-based AI orchestration with HPC resources.
- Define best practices for containerized scientific execution using Singularity/Apptainer or Enroot.
- Optimize scheduling parameters, queues, and fair-share policies for multi-tenant efficiency.
- Partner with Observability Engineers to monitor scheduler efficiency and hardware utilization.
- Troubleshoot complex workload failures, including distributed training and MPI communication issues.
- Maintain configuration-as-code models for scheduling tiers.
Qualifications
- Bachelor’s or advanced degree in Computer Science, Applied Mathematics, Computational Engineering, or a related field.
- 5+ years of systems engineering experience in workload scheduling, resource management, and cluster optimization.
- Expert-level proficiency in SLURM administration and Singularity container runtimes.
- Hands-on experience with Run:ai, Kubernetes, and GPU scheduling paradigms.
- Understanding of high-speed interconnects (InfiniBand, RoCE) and multi-node communication (MPI, NCCL).
- Proficiency in infrastructure automation and telemetry gathering.
- Strong collaboration skills with a lean and agile mindset focused on driving efficiency.
Neugierig, hier zu arbeiten?Dann bewirb dich gleich für den Job bei Roche in Österreich.
Ähnliche Jobs, die dich interessieren könnten
