GPU and Data Center Systems EngineerNAVIGO

Job Summary

Own the end-to-end design, installation, and optimization of GPU and server clusters powering containerized modular AI data centers. This includes hardware selection, rack-level integration, firmware and OS imaging, workload benchmarking, cooling and power performance tuning, and lifecycle maintenance. You'll help define the reference architecture for multi-megawatt GPU clusters, automate their provisioning and orchestration, and optimize for thermal efficiency, compute utilization, and fault tolerance in a hybrid on-location and remote-managed environment.

Responsibilities
  • Specify, integrate, and benchmark thousands of GPU nodes.

  • Build imaging and configuration automation (PXE boot, Terraform, etc.).

  • Optimize compute performance, power draw, and thermals.

  • Manage firmware updates, BIOS tuning, and redundancy planning.

  • Support rack-level maintenance and performance monitoring.

Qualifications
  • Experience with GPU clusters, DGX systems, or HPC environments.

  • Strong Linux systems experience (Ubuntu, RHEL, Rocky).

  • Familiarity with Kubernetes, Slurm, or Run:AI orchestration.

  • Experience managing >1 MW compute clusters preferred.

Requirements
  • GPU Cluster Management5 - 8 years
  • Linux Systems Administration4 - 7 years
  • Infrastructure Automation3 - 6 years
  • Problem SolvingGood - Excellent
  • Technical DocumentationGood - Excellent
Nice to Have
  • Kubernetes2 - 5 years
  • Power/Thermal Optimization3 - 5 years
  • Cross-functional CollaborationGood - Excellent
Logo
Tucker Carlson Network
PublicSquare
Maddox Transformers
Patriot Mobile
Moms for America
MxM News
Hillsdale College
Bahnsen Group