Senior Data Center Systems Engineer -- AI Infrastructure & Performance Optimization Job Description Position Summary Johnson Controls' Data Center Thermal Solutions team is seeking a Senior Data Center Systems Engineer for the Advanced Technology & Innovation group. This role bridges AI/HPC computing systems with data center infrastructure, focusing on optimizing the interdependencies between computing workloads and cooling efficiency. The ideal candidate will be highly self-driven, able to work independently with limited supervision, and bring deep expertise in GPU-based AI workloads combined with comprehensive understanding of data center infrastructure systems. Key Responsibilities
- Deploy, configure, and manage AI/HPC compute clusters running diverse workloads including LLM training and inference, VLM applications, HPC simulations, and stress testing scenarios.
- Deploy and commission liquid cooling hardware including direct-to-chip cold plates, CDUs, immersion systems, and hybrid cooling solutions for high-density AI infrastructure.
- Design and execute workload management strategies to evaluate thermal and power characteristics across CPU and GPU-based AI nodes under varied operational conditions.
- Develop system-level digital twin simulations integrating IT telemetry with mechanical and electrical infrastructure data to model cooling and computing efficiency interdependencies.
- Extract and analyze telemetry data from IT nodes, GPUs, CPUs, and thermal systems including chillers, cooling towers, pumps, and CDUs to optimize performance and energy efficiency.
- Collaborate with thermal engineers and controls teams to validate cooling system performance under real-world AI/HPC workloads and support NPI development initiatives.
- Integrate and leverage DCIM and BAS platforms for holistic infrastructure monitoring, predictive analytics, and operational optimization.
- Author technical reports and industry white papers documenting research findings, best practices, and performance benchmarks for AI data center infrastructure.
- Represent Johnson Controls at industry conferences, technical forums, and customer engagements to share innovations and establish thought leadership in AI infrastructure optimization.
- Interface with hyperscalers, OEMs, and technology partners to align infrastructure solutions with evolving AI hardware and workload requirements.
- Contribute to technical publications, patent filings, and industry standards related to data center systems optimization.
Qualifications
- PhD or Master's degree in Electrical Engineering, Mechanical Engineering, or Computer Science.
- 5-7 years of professional experience in AI/HPC infrastructure, data center operations, or related technical roles.
- Prior experience working in data center environments strongly preferred.
- Proven hands-on experience deploying and managing LLM, VLM, training, inference, and HPC simulation workloads on GPU-based AI infrastructure.
- Deep understanding of AI hardware architectures (NVIDIA H100/H200, AMD MI300, etc.) and their thermal/power characteristics across diverse workloads.
- Strong knowledge of data center mechanical systems (cooling distribution, heat rejection), electrical systems (power distribution, UPS), and networking architectures (InfiniBand, Ethernet fabrics).
- Proficiency with DCIM platforms, BAS integration, and digital twin modeling tools for infrastructure simulation and optimization.
- Experience with stress testing tools, workload orchestration platforms (Kubernetes, Slurm, etc.), and telemetry collection frameworks.
- Demonstrated ability to work independently, manage complex projects, and deliver results with minimal supervision.
- Track record of innovation and cross-functional collaboration in data center or AI infrastructure environments.
- Strong technical communication and presentation skills for conference participation and white paper authorship.
|