Case study

NuTheia AI Hybrid Cloud

NuTheia AI Hybrid Cloud Platform

A World-First AI Hybrid Cloud Platform

CodeZero proudly presents NuTheia, a groundbreaking, world-first AI Hybrid Cloud Platform. By seamlessly integrating on-premises, high-density immersion-cooled computing with the elastic capabilities of AWS cloud, NuTheia delivers a sustainable, optimised, and strategically advantageous solution for demanding AI model training and inference, aligning with key industry trends in hybrid AI and sustainable computing.

The Challenge: Balancing Power, Cost & Sustainability

Modern AI workloads, particularly the training of Large Language Models (LLMs), are incredibly resource-intensive, pushing traditional air-cooled data centers to their thermal and financial limits. Organisations require robust pathways to harness immense computational power not only efficiently but also sustainably, while retaining crucial flexibility for workload deployment and inference at the edge. The critical question is: how can businesses achieve peak AI performance and scalability without incurring prohibitive costs or exacerbating environmental impact?

NuTheia Synergistic Architecture

A Synergistic, Optimised Hybrid Architecture

NuTheia, personally architected by CodeZero CEO Andy Rogen, pioneers a hybrid AI model by leveraging a consistent Nutanix software underlay to create a unified fabric between two powerful, complementary environments:

On-Premises Immersion-Cooled AI Powerhouse:
  • High-Performance Hardware: Dell R750 servers, funded by Intel, featuring 3rd Generation Intel® Xeon® Platinum 8352Y processors (with built-in AI acceleration) and robust Nvidia A40 GPUs, ideal for complex AI training and demanding computations.
  • Revolutionary Cooling: Servers are submerged in a GRC (Green Revolution Cooling) immersion cooling tank with advanced thermal management fluid from BP Castrol. This approach drastically improves thermal stability, allows for higher compute density, enhances hardware reliability, and significantly reduces energy consumption compared to traditional air cooling – critical for sustained AI workloads.
  • AI-Ready Software Stack: Utilizes Nutanix GPT-in-a-Box, a full-stack AI-ready platform simplifying the deployment and management of generative AI.
Elastic and Scalable AWS Cloud Environment:
  • Seamless Cloud Extension: Employs Nutanix Cloud Clusters (NC2) on AWS, running the full Nutanix HCI stack natively on AWS bare-metal Z1D instances.
  • Cost-Effective Inference: Features an Amazon EKS (Elastic Kubernetes Service) cluster utilizing an Nvidia T4 GPU via a G4DN instance.
  • Cloud-Native Integration: Leverages Amazon EFS for resilient file storage and Nutanix Prism for unified management across the hybrid landscape.

This pioneering project is a testament to strong ecosystem collaboration, involving AWS, Nutanix, Green Revolution Cooling (GRC), Intel, BP Castrol, UNICOM Engineering, and Centersquare.

NuTheia Key Features & Benefits

Key Features & Strategic Benefits

World-First Hybrid Immersion AI Platform

Innovatively combines the extreme thermal efficiency and sustained performance of on-prem immersion cooling with the agility and global reach of the AWS cloud.

Sustainable High-Performance Training

Capitalises on powerful on-prem Nvidia A40 GPUs within an energy-efficient immersion environment. This setup is ideal for intensive AI model training (e.g., Meta's Llama 3 on Hugging Face datasets), mitigating thermal throttling and ensuring consistent peak performance over extended periods.

Flexible & Cost-Optimised Cloud Inference

Seamlessly pushes trained and fine-tuned models to AWS, leveraging energy-efficient Nvidia T4 GPUs for scalable, responsive, and geographically distributed inference – a strategy that optimises for both performance and operational expenditure.

Unified Operational Consistency

The Nutanix software-defined architecture (HCI, NKE, NC2) provides a consistent management plane, skillset requirement, and operational workflow across the entire hybrid environment, simplifying MLOps.

Future-Proof Performance Optimisation

Addresses the realities of prolonged model training by building in resilience and strategies for code efficiency and robust performance monitoring (Prometheus, Grafana), aligning with modern CI/CD practices.

NuTheia In Action

The Impact: Redefining AI Infrastructure

NuTheia is not merely a technological showcase; it is a strategic blueprint for the future of AI infrastructure, offering a compelling response to the dual demands of computational power and environmental responsibility.

Enabling Sustainable AI at Scale

Directly confronts the thermal and energy challenges of traditional data centers, offering a significantly more sustainable path for compute-intensive AI. Immersion cooling can lead to substantial reductions in cooling energy and overall PUE.

Strategic Cost-Performance Optimisation

Empowers organisations to architect AI workflows that leverage the best of both worlds: the sheer power and control of on-prem immersion for training, and the cost-efficiency and reach of the cloud for inference.

Enhanced Agility, Control, and Innovation

Provides the flexibility to deploy AI capabilities precisely where needed, maintaining sovereignty over sensitive training data and processes, while fostering rapid experimentation with cloud services.

Roadmap for Growth

Incorporates a clear vision for future enhancements, including Nutanix Move for streamlined model migration, Kubeflow for advanced MLOps pipelines, one-click deployment tools, and leveraging AWS CDN for optimised edge delivery.

NuTheia represents months of dedicated, intensive development by the CodeZero team. The visual storytelling of this complex project was expertly handled by our sister company NyxVX (an Epic Games MegaGrant recipient) using Unreal Engine.

CodeZero: Engineering the next generation of AI infrastructure – delivering sustainable power, strategic flexibility, and a tangible competitive edge for businesses poised to lead with Artificial Intelligence.

Ready to transform your AI infrastructure?

Let's discuss how hybrid immersion cooling can optimize your AI workloads while promoting sustainability.

Contact us