Is your organization's infrastructure ready for the new hybrid cloud?
AI and hybrid cloud promise major gains... if the underlying infrastructure can keep up. This article from Deloitte breaks down how hybrid environments are evolving, and why many organizations may be less "ready" than they think. Read the article to evaluate your infrastructure readiness and explore what's required to build a digital foundation for hybrid cloud and AI. Contact Bubble Cloud/ Bubble Social Media Marketing to discuss your next steps in infrastructure modernization.
How should we rethink our hybrid cloud strategy for AI workloads?
Many organizations start AI projects in the public cloud, then hit an inflection point where costs, performance, or regulatory needs push them to reimagine their infrastructure mix.
Based on Deloitte’s research and leadership interviews, a practical approach looks like this:
- Start in the cloud, but define a cost threshold. Monitor when the cost of running a given AI workload in the cloud reaches about 60–70% of the total cost of buying equivalent systems (for example, GPU-powered boxes with 4–8 modules). That’s a signal to consider shifting some workloads to dedicated infrastructure.
- Track capacity in business terms. Don’t just look at GPU counts. Track data and model-hosting needs (in GB) against the number of transactions per second a single GPU can support. If performance drags as more teams inference simultaneously, it may be time to invest in dedicated GPU boxes or redistribute workloads.
- Align infrastructure to business outcomes. Instead of “build it and they will come,” start with the business need: required latency, token ingestion speed, network capacity, security, and compliance. Then design the mix of public cloud, private cloud, on-prem, and edge to match.
- Use hybrid as a financial and technical lever. Hybrid models let you keep bursty or experimental workloads in the cloud, while moving steady, large-scale workloads to more economical dedicated or private AI infrastructure.
- Consider edge and high-performance computing (HPC) as complements. As cloud usage approaches your value inflection point, evaluate whether low-latency, high-security workloads belong at the edge, and whether compute-intensive research or simulation workloads are better suited to HPC environments.
The goal isn’t to abandon cloud, but to reshape your hybrid cloud strategy so AI workloads run where they make the most sense economically, technically, and from a risk perspective.
What new AI hardware and operating models should we plan for?
The AI hardware landscape is moving quickly, but a few trends are especially relevant for enterprise planning.
1. Specialized AI chips and processors
- New processors such as neural processing units (NPUs) and tensor processing units (TPUs) are designed to handle AI workloads more efficiently than general-purpose CPUs.
- These chips can process larger data sets with better energy efficiency, which can lower operating costs and environmental impact.
- NPUs can enable sensitive AI workloads to run locally (for example, on AI-embedded PCs) instead of in the cloud, improving privacy and latency.
2. Rapid vendor innovation
- Tech vendors are releasing new AI-focused chips and platforms in cycles measured in months, not years. For example, in a six-month window, Google launched its Ironwood chip and new A4/A4X virtual machines, while Intel released multiple Core Ultra series processors.
- Many chipmakers now bundle software layers with their hardware to simplify model deployment, orchestration, and optimization.
3. Full-stack and operating model changes
- Vendors are working to integrate diverse chips (GPUs, CPUs, NPUs, and specialized accelerators) into cohesive stacks—whether in private clouds, AI “factories,” or rack-scale solutions.
- On top of this hardware, organizations are adopting new operating models such as mixture-of-experts architectures and custom AI stacks to route workloads to the most efficient resources.
- Engineering teams are being built to handle data chunking, model management, and load balancing across heterogeneous hardware.
4. Avoiding the “latest chip” trap
- You may not need the newest chip generation immediately. There can be meaningful efficiency gains left in your current stack—especially if you right-size GPU-powered boxes or trays and optimize cooling and power.
- Given that AI trays often combine GPUs, CPUs, and special interconnect chips, infrastructure changes can have knock-on effects on power and cooling strategies, not just compute capacity.
In practice, it’s less about chasing every new chip and more about reimagining your operating model so you can plug in new hardware as it matures, without constant rework.
When does edge or high-performance computing make sense for AI?
Edge and high-performance computing can complement cloud and on-prem environments, especially as AI use cases diversify.
1. Why organizations are moving to the edge
- The global AI edge computing market is projected to grow from about US$27 billion in 2024 to US$267 billion by 2032, indicating strong momentum.
- In a 2024 Deloitte survey, organizations that invested in edge reported a 13-percentage-point increase in their belief that they are gaining ROI from those investments between 2023 and 2024.
- Key drivers include AI tasks that need low latency, low storage/compute, or high data security, and the rise of AI-embedded devices (phones, laptops, wearables, cameras, drones, robots, and more).
2. What edge AI looks like in practice
- Walmart uses a “triplet” model: two public clouds plus a private cloud distributed across US regions, supported by 10,000 edge cloud nodes in stores and clubs. This enables low-latency inferencing at the point of customer interaction.
- Netflix runs content management and user data tracking in a public cloud, while a private cloud-based content delivery network handles video delivery to reduce latency.
- Jaguar TCS Racing combines cloud and AI technologies to analyze real-time car performance data, supporting live decision-making for engineers and drivers.
3. Device trends you should anticipate
- Deloitte’s 2024 Tech, Media, and Telecommunications Predictions report projected that 30% of PCs sold in 2024 would have local AI-processing capabilities, rising to nearly half in 2025.
- The 2025 edition projects that AI-enabled PCs could exceed 40% of shipments in 2026, meaning AI will increasingly run directly on corporate and employee devices.
- This has implications for device management, security, and bring-your-own-device policies as AI capabilities become standard on endpoints.
4. Where HPC fits
- On the other end of the spectrum, high-performance computing is well-suited for compute-intensive AI workloads such as deep research, modeling, simulations, or genomic sequencing.
- These scenarios often require dense GPU arrays and extremely high throughput that can be difficult or expensive to achieve in general-purpose cloud environments.
5. Data architecture considerations
- As generative and agentic AI models consume more data, many organizations are shifting from centralized data lakes to federated data approaches, accessing data where it lives instead of moving everything into one place.
- This can reduce storage and movement costs and mitigate some security risks associated with centralization.
- Some companies add an ontological layer that maps disparate data to real-world concepts, making it easier for AI agents to work across federated sources.
- Strong role-based access controls and human-in-the-loop oversight remain important to manage risk and bias.
In short, consider edge when you need low latency, local processing, or tighter data control; consider HPC when you need extreme compute density. Both can help you reshape your hybrid cloud strategy for an AI-driven future.


