NVIDIA’s Revolutionary AI Supercomputers: Scaling Up for the Future
Download MP30:00: Hey everyone, welcome back for another deep dive.
0:02: So, anyone paying attention to AI has probably seen those crazy predictions lately, you know, about how we're gonna need 100 times more computing power for AI to Well, actually become intelligent, like, really reason and think.
0:18: But it's pretty mind blowing when you think about it.
0:19: It means we need completely new ways of building computers, not just like making the old ones a bit faster.
0:24: Yeah, for sure, and that's what we're going to dig into today, right?
0:27: It's all about the architecture, how these things are actually designed and built to handle this huge, this massive increase in what AI needs to do.
0:35: Exactly.
0:36: And the sources we're looking at today, they dive deep into exactly that, the incredible innovations happening right now in the computing architecture.
0:43: And one really important point they emphasize is the difference between scaling up, which means making individual computers more powerful, and scaling out, which is connecting lots of computers together.
0:53: And like their big argument is that we need to really focus on scaling up first.
0:57: OK, got it.
0:58: So, to understand where we're going, it helps to know where we've been.
1:02: Our sources talk about these systems called HGX.
1:06: Picture this.
1:07: You've got 8 super powerful processors called GPUs, all connected together with this crazy fast technology called NVL Link 8, right?
1:17: Now, these 8 GPUs need to talk to the main brains of the system, the CPUs, but those are on a separate board and use a slower connection called PCI Express.
1:25: And then, to build a really huge AI supercomputer, you link tons of these HGX systems together using this high-speed network, they call it Infiniband.
1:36: One of the sources even describes this whole setup as quote revolutionary for its time in the world of AI.
1:41: Yeah, definitely.
1:42: It was a big step forward.
1:44: But these new AI models, especially the ones that can actually like reason and learn and make decisions, they're just so demanding that the old architecture can't keep up, and that's where things get interesting.
1:52: What they've done is they've taken that tightly integrated NV Link system and kind of pulled it apart, which they call disaggregation, disaggregation, so like breaking a complex thing down into simpler parts.
2:02: Kind of, yeah, it's like this.
2:04: Those envy Li switches, they're what makes sure all the GPUs can talk to each other super fast, right?
2:10: Well, they used to be built into the main computing unit, but now they've moved them out and put them in the center of the system chassis.
2:17: It's like, imagine you've got a huge meeting with tons of people, you take the person who's in charge of making sure everyone can talk and give them their own control room to manage everything.
2:27: That's what's happening here.
2:28: OK, I get the picture, but what's so special about this NV Li switch?
2:32: Yeah.
2:33: Why does it make such a big difference?
2:34: So the source actually calls it, get this, the highest performance switch the world has ever made.
2:40: And the reason is it lets every single GPU connected to it talk to every other GPU at the same time at full speed.
2:48: Think about it.
2:48: It's like, if everyone in a giant stadium could instantly talk to anyone else with no delays or anything, that's the level of communication we're talking about here, and they're using 18 of these super switches in 9 racks.
2:59: Wow, OK.
3:00: That's a lot of communication power.
3:02: Yeah.
3:02: Now, another big change the sources talk about is in how they keep these things cool.
3:07: They've moved from air cooling in the old HGX systems to liquid cooling in these new disaggregated ones.
3:14: Why the switch?
3:15: Liquid cooling, it's just way more efficient at getting rid of heat, especially when you're trying to fit so much computing power into such a small space.
3:21: It's like, I don't know, like comparing a little desk fan to a giant industrial cooling system.
3:28: So what this means is they can take the power of two HGX systems and cram it all into a single rack, and the numbers are insane.
3:35: The number of components per rack has gone from like 60,000 to 600,000, and each rack now needs 120 kilowatts of power.
3:44: The really big takeaway is this lets them build a one exaflops computer in just one rack, and like the speaker and the source is super pumped about it.
3:51: He's like, Isn't it incredible?
3:53: One exaflops and one rack.
3:54: OK, yeah, that is incredible.
3:55: So it sounds like this.
3:56: Agregation and the liquid cooling, that's what's really driving this whole scaling up thing.
4:00: Absolutely.
4:00: You know, their original goal, their big dream was to build one giant chip, a monolithic chip with 130 trillion transistors, and 20 trillion of those would just be for computation.
4:12: But like the technology to manufacture something that big just doesn't exist yet, so they had to find another way.
4:18: And that's this disaggregated architecture using their gray CPUs, Blackwell GPUs, and the superfast MV Link interconnect.
4:25: And they're using 72 racks to achieve this.
4:27: They call it the ultimate scale up.
4:29: OK, so even though it's not one giant chip, the whole system works like one.
4:33: So how much power are we talking about here with this Blackwell architecture?
4:36: What can it actually do?
4:37: We're talking 1 exaflops of compute power and 570 terabytes per second of memory bandwidth.
4:44: It's wild.
4:45: And the source, they emphasize this, it's like everything in this machine is now in teas, meaning trillions of operations happening at the same time.
4:53: To understand what that means, they use this analogy of an AI factory.
4:57: Their older factory based on the Hopper architecture, needed 100 megawatts of power, used 45,000 chips spread across 1400 racks, and could process 300 million tokens per second.
5:08: OK.
5:09: Now they don't give the same kind of direct comparison for Blackwell, but the implication is clear.
5:13: It's a huge jump in both efficiency and capacity, a huge jump.
5:17: OK.
5:17: And the sources also talk about something called Blackwell Ultra, which is coming out soon, right?
5:21: Yeah, later this year.
5:22: It's built on the same basic design as Blackwell, but it's got some serious upgrades.
5:26: About 1.5 times more flops, a new instruction specifically for these attention mechanisms in AI, 1.5 times more high speed memory like for the KV cash, and double the networking bandwidth.
5:37: And they described the move to Ultra as a graceful glide because it's basically an evolution of what's already there.
5:43: So big upgrades without having to start from scratch, that makes sense.
5:46: And then there's Ruben, the next big thing, named after the astronomer Vera Rubin, right?
5:50: It's launching in the second half of next year and it's, well, it's a whole new beast.
5:54: It's got a new CPU twice as fast as Grace with more memory and bandwidth, and it only uses 50 watts.
6:00: There's also a brand new GPU called CX9, new networking with this smart NIC, a new MV link with 144 cores and HBM4 memory.
6:11: But the interesting thing is they're keeping the same chassis design so they can focus all their energy and risk taking on the core silicon and interconnect parts.
6:20: OK, now I remember there's this thing in the source about Blackwell actually having two GPUs in one package.
6:26: How does that affect how we understand these MV link numbers for ribbon?
6:29: yeah, that's important.
6:30: So from now on, when they talk about MV Link, they're referring to the number of individual GPU dies that are connected.
6:36: So Ruben MV Link 144, that means it's connecting to 144 separate GPU processing units.
6:42: It gives you a more accurate picture of how much processing power is actually linked together.
6:46: OK, so more individual processing units linked together with an even faster network, and then there's Ruben Ultra on top of that.
6:53: Yep, that's coming in late 2027, and they're calling it an extreme scale up.
6:58: We're talking MVLI 576, 600 kilowatts per rack, 2.5 million parts in the whole system, 15 exaflops of compute power, and a whopping 4600 terabytes per second of bandwidth.
7:11: It'll also have a new MV link switch and the CX 9 GPUs.
7:15: They're envisioning 16 sites with 4 GPUs packed into each package, all connected by this huge MV Link fabric.
7:21: It's hard to even imagine numbers that big.
7:23: The source even included a visual comparison, showing a Grace Blackwell system.
7:27: To a Ruben system like to scale, and the Ruben system is noticeably bigger.
7:32: It really shows what they were saying before about scaling up before you scale up.
7:35: That's right.
7:36: And to really highlight how much faster things are getting, they compare the scale up FLOPS.
7:40: Hopper's the baseline at 1X.
7:42: Blackwell is already 68 times faster.
7:44: And Ruben is a crazy 900 times faster.
7:47: That's almost 1000 times more compute power in just a few years.
7:50: Wow, OK.
7:51: So we've talked about scaling up, but like you said earlier, scaling out is important too, connecting all these powerful units together, and the sources go into some of the innovations that are making that possible, especially in networking.
8:02: Exactly.
8:02: You see, traditionally they've used things like Infiniban and Spectrum X for scaling out, but now they're moving towards Ethernet.
8:10: And the reason is, they want to make Ethernet just as fast and reliable as Infiniband while keeping it easy to use and manage.
8:18: Their goal is to be able to scale out to hundreds of thousands of GPUs by the time Ruben comes out.
8:24: That's a massive network.
8:25: So What are some of the challenges in connecting all these GPUs and how are they solving them?
8:29: Well, one big thing is the physical connections.
8:32: For short distances like within a rack or between racks that are close together, copper is still the best option.
8:38: It's reliable, energy efficient, and not too expensive.
8:41: But when you need to connect things that are really far apart, like in these huge data centers, you need something else, and that's where silicon photonics comes in.
8:48: , silicon photonics.
8:51: We hear so much about that these days.
8:52: What are some of the big breakthroughs the sources mention?
8:55: The main issue with silicon photonics used to be that the transceivers, the things that convert electrical signals to light, used a lot of power, but they've announced a big breakthrough.
9:05: The first co-packaged optics, or CPO, silicon Phonic systems.
9:10: It's the world's 1st 1.6 terabit per second CPO using this micro ring resonator modulator technology made with TSMC.
9:18: OK, that sounds super advanced.
9:20: But can you break down why this is such a big deal?
9:23: Sure.
9:23: So imagine if they didn't have this new technology for a network with, say, 100,000 GPUs, you'd need tons of these traditional transceivers that plug in.
9:32: Each one uses around 30 watts of power and costs about $1000 and each GPU might need 6 of them to connect to everything.
9:38: So you're talking about 180 watts and $6000.
9:40: Just for the transceivers for one GPU.
9:43: Now multiply that by hundreds of thousands of GPUs and the energy consumption goes through the roof.
9:48: Yeah, I see what you mean.
9:48: The speaker in the source actually says, energy is our most important commodity, and using that much power would just eat into their customers' profits.
9:56: It wouldn't be sustainable.
9:57: So this micro ring resonator modulator.
10:00: That's the key to solving this problem.
10:02: Yeah, it's called an MRM for short.
10:05: Basically, it's this tiny thing that uses a resonant ring to control how light reflects in a waveguide.
10:11: And by changing that reflectivity, they can encode data, like the ones and zeros, onto a beam of laser light.
10:19: This photonic integrated circuit is then stacked really close to the electronic integrated circuit, along with these tiny micro lenses and a fiber array.
10:29: It's all very tightly integrated and uses way less energy, and that leads to these silicon photonic switches.
10:34: Exactly.
10:35: This new tech lets them build switches with a ton of quarts, 512 to be exact, and they can connect fibers directly to the switches, which was really hard to do before.
10:43: So what does this actually mean for data centers?
10:45: It means they can save huge amounts of energy.
10:48: They give an example of saving 6 megawatts in one data center, and that's enough to power 10 Ruben Ultra racks.
10:55: So that's a lot of energy they can use for something else.
10:57: OK, that makes sense.
10:59: And it seems like they have a really clear plan for how they're going to keep innovating.
11:03: Yeah, they've got a rhythm going.
11:05: Every year they announce a new architecture.
11:07: Every 2 years, a new product line.
11:09: And every year they aim for big performance increases, what they call X factors up.
11:14: They're also careful about how they take risks.
11:16: They focus on big changes in silicon, networking, or the chassis, but they don't do all of them at once.
11:22: Smart.
11:22: And the sources also talk about how these changes are impacting enterprise computing and even robotics.
11:28: Right.
11:28: So they're completely redesigning the whole computing stack for enterprise, AI and machine learning, new processors, new operating systems, new applications, everything.
11:36: And they've announced a new line of DJX computers, the DJX station, DJX Spark, and another DJX station.
11:43: They're super powerful with lots of CPU cores, high bandwidth memory, and regular expansion slots, and they're aimed at data scientists and researchers.
11:52: Companies like HP, Dell, Lenovo, and Aus will be selling them.
11:56: So AI is pushing innovation everywhere, from the biggest data centers to individual workstations.
12:02: And then there's robotics, which they think could become the biggest industry in the world.
12:05: Yeah, they see huge potential there, but there are challenges.
12:09: The data problem, meaning there's not enough real world data to train robots, the model architecture, and the sheer computing power needed for robotic intelligence.
12:18: To solve the data problem, they're expanding their Omniverse platform, which they call the operating system for physical AIs, and they're adding something called Cosmos, which is a model that can create realistic virtual environments based on Omniverse.
12:32: So they can basically create unlimited training data for robots.
12:37: That's a clever way to get around the lack of data.
12:39: But what about rewards?
12:41: How do you train robots with reinforcement learning if you can't give them clear rewards?
12:45: That's a really good question.
12:47: See, with large language models, you can define clear rewards for the AI, but with robots, it's trickier.
12:54: So they're using the laws of physics as rewards.
12:56: Like if a robot picks something up without dropping it, that's a reward based on physics.
13:01: And to do that, they need a super powerful physics engine that can simulate all the complexities of the real world.
13:06: It sounds like they're building this whole ecosystem to support intelligence systems from giant data centers to tiny robots.
13:12: Exactly.
13:13: And to sum it all up, their Blackwell architecture is in full production now and demand is through the roof.
13:20: Their Blackwell Evy Link 72 system with Dynamo is 40 times more powerful than their hopper system.
13:26: AI inference is becoming a really important workload, and they have a very clear roadmap for the future of AI infrastructure covering cloud, enterprise, and robotics.
13:36: Wow, that was a lot to take in.
13:37: It's amazing how fast things are changing, from Hopper to Blackwell to Rubin and all these crazy innovations in interconnect technology like Disaggregated envy link and silicon photonics.
13:48: Yeah, and what's really impressive is how they're thinking about everything, not just the processing power, but also the connections between everything and making sure it's all energy efficient.
13:55: It really makes you think about the possibilities.
13:57: What does all this computing power mean for AI research, for new applications, for the future of robotics?
14:03: It's pretty exciting.
14:04: Absolutely.
14:04: And it brings up this big question, which I think is worth thinking about.
14:08: Given that energy is such a precious resource.
14:11: How will all these advancements in energy efficiency and new interconnect technologies change the way we think about sustainable computing, and how will they affect the future of AI?
14:22: It's this fascinating balance between power and sustainability that we need to keep in mind as things keep evolving.
14:29: It's definitely something to ponder.
14:30: Thanks for joining us for this deep dive.
14:32: We'll be back soon with more explorations into the cutting edge of tech.
14:35: Until then, keep those minds curious, and one more thing before we go.
14:40: This episode of the Deep Dive is brought to you by Stonefly, the pioneer of SA technology.
14:45: Stonefly has been a leading solutions company for over 2 decades with a deep understanding of enterprise storage and data management.
14:52: Whether you're looking to simplify your IT infrastructure, enhanced data security, or optimize performance, Stonefly has the expertise and solutions to meet your needs.
15:01: Visit ISCSI.com or Stonefly.com to learn more about how Stonefly can help you navigate the complex world of data storage and management.
15:09: That's Stonefly for all your storage needs.
15:12: See you next time.
