Summary

The text is a detailed presentation by Jensen Huang, CEO of NVIDIA, discussing the advancements in artificial intelligence (AI) and the company's new technologies and products. Key points include the introduction of tokens as fundamental building blo...

Transcript

Speaker10:32 - 10:43

Thank you for watching!

Speaker12:16 - 12:20

Thank you.

Speaker15:08 - 15:10

Thank you for watching!

Speaker18:33 - 19:00

If you have any questions or other problems, please post them in the comments. Thank you for watching!

Speaker23:44 - 23:45

Thank you.

Speaker25:48 - 25:52

Thank you for watching!

Speaker27:04 - 27:16

Thank you for watching!

Speaker27:33 - 27:36

Thanks for watching!

Speaker28:17 - 28:40

This is how intelligence is made. A new kind of factory. Generator of tokens. The building blocks of AI. Tokens have opened a new frontier. The first step into an extraordinary world. Where endless possibilities are born.

Speaker28:45 - 29:15

Tokens transform images into scientific data, charting alien atmospheres and guiding the explorers of tomorrow. They turn raw data into foresight. So next time, we'll be ready. Tokens decode the laws of physics to get us there faster

Speaker29:21 - 29:43

And take us further. Tokens see disease before it takes hold. They help us unravel the language of life. And learn what makes us tick.

Speaker29:49 - 30:17

Tokens connect the dots, so we can protect our most noble creatures. They turn potential into plenty, and help us harvest our bounty. Tokens don't just teach robots how to move, but to bring joy.

Speaker30:22 - 30:49

To lend us a hand and put life within reach. Together, we take the next great leap to bravely go

Speaker30:52 - 31:09

Where no one has gone before. And here is where it all begins.

Speaker31:24 - 31:53

Welcome to the stage, NVIDIA founder and CEO, Jensen Wang. Welcome to GTC! What an amazing year. We wanted to do this at NVIDIA.

Speaker31:54 - 32:16

So through the magic of artificial intelligence, we're going to bring you to NVIDIA's headquarters. I think I'm bringing you to NVIDIA's headquarters. What do you think? This is...

Speaker32:16 - 32:42

This is where we work. This is where we work. What an amazing year it was, and we have a lot of incredible things to talk about. And I just want you to know that I'm up here without a net. There are no scripts, there's no teleprompter, and I've got a lot of things to cover, so let's get started. First of all, I want to thank all the sponsors, all the amazing people who are a part of this conference. Just about every single industry is represented.

Speaker32:43 - 33:10

Healthcare is here. Transportation. Retail. Gosh, the computer industry. Everybody in the computer industry is here. And so it's really, really terrific to see all of you, and thank you for sponsoring it. GTC started with GeForce. It all started with GeForce. And today, I have here a GeForce 5090.

Speaker33:11 - 33:38

And 5090, unbelievably, 25 years later, 25 years after we started working on GeForce, GeForce is sold out all over the world. This is the 5090, the Blackwell generation, and comparing it to the 4090, look how it's 30% smaller in volume. It's 30% better at dissipating energy and

Speaker33:38 - 34:08

Incredible performance. Hard to even compare, and the reason for that is because of artificial intelligence. GeForce brought CUDA to the world. CUDA enabled AI, and AI has now come back to revolutionize computer graphics. What you're looking at is real-time computer graphics, 100% path traced. For every pixel that's rendered, artificial intelligence predicts the other 15.

Speaker34:09 - 34:39

Think about this for a second. For every pixel that we mathematically rendered, artificial intelligence inferred the other 15. And it has to do so with so much precision that the image looks right and it's temporally accurate, meaning that from frame to frame to frame, going forward or backwards because it's computer graphics, it has to stay temporally stable. Incredible. Artificial intelligence has made extraordinary progress.

Speaker34:39 - 35:02

It has only been 10 years. Now, we've been talking about AI for a little longer than that. But AI really came into the world's consciousness about a decade ago. It started with perception AI, computer vision, speech recognition, then generative AI. The last five years, we've largely focused on generative AI.

Speaker35:02 - 35:17

Teaching an AI how to translate from one modality to another, text to image, image to text, text to video, amino acids to proteins, properties to chemicals,

Speaker35:17 - 35:46

All kinds of different ways that we can use AI to generate content. Generative AI fundamentally changed how computing is done. From a retrieval computing model, we now have a generative computing model. Whereas almost everything that we did in the past was about creating content in advance, storing multiple versions of it, and fetching whatever version we think is appropriate at the moment of use.

Speaker35:47 - 36:13

Now, AI understands the context, understands what we're asking, understands the meaning of our request, and generates what it knows. If it needs, it'll retrieve information, augments its understanding, and generate answers for us. Rather than retrieving data, it now generates answers. Fundamentally changed how computing is done. Every single layer of computing has been transformed.

Speaker36:14 - 36:43

The last several years, the last couple, two, three years, major breakthrough happened. Fundamental advance in artificial intelligence. We call it agentic AI. Agentic AI basically means that you have an AI that has agency. It can perceive and understand the context of the circumstance. It can reason, very importantly, it can reason about how to answer or how to solve a problem.

Speaker36:44 - 37:09

And it can plan an action. It can plan and take action. It can use tools because it now understands multimodality information. It can go to a website and look at the format of the website, words and videos, maybe even play a video. Learns from what it learns from that website, understands it, and come back and use that information, use that newfound knowledge to do its job.

Speaker37:10 - 37:39

Agentic AI. At the foundation of agentic AI, of course, something that's very new, reasoning. And then, of course, the next wave is already happening. We're going to talk a lot about that today. Robotics, which has been enabled by physical AI. AI that understands the physical world. It understands things like friction and inertia, cause and effect, object permanence. When someone doesn't mean to disappear from this universe.

Speaker37:39 - 38:05

It's still there, just not seeable. And so that ability to understand the physical world, the three-dimensional world, is what's going to enable a new era of AI we call physical AI, and it's going to enable robotics. Each one of these phases, each one of these waves, opens up new market opportunities for all of us. It brings more

Speaker38:05 - 38:23

And new partners to GTC. As a result, GTC is now jam-packed. The only way to hold more people at GTC is we're going to have to grow San Jose. And we're working on it. We've got a lot of land to work with. We've got to grow San Jose.

Speaker38:30 - 38:56

As I'm standing here, I wish all of you could see what I see. We're in the middle of a stadium. Last year was the first year back that we did this live. It was like a rock concert. GTC was described as the Woodstock of AI, and this year it's described as the Super Bowl of AI.

Speaker39:02 - 39:30

The only difference is everybody wins at this Super Bowl. Everybody's a winner. And so every single year, more people come because AI is able to solve more interesting problems for more industries and more companies. And this year, we're going to talk a lot about agentic AI and physical AI. At its core, what enables each wave and each phase of AI, three fundamental

Speaker39:31 - 39:57

The first is how do you solve the data problem? And the reason why that's important is because AI is a data-driven computer science approach. It needs data to learn from. It needs digital experience to learn from, to learn knowledge and to gain digital experience. How do you solve the data problem? The second is how do you solve the training problem?

Speaker39:57 - 40:26

Without human in the loop. The reason why human in the loop is fundamentally challenging is because we only have so much time and we would like an AI to be able to learn at super human rates, at super real-time rates. And to be able to learn at a scale that no humans can keep up with. And so the second question is, how do you train the model? And the third is, how do you scale?

Speaker40:27 - 40:55

How do you find an algorithm whereby the more resource you provide, whatever the resource is, the smarter the AI becomes? The scaling law. Well, this last year, this is where almost the entire world got it wrong. The computation requirement, the scaling law of AI,

Speaker40:55 - 41:24

is more resilient and, in fact, hyper-accelerated. The amount of computation we need at this point as a result of agentic AI, as a result of reasoning, is easily a hundred times more than we thought we needed this time last year. And let's reason about why that's true. The first part is let's just go from what the AI can do. Let me work backwards.

Speaker41:25 - 41:50

Agentic AI, as I mentioned at this foundation, is reasoning. We now have AIs that can reason, which is fundamentally about breaking a problem down step by step. Maybe it approaches a problem in a few different ways and selects the best answer. Maybe it solves the same problem in a variety of ways.

Speaker41:51 - 42:16

And sure, it has the same answer, consistency checking. Or maybe, after it's done deriving the answer, it plugs it back into the equation, maybe a quadratic equation, to confirm that, in fact, that's the right answer. Instead of just one shot blurbing it out. Remember, two years ago, when we started working with ChatGPT, a miracle as it was, many

Speaker42:16 - 42:36

Complicated questions and many simple questions, it simply can't get right, and it's understandably so. It took a one-shot, whatever it learned by studying pre-trained data, whatever it saw from other experiences, pre-trained data, it does a one-shot, blurbs it out, like a savant.

Speaker42:36 - 42:57

Now we have AIs that can reason step by step by step using a technology called chain of thought, best of end, consistency checking, a variety of different path planning, a variety of different techniques. We now have AIs that can reason, break a problem down and reason, step by step by step.

Speaker42:57 - 43:16

Well, you could imagine, as a result, the number of tokens we generate and the fundamental technology of AI is still the same. Generate the next token, predict the next token. It's just that the next token now makes up step one. Then the next token after that, after it generates step one,

Speaker43:16 - 43:41

That step one has gone into the input of the AI again as it generates step two, and step three, and step four. So instead of just generating one token or one word after the next, it generates a sequence of words that represents a step of reasoning. The amount of tokens that's generated as a result is substantially higher, and I'll show you in a second. Easily a hundred times more.

Speaker43:42 - 44:09

A hundred times more, what does that mean? Well, it could generate a hundred times more tokens, and you can see that happening, as I explained previously. Or, the model is more complex. It generates ten times more tokens, and in order for us to keep the model responsive, interactive, so that we don't lose our patience waiting for it to think, we now have to compute ten times faster.

Speaker44:10 - 44:39

And so 10 times tokens, 10 times faster, the amount of computation we have to do is 100 times more easily. And so you're going to see this in the rest of the presentation, the amount of computation we have to do for inference is dramatically higher than it used to be. Well, the question then becomes, how do we teach an AI how to do what I just described? How to execute this chain of thought? Well, one method is you have to teach the AI how to reason.

Speaker44:39 - 45:09

And as I mentioned earlier, in training, there are two fundamental problems we have to solve. Where does the data come from? Where does the data come from? And how do we not have it be limited by human in the loop? There's only so much data and so much human demonstration we can perform. And so this is the big breakthrough in the last couple of years, reinforcement learning, verifiable results. Basically, reinforcement learning

Speaker45:09 - 45:38

of an AI as it attacks or tries to engage solving a problem step by step. Well, we have many problems that have been solved in the history of humanity where we know the answer. We know the equation of a quadratic equation, how to solve that. We know how to solve a Pythagorean theorem, the rules of a right triangle. We know many, many rules of math and geometry and logic and science.

Speaker45:39 - 45:59

We have puzzle games that we could give it, constraint type of problems like Sudoku. Those kind of problems, on and on and on, we have hundreds of these problem spaces where we can generate millions of different examples

Speaker46:00 - 46:13

And give the AI hundreds of chances to solve it step by step by step as we use reinforcement learning to reward it as it does a better and better job.

Speaker46:14 - 46:34

As a result, you take hundreds of different topics, millions of different examples, hundreds of different tries, each one of the tries generating tens of thousands of tokens. You put that all together, we're talking about trillions and trillions of tokens in order to train that model.

Speaker46:34 - 47:02

And now with reinforcement learning, we have the ability to generate an enormous amount of tokens. Synthetic data generation, basically using a robotic approach to teach an AI. The combination of these two things has put an enormous challenge of computing in front of the industry. And you can see that the industry is responding. What I'm about to show you is hopper shipments

Speaker47:04 - 47:33

Of the top four CSPs, the top four CSPs, they're the ones with the public clouds, Amazon, Azure, GCP, and OCI. The top four CSPs, not the AI companies, that's not included, not all the startups, not included, not enterprise, not included, a whole bunch of things not included, just those four. Just to give you a sense of comparing the peak year of Hopper and the first year

Speaker47:34 - 48:02

The peak year of Hopper and the first year of Blackwell. So you can kind of see that, in fact, AI is going through an inflection point. It has become more useful because it's smarter, it can reason. It is more used, you can tell it's more used because whenever you go to chat GPT these days, it seems like you have to wait longer and longer and longer, which is a good thing. It says a lot of people are using it with great effect.

Speaker48:03 - 48:25

And the amount of computation necessary to train those models and to inference those models has grown tremendously. So in just one year, and Blackwell has just started shipping, in just one year you could see the incredible growth in AI infrastructure. Well, that's been reflected in computing across the board.

Speaker48:26 - 48:47

We're now seeing, and the purple is the forecast of analysts about the increase of capital expense of the world's data centers, including CSPs and enterprise and so on, the world's data centers through the end of the decade, so 2030.

Speaker48:48 - 49:16

I've said before that I expect data center build-out to reach a trillion dollars, and I am fairly certain we're going to reach that very soon. Two dynamics is happening at the same time. The first dynamic is that the vast majority of that growth is likely to be accelerated, meaning we've known for some time that general purpose computing has run out of course, run its course, and that we need a new computing approach.

Speaker49:16 - 49:36

And the world is going through a platform shift from hand-coded software running on general purpose computers to machine learning software running on accelerators and GPUs. This way of doing computation is, at this point,

Speaker49:37 - 50:06

And we are now seeing the inflection point happening, the inflection happening in the world's data center build-outs. So the first thing is a transition in the way we do computing. Second is an increase in recognition that the future of software requires capital investment. Now this is a very big idea. Whereas in the past, we wrote the software and we ran it on computers,

Speaker50:06 - 50:18

In the future, the computer is going to generate the tokens for the software. And so the computer has become a generator of tokens, not a retrieval of files.

Speaker50:19 - 50:44

From retrieval-based computing to generative-based computing, from the old way of doing data centers to a new way of building these infrastructure, and I call them AI factories. They're AI factories because it has one job and one job only, generating these incredible tokens that we then reconstitute into music, into words, into videos, into

Speaker50:45 - 51:11

Research into chemicals or proteins. We reconstitute it into all kinds of information of different types. So the world is going through a transition in not just the amount of data centers that will be built, but also how it's built. Well, everything in the data center will be accelerated, not all of its AI. And I want to say a few words about this. You know, this slide, this slide,

Speaker51:12 - 51:33

This slide is genuinely my favorite. And the reason for that is because for all of you coming to GTC all of these years, you've been listening to me talk about these libraries this whole time. This is in fact what GTC is all about, this one slide. And in fact, a long time ago, 20 years ago, this is the only slide we had.

Speaker51:33 - 51:59

One library after another library after another library. You can't just accelerate software just as we needed an AI framework in order to create AIs and we accelerate the AI frameworks. You need frameworks for physics and biology and multi-physics and all kinds of different quantum physics. You need all kinds of libraries and frameworks.

Speaker52:00 - 52:19

We call them CUDAx libraries acceleration frameworks for each one of these fields of science. And so this first one is incredible. This is CUPAI numeric. NumPy is the number one most downloaded Python library, most used Python library in the world. Downloaded 400 million times this last year.

Speaker52:19 - 52:37

CuLitho and CuPi numeric is a zero-change, drop-in acceleration for NumPy. So if any of you are using NumPy out there, give CuPi numeric a try. You're gonna love it. CuLitho, a computational lithography library.

Speaker52:37 - 52:56

Over the course of four years, we've now taken the entire process of processing lithography, computational lithography, which is the second factory in a fab. There's the factory that manufactures the wafers, and then there's the factory that manufactures the information to manufacture the wafers.

Speaker52:56 - 53:14

Every industry, every company that has factories will have two factories in the future. The factory for what they build, and the factory for the mathematics. The factory for the AI. Factory for cars, factory for AIs for the cars. Factory for...

Speaker53:15 - 53:43

Smart speakers and factories for AI for the smart speakers. And so, CuLitho is our computational lithography, TSMC, Samsung, ASML, our partners Synopsys, Mentor, incredible support all over. I think that this is now at its tipping point. In another five years' time, every mask, every single lithography will be processed on NVIDIA CUDA. Arial is our library for 5G, turning a GPU into a 5G radio.

Speaker53:44 - 54:14

Why not? Signal processing is something we do incredibly well. Once we do that, we can layer on top of it AI, AI for RAN, or what we call AI RAN. The next generation of radio networks will have AI deeply inserted into it. Why is it that we're limited by the limits of information theory? Because there's only so much information spectrum we can get, not if we add AI to it.

Speaker54:14 - 54:35

Co-opt numerical or mathematical optimization. Almost every single industry uses this when you plan seats and flights, inventory and customers, workers and plants, drivers and riders.

Speaker54:35 - 55:00

So on and so forth, where we have multiple constraints, a whole bunch of variables, and you're optimizing for time, profit, quality of service, usage of resource, whatever it happens to be. NVIDIA uses it for our supply chain management. Coopt is an incredible library.

Speaker55:00 - 55:23

It takes what would take hours and hours, and it turns into seconds. The reason why that's a big deal is so that we can now explore much larger space. We announced that we are going to open source Coopt. Almost everybody is using either Gurubi or IBM Cplex or FICO.

Speaker55:23 - 55:48

We're working with all three of them. The industry is so excited. We're about to accelerate the living daylights out of the industry. A pair of bricks for gene sequencing and gene analysis. MONI is the world's leading medical imaging library. Earth 2, multi-physics for predicting in very high resolution local weather. Q-Quantum and Q2Q. We're going to have our first...

Speaker55:49 - 56:04

We're working with just about everybody in the ecosystem, either helping them research on quantum architectures, quantum algorithms, or in building a

Speaker56:04 - 56:23

Classical, accelerated, quantum, heterogeneous architecture. And so, really exciting work there. Cu equivariance and Cu tensor for tensor contraction, quantum chemistry. Of course, this stack is world famous. People think that there's one piece of software called CUDA, but in fact,

Speaker56:23 - 56:51

On top of CUDA is a whole bunch of libraries that's integrated into all different parts of the ecosystem and software and infrastructure in order to make AI possible. I've got a new one here to announce today. CUDSS are sparse solvers, really important for CAE. This is one of the biggest things that has happened in the last year. Working with Cadence and Synopsys and Ansys and Dassault

Speaker56:52 - 57:19

All of the systems companies, we've now made possible just about every important EDA and CAE library to be accelerated. What's amazing is until recently, NVIDIA has been using general purpose computers, running software super slowly, to design accelerated computers for everybody else.

Speaker57:19 - 57:45

And the reason for that is because we never had that software, that body of software optimized for CUDA until recently. And so now our entire industry is going to get supercharged as we move to accelerated computing. CU-DF, a data frame for structured data, we now have a drop-in acceleration for Spark and drop-in acceleration for Pandas. Incredible. And then we have Warp,

Speaker57:45 - 58:06

A library for physics that runs in a Python library for physics for CUDA. We have a big announcement there, I will save it in just a second. This is just a sampling of the libraries that make possible accelerated computing.

Speaker58:07 - 58:33

It's not just CUDA. We're so proud of CUDA. But if not for CUDA and the fact that we have such a large install base, none of these libraries would be useful for any of the developers who use them. For all the developers that use them, you use it because one, it's going to give you incredible speed up. It's going to give you incredible scale up. And two, because the install base of CUDA is now everywhere.

Speaker58:34 - 58:48

It's in every cloud, it's in every data center, it's available from every computer company in the world, it's literally everywhere. And therefore, by using one of these libraries, your software, your amazing software, can reach everyone.

Speaker58:48 - 59:17

And so we've now reached the tipping point of accelerated computing. CUDA has made it possible. And all of you, this is what GTC is about, the ecosystem, all of you made this possible. And so we made a little short video for you. Thank you. To the creators, the pioneers, the builders of the future, CUDA was made for you.

Speaker59:18 - 59:45

Since 2006, 6 million developers in over 200 countries have used CUDA and transformed computing. With over 900 CUDAx libraries and AI models, you're accelerating science, reshaping industries, and giving machines the power to see, learn, and reason.

Speaker59:46 - 01:00:04

Now, NVIDIA Blackwell is 50,000 times faster than the first CUDA GPU. These orders of magnitude gains in speed and scale are closing the gap between simulation and real-time digital twins.

Speaker01:00:25 - 01:00:55

And for you, this is still just the beginning. We can't wait to see what you do next. I love what we do. I love even more what you do with it. And one of the things that most touched me, in my 33 years

Speaker01:00:56 - 01:01:24

Doing this, one scientist said to me, Jensen, because of your work, I can do my life's work in my lifetime. And boy, if that doesn't touch you, well, you gotta be a corpse. So this is all about you guys, thank you. All right, so we're gonna talk about AI.

Speaker01:01:25 - 01:01:46

But you know, AI started in the cloud. It started in the cloud for a good reason, because it turns out that AI needs infrastructure. It's machine learning. If the science says machine learning, then you need a machine to do the science. And so machine learning requires infrastructure, and the cloud data centers had infrastructure.

Speaker01:01:47 - 01:02:12

They also have extraordinary computer science, extraordinary research, the perfect circumstance for AI to take off in the cloud and the CSPs. But that's not where AI is limited to. AI will go everywhere. And we're going to talk about AI in a lot of different ways. And the cloud service providers, of course, they like our leading-edge technology. They like the fact that we have full stack.

Speaker01:02:12 - 01:02:42

Because accelerated computing, as you know, as I was explaining earlier, is not about the chip. It's not even just the chip in the library, the programming model. It's the chip, the programming model, and a whole bunch of software that goes on top of it. That entire stack is incredibly complex. Each one of those layers, each one of those libraries, is essentially like SQL. SQL, as you know, is called in-storage computing. It was the big revolution of computation by IBM.

Speaker01:02:43 - 01:03:10

It's one library, just imagine. I just showed you a whole bunch of them. And in the case of AI, there's a whole bunch more. So the stack is complicated. They also love the fact that CSPs love that NVIDIA CUDA developers are CSP customers. Because in the final analysis, they're building infrastructure for the world to use. And so the rich developer ecosystem is really valued and really deeply appreciated.

Speaker01:03:11 - 01:03:36

Well, now that we're going to take AI out to the rest of the world, the rest of the world has different system configurations, operating environment differences, domain-specific library differences, usage differences, and so AI as it translates to enterprise IT.

Speaker01:03:37 - 01:03:53

As it translates to manufacturing, as it translates to robotics or self-driving cars, or even companies that are starting GPU clouds. There's a whole bunch of companies, maybe 20 of them, who started during the NVIDIA time.

Speaker01:03:53 - 01:04:16

And what they do is just one thing. They host GPUs. They call themselves GPU Clouds. And one of our great partners, CoreWeave, is in the process of going public, and we're super proud of them. And so, GPU Clouds, they have their own requirements. But one of the areas that I'm super excited about is Edge. And today, we announced

Speaker01:04:16 - 01:04:43

We announced today that Cisco, NVIDIA, T-Mobile, the largest telecommunications company in the world, Cerberus ODC, are going to build a full stack for radio networks here in the United States. And that's going to be the second stack. So this current stack we're announcing today will put AI into the edge.

Speaker01:04:43 - 01:04:56

Remember, $100 billion of the world's capital investments each year is in the radio networks and all of the data centers provisioning for communications.

Speaker01:04:56 - 01:05:23

In the future, there is no question in my mind that's going to be accelerated computing infused with AI. AI will do a far, far better job adapting the radio signals, the massive MIMOs, to the changing environments and the traffic conditions. Of course it would. Of course we would use reinforcement learning to do that. Of course MIMO is essentially one giant radio robot.

Speaker01:05:23 - 01:05:51

Of course it is. And so we will, of course, provide for those capabilities. Of course, AI could revolutionize communications. You know, when I call home, you don't have to say but that few words because my wife knows where I work, what the condition's like. Conversation carries on from yesterday. She kind of remembers what I like, don't like. And oftentimes, just a few words, you communicated a whole bunch.

Speaker01:05:51 - 01:06:19

The reason for that is because of context and human priors, prior knowledge. Well, combining those capabilities could revolutionize communications. Look what it's doing for video processing. Look what I just described earlier in 3D graphics. And so, of course, we're going to do the same for Edge. So I'm super excited about the announcement that we made today. T-Mobile, Cisco, NVIDIA, Cerberus, ODC are going to build a full stack.

Speaker01:06:28 - 01:06:56

Well, AI is going to go into every industry. That's just one. One of the earliest industries that AI went into was autonomous vehicles. The moment I saw AlexNet, and we've been working on computer vision for a long time, the moment I saw AlexNet was such an inspiring moment, such an exciting moment, it caused us to decide to go all in on building self-driving cars.

Speaker01:06:56 - 01:06:59

So we've been working on self-driving cars now for over a decade.

Speaker01:07:00 - 01:07:27

We build technology that almost every single self-driving car company uses. It could be either in the data center. For example, Tesla uses lots of NVIDIA GPUs in the data center. It could be in the data center or the car. Waymo and Wave uses NVIDIA computers in data centers as well as the car. It could be just in the car. It's very rare, but sometimes it's just in the car. Or they use all of our software in addition.

Speaker01:07:27 - 01:07:54

We work with the car industry however the car industry would like us to work with them. We built all three computers, the training computer, the simulation computer, and the robotics computer, the self-driving car computer, all the software stack that sits on top of it, models and algorithms, just as we do with all of the other industries that I've demonstrated. And so today, I'm super excited to announce

Speaker01:07:55 - 01:08:18

GM has selected NVIDIA to partner with them to build their future self-driving car fleet. The time for autonomous vehicles has arrived, and we're looking forward to building with GM AI in all three areas.

Speaker01:08:19 - 01:08:48

AI for manufacturing, so they can revolutionize the way they manufacture. AI for enterprise, so they can revolutionize the way they work, design cars and simulate cars, and then also AI for in the car. So AI infrastructure for GM, partnering with GM, and building with GM their AI. So I'm super excited about that. One of the areas that I'm deeply proud of, and it rarely gets any attention, is safety. Automotive safety.

Speaker01:08:49 - 01:09:17

It's called halos. In our company it's called halos. Safety requires technology from silicon to systems, the system software, the algorithms, the methodologies, everything from diversity to ensuring diversity, monitoring and transparency,

Speaker01:09:18 - 01:09:42

Explainability. All of these different philosophies have to be deeply ingrained into every single part of how you develop the system and the software. We're the first company in the world, I believe, to have every line of code safety assessed. Seven million lines of code safety assessed. Our chip, our system, our system software, and our algorithms are safety

Speaker01:09:43 - 01:09:58

Assessed by third parties that crawl through every line of code to ensure that it is designed to ensure diversity, transparency and explainability. We also

Speaker01:09:58 - 01:10:28

I have filed over a thousand patents, and during this GTC, and I really encourage you to do so, is to go spend time in the Halos workshop so that you could see all of the different things that comes together to ensure that cars of the future are going to be safe as well as autonomous. And so this is something I'm very proud of. It rarely gets any attention, and so I thought I would spend the extra time this time to talk about that. Okay, NVIDIA Halos.

Speaker01:10:34 - 01:10:58

All of you have seen cars drive by themselves. The Waymo robotaxis are incredible. But we made a video to share with you some of the technology we use to solve the problems of data and training and diversity so that we could use the magic of AI to go create AI. Let's take a look.

Speaker01:11:03 - 01:11:29

NVIDIA is accelerating AI development for AVs with Omniverse and Cosmos. Cosmos prediction and reasoning capabilities support AI-first AV systems that are end-to-end trainable with new methods of development. Model distillation, closed-loop training, and synthetic data generation. First, model distillation.

Speaker01:11:30 - 01:11:53

Adapted as a policy model, Cosmos' driving knowledge transfers from a slower, intelligent teacher to a smaller, faster student, inferenced in the car. The teacher's policy model demonstrates the optimal trajectory, followed by the student model learning through iterations, until it performs at nearly the same level as the teacher.

Speaker01:11:55 - 01:12:23

The distillation process bootstraps a policy model, but complex scenarios require further tuning. Closed-loop training enables fine-tuning of policy models. Log data is turned into 3D scenes for driving closed-loop in physics-based simulation using omniverse neural reconstruction. Variations of these scenes are created to test the model's trajectory generation capabilities.

Speaker01:12:25 - 01:12:53

Cosmos Behavior Evaluator can then score the generated driving behavior to measure model performance. Newly generated scenarios and their evaluation create a large dataset for closed-loop training, helping AVs navigate complex scenarios more robustly. Last, 3D synthetic data generation enhances AVs adaptability to diverse environments.

Speaker01:12:54 - 01:13:16

From log data, Omniverse builds detailed 4D driving environments by fusing maps and images, and generates a digital twin of the real world, including segmentation, to guide Cosmos by classifying each pixel. Cosmos then scales the training data by generating accurate and diverse scenarios, closing the sim-to-real gap.

Speaker01:13:18 - 01:13:45

Omniverse and Cosmos enable AVs to learn, adapt, and drive intelligently, advancing safer mobility. NVIDIA is the perfect company to do that. Gosh, that's our destiny.

Speaker01:13:46 - 01:14:15

Use AI to recreate AI. The technology that we showed you there is very similar to the technology that you're enjoying to take you to a digital twin we call NVIDIA. All right, let's talk about data centers. That's not bad, huh?

Speaker01:14:22 - 01:14:49

Gaussian Splats, just in case. Well, let's talk about data centers. Blackwell is in full production, and this is what it looks like. It's an incredible, incredible... You know, for people, for us, this is a sight of beauty. Would you agree? This is... How is this not beautiful? How is this not beautiful?

Speaker01:14:49 - 01:15:12

Well, this is a big deal because we made a fundamental transition in computer architecture. I just want you to know that, in fact, I've shown you a version of this about three years ago. It was called Grace Hopper, and the system was called Ranger.

Speaker01:15:13 - 01:15:40

The Ranger system is maybe about half of the width of the screen, and it was the world's first NVLink32. Three years ago, we showed Ranger working, and it was way too large, but it was exactly the right idea. We were trying to solve scale-up,

Speaker01:15:40 - 01:16:07

Distributed computing is about using a whole lot of different computers working together to solve a very large problem. But there's no replacement for scaling up before you scale out. Both are important, but you want to scale up first before you scale out. While scaling up is incredibly hard, there is no simple answer for it. You're not going to scale it up, you're not going to scale it out like Hadoop.

Speaker01:16:08 - 01:16:38

Take a whole bunch of commodity computers, hook it up into a large network and do in-storage computing using Hadoop. Hadoop was a revolutionary idea as we know. It enabled hyperscale data centers to solve problems of gigantic sizes using off-the-shelf computers. However, the problem we're trying to solve is so complex that scaling in that way would have simply

Speaker01:16:38 - 01:16:56

Cost way too much power, way too much energy. Deep learning would have never happened. And so the thing that we had to do was scale up first. Well, this is the way we scaled up. I'm not going to lift this. This is 70 pounds. This is the last generation system architecture. It's called HGX.

Speaker01:16:56 - 01:17:22

This revolutionized computing as we know it. This revolutionized artificial intelligence. This is eight GPUs. Each one of them is kind of like this. This is two GPUs, two Blackwell GPUs, in one Blackwell package. Two Blackwell GPUs in one Blackwell package. There are eight of these underneath this.

Speaker01:17:23 - 01:17:52

And this connects into what we call MVLink 8. This then connects to a CPU shelf like that, so there's dual CPUs, and that sits on top, and we connect it over PCI Express, and then many of these get connected with InfiniBand, which turns into what is an AI supercomputer. This is the way it was in the past. This is how we started.

Speaker01:17:52 - 01:18:08

Well, this is as far as we scaled up before we scaled out. But we wanted to scale up even further. And I told you that Ranger took this system and scaled it up by another factor of four.

Speaker01:18:09 - 01:18:37

And so we had MV-Link 32, but the system was way too large. And so we had to do something quite remarkable, re-engineer how MV-Link worked and how Scalab worked. And so the first thing that we did was we said, listen, the MV-Link switches are in this system embedded on the motherboard. We need to disaggregate the MV-Link system and take it out. So this is the MV-Link system. Okay, this is an MV-Link switch.

Speaker01:18:39 - 01:19:07

This is the highest performance switch the world's ever made. And this makes it possible for every GPU to talk to every GPU at exactly the same time at full bandwidth. Okay, so this is the NVLink switch. We disaggregated it, we took it out, and we put it in the center of the chassis. So there's all the, there are 18 of these switches in nine different racks, nine different switch racks.

Speaker01:19:08 - 01:19:30

Trays, we call them. And then the switches are disaggregated, the compute is now sitting in here. This is equivalent to these two things in compute. What's amazing is this is completely liquid cooled, and by liquid cooling it, we can compress all of these compute nodes into one rack.

Speaker01:19:30 - 01:19:59

This is the big change of the entire industry. All of you in the audience, I know how many of you are here, I want to thank you for making this fundamental shift from integrated NVLink to disaggregated NVLink, from air-cooled to liquid-cooled, from 60,000 components per computer or so,

Speaker01:19:59 - 01:20:20

To 600,000 components per rack, 120 kilowatts, fully liquid cooled, and as a result, we have a one Exaflops computer in one rack. Isn't it incredible?

Speaker01:20:26 - 01:20:55

So this is the compute node. This is the compute node, okay? And that now fits in one of these. Now we... 3,000 pounds, 5,000 cables, about two miles worth, just an incredible electronics.

Speaker01:20:56 - 01:21:13

600,000 parts, I think that's like 20 cars. 20 cars worth of parts. And integrates into one supercomputer. Well, our goal is to do this. Our goal is to do scale up. And this is what it now looks like. We essentially wanted to build this chip.

Speaker01:21:13 - 01:21:30

It's just that no radical limits can do this, no process technology can do this. It's 130 trillion transistors, 20 trillion of it is used for computing. So it's not like you can reasonably build this anytime soon.

Speaker01:21:30 - 01:21:55

And so the way to solve this problem is to disaggregate it, as I've described, into the Grace Blackwell MV-Link 72 rack. But as a result, we have done the ultimate scale-up. This is the most extreme scale-up the world has ever done. The amount of computation that's possible here, the memory bandwidth,

Speaker01:21:55 - 01:22:24

570 terabytes per second. Everything in this machine is now in Ts. Everything's a trillion. And you have an exaflops, which is a million trillion floating point operations per second. Well, the reason why we wanted to do this is to solve an extreme problem. And that extreme problem, a lot of people misunderstood.

Speaker01:22:25 - 01:22:48

To be easy. And in fact, it is the ultimate extreme computing problem, and it's called inference. And the reason for that is very simple. Inference is token generation by a factory, and a factory is revenue and profit generating, or lack of.

Speaker01:22:50 - 01:23:19

And so this factory has to be built with extreme efficiency, with extreme performance, because everything about this factory directly affects your quality of service, your revenues, and your profitability. Let me show you how to read this chart, because I want to come back to this a few more times. Basically, you have two axes. On the x-axis is the tokens per second,

Speaker01:23:19 - 01:23:49

Whenever you put a prompt into ChatGPT, what comes out is tokens. Those tokens are reformulated into words. It's more than a token per word. And they'll tokenize things like T-H-E could be used for the, it could be used for them, it could be used for theory, it could be used for theatrics, it could be used for all kinds of... And so T-H-E is an example of a token. They reformulate these tokens to turn into words.

Speaker01:23:50 - 01:24:17

We've already established that if you want your AI to be smarter, you want to generate a whole bunch of tokens. Those tokens are reasoning tokens, consistency-checking tokens, coming up with a whole bunch of ideas so they can select the best of those ideas, tokens. And so those tokens, it might be second-guessing itself, it might be, is this the best work you could do? And so it talks to itself, just like we talk to ourselves.

Speaker01:24:17 - 01:24:44

The more tokens you generate, the smarter your AI. But if you take too long to answer a question, the customer is not going to come back. This is no different than web search. There is a real limit to how long it can take before it comes back with a smart answer. And so you have these two dimensions that you're fighting against. You're trying to generate a whole bunch of tokens, but you're trying to do it as quickly as possible. Therefore, your token rate matters.

Speaker01:24:45 - 01:25:14

So you want your tokens per second for that one user to be as fast as possible. However, in computer sciences and factories, there's a fundamental tension between latency response time and throughput. And the reason is very simple. If you're in the large high volume business, you batch up, it's called batching, you batch up a lot of customer demand and you manufacture

Speaker01:25:15 - 01:25:42

A certain version of it for everybody to consume later. However, from the moment that they batched up and manufactured whatever they did, to the time that you consumed it, could take a long time. So no different for computer science, no different for AI factories that are generating tokens. And so you have these two fundamental tensions, on the one hand,

Speaker01:25:43 - 01:26:08

You would like the customer's quality of service to be as good as possible, smart AIs that are super fast. On the other hand, you're trying to get your data center to produce tokens for as many people as possible so you can maximize your revenues. The perfect answer is to the upper right. Ideally, the shape of that curve is a square.

Speaker01:26:09 - 01:26:38

That you could generate very fast tokens per person up until the limits of the factory, but no factory can do that. And so it's probably some curve, and your goal is to maximize the area under the curve, the product of X and Y, and the further you push out, more likely it means the better of a factory that you're building. Well, it turns out,

Speaker01:26:39 - 01:27:07

That in tokens per second for the whole factory and tokens per second response time, one of them requires enormous amount of computation, flops, and then the other dimension requires an enormous amount of bandwidth and flops. And so this is a very difficult problem to solve. The good answer is that you should have lots of flops and lots of bandwidth and lots of memory and lots of everything. That's the best answer to start, which is the reason why

Speaker01:27:08 - 01:27:35

This is such a great computer. You start with the most flops you can, the most memory you can, the most bandwidth you can, of course the best architecture you can, the most energy efficiency you can, and you have to have a programming model that allows you to run software across all of this insanely hard so that you could do this. Now let's just take a look at this one demo to give you a tactile feeling of what I'm talking about. Please play it.

Speaker01:27:37 - 01:27:55

Traditional LLMs capture foundational knowledge, while reasoning models help solve complex problems with thinking tokens. Here, a prompt asks to seat people around a wedding table while adhering to constraints like traditions, photogenic angles, and feuding family members.

Speaker01:27:58 - 01:28:26

Traditional LLM answers quickly with under 500 tokens. It makes mistakes in seating the guests, while the reasoning model thinks with over 8,000 tokens to come up with the correct answer. It takes a pastor to keep the peace. Okay. As all of you know, if you have a wedding party of 300,

Speaker01:28:28 - 01:28:45

And you're trying to find the perfect, well, the optimal seating for everyone. That's a problem that only AI can solve or a mother-in-law can solve. And so, that's one of those problems that co-op cannot solve.

Speaker01:28:47 - 01:29:13

Okay, so what you see here is that we gave it a problem that requires reasoning, and you saw R1 goes off and it reasons about it, tries all these different scenarios, and it comes back and it tests its own answer. It asks itself whether it did it right. Meanwhile, the last generation language model does a one-shot. So the one-shot is 439 tokens. It was fast, it was effective, but it was wrong.

Speaker01:29:15 - 01:29:43

So it was 439 wasted tokens. On the other hand, in order for you to reason about this problem, and that was actually a very simple problem, you just give it a few more difficult variables, and it becomes very difficult to reason through, and it took 8,000, almost 9,000 tokens. And it took a lot more computation, because the model's more complex. Okay, so that's one dimension. Before I show you some results, let me explain something else. So the answer...

Speaker01:29:44 - 01:30:11

If you look at Blackwell, you look at the Blackwell system, and it's now the scaled up MV-Link 72. The first thing that we have to do is we have to take this model, and this model is not small. In the case of R1, people think R1 is small, but it's 680 billion parameters. Next generation models could be trillions of parameters. And the way that you solve that problem is you take these trillions and trillions of parameters,

Speaker01:30:12 - 01:30:38

And this model, and you distribute the workload across the whole system of GPUs. You can use tensor parallel, you can take one layer of the model and run it across multiple GPUs. You could take a slice of the pipeline and call that pipeline parallel and put that on multiple GPUs. You could take different experts and put it across different GPUs, we call it expert parallel.

Speaker01:30:39 - 01:30:58

The combination of pipeline parallelism and tensor parallelism and expert parallelism, the number of combinations is insane. And depending on the model, depending on the workload, depending on the circumstance, how you configure that computer has to change so that you can get the maximum throughput out of it.

Speaker01:30:59 - 01:31:20

You also sometimes optimize for very low latency, sometimes you try to optimize for throughput, and so you have to do some in-flight batching. A lot of different techniques for batching and aggregating work. And so the software, the operating system for these AI factories is insanely complicated. Well, one of the observations

Speaker01:31:21 - 01:31:44

And this is a really terrific thing about having a homogenous architecture like MVLink72, is that every single GPU could do all the things that I just described. And we observe that these reasoning models are doing a couple of phases of computing. One of the phases of computing is thinking.

Speaker01:31:44 - 01:32:12

When you're thinking, you're not producing a lot of tokens. You're producing tokens that you're maybe consuming yourself. You're thinking. Maybe you're reading. You're digesting information. That information could be a PDF. That information could be a website. You could literally be watching a video, ingesting all of that at super linear rates. And you take all of that information, and you then formulate the answer. Formulate a planned answer.

Speaker01:32:13 - 01:32:41

And so that digestion of information, context processing, is very flops-intensive. On the other hand, during the next phase, it's called decode, so the first part we call pre-fill, the next phase of decode requires floating-point operations, but it requires an enormous amount of bandwidth. And it's fairly easy to calculate. You know, if you have a model and it's a few trillion parameters, well,

Speaker01:32:42 - 01:33:09

It takes a few terabytes per second. Notice I was mentioning 576 terabytes per second. It takes terabytes per second to just pull the model in from HBM memory and to generate literally one token. And the reason it generates one token is because, remember, that these large language models are predicting the next token. That's why they say the next token. It's not predicting every single token. It's predicting the next token.

Speaker01:33:09 - 01:33:33

Now we have all kinds of new techniques, speculative decoding and all kinds of new techniques for doing that faster, but in the final analysis, you're predicting the next token. And so you ingest, pull in the entire model and the context, we call it a KV cache, and then we produce one token. And then we take that one token, we put it back into our brain, we produce the next token.

Speaker01:33:33 - 01:34:01

Every single one, every single time we do that, we take trillions of parameters in, we produce one token. Trillions of parameters in, produce another token. Trillions of parameters in, produce another token. And notice that demo, we produced 8,600 tokens. So trillions of bytes of information, trillions of bytes of information have been taken into our GPUs and produced one token at a time.

Speaker01:34:02 - 01:34:31

Which is fundamentally the reason why you want NVLink. NVLink gives us the ability to take all of those GPUs and turn them into one massive GPU. The ultimate scale-up. And the second thing is that now that everything is on NVLink, I can disaggregate the pre-fill from the decode, and I could decide I want to use more GPUs for pre-fill, less for decode.

Speaker01:34:32 - 01:34:56

Because I'm thinking a lot. It's agentic. I'm reading a lot of information. I'm doing deep research. Notice doing deep research? And earlier I was listening to Michael, and Michael was talking about him doing research, and I do the same thing. And we go off and we write these really long research projects for our AI, and I love doing that, because I already paid for it.

Speaker01:35:00 - 01:35:22

And I just love making our GPUs work. And nothing gives me more joy. So I write up, and then it goes off and it does all this research, and it went off to like 94 different websites, and it read all this, and I'm reading all this information, and it formulates an answer, and writes the report. It's incredible, okay? During that entire time, pre-fill is super busy.

Speaker01:35:22 - 01:35:45

And it's not really generating that many tokens. On the other hand, when you're chatting with the chatbot, and millions of us are doing the same thing, it is very token generation heavy. It's very decode heavy. Depending on the workload, we might decide to put more GPUs into decode, depending on the workload, put more GPUs into pre-fill.

Speaker01:35:45 - 01:36:02

Well, this dynamic operation is really complicated. So I've just now described pipeline parallel, tensor parallel, expert parallel, in-flight batching, disaggregated inferencing, workload management,

Speaker01:36:03 - 01:36:29

And then I've got to take this thing called a KV cache, I've got to route it to the right GPU, I've got to manage it through all the memory hierarchies. That piece of software is insanely complicated. And so today we're announcing the NVIDIA Dynamo. NVIDIA Dynamo does all that. It is essentially the operating system of an AI factory.

Speaker01:36:31 - 01:36:59

Whereas in the past, in the way that we ran data centers, our operating system would be something like VMware. And we would orchestrate, and we still do, you know, we're a big user, we would orchestrate a whole bunch of different enterprise applications running on top of our enterprise IT. But in the future, the application is not enterprise IT, it's agents. And the operating system is not something like VMware, it's something like Dynamo.

Speaker01:37:00 - 01:37:25

And this operating system is running on top of not a data center, but on top of an AI factory. Now we call it dynamo for a good reason. As you know, the dynamo was the first instrument that started the last industrial revolution. The industrial revolution of energy. Water comes in, electricity comes out. It's pretty fantastic. Water comes in, you light it on fire, turn it into steam.

Speaker01:37:25 - 01:37:52

And what comes out is this invisible thing that's incredibly valuable. It took another 80 years to go to alternate and current, but Dynamo. Dynamo is where it all started. So we decided to call this operating system, this piece of software, insanely complicated software, the NVIDIA Dynamo. It's open source, and we're so happy that so many of our partners are working with us on it, and one of my favorite

Speaker01:37:52 - 01:38:22

My favorite partners, I just love them so much, because of the revolutionary work that they do, and also because Erwin is such a great guy. But Perplexity is a great partner of ours in working through this. So anyhow, really, really great. Okay, so now we're going to have to wait until we scale up all these infrastructure, but in the meantime, we've done a whole bunch of very in-depth simulation. We have supercomputers doing simulation of our supercomputers, which makes sense.

Speaker01:38:22 - 01:38:51

And I'm now going to show you the benefit of everything I've just said. And remember the factory diagram. On the x-axis is tokens per second throughput, excuse me, on the y-axis, tokens per second throughput of the factory, and the x-axis, tokens per second of the user experience. And you want super smart AIs, and you want to produce a whole bunch of them. This is Hopper.

Speaker01:38:52 - 01:39:15

So this is Hopper, and it can produce for each user about 100 tokens per second. This is 8 GPUs, and it's connected with InfiniBand, and I'm normalizing it to tokens per second per megawatt.

Speaker01:39:16 - 01:39:46

So it's a one megawatt data center, which is not a very large AI factory, but anyhow, one megawatt, okay? And so it can produce, for each user, 100 tokens per second, and it can produce at this level, whatever that happens to be, 100,000 tokens per second for that one megawatt data center. Or it can produce about two and a half million tokens per second, two and a half million tokens per second,

Speaker01:39:46 - 01:40:10

For that AI factory, if it was super batched up, and the customer is willing to wait a very long time. Okay? Does that make sense? All right, so nod. All right, because this is where, you know, every GTC, there's the price for entry, you guys know? And you get tortured with math, okay? This is the only...

Speaker01:40:12 - 01:40:28

Only at NVIDIA do you get tortured with math. Alright, so, Hopper, you get two and a half. Now, what's that two and a half million? How do you translate that? Two and a half million, remember, ChatGPT is like $10 per million tokens.

Speaker01:40:29 - 01:40:51

Right? Ten dollars per million tokens. Let's pretend for a second that that's... I think the ten dollars per million tokens is probably down here. Okay? I'd probably say it's down here. But let me pretend it's up there. Because two and a half million, ten, so twenty-five million dollars per second. Does that make sense?

Speaker01:40:51 - 01:41:16

That's how you think through it. Or, on the other hand, if it's way down here, then the question is, you know, so it's 100,000, 100,000, just divide that by 10, okay? $250,000 per factory per second. And then, it's 31 million, 30 million seconds in a year, and that translates into revenues for that one million, that one megawatt data center. And so that's your goal.

Speaker01:41:17 - 01:41:44

On the one hand, you would like your token rate to be as fast as possible so that you can make really smart AIs, and if you have smart AIs, people pay you more money for it. On the other hand, the smarter the AI, the less you can make in volume. Very sensible trade-off. And this is the curve we're trying to bend. Now, what I'm just showing you right now is the fastest computer in the world, Hopper.

Speaker01:41:45 - 01:42:08

It's the computer that revolutionized everything. And so how do we make that better? So the first thing that we do is we come up with Blackwell with MVLink8. Same Blackwell, that one same compute, and that one compute node with MVLink8 using FP8. And so Blackwell is just faster. Faster, bigger, more transistors, more everything.

Speaker01:42:09 - 01:42:37

But we'd like to do more than that, and so we introduced a new precision. It's not quite as simple as 4-bit floating point, but using 4-bit floating point, we can quantize the model, use less energy to do the same. And as a result, when you use less energy to do the same, you could do more. Because remember, one big idea is that every single data center in the future will be power limited.

Speaker01:42:37 - 01:43:02

Your revenues are power-limited. You can figure out what your revenues are going to be based on the power you have to work with. This is no different than many other industries. We are now a power-limited industry. Our revenues will associate with that. Based on that, you want to make sure you have the most energy-efficient compute architecture you can possibly get.

Speaker01:43:03 - 01:43:31

Then we scale up with MVLink 72. Does that make sense? Look at the difference between that, MVLink 72 FP4, and then, because our architecture is so tightly integrated, and now we add Dynamo to it, Dynamo can extend that even further. Are you following me? So Dynamo also helps Hopper, but Dynamo helps Blackwell incredibly. Now, yep.

Speaker01:43:36 - 01:44:06

Only at GTC do you get an applause for that. So now notice what I put, those two shiny parts, that's kind of where your max Q is. That's likely where you'll run your factory operations. You're trying to find that balance between maximum throughput and maximum quality of AI. Smartest AI, the most of it. Those two, that XY intercept, is really what you're optimizing for, and that's what it looks like if you look underneath those two squares.

Speaker01:44:07 - 01:44:30

Blackwell is way, way better than Hopper. And remember, this is not ISO chips. This is ISO power. This is ultimate Moore's Law. This is what Moore's Law was always about in the past. And now here we are, 25x in one generation as ISO power.

Speaker01:44:32 - 01:44:56

It's not ISO chips, it's not ISO transistors, it's not ISO anything. ISO power, the ultimate, the ultimate limiter. There's only so much energy we can get into a data center. And so within ISO power, Blackwell is 25 times, now here's the, that rainbow, that's incredible. That's the fun part. Look, all the different config, every,

Speaker01:44:56 - 01:45:25

Underneath the pareto, the frontier pareto, we call it the frontier pareto, under the frontier pareto are millions of points we could have configured the data center to do. We could have parallelized and split the work and sharded the work in a whole lot of different ways. And we found the most optimal answer, which is the pareto, the frontier pareto, okay, the pareto frontier, and each one of them

Speaker01:45:25 - 01:45:51

Because the color shows you it's a different configuration. Which is the reason why this image says very, very clearly you want a programmable architecture that is as homogeneously fungible, as fungible as possible. Because the workload changes so dramatically across the entire frontier. And look, we got...

Speaker01:45:51 - 01:46:14

On the top, Expert Parallel 8, batch of 3,000, disaggregation off, Dynamo off. In the middle, Expert Parallel 64 with, oh, the 26% is used for context, so Dynamo is turned on, 26% context, the other 64% is used for context.

Speaker01:46:14 - 01:46:33

74% is not, batch of 64, and expert parallel of 64 on one, expert parallel of 4 on the other. And then down here, all the way to the bottom, you got tensor parallel 16 with expert parallel 4, batch of 2, 1% context. The configuration of the computer is changing across that entire spectrum.

Speaker01:46:34 - 01:47:02

This is with input sequence length. This is kind of a commodity test case. This is a test case that you can benchmark relatively easily. The input is 1,000 tokens. The output is 2,000. Notice earlier, we just showed you a demo where the output is very simply 9,000, 8,000. And so obviously this is not representative of just that one chat. Now this one is more representative.

Speaker01:47:03 - 01:47:21

And this is what, you know, the goal is to build these next generation computers for next generation workloads. And so here's an example of a reasoning model. And in a reasoning model, Blackwell is 40 times, 40 times the performance of Hopper. Straight up. Pretty amazing.

Speaker01:47:28 - 01:47:55

You know, I've said before, somebody actually asked, you know, why would I say that? But I said before that when Blackwell starts shipping in volume, you couldn't give hoppers away. And this is what I mean. And this makes sense. If anybody, if you're still looking to buy a hopper, don't be afraid. It's okay. But, I'm the chief revenue destroyer.

Speaker01:47:58 - 01:48:25

My sales guys are going, oh no, don't say that. There are circumstances where Hopper is fine. That's the best thing I could say about Hopper. There are circumstances where you're fine. Not many. If I had to take a swing, and so that's kind of my point.

Speaker01:48:25 - 01:48:55

When the technology is moving this fast, and because the workload is so intense, and you're building these things, they're factories, we really like you to invest in the right versions. Okay, just to put it in perspective, this is what a 100 megawatt factory looks like. This is a 100 megawatt factory. You have, based on hoppers, you have 45,000 dies, 1,400 racks, and it produces 300 million tokens per second.

Speaker01:48:55 - 01:49:23

Okay? And then this is what it looks like with Blackwell. You have 86, yeah, I know. That doesn't make any sense. Okay, so we're not trying to sell you less. Okay, our sales guys are going, Jensen, you're selling them less. This is better.

Speaker01:49:24 - 01:49:40

The more you buy, the more you save. It's even better than that. Now the more you buy, the more you make.

Speaker01:49:40 - 01:50:10

Remember, everything is now in the context of AI factories. Although we talk about the chips, you always start from scale up. We talk about the chips, but you always start from scale up, the full scale up. What can you scale up to the maximum? I want to show you now what an AI factory looks like, but AI factories are so complicated. I just gave you an example of one rack. It has 600,000 parts.

Speaker01:50:10 - 01:50:28

You know, it's 3,000 pounds. Now, you've got to take that and connect it with a whole bunch of others. And so we are starting to build what we call the digital twin of every data center. Before you build a data center, you have to build a digital twin. Let's take a look at this. This is just incredibly beautiful.

Speaker01:50:37 - 01:51:02

The world is racing to build state-of-the-art, large-scale AI factories. Bringing up an AI Gigafactory is an extraordinary feat of engineering, requiring tens of thousands of workers from suppliers, architects, contractors, and engineers to build, ship, and assemble nearly 5 billion components and over 200,000 miles of fiber, nearly the distance from the Earth to the Moon.

Speaker01:51:03 - 01:51:28

The NVIDIA Omniverse Blueprint for AI factory digital twins enables us to design and optimize these AI factories long before physical construction starts. Here, NVIDIA engineers use the blueprint to plan a 1-gigawatt AI factory, integrating 3D and layout data of the latest NVIDIA DGX superpods and advanced power and cooling systems from Vertiv and Schneider Electric.

Speaker01:51:30 - 01:51:51

and optimized topology from NVIDIA AIR, a framework for simulating network logic, layout, and protocols. This work is traditionally done in silos. The Omniverse Blueprint lets our engineering teams work in parallel and collaboratively, letting us explore various configurations to maximizing TCO and power usage effectiveness.

Speaker01:51:52 - 01:52:16

NVIDIA uses Cadence Reality Digital Twin, accelerated by CUDA and Omniverse libraries, to simulate air and liquid cooling systems. And Schneider Electric with ETAP, an application to simulate power block efficiency and reliability. Real-time simulation lets us iterate and run large-scale what-if scenarios in seconds versus hours.

Speaker01:52:17 - 01:52:35

We use the digital twin to communicate instructions to the large body of teams and suppliers, reducing execution errors and accelerating time to bring up. And when planning for retrofits or upgrades, we can easily test and simulate cost and downtime, ensuring a future-proof AI factory.

Speaker01:52:48 - 01:53:13

This is the first time anybody who builds data thinks, oh, that's so beautiful. All right, I've got to race here because it turns out I've got a lot to tell you. And so if I go a little too fast, it's not because I don't care about you. It's just I've got a lot of information to go through. All right, so first, our roadmap. We're now in full production of Blackwell.

Speaker01:53:14 - 01:53:20

Computer companies all over the world are ramping these incredible machines at scale.

Speaker01:53:20 - 01:53:50

I'm just so pleased and so grateful that all of you worked hard on transitioning into this new architecture. And now, in the second half of this year, we'll easily transition into the upgrade. So we have the Blackwell Ultra MV-Link 72. You know, it's one and a half times more flabs. It's got a new instruction for attention. It's one and a half times more memory. All that memory is useful for things like KV cache. It's two times more bandwidth.

Speaker01:53:50 - 01:54:10

For networking bandwidth. And so now that we have the same architecture, we'll just kind of gracefully glide into that, and that's called Blackwell Ultra. So that's coming second half of this year. Now there's a reason why this is the only product announcement in any company where everybody's going, yeah, next.

Speaker01:54:18 - 01:54:36

And in fact, that's exactly the response I was hoping to get. And here's why. Look, we're building AI factories and AI infrastructure. It's going to take years of planning. This isn't like buying a laptop. This isn't discretionary spend.

Speaker01:54:36 - 01:54:57

This is spend that we have to go plan on. And so we have to plan on having, of course, the land and the power, and we have to get our CapEx ready, and we get engineering teams, and we have to lay it out a couple, two, three years in advance, which is the reason why I show you our roadmap a couple, two, three years in advance. So that we don't surprise you in May.

Speaker01:54:58 - 01:55:20

You know, hi, in another month we're going to go to this incredible new system. I'll show you an example in a second. And so we planned this out in multiple years. The next click, one year out, is named after an astronomer, and her grandkids are here. Her name is Vera Rubin, she discovered dark matter. Yep.

Speaker01:55:25 - 01:55:50

Vera Rubin is incredible because the CPU is new. It's twice the performance of Grace. More memory, more bandwidth, and yet just a little tiny 50-watt CPU. It's really quite incredible. And Rubin, brand new GPU. CX9, brand new networking smart NIC. NVLink 6, brand new NVLink.

Speaker01:55:50 - 01:56:18

Brand new memories, HBM-4. Basically, everything is brand new, except for the chassis. And this way, we could take a whole lot of risk in one direction, and not risk a whole bunch of other things related to the infrastructure. And so Vera Rubin, NVLink-144, is the second half of next year. Now, one of the things that I made a mistake on, and so I just need you to make this pivot. We're going to do this one time.

Speaker01:56:21 - 01:56:39

Blackwell is really two GPUs in one Blackwell chip. We call that one chip a GPU, and that was wrong. And the reason for that is it screws up all the NVLink nomenclature and things like that. So going forward, without going back to Blackwell to fix it, going forward, when I say,

Speaker01:56:39 - 01:57:05

NVLink-144 just means that it's connected to 144 GPUs, and each one of those GPUs is a GPU die, and it could be assembled in some package. How it's assembled could change from time to time, okay? And so each GPU die is a GPU, each NVLink is connected to the GPU. And so very Ruben NVLink-144. And then this now sets the stage

Speaker01:57:06 - 01:57:33

For the second half of the year, the following year, we call Ruben Ultra. Okay, so Vera Ruben Ultra. I know. This one is where you go... All right, so this is Vera Ruben, Ruben Ultra, second half of 27. It's MVLink 576 Extreme Scale-Up.

Speaker01:57:35 - 01:58:03

Each rack is 600 kilowatts, two and a half million parts, and obviously a whole lot of GPUs. And everything is X-factored more. So 14 times more flops, 15 exaflops. Instead of one exaflop, as I mentioned earlier, it's now 15 exaflops, scaled up exaflops. And it's 300, what?

Speaker01:58:04 - 01:58:26

4.6 petabytes, so 4,600 terabytes per second scale-up bandwidth. I don't mean aggregate, I mean scale-up bandwidth. And, of course, a brand new NVLink switch and CX9. And so, notice 16 sites, four GPUs in one package.

Speaker01:58:27 - 01:58:53

Extremely large NVLink. Now just put that in perspective. This is what it looks like. Now this is going to be fun. You are just literally ramping up Grace Blackwell at the moment, and I don't mean to make it look like a laptop, but here we go. So this is what Grace Blackwell looks like, and this is what Ruben looks like. ISO dimension.

Speaker01:58:54 - 01:59:18

This is another way of saying, before you scale out, you have to scale up. Does that make sense? Before you scale up, scale out, you scale up. And then after that, you scale out with amazing technology that I'll show you in just a second. First you scale up, and then now that gives you a sense of the pace at which we're moving. This is the amount of scale-up flops. This is scale-up flops.

Speaker01:59:19 - 01:59:46

Hopper is 1x, Blackwell is 68x, Rubin is 900x. Scale up flops, and then if I turn it into essentially your TCO, which is power on top, power per, and the underneath is the area underneath the curve that I was talking to you about. The square underneath the curve, which is basically flops times bandwidth.

Speaker01:59:46 - 02:00:16

So the way you think about a very easy gut feel, gut check, on whether your AI factories are making progress is watts divided by those numbers. And you can see that, Ruben, it's going to drive the cost down tremendously. So that's very quickly NVIDIA's roadmap. Once a year, like...

Speaker02:00:17 - 02:00:38

Like clock ticks, once a year. Okay, how do we scale up? Well, we introduced, we were preparing to scale out. That was scale up, it was MV-Link. Our scale-out network is InfiniBand and SpectrumX. Most were quite surprised that we came into the Ethernet world, and the reason why we decided to do Ethernet is if we could help Ethernet become

Speaker02:00:39 - 02:00:54

Like InfiniBand, have the qualities of InfiniBand, then the network itself would be a lot easier for everybody to use and manage. And so we decided to invest in Spectrum, we call it Spectrum X, and we brought to it the properties

Speaker02:00:54 - 02:01:15

of congestion control and very low latency and a mountain of software that's part of our computing fabric. And as a result, we made SpectrumX incredibly high-performing. We scaled up the largest single GPU cluster ever as one giant cluster with SpectrumX.

Speaker02:01:15 - 02:01:34

And that was Colossus. And so there are many other examples of it. Spectrum X is unquestionably a huge home run for us. One of the areas that I'm very excited about is the largest enterprise networking company to take Spectrum X and integrate it into their product line so that they could help the world's enterprises become AI companies.

Speaker02:01:40 - 02:02:10

We're at 100,000 with CX-8, CX-7, now CX-8's coming, CX-9's coming, and during Ruben's time frame, we would like to scale out the number of GPUs to many hundreds of thousands. Now, the challenge with scaling out GPUs to many hundreds of thousands is the connection of the scale out, the connection on scale up is copper. We should use copper as far as we can, and that's, you know, call it a meter or two.

Speaker02:02:11 - 02:02:36

And that's incredibly good connectivity, very high reliability, very good energy efficiency, very low cost. And so we use copper as much as we can on scale up, but on scale out, where the data centers are now the size of the stadium, we're going to need something much long distance running. And this is where silicon photonics comes in. The challenge of silicon photonics

Speaker02:02:36 - 02:03:04

has been that the transceivers consume a lot of energy. To go from electrical to photonic has to go through a certis, go through a transceiver and a certis, several certis. And so each one of these, each one of these, each one of these, am I alone? Is anybody? What happened to my networking guys?

Speaker02:03:05 - 02:03:34

Can I have this up here? Yeah, let's bring it up so I can show people what I'm talking about. Okay, so first of all, we're announcing NVIDIA's first co-packaged option, silicon photonic system. It is the world's first 1.6 terabit per second CPO. It is based on a technology called micro ring resonator modulator.

Speaker02:03:34 - 02:04:00

And it is completely built with this incredible process technology at TSMC that we've been working with for some time. And we partnered with just a giant ecosystem of technology providers to invent what I'm about to show you. This is really crazy technology. Crazy, crazy technology. Now the reason why we decided to invest in MRM is so that we could prepare ourselves using

Speaker02:04:00 - 02:04:25

MRM's incredible density and power, better density and power compared to Moxander, which is used for telecommunications when you drive from one data center to another data center in telecommunications, or even in the transceivers that we use, we use Moxander, because the density requirement is not very high until now. And so if you look at these transceivers, this is an example of a transceiver.

Speaker02:04:31 - 02:04:55

They did a very good job tangling this up for me. Oh, wow. Thank you. Oh, mother of God. Okay. This is where you've got to turn reasoning on.

Speaker02:05:05 - 02:05:27

It's not as easy as you think. These are squirrely little things. All right, so this, this one right here, this is 30 watts. Just to keep you remembered, this is 30 watts. And if you buy it in high volume, it's $1,000. This is a plug. On this side, it's electrical. On this side, it's optical.

Speaker02:05:27 - 02:05:56

Okay, so optics come in through the yellow, you plug this into a switch, it's electrical on this side, there's transceivers, lasers, and it's a technology called Moxander, and incredible. And so we use this to go from the GPU to the switch, to the next switch, and then the next switch down, and the next switch down to the GPU, for example.

Speaker02:05:56 - 02:06:24

And so each one of these, if we had 100,000 GPUs, we would have 100,000 of this side, and then another, you know, 100,000, which connects the switch to a switch, and then on the other side, I'll attribute that to the other NIC. If we had 250,000, we'll add another layer of switches, and so each GPU,

Speaker02:06:25 - 02:06:51

Every GPU, 250,000, every GPU would have six transceivers. Every GPU would have six of these plugs, and these six plugs would add 180 watts per GPU, 180 watts per GPU, and $6,000 per GPU, okay? And so the question is, how do we scale up now to millions of GPUs?

Speaker02:06:51 - 02:07:16

Because if we had a million GPUs multiplied by six, it would be six million transceivers times 30 watts, 180 megawatts of transceivers. They didn't do any math, they just moved signals around. And so the question is, how do we...

Speaker02:07:16 - 02:07:42

How could we afford, and as I mentioned earlier, energy is our most important commodity. Everything is related ultimately to energy, so this is going to limit our revenues, our customers' revenues, by subtracting out 180 megawatts of power. And so this is the amazing thing that we did. We invented the world's first MRM micro mirror, and this is what it looks like. There's a little...

Speaker02:07:42 - 02:08:07

A waveguide goes to a ring. That ring resonates and it controls the amount of reflectivity of the waveguide as it goes around and it limits and modulates the energy, the amount of light that goes through. It shuts it off by absorbing it or passing it on. It turns the light, this direct continuous laser beam, into ones and zeros.

Speaker02:08:07 - 02:08:36

And that's the miracle. And that technology is then, that photonic IC is stacked with the electronic IC, which is then stacked with a whole bunch of micro lenses, which is stacked with this thing called fiber array. These things are all manufactured using this technology at TSMC called, they call it Coop, and packaged using a 3D co-op technology, working with all of these technology providers, a whole bunch of them, the names I just showed you earlier,

Speaker02:08:37 - 02:08:40

And it turns it into this incredible machine. So let's take a look at the video.

Speaker02:09:11 - 02:09:16

Thank you for watching!

Speaker02:10:15 - 02:10:43

Just a technology marvel. And they turn into these switches, our InfiniBand switch. The silicon is working fantastically. Second half of this year, we will ship the silicon photonics switch in the second half of this year, and the second half of next year, we'll ship the Spectrum X. Because of the MRM choice, because of the incredible technology risks that over the last five years that we did, and filed hundreds of patents,

Speaker02:10:44 - 02:11:04

And we've licensed it to our partners so that we can all build them. Now, we're in a position to put silicon photonics with co-packaged options, no transceivers, direct fiber in into our switches with a radix of 512. This is the

Speaker02:11:04 - 02:11:30

This is the 512 ports. This would just simply not be possible any other way. And so this now set us up to be able to scale up to these multi-hundred thousand GPUs and multi-million GPUs. And the benefit, just so you imagine this, it's incredible. In a data center, we could save tens of megawatts. Tens of megawatts.

Speaker02:11:31 - 02:11:45

10 megawatts, well let's say 60 megawatts, 6 megawatts is 10 Rubin Ultra racks. 6 megawatts is 10 Rubin Ultra racks.

Speaker02:11:46 - 02:12:01

Right? And 60, that's a lot. 100 Rubin Ultra Racks of power that we can now deploy into Rubins. All right, so this is our roadmap once a year, once a year, and architecture every...

Speaker02:12:01 - 02:12:31

Every two years, a new product line every single year, X factors up, and we try to take silicon risk or networking risk or system chassis risk in pieces so that we can move the industry forward as we pursue these incredible technology. Vera Rubin, and I really appreciate the grandkids for being here. This is our opportunity to recognize her and to honor her for the incredible work that she did. Our next generation will be named after Feynman.

Speaker02:12:40 - 02:13:09

Okay, NVIDIA's roadmap. Let me talk to you about enterprise computing. This is really important. In order for us to bring AI to the world's enterprise, first we have to go to a different part of NVIDIA. The beauty of Gaussian splats. Okay, in order for us to take AI to enterprise, take a step back for a second and remind yourself this.

Speaker02:13:10 - 02:13:32

Remember, AI and machine learning has reinvented the entire computing stack. The processor is different, the operating system is different, the applications on top are different. The way the applications are different, the way you orchestrate it are different, and the way you run them are different. Let me give you one example. The way you access data will be fundamentally different than the past.

Speaker02:13:33 - 02:14:01

Instead of retrieving precisely the data that you want and you read it to try to understand it, in the future, we will do what we do with perplexity. Instead of doing retrieval that way, I'll just ask perplexity what I want. Ask it a question, and it will tell you the answer. This is the way enterprise IT will work in the future as well. We'll have AI agents, which are part of our digital workforce. There's a billion knowledge workers in the world.

Speaker02:14:01 - 02:14:30

There are probably going to be 10 billion digital workers working with us side by side. 100% of software engineers in the future, there are 30 million of them around the world, 100% of them are going to be AI-assisted. I'm certain of that. 100% of NVIDIA software engineers will be AI-assisted by the end of this year. And so AI agents will be everywhere. How they run, what enterprises run, and how we run it will be fundamentally different. And so we need a new line of computers. And this...

Speaker02:14:38 - 02:14:56

This is what a PC should look like. 20 petaflops. Unbelievable. 72 CPU cores, chip-to-chip interface, HBM memory, and just in case, some PCI Express slots for your GeForce.

Speaker02:14:59 - 02:15:19

So this is called DGX Station. DGX Spark and DGX Station are going to be available by all of the OEMs. HP, Dell, Lenovo, Asus. It's going to be manufactured for data scientists and researchers all over the world. This is the computer of the age of AI.

Speaker02:15:20 - 02:15:37

This is what computers should look like, and this is what computers will run in the future. And we have a whole lineup for enterprise now, from little tiny one to workstation ones, to server ones, to supercomputer ones, and these will be available by all of our partners. We will also,

Speaker02:15:38 - 02:16:01

This revolutionized the rest of the computing stack. Remember, computing has three pillars. There's computing, you're looking at it. There's networking, as I mentioned earlier, Spectrum X going to the world's enterprise, an AI network. And the third is storage. Storage has to be completely reinvented. Rather than a retrieval-based

Speaker02:16:02 - 02:16:15

Storage system is going to be a semantics-based retrieval system, a semantics-based storage system. And so the storage system has to be continuously embedding information in the background, taking raw data,

Speaker02:16:16 - 02:16:39

Embedding it into knowledge. And then later when you access it, you don't retrieve it, you just talk to it. You ask it questions. You give it problems. And one of the examples, I wish we had a video of it, but Aaron at Box even put one up in the cloud, worked with us to put it up in the cloud. And it's basically, you know, a super smart storage system.

Speaker02:16:40 - 02:17:09

And in the future, you're going to have something like that in every single enterprise. That is the enterprise storage of the future. And we're working with the entire storage industry, really fantastic partners, DDN and Dell and HP Enterprise and Hitachi and IBM and NetApp and Nutanix and Pure Storage and Vast and Weka. Basically, the entire world storage industry will be offering this stack. For the very first time, your storage system will be GPU accelerated.

Speaker02:17:17 - 02:17:44

Somebody thought I didn't have enough slides. Michael thought I didn't have enough slides. He said, Jensen, just in case you don't have enough slides, can I just put this in there? This is Michael's slides. He sent this to me. He goes, just in case you don't have any slides. I got too many slides. This is such a great slide. Let me tell you why. In one single slide, he's explaining that Dell is going to be offering a whole line of NVIDIA products

Speaker02:17:45 - 02:18:13

Enterprise IT AI infrastructure systems and all the software that runs on top of it. So you can see that we're in the process of revolutionizing the world's enterprise. We're also announcing today this incredible model that everybody can run. And so I showed you earlier R1, a reasoning model. I showed you versus LAMA3, a non-reasoning model. And obviously R1 is much smarter.

Speaker02:18:13 - 02:18:34

But we can do it even better than that. And we can make it possible to be enterprise ready for any company. And it's now completely open source. It's part of our system we call NIMS. And you can download it. You can run it anywhere. You can run it on DGX Spark. You can run it on DGX Station. You can run it on any of the servers that the OEMs make.

Speaker02:18:34 - 02:18:49

You can run it in the cloud. You can integrate it into any of your agentic AI frameworks. And we're working with companies all over the world. And I'm going to flip through these, so watch very carefully. I've got some great partners in the audience I want to recognize.

Speaker02:18:49 - 02:19:08

Accenture, Julie Sweet and her team are building their AI factory and their AI framework. Amdocs, the world's largest telecommunication software company. AT&T, John Stankey and his team are building an AT&T AI system, agentic system. Larry Fink and BlackRock team building theirs. Annie Rood.

Speaker02:19:08 - 02:19:32

In the future, not only will we hire ASIC designers, we're going to hire a whole bunch of digital ASIC designers from Anirudh Cadence that will help us design our chips. And so Cadence is building their AI framework, and as you can see, in every single one of them, there's NVIDIA models, NVIDIA NIMS, NVIDIA libraries integrated throughout, so that you can run it on-prem, in the cloud, any cloud. Capital One,

Speaker02:19:32 - 02:19:56

One of the most advanced financial services companies in using technology has NVIDIA all over it. Deloitte, Jason and his team, ENY, Janet and his team, Nasdaq, and Adena and her team integrating NVIDIA technology into their AI frameworks, and then Christian and his team at SAP, Bill McDermott and his team at ServiceNow. That was pretty good, huh?

Speaker02:20:03 - 02:20:24

This is one of those keynotes where the first slide took 30 minutes, and then all the other slides took 30 minutes. So next, let's go somewhere else. Let's go talk about robotics, shall we? Let's talk about robots. Well, the time has come for robots.

Speaker02:20:25 - 02:20:46

Robots have the benefit of being able to interact with the physical world and do things that otherwise digital information cannot. We know very clearly that the world has severe shortage of human laborers, human workers. By the end of this decade, the world is going to be at least 50 million workers short.

Speaker02:20:46 - 02:21:12

We'd be more than delighted to pay them each $50,000 to come to work. We're probably going to have to pay robots $50,000 a year to come to work. And so this is going to be a very, very large industry. There are all kinds of robotic systems. Your infrastructure will be robotic. Billions of cameras in warehouses and factories, 10, 20 million factories around the world. Every car is already a robot, as I mentioned earlier, and then now we're building general robots. Let me show you how we're doing that.

Speaker02:21:17 - 02:21:38

Everything that moves will be autonomous. Physical AI will embody robots of every kind, in every industry. Three computers built by NVIDIA enable a continuous loop of robot AI simulation, training, testing, and real-world experience.

Speaker02:21:39 - 02:22:03

Training robots requires huge volumes of data. Internet-scale data provides common sense and reasoning, but robots need action and control data, which is expensive to capture. With blueprints built on NVIDIA Omniverse and Cosmos, developers can generate massive amounts of diverse, synthetic data for training robot policies.

Speaker02:22:04 - 02:22:30

First, in Omniverse, developers aggregate real-world sensor, or demonstration data, according to their different domains, robots, and tasks. Then use Omniverse to condition Cosmos, multiplying the original captures into large volumes of photoreal, diverse data. Developers use Isaac Lab to post-train the robot policies with the augmented dataset.

Speaker02:22:31 - 02:22:56

And let the robots learn new skills by cloning behaviors through imitation learning or through trial and error with reinforcement learning AI feedback. Practicing in a lab is different than the real world. New policies need to be field tested. Developers use Omniverse for software and hardware-in-the-loop testing.

Speaker02:22:57 - 02:23:15

Simulating the policies in a digital twin with real world environmental dynamics. With domain randomization, physics feedback, and high fidelity sensor simulation. Real world operations require multiple robots to work together.

Speaker02:23:16 - 02:23:29

MEGA, an Omniverse Blueprint, lets developers test fleets of post-trained policies at scale. Here, Foxconn tests heterogeneous robots in a virtual NVIDIA Blackwell production facility.

Speaker02:23:29 - 02:23:57

As the robot brains execute their missions, they perceive the results of their actions through sensor simulation, then plan their next action. MEGA lets developers test many robot policies, enabling the robots to work as a system, whether for spatial reasoning, navigation, mobility, or dexterity. Amazing things are born in simulation.

Speaker02:23:57 - 02:24:25

Today, we're introducing NVIDIA Isaac Groot N1. Groot N1 is a generalist foundation model for humanoid robots. It's built on the foundations of synthetic data generation and learning and simulation. Groot N1 features a dual system architecture for thinking fast and slow, inspired by principles of human cognitive processing.

Speaker02:24:26 - 02:24:42

The slow thinking system lets the robot perceive and reason about its environment and instructions, and plan the right actions to take. The fast thinking system translates the plan into precise and continuous robot actions.

Speaker02:24:43 - 02:25:08

GRUT N1's generalization lets robots manipulate common objects with ease and execute multi-step sequences collaboratively. And with this entire pipeline of synthetic data generation and robot learning, humanoid robot developers can post-train GRUT N1 across multiple embodiments and tasks across many environments.

Speaker02:25:11 - 02:25:40

Around the world, in every industry, developers are using NVIDIA's three computers to build the next generation of embodied AI. Physical AI and robotics are moving so fast. Everybody pay attention to this space. This could very well likely be the largest industry of all.

Speaker02:25:41 - 02:26:09

At its core, we have the same challenges. As I mentioned before, there are three that we focus on. They are rather systematic. One, how do you solve the data problem? How, where do you create the data necessary to train the AI? Two, what's the model architecture? And then three, what's the scaling laws? How can we scale?

Speaker02:26:10 - 02:26:37

Either the data, the compute, or both. So that we can make AIs smarter and smarter and smarter. How do we scale? And those two, those fundamental problems exist in robotics as well. In robotics, we created a system called Omniverse. It's our operating system for physical AIs. You've heard me talk about Omniverse for a long time. We added two technologies to it. Today I'm going to show you two things. One of them

Speaker02:26:38 - 02:27:06

So that we could scale AI with generative capabilities and generative models that understand the physical world. We call it Cosmos. Using Omniverse to condition Cosmos and using Cosmos to generate an infinite number of environments allows us to create data that is grounded, controlled by us,

Speaker02:27:07 - 02:27:36

And yet be systematically infinite at the same time. So you see Omniverse, we used candy colors to give you an example of us controlling the robot in the scenario perfectly, and yet Cosmos can create all these virtual environments. The second thing, just as we were talking about earlier, one of the incredible scaling capabilities of language models today is reinforcement learning,

Speaker02:27:36 - 02:27:53

Verifiable rewards. The question is, what's the verifiable rewards in robotics? And as we know very well, it's the laws of physics. Verifiable physics rewards. And so we need an incredible physics engine.

Speaker02:27:53 - 02:28:19

Well, most physics engines have been designed for a variety of reasons. They can be designed because we want to use it for large machineries, or maybe we design it for virtual worlds, video games and such. But we need a physics engine that is designed for very fine-grained, rigid and soft bodies, designed for being able to train tactile feedback,

Speaker02:28:20 - 02:28:47

Fine motor skills and actuator controls. We needed to be GPU accelerated so that these virtual worlds could live in super linear time, super real time, and train these AI models incredibly fast. And we needed to be integrated harmoniously into a framework that is used by roboticists all over the world, MuJoCo. And so today we're announcing

Speaker02:28:47 - 02:29:17

Something really, really special. It is a partnership of three companies, DeepMind, Disney Research, and NVIDIA, and we call it Newton. Let's take a look at Newton.

Speaker02:29:25 - 02:29:50

Thank you. All right, let's start that over, shall we? Let's not ruin it for them. Hang on a second. Somebody talk to me. I need feedback. What happened? I just need a human to talk to.

Speaker02:29:54 - 02:30:13

Thank you. Thank you.

Speaker02:30:29 - 02:30:55

Tell me that wasn't amazing!

Speaker02:30:57 - 02:31:24

Hey, Blue. How are you doing? How do you like your new physics engine? You like it, huh? Yeah, I bet. I know. Tactile feedback, rigid body, soft body simulation, super real-time. Can you imagine just now what you were looking at as complete real-time simulation? This is how we're going to train robots in the future?

Speaker02:31:25 - 02:31:48

Just so you know, Blue has two computers, two NVIDIA computers inside. Look how smart you are. Yes, you're smart. Okay. All right. Hey, Blue, listen. How about let's take him home? Let's finish this keynote. It's lunchtime.

Speaker02:31:49 - 02:32:17

Are you ready? Let's finish it up. We have another announcement. You're good. You're good. Just stand right here. Stand right here. Stand right here. Alright, good. Right there. That's good. Alright, stand. Okay.

Speaker02:32:18 - 02:32:45

We have another amazing news. I told you the progress of our robotics has been making enormous progress. And today we're announcing that Groot N1 is open sourced. I want to thank all of you to come to...

Speaker02:32:47 - 02:32:54

Let's wrap up. I want to thank all of you for coming to GTC. We talked about several things. One, Blackwell is in full production.

Speaker02:32:54 - 02:33:19

And the ramp is incredible. Customer demand is incredible, and for good reason. Because there's an inflection point in AI, the amount of computation we have to do in AI is so much greater as a result of reasoning AI and the training of reasoning AI systems and agentic systems. Second, Blackwell NVLink 72 with Dynamo is 40 times

Speaker02:33:19 - 02:33:49

The performance, AI factory performance of Hopper. And inference is going to be one of the most important workloads in the next decade as we scale out AI. Third, we have annual rhythm of roadmaps that has been laid out for you so that you could plan your AI infrastructure. And then we have three AI infrastructures we're building. AI infrastructure for the cloud, AI infrastructure for enterprise, and AI infrastructure for robots.

Speaker02:33:52 - 02:34:12

Thank you for watching.

Speaker02:35:26 - 02:35:55

If you have any questions or other problems, please post them in the comments.

Speaker02:36:20 - 02:36:36

Thank you for watching!

Speaker02:36:51 - 02:37:17

Thank you everybody. Thank you for all the partners that made this video possible. Thank you everybody that made this video possible. Have a great GTC. Thank you. Hey, Blue. Let's go home. Good job. Good little man. Thank you. I love you too. Thank you.

Convert Audio&Video to Text Online for Free

- Converts audio and video files to accurate text in seconds.
- Creates summaries, mind maps, and key questions.

Start for Free

GTC March 2025 Keynote with NVIDIA CEO Jensen Huang

00:00

02:37:48