About CUDA


Tom explains the need and use case for CUDA software and hardware and why that may matter to you.

Featuring Tom Merritt.



A special thanks to all our supporters–without you, none of this would be possible.

Thanks to Kevin MacLeod of Incompetech.com for the theme music.

Thanks to Garrett Weinzierl for the logo!

Thanks to our mods, Kylde, Jack_Shid, KAPT_Kipper, and scottierowland on the subreddit

Send us email to [email protected]

Episode transcript:

I was shopping for a graphics card and I noticed one had a lot of CUDA cores

But another one didn’t have any CUDA cores at all

What’s a CUDA core? Can I live without it?

Confused? Don’t be

Let’s Help you know a little more about CUDA.

CUDA stands for stood for Compute Unified Device Architecture. These days it’s just referred to as CUDA. Kind of like CBS in the US used to stand for Columbia Broadcasting System but now is just CBS. Or like KFC tried to do for awhile.
CUDA lets software use a GPU, the graphics processor in your computer, like a CPU, the central processor. The approach is sometimes called general-purpose computing on GPUs or GPGPU for short. But mostly in your everyday tech news consumption, you’re going to hear people talk about CUDA and CUDA cores. “This new GPU has 12,000 CUDA cores!” Or something like that.
To oversimplify, it’s a form of parallel processing. Lets you have the computer do a lot of things at once instead of doing them one at a time, which makes everything faster and also lets you do more things. CUDA is Nvidia’s parallel processing platform. And a CUDA core is a processing unit, the hardware inside the GPU for taking advantage of it. AMD has one too, called Stream Processors.
CUDA is actually a software layer that gives access to the GPU’s virtual instruction set. Basically it lets software meant to run on a CPU get some of the same results from the GPU. You could do this with APIs like Direct3D and OpenGL, but you need to be good at graphics programming. CUDA works with languages like C, C++ and Fortran, so if you do parallel programming already, you should be able to take advantage of CUDA.
Nvidia created CUDA and initially released it on June 23, 2007. It’s Nvidia’s proprietary technology.
CUDA works on most standard Operating Systems on all Nvidia GPUs from the G8x series onwards, including GeForce, Quadro and the Tesla line. Tegra, Jetson, Drive and Clara GPUs tend to get specialized version of CUDA every few point upgrades or so. Nvidia uses the OpenACC standard for parallel processing in CUDA.
GPUs were developed to handle the more intense task of processing graphics, without burdening the CPU. Because CUDA allows you to have a lot more cores than you could have in a CPU. A CPU core has to fetch from memory, interpret instructions and calculate, because it has to do everything. A GPU core doesn’t need to do everything. So it just calculates. Over time this developed into a parallel system that became very efficient at manipulating any large block of data whether it was graphics or not.
Besides graphics, modern algorithms used for all kinds of things need to process large blocks of data in parallel, and can often do that better on a GPU..That include things like machine learning, physics engines, cryptographic hash functions and more. So in gaming the physics engines that model the real world use it, outside of the actual graphics, and in cryptography it’s used to hash things so they’re hard to crack.
CUDA accelerates these functions. Now it’s worth noting that Nvidia has a couple other kinds of cores too. The surprisingly named Ray-tracing cores, specializing in ray-tracing graphics. You can do ray tracing in CUDA cores, but not as well as you can in the specialized RT cores. We won’t go into it here, but Ray tracing is essentially handling how light moves to improve the look of graphics. (I know that’s not REALLY right but it gives people who don’t know, the gist of it.) And then there are Tensor cores. I won’t even try to summarize what they do but they end up helping train neural networks for deep learning algorithms. Do you know the importance of metric multiplication in processing? Great! Then I don’t need to explain what a Tensor core is to you. If you said, no, like me, then knowing that Tensor cores have to do with deep learning and neural network training is probably good enough for now.
Back to CUDA!
While CUDA is a software layer, a CUDA core is the hardware part of the GPU that the software can use.
Think of it like this. You have a room full of folks who have air pumps that can inflate footballs. But you also want them to inflate bike tires. The CUDA software is an adapter for their pumps that let them inflate other things besides footballs. And you have a lot of inflatables, basketballs, bouncy castles, floaty ducks. So you have the adapters, the CUDA platform, and a bunch of rooms full of people with pumps, the CUDA cores– so you can send a bouncy castle into one room which may take everyone in the room all morning to inflate but it won’t stop you from inflating footballs, and floaty ducks, because you have other rooms you can send those into.
No. I don’t know where these metaphors come from but surprisingly I think that one works pretty well. If it didn’t, everyone else usually describes CUDA cores as extra pipes that help drain water faster. Whatever works for you.
In the first Fermi GPU from Nvidia in April 2010, you had 16 stream multiprocessors that each supported 32 cores, for a total of 512 CUDA cores. Remember like I said earlier, a GPU core does not fetch from memory or decode instructions, it just carries out calculations. This is one of the reasons you can have so many more of them on a card, compared to CPU cores. Anyway, being able to just do calculations is cool for 8-bit graphics where you just need to know what pixels go where. CUDA the software layer interprets the instructions and can coordinate all the cores and get the cores to calculate the right things for more advanced graphics uses and non-graphical uses too.
Are more cores better?
Yes. But not for every version of that question. The performance of a graphic card is not going to rest on the number of cores. If you have slow clock speeds or inefficient architecture the number of cores won’t matter much. However if the architecture is the same and the clock speeds are close, the number of CUDA cores can tell you something useful. This comes into play when comparing all the cards in a single generation. Like the NVIDIA 4000-series cards or something. But it won’t work across generations. Tech Centurion points out that the Nvidia RTX 2060 had fewer CUDA cores than the GTX 780. But nobody is out there arguing the 780 was a better card than the 2060.
Part of that is because CUDA cores can be built differently too. For instance the Fermi CUDA cores had a floating point unit and an integer unit. The CUDA cores in the Ampere architecture had two 32-bit floating processing units. It could handle two 32-bit floating point or one 32-bit floating point and one Integer operation every cycle. In other words the CUDA core could do a lot more in the Ampere architecture than it could in Fermi architecture.
So remember that the number of CUDA cores just tells you that more data can be processed in parallel overall than another card with fewer of the same kind of CUDA cores. The clock speed tells you whether a single core can perform faster or not. And the architecture tells you how much each core can do per cycle. And then the software layer can affect things, as can the size of the transistors.
And you really can’t compare AMD’s stream processors to Nvidia’s CUDA cores. They work differently and use different software platforms. They’re similar the way, say an apple and an orange are both fruits and deliver fructose. They do the same things, hydrate you, give you vitamins, and deliver about the same number calories. But they have very different ways of going about it.
This is why people do benchmarks. Just let me know what it actually does in practice. Thanks.
To sum up, CUDA cores help your NVidia GPU do more but the number of them is only helpful as a comparison of cards within the same Nvidia generation of GPUs.
In other words, I hope you know a little more about CUDA.