Nvidia's Tesla A100 has a whopping 6,912 CUDA cores - Specs Detailed

Nvidia has revealed its Tesla A100 graphics accelerator, and it is a monster. Thanks to CRN, we have detailed specifications for Nvidia's Tesla A100 silicon, complete with CUDA core counts, die size and more. 

Right now, we know that Nvidia's Tesla A100 features 6,912 CUDA cores, which feature the ability to calculate FP64 calculations at half-rate.

With 7nm, Nvidia has delivered a greater than 2x increase in transistor count over the company's Tesla V100 core design, a feat which allows Nvidia to deliver some incredible performance increases for its Tesla A100. Nvidia has also combined this core design with 40GB of HBM memory by using five 8GB HBM memory modules. The image below shows that Nvidia's Tesla A100 can feature up to six of these chips to deliver 48GB of VRAM. This reduction in VRAM capacity was likely a design decision which will help Nvidia increase the production yields of such a large graphics card. 

Nvidia's Tesla A100 chips use Nvidia's SMX3 form factor and support PCIe 4.0 and third-generation NVLink. With 3rd Generation NVLink, Nvidia can support up to 4.8 TB per second in bi-direction bandwidth and 600 GB per second in GPU-to-GPU bandwidth. This means that Nvidia can connect up to eight Tesla A100 graphics cards together with 600 GB per second of bandwidth. Nvidia has also combined its offerings with 200Gbps Mellanox interconnects to increase platform scalability further. 

With the company's new 3rd generation Tensor cores, Nvidia's Tesla A100 is said to deliver a 20x increase in eight-bit integer math (INT8) performance when compared to the company's older Tesla V100. Nvidia has also reported a 2.5X increase in double-precision floating-point, FP64, performance. While there are fewer Tensor cores on the Tesla A100, these redesigned Tensor cores are much more powerful than before, more than making up for this decrease Tensor core count. 

Nvidia is expected to reveal more information about its Ampere architecture later today. 

 Tesla A100Tesla V100Tesla P100
GPU ArchitectureAmpereVoltaPascal
Process node7nm12nm16nm
Die Size (mm^2)826mm^2815mm^2610mm^2
FP64 TFLOPS9.77.85.3
FP32 TFLOPs19.515.710.6
FP16 TFLOPS39.031.421.2
Transistor Count54 Billion21.1B15.3B
CUDA Core Count6,9205,1203,584
Tensor Cores432640N/A
VRAM Cappacity40GB32GB or 16GB16GB
Memory Bus Size5120-bit 4096-bit4096-bit
Memory Bandwidth1,555 GB/s900GB/s720GB/s
Boost clock speed ?1455MHz1480MHz

14-05-2020, 08:53:18

I watched an interesting stream last night about Ampere. It raised some decent theories.

As we know, we were supposed to get Ampere ages ago. Then all of a sudden it's Turing and it's really expensive. They suggested Ampere might not have been ready, or even working properly, hence the "Bolt the RTX stuff onto Pascal" which would have been expensive.

Which suggests that yes, Ampere should be a fair chunk cheaper. Now whether or not we will see that as buyers? who knows.Quote

14-05-2020, 10:22:23

7nm hmmm... Very curious where that's being producedQuote

14-05-2020, 10:23:32

Originally Posted by Giggyolly View Post
7nm hmmm... Very curious where that's being produced
Samsung, it's been known for a few years now, Nvidia Korea fully confirmed it last year. EDIT: That may be the GA102 and below only actually, whichever one taped out in November 2019 is Samsung I know anyway. This might just be too big for them, in which case ofc it kinda has to be TSMC.Quote

14-05-2020, 10:55:00

Originally Posted by Giggyolly View Post
7nm hmmm... Very curious where that's being produced
As far as I can tell, Nvidia hasn't officially stated its foundry partner.

Edit; Nevermind, Jensen has confirmed that they are using TSMC for this chip.Quote

