`Nvidia GeForce RTX 4070 Review: Mainstream Ada Arrives`

Nvidia positions its new GeForce RTX 4070 as a great upgrade for GTX 1070 and RTX 2070 users, but that doesn’t hide the fact that in many cases, it’s effectively tied with the last generation’s RTX 3080. The $599 MSRP means it’s also replacing the RTX 3070 Ti, with 50% more VRAM and dramatically improved efficiency. Is the RTX 4070 one of the best graphics cards? It’s certainly an easier recommendation than cards that cost $1,000 or more, but you’ll inevitably trade performance for those saved pennies.

At its core, the RTX 4070 borrows heavily from the RTX 4070 Ti. Both use the AD104 GPU, and both feature a 192-bit memory interface with 12GB of GDDR6X 12Gbps VRAM. The main difference, other than the $200 price cut, is that the RTX 4070 has 5,888 CUDA cores compared to 7,680 on the 4070 Ti. Clock speeds are also theoretically a bit lower, though we’ll get into that more in our testing. Ultimately, we’re looking at a 25% price cut to go with the 23% reduction in processor cores.

We’ve covered Nvidia’s Ada Lovelace architecture already, so start there if you want to know more about what makes the RTX 40-series GPUs tick. The main question here is how the RTX 4070 stacks up against its costlier siblings, not to mention the previous generation RTX 30-series. Here are the official specifications for the reference card.

Swipe to scroll horizontally

Nvidia RTX 4070 Compared to Other Ada / Ampere GPUs
Graphics Card	RTX 4070	RTX 4080	RTX 4070 Ti	RTX 3080 Ti	RTX 3080	RTX 3070 Ti	RTX 3070
Architecture	AD104	AD103	AD104	GA102	GA102	GA104	GA104
Process Technology	TSMC 4N	TSMC 4N	TSMC 4N	Samsung 8N	Samsung 8N	Samsung 8N	Samsung 8N
Transistors (Billion)	32	45.9	35.8	28.3	28.3	17.4	17.4
Die size (mm^2)	294.5	378.6	294.5	628.4	628.4	392.5	392.5
SMs	46	76	60	80	68	48	46
GPU Cores (Shaders)	5888	9728	7680	10240	8704	6144	5888
Tensor Cores	184	304	240	320	272	192	184
Ray Tracing “Cores”	46	76	60	80	68	48	46
Boost Clock (MHz)	2475	2505	2610	1665	1710	1765	1725
VRAM Speed (Gbps)	21	22.4	21	19	19	19	14
VRAM (GB)	12	16	12	12	10	8	8
VRAM Bus Width	192	256	192	384	320	256	256
L2 Cache (MiB)	36	64	48	6	5	4	4
ROPs	64	112	80	112	96	96	96
TMUs	184	304	240	320	272	192	184
TFLOPS FP32 (Boost)	29.1	48.7	40.1	34.1	29.8	21.7	20.3
TFLOPS FP16 (FP8)	233 (466)	390 (780)	321 (641)	136 (273)	119 (238)	87 (174)	81 (163)
Bandwidth (GBps)	504	717	504	912	760	608	448
TGP (watts)	200	320	285	350	320	290	220
Launch Date	Apr 2023	Nov 2022	Jan 2023	Jun 2021	Sep 2020	Jun 2021	Oct 2020
Launch Price	$599	$1,199	$799	$1,199	$699	$599	$499

There’s a pretty steep slope going from the RTX 4080 to the 4070 Ti, and from there to the RTX 4070. We’re now looking at the same number of GPU shaders — 5888 — as Nvidia used on the previous generation RTX 3070. Of course, there are plenty of other changes that have taken place.

Chief among those is the massive increase in GPU core clocks. 5888 shaders running at 2.5GHz will deliver a lot more performance than the same number of shaders clocked at 1.7GHz — almost 50% more performance, by the math. Nvidia also likes to be conservative, and real-world gaming clocks are closer to 2.7GHz… though the RTX 3070 also clocked closer to 1.9GHz in our testing.

The memory bandwidth ends up being slightly higher than the 3070 as well, but the significantly larger L2 cache will inevitably mean it performs much better than the raw bandwidth might suggest. Moving to a 192-bit interface instead of the 256-bit interface on the GA104 does present some interesting compromises, but we’re glad to at least have 12GB of VRAM this round — the 3060 Ti, 3070, and 3070 Ti with 8GB are all feeling a bit limited these days. But short of using memory chips in “clamshell” mode (two chips per channel, on both sides of the circuit board), 12GB represents the maximum for a 192-bit interface right now.

While AMD was throwing shade yesterday about the lack of VRAM on the RTX 4070, it’s important to note that AMD has yet to reveal its own “mainstream” 7000-series parts, and it will face similar potential compromises. A 256-bit interface allows for 16GB of VRAM, but it also increases board and component costs. Perhaps we’ll get a 16GB RX 7800 XT, but the RX 7700 XT will likely end up at 12GB VRAM as well. As for the previous generation AMD GPUs having more VRAM, that’s certainly true, but capacity is only part of the equation, so we need to see how the RTX 4070 stacks up before declaring a victor.

Another noteworthy item is the 200W TGP (Total Graphics Power), and Nvidia was keen to emphasize that in many cases, the RTX 4070 will use less power than TGP, where competing cards (and previous generation offerings) usually hit or exceeded TGP. We can confirm that’s true here, and we’ll dig into the particulars more later on.

The good news is that we finally have a latest generation graphics card starting at $599. There will naturally be third-party overclocked cards that jack up the price, with extras like RGB lighting and beefier cooling, but Nvidia has restricted this pre-launch review to cards that sell at MSRP. We’ve got a PNY model as well that we’ll look at in more detail in a separate review, though we’ll include the performance results in our charts. (Spoiler: It’s just as fast as the Founders Edition.)

Image 1 of 2

Nvidia GeForce RTX 4070 block diagram — Four GPCs, one NVENC, and one NVDEC for the RTX 4070 (Image credit: Tom’s Hardware)

Above are the block diagrams for the RTX 4070 and for the full AD104, and you can see all the extra stuff that’s included but turned off on this lower tier AD104 implementation. None of the blocks in that image are “to scale,” and Nvidia didn’t provide a die shot of AD104, so we can’t try to determine just how much space is dedicated to the various bits and pieces — not until someone else does the dirty work, anyway (looking at you, Fritzchens Fritz (opens in new tab)).

As discussed previously, AD104 includes Nvidia’s 4th generation Tensor cores, 3rd generation RT cores, new and improved NVENC/NVDEC units for video encoding and decoding (now with AV1 support), and a significantly more powerful Optical Flow Accelerator (OFA). The latter is used for DLSS 3, and while it’s “theoretically” possible to do Frame Generation with the Ampere OFA (or using some other alternative), so far only RTX 40-series cards can provide that feature.

The Tensor cores meanwhile now support FP8 with sparsity. It’s not clear how useful that is in all workloads, but certainly AI and deep learning have leveraged lower precision number formats to boost performance without significantly altering the quality of the results — at least in some workloads. It will ultimately depend on the work being done, and figuring out just what uses FP8 versus FP16, plus sparsity, can be tricky. Basically, it’s a problem for software developers, but we’ll probably see additional tools (like Stable Diffusion or GPT Text Generation) that end up leveraging such features.

Those interested in AI research may find other reasons to pick an RTX 4070 over its competition, and we’ll look at performance in some of those tasks as well as gaming and professional workloads. But before the benchmarks, let’s take a closer look at the RTX 4070 Founders Edition.

1 World Solutions Technology News & Info

Nvidia GeForce RTX 4070 Review: Mainstream Ada Arrives

`Nvidia GeForce RTX 4070 Review: Mainstream Ada Arrives`

Leave a Reply Cancel reply