NVLink RTX 2080 Ti Benchmark: x16/x16 vs. x8 & GTX 1080 Ti SLI

Posted on September 22, 2018

NVidia’s support of its multi-GPU technology has followed a tumultuous course over the years. Following a heavy push for adoption (that landed flat with developers), the company shunted its own SLI tech with Pascal, where multi-GPU support was cut-down to two devices concurrently. Even in press briefings, the company acknowledged waning interest and support in multi-GPU, and so the marketing efforts died entirely with Pascal. Come Turing, a renewed interest in creating multiple-purchasers has spurred development effort to coincide with NVLink, a 100GB/s symmetrical interface for the 2080 Ti. On the 2080, this still maintains a 50GB/s bus. It seems that nVidia may be pushing again for multi-GPU, and NVLink could further enable actual performance scaling with 2x RTX 2080 Tis or RTX 2080s (conclusions notwithstanding). Today, we're benchmarking the RTX 2080 Ti with NVLink (two-way), including tests for PCIe 3.0 bandwidth limitations when using x16/x8 or x8/x8 vs. x16/x16. The GTX 1080 Ti in SLI is also featured.

Note that we most recently visited the topic of PCIe bandwidth limitations in this post, featuring two Titan Vs, and must again revisit this topic. We have to determine whether an 8086K and Z370 platform will be sufficient for benchmarking with multi-GPU, i.e. in x8/x8, and so that requires another platform – the 7980XE and X299 DARK that we used to take a top-three world record previously.

Test Platform – X299 / PCIe Bandwidth Limitation Testing

We are using the following components to benchmark PCIe bandwidth limitations:

	Component	Courtesy of
CPU	Intel i9-7980XE 4.6GHz	Intel
GPU	This is what we’re testing!	Often the company that makes the card, but sometimes us (see article)
Motherboard	EVGA X299 DARK	EVGA
RAM	GSkill Trident Z Black 32GB 3600MHz (4 sticks)	GSkill
PSU	Corsair AX1600i	Corsair
Cooler	NZXT Kraken X62	NZXT
SSD	ADATA S60 Crucial MX300 1TB	GamersNexus

On this platform, we are toggling between PCIe generations to create limitations on the per-lane throughput, thus enabling visibility to potential limitations within the interface itself. This will help us determine viability of testing later in the content.

Test Methodology – Game Benchmarks

Testing methodology has completely changed from our last GPU reviews, which were probably for the GTX 1070 Ti series cards. Most notably, we have overhauled the host test bench and had updated with new games. Our games selection is a careful one: Time is finite, and having analyzed our previous testing methodologies, we identified shortcomings where we were ultimately wasting time by testing too many games that didn’t provide meaningfully different data from our other tested titles. In order to better optimize our time available and test “smarter” (rather than “more,” which was one of our previous goals), we have selected games based upon the following criteria:

Game Engine: Most games run on the same group of popular engines. By choosing one game from each major engine (e.g. Unreal Engine), we can ensure that we are representing a wide sweep of games that just use the built-in engine-level optimizations
API: We have chosen a select group of DirectX 11 and DirectX 12 API integrations, as these are the most prevalent at this time. We will include more Vulkan API testing as more games ship with Vulkan
Popularity: Is it something people actually play?
Longevity: Regardless of popularity, how long can we reasonably expect that a game will go without updates? Updating games can hurt comparative data from past tests, which impacts our ability to cross-compare new data and old, as old data may no longer be comparable post-patch

Game graphics settings are defined in their respective charts.

We are also testing most games at all three popular resolutions – at least, we are for the high-end. This includes 4K, 1440p, and 1080p, which allows us to determine GPU scalability across multiple monitor types. More importantly, this allows us to start pinpointing the reason for performance uplift, rather than just saying there is performance uplift. If we know that performance boosts harder at 4K than 1080p, we might be able to call this indicative of a ROPs advantage, for instance. Understanding why performance behaves the way it does is critical for future expansion of our own knowledge, and thus prepares our content for smarter analysis in the future.

For the test bench proper, we are now using the following components:

GPU Test Bench (Sponsored by Corsair)

	Component	Courtesy of
CPU	Intel i7-8086K 5.0GHz	GamersNexus
GPU	This is what we’re testing!	Often the company that makes the card, but sometimes us (see article)
Motherboard	ASUS ROG Maximus X Hero	ASUS
RAM	Corsair Vengeance LPX 32GB 3200MHz	Corsair
PSU	Corsair AX1600i	Corsair
Cooler	NZXT Kraken X62	NZXT
SSD	Plextor 256-M7VC Crucial MX300 1TB	GamersNexus

Separately, for the initial RTX 20-series reviews, we are using 10-series board partner models instead of reference models. This is because we know that most of the market, for fact, is using board partner models, and we believe this to be the most realistically representative and relatable for our audience. We acknowledge that the differences between the RTX and GTX reference cards would be more pronounced than when comparing partner cards, but much of this is resultant of poor cooler and reference card solutions in the previous generation. It creates, in our eyes, an unrealistically strong appearance for incoming cards on dual-axial coolers, and does not help the vast majority of users who own board partner model 10-series cards.

PCIe 3.0 Bandwidth Limitations: RTX 2080 Ti NVLink Benchmark

Ashes of the Singularity: Explicit Multi-GPU PCIe Bandwidth Test

Ashes of the Singularity is an incredibly interesting benchmarking tool for this scenario. Ashes uses explicit multi-GPU via the PCIe bus to allow multiple GPUs of varying make, a unique feature of Dx12, and also communicates entirely via the PCIe bus. This means that the cards can’t lean on the 100GB/s bandwidth provided to them by NVLink. Instead, all that data transacts over the significantly more limited bandwidth of PCIe, which is limited to 16GB/s in x16 mode. In our Titan V testing – and we’ll pop the old chart up on screen – we found that the PCIe bandwidth limits were finally being strained. Again, this is with no supporting bridge, and it’s the only title we know of that really makes use of multi-GPU like this.

For the 2080 Tis, we removed the NVLink bridge and just tested them via explicit multi-GPU via the PCIe bus. This is to determine at what point we hit PCIe 3.0 limitations; with PCIe 4.0 looming, there’s been a lot of talk of EOL for 3.0. In Ashes, we found that our maximum performance was 127.2FPS AVG, averaged across 10 runs. Running in x8/x8, which would be common in Z370 platforms, we had a measurable and consistent loss that exited margin of error. The loss was about 1.7%. Not a big deal. Running in x8/x8, we saw a massive performance penalty. The cards were now limited to 107FPS AVG, resulting in a 16% loss.

TimeSpy Extreme PCIe Bandwidth Limitation Benchmark

TimeSpy Extreme is an extremely useful synthetic tool for this type of benchmark, and also runs memory hard for the GPUs. We ran TimeSpy extreme 5 times each on these cards, fully automated, and found a difference of 0.19% between x8/x8 and x16/x16 for GFX test 1, which is geometrically intensive. This is way within margin of test variance for 3DMark, and produces zero loss of performance between x8/x8 and x16/x16. Part of this is likely because of NVLink’s additional bandwidth, reducing reliance on the PCIe bus.

Firestrike Ultra PCIe 3.0 x16/x16 vs. x8/x8 with 2080 Ti

For Firestrike Ultra, we observed an FPS difference of about 1% -- it was a 0.9% difference in GFX 1 and a 1.0% difference in GFX 2. We ended up running these an additional 5 times, for a total of 10 each, and found the results repeated. Firestrike has variance run-to-run, so we cannot with full confidence state that a difference exists – but if one does exist here, it amounts to a 1% advantage in x16/x16 versus x8/x8.

Negative scaling is of questionable existence at x16/x8, but certainly exists when forced down PCIe entirely (bypassing the NVLink bridge). We only know of one ‘game’ which does this presently, and that’s Ashes.

2x RTX 2080 Ti NVLink vs. GTX 1080 Ti SLI & Single RTX 2080 Ti

Sniper Elite 4 – NVLink Benchmark vs. RTX 2080 Ti & SLI 1080 Ti

Sniper Elite 4 produced some of the best scaling results, as it often does. This game is also the best DirectX 12 implementation we’re aware of, so its scaling will not apply to all games universally – it is an outlier, but a good one that can teach us a lot.

With our usual benchmark settings, the dual, NVLinked cards push past 200FPS and hit an average of 210FPS under non-overclocked settings. This outperforms the stock RTX 2080 Ti FE by about 94%. This is nearly perfect 2x scaling, and has been rare to achieve in the past years – but it’s always exciting when we see it, because this is what multi-GPU should be like. Versus the overclocked, single 2080 Ti, we saw a performance gain of 71% with the stock 2080 Tis in NVLink. Not bad, and overclocking the two cards, although annoying to find stability, would regain that lead. The GTX 1080 Tis in SLI

The next major consideration is frametime consistency: Multi-GPU has traditionally shown terrible frame-to-frame interval consistency, often resulting in things like micro-stutter or intolerable tearing. For this pairing, as you can see in our frametime plot, the lows scale pretty well. It’s not a near-perfect 2x scaling like the average, but it’s pretty close. As a reminder, these plots are to be read as lowest is best, but more consistent is more important than just being a low interval. 16ms is 60FPS. Very impressive performance in this game, which is more a testament to Sniper 4’s development team than anything else – they have continued to build some of the best-optimized games in the space.

We also tested with Sniper 4 at Ultra settings, just to remove CPU bottleneck concerns. Here’s a quick chart with those results, although they aren’t too different.

Far Cry 5 RTX 2080 Ti x8/x8 NVLink vs. Single Card

Far Cry 5 and the Dunia engine also show some SLI or NVLink scaling support. At 4K/High, Far Cry 5 plots the RTX 2080 Ti single card at 74FPS AVG stock, or 83FPS AVG overclocked. Lows stick around 55-60FPS in each value. With dual-cards, we manage 108FPS AVG, posting a growth of 46% over the single 2080 Ti stock card’s 74FPS AVG. That’s not nearly as exciting as the past result, but at least it’s still some scaling. At 50%, though, you can’t help but feel like you’re only getting $600 of value out of your additional $1200 purchase. For the lows, we’re looking at a 0.1% of 60FPS, compared to a 0.1% of 55FPS on the stock 2080 Ti – no improvement there. Let’s look at a more valuable frametime plot, as these 0.1% metrics don’t tell the whole story.

In our frametime chart, we can see the limitations of scaling. Although the NVLinked cards run a higher average, they fail to sustain similar scaling in frametime consistency. Frametimes are spikier and potentially more jarring, although raw framerate alone makes up for much of this lost frame-to-frameEV interval consistency.

Back to the main chart now, we also have the 1080 Ti cards in SLI to consider: In this configuration, the SLI 1080 Tis operate at 91.4FPS AVG, with spurious lows bouncing around between 42FPS and 66FPS for the 0.1% metric. For averages, the overall performance uplift amounts to about 60% over a single 1080 Ti SC2, and outperforms a single 2080 Ti FE card. Of course, there’ll be games where SLI gets you nothing, but instances like this will permit out-performing new hardware at the same price.

Shadow of the Tomb Raider GPU Benchmark – NVLink RTX 2080 Ti

Shadow of the Tomb Raider is a new game still and will eventually host RTX features, but didn’t at the time of filming. The game also uses a modified Crystal engine. It’s got a lot of issues with NVLink and SLI, and nVidia is aware of them. As of now, we have experienced blue screens of death upon launch, crashes upon minimizing, and other seemingly random crashes. Fortunately, we were eventually able to figure out how to work around these for long enough to run a benchmark – just know that the game is very unstable with multi-GPU. One of the other issues we discovered was constant blue screens with TAA enabled, which is unfortunately what we used for our full GPU review. For this reason, we retested the 1080 Ti with TAA off as well, just for a baseline. We did not retest all devices, only those which are marked.

At 4K, Shadow of the Tomb Raider shows a few-FPS difference between the 1080 Ti SC2 with TAA on and TAA off. This shows that there is minimal overall performance impact, but it will offset our data a bit. The 2080 Ti FE single-card averaged 67FPS originally, with lows tightly timed at around 56-58. This means frametimes are very consistent with a single card. Multi-GPU got us 147FPS AVG, and the dual 1080 Tis got 113FPS AVG. These two numbers are directly relatable as they were run under fully identical conditions: For SLI, it becomes even more difficult to justify the 2080 Tis versus 1080 Tis, not that we fully endorse SLI as a good overall option.

F1 2018 GPU Benchmark – NVLink vs. SLI

F1 2018 also showed scaling results. NVLinked 2080 Tis managed 168FPS AVG here, with 1% lows at around 69FPS. The RTX 2080 Ti single-card had an average of 99, with its 1% lows at 47FPS when stock. The result is scaling of about 70% -- pretty good. It’s not as impressive as Sniper, but still a better gain overall than expected for SLI configurations over the last few years. As for the 1080 Tis in SLI, we measured them at 88FPS AVG and 57FPS for 1% lows. A single 1080 Ti SC2 ran at 81FPS, giving us a dismal scaling of 9%.

Hellblade GPU Benchmark – NVLink RTX 2080 Ti

Hellblade is up next, just for a Dx11 Unreal Engine title. This game has some of the best graphics in a game right now, making it a good benchmarking option, and represents Unreal Engine 4 well. It also did not show any scacling; in fact, technically, we observed negative scaling with this title. We saw a drop of about 16% in performance, with additional jarring tearing during gameplay. Doing research for this content, we learned that there is a custom Hellblade SLI profile out there, but it is not an nVidia official profile. Out of the box, it appears that NVLink does not work with Hellblade, but it also looks like some mods could be made to hack it to work. The 1080 Ti saw negative scaling.

GTA V NVLink Benchmark

GTA V is next. This is another Dx11 title, but it uses the RAGE engine and has been more heavily tuned for graphics hardware over its three-year tenure. It also shows some scaling in averages, though not necessarily in low-end frametime performance. We posted a 132FPS AVG with the NVLinked cards, as opposed to a 77FPS AVG with a single FE GPU. The difference is an approximate 71% improvement in average framerate, but lock-step frametime performance. There is no improvement in low-end performance. For dual GTX 1080 Ti cards in SLI, we managed 117FPS AVG, for a gain of 83% over a single GTX 1080 Ti SC2, allowing it to outperform a 2080 Ti when overclocked, albeit with similar frametime consistency illustrated in the lows. We are nearing CPU limitations in this game, with hard limits around 170FPS AVG on our configuration.

Conclusion: Is NVLink or SLI Worth It in 2018?

It’s certainly gotten better since we last looked at multi-GPU game support. We have never once, in the 10-year history of the site, recommended multi-GPU for AMD or nVidia when given the stronger, more cost-effective single-card alternatives.

That trend continues, but it continues with more hesitance in the answer than ever before. Overall game support is improved, but it’s clear – if only because of SOTTR’s dismal BSOD issues at launch – that games still won’t be immediately supported. Marketshare of multi-GPU users is infinitesimal and off the radar of developers, at least without direct encouragement from the graphics vendors. You could be waiting weeks (or months – or ad infinitum) for multi-GPU support to get patched into games. Most of them won’t have it at launch. A stronger user community than previous years does mean more options if nVidia fails to officially provide SLI profiles, though. NVidia’s renewed focus on selling sets of cards to users, rather than one, may also benefit multi-GPU support. The rise of low-level, low-abstraction APIs has also aided in multi-GPU scalability. It is now more common to see 70% scaling and up.

But we still don’t wholly recommend multi-GPU configurations, particularly given our present stance on the RTX lineup. When it doesn’t work, it burns, and nVidia does not have a strong track record in recent years for supporting its own technologies. VXAO and Flow are rarely used, if ever. MFAA vanished. SLI was shunted with Pascal, forced down to two-way and then forgotten. NVidia hasn’t even updated its own list of supported SLI games – and NVLink is SLI, in this regard – to include the most recent, compatible titles. The company couldn’t give more signals that it won’t support this technology, despite scaling actually improving year-over-year.

It’s better. That much is certain. It’s just a question of whether you can trust nVidia to continue pushing for multi-GPU adoption.

Editorial, Testing: Steve Burke
Video: Andrew Coleman