This test is another in a series of studies to learn more about nVidia’s new Volta architecture. Although Volta in its present form is not the next-generation gaming architecture, we would anticipate that key performance metrics can be stripped from Volta and used to extrapolate future behavior of nVidia’s inevitable gaming arch, even if named differently. One example would be our gaming benchmarks, where we observed significant performance uplift in games leveraging asynchronous compute pipelines and low-level APIs. Our prediction is that nVidia is moving toward a future of heavily support asynchronous compute job queuing, where the company is presently disadvantaged versus its competition; that’s not to say that nVidia doesn’t do asynchronous job queuing on Pascal (it does), but that AMD has, until now, put greater emphasis on that particular aspect of development.
This, we think, may also precipitate more developer movement toward these more advanced programming techniques. With the only two GPU vendors in the market supporting lower level APIs and asynchronous compute with greater emphasis, it would be reasonable to assume that development would follow, as would marketing development dollars.
In this testing, we’re running benchmarks on the nVidia Titan V to determine whether GPU core or memory (HBM2) overclocks have greater impact on performance. For this test, we’re only using a few key games, as selected from our gaming benchmarks:
Sniper Elite 4: DirectX 12, asynchronous compute-enabled, and showed significant performance uplift in Volta over Pascal. Sniper responds to GPU clock changes in drastic ways, we find. This represents our async titles.
Ashes of the Singularity: DirectX 12, but less responsive than Sniper. We were seeing ~10% uplift over the Titan Xp, whereas Sniper showed ~30-40% uplift. This gives us a middle-ground.
Destiny 2: DirectX 11, not very responsive to the Titan V in general. We saw ~4% uplift over the Titan Xp at some settings, though other settings combinations did produce greater change. This gives us a look at games that don’t necessarily care for Volta’s new async capabilities.
We are also using Firestrike Ultra and Superposition, the latter of which is also fairly responsive to the Titan’s dynamic ray-casting performance.
We are running the fan at 100% for all tests, with the power offset at 120% (max) for all tests. Clocks are changed according to their numbers in the charts.
We have more frequencies tested in synthetic benchmarks, coming up soon, but we’re starting with just the polar opposites for a few games, as we know games interest most of you more than synthetics.
For Sniper Elite 4, we observed slightly more benefit from just HBM overclocking, indicated by 129.6FPS AVG and marginally increased lows, versus 125.2FPS AVG. This shows a 3.5% increase from doing just the HBM2 overclock versus just the core overclock. Overclocking either one standalone is still getting us a noteworthy jump over the stock Titan V – minimally 8.7% -- but overclocking both has the most impressive gains, jumping up to 142FPS AVG. It’s almost as if the core and HBM overclocks “stack,” in this particular title, and that makes sense: Remember that Sniper Elite 4 uses asynchronous compute and low-level API programming to leverage components more heavily. This, we think, is an indicator of where nVidia is going with its future gaming architecture, which will likely be a Volta-derivative, but won’t be Volta in its current form.
Ashes of the Singularity – Volta Core vs. HBM2 Overclock
In Ashes of the Singularity with DirectX 12, we’re observing clock scaling and HBM scaling almost equally. The increased core clock helps a bit more in frametime consistency, but doesn’t move the average in a meaningful way versus just the increased HBM clock. These are functionally the same. Again, overclocking both provides a noteworthy gain – about 5% over the individual component overclocks.
Destiny 2 – Titan V Core vs. HBM2 Overclock
Destiny 2 showed some of the least scaling in our original tests, which is a mix of its DirectX 11 API and, more importantly, a potential ROPs limitation, as we have the same ROP count as previous high-end 10-series cards.
With Destiny 2, we observed marginally higher performance with just an HBM overclock, at 2.6% boosted over the core-only overclock. Overclocking both the core and HBM gave us another 8% over the core-only overclock.
Firestrike Ultra – Titan V Benchmarks
Here’s what we observed in Firestrike Ultra. For this one, we saw the core overclocks generally providing greater uplift, with a change of 8686 points for core-only to 8434 points for memory-only. The difference is a boost of about 3% for core over memory-only overclocks. Overclocking both to 200MHz offset gets us to 9026 points, where we’re observing diminishing returns versus the 175MHz HBM2 offset and 150MHz HBM2 offset. It would appear that the final 25MHz of HBM clock isn’t doing a lot for us, or that we’re becoming bound elsewhere in the architecture. That may be core clock.
Superposition – Titan V Benchmarks
Superposition shows more gains from the core than memory-only overclocks, but not by much. The difference is about 1.5%. We don’t run into diminishing returns as hard with this one, as the 175MHz offset, the 150MHz offset, and the 100MHz HBM2 offset all show somewhat comparable scaling in performance. Our 100MHz core and 200MHz HBM2 offset also shows a slight gain over core-only or HBM-only, and further illustrates that there’s a bit more headroom to boost performance in this particular application.
Conclusion: Core vs. Memory Overclocking
Titan V behaves a bit differently from the other recent HBM2 launch (Vega), primarily in that it appears less memory-constrained than Vega. The Titan V card responds better to core overclocks, in some cases, where other benchmarks produces roughly equal uplift from both core and memory overclocking. Based on our thermal and gaming benchmarks from earlier, it would appear that the Titan first needs a core OC, or at least a power offset and improved cooling solution, as performance grows significantly with core overclocks alone. Once that’s solved for, memory is actually providing meaningful uplift in some of these applications; it’s not like other GPU architectures where memory OCs can sometimes appear not worthwhile.
Editorial, Testing: Steve Burke Video: Andrew Coleman