This week, our news is headlined with surprise supercomputer wins, from Fujitsu’s “Fugaku” dethroning Summit for the new No.1 spot, to Nvidia’s “Selene” that uses AMD CPUs, interestingly enough. Equally newsworthy is Apple confirming its transition from Intel to Arm, in the form of “Apple Silicon,” which leaves more questions than answers right now, but the move will have big implications for the CPU landscape. Speaking of Arm, the server space is set to heat up even more with Ampere’s new Altra Max product stack.
We also have news of a massive air cooler aimed at GPUs from Raijintek, which is a bit different.
The ‘F,’ Koduri joked, is for ‘fabulous’ -- but this is more likely in-line with the BFG.
Raja Koduri is a former AMD engineer whom we interviewed during his tenure at the Radeon group, but has been at Intel for a few years now. Most recently, Raja tweeted photos of one of Intel’s Folsom testing labs for Intel Xe silicon. It’s mostly interesting for a look behind the scenes, as there’s no performance showcase just yet. We should also note that this is almost certainly for the Xe deployment that focuses on data center performance and is not a gaming variant. It’ll make it down the chain eventually, though.
In the photos, we see a massive retention kit for what is likely a socketable test system, something we saw in our MSI factory tour for rapid GPU testing. A photo of a board containing the test silicon shows debug wires everywhere, including some labeled PCIe and GT, and a label that might indicate DG1 in the ES stages. The water cooling solutions have some insane springs coupled with them, which furthers our guess that this might be socketable since you need a lot of mounting pressure to get full contact with BGA chips in a socket.
We’re looking forward to more news from Intel’s graphics team.
We’ve talked about Ampere Computing a few times now, and for good reason. The company is a new chip startup with its eyes set on disrupting the x86 market dominated by Intel and AMD. The fact that the company is headed by former Intel President Renee James makes it all the more interesting. Just to be absolutely clear, Ampere Computing is different from NVIDIA Ampere, although that’s definitely what most people will think of when seeing headlines.
Last we talked of Ampere Computing, we mentioned its upcoming line of Altra chips, topping out at 80 cores. Although now, the Altra line will have a new flagship designated in the forthcoming Altra Max brand. Ampere’s extended roadmap now includes Altra Max, which will feature 128 cores, will be socket compatible with Altra, and should sample sometime in the fourth quarter of this year.
As a reminder, Ampere’s Altra is based on the Arm N1 Neoverse architecture, using Ampere’s “Quicksilver” cores, which are actually semi-custom Neoverse cores. Much like Amazon’s Graviton2, Altra is attempting to cut out a swath of the server/cloud provider market for Arm. Unlike the Graviton2, however, Ampere’s Altra will be available to anyone who isn’t Amazon/AWS.
Altra will contain SKUs ranging from 32 cores up to 80 cores, at least initially; 128C chips will follow later. Frequencies start out at 1.7GHz and scale up to 3.3GHz for the most performant SKUs. At the low end, TDPs start at 45W and increase throughout the product stack to 250W. Every SKU will offer 128 PCIe 4.0 lanes, and support for DDR4 3200. Both Altra and Altra Max will come in 1U and 2U platforms, which Ampere is calling Mt. Jade (two socket) and Mt. Snow (one socket).
While Altra and Altra Max are built on TSMC’s N7 node, Ampere offered an update on its next generation of 5nm chips, codenamed Siryn. Siryn will be built on one of TSMC’s N5 nodes, and is slated for sampling in late 2021, with volume production in 2022.
Ampere is already courting customers such as Cloudflare, Packet, and GenyMotion. Additionally, Ampere is working with Nvidia to bring CUDA acceleration to ARM when paired with an Nvidia GPU.
At its WWDC 2020 event, Apple finally confirmed the hushed whispers that it would be ditching Intel silicon for its own “Apple Silicon.” In what seems like a move that will both further Apple’s vertical integration and add another layer atop Apple’s walled garden, Apple will transition to the Arm architecture to design its own chips for Macs and MacBooks, much like it does with its smaller devices (iPhone, iPad, etc.).
As we mentioned previously, Apple’s move, such as it is, leaves a lot of unanswered questions -- and Apple was decidedly vague with regards to details. The two biggest question marks are: How long will Apple continue to support Intel-based Macs, and what will become of x86 app support? Apple’s past doesn’t instill a lot of faith into either. There will be a bit of an overlap period where both x86 Macs and Arm-based Macs will coexist, as Apple plans to bring its first Arm-based Mac to market by end of year. All in all, Apple notes that the transition away from Intel will take ~2 years.
The move to Arm means abandoning the Intel x86 ISA and replacing it with custom Arm silicon, which means x86 code won’t run natively; it’ll need to be recompiled or emulated. Apple is overhauling its Xcode development environment for this cause, and it’s resurrecting Rosetta -- in the form of Rossetta 2 -- as a compatibility layer to keep x86 apps working while Apple and its development partners move to Arm. Apple previously used Rossetta in the early 2000s, when it transitioned from PowerPC to Intel. Here’s an excellent recount of those days.
Nonetheless, it’ll be interesting to see just what Apple can do with higher power SoCs. Apple’s mobile silicon has been the market leader for years, and Apple seems to think it can do for its Mac line what it has already done for its iOS line. Assuming Apple manages to develop an Arm-based SoC that truly rivals x86 desktop/workstation class chips, the complexion of the CPU landscape will begin to look very different.
And as for why Apple would leave Intel, aside from the obvious reasons -- more control over its hardware and release schedule, increased margins, etc. -- there’s a former Intel engineer who seems to have an idea.
François Piednoël, who was the former principal engineer at Intel, offered some insight via his YouTube channel. According to Piednoël (per Gizmodo) the turning point was Skylake, and its quality assurance. Of Skylake’s quality assurance, Piednoëlmore said that it was “more than a problem, it was abnormally bad.”
“Basically our buddies at Apple became the number one filer of problems in the architecture. When your customer starts finding almost as much bugs as you found yourself, you’re not leading into the right place,” said Piednoël. Piednoël went on to say that the issues with Skylake stemmed from changes in leadership and management that led to overall poor management of Skylake during the time.
11:20 | Fugaku Supercomputer Takes #1 Spot from Summit
The new leader in supercomputers is Fugaku, developed by Fujitsu and Riken Center for Computational Science in Japan. Fugaku is now sitting at #1 at the TOP500 list, dethroning Summit. When compared to Summit, Fugaku’s lead in specs, performance and power is significant.
The Fugaku supercomputer boasts 7,299,072 CPU cores, an Rmax value of 415 petaFLOPs, and an Rpeak of 513 petaFLOPs. For comparison, Summit runs 2,414,592 CPU cores, with an Rmax of 148 petaFLOPS and an Rpeak of 200 petaFLOPs. The CPU of choice for Fugaku is the ArmV8-2A-based A64FX. The A64FX succeeds Fujitsu’s previous server/supercomputer design that was the SPARC64. The A64FX is a 48+4 (48 cores + 4 assistance cores) core design, running at 2.2GHz with 4,866,048 GB of memory. The A64FX is manufactured on TSMC’s N7 node, with a reported transistor count of around 8.7B.
Fugaku also entered the Green500 list at #9, coming in behind Summit. While Fugaku isn’t scheduled to be completely operational until 2021, 400 of its racks have been deployed this month in response to the pandemic. As AnandTech notes, as more racks and resources come online, Fugaku’s performance lead will likely increase.
13:10 | Nvidia’s New Selene Supercomputer Ranked at #7
In other supercomputer news, Nvidia’s new Selene supercomputer entered the TOP500 list at #7, placing it just behind Italy’s HPC5, and just ahead of Texas’ Frontera. Nvidia’s announcement of Selene coincided with the release of a PCI-based version of the A100 accelerator.
Regarding Selene, the supercomputer is the latest to join Nvidia’s internal research cluster. Selene is centered around Nvidia’s DGX A100; Selene uses 280 DGX A100 units, for a total of 2,240 A100 GPUs. Selene also uses 494 Nvidia Mellanox Quantum 200G InfiniBand switches and 7 PB of flash storage. Selene also makes use of AMD’s Epyc Rome silicon, specifically the Epyc 7742 parts. It seems there’s 2x Epyc 7742 CPUs and 8x A100 GPUs per DGX A100 node.
Selene packs a Linpack score of 27.6 petaFLOPS and a Rpeak of 34.5 petaFLOPS, while consuming 1.3 MegaWatts of power. Nvidia notes that Selene entered the Green500 at #2. Nvidia uses Selene for chip modeling and development, robotics, self driving, and other research projects. Selene is Nvidia’s fourth entry in the TOP500, and marks another win for AMD.
As a reminder, AMD’s hardware will also appear in the future Frontier and El Capitan supercomputers, both of which are exa-scale machines set for deployment in 2021 and 2023, respectively.
News also surfaced this past week of an interesting air cooler, one aimed specifically at GPUs. Raijintek's Morpheus 8057 is an interesting design; it’s a massive fin stack affixed to 12 heatpipes, fitted for 2x 120mm axial fans. The heatpipes are of the 6mm variety, and the fin stack features roughly 129 fins.
The main cooler is aimed at cooling the GPU itself, and isn’t capable of cooling the VRAM or VRM on its own. However, Raijintek includes additional copper and aluminum heatsinks for this purpose. Additionally, while Raijintek includes fan clips, users will need to provide their own fans.
The Morpheus 8057 is compatible with Nvidia’s RTX 20-series cards and AMD’s RX 5000-series. Future support for Nvidia’s upcoming Ampere and AMD’s RDNA 2 cards is likely up in the air. Raijintek hasn’t mentioned pricing or availability.
18:17 | Cox Still Promising to Reduce Latency in Games
Last year, Cox Communications announced it was working on a separate subscription called “Elite Gamer” that was aimed at reducing lag and ping times in games. On the surface, it sounded like Cox was basically introducing fast lanes (and to a certain extent, it still does), but this is something Cox expressly denies. So we’re clear, this is the same Cox that is currently throttling entire neighborhoods to deal with “excessive” internet usage.
At the time of announcement last year, Cox was trialing the service in Arizona. Now, after about a year of not hearing anything about it (we hoped it had died), it seems Cox is ready for a broad roll out. Cox is confident in the service, promising 32% less lag, if you believe that. Cox also touts fewer ping spikes and less jitter.
According to Cox, “Elite Gamer is designed to use an intelligent server network to route your game connections more efficiently. To put it simply, when you are not using Elite Gamer, your game connection is treated the same as all the other traffic on your network. This means that, generally, the easiest path for the data is used, not the most efficient. It would be like taking the highway during rush hour because it’s easier to get to, rather than taking back roads when they would get you where you are going faster.”
Cox’s Elite Gamer will be included with certain plans for the first computer connections, with additional connections costing $5/mo. Otherwise, Cox will charge $7/mo for the first connection, and $5/mo for additional connections. As many have pointed out, what Cox is offering here looks like a rebranded version of WTFast. Many users aren’t sold on that service, either.
That latest development in the ongoing SMR saga sees Western Digital trotting out a new identifier within its WD Red line, known as WD Red Plus. The WD Red Plus line will exclusively use CMR technology, rather than SMR. Hopefully.
This all comes after weeks of unflinching criticism, not only against WD, but Seagate and Toshiba as well. WD is currently embattled in at least two class action lawsuits that we’re aware of; one in the US, and another in Canada. By introducing the WD Red Plus line, WD is attempting to make good on its promise to be more transparent about the use of SMR in NAS oriented HDDs. However, if the recent lawsuit has its way, it could prevent WD from advertising SMR as suitable for RAID and NAS altogether.
WD is offering the WD Red Plus drives as suitable for users who deal with more write-intensive workloads or deal with the ZFS file system. WD still maintains that WD Red (with DMSMR) is suitable for less demanding SOHO users. The WD Red Pro drives that have always used CMR will remain unchanged. Again, the lawsuit(s) could change all this, but for the time being, this is how WD is repositioning its product stack.
Tom’s Hardware also reports that WD has amended its marketing materials to mention what drives are using SMR, and notes that retailers have begun to update product listings.
Nvidia has released its first driver package that offers support for Microsoft’s DirectX 12 Ultimate. As we’ve mentioned before, DirectX 12 Ultimate is more of a rebranding of the API and an attempt at unifying features across platforms (i.e., PC and Xbox Series X). DirectX 12 Ultimate is also attempting to streamline certain features, such as DirectX Raytracing, Variable Rate Shading, Mesh Shaders and Sampler Feedback, to name a few.
With Nvidia’s latest Game Ready Drive package, 451.48, DirectX 12 Ultimate and its feature set should all be accessible on RTX GPUs.
One of the more interesting features that came alongside the Windows 10 version 2004 update (via WDDM 2.7) is hardware accelerated GPU scheduling. We mentioned this briefly a few weeks ago, citing that while Windows was greenlighting the feature, it lacked driver support from all the GPU providers. With Nvidia’s latest drivers, Nvidia is officially adding support for the feature.
We’re still not entirely clear on how this feature works, or how well it works, for that matter. However, the idea is that the GPU will be able to manage its VRAM independent of the OS, theoretically removing a layer of latency and overhead. In turn, that could improve latency and performance, however minor.