Cerebras Unveils 2nd Gen Wafer Scale Engine: 850,000 Cores, 2.6 Trillion Transistors
Cerebras is back with the second generation of its Wafer Scale Engine. WSE 2.0 — sadly, the name “Son of Wafer-Scale” appears to have died in committee — is a 7nm die shrink of the original, with far more cores, more RAM, and 2.6 trillion transistors, with a “T.” Makes the 54 billion on your average Nvidia A100 look a bit pedestrian, for a certain value of “pedestrian.”
The concept of a wafer-scale engine is simple: Instead of etching dozens or hundreds of chips into a wafer and then packaging those CPUs or GPUs for individual resale, why not use an entire wafer (or most of a wafer, in this case) for one enormous processor?
People have tried this trick before, with no success, but that was before modern yields improved to the point where building 850,000 cores on a piece of silicon the size of a cutting board was a reasonable idea. Last year, the Cerebras WSE-1 raised eyebrows by offering 400,000 cores, 18GB of on-chip memory, and 9PB/s of memory bandwidth, with 100Pb/s of fabric bandwidth across the wafer. Today, the WSE-2 offers 850,000 cores, 40GB of on-chip SRAM memory, and 20PB/s of on-wafer memory bandwidth. Total fabric bandwidth has increased to 220Pb/s.
While the new WSE-2 is certainly bigger, there’s not much sign it’s different. The top-line stat improvements are all impressive, but the gains are commensurate across the board, which is to say: A 2.12x increase in core count is matched by a 2.2x increase in RAM, a 2.2x increase in memory bandwidth, and a 2.2x increase in fabric bandwidth. The actual amount of RAM, RAM bandwidth, or fabric bandwidth, evaluated on a per-core basis, is virtually identical between the two WSEs.
Normally, with a second-generation design like this, we’d expect the company to make some resource allocation changes or to scale out some specific aspect of the design, such as adjusting the ratios between core counts, memory bandwidth, and total RAM. The fact that Cerebras chose to scale the WSE-1 upwards into the WSE-2 without adjusting any other aspect of the design implies the company targeted its initial hardware well and was able to scale it upwards to meet the desires of its customer base without compromising or changing other aspects of the WSE architecture.
One of Cerebras’ arguments in favor of its own designs is the simplicity of scaling a workload across a single WSE, rather than attempting to scale across the dozens or hundreds of GPUs that might be required to match its performance. It isn’t clear how easy it is to adapt workloads to the WSE-1 or WSE-2, and there don’t seem to be a lot of independent benchmarks available yet to compare scaling between the WSE-1 or WSE-2 and equivalent Nvidia cards. We would expect the WSE-2 to have the advantage in scaling, assuming the relevant workload fits the characteristics of both systems equally, due to the intrinsic difficulty of splitting a workload efficiently across an ever-larger number of accelerator cards.
Cerebras doesn’t appear to have publicly published any benchmarks of the WSE-1 or WSE-2 comparing it against other systems, so we’re still in a holding pattern as far as that kind of data. Moving on from the WSE-1 to the WSE-2 this quickly, however, does imply some customer interest in the chip.