L2 vs. L3 cache: What’s the Difference?
CPUs have a number of caching levels. We’ve discussed cache structures generally, in our L1 & L2 explainer, but we haven’t spent as much time discussing how an L3 works or how it’s different compared to an L1 or L2 cache.
At the simplest level, an L3 cache is just a larger, slower version of the L2 cache. Back when most chips were single-core processors, this was generally true. The first L3 caches were actually built on the motherboard itself, connected to the CPU via the back-side bus (as distinct from the front-side bus). When AMD launched its K6-III processor family, many existing K6/K-2 motherboards could accept a K6-III as well. Typically these boards had 512K-2MB of L2 cache — when a K6-III, with its integrated L2 cache was inserted, these slower, motherboard-based caches became L3 instead.
By the turn of the century, slapping an additional L3 cache on a chip had become an easy way to improve performance — Intel’s first consumer-oriented Pentium 4 “Extreme Edition” was a repurposed Gallatin Xeon with a 2MB L3 on-die. Adding that cache was sufficient to buy the Pentium 4 EE a 10-20 percent performance boost over the standard Northwood line.
Cache and the Multi-Core Curveball
As multicore processors became more common, L3 cache started appearing more frequently on consumer hardware. These chips, like Intel’s Nehalem and AMD’s K10 (Barcelona) used L3 as more than just a larger, slower backstop for L2. In addition to this function, the L3 cache is often shared between all of the processors on a single piece of silicon. That’s in contrast to the L1 and L2 caches, both of which tend to be private and dedicated to the needs of each particular core. (AMD’s Bulldozer design is an exception to this — Bulldozer, Piledriver, and Steamroller all share a common L1 instruction cache between the two cores in each module). AMD’s Ryzen processors based on the Zen, Zen+, and Zen 2 cores all share a common L3, but the structure of AMD’s CCX modules left the CPU functioning more like it had 2x8MB L3 caches, one for each CCX cluster, as opposed to one large, unified L3 cache like a standard Intel CPU.
Private L1/L2 caches and a shared L3 is hardly the only way to design a cache hierarchy, but it’s a common approach that multiple vendors have adopted. Giving each individual core a dedicated L1 and L2 cuts access latencies and reduces the chance of cache contention — meaning two different cores won’t overwrite vital data that the other put in a location in favor of their own workload. The common L3 cache is slower but much larger, which means it can store data for all the cores at once. Sophisticated algorithms are used to ensure that Core 0 tends to store information closest to itself, while Core 7 across the die also puts necessary data closer to itself.
Unlike the L1 and L2, which are nearly always CPU-focused and private, the L3 can also be shared with other devices or capabilities. Intel’s Sandy Bridge CPUs shared an 8MB L3 cache with the on-die graphics core (Ivy Bridge gave the GPU its own dedicated slice of L3 cache in lieu of sharing the entire 8MB). Intel’s Tiger Lake documentation indicates that the onboard CPU cache can also function as a LLC for the GPU.
In contrast to the L1 and L2 caches, both of which are typically fixed and vary only very slightly (and mostly for budget parts) both AMD and Intel offer different chips with significantly different amounts of L3. Intel typically sells at least a few Xeons with lower core counts, higher frequencies, and a higher L3 cache-per-CPU ratio. AMD’s Epyc 7F52 pairs a full 256MB L3 cache with just 16 cores and 32 threads.
Today, the L3 is characterized as a pool of fast memory common to all the CPUs on an SoC. It’s often gated independently from the rest of the CPU core and can be dynamically partitioned to balance access speed, power consumption, and storage capacity. While not nearly as fast as L1 or L2, it’s often more flexible and plays a vital role in managing inter-core communication. It’s also not uncommon to see L3 caches being used as an LLC shared by CPU and GPU, or even to see a huge L3 cache pop up on graphics cards like AMD’s RDNA2 architecture.