Block pairs are also associated with four texture units, meaning SMMs come armed with eight. Per SM: GM107 GK106 Ratio CUDA Cores1281922/3x Special Function Units32321x Load/Store32321x Texture Units8161/2x Warp Schedulers441x Geometry Engines111xĮvery pair of blocks is tied to a 12 KB texture and L1 cache, adding up to 24 KB per SMM. GM107 SMM (Left) Versus GK106 SMX (Right) However, double-precision math is further pared back to 1/32 the rate of FP32 that was 1/24 in the mainstream Kepler-based GPUs. The previous architecture’s 32 load/store and 32 special function units carry over to Maxwell. And the blocks have 32 CUDA cores each, totaling 128 across the SMM (down from Kepler’s 192). Kepler’s 256 KB register file now gets split into four 64 KB slices. Maxwell attempts to address that by partitioning the SMX into four blocks, each with its own instruction buffer, warp scheduler, and pair of dispatch units. But the SMX apparently proved difficult to fully utilize in this configuration. The big jump in CUDA core count and control logic helped Nvidia overcome losing Fermi’s doubled shader frequency.
There’s also 64 KB serving as shared memory and L1 cache, a separate texture cache, and a uniform cache, plus 16 texture units. In Kepler, each SMX plays host to 192 CUDA cores, four warp schedulers, and a 256 KB register file. The Maxwell Streaming MultiprocessorĬompany representatives tell us that Maxwell’s biggest gains come from a redesign of the Streaming Multiprocessor, now abbreviated as SMM. In other words, it needed to gets its GPUs working smarter, since simply tacking on more resources wouldn’t be an option. Knowing that it wouldn’t have a new process technology node to lean on, Nvidia had to make its improvements to Maxwell with 28 nm manufacturing in mind. Nvidia says its more effective design pulls power consumption way down and nudges performance up, even in a GPU featuring fewer CUDA cores. The GeForce GTX 750 Ti launching today should demonstrate what Maxwell can do (at least at a 60 W TDP). K1-based devices aren't even here yet and we're already thinking about Nvidia's claim that Maxwell offers two times the performance-per-watt of Kepler, and what such sizable improvements could mean to mobile gaming.Ī renewed emphasis on efficiency should be good on the desktop too though, providing the company's retooled architecture continues scaling up from single- to double- and triple-digit power ceilings.įortunately, you won't have to wait long for an answer. This is clearly good news for the Tegra family, which continues clawing around for a more meaningful slice of market share. Read Nvidia Tegra K1 In-Depth: The Power Of An Xbox In A Mobile SoC? to learn more about that architecture.
The impetus for Maxwell comes from Nvidia's effort in the smartphone and tablet space. However, the fundamental building blocks would stay common between segments, and efficiency would guide the important decisions. Engineers would optimize the fabrics between GPU components based on performance targets and power budgets. But Jonah Alben, senior vice president of GPU engineering, also made it clear that every new architecture, from Maxwell onward, would be built with mobile in mind. We already knew that K1’s graphics engine was Kepler-based, essentially a single SMX with notable changes to the structures connecting various subsystems in a bid to optimize for power. Back in December of last year, we were in Santa Clara learning about Nvidia’s Tegra K1 SoC.