Alder Lake Deep Dive - Intel promises a 19% IPC increase with their Performance Cores
Alder Lake will be released later this year
Published: 19th August 2021 | Source: Intel |
Alder Lake Deep Dive - Intel promises a 19% IPC increase with their Performance Cores
Intel's Alder Lake CPU designs will be utilising two new core designs, utilising a new Performance-core (previously Golden Cove) architecture to deliver high clock speeds, low latencies, and high levels of single-threaded performance, and new Efficient-cores (previously called Gracemont) to enable highly scalable levels of multi-threaded performance while utilising minimal amounts of power.
Intel Thread Director
To ensure that their new Alder Lake Hybrid x86 designs operate at peak levels of efficiency, Intel has created a new hardware/software feature called "Thread Director" to ensure that all workloads are optimally spread out amongst Intel's Performance cores and Efficient cores. This feature will allow Intel to mitigate the downsides of prior hybrid architecture x86 processors, maximising the performance and efficiency of Alder Lake.
Intel has confirmed that they have been working with Microsoft on Windows 11 to ensure that hybrid x86 CPU architectures are supported. As such, Intel will be a major driver for Windows 11 adoption, as Windows 11 is required to maximise the performance of Intel's latest processors.
We have discussed Intel's Thread Director in more detail here.
Performance x86 Core Architecture
With their performance-oriented Golden Cove CPU cores, Intel's planning to deliver their users a dramatic performance uplift with a wider front-end, deeper registers, improved branch prediction and optimised cache solutions.
When compared with today's Rocket Lake processors, Intel's pledging to deliver their users a significant IPC uplift, building upon the company's existing improvements with Tiger Lake and Rocket Lake. Alder Lake's performance cores should deliver significant performance improvements over Intel's older Comet Lake and older Skylake-series CPU core designs.
Intel’s new Performance-core microarchitecture, previously code-named “Golden Cove,” is designed for speed and pushes the limits of low latency and single-threaded application performance. Workloads are growing in their code footprint and demand more execution capabilities. Datasets are also massively growing along with data bandwidth requirements. Intel’s new Performance-core microarchitecture provides a significant boost in general purpose performance and better support for large code footprint applications.
The Performance-core features a wider, deeper and smarter architecture:
• Wider: six decoders (up from four); eight-wide µop cache (up from six); six allocation (up from five); 12 execution ports (up from 10)
• Deeper: Bigger register files; bigger physical register files; deeper re-order buffer with 512 entry
• Smarter: Improved branch prediction accuracy; reduced effective L1 latency; full write predictive bandwidth optimizations in L2
19% IPC increase?
Across a series of general-purpose performance tests, Intel's promising an average IPC increase of 19% with their Alder Lake CPU cores. Remember that this listed IPC increase is an average, which means that in some applications, Alder Lake will deliver significantly lower and significantly higher performance improvements.
Based on Intel's chart below, expect a 19% performance advantage on average, no performance increase at worst and up to a 60% performance boost in rare circumstances.
The Performance-core is the highest performing CPU core Intel has ever built and pushes the limits of low latency and single-threaded application performance with:
• A Geomean improvement of ~19% across a wide range of workloads over current 11th Gen Intel Core processor architecture (Cypress Cove) at ISO frequency for general purpose performance
• Exposure for more parallelism and an increase in execution parallelism
• Intel Advanced Matrix Extensions, the next-generation, built-in AI acceleration advancement, for deep learning inference and training performance. It includes dedicated hardware and new instruction set architecture to perform matrix multiplication operations significantly faster
• Reduced latency and increased support for large data and large code footprint applications
Intel's power-efficient cores are designed to do two things, deliver a lot of multi-threaded performance while consuming little power and deliver high core counts while using minimal die area. Intel's block diagrams show that the company can fit around four Efficient cores within the same die area as a larger high-performance core. This allows Intel to deliver more multi-threaded performance while efficiently using die space and power, which is great news for Intel.
Performance-wise, Intel says that its efficient cores are more powerful and power-efficient than their Skylake series cores. They can deliver 40% more performance per core while consuming the same power or the same performance while consuming 40% less power. Given the performance of Intel's Skylake series cores, these are not bad results from Intel, especially when these cores won't be doing the heavy lifting for Alder Lake.
Intel’s new Efficient-core microarchitecture, previously code-named “Gracemont,” is designed for throughput efficiency, enabling scalable multithreaded performance for modern multitasking. This is Intel’s most efficient x86 microarchitecture with an aggressive silicon area target so that multicore workloads can scale out with the number of cores. It also delivers a wide frequency range. The microarchitecture and focused design effort allow Efficient-core to run at low voltage to reduce overall power consumption, while creating the power headroom to operate at higher frequencies. This allows Efficient-core to ramp up performance for more demanding workloads.
Efficient-core utilizes a variety of technical advancements to prioritize workloads without being wasteful with processing power and to directly enhance performance with features that improve instruction per cycle (IPC), including:
• 5,000 entry branch target cache that results in more accurate branch prediction
• 64 kilobyte instruction cache to keep useful instructions close without expending memory subsystem power
• Intel’s first on-demand instruction length decoder that generates pre-decode information
• Intel’s clustered out-of-order decoder that enables decoding up to six instructions per cycle while maintaining energy efficiency
• A wide back end with five-wide allocation and eight-wide retire, 256 entry out-of-order window and 17 execution ports
• Robust security features that support Intel control-flow enforcement technology and Intel virtualization technology redirection protection
• The implementation of the AVX ISA, along with new extensions to support integer artificial intelligence (AI) operations
Compared with the Skylake CPU core, Intel’s most prolific central processing unit (CPU) microarchitecture, in single-thread performance, the Efficient-core achieves 40% more performance at the same power or delivers the same performance while consuming less than 40% of the power1 . For throughput performance, four Efficient-cores offer 80% more performance while still consuming less power than two Skylake cores running four threads or the same throughput performance while consuming 80% less power.
So far, Intel has revealed three Alder Lake CPU dies, one focused on desktop performance, another focused on the mobile market, and another that targets ultra-mobile devices like Ultrabooks.
All of these CPU designs use the same building blocks, offering users the same Performance Cores, Efficient Cores, Xe graphics, and other functional blocks.
We can see below that Intel's desktop model features eight performance cores and eight efficient cores, while Intel's mobile model sheds two performance cores in favour of a larger integrated Xe graphics solution. Intel's smaller Ultra Mobile ship sheds all but two performance cores to minimise the processor's die size and maximise the CPU's use of efficient cores.
Alder Lake is a top-to-bottom solution for the consumer PC market, delivering high performance levels for desktop users while making clever power/efficiency tradeoffs to cater to specific areas of the mobile market.
Intel's desktop Alder Lake processors will ship later this year with support for DDR5 memory, PCIe 5.0 connectivity, WiFi 6E support and Thunderbolt 4.
You can join the discussion on Intel's Alder Lake hybrid CPU architecture on the OC3D Forums.