Intel Core i7 Presentation

"OC3D were present at Intels recent Core i7 presentation at Heathrow. We'll take a look at some of the new features and what they do"

Search News

  • Scrolling Image
  • Scrolling Image
  • Scrolling Image
 Nehalem's Cache Structure
 Core / Uncore
Intel have implemented quite a few changes to the cache structure in Nehalem compared to Penryn, including an all new
L3 cache.
 
Something new that Intel is bringing to us with this modular design, shown on the slide to the right, is the "uncore".  In short, everything other than the cores and their own cache is in the "uncore", such as the integrated memory controller, QPI links and the shared L3 cache.  All of the components in the "uncore" are completely modular and so can be scaled according to the section of the market each chip is being aimed at.  Intel can add or remove cores, QPI links, integrated graphics (which Intel say will come in late 2009) and they could even add another integrated memory controller if they so wish.
 
 
 
Nehalem Cache StructureThe 64KB L1 cache is comprised of a 32KB instruction cache and 32KB data cache which is the same as we see in Penryn.  However, the L2 cache is a totally new design compared to what we see in the Core 2 CPU's of today.  Each core receives 256KB of unified cache that is extremely low latency and scales well to keep extra load off the L3 cache.  While this 256KB is a lot smaller than that of Penryn, it is much quicker and it only takes 10 cycles from load to get data out of the L2 cache.  Each core within a Nehalem CPU with have its L1 & L2 cache integrated within the core itself.
 
The L3 cache that is coming with Nehalem is totally new to Intel, and is also very similar in design to AMD's Phenom CPU's.  It is an inclusive cache that Intel can scale according to how many cores are in any given processor. An inclusive cache means that ALL of the data residing in the L1 or L2 caches within each core will also reside within the L3 cache.  What this achieves is better performance, and in turn lower power consumption, due to any core knowing that if it can't find information it's looking for within
the L3 cache, then it  doesn't exist within any core's L1 and L2 caches.  This will help lower "core snoop traffic" which will become more of a problem as CPU's comprise of more and more cores.
 
 
Hyper ThreadingHyper Threading
 
The launch of Nehalem brings the return of Hyper Threading (also known as simultaneous multi-threading (SMT)).
 
With Nehalem, Intel is a hell of a lot more prepared for the implementation of HT than it was when it was last present in Intel's Pentium 4 processors.  This is largely due to the massive memory bandwidth and larger caches available which aid in getting data to the core faster and more predictably.
 
An Operating System will see an HT enabled processor as multiple processors i.e. a quad core would show up as 8 cores when in reality it's 4 cores and 8 threads.  The OS would then proceed to send 8 threads of instructions to the CPU.

Intel felt that with Nehalem the time was right to re-implement HT, not only for the reasons above, but because at this point in time there are a lot more applications that can actually take advantage of this technology.
 
Intel are also very happy to use this technology again, and build on it in the future due to its performance /die size ratio.  Its performance gain percentage is actually larger than the percentage of die real estate it inhabits.  Ronak explained that in general, when implementing technology, they hope for a 1:1 ratio when it came to gains vs die area it consumes.  He also said that using HT was much more power efficient than adding an entire core.  The inclusion of HT on Nehalem only takes up roughly 5% - 10% of the die area, and the gains seen on supported applications are generally higher than this figure.
 
 
One other thing I feel I should mention is Ronak explained that higher bandwidth hungry applications may not see a gain from HT at all. This is because the bandwidth is already saturated by the data from 4 cores, and adding more threads could actually be detrimental to performance.
 
Here is a little slide to show performance of HT with disabled being 0% and the bars showing the gains with HT being enabled.
 
HT Performance Chart
 
 
On the next page we look at the IMC, QPI & Power Management

Most Recent Comments

26-09-2008, 19:27:42

mrapoc
As title says...also which program is the one we are talking about - iv seen at least two drivercleaners out there!

27-09-2008, 10:50:45

PV5150
Drivercleaner was used when uninstalling graphics card drivers and/or changing from Nvidia to ATI or vice versa. Its CAB cleaning tool removed residual .SYS files, support DLLs, language resources and pretty much anything else not needed.

As for it being necessary...I haven't used it for ages, but it is a good application. Plus the only way that you can get it now is to pay for the Professional version. The free versions that are kicking about still would be too old really for anything
Reply
x

Register for the OC3D Newsletter

Subscribing to the OC3D newsletter will keep you up-to-date on the latest technology reviews, competitions and goings-on at Overclock3D. We won't share your email address with ANYONE, and we will only email you with updates on site news, reviews, and competitions and you can unsubscribe easily at any time.

Simply enter your name and email address into the box below and be sure to click on the links in the confirmation emails that will arrive in your e-mail shortly after to complete the registration.

If you run into any problems, just drop us a message on the forums.