Streszczenie treści zawartej na stronie nr. 1
A Detailed Look Inside the
®
Intel NetBurst™ Micro-Architecture of
®
the Intel Pentium 4 Processor
November, 2000
Streszczenie treści zawartej na stronie nr. 2
® ™ ® A Detailed Look Inside the Intel NetBurst Micro-Architecture of the Intel Pentium 4 Processor Information in this document is provided in connection with Intel products. No license, express or implied, by estoppel or otherwise, to any intellectual property rights is granted by this document. Except as provided in Intel’s Terms and Conditions of Sale for such products, Intel assumes no liability whatsoever, and Intel disclaims any express or implied warranty, relating to sale and/or use
Streszczenie treści zawartej na stronie nr. 3
® ™ ® A Detailed Look Inside the Intel NetBurst Micro-Architecture of the Intel Pentium 4 Processor Revision History Revision Date Revision Major Changes 11/2000 1.0 Release Page 3
Streszczenie treści zawartej na stronie nr. 4
® ™ ® A Detailed Look Inside the Intel NetBurst Micro-Architecture of the Intel Pentium 4 Processor Table of Contents ABOUT THIS DOCUMENT .................................................................................................................5 INTRODUCTION ................................................................................................................................6 SIMD TECHNOLOGY AND STREAMING SIMD EXTENSIONS 2 ....................................................
Streszczenie treści zawartej na stronie nr. 5
® ™ ® A Detailed Look Inside the Intel NetBurst Micro-Architecture of the Intel Pentium 4 Processor About this Document ® ® ® ™ The Intel NetBurst micro-architecture is the foundation for the Intel Pentium 4 processor. It includes several important new features and innovations that will allow the Intel Pentium 4 processor and future IA-32 processors to deliver industry leading performance for the next several years. This paper provides an in-depth examination of the features and functions th
Streszczenie treści zawartej na stronie nr. 6
® ™ ® A Detailed Look Inside the Intel NetBurst Micro-Architecture of the Intel Pentium 4 Processor Introduction ® ® ® TM The Intel Pentium 4 processor, utilizing the Intel NetBurst micro-architecture, is a complete processor re- design that delivers new technologies and capabilities while advancing many of the innovative features, such as ® “out-of-order speculative execution” and “super-scalar execution”, introduced on prior Intel micro-architectural generations. Many of these new innovati
Streszczenie treści zawartej na stronie nr. 7
® ™ ® A Detailed Look Inside the Intel NetBurst Micro-Architecture of the Intel Pentium 4 Processor computations to operate on packed double-precision floating-point data elements and 128-bit packed integers. There are 144 instructions in the SSE2 that can operate on two packed double-precision floating-point data elements, or on 16 packed byte, 8 packed word, 4 doubleword, and 2 quadword integers. The full set of IA-32 SIMD technologies (the Intel MMX technology, the SSE extensions, and the
Streszczenie treści zawartej na stronie nr. 8
® ™ ® A Detailed Look Inside the Intel NetBurst Micro-Architecture of the Intel Pentium 4 Processor The SSE instructions are useful for 3D geometry, 3D rendering, speech recognition, video encoding and decoding. ® For more information on the Streaming SIMD Extensions, refer to the IA-32 Intel Architecture Software Developer’s Manual, Volume 1, available at http://developer.intel.com/design/pentium4/manuals/. Streaming SIMD Extensions 2 § Adds 128-bit data type with two packed double-precisio
Streszczenie treści zawartej na stronie nr. 9
® ™ ® A Detailed Look Inside the Intel NetBurst Micro-Architecture of the Intel Pentium 4 Processor ® Intel NetBurst™ Micro-architecture ® The Pentium 4 processor is the first hardware implementation of a new micro-architecture, the Intel NetBurst micro-architecture. To help reader understand this new micro-architecture, this section examines in detail the following: § the design considerations the Intel NetBurst micro-architecture § the building blocks that make up this new micro-architectu
Streszczenie treści zawartej na stronie nr. 10
® ™ ® A Detailed Look Inside the Intel NetBurst Micro-Architecture of the Intel Pentium 4 Processor ® TM Overview of the Intel NetBurst Micro-architecture Pipeline The pipeline of the Intel NetBurst micro-architecture contain three sections: § the in-order issue front end § the out-of-order superscalar execution core ® TM Figure 3 The Intel NetBurst Micro-architecture § the in-order retirement unit. The front end supplies instructions in program order to System Bus Frequently used paths the
Streszczenie treści zawartej na stronie nr. 11
® ™ ® A Detailed Look Inside the Intel NetBurst Micro-Architecture of the Intel Pentium 4 Processor µops called traces, which are stored in the execution trace cache. The execution trace cache stores these µops in the path of program execution flow, where the results of branches in the code are integrated into the same cache line. This increases the instruction flow from the cache and makes better use of the overall cache storage space since the cache no longer stores instructions that are
Streszczenie treści zawartej na stronie nr. 12
® ™ ® A Detailed Look Inside the Intel NetBurst Micro-Architecture of the Intel Pentium 4 Processor Prefetching The Intel NetBurst micro-architecture supports three prefetching mechanisms: § the first is for instructions only § the second is for data only § the third is for code or data. The first mechanism is hardware instruction fetcher that automatically prefetches instructions. The second is a software-controlled mechanism that fetches data into the caches using the prefetch instructions
Streszczenie treści zawartej na stronie nr. 13
® ™ ® A Detailed Look Inside the Intel NetBurst Micro-Architecture of the Intel Pentium 4 Processor The Static Predictor. Once the branch instruction is decoded, the direction of the branch (forward or backward) is known. If there was no valid entry in the BTB for the branch, the static predictor makes a prediction based on the direction of the branch. The static prediction mechanism predicts backward conditional branches (those with negative displacement), such as loop-closing branches, as
Streszczenie treści zawartej na stronie nr. 14
® ™ ® A Detailed Look Inside the Intel NetBurst Micro-Architecture of the Intel Pentium 4 Processor § selecting IA-32 instructions that can be decoded into less than 4 μops and/or have short latencies § ordering IA-32 instructions to preserve available parallelism by minimizing long dependence chains and covering long instruction latencies § ordering instructions so that their operands are ready and their corresponding issue ports and execution units are free when they reach the scheduler. T
Streszczenie treści zawartej na stronie nr. 15
® ™ ® A Detailed Look Inside the Intel NetBurst Micro-Architecture of the Intel Pentium 4 Processor Port 3. Port 3 supports the dispatch of one store address operation per cycle. Thus the total issue bandwidth can range from zero to six µops per cycle. Each pipeline contains several execution units. The µops are dispatched to the pipeline that corresponds to its type of operation. For example, an integer arithmetic logic unit and the floating-point execution units (adder, multiplier, and div
Streszczenie treści zawartej na stronie nr. 16
® ™ ® A Detailed Look Inside the Intel NetBurst Micro-Architecture of the Intel Pentium 4 Processor b) avoiding the need to access off-chip caches, which can increase the realized bandwidth compared to a normal load-miss, which returns data to all cache levels. The situations that are less likely to benefit from software-controlled data prefetch are the following: § In cases that are already bandwidth bound, prefetching tends to increase bandwidth demands, and thus not be effective. § Prefet
Streszczenie treści zawartej na stronie nr. 17
® ™ ® A Detailed Look Inside the Intel NetBurst Micro-Architecture of the Intel Pentium 4 Processor branches are resolved. However, speculative loads cannot cause page faults. Reordering loads with respect to each other can prevent a load miss from stalling later loads. Reordering loads with respect to other loads and stores to different addresses can enable more parallelism, allowing the machine to execute more operations as soon as their inputs are ready. Writes to memory are always carrie