# Intel® Open Source HD Graphics and Intel Iris™ Graphics # **Programmer's Reference Manual** For the 2014-2015 Intel Core™ Processors, Celeron™ Processors and Pentium™ Processors based on the "Broadwell" Platform Volume 4: Configurations May 2015, Revision 1.0 #### **Creative Commons License** **You are free to Share** - to copy, distribute, display, and perform the work under the following conditions: - **Attribution.** You must attribute the work in the manner specified by the author or licensor (but not in any way that suggests that they endorse you or your use of the work). - No Derivative Works. You may not alter, transform, or build upon this work. #### **Notices and Disclaimers** INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL® PRODUCTS. NO LICENSE, EXPRESS OR IMPLIED, BY ESTOPPEL OR OTHERWISE, TO ANY INTELLECTUAL PROPERTY RIGHTS IS GRANTED BY THIS DOCUMENT. EXCEPT AS PROVIDED IN INTEL'S TERMS AND CONDITIONS OF SALE FOR SUCH PRODUCTS, INTEL ASSUMES NO LIABILITY WHATSOEVER AND INTEL DISCLAIMS ANY EXPRESS OR IMPLIED WARRANTY, RELATING TO SALE AND/OR USE OF INTEL PRODUCTS INCLUDING LIABILITY OR WARRANTIES RELATING TO FITNESS FOR A PARTICULAR PURPOSE, MERCHANTABILITY, OR INFRINGEMENT OF ANY PATENT, COPYRIGHT OR OTHER INTELLECTUAL PROPERTY RIGHT. A "Mission Critical Application" is any application in which failure of the Intel Product could result, directly or indirectly, in personal injury or death. SHOULD YOU PURCHASE OR USE INTEL'S PRODUCTS FOR ANY SUCH MISSION CRITICAL APPLICATION, YOU SHALL INDEMNIFY AND HOLD INTEL AND ITS SUBSIDIARIES, SUBCONTRACTORS AND AFFILIATES, AND THE DIRECTORS, OFFICERS, AND EMPLOYEES OF EACH, HARMLESS AGAINST ALL CLAIMS COSTS, DAMAGES, AND EXPENSES AND REASONABLE ATTORNEYS' FEES ARISING OUT OF, DIRECTLY OR INDIRECTLY, ANY CLAIM OF PRODUCT LIABILITY, PERSONAL INJURY, OR DEATH ARISING IN ANY WAY OUT OF SUCH MISSION CRITICAL APPLICATION, WHETHER OR NOT INTEL OR ITS SUBCONTRACTOR WAS NEGLIGENT IN THE DESIGN, MANUFACTURE, OR WARNING OF THE INTEL PRODUCT OR ANY OF ITS PARTS. Intel may make changes to specifications and product descriptions at any time, without notice. Designers must not rely on the absence or characteristics of any features or instructions marked "reserved" or "undefined". Intel reserves these for future definition and shall have no responsibility whatsoever for conflicts or incompatibilities arising from future changes to them. The information here is subject to change without notice. Do not finalize a design with this information. The products described in this document may contain design defects or errors known as errata which may cause the product to deviate from published specifications. Current characterized errata are available on request. Implementations of the I2C bus/protocol may require licenses from various entities, including Philips Electronics N.V. and North American Philips Corporation. Intel and the Intel logo are trademarks of Intel Corporation in the U.S. and other countries. \* Other names and brands may be claimed as the property of others. Copyright © 2015, Intel Corporation. All rights reserved. ## **Table of Contents** | Configurations Overview | 1 | |--------------------------|---| | Top Level Block Diagrams | | | GT1 Configuration | | | Device Attributes | | | Steppings and Device IDs | | # **Configurations Overview** The Intel "Gen" Graphics Architecture was first introduced to the market in 2004. Since that time, the architecture and its implementation have evolved to add many new features, increase performance, and improve power efficiency. This volume of the Programmer's Reference Manual provides information about architectural attributes, feature sets and performance. ## **Top Level Block Diagrams** The following diagram shows basic feature blocks of the Broadwell (BDW) graphics architecture arranged in a GT3 configuration, with the portion above the "Chop" line representing the GT2 configuration: This diagram is based on the following functional partitions: - (a) Geometry Fixed Functions - (b) Media Fixed Functions - (c) Global Assets and GT Interface - (d) One or more Subslices (three shown) - (e) A Slice-Common block - (f) An L3 Cache (L3\$) block Note that the combination of (a), (b), and (c) is typically referred to as the "unslice", while a combination of (d), (e), and (f) is referred to as a compute "slice". The functionality in each of these groupings is further broken down as follows: - Unslice Fixed function pipelines for 3D, GPGPU, and Media operations, and interface to the outside world. - o The 3D Geometry / Fixed Function (Geom/FF) block consisting of: - 3D fixed function pipeline (CS, VFVS, HS, TE, DS, GS, SOL, SL, SFE) - Video Front-End unit (VFE) - Thread Spawner unit (TSG) and the global Thread Dispatcher unit (TDG) - Unified Return Buffer Manager (URBM) - o Media fixed function assets: - Video Decode (VD) Box - Video Encode (VE) Box - Wireless Display (WD) BOX - The Global Assets (GA) block as the primary interface and memory stream gateway to the outside world, consisting of: - GT Interface (GTI) - State Variable Manager (SVM) - Blitter (BLT) - Graphics Arbiter (GAM) - Subslice (three shown) A compute unit with supporting fixed- or shared-function assets sufficient for the EU capability. - o A bank of Execution Units (EUs) eight per subslice shown - o Sampler, supporting both media and 3D functions - o Gateway (GWY) - o Instruction cache (IC) - Local Thread Dispatcher (TDL) - Barycentric Calculator (BC) - o Pixel Shader Dispatcher (PSD) - Data Cluster (HDC) - o Dataport Render Cache (DAPRC) - Slice Common Scalable fixed function assets which support the compute horsepower provided two or more subslices. - o 3D Fixed Function: - Windower/Mask unit (WM) - Plane-Z, Hi-Z (HZ) and Intermediate Z (IZ) - Setup Backend (SBE) - Pixel backend units - 3D stream caches for color, multi-sample surface, iz, and stencil (RCC, MSC, RCZ, STC) - o Media Fixed Function: - DAPRSC - L3 Cache backing L3 cache for certain memory streams emanating from subslices. - o L3 Data cache with support for data, URB, and shared local memory (SLM) Slices and unslices are combined to create three product configurations: - GT3: A single unslice coupled with two slices, plus an added VD and VE unit (see above) - GT2: A single unslice coupled with a single slice (see above) - GT1: The smallest configuration uses a reduced unslice and a reduced slice (see below) #### **GT1 Configuration** The GT1 configuration is an opportunistic reduction of GT2, as shown in the following diagram: ## **Device Attributes** | | Product Config | juration Attrik | oute Table | | | |----------------------------------------------------------|----------------------|----------------------|----------------------|----------------------|----------------------| | <b>Product Family</b> | BDW | | | | | | SKU Name | GT1F | GT1.5F | GT2F | GT2 | GT3 | | | Glob | oal Attributes | | | | | Slice count | 1 | 1 | 1 | 1 | 2 | | Subslice Count | 2 | 3 | 3 | 3 | 6 | | EU/Subslice | 6 | 6 | 8 | 8 | 8 | | EU count (total) | 12 | 18 | 23 | 24 | 48 | | Thread Count | 7 | 7 | 7 | 7 | 7 | | Thread Count (Total) | 84 | 126 | 161 | 168 | 336 | | FLOPs/Clk - Half Precision, MAD (peak) | 384 | 576 | 736 | 768 | 1536 | | FLOPs/Clk - Single Precision,<br>MAD (peak) | 192 | 288 | 368 | 384 | 768 | | FLOPs/Clk - Double Precision,<br>MAD (peak) | 48 | 72 | 92 | 96 | 192 | | Unslice clocking<br>(coupled/decoupled from Cr<br>slice) | coupled | coupled | coupled | coupled | coupled | | GTI / Ring Interfaces | 1 | 1 | 1 | 1 | 1 | | GTI bandwidth (bytes/unslice-clk) | 64: R | 64: R | 64: R | 64: R | 64: R | | | 32: W | 32: W | 32: W | 32: W | 64: W | | | Caches & D | <b>Dedicated Mer</b> | nories | | | | L3 Cache, total size (bytes) | 384K | 768K | 768K | 768K | 1.5M | | L3 Cache, bank count | 2 | 4 | 4 | 4 | 8 | | L3 Cache, bandwidth (bytes/clk) | 2x 64: R<br>2x 64: W | 4x 64: R<br>4x 64: W | 4x 64: R<br>4x 64: W | 4x 64: R<br>4x 64: W | 8x 64: R<br>8x 64: W | | L3 Cache, D\$ Size (Kbytes) | 192K-320K | 384K-576K | 384K-576K | 384K-576K | 768K-1024K | | URB Size (kbytes) | 64K-192K | 128K-384K | 128K-384K | 128K-384K | 256K-768K | | SLM Size (kbytes) | 0, 128K | 0, 192K | 0, 192K | 0, 192K | 0, 384K | | LLC/L4 size (bytes) | ~2MB/CPU<br>core | ~2MB/CPU<br>core | ~2MB/CPU<br>core | ~2MB/CPU<br>core | ~2MB/CPU<br>core | | Instruction Cache (IC, bytes) | 2x 48K | 3x 48K | 3x 48K | 3x 48K | 6x 48K | | Color Cache (RCC, bytes) | 24K | 24K | 24K | 24K | 2x 24K | | MSC Cache (MSC, bytes) | 12K | 12K | 12K | 12K | 2x 12K | | HiZ Cache (HZC, bytes) | 12K | 12K | 12K | 12K | 2x 12K | | Z Cache (RCZ, bytes) | 32K | 32K | 32K | 32K | 2x 12K | | | Product Config | guration Attrik | oute Table | | | |---------------------------------------------------------------------|----------------|-----------------|------------|--------|--------| | Product Family | - | | BDW | | | | SKU Name | GT1F | GT1.5F | GT2F | GT2 | GT3 | | Stencil Cache (STC, bytes) | 8K | 8K | 8K | 8K | 2x 8K | | L1 Texture Cache (bytes) | 2x 32K | 3x 32K | 3x 32K | 3x 32K | 6x 32K | | MT Texture Cache (bytes) | 2x 8K | 3x 8K | 3x 8K | 3x 8K | 6x 8K | | | Instruc | ction Issue Rat | es | | | | FMAD, SP (ops/EU/clk) | 8 | 8 | 8 | 8 | 8 | | FMUL, SP (ops/EU/clk) | 8 | 8 | 8 | 8 | 8 | | FADD, SP (ops/EU/clk) | 8 | 8 | 8 | 8 | 8 | | MIN,MAX, SP (ops/EU/clk) | 8 | 8 | 8 | 8 | 8 | | CMP, SP (ops/EU/clk) | 8 | 8 | 8 | 8 | 8 | | INV, SP (ops/EU/clk) | 2 | 2 | 2 | 2 | 2 | | SQRT, SP (ops/EU/clk) | 2 | 2 | 2 | 2 | 2 | | RSQRT, SP (ops/EU/clk) | 2 | 2 | 2 | 2 | 2 | | LOG, SP (ops/EU/clk) | 2 | 2 | 2 | 2 | 2 | | EXP, SP (ops/EU/clk) | 2 | 2 | 2 | 2 | 2 | | POW, SP (ops/EU/clk) | 1 | 1 | 1 | 1 | 1 | | IDIV, SP (ops/EU/clk) | 1-6 | 1-6 | 1-6 | 1-6 | 1-6 | | TRIG, SP (ops/EU/clk) | 2 | 2 | 2 | 2 | 2 | | FDIV, SP (ops/EU/clk) | 1 | 1 | 1 | 1 | 1 | | | L | oad/Store | | | | | Data Ports (HDC) | 2 | 3 | 3 | 3 | 6 | | L3 Load/Store - same addresses within msg (dwords/clk) | | | | | | | L3 Load/Store - unique addresses within msg (dwords/clk) | | | | | | | SLM Load//Store - same<br>addresses within msg<br>(dwords/clk) | | | | | | | SLM Load//Store - unique<br>addresses within msg<br>(dwords/clk) | | | | | | | Atomic, Local 32b - same<br>addresses within msg<br>(dwords/clk) | | | | | | | Atomic, Global 32b - unique<br>addresses within msg<br>(dwords/clk) | | | | | | | | Product Config | guration Attrib | oute Table | | | |----------------------------------------------------------------|------------------|------------------|------------|-------|-------| | Product Family | BDW | | | | | | SKU Name | GT1F | GT1.5F | GT2F | GT2 | GT3 | | | 31 | D Attributes | | | | | Geometry pipes | 1 | 1 | 1 | 1 | 1 | | Samplers (3D) | 2 | 3 | 3 | 3 | 6 | | Texel Rate, point, 32b (tex/clk) | 8 | 12 | 12 | 12 | 24 | | Texel Rate, point, 64b (tex/clk) | 8 | 12 | 12 | 12 | 24 | | Texel Rate, point, 128b (tex/clk) | 8 | 12 | 12 | 12 | 24 | | Texel Rate, bilinear, 32b (tex/clk) | 8 | 12 | 12 | 12 | 24 | | Texel Rate, bilinear, 64b (tex/clk) | 8 | 12 | 12 | 12 | 24 | | Texel Rate, bilinear, 128b (tex/clk) | 2 | 3 | 3 | 3 | 6 | | Texel Rate, trilinear, 32b (tex/clk) | 4 | 6 | 6 | 6 | 12 | | Texel Rate, trilinear, 64b (tex/clk) | 2 | 3 | 3 | 3 | 6 | | Texel Rate, trilinear, 128b (tex/clk) | 1 | 1.5 | 1.5 | 1.5 | 3 | | Texel Rate, aniso 2x, 32b (tex/clk) | 2 | 3 | 3 | 3 | 6 | | Texel Rate, aniso 4x, 32b (tex/clk) | 1 | 1.5 | 1.5 | 1.5 | 3 | | Texel Rate, ansio 8x, 32b (tex/clk) | 0.5 | 0.75 | 0.75 | 0.75 | 1.5 | | Texel Rate, ansio 16x, 32b (tex/clk) | 0.25 | 0.375 | 0.375 | 0.375 | 0.75 | | HiZ Rate, (ppc) | 64 | 64 | 64 | 64 | 2x 64 | | IZ Rate, (ppc) | 16 | 16 | 16 | 16 | 2x 16 | | Stencil Rate (ppc) | 64 | 64 | 64 | 64 | 2x 64 | | (500 MHz, DDR-2400; Range depen | ds on dynamic co | ompression ratio | p) | | | | Pixel Rate, fill, 32bpp (pix/clk, RCC hit) | 4 | 6 | 6 | 6 | 12 | | Pixel Rate, fill, 32bpp (pix/clk, LLC hit @ 1.0x unslice clk) | 4 | 6 | 6 | 6 | 12 | | Pixel Rate, fill, 32bpp (pix/clk, LLC hit, @ 1.5x unslice clk) | N/A | N/A | N/A | N/A | N/A | | Pixel Rate, fill, 32bpp (pix/clk, memory, @ 1.0x unslice clk) | 4 | 6 | 6 | 6 | 12 | | Pixel Rate, fill, 32bpp (pix/clk, memory, @ 1.5x unslice clk) | N/A | N/A | N/A | N/A | N/A | | (500 MHz, DDR-2400; Range depen | ds on dynamic co | ompression ratio | p) | | | | Pixel Rate, blend, 32bpp (p/clk, RCC hit) | 4 | 4 | 4 | 4 | 8 | | Pixel Rate, blend, 32bpp (p/clk, LLC hit, @ 1.0x unslice clk) | 4 | 4 | 4 | 4 | 8 | | | <b>Product Confi</b> | guration Attrib | oute Table | | | |------------------------------------------------------------------|--------------------------|-----------------|------------|-----|-----| | Product Family | BDW | | | | | | SKU Name | GT1F GT1.5F GT2F GT2 GT3 | | | | | | Pixel Rate, blend, 32bpp (p/clk,<br>LLC hit, @ 1.5x unslice clk) | N/A | N/A | N/A | N/A | N/A | | Pixel Rate, blend, 32bpp (pix/clk, memory, @ 1.0x unslice clk) | 4 | 4 | 4 | 4 | 8 | | Pixel Rate, blend, 32bpp (pix/clk, memory, @ 1.5x unslice clk) | N/A | N/A | N/A | N/A | N/A | | | Me | dia Attributes | | | | | Samplers (media) | 2 | 3 | 3 | 3 | 6 | | VDBox Instances | 1 | 1 | 1 | 1 | 2 | | VEBox Instances | 1 | 1 | 1 | 1 | 2 | | SFC Instances | N/A | N/A | N/A | N/A | N/A | | WDBox Instances | N/A | N/A | N/A | N/A | N/A | | WGBox Instances | N/A | N/A | N/A | N/A | N/A | ## **Steppings and Device IDs** #### **Broadwell Graphics Production Devices** The following table details the current production devices of graphics for Broadwell. It will be updated as additional production devices are released. | CPU SKU | GT SKU | Device 2 DeviceID | GT Device2 RevID | |---------------|---------|---------------------------------|------------------| | 2+2 ULT / ULX | BDW:GT2 | 0x1616 (ULT) or<br>0x161E (ULX) | 0x8 | | 2+2 ULT / ULX | BDW:GT2 | 0x1616 (ULT) or<br>0x161E (ULX) | 0x9 | | 2+3 ULT | BDW:GT3 | 0x1626 (15W) or<br>0x162B (28W) | 0x9 | #### **Broadwell SKUs and Device IDs** The following table details all SKUs for BDW currently in production. | Device2 ID | Description | Comments / SKU String | Number of EUs | |------------|-----------------------|--------------------------|----------------------------------------| | 0x1606 | U-Processor - GT1 | Intel HD graphics | 12 | | 0x1612 | H-Processor - GT2 | Intel HD graphics 5600 | 24 | | 0x1616 | U-Processor - GT2 | Intel HD graphics 5500 | High End SKUs: 24<br>Low End SKUs: 23* | | 0x161E | Y-Processor - GT2 | Intel HD graphics 5300 | 24 | | 0x1626 | U-Processor - GT3 15W | Intel HD graphics 6000 | 47* | | 0x162B | U-Processor - GT3 28W | Intel Iris graphics 6100 | 48 | - (\*) Intel reserves the right to increase the number of EUs on these SKUs in the future. - o Intel Core i3 processors (ULT) will have 23 EUs, but could move to 24 EUs in the future. - o Intel Pentium Processors and Celeron Processors will have 12 EUs.