OpenCL *, Heterogeneous Computing, and the CPU

Size: px
Start display at page:

Download "OpenCL *, Heterogeneous Computing, and the CPU"

Transcription

1 OpenCL *, Heterogeneous Computing, and the CPU Dr. Tim Mattson Visual Applications Research lab Intel Corporation * 3 rd party nam es are the property of their ow ners I ntel Corporation, 2009

2 Agenda Heterogeneous com puting in OpenCL Data parallelism in OpenCL Task parallelism in OpenCL Future direction for parallel com puting 2

3 Heterogeneous com puting A m odern platform has: Multi-core CPU(s) A GPU DSP processors other? GPU CPU GMCH I CH CPU DRAM The goal should NOT be to off-load" the CPU. We need to m ake the best use of all the available resources from wit hin a single program : One program that runs well (i.e. reasonably close to handtuned perform ance) on a heterogeneous m ixture of processors. 3 GMCH = graphics m em ory cont rol hub, I CH = I nput / out put cont rol hub

4 OpenCL: it s not just a GPGPU Language OpenCL defines a platform API to coordinate het erogeneous parallel com put at ions Literature rich with parallel coordination languages/ API OpenCL unique in its ability to coordinate CPUs, GPUs, etc Key coordination concepts Each device has its own asynchronous workqueue Synchronize between OCL com putations w/ event handles from different (or sam e) devices Enables algorithm s and system s that use all available com put at ional resources Enqueue native functions for integration with C/ C+ + code 4

5 Agenda Heterogeneous com puting in OpenCL Data parallelism in OpenCL Task parallelism in OpenCL Future direction for parallel com puting 5

6 OpenCL s Tw o St yles of Dat a- Parallelism Explicit SI MD data parallelism : The kernel defines one stream of instructions Parallelism from using wide vector types Size vector types to m atch native HW width Com bine with task parallelism to exploit m ultiple cores I m plicit SI MD data parallelism (i.e. shader-style): Write the kernel as a scalar program Use vector data types sized naturally to the algorithm Kernel autom atically m apped to SI MD-com pute-resources and cores by the com piler/ runtim e/ hardware. Both approaches are viable CPU options 6

7 Data- Parallelism : options on I A processors Explicit SI MD data parallelism Program m er chooses vector data type (width) Com piler hints using attributes vec_type_hint(typen) I m plicit SI MD Data parallel Map onto CPUs, GPUs, Larrabee, SSE/ AVX/ LRBni: 4/ 8/ 16 workitem s in parallel Hybrid use of the two m ethods AVX: can run two 4-wide workitem s in parallel LRBni: can run four 4-wide workitem s in parallel 7

8 Explicit SI MD data parallelism OpenCL as a portable interface to vector instruction sets Block loops and pack data into vector types (float4, ushort16, etc). Replace scalar ops in loops with blocked loops and vector ops. Unroll loops, optim ize indexing to m atch m achine vector width float a[n], b[n], c[n]; for (i=0; i<n; i++) c[i] = a[i]*b[i]; <<< the above becomes >>>> float4 a[n/4], b[n/4], c[n/4]; for (i=0; i<n/4; i++) c[i] = a[i]*b[i]; Explicit SI MD data parallelism m eans you tune your code to the vector w idth and other properties of the com pute device 8

9 Explicit SI MD data parallelism : Case Study Video contrast/ color optim ization kernel on a dual core CPU Successive im provem ent 5 % 2 3 % % 4 0 % 5 % Hand- tuned SSE + Multithreading Unroll loops Optim ize vector indexing Vectorize ( block loops, pack into ushort8 and ushort1 6 ) 1 w ork- item per core + loops 2 0 % % % peak perform ance Good new s: OpenCL code 9 5 % of hand- tuned SSE/ MT perf. Bad new s: N ew platform, redo all those optim izations. 9 3 Ghz dual core CPU pre- release version of OpenCL Source: I nt el Corp. * Results have been estim ated based on internal I ntel analysis and are provided for inform ational purposes only. Any difference in system hardware or software design or configuration m ay affect actual perform ance.

10 Tow ards Portable Perform ance The following C code is an exam ple of a Bilateral 1D filter: Rem inder: Bilateral filter is an edge preserving im age processing algorithm. See m ore inform ation here: ht t p: / / scien.st anford.edu/ class/ psych221/ projects/ 06/ im agescaling/ bilati.htm l void P4_Bilateral9 (int start, int end, float v) { int i, j, k; float w[4], a[4], p[4]; float inv_of_2v = -0.5 / v; for (i = start; i < end; i++) { float wt[4] = { 1.0f, 1.0f, 1.0f, 1.0f ; for (k = 0; k < 4; k++) a[k] = image[i][k]; for (j = 1; j <= 4; j++) { for (k = 0; k < 4; k++) p[k] = image[i - j*size][k] - image[i][k]; for (k = 0; k < 4; k++) w[k] = exp (p[k] * p[k] * inv_of_2v); for (k = 0; k < 4; k++) { wt[k] += w[k]; a[k] += w[k] * image[i - j*size][k]; for (j = 1; j <= 4; j++) { for (k = 0; k < 4; k++) p[k] = image[i + j*size][k] - image[i][k]; for (k = 0; k < 4; k++; w[k] = exp (p[k] * p[k] * inv_of_2v); for (k = 0; k < 4; k++) { wt[k] += w[k]; a[k] += w[k] * image[i + j*size][k]; for (k = 0; k < 4; k++) { image2[i][k] = a[k] / wt[k]; 1 0

11 Tow ards Portable Perform ance 1 1 float w[4], a[4], p[4]; The following void C code P4_Bilateral9 is an (int start, int end, float v) float inv_of_2v = -0.5 / v; for (i = start; i < end; i++) { exam ple of { a Bilateral 1D filter: float wt[4] = { 1.0f, 1.0f, 1.0f, 1.0f ; <<< Declarations >>> Rem inder: Bilateral filter is an edge preserving for im(i age = start; i < end; i++) { processing algorithm for. (j = 1; j <= 4; j++) { <<< a series of short loops >>>> See m ore inform ation here: ht t p: / / scien.st anford.edu/ class/ psych221/ projects/ 06/ im agescaling/ bilati.htm l void P4_Bilateral9 (int start, int end, float v) { int i, j, k; for (j = 1; j <= 4; j++) { for (k = 0; k < 4; k++) a[k] = image[i][k]; for (j = 1; j <= 4; j++) { for (k = 0; k < 4; k++) p[k] = image[i - j*size][k] - image[i][k]; for (k = 0; k < 4; k++) w[k] = exp (p[k] * p[k] * inv_of_2v); for (k = 0; k < 4; k++) { wt[k] += w[k]; a[k] += w[k] * image[i - j*size][k]; for (j = 1; j <= 4; j++) { for (k = 0; k < 4; k++) p[k] = image[i + j*size][k] - image[i][k]; for (k = 0; k < 4; k++; w[k] = exp (p[k] * p[k] * inv_of_2v); <<< a 2 nd series of short for loops (k = 0; k < 4; >>> k++) { wt[k] += w[k]; a[k] += w[k] * image[i + j*size][k]; for (k = 0; k < 4; k++) { image2[i][k] = a[k] / wt[k];

12 I m plicit SI MD data parallel code outer loop replaced by work-item s running over an NDRange index set NDRange 4* im age size since each workitem does a color for each pixel Leave it to the com piler to m ap work-item s onto lanes of the vector units kernel void P4_Bilateral9 ( global float* ini m age, global float* outi m age, float v) { const size_t m yi D = get_global_id(0); const float inv_of_2v = -0.5f / v; const size_t m yrow = m yi D / I MAGE_WI DTH; size_t m axdistance = m in(di STANCE, m yrow); m axdistance = m in(m axdistance, I MAGE_HEI GHT - m yrow); float currentpixel, neighborpixel, newpixel; float diff; float accum ulatedweights, currentweights; newpixel = currentpixel = ini m age[ m yi D] ; accum ulatedweights = 1.0f; for (size_t dist = 1; dist < = m axdistance; + + dist) { neighborpixel = ini m age[ m yi D + dist* I MAGE_WI DTH] ; diff currentweights = neighborpixel - currentpixel; = exp(diff * diff * inv_of_2v); accum ulatedweights + = currentweights; newpixel + = neighborpixel * currentweights; neighborpixel = ini m age[ m yi D - dist* I MAGE_WI DTH] ; diff currentweights = neighborpixel - currentpixel; = exp(diff * diff * inv_of_2v); accum ulatedweights + = currentweights; newpixel + = neighborpixel * currentweights; outimage[ m yi D] = newpixel / accum ulatedweights; 1 2

13 I m plicit SI MD data parallel code kernel void P4_Bilateral9 ( global float* ini m age, global float* outi m age, float v) { kernel void p4 _ bilateral9 ( global float* ini m age, const size_t m yi D = get_global_id(0); outer loop replaced by work-item s { running over an NDRange index set. NDRange 4* im age size since each workitem does a color for each pixel. Leave it to the com piler to m ap work-item s onto lanes of the vector units const float inv_of_2v = -0.5f / v; const size_t m yrow = m yi D / I MAGE_WI DTH; size_t m axdistance = m in(di STANCE, m yrow); m axdistance = m in(m axdistance, I MAGE_HEI GHT - m yrow); float currentpixel, neighborpixel, newpixel; float diff; float accum ulatedweights, currentweights; for ( size_ t dist = 1 ; dist < = m axdistance; + + dist) { newpixel = currentpixel = ini m age[ m yi D] ; accum ulatedweights = 1.0f; neighborpixel = ini m age[ m yi D + for (size_t dist = 1; dist < = m axdistance; + + dist) { global float* outi m age, float v) const size_ t m yi D = get_ global_ id( 0 ) ; < < < declarations > > > diff neighborpixel = ini m age[ m yi D + dist* I MAGE_WI DTH] ; diff currentweights = neighborpixel - currentpixel; = exp(diff * diff * inv_of_2v); accum ulatedweights + = currentweights; newpixel + = neighborpixel * currentweights; neighborpixel = ini m age[ m yi D - dist* I MAGE_WI DTH] ; diff currentweights dist* I MAGE_ W I DTH] ; = neighborpixel - currentpixel; currentw eights = exp( diff * diff * inv_ of_ 2 v) ; < < plus others to com pute pixels, w eights, etc > > = neighborpixel - currentpixel; accum ulatedw eights + = currentw eights; = exp(diff * diff * inv_of_2v); accum ulatedweights + = currentweights; newpixel + = neighborpixel * currentweights; outi m age[ m yi D] = new Pixel / accum ulatedw eights; outimage[ m yi D] = newpixel / accum ulatedweights; 1 3

14 Portable Perform ance in OpenCL I m plicit SI MD code where the fram ework m aps work-item s onto the lanes of the vector unit creates the opportunity for portable code that perform s well on full range of OpenCL com pute devices Requires m ature OpenCL technology that knows how to do this: But it is im portant to note. we know this approach works since its based on the way shader com pilers work today 1 4

15 Agenda Heterogeneous com puting in OpenCL Data parallelism in OpenCL Task parallelism in OpenCL Future direction for parallel com puting 1 5

16 Task Parallelism Overview Think of a task as an asynchronous function call Do X at som e point in the future Optionally after Y is done Light weight, often in user space Strengths Copes well wit h het erogeneous workloads Doesn t require 1000 s of st rands Scales well with core count Lim itations No autom atic support for latency hiding Must explicitly write SI MD code A natural fit to m ulti- core CPUs Y() X() 1 6

17 Task Parallelism in OpenCL 1 7 clenqueuetask I m agine sea of different tasks executing concurrently A task owns the core (i.e., a workgroup size of 1) Use tasks when algorithm Benefits from large am ount of local/ private m em ory Has predictable global m em ory accesses Can be program m ed using explicit vector style Just doesn t have 1000 s of identical things to do Use data-parallel kernels when algorithm Does not benefit from large am ounts of local/ private m em ory Has unpredictable global m em ory accesses Needs to apply sam e operation across large num ber of data elem ents

18 Agenda Heterogeneous com puting in OpenCL Data parallelism in OpenCL Task parallelism in OpenCL Future direction for parallel com puting 1 8

19 Future Parallel Program m ing Real world applications contain data parallel parts as well as serial/ sequent ial part s OpenCL addresses these Apps need by supporting Data Parallel & Task Parallel Braided Parallelism com posing Data Parallel & Task Parallel constructs in a single algorit hm CPUs are ideal for Braided Parallelism 1 9

20 Future parallel program m ing: Larrabee Memory Controller Fixed Function Texture Logic Multi-Threaded Threaded Wide SIMD W ide SI MD I$ D$ Multi-Threaded Threaded Wide SIMD W ide SI MD I$ D$ I$ D$... L2 Cache... Multi-Threaded Threaded Wide SIMD W ide SI MD I$ D$ I$ D$ Multi-Threaded Threaded Wide SIMD W ide SI MD I$ D$ I$ D$ Display Interface System Interface Memory Controller Cores com m unicate on a wide ring bus Fast access to m em ory and fixed function blocks Fast access for cache coherency L2 cache is partitioned am ong the cores Provides high aggregat e bandwidt h Allows data replication & sharing 2 0

21 Processor Core Block Diagram Instruction Decode Scalar Unit Scalar Registers Vector Unit Vector Registers L1 Icache & Dcache 256KB L2 Cache Local Subset Separate scalar and vector units with separate registers Vector unit: bit ops/ clock I n-order instruction execution Short execution pipelines Fast access from L1 cache Direct connection to each core s subset of the L2 cache Prefetch instructions load L1 and L2 caches Ring 2 1

22 Key Differences from Typical GPUs Each Larrabee core is a com plete I ntel processor Context switching & pre-em ptive m ulti-tasking Virtual m em ory and page swapping, even in texture logic Fully coherent caches at all levels of the hierarchy Efficient inter-block com m unication Ring bus for full inter-processor com m unication Low latency high bandwidth L1 and L2 caches Fast synchronizat ion bet ween cores and caches Larrabee is perfect for the braided parallelism in future applications 2 2

23 Conclusion OpenCL defines a platform -API / fram ework for heterogeneous com puting not just GPGPU or CPU-offload program m ing OpenCL has the potential to deliver portably perform ant code; but only if its used correctly: I m plicit SI MD data parallel code has the best chance of m apping onto a diverse range of hardware once OpenCL im plem entation quality catches up with m ature shader languages The future is clear: Braided parallelism m ixing task parallel and data parallel code in a single program balancing the load am ong ALL OF the plat form s resources OpenCL can handle this and em erging platform s such as Larrabee are well suited to support this m odel 2 3

24 References s0 9.idav.ucdavis.edu for slides from a Siggraph2009 course titled Beyond Program m able Shading Seiler, L., Carm ean, D., et al Larrabee: A m any-core x86 architecture for visual com puting. SI GGRAPH 08: ACM SI GGRAPH 2008 Papers, ACM Press, New York, NY Fatahalian, K., Houston, M., GPUs: a closer look, Com m unications of the ACM October 2008, vol 51 # 10. graphics.st anford.edu/ ~ kayvonf/ papers/ fat ahaliancacm.pdf 2 4

25 Legal Disclaim er INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL PRODUCTS. NO LICENSE, EXPRESS OR IMPLIED, BY ESTOPPEL OR OTHERWISE, TO ANY INTELLECTUAL PROPERTY RIGHTS IS GRANTED BY THIS DOCUMENT. EXCEPT AS PROVIDED IN INTEL S TERMS AND CONDITIONS OF SALE FOR SUCH PRODUCTS, INTEL ASSUMES NO LIABILITY WHATSOEVER, AND INTEL DISCLAIMS ANY EXPRESS OR IMPLIED WARRANTY, RELATING TO SALE AND/OR USE OF INTEL PRODUCTS INCLUDING LIABILITY OR WARRANTIES RELATING TO FITNESS FOR A PARTICULAR PURPOSE, MERCHANTABILITY, OR INFRINGEMENT OF ANY PATENT, COPYRIGHT OR OTHER INTELLECTUAL PROPERTY RIGHT. INTEL PRODUCTS ARE NOT INTENDED FOR USE IN MEDICAL, LIFE SAVING, OR LIFE SUSTAINING APPLICATIONS. Intel may make changes to specifications and product descriptions at any time, without notice. All products, dates, and figures specified are preliminary based on current expectations, and are subject to change without notice. Intel, processors, chipsets, and desktop boards may contain design defects or errors known as errata, which may cause the product to deviate from published specifications. Current characterized errata are available on request. Larrabee and other code names featured are used internally within Intel to identify products that are in development and not yet publicly announced for release. Customers, licensees and other third parties are not authorized by Intel to use code names in advertising, promotion or marketing of any product or services and any such use of Intel's internal code names is at the sole risk of the user Performance tests and ratings are measured using specific computer systems and/or components and reflect the approximate performance of Intel products as measured by those tests. Any difference in system hardware or software design or configuration may affect actual performance. Intel, Intel Inside and the Intel logo are trademarks of Intel Corporation in the United States and other countries. *Other names and brands may be claimed as the property of others. Copyright 2009 Intel Corporation. 2 5

26 Risk Factors This presentation contains forward-looking statem ents that involve a num ber of risks and uncertainties. These statem ents do not reflect the potential im pact of any m ergers, acquisitions, divestitures, investm ents or other sim ilar transactions that m ay be com pleted in the future. The inform ation presented is accurate only as of today s date and will not be updated. I n addition to any factors discussed in the presentation, the im portant factors that could cause actual results to differ m aterially include the following: Dem and could be different from I ntel's expectations due to factors including changes in business and econom ic conditions, including conditions in the credit m arket that could affect consum er confidence; custom er acceptance of I ntel s and com petitors products; changes in custom er order patterns, including order cancellations; and changes in the level of inventory at custom ers. I ntel s results could be affected by the tim ing of closing of acquisitions and divestitures. I ntel operates in intensely com petitive industries that are characterized by a high percentage of costs that are fixed or difficult to reduce in the short term and product dem and that is highly variable and difficult to forecast. Revenue and the gross m argin percentage are affected by the tim ing of new I ntel product introductions and the dem and for and m arket acceptance of I ntel's products; actions taken by I ntel's com petitors, including product offerings and introductions, m arketing program s and pricing pressures and I ntel s response to such actions; I ntel s ability to respond quickly to technological developm ents and to incorporate new features into its products; and the availability of sufficient supply of com ponents from suppliers to m eet dem and. The gross m argin percentage could vary significantly from expectations based on changes in revenue levels; product m ix and pricing; capacity utilization; variations in inventory valuation, including variations related to the tim ing of qualifying products for sale; excess or obsolete inventory; m anufacturing yields; changes in unit costs; im pairm ents of long-lived assets, including m anufacturing, assem bly/ test and intangible assets; and the tim ing and execution of the m anufacturing ram p and associated costs, including start-up costs. Expenses, particularly certain m arketing and com pensation expenses, vary depending on the level of dem and for I ntel's products, the level of revenue and profits, and im pairm ents of long-lived assets. I ntel is in the m idst of a structure and efficiency program that is resulting in several actions that could have an im pact on expected expense levels and gross m argin. I ntel's results could be im pacted by adverse econom ic, social, political and physical/ infrastructure conditions in the countries in which I ntel, its custom ers or its suppliers operate, including m ilitary conflict and other security risks, natural disasters, infrastructure disruptions, health concerns and fluctuations in currency exchange rates. I ntel's results could be affected by adverse effects associated with product defects and errata (deviations from published specifications), and by litigation or regulatory m atters involving intellectual property, stockholder, consum er, antitrust and other issues, such as the litigation and regulatory m atters described in I ntel's SEC reports. A detailed discussion of these and other factors that could affect I ntel s results is included in I ntel s SEC filings, including the report on Form 10-Q for the quarter ended June 28,

Programming Larrabee: Beyond Data Parallelism. Dr. Larry Seiler Intel Corporation

Programming Larrabee: Beyond Data Parallelism. Dr. Larry Seiler Intel Corporation Dr. Larry Seiler Intel Corporation Intel Corporation, 2009 Agenda: What do parallel applications need? How does Larrabee enable parallelism? How does Larrabee support data parallelism? How does Larrabee

More information

NVMHCI: The Optimized Interface for Caches and SSDs

NVMHCI: The Optimized Interface for Caches and SSDs NVMHCI: The Optimized Interface for Caches and SSDs Amber Huffman Intel August 2008 1 Agenda Hidden Costs of Emulating Hard Drives An Optimized Interface for Caches and SSDs August 2008 2 Hidden Costs

More information

Kirk Skaugen Senior Vice President General Manager, PC Client Group Intel Corporation

Kirk Skaugen Senior Vice President General Manager, PC Client Group Intel Corporation Kirk Skaugen Senior Vice President General Manager, PC Client Group Intel Corporation 2 in 1 Computing Built for Business A Look Inside 2014 Ultrabook 2011 2012 2013 Delivering The Best Mobile Experience

More information

The Heart of A New Generation Update to Analysts. Anand Chandrasekher Senior Vice President, Intel General Manager, Ultra Mobility Group

The Heart of A New Generation Update to Analysts. Anand Chandrasekher Senior Vice President, Intel General Manager, Ultra Mobility Group The Heart of A New Generation Update to Analysts Anand Chandrasekher Senior Vice President, Intel General Manager, Ultra Mobility Group Today s s presentation contains forward-looking statements. All statements

More information

Data center day. Non-volatile memory. Rob Crooke. August 27, Senior Vice President, General Manager Non-Volatile Memory Solutions Group

Data center day. Non-volatile memory. Rob Crooke. August 27, Senior Vice President, General Manager Non-Volatile Memory Solutions Group Non-volatile memory Rob Crooke Senior Vice President, General Manager Non-Volatile Memory Solutions Group August 27, 2015 THE EXPLOSION OF DATA Requires Technology To Perform Random Data Access, Low Queue

More information

Hardware and Software Co-Optimization for Best Cloud Experience

Hardware and Software Co-Optimization for Best Cloud Experience Hardware and Software Co-Optimization for Best Cloud Experience Khun Ban (Intel), Troy Wallins (Intel) October 25-29 2015 1 Cloud Computing Growth Connected Devices + Apps + New Services + New Service

More information

Thunderbolt Technology Brett Branch Thunderbolt Platform Enabling Manager

Thunderbolt Technology Brett Branch Thunderbolt Platform Enabling Manager Thunderbolt Technology Brett Branch Thunderbolt Platform Enabling Manager Agenda Technology Recap Protocol Overview Connector and Cable Overview System Overview Technology Block Overview Silicon Block

More information

AMD and OpenCL. Mike Houst on Senior Syst em Archit ect Advanced Micro Devices, I nc.

AMD and OpenCL. Mike Houst on Senior Syst em Archit ect Advanced Micro Devices, I nc. AMD and OpenCL Mike Houst on Senior Syst em Archit ect Advanced Micro Devices, I nc. Overview Brief overview of ATI Radeon HD 4870 architecture and operat ion Spreadsheet perform ance analysis Tips & tricks

More information

MICHAL MROZEK ZBIGNIEW ZDANOWICZ

MICHAL MROZEK ZBIGNIEW ZDANOWICZ MICHAL MROZEK ZBIGNIEW ZDANOWICZ Legal Notices and Disclaimers INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL PRODUCTS. NO LICENSE, EXPRESS OR IMPLIED, BY ESTOPPEL OR OTHERWISE, TO ANY

More information

Intel Many Integrated Core (MIC) Architecture

Intel Many Integrated Core (MIC) Architecture Intel Many Integrated Core (MIC) Architecture Karl Solchenbach Director European Exascale Labs BMW2011, November 3, 2011 1 Notice and Disclaimers Notice: This document contains information on products

More information

Performance Monitoring on Intel Core i7 Processors*

Performance Monitoring on Intel Core i7 Processors* Performance Monitoring on Intel Core i7 Processors* Ramesh Peri Principal Engineer & Engineering Manager Profiling Collectors Group (PAT/DPD/SSG) Intel Corporation Austin, TX 78746 * Intel, the Intel logo,

More information

Intel Xeon Phi Coprocessor. Technical Resources. Intel Xeon Phi Coprocessor Workshop Pawsey Centre & CSIRO, Aug Intel Xeon Phi Coprocessor

Intel Xeon Phi Coprocessor. Technical Resources. Intel Xeon Phi Coprocessor Workshop Pawsey Centre & CSIRO, Aug Intel Xeon Phi Coprocessor Technical Resources Legal Disclaimer INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL PRODUCTS. NO LICENSE, EXPRESS OR IMPLIED, BY ESTOPPEL OR OTHERWISE, TO ANY INTELLECTUAL PROPETY RIGHTS

More information

Data Centre Server Efficiency Metric- A Simplified Effective Approach. Henry ML Wong Sr. Power Technologist Eco-Technology Program Office

Data Centre Server Efficiency Metric- A Simplified Effective Approach. Henry ML Wong Sr. Power Technologist Eco-Technology Program Office Data Centre Server Efficiency Metric- A Simplified Effective Approach Henry ML Wong Sr. Power Technologist Eco-Technology Program Office 1 Agenda The Elephant in your Data Center Calculating Server Utilization

More information

Intel Array Building Blocks

Intel Array Building Blocks Intel Array Building Blocks Productivity, Performance, and Portability with Intel Parallel Building Blocks Intel SW Products Workshop 2010 CERN openlab 11/29/2010 1 Agenda Legal Information Vision Call

More information

The Transition to PCI Express* for Client SSDs

The Transition to PCI Express* for Client SSDs The Transition to PCI Express* for Client SSDs Amber Huffman Senior Principal Engineer Intel Santa Clara, CA 1 *Other names and brands may be claimed as the property of others. Legal Notices and Disclaimers

More information

GraphBuilder: A Scalable Graph ETL Framework

GraphBuilder: A Scalable Graph ETL Framework SIGMOD GRADES 2013 GraphBuilder: A Scalable Graph ETL Framework Large Scale Graph Construction using Apache Hadoop 1 Authors: Nilesh Jain, Guangdeng Liao, Theodore Willke Presented By: Kushal Datta Legal

More information

Intel Desktop Board DZ68DB

Intel Desktop Board DZ68DB Intel Desktop Board DZ68DB Specification Update April 2011 Part Number: G31558-001 The Intel Desktop Board DZ68DB may contain design defects or errors known as errata, which may cause the product to deviate

More information

Krzysztof Laskowski, Intel Pavan K Lanka, Intel

Krzysztof Laskowski, Intel Pavan K Lanka, Intel Krzysztof Laskowski, Intel Pavan K Lanka, Intel Legal Notices and Disclaimers INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL PRODUCTS. NO LICENSE, EXPRESS OR IMPLIED, BY ESTOPPEL OR

More information

Parallel Programming on Larrabee. Tim Foley Intel Corp

Parallel Programming on Larrabee. Tim Foley Intel Corp Parallel Programming on Larrabee Tim Foley Intel Corp Motivation This morning we talked about abstractions A mental model for GPU architectures Parallel programming models Particular tools and APIs This

More information

Enterprise Data Integrity and Increasing the Endurance of Your Solid-State Drive MEMS003

Enterprise Data Integrity and Increasing the Endurance of Your Solid-State Drive MEMS003 SF 2009 Enterprise Data Integrity and Increasing the Endurance of Your Solid-State Drive James Myers Manager, SSD Applications Engineering, Intel Cliff Jeske Manager, SSD development, Hitachi GST MEMS003

More information

Software Occlusion Culling

Software Occlusion Culling Software Occlusion Culling Abstract This article details an algorithm and associated sample code for software occlusion culling which is available for download. The technique divides scene objects into

More information

Technology is a Journey

Technology is a Journey Technology is a Journey Not a Destination Renée James President, Intel Corp Other names and brands may be claimed as the property of others. Building Tomorrow s TECHNOLOGIES Late 80 s The Early Motherboard

More information

Extending Energy Efficiency. From Silicon To The Platform. And Beyond Raj Hazra. Director, Systems Technology Lab

Extending Energy Efficiency. From Silicon To The Platform. And Beyond Raj Hazra. Director, Systems Technology Lab Extending Energy Efficiency From Silicon To The Platform And Beyond Raj Hazra Director, Systems Technology Lab 1 Agenda Defining Terms Why Platform Energy Efficiency Value Intel Research Call to Action

More information

Collecting OpenCL*-related Metrics with Intel Graphics Performance Analyzers

Collecting OpenCL*-related Metrics with Intel Graphics Performance Analyzers Collecting OpenCL*-related Metrics with Intel Graphics Performance Analyzers Collecting Important OpenCL*-related Metrics with Intel GPA System Analyzer Introduction Intel SDK for OpenCL* Applications

More information

Server Efficiency: A Simplified Data Center Approach

Server Efficiency: A Simplified Data Center Approach Server Efficiency: A Simplified Data Center Approach Henry M.L. Wong Intel Eco-Technology Program Office Environment Global CO 2 Emissions ICT 2% 98% 2 Source: The Climate Group Economic Efficiency and

More information

JOE NARDONE. General Manager, WiMAX Solutions Division Service Provider Business Group October 23, 2006

JOE NARDONE. General Manager, WiMAX Solutions Division Service Provider Business Group October 23, 2006 JOE NARDONE General Manager, WiMAX Solutions Division Service Provider Business Group October 23, 2006 Intel s s Broadband Wireless Vision Intel is driving >200 million new screens each year We re focused

More information

Intel Atom Processor Based Platform Technologies. Intelligent Systems Group Intel Corporation

Intel Atom Processor Based Platform Technologies. Intelligent Systems Group Intel Corporation Intel Atom Processor Based Platform Technologies Intelligent Systems Group Intel Corporation Legal Disclaimer INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL PRODUCTS. NO LICENSE, EXPRESS

More information

Bitonic Sorting Intel OpenCL SDK Sample Documentation

Bitonic Sorting Intel OpenCL SDK Sample Documentation Intel OpenCL SDK Sample Documentation Document Number: 325262-002US Legal Information INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL PRODUCTS. NO LICENSE, EXPRESS OR IMPLIED, BY ESTOPPEL

More information

Solid-State Drives for Servers and Clients: Why, When, Where and How

Solid-State Drives for Servers and Clients: Why, When, Where and How Solid-State Drives for Servers and Clients: Why, When, Where and How Troy Winslow Marketing Manager Intel NAND Products Group MASS001 Agenda Why Solid-State Drives? Where will SSDs be utilized? When will

More information

Supra-linear Packet Processing Performance with Intel Multi-core Processors

Supra-linear Packet Processing Performance with Intel Multi-core Processors White Paper Dual-Core Intel Xeon Processor LV 2.0 GHz Communications and Networking Applications Supra-linear Packet Processing Performance with Intel Multi-core Processors 1 Executive Summary Advances

More information

Intel Architecture Press Briefing

Intel Architecture Press Briefing Intel Architecture Press Briefing Stephen L. Smith Vice President Director, Digital Enterprise Group Operations Ronak Singhal Principal Engineer Digital Enterprise Group 17 March, 2008 Today s s News Intel

More information

DisplayPort: Foundation and Innovation for Current and Future Intel Platforms

DisplayPort: Foundation and Innovation for Current and Future Intel Platforms DisplayPort: Foundation Innovation for Current Future Intel Platforms Bailey Cross Mobile Technology Evangelist Mobile Platforms Group Nick Willow Display Technologist Business Client Group Session ID

More information

More performance options

More performance options More performance options OpenCL, streaming media, and native coding options with INDE April 8, 2014 2014, Intel Corporation. All rights reserved. Intel, the Intel logo, Intel Inside, Intel Xeon, and Intel

More information

Carsten Benthin, Sven Woop, Ingo Wald, Attila Áfra HPG 2017

Carsten Benthin, Sven Woop, Ingo Wald, Attila Áfra HPG 2017 Carsten Benthin, Sven Woop, Ingo Wald, Attila Áfra HPG 2017 Recap Twolevel BVH Multiple object BVHs Single toplevel BVH Recap Twolevel BVH Multiple object BVHs Single toplevel BVH Motivation The Library

More information

Intel and the Future of Consumer Electronics. Shahrokh Shahidzadeh Sr. Principal Technologist

Intel and the Future of Consumer Electronics. Shahrokh Shahidzadeh Sr. Principal Technologist 1 Intel and the Future of Consumer Electronics Shahrokh Shahidzadeh Sr. Principal Technologist Legal Notices and Disclaimers INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL PRODUCTS.

More information

Bitonic Sorting. Intel SDK for OpenCL* Applications Sample Documentation. Copyright Intel Corporation. All Rights Reserved

Bitonic Sorting. Intel SDK for OpenCL* Applications Sample Documentation. Copyright Intel Corporation. All Rights Reserved Intel SDK for OpenCL* Applications Sample Documentation Copyright 2010 2012 Intel Corporation All Rights Reserved Document Number: 325262-002US Revision: 1.3 World Wide Web: http://www.intel.com Document

More information

Mobile World Congress Claudine Mangano Director, Global Communications Intel Corporation

Mobile World Congress Claudine Mangano Director, Global Communications Intel Corporation Mobile World Congress 2015 Claudine Mangano Director, Global Communications Intel Corporation Mobile World Congress 2015 Brian Krzanich Chief Executive Officer Intel Corporation 4.9B 2X CONNECTED CONNECTED

More information

Evolving Small Cells. Udayan Mukherjee Senior Principal Engineer and Director (Wireless Infrastructure)

Evolving Small Cells. Udayan Mukherjee Senior Principal Engineer and Director (Wireless Infrastructure) Evolving Small Cells Udayan Mukherjee Senior Principal Engineer and Director (Wireless Infrastructure) Intelligent Heterogeneous Network Optimum User Experience Fibre-optic Connected Macro Base stations

More information

Intel Desktop Board D945GCLF2

Intel Desktop Board D945GCLF2 Intel Desktop Board D945GCLF2 Specification Update July 2010 Order Number: E54886-006US The Intel Desktop Board D945GCLF2 may contain design defects or errors known as errata, which may cause the product

More information

Innovation Accelerating Mission Critical Infrastructure

Innovation Accelerating Mission Critical Infrastructure Innovation Accelerating Mission Critical Infrastructure Legal Disclaimers INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL PRODUCTS. NO LICENSE, EXPRESS OR IMPLIED, BY ESTOPPEL OR OTHERWISE,

More information

Striking the Balance Driving Increased Density and Cost Reduction in Printed Circuit Board Designs

Striking the Balance Driving Increased Density and Cost Reduction in Printed Circuit Board Designs Striking the Balance Driving Increased Density and Cost Reduction in Printed Circuit Board Designs Tim Swettlen & Gary Long Intel Corporation Tuesday, Oct 22, 2013 Legal Disclaimer The presentation is

More information

Intel Xeon Phi Coprocessor Performance Analysis

Intel Xeon Phi Coprocessor Performance Analysis Intel Xeon Phi Coprocessor Performance Analysis Legal Disclaimer INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL PRODUCTS. NO LICENSE, EXPRESS OR IMPLIED, BY ESTOPPEL OR OTHERWISE, TO

More information

Mobility: Innovation Unleashed!

Mobility: Innovation Unleashed! Mobility: Innovation Unleashed! Mooly Eden Corporate Vice President General Manager, Mobile Platforms Group Intel Corporation Legal Notices INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL

More information

Installation Guide and Release Notes

Installation Guide and Release Notes Intel C++ Studio XE 2013 for Windows* Installation Guide and Release Notes Document number: 323805-003US 26 June 2013 Table of Contents 1 Introduction... 1 1.1 What s New... 2 1.1.1 Changes since Intel

More information

Intel Desktop Board DG41RQ

Intel Desktop Board DG41RQ Intel Desktop Board DG41RQ Specification Update July 2010 Order Number: E61979-004US The Intel Desktop Board DG41RQ may contain design defects or errors known as errata, which may cause the product to

More information

Intel Desktop Board D946GZAB

Intel Desktop Board D946GZAB Intel Desktop Board D946GZAB Specification Update Release Date: November 2007 Order Number: D65909-002US The Intel Desktop Board D946GZAB may contain design defects or errors known as errata, which may

More information

Intel Desktop Board DG41CN

Intel Desktop Board DG41CN Intel Desktop Board DG41CN Specification Update December 2010 Order Number: E89822-003US The Intel Desktop Board DG41CN may contain design defects or errors known as errata, which may cause the product

More information

Sample for OpenCL* and DirectX* Video Acceleration Surface Sharing

Sample for OpenCL* and DirectX* Video Acceleration Surface Sharing Sample for OpenCL* and DirectX* Video Acceleration Surface Sharing User s Guide Intel SDK for OpenCL* Applications Sample Documentation Copyright 2010 2013 Intel Corporation All Rights Reserved Document

More information

Quadrics QsNet II : A network for Supercomputing Applications

Quadrics QsNet II : A network for Supercomputing Applications Quadrics QsNet II : A network for Supercomputing Applications David Addison, Jon Beecroft, David Hewson, Moray McLaren (Quadrics Ltd.), Fabrizio Petrini (LANL) You ve bought your super computer You ve

More information

The Intel Processor Diagnostic Tool Release Notes

The Intel Processor Diagnostic Tool Release Notes The Intel Processor Diagnostic Tool Release Notes Page 1 of 7 LEGAL INFORMATION INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL PRODUCTS. NO LICENSE, EXPRESS OR IMPLIED, BY ESTOPPEL OR

More information

Intel Desktop Board D945GCLF

Intel Desktop Board D945GCLF Intel Desktop Board D945GCLF Specification Update July 2010 Order Number: E47517-008US The Intel Desktop Board D945GCLF may contain design defects or errors known as errata, which may cause the product

More information

Transitioning the I ntel Next. ( Nehalem and W estm ere) into the. Lily Looi, St ephan Jourdan I ntel Corporation Hot Chips 21 August 24, 2009

Transitioning the I ntel Next. ( Nehalem and W estm ere) into the. Lily Looi, St ephan Jourdan I ntel Corporation Hot Chips 21 August 24, 2009 Transitioning the I ntel Next Generation Microarchitectures ( Nehalem and W estm ere) into the Mainstream Lily Looi, St ephan Jourdan I ntel Corporation Hot Chips 21 August 24, 2009 1 Agenda Next Generation

More information

Intel Desktop Board DG31PR

Intel Desktop Board DG31PR Intel Desktop Board DG31PR Specification Update May 2008 Order Number E30564-003US The Intel Desktop Board DG31PR may contain design defects or errors known as errata, which may cause the product to deviate

More information

Getting Started with Intel SDK for OpenCL Applications

Getting Started with Intel SDK for OpenCL Applications Getting Started with Intel SDK for OpenCL Applications Webinar #1 in the Three-part OpenCL Webinar Series July 11, 2012 Register Now for All Webinars in the Series Welcome to Getting Started with Intel

More information

Ernesto Su, Hideki Saito, Xinmin Tian Intel Corporation. OpenMPCon 2017 September 18, 2017

Ernesto Su, Hideki Saito, Xinmin Tian Intel Corporation. OpenMPCon 2017 September 18, 2017 Ernesto Su, Hideki Saito, Xinmin Tian Intel Corporation OpenMPCon 2017 September 18, 2017 Legal Notice and Disclaimers By using this document, in addition to any agreements you have with Intel, you accept

More information

Intel Desktop Board D975XBX2

Intel Desktop Board D975XBX2 Intel Desktop Board D975XBX2 Specification Update July 2008 Order Number: D74278-003US The Intel Desktop Board D975XBX2 may contain design defects or errors known as errata, which may cause the product

More information

i960 Microprocessor Performance Brief October 1998 Order Number:

i960 Microprocessor Performance Brief October 1998 Order Number: Performance Brief October 1998 Order Number: 272950-003 Information in this document is provided in connection with Intel products. No license, express or implied, by estoppel or otherwise, to any intellectual

More information

Sensors on mobile devices An overview of applications, power management and security aspects. Katrin Matthes, Rajasekaran Andiappan Intel Corporation

Sensors on mobile devices An overview of applications, power management and security aspects. Katrin Matthes, Rajasekaran Andiappan Intel Corporation Sensors on mobile devices An overview of applications, power management and security aspects Katrin Matthes, Rajasekaran Andiappan Intel Corporation Introduction 1. Compute platforms are adding more and

More information

Show Me the Money: Monetization Strategies for Apps. Scott Crabtree moderator

Show Me the Money: Monetization Strategies for Apps. Scott Crabtree moderator Show Me the Money: Monetization Strategies for Apps Scott Crabtree moderator Show Me The Money! Monetization Strategies for Apps Panel Discussion with: Moderator: Scott Crabtree, Tech Strategist, Intel

More information

Intel Desktop Board D945GCCR

Intel Desktop Board D945GCCR Intel Desktop Board D945GCCR Specification Update January 2008 Order Number: D87098-003 The Intel Desktop Board D945GCCR may contain design defects or errors known as errata, which may cause the product

More information

Device Firmware Update (DFU) for Windows

Device Firmware Update (DFU) for Windows Legal Disclaimer INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL PRODUCTS. NO LICENSE, EXPRESS OR IMPLIED, BY ESTOPPEL OR OTHERWISE, TO ANY INTELLECTUAL PROPERTY RIGHTS IS GRANTED BY

More information

OpenCL* and Microsoft DirectX* Video Acceleration Surface Sharing

OpenCL* and Microsoft DirectX* Video Acceleration Surface Sharing OpenCL* and Microsoft DirectX* Video Acceleration Surface Sharing Intel SDK for OpenCL* Applications Sample Documentation Copyright 2010 2012 Intel Corporation All Rights Reserved Document Number: 327281-001US

More information

Installation Guide and Release Notes

Installation Guide and Release Notes Installation Guide and Release Notes Document number: 321604-001US 19 October 2009 Table of Contents 1 Introduction... 1 1.1 Product Contents... 1 1.2 System Requirements... 2 1.3 Documentation... 3 1.4

More information

Scientific Computing on GPUs: GPU Architecture Overview

Scientific Computing on GPUs: GPU Architecture Overview Scientific Computing on GPUs: GPU Architecture Overview Dominik Göddeke, Jakub Kurzak, Jan-Philipp Weiß, André Heidekrüger and Tim Schröder PPAM 2011 Tutorial Toruń, Poland, September 11 http://gpgpu.org/ppam11

More information

Rack Scale Architecture Platform and Management

Rack Scale Architecture Platform and Management Rack Scale Architecture Platform and Management Mohan J. Kumar Senior Principal Engineer, Intel Corporation DATS008 Agenda Rack Scale Architecture (RSA) Overview RSA Platform Overview Elements of RSA Platform

More information

Intel Cache Acceleration Software for Windows* Workstation

Intel Cache Acceleration Software for Windows* Workstation Intel Cache Acceleration Software for Windows* Workstation Release 3.1 Release Notes July 8, 2016 Revision 1.3 INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL PRODUCTS. NO LICENSE, EXPRESS

More information

Intel Virtualization Technology Roadmap and VT-d Support in Xen

Intel Virtualization Technology Roadmap and VT-d Support in Xen Intel Virtualization Technology Roadmap and VT-d Support in Xen Jun Nakajima Intel Open Source Technology Center Legal Disclaimer INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL PRODUCTS.

More information

Intel RealSense Depth Module D400 Series Software Calibration Tool

Intel RealSense Depth Module D400 Series Software Calibration Tool Intel RealSense Depth Module D400 Series Software Calibration Tool Release Notes January 29, 2018 Version 2.5.2.0 INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL PRODUCTS. NO LICENSE,

More information

CSCI 402: Computer Architectures. Parallel Processors (2) Fengguang Song Department of Computer & Information Science IUPUI.

CSCI 402: Computer Architectures. Parallel Processors (2) Fengguang Song Department of Computer & Information Science IUPUI. CSCI 402: Computer Architectures Parallel Processors (2) Fengguang Song Department of Computer & Information Science IUPUI 6.6 - End Today s Contents GPU Cluster and its network topology The Roofline performance

More information

Intel Desktop Board DP55SB

Intel Desktop Board DP55SB Intel Desktop Board DP55SB Specification Update July 2010 Order Number: E81107-003US The Intel Desktop Board DP55SB may contain design defects or errors known as errata, which may cause the product to

More information

Real-Time Rendering Architectures

Real-Time Rendering Architectures Real-Time Rendering Architectures Mike Houston, AMD Part 1: throughput processing Three key concepts behind how modern GPU processing cores run code Knowing these concepts will help you: 1. Understand

More information

Intel G31/P31 Express Chipset

Intel G31/P31 Express Chipset Intel G31/P31 Express Chipset Specification Update For the Intel 82G31 Graphics and Memory Controller Hub (GMCH) and Intel 82GP31 Memory Controller Hub (MCH) February 2008 Notice: The Intel G31/P31 Express

More information

From Shader Code to a Teraflop: How GPU Shader Cores Work. Jonathan Ragan- Kelley (Slides by Kayvon Fatahalian)

From Shader Code to a Teraflop: How GPU Shader Cores Work. Jonathan Ragan- Kelley (Slides by Kayvon Fatahalian) From Shader Code to a Teraflop: How GPU Shader Cores Work Jonathan Ragan- Kelley (Slides by Kayvon Fatahalian) 1 This talk Three major ideas that make GPU processing cores run fast Closer look at real

More information

Building a Firmware Component Ecosystem with the Intel Firmware Engine

Building a Firmware Component Ecosystem with the Intel Firmware Engine Building a Firmware Component Ecosystem with the Intel Firmware Engine Brian Richardson Senior Technical Marketing Engineer, Intel Corporation STTS002 1 Agenda Binary Firmware Management The Binary Firmware

More information

Intel USB 3.0 extensible Host Controller Driver

Intel USB 3.0 extensible Host Controller Driver Intel USB 3.0 extensible Host Controller Driver Release Notes (5.0.4.43) Unified driver September 2018 Revision 1.2 INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL PRODUCTS. NO LICENSE,

More information

Unlocking the Future with Intel

Unlocking the Future with Intel Unlocking the Future with Intel Renée James Senior Vice President, Intel Corporation General Manager, Software and Services Group Technology Has Changed the Way We Interact With Our World 1981 348 IBM

More information

Maximize Performance and Scalability of RADIOSS* Structural Analysis Software on Intel Xeon Processor E7 v2 Family-Based Platforms

Maximize Performance and Scalability of RADIOSS* Structural Analysis Software on Intel Xeon Processor E7 v2 Family-Based Platforms Maximize Performance and Scalability of RADIOSS* Structural Analysis Software on Family-Based Platforms Executive Summary Complex simulations of structural and systems performance, such as car crash simulations,

More information

Intel Extreme Memory Profile (Intel XMP) DDR3 Technology

Intel Extreme Memory Profile (Intel XMP) DDR3 Technology Intel Extreme Memory Profile (Intel XMP) DDR3 Technology White Paper March 2008 Document Number: 319124-001 INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL PRODUCTS. NO LICENSE, EXPRESS

More information

32nm Westmere Family of Processors

32nm Westmere Family of Processors 32nm Westmere Family of Processors Stephen L. Smith Vice President, Director of Group Operations Digital Enterprise Group Contact: George Alfs (408)765-5707 5707 Highlights from Paul Otellini Speech Today

More information

Achieving Performance with OpenCL 2.0 on Intel Processor Graphics. Robert Ioffe, Sonal Sharma, Michael Stoner

Achieving Performance with OpenCL 2.0 on Intel Processor Graphics. Robert Ioffe, Sonal Sharma, Michael Stoner Achieving Performance with OpenCL 2.0 on Intel Processor Graphics Robert Ioffe, Sonal Sharma, Michael Stoner Legal INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL PRODUCTS. NO LICENSE,

More information

Desktop 4th Generation Intel Core, Intel Pentium, and Intel Celeron Processor Families and Intel Xeon Processor E3-1268L v3

Desktop 4th Generation Intel Core, Intel Pentium, and Intel Celeron Processor Families and Intel Xeon Processor E3-1268L v3 Desktop 4th Generation Intel Core, Intel Pentium, and Intel Celeron Processor Families and Intel Xeon Processor E3-1268L v3 Addendum May 2014 Document Number: 329174-004US Introduction INFORMATION IN THIS

More information

From Shader Code to a Teraflop: How Shader Cores Work

From Shader Code to a Teraflop: How Shader Cores Work From Shader Code to a Teraflop: How Shader Cores Work Kayvon Fatahalian Stanford University This talk 1. Three major ideas that make GPU processing cores run fast 2. Closer look at real GPU designs NVIDIA

More information

Intel 848P Chipset. Specification Update. Intel 82848P Memory Controller Hub (MCH) August 2003

Intel 848P Chipset. Specification Update. Intel 82848P Memory Controller Hub (MCH) August 2003 Intel 848P Chipset Specification Update Intel 82848P Memory Controller Hub (MCH) August 2003 Notice: The Intel 82848P MCH may contain design defects or errors known as errata which may cause the product

More information

Advanced Com puter Architecture: s1/ 2005

Advanced Com puter Architecture: s1/ 2005 Advanced Com puter Architecture: s1/ 2005 Project Presen tation David Mirabito Handling branches through context fking Currently: b eq Hypothetical instruction stream (operands rem oved) Currently: b eq

More information

Intel Desktop Board D2700DC. PMLP Report. Previously Logo d Motherboard Logo Program (PMLP)

Intel Desktop Board D2700DC. PMLP Report. Previously Logo d Motherboard Logo Program (PMLP) Previously Logo d Motherboard Logo Program (PMLP) Intel Desktop Board D2700DC PMLP Report 1/12/2012 Purpose: This report describes the Board D2700DC Previously Logo d Motherboard Logo Program testing run

More information

Interconnect Bus Extensions for Energy-Efficient Platforms

Interconnect Bus Extensions for Energy-Efficient Platforms Interconnect Bus Extensions for Energy-Efficient Platforms EBLS001 Sonesh Balchandani, Product Marketing Manager, Intel Corporation Jaya Jeyaseelan, Client Platform Architect, Intel Corporation Agenda

More information

Efficient scaling in a Task-Based Game Engine

Efficient scaling in a Task-Based Game Engine Efficient scaling in a Task-Based Game Engine Leigh Davies, Intel leigh.davies@intel.com www.intel.com/software/gdceurope2011/ Agenda How do we want to program for multi-core? Introduction to tasking Building

More information

Using Wind River Simics * Virtual Platforms to Accelerate Firmware Development PTAS003

Using Wind River Simics * Virtual Platforms to Accelerate Firmware Development PTAS003 Using Wind River Simics * Virtual Platforms to Accelerate Firmware Development Steven Shi, Senior Firmware Engineer, Intel Chunrong Lai, Software Engineer, Intel Alexander Y. Belousov, Engineer Manager,

More information

Intel Transparent Computing

Intel Transparent Computing Intel Transparent Computing Jeff Griffen Director of Platform Software Infrastructure Software and Services Group October, 21 2010 1 Legal Information INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION

More information

Expand Your HPC Market Reach and Grow Your Sales with Intel Cluster Ready

Expand Your HPC Market Reach and Grow Your Sales with Intel Cluster Ready Intel Cluster Ready Expand Your HPC Market Reach and Grow Your Sales with Intel Cluster Ready Legal Disclaimer Intel may make changes to specifications and product descriptions at any time, without notice.

More information

Intel Desktop Board DH61SA

Intel Desktop Board DH61SA Intel Desktop Board DH61SA Specification Update December 2011 Part Number: G52483-001 The Intel Desktop Board DH61SA may contain design defects or errors known as errata, which may cause the product to

More information

Intel RealSense D400 Series Calibration Tools and API Release Notes

Intel RealSense D400 Series Calibration Tools and API Release Notes Intel RealSense D400 Series Calibration Tools and API Release Notes July 9, 2018 Version 2.6.4.0 INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL PRODUCTS. NO LICENSE, EXPRESS OR IMPLIED,

More information

Intel Desktop Board DQ35JO

Intel Desktop Board DQ35JO Intel Desktop Board DQ35JO Specification Update July 2010 Order Number: E21492-005US The Intel Desktop Board DQ35JO may contain design defects or errors known as errata, which may cause the product to

More information

Parallel Programming for Graphics

Parallel Programming for Graphics Beyond Programmable Shading Course ACM SIGGRAPH 2010 Parallel Programming for Graphics Aaron Lefohn Advanced Rendering Technology (ART) Intel What s In This Talk? Overview of parallel programming models

More information

An Exploration into Object Storage for Exascale Supercomputers. Raghu Chandrasekar

An Exploration into Object Storage for Exascale Supercomputers. Raghu Chandrasekar An Exploration into Object Storage for Exascale Supercomputers Raghu Chandrasekar Agenda Introduction Trends and Challenges Design and Implementation of SAROJA Preliminary evaluations Summary and Conclusion

More information

Software Evaluation Guide for ImTOO* YouTube* to ipod* Converter Downloading YouTube videos to your ipod

Software Evaluation Guide for ImTOO* YouTube* to ipod* Converter Downloading YouTube videos to your ipod Software Evaluation Guide for ImTOO* YouTube* to ipod* Converter Downloading YouTube videos to your ipod http://www.intel.com/performance/resources Version 2008-09 Rev. 1.0 Information in this document

More information

Intel Desktop Board DQ35MP. MLP Report. Motherboard Logo Program (MLP) 5/7/2008

Intel Desktop Board DQ35MP. MLP Report. Motherboard Logo Program (MLP) 5/7/2008 Motherboard Logo Program (MLP) Intel Desktop Board DQ35MP MLP Report 5/7/2008 Purpose: This report describes the DQ35MP Motherboard Logo Program testing run conducted by Intel Corporation. THIS TEST REPORT

More information

Opportunities and Challenges in Sparse Linear Algebra on Many-Core Processors with High-Bandwidth Memory

Opportunities and Challenges in Sparse Linear Algebra on Many-Core Processors with High-Bandwidth Memory Opportunities and Challenges in Sparse Linear Algebra on Many-Core Processors with High-Bandwidth Memory Jongsoo Park, Parallel Computing Lab, Intel Corporation with contributions from MKL team 1 Algorithm/

More information

Intel Desktop Board DH55TC

Intel Desktop Board DH55TC Intel Desktop Board DH55TC Specification Update December 2011 Order Number: E88213-006 The Intel Desktop Board DH55TC may contain design defects or errors known as errata, which may cause the product to

More information

Non-Volatile Memory Cache Enhancements: Turbo-Charging Client Platform Performance

Non-Volatile Memory Cache Enhancements: Turbo-Charging Client Platform Performance Non-Volatile Memory Cache Enhancements: Turbo-Charging Client Platform Performance By Robert E Larsen NVM Cache Product Line Manager Intel Corporation August 2008 1 Legal Disclaimer INFORMATION IN THIS

More information