OpenCL *, Heterogeneous Computing, and the CPU
|
|
- Rosaline Horton
- 6 years ago
- Views:
Transcription
1 OpenCL *, Heterogeneous Computing, and the CPU Dr. Tim Mattson Visual Applications Research lab Intel Corporation * 3 rd party nam es are the property of their ow ners I ntel Corporation, 2009
2 Agenda Heterogeneous com puting in OpenCL Data parallelism in OpenCL Task parallelism in OpenCL Future direction for parallel com puting 2
3 Heterogeneous com puting A m odern platform has: Multi-core CPU(s) A GPU DSP processors other? GPU CPU GMCH I CH CPU DRAM The goal should NOT be to off-load" the CPU. We need to m ake the best use of all the available resources from wit hin a single program : One program that runs well (i.e. reasonably close to handtuned perform ance) on a heterogeneous m ixture of processors. 3 GMCH = graphics m em ory cont rol hub, I CH = I nput / out put cont rol hub
4 OpenCL: it s not just a GPGPU Language OpenCL defines a platform API to coordinate het erogeneous parallel com put at ions Literature rich with parallel coordination languages/ API OpenCL unique in its ability to coordinate CPUs, GPUs, etc Key coordination concepts Each device has its own asynchronous workqueue Synchronize between OCL com putations w/ event handles from different (or sam e) devices Enables algorithm s and system s that use all available com put at ional resources Enqueue native functions for integration with C/ C+ + code 4
5 Agenda Heterogeneous com puting in OpenCL Data parallelism in OpenCL Task parallelism in OpenCL Future direction for parallel com puting 5
6 OpenCL s Tw o St yles of Dat a- Parallelism Explicit SI MD data parallelism : The kernel defines one stream of instructions Parallelism from using wide vector types Size vector types to m atch native HW width Com bine with task parallelism to exploit m ultiple cores I m plicit SI MD data parallelism (i.e. shader-style): Write the kernel as a scalar program Use vector data types sized naturally to the algorithm Kernel autom atically m apped to SI MD-com pute-resources and cores by the com piler/ runtim e/ hardware. Both approaches are viable CPU options 6
7 Data- Parallelism : options on I A processors Explicit SI MD data parallelism Program m er chooses vector data type (width) Com piler hints using attributes vec_type_hint(typen) I m plicit SI MD Data parallel Map onto CPUs, GPUs, Larrabee, SSE/ AVX/ LRBni: 4/ 8/ 16 workitem s in parallel Hybrid use of the two m ethods AVX: can run two 4-wide workitem s in parallel LRBni: can run four 4-wide workitem s in parallel 7
8 Explicit SI MD data parallelism OpenCL as a portable interface to vector instruction sets Block loops and pack data into vector types (float4, ushort16, etc). Replace scalar ops in loops with blocked loops and vector ops. Unroll loops, optim ize indexing to m atch m achine vector width float a[n], b[n], c[n]; for (i=0; i<n; i++) c[i] = a[i]*b[i]; <<< the above becomes >>>> float4 a[n/4], b[n/4], c[n/4]; for (i=0; i<n/4; i++) c[i] = a[i]*b[i]; Explicit SI MD data parallelism m eans you tune your code to the vector w idth and other properties of the com pute device 8
9 Explicit SI MD data parallelism : Case Study Video contrast/ color optim ization kernel on a dual core CPU Successive im provem ent 5 % 2 3 % % 4 0 % 5 % Hand- tuned SSE + Multithreading Unroll loops Optim ize vector indexing Vectorize ( block loops, pack into ushort8 and ushort1 6 ) 1 w ork- item per core + loops 2 0 % % % peak perform ance Good new s: OpenCL code 9 5 % of hand- tuned SSE/ MT perf. Bad new s: N ew platform, redo all those optim izations. 9 3 Ghz dual core CPU pre- release version of OpenCL Source: I nt el Corp. * Results have been estim ated based on internal I ntel analysis and are provided for inform ational purposes only. Any difference in system hardware or software design or configuration m ay affect actual perform ance.
10 Tow ards Portable Perform ance The following C code is an exam ple of a Bilateral 1D filter: Rem inder: Bilateral filter is an edge preserving im age processing algorithm. See m ore inform ation here: ht t p: / / scien.st anford.edu/ class/ psych221/ projects/ 06/ im agescaling/ bilati.htm l void P4_Bilateral9 (int start, int end, float v) { int i, j, k; float w[4], a[4], p[4]; float inv_of_2v = -0.5 / v; for (i = start; i < end; i++) { float wt[4] = { 1.0f, 1.0f, 1.0f, 1.0f ; for (k = 0; k < 4; k++) a[k] = image[i][k]; for (j = 1; j <= 4; j++) { for (k = 0; k < 4; k++) p[k] = image[i - j*size][k] - image[i][k]; for (k = 0; k < 4; k++) w[k] = exp (p[k] * p[k] * inv_of_2v); for (k = 0; k < 4; k++) { wt[k] += w[k]; a[k] += w[k] * image[i - j*size][k]; for (j = 1; j <= 4; j++) { for (k = 0; k < 4; k++) p[k] = image[i + j*size][k] - image[i][k]; for (k = 0; k < 4; k++; w[k] = exp (p[k] * p[k] * inv_of_2v); for (k = 0; k < 4; k++) { wt[k] += w[k]; a[k] += w[k] * image[i + j*size][k]; for (k = 0; k < 4; k++) { image2[i][k] = a[k] / wt[k]; 1 0
11 Tow ards Portable Perform ance 1 1 float w[4], a[4], p[4]; The following void C code P4_Bilateral9 is an (int start, int end, float v) float inv_of_2v = -0.5 / v; for (i = start; i < end; i++) { exam ple of { a Bilateral 1D filter: float wt[4] = { 1.0f, 1.0f, 1.0f, 1.0f ; <<< Declarations >>> Rem inder: Bilateral filter is an edge preserving for im(i age = start; i < end; i++) { processing algorithm for. (j = 1; j <= 4; j++) { <<< a series of short loops >>>> See m ore inform ation here: ht t p: / / scien.st anford.edu/ class/ psych221/ projects/ 06/ im agescaling/ bilati.htm l void P4_Bilateral9 (int start, int end, float v) { int i, j, k; for (j = 1; j <= 4; j++) { for (k = 0; k < 4; k++) a[k] = image[i][k]; for (j = 1; j <= 4; j++) { for (k = 0; k < 4; k++) p[k] = image[i - j*size][k] - image[i][k]; for (k = 0; k < 4; k++) w[k] = exp (p[k] * p[k] * inv_of_2v); for (k = 0; k < 4; k++) { wt[k] += w[k]; a[k] += w[k] * image[i - j*size][k]; for (j = 1; j <= 4; j++) { for (k = 0; k < 4; k++) p[k] = image[i + j*size][k] - image[i][k]; for (k = 0; k < 4; k++; w[k] = exp (p[k] * p[k] * inv_of_2v); <<< a 2 nd series of short for loops (k = 0; k < 4; >>> k++) { wt[k] += w[k]; a[k] += w[k] * image[i + j*size][k]; for (k = 0; k < 4; k++) { image2[i][k] = a[k] / wt[k];
12 I m plicit SI MD data parallel code outer loop replaced by work-item s running over an NDRange index set NDRange 4* im age size since each workitem does a color for each pixel Leave it to the com piler to m ap work-item s onto lanes of the vector units kernel void P4_Bilateral9 ( global float* ini m age, global float* outi m age, float v) { const size_t m yi D = get_global_id(0); const float inv_of_2v = -0.5f / v; const size_t m yrow = m yi D / I MAGE_WI DTH; size_t m axdistance = m in(di STANCE, m yrow); m axdistance = m in(m axdistance, I MAGE_HEI GHT - m yrow); float currentpixel, neighborpixel, newpixel; float diff; float accum ulatedweights, currentweights; newpixel = currentpixel = ini m age[ m yi D] ; accum ulatedweights = 1.0f; for (size_t dist = 1; dist < = m axdistance; + + dist) { neighborpixel = ini m age[ m yi D + dist* I MAGE_WI DTH] ; diff currentweights = neighborpixel - currentpixel; = exp(diff * diff * inv_of_2v); accum ulatedweights + = currentweights; newpixel + = neighborpixel * currentweights; neighborpixel = ini m age[ m yi D - dist* I MAGE_WI DTH] ; diff currentweights = neighborpixel - currentpixel; = exp(diff * diff * inv_of_2v); accum ulatedweights + = currentweights; newpixel + = neighborpixel * currentweights; outimage[ m yi D] = newpixel / accum ulatedweights; 1 2
13 I m plicit SI MD data parallel code kernel void P4_Bilateral9 ( global float* ini m age, global float* outi m age, float v) { kernel void p4 _ bilateral9 ( global float* ini m age, const size_t m yi D = get_global_id(0); outer loop replaced by work-item s { running over an NDRange index set. NDRange 4* im age size since each workitem does a color for each pixel. Leave it to the com piler to m ap work-item s onto lanes of the vector units const float inv_of_2v = -0.5f / v; const size_t m yrow = m yi D / I MAGE_WI DTH; size_t m axdistance = m in(di STANCE, m yrow); m axdistance = m in(m axdistance, I MAGE_HEI GHT - m yrow); float currentpixel, neighborpixel, newpixel; float diff; float accum ulatedweights, currentweights; for ( size_ t dist = 1 ; dist < = m axdistance; + + dist) { newpixel = currentpixel = ini m age[ m yi D] ; accum ulatedweights = 1.0f; neighborpixel = ini m age[ m yi D + for (size_t dist = 1; dist < = m axdistance; + + dist) { global float* outi m age, float v) const size_ t m yi D = get_ global_ id( 0 ) ; < < < declarations > > > diff neighborpixel = ini m age[ m yi D + dist* I MAGE_WI DTH] ; diff currentweights = neighborpixel - currentpixel; = exp(diff * diff * inv_of_2v); accum ulatedweights + = currentweights; newpixel + = neighborpixel * currentweights; neighborpixel = ini m age[ m yi D - dist* I MAGE_WI DTH] ; diff currentweights dist* I MAGE_ W I DTH] ; = neighborpixel - currentpixel; currentw eights = exp( diff * diff * inv_ of_ 2 v) ; < < plus others to com pute pixels, w eights, etc > > = neighborpixel - currentpixel; accum ulatedw eights + = currentw eights; = exp(diff * diff * inv_of_2v); accum ulatedweights + = currentweights; newpixel + = neighborpixel * currentweights; outi m age[ m yi D] = new Pixel / accum ulatedw eights; outimage[ m yi D] = newpixel / accum ulatedweights; 1 3
14 Portable Perform ance in OpenCL I m plicit SI MD code where the fram ework m aps work-item s onto the lanes of the vector unit creates the opportunity for portable code that perform s well on full range of OpenCL com pute devices Requires m ature OpenCL technology that knows how to do this: But it is im portant to note. we know this approach works since its based on the way shader com pilers work today 1 4
15 Agenda Heterogeneous com puting in OpenCL Data parallelism in OpenCL Task parallelism in OpenCL Future direction for parallel com puting 1 5
16 Task Parallelism Overview Think of a task as an asynchronous function call Do X at som e point in the future Optionally after Y is done Light weight, often in user space Strengths Copes well wit h het erogeneous workloads Doesn t require 1000 s of st rands Scales well with core count Lim itations No autom atic support for latency hiding Must explicitly write SI MD code A natural fit to m ulti- core CPUs Y() X() 1 6
17 Task Parallelism in OpenCL 1 7 clenqueuetask I m agine sea of different tasks executing concurrently A task owns the core (i.e., a workgroup size of 1) Use tasks when algorithm Benefits from large am ount of local/ private m em ory Has predictable global m em ory accesses Can be program m ed using explicit vector style Just doesn t have 1000 s of identical things to do Use data-parallel kernels when algorithm Does not benefit from large am ounts of local/ private m em ory Has unpredictable global m em ory accesses Needs to apply sam e operation across large num ber of data elem ents
18 Agenda Heterogeneous com puting in OpenCL Data parallelism in OpenCL Task parallelism in OpenCL Future direction for parallel com puting 1 8
19 Future Parallel Program m ing Real world applications contain data parallel parts as well as serial/ sequent ial part s OpenCL addresses these Apps need by supporting Data Parallel & Task Parallel Braided Parallelism com posing Data Parallel & Task Parallel constructs in a single algorit hm CPUs are ideal for Braided Parallelism 1 9
20 Future parallel program m ing: Larrabee Memory Controller Fixed Function Texture Logic Multi-Threaded Threaded Wide SIMD W ide SI MD I$ D$ Multi-Threaded Threaded Wide SIMD W ide SI MD I$ D$ I$ D$... L2 Cache... Multi-Threaded Threaded Wide SIMD W ide SI MD I$ D$ I$ D$ Multi-Threaded Threaded Wide SIMD W ide SI MD I$ D$ I$ D$ Display Interface System Interface Memory Controller Cores com m unicate on a wide ring bus Fast access to m em ory and fixed function blocks Fast access for cache coherency L2 cache is partitioned am ong the cores Provides high aggregat e bandwidt h Allows data replication & sharing 2 0
21 Processor Core Block Diagram Instruction Decode Scalar Unit Scalar Registers Vector Unit Vector Registers L1 Icache & Dcache 256KB L2 Cache Local Subset Separate scalar and vector units with separate registers Vector unit: bit ops/ clock I n-order instruction execution Short execution pipelines Fast access from L1 cache Direct connection to each core s subset of the L2 cache Prefetch instructions load L1 and L2 caches Ring 2 1
22 Key Differences from Typical GPUs Each Larrabee core is a com plete I ntel processor Context switching & pre-em ptive m ulti-tasking Virtual m em ory and page swapping, even in texture logic Fully coherent caches at all levels of the hierarchy Efficient inter-block com m unication Ring bus for full inter-processor com m unication Low latency high bandwidth L1 and L2 caches Fast synchronizat ion bet ween cores and caches Larrabee is perfect for the braided parallelism in future applications 2 2
23 Conclusion OpenCL defines a platform -API / fram ework for heterogeneous com puting not just GPGPU or CPU-offload program m ing OpenCL has the potential to deliver portably perform ant code; but only if its used correctly: I m plicit SI MD data parallel code has the best chance of m apping onto a diverse range of hardware once OpenCL im plem entation quality catches up with m ature shader languages The future is clear: Braided parallelism m ixing task parallel and data parallel code in a single program balancing the load am ong ALL OF the plat form s resources OpenCL can handle this and em erging platform s such as Larrabee are well suited to support this m odel 2 3
24 References s0 9.idav.ucdavis.edu for slides from a Siggraph2009 course titled Beyond Program m able Shading Seiler, L., Carm ean, D., et al Larrabee: A m any-core x86 architecture for visual com puting. SI GGRAPH 08: ACM SI GGRAPH 2008 Papers, ACM Press, New York, NY Fatahalian, K., Houston, M., GPUs: a closer look, Com m unications of the ACM October 2008, vol 51 # 10. graphics.st anford.edu/ ~ kayvonf/ papers/ fat ahaliancacm.pdf 2 4
25 Legal Disclaim er INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL PRODUCTS. NO LICENSE, EXPRESS OR IMPLIED, BY ESTOPPEL OR OTHERWISE, TO ANY INTELLECTUAL PROPERTY RIGHTS IS GRANTED BY THIS DOCUMENT. EXCEPT AS PROVIDED IN INTEL S TERMS AND CONDITIONS OF SALE FOR SUCH PRODUCTS, INTEL ASSUMES NO LIABILITY WHATSOEVER, AND INTEL DISCLAIMS ANY EXPRESS OR IMPLIED WARRANTY, RELATING TO SALE AND/OR USE OF INTEL PRODUCTS INCLUDING LIABILITY OR WARRANTIES RELATING TO FITNESS FOR A PARTICULAR PURPOSE, MERCHANTABILITY, OR INFRINGEMENT OF ANY PATENT, COPYRIGHT OR OTHER INTELLECTUAL PROPERTY RIGHT. INTEL PRODUCTS ARE NOT INTENDED FOR USE IN MEDICAL, LIFE SAVING, OR LIFE SUSTAINING APPLICATIONS. Intel may make changes to specifications and product descriptions at any time, without notice. All products, dates, and figures specified are preliminary based on current expectations, and are subject to change without notice. Intel, processors, chipsets, and desktop boards may contain design defects or errors known as errata, which may cause the product to deviate from published specifications. Current characterized errata are available on request. Larrabee and other code names featured are used internally within Intel to identify products that are in development and not yet publicly announced for release. Customers, licensees and other third parties are not authorized by Intel to use code names in advertising, promotion or marketing of any product or services and any such use of Intel's internal code names is at the sole risk of the user Performance tests and ratings are measured using specific computer systems and/or components and reflect the approximate performance of Intel products as measured by those tests. Any difference in system hardware or software design or configuration may affect actual performance. Intel, Intel Inside and the Intel logo are trademarks of Intel Corporation in the United States and other countries. *Other names and brands may be claimed as the property of others. Copyright 2009 Intel Corporation. 2 5
26 Risk Factors This presentation contains forward-looking statem ents that involve a num ber of risks and uncertainties. These statem ents do not reflect the potential im pact of any m ergers, acquisitions, divestitures, investm ents or other sim ilar transactions that m ay be com pleted in the future. The inform ation presented is accurate only as of today s date and will not be updated. I n addition to any factors discussed in the presentation, the im portant factors that could cause actual results to differ m aterially include the following: Dem and could be different from I ntel's expectations due to factors including changes in business and econom ic conditions, including conditions in the credit m arket that could affect consum er confidence; custom er acceptance of I ntel s and com petitors products; changes in custom er order patterns, including order cancellations; and changes in the level of inventory at custom ers. I ntel s results could be affected by the tim ing of closing of acquisitions and divestitures. I ntel operates in intensely com petitive industries that are characterized by a high percentage of costs that are fixed or difficult to reduce in the short term and product dem and that is highly variable and difficult to forecast. Revenue and the gross m argin percentage are affected by the tim ing of new I ntel product introductions and the dem and for and m arket acceptance of I ntel's products; actions taken by I ntel's com petitors, including product offerings and introductions, m arketing program s and pricing pressures and I ntel s response to such actions; I ntel s ability to respond quickly to technological developm ents and to incorporate new features into its products; and the availability of sufficient supply of com ponents from suppliers to m eet dem and. The gross m argin percentage could vary significantly from expectations based on changes in revenue levels; product m ix and pricing; capacity utilization; variations in inventory valuation, including variations related to the tim ing of qualifying products for sale; excess or obsolete inventory; m anufacturing yields; changes in unit costs; im pairm ents of long-lived assets, including m anufacturing, assem bly/ test and intangible assets; and the tim ing and execution of the m anufacturing ram p and associated costs, including start-up costs. Expenses, particularly certain m arketing and com pensation expenses, vary depending on the level of dem and for I ntel's products, the level of revenue and profits, and im pairm ents of long-lived assets. I ntel is in the m idst of a structure and efficiency program that is resulting in several actions that could have an im pact on expected expense levels and gross m argin. I ntel's results could be im pacted by adverse econom ic, social, political and physical/ infrastructure conditions in the countries in which I ntel, its custom ers or its suppliers operate, including m ilitary conflict and other security risks, natural disasters, infrastructure disruptions, health concerns and fluctuations in currency exchange rates. I ntel's results could be affected by adverse effects associated with product defects and errata (deviations from published specifications), and by litigation or regulatory m atters involving intellectual property, stockholder, consum er, antitrust and other issues, such as the litigation and regulatory m atters described in I ntel's SEC reports. A detailed discussion of these and other factors that could affect I ntel s results is included in I ntel s SEC filings, including the report on Form 10-Q for the quarter ended June 28,
Programming Larrabee: Beyond Data Parallelism. Dr. Larry Seiler Intel Corporation
Dr. Larry Seiler Intel Corporation Intel Corporation, 2009 Agenda: What do parallel applications need? How does Larrabee enable parallelism? How does Larrabee support data parallelism? How does Larrabee
More informationNVMHCI: The Optimized Interface for Caches and SSDs
NVMHCI: The Optimized Interface for Caches and SSDs Amber Huffman Intel August 2008 1 Agenda Hidden Costs of Emulating Hard Drives An Optimized Interface for Caches and SSDs August 2008 2 Hidden Costs
More informationKirk Skaugen Senior Vice President General Manager, PC Client Group Intel Corporation
Kirk Skaugen Senior Vice President General Manager, PC Client Group Intel Corporation 2 in 1 Computing Built for Business A Look Inside 2014 Ultrabook 2011 2012 2013 Delivering The Best Mobile Experience
More informationThe Heart of A New Generation Update to Analysts. Anand Chandrasekher Senior Vice President, Intel General Manager, Ultra Mobility Group
The Heart of A New Generation Update to Analysts Anand Chandrasekher Senior Vice President, Intel General Manager, Ultra Mobility Group Today s s presentation contains forward-looking statements. All statements
More informationData center day. Non-volatile memory. Rob Crooke. August 27, Senior Vice President, General Manager Non-Volatile Memory Solutions Group
Non-volatile memory Rob Crooke Senior Vice President, General Manager Non-Volatile Memory Solutions Group August 27, 2015 THE EXPLOSION OF DATA Requires Technology To Perform Random Data Access, Low Queue
More informationHardware and Software Co-Optimization for Best Cloud Experience
Hardware and Software Co-Optimization for Best Cloud Experience Khun Ban (Intel), Troy Wallins (Intel) October 25-29 2015 1 Cloud Computing Growth Connected Devices + Apps + New Services + New Service
More informationThunderbolt Technology Brett Branch Thunderbolt Platform Enabling Manager
Thunderbolt Technology Brett Branch Thunderbolt Platform Enabling Manager Agenda Technology Recap Protocol Overview Connector and Cable Overview System Overview Technology Block Overview Silicon Block
More informationAMD and OpenCL. Mike Houst on Senior Syst em Archit ect Advanced Micro Devices, I nc.
AMD and OpenCL Mike Houst on Senior Syst em Archit ect Advanced Micro Devices, I nc. Overview Brief overview of ATI Radeon HD 4870 architecture and operat ion Spreadsheet perform ance analysis Tips & tricks
More informationMICHAL MROZEK ZBIGNIEW ZDANOWICZ
MICHAL MROZEK ZBIGNIEW ZDANOWICZ Legal Notices and Disclaimers INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL PRODUCTS. NO LICENSE, EXPRESS OR IMPLIED, BY ESTOPPEL OR OTHERWISE, TO ANY
More informationIntel Many Integrated Core (MIC) Architecture
Intel Many Integrated Core (MIC) Architecture Karl Solchenbach Director European Exascale Labs BMW2011, November 3, 2011 1 Notice and Disclaimers Notice: This document contains information on products
More informationPerformance Monitoring on Intel Core i7 Processors*
Performance Monitoring on Intel Core i7 Processors* Ramesh Peri Principal Engineer & Engineering Manager Profiling Collectors Group (PAT/DPD/SSG) Intel Corporation Austin, TX 78746 * Intel, the Intel logo,
More informationIntel Xeon Phi Coprocessor. Technical Resources. Intel Xeon Phi Coprocessor Workshop Pawsey Centre & CSIRO, Aug Intel Xeon Phi Coprocessor
Technical Resources Legal Disclaimer INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL PRODUCTS. NO LICENSE, EXPRESS OR IMPLIED, BY ESTOPPEL OR OTHERWISE, TO ANY INTELLECTUAL PROPETY RIGHTS
More informationData Centre Server Efficiency Metric- A Simplified Effective Approach. Henry ML Wong Sr. Power Technologist Eco-Technology Program Office
Data Centre Server Efficiency Metric- A Simplified Effective Approach Henry ML Wong Sr. Power Technologist Eco-Technology Program Office 1 Agenda The Elephant in your Data Center Calculating Server Utilization
More informationIntel Array Building Blocks
Intel Array Building Blocks Productivity, Performance, and Portability with Intel Parallel Building Blocks Intel SW Products Workshop 2010 CERN openlab 11/29/2010 1 Agenda Legal Information Vision Call
More informationThe Transition to PCI Express* for Client SSDs
The Transition to PCI Express* for Client SSDs Amber Huffman Senior Principal Engineer Intel Santa Clara, CA 1 *Other names and brands may be claimed as the property of others. Legal Notices and Disclaimers
More informationGraphBuilder: A Scalable Graph ETL Framework
SIGMOD GRADES 2013 GraphBuilder: A Scalable Graph ETL Framework Large Scale Graph Construction using Apache Hadoop 1 Authors: Nilesh Jain, Guangdeng Liao, Theodore Willke Presented By: Kushal Datta Legal
More informationIntel Desktop Board DZ68DB
Intel Desktop Board DZ68DB Specification Update April 2011 Part Number: G31558-001 The Intel Desktop Board DZ68DB may contain design defects or errors known as errata, which may cause the product to deviate
More informationKrzysztof Laskowski, Intel Pavan K Lanka, Intel
Krzysztof Laskowski, Intel Pavan K Lanka, Intel Legal Notices and Disclaimers INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL PRODUCTS. NO LICENSE, EXPRESS OR IMPLIED, BY ESTOPPEL OR
More informationParallel Programming on Larrabee. Tim Foley Intel Corp
Parallel Programming on Larrabee Tim Foley Intel Corp Motivation This morning we talked about abstractions A mental model for GPU architectures Parallel programming models Particular tools and APIs This
More informationEnterprise Data Integrity and Increasing the Endurance of Your Solid-State Drive MEMS003
SF 2009 Enterprise Data Integrity and Increasing the Endurance of Your Solid-State Drive James Myers Manager, SSD Applications Engineering, Intel Cliff Jeske Manager, SSD development, Hitachi GST MEMS003
More informationSoftware Occlusion Culling
Software Occlusion Culling Abstract This article details an algorithm and associated sample code for software occlusion culling which is available for download. The technique divides scene objects into
More informationTechnology is a Journey
Technology is a Journey Not a Destination Renée James President, Intel Corp Other names and brands may be claimed as the property of others. Building Tomorrow s TECHNOLOGIES Late 80 s The Early Motherboard
More informationExtending Energy Efficiency. From Silicon To The Platform. And Beyond Raj Hazra. Director, Systems Technology Lab
Extending Energy Efficiency From Silicon To The Platform And Beyond Raj Hazra Director, Systems Technology Lab 1 Agenda Defining Terms Why Platform Energy Efficiency Value Intel Research Call to Action
More informationCollecting OpenCL*-related Metrics with Intel Graphics Performance Analyzers
Collecting OpenCL*-related Metrics with Intel Graphics Performance Analyzers Collecting Important OpenCL*-related Metrics with Intel GPA System Analyzer Introduction Intel SDK for OpenCL* Applications
More informationServer Efficiency: A Simplified Data Center Approach
Server Efficiency: A Simplified Data Center Approach Henry M.L. Wong Intel Eco-Technology Program Office Environment Global CO 2 Emissions ICT 2% 98% 2 Source: The Climate Group Economic Efficiency and
More informationJOE NARDONE. General Manager, WiMAX Solutions Division Service Provider Business Group October 23, 2006
JOE NARDONE General Manager, WiMAX Solutions Division Service Provider Business Group October 23, 2006 Intel s s Broadband Wireless Vision Intel is driving >200 million new screens each year We re focused
More informationIntel Atom Processor Based Platform Technologies. Intelligent Systems Group Intel Corporation
Intel Atom Processor Based Platform Technologies Intelligent Systems Group Intel Corporation Legal Disclaimer INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL PRODUCTS. NO LICENSE, EXPRESS
More informationBitonic Sorting Intel OpenCL SDK Sample Documentation
Intel OpenCL SDK Sample Documentation Document Number: 325262-002US Legal Information INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL PRODUCTS. NO LICENSE, EXPRESS OR IMPLIED, BY ESTOPPEL
More informationSolid-State Drives for Servers and Clients: Why, When, Where and How
Solid-State Drives for Servers and Clients: Why, When, Where and How Troy Winslow Marketing Manager Intel NAND Products Group MASS001 Agenda Why Solid-State Drives? Where will SSDs be utilized? When will
More informationSupra-linear Packet Processing Performance with Intel Multi-core Processors
White Paper Dual-Core Intel Xeon Processor LV 2.0 GHz Communications and Networking Applications Supra-linear Packet Processing Performance with Intel Multi-core Processors 1 Executive Summary Advances
More informationIntel Architecture Press Briefing
Intel Architecture Press Briefing Stephen L. Smith Vice President Director, Digital Enterprise Group Operations Ronak Singhal Principal Engineer Digital Enterprise Group 17 March, 2008 Today s s News Intel
More informationDisplayPort: Foundation and Innovation for Current and Future Intel Platforms
DisplayPort: Foundation Innovation for Current Future Intel Platforms Bailey Cross Mobile Technology Evangelist Mobile Platforms Group Nick Willow Display Technologist Business Client Group Session ID
More informationMore performance options
More performance options OpenCL, streaming media, and native coding options with INDE April 8, 2014 2014, Intel Corporation. All rights reserved. Intel, the Intel logo, Intel Inside, Intel Xeon, and Intel
More informationCarsten Benthin, Sven Woop, Ingo Wald, Attila Áfra HPG 2017
Carsten Benthin, Sven Woop, Ingo Wald, Attila Áfra HPG 2017 Recap Twolevel BVH Multiple object BVHs Single toplevel BVH Recap Twolevel BVH Multiple object BVHs Single toplevel BVH Motivation The Library
More informationIntel and the Future of Consumer Electronics. Shahrokh Shahidzadeh Sr. Principal Technologist
1 Intel and the Future of Consumer Electronics Shahrokh Shahidzadeh Sr. Principal Technologist Legal Notices and Disclaimers INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL PRODUCTS.
More informationBitonic Sorting. Intel SDK for OpenCL* Applications Sample Documentation. Copyright Intel Corporation. All Rights Reserved
Intel SDK for OpenCL* Applications Sample Documentation Copyright 2010 2012 Intel Corporation All Rights Reserved Document Number: 325262-002US Revision: 1.3 World Wide Web: http://www.intel.com Document
More informationMobile World Congress Claudine Mangano Director, Global Communications Intel Corporation
Mobile World Congress 2015 Claudine Mangano Director, Global Communications Intel Corporation Mobile World Congress 2015 Brian Krzanich Chief Executive Officer Intel Corporation 4.9B 2X CONNECTED CONNECTED
More informationEvolving Small Cells. Udayan Mukherjee Senior Principal Engineer and Director (Wireless Infrastructure)
Evolving Small Cells Udayan Mukherjee Senior Principal Engineer and Director (Wireless Infrastructure) Intelligent Heterogeneous Network Optimum User Experience Fibre-optic Connected Macro Base stations
More informationIntel Desktop Board D945GCLF2
Intel Desktop Board D945GCLF2 Specification Update July 2010 Order Number: E54886-006US The Intel Desktop Board D945GCLF2 may contain design defects or errors known as errata, which may cause the product
More informationInnovation Accelerating Mission Critical Infrastructure
Innovation Accelerating Mission Critical Infrastructure Legal Disclaimers INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL PRODUCTS. NO LICENSE, EXPRESS OR IMPLIED, BY ESTOPPEL OR OTHERWISE,
More informationStriking the Balance Driving Increased Density and Cost Reduction in Printed Circuit Board Designs
Striking the Balance Driving Increased Density and Cost Reduction in Printed Circuit Board Designs Tim Swettlen & Gary Long Intel Corporation Tuesday, Oct 22, 2013 Legal Disclaimer The presentation is
More informationIntel Xeon Phi Coprocessor Performance Analysis
Intel Xeon Phi Coprocessor Performance Analysis Legal Disclaimer INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL PRODUCTS. NO LICENSE, EXPRESS OR IMPLIED, BY ESTOPPEL OR OTHERWISE, TO
More informationMobility: Innovation Unleashed!
Mobility: Innovation Unleashed! Mooly Eden Corporate Vice President General Manager, Mobile Platforms Group Intel Corporation Legal Notices INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL
More informationInstallation Guide and Release Notes
Intel C++ Studio XE 2013 for Windows* Installation Guide and Release Notes Document number: 323805-003US 26 June 2013 Table of Contents 1 Introduction... 1 1.1 What s New... 2 1.1.1 Changes since Intel
More informationIntel Desktop Board DG41RQ
Intel Desktop Board DG41RQ Specification Update July 2010 Order Number: E61979-004US The Intel Desktop Board DG41RQ may contain design defects or errors known as errata, which may cause the product to
More informationIntel Desktop Board D946GZAB
Intel Desktop Board D946GZAB Specification Update Release Date: November 2007 Order Number: D65909-002US The Intel Desktop Board D946GZAB may contain design defects or errors known as errata, which may
More informationIntel Desktop Board DG41CN
Intel Desktop Board DG41CN Specification Update December 2010 Order Number: E89822-003US The Intel Desktop Board DG41CN may contain design defects or errors known as errata, which may cause the product
More informationSample for OpenCL* and DirectX* Video Acceleration Surface Sharing
Sample for OpenCL* and DirectX* Video Acceleration Surface Sharing User s Guide Intel SDK for OpenCL* Applications Sample Documentation Copyright 2010 2013 Intel Corporation All Rights Reserved Document
More informationQuadrics QsNet II : A network for Supercomputing Applications
Quadrics QsNet II : A network for Supercomputing Applications David Addison, Jon Beecroft, David Hewson, Moray McLaren (Quadrics Ltd.), Fabrizio Petrini (LANL) You ve bought your super computer You ve
More informationThe Intel Processor Diagnostic Tool Release Notes
The Intel Processor Diagnostic Tool Release Notes Page 1 of 7 LEGAL INFORMATION INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL PRODUCTS. NO LICENSE, EXPRESS OR IMPLIED, BY ESTOPPEL OR
More informationIntel Desktop Board D945GCLF
Intel Desktop Board D945GCLF Specification Update July 2010 Order Number: E47517-008US The Intel Desktop Board D945GCLF may contain design defects or errors known as errata, which may cause the product
More informationTransitioning the I ntel Next. ( Nehalem and W estm ere) into the. Lily Looi, St ephan Jourdan I ntel Corporation Hot Chips 21 August 24, 2009
Transitioning the I ntel Next Generation Microarchitectures ( Nehalem and W estm ere) into the Mainstream Lily Looi, St ephan Jourdan I ntel Corporation Hot Chips 21 August 24, 2009 1 Agenda Next Generation
More informationIntel Desktop Board DG31PR
Intel Desktop Board DG31PR Specification Update May 2008 Order Number E30564-003US The Intel Desktop Board DG31PR may contain design defects or errors known as errata, which may cause the product to deviate
More informationGetting Started with Intel SDK for OpenCL Applications
Getting Started with Intel SDK for OpenCL Applications Webinar #1 in the Three-part OpenCL Webinar Series July 11, 2012 Register Now for All Webinars in the Series Welcome to Getting Started with Intel
More informationErnesto Su, Hideki Saito, Xinmin Tian Intel Corporation. OpenMPCon 2017 September 18, 2017
Ernesto Su, Hideki Saito, Xinmin Tian Intel Corporation OpenMPCon 2017 September 18, 2017 Legal Notice and Disclaimers By using this document, in addition to any agreements you have with Intel, you accept
More informationIntel Desktop Board D975XBX2
Intel Desktop Board D975XBX2 Specification Update July 2008 Order Number: D74278-003US The Intel Desktop Board D975XBX2 may contain design defects or errors known as errata, which may cause the product
More informationi960 Microprocessor Performance Brief October 1998 Order Number:
Performance Brief October 1998 Order Number: 272950-003 Information in this document is provided in connection with Intel products. No license, express or implied, by estoppel or otherwise, to any intellectual
More informationSensors on mobile devices An overview of applications, power management and security aspects. Katrin Matthes, Rajasekaran Andiappan Intel Corporation
Sensors on mobile devices An overview of applications, power management and security aspects Katrin Matthes, Rajasekaran Andiappan Intel Corporation Introduction 1. Compute platforms are adding more and
More informationShow Me the Money: Monetization Strategies for Apps. Scott Crabtree moderator
Show Me the Money: Monetization Strategies for Apps Scott Crabtree moderator Show Me The Money! Monetization Strategies for Apps Panel Discussion with: Moderator: Scott Crabtree, Tech Strategist, Intel
More informationIntel Desktop Board D945GCCR
Intel Desktop Board D945GCCR Specification Update January 2008 Order Number: D87098-003 The Intel Desktop Board D945GCCR may contain design defects or errors known as errata, which may cause the product
More informationDevice Firmware Update (DFU) for Windows
Legal Disclaimer INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL PRODUCTS. NO LICENSE, EXPRESS OR IMPLIED, BY ESTOPPEL OR OTHERWISE, TO ANY INTELLECTUAL PROPERTY RIGHTS IS GRANTED BY
More informationOpenCL* and Microsoft DirectX* Video Acceleration Surface Sharing
OpenCL* and Microsoft DirectX* Video Acceleration Surface Sharing Intel SDK for OpenCL* Applications Sample Documentation Copyright 2010 2012 Intel Corporation All Rights Reserved Document Number: 327281-001US
More informationInstallation Guide and Release Notes
Installation Guide and Release Notes Document number: 321604-001US 19 October 2009 Table of Contents 1 Introduction... 1 1.1 Product Contents... 1 1.2 System Requirements... 2 1.3 Documentation... 3 1.4
More informationScientific Computing on GPUs: GPU Architecture Overview
Scientific Computing on GPUs: GPU Architecture Overview Dominik Göddeke, Jakub Kurzak, Jan-Philipp Weiß, André Heidekrüger and Tim Schröder PPAM 2011 Tutorial Toruń, Poland, September 11 http://gpgpu.org/ppam11
More informationRack Scale Architecture Platform and Management
Rack Scale Architecture Platform and Management Mohan J. Kumar Senior Principal Engineer, Intel Corporation DATS008 Agenda Rack Scale Architecture (RSA) Overview RSA Platform Overview Elements of RSA Platform
More informationIntel Cache Acceleration Software for Windows* Workstation
Intel Cache Acceleration Software for Windows* Workstation Release 3.1 Release Notes July 8, 2016 Revision 1.3 INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL PRODUCTS. NO LICENSE, EXPRESS
More informationIntel Virtualization Technology Roadmap and VT-d Support in Xen
Intel Virtualization Technology Roadmap and VT-d Support in Xen Jun Nakajima Intel Open Source Technology Center Legal Disclaimer INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL PRODUCTS.
More informationIntel RealSense Depth Module D400 Series Software Calibration Tool
Intel RealSense Depth Module D400 Series Software Calibration Tool Release Notes January 29, 2018 Version 2.5.2.0 INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL PRODUCTS. NO LICENSE,
More informationCSCI 402: Computer Architectures. Parallel Processors (2) Fengguang Song Department of Computer & Information Science IUPUI.
CSCI 402: Computer Architectures Parallel Processors (2) Fengguang Song Department of Computer & Information Science IUPUI 6.6 - End Today s Contents GPU Cluster and its network topology The Roofline performance
More informationIntel Desktop Board DP55SB
Intel Desktop Board DP55SB Specification Update July 2010 Order Number: E81107-003US The Intel Desktop Board DP55SB may contain design defects or errors known as errata, which may cause the product to
More informationReal-Time Rendering Architectures
Real-Time Rendering Architectures Mike Houston, AMD Part 1: throughput processing Three key concepts behind how modern GPU processing cores run code Knowing these concepts will help you: 1. Understand
More informationIntel G31/P31 Express Chipset
Intel G31/P31 Express Chipset Specification Update For the Intel 82G31 Graphics and Memory Controller Hub (GMCH) and Intel 82GP31 Memory Controller Hub (MCH) February 2008 Notice: The Intel G31/P31 Express
More informationFrom Shader Code to a Teraflop: How GPU Shader Cores Work. Jonathan Ragan- Kelley (Slides by Kayvon Fatahalian)
From Shader Code to a Teraflop: How GPU Shader Cores Work Jonathan Ragan- Kelley (Slides by Kayvon Fatahalian) 1 This talk Three major ideas that make GPU processing cores run fast Closer look at real
More informationBuilding a Firmware Component Ecosystem with the Intel Firmware Engine
Building a Firmware Component Ecosystem with the Intel Firmware Engine Brian Richardson Senior Technical Marketing Engineer, Intel Corporation STTS002 1 Agenda Binary Firmware Management The Binary Firmware
More informationIntel USB 3.0 extensible Host Controller Driver
Intel USB 3.0 extensible Host Controller Driver Release Notes (5.0.4.43) Unified driver September 2018 Revision 1.2 INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL PRODUCTS. NO LICENSE,
More informationUnlocking the Future with Intel
Unlocking the Future with Intel Renée James Senior Vice President, Intel Corporation General Manager, Software and Services Group Technology Has Changed the Way We Interact With Our World 1981 348 IBM
More informationMaximize Performance and Scalability of RADIOSS* Structural Analysis Software on Intel Xeon Processor E7 v2 Family-Based Platforms
Maximize Performance and Scalability of RADIOSS* Structural Analysis Software on Family-Based Platforms Executive Summary Complex simulations of structural and systems performance, such as car crash simulations,
More informationIntel Extreme Memory Profile (Intel XMP) DDR3 Technology
Intel Extreme Memory Profile (Intel XMP) DDR3 Technology White Paper March 2008 Document Number: 319124-001 INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL PRODUCTS. NO LICENSE, EXPRESS
More information32nm Westmere Family of Processors
32nm Westmere Family of Processors Stephen L. Smith Vice President, Director of Group Operations Digital Enterprise Group Contact: George Alfs (408)765-5707 5707 Highlights from Paul Otellini Speech Today
More informationAchieving Performance with OpenCL 2.0 on Intel Processor Graphics. Robert Ioffe, Sonal Sharma, Michael Stoner
Achieving Performance with OpenCL 2.0 on Intel Processor Graphics Robert Ioffe, Sonal Sharma, Michael Stoner Legal INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL PRODUCTS. NO LICENSE,
More informationDesktop 4th Generation Intel Core, Intel Pentium, and Intel Celeron Processor Families and Intel Xeon Processor E3-1268L v3
Desktop 4th Generation Intel Core, Intel Pentium, and Intel Celeron Processor Families and Intel Xeon Processor E3-1268L v3 Addendum May 2014 Document Number: 329174-004US Introduction INFORMATION IN THIS
More informationFrom Shader Code to a Teraflop: How Shader Cores Work
From Shader Code to a Teraflop: How Shader Cores Work Kayvon Fatahalian Stanford University This talk 1. Three major ideas that make GPU processing cores run fast 2. Closer look at real GPU designs NVIDIA
More informationIntel 848P Chipset. Specification Update. Intel 82848P Memory Controller Hub (MCH) August 2003
Intel 848P Chipset Specification Update Intel 82848P Memory Controller Hub (MCH) August 2003 Notice: The Intel 82848P MCH may contain design defects or errors known as errata which may cause the product
More informationAdvanced Com puter Architecture: s1/ 2005
Advanced Com puter Architecture: s1/ 2005 Project Presen tation David Mirabito Handling branches through context fking Currently: b eq Hypothetical instruction stream (operands rem oved) Currently: b eq
More informationIntel Desktop Board D2700DC. PMLP Report. Previously Logo d Motherboard Logo Program (PMLP)
Previously Logo d Motherboard Logo Program (PMLP) Intel Desktop Board D2700DC PMLP Report 1/12/2012 Purpose: This report describes the Board D2700DC Previously Logo d Motherboard Logo Program testing run
More informationInterconnect Bus Extensions for Energy-Efficient Platforms
Interconnect Bus Extensions for Energy-Efficient Platforms EBLS001 Sonesh Balchandani, Product Marketing Manager, Intel Corporation Jaya Jeyaseelan, Client Platform Architect, Intel Corporation Agenda
More informationEfficient scaling in a Task-Based Game Engine
Efficient scaling in a Task-Based Game Engine Leigh Davies, Intel leigh.davies@intel.com www.intel.com/software/gdceurope2011/ Agenda How do we want to program for multi-core? Introduction to tasking Building
More informationUsing Wind River Simics * Virtual Platforms to Accelerate Firmware Development PTAS003
Using Wind River Simics * Virtual Platforms to Accelerate Firmware Development Steven Shi, Senior Firmware Engineer, Intel Chunrong Lai, Software Engineer, Intel Alexander Y. Belousov, Engineer Manager,
More informationIntel Transparent Computing
Intel Transparent Computing Jeff Griffen Director of Platform Software Infrastructure Software and Services Group October, 21 2010 1 Legal Information INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION
More informationExpand Your HPC Market Reach and Grow Your Sales with Intel Cluster Ready
Intel Cluster Ready Expand Your HPC Market Reach and Grow Your Sales with Intel Cluster Ready Legal Disclaimer Intel may make changes to specifications and product descriptions at any time, without notice.
More informationIntel Desktop Board DH61SA
Intel Desktop Board DH61SA Specification Update December 2011 Part Number: G52483-001 The Intel Desktop Board DH61SA may contain design defects or errors known as errata, which may cause the product to
More informationIntel RealSense D400 Series Calibration Tools and API Release Notes
Intel RealSense D400 Series Calibration Tools and API Release Notes July 9, 2018 Version 2.6.4.0 INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL PRODUCTS. NO LICENSE, EXPRESS OR IMPLIED,
More informationIntel Desktop Board DQ35JO
Intel Desktop Board DQ35JO Specification Update July 2010 Order Number: E21492-005US The Intel Desktop Board DQ35JO may contain design defects or errors known as errata, which may cause the product to
More informationParallel Programming for Graphics
Beyond Programmable Shading Course ACM SIGGRAPH 2010 Parallel Programming for Graphics Aaron Lefohn Advanced Rendering Technology (ART) Intel What s In This Talk? Overview of parallel programming models
More informationAn Exploration into Object Storage for Exascale Supercomputers. Raghu Chandrasekar
An Exploration into Object Storage for Exascale Supercomputers Raghu Chandrasekar Agenda Introduction Trends and Challenges Design and Implementation of SAROJA Preliminary evaluations Summary and Conclusion
More informationSoftware Evaluation Guide for ImTOO* YouTube* to ipod* Converter Downloading YouTube videos to your ipod
Software Evaluation Guide for ImTOO* YouTube* to ipod* Converter Downloading YouTube videos to your ipod http://www.intel.com/performance/resources Version 2008-09 Rev. 1.0 Information in this document
More informationIntel Desktop Board DQ35MP. MLP Report. Motherboard Logo Program (MLP) 5/7/2008
Motherboard Logo Program (MLP) Intel Desktop Board DQ35MP MLP Report 5/7/2008 Purpose: This report describes the DQ35MP Motherboard Logo Program testing run conducted by Intel Corporation. THIS TEST REPORT
More informationOpportunities and Challenges in Sparse Linear Algebra on Many-Core Processors with High-Bandwidth Memory
Opportunities and Challenges in Sparse Linear Algebra on Many-Core Processors with High-Bandwidth Memory Jongsoo Park, Parallel Computing Lab, Intel Corporation with contributions from MKL team 1 Algorithm/
More informationIntel Desktop Board DH55TC
Intel Desktop Board DH55TC Specification Update December 2011 Order Number: E88213-006 The Intel Desktop Board DH55TC may contain design defects or errors known as errata, which may cause the product to
More informationNon-Volatile Memory Cache Enhancements: Turbo-Charging Client Platform Performance
Non-Volatile Memory Cache Enhancements: Turbo-Charging Client Platform Performance By Robert E Larsen NVM Cache Product Line Manager Intel Corporation August 2008 1 Legal Disclaimer INFORMATION IN THIS
More information