Since the invention of microprocessors at Intel in the late 1960s, Multicore Chips and Parallel Processing for High-End Learning Environments

old Learning on Demand Marcelo Hoffmann +1 650 859 3680; fax: +1 650 859 4544; electronic mail: mhoffmann@sric-bi.com Multicore Chips and Parallel Processing for High-End Learning Environments Why is this topic significant? Computing power is a key enabler for high-end learning environments such as threedimensional simulations, games, and virtual worlds. The availability of inexpensive, multicore, parallel computers may revolutionize high-end learning environments if programmers can create and convert software for the new computer systems. This Viewpoints, which examines the field of multicore chips and parallel processing, is relevant to all learning-industry players interested in new hardware trends. Since the invention of microprocessors at Intel in the late 1960s, improvements in their speed and power have become expected and generally assumed by the marketplace. However, in 2004, Moore s law ran into a major speed bump that led developers to transform microprocessor designs and, with them, computer programming. Gordon Moore, one of the founders of Intel, originally predicted that the number of transistors in the same surface area on a microchip would double every 18 months. In 2004, chip designers hit a wall because each new chip not only doubled in performance but also doubled in power dissipation, resulting in the production of excessive heat that the chip could not easily expel. Because of this barrier, Intel canceled its planned successor to the Pentium 4 in 2004 and gave up on its goal of producing a chip with a 10 GHz clock rate by 2010 (microprocessors now generally operate little faster than about, a monthly bulletin, alerts members to issues and developments in learning and technology with potential high impact on business performance. LoD research reports provide more in-depth analysis of key topics, including strategy recommendations for adopters and vendors. Membership includes access to LoD s Inquiry Service. To use the service, contact Eilif Trondsen at etrondsen@sric-bi.com or Rob Edmonds at redmonds@sric-bi.com.

3 GHz). Currently, and for the first time in some 20 years, the line plotting microprocessor performance per unit of surface area has dipped below the line plotting Moore s law. This change has severe consequences for the consumer electronics and computer industries, which expect continual improvements in microprocessors power for their ongoing business. The most popular solution that chip architects have developed to alleviate the power-density limitation of chips uses multiple processors, or cores, in each chip. The various cores execute portions of software code in parallel. Multicore architectures allow designers to lower the clock rate and voltage requirements of the chip (solving the heat problem) while continuing to improve overall performance for executing programs. The number of cores on a microprocessor may soon join measurements (in megahertz) of chip clock rates as a combined metric for microprocessor performance. David Patterson a noted computer-science researcher at the University of California, Berkeley claims that we can now expect the number of cores per chip to double every 18 months in a sense replacing Moore s law with a measure for parallelization. The transition toward multicore chips runs across the microprocessor and computer industries: Intel Corp. and Advanced Micro Devices Inc. introduced two- and four-core microprocessors during 2006 for mainstream and power-user computers. Sun Microsystems is already producing chips for servers with eight cores. IBM sells what amounts to an eight-core chip in its Cell microprocessor product that it designed and built in conjunction with Sony and Toshiba for use in Sony s latest PlayStation 3 game console and several other applications. Texas Instruments has produced chips with a variety of cores for cell-phone makers for several years. In 2006, LSI Logic Corp. introduced a microprocessor platform for consumer gadgets that uses three or four cores, depending on the particular application. Makers of chips for embedded systems have produced highly specialized chips with more than 200 cores on a die. No one is sure how successful the multicore-development strategy will be for the consumer-electronics and computer industries. Several major questions remain about implementation: Programming multicore processors is dramatically different from and more complex than programming traditional processors. Parallel programs require design and code so that multiple processors can share computational tasks. This requirement is problematic because not all application programs have components that can easily divide; sometimes tasks begun simultaneously

finish at different times and generate bottlenecks when conjoined. Shared resources can also generate problems: If an application needs to access data in memory that is already shared and used by other cores, the program could freeze, stopping all operations. Parallel programs are also inherently hard to debug because mistakes are often not obvious at inception making location of the source of future problems difficult. The move to multicore requires not only new programming skills, but also new tools. Some experts claim that adapting to multicore processors and parallel programming is the biggest challenge that the information technology (IT) industry has faced in 50 years. Jim Larus, a computer scientist who manages programming initiatives at Microsoft Research, notes, We lack algorithms, languages, compilers and expertise in parallel programming. Programmers face many short-term issues, like developing better support for multithreading, synchronization, debugging and error detection (Electronic Engineering Times, 28 December 2006, page 6). A key driver toward parallelism in computation is the market for computers with high-end graphics processing units (GPUs; microprocessors specifically to improve computer graphics). High end GPUs are important for computer games, including games such as Second Life that may find use in learning situations. Such GPUs are also necessary for three-dimensional learning environments with highly detailed, real- time representations of the people and objects. The game market has already driven the development of graphics chips and GPUs that are even more powerful for some operations than the leading general purpose microprocessors that drive personal computers and workstations. According to Wired magazine ( Supercomputing s next Revolution, 9 November 2006; www.wired. com/news/technology/0,72090-0.html), researchers from the University of North Carolina at Chapel Hill released benchmark tests showing how specialized GPUs that developed for the games industry in the past few years can surpass the latest central-processing-unit (CPU-) based systems by two to five times in a wide variety of tasks. Some researchers, writing specialized code for the graphics chips, find even greater improvements, though these improvements generally result from painstaking hand coding of software to the specific hardware, which is often not cost beneficial for commercial purposes.

Competition in the GPU market is now even more intense than in the CPU market, and doubling of computation power per board in one year is not unusual in the industry. The GPU hardware makers are also anxious to address the issues facing programmers. NVIDIA, one of the two leading GPU developers (the other is ATI, recently bought by AMD), has announced that it will soon offer the first C-compiler development environment for its GPUs, making GPUs easier to program for multiple applications, in addition to graphics rendering and presentation. According to Andy Keane, NVIDIA s general manager for GPU computing, the company created a new architecture for its latest GPU, the GeForce 8800, adding a memory cache that allows the chip to work in two modes one for graphics that uses stream processing (a specialized type of parallel processing) and a second so-called load-store mode for more complex logic-based operations, making the leading-edge GPU operate similar to the way in which a traditional CPU operates. Offloading computational tasks from the general-purpose microprocessor could result in substantially greater realism in learning environments, if researchers can surpass the programming bottlenecks. In the January 2007 issue of Intelligent Enterprise, David Patterson described the problems and opportunities facing the computer industry, stating that the switch to parallel architectures will require researchers and practitioners to address the biggest challenge and opportunity to face the IT industry in 50 years. If we solve the problem of making it easy to program thousands of processors efficiently, the future is rosy. If we don t, the IT industry will have to learn to live without the performance rush that it has been addicted to for decades. (See www.intelligententerprise.com/showarticle. jhtml?articleid=196603897&pgno=5.) Patterson and a number of other leading computer scientists at the Massachusetts Institute of Technology, Carnegie Mellon University, Stanford University, and the University of Washington are now working to solve the problem by collaborating on a project that they call Research Accelerator for Multiple Processors (RAMP). According to its Web site (ramp.eecs.berkeley.edu/), the goal of RAMP is to develop (i) component models (ranging from processors to coherent caches to networks) that can be composed quickly to create and evaluate new multiprocessor architectural and micro-architectural concepts and (ii) a set of three reference machines that will use those component models. Reportedly, the reference systems are in design to scale to the 1000-core range given the appropriate hardware platform (additional information is available at en.wikipedia.org/wiki/ramp:_research_accelerator_for_multiple_ Processors).

About the LoD Program SRI Consulting Business Intelligence s Learning-on-Demand (LoD) multiclient research program leverages the subscription fees of multiple clients to examine the evolution and features of the emerging technology-enabled learning marketplace, explore adoption issues, and define the components of effective workplace learning. The LoD multiclient program provides a cost-effective way to discover, evaluate, and implement LoD solutions that will yield high business payoffs by improving employee performance. The program benefits both LoD users and developers: Potential LoD system users gain an unbiased source of information about LoD implementation, benefits of LoD systems, and innovative LoD solutions emerging in the marketplace. LoD system developers receive information about the factors driving or constraining market demand for LoD systems. For more information about the Learning-on-Demand multiclient research program, contact: Eilif Trondsen, Program Director; etrondsen@sric-bi.com 333 Ravenswood Avenue, Menlo Park, California 94025-3476 Telephone: +1 650 859 4600 Or visit our Web site: www.sric-bi.com. About SRI Consulting Business Intelligence TM Anticipating Futures Opportunities in digital information, communications, and entertainment EXPLORE FOCUS Business opportunities in technology commercialization old Learning on Demand TM Psychology of Markets Opportunities in technology-enabled learning and strategies in elearning Applying psychology to understand and predict consumer behavior Core Consulting Services Technology and Market Assessment Opportunity Discovery Innovation and Commercialization Consumer Demand Scenario Planning Strategy Intelligence Technology Management Learning Strategy C F DTM CONSUMER FINANCIAL DECISIONS Insight and consulting about consumer financial behaviors and attitudes