FET Proactive initiative: Advanced Computing Architectures

FET Proactive initiative: Advanced Computing Architectures Terms of Reference 1 INTRODUCTION Fuelled by Moore's law, progress in electronics continues at an unabated pace permitting the design of devices with ever smaller size, lower cost and power consumption while simultaneously delivering a steady improvement in performance. Yesterday's supercomputers have thus become today's commodity processors that are embedded in an ever growing number of digital devices. Just as during the early days of computing, a lot of the recent innovation in computing has been driven by the needs of network services: telephone services then, the Internet and wireless communication today. The emergence of always accessible, high bandwidth networks are changing the expectations of users and enable new classes of services. On the back end of the network, sophisticated data centres have emerged that maintain repositories of data and provide a wide variety of services from storage, through searching and commerce to data mining to their users. These services are driving the need for reliable high performance computing often through techniques that take advantage of the new network-centric world view. A good example of this is the emergence of cluster and grid computing technologies, which are the modern form of parallel distributed processing. While the back-end of the network has seen a lot of innovation in recent years, the pace of evolution on the front of the network the devices that connect the users to the network has probably been even faster. Wireless technologies have created a breed of consumer devices that has been driving innovation in computers. Not only are these devices small and mobile but their users demand applications on them that only a few years ago were in the realm of workstations and supercomputers. Computer designers are asked for the almost impossible: devices with tremendous processing power, with very low power consumption and low cost. These requirements necessitate a lot of new research efforts as well as standardization, and open platforms to enable the next generation of high-value services on the network and to enable innovation. As our future is poised to become even more reliant on computer technologies than it is today, the coming years will call for continued innovation at all levels. An emerging class of intelligent sensors will increase the number of networked devices many orders of magnitude and will also drive innovation in at the back-end through the need to make sense of the data collected by these sensors. These developments show that future computing architectures will need to be able to deliver a step-function improvement in performance and power efficiency if the vision of a connected world is to be fulfilled. 1

2 AIMS OF THE CALL So far, performance increases of electronic devices have primarily been achieved through semiconductor technology shrinking in combination with architectural enhancements. However, speed gains from technology shrinks are likely to slow down soon, and the straightforward architectural gains now produce only diminishing returns. New methods are required to fuel the progress in semiconductor logic performance at low cost and low power dissipation. These are likely to emerge by adding specialised features to existing processors and, most importantly, from parallel processing generic or specialized, chiplevel or cluster-level. However, not all solutions are created equal: as the vast majority of computing devices go into the cost-competitive market of consumer devices, development and manufacturing costs must be kept in check. Not only is the design complexity of embedded systems going up but also the cost of launching production of ASICs using the latest generation of technologies. This means that state-of-the-art manufacturing is economical only for an increasingly higher production volume, which puts more emphasis on post-manufacture specialization for the needs of the individual customer. As a result, methods for enhancing (re)-programmability, re-configurability, and re-usability of devices and thereby enlarging the potential set of customers will become a key issue. Under these constraints it becomes important to develop highly effective computing architectures that are applicable across a wide range of application areas. Product differentiation will then be realized using a combination of software and reconfigurable hardware technologies. Progress in this domain hinges on both programming techniques and on programmable architectures that support fast system design, debugging, verification, and testing. The long-term stability of platforms and programming models for re-use of application software has been a key enabler of personal computers. A similar stability, yet with the ability to fine-tune for applications and the target environment s requirements, will be a key development issue for the emerging embedded applications. The aim of this programme is to develop novel advanced computing architectures, methods, tools, and intellectual property that will: Substantially increase the performance of computing engines (processors and scalable systems made of multiple processors) well beyond projected performance of Moore s law (e.g., by two orders of magnitude), while reducing their power consumption. Provide leading compiler and operating system technology that will deliver high performance and efficient code optimisation, just-in-time compilation, and that will be portable across a wide range of target systems. Constitute building blocks to be combined with each other and programmed easily and efficiently, even in heterogeneous processing platforms. 2

3 RESEARCH CHALLENGES AND RESEARCH THEMES 3.1 CHALLENGES SCALABLE PERFORMANCE Current architectural designs will not support sustainable scaling across future submicron technology because of a variety of technological challenges. A radical rethink is needed of how a processor architecture is constructed, and how larger systems are built out of combinations of such processors. Increasing the performance of today's computing systems substantially beyond projected performance of Moore s law (e.g., two orders of magnitude) is a grand challenge as no current architecture can scale to such performance levels. LOW POWER / ENERGY EFFICIENT Power consumption will be a key issue for future systems composed of heterogeneous components in both mobile and embedded devices. Further, in stationary high performance devices, power dissipation density levels are increasing super-linearly with shrinking silicon technologies. Therefore, reducing power by an order of magnitude while increasing computing performance is a grand challenge. SYSTEM PERFORMANCE Increased peak performance from computer architecture is often not leading to increased application performance. This gap between peak and actual performance may be due to the intervening mapping layers - compiler and operating system, but probably also to the unsuitability of the architecture to the application requirements. Developing technology that delivers portable optimisation on time across a range of future architectures is a grand challenge. DESIGN, ARCHITECTURE AND PROGRAMMABILITY Today the cost of design of ICs, ASICs, FPGAs, embedded Systems on a Chip (SoC), and hardware-dependent software is one of the major concerns of the industry. Huge efforts are expended to increase design productivity while keeping costs under control. At the same time, efficient programming of parallel and often heterogeneous processors and other components in a given platform remains one of the major unsolved problems in computer science. Leap progress is required in efficiency of designing and programming heterogeneous processors and platforms. The challenge here is to develop stable application interfaces that will benefit from technology scaling and system development environments that are long-lasting. 3.2 RESEARCH THEMES SCALABLE PROCESSOR ARCHITECTURES Processor performance gains of the last 20 years have mainly been obtained by the reduced switching delays of smaller gates and reduced transmission delays of shorter links. However, transmission delays may not scale down beyond the 45nm node and will dominate overall IC performance limitations. Hence performance improvement due to the 3

silicon technology is going to prove more difficult to reach. Other performance gains were obtained at the silicon level through wider word sizes and pipelining, but these are also reaching their limits. Further gains require the detection and exploitation of parallelism in both software and hardware. While substantial gains in microprocessor performance have already been reached through the exploitation of instruction-level parallelism, current implementations are now reaching their practical limits. Research in thread-level parallelism brings new promises for high returns in data-intensive applications run on suitable parallel architectures, but this approach requires rethinking current algorithms, programming models and architectural designs. Designing architectures that are scalable over a wide range of performance and power efficiency levels is a great challenge. A possible way forward is a more formal separation of architectures for control- and data-plane operation. A taxonomy can be drawn in which the control processor needs a stable environment as it is mostly responsible for running the operating system and user applications which evolve relatively slowly. On the other side, the data processor could become more open to possible innovations, since it is computationally more demanding and changing more rapidly. This type of separation has been used for a long time in telecom systems and can be observed in most modern mobile phone chip sets. In desktop PCs, the central processor executes the control threads and provides a consistent execution environment. Graphic cards on the other hand carry out the data intensive operatings and are evolving much faster. Architectural scalability needs to be investigated consistently at the levels of data processing, control processing, communication, and compiler. With ever larger and more complex chips, chip-wide interconnection, communication and synchronisation become growing concerns. Chip area networks are needed (also referred to as Network on Chip - NoC). Issues such as how to partition memory resources and how to maintain quality of service need to be addressed. Moreover, multi-chip systems need novel, more efficient processor-network interfaces and interconnection networks. Modular, grid or tile architectures where small units designed to be scalable are pieced together offer one avenue of research for achieving scalability towards large-scale networks. Equally challenging is the design of memory and storage systems that can deliver data at a sufficient rate. As the gap between processor and memory speed inexorably widens, fundamentally new ways of designing memory systems, but also processor-memory communication, memory management and cache control need to be explored. The research themes include Processor architectures: low-power, low-cost or high-performance processors, application-oriented processors (embedded computing, multimedia, networking, wireless, etc), including programmability and reconfigurability. Scalable system architectures with multiple processors: cluster, SMP, chip-mp, tiled architectures, storage and interconnection architectures, high-performance embedded computing architectures. 4

LOW POWER DESIGNS Power will be the main constraining resource in future embedded systems. Semiconductor technology scaling will no longer provide major power savings. New, very promising solutions may then be derived through assessing the impact of any architecture or technology changes on power consumption, identifying the areas where power savings could be realised, exploring a wide range of avenues for power reduction, and addressing the overall optimisation at system level. Among the future more serious power losses, one can identify high frequency clocking, gate leakage, speculative execution and cache loading, memory accesses in general, software and system inefficiencies. Direct measures include specific low-power silicon technology, multi-clock circuits, asynchronous chips, on-chip power management, power-aware compilation, architecture and system design, and parallel circuits with lower operating frequencies. Inefficient power utilisation at the system level often results from the difficulty to devise power saving measures across layered software, from the kernel through the operating system, middleware, configware to the application. Measures include compiler-directed power optimisation and dynamic power management. In a number of cases, power-efficiency will go hand-in-hand with performance increases, improving the ratio between application-level performance and power consumption and aggregate intrinsic performance and power efficiency at the gate level. RETARGETABLE OPTIMISATION Compiler development time will be a major restriction to time to market of future embedded systems. Compilation approaches from an application into an architecture are required so that the adaptation to specific implementations would be straightforward. Performance criteria include speed, power, and size. As more and more applications are multi-threaded, partitioning of functionality becomes a key issue. Next generation compilers would have to work at system level, automatically detect parallelism at application level, decompose the application into threads and spread them over multiprocessors, based on system-level, communication-aware cost metrics. Research themes include retargetable optimisation, compilation for multi-core systems, generation of code with guaranteed security properties, automated compiler generation, architecture and operating system cross-optimisation, architecture-aware compilation, and optimisation of high-level language for embedded systems. SYSTEM ARCHITECTURAL TOOLS In order that we may research and develop new architecture and compiler designs, we need new concepts and design methodologies as well as a complete set of tools for heterogeneous parallel design of highly complex computing architectures. We need tools permitting system design to cut boundaries between software, architectural, and microarchitectural aspects; tools from specification and modelling to verification and implementation and that deal with parallel or distributed hardware and software or hybrid systems. This calls in particular for tools that allow system-level design through 5

abstraction; retargetable tools and compilers for a range of computing platforms; design exploration tools; and power/performance estimation tools. In order to successfully address performance prediction, the development of software long before the respective hardware platform becomes available is a necessity. This calls for fast simulation platforms and portability with respect to code, tools, and applications to new platforms. And finally, we need developing tools allowing to easily programme (general-purpose) computing platforms to be best tuned to computational resources required from a specific application and to achieve programme efficiency. OPERATING SYSTEMS AND EXECUTION ENVIRONMENTS Highly flexible operating systems are needed that will provide a unified programming model for computing systems at different scales, as well as across different heterogeneous subsystems. They should address runtime configuration and support, efficient power management, real time system operation, reliability and dependability, dynamic workload distribution and scalable distributed OS implementation. OTHER RESEARCH TOPICS Dependable and secure computing architectures: Major requirements for future architectures are that they should detect errors and take measures to tolerate faults in each part of the processor architecture (from the data and control path to the memory path), but also at sub-system and system level. A set of methods for fault prevention and fault tolerance are required to cope with different fault and failure models. Hardware hooks for achieving security against unauthorized access (intrusion avoidance and detection) and against denial-of-service (DoS) attacks are also needed. 4 S&T COMMUNITIES ADDRESSED Each country throughout Europe has individual internationally recognised experts and small teams of high-quality research in one or more of the main contributing fields, i.e., architectures, compilers, operating systems, system design tools and methods. There is also a wide range of successful industries in these fields throughout the EU. This dispersal has led to a rich and diverse range of expertise. However, advanced computer architectures increasingly need large resources to make headway both in terms of computing power for simulation and person-power to investigate the myriad design tradeoffs. The most successful academic groups and companies are those able to sustain a large research staff and are currently based in the US. For Europe, the only real way to compete is to provide (financial and other) support for bringing groups closer together and combining their strengths, thus forming a critical mass of excellence that will provide solid support to industry. The initiative is expected to mobilise key research stakeholders. Participation from industry is required in order to address research directions that have the potential of providing the required application breakthroughs (ranging from tiny embedded or wireless systems to large internetworked server-based systems) in a mid- to long-term horizon. 6

5 CHARACTERISTICS OF SUCCESSFUL PROPOSALS To capitalise on a critical mass of efforts at the European scale, the programme will be implemented through a set of Integrated Projects and (possibly) a Network of Excellence. Proposers should read carefully the documents related to FET proactive initiatives (http://www.cordis.lu/ist/fet/int-p.htm) and the description of the IP and NoE instruments, both in general terms (http://europa.eu.int/comm/research/fp6/instruments_en.html), as well as in the frame of FET (http://www.cordis.lu/ist/fet/int-n.htm for NoE). Integrated Projects (IPs): IPs would focus on the investigation of generic emerging computing architectures addressing the grand challenges identified in Section 3 above. IPs should have a clear set of measurable and ambitious targets and be motivated by projected industrial requirements covering a broad range of application scenarios. They should define their target systems and application-linked benchmarks to assess their performances. They should be focused around a coherent set of research themes among those listed in Section 3 above. Each IP would normally assemble multidisciplinary teams for providing integrated solutions from architecture design through development of prototypes, compilers, and other tools, to their demonstration onto specific emerging application domains. Such application domains will be defined by the community with industrial partners having a particularly important role here. They should ensure that the research outcomes address the real issues in future applications. The actual research direction for each IP should not be prescribed, but it should be open to the best ideas from the research community. IPs could work on competing alternatives, or they may address complementary topics. Network of Excellence (NoE): Co-operation across and beyond that operated in the IPs could be organised through a Network of Excellence (NoE). The NoE would aim at grouping the best competencies available in Europe. It could create a basic joint research programme of activities at a number of research centres that have decided to converge their activities on a long-term basis. The main aim is to spread excellence, by bringing together the broader community active in embedded and networked computing architectures in order to provide a framework of co-ordination for research, training and related activities, and allow the progressive and lasting integration of these activities around a set of pre-specified themes. It should help achieve the necessary critical mass of researchers and invest in the future knowledge building and dissemination. Scientific exchange and interaction will allow rapid dissemination of the best ideas. The NoE should also capitalise on the inevitable technical overlap between each vertical Integrated Project, and allow to share ideas and resources and cross-fertilize the ideas between traditionally separate domains. In this respect, it should include in its joint research activities, support to the IPs for the development of agreed sets of performance testing and evaluation benchmarks. Finally, through specific calls for small research grants, the NoE could encourage the best research, which may not fit into any specific application area, as a speculative work can often have a real impact. 7