New development within the FPGA area with focus on soft processors

Similar documents
A Study of the Speedups and Competitiveness of FPGA Soft Processor Cores using Dynamic Hardware/Software Partitioning

FPGA How do they work?

Digital Integrated Circuits

Digital Systems Design. System on a Programmable Chip

ECE332, Week 2, Lecture 3. September 5, 2007

ECE332, Week 2, Lecture 3

Chapter 5: ASICs Vs. PLDs

FPGA for Complex System Implementation. National Chiao Tung University Chun-Jen Tsai 04/14/2011

INTRODUCTION TO FPGA ARCHITECTURE

Chapter 1 Overview of Digital Systems Design

Field Programmable Gate Array (FPGA)

EMBEDDED SOPC DESIGN WITH NIOS II PROCESSOR AND VHDL EXAMPLES

FPGA: What? Why? Marco D. Santambrogio

Laboratory Exercise 3 Comparative Analysis of Hardware and Emulation Forms of Signed 32-Bit Multiplication

CMPE 415 Programmable Logic Devices Introduction

The Nios II Family of Configurable Soft-core Processors

FPGAs in a Nutshell - Introduction to Embedded Systems-

FPGA for Software Engineers

Evolution of Implementation Technologies. ECE 4211/5211 Rapid Prototyping with FPGAs. Gate Array Technology (IBM s) Programmable Logic

Building A Custom System-On-A-Chip

NIOS CPU Based Embedded Computer System on Programmable Chip

PINE TRAINING ACADEMY

Abstract. 1 Introduction. Reconfigurable Logic and Hardware Software Codesign. Class EEC282 Author Marty Nicholes Date 12/06/2003

PREFACE. Changes to the SOPC Edition

Embedded Computing Platform. Architecture and Instruction Set

Overview. Memory Classification Read-Only Memory (ROM) Random Access Memory (RAM) Functional Behavior of RAM. Implementing Static RAM

Introduction to Field Programmable Gate Arrays

Field Programmable Gate Array

Advanced FPGA Design Methodologies with Xilinx Vivado

EECS150 - Digital Design Lecture 6 - Field Programmable Gate Arrays (FPGAs)

Outline. EECS150 - Digital Design Lecture 6 - Field Programmable Gate Arrays (FPGAs) FPGA Overview. Why FPGAs?

An Introduction to Programmable Logic

Embedded Systems Design Prof. Anupam Basu Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur

Using FPGAs in Supercomputing Reconfigurable Supercomputing

NIOS CPU Based Embedded Computer System on Programmable Chip

Overview. CSE372 Digital Systems Organization and Design Lab. Hardware CAD. Two Types of Chips

FPGA Based Digital Design Using Verilog HDL

Cover TBD. intel Quartus prime Design software

The Xilinx XC6200 chip, the software tools and the board development tools

CHAPTER 3 METHODOLOGY. 3.1 Analysis of the Conventional High Speed 8-bits x 8-bits Wallace Tree Multiplier

EN2911X: Reconfigurable Computing Lecture 01: Introduction

RTL Coding General Concepts

8. Best Practices for Incremental Compilation Partitions and Floorplan Assignments

Design Space Exploration Using Parameterized Cores

Programmable Logic Devices

Teaching Computer Architecture with FPGA Soft Processors

ECE 1160/2160 Embedded Systems Design. Midterm Review. Wei Gao. ECE 1160/2160 Embedded Systems Design

Embedded Systems: Hardware Components (part I) Todor Stefanov

Spiral 2-8. Cell Layout

PS2 VGA Peripheral Based Arithmetic Application Using Micro Blaze Processor

PDF created with pdffactory Pro trial version How Computer Memory Works by Jeff Tyson. Introduction to How Computer Memory Works

Cover TBD. intel Quartus prime Design software

Architecture and Partitioning - Architecture

The QR code here provides a shortcut to go to the course webpage.

Hardware Modeling using Verilog Prof. Indranil Sengupta Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur

V8-uRISC 8-bit RISC Microprocessor AllianceCORE Facts Core Specifics VAutomation, Inc. Supported Devices/Resources Remaining I/O CLBs

VHDL-MODELING OF A GAS LASER S GAS DISCHARGE CIRCUIT Nataliya Golian, Vera Golian, Olga Kalynychenko

SYSTEMS ON CHIP (SOC) FOR EMBEDDED APPLICATIONS

Picture of memory. Word FFFFFFFD FFFFFFFE FFFFFFFF

CS/EE 3710 Computer Architecture Lab Checkpoint #2 Datapath Infrastructure

Hardware/Software Partitioning for SoCs. EECE Advanced Topics in VLSI Design Spring 2009 Brad Quinton

Lecture 1: Introduction Course arrangements Recap of basic digital design concepts EDA tool demonstration

FPGA. Logic Block. Plessey FPGA: basic building block here is 2-input NAND gate which is connected to each other to implement desired function.

DKAN0011A Setting Up a Nios II System with SDRAM on the DE2

Hardware Design Environments. Dr. Mahdi Abbasi Computer Engineering Department Bu-Ali Sina University

Lecture Objectives. Introduction to Computing Chapter 0. Topics. Numbering Systems 04/09/2017

Functional Programming in Hardware Design

QUARTUS II Altera Corporation

DE2 Board & Quartus II Software

Xilinx Vivado/SDK Tutorial

Field Program mable Gate Arrays

Multi Cycle Implementation Scheme for 8 bit Microprocessor by VHDL

High Level Abstractions for Implementation of Software Radios

Programmable Logic Devices UNIT II DIGITAL SYSTEM DESIGN

Design of Digital Circuits

4DM4 Lab. #1 A: Introduction to VHDL and FPGAs B: An Unbuffered Crossbar Switch (posted Thursday, Sept 19, 2013)

University of Massachusetts Amherst Computer Systems Lab 2 (ECE 354) Spring Lab 1: Using Nios 2 processor for code execution on FPGA

CS310 Embedded Computer Systems. Maeng

Today. Comments about assignment Max 1/T (skew = 0) Max clock skew? Comments about assignment 3 ASICs and Programmable logic Others courses

Design Methodologies and Tools. Full-Custom Design

DESIGN OF A SOFT-CORE PROCESSOR BASED ON OPENCORES WITH ENHANCED PROCESSING FOR EMBEDDED APPLICATIONS

Hardware Software Codesign of Embedded Systems

The SOCks Design Platform. Johannes Grad

FPGA Technology and Industry Experience

Design Methodologies. Full-Custom Design

L2: FPGA HARDWARE : ADVANCED DIGITAL DESIGN PROJECT FALL 2015 BRANDON LUCIA

Parallelizing FPGA Technology Mapping using GPUs. Doris Chen Deshanand Singh Aug 31 st, 2010

Introduction to reconfigurable systems

A Configurable Multi-Ported Register File Architecture for Soft Processor Cores

University of Massachusetts Amherst Computer Systems Lab 1 (ECE 354) LAB 1 Reference Manual

Chapter 2 Getting Hands on Altera Quartus II Software

Computer Systems Organization

Choosing an Intellectual Property Core

System-on Solution from Altera and Xilinx

ECEN 449 Microprocessor System Design. FPGAs and Reconfigurable Computing

System Design and Methodology/ Embedded Systems Design (Modeling and Design of Embedded Systems)

Memory. Objectives. Introduction. 6.2 Types of Memory

Chapter 10: Design Options of Digital Systems

Workspace for '4-FPGA' Page 1 (row 1, column 1)

ECE4530 Fall 2011: Codesign Challenge Bit-Matrix Multiplication on a DE2-115 FPGA Board

Transcription:

New development within the FPGA area with focus on soft processors Jonathan Blom The University of Mälardalen Langenbergsgatan 6 d SE- 722 17 Västerås +46 707 32 52 78 jbm05005@student.mdh.se Peter Nilsson The University of Mälardalen Brahegatan 4 b SE- 722 16 Västerås +46 70 751 86 17 pnn05001@student.mdh.se ABSTRACT This paper will compare field programmable gate array (FPGA) with other systems like PC and microprocessor, and look at the advantaged and disadvantages. It will also talk about the different applications of FPGA. Another central topic is a new area in FPGA and that is to create a processor in the FPGA chip to make a system on a chip (SoC). This means a complete system within one chip that can be programmed with a high level programming language like C. Our paper ends with summarizing some new experimental research on FPGAs and soft processor development. Keywords FPGA, Soft CPU, Altera, Nios II. 1. INTRODUCTION Not long ago we could only develop software system on a PC or develop hardware system on a chip. Today the boundary between software and hardware is getting more and more diffuse. On a hardware system, like FPGA, we can synthesis a soft processor and combine it with system specific hardware. A soft processor means that we create a processor within the FPGA chip. We can then program this processor with complex design in high level language like C and combine that with the fast speed of hardware components. This report starts by writing background information about FPGA and the advantages and disadvantages it has towards other systems. We then move over the explain concepts like hardware and software solution and technical terms such as system on a chip (SoC). Embedded system and soft processors are then explained. And finally we write about the applications for FPGA and areas where FPGA is developing today. 2. BACKGROUND 2.1 FPGA compared to other technologies 2.1.1 Microprocessors Microprocessors can be used to solve a wide range of tasks. This is due to its property of execute an arbitrary code that follows a specified template, known as software. The price for this flexibility is that the code is executed sequentially, demanding high clock frequencies in order to perform complex, computational heavy tasks within specified time. A higher clock frequency comes with higher price and consumes more power. 2.1.2 ASIC Application Specified Integrated Circuits (ASIC) is an alternative to the microprocessor; this circuit is designed to solve a specific task. This has no programmable processor instead in does a specific task very fast. This is due to that the task can be divided into smaller tasks that can be run in parallel. This will dramatically decrease the execution time compared to the microprocessor even if the ASIC is working in a lower clock frequency. The drawback with ASIC is the way they are manufactured. First a design is created in a hardware description language (HDL). This description file is then sent to an ASIC manufacturer. The process of manufacturing the circuit from the description file can take several weeks or even months. Due to the complex manufacturing methods small production volumes results in an unacceptable price per unit. If a bug is discovered in the ready circuits, a new circuit has to be produced and the old circuits have to be obsolete. This will cause a severe delay for the end product to be introduced on the market. 2.1.3 FPGA In the 1980s Ross Freeman came up with a new business concept, to make the market of ASIC more flexible; manufacture a blank computer chip that the costumer can program to solve a specific task, without involving the manufacturer. 1984 Ross Freeman and Bernard Vonderschmitt founded Xilinx and released their first field programmable gate array (FPGA) in November 1985 [2]. The FPGA allows the costumer to use an ASIC like component for low volume products, to a reasonable cost, with short time to market. 2.2 Technical description An FPGA consist of configurable logic blocks (CLBs) and an matrix of wires to connect the CLBs to each other and to inputs/outputs. A CLB can in turn be programmed to imitate a variety of logical function, from simple AND, OR, NAND to more complex arithmetic functions, depending on the specific construction. Most CLBs consist of a lookup table; which is a small memory cell with one output and n inputs, the inputs decide which memory bit that is read out on the output. The size of the memory cell is 2 n 1 bit; these bits can be programmed to make the lookup table implement any n bit logical function [3]. As an alternative to the lookup table a mux together with additional logic can be used instead. Besides the lookup table or mux most CLBs also contain one or more flip flops and some supporting logic. It s not uncommon that the CLBs in an FPGA are different from each other, e.g. only 50% of the CLBs contain the flip flop

that for example enables sequential functions, and thus is seldom needed in all CLBs. Between the CLBs there is a matrix of wires that are used to connect one CLB to another and make connections to input/outputs. When not programmed none of the CLBs are connected to the wires. The desired connections are enabled by static random access memory (SRAM) connected to a switch, this combination makes a programmable switch. These features together then allow the user to program simple or complex logical functions in the CLBs and then connect them, via the programmable switches, to create even more complex functions. Table 1 show that software is not run much in parallel while hardware solution is in very high degree parallel. What this means in that as long parallelism is possible hardware is much faster. The implementation of the software in solved in machine code in the system s memory while hardware system is created with gates, flip-flops and memories. The software platform requires a processor in the system that executes the software. It also requires a bus and memory for the system to communicate with the different part of the system. The hardware solution is on a FPGA for example, which is built up of flip-flops and gates. As we will show later in this report it is today possible to combine the best from both software and hardware on the same chip. 3.1 Moving parts from software to hardware In a FPGA system it is easy to move functionality between hardware and software depending on the requirement for cost and performance [6]. In this area, called co-design, there is intensive research and development. Develop new hardware today can be compared to writing new software in machine code. It s gets to hard when it gets complex. It is much easier to develop in a high level language like C when the complexity gets higher. But today we can develop the application in C and then automatically get parts of it converted to be synthesized in hardware for increased performance. This is a new area and is developed quickly today. Function converted into hardware takes up more logical elements in the FPGA, because of this many functions that don t require much performance should be kept in software running on the soft CPU. Software can handle complex function in matters of microseconds while hardware can handle less complex function with much higher performance. Figure 1. [3]. FPGA structure. 3. HARDWARE VERSUS SOFTWARE Software has traditionally been seen as flexible while hardware is more static [1]. Earlier scientist and engineers have seen hardware and software as distinct entities and that they do not have much in common during their design process. But today this is no longer true. Today there is much tighter connection between hardware and software. Today we can write a program in software, then translate it into a hardware description language and synthesize it to target architecture on for example a FPGA. Table 1. Compare between software and hardware Software Hardware Parallel Little Massive parallel Implementation Machine code in a memory Gates, flip flops, memory Platform CPU, bus and memory Gates and flip flops (e.g. FPGA) 4. SYSTEM ON A CHIP If you combine everything you need for an electronic system into one chip it is referred to as a system on a chip (SoC)[1]. Such system can for example be a cell phone or digital camera. So instead of combining one CPU with external peripherals you combine it all into one chip. All these SoC:s are optimized for it purpose and they include only the peripheral that is needed for that specific system. A system design to handle sound might include an audio receiver, analogue-to-digital converter, microprocessor, memory and the input and output logic for the user, all this on the same chip. The input and output logic can be connected to for example a keyboard and a display. These small chips can be and have the processing power and memory size as a 10-year old desktop computer. With an FPGA it is easy to develop a SoC because you can create all the needed hardware directly in your FPGA chip. This system is called a SOPC, system on a programmable chip. 5. EMBEDDED COMPUTER SYSTEM On an embedded computer system you have an application running on a platform [1]. The platform is the system s hardware and operating system. Smaller systems might not need an operating system. The application can be seen as the software that controls what the hardware is suppose to do. 5.1 Embedded FPGA systems On the FPGA you can today easily synthesis exactly which parts you need for your system. For example if your system needs a lot of memory you can access external memory, for example SRAM

memory, by synthesis a SRAM controller on your FPGA chip. This SRAM controller is then connected to the external memory pins using a standard interface. If we need to connect the system to a network we can synthesis a TCP/IP controller and so on. Inside the FPGA a CPU and a system specific bus will be synthesised. This enables all the different peripherals, like the memory controller, to communicate with each other. This way you can design your own system in a matter of minutes where just a few years ago you might needed years. 5.2 Development of a FPGA systems Today the development software for FPGA is much easier to use, it is drag and drop. You select which components you need in your FPGA system and then it will synthesis the chip. If for example we need to connect a monitor, we add a VGA controller. This can be solved through several ways; the following are the three mayor ones. 5.2.1 Reuse component The first way is reuse a controller you have used before. So if you already have a developed VGA controller you can easily reuse it in this new system. And because you used it before you can be sure that it is working and how to use it. 5.2.2 Buy component Today a new way to add components has seen the day light. And that is to buy finished component from other companies or developers. We are not talking about buying a physical chip instead we just buy the hardware description code for synthesis this component in our FPGA system. The hardware description code can be sent through e-mail or downloaded from the developer s homepage. This is a simple and fast way to get hold of new hardware for our FPGA system. And once again, because this component has been used before it is well tested and documented. One important aspect of development is to get the product out fast on the market, before another company have released a similar product. And today it is easy to buy knowledge from other companies by buying component instead of develop them within the company. This could even be cheaper than hiring a consultant for the development. 5.2.3 Develop component yourself The third way is to develop the component yourself. This can be done by writing the component in a hardware description language like VHDL. Today it has become much easier to connect this new component to the rest of the system. Today the development software can automatically build a wrapper around this new component so it can be connected to a standard bus. A wrapper is a way to create a standard interface from different components. With this standard interface all the different components can be connected to the bus and all integrated inside our FPGA chip. 6. SOFT PROCESSOR A soft processor is a processor implemented in the FPGA[6]. They are synthesised in the FPGA just like other hardware would be in FPGA. This soft processor won t take up the entire FPGA chip so there is still place for other hardware. So we can easily combine our software program with our specific hardware. This way we can get the best from both worlds, the easiness of programming a standard CPU with the speed of pure hardware. 6.1 Normal CPU solution Today it is simple to develop new software for a computer system with a standard CPU. Code can be written, reused and altered, it can be copied from different systems as long there is a compiler that can produce the binary code for that specific hardware. The programming languages can easily be learned by developers. 6.2 Hardware solution As we just mention writing code for a CPU is often easy. To write VHDL for FPGA gets quick complex when the systems gets bigger. Even with new development tools it can be hard to write a whole system. The main advantage with a FPGA is the speed. FPGA can be many times faster than any CPU that exists today. 6.3 Combine hardware and soft processor The latest today is that we build a soft processor in the FPGA and run software written in C on that processor. This software does not run as fast as pure hardware solution but about the same speed as a normal CPU. The main advantage here is that it is much easier to develop C code than solve it entirely in hardware. So why not only use a hardware CPU? First of all we can optimize our soft CPU for our task. Another big advance is that we can combine this soft CPU with component build with VHDL, these components is of course very fast for they are entirely in hardware. This way we can combine them both. 6.4 Development methods A good way to develop a new system is to first write it completely in software and observe if our idea works. This way the development time is shortened. If our design works we now observe if the performance is sufficient. If the systems performance is not good enough we must optimize the system. The optimization is done by taking part from software and converts them into a hardware solution. The two main advantages with this are that is gets much faster and if the CPU crashes the hardware will still be running. For some soft CPU there is even a function in the development tools that automatically change software solution into hardware. The company Altera has for their Nios II soft CPU developed a C-to-Hardware Acceleration Compiler [10]. This software compile C code directly to hardware without the user have to write a line of VHDL. 6.5 The Altera Nios II family One of the soft processor on the market is made by Altera. Their soft processor is called Nios II. It has a general-purpose RISC processor core with full 32-bit instruction set, data path and address space [9]. Altera have three different models in the Nios II family. They are Nios II/f, Nios II/s and Nios II/e. 6.5.1 Nios II/f The most complete model is the Nios II/f, fast, model. This model has all the functionality that the Nios II family has to offer. It is designed for fast performance. This core has the most configuration option allowing fine tuning the processor for performance.

6.5.2 Nios II/s The middle version is the standard version, Nios II/s. It is design for smaller size but without losing its performance. On-chip logic and memory resources are conserved at the price of lower performance. The performance is about 40% less than the fast edition but the core size is also about 20 % smaller. This is optimized for cost-sensitive medium performance application. This model has instruction cache but no memory cache. 6.5.3 Nios II/e Last, but not least, the economic version Nios II/e. It is slower than the two other models and has a bit fewer features; many of the settings are not available in the economic version. It was designed to attain the smallest possible core size. It reduces resource utilization while still uphold capability with Nios II instruction set. Hardware resources are reduced at the cost of performance. This model has neither instruction cache nor data cache. You might think this version is just a cheap and simpler version of its bigger brothers. But it has its advantages and that is the core size. Every soft CPU core use a lot of logical elements (LEs) and the FPGA only has a maximum amount of elements so often it can be necessary to use a more simple soft CPU to be able to put more function in the hardware. The Nios II/e version is about half the size of the more advanced models and the performance is even less. The fast and standard version on Nios II use somewhere around 1200 to 1800 LE, while the economic version use around 600-700 LE. This can be a mayor factor when deciding what model to use. All models can access up to 2 gigabytes of external memory. The fast and standard edition can use pipe-lines. 7. APPLICATIONS OF FPGA When it is critical that the product hit the market quickly, using FPGA instead of regular ASIC circuits can be preferred. This is due to the production time for ASIC circuits are long. Also when an ASIC is the optimal solution it can be better to use FPGA if the volumes are low, this to reduce the cost per unit. If an ASIC is the final goal, the FPGA can be used for prototyping and debugging to verify the HDL code before manufacturing. This is to prevent bugs in the final product that will surface when the FPGA is tested. If a task demands heavy computations a FPGA can be of great use, such a task is manipulation of video signals. The Swedish company Lyyn has manufactured a video enhancement product. Their product manipulates video taken in unfavorable conditions, such as fog or snowstorm, in real time to make clear images. This is a computational heavy task which they have implemented on a FPGA [4]. The choice of FPGA ahead of ASIC for implementation will also help them protecting the central algorithm in the product against black market copies. Due to the property that a FPGA, if correctly encrypted, is immune to the methods used to copy ASIC circuits, for example x-ray. Other areas for FPGAs are audio manipulation, sorting and searching algorithms, random number generation among many others. A new developing area for FPGA today is soft processors, a micro processor integrated as a component in a FPGA. This processor can be then modified to fit a specified application. Unnecessary parts can be excluded and new parts integrated, the basic architecture can be modified. Simulation tools can be used to locate recurring parts of software code that can be implemented in hardware to speedup execution time. Researchers in this area are developing soft processors that autonomously perform this procedure during execution [6]. There are more information about this area and soft processors in later sections. Another interesting area is using smaller and cheaper FPGAs to replace more expensive microcontrollers, and larger FPGA, in tasks that are not deeply parallel. The system is constructed with two small FPGAs, a memory, and a sequencer that shifts execution between the two FPGAs and reprograms the non executing FPGA with the next code sequence to be executed [7]. 8. DEVELOPING AREAS AND RESEARCH 8.1 Customization of FPGA soft processors In [5] the authors explore the ability to optimize a soft processor for a specific application; this is done by tuning several parameters in the processor. The tool used to create these costume soft processors is SPREE, this tool allows the user to specify the processor in a high level text based description and then generates it in Verilog a HDL. The performance of the generated soft processor is then tested in a simulation tool, and other parameters such as power consumption and clock speed are generated by a CAD tool. The processor parameters considered are, logical shifter, multiplication support, pipelining depth, pipelining organization and forwarding. Processors with various combinations are then created, and tested by executing different benchmark programs, this way they try to create a processor that is optimal for the current benchmark. The best processors for the different benchmarks are then compared to the commercially available Nios II soft processors that also have been tuned for the specified application. The SPREE generated processor is in general, all different benchmarks in the test compared, 14.1% better measured in performance per unit area. Next they optimize the processors instruction set architecture (ISA), the SPREE tool is used to map which instructions the different benchmarks use and the unused and there associated logic are removed. This optimization alone generates on average 16.2% better performance per area, combined with the previous optimization the gain is 8,6% performance per area. Worth noting is some benchmarks achieve there best result, performance per area, with only one off the two optimization methods in use, the total improvement using both techniques is on average 24.5%.

giving significant speedup on application with compact code kernels. The average speedup of the warped microblazer compared to the original macroblazer is 360%, and the average energy saving is 49%. A selection of hard microprocessors with higher clock frequencies are also compared to the low frequency soft processor microblazer, and with the warping technique it can, for some applications, be used as an alternative to these. 8.3 Hardware context switching Executing large algorithms on low-capacity FPGAs using flow path portioning and runtime reconfiguration in this [7] paper the authors develop a method to compile java code algorithms to VHDL flow paths. This flow path can then be executed on a FPGA, if the FPGA is large enough. To be able to execute a large algorithm on a smaller FPGA a partitioning algorithm is developed that partitions the flow path in atomic units. A atomic unit being the smallest part of the code needed to execute, in hardware, during one clock cycle. The size of the largest atomic unit will then limit the size of the smallest FPGA that can be used to execute the flow path of the algorithm. When the flow path have been partitioned into smaller peaces the execution can be preformed by a smaller FPGA, to execute the entire flow path all one needs to do is save the computed data and reconfigure the FPGA with the subsequent atomic unit(s) and continue the execution. To keep the execution continuous as possible they use a system consisting of two FPGAs that share the workload, when one is executing and the other is programmed with a new atomic unit(s). The process is controlled by a sequencer that reprograms the non executing FPGA and also gives the executing FPGA access to the memory. The two FPGAs are connected to each other to be able to transfer variables, used within the flow path, when execution is handed over to the other FPGA. Figure 2. [5]. Performance-per-area of tuning, subsetting, and their combination. 8.2 Dynamic converting of software to hardware A Study of the Speedups and Competitiveness of FPGA Soft Processor Cores using Dynamic Hardware/Software Partitioning One of a FPGAs strong abilities today when replacing a hard core processor with a soft processor, is the possibility to convert repetitive parts of code to hardware. Today this is done by the compilation tool before the bit stream is downloaded in the FPGA, the authors in this [6] paper explore the ability to perform this warping on chip during execution. The method developed uses a microblaze soft processor, when the execution starts a component called profiler monitors the addresses accessed on the instruction data bus, whenever a backward branch occurs this is registered on a memory. This information is then used to evaluate which part of the code that can be successfully transformed to hardware. This section of machine code is then compiled into hardware code that is mapped, placed and routed to create a bit stream that can be downloaded to the FPGAs warp configurable logic architecture (WCLA). The software code is then updated to move the execution of the warped code into hardware. The speedup this technique archives is highly application dependent Figure 3. [7]. A system for executing flow paths generated by large programs using two low-capacity FPGAs, a sequencer, data RAM, and a ROM for bit files. The setback encountered is the time it takes, for FPGAs available today, to be reconfigured, this produces a delay. The executing FPGA has completed the execution and the non executing FPGA is still being reconfigured. The authors have made calculations on how this system could perform if the FPGA used in the experiment could be reconfigured in 32 bit at a clock speed of 100 MHz. The time it would take for the system to execute the example program would then be 2.58 seconds compared to 1

seconds for the far more complex and expensive j-stamp microcontroller, the actual execution time was 20.64 seconds. 9. CONCLUSION We can see a strong development in the area of soft processor. We see that FPGA systems have a strong future ahead and can well take over from others systems. FPGA is becoming more and more flexible and dynamic and can be used to solve much more complex function now without making the development to complex. The combination of soft CPU and pure hardware can lead to the development of advanced and fast new systems. For example within hardware context switching if they can reduce the overhead. 10. REFERENCES [1] Lennart Lindh, Tommy Klevin, 2008, HW/SW Embedded System Design with FPGA Technology, AGSTU AB [2] Grant, Tina, International directory of company histories. Vol. 16, St. James, cop., 1558622195, 1997. [3] Brown, S.; Rose, J., "FPGA and CPLD architectures: a tutorial," Design & Test of Computers, IEEE, vol.13, no.2, pp.42-57, Summer 1996 [4] www.lyyn.com/visibility/qa.html Accessed 2008-10-20 [5] Yiannacouras, P.; Steffan, J. G.; Rose, J., "Exploration and Customization of FPGA-Based Soft Processors," Computer- Aided Design of Integrated Circuits and Systems, IEEE Transactions on, vol.26, no.2, pp.266-277, Feb. 2007 [6] Lysecky, R.; Vahid, F., "A study of the speedups and competitiveness of FPGA soft processor cores using dynamic hardware/software partitioning," Design, Automation and Test in Europe, 2005. Proceedings, vol., no., pp. 18-23 Vol. 1, 7-11 March 2005 [7] Hanna, D. M. and DuChene, M. 2007. Executing large algorithms on low-capacity FPGAs using flowpath partitioning and runtime reconfiguration. Microprocess. Microsyst. 31, 5 (Aug. 2007), pp.302-312. [8] Reasoning about naming systems. ACM Trans. Program. Lang. Syst. 15, 5 (Nov. 1993), pp. 795-825 [9] Altera, Nios II Processor Reference Handbook, Rev 8.0 2008 [10] Altera, Automated Generation of Hardware Accelerators With Direct Memory Access From ANSI/ISO Standard C Functions, 2006