Benchmark hardware support for virtual machines

Size: px
Start display at page:

Download "Benchmark hardware support for virtual machines"

Transcription

1 Benchmark hardware support for virtual machines Master thesis report Abstract This report identifies and investigates what performance gain hardware acceleration techniques (VFPv3, SMP, NEON, and Thumb2) can give on a virtual machine, with or without the use of JIT. The best performance gain can be made when a combination of techniques is used (SMP support together with a VM, compiled with the techniques VFPv3/NEON, using a JIT that can produce VFPv3 instructions). The impact on memory usage and CPU load is minimal when using these techniques. Programs executed must use the parts in the VM that has been affected by the acceleration techniques to gain any performance boost. It is of all interest to continue tests in the area of hardware accelerated VMs. Authors: Håkan Ternby, Eskil Algéus Supervisor: Peter Andersson, Supervisor at ST-Ericsson Henrik Svensson, Manager at ST-Ericsson Jonas Skeppstedt, Examiner from,

2 Project division of labor The work in this project was divided in a way that: Eskil made investigations and tests on how the hardware acceleration techniques affected the performance with or without the JIT. He also investigated SMP support and how memory and CPU load was affected by the techniques. Håkan made investigations and tests on Thumb2 and VFP3/Neon. Håkan also investigated the VFP in a more detailed way by examining code. He also did measurements of CPU and memory load. Both of us read through all the reference material and for the report we wrote the parts that we worked on and had the other person to proofread. ii

3 Acknowledgement We would like to thank ST-Ericsson for contributing with the workspace and test equipment used in this project. We also want to thank (in alphabetic order); Andersson, Peter (our supervisor at ST-Ericsson, for guidance in this project and on this report), Fagerstedt, Axel (ST-Ericsson, for debugging and code help), Nilsson, Anders (ST-Ericsson, for help with code and programs), Skeppstedt, Jonas (our supervisor at CS-LTH, for input on this report), Strand, Henrik (ST- Ericsson, for helping out with hardware and questions), Svensson, Henrik (ST-Ericsson, our manager in this project), and to the rest of the team under Henrik Svensson at ST-Ericsson. iii

4 Contents LIST OF FIGURES 6 LIST OF DIAGRAMS 7 1 INTRODUCTION BACKGROUND AND PURPOSE PROJECT DELIMITATIONS AND LIMITATIONS PROJECT SCOPE REPORT OUTLINE THEORY BACKGROUND THE VIRTUAL MACHINE JIT (Just In Time) compiler JAVA The J2ME platform The CLDC version of J2ME The CDC version of J2ME THE ARM PLATFORM The ARM Architecture Cortex Family Multi-Processing Core ARM PLATFORM EXTENSIONS Thumb Thumb Thumb2EE Jazelle Jazelle RCT Jazelle DBX Vector Float Point extension Single Instruction, Multiple Data Neon BENCHMARK PROGRAMS Grinderbench SciMark The Monte Carlo integration Successive Over Relaxation METHODS WORK PLAN SYSTEM SETUP The Virtual Machine Compiler flags RUNNING TESTS AND OBTAINING RESULTS Test result evaluation methods RESULTS AND DISCUSSION HARDWARE ACCELERATION TECHNIQUES COMPARISON JIT_OFF JIT_ON JIT_HW_FP Hardware acceleration techniques discussion JIT_OFF vs. JIT_ON vs. JIT_HW_FP JIT discussion

5 4.2 SMP SUPPORT SMP discussion INSTRUCTION SET COMPARISON AND PERFORMANCE Thumb SWP instruction Thumb2 discussion Jazelle discussion Vector Floating Point Tracing the mathematical Java method arcsine Modified SciMark2 SOR method with Java method arcsine Code comparison from e_asin.o with and without VFP JIT_ON and JIT_HW_FP code comparison VFP discussion Neon discussion CPU/Memory usage SMP discussion General findings and discussion CONCLUSION CREDIBILITY ANALYSIS FUTURE WORK

6 List of Figures FIGURE 1: DIFFERENCES IN CLDC AND CDC [14] FIGURE 2: THE ARM ARCHITECTURE EXTENSIONS FOR DIFFERENT ARCHITECTURE VERSIONS. [23] FIGURE 3: SPEED VERSUS POWER CONSUMPTION CHART OF THE CORTEX A9 MPCORE. [27] FIGURE 4: PERFORMANCE VERSUS CODE DENSITY COMPARISON OF THREE INSTRUCTION SETS. [34] FIGURE 5: THE DIFFERENT DECODING STAGES.[39]

7 List of Diagrams DIAGRAM 1: COMPARISON OF TECHNIQUES WITH JIT_OFF DIAGRAM 2: COMPARISON OF TECHNIQUES WITH JIT_ON DIAGRAM 3: COMPARISON OF TECHNIQUES WITH JIT_HW_FP DIAGRAM 4: COMPARISON OF JIT_OFF, JIT_ON AND JIT_HW_FP DIAGRAM 5: SMP COMPARISON ON SCIMARK DIAGRAM 6: SMP COMPARISON ON GRINDERBENCH DIAGRAM 7: CVM SIZE COMPARISON WITH AND WITHOUT THUMB DIAGRAM 8: PERFORMANCE COMPARISON OF THE MODIFIED SOR METHODS DIAGRAM 9: CPU LOAD WITH JIT_OFF AND NO_SMP ON SCIMARK DIAGRAM 10: CPU LOAD WITH JIT_OFF AND SMP ON SCIMARK DIAGRAM 11: CPU LOAD WITH JIT_OFF AND NO_SMP ON GRINDERBENCH DIAGRAM 12: CPU LOAD WITH JIT_OFF AND SMP ON GRINDERBENCH

8 1 Introduction An interesting task is to investigate the benefits of using hardware acceleration of a virtual machine (VM), to see what performance gain can be made. The use of hardware acceleration techniques can make the difference of a good or bad user experience when running Java programs on a mobile unit. Hardware acceleration helps to speed up execution and thus lower the power consumption. The VMs execution speed is directly related to limiting factors of the mobile unit such as memory size, battery capacity, processor speed, and cost requirements [1]. Speeding up the VM is also interesting, from a company view, as Java is the leading mobile application environment and has the highest penetration of devices worldwide with the most available applications [2]. 1.1 Background and purpose ST-Ericsson is a global leader in Wireless technologies. As such, it is crucial to investigate new techniques. One area of interest is the VM and available hardware acceleration techniques for it. ST-Ericsson decided to start this project as a master thesis to identify and investigate available hardware acceleration techniques for VMs. The overall purpose is to find out if there is any performance gain for the VM when using hardware acceleration techniques compared when not using any. The work and testing was done on site at ST-Ericsson Lund, Sweden. ST-Ericsson helped out by providing the target platform, workspace, and personal support such as setting up user accounts, answering questions, and give guidance along the way. 1.2 Project delimitations and limitations The research in this project does not take in consideration economical cost/benefits when using these hardware acceleration techniques. Also this project shall not test the techniques on different VMs, nor shall it investigate the possibility to do so. Performance tests shall only be run on one hardware target platform as the goal is to compare the different hardware techniques against each other under the same hardware circumstances. A limitation of this project is that not all hardware/software solutions can be tested as some of the solutions may be under license or may not be supported by software or target platform. Furthermore the results of the benchmark programs are not valid on systems running other kernels or when other applications are running simultaneously with the VM. The results can only be a vague guidance for other system setups than used in this project. 8

9 1.3 Project scope The idea is to test hardware solutions to speed up the VM and also to compare if these hardware solutions have an impact on the execution speed when running a Java program. To be able to compare the hardware solutions against each other, results from standardized benchmark programs will be used. Measurements of the CPU load, RAM usage, and flash footprint will also be done. The goal is to find answers to the following questions; Which hardware acceleration technique gives the best performance boost? Can any software technique compete or perform better with the use of hardware acceleration techniques? Can combinations of techniques give better performance gain than when working stand alone? Is CPU load, RAM load and Flash footprint affected by the hardware acceleration techniques and how? Does the VM need to be able to handle multitasking to gain further performance boost? 1.4 Report outline The chapters in this report have the following topics: Chapter 2, theory of the techniques and hardware platform Chapter 3, the method is described Chapter 4, results of the tests and discussion of them Chapter 5, conclusion and suggestions for further work 9

10 2 Theory The purpose of this chapter is to show the background theory and present the hardware acceleration techniques, benchmark programs, and hardware used in this project. 2.1 Background A good user experience is crucial as a selling point for mobile units. One thing that contributes to the user experience is how fast programs execute on the unit [3]. This has lead to investigations on how to speed up the execution speed of Java programs on mobile units since leading mobile operators expand their use of java programs in their networks [4]. A Java program is platform independent when compiled into Java byte-code. When executing Java byte-code the VM compiles it into native instructions. Optimizing the execution speed of the VM can then increase the overall performance of a Java program. For this reason manufactures has found different improvements to speed up the execution speed of the VM. These improvements can be both hardware and software improvements. There have been various works and projects by others to investigate improvement techniques of the VM. The project Evaluation of a hardware accelerated java virtual machine on embedded devices [5] shows that in some cases hardware acceleration can give a better performance. It also concluded further investigations of hardware acceleration are needed as the project only looked at one type of acceleration technique. Another work done, Hardware support for embedded java [6] presents hardware techniques used for accelerate Java binary translation through the extension of embedded processor pipelines. Techniques for RISC processors, such as ARM, are presented and investigated. Project conclusion is that improvements can be made when using hardware extensions, but it depends on the embedded system. 2.2 The Virtual Machine A Virtual Machine (VM) executes software code like a physical machine would do. The VM has an instruction set and is allowed to manipulate memory areas at runtime. It creates threads and gives these program counters and native stacks [7]. VMs can provide an Instruction Set Architecture (ISA) that is different from the underlying hardware ISA with high-level abstraction and performance like compiled programming languages. There are two types of VM:s; System VM:s, and Processor VM:s. System VM:s is only virtualization at the ISA level [8]. An example is the SUN CLDC HI. 10

11 Process VM:s is a type of VM that can only run one process at a time. Multiple instances of a process VM is needed to run multiple processes. PhoneMe Advanced is a process VM [8] JIT (Just In Time) compiler JIT is a dynamic compiler with the goal to produce fast code at the smallest possible compile time [9]. It will compile the most frequently executing methods to native code while the program is running. This means that portability is still intact as native code compilation is done at runtime instead of compiling before the program is run [10]. The VM must initially interpret the program and then analyze how it runs by looking for most frequently executed portions of byte-code in order for this to work. These portions of byte-code are then compiled into optimized native code during program execution [11] since compilation happens at the same time as the program is executing the compilation time will add to the programs total running time. A JIT can be implemented so it can take advantages of hardware acceleration techniques by using an instruction set belonging to the technique. For example a JIT compiler can use a floating point unit by emitting special floating point instructions and using floating point registers [12]. 2.3 Java The Java platform currently consists of three versions; Java 2 Enterprise Edition (J2EE), Java 2 Standard Edition (J2SE), and the Java 2 Micro Edition (J2ME) [13]. Figure 1: Differences in CLDC and CDC [14] 11

12 Java code is compiled to byte-code and saved as a Java class file. The class file is then interpreted at runtime in the Java VM (JVM). The JVM and the Java class files are defined in the Java Virtual Machine Specification [15]. Java byte-code is compiled using either of two techniques; Ahead-Of-Time (AOT), and Just-In-Time (JIT) [10]. AOT compilation is a technique that compiles the Java byte-code into a system dependent binary and provides faster start-up time. AOT performs the compilation before the actual execution at the cost of flash memory. JIT, on the other hand, converts code at runtime and gives in most cases an overall performance that is better than AOT. The code is, briefly, compiled into native machine code. It lacks predictability in performance. This is because when JIT can t find the needed code in the memory cache, it must start compiling it which will create an overhead The J2ME platform The J2ME is primarily targeting consumer products with limited resources and is a collection of technologies and specifications that can be combined to construct a complete Java runtime environment [7]. In the J2ME architecture, configurations and profiles were introduced to be able to address the problems with limited resources on different mobile platforms [16]. A Configuration defines the Java language features and the core Java libraries of the JVM. Different device limitations make the use of different configurations necessary. There are two such configurations for J2ME; the Connected Limited Device Configuration (CLDC), and the Connected Device Configuration (CDC). The Profile is an extension to the configuration and is a set of standard APIs that support a narrower category of devices within the framework of a chosen configuration [17] The CLDC version of J2ME The latest version of the Connected Limited Device Configuration (CLDC) cleared by the Java Community Process (JCP) has the release name Java Specification Request (JSR) 139, also called CLDC 1.1 [18]. CLDC is a minimal runtime environment. The CLDC specification defines three things: CLDC does not handle thread groups, lacks dynamic class loading. It has a very small subset of the J2SE 1.3 classes A new API set for Input/Output called Generic Connection Framework (GCF). The CLDC does not define APIs for user interfaces or how applications are loaded and activated on the device. 12

13 The CDC version of J2ME The Connected Device Configuration (CDC) provides a much more conventional Java 2 runtime environment than CLDC. The latest release of CDC is the JSR 218 release [19]. The CDC does not require pre-verification of classes, even though such pre-verified classes can be used. The CDC specification defines: The capabilities of the VM, which is a full-featured JVM A subset of the J2SE classes [20] The Generic Connection Framework API Supported file and datagram based I/O that uses both the GCF and the ordinary java.io and java.net. The CDC does not define specifications for user interface classes or how applications are loaded and activated on the device. 2.4 The ARM Platform The ARM processor is a 32-bit Reduced Instruction Set Computer (RISC) microprocessor architecture for embedded use [21]. ARM doesn t manufacture, but instead sells intellectual property (IP) licenses to different manufactures, to produce the processor. ARM offers a broad range of processors categorized as Application processors, Embedded processors and SecureCores. The ARM processor families are built on different ARM architecture versions. 13

14 2.4.1 The ARM Architecture The ARM processor architecture provides support for the 32-bit ARM and 16-bit Thumb Instruction Set Architectures (ISAs) along with architecture extensions to provide support for Java acceleration (Jazelle ), security (TrustZone ), SIMD, and NEON TM technologies [22]. Figure 2: The ARM architecture extensions for different architecture versions. [23] Cortex Family The ARM architecture version for the Cortex family is ARMv7. The Cortex family consists of three series which all includes the 16-bit Thumb2 instruction set [24]; The ARM Cortex-A Series which is an application processor with support for ARM, Thumb, Thumb2, and Thumb2EE instruction sets. It also has VFP, Jazelle RCT, NEON, and SMP support. The ARM Cortex-R Series is a family of embedded processors for real-time systems. These processors support the ARM, Thumb, and Thumb2 instruction sets. The ARM Cortex-M Series is a family of deeply embedded processors. These processors only support the Thumb2 instruction set. 14

15 2.4.2 Multi-Processing Core With a Multi-Processing Core (MPCore) the theoretical maximum performance of an n-processor device is n*100% [25]. The use of MPCore has also the capability of reducing power consumption of up to 85% when all CPUs are in standby mode, compared to when all CPUs are running at the highest capacity [26]. It provides scalability because more CPUs can be added as busyness increases. There are solutions scalable from 1-4 CPU cores which have memory- and sub-system optimized for multi-processing. There are two techniques that are used in multiprocessing, Symmetric Multi-Processing (SMP) and Asymmetric Multi-Processing (AMP). Figure 3: Speed versus power consumption chart of the Cortex-A9 MPCore. [27] SMP is a load-distributed software architecture which means that the CPU cores are dynamically distributed. The processors are identical and are connected to a single shared memory and input/output system. AMP is similar to SMP but the processors are not perfectly symmetrical. The different CPUs might run different software or have dedicated input/output such as interrupt signals. 15

16 The ARM MPCores support both SMP and AMP and combinations of them. Each processor may be independently configured for their cache sizes. The interrupt controller is designed for distribution across multiple cores [28]. On SMP aware Operating Systems (OS) there are automated load balance/distribution across available cores. This is for processes, applications, threads, and interrupts. The Linux 2.6+ provides SMP support. There is also automatic power saving where adaptive power management for workload variations is used. The ARM MPCore power modes are; running, dormant, stand-by, and power-off. A way of measuring performance is with the Dhrystone results. It is a measurement of the average time the processor takes to perform many iterations of a single loop containing a fixed sequence of instructions. This result is referred as DMIPS or Dhrystone MIPS/MHz [29]. Benefits of Multi-Processing [30]: Higher performance ARM11 MPCore: 650 DMIPS -> 2600 DMIPS Cortex A9 MPCore: 2000 DMIPS -> 8000 DMIPS Less power consumption than you get from the equivalent performance throughput of a single processor. More CPUs that run on lower frequency with ability of individual power-off. Add/enable additional CPUs for on demand performance increase. Scalable system expansion to leverage next-generation system requirements. Flexible, ready-available, programming models to suite application requirements. Isolate real-time requirements from high-performance application deployment. 16

17 2.5 ARM platform extensions Arms instruction sets are run in different modes/states; Arm, Thumb, ThumbEE and Jazelle. Some ARM architectures include hardware extension support for Vector Float Point calculations Thumb Thumb [31] technology can give 31% code size reduction compared to 32-bit ARM instructions, but at an expense of performance. The ARM instructions can perform up to 38% better than Thumb instructions and therefore the equivalent performance loss for Thumb instructions will be 28%, according to ARM [32]. Thumb is a 16-bit instruction set that extends the 32-bit ARM architecture. A processor is operating in Thumb-state when executing Thumb instructions. These instructions are a subset of the most commonly used 32-bit ARM instructions compressed into 16-bit operation code. During execution, these instructions are decoded to enable the same functionality as the ARM instructions. 17

18 2.5.2 Thumb2 Thumb2 [33] technology can give 31% code size reduction compared to ARM instructions, and performance of up to 38% better than when using the Thumb instruction set [32]. Thumb2 is a set of 16- and 32-bit instructions that extends the ARM-architecture to improve the Thumb instruction set. It provides almost exactly the same functionality as the ARM instruction set. A processor is operating in Thumb-state when executing Thumb2 instructions. It consists of the existing 16-bit Thumb instructions and new 16-bit instructions for increased program flow. There are also new 32-bit instructions derived from the ARM instruction equivalent. The new instructions are for co-processor access, privileged instructions, bit-field manipulation, table branches, conditional execution, and special functions like Single Instruction, Multiple Data (SIMD). Figure 4: Performance versus code density comparison of three instruction sets. [34] Thumb2EE The Thumb2 Execution Environment (Thumb2EE) instruction set is for dynamically generated code which will help reduce compiled code and therefore reduce memory footprint [35]. This also means that recompiled methods can be kept in memory which will result in better performance and almost no startup delays. This instruction set is based on Thumb2 but has some changes and additions to make it a better target for dynamically generated code techniques like JIT and AOT. It is a set of 16- and 32-bit instructions. A processor is operating in ThumbEE-state when executing Thumb2EE instructions [36]. 18

19 2.5.3 Jazelle Jazelle provides hardware acceleration for some of the most commonly used managed execution environments, like Java, and outperforms a software only interpreter [37]. This is because Jazelle will execute a significant amount of Java byte-code in hardware. It extends the processor-states with a Jazelle-state. The processor also maintains the Jazelle operand stack. Jazelle allows designers and developers to deliver more features to the devices but still be able to maintain power and performance characteristics. Figure 5: The different decoding stages.[39] 19

20 Jazelle RCT A compiler that uses Jazelle Run-time Compilation Target (RCT) can provide an overhead of only 10% when converting from byte-code to 16-bit Thumb2EE instructions [38], and still match the performance of Thumb2. There is almost no increase in size between compiled code compared to the existing byte-code. In Jazelle RCT mode, also known as Thumb2EE mode [36], some Thumb2 instructions are changed to do the compilation more efficient by combining these instructions with byte-code instructions. The processor-state in which Jazelle RCT instructions are executed in is called ThumbEE-state. Jazelle RCT supports AOTand JIT-compilation with Java and other execution environments like.net Compact Framework technology. The instruction set that Jazelle RCT uses is called Thumb2EE and is a superset of the existing Thumb2 instruction set. There are also instructions for changing between Jazelle RCT mode and Thumb2 mode. Implicit null-pointer tests and fast array range checking makes the performance better [35]. It also provides 16-bit instructions for commonly used AOT/JIT compilation routines Jazelle DBX Jazelle Direct Byte-code execution (DBX) technology has important benefits when it comes to power consumption and performance compared to co-processor or dedicated processor solutions [39]. Other hardware solutions for accelerating Java execution, like a co-processor or a dedicated processor, would typically require additional silicon footprint and consume extra power to operate. They also require external memory which means that they do not maximize speed [39]. Jazelle DBX technology introduces a new instruction set, Java byte-code, to the processor [40]. In this state the processor fetches and decodes Java byte-code directly. These Java instructions are pausable, which means that an interrupt can take place in the middle of an executed Java instruction, and not affect the interrupt latency which ensures real-time interrupt performance. Jazelle DBX has the disadvantage that only Java byte-code is supported Vector Float Point extension The Vector Floating Point (VFP) is a coprocessor extension that provides hardware acceleration of single and double precision floating-point arithmetic [41]. VFP increases throughput in graphics and signal-processing applications. The implemented VFP extension follows the IEEE 754 [42] standard for binary floatingpoint arithmetic. The VFP supports the execution of short vector instructions allowing Single Instruction, Multiple Data (SIMD) parallelism. There are different implementation versions of the VFP on the ARM architecture and all versions needs support code to trap exceptions. The only version that can trap float-point exceptions is version VFPv3U. In addition there can be extra registers that the VFP coprocessor hardware uses that describes exceptional conditions that may need to be considered. 20

21 Single Instruction, Multiple Data The Single Instruction, Multiple Data (SIMD) parallelism is used for repetitive operations done on multiple data [43]. SIMD uses packed vectors with data and, unlike traditional vectors, the SIMD packed vector can be used as an argument for a specific instruction. This instruction is then performed on all the elements in the vector simultaneously. The vector size directly affects the performance as well of the type of instruction performed on the vector. The SIMD architecture often use a special set of CPU registers where the parallel processing takes place. Real SIMD computers have a mixture of Single Instruction, Single Data (SISD) and SIMD instructions, which is the case in the ARM implementation Neon Neon is the name for the ARM Advanced SIMD extension that has a comprehensive instruction set, separate register files and independent execution hardware. It was developed to accelerate the performance of multimedia and signal processing applications for video encode/decode, 3D graphics, and more [44]. It has an independent pipeline, separate register files, and independent execution hardware. NEON supports 8-, 16-, 32-, 64-bit integer and single-precision floating-point data and operates in SIMD where it can handle up to 16 operations at the same time. According to ARM - Processors that implement the ARMv7-A architecture profile have two options for handling single-precision floating point; VFPv3 and NEON technology. VFPv3 supports full IEEE754 compliant single-precision and doubleprecision handling completely in hardware. The NEON engine operates on singleprecision floating-point numbers only, and its handling of denormalled numbers and NaNs (Not a Number) is not IEEE754 compliant. The NEON engine processing of floating-point numbers is compliant with the standards of most modern programming languages, including C and C++ [45]. In Cortex-A9 an enhancement called Media Processing Engine (MPE) has been added to NEON. This extends the floating-point unit (FPU) to provide a quad-mac and additional 64-bit and 128-bit register sets [46] NEON is encoded in the ARM and Thumb2 instruction sets providing high performance with optimized code density. 21

22 2.6 Benchmark programs Grinderbench Grinderbench is a benchmarking suite that approximates the performance of J2ME. It includes five benchmarking programs [47]. These are: Chess - A chess playing engine that performs the logical parts of a chess game but without any graphical output. Crypto - Cryptographic algorithms that are calculated. kxml - Parsing of an XML-document and/or manipulating of a Document Object Model (DOM) tree. Parallel - Multiple threads running at the same time with thread switching and synchronization. PNG - A PNG image that is decoded. This benchmark focuses on the CLDC 1.0 but has a MIDP 1.0 wrapper so it can be run on devices with MIDP 1.0. This benchmark only uses integer calculations as floating point calculation is not available on the J2ME edition [48]. To calculate the GrinderMark score the geometric mean of the five individual benchmark application scores is calculated [49] SciMark 2.0 SciMark 2.0 performs five numerical tests that are common in scientific and engineering applications. These are; Fast Fourier Transform, Jacobi Successive Over-relaxation, Monte Carlo integration, Sparse matrix multiply, and dense LU matrix factorization [50]. SciMark 2.0 is working with floating point calculations. The presented composite score in SciMark is the average score of the five tests. The sources are presented in MFLOPS The Monte Carlo integration The Monte Carlo integration in SciMark 2.0 is an approximation of PI by doing the integral of the quarter circle y = sqrt(1 - x^2)[50]. 22

23 Successive Over Relaxation The SOR method [51] in SciMark 2.0 uses the Jacobi Iteration [52] to operate on a 100x100 matrix. The algorithm exercises basic "grid averaging" memory patterns, where each A(i,j) is assigned an average weighting of its four nearest neighbors. [50]. 23

24 3 Methods To achieve the main goal of this project, the work was planned and organized as described in this chapter. 3.1 Work plan The work was organized in three phases. 1. Find documents and literature regarding the subject and evaluate its usefulness for this project 2. Gain comfort with the target platform and software 3. Run tests with benchmark programs to obtain measurements for the different hardware acceleration techniques. Compare scores and draw conclusion Throughout the work documentation was done continuously. 3.2 System setup The system setup was a target platform based on the ARM architecture ARMv7-a. The kernel running on the platform was based on Linux and was run as pure as possible. No other applications other than the VM were run on the target platform at the same time. A PC and PC software such as Putty [53] was used to communicate with the target platform and benchmark programs. The target platform and the PC were connected via the serial port The Virtual Machine The VM used in this project was an open source version of Java 2 ME CDC called PhoneME Advanced [54], which has been developed with the mobile phone in mind. To compile the VM the open source compiler GCC [55] was used and also a tool library to be able to do cross-compile for the ARM/Linux platform setup. PhoneME makes use of GNU Make [56] for building and compiling the VM. This makes it easy to build and compile the VM and to change build flags via the use of make files. The make file used for this project was GNUmakefile located in folder phoneme_advanced_mr2/cdc/build/linux-arm-generic. This file contains the different compiler flags for the hardware acceleration techniques. For this project it was also altered to include options for Thumb2. 24

25 Compiler flags To compile for the hardware acceleration techniques different flags must be set to enable or disable the techniques. Used compiler flags are presented in appendix B.3 Compiler flags. More information on GCCs ARM-flags can be found at gcc.gnu.org [57]. How to set the debug and VM flags for this VM is described in CDC Build System Guide [12] document. 3.3 Running tests and obtaining results The test was done according to this plan Compile the VMs to run with the hardware flags set as shown in Appendix B.4 Build options Boot up Linux on the target platform and run the benchmark programs using six different scripts (Appendix C.2 Scripts for running tests) 1. ss.sh, running one instance of SciMark ds.sh, running two instances of SciMark2.0 in parallel 3. ts.sh, running three instances of SciMark2.0 in parallel 4. sg.sh, running one instance of Grinderbench 5. dg.sh, running two instances of Grinderbench in parallel 6. tg.sh, running three instances of Grinderbench in parallel Each script was run twice for each VM. Collecting the test results and evaluate them 25

26 3.3.1 Test result evaluation methods The benchmark programs SciMark 2.0 and GrinderBench are well known and used in the mobile industry to test performance of mobile units. These programs present results in a score based way. These scores have sufficient accuracy for this project. Results from the benchmark programs used in this project can also be used to compare with other hardware platforms and software VMs on the market today that have been benched with SciMark 2.0 and GrinderBench. The benchmark scores were processed and compared by the use of Excel and diagrams. Only the composite score from SciMark 2.0 and the GrinderMark from GrinderBench was used for comparison to eliminate faults in measurement and bumps in test score for individual tests inside the benchmark programs. The results will be shown in a percent based way of how the VMs performed in comparison to a reference VM, which will be shown to have a 100% performance. To be able to measure flash footprint a Java program [Appendix C.1 Flash footprint comparison program] was written to check files sizes of the object files against each other. When checking CPU load and RAM load the Linux command TOP was used. The results from TOP were piped to a data file and then sent through a filter [Appendix C.3 Data filter program] to filter out useful data. This data was thereafter processed in Excel. 26

27 4 Results and discussion The results presented in this chapter were obtained with the method described in chapter three. This chapter also presents this projects discussion around the findings and results. 4.1 Hardware acceleration techniques comparison To be able to compare the hardware acceleration techniques against each other the tests were arranged in three major test cases JIT_OFF, JIT_ON and JIT_HW_FP. This was done to be able to see how the hardware acceleration techniques performed with or without the influence of the software technique JIT JIT_OFF In this test case the VMs were built without the use of JIT. The different VMs build flags are found in appendix B.4 VM#: 1-6. This gives the ability to compare how the hardware accelerations performed when using interpreted mode to process the Java byte-code. The reference VM (Cortex A9) was built with no JIT and no hardware acceleration techniques [Appendix B.4 VM#: 1]. Diagram 1: Comparison of techniques with JIT_OFF 27

28 The results show a performance boost in especially SciMark when using Neon or VFPv3 techniques. This has to do with SciMarks internal tests that are based on floating point calculations as NEON and VFPv3 are aimed towards speeding up floating point calculations JIT_ON This test case puts the use of hardware acceleration techniques against each other under the influence of a JIT. The different VMs build flags are found in appendix B.4 VM#: The reference VM was compiled with no hardware acceleration and with the use of JIT [Appendix B.4 VM#: 7]. Diagram 2: Comparison of techniques with JIT_ON Under the influence of JIT the hardware acceleration techniques don t have the same impact. This has to do with that JIT does not use any of the instruction sets provided in the hardware techniques and therefore cannot benefit from them. 28

29 4.1.3 JIT_HW_FP Here the VMs are built with the use of JIT_HW_FP option. This option makes the JIT to be able to use floating point instructions when compiling Java byte-code to native instructions. The different VMs build flags are found in appendix B.4 VM#: The reference VM was compiled with no hardware acceleration technique but with JIT_HW_FP [Appendix B.4 VM#: 13]. Diagram 3: Comparison of techniques with JIT_HW_FP In this test the JIT can use the floating point instruction set but as seen here the VM itself does not gain much with the different techniques. The differences are too small to be able to say if the techniques really did have any impact on performance Hardware acceleration techniques discussion Testing the hardware acceleration techniques against each other under the test cases JIT_OFF, JIT_ON, and JIT_HW_FP shows interesting results. When no JIT is used the performance of each hardware technique is clearly visible. When enabling JIT the hardware acceleration techniques don t have any significant impact on performance. The boosts and drops of 1-3 % cannot be used to evaluate of the techniques themselves as +/- 5% can be considered within the error margin because of background threads in the Java environment [58].The same is valid when JIT_HW_FP is used. 29

30 We found out that the SciMark2 tests didn t call any of the functions in the VM that were affected by the techniques. The mathematical calls that were used in the tests were often addition, subtraction, multiplication and division. The collection of instructions of how these functions shall be executed is located in the libgcc.a library. As we didn t compile this library with the different techniques the impact on performance probably could become better if they were affected by the techniques. That is why we looked further in to the VFPv3 case in chapter and added one instruction that we knew was affected when compiling the VM with VFPv3 instructions. We also found that an interesting case to compare are between the three major test cases when JIT is enabled to see if there is any gain in performance with the hardware acceleration technique for floating point calculations. This has been done in chapter JIT_OFF vs. JIT_ON vs. JIT_HW_FP This test case compares the use of JIT and JIT_HW_FP to the case when not using JIT for each hardware acceleration technique. Reference VMs are compiled for each hardware technique without the use of JIT [Appendix B.4 VM#: 1-6]. Diagram 4: Comparison of JIT_OFF, JIT_ON and JIT_HW_FP The diagram shows an enormous boost when JIT can use floating point instructions. 30

31 JIT discussion Even though JIT is a software acceleration technique we wanted to check if the impact on performance would be better than when just using interpreted mode and hardware acceleration techniques. The diagram shows clearly that JIT alone can boost the performance way better that any hardware technique alone in interpreted mode. In the case when JIT is working with floating point instructions an even greater performance boost can be achieved. In SciMark 2.0 with JIT_ON the performance boost was about three to four times better than with JIT_OFF. However when we tested with JIT_HW_FP the performance boost was nearly 20 times larger to when not using JIT at all. This has probably to do with that the JIT can produce code that is used in the floating point hardware. When looking at the GrinderBench test there was not any noticeable change in performance when comparing JIT_ON and JIT_HW_FP. This is probably because, as we said before, that GrinderBench does not use floating point computations and therefore cannot benefit from the floating point hardware. 4.2 SMP support For testing the use of SMP support the scores were measured when running one, two or three instances of a benchmark program at the same time. The scores for the reference VMs are when running one instance of the benchmark programs without SMP support on the target platform. Test runs were done via scripts shown in appendix C.2. The target platform used in this project has an ARM Cortex-A9 with dual cores. 31

32 Diagram 5: SMP comparison on SciMark2 In the diagram 5 there is a clear performance drop when running more than one instance on a single processor (NO_SMP). When the SMP support is enabled the performance for two instances is almost the same as for one instance. When running two or three instances with SMP support the performance boost is actually almost 100% better than running on a single core. 32

33 Diagram 6: SMP comparison on GrinderBench Similar results for SMP support can be observed when Grinderbench is used. There is a 15% performance gain when using one instance of Grinderbech under SMP support and has to do with the parallel test inside Grinderbench that take advantage of the SMP support SMP discussion The results we got confirmed our expectations when using SMP support. Using more than one core gives a performance boost. Here it clearly shows that in the case of two instances of benchmark programs running at the same time the benchmark programs gives the same scores as for one instance running at one processor. On this target platform we only have SMP support with two cores but we can draw the conclusion that the more cores available, the better the unit will handle multiple tasks. 33

34 4.3 Instruction set comparison and performance Thumb2 Compiling the VM with the Thumb2 instruction set introduced problems with the SWP instruction [Chapter: ]. This instruction is an atomic read-modify-write operation that is not supported by the Thumb2 instruction set. To solve this problem some files were excluded when using the Thumb2 instruction set. The excluded files are found in the variable UNTHUMBABLE in Appendix B Some files became larger when using the Thumb2 instruction set and has to do with that in some cases it is necessary to combine a couple of Thumb2 instructions to get the same functionality as one ARM instruction. This can cause the files to become larger and the average increase in size on those files is about 1.5%. The files are found in appendix B There was no significant increase or decrease in performance when comparing the VM with [Appendix B.4 VM#:2] or without [Appendix B.4 VM#:1] the Thumb2 option. The decrease in size of the overall VM was about 3% with JIT_ON and about 2,5% when JIT_OFF. The average decrease in size of the files that got smaller is about 11.7% and those files are found in Appendix B The diagram below shows the difference in size on the VM compiled with different techniques. The reference VM is compiled with no acceleration techniques in both the JIT_OFF and JIT_ON case. Diagram 7: CVM size comparison with and without Thumb2 34

35 SWP instruction SWP is an atomic instruction. Atomic instructions will always run the whole instruction without being disturbed from another operation. Implementation is often used with semaphores to guarantee that no other processes will disturb while an atomic instruction is executing. The SWP instruction can be replaced with other instructions but it is hard to solve with a good thread-safe solution. The problem with the SWP instruction first appeared when the VM were compiled with the Thumb2 instruction set. This was because the Thumb2 instruction set did not support the SWP instruction. Therefore we had to exclude some c-files from using Thumb2. The second time the SWP was causing problems was when we were running the VM with JIT support because JIT was producing SWP instructions. To solve this problem there was a workaround to enable the target platform to accept these instructions. Although one problem with this solution is that the SWP instruction locks out both the processor and the memory while it is executing. This can especially have a negative impact on performance when running in multi-core mode Thumb2 discussion We observed that the overall size of the VM did not become any smaller. That is because most of the data in the C-files, which are being compiled and linked together are just C-structs and other pure data. The C-structs and pure data does not have any instructions in them and can therefore not be compressed with another instruction set like Thumb Jazelle discussion We did searches on the internet in how to enable this option on the reference VM. The search criteria we used were that it needed to be license free in order to be valid. No such results were found. Instead we redefined the search criteria to include results containing other VMs implementing Jazelle and found that phoneme feature MR4 included Jazelle RCT [59] and when downloading this VM we could confirm that there was a library containing implementations for Jazelle RCT. But the only search results for Jazelle RCT was that the open source VM phoneme Advanced does not support Jazelle-RCT and the support for Jazelle-DBX is under license, which meant that we had no opportunity to test the Jazelle feature in this project. 35

36 4.3.3 Vector Floating Point Further work on the VFP option was made. Based on the results from the SciMark scores, we decided to do some more tests on a modified version of the SOR method in SciMark 2.0. Because of the poor performance gains when compiling the VM with VFP, compared to no VFP, a decision was taken that some modifications in some SciMark 2.0 method was necessary to be made to really test if the VFP instructions would give any performance gains. As suspected earlier there were no relevant segments of code that tested any native functions, in the original SciMark 2.0, that would have been affected by compiling the VM with the VFP flag. In the original methods in SciMark2 most calls to mathematical functions are done to the library libgcc.a containing mathematical functions that, in our case, are NOT affected by the VFP flag. To solve this problem a small bit of code [Appendix A.1] in the SOR method was modified to make a call to a VM native mathematical function that gets affected by the VFP flag. This function is the Java mathematical method arcsine and is traced to its VM native counterpart by following the next steps in the sequence diagram presented in the next chapter Tracing the mathematical Java method arcsine 36

37 Modified SciMark2 SOR method with Java method arcsine In diagram 8 it is clearly visible that there is a slight increase of performance in the original version of the SOR method when JIT is turned off. In the cases with JIT_ON and _JIT_HW_FP there is no significant increase/decrease in performance. Diagram 8: Performance comparison of the modified SOR methods. Performance comparison where the pure VM with JIT_OFF 1, JIT_ON 2 and JIT_HW_FP 3 is the base compared with the VM that is compiled with VFP with JIT_OFF 4, JIT_ON 5 and JIT_HW_FP 6. The result difference in SciMark2 is not big when comparing without [Appendix B.4 VM#:1] or with [Appendix B.4 VM#:4] VFP when JIT is turned off. When VFP is turned off the VM uses software emulated floating point computation and may be the cause why there is no big improvement on performance. It may also be that there is so much other code, e.g. overhead, when running in interpreted mode that the overall impact on performance is irrelevant. There is only 3% increase in performance gain when compiling the VM with VFP than without, even though there are big differences when comparing the e_asin.o file. 1 Appendix B.4 VM#: 1 2 Appendix B.4 VM#: 7 3 Appendix B.4 VM#: 13 4 Appendix B.4 VM#: 4 5 Appendix B.4 VM#: 10 6 Appendix B.4 VM#: 16 37

38 When JIT was turned on there was a 50% performance increase between compiling the VM with [Appendix B.4 VM#:10] or without [Appendix B.4 VM#:7] VFP. This probably is because the JIT saves the methods in the memory while executing and can call these methods much faster and with less overhead, meaning that the performance impact of the code from e_asin.o will be much larger Code comparison from e_asin.o with and without VFP The difference between the two e_asin.o files when compiled with [Appendix B.4 VM#:4] or without [Appendix B.4 VM#:1] VFP is: e.asin.o A9 VFP Rows Size 8803kB 3664kB A small code example from e_asin.o, that corresponds to each other, with [Appendix B.4 VM#:4] and without [Appendix B.4 VM#:1] VFP : A9 [Appendix B.4 VM#:1]: Row 214 in e_asin.o.l7: ldrd r4, [sp, #16] mov r2, r0 mov r3, r1 mov r0, r4 mov r1, r5 bl aeabi_dmul (Call the subroutine aeabi_dmul. [61]) [Appendix A.2.3] mov r2, r0 mov r3, r1 mov r0, r4 mov r1, r5 bl aeabi_dadd (Call the subroutine aeabi_dadd. [61]) mov r4, r0 mov r5, r1 b.l4 VFP [Appendix B.4 VM#:4]: Row 94 in e_asin.o.l7 fldd d0, [sp, #0] fmacd d0, d0, d7 b.l4 The differences in assembly code (above) when VFP is used are clearly visible. The ARM assembler guide is found in the ARM Developer Suite Assembler Guide [60]. 38

39 JIT_ON and JIT_HW_FP code comparison When enabling JIT_HW_FP [Appendix B.4 VM#:13] on the VM the JIT can also produce floating point instructions, e.g. instructions to add faddd or multiply fmuld two registers [Appendix A.2.1] [60]. When using JIT_ON [Appendix B.4 VM#:7], the JIT must call the native methods CVMCCMruntimeDAdd to add and CVMCCMruntimeDMul to multiply [Appendix A.2.2] [61]. These native methods are usually much larger compared to the case when using floating point instructions. The native method that is called to multiply is in this case aeabi_dmul [Appendix A.2.3]. This method is in the file _arm_muldivdf3.o that is located in the libgcc.a library. To get the code from this file the library was first unpacked and then the file _arm_muldivdf3.o was disassembled. The disassembled file shows that the aeabi_dmul [Appendix A.2.3] method is 155 rows long and contains several loops. The code shown in appendix A.2.2 and in appendix A.2.3 is the equivalent, with [Appendix B.4 VM#:13] and without [Appendix B.4 VM#:7] hardware support for floating point operations, to the if (x*x + y*y <= 1.0) C-code statement in the MonteCarlo Integrate [Appendix A.2] file. The performance gain in the case when the pure VM with JIT_ON is compared to the pure VM with JIT_HW_FP is about 100% better with JIT_HW_FP VFP discussion The relevant part of all this is that there is a performance increase when actually using methods that has been affected by the VFP option: Those are JIT_ON with VFP VM and JIT_HW_FP with VFP VM and they give the best performance increase when comparing to the pure VM case Neon discussion There is no gain in using NEON over VFP when running GrinderBench and SciMark 2.0. This can probably be explained with that NEON is an SIMD extension [44] for floating point calculations and that the benchmark programs don t use any test that can benefit from the NEON acceleration. 39

Jazelle ARM. By: Adrian Cretzu & Sabine Loebner

Jazelle ARM. By: Adrian Cretzu & Sabine Loebner Jazelle ARM By: Adrian Cretzu & Sabine Loebner Table of Contents Java o Challenge o Acceleration Techniques ARM Overview o RISC o ISA o Background Jazelle o Background o Jazelle mode o bytecode execution

More information

Chapter 5. Introduction ARM Cortex series

Chapter 5. Introduction ARM Cortex series Chapter 5 Introduction ARM Cortex series 5.1 ARM Cortex series variants 5.2 ARM Cortex A series 5.3 ARM Cortex R series 5.4 ARM Cortex M series 5.5 Comparison of Cortex M series with 8/16 bit MCUs 51 5.1

More information

GrinderBench. software benchmark data book.

GrinderBench. software benchmark data book. GrinderBench software benchmark data book Table of Contents Calculating the Grindermark...2 Chess...3 Crypto...5 kxml...6 Parallel...7 PNG...9 1 Name: Calculating the Grindermark The Grindermark and the

More information

ASSEMBLY LANGUAGE MACHINE ORGANIZATION

ASSEMBLY LANGUAGE MACHINE ORGANIZATION ASSEMBLY LANGUAGE MACHINE ORGANIZATION CHAPTER 3 1 Sub-topics The topic will cover: Microprocessor architecture CPU processing methods Pipelining Superscalar RISC Multiprocessing Instruction Cycle Instruction

More information

GrinderBench for the Java Platform Micro Edition Java ME

GrinderBench for the Java Platform Micro Edition Java ME GrinderBench for the Java Platform Micro Edition Java ME WHITE PAPER May 2003 Updated April 2006 Protagoras, the leading Greek Sophist, was quoted as saying, "Man is the measure of all things," by which

More information

ARM Processors for Embedded Applications

ARM Processors for Embedded Applications ARM Processors for Embedded Applications Roadmap for ARM Processors ARM Architecture Basics ARM Families AMBA Architecture 1 Current ARM Core Families ARM7: Hard cores and Soft cores Cache with MPU or

More information

Amber Baruffa Vincent Varouh

Amber Baruffa Vincent Varouh Amber Baruffa Vincent Varouh Advanced RISC Machine 1979 Acorn Computers Created 1985 first RISC processor (ARM1) 25,000 transistors 32-bit instruction set 16 general purpose registers Load/Store Multiple

More information

Java Performance Analysis for Scientific Computing

Java Performance Analysis for Scientific Computing Java Performance Analysis for Scientific Computing Roldan Pozo Leader, Mathematical Software Group National Institute of Standards and Technology USA UKHEC: Java for High End Computing Nov. 20th, 2000

More information

ECE 471 Embedded Systems Lecture 2

ECE 471 Embedded Systems Lecture 2 ECE 471 Embedded Systems Lecture 2 Vince Weaver http://www.eece.maine.edu/~vweaver vincent.weaver@maine.edu 3 September 2015 Announcements HW#1 will be posted today, due next Thursday. I will send out

More information

Cortex-A9 MPCore Software Development

Cortex-A9 MPCore Software Development Cortex-A9 MPCore Software Development Course Description Cortex-A9 MPCore software development is a 4 days ARM official course. The course goes into great depth and provides all necessary know-how to develop

More information

Contents of this presentation: Some words about the ARM company

Contents of this presentation: Some words about the ARM company The architecture of the ARM cores Contents of this presentation: Some words about the ARM company The ARM's Core Families and their benefits Explanation of the ARM architecture Architecture details, features

More information

Chapter 1: Introduction to Computers and Java

Chapter 1: Introduction to Computers and Java Chapter 1: Introduction to Computers and Java Starting Out with Java: From Control Structures through Objects Fifth Edition by Tony Gaddis Chapter Topics Chapter 1 discusses the following main topics:

More information

8/23/2014. Chapter Topics. Introduction. Java History. Why Program? Java Applications and Applets. Chapter 1: Introduction to Computers and Java

8/23/2014. Chapter Topics. Introduction. Java History. Why Program? Java Applications and Applets. Chapter 1: Introduction to Computers and Java Chapter 1: Introduction to Computers and Java Starting Out with Java: From Control Structures through Objects Fifth Edition by Tony Gaddis Chapter Topics Chapter 1 discusses the following main topics:

More information

An Overview of the BLITZ System

An Overview of the BLITZ System An Overview of the BLITZ System Harry H. Porter III Department of Computer Science Portland State University Introduction The BLITZ System is a collection of software designed to support a university-level

More information

Virtual Machines and Dynamic Translation: Implementing ISAs in Software

Virtual Machines and Dynamic Translation: Implementing ISAs in Software Virtual Machines and Dynamic Translation: Implementing ISAs in Software Krste Asanovic Laboratory for Computer Science Massachusetts Institute of Technology Software Applications How is a software application

More information

Cortex-R5 Software Development

Cortex-R5 Software Development Cortex-R5 Software Development Course Description Cortex-R5 software development is a three days ARM official course. The course goes into great depth, and provides all necessary know-how to develop software

More information

Universität Dortmund. ARM Architecture

Universität Dortmund. ARM Architecture ARM Architecture The RISC Philosophy Original RISC design (e.g. MIPS) aims for high performance through o reduced number of instruction classes o large general-purpose register set o load-store architecture

More information

EEM870 Embedded System and Experiment Lecture 3: ARM Processor Architecture

EEM870 Embedded System and Experiment Lecture 3: ARM Processor Architecture EEM870 Embedded System and Experiment Lecture 3: ARM Processor Architecture Wen-Yen Lin, Ph.D. Department of Electrical Engineering Chang Gung University Email: wylin@mail.cgu.edu.tw March 2014 Agenda

More information

Cortex-A15 MPCore Software Development

Cortex-A15 MPCore Software Development Cortex-A15 MPCore Software Development Course Description Cortex-A15 MPCore software development is a 4 days ARM official course. The course goes into great depth and provides all necessary know-how to

More information

ECE 471 Embedded Systems Lecture 2

ECE 471 Embedded Systems Lecture 2 ECE 471 Embedded Systems Lecture 2 Vince Weaver http://www.eece.maine.edu/ vweaver vincent.weaver@maine.edu 4 September 2014 Announcements HW#1 will be posted tomorrow (Friday), due next Thursday Working

More information

ARMv8-A Software Development

ARMv8-A Software Development ARMv8-A Software Development Course Description ARMv8-A software development is a 4 days ARM official course. The course goes into great depth and provides all necessary know-how to develop software for

More information

Advanced Computer Architecture

Advanced Computer Architecture ECE 563 Advanced Computer Architecture Fall 2007 Lecture 14: Virtual Machines 563 L14.1 Fall 2009 Outline Types of Virtual Machine User-level (or Process VMs) System-level Techniques for implementing all

More information

Introduction to the ARM Architecture. or: a loose set of random facts blatantly copied from tech sheets and the Architecture Ref.

Introduction to the ARM Architecture. or: a loose set of random facts blatantly copied from tech sheets and the Architecture Ref. Introduction to the ARM Architecture or: a loose set of random facts blatantly copied from tech sheets and the Architecture Ref. Manual Glance into the past Initial ARM Processor developed by Acorn Computers,

More information

ARM Ltd. ! Founded in November 1990! Spun out of Acorn Computers

ARM Ltd. ! Founded in November 1990! Spun out of Acorn Computers ARM Architecture ARM Ltd! Founded in November 1990! Spun out of Acorn Computers! Designs the ARM range of RISC processor cores! Licenses ARM core designs to semiconductor partners who fabricate and sell

More information

ARM ARCHITECTURE. Contents at a glance:

ARM ARCHITECTURE. Contents at a glance: UNIT-III ARM ARCHITECTURE Contents at a glance: RISC Design Philosophy ARM Design Philosophy Registers Current Program Status Register(CPSR) Instruction Pipeline Interrupts and Vector Table Architecture

More information

Job Posting (Aug. 19) ECE 425. ARM7 Block Diagram. ARM Programming. Assembly Language Programming. ARM Architecture 9/7/2017. Microprocessor Systems

Job Posting (Aug. 19) ECE 425. ARM7 Block Diagram. ARM Programming. Assembly Language Programming. ARM Architecture 9/7/2017. Microprocessor Systems Job Posting (Aug. 19) ECE 425 Microprocessor Systems TECHNICAL SKILLS: Use software development tools for microcontrollers. Must have experience with verification test languages such as Vera, Specman,

More information

Chapter 15 ARM Architecture, Programming and Development Tools

Chapter 15 ARM Architecture, Programming and Development Tools Chapter 15 ARM Architecture, Programming and Development Tools Lesson 07 ARM Cortex CPU and Microcontrollers 2 Microcontroller CORTEX M3 Core 32-bit RALU, single cycle MUL, 2-12 divide, ETM interface,

More information

The ARM Cortex-A9 Processors

The ARM Cortex-A9 Processors The ARM Cortex-A9 Processors This whitepaper describes the details of the latest high performance processor design within the common ARM Cortex applications profile ARM Cortex-A9 MPCore processor: A multicore

More information

ECE 471 Embedded Systems Lecture 3

ECE 471 Embedded Systems Lecture 3 ECE 471 Embedded Systems Lecture 3 Vince Weaver http://web.eece.maine.edu/~vweaver vincent.weaver@maine.edu 10 September 2018 Announcements New classroom: Stevens 365 HW#1 was posted, due Friday Reminder:

More information

CISC RISC. Compiler. Compiler. Processor. Processor

CISC RISC. Compiler. Compiler. Processor. Processor Q1. Explain briefly the RISC design philosophy. Answer: RISC is a design philosophy aimed at delivering simple but powerful instructions that execute within a single cycle at a high clock speed. The RISC

More information

Whiz-Bang Graphics and Media Performance for Java Platform, Micro Edition (JavaME)

Whiz-Bang Graphics and Media Performance for Java Platform, Micro Edition (JavaME) Whiz-Bang Graphics and Media Performance for Java Platform, Micro Edition (JavaME) Pavel Petroshenko, Sun Microsystems, Inc. Ashmi Bhanushali, NVIDIA Corporation Jerry Evans, Sun Microsystems, Inc. Nandini

More information

Evolution of Virtual Machine Technologies for Portability and Application Capture. Bob Vandette Java Hotspot VM Engineering Sept 2004

Evolution of Virtual Machine Technologies for Portability and Application Capture. Bob Vandette Java Hotspot VM Engineering Sept 2004 Evolution of Virtual Machine Technologies for Portability and Application Capture Bob Vandette Java Hotspot VM Engineering Sept 2004 Topics Virtual Machine Evolution Timeline & Products Trends forcing

More information

Assembly Language. Lecture 2 - x86 Processor Architecture. Ahmed Sallam

Assembly Language. Lecture 2 - x86 Processor Architecture. Ahmed Sallam Assembly Language Lecture 2 - x86 Processor Architecture Ahmed Sallam Introduction to the course Outcomes of Lecture 1 Always check the course website Don t forget the deadline rule!! Motivations for studying

More information

KeyStone II. CorePac Overview

KeyStone II. CorePac Overview KeyStone II ARM Cortex A15 CorePac Overview ARM A15 CorePac in KeyStone II Standard ARM Cortex A15 MPCore processor Cortex A15 MPCore version r2p2 Quad core, dual core, and single core variants 4096kB

More information

Design of CPU Simulation Software for ARMv7 Instruction Set Architecture

Design of CPU Simulation Software for ARMv7 Instruction Set Architecture Design of CPU Simulation Software for ARMv7 Instruction Set Architecture Author: Dillon Tellier Advisor: Dr. Christopher Lupo Date: June 2014 1 INTRODUCTION Simulations have long been a part of the engineering

More information

Growth outside Cell Phone Applications

Growth outside Cell Phone Applications ARM Introduction Growth outside Cell Phone Applications ~1B units shipped into non-mobile applications Embedded segment now accounts for 13% of ARM shipments Automotive, microcontroller and smartcards

More information

Introduction CHAPTER IN THIS CHAPTER

Introduction CHAPTER IN THIS CHAPTER CHAPTER Introduction 1 IN THIS CHAPTER What Is the ARM Cortex-M3 Processor?... 1 Background of ARM and ARM Architecture... 2 Instruction Set Development... 7 The Thumb-2 Technology and Instruction Set

More information

Hardware Accelerated Graphics for High Performance JavaFX Mobile Applications

Hardware Accelerated Graphics for High Performance JavaFX Mobile Applications Hardware Accelerated Graphics for High Performance JavaFX Mobile Applications Pavel Petroshenko, Sun Microsystems Jan Valenta, Sun Microsystems Jerry Evans, Sun Microsystems Goal of this Session Demonstrate

More information

On the Design of the Local Variable Cache in a Hardware Translation-Based Java Virtual Machine

On the Design of the Local Variable Cache in a Hardware Translation-Based Java Virtual Machine On the Design of the Local Variable Cache in a Hardware Translation-Based Java Virtual Machine Hitoshi Oi The University of Aizu June 16, 2005 Languages, Compilers, and Tools for Embedded Systems (LCTES

More information

CS 326: Operating Systems. CPU Scheduling. Lecture 6

CS 326: Operating Systems. CPU Scheduling. Lecture 6 CS 326: Operating Systems CPU Scheduling Lecture 6 Today s Schedule Agenda? Context Switches and Interrupts Basic Scheduling Algorithms Scheduling with I/O Symmetric multiprocessing 2/7/18 CS 326: Operating

More information

Assembly Language. Lecture 2 x86 Processor Architecture

Assembly Language. Lecture 2 x86 Processor Architecture Assembly Language Lecture 2 x86 Processor Architecture Ahmed Sallam Slides based on original lecture slides by Dr. Mahmoud Elgayyar Introduction to the course Outcomes of Lecture 1 Always check the course

More information

ARM Cortex A9. ARM Cortex A9

ARM Cortex A9. ARM Cortex A9 ARM Cortex A9 Four dedicated registers are used for special purposes. The IP register works around the limitations of the ARM functional call instruction (BL) which cannot fully address all of its 2 32

More information

Processing Unit CS206T

Processing Unit CS206T Processing Unit CS206T Microprocessors The density of elements on processor chips continued to rise More and more elements were placed on each chip so that fewer and fewer chips were needed to construct

More information

ARM Cortex core microcontrollers 3. Cortex-M0, M4, M7

ARM Cortex core microcontrollers 3. Cortex-M0, M4, M7 ARM Cortex core microcontrollers 3. Cortex-M0, M4, M7 Scherer Balázs Budapest University of Technology and Economics Department of Measurement and Information Systems BME-MIT 2018 Trends of 32-bit microcontrollers

More information

Chapter 4. Enhancing ARM7 architecture by embedding RTOS

Chapter 4. Enhancing ARM7 architecture by embedding RTOS Chapter 4 Enhancing ARM7 architecture by embedding RTOS 4.1 ARM7 architecture 4.2 ARM7TDMI processor core 4.3 Embedding RTOS on ARM7TDMI architecture 4.4 Block diagram of the Design 4.5 Hardware Design

More information

15CS44: MICROPROCESSORS AND MICROCONTROLLERS. QUESTION BANK with SOLUTIONS MODULE-4

15CS44: MICROPROCESSORS AND MICROCONTROLLERS. QUESTION BANK with SOLUTIONS MODULE-4 15CS44: MICROPROCESSORS AND MICROCONTROLLERS QUESTION BANK with SOLUTIONS MODULE-4 1) Differentiate CISC and RISC architectures. 2) Explain the important design rules of RISC philosophy. The RISC philosophy

More information

ARM Simulation using C++ and Multithreading

ARM Simulation using C++ and Multithreading International Journal of Innovative Technology and Exploring Engineering (IJITEE) ARM Simulation using C++ and Multithreading Suresh Babu S, Channabasappa Baligar Abstract: - This project is to be produced

More information

Memory Models. Registers

Memory Models. Registers Memory Models Most machines have a single linear address space at the ISA level, extending from address 0 up to some maximum, often 2 32 1 bytes or 2 64 1 bytes. Some machines have separate address spaces

More information

Overview of Java 2 Platform, Micro Edition (J2ME )

Overview of Java 2 Platform, Micro Edition (J2ME ) CHAPTER2 Overview of Java 2 Platform, Micro Edition (J2ME ) 2.1 Java 2 Platform Recognizing that one size does not fit all, Sun Microsystems has grouped Java technologies into three editions, each aimed

More information

ELC4438: Embedded System Design ARM Embedded Processor

ELC4438: Embedded System Design ARM Embedded Processor ELC4438: Embedded System Design ARM Embedded Processor Liang Dong Electrical and Computer Engineering Baylor University Intro to ARM Embedded Processor (UK 1990) Advanced RISC Machines (ARM) Holding Produce

More information

Unit 9 : Fundamentals of Parallel Processing

Unit 9 : Fundamentals of Parallel Processing Unit 9 : Fundamentals of Parallel Processing Lesson 1 : Types of Parallel Processing 1.1. Learning Objectives On completion of this lesson you will be able to : classify different types of parallel processing

More information

Operating Systems (2INC0) 2018/19. Introduction (01) Dr. Tanir Ozcelebi. Courtesy of Prof. Dr. Johan Lukkien. System Architecture and Networking Group

Operating Systems (2INC0) 2018/19. Introduction (01) Dr. Tanir Ozcelebi. Courtesy of Prof. Dr. Johan Lukkien. System Architecture and Networking Group Operating Systems (2INC0) 20/19 Introduction (01) Dr. Courtesy of Prof. Dr. Johan Lukkien System Architecture and Networking Group Course Overview Introduction to operating systems Processes, threads and

More information

Chapter 12. CPU Structure and Function. Yonsei University

Chapter 12. CPU Structure and Function. Yonsei University Chapter 12 CPU Structure and Function Contents Processor organization Register organization Instruction cycle Instruction pipelining The Pentium processor The PowerPC processor 12-2 CPU Structures Processor

More information

Computer Organization and Design, 5th Edition: The Hardware/Software Interface

Computer Organization and Design, 5th Edition: The Hardware/Software Interface Computer Organization and Design, 5th Edition: The Hardware/Software Interface 1 Computer Abstractions and Technology 1.1 Introduction 1.2 Eight Great Ideas in Computer Architecture 1.3 Below Your Program

More information

Chapters 3. ARM Assembly. Embedded Systems with ARM Cortext-M. Updated: Wednesday, February 7, 2018

Chapters 3. ARM Assembly. Embedded Systems with ARM Cortext-M. Updated: Wednesday, February 7, 2018 Chapters 3 ARM Assembly Embedded Systems with ARM Cortext-M Updated: Wednesday, February 7, 2018 Programming languages - Categories Interpreted based on the machine Less complex, not as efficient Efficient,

More information

Cortex-A5 MPCore Software Development

Cortex-A5 MPCore Software Development Cortex-A5 MPCore Software Development תיאורהקורס קורסDevelopment Cortex-A5 MPCore Software הינו הקורס הרשמי שלחברת ARM בן 4 ימים, מעמיקמאודומכסהאתכלהנושאיםהקשוריםבפיתוחתוכנה לפלטפורמותמבוססותליבת.Cortex-A5

More information

Interaction of JVM with x86, Sparc and MIPS

Interaction of JVM with x86, Sparc and MIPS Interaction of JVM with x86, Sparc and MIPS Sasikanth Avancha, Dipanjan Chakraborty, Dhiral Gada, Tapan Kamdar {savanc1, dchakr1, dgada1, kamdar}@cs.umbc.edu Department of Computer Science and Electrical

More information

Topics Power tends to corrupt; absolute power corrupts absolutely. Computer Organization CS Data Representation

Topics Power tends to corrupt; absolute power corrupts absolutely. Computer Organization CS Data Representation Computer Organization CS 231-01 Data Representation Dr. William H. Robinson November 12, 2004 Topics Power tends to corrupt; absolute power corrupts absolutely. Lord Acton British historian, late 19 th

More information

Embedded Systems Dr. Santanu Chaudhury Department of Electrical Engineering Indian Institute of Technology, Delhi

Embedded Systems Dr. Santanu Chaudhury Department of Electrical Engineering Indian Institute of Technology, Delhi Embedded Systems Dr. Santanu Chaudhury Department of Electrical Engineering Indian Institute of Technology, Delhi Lecture - 13 Virtual memory and memory management unit In the last class, we had discussed

More information

Fiji VM Safety Critical Java

Fiji VM Safety Critical Java Fiji VM Safety Critical Java Filip Pizlo, President Fiji Systems Inc. Introduction Java is a modern, portable programming language with wide-spread adoption. Goal: streamlining debugging and certification.

More information

Using Intel Streaming SIMD Extensions for 3D Geometry Processing

Using Intel Streaming SIMD Extensions for 3D Geometry Processing Using Intel Streaming SIMD Extensions for 3D Geometry Processing Wan-Chun Ma, Chia-Lin Yang Dept. of Computer Science and Information Engineering National Taiwan University firebird@cmlab.csie.ntu.edu.tw,

More information

The Next Steps in the Evolution of Embedded Processors

The Next Steps in the Evolution of Embedded Processors The Next Steps in the Evolution of Embedded Processors Terry Kim Staff FAE, ARM Korea ARM Tech Forum Singapore July 12 th 2017 Cortex-M Processors Serving Connected Applications Energy grid Automotive

More information

CS533 Concepts of Operating Systems. Jonathan Walpole

CS533 Concepts of Operating Systems. Jonathan Walpole CS533 Concepts of Operating Systems Jonathan Walpole Improving IPC by Kernel Design & The Performance of Micro- Kernel Based Systems The IPC Dilemma IPC is very import in µ-kernel design - Increases modularity,

More information

H.-S. Oh, B.-J. Kim, H.-K. Choi, S.-M. Moon. School of Electrical Engineering and Computer Science Seoul National University, Korea

H.-S. Oh, B.-J. Kim, H.-K. Choi, S.-M. Moon. School of Electrical Engineering and Computer Science Seoul National University, Korea H.-S. Oh, B.-J. Kim, H.-K. Choi, S.-M. Moon School of Electrical Engineering and Computer Science Seoul National University, Korea Android apps are programmed using Java Android uses DVM instead of JVM

More information

Requirements and Issues of V**s for Mobile Terminals

Requirements and Issues of V**s for Mobile Terminals Requirements and Issues of V**s for Mobile Terminals Workshop on the Future of Virtual Execution Environments Armonk, NY, USA 15-17.09.2004 Kari Systä Nokia Research Center 1 NOKIA Presentation_Name.PPT

More information

COMPUTER ORGANIZATION AND ARCHITECTURE

COMPUTER ORGANIZATION AND ARCHITECTURE Page 1 1. Which register store the address of next instruction to be executed? A) PC B) AC C) SP D) NONE 2. How many bits are required to address the 128 words of memory? A) 7 B) 8 C) 9 D) NONE 3. is the

More information

Overview of Development Tools for the ARM Cortex -A8 Processor George Milne March 2006

Overview of Development Tools for the ARM Cortex -A8 Processor George Milne March 2006 Overview of Development Tools for the ARM Cortex -A8 Processor George Milne March 2006 Introduction ARM launched the Cortex-A8 CPU in October 2005, for consumer products requiring power efficient multi-media

More information

CS6303 Computer Architecture Regulation 2013 BE-Computer Science and Engineering III semester 2 MARKS

CS6303 Computer Architecture Regulation 2013 BE-Computer Science and Engineering III semester 2 MARKS CS6303 Computer Architecture Regulation 2013 BE-Computer Science and Engineering III semester 2 MARKS UNIT-I OVERVIEW & INSTRUCTIONS 1. What are the eight great ideas in computer architecture? The eight

More information

CPE300: Digital System Architecture and Design

CPE300: Digital System Architecture and Design CPE300: Digital System Architecture and Design Fall 2011 MW 17:30-18:45 CBC C316 Arithmetic Unit 10032011 http://www.egr.unlv.edu/~b1morris/cpe300/ 2 Outline Recap Chapter 3 Number Systems Fixed Point

More information

Hardware Emulation and Virtual Machines

Hardware Emulation and Virtual Machines Hardware Emulation and Virtual Machines Overview Review of How Programs Run: Registers Execution Cycle Processor Emulation Types: Pure Translation Static Recompilation Dynamic Recompilation Direct Bytecode

More information

CODE TIME TECHNOLOGIES. Abassi RTOS. Porting Document. ARM Cortex-A9 CCS

CODE TIME TECHNOLOGIES. Abassi RTOS. Porting Document. ARM Cortex-A9 CCS CODE TIME TECHNOLOGIES Abassi RTOS Porting Document ARM Cortex-A9 CCS Copyright Information This document is copyright Code Time Technologies Inc. 2012. All rights reserved. No part of this document may

More information

Lecture 11 - Portability and Optimizations

Lecture 11 - Portability and Optimizations Lecture 11 - Portability and Optimizations This work is licensed under the Creative Commons Attribution 4.0 International License. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/

More information

Big.LITTLE Processing with ARM Cortex -A15 & Cortex-A7

Big.LITTLE Processing with ARM Cortex -A15 & Cortex-A7 Big.LITTLE Processing with ARM Cortex -A15 & Cortex-A7 Improving Energy Efficiency in High-Performance Mobile Platforms Peter Greenhalgh, ARM September 2011 This paper presents the rationale and design

More information

Latches. IT 3123 Hardware and Software Concepts. Registers. The Little Man has Registers. Data Registers. Program Counter

Latches. IT 3123 Hardware and Software Concepts. Registers. The Little Man has Registers. Data Registers. Program Counter IT 3123 Hardware and Software Concepts Notice: This session is being recorded. CPU and Memory June 11 Copyright 2005 by Bob Brown Latches Can store one bit of data Can be ganged together to store more

More information

CHAPTER 1 Introduction to Computers and Java

CHAPTER 1 Introduction to Computers and Java CHAPTER 1 Introduction to Computers and Java Copyright 2016 Pearson Education, Inc., Hoboken NJ Chapter Topics Chapter 1 discusses the following main topics: Why Program? Computer Systems: Hardware and

More information

CHAPTER 5 A Closer Look at Instruction Set Architectures

CHAPTER 5 A Closer Look at Instruction Set Architectures CHAPTER 5 A Closer Look at Instruction Set Architectures 5.1 Introduction 199 5.2 Instruction Formats 199 5.2.1 Design Decisions for Instruction Sets 200 5.2.2 Little versus Big Endian 201 5.2.3 Internal

More information

Computer Systems. Binary Representation. Binary Representation. Logical Computation: Boolean Algebra

Computer Systems. Binary Representation. Binary Representation. Logical Computation: Boolean Algebra Binary Representation Computer Systems Information is represented as a sequence of binary digits: Bits What the actual bits represent depends on the context: Seminar 3 Numerical value (integer, floating

More information

CSE 410. Operating Systems

CSE 410. Operating Systems CSE 410 Operating Systems Handout: syllabus 1 Today s Lecture Course organization Computing environment Overview of course topics 2 Course Organization Course website http://www.cse.msu.edu/~cse410/ Syllabus

More information

Scheduling the Intel Core i7

Scheduling the Intel Core i7 Third Year Project Report University of Manchester SCHOOL OF COMPUTER SCIENCE Scheduling the Intel Core i7 Ibrahim Alsuheabani Degree Programme: BSc Software Engineering Supervisor: Prof. Alasdair Rawsthorne

More information

The Next Steps in the Evolution of ARM Cortex-M

The Next Steps in the Evolution of ARM Cortex-M The Next Steps in the Evolution of ARM Cortex-M Joseph Yiu Senior Embedded Technology Manager CPU Group ARM Tech Symposia China 2015 November 2015 Trust & Device Integrity from Sensor to Server 2 ARM 2015

More information

CSE 237B Fall 2009 Virtualization, Security and RTOS. Rajesh Gupta Computer Science and Engineering University of California, San Diego.

CSE 237B Fall 2009 Virtualization, Security and RTOS. Rajesh Gupta Computer Science and Engineering University of California, San Diego. CSE 237B Fall 2009 Virtualization, Security and RTOS Rajesh Gupta Computer Science and Engineering University of California, San Diego. Overview What is virtualization? Types of virtualization and VMs

More information

Embedded Computing Platform. Architecture and Instruction Set

Embedded Computing Platform. Architecture and Instruction Set Embedded Computing Platform Microprocessor: Architecture and Instruction Set Ingo Sander ingo@kth.se Microprocessor A central part of the embedded platform A platform is the basic hardware and software

More information

Serial. Parallel. CIT 668: System Architecture 2/14/2011. Topics. Serial and Parallel Computation. Parallel Computing

Serial. Parallel. CIT 668: System Architecture 2/14/2011. Topics. Serial and Parallel Computation. Parallel Computing CIT 668: System Architecture Parallel Computing Topics 1. What is Parallel Computing? 2. Why use Parallel Computing? 3. Types of Parallelism 4. Amdahl s Law 5. Flynn s Taxonomy of Parallel Computers 6.

More information

Fixed-Point Math and Other Optimizations

Fixed-Point Math and Other Optimizations Fixed-Point Math and Other Optimizations Embedded Systems 8-1 Fixed Point Math Why and How Floating point is too slow and integers truncate the data Floating point subroutines: slower than native, overhead

More information

Low-power Architecture. By: Jonathan Herbst Scott Duntley

Low-power Architecture. By: Jonathan Herbst Scott Duntley Low-power Architecture By: Jonathan Herbst Scott Duntley Why low power? Has become necessary with new-age demands: o Increasing design complexity o Demands of and for portable equipment Communication Media

More information

ECE571: Advanced Microprocessor Design Final Project Spring Officially Due: Friday, 4 May 2018 (Last day of Classes)

ECE571: Advanced Microprocessor Design Final Project Spring Officially Due: Friday, 4 May 2018 (Last day of Classes) Overview: ECE571: Advanced Microprocessor Design Final Project Spring 2018 Officially Due: Friday, 4 May 2018 (Last day of Classes) Design a project that explores the power, energy, and/or performance

More information

Copyright 2016 Xilinx

Copyright 2016 Xilinx Zynq Architecture Zynq Vivado 2015.4 Version This material exempt per Department of Commerce license exception TSU Objectives After completing this module, you will be able to: Identify the basic building

More information

CHAPTER 5 A Closer Look at Instruction Set Architectures

CHAPTER 5 A Closer Look at Instruction Set Architectures CHAPTER 5 A Closer Look at Instruction Set Architectures 5.1 Introduction 293 5.2 Instruction Formats 293 5.2.1 Design Decisions for Instruction Sets 294 5.2.2 Little versus Big Endian 295 5.2.3 Internal

More information

EE4144: ARM Cortex-M Processor

EE4144: ARM Cortex-M Processor EE4144: ARM Cortex-M Processor EE4144 Fall 2014 EE4144 EE4144: ARM Cortex-M Processor Fall 2014 1 / 10 ARM Cortex-M 32-bit RISC processor Cortex-M4F Cortex-M3 + DSP instructions + floating point unit (FPU)

More information

Notos: Efficient Emulation of Wireless Sensor Networks with Binary-to-Source Translation

Notos: Efficient Emulation of Wireless Sensor Networks with Binary-to-Source Translation Schützenbahn 70 45127 Essen, Germany Notos: Efficient Emulation of Wireless Sensor Networks with Binary-to-Source Translation Robert Sauter, Sascha Jungen, Richard Figura, and Pedro José Marrón, Germany

More information

A Comparative Performance Evaluation of Different Application Domains on Server Processor Architectures

A Comparative Performance Evaluation of Different Application Domains on Server Processor Architectures A Comparative Performance Evaluation of Different Application Domains on Server Processor Architectures W.M. Roshan Weerasuriya and D.N. Ranasinghe University of Colombo School of Computing A Comparative

More information

The Nios II Family of Configurable Soft-core Processors

The Nios II Family of Configurable Soft-core Processors The Nios II Family of Configurable Soft-core Processors James Ball August 16, 2005 2005 Altera Corporation Agenda Nios II Introduction Configuring your CPU FPGA vs. ASIC CPU Design Instruction Set Architecture

More information

The ARM10 Family of Advanced Microprocessor Cores

The ARM10 Family of Advanced Microprocessor Cores The ARM10 Family of Advanced Microprocessor Cores Stephen Hill ARM Austin Design Center 1 Agenda Design overview Microarchitecture ARM10 o o Memory System Interrupt response 3. Power o o 4. VFP10 ETM10

More information

5. ARM 기반모니터프로그램사용. Embedded Processors. DE1-SoC 보드 (IntelFPGA) Application Processors. Development of the ARM Architecture.

5. ARM 기반모니터프로그램사용. Embedded Processors. DE1-SoC 보드 (IntelFPGA) Application Processors. Development of the ARM Architecture. Embedded Processors 5. ARM 기반모니터프로그램사용 DE1-SoC 보드 (IntelFPGA) 2 Application Processors Development of the ARM Architecture v4 v5 v6 v7 Halfword and signed halfword / byte support System mode Thumb instruction

More information

UNIT 2 (ECS-10CS72) VTU Question paper solutions

UNIT 2 (ECS-10CS72) VTU Question paper solutions UNIT 2 (ECS-10CS72) VTU Question paper solutions 1. Differentiate between Harvard and von Neumann architecture. Jun 14 The Harvard architecture is a computer architecture with physically separate storage

More information

Parallels Virtuozzo Containers

Parallels Virtuozzo Containers Parallels Virtuozzo Containers White Paper Parallels Virtuozzo Containers for Windows Capacity and Scaling www.parallels.com Version 1.0 Table of Contents Introduction... 3 Resources and bottlenecks...

More information

Operating Systems Course 2 nd semester 2016/2017 Chapter 1: Introduction

Operating Systems Course 2 nd semester 2016/2017 Chapter 1: Introduction Operating Systems Course 2 nd semester 2016/2017 Chapter 1: Introduction Lecturer: Eng. Mohamed B. Abubaker Note: Adapted from the resources of textbox Operating System Concepts, 9 th edition What is an

More information

CODE TIME TECHNOLOGIES. mabassi RTOS. Porting Document. SMP / ARM Cortex-A9 CCS

CODE TIME TECHNOLOGIES. mabassi RTOS. Porting Document. SMP / ARM Cortex-A9 CCS CODE TIME TECHNOLOGIES mabassi RTOS Porting Document SMP / ARM Cortex-A9 CCS Copyright Information This document is copyright Code Time Technologies Inc. 2012-2016. All rights reserved. No part of this

More information

OPERATING SYSTEMS UNIT - 1

OPERATING SYSTEMS UNIT - 1 OPERATING SYSTEMS UNIT - 1 Syllabus UNIT I FUNDAMENTALS Introduction: Mainframe systems Desktop Systems Multiprocessor Systems Distributed Systems Clustered Systems Real Time Systems Handheld Systems -

More information

Real instruction set architectures. Part 2: a representative sample

Real instruction set architectures. Part 2: a representative sample Real instruction set architectures Part 2: a representative sample Some historical architectures VAX: Digital s line of midsize computers, dominant in academia in the 70s and 80s Characteristics: Variable-length

More information