AN INTEGRATED DEVELOPMENT / RUN-TIME ENVIRONMENT

Size: px

Start display at page:

Download "AN INTEGRATED DEVELOPMENT / RUN-TIME ENVIRONMENT"

Ezra Farmer
6 years ago
Views:

1 AN INTEGRATED DEVELOPMENT / RUN-TIME ENVIRONMENT William Cave & Robert Wassmer - May 12, 2012 INTRODUCTION The need for an integrated software development / run-time environment is motivated by requirements to significantly improve productivity and run-time speed on parallel processors. Trying to meet such objectives using current programming languages is a challenge that software developers no longer need to confront. This paper describes an approach to building software that follows from engineering principles. Having used this system (known as VisiSoft), it becomes clear that design of the development environment must be integrated with that of the run-time environment. The approach described here achieves the following objectives. Substantially simplify software development for parallel processors. Create software that runs much faster on single as well as parallel processors. Control the growing complexity of a software system as it is expanded. Create modules that can be changed with minimal effects on the rest of the system. SOFTWARE-HARDWARE ENVIRONMENT - FUNCTIONAL REQUIREMENTS Figure 1 illustrates a decomposition of the software-hardware environment from application requirements to results. To address the objectives in the introduction, we must consider the individual requirements in the chain of elements in the software-hardware environment. These are described below, refer to [1]. APPLICATION REQUIREMENTS DESIGN ARCHITECTURE & PRODUCE CODE SOFTWARE-HARDWARE ENVIRONMENT RUN-TIME ENVIRONMENT DEVELOPMENT ENVIRONMENT APPLICATION SOFTWARE RUN-TIME SYSTEM (RTS) VIRTUAL PARALLEL OPERATING SYSTEM (VPOS) HARDWARE RESULTS PlatformHierarchies 05/21/12 Figure 1. Overall software - hardware environment. The authors are with Visual Software International ( Software Architecture Page 1

2 Application Requirements This approach addresses large complex parallel processor applications requiring a team effort. For the purposes of this paper, applications are divided into three types: 1. Embarrassingly Parallel - Applications that may be split into independent tasks that run concurrently with effectively no exchange of information. 2. Partially Independent - A single task that may be split into independent modules that must exchange information during the task, but where the processing time for information exchanges are small compared to what is going on inside the modules. 3. Effectively Sequential - A single task where most instructions follow from the prior ones, providing little chance for concurrent processing. This paper addresses partially independent applications, i.e., those with a reasonable amount of inherent parallelism, that may be run effectively on a parallel processor to meet stringent run-time speed requirements. Examples are real-time planning and control systems used in large manufacturing plants, or simulations of many platforms, e.g., aircraft, exchanging information - by radio - that affects their future behavior. The applications addressed also require high reliability, rapid enhancement to support new features, and potential for growth of complexity. Architectural Design In the case of applications to be run on a parallel processor, the architect must decompose the application into sets of relatively independent modules that represent the inherent parallelism of the system. By this we imply that processing within modules far exceeds communications between modules, stemming from the inherent parallelism in the system (partially independent). These modules may then be placed on separate processors to run efficiently. For complex systems requiring special skills to produce a design (e.g., systems requiring detailed engineering knowledge, special experience, or historic statistical knowledge), subject area experts must be able to understand both software architectures and code with minimal help from programmers. They must also be able to help design architectures that take full advantage of the inherent parallelism in the application system since only they may have that knowledge. Development Environment The development environment must support high productivity to minimize the time and cost of development, validation, and testing. This implies rapid translation of application requirements into software architectures that reflect the inherent parallelism in the system. This is particularly true during post development upgrades and support. This implies that architectures can be easily inspected, visually - using engineering drawings of connectivity (they are not flow charts), to maintain full control over the design. It also implies that the language effectively supports this architectural breakout. In addition, the language must be easily read directly by subject area experts, so they can understand and validate complex algorithms representing the system as well as the architectural breakout. Software Architecture Page 2

3 The development environment must also produce the information needed by the run-time environment to ensure that full advantage is taken of the architectural characteristics of the application software. This is especially true when trying to achieve high run-time speeds on a parallel processor while minimizing the machine resources required to achieve that speed. This information includes designation of the independent modules that may be assigned to separate processors. It must also produce the connectivity properties between modules so those that communicate may be located on physically adjacent processors to minimize communication delays. In the case that the use of these connectivity properties is nonstationary (modules may vary their use of the connectivity properties by communicating with different modules as they operate), modules may be migrated to reduce communication delays during run time. To support the above, the development environment must produce the application software object code in segments, corresponding to the independent modules produced by the architecture. Similarly, it must produce the database describing the independent module architecture along with management software to interface with the OS. Given this information, the OS can take maximum advantage of a (potentially simplified) hardware architecture. Application Software It is essential that the resulting application software be able to run fast on a single or parallel processor while using minimum machine resources. This implies that the machine code is organized such that hardware resource management, and in particular memory management, is simplified. This implies that the chunks of code to be managed are well defined and organized into a minimum number of chunks. This is another architectural design problem that depends heavily on the language used in the development environment to describe the databases. Run-Time System (RTS) The run-time system must provide the translation of architectural information from the development environment into calls to the OS during run time. It is the architectural design that minimizes the movement of instruction memory as well as data memory at run time. As indicated above, architectural information can be used to optimize processor allocation so as to minimize memory boundary crossing delays. Use of this information by the run-time system is critical to effective use of parallel processors. Virtual Parallel Operating System (VPOS) VPOS must be designed to take full advantage of the information provided by the runtime system. Specifically, it must be designed to allocate and assign hardware machine resources to make maximum effective use of this information. This includes minimizing overhead and memory sharing delays to achieve maximum run-time speed. This can only be achieved by allocating processors and memory to independent modules based upon the architectural information, including the possible migration of partially independent modules when the time-constants of nonstationary inter-module communications permit. Software Architecture Page 3

4 Hardware In applications where run-time speeds are critical and parallel processors are required to support a single task, the hardware design must support the run-time system and corresponding OS requirements. In general, one typically trades memory for speed, duplicating instruction sets and stationary databases on separate processors to avoid swapping and paging. With the approach to architecture described here, hardware designers can focus on the essentials of minimizing overhead and memory sharing delays to achieve maximum speeds on a parallel processor, with little concern for the inherent architecture of an application software system. This is because full knowledge of the inherent parallelism of the system is embedded in the architectural design and automatically transferred to the run-time system. With the integrated approach described here, the software development environment directly impacts the design of the run-time environment, including the OS. This, in turn, can be used to simplify design of multi-core chips. Specifically, the combination of language facilities and architecture eliminates the need for the hardware facilities in the bullets below, opening up chip real-estate for better use, e.g., more memory. Cache coherency Thread synchronization Stack facilities Special instruction swapping facilities In the case where parallel processors may be dedicated to algorithm-intensive or memory-intensive applications that consume substantial processor time, they may be connected to server chips via shared memory as illustrated in Figure 2. When properly housed with a shared memory server environment, parallel processor chips need not interface directly with disks, communication channels, graphics, work stations, etc. One-way memory transfers to and from the server replace the need for special DMA channels or device interfaces. ACHIEVING SPEED INCREASES Design of the language for VisiSoft was driven by speed and accuracy for discrete event simulations of physical systems, typically with a high degree of inherent parallelism. The principle requirement was to develop a language that made it easy to build complex software for parallel processors as well as ease of understanding by subject area experts. The first step in the design was to separate data from instructions at the coding level. Known as the Separation Principle, this simplifies the ability to track which sets of instructions share what data sets. To minimize the number of data elements to be tracked requires the ability to support large hierarchical data structures. Similarly, one wants large hierarchical rule sets within a single process (a group of assembler instructions). Given that blocks of data are separated from blocks of instructions at the language level, one can easily build independent modules that map into the inherent parallelism of an application. As a by-product, this provides the ability to visualize the design using engineering drawings showing the connectivity of blocks of instructions with blocks of data (they are not flow charts). These are represented by icons that are grouped into hierarchical modules that form an independent module at the top layer, see Figure 3. Software Architecture Page 4

5 GENERAL PARALLEL PROCESSOR FACILITY MASTER CONTROLLER & BACKUPS SERVERs MASTER OS SERVER OS-1 SERVER OS-2 SERVER OS-3 SERVER OS-4 SERVER OS-5 SERVER OS-6 RUN-TIME MASTER_1 RUN-TIME MASTER_2 RUN-TIME MASTER_3 PARALLEL_PROCESSORS Parallel_processor_hardware 01/16/12 Figure 2. Server environment with parallel processors. Software Architecture Page 5

6 Software Architecture Page 6 UD UD Figure 3. Illustration of editing processes and resources on the drawing. PROPAGATION_PREDICTION PROPAGATION_PREDICTION FPPS 08/26/07

7 Software Decomposition - Creating Independent Modules The decomposition of a software system into independent modules implies drawing boundaries around the elements in a system that comprise a specified module. Any system can be decomposed into a set of modules. Furthermore, as modules get large, they can be decomposed hierarchically into submodules, etc. Creating modules that can run concurrently on a parallel processor presents explicit requirements on module design. Two modules can run concurrently only if they are independent. This implies that they share no data, else they incur the potential for incoherent use of that data. The independence property is also an important contribution to the other requirements stated above. To determine the independence of modules, one needs a map of the data shared between the processes (groups of instructions). This leads to the concept of software architecture as shown in Figure 3, a totally new approach to software design. Multipliers On The Speed Multipliers Being able to easily define and reference large data structures as illustrated in Figure 3, they may be moved using a single instruction fetch into another shared structure that defines the details of all of the elements. This provides for significant increases in speed when working with algorithms requiring large state vectors or databases. This has been born out by a substantial number of case histories and experiments. Given the speed multipliers that VisiSoft has generated on single processors, one may expect to use fewer processors (as many as a factor of 10 less) simply by using the VisiSoft environment to build the software. Using the architectural features of VisiSoft, one may create larger independent modules that will run faster (using less overhead) provided that each processor has sufficient adjacent memory. This new architectural approach affords speed increases that require fewer processors to achieve the same speed multiplier. Using fewer processors reduces the distance between processors, further increasing the speed multiplier. This is clearly a nonlinear function, where speed increases with fewer processors. Conversely, speed will decrease nonlinearly with more processors if they increase the overhead. This has been shown to be true in many parallel processor experiments. Ensuring Data Coherency When one independent module wants to communicate with another, it simply copies the shared data structure into a similar system data structure that ensures coherency of the data. The system data structure is part of the run-time system that places interlocks on processes that share the data structure. This includes the timing on scheduling of threads within each independent module. Software Architecture Page 7

8 Scheduling Of Threads Using the integrated approach, a thread must be contained within a single independent module. Threads within an independent module cannot run concurrently since they are on a single processor. Threads in one independent module may schedule threads in another (or the same) independent module. All threads are controlled by a run-time system scheduler that ensures they do not get out of synchronization, including those on separate processors. Thus, the developer has no concern for synchronization of threads or corresponding race conditions. Special facilities exist that allow timing to be out-of-sync up to a ΔT when using a discrete event simulation clock, where ΔT is determined based upon comparing error distributions of simulated results with live test data or single processor simulations. SUMMARY This paper describes the application of concepts and principles derived from engineering to support the design of large complex software systems for parallel processors. These principles include the properties of independence derived from separating data from instructions (the Separation Principle). These properties lead to increasing speed while reducing the effort required to develop software and to support enhancements that increase complexity, particularly when using parallel processors. This approach automates thread synchronization and eliminates the need for hardware (cache) coherency checks. The independence properties of modular architectures and understandability of complex algorithms have been confirmed on many large software projects. The language is easily read directly by subject area experts who must understand and validate complex algorithms representing the system as well as the architectural breakout. This approach simplifies the development of large software systems, particularly those whose complexity is high and constantly increasing, as well as those requiring the speed of a parallel processor. REFERENCES [1] Cave, W.C. et al, Time is of the Essence: Software Engineering for Parallel Processors, Visual Software International, Spring Lake, NJ, Dec Software Architecture Page 8

Operating Systems: Internals and Design Principles. Chapter 2 Operating System Overview Seventh Edition By William Stallings

Operating Systems: Internals and Design Principles Chapter 2 Operating System Overview Seventh Edition By William Stallings Operating Systems: Internals and Design Principles Operating systems are those