Design Exploration and Implementation of Simplex Algorithm over Reconfigurable Computing Platforms

Size: px

Start display at page:

Download "Design Exploration and Implementation of Simplex Algorithm over Reconfigurable Computing Platforms"

Kenneth Franklin
5 years ago
Views:

1 Design Exploration and Implementation of Simplex Algorithm over Reconfigurable Computing Platforms Sparsh Mittal Department of Electrical and Computer Engg., ISU Ames, USA Lizhi Wang Department of Electrical and Computer Engg., ISU Ames, USA Abstract Linear programming (LP) is an important tool for many inter-disciplinary optimization problems. he Simplex method is the most widely used algorithm to solve LP problems and has immense impact on several developments in various fields. With development of public domain and commercial software solvers, it has been automated and made available for use. A serious bottleneck in implementation of Simplex algorithm is the efficient implementation over application-specific processors and parallel hardware platforms such as Field Programmable Gate Arrays. Such implementation could result in drastic speed up in execution of linear programming models. In this paper, we implement Simplex algorithm over FPGA with both low-level design language namely VHDL and high-level design and modeling packages for hardware generation. In addition, we have also modeled the design in Simulink to serve as an intermediate design for migration from software to hardware. A comparison with existing works promises large speed-ups. Keywords- Linear Programming; Simplex; Simulink; Xilinx System Generator; VHDL programming; I. INRODUCION Linear programming refers to the optimization techniques where both objective functions and the constraints are linear. he linear programming started in 1947 with the discovery of the Simplex method by Dantzig [1]. It allows mechanical solutions for optimization problems with large number of programming constraints and variables. Simplex method is a simple, elegant, yet powerful tool for solving linear programming problems. It requires only function evaluations, not derivatives and can be solved efficiently in software. Although different algorithms have been proposed for solving LP problems, Simplex remains a popular choice. With the availability of many Simplex-based solvers on many general purpose processing platforms, it is being extensively used in diverse engineering domains. However the computation intensive nature of the problem and the algorithm calls for greater processing power and greater his work is partially supported by the National Science Foundation under Grant # to the Computing Research Association for the CIFellows Project. Amit Pande Department of Computer Science, UC Davis, California, USA amit@cs.ucdavis.edu Praveen Kumar Department of Computer Science, GRIE Hyderabad, India praveen.kverma@gmail.com speed for efficient computation in solving real-time problems. Recently, great speedups have been achieved for several algorithms by efficient implementation in dedicated hardware such as Application-Specific Integrated Circuits (ASICs). However, high time-to market has been a bottleneck for the ASICs. he evolution of Field Programmable Gate Arrays (FPGAs) along with high-level design tools such as from Altera, Xilinx System Generator have come as valuable and effective tool for high-level programmers to achieve better execution times in these reconfigurable hardware. FPGA expedite the time lag between hardware design and shipping time of the circuit from 2-3 years to a few weeks. In this paper, we implement Simplex algorithm on FPGA using both VHDL (a low level programming language) and XSG (a high level visual tool for hardware generation), for small-sized problems and also model and simulate the algorithm on Simulink. he key contributions of our work are as follows: 1) o best of our knowledge, this is the first model of Simplex in Simulink for ease in visualization and simulation. 2) We are also the first to implement Simplex in System Generator for FPGA design. 3) We have also developed Simplex on FPGA, using direct design in VHDL to achieve a fast implementation. 4) We discuss the parallelization obtained by efficient tableau based representation. he clock frequency achieved by such design is compared with that in general purpose software. he paper is organized as follows: Section II discusses about basics of simplex method. Section III discusses about existing literature work while Section IV discusses some design languages for hardware implementation on FPGA. wo such implementations are then discussed: Simulink based design in Section V and vhdl based coding in Section VI. Section VII gives conclusion and future work in this direction. II. BACKGROUND A linear program is represented in the standard form in

2 matrix notation: M A X C x s. t. A x w b x, w 0 Here C, xw, n n b m, and are decision variables. m n A are parameters and For the special case of three variables (n=3) and three constraints (m=3), it can be explicitly written as: M A X = C x C x C x st x, x, x, w, w, w In what follows, we briefly explain the working of the Simplex. he basic idea of simplex is based on the observation that the optimal solution to an LP, if exists, occurs at an extreme point of the feasible region (called basic solution"). Based on this observation, we can find the optimal solution by (i) starting from a feasible corner point, and (ii) moving to a better corner point until the current one is already optimal. If we cannot find a starting point, then the LP is infeasible; if we can optimize the objective value to infinity, then the LP is unbounded. We write the problem in following matrix form: M A X { C x : A x b, x 0} Here, C [ C 0 ], A [ A I ] x [ x w ] m 1 m m n m We need n m linearly independent active constraints to uniquely determine a basic solution. Define N as the indices of constraints in x 0 that are set to hold at equality, and B as the indices of other constraints in x 0. Such a pair of (B,N) is an exclusive and exhaustive partition of the set {1, 2,..., m n}. he above conditions are only necessary for a basic solution. o find the sufficient condition for a basic solution, we rewrite using the definition of N and B: M A X C x C x B B N N s. t. A x A x b x B B B N N 0, x 0 N where A B is the collection of columns in A whose indices are in the set B, and N x and N C are the collections of elements in x and C, respectively, whose indices are in the set N. hen, the necessary and sufficient conditions for a basic solution are: he m elements in the set B should be chosen such that A B is invertible, and the n elements in the set N are then determined by N {1, 2,..., m n} / B. Such a partition is called a basic partition. In any iteration with a feasible basic partition ( B, N ) which is not optimal, the partition is updated by selecting an entering variable and a leaving variable. he rule for selecting the entering and leaving variables is called a pivoting rule. We have used Bland's pivoting rule. After the entering and leaving variables are chosen, we get an updated partition. his process is repeated till an optimal partition and solution is found. In worst case, Simplex may require exponentially many iterations to examine each of the basic solution, and other methods (such as ellipsoid) exist which theoretically are guaranteed to be polynomial. However, the practical performance of simplex algorithm is in general much better than that of the ellipsoid method, and that is one of the reasons simplex algorithm is widely used. III. LIERAURE REVIEW In literature, several techniques have been proposed for the solution of linear programs. For example, feasible direction methods are proposed by Brown and Koopmans [2], as well as by Murty and Faithi [3], among others. Megiddo [4] reduces the number of constraints through a multidimensional search technique. he ellipsoid algorithm [5] first established that linear programming problems can be solved in polynomial time, but it performs poorly in practice. Karmarkar [6] developed a polynomial projection approach that is used in some applications. However, the simplex algorithm remains the underlying algorithm utilized by most commercial linear programming packages. Even though the simplex algorithm is not polynomial, in practice it is found to be efficient enough to be used and Borgwardt [7] proved that its expected number of iterations is polynomial when it is applied for practical problems. he main computational disadvantage of the simplex algorithm is that the total number of iterations cannot be predicted. As dimension n increases, the computational time rises up exponentially. o improve the efficiency, parallel implementations of linear programming algorithms have been studied extensively in the recent years ([8,9,10]). Linear programming is applied to a large variety of scientific and industrial computing applications employing optimization problems. A few application areas include real time motion analysis ([11]), MIMO detection and decoding

3 ([12]) etc. In these applications, linear programming is preferred over nonlinear programming because of its efficiency and other problem-specific advantages. here are many variants of Simplex that have been developed and are more efficient than naïve simplex such as Cosine Simplex etc., which offer some improvement such as reduction in the number of simplex iterations and the number of computations in each iteration. We also discuss the work done on implementing Simplex on hardware. Majumdar [13] implements integer linear programming on FPGA and show a speed-up over software implementation. heir design is composed of both software and hardware unit. he software unit accepts the input file and scans it for the problem size, objective and different components of the input and sends it to the hardware unit where it is stored into the Zero Bus urn around (ZB) of Virtex-II and sends the data to the processing module. he processing module processes the data and sends the solution to the output module that gets stored in the ZB. hey have used dictionary based representation of problem, however we have used tableau based representation for efficient computations. Due to large hardware requirements and lack of pipelining, their implementation is slow compared to ours as shown in very poor clock frequency. Klindworth and Schutz [14] present a hardware realization of Simplex. hey discuss the solution of problem where many operands (coefficients in A and b) are zeros. he hardware is based on parallel architecture and it employs standards FPUs, RAMs and custom VLSI chips. hey use a VLSI chip model which is somewhat like a multicore chip. However implementation on FPGA has its own advantages. hey use eight processing units to get parallelism, however, by efficiently exploiting parallelism of FPGA, we promise very high parallelism (as shown in section 6, such as 28 or 100 or more). he small time-to-market for FPGAs over VLSI models is the reason for popular choice of FPGAs in current market. Besides, none of the current system uses any modeling or simulation language for visualization and demonstration of this algorithm to enhance learning. Moreover, even though commercial and public domain software packages for Simplex exist and are widely used, the immense potential of hardware has hardly been utilized for enhancing performance of this computation intensive algorithm. In this paper, we address these limitations. IV. DESIGN LANGUAGES FOR IMPLEMENAION he salient features of FPGAs that make them superior in speed, over conventional general purpose hardware like Pentiums are their greater I/O bandwidth to local memory, pipelining, parallelism and availability of optimizing compiler. Complex tasks, which involve, multiple image operators, run much faster on FPGAs than on Pentiums, in fact, Bruce (2003) reports an 800-time speed up by FPGA using SA-C. here are several reasons for such large speed up which FPGAs have over PCs. In comparison to an FPGA, hardware such as Pentium runs at memory speed, not at cache speed. So, even running at much higher clock frequency and having the facility of cache memory, it responds much slower than a comparable FPGA. Frequency of operation in hardware such as Pentium can be increased up to a certain extent to increase the performance or the required data rate to process the image data, but increasing the frequency above certain limits causes system level and board level issues that become a bottleneck in the design. Choosing an appropriate tool for FPGA design is of crucial importance as it affects the cost, development time and various other aspects of design. Simulink is a platform for multi-domain simulation and Model-Based Design for dynamic systems. It provides an interactive graphical environment and a set of block libraries, and can be extended for different specialized applications. Using Simulink one can quickly build up models from libraries of pre-built blocks. For high level design we have chosen Xilinx System Generator. It is a DSP design tool from Xilinx that enables the use of the Mathworks model-based design environment Simulink for FPGA design. Xilinx System Generator (XSG) for DSP is a tool which offers block libraries that plugs into Simulink tool (containing bit-true and cycleaccurate models of their FPGA s particular math, logic, and DSP functions). It is a system-level modeling tool in which designs are captured in the DSP friendly Simulink modeling environment using a Xilinx specific blockset. All of the downstream FPGA implementation steps including synthesis and place and route are automatically performed to generate an FPGA programming file. Over 90 DSP building blocks are provided in the Xilinx DSP blockset for Simulink. V. SYSEM DESIGN Figure 1 shows our model of Simplex Solver on Simulink. Simplex iteratively searches for the optimal solution till one is found and checks the vertices of the feasible region for its computation. he values of the coefficients at the end of one step act as starting point in the next step of pivot computation. hus in a visual data flow environment, this is represented by a feedback network or memory element to remember the previous value of coefficients in the Simplex tableau. We have implemented models using both the properties. For sake of brevity, we omit the figure employing feedback network to update value. he model in figure 1 uses persistent variables for this purpose. hey have the special property that they need to be initialized only once during first function call and remember their values during subsequent function calls. he value of objective function can be inferred from the display for both current step and optimal (final) step. he simulation automatically stops on finding optimal value. Note that,

Figure 1: Simplex Model in Simulink Figure 2: Simplex Model in Xilinx System Generator since this design does not use Xilinx Blocksets, it cannot be directly implemented in hardware.

he input and output interface blocks carry out the function of interfacing between signal produced by Simulink sources and that to be used by Xilinx blocks and vice versa.

4 Figure 1: Simplex Model in Simulink Figure 2: Simplex Model in Xilinx System Generator since this design does not use Xilinx Blocksets, it cannot be directly implemented in hardware. Figure 2 shows the model of Simplex in System Generator. he input and output interface blocks carry out the function of interfacing between signal produced by Simulink sources and that to be used by Xilinx blocks and vice versa. Except sources and sinks (for display of results), this design is composed entirely of Xilnx blocks and hence can be used to generate the hardware at the click of the button. VI. VHDL IMPLEMENAION VHDL or Very high speed integrated circuits Hardware Description Language has been the choice of commercial and military consumers for digital hardware design (Kief, 2008) for the past and continues to dominate the commercial market due to optimized implementation on hardware and availability of large number of free IP cores. After studying the solution of Simplex method using Simulink we demonstrated its hardware feasibility and visual interface through Xilinx system generator. Many blocks of custom Matlab code (.m files) were however needed for the design and the hardware generated for these blocks was not optimized. In this section we present the details of design implemented using VHDL programming language and later synthesized in Xilinx ISE. Xilinx ISE is a design tool provided by Xilinx to help build bit streams to be directly ported into the FPGA boards.. he Xilinx ISE tool performs several optimizations before synthesizing the design. We targeted the Xilinx Vertex V XCVLX330-LX board. he hardware usage of FPGA is presented table 1. he hardware was pipelined to increase the critical path of the design and increase the clock frequency. he multipliers were implemented in hardware with the help of

5 Extreme DSP slices while the divider IP core was generated using Xilinx core-generator software. he hardware implementation details are presented in table 2. A clock frequency of 644 MHz was achieved with a able 1. he Hardware usage statistics of FPGA Slice Logic Utilization: Number of Slice Registers: 1029 / Number of Slice LUs: 1018/ Number used as Logic: 1012 / Number used as Memory: 6 / Slice Logic Distribution: Number of LU Flip Flop pairs used: 1591 able 2. he hardware implementation details on Xilinx FPGA # Multipliers : 27 16x16-bit multiplier : 27 # Adders/Subtractors : bit adder : bit subtractor : 28 # Registers : bit register : 91 3-bit register : 1 # Latches : 20 1-bit latch : 20 # Comparators : bit comparator greater : 3 16-bit comparator less : 26 # Multiplexers : 3 16-bit 8-to-1 multiplexer : 3 # Xors : 10 1-bit xor2 : 10 #Dividers : 8 latency of 3 cycles. his implies that we can move from one optimal solution to another in 4.5 ns. It can be observed from able 1 that the implementation (a standard LP with 3 variables and 3 constraints), leaves most of the FPGA hardware unutilized. herefore, we can increase the number of variables to a very large value and still get a reasonably good implementation. As the number of variables and constraints increase, there is a quadratic increase in hardware resources (slice registers) usage. However, since most of the multiplication operations are done in parallel and in a row/ column-wise manner, the clock frequency decreases linearly. As we increase the number of variables, the clock frequency of FPGA based implementation will decrease, owing to large time in signal propagation through interconnects, however we expect that the performance will be still better than other software based implementations where the increase in number of variables cannot be accompanied with increased resource utilization. VII. CONCLUSIONS AND FUURE WORK Advances in FPGA technology along with development of elaborate and efficient tools for modeling, simulation and synthesis have made FPGAs a highly useful platform. With a graphical environment based on Simulink and a predefined block set of Xilinx DSP cores, System Generator meets the needs of both system architects who need to integrate the components of a complete design and hardware designers who need to optimize implementations. We have implemented Simplex over Simulink and over FPGA using Xilinx System Generator for problem size of three variables and constraints. We presented the synthesis results for implementation over Vertex V XCVLX330 FPGA board. A high clock frequency of 644 MHz was obtained. he future work will focus on development of visually enhanced implementation of Simplex on Simulink and its generalization to arbitrary large number of variables, using powerful graphical functions of Simulink. We also plan to conduct a survey among undergraduate and graduate students, learning Simplex algorithm to assess how a graphical implementation of Simplex assists in and augments their learning process. REFERENCES [1] Dantzig, G.B.. Maximization of a linear function of variables subject to linear inequalities. In.C. Koopmans, editor, Activity Analysis of Production and Allocation, number 13 in Cowles Commission Monographs, pages 339_347, John Wiley & Sons, Inc. [2] Brown, G.W. and Koopmans,.C. Computational suggestions for maximizing a linear function subject to linear inequalities in.c. Koopmans, editor, Activity Analysis of Production and Allocation, John Wiley, New York (1951). [3] Murty, K.G. and Faithi, Y. A feasible direction method for linear programming Operations Research Letters 3, (1984). [4] Megiddo, N. Linear programming in linear time when the dimension is fixed Journal of the Association of Computing Machinery 31, (1984). [5] R.G. Bland, D. Goldfarb, and M.J. odd, he ellipsoid method: a survey, Operations Research 29, (1981). [6] Karmarkar, N. A new polynomial-time algorithm for linear programming Combinatorica 4, (1984). [7] Borgwardt, K. H. Some distribution independent results about the asymptotic order of the average number of pivot steps in the simplex method Mathematics of Operations Research, vol. 7, no. 3, pp , [8] Maros, I. and Mitra, G. Investigating the sparse simplex method on a distributed memory multiprocessor, Parallel Computing, vol. 26, pp , [9] Klabjan, D. Johnson, L. E. and Nemhauser, L. G. A parallel primaldual simplex algorithm Operations Research Letters, vol. 27, no. 2, pp , [10] Eckstein, J. Bodurglu, I. Polymenakos, L. and Goldfarb, D. Data- Parallel Implementations of Dense Simplex Methods on the ConnectionMachine CM-2 ORSA Journal on Computing, vol. 7, no. 4, pp , [11] Ben-Ezra, M. Peleg, S. Werman, M. Real-time motion analysis with linear programming Computer Vision and Image Understanding, vol.78 no.1, pp.32-52, April [12] Cui,. Ho,. ellambura, C. Linear Programming Detection and Decoding for MIMO Systems IEEE International Symposium on Information heory, pp , July [13] Majumdar, A. FPGA Implementation Of Integer Linear Programming Accelerator International Conference on Systemics, Cybernetics and Informatics, (ICSCI), Jan [14] Klindworth, A. Schutz, B. A VLSI-Chip-Set for a Hardware- Accelerator for the Simplex-Method; Proc. 5th Ann. IEEE International ASIC Conference, Rochester, NY, Sept. 1992, pp

A SIMULINK-TO-FPGA MULTI-RATE HIERARCHICAL FIR FILTER DESIGN

A SIMULINK-TO-FPGA MULTI-RATE HIERARCHICAL FIR FILTER DESIGN Xiaoying Li 1 Fuming Sun 2 Enhua Wu 1, 3 1 University of Macau, Macao, China 2 University of Science and Technology Beijing, Beijing, China