This action might not be possible to undo. Are you sure you want to continue?

Welcome to Scribd! Start your free trial and access books, documents and more.Find out more

Sparsh Mittal

Department of Electrical and Computer Engg., ISU Ames, USA sparsh@iastate.edu

Amit Pande

Department of Computer Science, UC Davis, California, USA amit@cs.ucdavis.edu

Lizhi Wang

Department of Electrical and Computer Engg., ISU Ames, USA lzwang@iastate.edu

Abstract— Linear programming (LP) is an important tool for many inter-disciplinary optimization problems. The Simplex method is the most widely used algorithm to solve LP problems and has immense impact on several developments in various fields. With development of public domain and commercial software solvers, it has been automated and made available for use. A serious bottleneck in implementation of Simplex algorithm is the efficient implementation over application-specific processors and parallel hardware platforms such as Field Programmable Gate Arrays. Such implementation could result in drastic speed up in execution of linear programming models. In this paper, we implement Simplex algorithm over FPGA with both low-level design language namely VHDL and high-level design and modeling packages for hardware generation. In addition, we have also modeled the design in Simulink to serve as an intermediate design for migration from software to hardware. A comparison with existing works promises large speed-ups. Keywords- Linear Programming; Simplex; Simulink; Xilinx System Generator; VHDL programming;

Praveen Kumar

Department of Computer Science, GRIET Hyderabad, India praveen.kverma@gmail.com speed for efficient computation in solving real-time problems. Recently, great speedups have been achieved for several algorithms by efficient implementation in dedicated hardware such as Application-Specific Integrated Circuits (ASICs). However, high “time-to market” has been a bottleneck for the ASICs. The evolution of Field Programmable Gate Arrays (FPGAs) along with high-level design tools such as from Altera, Xilinx System Generator have come as valuable and effective tool for high-level programmers to achieve better execution times in these reconfigurable hardware. FPGA expedite the time lag between hardware design and shipping time of the circuit from 2-3 years to a few weeks. In this paper, we implement Simplex algorithm on FPGA using both VHDL (a low level programming language) and XSG (a high level visual tool for hardware generation), for small-sized problems and also model and simulate the algorithm on Simulink. The key contributions of our work are as follows: 1) To best of our knowledge, this is the first model of Simplex in Simulink for ease in visualization and simulation. 2) We are also the first to implement Simplex in System Generator for FPGA design. 3) We have also developed Simplex on FPGA, using direct design in VHDL to achieve a fast implementation. 4) We discuss the parallelization obtained by efficient tableau based representation. The clock frequency achieved by such design is compared with that in general purpose software. The paper is organized as follows: Section II discusses about basics of simplex method. Section III discusses about existing literature work while Section IV discusses some design languages for hardware implementation on FPGA. Two such implementations are then discussed: Simulink based design in Section V and vhdl based coding in Section VI. Section VII gives conclusion and future work in this direction. II. BACKGROUND A linear program is represented in the standard form in

I.

INTRODUCTION

Linear programming refers to the optimization techniques where both objective functions and the constraints are linear. The linear programming started in 1947 with the discovery of the Simplex method by Dantzig [1]. It allows mechanical solutions for optimization problems with large number of programming constraints and variables. Simplex method is a simple, elegant, yet powerful tool for solving linear programming problems. It requires only function evaluations, not derivatives and can be solved efficiently in software. Although different algorithms have been proposed for solving LP problems, Simplex remains a popular choice. With the availability of many Simplex-based solvers on many general purpose processing platforms, it is being extensively used in diverse engineering domains. However the computation intensive nature of the problem and the algorithm calls for greater processing power and greater

This work is partially supported by the National Science Foundation under Grant #1019343 to the Computing Research Association for the CIFellows Project.

N) is an exclusive and exhaustive partition of the set {1.... A1 1 x 1 A1 2 x 2 A1 3 x 3 w 1 b 1 A 2 1 x1 A 2 2 x 2 A 2 3 x 3 w 2 b 2 A 3 1 x1 A 3 2 x 2 A 3 3 x 3 w 3 b 3 x1 . In worst case. parallel implementations of linear programming algorithms have been studied extensively in the recent years ([8. Define N as the indices of constraints in x 0 that are set to hold at equality. occurs at an extreme point of the feasible region (called “basic solution"). feasible direction methods are proposed by Brown and Koopmans [2]. respectively.. 2 . MIMO detection and decoding C B xB C N xN T T . x 3 .. The main computational disadvantage of the simplex algorithm is that the total number of iterations cannot be predicted. and A mn are parameters and are decision variables. In any iteration with a feasible basic partition ( B ..t . and that is one of the reasons simplex algorithm is widely used. xN 0 Here C x. w 1 . as well as by Murty and Faithi [3]. 2 . the necessary and sufficient conditions for a basic solution are: The m elements in the set B should be chosen such that A B is invertible. and N x and N C are the collections of elements in x and C .. If we cannot find a starting point. LITERATURE REVIEW In what follows. x 0} T Here. For the special case of three variables (n=3) and three constraints (m=3). m n } / B . Such a partition is called a basic partition. if we can optimize the objective value to infinity. the simplex algorithm remains the underlying algorithm utilized by most commercial linear programming packages. it can be explicitly written as: M AX s . and the n elements in the set N are then = C 1 x1 C 2 x 2 C 3 x 3 determined by N {1. the practical performance of simplex algorithm is in general much better than that of the ellipsoid method. The ellipsoid algorithm [5] first established that linear programming problems can be solved in polynomial time.. m n } . we rewrite using the definition of N and B: M AX In literature. if exists. then the LP is infeasible. However.. . After the entering and leaving variables are chosen.. The basic idea of simplex is based on the observation that the optimal solution to an LP.. in practice it is found to be efficient enough to be used and Borgwardt [7] proved that its expected number of iterations is polynomial when it is applied for practical problems. Karmarkar [6] developed a polynomial projection approach that is used in some applications.t . the computational time rises up exponentially... we can find the optimal solution by (i) starting from a feasible corner point. Such a pair of (B. Simplex may require exponentially many iterations to examine each of the basic solution. C [C 0 m 1 ] .matrix notation: M AX C x T s . whose indices are in the set N . w 0 xB 0. w 2 . and (ii) moving to a better corner point until the current one is already optimal. x 2 . We have used Bland's pivoting rule. The above conditions are only necessary for a basic solution. However. the partition is updated by selecting an entering variable and a leaving variable. then the LP is unbounded. To find the sufficient condition for a basic solution.. We write the problem in following matrix form: M AX { C x : A x b . . Based on this observation. A few application areas include real time motion analysis ([11]).. This process is repeated till an optimal partition and solution is found. Even though the simplex algorithm is not polynomial... but it performs poorly in practice. As dimension n increases.. several techniques have been proposed for the solution of linear programs.t . III. Then. For example. To improve the efficiency. Megiddo [4] reduces the number of constraints through a multidimensional search technique. w n n . The rule for selecting the entering and leaving variables is called a pivoting rule. N ) which is not optimal... A x w b x. b m .. and other methods (such as ellipsoid) exist which theoretically are guaranteed to be polynomial.9. we get an updated partition. Linear programming is applied to a large variety of scientific and industrial computing applications employing optimization problems. w 3 0 where A B is the collection of columns in A whose indices are in the set B . among others.. we briefly explain the working of the Simplex. and B as the indices of other constraints in x 0 . A [ A I m m ] T T nm x [x w] We need n m linearly independent active constraints to uniquely determine a basic solution. AB x B AN x N b s .10])..

Complex tasks. linear programming is preferred over nonlinear programming because of its efficiency and other problem-specific advantages. pipelining. The model in figure 1 uses “persistent” variables for this purpose. For high level design we have chosen Xilinx System Generator. even though commercial and public domain software packages for Simplex exist and are widely used. Xilinx System Generator (XSG) for DSP is a tool which offers block libraries that plugs into Simulink tool (containing bit-true and cycleaccurate models of their FPGA‟s particular math. Choosing an appropriate tool for FPGA design is of crucial importance as it affects the cost. The hardware is based on parallel architecture and it employs standards FPUs. The value of objective function can be inferred from the display for both current step and optimal (final) step. however we have used tableau based representation for efficient computations. The small „time -to-market for FPGAs over VLSI models is the reason for popular choice of FPGAs in current market. Klindworth and Schutz [14] present a hardware realization of Simplex. and DSP functions). logic. IV. Using Simulink one can quickly build up models from libraries of pre-built blocks. V. There are many variants of Simplex that have been developed and are more efficient than naïve simplex such as Cosine Simplex etc. not at cache speed. Thus in a visual data flow environment. All of the downstream FPGA implementation steps including synthesis and place and route are automatically performed to generate an FPGA programming file. objective and different components of the input and sends it to the hardware unit where it is stored into the Zero Bus Turn around (ZBT) of Virtex-II and sends the data to the processing module. Note that. Moreover. Bruce (2003) reports an 800-time speed up by FPGA using SA-C. which offer some improvement such as reduction in the number of simplex iterations and the number of computations in each iteration. Besides. by efficiently exploiting parallelism of FPGA. Majumdar [13] implements integer linear programming on FPGA and show a speed-up over software implementation. However implementation on FPGA has its own advantages. it responds much slower than a comparable FPGA. They use eight processing units to get parallelism. Their design is composed of both software and hardware unit. parallelism and availability of optimizing compiler. SYSTEM DESIGN Figure 1 shows our model of Simplex Solver on Simulink. We also discuss the work done on implementing Simplex on hardware. in fact. and can be extended for different specialized applications. which involve. The values of the coefficients at the end of one step act as starting point in the next step of pivot computation. There are several reasons for such large speed up which FPGAs have over PCs. Simplex iteratively searches for the optimal solution till one is found and checks the vertices of the feasible region for its computation. Due to large hardware requirements and lack of pipelining.([12]) etc. the immense potential of hardware has hardly been utilized for enhancing performance of this computation intensive algorithm. The software unit accepts the input file and scans it for the problem size. but increasing the frequency above certain limits causes system level and board level issues that become a bottleneck in the design. The simulation automatically stops on finding optimal value. however. run much faster on FPGAs than on Pentiums. Over 90 DSP building blocks are provided in the Xilinx DSP blockset for Simulink. Frequency of operation in hardware such as Pentium can be increased up to a certain extent to increase the performance or the required data rate to process the image data. RAMs and custom VLSI chips. In this paper.. It provides an interactive graphical environment and a set of block libraries. multiple image operators. we promise very high parallelism (as shown in section 6. such as 28 or 100 or more). So. none of the current system uses any modeling or simulation language for visualization and demonstration of this algorithm to enhance learning. For sake of brevity. In these applications. It is a system-level modeling tool in which designs are captured in the DSP friendly Simulink modeling environment using a Xilinx specific blockset. hardware such as Pentium runs at memory speed. . we address these limitations. Simulink is a platform for multi-domain simulation and Model-Based Design for dynamic systems. They discuss the solution of problem where many operands (coefficients in A and b) are zeros. The processing module processes the data and sends the solution to the output module that gets stored in the ZBT. DESIGN LANGUAGES FOR IMPLEMENTATION The salient features of FPGAs that make them superior in speed. over conventional general purpose hardware like Pentiums are their greater I/O bandwidth to local memory. It is a DSP design tool from Xilinx that enables the use of the Mathworks model-based design environment Simulink for FPGA design. We have implemented models using both the properties. They have used dictionary based representation of problem. They have the special property that they need to be initialized only once during first function call and remember their values during subsequent function calls. development time and various other aspects of design. their implementation is slow compared to ours as shown in very poor clock frequency. In comparison to an FPGA. even running at much higher clock frequency and having the facility of cache memory. They use a VLSI chip model which is somewhat like a multicore chip. we omit the figure employing feedback network to update value. this is represented by a feedback network or memory element to remember the previous value of coefficients in the Simplex tableau.

Except sources and sinks (for display of results). The multipliers were implemented in hardware with the help of . After studying the solution of Simplex method using Simulink we demonstrated its hardware feasibility and visual interface through Xilinx system generator. Figure 2 shows the model of Simplex in System Generator. The hardware usage of FPGA is presented table 1. . Xilinx ISE is a design tool provided by Xilinx to help build bit streams to be directly ported into the FPGA boards. In this section we present the details of design implemented using VHDL programming language and later synthesized in Xilinx ISE. this design is composed entirely of Xilnx blocks and hence can be used to generate the hardware at the click of the button. The input and output interface blocks carry out the function of interfacing between signal produced by Simulink sources and that to be used by Xilinx blocks and vice versa.m files) were however needed for the design and the hardware generated for these blocks was not optimized. VHDL IMPLEMENTATION VHDL or Very high speed integrated circuits Hardware Description Language has been the choice of commercial and military consumers for digital hardware design (Kief. The Xilinx ISE tool performs several optimizations before synthesizing the design. The hardware was pipelined to increase the critical path of the design and increase the clock frequency. Many blocks of custom Matlab code (.Figure 1: Simplex Model in Simulink Figure 2: Simplex Model in Xilinx System Generator since this design does not use Xilinx Blocksets. it cannot be directly implemented in hardware. 2008) for the past and continues to dominate the commercial market due to optimized implementation on hardware and availability of large number of free IP cores. VI. We targeted the Xilinx Vertex V XCVLX330-LX board.

pages 339_347. REFERENCES [1] Dantzig. FPGA Implementation Of Integer Linear Programming Accelerator International Conference on Systemics. NY. the clock frequency decreases linearly. and Mitra. vol. VII.J. L. there is a quadratic increase in hardware resources (slice registers) usage. 27.Extreme DSP slices while the divider IP core was generated using Xilinx core-generator software. Therefore. pp. 7. Rochester. I. System Generator meets the needs of both system architects who need to integrate the components of a complete design and hardware designers who need to optimize implementations. We also plan to conduct a survey among undergraduate and graduate students. 1995. no. G. 3. 7. Polymenakos. vol. As the number of variables and constraints increase. 441462. 1982.. Todd. Tellambura. Megiddo. Majumdar. B. The hardware implementation details on Xilinx FPGA # Multipliers : 27 16x16-bit multiplier : 27 # Adders/Subtractors : 58 16-bit adder : 30 16-bit subtractor : 28 # Registers : 92 16-bit register : 91 3-bit register :1 # Latches : 20 1-bit latch : 20 # Comparators : 29 16-bit comparator greater :3 16-bit comparator less : 26 # Multiplexers :3 16-bit 8-to-1 multiplexer :3 # Xors : 10 1-bit xor2 : 10 #Dividers :8 defined block set of Xilinx DSP cores.32-52. We have implemented Simplex over Simulink and over FPGA using Xilinx System Generator for problem size of three variables and constraints. The hardware implementation details are presented in table 2. Activity Analysis of Production and Allocation. The Hardware usage statistics of FPGA Slice Logic Utilization: Number of Slice Registers: 1029 / 207360 Number of Slice LUTs: 1018/ 207360 Number used as Logic: 1012 / 207360 Number used as Memory: 6 / 54720 Slice Logic Distribution: Number of LUT Flip Flop pairs used: 1591 Table 2. I. T. vol. D. 1992.78 no. Investigating the sparse simplex method on a distributed memory multiprocessor”. Cui. Real-time motion analysis with linear programming Computer Vision and Image Understanding. Sept. vol. 5th Ann.151-170. Karmarkar. 2000. New York 377−380 (1951). G.” Operations Research 29. April 2000. L. Goldfarb. Klindworth. using powerful graphical functions of Simulink.1. pp. N.C. Linear Programming Detection and Decoding for MIMO Systems IEEE International Symposium on Information Theory. A feasible direction method for linear programming Operations Research Letters 3. “The ellipsoid method: a survey. Klabjan. M. Computational suggestions for maximizing a linear function subject to linear inequalities in T. 123-127 (1984).C. Maros. pp. 1783-1787. Eckstein. Maximization of a linear function of variables subject to linear inequalities. Borgwardt. R.B. Inc. no. Some distribution independent results about the asymptotic order of the average number of pivot steps in the simplex method Mathematics of Operations Research. 2000. J. S. DataParallel Implementations of Dense Simplex Methods on the ConnectionMachine CM-2 ORSA Journal on Computing. T. and Nemhauser. number 13 in Cowles Commission Monographs. Brown. editor. simulation and synthesis have made FPGAs a highly useful platform. D. Cybernetics and Informatics. and Koopmans. A parallel primaldual simplex algorithm Operations Research Letters. The future work will focus on development of visually enhanced implementation of Simplex on Simulink and its generalization to arbitrary large number of variables. pp. Proc. CONCLUSIONS AND FUTURE WORK Advances in FPGA technology along with development of elaborate and efficient tools for modeling. Y. Bodurglu.C. John Wiley. However.G. K. G. G. the clock frequency of FPGA based implementation will decrease. A clock frequency of 644 MHz was achieved with a Table 1. however we expect that the performance will be still better than other software based implementations where the increase in number of variables cannot be accompanied with increased resource utilization. 114−127 (1984). and M. pp. D. learning Simplex algorithm to assess how a graphical implementation of Simplex assists in and augments their learning process. A. leaves most of the FPGA hardware unutilized. 4. IEEE International ASIC Conference. Murty. Koopmans. and Goldfarb. A. 2. 1951. Parallel Computing. Werman. we can increase the number of variables to a very large value and still get a reasonably good implementation. Schutz. We presented the synthesis results for implementation over Vertex V XCVLX330 FPGA board. C. Peleg . L. H. M. A new polynomial-time algorithm for linear programming Combinatorica 4. and Faithi. T. A VLSI-Chip-Set for a HardwareAccelerator for the Simplex-Method. N. In T. owing to large time in signal propagation through interconnects. 47-55. Ho. As we increase the number of variables. Bland. This implies that we can move from one optimal solution to another in 4. 373−395 (1984). (ICSCI). 553559 [2] [3] [4] [5] latency of 3 cycles. Ben-Ezra. It can be observed from Table 1 that the implementation (a standard LP with 3 variables and 3 constraints). Linear programming in linear time when the dimension is fixed Journal of the Association of Computing Machinery 31. pp. Johnson. Koopmans. 26. Activity Analysis of Production and Allocation. vol. K. 402-416.W. editor. John Wiley & Sons. Jan 2006. pp. no.5 ns. E.G. July 2006. A high clock frequency of 644 MHz was obtained. 1039-1091 (1981). since most of the multiplication operations are done in parallel and in a row/ column-wise manner. With a graphical environment based on Simulink and a pre- [6] [7] [8] [9] [10] [11] [12] [13] [14] .

- 4
- Production Per Field (4Q 2007 - 4Q 2014)
- Basin Name Modifications & Project-Region Association-Final
- 9002033 - Geology
- Martinez
- Ndx Vasquez
- -pdfdocs-00022440_20141003104611_02265566-00000001-00022440-t@sedartorontoheather_sinclair971330-pdf
- 401_2005_ESIA-TIC_MAESTRIA_mayol_castillo_martha__angelica.pdf
- Amna_Pg12-13_1965_Mar_Abr_02_31 wells
- 9002010 - Well info
- 9002010 - Well info
- Cantu-Chapa,2009 Ammonites Taraises Fm
- Ndx Guzman
- 2014 Credit Suisse Energy Summit (1)
- High Fre Barron's Wordlist
- Analysis of a Vapor Absorption Machine to Air
- Traveling Salesman Problem Theory and Applications
- National Thermal Power Corporation
- Wind Speed Data_2
- Probabilistic Analysis of the Held and Karp Lower Bound for the Euclidean Traveling Salesman Problem
- Industrial Training Report on NTPC DADRI
- 33369999 Summer Training Project Report on NTPC by Prateek Jain VIT University
- EIA 17 (pp. 193-208) art.14.pdf
- WINSEM2012-13_CP1147_02-Jan-2013_RM01_ORandInfoSys

Simplex algorithm to solve Travelling salesman problems in highly technical paper presentation. Now you know i dont really give a damn about this and i am writing absolutely random words to make my...

Simplex algorithm to solve Travelling salesman problems in highly technical paper presentation. Now you know i dont really give a damn about this and i am writing absolutely random words to make my discover ability high

- Implementing Fast Fourier Transform Algorithms of Real-Valued Sequences With the TMS320 DSP Platformby Alejandro Rendón Uribe

- Easy Mod
- Quantum Computing Research Summary
- Higher Order Function
- Chapter 3 Arithmetic for Computers
- mp2
- Implementing Fuzzy Polynomial Interpolation
- R Examplesdsfsdffdsfdsfdsdsfdsfdssddsdfdsfds
- GATE (CSE)_ADA & DSA.pdf
- softwaretesting_unit4
- Algorithm Notes
- Implementing Fast Fourier Transform Algorithms of Real-Valued Sequences With the TMS320 DSP Platform
- Manual
- Chapter 4
- IMP Rate and Power Allocation for Multiuser OFDM
- FFT
- LDPC_исследование помехоустойчивости
- Reaching
- iec
- lec3
- Ma Thematic A Cheat Sheet
- VB-11-20
- Mpeg Video Compression
- 1Cracking Cancelable Fingerprint Template of Ratha
- ME Digital Communication 1st Sem syllabus
- PCO11
- Computer Programming 1(Lecture3)
- C10
- Notes 2
- Lecture 1asd`a

Are you sure?

This action might not be possible to undo. Are you sure you want to continue?

We've moved you to where you read on your other device.

Get the full title to continue

Get the full title to continue listening from where you left off, or restart the preview.

scribd