Parallel Algorithms for Logic Synthesis using the MIS Approach

Kaushik Det
t LSI Logic Corporation Milpitas, CA 95035 kaushik@lsil.com

John A. Chandyt

Sumit Royg

Steven Parke8

Prithviraj Banerjee~
5 Sierra Vista Research Santa Clara, CA 95053758 parkesQsierravista.com

$ University of Illinois IJrbana, IL 61801 {jchandy,sroy,banerjee}@crhc.uiuc.edu

Abstract
Combinational logic synthesis is a very zmportant but compututionally expensive phase of VLSI system design. Parallel processing offers an attructive solution to reduce this design cycle time. In this paper; we describe ProperMIS, a portable parallel algorithm for logic synthesis based on the MIS multi-level logic synthesis system. As part of this work, we have developed novel parallel algorithms for the d@rent logic transformations of theMIS system. Our algorithm uses an asynchronous messagedriven computing modei with no synchronizing barriers separating phases of parallel computation. The algorithm is portable aclnss a wide van’ ety ofparallel architectures, and is built around a well-defied sequential algorithm interface, so that we can beneJi!from future expansion oj’ the sequentral algorithm. We present results on several MCNC and ISCAS benchmark circuits for a varlc’ ty of shared memory and distributed processing architectures. Our implementation produces speedups oj‘ lzn average o.f 4 on 8 processors.

synthesis algorithmisMIS-II, which is based on iterativefactoring and simplification of nodes in a Boolean network. This algorithm forms the core of numerous university and industrial logic synthesis systems. A previous attempt to parallelize MIS - I I resulted in poor speedups and significant loss in quality, because the MIS-II algorithm is inherently sequential in nature and extremely hard to parallelize [7]. In this paper, we therefore present ProperMIS, a new parallel MIS-I I based algorithm for logic synthesis that uses an asynchronous message-driven computing model with no synchronizing barriers separating phases of parallel computation. Using the ProperCAD II system, the algorithm is portable across a wide variety of parallel architectures.

2

ProperCAD II Overview

1

Introduction

Combinational logic synthesis is the optimization of a logic design to realize a specific combinational function in either two level or multilevel form, and typically optimizes the area or delay of the resultant circuit. Efficient algorithms for two-level logic minimization include ESPRESSO [l] and for multilevel logic optimization, SOCRATES 121, MIS [3], SYLON-XTRANS [4], and BOLD [5l. Since logic synthesis is very compute Intensive, parallel processing is fast becoming a desirable solution to reduce the large amounts of time spent in VLSI circuit d’ esign. This has been recognized by several researchers in VLSI CAD, as many have started to investigate parallel algorithms for problems in logic synthesis and verification [6,7, 8,9]. We recently developed a portable parallel algorithm for the transduction method [4] of logic synthesis [IO], and results ot the parallel algorithm were presented for a variety of parallel platforms. Even though we obtained reasonably good speedups using that algorithm, the original sequential algorithm using the transduction method has very large run times. The more popular logic %s researchwas supportedin part by the National Science Foundation under grant MIP-9320854,the SemiconductorResearch Corporationunder grant SRC 94-DP-109. and the AdvancedResearch ProjectsAgency undercontract DAA-H04-94-G-0273administered by the Army Research Oflice.

Much of the work in parallel CAD reported to date suffers from a major limitation in that these proposed parallel algorithms are designed with a specific underlying architecture in mind. As a result, these applications perform poorly on architectures other than the one for which they were designed. Just as importantly, incompatibilities in programming environments make it difficult to port these programs across different parallel architectures. This limitation has serious consequences, since a parallel algorithm needs to be developed afresh for every target MIMD architecture. One of the primary concerns of the ProperCAD project [ 111 is to address this portability problem by designing algorithms to run on a range of parallel machines including shared memory multiprocessors, distributed memory multicomputers, and networks of workstations. The ProperCAD approach to the design of parallel CAD algorithms is illustrated in Figure 1. A parallel algorithm is designed around an existing uniprocessor algorithm by identifying modules in the uniprocessor code and designing a well-defined interface between the parallel and sequential code. The project has undergone two distinct phases, the first of which, ProperCAD I, involved the use of the C-based Charm language and runtime system [12]. The second phase, ProperCAD Il [13, 141, entailed the creation of a CH library which provided an object-riented parallel interface based on the actor model of concurrent object-oriented computing. The ProperCAD U library provides the mechanisms necessary for parallel execution in through the use of a fundamental object called an actor [ 151. An actor object consists of a thread of control that communicates with other actors by sending messages,and all actor actions are in response to these messages. Specific actor

579 1063.7133/95 $4.00 0 1995 IEEE

Proceedings of the 9th International Parallel Processing Symposium (IPPS '95) 1063-7133/95 $10.00 © 1995 IEEE

given the following Boolean network equations F = abc+ abd + eg: G = abfg (3. Resubsritutionis used to check if an existing function itself is a divisor of anotherfunction. consider the network: 3: = ac+ad+bc+bd+e and y = a+b (5) The function y itself is a divisor of the function Z.. ‘ . if cl E F(vl)..suchthatk. the equationsin the modified network become F = Xc+Xd +eg.k~. This transformation reducesthe number of literals from 12 to 11.. A variable is a symbol representinga single coordinateof the Boolean space.. .n K(~l).. In the graph. @ f..IJnfortunately. . An expressionf is cube:free if no cube divides f evenly.nk. The satisjiabzity don’ t wre set of a node i is defined to b< DS4Ti = y%f.Existing Sequential The aim of kernel extraction is to searchfor multiple-cube comHITECIPROOFS mon divisors and extract those divisors.. we partition the circuit for the purpose Proceedings of the 9th International Parallel Processing Symposium (IPPS '95) 1063-7133/95 $10. For example. Instead of using such a naive method. . Therefore..and actors are not allowed to block or explicitly make receive requestsfrom other processors. H = ade+cde (1) the best multiple-cube divisor that can be found is a + b.) Definitions the best single cube common divisor that can be found is ab. and logic synthesis. of a logic function.2 Types of Logic ‘ hansformations cube common divisor b for 1 nodes 91.k. expressedin the sum-of-productsformat (two-level logic). Thefun-our of a node i ISthe set of all nodes which node i points to. cm E F(+.. the kernels of arkexpressionf are the cube-free primary divisors of f The cube C used to obtain kernel k = f. After creating a new node X in the network.. The support of an expression f is the set of literals sup(f) which are presentin the sum-of-product expressionof f. After creating a new node X for the common divisor iul the network. . . H=ade+cde.r)~?. .). 711 if 3 k. it can be used to simplify the function 2. that results in loss in quality of the synthesizedcircuit. E ~K(vz). In other words. given the equationsin a Boolean network F = af + bf + ag + cg + ade -t bde + cde. G = af+bf+ace+bce.1 MISII OVerVieW F = deX + fX + ag + cg + cde. where F(vi) representsthe Boolean expressionfor the node vi. . where y. cell placement. q. such that cl ncz . The kernels of an expression f are the expressionsK(f) = {!glg E o(f) and g is cube-free}. A literal is a variable or its negation.The runtime systemon each processorpicks the next available actor thread with some prrority and that thread is then allowed to run to completion without interruption.. 3 3. A Boolean network is a directed acyclic graph (DAG) where each node i is associatedwith 1) a variable yt and 2) a reprcsentarion f. There exists a multiplenmberwdfsc SYLON-XlRANS Algorimmr MIS-II PACE 3. is the variable representingthe node i and fi is the logic function for that nrlde. G = Xfg. As part of ProperCAD. There exists a single cube common divisor 6 for m nodes. l K(~~).. which can be rewritten as: z = y(c+d) -t e (6) Each node of a Boolean network is a Boolean function. Theprimav divisocv of an expressionf form a set of expressions o(f) = { f/C(G’ is a cube).. two-level minimization algorithms such as espresso [l] can be usedto minimize eachnode of the Boolean network That process is called simplification. Hence. we will review some basic definitions as given in [3] to be used later in this paper.00 © 1995 IEEE . X=a+b (2) The original network had 33 liter&.. An expression is a set f of cubes. 4 Parallelization Methodology An immediately apparentapproachto parallel logic synthesis is to divide the logic circuit into several partitions and then synthesize thosepartitions independentlyin parallel [ 161. Th e . For example. c2 ci F(r)z). the equationsin the network become Figure 1: An overvlew of the ProperCAD project methodsare invoked to processeach type of message. X = ab (4 ln this section. a suite of parallel applications have beendevelopedthat addressthe most sigmficant tasksin VLSI design automation including Lircuit extraction. A cube is a set C of literals such that 3’E C implies f 6 C.. = 6.fnn-. G=ceX+fX. in ProperMIS. ~1. and the modified network after the kernel extraction has 25 literals. ot the nodes pointing to node i. ). test generation. because of the lack of global information. Cube extraction searches for single-cubecommon divisors and extracts those divisors.~ = 5.fault simulation. + ?$fi = y. For example. k./C is called the cok(mel of k.. ~2. 42. an arc connectsnode i in o f il node i is the sei of all to node j if ?/i E sllp(f.

The entry at position (i. In a parallel environment. ce . permissionis denied. Partitioning is used only for distribution of work among different processors. Gf H de F Ff F F de c g 4 5 -2. Every node in the Booleannetwork has a version number that is incrementedby one whenever the functionality of the node changesdue to sometransformation. Each partition(7r) actor then initiates the optimization procedure by starting with the simplification procedurewhich is detailed in Section 4.For example.In our parallel algorithm. when a processorfinds a possibletransformationon a node 1. 4567 513. The kernels of G . i1sdescribed in [ 171. a t t + c (de). 4. For each partition K. kemel extraction. we provide coherenceamong the parallel applicationsof theseoptimizationsby using ‘ versionnumbersfor the nodes. and a column correspondsto a cube which is presentin some kernel. Finding useful intersectionsof kernels is facilitated with a data structurecalled the co. In that case. the co-kernel cube matrix in that processorwill resemble the one given in Figure 2(b). For reference. we defined the serial kernel extraction process. Using the kernel generationalgorithm described in [ 11. Similarly. permissionis grantedbecause not changedsince it was checkedfor the possibletransformation.These pnorities are usedto guide the synthesisprocesssuchthat it avoids 10~1 mitima. an actor object denoted as partition(7r) is created. Fg6 Ga Gb G ce 1. we do not synthesizethosepartitions independently. cube extraction. The ProperCAD II run-time systemautomaticallydistributesthese objectsto different processors.Logic minimization is performedon the entire network.c (de). We assignpriorities to different transformations. and resubstitutionare begun as explainedin Sections4.The muster processorchecksif the versionnumber provided in the requestis the sameas the current version number the functionality of 71 has of 9.kernel cube matrix. The kernels (co-kernels) of the equation F are dc: + f + g (u). A row in this matrix correspondsto a kernel (and its associatedco-kernel). Therefore.a processorwill not pick up any messageregarding kernel extraction if someother message regarding simplification is waiting to be processedin that processor. all of the possiblekernelsfor eachnode are generated concurrently.each partition actor will in parallel generatethe kernels for all the nodes in its partition.00 © 1995 IEEE . 5 6 12 3 1’ 1 2 . Fb7 a b ce I 2 3 IO f 2 8 a 3 lb b 4 1’ 1 9 .For example.So any kernel 581 Proceedings of the 9th International Parallel Processing Symposium (IPPS '95) 1063-7133/95 $10.each node is assigneda distinct interval of labels.. a f b (f. However.if anotherprocessorreceivesthe kernels in the order G.In the sequentialalgorithm. it asks permissionfrom a designatedmaster processorto make the transformation and it provides the version xmmberof 71 in the request.one processormay receive the kernels in the order F. de 6 g 1 11 9 : I. and the row number correspondingto that kernel is noted in a table.of division of work among different processors.2. : 1 : I 3 4 L 4 . : lb II . when a new kernel (co-kernel)is generated. In order to allow different processorsto simultaneouslypcrform conflicting optimizations on the network. when a unique kernel-cubeis generated.we can assignthe intervals [l-1000]. Sincekernelsare generated in parallel and then broadcastto all of the processors.3.a new column is assignedfor that cube and the column number correspondingto that cube is noted in a table. I. For example. To keep the row and column labeling consistentacrossall processors. 4. so as not to lose the quality of the synthesizedlogic. 6 s 6 12. which has different labeling for the rows and the columns. and they have to be the same in all processors. Otherwise. After reading in the circuit. the partition(x) actor is sent a messageto generate kernels for the nodesin P. de + f (b). F de Ff FcS. ce). cube extraction and resubstitution. 8 12 13 . and will then broadcast tha! information to all of the processors. Consider the equations given in Eq.We want to createthe co-kernel cube matrix in all of the processors. At this point the other transformations.with initial priorities ordered as follows (highest to 1c)west): simplification.1 Parallel Algorithm for Extraction a I Fa Fb2. de + g (c) and a + c (9). . : i 9 : : G f de i H 10 11 8 12 9 1.. the creation of the co-kernel cube matrix is a more involved process. [lOOl-2ooO] and [X01-30001 to the nodesF. II and F. We createmany small partitions such that load balancing is good. a very simple strategybased on the input conesof the primary outputs is usedto createthe partitions. a + b (f). G and H.: 10 II (b) Figure 2: Two co-kernel cube matrices for Eq. G and H respectively. b).2. 8 9. When all of the nodes in a partition x are simplified. and the only kernel of H is n +.1 and 4.the cubes of the original expressionsin Eq. j) is nonzero if kernel i containsthe cube j. 3 4 I 8 9 IO b 2 _. the order in which the kernels are receivedmay vary in different processors. de 9 . 6 f g . 4 : . but we make surethat the amountof computationrequired by each partition is roughly an order of magnitudehigher than that of communication time for sending a message between objects. so it doesnot have any effect on the quality of the synthesizedcircuit.however. 1 are numberedfrom 1 to 13. the cokernel cube matrix will be the sameas that given in Figure 2(a). I In Section 3. (4 --I- ce I G G G Fa6.brece + f (a. kernel extraction.. If so.a new row is assignedfor that kernel.

4.9.6. This actor receivesthe submatrix Ml. it is equivalent to adding the kernel cube a to the null subexpression in the root object as shown in Figure 4(ii). Rectangle-coveringis pertormed by generatingall possiblerectangles and then selecting the maximum valued rectangle. Otherwise.62. For example. 5 1 i b 2 de s 4 f 5 g 6 *& 513 . ‘ Thecolumns of the rectangleidentify the cubesin the subexpresston. a new child kernel-extract actor is createdwtth the following arguments:the submatrix h41. rect is a empty rectangle and the value of index is 0 as shown in Figure 5(i). Figure 3: Consistent co-kernel cube matrix for Eq. That is performed by adding to rect columns with labels more than or equal to in&~. Hence. After the generauon of all of the kernels. a common subexpression in the network. If the number of columns in the submatrix A4 is less than a user-definedthreshold.by adding column 1 to the empty rectangle in the root object. Each column c with d label greater than index is examined as a column to include in the rectangle. When a new kurnrlzxtruct actor is created. all of the possible prime rectangles are generatedby applying the sequentialalgorithm describedin [ 171.4. The leaf nodes of the searchtree generatedprime rectangles by using the sequentialalgorithm and then reportsthe best rectangle seento its parentafter the computationis performed.10. the prime rectangle ({3.:4: 4 8 9 I.new child actorsare createdand the searchspaceis divided among those children.. At this point. it is equivalent to generating all of the possible subexpressions which can be multiple cube common divisors to the given set of equationsby adding more kernel cubes (columns of M) to the given subexpression (represented by rect). The responsibility of this kernel-extract actor is to generateall of the prime rectangleswith fewer rows but more columns than ret?. Conceptually. 10. 2. 6 7 2 .l-L-.. 2. the new rectanglerectl and the index c.11) as shown in Figure 5(ii). 8. The cubes numbered I. 2) Jin Figure 3 (shown in bold) identifies the subexpression a + h which divides the function F and G as in Eq. Non leaf nodes.H f de 2001 12--.the followmg information is passed: the submatrix of the co-kernel cube matrix (. $6. The kernel-extract actor also computesthe value of red and records the information if it is the best rectangleseenby this actor.00 © 1995 IEEE .0 II Rectangle b ce 10 8 11 9 : : G u. the rectangle rectl as shown in Figure 5(ii) and the value of the parameterindex u’ ill be 1. and the rows of the rectangleidentify the particular functions that the subexpression divides.which createdchild actors.must wait for the best rectangles seenby its children before it can then forward the best rectangle 582 Proceedings of the 9th International Parallel Processing Symposium (IPPS '95) 1063-7133/95 $10. 1004}. The szarch for the maximum-baluedrectang’ lecan be performed in parallel as conceptually illustrated in Figure 4 and illustrated with the help of a co-kernel LX& mtrlr& in Figure 5. that is. Conceptually. {I. M is the cokernel cube matrix itself. The value of a rectangle measuresthl* reduction of the number of literals in the network if the particular rectangleis selected. the rectangle generatedthus far (re&) and the index of the column from which it should searchonwards for new prime rect- angles (index).9. The submatrix Ml of the original matrix M is createdby selecting only the rows in which column c has a nonzero value. . a designatednmst~~r processorcreatesan Jctor called kernel-mxtruct LOcompute the maximum valued rect. and 11 from the original set of functions aru coveredby this rectangle. For example.M).rmgle by generatmgall of the prime rectariglcs. the labeling of the rows will be consistentin all of the processors irrespectiveof the order in which the kernelsare received. A rectangleof the co-kernel cube matrix identifies an intersectton of kernels. 1 y-G--J Figure 5: Kernel-extract matrices actor search tree and associated Figure 4: Conceptual search space for finding the maximum-valued rectangle (LO-kernel) generatedfor thenode G will be labeledas a row in the co-kernel cube matrix with a number starting from 1001. For the root kernel-extract actor. IJsing a similar strategyfor column labeling will producea co-kcrel cube matrix as in Figure 3. we createa new rectanglerectl with one column and six rows (:3.-.-L1. 1003. and a new rectanglerectl is formed from the columns of the old rectangle rect and the rows for which c has a nonzero value.aFa Fb2 F de F f F c F G G G : I 3 4 5 6 1001 1002 1003 I004 a 1 .

that will never happen. If it is. As with the resub actors.the simp actors are distributed acrossdifferent processorsby the ProperCAD II run-time system. A node v can be an algebraic divisor of the node 7~ if the support of v is a subset of the support of 71. we will discussresults obtained by applying Propox-MIS on various MCNC and ISCAS benchmark circuits. With such a procedure. 4. and Y = XorX = P and Y = ab+Z$ This simple example shows that two nodes being simplified concurrently should not use each other’ s satisfiability don’ t-care sets. but all algebraic operationssuch as kernel extraction. the resubstitution transformation is used to check if an existing expression itself is a divisor of other expressions. and broadcasts that information to the other processors. The root kernel-extract actor finally reports the best rectangle information to the ma~rer processor. In Table 1. resuh(t/) restricts its searchfor divisors to the nodes whose supportsare subsetsof the support of q. The version numbers of the affected nodesare incremented by 1. and the processcontinues until no more positive valued rectanglescan be found.II with that of the ProperMIS algorithm in a uniproces- Proceedings of the 9th International Parallel Processing Symposium (IPPS '95) 1063-7133/95 $10. the master processor will deny permission or simply supply a reduceddon’ t-care set. and very efficient algorithms exist for finding minimal two-level representationsof Boolean functions. Each partition actor creates for each node 7 in its partition a simp(v) actor that is responsiblefor simplification of that node.the master processorcreates a new resub actor for the node n. two-level minimization can bt: made more powerful by providing the minimizer with a set of functions known as don’ t-care sets. The search is restricted to algebraic divisors as the search for Boolean division is very expensive. Since the functionality of the node 9 has changed. The responsibility of resrcb(q) is to search for possible resubstitution by other nodes in the node n.2 Parallel Algorithm for Resubstitution As described in Section 3. the other processorsare notified and the master processor will unlock the appropriate locks. Initially all of the nodes are unlocked. potentially improving the quality of node simplification. Hence. we thus have to introduce the conceptof locking a node. The resubstitution operation can be performed in parallel by creating for each node Q resubiqi actorswhich are then distributed acrossall of the processorsby the ProperCAD U runtime system as shown in Figure (5. we assigneddifferent priorities proportional to the size of the nodes(in terms of the literal count) to different simp actors. 4. Otherwise. Therefore. we use a heuristic as described below. When the muster processor receives the information regarding the maximum-valued rectangle. When the master processorpicks up the permission request. cube extraction and resubstitution are allowed. we compare the quality of circuits obtained by MIS. The muster processorverifies the currentnessof the version number and.1. it checks if n is Boolean locked. In a sequcntial algorithm for simplification. and its size can determinehow thorough and how fast the minimization process will be. we must find the largest set of nodes such that simplification of those nodes can be performed concurrently. the don’ t-care sets of those nodes will be available for use. It is very apparent from the description of the simplification procedurethat the first node to receive permission can use all of the don’ t-care sets it asks for. To make concurrentnode simplification possible.00 © 1995 IEEE . After simplification is complete. since only one node is simplified at a time. Conceptually. there must be constraintson which don’ t-cares can be used to ensurethe results are correct. depending on the type of don’ t-care set used.3 Parallel Algorithm for Simplification Two-level minimization is a much more developed science than multilevel minimization. If permissionis granted.again dependingon the don’ t-care set. and is uot described further for lack of space.Processor 0 Procwsor 1 Pmcessor 2 hocessor 3 Figure 6: Creation of objects for resubstitution to its parent. A new root kernel-extract actor is created to processthe new co-kernel cube matrix. If so. For example. Then we would have X = a&+Til. the satisfiability don’ t care of any node can be used for simplification of another node as long as it does not form a cycle in the network. More details are available in [ 18. 5 Results and Discussion In this section. But if they are simplified one at a time. Since we are performing node simplifications in parallel. for better simplification. consider the set of expressions X = ag+Eb and Y = ab+z6 If both of the nodes are simplified concurrently and they use each other’ s satisfiability don’ t-care sets. If two simp actors ask permission from the master processor simultaneously. A node is said to be Boolean locked if simplification of that node is not allowed. In Ihe context of a multilevel network. permission for the transformation must be requestedfrom the master processor. the don’ t-care set is checked to determine if any of the nodes are are algebraically locked. it computesthe subexpression corresponding to that rectangle from the co-kernel cube matrix. then the result could be X=YandY=F which is wrong. The parallel algorithm for cube extraction is similar to that for kerrteE extraction. the nodes in the set are Boolean or algebraically locked. increments the version number. each simp(q) actor asks for permission from the master processorfor the simplification of n. Anytime it finds a divrsor u and the divisiou is going to reducethe literal count of the network. To perform the simplification of nodes in parallel. if granted. Since this problem is NP-hard. the larger node will be processedfirst. Upon creation. the don’ t-care set of any other node can be used 583 in the simplification processwithout error. the nzasrerprocessordenies permission and asks simp(q) to try later.2.which then remove the kernels of the affected expressionsfrom the cokernel cube matrix. A node is said to be algebraically locked if no change in functionality is allowed to that node. These sets are derived from the:structureof the network as well.

References [l] R. no explicit synchronizingharriers are allowed in the algorithm. implemented as part of the ProperCAD project As part of this work. We are also thankful to the San Diego Supercomputing Center for granting us accessto their Intel Paragon. now part of SIS 1.d it on various M(‘ NC We are grateful to Dr. rttsubstitution.2 [ 191. again. The last two columns compare the nmtimes for MIS. and to John G. ProperMIS uses an asyn-. C. called ProperMIS.The runtimesand qualitiesare not identicaldue to the nondeterminismin the order of applying various transformations.however.ssor Sun 4/69OMPserver.show good speedupswith almost no degradationin the quality of the synthesized network over the uniprocessoralgorithm.I I and ProperMIS for a Sun 4/69OMP[runtimes are given in seconds).2 Propez MIS 32. Tables 2. in most casesthe runtimesare comparable. Table 4 shows the results for a network of SPARCstation5 workstations.Again. Hachtel. load balancing. SangiovanniVincentelli. McMullen.I I and ProperMI Circuit Initial bw 424 Literal Count Run Time (SW) S Table 2: Results for the Intel Paragon MIS Proper MIS 186 MIS 30. Holm for his assistance in characterizingthe parallel behavior of ProperMIS. D. 3 and 5 present the results obtained on a distributed memory MIMD machme.00 © 1995 IEEE . c hronousmessage-driven actor model of ‘ computation.ed circuits in terms of the literal count and the run times in secondsand speedups are shown for some benchmarkcircuits. One can observe that the maximum degradationof quality with multiple processors IS generally less than 10% The quality is sometimesimproved with a larger numberof processors. From the tables it is clear that the ProperMIS algorithm produces good speedups. the quality of the synthesiz.Table 1: Comparison between MIS .the Intel Paragon. we were easily able lo update ProperMIS to the latest version of MIS. G. and node simphjicotiorz proceduresof the MIS.II system.ih poor. vvc have developed novel parallel algorithms for the kernel extraction. and ISCAS synthesisbenchmarkcircuits. The first 3 columns comparethe literal countsof the initial circuits. For different numbersof processors. We are also currently investigating paral-. 6 Conclusions 7 Acknowledgement In this paper. the circuits produced by MIS-II and the circuits produced by ProperMIS.In light of this. The results. we have designedinnovative partitioning and redistribution methods that do not sacrifice the quality of the synthesizedlogic [16]. Overly large circuits can not be handled by ProperMIS due to excessivememory requirements. We have implementedthe algorithm and have eva1uatc. We are in the processof evaluating the reasonsfor the difference in results between ProperMIS and ProperSIS. Balkrishna Ramkumar for various discussions on the developmentof parallel algorithms. respectively. and A.an 8-processorEncore Multimax and a 4-procfL”. Some of the superlinearspeedupsare daleto anomaliesin parallel searchtechniques.and maintains quality comparableto that of the MIS. cube extruction. we have presentedan asynchronousportable parallel algorithm for logic synthesis.II approach.as well as two shared memory machines. The parametersand don’ t-care sets supplied to MIS-II and ProperMIS are identical for any circuit. due to nondeterminism. In doing so. and we will be addressingthat issue in future research.2. reportedon a variety of parallel architectures.4 i78 Table 3: Results for the Encore Multimax sor environment. The quality of the circuits produced by ProperMIS is almost the same(sometimesbetter) as that ofcircuits produced by MIS-I I.Preliminary results using the new version are shown in Table 7. “ESPRESSO-II: A new logic minimizer for program- Proceedings of the 9th International Parallel Processing Symposium (IPPS '95) 1063-7133/95 $10. K. By retaining about 80% of the sequentialcode from MIS-II. Measurements of the parallielcharacteristicsof ProperME? are shown in Table 6 for C49Y on the Sun 4/69(fMP. lelizing other aspectsof logic synthesispresentin s IS 1. Brayton.While the user time to systemtime ratio is good. we have conductedresearch to partition thosecircuits and then synthesizeeach partition separatelyin parallel.

K. vol. De. CRHC-94-20AJtLLJ-ENC-94-2235. Devadas. “Parallel algorithms for logic synthesis. De and P. L. Moceyunas. “ProperCAD: A portable object-oriented parallel environment for VLSl CAD. 19X7. “Logic synthesis for VLSI design. Sangiovanni-Vincentelli. 141 X. 422-425. De. A. ” in Dtgest of Pupers. “Logic verification algorithms and their parallel implementations. CRHC-93-20/UILU-ENG-93-2235. Berkeley. Nov 1994.” IEEE Design & Test !!fCompufer~. DC). G. Rep. June 1991 Tech. “Thtk Boulder Optimal Logic Design system. Dissertation. R. (Santa Clam CA). Bostick. Stephan. Sept.Xiang. 17-25. G. pp. Aug. and I). 1985. BanerJ:rJee. University of Illinois at Urbana-Champaign. 1990. Rudell. Cambridge.” Tech. vol. and P. University of Illinois at UrhanaChampaign. R. L. University of Illinois at Urbana-Champaign. M. IL). 1151 G. Aug. 1993.S. de Geus and W. Ma. Lavagno. III: 135-142. D. Murgai. July 1994. Department of Electrical Engineering and Computer Science. Sept.parallel. pp. pp. Dissertation. June 1987. R. pp.s. A Sangiovanmi-Vincentelli.D. Sangiovanni-Vincentelh. 839-842. and application. Internutionul Conferencc~ on Computer-Aided De. Ramkumar and P. Rudell. Hachtel. Charles. (St. Tech. 1062-1081 Nov. [‘ T] G. and P. Table 5: Results for a Sun 4/690MP Table 7: ProperSIS Results for a Sun 4/690MP mable logic arrays. “A class library approach to concurrent object-oriented programming with applications to VLSI CAD. 1989. [:im] R. Dissertation. Nov. “Multilevel logic network synthesis systems. (Washington. i. MA: The MIT Press. pp. IL). pp 600-607. 566-580. Ravenscroft. Rep. Kale. [ 121 L. D. Jacoby. J. “A parallel algorithm for algebraic factorization with application to multi-level logic synthesis.Table 4: Results for a cluster of SPARCstation 5s Table 6: Characteristics of ProperMIS.” in Proc~eedings of f/w Design Automation Confkrence. Brayton. K. (Miami Beach.” Ph. A. Zipfel. Intwnutronul Co@rencc on Compufer-Aided Design. Dissertation. Nov. UCB/ERL M9u41. (Miami Beach.00 © 1995 IEEE . “SIS: A system for seyuential circuit synthesis.. Banejee. [19] E. R. S. “Paralllel algorithms for boolean tautology checking. University of Catifomia. Actors: A Model of Concurrent Computation in Distributed Sysrems. May 1992 IO] R. 1986. Computer-Aided Design. CAD-6. M. and A R. H. FL). [ 161 K. Cohen. Tech.” Ph. C.D.D. Ramkumar. “MIS: A multiple-level logic optimization system. Morrison.D.” in Proceedings of the lnternatianul Conference on Purullel Processing. 13. SavoJ. pp.” IEEE Trans. 585 Proceedings of the 9th International Parallel Processing Symposium (IPPS '95) 1063-7133/95 $10. [I I] B. SYLONXTRANS. pp. 1987.” Ph. “A parallel PLA mimmization program. pp 370-376. Parkes. Singh. CA. 1987. [I31 S. [ 141 S. P. 1990.” in Digest of Puper. J. Reddy. Cahvanche and S. pp 22. Chandy. University of Illinois at Urbana-Champaign. Wang. Sentovich. [IO] K. CA) pp.32. vol. FL). J. (St.” Ph.‘” M. “A rule-based system for oprimizmg combinational logic. 1994. Saldanha. Aug. ComputerAided Design.Smta Clara. Berkeley. University of California. implementation. M. l? Moceyunas. Banerjee. 62-65. V. Charles. Hachtel and P. and A. Parkes.object-oriented programming: Interface. “A library-based approach to portable. May 1994. Bane+. thesis. 13. “Parallel logic synthesis using partitioning.” IEEE Trunx Computer-Aided Drvign. B. [ 171 R. R. R. Rep. “A portable parallel algorithm for logic synthesis using transduction. and A. 283-290.” in Supercomputing ‘ 94. pp. Rep CRHC-9I24/UILU-ENG-91-2134 [8] G.” in Proceedings of tht Custom hltegrured Circuits Cortference. Moon.qw. R Lightner. “The Chare Kernel parallel programming system.-K. A. [i] D.” in Proceedqs of the Intern&ma1 Conference on Purullel Processmg. [ 181 K. [<I] H. C. M.” in Pmceedings ofthe Design Auromution Co@r~ncc~. H. June 10X7. June 1984 121 A. T. 1994. Agha. K Brayton.” IEEE Truns. 69--7X.