TRIPLE patterning lithography (TPL) is regarded as

Size: px

Start display at page:

Download "TRIPLE patterning lithography (TPL) is regarded as"

Alexandra Dalton
6 years ago
Views:

1 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 24, NO. 4, APRIL Triple Patterning Lithography Aware Optimization and Detailed Placement Algorithms for Standard Cell-Based Designs Jian Kuang, Wing-Kai Chow, and Evangeline F. Y. Young Abstract Triple patterning lithography (TPL) is regarded as a promising technique to handle the manufacturing challenges in the 14nm technology node and beyond. It is necessary to consider TPL in early design stages to make the layout more TPL friendly and reduce the manufacturing cost. In this paper, we propose a flow to co-optimize cell layout decomposition and detailed placement. Our cell decomposition approach can enumerate all coloring solutions with the minimum number of stitches. The experimental results show that our approach can outperform the existing work in all aspects of stitch number, half-perimeter wirelength (HPWL), and running time. We further extend our placer to consider the displacement of cells as a constraint and as an objective, respectively, which can help to preserve the quality of the input placement. Effectiveness of the extensions is verified by the experiments. Index Terms Design for manufacturability, displacement, layout decomposition, placement, triple patterning lithography (TPL). I. INTRODUCTION TRIPLE patterning lithography (TPL) is regarded as a highly possible substitute of double patterning lithography (DPL) [2], [3] in the 14/10nm technology node, especially when the next generation lithography technologies, such as extreme ultra-violet and electron beam, are still not ready for mass production. One of the most crucial problems that has to be solved before TPL becomes a reality is the layout decomposition for TPL, which decomposes a layout into three parts such that they can be assigned to different masks. This problem has received extensive attention in research in these few years [4] [10]. However, considering TPL decomposition after placement and routing can be too late [13]. There are several reasons as follows. 1) If TPL is not considered in the physical design stage, many conflicts can be created because some unfriendly cells may be placed too close to each other, or the routed wires are incompatible to TPL. Manuscript received February 26, 2015; revised May 5, 2015; accepted July 10, Date of publication August 18, 2015; date of current version March 18, This work was supported by the Research Grants Council, University Grants Committee, Hong Kong, under Project CUHK The authors are with the Department of Computer Science and Engineering, The Chinese University of Hong Kong, Hong Kong ( jkuang@cse.cuhk.edu.hk; wkchow@cse.cuhk.edu.hk; ffyoung@cse. cuhk.edu.hk). The preliminary version of this paper was presented in ICCAD 2014 [1]. Color versions of one or more of the figures in this paper are available online at Digital Object Identifier /TVLSI ) Native conflict may exist inside a cell, and a placement unaware of this may produce a layout with many native conflicts because a cell can appear many times in a circuit. 3) A reported conflict after decomposition can only be resolved by modifying the layout, which is very expensive. For the same reason as in 2), the same modification may need to be repeated many times. 4) As the layout decomposition for TPL is NP-complete in general [4], the decomposition of the whole layout, which has a large problem size, is very time consuming. There are some previous works such as [11] and [12] that consider TPL in detailed routing, which can help decomposition of the routing layers. Reference [13] was the first systematical work aiming at TPL aware cell compliance and placement for standard cell-based design. However, their cell compliance/decomposition approach is based on a manual cell modification and a backtracking algorithm, which tends to be inefficient and not robust. Besides, only the row-based cell shifting step is considered for TPL decomposition in their approach, while other steps in detailed placement, such as cell reordering, vertical move, and global move, were not considered. Considering TPL in the whole detailed placement process can explore a larger solution space and further optimize the placement objectives including TPL decomposability. After considering TPL in the whole detailed placement flow, our approach can still run faster than [13] and obtain much higher quality. Tian et al. [15] consider TPL aware placement with the constraint that the same cell must have the same coloring solution. This work considers only linear placement and their MAX-SAT-based algorithm may not be scalable for complicated design with a large number of conflicts. Lin and Chu [16] proposed the mixed integer linear programming-based approach and tree-based heuristic for TPL aware placement. This paper is also limited to linear placement and it has the same constraint as [15] that the same cell must have the same coloring. In this paper, we propose a comprehensive study on the co-optimization of cell decomposition and TPL aware placement for standard cell-based designs. Our main contributions can be summarized as follows. 1) We present a systematic study on the native conflict problem in TPL. 2) We propose a fast and robust algorithm to enumerate all coloring solutions of a cell IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See for more information.

2 1320 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 24, NO. 4, APRIL 2016 Fig. 1. Standard cell-based design. (a) Example for a standard cell. (b)- (c) Different design styles. (d) Filler cell. 3) We propose an efficient TPL aware compaction algorithm and an efficient TPL aware legalization method based on dynamic programming. Our TPL aware placer integrated with these algorithms outperforms the previous work. 4) We further extend our placer to consider displacement constraint and to minimize the total displacement. The rest of this paper is organized as follows. Section II introduces some preliminaries. Section III discusses the standard cell decomposition problem. Section IV describes our TPL aware placement algorithms. Section V presents our TPL aware placement method with displacement constraint. Section VI introduces the TPL aware placement algorithm to minimize the total displacement. Section VII reports the experimental results. Finally, the conclusion is drawn in Section VIII. II. PRELIMINARIES Preliminaries of standard cell-based design and the layout decomposition for TPL that are related to this paper are introduced in this section. A. Standard Cell-Based Design There is a predesigned cell library in standard cell-based design. All the cells in the library have the same height, and the power and ground rails are located at the top and bottom of a cell. An example is shown in Fig. 1(a). The power and ground rails go from the very left of the cell to the very right. There is always a margin between the internal features and the left and right boundaries of a cell. Cells are placed in rows in the layout, aligned to each other by the power and ground rails. There are two different design styles [17]. One is shown in Fig. 1(b) in which the neighboring rows are separated. Another type is shown in Fig. 1(c) in which the rows are abutting and the power and ground rails of the neighboring rows are merged. In the second case, cells in every other row need to be flipped vertically. Note that the cells in Fig. 2. Layout decomposition for TPL in [18]. (a) Conflict graph. (b) Division of the layout. (c) Solution graph in which all the edges are from left to right. the same row are not necessarily abutting horizontally. When there are spaces between two cells in a row, filler cells will be inserted to connect the power and ground rails [Fig. 1(d)]. Therefore, the cells cannot be placed arbitrarily. Instead, they should align to the placement sites. The cell widths and the distance between any two neighboring cells must be the multiples of the width of the smallest filler cell. B. Layout Decomposition for TPL With Solution Graph Layout decomposition for TPL needs to assign each feature in the layout a mask, such that two features that are too close (i.e., at a distance smaller than a threshold C min )mustbe assigned to different masks. Theoretically, a conflict graph can be constructed [Fig. 2(a)], and the problem can be transformed into a three-coloring problem [4]. In this paper, we focus on the M1 layer of the standard cells, which is the most complicated one and needs TPL. A polynomial time algorithm to decompose a standard cell-based row-structure layout was proposed in [18]. The basic idea is to divide the whole problem into subproblems according to the left boundaries of the features. An example is shown in Fig. 2(b) in which the whole layout is divided into three cutting sets (three subproblems). Note that one feature may appear in multiple sets. All coloring solutions of each set can be enumerated easily. Then, a solution graph [Fig. 2(c)] can be constructed in such a way that each node in the graph represents one coloring solution of a cutting set. An edge exists between two coloring solutions if the two solutions are compatible with each other. With the solution graph constructed as described, any path from the virtual source s to the virtual sink t represents a legal coloring solution for every cell in the layout under decomposition. III. STANDARD CELL DECOMPOSITION The cell decomposition problem is discussed in this section. An effective method will be proposed and the coloring solutions obtained will be used in the detailed placement stage.

3 KUANG et al.: TPL AWARE OPTIMIZATION AND DETAILED PLACEMENT ALGORITHMS 1321 Fig. 4. Illustration for stitch insertion. The conflict graph between the four features is forming a K4, so stitching is necessary. Fig. 3. Impact of coloring solution of a cell on its neighbors. (a) There is a conflict between A and B. (b) Coloring of A is changed. (c) A is flipped horizontally. A. Issues on Cell Decomposition 1) Differences With General Decomposition: The decomposition of a cell is similar to the decomposition of a general layout, but there are two major differences. 1) In standard cell-based designs, as the power rail goes from the very left of the whole layout to the very right, they are preferred to have the same color in different cells. This is also true for the ground rails. Thus, there are two choices, one is to assign the power and ground rails of every cell the same color, and another option is to assign the two rails different colors. We assume the former in this paper. This constraint can be effectively integrated into the solution graph model. Since the leftmost cutting set of any cell layout must contain only the power and ground rails [because of the cell margin, see Fig. 1(a)], the coloring constraint on the power and ground rails can be satisfied by deleting the edges from node s to some nodes in the first cutting set (nodes corresponding to the solutions in which the power and ground rails have different colors). 2) We want to obtain many coloring solutions for each cell, since different colorings will be useful in different occasions depending on the neighboring cells. An example is shown in Fig. 3(a), in which the cell A conflicts with the cell B. If the coloring solution of the cell A is changed, the conflict can be solved easily [Fig. 3(b)]. For each coloring solution of a cell, flipping (or mirroring) it horizontally will also be considered as another solution, because flipped and unflipped instances of a cell with a certain coloring may have different safe distances with respect to its two neighbors. For example, the conflict in Fig. 3(a) can also be resolved by flipping the cell A, as shown in Fig. 3(c). 2) Redundancy Among Coloring Solutions: We first define redundant solution as follows. Definition 1 (Redundant Coloring Solution): A coloring solution is redundant if it is different from another solution only in the features that will never conflict with the features in another cell. The features that will never conflict with the features in another cell are called immune features in [13]. In [13], the features at distances from the two vertical boundaries larger than C min are counted as immune features. However, as pointed out in Section II-A, there is always a margin on the boundary of a cell, as long as the distances of a feature from the two boundaries are both larger than (C min M min ), where M min is the minimum cell margin among all the cells, it can be considered as an immune feature. In this way, more immune features can be identified and the redundancy between different coloring solutions can be further reduced. 3) Solutions With Fewest Stitches: For some complex layouts, stitches are needed to resolve the conflicts. As shown in Fig. 4, a stitch is inserted into the bottom feature to split it into two such that the two parts can be assigned different colors in order to resolve the conflicts between features. To support cell decomposition with stitches, an effective way is to reuse the solution graph as described in Section II-B as follows. First, all potential stitches are identified and all features are split into subfeatures according to the stitches. Then, each subfeature is regarded as an independent feature, and a conflict graph and a solution graph can be constructed as before. The only difference is that the edges in the solution graph are now weighted by the numbers of stitches [18]. After the feature splitting step as described above, there will be many more features than before. As a consequence, the total number of coloring solutions will increase dramatically. In our experiment, there can be millions of solutions in total even when the redundant solutions are eliminated. Therefore, we will only keep those solutions with the minimum number of stitches, i.e., the smallest number of stitches with which a cell can be decomposed successfully. If a cell can be decomposed without using any stitch, the minimum number of stitches for that cell will be 0. We found from the experiments that restricting to the minimum number of stitches for the colorings of each cell still can provide enough choices of coloring solutions to optimize the placement globally. There are several additional advantages of using this approach. 1) The number of solutions can be reduced from millions to tens. Enumeration of all the solutions can then be solved much more efficiently. 2) This guarantees a placement solution with the fewest number of stitches if a solution can be produced. 3) In [13], a weighted sum of stitch number and halfperimeter wirelength (HPWL) is used as the cost of a placement, but it is usually hard to set a good weight. In our approach, there is no need to consider stitch minimization in detailed placement since the number of stitches is guaranteed to be minimum, which simplifies the problem significantly. At the same time, there are still sufficient coloring solutions to finish placement successfully. 4) In [13], a parameter maxs was used to guide the solution enumeration such that only solutions with stitch number no more than maxs will be kept. However, before checking all the coloring solutions for each cell, it is hard to set a proper value for maxs. In our approach, the minimum stitch number for each cell can be found easily and there is no need for such a parameter.

4 1322 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 24, NO. 4, APRIL 2016 Fig. 6. (a) (e) Clusters of points. The width of feature b is smaller than 2 F min in (a). A K4 subgraph is highlighted in (b) and (c). Fig. 5. Flow of the cell decomposition process. 4) Native Conflict: There can be native conflict inside a standard cell, which must be removed before coloring. Both native conflict checking and removal are still open problems. In [13], the native conflicts are removed manually. In the following, we will propose a systematic method to check and remove the native conflicts. Both of them appear in the literature for the first time to the best of our knowledge. Although theoretically, not all conflicts can be identified, the methods perform well in practice. B. Problem Formulation We formulate the decomposition problem for the standard cells as follows. Problem 1: Given a standard cell layout, remove all detected native conflicts and find all nonredundant coloring solutions using the minimum number of stitches. C. Decomposition Flow As shown in Fig. 5, with the input layout of a cell, we will first check whether there is any native conflict caused by K4. If any native conflict is detected, we will remove it by modifying the layout as slightly as possible. We will then construct a conflict graph and a solution graph (Fig. 2). After that, we will try to find a path in the solution graph. If a path exists, the cell can be decomposed without using any stitch. Otherwise, we will identify all the stitch positions, and split the features in the layout into subfeatures. The updated layout will then be input into the decomposer again. Graph construction and path finding will be performed again. In the solution graph, we will find a path with the minimum number of stitches. If any path can be found, that means, the cell can be decomposed with some number of stitches. After finding the minimum stitch number (0 or more), all the solutions with that number of stitches will be enumerated. After that, the redundant solutions will be identified and removed. Finally, all the solutions will be output. We will describe each step in more detail below. D. Native Conflict Checking It was observed in [8] that a K4 between four points, e.g., the four red points labeled 1, 2, 3, and 4 in Fig. 6(a), is a very common case of native conflict. Note that here, we refer to a K4 between four points instead of between four features. A K4 between four features (Fig. 4) may be resolvable by stitches but a K4 between four points will be a native conflict. In the following, K4 always refers to a complete graph between four points. K4 can also be found in subgraphs [Fig. 6(b) and (c)] and the corresponding layouts will be undecomposable as well. Fig. 6(d) does not contain any K4 and is three-colorable. We found that checking K4 between points can be an effective way to identify many native conflicts. Besides, two points not separable by any stitch should be considered as the same point. For example, points 4 and 5 in Fig. 6(a) should be considered as the same point. This helps us to identify the more native conflicts. Given a cell layout, we first generate the conflict graph, which is then simplified with the methods in [6]. Note that in our case, the power and ground rails will be colored the same, so we will merge the nodes representing the power and ground rails. Corresponding edges will also be merged. After simplification, we will divide the features into grids such that all the points in one grid are not separable by a stitch. Suppose G is the simplified conflict graph and GR is the set of all the grids. We define the set of all quadruples as {(a, b, c, d) a, b, c, d GR}. We try to find a quadruple (a, b, c, d) that is a K4 which possibly causes a native conflict through enumeration with the following constraints 1 :1)a, b, c, andd are in different features A, B, C, andd and there is a conflict edge between every pair of A, B, C, andd in G and 2) the distance between every pair of a, b, c, andd is smaller than cs min. Theoretically, not all the native conflicts are related to K4, e.g., the cluster of points shown in Fig. 6(e) also has native conflict but it does not contain any K4. Identification of all possible native conflicts is still an open problem. However, the existence of K4 leads to most of the native conflict cases. For example, in our experiments, all the cells with native conflicts can be identified by this method of checking K4 (this is confirmed because all the other cells in the library can be decomposed successfully). If there is a native conflict not caused by K4, it cannot be detected by our native conflict detection method, thus it cannot be removed by our conflict removal method. However, we will be aware of this because we will not be able to find any legal solution after we build a solution graph. In this case, we can call another TPL decomposer such as [8] to get a decomposition result with unresolved conflicts. E. Native Conflict Removal If a cell layout has native conflict, we need to modify its layout as slightly as possible to get rid of the conflict before it can be decomposed. The modified layout must also 1 This enumeration only takes seconds for each cell in our experiments.

5 KUANG et al.: TPL AWARE OPTIMIZATION AND DETAILED PLACEMENT ALGORITHMS 1323 Algorithm 1 Native Conflict Removal Fig. 7. Illustration for native conflict removal. (a) Conflict points, critical points and critical edges. (b) Prohibited region and legal region. satisfy the following design rules: 1) minimum spacing rule between features (MSR) and 2) minimum boundary margin rule (MBMR). Manual modification as in [13] may only handle simpler layouts while can easily lead to violations of the above two rules for complicated ones. In our approach, we will try to move only one feature if possible to get rid of each K4 conflict subgraph. Consider any K4 conflict subgraph, there are four conflict points. Note that we only need to delete one edge from this K4 to remove it. To achieve this, we will look at all pairs of conflict points. For each pair, we will select one feature, called x, to be moved. For example, in Fig. 7(a), we may select feature c in the conflict pair (point 1, point 4) to be moved (where points 1 4 are forming a K4). First, we will locate the feature edges that will potentially conflict with c with respect to the MSR when c is moved. We call these feature edges as critical edges. To identify these critical edges, we will move x in the directions away from the corresponding conflict point. For the example in Fig. 7(a), feature c will be moved upward and leftward away from point 4 and we will be able to identify the two critical edges. Then, we will identify the points on x that are closest to these critical edges and we call them critical points. The critical point corresponding to the two critical edges is highlighted in Fig. 7(a). For each critical edge, we will compute a legal region for feature x with respect to the conflict point on it, according to the relative position between the critical edge and the corresponding critical point. Taking conjunction of all these legal regions with respect to each critical edge will give a final legal region for the conflict point on feature x, i.e., point 1 in the example of Fig. 7. We also need to compute a prohibited region [Fig. 7(b)], which is a region of all points within a distance of C min from the conflict point of x in the K4, i.e., point 4 in the example of Fig. 7. Finally, the prohibited region is subtracted from the legal region to give the feasible region for x. The flow is shown in Algorithm 1. Notice that the MBMR can be easily satisfied by adding two virtual features on the left and right boundaries of a cell. The advantage of obtaining the feasible regions instead of computing a single position is that many criteria can be considered simultaneously when shifting the feature. For example, we can choose a position by considering the distance moved, decomposability after modification (e.g., whether another native conflict is generated and the number of stitches needed for decomposition), size of the moved feature (smaller one is preferred), and the impact on the cell functionality (e.g., timing). In our implementation, our choice is based on the former two criteria, i.e., distance moved and decomposability, and all the native conflicts can be removed successfully. However, if there is no feasible region at all, it means that moving only one feature cannot resolve the problem. In this case, assuming feature x has no feasible region because moving x will violate MSR with another feature y, we may consider moving y first to give x more freedom. After modifying the M1 layer, other layers need to be checked to make sure that the moved wires are connected to other layers properly, and we have verified this for every modified cell. It is true that the timing will change with the modified metal layout. However, from the experiments, we can see that there are only a small number of cells that need to be modified, and the changes we made to the standard cells are also very small. To keep the timing change as small as possible, different modifying solutions, i.e., different positions in the feasible region calculated by Algorithm 1, can be evaluated with a timer. Actually, it has been shown in [13] that the delay change due to some similar modifications is <1%. Note that if we choose not to modify the standard cells with native conflicts, it will result in many native conflicts in the layout, which is highly undesirable. The best way to solve this problem is to design the standard cell libraries with TPL awareness. F. Stitch Identification Complex layouts need stitching to be decomposed. Most previous works including [13] use projection method to identify candidate stitches but many stitches in TPL may be missed [6]. A stitch finding method was proposed in [8]. For each feature, all the stitches that are legal in TPL were found and then one stitch was chosen to solve the conflict. However, for some very complicated layouts, the whole layout or even one feature may need multiple stitches. The approach in [8] is used to find all the stitches. Then, we consider the distance d between two neighboring stitches in the same feature. If d < F min (minimum feature size), the two stitches cannot be chosen at the same time

6 1324 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 24, NO. 4, APRIL 2016 Algorithm 2 Path Enumeration Fig. 8. (a) Two stitches that are too close will violate the design rule. (b) Position of a stitch can be shifted within a segment. because, otherwise, a feature that is too small will be generated [Fig. 8(a)]. According to [8], position within a projection segment is equivalent. Therefore, if two neighboring stitches are too close, they will be shifted to increase the distance in between to at least F min [Fig. 8(b)]. If this is impossible, we will simply forbid them to be used at the same time by selecting one arbitrarily. G. Coloring Solutions After finding all the stitches, a conflict graph and an edgeweighted solution graph will be constructed. To find the minimum stitch number (S min ) for a particular cell, we can simply find a shortest path in the solution graph. To find all the paths using S min stitches, Algorithm 2 is proposed. In Algorithm 2, we enumerate all the solutions by finding all the paths in the given solution graph. Finding all the paths is done by iteratively propagating from one node to all of its successors. Solutions with more than S min stitches are pruned in the process of propagation. After finding all the paths with the same stitch number S min, the redundant ones according to the discussion in Section III-A2 will be removed. H. Speedup Technique As shown in Fig. 3(a) and (b), one coloring solution for the cell A can be easily obtained from another solution by exchanging two colors. Observing this, we reduce the size of the solution graph by removing the nodes that can be obtained by exchanging two colors of the other nodes. This accelerates the decomposition process significantly. Also, note that we do not need to run the decomposition algorithm again for the mirrored instance of a cell. IV. TPL AWARE DETAILED PLACEMENT A. Lookup Table Construction After generating the coloring solution set for each cell in the library, we will precompute and store in a table the minimum distance required between any two cells and colorings such that no conflict is resulted (similar to [13]). However, we do not need to calculate the exact distance. Instead, we only need to try multiples of the width of the minimum filler cell from 0 to C min, where C min is the minimum colorable distance between two features. This method of constructing lookup table is still applicable when multiple patterning lithography is applied to multiple layers simultaneously, e.g., M1 layer needs TPL and contact Algorithm 3 TPL Aware Detailed Placement layer is also very dense, requiring DPL or even TPL, in the 14/10nm technology node. In this case, the M1 and the contact layers of a standard cell are decomposed separately first. When calculating the required distance between two coloring solutions, we compute a safe distance for M1 layers and contact layers, respectively, and take the larger one. B. Placement Flow Instead of considering TPL in the linear placement only as in [13], we propose a framework to consider TPL in the whole detailed placement process. Algorithm 3 shows the flow of our placer, which is developed and extended from [19] and [20]. The input is a legal placement without overlapping but there may be coloring conflicts, so we will first call TPL aware compaction to remove the conflicts by shifting cells in a row and to minimize HPWL at the same time. After this step,

7 KUANG et al.: TPL AWARE OPTIMIZATION AND DETAILED PLACEMENT ALGORITHMS 1325 Algorithm 4 TPL Aware Compaction Fig. 9. Graph models for TPL aware compaction. (a) Graph model for colorability checking. (b) Graph model for cell merging. we will maintain a nonconflicting placement all the way. The next two stages will be GlobalMove and LocalMove. GlobalMove globally moves each cell to its optimal region as in [14], [19], and [20], while LocalMove, including VerticalMove, TPL aware reordering, and TPL aware compaction, will locally move a cell to its nearby placement sites or rows to optimize wirelength. Both stages will stop when the improvement in wirelength is too small. One point to note is that whenever we move a cell to a new position, we will legalize the placement to ensure that there is no overlapping nor coloring conflict. This will be done by our TPL aware legalization. Compared with the approach in [13], we will try to move a cell to its optimal region by GlobalMove and LocalMove. But in [13], a cell will be placed in its original row as much as possible, which may not be good in terms of HPWL and stitch number minimization. C. TPL Aware Compaction Given a row of cells, we will decide the coloring solution and the position of each cell simultaneously, such that the HPWL is minimized and the layout is decomposable. In [13], some graph-based approaches were proposed that consider every placement site using a graph of very large size. In the following, we will present an effective TPL aware compaction method based on dynamic programming. Besides, as stitch minimization has already been done in the cell decomposition step, we can obtain further speedup. The procedure of our compaction algorithm can be found in Algorithm 4. It is based on the classical clumping algorithm [21] and uses the concept of optimal range. The optimal range can be found easily because the HPWL function of the x-coordinate of each cell is convex. The major difference between our compaction algorithm and the classical clumping algorithm is that when we put a cluster (which can be one or more cells) at its optimal range (in terms of HPWL minimization), we not only will check for cell overlapping but also will make sure that all the placed cells on the left of the current cluster can be colored without any conflict. This is called colorability check (procedure IsColorable). If this is not the case, we will merge the current cluster with its left neighbor to form a new cluster (procedure MergeCell) as in the classical clumping algorithm and continue with the placement process. However, this merging process also needs to consider coloring. Details of colorability check and this merging step will be elaborated below. Colorability check (procedure IsColorable) is based on a graph model as shown in Fig. 9(a), in which each column of nodes represent the coloring solutions of a cell. For any two neighboring nodes (they represent coloring solutions of two neighboring cells), if the current distance between the two cells is not less than the minimum distance required between the two corresponding coloring solutions, there will be a directed edge between them. After constructing this graph, we will trace the graph, starting from a virtual source s, until reaching the first node of the row or a node that has an edge connected to every coloring solution of its left neighbor, e.g., node k in Fig. 9(a). It means that the corresponding cell with such a coloring solution is at a safe distance from its left neighbor no matter what coloring solution its left neighbor is taking. If such a path can be found, the procedure IsColorable will return true, otherwise, it will return false. When we need to merge a set of cells (procedure MergeCell) to form a cluster, we need to determine the colors and the inter-cell distances at the same time. A simple graph model as in Fig. 9(b) is constructed in which each cell is represented by a column of nodes that correspond to its different coloring solutions. An edge exists between every pair of neighboring nodes with a weight representing the minimum distance required between the two corresponding coloring solutions. A shortest path algorithm is then applied to find the colors of the cells giving the smallest total width of the cluster, e.g., a shortest path is highlighted in Fig. 9(b). It is possible that some cells are out of the boundary after compaction. In this case, these cells will be marked and moved to other rows in the steps of GlobalMove and VerticalMove. We now analyze the time complexity of Algorithm 4 by assuming that the total number of input cells is n, the maximum number of coloring solutions of one input cell is t, and the total number of the nets connected to all the input cells

8 1326 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 24, NO. 4, APRIL 2016 Fig. 10. TPL aware reordering. (a) Left and right boundaries in conventional reordering. (b) Old and new boundaries in TPL aware reordering. c 1 c 3 are placed evenly within old boundaries, and a conflict between c 1 and c 2 is caused. (c) c 1 is shifted within the new boundaries to resolve the conflicts. is m. It is clear that at most n 1 clusters could be formed by merging cells. Every time when a new cluster is formed, it means that we have done the following: 1) placing a cluster in its optimal range; 2) checking for overlap; 3) procedure IsColorable; and 4) procedure MergeCell. Before placing a cluster in its optimal range, the optimal range should be calculated by considering the bounding box of every net connected to this cluster, and the time complexity of this step is linear to m [19]. Checking for overlap takes constant time. The time complexities of procedures IsColorable and MergeCell are both O(nt 2 ), which can be derived from the graph models in Fig. 9. Thus, the time complexity of Algorithm 4 is O(n(m + nt 2 )). D. TPL Aware Reordering Similar to [20], in TPL aware reordering, we try all six possible orders of every three consecutive cells and use the best one with the smallest HPWL. In the preliminary work of this paper [1], the cells are not made colorable immediately after choosing an order, and compaction is used to make the cells of the whole row colorable after reordering. It is possible that an order is chosen but the three cells or adjacent cells are not colorable, so the benefit of HPWL reduction cannot be obtained after compaction. To solve this problem, in our new method, we always maintain the colorability during the process of reordering. In conventional reordering, when we try to place three consecutive cells in a specified order, the positions of the left boundary and the right boundary will be maintained in order not to affect the other cells [Fig. 10(a)], where the left (right) boundary is at the left (right) boundary of the leftmost (rightmost) cell. The three cells are then placed evenly within the boundaries (called old boundaries). In TPL aware reordering, we will keep the colors of the cells and extend the left and right boundaries to give more flexibility to resolve the possible coloring conflicts. As shown in Fig. 10(b), the new left (right) boundary is at a safe distance from c l (c r ) according to their colorings. In this way, no matter how c 1 c 3 are placed, as long as they are within the new boundaries, c l and c r will not be affected. We still try to place the three cells evenly within the old boundaries first. If there is any conflict, we will Fig. 11. Illustration for TPL aware legalization. (a) Two cases that need to be legalized. (b) Graph model for TPL aware legalization. The path in blue color optimizes the cells to the left of c i and the path in green color optimizes the cells to the right of c i. shift c 1 or c 3 (or both) a little bit within the new boundaries to resolve the conflict. If this fails, the corresponding order will not be chosen. E. TPL Aware Legalization This TPL aware legalization will be invoked when a cell is moved to a new position, and this will happen in GlobalMove and VerticalMove [19], [20]. In this legalization, we should shift the cells on the left and on the right of the newly inserted cell c i as little as possible to remove overlapping and coloring conflicts. As shown in Fig. 11(a), a cell c i is inserted and it may overlap or have coloring conflict with its, say, left neighbor c i 1, we will then shift c i 1 to the left. After shifting, we will check whether c i 1 overlaps or has coloring conflicts with its left neighbor c i 2. If this is the case, we will further shift c i 2. This process continues until we reach the first cell or a cell that does not need to be shifted because no overlapping nor coloring conflict occurs. Shifting of the cells on the right of c i is similar. Here, we need to consider the cell colors simultaneously when shifting cells, as the colors of the shifted cells will affect the displacement required. Therefore, we need to perform cell shifting and color selection at the same time, to minimize the total displacement from the original placement as much as possible. In the following, we introduce the details of our algorithm that can achieve this. Our legalization algorithm is again based on dynamic programming using a graph model, as shown in Fig. 11(b). Here, c i is the newly inserted cell. The graph is similar to that in Fig. 9(b) in such a way that each column of nodes represent different coloring solutions of a cell and every pair of neighboring nodes are connected by a directed edge. Starting from the virtual source s, we will process from right to left. Based on the concept of dynamic programming, at each node x, we will calculate the cost of minimum total displacement if such coloring (of node x) is used based on the costs of different coloring solutions of its right neighbors. This computation can stop until we reach the first cell or a cell c j such that we do not need to shift c j nor to change the color of c j. In the former case, if the first cell is moved out of the row boundary, it means that the newly inserted cell cannot be legalized and the process will be terminated. In the latter case, it means that all the cells to the left of c j will not be affected. After this computational process, we will locate the path with the minimum cost, i.e., the minimum total displacement. With this path, we can trace

9 KUANG et al.: TPL AWARE OPTIMIZATION AND DETAILED PLACEMENT ALGORITHMS 1327 back the coloring solutions and positions of all the shifted cells including the newly inserted cell c i. By now, the cells to the left of c i have been legalized and colored [see Fig. 11(b) (blue path)]. Then, we do the same process for the cells to the right of c i. The path in green color in Fig. 11(b) is used to do this. Note that when we process the cells to the right of c i, the coloring of c i is already fixed, so the shaded node in Fig. 11(b) is chosen as the source. By now, all the cells that may be affected by the insertion of c i have been processed. To optimize better, we will perform the whole process again. However, this time we will first process the cells to the right of c i and then process the cells to the left of c i. Note that processing the cells to the left of c i first or processing the cells to the right of c i first may give different solutions, and we will choose the one with a smaller total displacement. V. TPL AWARE DETAILED PLACEMENT WITH DISPLACEMENT CONSTRAINT Modern global placers typically target at multiple objectives, such as wirelength, routability, timing, and power consumption, which is too complicated for a detailed placer that does not change a placement globally. Our detailed placer targets at wirelength and TPL decomposability but tries to minimize the change made to the global placement result. Displacement constraint is introduced for this purpose [14], [20]. The displacement of a cell from its original position is calculated in Manhattan distance. For any cell c, the displacement should not exceed a threshold Disp, i.e., c, x c x c + y c y c Disp, where(x c, y c ) and (x c, y c ) are the current coordinates and the original coord inates of c, respectively. It is easy to see that the movable region of a cell c with displacement constraint, called displacement region, will typically be in a diamond shape, as shown in Fig. 12(a). 2 In the step of global move, the constrained optimal region for a cell is the overlapped region between its displacement region and its optimal region, as shown in Fig. 12(b), and we need to find a position for c within the constrained optimal region. In TPL aware compaction, we define the movable range for a cell c in a row under displacement constraint as displacement range [Fig. 12(c)]. The displacement range of a cluster is a range that considers the movable ranges of all the constituent cells under displacement constraint. The constrained optimal range of a cluster is then calculated by taking the overlapped part of the optimal range and the displacement range [Fig. 12(d)]. In the process of compaction, a cluster will be placed in the leftmost position of its constrained optimal range. When a cell is moved for some other reasons (vertical move, reordering as well as legalization), we will check whether the displacement constraint is satisfied after the move. The move will not be carried out if any violation happens. Note that it is possible that there is no legal and colorable placement solution for TPL aware compaction with displacement constraint. For example, as shown in Fig. 13, if cluster 2 conflicts with cluster 1 when we place cluster 2, we will combine them and form cluster new. It may be impossible to place cluster new in such a way that both cluster 1 and cluster 2 are in 2 In this paper, we assume that the coordinate of a cell is the coordinate of its bottom left corner [Fig. 12(a) (red point)]. Fig. 12. Illustration for TPL aware detailed placement with displacement constraint. (a) Displacement region in global move. (b) Constrained optimal region. (c) Displacement range in compaction. (d) Constrained optimal range. Fig. 13. Example in which there is no legal position for the cluster. Fig. 14. Example in which there is no legal and colorable placement solution for TPL aware compaction with displacement constraint. (a) The input placement. (b) The placement to avoid conflicts. their displacement ranges. A more intuitive example is shown in Fig. 14. Fig. 14(a) shows the input placement, in which there is conflict but no overlapping between cells. In order to avoid coloring conflict, the distance between c 1 and c n must be increased by d [Fig. 14(b)]. Because c 1 (c n ) can be moved left (right) by at most Disp, if d > 2 Disp, itis impossible to have a legal placement without coloring conflict nor violation of the displacement constraint. Interestingly, compaction with displacement constraint but without coloring constraint will never encounter the situation in Fig. 13. In the process of displacement constrained com-

10 1328 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 24, NO. 4, APRIL 2016 the total displacement of the whole design can be minimized. Fig. 15. Model for minimizing total displacement. (a) Model for individual cells. (b) Model for a cluster of cells. paction, if coloring constraint is not considered and the input placement is legal, when two clusters are combined to form a new cluster cluster new, the displacement range for cluster new will not be empty. As shown in Algorithm 3, there are two situations where TPL aware displacement-constrained compaction will be invoked. The first situation is that the compaction is called for the first time and the placement input to compaction may not be colorable (see the third line of Algorithm 3). In this case, if no legal and colorable placement can be found for some rows under displacement constraint, these rows will be marked and we will try to resolve the conflicts in some later stages of global move and vertical move. If the conflicts are not solvable even with global move and vertical move, the information will be reported to the user that the displacement constraint should be relaxed. The second situation is that after the first round of compaction and the input placement to compaction is colorable (see the second last line of Algorithm 3). In this case, if no legal and colorable placement can be found for a row, the compaction process for this row will be skipped. VI. TPL AWARE DETAILED PLACEMENT TO MINIMIZE DISPLACEMENT In some cases, the input placement may have already been optimized in terms of wirelength, timing, and so on, the users are thus interested in making the circuit TPL decomposable while preserving the locations of the cells as much as possible. We further extend our TPL aware compaction to handle this by minimizing the total perturbation. We first formulate the problem as follows. Problem 2: Given a placement of a row, decide the locations and coloring solutions of the cells such that: 1) the order of the cells are kept; 2) no cells are moved out of the row; 3) there is no TPL conflict; and 4) the total displacement of the cells is minimized. We propose the following model to transform this problem into a problem of minimizing HPWL. As shown in Fig. 15(a), we create a set of nets such that each cell is connected by one and only one net to its original location, thus minimizing the total displacement is equivalent to minimizing the total HPWL of the created nets. Any TPL aware algorithm targeting on minimizing the HPWL can be applied here. We can directly employ the compaction algorithm to this problem. When several cells are combined into a cluster, similar to that in TPL aware compaction, multiple nets are considered simultaneously to find the optimal range for the cluster, as shown in Fig. 15(b). Given a placement of a design with multiple rows, if we solve Problem 2 for each row using our method, VII. EXPERIMENTAL RESULTS A. Comparison With the Previous Work We implemented the proposed approaches of standard cell decomposition and TPL aware detailed placement in C++, on a 3.4GHz Linux machine with 32GB memory, and compare it with [13]. For cell decomposition, we use M1 layers of the Nangate 45nm standard cell library [22], and scale it down to 16 nm for comparison with [13]. We also use the same C min as in [13], and other parameters are derived from the library. First, we check all the cells in the library for native conflict, and 11 out of 128 cells are found to have native conflicts, while all the others can be decomposed. Then, all the native conflicts are removed using our native conflict removal technique. From the experiments, the maximum displacement is about C min /3. This automatic process is more efficient than the manual approach in [13]. Then, we enumerate all the coloring solutions with the minimum number of stitches for each cell. For most of the cases, coloring a standard cell only takes <0.1 s. We can see the advantages of this cell decomposition step in the placement result. To evaluate our TPL aware detailed placer, we test it on the benchmarks used in [13]. The statistics of these benchmarks are shown in Table I, where Row#, Cell#, Net#, and Site# are the numbers of rows, movable cells, nets, and placement sites of a row in a benchmark, respectively, and Density is the total area of movable cells divided by the total placeable area in a circuit. The result comparisons are also shown in Table I, where S# represents the stitch number, WL Red. denotes the percentage reduction on HPWL compared with the input placement, and Time is the wall clock time in seconds. Results in the table are better than the preliminary work [1] in general because we improved the implementation. Note that if the HPWL reduction is negative, it means that the HPWL is increased. S# and WL Red. of [13] are the same as the published results (there are two sets of results in [13] and we choose the one with a better tradeoff between runtime and quality). To compare runtime fairly, we run the executable file provided by Yu et al. [13] on our machine. It can be seen that for S#, as our approach guarantees the fewest stitches by the cell decomposition step, Yu et al. [13] generate 167% more stitches than us on average. To compare the HPWL and the runtime, we implement two versions of our placer: one is called Compact, that only applies TPL aware compaction, and the other one is called Full, that is a full run of our placement algorithm. It can be seen that even with only compaction, we can reduce 0.15% more on the HPWL and achieve speedup on average. With the full placer, we can further improve the HPWL and achieve 4.77% more reduction. The speedup is This much shorter running time is mainly coming from our efficient dynamic programming-based TPL aware compaction and TPL aware legalization. For some benchmarks, [13] will worsen the HPWL, which never happens for both versions of our placer.

11 KUANG et al.: TPL AWARE OPTIMIZATION AND DETAILED PLACEMENT ALGORITHMS 1329 TABLE I STATISTICS OF AND RESULT COMPARISON ON THE BENCHMARKS USED IN THE PREVIOUS WORK Fig. 16. Performance analysis for TPL aware detailed placement with different displacement constraints. (a) Wirelength. (b) Runtime. (c) Actual maximum displacement. (d) Average displacement. B. Experiments With Displacement Constraint To analyze the performance of the TPL aware placer with different displacement constraints, we compare the placement result with small/large displacement constraint with the result without displacement constraint in Table II, where MaxD is the actual maximum displacement and AvgD is the average displacement for the cells in a circuit, and the unit for the displacement constraints is the placement site. As stitch number has been minimized when decomposing the cells,

12 1330 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 24, NO. 4, APRIL 2016 TABLE II RESULT COMPARISON WITH SMALL/LARGE DISPLACEMENT CONSTRAINT AND WITHOUT DISPLACEMENT CONSTRAINT the stitch numbers for the placement results with different displacement constraints are all the same. The conflict numbers for all the testcases are all 0. For these data sets, the height of a standard cell row is 8.3 placement sites, so when displacement constraint = 10, a cell may only be moved within its original row or be moved to adjacent rows, whereas when displacement constraint = 100, a cell may be moved ten rows away. It can be seen from the table that when the displacement constraint is small, in order to make the placement conflict free, wirelength may be sacrificed for some data sets. When the displacement constraint is large, compared with the case without displacement constraint, it achieves 91% of the wirelength reduction with only 69% of the running time, 45% of the actual maximum displacement, and 88% of the average displacement. We further analyze the effect of more different displacement constraint values for three data sets alu-70, ecc-80, and top-90. They are with relatively small size and low density, medium size and moderate density, and large size and high density, respectively. The result is shown in Fig. 16. It can be seen that the wirelength reduction, running time, and actual maximum displacement are almost always monotonically nondecreasing with the displacement constraint value. The only exception is that for top-90 when displacement constraint = 90, the running time is even shorter than the case when displacement constraint = 70. This is because the running time is mainly determined by the problem size and the number of iterations. When displacement constraint = 90, it will take fewer iterations to converge, and the running time may thus be shorter. It can also be seen that the actual displacement value is always very similar to the given displacement constraint, which demonstrates the robustness of our placer. For the average displacement, it is interesting to see that the average displacement with a larger constraint may be even smaller than TABLE III STATISTICS OF THE LARGE BENCHMARKS the average displacement with a smaller constraint. A possible reason is that when the displacement constraint is smaller, it may need more movement of the cells to make the placement colorable and to have a good wirelength. C. Experiments With Large Benchmarks As the benchmarks from [13] are all small, we generate some large benchmarks to further demonstrate the efficiency and effectiveness of our placer. We map the standard cell library [22] to the suite A benchmarks of the ISPD2014 placement contest [23] such that we can have the layout information of the standard cells in the placement. We only take the benchmarks with density not larger than 90% as, otherwise, the layout may be too dense to be colorable. The benchmarks are first placed with the global placer Ripple2 [24] and the detailed placer RippleDP [20], and then input to our placer. The statistics of the benchmarks are shown in Table III. The results 3 on the benchmarks in Table III are shown in Table IV, where C# is the number of conflicts between cells. There are also two versions of our placers: 1) Full and 2) Compact. It can be seen that with only compaction, some dense and large-scale data sets are uncolorable, with a large 3 We do not compare with [13] on the large benchmarks because the provided binary cannot handle these benchmarks.

13 KUANG et al.: TPL AWARE OPTIMIZATION AND DETAILED PLACEMENT ALGORITHMS 1331 TABLE IV RESULTS OF THE LARGE BENCHMARKS TABLE V ROUTABILITY OF THE PLACEMENT RESULTS TABLE VI RESULTS FOR MINIMIZING TOTAL DISPLACEMENT Fig. 17. Examples for cross row coloring conflicts. (a) The case when stitch is not considered. (b) The case when stitch is considered. perturbation to the input placement is under control. The slight reduction of congestion may come from the optimization of wirelength. number of remaining conflicts, which shows the necessity of the other steps in our placer. As these benchmarks are much larger than those in Table II, we set the displacement constraint to be 40 placement sites, which is the height of four rows. A full run of our placer can always robustly satisfy the displacement constraints while making the placement colorable. Without constraint, it can achieve larger wirelength reduction, but the perturbation to the input placement is also more severe. Then, we analyze the significance of the displacement constraint. As Ripple2 is a routability-driven placer, we use the congestion estimator in NCTUgr [25] to evaluate the congestion of the placement results generated by the TPL aware placer with displacement constraint = 40 and without displacement constraint. The results are shown in Table V, where Ov. is the number of overflows and Diff. is the difference of overflows. It can be seen that with displacement constraint, the routability of the input placement is preserved well, with 1.84% reduction on congestion on average, while without displacement constraint, the congestion is increased by 35.5% on average. This is because with displacement constraint, D. Experiments on Minimizing Total Displacement In this section, we demonstrate the efficiency and effectiveness of our TPL aware compaction method on minimizing total displacement. The results are shown in Table VI, where TotD is the total displacement, Method1 is our method that minimizes the total displacement, and Method2 is one round of TPL aware compaction without minimization of the total displacement (normally there are several rounds of compaction until the improvement is insignificant). The first six benchmarks are generated by randomly taking a sequence of cells from a row or a part of a row of pci_bridge32_1, and the other benchmarks are those in Table IV that can be made colorable with compaction only. It can be seen that the total displacement of Method1 is much smaller than that of Method2 even for very small benchmarks. The running time of Method1 for matrix_mult is longer than that of Method2 because there are more cells that need clustering in Method1 for this particular benchmark. VIII. CONCLUSION In this paper, we propose a flow to co-optimize the cell layout decomposition and detailed placement. Its efficiency and effectiveness are verified by extensive experiments. To the best of our knowledge, all the existing TPL aware placers assume no conflict between cells in different rows.

1332 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 24, NO.

The minimum possible distance between two features in different rows is d1 = 2dmin + 2wpg, as shown in Fig.

wpg varies from 1 to 2 the minimum feature width in different standard cell libraries. It can be seen that if d1 is smaller than the minimum colorable distance Cmin, there will be cross row conflicts.

14 1332 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 24, NO. 4, APRIL 2016 However, with the scaling of the minimum feature size, it is possible that there is coloring conflicts cross standard cell rows. The minimum possible distance between two features in different rows is d1 = 2dmin + 2wpg, as shown in Fig. 17(a), where dmin is the minimum distance between features with respect to the design rules and wpg is the width of power/ground rail. wpg varies from 1 to 2 the minimum feature width in different standard cell libraries. It can be seen that if d1 is smaller than the minimum colorable distance Cmin, there will be cross row conflicts. This is possible in advanced nodes. The situation is even worse when the stitches are inserted, as shown in Fig. 17(b), where d2 is only 2wpg. In the near future, we will extend our placer to consider cross row TPL conflicts. Besides, as identification of all possible native conflicts is still an open problem, we will try to solve this problem in our future work. R EFERENCES [1] J. Kuang, W.-K. Chow, and E. F. Y. Young, Triple patterning lithography aware optimization for standard cell based design, in Proc. IEEE/ACM Int. Conf. Comput.-Aided Design, Nov. 2014, pp [2] A. B. Kahng, C.-H. Park, X. Xu, and H. Yao, Layout decomposition for double patterning lithography, in Proc. IEEE/ACM Int. Conf. Comput.Aided Design, Nov. 2008, pp [3] D. Abercrombie et al., Double patterning from design enablement to verification, Proc. SPIE, vol. 8166, pp X X-14, Oct [4] B. Yu, K. Yuan, B. Zhang, D. Ding, and D. Z. Pan, Layout decomposition for triple patterning lithography, in Proc. IEEE/ACM Int. Conf. Comput.-Aided Design, Nov. 2011, pp [5] B. Yu, K. Yuan, B. Zhang, D. Ding, and D. Z. Pan, Layout decomposition for triple patterning lithography, IEEE Trans. Comput.-Aided Design Integr. Circuits Syst., vol. 34, no. 3, pp , Mar [6] S.-Y. Fang, Y.-W. Chang, and W.-Y. Chen, A novel layout decomposition algorithm for triple patterning lithography, in Proc. ACM/EDAC/IEEE Design Autom. Conf., Jun. 2012, pp [7] S.-Y. Fang, Y.-W. Chang, and W.-Y. Chen, A novel layout decomposition algorithm for triple patterning lithography, IEEE Trans. Comput.Aided Design Integr. Circuits Syst., vol. 33, no. 3, pp , Mar [8] J. Kuang and E. F. Y. Young, An efficient layout decomposition approach for triple patterning lithography, in Proc. ACM/EDAC/IEEE Design Autom. Conf., Jun. 2013, pp. 69:1 69:6 [9] B. Yu, Y.-H. Lin, G. Luk-Pat, D. Ding, K. Lucas, and D. Z. Pan, A highperformance triple patterning layout decomposer with balanced density, in Proc. IEEE/ACM Int. Conf. Comput.-Aided Design, Nov. 2013, pp [10] H. Tian, Y. Du, H. Zhang, Z. Xiao, and M. D. F. Wong, Constrained pattern assignment for standard cell based triple patterning lithography, in Proc. IEEE/ACM Int. Conf. Comput.-Aided Design, Nov. 2013, pp [11] Q. Ma, H. Zhang, and M. D. F. Wong, Triple patterning aware routing and its comparison with double patterning aware routing in 14nm technology, in Proc. ACM/EDAC/IEEE Design Autom. Conf., Jun. 2012, pp [12] Y.-H. Lin, B. Yu, D. Z. Pan, and Y.-L. Li, TRIAD: A triple patterning lithography aware detailed router, in Proc. IEEE/ACM Int. Conf. Comput.-Aided Design, Nov. 2012, pp [13] B. Yu, X. Xu, J.-R. Gao, and D. Z. Pan, Methodology for standard cell compliance and detailed placement for triple patterning lithography, in Proc. IEEE/ACM Int. Conf. Comput.-Aided Design, Nov. 2013, pp [14] B. Yu et al., Methodology for standard cell compliance and detailed placement for triple patterning lithography, IEEE Trans. Comput.-Aided Design Integr. Circuits Syst., vol. 34, no. 5, pp , May [15] H. Tian, Y. Du, H. Zhang, Z. Xiao, and M. D. F. Wong, Triple patterning aware detailed placement with constrained pattern assignment, in Proc. IEEE/ACM Int. Conf. Comput.-Aided Design, Nov. 2014, pp [16] T. Lin and C. Chu, TPL-aware displacement-driven detailed placement refinement with coloring constraints, in Proc. ACM Int. Symp. Phys. Design, Mar. 2015, pp [17] A. G. Wassal, H. Sharaf, and S. Hammouda, Placement-aware decomposition of a digital standard cells library for double patterning lithography, Proc. SPIE, vol. 8522, pp , Dec [18] H. Tian, H. Zhang, Q. Ma, Z. Xiao, and M. D. F. Wong, A polynomial time triple patterning algorithm for cell based row-structure layout, in Proc. IEEE/ACM Int. Conf. Comput.-Aided Design, Nov. 2012, pp [19] M. Pan, N. Viswanathan, and C. Chu, An efficient and effective detailed placement algorithm, in Proc. IEEE/ACM Int. Conf. Comput.-Aided Design, Nov. 2005, pp [20] W.-K. Chow, J. Kuang, X. He, W. Cai, and E. F. Y. Young, Cell densitydriven detailed placement with displacement constraint, in Proc. ACM Int. Symp. Phys. Design, Mar. 2014, pp [21] A. B. Kahng, P. Tucker, and A. Zelikovsky, Optimization of linear placements for wirelength minimization with free sites, in Proc. IEEE Asia South Pacific Design Autom. Conf., Jan. 1999, pp [22] NanGate FreePDK45 Generic Open Cell Library. [Online]. Available: accessed Aug. 7, [23] ISPD 2014 Detailed Routing-Driven Placement Contest. [Online]. Available: accessed Aug. 7, [24] X. He et al., Ripple 2.0: High quality routability-driven placement via global router integration, in Proc. ACM/EDAC/IEEE Design Autom. Conf., Jun. 2013, pp. 152:1 152:6. [25] W.-H. Liu, Y.-L. Li, and C.-K. Koh, A fast maze-free routing congestion estimator with hybrid unilateral monotonic routing, in Proc. IEEE/ACM Int. Conf. Comput.-Aided Design, Nov. 2012, pp Jian Kuang received the B.E. degree from Sun Yat-sen University, Guangzhou, China, in He is currently pursuing the Ph.D. degree with the Department of Computer Science and Engineering, The Chinese University of Hong Kong, Hong Kong. He was a Software Engineering Intern with Cadence Design Systems, San Jose, CA, USA, in His current research interests include VLSI computer-aided design, physical design automation, and design for manufacturability. Wing-Kai Chow received the B.Sc. degree from The Hong Kong Polytechnic University, Hong Kong, in 2009, and the M.Sc. degree in computer science from The Chinese University of Hong Kong, Hong Kong, in 2012, where he is currently pursuing the Ph.D. degree. He was a Research Assistant with The Hong Kong Polytechnic University in 2010, and The Chinese University of Hong Kong, from 2010 to His current research interests are design automation of VLSI, including placement and routing. Evangeline F. Y. Young received the B.Sc. and M.Phil. degrees in computer science from The Chinese University of Hong Kong (CUHK), Hong Kong, and the Ph.D. degree from The University of Texas at Austin, Austin, TX, USA, in She is currently a Professor with the Department of Computer Science and Engineering, CUHK. She is actively working on floorplanning, placement, routing, design for manufacturability, and algorithmic designs. Her current research interests include algorithms and computer-aided design of VLSI circuits. Dr. Young served on the organization committees of the International Symposium on Physical Design, the International Symposium on Applied Reconfigurable Computing, and the International Conference on Field Programmable Technology, and the program committees of several major conferences, including the Design Automation Conference, the International Conference on Computer-Aided Design, the International Symposium on Physical Design, the Asia and South Pacific Design Automation Conference, the Design, Automation & Test in Europe Conference, and the Great Lakes Symposium on VLSI. She also served on the Editorial Boards of the IEEE T RANSACTIONS ON C OMPUTER -A IDED D ESIGN OF I NTEGRATED C IRCUITS AND S YSTEMS, ACM Transactions on Design Automation of Electronic Systems, and Integration, the VLSI Journal.

Double Patterning-Aware Detailed Routing with Mask Usage Balancing

Double Patterning-Aware Detailed Routing with Mask Usage Balancing Seong-I Lei Department of Computer Science National Tsing Hua University HsinChu, Taiwan Email: d9762804@oz.nthu.edu.tw Chris Chu Department