Kernelization for Cycle Transversal Problems

Kernelization for Cycle Transversal Problems Ge Xia Yong Zhang Abstract We present new kernelization results for two problems, s-cycle transversal and ( s)- cycle transversal, when s is 4or 5. We showthat 4-cycle transversal and 4-cycle transversal admit 6k 2 vertex kernels in general graphs. We then prove NP-completeness of s-cycle transversal and ( s)-cycle transversal in planar graphs for s > 3. We show the following linear vertex kernels in planar graphs a 74k vertex kernel for 4-cycle transversal; a 32k vertex kernel for ( 4)-cycle transversal; a 266k vertex kernel for ( 5)-cycle transversal. Keywords: Kernelization, Planar Graphs, Cycle Transversal 1 Introduction Graphs that are free of cycles (not just induced cycles) of a given length s are extensively studied in extremal graph theory [2, 9, 10, 13, 17]. When s is small(s 5), such graphs are examined for their chromatic numbers [19, 20], adapted list coloring [3], and M-degrees [7]. For small s value, such graphs also have interesting applications. For example, the class of 4-cycle-free Taner graphs plays an important role in designing Low-Density Parity-Check (LDPC) codes [18]. In this paper we study the problem of obtaining a maximum subgraph without cycles of a given length s through edge deletions. This problem is equivalent to the following edge transversal problem. We say an edge e covers a cycle C if e is one of the edges in C. s-cycle transversal: Given an undirected graph G and an integer k, is there a set S of at most k edges in G such that every cycle in G of length s is covered by at least one edge in S? We shall call S a transversal set. s-cycle transversal is known to be NP-complete in general graphs [23]. Approximation algorithms for s-cycle transversal have been studied in Krivelevich [16] and Kortsarz et al. [15]. Krivelevich [16] presented a linear programming-based 2-approximation algorithm for 3- cycle-transversal. Kortsarz et al. [15] showed that the approximation ratio 2 is likely the best possible by showing that a (2 ǫ)-approximation algorithm for 3-cycle-transversal implies a (2 ǫ)-approximation algorithm for vertex cover, which is highly unlikely [14]. Kortsarz et al. [15] also gave a generalized (s 1)-approximation algorithm for s-cycle transversal where s is any odd number. We also study a related problem ( s)-cycle transversal, where one asks for a minimum edge set to cover all cycles of length s for a given s. The approximation algorithms for ( s)-cycle transversal has been studied in [15]. A preliminary version of this paper was published in Proceedings of the 6th International Conference on Algorithmic Aspects in Information and Management (AAIM 2010), Weihai, China, July 2010. Department of Computer Science, Lafayette College, Easton, PA 18042. Email: gexia@cs.lafayette.edu Department of Computer Science, Kutztown University, Kutztown, PA 19530. Email: zhang@kutztown.edu 1

s-cycle transversal has been studied in the context of parameterized complexity. A parameterized problem is a set of instances of the form (x,k), wherexis the inputinstance and k is a nonnegative integer called the parameter. A parameterized problem is said to be fixed parameter tractable if there is an algorithm that solves the problem in time f(k) x O(1), where f is a computable function solely dependent on k, and x is the size of the input instance. When dealing with NP-hard problems in practice, kernelization is a very useful preprocessing technique. The idea of kernelization is to design data reduction rules to reduce the input instance to an equivalent kernel of smaller size. Formally, the kernelization of a parameterized problem is a reduction to a problem kernel, that is, to apply a polynomial-time algorithm to transform any input instance (x,k) to an equivalent reduced instance (x,k ) such that k k and x g(k) for some function g solely dependent on k. It is known that a parameterized problem is fixed parameter tractable if and only if the problem is kernelizable. We refer interested readers to [6, 11] for more details. Note that all the kernel sizes discussed in this paper refer to the number of vertices in the kernels. Brügmann et al. [4] first studied the kernelization of 3-cycle-transversal. They designed data reduction rules to obtain a 6k vertex kernel for 3-cycle-transversal in general graphs. They also proved the NP-completeness of 3-cycle-transversal in planar graphs and gave a 11k/3 kernel for the problem. The vertex version of 3-cycle transversal, where one asks for a minimum vertex set to cover all cycles of length 3, was also studied in the literatures. In particular, Abu-Khzam [1] obtained a quadratic kernel for this problem by reducing it to 3- Hitting Set. Fernau [8] showed this problem can be solved in running time O( V 3 +2.1788 k ). Wahlström [21] further improved the running time to O (2.0755 k ). Using the crown decomposition technique, Abu-Khzam [1] obtained an O(k s 1 ) kernel for the s-hitting set problem. Since s-cycle transversal and( s)-cycle transversal naturally reduce to s-hitting set, Abu-Kham s result for s-hitting set implies O(k s 1 ) vertex kernels for s-cycle transversal and ( s)-cycle transversal in general graphs for s > 3. In this paper we improve this kernelization upper bound for the case s = 4, i.e., 4-cycle transversal and ( 4)-cycle transversal admit 6k 2 vertex kernels in general graphs. Note that our result is hard to obtain by a direct application of 4-Hitting set kernelization, since in the case of 4-hitting set a quadratic bound in the number of vertices in the hypergraph will lead to an O(k 4 ǫ ) bound on the overall kernel size, which is very unlikely due to a hardness result in Dell and van Melkebeek [5]. Next we study the complexities of s-cycle transversal and ( s)-cycle transversal in planar graphs. We show that when s 3 these two problems remain NP-complete in planar graphs, by generalizing the NP-completeness proof of 3-cycle-transversal in planar graphs [4]. We then present a set of kernelization results for s-cycle transversal and ( s)-cycle transversal in planar graphs a 74k vertex kernel for 4-cycle transversal; a 32k vertex kernel for ( 4)-cycle transversal; a 266k vertex kernel for ( 5)-cycle transversal. All our kernels in planar graphs are linear in size. Our kernelization results in planar graphs are based on the region-decomposition technique developed by Guo and Niedermeier [12]. Using this technique we can decompose a planar graph into a number of regions (linear in k), and bound the number of vertices in each region by a constant. However, when s > 5, it seems the standard region-decomposition technique is not strong enough to obtain linear kernels for ( s)- cycle transversal. In a separate paper [22], we developed an enhanced region-decomposition technique, in which the decomposition is based on a special set of shortest paths called witness paths, to deal with this problem. Using the enhanced region-decomposition technique, we are able to obtain a linear kernel of size 36s 3 k for ( s)-cycle transversal when s > 5. Next, we present several necessary definitions and some backgrounds. We only consider simple 2

and undirected graphs. All paths and cycles considered in this paper are simple. A path P in a graph G is a sequence of vertices P = (v 0,v 1,,v l ) such that v i 1 and v i are adjacent for all 1 i l. The length of P is the number of edges in P. A path Q is called a sub-path of P if Q is a subsequence of P; or equivalently, we say that P contains Q. A cycle is a closed path where the first vertex is the same as the last vertex in the sequence. For example C = (a,b,c,d,a) is a cycle of length 4 containing edges (a,b), (b,c), (c,d), and (d,a). For a set of edges T, if both cycles C 1 and C 2 contain T, we say T is shared by C 1 and C 2. If C 1 and C 2 have no other common edges, we say C 1 and C 2 are edge-disjoint-sharing T. Let W be a set of cycles in G, we use E(W) to represent the set of edges in cycles in W, and G[W] to represent the subgraph induced by vertices in cycles in W. For all the s-cycle transversal problems with various s values discussed in this paper, we assume the input graph has been preprocessed by removing all vertices and edges that are not contained in any s-cycle. It is clear that this preprocessing is correct and can be done in polynomial time. 2 Kernels in General Graphs First we show that 4-cycle transversalin general graphs admits a 6k 2 kernel. Let G = (V,E) be the input graph after the above mentioned preprocessing. We construct a witness structure W in G, which is a subgraph of G whose size does not exceed our target kernel bound. Let W be a maximal set of 4-cycles in G that share pairwise at most one edge. We enumerate all 4-cycles in G and compute W using a simple greedy algorithm. Reduction Rule 1: For any edge e in a 4-cycle in W, if e is shared by k other 4-cycles in W, then delete e from G, remove all 4-cycles containing e from W, and decrease k by 1. It is clear that Reduction Rule 1 can be applied in polynomial time. Lemma 2.1. Reduction Rule 1 is correct. Proof. Since W is a maximal set of 4-cycles that share pairwise at most one edge, the k + 1 4-cycles are pairwise edge-disjoint-sharing e. If G has a transversal set S of size k, then e must be in S, since otherwise at least k+1 edges are needed to cover the k+1 4-cycles. Thus we can safely delete e and decrease k by 1. Let G be the reduced graph that cannot be further reduced by Reduction Rule 1. Let Q := G G[W]. We show that G has at most 6k 2 vertices, by bounding the sizes of both W and Q. Theorem 2.2. 4-cycle transversal admits a 6k 2 problem kernel. Proof. We show that if G has more than 6k 2 vertices, then G does not have a transversal set of size k. First observe that if W contains more than k 2 4-cycles, then G does not have a transversal set of size k, since by Reduction Rule 1 any edge in G covers at most k 4-cycles in W. This implies that G[W] has no more than 4k 2 vertices, because otherwise W contains more than k 2 4-cycles. In the rest of the proof, we show that Q has no more than 2k 2 vertices. The total number of vertices in G is at most 4k 2 +2k 2 = 6k 2. First observe that Q induces an independent set. Suppose that there is an edge e in Q. e must belong to a 4-cycle C. Since both end points of e are not in W, at most two vertices of C 3

are in W, which implies that C shares at most one edge with any 4-cycle in W. Thus C should be included in W due to the maximality of W. Let v be a vertex in Q and let C v be a 4-cycle containing v. There exists a 4-cycle C W that shares at least two edges with C v. We call v a Q-neighbor of C in this case. Note that if v is a Q-neighbor of a 4-cycle C = (a,b,c,d,a) W, then v must be connected to either both a and c, or to both b and d. If C has more than two Q-neighbors, then at least two of them, denoted by x and y, are connected to the same vertex pair {a,c} or {b,d}. Without loss of generality, assume that both x and y are connected to both a and c. Then (x,a,y,c,x) is a 4-cycle that does not share an edge with 4-cycles in W and thus should be included in W. This contradicts the that x and y are not in W. Since every 4-cycle C in W has at most two Q-neighbors, and each vertex in Q is a Q-neighbor of some 4-cycle in W, the number of vertices in Q is at most 2 W 2k 2. This completes the proof. For an input graph G of ( 4)-cycle transversal, we compute a witness structure W, which is a maximal set of 3-cycles and 4-cycles that share pairwise at most one edge. For any edge e in a 3-cycle or 4-cycle in W, if e is shared by k other 3 or 4-cycles in W, we delete e from G and decrease k by 1. The arguments in the proof of Theorem 2.2 still apply. Thus we have Corollary 2.3. ( 4)-cycle transversal admits a 6k 2 problem kernel. Our quadratic kernels improve the O(k 3 ) kernel implied by Abu-Khzam s kernelization algorithm for s-hitting set [1]. The concept of a witness structure and Reduction Rule 1 are similar to the concept of weakly related edges and the high occurrence rule used in [1]. The use of problem-specific structures (e.g., Q-neighbours) is crucial in getting the improved kernel upper bounds. 3 NP-Completeness in Planar Graphs s-cycle transversal for any fixed s 3 is known to be NP-complete in general graphs [23]. Brügmann et al. [4] showed it is NP-complete in planar graphs when s = 3, through a reduction from Vertex Cover in cubic planar graphs, a known NP-complete problem. For an input planar cubic graph G of Vertex Cover, they constructed a graph G by replacing vertices and edges in G with special vertex gadgets and edge gadgets such that finding a minimum edge set to cover all triangles in G is equivalent to finding a minimum vertex cover for G. Their proof can be generalized to yield the NP-completeness for s-cycle transversal in planar graphs for any fixed s 3. In particular, we modify the vertex and edge gadget construction to obtain a graph G such that finding a minimum edge set to cover all cycles of length s in G is equivalent to finding a minimum vertex cover for G. Theorem 3.1. s-cycle transversal for any fixed s 3 is NP-complete in planar graphs of maximum degree seven. Proof. (Sketch) Let (G,k) be the input of Vertex Cover, where G = (V,E) is a planar cubic graph. We build a graph G as follows. For each u V, construct a vertex gadget B u as shown in Figure 1(a). A solid edge in B u represents a single edge, a dashed edge represents a chain of length s 2, and the dotted edge between b and e represents a chain of length s 3. The three edges (a,c),(b,d),(c,d) are docking edges that are used to connect with the edge gadgets to be defined as below. For each edge {u,v} E, construct an edge gadget, which is a cycle C {u,v} = {x,y,z,...} of length s. The vertex gadgets are then connected through edge gadgets as 4

a x z y e b f g h c x x d y z y z (a) (b) (c) Figure 1: The vertex and edge gadgets used in the reduction from Vertex Cover to s-cycle transversal. illustrated in Figure 1(b). For example, a portion of G as depicted in Figure 1(c) is represented in G by a structure shown in Figure 1(b). Note that in Figure 1(c) the end vertices of two rightmost edges are not shown; the vertex gadgets for those two vertices are not shown either in Figure 1(b). Again the dashed edges and dotted edges represent chains of length s 2 and s 3, respectively. Since G is a planar cubic graph, G is a planar graph with maximum degree seven. Using similar arguments as in Theorem 2.1 in [4], we can show that (G,k) is a YES-instance of Vertex Cover if and only if (G,k ) is a YES-instance of s-cycle transversal, where k = 6 V +k. Since there is no cycle of length < s in the construction, the above proof can be naturally extended to show NP-completeness of s-cycle transversal in planar graphs. Corollary 3.2. ( s)-cycle transversal for any fixed s 3 is NP-complete in planar graphs of maximum degree seven. 4 Linear Kernels in Planar Graphs We first present a 74k kernel for 4-cycle transversal in planar graphs. Given an input graph G, we say that a 4-cycle C in G is dangling if only one edge of C is shared with other 4-cycles in G. Reduction Rule 2: If there is a dangling 4-cycle C in G, then delete all four edges in C and decrease k by 1. It is easy to verify the following lemma. Lemma 4.1. Reduction Rule 2 is correct and can be applied in polynomial time. Let the reduced graph be G = (V,E). To bound the size of G, we will use the regiondecomposition frameworkdevelopedbyguoandniedermeier[12]. SupposethatG hasatransversal set S of size k. Let V(S) be the set of endpoints of the edges in S. V(S) have at most 2k vertices. We call a cycle an uncover cycle if it is not covered by S. In order to apply the region-decomposition framework, a graph problem has to admit the following distance property. For two vertices u and v, we say the distance d(u,v) is the length of 5

a shortest path between u and v. Similarly, for an edge e = (u,v) and a vertex w, the distance d(e,w) is the minimum of d(u,w) and d(v,w). Definition 4.2 ([12]). A graph problem on input G = (V,E) is said to admit the distance property with constants C V and C E if, for every solution set S with vertex set V(S), it holds that, for every vertex u V, there exists a vertex v V(S) with d(u,v) C V, and, for every edge e E, there exists a vertex v V(S) with d(e,v) C E. With respect to V(S), it is clear that 4-cycle transversal admits the distance property (C V = 1,C E = 1). Therefore we can decompose G into a set of regions. Definition 4.3 ([12]). A region R(u, v) between two distinct vertices u, v V(S) is a closed subset of the plane with the following properties: 1. The boundary of R(u,v) is formed by two paths between u and v of length at most 3. The two paths do not need to be disjoint or simple. A vertex is said to be inside R(u,v) if it lies either on the boundary or strictly inside R(u, v). 2. All vertices inside R(u,v) have distance at most 1 to at least one of the vertices u and v. Similarly, all edges whose both endpoints are inside R(u,v) have distance at most 1 to at least one of the vertices u and v. 3. With the exception of u and v, none of the vertices inside R(u,v) are from V(S). A V(S)-region decomposition of G is a set R of regions such that no vertex lie strictly inside more than one region from R. The following Lemma directly follows from Lemma 1 in [12]. Lemma 4.4. There is a maximal V(S)-region decomposition R for the graph G that consists of at most 6k 6 regions. Then we show that there are constant number of vertices inside each region in R and constant number of vertices outside all regions in R. Lemma 4.5. Every region R(u,v) in R contains at most 12 vertices which are not in V(S). Proof. Consider a region R(u, v). We distinguish two cases. Case 1: (u,v) / E, or (u,v) E but (u,v) / S. There may not be an edge between u and v. If there is an edge between u and v, then this edge is not in S, which means u and v are endpoints of two edges e,e S, separately, where both e and e are outside R(u,v). This implies that no edge from S is inside R(u,v). First, there are no degree-one vertices in R(u,v) since they are not involved in any 4-cycle. Second, there are no edges with both end points strictly inside R(u,v). Suppose there is such an edge e with both end points strictly inside R(u,v), e must form a 4-cycle with two other vertices inside R(u,v), thus all the edges in this 4-cycle must be inside R(u,v). This contradicts the fact that no edge from S is inside R(u,v). Therefore all vertices strictly inside R(u,v) must have degree at least two and must connect to boundary vertices. As shown in Fig 2(a), R(u,v) has at most 6 boundary vertices including u and v. For every pair of boundary vertices, there can be only one vertex strictly inside R(u,v) connecting to both of them, otherwise they will form an uncovered 4-cycle strictly inside R(u,v). Out of the 15 pair of boundary vertices, each of the pairs {u,b}, {u,d}, {a,v}, {c,v} cannot share a common neighbor vertex strictly inside R(u,v) since these will result in uncovered 4-cycles; each of the pairs {a,b}, {c,d} cannot 6

a b a b e f u v u v c d c d (a) (b) Figure 2: Two cases in the region decomposition (a) case 1; (b) case 2 share common neighbor vertices since the neighbor vertices will have distance larger than 1 from u and v; at most one of the pairs {u,v}, {a,d}, {b,c} can have a common neighbor due to planarity of the region. Therefore, there can be at most 5 vertices strictly inside R(u,v), as shown in Fig 2(a). Case 2: (u,v) S. The edge (u,v) is either inside R(u,v) or inside a different region between u and v. The analysis for case 1 still applies except that there can be edges with both end points strictly inside R(u,v), such as (e,f) in Fig 2(b) which is contained in a 4-cycle (u,e,f,v,u). Since G is reduced by Reduction Rule 2, this 4-cycle cannot be dangling, therefore at least one of the edges (e,u), (e,f), and (f,v) has to be involved in another 4-cycle. First, it is not possible for (e,f) to be involved in another 4-cycle. If (e,f) forms another 4-cycle (u,f,e,v,u) with (u,v), then we have an uncovered 4-cycle (u,e,v,f,u) insider(u,v). If (e,f) forms another 4-cycle with an edgeother than (u,v) in R(u,v), again this 4-cycle will be uncovered. Without loss of generality, assume that (e,u) is involved in another 4-cycle. It is not possible for (e,u) to be involved in another 4-cycle that contains (u,v). If there is a 4-cycle (u,e,w,v,u) where w is a vertex inside R(u,v), then (e,w,v,f,e) will be uncovered in R(u,v). So (e,u) has to be involved in a 4-cycle that is covered by an edge outside R(u,v). In order to form such a 4-cycle, e must be connected to a boundary vertex, either a, c, or v (connecting to b or d will result in uncovered 4-cycles). Based on this observation, there can be at most three such edges with both end points strictly inside R(u,v) forming 4-cycles with (u,v), see Fig 2(b). It is easy to verify that one more such edge will either result in a dangling 4-cycle or an uncovered 4-cycle in R(u,v). In this case, there are at most 8 vertices strictly inside the region R(u,v). Therefore, there are at most 8 vertices strictly inside R(u,v) and at most 4 vertices on the boundary of R(u,v). Overall, there are at most 12 vertices in R(u,v) that are not in V(S). Lemma 4.6. There are at most 6k 6 vertices lying outside the regions in R. Proof. Let w be a vertex outside any region in R. w has to be involved in some 4-cycle C. Let (u,v) S be the edge that covers C. Without loss of generality, assume that w is adjacent to u, i.e., there exists another vertex w such that C = (u,w,w,v,u). If w is also outside any region, then the path (u,w,w,v) either forms a new region between u and v or should be included in 7

an existing region between u and v, contradicting the maximality of R. Therefore w must be a boundary vertex for some region R(u,v) between u and v. Let (u,a,b,v) and (u,c,d,v) be the boundary paths of R(u,v). w cannot be b or d since otherwise (u,w,w,a,u) or (u,w,w,c,u) are uncovered 4-cycles. Therefore w has to be either a or c. This implies that either (a,v) E or (c,v) E. Note that in this case, w is the only vertex outside all regions that is adjacent to boundary vertices of R(u,v). If there is another such vertex w, then this will result in either uncovered 4-cycles or violation of the planarity of R(u, v). Overall, there can be one outside vertex for each region, this gives the 6k 6 bound. Theorem 4.7. 4-cycle transversal admits a 74k kernel in planar graphs. Proof. To count the total number of vertices, we do not need to count the vertices outside all regions in R. From the analysis in Lemma 4.6, a vertex outside the region R(u,v) will always be adjacent to either u and one of the boundary vertices adjacent to u, or v and one of the boundary vertices adjacent to v. If such an outside vertex exists, we cannot have a vertex strictly inside R(u,v) that is adjacent to the same two vertices since this will result in an uncovered 4-cycle. When we count the total number of vertices, we only need to count them once. There are at most 2k vertices in V(S). By Lemma 4.4 and Lemma 4.5, there are at most 6k 6 regions and at most 12 vertices for each region. Therefore the total number of vertices in G is at most (6k 6) 12+2k = 74k 72. The proof of Theorem 4.7 can be easily adapted to show the following kernelization result for ( 4)-cycle transversal in planar graphs. Corollary 4.8. ( 4)-cycle transversal admits a 32k kernel in planar graphs. Proof. Using the same argument as in the proof of Theorem 4.7, we decompose the planar input graph G into 6k 6 regions. The arguments in Lemma 4.5 still apply. Since there cannot be any uncovered 3-cycle in a region R(u,v), we can further reduce the number of vertices in each region. For case 1 (Figure 2(a)), four of the five vertices strictly inside R(u,v) (except the vertex adjacent to both u and v) form uncovered 3-cycles with boundary edges of R(u, v). Therefore they cannot exist in R(u,v) and there is at most one vertex strictly inside R(u,v). Similarly, for case 2 (Figure 2(b)), each vertex strictly inside R(u, v) either form uncovered 3-cycles, such as e, or cannot exist independently, such as f. The only possible vertex strictly inside R(u, v) will be a vertex that is adjacent to both u and v. So in this case there is also at most one vertex strictly inside R(u,v). Overall, the numbers of vertices inside each region R(u,v) is at most 5, one strictly inside R(u,v) and four on the boundary. We claim there are no vertices lying outside the regions in R. Suppose there exists such a vertex w. If w forms in a 3-cycle with u and v, then w should be included as a boundary vertex of R(u,v). If w forms a 4-cycle which is covered by the edge (u,v), then we follow the arguments in Lemma 4.6. With the additional requirement that w cannot be form an uncovered 3-cycle, we see that such w cannot exist. Therefore the total number of vertices in G is at most (6k 6) 5+2k = 32k 30. Next we consider ( 5)-cycle transversal. We say a cycle C is a small cycle if C has length 5. A small cycle C is dangling if only one edge of C is shared with other small cycles. Given two vertices u and v, we call a simple path L from u to v a chain if all vertices in L except u and v are of degree two. We use L to denote the length of L. We assume the input graph G = (V,E) for ( 5)-cycle transversal has been preprocessed such that all edges and vertices which are not involved in any small cycle are deleted. We repeatedly apply each of the 8

following reduction rules until the graph cannot be reduced further. Reduction Rule 3.0: If there is a dangling small cycle C in G, then delete all edges in C and decrease k by 1. Clearly Reduction Rule 3.0 is correct. Reduction Rule 3.1: For any two vertices u and v with (u,v) / E, if there are 2 chains L and L from u to v such that L + L 5 and L L, then delete L and reduce k by 1. Lemma 4.9. Reduction Rule 3.1 is correct. Proof. We shall prove that if the graph G has a transversal set of size k, then there exists a transversal set of size k that contains an edge in L. Since L and L form a small cycle, if no edge in L is included in a transversal set, then at least one edge in L must be included. Suppose there is a transversal set S of size k that contains an edge e in L. Let e be an arbitrary edge in L. Form another set S by substituting e with e, i.e., S = S e +e. We claim that S is also a transveral set of size k for G. Obviously S = S = k. If there exists a small cycle C in G S, then C must contain L and a path P from u to v. This implies that P and L also form a small cycle in G S which is not covered by S. This contradicts that S is a transversal set for G. Reduction Rule 3.2: For any two vertices u and v with (u,v) / E, if there are two chains L and L from u to v such that L + L > 5 and L L, delete L. Lemma 4.10. Reduction Rule 3.2 is correct. Proof. LetGandG bethegraphsbeforeandafterl isdeleted. WeshowthatGhasatransversal set of size k if and only if G has one. Since G is a subgraph of G, the only-if part is obvious. Suppose that S is a transversal set of size k for G. Case 1: S does not contain any edge in L. This implies that S is also a transversal set for G. Suppose this is not true, then there exists a small cycle C in G S. C must contain L and a path P from u to v. Then P and L also form a small cycle in G which is not covered by S. This contradicts that S is a transversal set for G. Case 2: S contains an edge e in L and S is not a transversal set for G. This implies that there exists a small cycle C that contains L and a path P from u to v and is not covered by S. Since L + L > 5 (by Reduction Rule 4.1) and P + L 5, we have P < L. Moreover, we show C is the only small cycle that is not covered by S in G, that is, the path P is unique. If there is another path P from u to v such that P and L also form a small cycle in G not covered by S, by the same argument we have P < L. Since P and P are two different paths from u to v and P + P < P + L P + L 5, P and P must form a small cycle (not necessarily including u or v) that is not covered by S. This contradicts that S is a transversal set in G. Pick an edge e in P and form a set S = S e+e, we claim that S is also a transversal set of size k for G. Suppose this is not true and there is a small cycle C that is not covered by S. Then C must contain L and another path P from u to v, which is different from P. Since P < L, P and P must form a small cycle not covered by S. This again contradicts that S is a transversal set in G. 9

Now that S is a transversal set of size k for G and S does not contain any edge in L, by case 1, S is also a transversal set for G. Reduction Rule 3.3: For an edge (u,v) E, if there is a small cycle C containing (u,v) such that for any edge e (u,v) in C, all small cycles containing e must go through both u and v, then delete (u,v) and decrease k by 1. Lemma 4.11. Reduction Rule 3.3 is correct. Proof. Let G and G be the graphs before and after the deletion of (u,v). We show that G has a transversal set of size k if and only if G has a transversal set of size k 1. If S is a transversal set of size k 1 for G, then clearly S (u,v) is a transversal set of size k for G. On the other hand, let S be a transversal set of size k for G, if S contains (u,v), then S (u,v) is a transversal set of size k 1 for G. If S does not contain (u,v), to cover the small cycle C, at least one of the edges other than (u,v) has to be in S. Call this edge e. Then let S = S e + (u,v). We claim S is also a transversal set for G. Suppose there is a cycle C that is not covered by S, then C must contain e and thus goes through both u and v, and C does not contain (u,v). C can be broken into two paths P 1 and P 2 both connecting u and v. Without loss of generality, suppose P 1 contains e, then P 2 and (u,v) form an even smaller cycle, and this small cycle is not covered by S. This contradicts that S is a transversal set for G. Since S is a transversal set for G and S contains (u,v), S (u,v) is a transversal set of size k 1 for G. Clearly allthereductionrulescanbeappliedinpolynomialtime. LetG = (V,E) bethegraph in which all the reduction rules cannot beapplied further. Supposethat G has a transversal set S of size k. Let V(S) be the set of endpoints of the edges in S. V(S) has at most 2k vertices. With respect to V(S), ( 5)-cycle transversal admits the distance property (C V = 2,C E = 1). By the region-decomposition framework (Lemma 1 in [12]), we can decompose G into a set of regions. Lemma 4.12. There is a maximal V(S)-region decomposition R for the graph G that consists of at most 12k 12 regions. Next we bound the number of vertices inside each region and outside all the regions in R. Lemma 4.13. Every region R(u,v) in R contains at most 22 vertices which are not in V(S). Proof. We distinguish two cases. Case 1: (u,v) / E, or (u,v) E but (u,v) / S. Any vertex w strictly inside R(u,v) must belong to a small cycle C. C cannot be totally inside R(u,v), since otherwise C will not be covered by S. Since w is strictly inside R(u,v), C must have either one or two vertices outside R(u,v), which means w belongs to a path of length 2 or 3 and this path connects two boundary vertices of R(u,v). To bound the number of vertices strictly inside R(u, v), we consider all possible paths of length 2 or 3which are strictly insider(u,v) andconnect two boundaryvertices. First we consider paths of length 2. To avoid uncovered small cycle inside R(u,v), a path of length 2 can be only between vertex pairs {u,v}, {a,f}, {b,e}, {c,d}. Due to planarity, only paths between one such pair can exist. Also, only one such path can be between the vertex pair since two such paths will form a 10

a b c R 1 a b c u R 2 v u v R 3 d e f (a) d e f (b) Figure 3: Two cases in the region decomposition (a) case 1; (b) case 2, note that only the structure of the subregion R 2 is drawn here. 4-cycle. This implies that there can be only one vertex w strictly inside R(u,v) such that w only belongs to a path of length 2 connecting two boundary vertices. For paths of length 3, one of the boundary vertices has to be either u or v, due to the distance restriction C V = 2 in the decomposition. Therefore they can be between the following vertex pairs: {u,c}, {u,v}, {u,f}, {a,v}, {d,v}. We decompose R(u,v) further into subregions between any of these vertex pairs (see Figure 3 (a)). Due to planarity, at most three subregions can exist inside R(u, v). Let P 1 be the lowest path of length 3 connecting u and c. Let R 1 be the subregion enclosed by the path (u,a,b,c) and P 1. We claim that there are only two vertices strictly inside R 1. Any vertex w strictly inside R 1 also belongs to a path of length 3 connecting two boundary vertices of R(u,v). To avoid forming uncovered small cycles in R 1, the only possibility is that w belongs to a path P 1 of length 3 connecting u and c. Also, P 1 has to be a chain between u and c. By Reduction Rule 3.1 and 3.2, P 1 is the only such chain strictly inside R 1. Let P 2 and P 3 be the highest and lowest paths of length 3 connecting u and v. Let R 2 be the subregion enclosed by P 2 and P 3. P 1 and P 2 may share boundary vertices. Based on the definitions of R 1 and R 2, there are no vertex between subregion R 1 and R 2. We define and analyze other subregions in a similar fashion. By the same argument, there are at most 2 vertices strictly inside each subregion. So, in total there are at most 20 vertices in R(u,v) except u and v. Case 2: (u,v) S. For any vertex w strictly inside R(u,v), in addition to paths of length 2 or 3 connecting two boundary vertices (as in case 1), w can also belong to a path of length 4 connecting u and v (this path forms a 5-cycle with the edge (u,v)). As in case 1, we decompose R(u,v) into subregions. The definition and analysis of all subregions remain the same except the subregion R 2 between u and v (see Figure 3 (b)). Note that in Figure 3 (b) only the structure of the subregion R 2 is drawn. Let P 1 and P 2 be the highest and lowest paths (strictly inside R(u,v)) of length 3 or 4 connecting u and v. R 2 is the subregion enclosed by P 1 and P 2. We claim that there are no paths of length 2 to 4 strictly inside R 2 connecting u and v, because such path will result in the 11

application of Reduction Rule 3.3. In particular, if such path P exists, then P and (u,v) forms a small cycle. For each edge e on P, all the small cycles containing e must go through both u and v. If there is a small cycle C containing e and C does not go through u and v, i.e., C intersects with P 1 or P 2 at a vertex other than u and v, then this will always result in an uncovered small cycle in R 2. Therefore such C cannot exist and Reduction Rule 3.3 applies. In the worst case, both P 1 and P 2 are of length 4. It is still possible to have paths of length 3 inside R 2 connecting vertex pairs {u,c }, {u,f }, {a,v}, {d,v}. To avoid uncovered small cycles, paths connecting at most two such vertex pairs can exist in R 2 and such paths has to also be chains. By Reduction Rule 3.1 and 3.2, there can be only one such path between a certain vertex pair. Therefore there are at most 10 vertices inside R 2 except u and v. For the other two subregions that can possibly exist in R(u, v), there are at most 7 vertices inside each subregion. Furthermore, for the paths P 1 and P 2 that enclose R 2, at least one vertex on each path is shared with other subregions, since otherwise Reduction Rule 3.3 applies. So there are in total at most 22 vertices inside R(u,v) except u and v. Lemma 4.14. There are no vertices outside the regions in R. Proof. Suppose w is a vertex outside any region in R. w has to be in a small cycle C. Let (u,v) be the edge in S that covers C and R(u,v) be the region that contains (u,v). Then w is on a path P of length at most four between u and v. Clearly P has to intercept with R(u,v). It is not hard to check that no matter how P intercepts with R(u,v), P should either be included in R(u,v) or from an uncovered small cycle with the boundary paths of R(u,v). Therefore such w cannot exist. Theorem 4.15. ( 5)-cycle transversal admits a 266k kernel on planar graphs. Proof. There are at most 2k vertices in V(S). By Lemma 4.12, the reduced graph G can be decomposed into at most 12k 12 regions. By Lemma 4.13 and Lemma 4.14, there are at most 22 vertices inside each region and there are no vertex outside all the regions. Therefore the total number of vertices in G is at most (12k 12) 22+2k = 266k 264. Acknowledgement We thank anonymous referees for many valuable suggestions. References [1] F. N. Abu-Khzam. A kernelization algorithm for d-hitting set. Journal of Computer and System Sciences, 76(7):524 531, 2010. [2] N. Alon. Bipartite subgraphs. Combinatorica, 16:301 311, 1996. [3] O. V. Borodin, A. V. Kostochka, N. N. Sheikh, and G. Yu. M-degrees of quadrangle-free planar graphs. Journal of Graph Theory, 60(1):80 85, 2009. [4] D. Brügmann, C. Komusiewicz, and H. Moser. On generating triangle-free graphs. Electronic Notes in Discrete Mathematics, 32:51 58, 2009. 12

[5] H. Dell and D. van Melkebeek. Satisfiability allows no nontrivial sparsification unless the polynomial-time hierarchy collapses. In Proceedings of the 42nd ACM symposium on Theory of computing, STOC 10, pages 251 260, 2010. [6] R. Downey and M. Fellows. Parameterized Complexity. Springer, 1999. [7] L. Esperet, M. Montassier, and X. Zhu. Adapted list coloring of planar graphs. Journal of Graph Theory, 62(2):127 138, 2009. [8] H. Fernau. A top-down approach to search-trees: Improved algorithmics for 3-hitting set. Algorithmica, 57:97 118, 2010. [9] Z. Füredi. On the number of edges of quadrilateral-free graphs. Journal of Combinatorial Theory, Series B, 68(1):1 6, 1996. [10] Z. Füredi, A. Naor, and J. Verstraete. On the Turan number for the hexagon. Advances in Mathematics, 203(2):476 496, 2006. [11] J. Guo and R. Niedermeier. Invitation to data reduction and problem kernelization. SIGACT News, 38(1):31 45, 2007. [12] J. Guo and R. Niedermeier. Linear problem kernels for NP-hard problems on planar graphs. In Proceedings of the 34th International Colloquium on Automata, Languages and Programming (ICALP), pages 375 386, 2007. [13] E. Györi and N. Lemons. Hypergraphs with no odd cycle of given length. Electronic Notes in Discrete Mathematics, 34:359 362, 2009. [14] S. Khot and O. Regev. Vertex cover might be hard to approximate to within 2-ǫ. Journal of Computer and System Sciences, 74(3):335 349, 2008. [15] G. Kortsarz, M. Langberg, and Z. Nutov. Approximating maximum subgraphs without short cycles. SIAM Journal on Discrete Mathematics, 24:255 269, 2010. [16] M. Krivelevich. On a conjecture of Tuza about packing and covering of triangles. Discrete Math., 142(1-3):281 286, 1995. [17] A. Naor and J. Verstraëte. A note on bipartite graphs without 2k-cycles. Combinatorics Probability & Computing, 14(5-6):845 849, 2005. [18] R. H. Thomas, J. G. Alex, and M. C. Keith. Which codes have 4-cycle-free Tanner graphs? IEEE Transactions on Information Theory, 52(9):4219 4223, 2006. [19] C. Thomassen. On the chromatic number of triangle-free graphs of large minimum degree. Combinatorica, 22(4):591 596, 2002. [20] C. Thomassen. On the chromatic number of pentagon-free graphs of large minimum degree. Combinatorica, 27(2):241 243, 2007. [21] M. Wahlström. Algorithms, Measures and Upper Bounds for Satisfiability and Related Problems. PhD thesis, Department of Computer and Information Science, Linköpings universitet, Sweden, 2007. 13

[22] G. Xia and Y. Zhang. On the small cycle transversal of planar graphs. Theoretical Computer Science, 412:3501 3509, 2011. [23] M. Yannakakis. Node-and edge-deletion NP-complete problems. In Proceedings of the tenth annual ACM symposium on Theory of computing (STOC), pages 253 264, 1978. 14