Flexible Colorig Xiaozhou (Steve) Li, Atri Rudra, Ram Swamiatha HP Laboratories HPL-2010-177 Keyword(s): graph colorig; hardess of approximatio Abstract: Motivated b y reliability cosideratios i data deduplicatio for storage s ystems, w e itroduce the problem of flexible colorig. Give a hypergraph H ad the umber of allowable colors k, a exible colorig of H is a assigmet o f o e or more colors to each vertex s uch that, for each hyperedge, it is possible to choose a color from each vertex's color list so that this hyperedge i s strogly colored (i.e., each vertex has a differet color). Differet colors for the same vertex ca be chose for di fferet i cidet hyperedges ( hece the term flexible). The goal is to miimize color cosumptio, amely, the total umber of colors assiged, coutig multiplicities. Flexible colorig is N P-hard ad trivially approximable, w here s is t he size o f t he l argest hyperedge, ad is the umber of vertices. Usig a recet result by Basal ad K hot, we s how that if k is costat, the it is UGC-hard to approximate to withi a factor of s - ɛ, for arbitrarily small c ostat ɛ > 0. Lastly, we preset a algorithm with a where k' is umber of colors used by a strog colorig algorithm for H. approximatio ratio, Exteral Postig Date: March 11, 2 011 [Fulltext] A pproved for Exteral Publicatio Iteral Postig Date: March 11, 2 011 [Fulltext] Additioal Publicatio Iformatio: to be published i I formatio Process Letters. Copyright 2011 Hewlett-Packard Developmet C ompay, L.P.
Xiaozhou (Steve) Li xiaozhou.li@hp.com HP Labs 1501 Page Mill Road Palo Alto, CA 94304 Flexible Colorig Atri Rudra atri@buffalo.edu Computer Sc. & Egg. Dept. SUNY Buffalo Buffalo, NY 14260 Ram Swamiatha ram.swamiatha@hp.com HP Labs 1501 Page Mill Road Palo Alto, CA 94304 Hewlett-Packard Laboratories Techical Report HPL-2010-177 Abstract Motivated by reliability cosideratios i data deduplicatio for storage systems, we itroduce the problem of flexible colorig. Give a hypergraph H ad the umber of allowable colors k, a flexible colorig of H is a assigmet of oe or more colors to each vertex such that, for each hyperedge, it is possible to choose a color from each vertex s color list so that this hyperedge is strogly colored (i.e., each vertex has a differet color). Differet colors for the same vertex ca be chose for differet icidet hyperedges (hece the term flexible). The goal is to miimize color cosumptio, amely, the total umber of colors assiged, coutig multiplicities. Flexible colorig is NP-hard ad trivially s (s 1)k approximable, where s is the size of the largest hyperedge, ad is the umber of vertices. Usig a recet result by Basal ad Khot, we show that if k is costat, the it is UGC-hard to approximate to withi a factor of s ε, for arbitrarily small costat ε > 0. Lastly, we preset a algorithm with a s (s 1)k k approximatio ratio, where k is umber of colors used by a strog colorig algorithm for H. 1 Itroductio Data deduplicatio is a storage systems techique that aims to reduce the storig of multiple copies of the same data, thereby savig storage space. The followig optimizatio problem arises i data deduplicatio. A large umber of data objects (biary strigs for our purposes) are to be stored o some umber of disks. Each object cosists of a umber of blocks. For reliability cosideratios, blocks belogig to the same object should be stored o distict disks so that the failure of a disk oly affects oe block. I storage systems, it is commo that objects have idetical blocks. To save space, idetical blocks eed ot be stored multiple times. The goal is to store the fewest umber of blocks without violatig the distict-disks rule, omittig cosideratios such as disk capacities. To uderstad the problem better, cosider the followig simple example. Suppose we eed to store three objects o two disks. Each object cosists of two blocks: {A, B}, {B, C}, ad {A, C}, respectively. The the most ecoomical way to store these objects is to store four blocks, say, A, C o the first disk, ad B, C o the secod. This placemet is legitimate because the first object cosists of A 1 (meaig block A o disk 1) ad B 2, the secod cosists of B 2 ad C 1, ad the This research was supported by a 2010 HP Labs Iovatio Research Program grat. 1
third cosists of A 1 ad C 2. It is easy to verify that storig oly three blocks A, B, ad C, will violate the distict-disks rule stated above. I this paper, we formulate the above problem ito a optimizatio problem called flexible colorig. Sectio 2 presets the problem formulatio ad some simple observatios. Sectio 3 presets a hardess of approximatio result. Sectio 4 derives some combiatorial properties of flexible colorig o graphs. Sectio 5 presets a simple approximatio algorithm. 2 Problem formulatio We formulate this optimizatio problem as the followig graph-theoretic problem which we call flexible colorig. Give a hypergraph H ad the umber of allowable colors k, a flexible colorig of H is a assigmet of oe or more colors to each vertex such that, for each hyperedge, it is possible to choose a color from each vertex s color list so that this hyperedge is strogly colored (i.e., each vertex has a differet color). Differet colors for the same vertex ca be chose for differet icidet hyperedges (hece the term flexible). The goal is to miimize color cosumptio, amely, the total umber of colors assiged, coutig multiplicities. Clearly, for flexible colorig to be feasible, we eed k s, where s is the size of the largest hyperedge. It is easy to see that there is o eed to assig a vertex more tha s colors, because assigig s colors to a vertex gives the vertex eough flexibility to choose a color for ay hyperedge to which the vertex belogs. Clearly, i the above formulatio, a vertex i H correspods to a block, a hyperedge i H correspods to a object, k correspods to the umber of disks, ad color cosumptio correspods to storage space cosumptio. I the example described earlier, the hypergraph cosists of three vertices {A, B, C}, three hyperedges {A, B}, {B, C}, {C, A}, ad s = k = 2. A legitimate flexible colorig is colorig A with color 1, B with color 2, ad C with colors 1 ad 2. The total color cosumptio is four. For s = 2, computig the optimal flexible colorig is equivalet to the problem of fidig the maximum k-colorable iduced subgraph, but for geeral s, we are ot aware of a similar problem. Fially, we poit out that give a valid assigmet of colors to every vertex, a valid assigmet of colors to the edpoits of a hyperedge ca easily be obtaied by solvig a bipartite maximum matchig problem (where the ed-poits form the left vertices, the colors {1,..., k} form the right side, ad (v, c) is a edge if the color c is assiged to the ed-poit v). 3 Hardess of approximatio Flexible colorig is NP-hard, because it cotais graph colorig as a special case. To see this, take a istace of the graph colorig problem (G, k), where we wish to determie if graph G is k- colorable. Aswerig the questio Ca G be flexibly colored with k allowable colors ad cosume oly V (G) colors? solves the graph colorig problem, because the requiremet of cosumig oly V (G) colors forces flexible colorig to assig exactly oe color to each vertex. Followig the termiology for graph colorig, we say a hypergraph H is (k, l)-flex-colorable if there is a flexible colorig for H that uses k allowable colors ad cosumes at most l colors, coutig multiplicities. Give H ad k, we call the smallest umber l such that H is (k, l)-flexcolorable the cosumptio umber of H ad k, deoted by ϕ(h, k). Observe that ϕ(h, k), where = V (H), because every vertex is assiged at least oe color. O the other had, H is trivially (k, s (s 1)k)-flex-colorable, because we ca assig oe distict color to (ay) k vertices 2
ad s colors to the remaiig k vertices. These two simple observatios idicate that flexible colorig is trivially (s (s 1)k )-approximable. I what follows, we show that, if k is costat, the s is essetially the best ratio that ca be achieved. To prove this, we use the followig recet result by Basal ad Khot. Theorem 1 ([1]) Assumig the Uique Games Cojecture (UGC), for ay iteger s 2 ad arbitrary costats α, β > 0, give a s-uiform hypergraph H = (V, E), distiguishig betwee the followig cases is NP-hard: (YES): there exist disjoit subsets V 1,..., V s V, satisfyig V i 1 α s V ad such that o hyperedge cotais at least two vertices from some V i. (NO): every vertex cover has size at least (1 β) V. We ow state ad prove the mai theorem of this sectio. Theorem 2 If k is a costat, the it is UGC-hard to approximate the cosumptio umber to withi a factor of s ε, for arbitrarily small costat ε > 0. Proof: Recall that i order for flexible colorig to be feasible, we eed s k. Therefore, if k is give to be costat, the so is s. Cosider a s-uiform hypergraph H = (V, E). Let = V. Suppose H is i the YES case i Theorem 1. Let V = V \(V 1 V 2 V s ). Sice V i 1 α s for all i, we have V α. Because the V i s are disjoit ad o V i cotais more tha oe vertex from ay hyperedge, we ca color the vertices i the V i s with oe color, say color i, ad color those i V with (ay) s colors. Sice o V i cotais more tha oe vertex from ay hyperedge, this colorig is a legitimate flexible colorig. The total color cosumptio is s i=1 V i + s V + (s 1) α. Therefore, ϕ(h, k) (1 + (s 1)α). For ay give costat ε 1, we ca choose a α small eough so that ϕ(h, k) (1 + ε 1 ), because s is costat. O the other had, suppose H is i the NO case i Theorem 1. Recall that a vertex cover of a hypergraph is a subset of the vertices such that it cotais at least oe vertex i every hyperedge. Cosider a flexible colorig o H. For each color set T {1, 2,..., k} such that 1 T < s, let V T be the set of vertices that are colored with T. We observe that V T does ot cotai ay hyperedge etirely, because a hyperedge is of size s but T < s. This implies that V \ V T is a vertex cover. By Theorem 1, V T β. Summig up over all the possible color sets T such that 1 T < s, we obtai a upper boud o the umber of vertices that are assiged less tha s colors: s 1 ( k i=1 i ) β. For ay costat ε 2, we ca choose a β small eough so that the above summatio is at most ε 2, because s ad k are costats. Therefore, the umber of vertices that are assiged s colors is at least (1 ε 2 ) (recall that o vertex eeds to be assiged more tha s colors), ad the color cosumptio o these vertices (ad hece o all vertices) is at least (1 ε 2 ) s. I other words, if H is i the NO case i Theorem 1, the ϕ(h, k) (1 ε 2 ) s. Therefore, if there is a approximatio algorithm that achieves a ratio better tha 1 ε 2 1+ε 1 s, the we will be able to tell whether H is i the YES case or the NO case, a cotradictio to Theorem 1. Sice ε 1 ad ε 2 ca be arbitrarily small, we coclude that it is UGC-hard to approximate flexible colorig to withi a factor of s ε, for arbitrarily small costat ε > 0. We remark that the the result above ca be stregtheed to obtai a hardess of approximatio factor of s α γ, where the optimal umber of colors used is α for ay 1 + ε α s ε (where ε = Θ(γ)). The proof follows by addig roughly (α 1) dummy odes to the graph produced i the reductio above ad addig hyperedges so that each dummy ode has the maximum possible umber of icidet hyperedges. (Note that i such a case the dummy odes have to be assiged s colors.) 3
4 Flexible colorig o graphs I this sectio, we focus o the special case of flexible colorig o graphs. For a give k ad G = (V, E), where V =, we derive tight upper ad lower bouds of ϕ(g, k) i terms of the chromatic umber of G, deoted by χ(g). It is easy to see that if k χ(g), we have ϕ(g, k) =. For k < χ(g), the followig two theorems establish tight upper ad lower bouds for ϕ(g, k). Theorem 3 For 2 k < χ(g), ϕ(g, k) + (χ(g) k) χ(g), ad this boud is tight. The ituitio behid this theorem is quite simple. Sice χ(g) is the the most ecoomical umber of colors that allows us to assig oe color to each vertex, if we are allowed fewer, we have to icrease color cosumptio by assigig two colors to some vertices. This theorem says that for each allowable color fewer, we have to cosume at most more colors, the average umber of vertices assiged a particular color i regular colorig. We will also costruct a example that precisely cosumes this may more colors. Proof: Let i = χ(g) k. Startig from a regular colorig that uses χ(g) colors, we obtai a flexible colorig that uses k allowable colors as follows. A regular colorig of G with χ(g) colors categorize the vertices ito χ(g) idepedet sets. We sort these idepedet sets i icreasig order of their sizes, ad assig (ay) two colors to the first i idepedet sets, whose total size is χ(g) at most i. It is easy to verify that this is a legitimate flexible colorig with k allowable colors. The total color cosumptio is at most + i χ(g). We claim that the above boud is tight. To see this, cosider a graph where the vertices are orgaized ito c colums ad r rows, hece = r c. Each vertex is coected to all the vertices ot i its ow colum. Therefore, the vertices i each colum forms a idepedet set, ad ay set of vertices from distict colums form a complete graph. For this graph, it is easy to see that χ(g) = c, because otherwise if χ(g) < c, the there exist two vertices from two colums that are assiged the same color, yet these two vertices are eighbors. By the argumet i the previous paragraph, this graph is (c i, + i r)-flex-colorable. Next we prove by cotradictio that we caot cosume fewer tha + i r colors for flexible colorig. Suppose we ca, the there exist more tha (c i) r vertices that are assiged oe color, which implies that these vertices spread across at least c i + 1 colums. Therefore, there exists a (c i + 1)-complete graph amog some of these sigle-colored odes, which is (c i)-colored, a cotradictio. Therefore, the boud is tight. χ(g) Theorem 4 For 2 k < χ(g), ϕ(g, k) + (χ(g) k) ad this boud is tight. Agai, the ituitio for this theorem is quite simple. Sice we are allowed to use less tha χ(g) colors, we expect to assig two colors to some vertices. This theorem says that for each allowable color fewer tha χ(g), we eed to assig two colors to at least oe vertex. We will costruct a example that precisely achieves this boud. Proof: Let i = χ(g) k. We prove by cotradictio that a flexible colorig with k colors will cosume at least + i colors. Suppose that is ot the case. Do a regular colorig o the graph G with χ(g) colors. Group the vertices ito χ(g) groups accordig to their colors. Let these groups be V 1, V 2,..., V χ(g), where group V i is colored with color i. 4
Next, do a flexible colorig o the graph G with k allowable colors. Suppose the flexible colorig cosumes less tha +i colors, i.e., there are less tha i two-colored vertices i the flexible colorig. By the pigeohole priciple, there exist at least k + 1 groups amog all V i s (i regular colorig above) that cotai o two-colored vertices i flexible colorig. Without loss of geerality, let these k + 1 groups be V 1,..., V k+1. Let V = V 1 V k+1 ad V = V k+2 V χ(g). Observe that V cotais oly sigle-colored vertices i flexible colorig, which uses k allowable colors. Therefore, we ca take the regular colorig above ad re-color V usig k colors {1, 2,..., k}. We kow such a re-colorig is possible because flexible colorig uses oly k colors for V without causig coflicts amog the vertices i V ad the re-colorig does ot cause coflicts betwee V ad V, because the k colors used by V are differet from the χ(g) k 1 colors used by V. I other words, we have ow obtaied a χ(g) 1 regular colorig for G, which cotradicts the defiitio of χ(g). We ext show that this boud is tight. Cosider c+1 colums of vertices, where oe colum has r vertices, ad the other c colums have oly oe vertex. A vertex is coected to all the vertices i other colums. So for this graph, χ(g) = c + 1. Give k allowable colors, oe way to do flexible colorig is to assig some sigle color to the log colum ad k 1 short colums, ad assig two colors to the remaiig c + 1 k colums, cosumig r + (k 1) + 2(c + 1 k) = + (c + 1 k) = + χ(g) k colors. It ca be prove (usig the same argumet as i Theorem 3) that this color cosumptio is miimum. 5 A approximatio algorithm The proof of Theorem 3 immediately suggests the followig approximatio algorithm for a hypergraph H. Do a strog colorig o H, usig ay strog colorig algorithm, with o restrictios o the umber of colors used. Therefore, the algorithm ca use k colors, which may be greater tha k. If k k, we ca assig oe color to every vertex ad fiish flexible colorig with a optimal color cosumptio of. If k > k, the we orgaize the vertices ito k groups based o their colors ad we sort the groups i icreasig order of size. We the re-assig the vertices the first k k groups ad assig s colors to these vertices. These s colors ca be ay s used by groups k k + 1 to k. It ( is ot hard to see that this procedure produces a legitimate flexible colorig, ad it cosumes s (s 1)k ) k colors, which yields a approximatio ratio of s (s 1)k k. The above algorithm, which makes use of existig strog colorig algorithms, establishes a coectio betwee the strog chromatic umber ad the cosumptio umber. How well we ca approximate flexible colorig ow depeds o how well we ca approximate strog colorig. For graphs, strog colorig is just regular colorig, a well-studied problem. For example, we ca use the strog colorig algorithm by Agarsso ad Halldórsso [2] or the regular colorig algorithm o graphs by Halldórsso [3]. However, sice colorig is i geeral a hard problem to approximate, this boud is ot ecessarily attractive. We ca also iterpret the s (s 1)k k boud i terms of other graph parameters. For example, a graph G ca be greedily colored by (G)+1 colors, where (G) k is the maximum degree of G. Therefore, the above algorithm also achieves a ratio of 2 (G)+1 for G. As metioed earlier, for the special case of s = 2, flexible colorig is equivalet to the problem of maximum ( k-colorable ) iduced subgraph, for which Halldórsso [4] obtaied a approximatio ratio of 1 2 k + 1 whe > k. A simple calculatio shows that if xk 1 2 ( + k), where x is the size of the maximum k-colorable iduced subgraph, the our boud is better, assumig 5
that Halldórsso s algorithm does ot provide a better boud for certai cases. For example, if k + 1 ad x 2, or if k 1 2 ( + k), the our boud is better. Whe the hypergraph is sparse, the above algorithm ca be improved. As a illustratio, cosider the special case where s = 2 (i.e., graphs). We ca assume that all vertices are of degree at least 1 because isolated vertices ca be arbitrarily sigle-colored. Suppose 2m < k, wher m is the umber of edges i the graph, the by a simple averagig argumet, there are at least k 2m k 1 vertices that are of degree at most k 1. Observe that these low degree vertices ca always be sigly colored. Therefore, we ca (1) exclude the low-degree vertices, (2) color the remaiig iduced subgraph usig the above flexible colorig algorithm, ad (3) add back the low-degree vertices ad ) sigle color them. This algorithm results i at most (1 k k 2m k 1 vertices (as opposed to the ) earlier (1 k k ) beig doubly colored. We ote that step (1) above ca be repeated multiple times. We coclude by remarkig that there is room for improvemet for our algorithm. For example, re-assigig s colors to a vertex is brute force. A more refied method would be to first aalyze whether a smaller color set is possible (e.g., a vertex with at most k 1 eighbors ca be sigly colored). Refereces [1] N. Basal, S. Khot, Iapproximability of hypergraph vertex cover ad applicatios to schedulig problems, i: Proceedigs of the 37th Iteratioal Colloquium o Automata, Laguages ad Programmig (ICALP), 2010, pp. 250 261. [2] G. Agarsso, M. M. Halldórsso, Strog colorigs of hypergraphs, i: Proceedigs of the Third Workshop o Olie ad Approximatio Algorithms (WAOA), 2005, pp. 253 266. [3] M. M. Halldórsso, A still better performace guaratee for approximate graph colorig, Iformatio Processig Letters 45 (1993) 19 23. [4] M. M. Halldórsso, Approximatig discrete collectios via local improvemets, i: Proceedigs of the Sixth ACM-SIAM Symposium o Discrete Algorithms (SODA), 1995, pp. 160 169. 6