Lecture and notes by: Nate Chenette, Brent Myers, Hari Prasad November 8, Property Testing

Property Testing 1 Introduction Broadly, property testing is the study of the following class of problems: Given the ability to perform (local) queries concerning a particular object (e.g., a function, or a graph), the task is to determine whether the object has a predetermined (global) property (e.g., linearity or bipartiteness), or is far from having the property. The task should be performed by inspecting only a small (possibly randomly selected) part of the whole object, where a small probability of failure is allowed.[2] 1.1 Testing Linearity A Motivating Example To give the reader an intuitive sense of why and how property testing is useful, we begin with a motivating example. Let F be a finite field. Consider the definition of functional linearity: Definition. A function f : F m F is linear if and only if for every x, y F m, f(x) + f(y) = f(x + y). Now, suppose you are given a function f : F m F, and are asked to determine whether it is linear. More specifically, you are given a sort of black box representing the function, into which you can feed a value x F m, and receive back the value f(x) F (such an action is called a query. A naive approach to this problem might be to take every pair of values x, y in F m, and, using the black box, test whether f(x)+f(y) = f(x+y). In fact, if we wish to know absolutely whether f is linear or not, then something like this naive approach is necessary. But one can readily see that this approach is impractical if F is of a nontrivial size. And besides, if we feed every value of F m into the black box, then this actually determines the value of f, which seems like an exorbitantly strong intermediate result in determining a property of f (linearity). However, suppose we are faced with a relaxation of the problem: given that you can compute any value of f, determine whether f is linear or far from linear with high probability. Now, we can exploit the black box model to answer this question quickly, using the following approach. Algorithm (Sketch). Let us pick a set S of n random elements of F m. Now, check the linearity of the subset, by comparing f(x) + f(y) and f(x + y) for every pair x, y S. If any such comparison is an inequality, answer REJECT. Otherwise, if all comparisons are equalities, answer ACCEPT. It is easy to see that, the larger S is, the more accurate our answer can be, and in particular, the probability of success can be brought arbitrarily close to 1 by increasing S. A few more notes about this particular algorithm: This particular algorithm has one-sided error, meaning that if f is linear, then the algorithm always answers ACCEPT. Equivalently, the contrapositive is true: if the algorithm answers REJECT, then f is not linear with probability 1. If f is determined to be not linear, then this algorithm even provides a witness, i.e., two elements x, y F m such that f(x) + f(y) f(x + y). 1

1.2 Property testing definitions We see that the above motivating example fits into the broad definition of property testing given at the beginning of this paper. The object in question is a function, and we wish to determine whether it has, or is far from having, the property of linearity. Our algorithm randomly selected a small part of the object (namely, f(s)) to inspect and determined an answer that is true with high probability. Let us formalize some of the notions in our definition of property testing. Let U be a family of objects of the same type and representation (for example, U could be the family of graphs under the adjacency matrix model.) Let us also define a distance function for elements of U, δ : U U [0, 1]. Let P be a property defined on elements of U. Then for any ɛ [0, 1], we say an object X U is ɛ far from having property P if δ(x, Y ) > ɛ for all objects Y U having property P. Otherwise, X is ɛ-close. A property testing algorithm for a property P is given a distance parameter ɛ and an object. The algorithm should accept with probability at least p > 1/2 every object that has property P, and should reject with probability q > 1/2 every object that is ɛ-far from having property P. (Note: p and q are usually arbitrarily set to 2/3. Recall that, as in random algorithms, we can repeat the algorithm in order to achieve an arbitrarily high assurance bound.) If one of p or q is 1, then we say the algorithm has one-sided-error, and if both are strictly less that 1, the algorithm has twosided-error. The efficiency of a property testing algorithm can be measured using two parameters, the running time and the query complexity (the number of queries the algorithm performs on an object.) Most property testing algorithms actually achieve query and time complexities that depend on ɛ rather than the size of the object! 1.3 Formalization of the Linearity Testing Example Here is a formalization of the algorithm for testing the linearity of a function f : F m F. We say that a pair of elements x, y is a violating pair if f(x) + f(y) f(x + y). The distance of a function from linearity will be defined as the number of violating pairs over the domain of the function. Linearity Test 1. Uniformly and independently select m = Θ(ɛ 1 ) pairs of elements x, y F. 2. For every pair of elements selected, check whether f(x) + f(y) = f(x + y). 3. If any of the selected pairs is a violating pair, then reject, otherwise, accept. By the definition of linearity, if f is linear then it is always accepted. It thus remains to prove: Theorem 1. If f is ɛ-far from linear, then with probability at least 2/3, Linearity Test rejects it. Proof. We define the distance from linearity of a nonlinear function in this setting to be the number of points f(x) needed to be changed in order to make f a linear function. Let δ denote the (exact) distance of f from linearity (so that in particular, δ > ɛ). We will prove the theorem for the case where δ is bounded away from 1 2, i.e., δ 1 2 γ for some constant γ > 0. We shall show that the probability that a single uniformly selected pair of elements is a violating pair is at least 3δ(1 2δ). For δ bounded away from 1 2, this probability is Ω(δ). Since the test 2

selects Θ(1/ɛ) = ω(1/δ) pairs, the probability that no violating pair is selected is at most 1/3 (for the appropriate choice of the constant in the Ω( ) notation). Let g be a linear function at distance δ from f. Let G {x : f(x) = g(x)} be the set of good elements in F on which f and g agree. For any pair x, y, if among the three elements x, y, and x + y, two of them belong to G while the third doesn t, then x, y are a violating pair. Hence, Pr[x, y are a violating pair] Pr[x / G, y G, (x + y) G] +Pr[x G, y / G, (x + y) G] +Pr[x G, y G, (x + y) / G] (1) Consider the first probability in the above sum (the treatment of the other two is analogous, as the important property of any triplet is that every two of the elements are pairwise independent). Pr[x / G, y G, (x + y) G] = Pr[x / G] Pr[y G, (x + y) G x / G] = δ (1 Pr[y / G or (x + y) / G x / G] (2) By using a probability union bound, and the fact that both y and (x+y) are uniformly distributed, 1 Pr[y / G or (x + y) / G x / G] 1 2Pr[y / G x / G] (3) Since x and y are chosen independently, Pr[y / G x / G] = Pr[y / G] = δ (4) and so by combining Equations (1) - (4), the probability of selecting a violating pair is at least 3 δ (1 2δ). 2 Property Testing in Graphs 2.1 Graph Representations 2.1.1 Adjacency Matrix Model One way to represent a graph is by its adjacency matrix. An adjacency matrix is is a 0 1 matrix with a ij = 1 if and only if (i, j) E. A property testing algorithm under the adjacency matrix model is allowed to query whether there is an edge between any two vertices of its choice. That is, the algorithm queries vertices i, j and is given the value of a ij = a ji. In this representation, the distance between graphs is the fraction of entries in the adjacency matrix on which the two graphs differ. By this definition, for a given distance parameter ɛ, the algorithm should reject every graph that requires more than ɛ 2 V 2 edge modifications in order to acquire the tested property (the factor of 1 2 is because each edge is represented twice in the matrix.) This representation is most appropriate for dense graphs, and the results for testing in this model are most meaningful for dense graphs. 2.1.2 Incidence-List Model In this model, a graph is represented by lists of length d, where d is a bound on the degree of the graph. Here, the testing algorithm can query, for every vertex v and index i {1,..., d}, which vertex is the ith neighbor of v. If no such neighbor exists, then the answer is 0. Analogously to the adjacency-matrix model, the distance between graphs is defined to be the fraction of entries on 3

which the graphs differ according to this representation. Since the total number of incidence-list entries is d V, a graph should be rejected if the number of edge modifications required in order to obtain the property is greater than ɛ 2 d V. (Once again, the factor of 1 2 is because each edge (u, v) is represented in the list for u and the list for v.) This model is best for sparse graphs that have low or bounded maximum degree. A variant of the above model allows the incidence lists to be of varying lengths. In such a case, the distance between graphs is defined with respect to the total number of edges in the graph (or an upper bound on this number). This model is suitable for testing graphs that are not dense but for which there is a large variance in the degrees of the graph vertices. Furthermore, some problems are more interesting in this model, in the sense that removing the degree bound makes them less restricted. For example, testing whether a graph has a diameter of at most a bounded size, is less interesting in the bounded degree model, since a bound d on the degree implies a lower bound on the diameter of a graph. Intuitively, testing in this model is at least as hard as testing in the bounded-degree model described above, and in fact in some cases it is strictly harder. 2.2 Summary of Results The following graph properties were studied in [1] and were shown to have testing algorithms with query complexity poly(1/ɛ) and time complexity at most exp(poly(1/ɛ)). In what follows, N denotes the number of graph vertices. [Note: Õ is the same as O, but without logarithmic factors.] 1. Bipartiteness. The algorithm has query complexity and running time Õ(ɛ 3 ). Alon and Krivelevich [3] improved the analysis of the algorithm and obtained a bound Õ(ɛ 2 ) on the query complexity and running time. 2. k-colorability, ) k 3. The algorithm has query complexity Õ(k4 /ɛ 6 ) and running time exp (Õ(k 2 /ɛ 3 ). Alon and Krivelevich [3] improved the analysis of the algorithm and obtained ) a bound of Õ(k2 /ɛ 4 ) on the query complexity, and exp (Õ(k/ɛ 2 ) on the running time. 3. ρ-clique. The property is having a clique of size ρ N, where 0 < ρ < 1 is a constant. ) The query complexity of the algorithm is Õ(ρ2 /ɛ 6 ) and the running time is exp (Õ(ρ/ɛ 2 ). 4. ρ-cut. The property is having a 2-way cut with ρn 2 crossing) edges. The query complexity of the algorithm is Õ(ɛ 7 ) and the running time is exp (Õ(ɛ 3 ). The algorithm generalizes to k-way cuts, at a multiplicative cost of O(log 2 (k)) in the query complexity and in the exponent of the running time. The algorithm can also be modified to test ρ-bisection. This property is similar to ρ-cut except that the partition ) is to equal size subsets. The query complexity is Õ(ɛ 8 ) and the running time is exp (Õ(ɛ 3 ). For all the above properties (except bipartiteness) it is very unlikely that there is a testing algorithm having running time poly(1/ɛ). If such an algorithm exists, by setting ɛ = 1/N we would be able to obtain an exact (randomized) decision procedure that runs in polynomial time, and this would imply that NP BP P. (BP P is the class of decision problems solvable by a probabilistic Turing machine in polynomial time, with an error probability of at most 1/3 for all instances.) The bipartiteness and k-colorability algorithms have one-sided error. Furthermore, whenever a graph is rejected, the algorithm supplies evidence that it does not have the property. Evidence is 4

in the form of a subgraph that is not bipartite/k-colorable. All other algorithms have two-sided error, and this can be shown to be unavoidable within o(n) query-complexity. 2.3 Testing Bipartiteness We would like to be able to use testing algorithms to test properties of large graphs. One common property of graphs is bipartiteness. Definition. A graph is bipartite if it can be partitioned into two disjoint subsets with no edges within each set. Recall that a graph is bipartite if and only if it has no odd length cycles. Clearly bipartiteness can be tested in O( V + E ) with no error by running BFS to find odd cycles. However, we can speed this up considerably if we can afford some inaccuracy. If we just take a random sample from the graph and test the induced subgraph for bipartiteness, what do we know about the full graph? If any induced subgraph contains an odd cycle, then the full graph also must contain an odd cycle. Any induced subgraph of a bipartite graph will always be bipartite. These two facts are encouraging for the sampling approach. This approach will always correctly label bipartite graphs. Also, if we find that the subgraph is not bipartite, the sample will contain proof that the graph is not bipartite. However, we still have to show that we find this proof with high probability. Fortunately, we do not have to reject any non-bipartite graph, only graphs that are ɛ-far from bipartite must be rejected. The distance from bipartiteness is defined in terms of the number of violating edges in the graph. Definition. For some two-way partition of a graph, an edge is a violating edge if both of its ends are in the same partition. Definition. A two-way partition of a graph is ɛ-bad if the number of violating edges is greater than ɛn 2. Otherwise it is ɛ-good. Definition. A graph is ɛ-far from bipartite if all possible partitions are ɛ-bad. If the partitions are known, how many pairs of vertices must be sampled to find a violating edge with high probability? In some ɛ-bad partitioning of a graph, there are at least ɛn 2 violating edges by definition. Also, there are at most N(N 1)/2 < N 2 edges in any simple graph. The probability of discovering a violating edge from a uniform sample is then at least ɛn 2 = ɛ. So to N 2 to find a violating edge with high probability we must sample O(ɛ 1 ) edges. Unfortunately we do not know the partitioning, so we must sample enough vertices to find a violating edge in every partitioning. This results in a sample size linear in the number of vertices. It turns out that we can do a little better in the analysis to obtain a better query complexity. First, let s state the algorithm more formally. Algorithm 1. Tests a graph for bipartiteness. 1. Uniformly and independently select m = Θ( log(1/ɛ) ) vertices. (m will be derived later) ɛ 2 5

2. Obtain the subgraph induced by the sampled vertices. 3. Use BFS to determine if the induced subgraph is bipartite. If it is bipartite, ACCEPT, otherwise, REJECT. Theorem 2. Algorithm 1 is a testing algorithm for bipartiteness. Proof. We have already discussed the one-sided error of our algorithm above. We only need to show that the algorithm rejects graphs ɛ-far from bipartite with probability 2/3. Definition. A vertex is influential if its degree is at least ɛ 4N. Otherwise it is not influential. In a bipartite graph, E < mn where m and n are the sizes of the two partitions. E is maximized when the sets are both the same size. So E < N 2 /4 for all bipartite graphs. Since there are at most N non-influential vertices in a graph, there are at most (ɛ/4)n 2 edges incident to non-influential vertices. This gives us the intuition that influential vertices will tell us much more than non-influential vertices about the bipartiteness of the graph. Definition. A set of vertices U covers a vertex v if v has at least one neighbor in U. Lemma 3. Let U be a set of vertices of size Θ(log(1/ɛ)/ɛ) selected uniformly and independently from a simple graph G. With probability at least 5/6 over the choice of the vertices in U, all but at most (ɛ/4)n influential vertices are covered by U. Proof. Let v be an influential vertex. If we randomly pick a set U of vertices uniformly and independently, what are the chances that the set covers v? Since v has at least (ɛ/4)n neighbors, the probability of finding none of them is at most (1 ɛ 4 ) U < exp( ɛ 4 U ). If we let U = 4 ɛ ln(24/ɛ), then the probability above reduces to ɛ/24. Since there are at most N influential vertices, the expected number of influential vertices not covered by U is at most (ɛ/24)n. Applying Markov s inequality shows that the probability of having greater than (ɛ/4)n vertices is at most 1/6 = 1 5/6. Lemma 4. Let G = (V, E) be a graph that is ɛ-far from bipartite, U a subset of V that covers all but at most (ɛ/4)n of the influential vertices in G, and (U 1, U 2 ) a fixed partition of U. Let S be a uniformly and independently selected sample of size Θ( U /ɛ) vertices. Then, with probability at least 1 2 U /6 over the choice of S, for every partition (S 1, S 2 ) of S, there is some edge between vertices in U S that is violating with respect to (U 1 S 1, U 2 S 2 ). Proof. Let C be the set of vertices in V that are covered by U and let R be the remaining vertices. R contains at most (ɛ/4)n vertices by Lemma 3. Break up U into a partition (U 1, U 2 ). Now there is a similar partiton (C 1, C 2 ) of C based on the cover sets of (U 1, U 2 ). Let C 2 be the set of vertices covered by U 1, and C 1 be C C 2. Also partition R into (R 1, R 2 ) with U 1 R R 1 and U 2 R R 2, and the rest of R distributed arbitrarily. The partition (C 1 R 1, C 2 R 2 ) must be ɛ-bad since the graph is ɛ-far from bipartite. Which edges in the graph are violating with respect to this partition? There are at most (ɛ/4)n influential vertices in R, and at most N non-influential vertices in the entire graph. Also there could be at most N non-influential edges each adjacent to ɛ 4N edges. So the number of edges incident to R is at most (ɛ/4)n N + N (ɛ/4)n = (ɛ/2)n 2. This shows that for all possible partitionings of R, there are at most (ɛ/2)n 2 violating edges incident to R. This leaves (ɛ)n 2 (ɛ/2)n 2 = (ɛ/2)n 2 violating edges that are not incident to R, and must be within C 1 or within C 2. 6

If S is viewed as pairs of vertices, the probability that each pair is a violating edge with respect to (C 1, C 2 ) is at least ɛ/2. So the probability there is no violating edge among all S /2 pairs is at most (1 (ɛ/2)) S /2. If S = 16 U /ɛ, the probability is less than 2 U /6. To complete the claim we need to show that if S contains such a pair, then it is not possible to partition S into (S 1, S 2 ) so that (U 1 S 1, U 2 S 2 ) has no violating edges. Consider an edge (v, w) such that v, w S that violates (C 1, C 2 ), and without loss of generality, assume v, w C 2. If we put both vertices either in S 1 or S 2 then (v, w) is violating with respect to (S 1, S 2 ). However, since v and w belong to C 2, by our definition of (C 1, C 2 ), v has some neighbor u U 1 and w has some neighbor u U 1. Therefore if we put v S 1 and w S 2, then the edge (u, v) will be violating, and if we put w S 1 and v S 2, then the edge (u, w) will be violating. The lemma follows. Combining Lemma 4 with the fact that there are 2 U partitions of U, it follows that with probability 5/6 over the choice of S, for every partition (U 1, U 2 ) of U, and for every partition (S 1, S 2 ) of S, the sample contains edges that violate (U 1 S 1, U 2 S 2 ). In other words, the sample U S cannot be partitioned without violations. Combining this with Lemma 3, the theorem follows. As described, Algorithm 1 has query and time complexities that are quadratic in the size of Figure 1: Diagram of graph structure induced by (U 1, U 2 ). Vertices in C 2 have at least one neighbor in U 1, and vertices in C 1 have at least one neighbor in C 1. R has no neighbors in U 1 or U 2, with the remaining elements partitioned arbitrarily. Edges incedent to R (at most (ɛ/2)/n 2 ) are dotted. Violating edges within C 1 or within C 2 (at most (ɛ/2)/n 2 ) are dashed. 7

the sample. That is, Θ(log 2 (1/ɛ)/ɛ 4 ). However, given the analysis, we can slightly improve on this bound. Assume the algorithm actually partitions the sample into two parts U and S of sizes m 1 = Θ(log(1/ɛ)/ɛ) and m 2 = Θ(log(1/ɛ)/ɛ 2 ) respectively. It views S as consisting of m 2 /2 pairs of vertices, and queries whether an edge exists only between all pairs of vertices in U S, and between the m 2 /2 pairs in S. It then checks whether the resulting subgraph is bipartite. Then by the analysis this suffices to obtain the desired success probability while decreasing the complexities to Θ(log 2 (1/ɛ)/ɛ 3 ) 3 Other Applications 3.1 Constructing (Good) Graph Partitions For many graph properties, if the graph has the desired property, the testing algorithm outputs some auxiliary information which allows to construct, in poly(1/ɛ) N time, a partition that approximately obeys the property. For example, for ρ-clique, the algorithm will find a subset of vertices of size ρn, such that at most ɛn 2 edges need to be added so that it becomes a clique. In the case of ρ-cut, the algorithm will construct a partition with at least (ρ ɛ)n 2 crossing edges. The basic idea is that the partition of the sample that caused the algorithm to accept is used to partition the whole graph. 3.2 A General Class of Partition Properties A 1998 paper of Goldreich, Goldwasser, and Ron [1], which introduced the notion of graph property testing, also provided an property testing algorithm and bounds for an entire class of property testing problems they called partition problems. What follows is a statement of this result, without description of the algorithm or proof (which are quite complex), but with a short list of problems for which the algorithm applies. The following framework of a general partition problem captures any graph property which requires the existence of partitions satisfying certain fixed density constraints. These constraints may refer both to the number of vertices in each component of the partition and to the number of edges between each pair of components. Let Φ def = { ρ LB j, ρ UB } k j j=1 { ϱ LB } k j,j, ϱub j,j be a set of non-negative parameters so that j,j ρlb =1 j ρ UB j for all j and ϱ LB j,j ϱ UB j,j for all j and j. (LB stands for Lower Bound, UB stands for Upper Bound.) Let GP Φ be the class of graphs which have a k-way partition (V 1,..., V k ) with the following conditions being satisfied. j, ρ LB j N V j ρ UB j N and j, j, ϱ LB j,j N 2 E(V j, V j ) ϱ UB j,j N 2 where, recall that E(V j, V j ) is the set of edges between vertices in V 1 and vertices in V j. That is, the first relation places lower and upper bounds on the relative sizes of the various components of the partition; whereas the second relation imposes lower and upper bounds on the density of edges among the various pairs of components. Theorem 5. There exists an algorithm A such that for every given set of parameters Φ, algorithm A is a property testing algorithm for the class GP Φ with query complexity ( O(k 2 )/ɛ ) k+5 k 2 log(k/ɛδ), and running time exp ( (O(k 2 )/ɛ) k+2 log(k/ɛδ) ). 8

For a description of the algorithm and proof of this theorem, see [1]. What follows is a list of property testing problems (along with values for the density constraints) for which the above theorem provides an algorithm to solve, with the stated bounds. 1. Bipartiteness. We set k = 2 (as we are interested in a two-way partition), ρ LB 1 = ρ LB 2 = 0, and ρ UB 1 = ρ UB 2 = 1 (as there are no restrictions on the sizes of the partition subsets), ϱ LB 1,1 = ϱlb 2,2 = ϱub 1,1 = ϱub 2,2 = 0, enforcing the main constraint that there be no edges within the partition subsets, and finally ϱ LB 1,2 = 0, ϱub 1,2 = 1, since the number of edges between the two subsets is not restricted. 2. k-colorability, k 3. This is a generalization of bipartiteness, and the important density bounds are ϱ UB j,j = 0 for every 1 j k. All lower bounds are 0 and all other upper bounds are 1. 3. ρ-clique. Here, k = 1, ρ LB 1 = ρ UB 1 = ρ, enforcing the restriction that one subset should be of size ρn, and ϱ LB 1 = 1 2 (ρ2 ρ/n), enforcing the restriction that the subset be a clique. 4. ρ-cut. Here k = 2, and ϱ LB 1,2 = ρ UB j = 1/2 for j {1, 2}. ρ LB l 4 Conclusion and Open Problems = ρ. For the case of ρ-bisection we add the constraints that Property testing is a useful tool that allows for sublinear algorithms when some accuracy can be sacrificed. However there are many properties that are hard to test. For example, First Order Graph Properties are hard to test. Let A(x 1,, x t, y 1,, y s ) be a quantifier free graph expression. That is, it contains equality of vertices, adjacency relations between vertices, and boolean connectives. Properties that are expresed in this form are known as First Order Graph Properties. There are many classes of first order graph properties, most of which are hard to test. For example, an expression is an EA First Order Graph Property if the expression is of the form x 1,, x t y 1,, y s A(x 1,, x t, y 1,, y s ). An expression of the type AE is defined analogously. k-colorability is known to be equivalent to an EA type problem. Properties that can be described as an EA statement have been shown to have query complexity and running time independent of the size of the graph. However, the best known upper bound on the number of queries is a tower of towers of poly(1/ɛ) exponents. It is not known if this can be improved upon. [4] References [1] O. Goldreich, S. Goldwasser, and D. Ron. Property testing and its connection to learning and approximation. JACM, 45(4):653 750, 1998. [2] Ron, Dana. Property Testing. Appeared: Handbook on Randomization, Volume II. Editors: S. Rajasekaran, P. M. Pardalos, J. H. Reif, J. D. P. Rolim. [3] N. Alon and M. Krivelevich. Testing k-colorability. Manuscript, 1999. [4] N. Alon, E. Fischer, M. Krivelevich, and M. Szegedy. Efficient testing of large graphs. In Proceedings of the Fortieth Annual Symposium on Foundations of Computer Science, pages 645-655, 1999. 9