Mining Closed Itemsets: A Review

Size: px

Start display at page:

Download "Mining Closed Itemsets: A Review"

Shanon Fox
5 years ago
Views:

1 Mining Closed Itemsets: A Review 1, 2 *1 Department of Computer Science, Faculty of Informatics Mahasarakham University,Mahasaraham, 44150, Thailand panida.s@msu.ac.th 2 National Centre of Excellence in Mathematics, PERDO, Bangkok, 10400, Thailand Abstract Closed itemset mining is a popular research in data mining. It was proposed to avoid a large number of redundant itemsets in frequent itemset mining. Various algorithms were proposed with efficient strategies to generate closed itemsets. This paper aims to study the existence algorithms used to mine closed itemsets. The various strategies in the algorithms are presented and analyzed in this paper. 1. Introduction Keywords: Closed Itemset, Frequent Itemset, Data Mining Association rule mining was first introduced by Agrawal et al. [18, 19] to determine relationships among items in a very large database and create useful rules, called association rules. The association rule mining is exploited in many applications, such as the analysis of customer behavior, predicts of web access patterns, scientific experiments, disease treatment, natural disasters, and so on. The association rule mining is divided into two sub-problems [13]. The first problem is the generation of all itemsets whose frequency greater or equal to a user-defined threshold, called frequent itemsets, and then the frequent itemsets are used to generate association rules in the second problem. If a large number of frequent itemsets are generated, a large number of association rules are also produced and then made the process of analysts very complex. Users have to search the most relevant rules through the huge number of association rules. In addition, the performances of frequent itemset mining drastically decrease for mining the large database. To overcome the problems, Pasquier et al. [13, 14, 13] proposed closed itemset mining to reduce the number of frequent itemsets without information lossless. The closed itemset mining generates only frequent itemsets having no superset with the same support based on Galois connection [11, 18, 21]. Mining closed itemsets gives not only a compact complete result set but also a better efficiency than mining frequent itemsets. The remainder of the paper is organized as follows. Section 2 mentions basic definitions of closed itemset mining. Various strategies in the algorithms are concluded in the section 3. The algorithms for mining closed itemsets are briefly mentioned in section 4. The conclusion of the study is presented in section Basic definitions Definition 1 A data mining context is a triple C ( D, I, R). D is a non-empty finite set of all transactions in the database, and I is a non-empty finite set of distinct items appearing in the database. R D I is a binary relation between transactions and items. Each couple ( di, ) Rdenotes the fact that the transaction d D has the item i I. Definition 2 Let C ( D, I, R) be a data mining context, a set of transactions T be a non-empty subset of D and an itemset (a set of items) X be a non-empty subset of I. The concept of closed itemset mining based on function f and g [20, 21] as follows: f ( T) { i I d T,( d, i) R} (1) g( X) { d D i X,( d, i) R} (2) International Journal of Advancements in Computing Technology(IJACT) Volume4, Number5, March 2012 doi: /ijact.vol4.issue

2 The function f returns the largest itemset included in all transactions belonging to T, and the function g returns a set of transactions supporting a given itemset X. Definition 3 An itemset X is closed if and only if ( X) f( g( X)) fog( X) X, where the composite function fog is called a Galois operator or closure operator. Example 1 In table 1, the letters, a-d, represent items in the database. Given an itemset { i 1, i2,..., ik} represented in form i 1 i2... ik, itemset bc is not closed because ( bc) f ( g( bc)) f ({1,5}) abcd. Considering itemset ac, it is closed because ( ac) f ( g( ac)) f ({1, 2,5}) ac. Table 1. The example database Transaction ids 1. a, b, c, d 2. a, c 3. c, d 4. a, b, d 5. a, b, c, d Itemset Definition 4 The support of itemset X is the number of transaction ids containing X that is g( X ), denoted as supp( X ). Definition 5 The length of itemset X is the number of items contained in X. An itemset having length l is denoted as l-itemset. Definition 6 Itemset X is a sub-itemset of itemset Y if and only if X is a subset ofy. Y is called a super-itemset of X, denoted as X Y. Observation: If X Y and supp( X ) supp( Y ), X will be absorbed by Y. Definition 7 Given a user-defined threshold or a minimum support min_supp, an itemset X is called a frequent itemset if and only if supp( X ) min_supp. Definition 8 An itemset X is called a frequent closed itemset if and only if supp( X ) ( X) X. min_supp and abcd:2 Top-Down abc:2 abd:3 acd:2 bcd:2 ab:3 ac:3 ad:3 bc:2 bd:3 cd:2 a:4 b:3 c:4 d:4 Bottom-Up ad:3 Support A closed itemset <> Equivalence class Figure 1. The frequent itemset lattice Example 2 Figure 1 shows a lattice of frequent itemsets when min_supp 2. The closure operator defines a set of equivalence classes over the lattice: two frequent itemsets belong to the same equivalence class if and only if they have the same closure. For example, ab and ad are in the same equivalence class, because ( ab) ( ad) abd. Each equivalence class contains elements sharing the 182

3 same support, and frequent closed itemsets are maximal elements of each equivalence class. Therefore, the set of frequent closed itemsets is {(a):4, (c):4, (d):4, (ac):3, (abd):3, (abcd):2}, while the set of frequent itemsets is {(a):4, (b):3, (c):4, (d):4, (ab):3 (ac):3, (ad):3, (bc):2, (bd):3, (cd):2, (abc):2, (abd):3, (acd):2, (bcd):2, (abcd):2}that is larger than the number of frequent closed itemsets. Although, the number of closed itemsets is smaller than the number of frequent itemsets, they can be used to derive the complete set of frequent itemsets. 3. Strategies for mining closed itemsets This section discusses strategies of the existence algorithms for mining closed itemsets. The strategies are analyzed and grouped in term of verification of closed itemsets, search space pruning, search space travel, closed itemset growing and dataset format Verification of closed itemsets Intersection transactions: This technique finds closed itemsets by intersection transactions. If an intersection all transactions containing itemset X is X, X is closed. This technique has to take a large cost for operation of the intersection if the transactions containing X is very large. Moreover, it has to perform the intersection all itemsets to check they are closed or not. Furthermore, this technique leads to redundant computation because two generators may lead to the same closed itemsets, and it needs to maintain the already mined frequent itemset in order to identify generator. Subset checking: This technique mines closed itemsets by checking subset of an itemset and its support. If X Y and supp( X ) supp( Y ), then X is not closed. It means that if the newly found itemset is a subset of an already found closed itemset candidate with the same support, the newly found is not closed itemset. This technique has to store the already mined itemsets in memory to check they are real closed. Thus, it may lead to consume much memory if the number of the mined itemsets is very large. Closure calculation: This technique directly produces closed itemsets, if g( X) g( i) and i X, then i ( X). Therefore, this technique does not need to maintain candidates to check that they are real closed. However, it may produce the same closed itemset from the different itemsets. Therefore, before computing the closure, the redundant itemsets have to be detected and discarded Search space pruning Itemset merging: This technique prunes search space by considering transactions. If every transactions containing itemset X also contains itemset Y but not any proper superset of Y, then X Y forms a frequent closed itemset and it does not need to search any iemset containing X but no Y. This technique has to take a large cost to consider that transaction of X contains itemset Y if there are many itemsets produced. Sub-itemset pruning: This technique considers that if prefix itemset X is a proper subset of an already found frequent closed itemset Y and supp( X ) supp( Y ), prefix X can be safely pruned from the search space. From this technique, it needs to maintain the already found closed itemsets to prune the search space. Tid subset testing: This technique considers that If g( X) g( i) ; i is any preceding frequent items of X according to lexicographic order, X and its possible extensions will be pruned because all itemsets beginning with X are absorbed by the itemsets beginning i. Unlike itemset merging technique, this technique considers only transaction of preceding frequent itemsets included in transaction of X or not. Therefore, it takes less expensive than itemset merging technique Search space travel Breadth-first: This approach searches level by level. It is suitable for mining closed itemsets with length constraint because such patterns are found only itemset belong to minimum length level. The breadth-first search can avoid traversal of long braches and minimum support can be raised faster. 183

4 However, breadth-first approach may encounter difficulties because there could be many candidates, many times database scans and limitation of termination by using equivalence class. It is not efficient for pruning search space. Therefore, breath-first search approach is not suitable for mining long itemsets [10]. Depth-first: The idea of this approach is to travel as deep as possible for neighbor to neighbor before backtracking. This travel tends to generate long itemsets. The sub-tree of an itemset is searched only if the itemset is not included in other itemsets. If the itemset is included, the sub-tree is quickly pruned. Therefore, it is more efficient than breath-first search for mining closed itemsets, because closed itemsets quickly found and unnecessary itemsets are quickly pruned. Best-first: This approach schedules highest-priority-first order. It is used to find closed itemsets in descending order of their supports. Therefore, the best-first search is a good approach for mining the most interesting itemsets Closed itemset growing Top-down: This strategy tends to produce the longest itemset first, and then generates the shorter ones as shown in figure 1 from top to down. Top-down is an efficient strategy for finding closed itemsets because the longest itemsets can contain the subset itemsets which are absorbed by it if their supports are the same. However, this strategy firstly generates itemsets with low support. The itemsets may not be frequent itemsets. Although some of generated itemsets are not frequent itemsets, they are kept to produce the shorter ones because the support of the shorter ones is possible more than that of the generated itemsets. Therefore, the shorter ones may be frequent itemsets. In addition, a new database scan or a new calculation may be performed to find the support of itemsets without exploiting those of the generated itemsets. Bottom-up: This strategy produces the shortest itemsets (single-itemsets) with high support first, then the shortest itemsets are extended to the longer ones as shown in figure 1 from bottom to up. This strategy is efficiently checked frequent itemsets because it first generates itemsets with high support which are highly possibility frequent itemsets. Moreover, the support of new itemsets can be calculated by using those of generated itemsets. However, it is not easy to check closed itemsets. The generated itemsets may not be closed itemsets, they will be eliminated later, if any new itemset is a superset of them with the same support. Therefore, some algorithms need to consume memory to store mined itemsets such as CLOSET, CLOSET+, FPCLOSE, CHARM and CLOSPAN Dataset format Characteristic of datasets is different. Therefore, datasets may be clustered before the mining process [4]. Datasets may be transformed to another format to improve the mining process. The dataset format is dived to two groups as follows. Vertical format: This format keeps a set of transaction ids for every item as shown in table 2. The support of itemset can be computed by intersection on the transaction ids. This computation not only is simply computed the supports but also fast performed. In addition, there is automatic pruning of irrelevant information, only transaction ids relevant for frequency determination remain after each intersection [11]. For database with long transactions, it is scan at once and represented using a simple cost model, that the vertical approach reduces the number of I/O operations. However, if the database is dense, the set of transaction ids of frequent itemsets become large. It takes expensive cost for intersection. Moreover, data compression and temporary disk may be required to fit a large size of transaction ids. Horizontal format: Each transaction is recorded as a list of items as shown in table 1. This format produces many local frequent items which can be used to grow the prefix itemset to generate frequent itemsets, whereas intersection operation of transaction ids in vertical format can find only one frequent itemset. However, if the database contains a lot of transactions, horizontal format may take expensive calculation cost in all transactions to find closed itemsets. 184

5 Table 2. Vertical format dataset Itemset Transaction ids a 1, 2, 4, 5 b 1, 4, 5 c 1, 2, 3, 5 d 1, 3, 4, 5 4. Algorithms for mining closed itemsets CLOSE [13] or A-CLOSE [14] was introduced to mine closed itemsets by applying the closure operator. The smallest frequent itemsets of all equivalence classes are determined as generators to produce closed itemsets. If the support of an itemset is equal to the support of any it s subsets, then such itemset will be pruned because it is not a generator. The closure of each generator is computed by intersection of all transactions containing it. Unfortunately, computing a closure form the smallest frequent itemsets may lead to redundant computation because two generators may lead to the same closed itemsets. For example, in the figure 1, the closed itemset (abcd) have to be computed twice from two smallest frequent itemsets, (bc) and (cd). Moreover, the closure operation is also expensive if there are many transactions containing a generator. In addition, it needs to maintain the already mined frequent itemsets in order to identify generators. To avoid the redundant computation of the same frequent closed itemsets, the CLOSET [9] and CLOSET+ [10] algorithms adopt an FP-tree (Frequent Itemset Tree) [7] to store itemsets in database in main memory. The itemsets sharing the same subset of items is share common prefix paths from the root in the FP-tree. After the FP-tree has been constructed, CLOSET and CLOSET+ use divide-andconquer technique to mine closed itemsets on the FP-tree. They split the extraction context into smaller sub-context and recursively performs the closed itemset mining process on these sub-contexts. CLOSET mines closed itemsets by closure climbing and growing up frequent closed itemsets with items having the same support. It verifies a closed itemset by using subset checking. CLOSET was improved to CLOSET+ by using different methods for mining different datasets. CLOSET+ mines frequent closed itemsets in a top-down manner on the FP-tree for sparse datasets and bottom-up manner for dense datasets. During the mining process, CLOSET+ uses the item merging, and subitemset pruning methods to prune search space. For verifying closed itemsets, it uses upward checking method for sparse datasets and adopts subset checking for dense datasets. FPCLOSE [5] is another algorithm exploiting the FP-tree to mine closed itemsets. It stores the previously mined closed itemsets in recursively constructed CFI-trees (Closed Frequent Itemset trees). Consequently, subset checking cost is less expensive than that of CLOSET and CLOSET+. In addition, a simple array structure is used to reduce the traversal time. Unlike the previously mentioned algorithms, the CHARM [12] algorithm explores the search space in depth-first manner technique without splitting the extraction context into smaller sub-contexts. It proposed a new technique to mines closed itemsets by exploring vertical data format. The vertical data format saves the mining process cost, but the storage space is still large [6] because CHARM not only exploits itemset space but also transaction space. CHARM introduces a data structure, called IT-tree (Itemset Tidset Tree) for mining closed itemsets. Each node in the IT-tree contains a frequent itemset and a set of transaction ids to which it belong. As soon as the frequent itemset is generated, the set of transaction ids is compared with those of the other itemsets having the same parent. If the set of transaction ids includes another one, the associated nodes are merged and the frequent itemsets are finally intersected to be a closed itemset. However, if a set of transactions in the database is large, the intersection computation will become large. To fast the computation, CHARM stores diffsets (a set of different transaction ids) in each node instead of the set of transaction ids, except for the root node. In addition, the number of mismatches in both the diffesets is used for subset checking. From the above algorithms, they enumerate closed itemsets by pruning unnecessary itemsets. Therefore, they need to keep the already mined closed itemset candidates in order to do subset checking. If a large of candidates is generated, they consume much memory. Moreover, some of the algorithms, CLOSE, A-CLOSE and CHARM, lead to compute the same closed itemset. To avoid these problems, the DCI-CLOSED [3, 2] and LCM [20, 21] algorithms try to directly mine closed itemsets without storing candidates and avoid the redundant computation of the same closed itemset. 185

6 DCI-CLOSED was proposed with similarly mining concept of CLOSE and CHARM. DCI- CLOSED uses closure climbing to brow the search space to find generators and compute their closure based on transaction ids using. It detects and discards generators whose closures were already mined based on lexicographic order. Moreover, DCI_CLOSED finds closed itemsets without keeping the set of mined closed itemsets in main memory. After discovering a closed itemset f, a new itemset g is grown by extending f with a frequent item i, i f. If the set of transaction ids of g is subset of those of any preceding frequent items, g is not an order preserving generator of a new closed itemset, and it is discarded. Otherwise, g is computed to find the closed itemset by unifying it with the post frequent items whose transaction ids including those of g. The LCM (Linear time closed itemset Miner) algorithm is similar to DCI_CLOSED algorithm. It finds closed itemsets from transaction database without storing already mined closed itemsets. LCM proposed prefix-preserving closure extension of closed itemsets to search all frequent closed itemsets, a new closed itemset is extended from previously obtained closed itemset. The TFP algorithm [8, 11] was proposed to retrieve the desired number of the first k most frequent closed itemsets having length no less than a minimal length. TFP mines the closed itemsets on the FPtree structure. Top-down and bottom-up traverse strategy are combined to speed-up closed itemsets discovering, top-down strategy easily finds itemsets of length at least minimum length, whereas bottom-up strategy is an efficient for finding itemsets with high support. In addition, a hash-based closed itemset verification scheme is employed to perform subset checking. However, TFP needs to maintain the set of already mined frequent closed itemset candidates to assure that every frequent closed itemset is really closed. If a lot of candidates are generated, it consumes much memory. The TopKMiner algorithm [1] combines the LCM algorithm and priority queue to mine the first k most frequent closed itemsets without candidates generating. LCM is exploited to generate closed itemsets without keeping candidates. Priority queue is used to produce closed itemsets with high support first. Songram et al. [16] also proposed the TOPK_CLOSED algorithm to retrieve the first k most frequent closed itemsets. They adopt a best-first search to discovery closed itemsets in descending support values, and adapt DCI_CLOSED to avoid candidates generating. The TOPK_CLOSED algorithm was latterly improved to the NCLOSED algorithm to retrieve N l-itemsets with the highest supports for length l=1 up to a certain l max value, where l max the upper bound of the length of itemsets, and N is the desired number of k-itemsets [17]. From the previously briefed algorithms, the algorithms with their strategies are summarized in table

7 Table 3. Strategies of the most known algorithms Strategies Algorithms 1.Verification of closed itemsets - Intersection transactions CLOSE and A-CLOSE - Subset checking CLOSET, CLOSET+, CHARM and TFP - Closure calculation DCI_CLOSED, LMD, TopKMiner, TOPK_CLOSED and NCLOSED 2.Search space pruning - Itemset merging CLOSET, CLOSET+, CHARM and TFP - Sub-itemset pruning CLOSE, A-CLOSE and CLOSET+ - Tid subset testing DCI_CLOSED, LCM, TopKMiner, TOPK_CLOSED and NCLOSED 3.Search space travel - Breadth-first: A-CLOSED and CLOSE - Depth-first CLOSET, CLOSET+, FPCLOSE, CHARM, CLOSPAN, DCI_CLOSED, LCM and TFP - Best-first TopKMiner, TOPK_CLOSED and NCLOSED 4.Closed itemset growing - Top-down - - Bottom-up CLOSET, CLOSET+, FPCLOSE, CHARM and CLOSPAN, DCI_CLOSE, CLOSE, A-CLOSE and LCM 5.Dataset format - Vertical format CHARM, DCI_CLOSED, LCM TopKMiner, TOPK_CLOSED and NCLOSED - Horizontal format CLOSE, A-CLOSE, CLOSET, CLOSET+, FPCLOSE and TFP 5. Conclusion This paper studies the algorithms in existence for mining closed itemsets. The brief methods of algorithms are presented and discussed in this paper. Moreover, the strategies of algorithms are divided and discussed in term of verification of closed itemsets, search space pruning, closed itemset growing, and dataset format. From the strategies review, to efficiently mine closed itemsets, breath-first is adopted to travel itemsets, and grow them by using bottom-up strategy because search space is quickly pruned and frequent itemsets are quickly found. Moreover, vertical dataset format is exploited to directly generate closed itemsets in order to avoid generating candidates in memory. These strategies tend to be exploited for mining closed itemsets in the future. 6. References [1] Andrea Pietracaprina and Fabio Vandin, Efficient Incremental Mining of Top-k Frequent Closed Itemsets, In Proceedings of 10 th International Conference on Discovery Science, pp , Sendai, Japan, 1-4 October, [2] Claudio Lucchesse, Salvatore Orlando, and Raffaele Perego, DCI-Closed: A Fast and Memory Efficient Algorithm to Mine Frequent Closed Itemsets, In Proceedings of the IEEE ICDM Workshop on Frequent Itemset Mining Implementations (FIMI 2004), volume 126 of CEUR Workshop Proceedings, Brighton, UK, 1 November, [3] Claudio Lucchese, Salvatore Orlando, and Raffaele Perego, Fast and Memory Efficient Mining of Frequent Closed Itemsets, IEEE Journal Transactions on Knowledge and Data Engineering (TKDE), vol. 18, issue 1, pp , January [4] Fan Lilin, Research on Classification Mining Method of Frequent Itemset, Journal of Convergence Information Technology, Vol. 5, No. 8, pp , [5] Gosta Grahne and Jianfei Zhu, Efficiently Using Prefix-trees in Mining Frequent Itemsets, In Proceedings of the IEEE ICDM Workshop on Frequent Itemset Mining Implementations (FIMI 187

8 2003), volume 90 of CEUR Workshop Proceedings, vol. 8, issue 1, pp. 103, Melbourne, Florida, USA, 19 November [6] Haitao He, Shasha Feng, Jiadong Ren, and Qian Wang, The algorithm of mining frequent closed itemsets based on index array, Advances in Information Science and Service Sciences, Vol. 3, No. 9, pp , 2011 [7] Jiawei Han, Jian Pei, and Yiwen Yin, Mining Frequent Itemsets without Candidate Generation, In Proceeding 2000 ACM SIGMOD International conference Management of Data, pp. 1-12, May [8] Jiawei Han, Jianyong Wang, Ying Lu and Petre Tzvetkov, Mining Top-k Frequent Closed Itemsets without Minimum Support, In Proceeding of the 2002 IEEE International Conference on Data Mining (ICDM'02), pp , Washington, DC, USA, [9] Jian Pei, Jiawei Han, and Runying Mao, Closet: An Efficient Algorithm for Mining Frequent Closed Itemsets, In Proceedings of the ACM-SIGMOD International Workshop on Data Mining and Knowledge Discovery (DMKD 2000), pp , Dallas, Texas, USA, [10] Jianyong Wang, Jiawei Han, and Jian Pei, Closet+: Searching for the Best Strategies for Mining Frequent Closed Itemsets, In Proceedings of the 9th ACM-SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD 2003), pp , Washington D.C., USA, August [11] Jian Wang, Jiawei Han, Ying Lu and Petre Tzvetkov, TFP: An Efficient Algorithm for Mining Top-K Frequent Closed Itemsets, Journal of IEEE Transaction on knowledge and data mining, Vol. 17, No. 5, pp , May [12] Mohammed J. Zaki and Ching-jui Hsiao, CHARM: An Efficient Algorithm for Closed Itemset Mining, In Proceedings of the 2nd SIAM International Conference on Data Mining, pp , Arlington, Virginia, USA, April [13] Nicolas Pasquier, Yves Bastide, Rafix Taouil, and Lotfi Lakhal, Efficient Mining of Association Rules Using Closed Itemset Lattices, Journal of Information Systems, vol. 24, issue 1, pp.25 46, [14] Nicolas Pasquier, Yves Bastide, Rafik Taouil, and Lotfi Lakhal, Discovering Frequent Closed Itemsets for Association Rules, In Proceedings of 7th International Conference on Database Theory (ICDT 1999), LNCS, Vol. 1540, pp , Springer, Verlag, Jerusalem, Israel, January [15] Rafik Taouil, Nicolas Pasquier, Yves Bastide, and Lotfi Lakhal, Mining Based for Associate Rules Using Galois Closed Sets, In Proceedings of the 16 th International Conference on Data Engineering (ICDE 2000), pp. 307, [16] and Veera Boonjing, N-Most Interesting Closed Itemset Mining, In Proceedings of the 3 rd Int. Conf. on Convergence and Hybrid Information Technology, pp , November, Korea, [17] and Veera Boonjing, Mining Top-K Closed Itemsets Using Best-First Search, In Proceedings of the 8 rd IEEE Int. Conf. on Computer and Information Technology (CIT 08), pp , Syney, Australia, 8-11 July, [18] Rakesh Agrawal, Tomasz Imielinski, and Arun Swami, Mining association rules between sets of items in large databases, In Proceedings of the ACM SIGMOD International Conference on Management of Data, pp , May [19] Rakesh Agrawal and Ramakrishnan Srikant, Fast Algorithm for Mining Association Rules, In Proceeding of 20 th Very Large Database, pp , [20] Takeaki Uno, Tatsuya Asai, Yuzo Uchida, and Hiroki Arimura, An Efficient Algorithm for Enumerating Closed Itemsets in Transaction Databases, In Proceedings of the 7th International Conference on Discovery Science, pp , Padova, Italy, 2-5 October [21] Takeaki Uno, Masashi Kiyomi, and Hiroki Arimura, LCM ver. 2: Efficient Mining Algorithms for Frequent/Closed/Maximal Itemsets, In Proceedings of the IEEE ICDM Workshop on Frequent Itemset Mining Implementations (FIMI 2004), volume 126 of CEUR Workshop Proceedings, Brighton, UK, 1 November

Efficient Incremental Mining of Top-K Frequent Closed Itemsets

Efficient Incremental Mining of Top- Frequent Closed Itemsets Andrea Pietracaprina and Fabio Vandin Dipartimento di Ingegneria dell Informazione, Università di Padova, Via Gradenigo 6/B, 35131, Padova,