An Efficient Interval Query Algorithm Based on Inverted List in Cloud Environment *

Size: px
Start display at page:

Download "An Efficient Interval Query Algorithm Based on Inverted List in Cloud Environment *"

Transcription

1 Proceeding of the IEEE International Conference on Information and Automation Shenyang, China, June 2012 An Efficient Interval Query Algorithm Based on Inverted List in Cloud Environment * Zhiqiong Wang, Ke Gong, Shikai Jin, Wenjun Li and Zixi Liu Sino-Dutch Biomedical and Information Engineering School Northeastern University Shenyang , P.R. China wangzq@bmie.neu.edu.cn Abstract - Interval overlap query has played a more and more significant role in genomics researches and the development of biomedicine. However, traditional query approches based on single computer cannot handle the problem of limited query speed in the query process properly. A new algorithm based on cloud computing technology named CNCList+ has been proposed to increase the query speed. Nevertheless, the mechanism of CNCList+ that it needs to scan the data of subgroups orderly in every query process reduces the degree of query speed enhancement. Considering the significant role of inverted list in data idex area, the concept of inverted list and the technique of cloud computing are combined together in this paper, forming an efficient query algorithm named IQIL to futher speed up the query speed. In addition, detailed comparison experiments between IQIL and CNCList+ prove the superior performance of IQIL on query speed, thus demonstrating the extraordinary ability of IQIL on solving the limited query speed problem of interval overlap query. Index Terms - Cloud Computing; Inverted List; Interval Overlap Query; Performance. I. INTRODUCTION Interval overlap query has played a significant role in the development of modern genomics and it has proved to be a fundemental and essential tool in theoretical researches and practical applications of biomedicine. One application of interval overlap query is that by comparing the overlapping level between the sequence that needs to be measured and the target sequence interval of a specific disease, the potential risk that people or animals that contained such kind of sequence may get this disease can be predicted, which will help a lot for early diagnosis and clinical treatment. Therefore, interval overlap query is of great importance in modern biomedical area. Nevertheless, many problems turn up and block the effective use of technique of interval overlap query. Among all the problems, limited query speed of traditional query method is supposed to be the most remarkable one. While an efficient interval query of genome alignment and interval databases in cloud environment named CNCList+ is created [1], this accelerates the query speed dramatically by bringing in the extraordinary capability of cloud computing technology in dealing with large amount of data. However, the mechanism of CNCList+ that it needs to scan the data of subgroups orderly in every query process reduces the degree of query speed enhancement. Considering the great importance of inverted index in data index field, we decide to combine inverted index method and cloud computing technique to futher speed up the proces of interval overlap query. The major contributions of this paper can be summarized as following: 1) An efficient interval overlap query algorithm based on inverted list (IQIL) that combines the concept of inverted list and the outstanding capacity of cloud computing in dealing with massive data volumes is proposed. 2) Elaborate experiments are executed to demonstrate the superior ability of the interval overlap query algorithm based on inverted list (IQIL) over CNCList+ in query speed. The rest of the paper is organized as follows. Related work is reviewed in Section 2. Section 3 describes the general idea of CNCList+. Section 4 introduces the details and process of IQIL. Elaborate comparison experiments are presented and the experimental results are analyzed in Section 5. Finally, we conclude this paper in Section 6. II. RELATED WORK Since the interval overlap query has become more and more important in biomedical area, many approches has been attempted on this problem. An excellent browser named the human genome browser (HGB) at UCSC [2] provides an access to the sequence and annotations of the human genome. A database of genomic DNA sequence alignments and annotations called GALA [3] was developed to execute complex queries across multiple forms of information simultaneously or multiple genes. Segment R-tree [4] was designed as an indexing technique for interval data in multiple dimensions. In addition, the conventional multi-column B- Tree has also been used while it is only suitable for the small database. Besides these conventional methods, some techniques from the related field of spatio-temporal indexing came up. The Relational Interval Tree [5] was proposed for any relational or object-relational table containing intervals. Based on the Relational Interval Tree, a new join algorithm [6] was created for interval data. MV3R-Tree [7] was presented as a structure to utilize the concepts of multi-version B-trees and * The paper was supported by Liaoning Provincial Natural Science Foundation of China (No ) and Overseas Distinguished Foreign Expert Project of Universities directly under the Ministry of Education (No. MS2011DBDX021) /12/$ IEEE 221

2 3D R-trees. An interval overlap query algorithm named NCList [8] was created aimed at accelerating interval query of genome alignment and interval database. However, all these methods are executed on single computer, which limite the query speed. An efficient query algorithm based on cloud computing named CNCList and its advanced algorithm CNCList+ were created. The idea of CNCList is that all the sequences are assigned to several big groups by satisfying the rule that all the sequences inside one big group must be in ascending order. Then, the query process is carried on by distribuing the task of querying sequences inside every big group into several common computers. In this case, all the query processes are executed simultaneously, which improves the query speed dramatically. Additionally, two optimazation strategies that are subgroup formation and boundary interval filter are proposed on CNCList and thus form an advanced query algorithm named CNCList+ that further enhances the query speed. Besides these query algorithm attemps, many tequniues related to inverted list arise because of its significant status in database query. A new ranking paradigm for relational databases called Structured Value Ranking that can be supported by a new family of inverted list indices and associated query algorithms was designed [9]. An Apriori algorithm was presented for mining frequent patterns based on inverted list [10]. In addition, a combination-tree algorithm was created for mining frequent patterns based on inverted list [11]. By studing efficient query processing in distributed web search engines with global index organization, an optimized inverted list assignment in distributed search engine architectures was proposed [12]. What s more, a new character-based indexing algorithm which generates all locations of target text to the inverted list in existed bit form turned up [13]. III. CNCLIST+ At beginning, all the sequences are assigned to several original big groups to ensure that all the sequences inside the same big group must be in ascending order, which also means that there is no containment relationship in every big group. Then two optimization strategies are brought in, which are the subgroup formation and bouandry interval filter to further increase the query speed. Subgroup formation enables every original big group divided into several subgroups by observing two rules, maximum efficiency length and adjacent gap rule, which will optimize query process. Moreover, boundary interval filter rule marks every big group and subgroup as an interval that will be checked on the inclusion relation with the target query interval. If the big groups or the subgroups are completely contained by the target query interval, all the sequences within that big group or subgroup are the result sequences. On the contrary, if the big groups or the subgroups are out of the target query interval, all the sequences inside them will be discarded. If the relationship is neither full inclusion nor complete exclusion, which means there is intersection between them, the query process will be executed on the intersected big groups or subgroups only. Owing to the introductions of theses optimization concepts and cloud computing technology, CNCList+ performances better than other traditonal algorithms on final query time. However, since the mechanism of CNCList+ needs to scan the data of subgroups orderly in every query process, it reduces the degree of query speed enhancement. If the superior ability of cloud computing in processing massive data volumes can be made good use of, the query speed can be further increased. Addtionaly, since inverted list is of great importance in database field, the idea that maybe cloud computing and inverted list can be combined together to address the problem of limited query speed comes up and form a new interval overlap query algorithm based on inverted list (IQIL). IV. INTERVAL QUERY BASED ON INVERTED LIST A. Sub-intervals Formation All the sequences will be cut into several fragments during the process of sub-intervals formation. Firstly, the sequence the head of which is the most left will be scanned at first. Secondly, when the start or the end of a sequence is encountered, the interval from the start of the most left sequence to the first encountering start or the end of the sequence will be cut off. Then the scanning process begins from the first encountering start or end of this sequence until another start or end of a sequence is encountered. As the same method before, the interval from the first encountering start or end to the second encountering start or end will be cut off. The sub-intervals formation continues until the right most end of a sequence has been scanned. As is shown in Figure 1, we mark the sequences as S1, S2, S3, S4 and S5 in their start ascending order. Firstly, the sequence S1 will be scanned at first since its start is the left most. Secondly, the scanning process continues from left to right until the first start or the end of a sequence is encountered, which is the start of sequence S2. According to our sub-intervals formation rule, the interval between the start Fig. 1 There are five sequences marked as S1, S2, S3, S4 and S5 in the process of sub-intervals formation. 222

3 Fig. 2 The first interval forms between the start of S1 to the first encountering start or end of a sequence, which is the start of S2. Then another interval formation continues from the start of S2 to the second start or end of a sequence, which is the start of S3. The sub-interval formation continues until all the sequences are scanned. of S1 to the start of S2 will be cut off, forming a sub-interval marked as interval A, as shown in Figure 2. Then the scanning process goes on from the first encountering start or end of the sequence, which is the start of S2 in this example, until another start or end of a sequence is encountered, which is the start of sequence S3. Based on the same rule of sub-intervals formaiton, the segment from the start of S2 to the start of S3 will be cut off and a new sub-interval marked as B in Figure 2 will be formed. The process of sub-interval formation keeps going until the right most end of a sequence, which is S5 in our example, is scanned. The result of the sub-interval formation will be what is shown in Figure 2, that there are nine sub-intervals marked as A, B, C,,I formed during the sub-intervals formation process. B. Intersection Checking Between Sub-intervals and Target Query Interval Fig. 3 All the sub-intervals will be stored after the sub-interval formation. Then the intersection relationship between the target query interval T and all the sub-intervals will be checked and as a result, the sub-intervals C, D and E intersect with target query interval T. These qualifying sub-intervals will the source to track back to the result sequences, which are S1, S2, S3 and S4. Since all the sequences have been cut into several subintervals, all the sub-intervals will be stored. When querying the result sequences that contain the target query interval, what we need to do is checking whether there is an intersection between a sub-interval and the target query interval. If there is, the sequences that contain such subinterval will be the result sequence. As is shown in Figure 3, all the sequences from S1 to S5 have been cutted into nine sub-intervals. Next, all the subintervals A, B, C,, I will be stored. The ultimate goal in our exmaple is to query the sequences that contain the taget query interval T, so what we need to do is to execute the process of intersection checking between sub-intervals and target query interval T. In Figure 3, it is obvious that subintervals C, D and E intersect with target query interval T, thus, the sub-intervals C, D and E are the qualifying sub-intervals. According to the qualifying sub-intervals, the result sequences will be tracked down to because they must contain at least one of these three qualifying sub-intervals. The result sequences will be S1, S2, S3 and S4 shown in Figure 3. C. Process of IQIL As is shown in Algorithm I in Appendix, firstly, all the sequences will be cut into several sub-intervals based on the rules above. What s more, every unit length segment of the same sub-interval will be marked with the same lable (see Function 1). Secondly, the intersection relationship between the target query interval and sub-intervals will be checked and all the qualifying sub-intervals will be got through the lable (see Function 2). According to the inverted table that has been formed, map function will track down to the qualifying sequences and pass the data to reduce function (see Function 3). At last, all the result sequences will be got in reduce function based on the data received from map function (see Function 4). V. EXPERIMENTAL EVALUATION A. Setup In our experiments, we use ubuntu 10.04, linux generic as operating system. The environment is Hadoop 0.21, RAM is 2.0 GB and the switch is net-core NSD1016D (16 port Fast Ethernet Switch, 10M/100M). CPUs and their remaining disk spaces are listed below (see Table I). TABLE I Machine Configuration CPU Remaining Disk Space Intel core.2.duo E GHz 26.3GB Intel core.2.duo E GHz 261.1GB Intel core.2.duo E GHz 27.4GB Intel core.2.duo E GHz 23.9GB Intel core.2.duo E GHz 27.9GB Intel core.2.duo E GHz 27.3GB Intel core.2.duo E GHz 27.0GB Intel core.2.duo E GHz 27.7GB B. Different Number of Machines Since simultaneous operation of lots of common computers that share the computaional resources and 223

4 constitutes a large distributed cluster system yields the extraordinary capability of cloud computing technology, the number of machines is a key factor to the final query time. The following experiment will test the relationship between the final query time and the number of machines. In this experiment, the variable is the number of machines and the amount of test data is set to be 10GB. Result analysis: 1) Figure 4 shows that for both CNCList+ and IQIL algorithm, the more of the number of the machine, the less query time will be used; 2) For the same number of machine, the query time of IQIL is impressively shorter than the query time of CNCList+. Thus, IQIL demonstrates superior query speed ability over CNCList+ under the same amount of machine. does not significantly contribute to the query time. However, the number of sub-intervals will have a major impact on the query time because the checking time of intersecting subintervals will increase if the total number of sub-intervals increases, so if the sub-intervals of a larger data size experiment outnumbers that of a smaller data size, it can still consume more time. This can illustrate the shorter query time of 10GB than that of 8GB in Figure 5; 2) for CNCList+, the larger the amount of data, the longer final query time will be used; 3) for the same amount of data, the query time of IQIL is distinctly shorter than that of CNCList+. Fig. 5 Relationship between the amount of data and query time. Fig. 4 Relationship between number of machine and query time. C. Different Amount of Data When the configuration of platform and the amount of the machines is fixed, we are curious about whether different amount of data will affect the final query time. Therefore, the final experiment is to test the influence from the amount of data to the final query time. In this experiment, the variable is the amount of data and we set the number of machines as eight. In addition, the other configuration would be the same, which guarantee the accuracy. The result is shown on Figure 5. Result analysis: according to the tendency of the experiment curve, we can see that: 1) the amount of data does not make a remarkable effect on the query time of IQIL, and even in some cases the query time of larger amount of data is shorter than that of smaller data. For example, 8GB data has a query time of 35 seconds, whereas 10GB data only needs 28 seconds. The reason is that query process needs to check the suitable intersecting sub-intervals and trace back to all the result sequences containing these intersecing sub-intervals. The amount of data represents the number of sequences. Since the process of tracing back to the result sequences containing intersecting sub-intervals consumes very little time, data size D. Summary The above experiments shows that IQIL has superior query speed ability over CNCList+ for both the same amount of machine and data. In addition, we can enhance the query time by increasing the number of machine. Thus, for researchers focusing on genomics and biomedine, the IQIL algorithm based on cloud computing will be an attractive choice to improve the interval query efficiency and reduce the query speed when interval query that contains massive data volumes needs to be executed. VI. CONCLUSION Overlap interval query has become a more and more crucial tool to data mining and clinical diagnosis in biomedical field. Many biomedical doctors and scientists have made many researches on this, and many approaches have been applied to this topic while they are all based on single computer. An effecinent algorithm based on cloud computing named CNCList+ has been proposed and increases the query speed dramatically. However, since the data of subgroups need to be scanned orderly in every query process of CNCList+, the degree of query speed enhancement is reduced. Considering the great importance of inverted list in data index field, the inverted index method and cloud computing technique are combined, forming a new efficient interval query algorithm 224

5 named IQIL to futher speed up the proces of interval overlap query. Elaborate experiments demonstrate the superior performance of IQIL than the CNCList+ on query speed. Consequently, the IQIL algorithm will contribute a lot for biomedical researchers on data mining and clicical diagnosis and make interval overlap query a more practical tool in biomedical area. REFERENCES [1] Z. Wang, K. Gong, S Jin, W. Li and Z. Liu, Efficient Interval Query of Genome Alignment and Interval Databases in Cloud Environment ICCIP 2012, in press. [2] W.J. Kent, C.W. Sugnet, T.S. Furey, et al, The human genome browser at UCSC, Genome Res., 12, (2002) [3] B. Giardine, L. Elnitski, C. Riemer, et al, GALA, a database for genomic sequence alignments and annotations, Genome Res., 13, (2003) [4] C.P. Kolovson and M. Stonebraker, Segment indexes: dynamic indexing techniques for multi-dimensional interval data, In SIGMOD Conference (1991) [5] H.P. Kriegel, M. Pötke and T. Seidl, Managing intervals efficiently in object-relational databases, In Proc. 26th International Conference on VLDB, Cario (2000) [6] J. Enderle, M. Hampel, and T. Seidl, Joining interval data in relational databases, In Proc. ACM SIGMOD Conference on Management of Data, Paris (2004) [7] Y. Tao and D. Papadias, Mv3r-tree: a spatio-temporal access method fortimestamp and interval queries, In Proc. 27th VLDB Conference, Roma (2001) [8] A.V. Alekseyenko, and C.J. Lee, Nested Containment List (NCList): a new algorithm for accelerating interval query of genome alignment and interval databases, Bioinformatics 23(11), (2007) [9] L. Guo, J. Shanmugasundaram, K. Beyer and E. Shekita, Efficient Inverted Lists and Query Algorithms for Structured Value Ranking in Update Intensive Relational Databases, In Proc. of the International Conference on Data Engineering, (2005) [10] Y. Liu and Y. Hu, Mining Frequent Patterns Based on Inverted List, In Proc. of the International Conference on Machine Learning and Cybernetics (2006) [11] Y. Liu and Y. Hu, Combination Tree for Mining Frequent Patterns Based on Inverted List, In Proc. of the International Conference on Computational Intelligence and Security (2006) [12] J. Zhang and T. Suel, Optimized Inverted List Assignment in Distributed Search Engine Architectures, In Proc. of the IEEE International Symposium on Parallel and Distributed Processing, l [13] C. Khancome and V. Boonjing, Character-Based Indexing Using Inverted Lists, In Proc. of the International Conference on Computer Technology and Development, APPENDIX Algorithm I //Function 1 Encode_every_intervals (allintervals): int i=0; FOR_each (interval m in allintervals) i++; FOR_each(int element in m) Mark (element, i); END END //Function 2 Locate_queried_interval (queried_interval): intervalset=intervals_between(queried_interval.begin.mark, queried_interval.end.mark); return intervalset; //Function 3 Map: geneinterval=inverted_table (one interval of intervalset); pass_to_reduce (geneinterval); //Function 4 Reduce: pass_to_resultfile (data_received_from_map); //Function main Main: encode_every_intervals (allintervals); intervalset=locate_queried_interval(queried_interval ); setup_mapreduce (map, reduce); mapreduce_beginwork (); 225

An Improved Frequent Pattern-growth Algorithm Based on Decomposition of the Transaction Database

An Improved Frequent Pattern-growth Algorithm Based on Decomposition of the Transaction Database Algorithm Based on Decomposition of the Transaction Database 1 School of Management Science and Engineering, Shandong Normal University,Jinan, 250014,China E-mail:459132653@qq.com Fei Wei 2 School of Management

More information

PSON: A Parallelized SON Algorithm with MapReduce for Mining Frequent Sets

PSON: A Parallelized SON Algorithm with MapReduce for Mining Frequent Sets 2011 Fourth International Symposium on Parallel Architectures, Algorithms and Programming PSON: A Parallelized SON Algorithm with MapReduce for Mining Frequent Sets Tao Xiao Chunfeng Yuan Yihua Huang Department

More information

Open Access The Three-dimensional Coding Based on the Cone for XML Under Weaving Multi-documents

Open Access The Three-dimensional Coding Based on the Cone for XML Under Weaving Multi-documents Send Orders for Reprints to reprints@benthamscience.ae 676 The Open Automation and Control Systems Journal, 2014, 6, 676-683 Open Access The Three-dimensional Coding Based on the Cone for XML Under Weaving

More information

Medical Data Mining Based on Association Rules

Medical Data Mining Based on Association Rules Medical Data Mining Based on Association Rules Ruijuan Hu Dep of Foundation, PLA University of Foreign Languages, Luoyang 471003, China E-mail: huruijuan01@126.com Abstract Detailed elaborations are presented

More information

Correlation based File Prefetching Approach for Hadoop

Correlation based File Prefetching Approach for Hadoop IEEE 2nd International Conference on Cloud Computing Technology and Science Correlation based File Prefetching Approach for Hadoop Bo Dong 1, Xiao Zhong 2, Qinghua Zheng 1, Lirong Jian 2, Jian Liu 1, Jie

More information

Quadrant-Based MBR-Tree Indexing Technique for Range Query Over HBase

Quadrant-Based MBR-Tree Indexing Technique for Range Query Over HBase Quadrant-Based MBR-Tree Indexing Technique for Range Query Over HBase Bumjoon Jo and Sungwon Jung (&) Department of Computer Science and Engineering, Sogang University, 35 Baekbeom-ro, Mapo-gu, Seoul 04107,

More information

Open Access Apriori Algorithm Research Based on Map-Reduce in Cloud Computing Environments

Open Access Apriori Algorithm Research Based on Map-Reduce in Cloud Computing Environments Send Orders for Reprints to reprints@benthamscience.ae 368 The Open Automation and Control Systems Journal, 2014, 6, 368-373 Open Access Apriori Algorithm Research Based on Map-Reduce in Cloud Computing

More information

Processing Technology of Massive Human Health Data Based on Hadoop

Processing Technology of Massive Human Health Data Based on Hadoop 6th International Conference on Machinery, Materials, Environment, Biotechnology and Computer (MMEBC 2016) Processing Technology of Massive Human Health Data Based on Hadoop Miao Liu1, a, Junsheng Yu1,

More information

Research and Improvement of Apriori Algorithm Based on Hadoop

Research and Improvement of Apriori Algorithm Based on Hadoop Research and Improvement of Apriori Algorithm Based on Hadoop Gao Pengfei a, Wang Jianguo b and Liu Pengcheng c School of Computer Science and Engineering Xi'an Technological University Xi'an, 710021,

More information

Research on the Application of Bank Transaction Data Stream Storage based on HBase Xiaoguo Wang*, Yuxiang Liu and Lin Zhang

Research on the Application of Bank Transaction Data Stream Storage based on HBase Xiaoguo Wang*, Yuxiang Liu and Lin Zhang International Conference on Engineering Management (Iconf-EM 2016) Research on the Application of Bank Transaction Data Stream Storage based on HBase Xiaoguo Wang*, Yuxiang Liu and Lin Zhang School of

More information

Benchmarking the UB-tree

Benchmarking the UB-tree Benchmarking the UB-tree Michal Krátký, Tomáš Skopal Department of Computer Science, VŠB Technical University of Ostrava, tř. 17. listopadu 15, Ostrava, Czech Republic michal.kratky@vsb.cz, tomas.skopal@vsb.cz

More information

Spatiotemporal Access to Moving Objects. Hao LIU, Xu GENG 17/04/2018

Spatiotemporal Access to Moving Objects. Hao LIU, Xu GENG 17/04/2018 Spatiotemporal Access to Moving Objects Hao LIU, Xu GENG 17/04/2018 Contents Overview & applications Spatiotemporal queries Movingobjects modeling Sampled locations Linear function of time Indexing structure

More information

A mining method for tracking changes in temporal association rules from an encoded database

A mining method for tracking changes in temporal association rules from an encoded database A mining method for tracking changes in temporal association rules from an encoded database Chelliah Balasubramanian *, Karuppaswamy Duraiswamy ** K.S.Rangasamy College of Technology, Tiruchengode, Tamil

More information

Managing Massive Business Process Models and Instances with Process Space

Managing Massive Business Process Models and Instances with Process Space Managing Massive Business Process s and s with Process Space Shuhao Wang, Cheng Lv, Lijie Wen, and Jianmin Wang School of Software, Tsinghua University, Beijing 184, P.R. China shudiwsh29@gmail.com,lvcheng131@qq.com,

More information

THE EFFECT OF JOIN SELECTIVITIES ON OPTIMAL NESTING ORDER

THE EFFECT OF JOIN SELECTIVITIES ON OPTIMAL NESTING ORDER THE EFFECT OF JOIN SELECTIVITIES ON OPTIMAL NESTING ORDER Akhil Kumar and Michael Stonebraker EECS Department University of California Berkeley, Ca., 94720 Abstract A heuristic query optimizer must choose

More information

ChIP-Seq Tutorial on Galaxy

ChIP-Seq Tutorial on Galaxy 1 Introduction ChIP-Seq Tutorial on Galaxy 2 December 2010 (modified April 6, 2017) Rory Stark The aim of this practical is to give you some experience handling ChIP-Seq data. We will be working with data

More information

Top-k Keyword Search Over Graphs Based On Backward Search

Top-k Keyword Search Over Graphs Based On Backward Search Top-k Keyword Search Over Graphs Based On Backward Search Jia-Hui Zeng, Jiu-Ming Huang, Shu-Qiang Yang 1College of Computer National University of Defense Technology, Changsha, China 2College of Computer

More information

Max-Count Aggregation Estimation for Moving Points

Max-Count Aggregation Estimation for Moving Points Max-Count Aggregation Estimation for Moving Points Yi Chen Peter Revesz Dept. of Computer Science and Engineering, University of Nebraska-Lincoln, Lincoln, NE 68588, USA Abstract Many interesting problems

More information

Improving Throughput in Cloud Storage System

Improving Throughput in Cloud Storage System Improving Throughput in Cloud Storage System Chanho Choi chchoi@dcslab.snu.ac.kr Shin-gyu Kim sgkim@dcslab.snu.ac.kr Hyeonsang Eom hseom@dcslab.snu.ac.kr Heon Y. Yeom yeom@dcslab.snu.ac.kr Abstract Because

More information

Improved Balanced Parallel FP-Growth with MapReduce Qing YANG 1,a, Fei-Yang DU 2,b, Xi ZHU 1,c, Cheng-Gong JIANG *

Improved Balanced Parallel FP-Growth with MapReduce Qing YANG 1,a, Fei-Yang DU 2,b, Xi ZHU 1,c, Cheng-Gong JIANG * 2016 Joint International Conference on Artificial Intelligence and Computer Engineering (AICE 2016) and International Conference on Network and Communication Security (NCS 2016) ISBN: 978-1-60595-362-5

More information

The Comparative Study of Machine Learning Algorithms in Text Data Classification*

The Comparative Study of Machine Learning Algorithms in Text Data Classification* The Comparative Study of Machine Learning Algorithms in Text Data Classification* Wang Xin School of Science, Beijing Information Science and Technology University Beijing, China Abstract Classification

More information

JULIA ENABLED COMPUTATION OF MOLECULAR LIBRARY COMPLEXITY IN DNA SEQUENCING

JULIA ENABLED COMPUTATION OF MOLECULAR LIBRARY COMPLEXITY IN DNA SEQUENCING JULIA ENABLED COMPUTATION OF MOLECULAR LIBRARY COMPLEXITY IN DNA SEQUENCING Larson Hogstrom, Mukarram Tahir, Andres Hasfura Massachusetts Institute of Technology, Cambridge, Massachusetts, USA 18.337/6.338

More information

Fast trajectory matching using small binary images

Fast trajectory matching using small binary images Title Fast trajectory matching using small binary images Author(s) Zhuo, W; Schnieders, D; Wong, KKY Citation The 3rd International Conference on Multimedia Technology (ICMT 2013), Guangzhou, China, 29

More information

Mining Frequent Itemsets for data streams over Weighted Sliding Windows

Mining Frequent Itemsets for data streams over Weighted Sliding Windows Mining Frequent Itemsets for data streams over Weighted Sliding Windows Pauray S.M. Tsai Yao-Ming Chen Department of Computer Science and Information Engineering Minghsin University of Science and Technology

More information

Background and Strategy. Smitha, Adrian, Devin, Jeff, Ali, Sanjeev, Karthikeyan

Background and Strategy. Smitha, Adrian, Devin, Jeff, Ali, Sanjeev, Karthikeyan Background and Strategy Smitha, Adrian, Devin, Jeff, Ali, Sanjeev, Karthikeyan What is a genome browser? A web/desktop based graphical tool for rapid and reliable display of any requested portion of the

More information

SA-IFIM: Incrementally Mining Frequent Itemsets in Update Distorted Databases

SA-IFIM: Incrementally Mining Frequent Itemsets in Update Distorted Databases SA-IFIM: Incrementally Mining Frequent Itemsets in Update Distorted Databases Jinlong Wang, Congfu Xu, Hongwei Dan, and Yunhe Pan Institute of Artificial Intelligence, Zhejiang University Hangzhou, 310027,

More information

Chapter 1, Introduction

Chapter 1, Introduction CSI 4352, Introduction to Data Mining Chapter 1, Introduction Young-Rae Cho Associate Professor Department of Computer Science Baylor University What is Data Mining? Definition Knowledge Discovery from

More information

Ranking Web Pages by Associating Keywords with Locations

Ranking Web Pages by Associating Keywords with Locations Ranking Web Pages by Associating Keywords with Locations Peiquan Jin, Xiaoxiang Zhang, Qingqing Zhang, Sheng Lin, and Lihua Yue University of Science and Technology of China, 230027, Hefei, China jpq@ustc.edu.cn

More information

Compression of the Stream Array Data Structure

Compression of the Stream Array Data Structure Compression of the Stream Array Data Structure Radim Bača and Martin Pawlas Department of Computer Science, Technical University of Ostrava Czech Republic {radim.baca,martin.pawlas}@vsb.cz Abstract. In

More information

Video annotation based on adaptive annular spatial partition scheme

Video annotation based on adaptive annular spatial partition scheme Video annotation based on adaptive annular spatial partition scheme Guiguang Ding a), Lu Zhang, and Xiaoxu Li Key Laboratory for Information System Security, Ministry of Education, Tsinghua National Laboratory

More information

Mining Temporal Association Rules in Network Traffic Data

Mining Temporal Association Rules in Network Traffic Data Mining Temporal Association Rules in Network Traffic Data Guojun Mao Abstract Mining association rules is one of the most important and popular task in data mining. Current researches focus on discovering

More information

Baoping Wang School of software, Nanyang Normal University, Nanyang , Henan, China

Baoping Wang School of software, Nanyang Normal University, Nanyang , Henan, China doi:10.21311/001.39.7.41 Implementation of Cache Schedule Strategy in Solid-state Disk Baoping Wang School of software, Nanyang Normal University, Nanyang 473061, Henan, China Chao Yin* School of Information

More information

AN OPTIMIZATION GENETIC ALGORITHM FOR IMAGE DATABASES IN AGRICULTURE

AN OPTIMIZATION GENETIC ALGORITHM FOR IMAGE DATABASES IN AGRICULTURE AN OPTIMIZATION GENETIC ALGORITHM FOR IMAGE DATABASES IN AGRICULTURE Changwu Zhu 1, Guanxiang Yan 2, Zhi Liu 3, Li Gao 1,* 1 Department of Computer Science, Hua Zhong Normal University, Wuhan 430079, China

More information

Using MPI One-sided Communication to Accelerate Bioinformatics Applications

Using MPI One-sided Communication to Accelerate Bioinformatics Applications Using MPI One-sided Communication to Accelerate Bioinformatics Applications Hao Wang (hwang121@vt.edu) Department of Computer Science, Virginia Tech Next-Generation Sequencing (NGS) Data Analysis NGS Data

More information

Integrated Usage of Heterogeneous Databases for Novice Users

Integrated Usage of Heterogeneous Databases for Novice Users International Journal of Networked and Distributed Computing, Vol. 3, No. 2 (April 2015), 109-118 Integrated Usage of Heterogeneous Databases for Novice Users Ayano Terakawa Dept. of Information Science,

More information

Best Keyword Cover Search

Best Keyword Cover Search Vennapusa Mahesh Kumar Reddy Dept of CSE, Benaiah Institute of Technology and Science. Best Keyword Cover Search Sudhakar Babu Pendhurthi Assistant Professor, Benaiah Institute of Technology and Science.

More information

A Method of Hyper-sphere Cover in Multidimensional Space for Human Mocap Data Retrieval

A Method of Hyper-sphere Cover in Multidimensional Space for Human Mocap Data Retrieval Journal of Human Kinetics volume 28/2011, 133-139 DOI: 10.2478/v10078-011-0030-0 133 Section III Sport, Physical Education & Recreation A Method of Hyper-sphere Cover in Multidimensional Space for Human

More information

COMPARISON OF DENSITY-BASED CLUSTERING ALGORITHMS

COMPARISON OF DENSITY-BASED CLUSTERING ALGORITHMS COMPARISON OF DENSITY-BASED CLUSTERING ALGORITHMS Mariam Rehman Lahore College for Women University Lahore, Pakistan mariam.rehman321@gmail.com Syed Atif Mehdi University of Management and Technology Lahore,

More information

Accelerating String Matching Algorithms on Multicore Processors Cheng-Hung Lin

Accelerating String Matching Algorithms on Multicore Processors Cheng-Hung Lin Accelerating String Matching Algorithms on Multicore Processors Cheng-Hung Lin Department of Electrical Engineering, National Taiwan Normal University, Taipei, Taiwan Abstract String matching is the most

More information

Preliminary Research on Distributed Cluster Monitoring of G/S Model

Preliminary Research on Distributed Cluster Monitoring of G/S Model Available online at www.sciencedirect.com Physics Procedia 25 (2012 ) 860 867 2012 International Conference on Solid State Devices and Materials Science Preliminary Research on Distributed Cluster Monitoring

More information

A Practical Distributed String Matching Algorithm Architecture and Implementation

A Practical Distributed String Matching Algorithm Architecture and Implementation A Practical Distributed String Matching Algorithm Architecture and Implementation Bi Kun, Gu Nai-jie, Tu Kun, Liu Xiao-hu, and Liu Gang International Science Index, Computer and Information Engineering

More information

A Fast and High Throughput SQL Query System for Big Data

A Fast and High Throughput SQL Query System for Big Data A Fast and High Throughput SQL Query System for Big Data Feng Zhu, Jie Liu, and Lijie Xu Technology Center of Software Engineering, Institute of Software, Chinese Academy of Sciences, Beijing, China 100190

More information

Comparative Study of Subspace Clustering Algorithms

Comparative Study of Subspace Clustering Algorithms Comparative Study of Subspace Clustering Algorithms S.Chitra Nayagam, Asst Prof., Dept of Computer Applications, Don Bosco College, Panjim, Goa. Abstract-A cluster is a collection of data objects that

More information

Sequences Modeling and Analysis Based on Complex Network

Sequences Modeling and Analysis Based on Complex Network Sequences Modeling and Analysis Based on Complex Network Li Wan 1, Kai Shu 1, and Yu Guo 2 1 Chongqing University, China 2 Institute of Chemical Defence People Libration Army {wanli,shukai}@cqu.edu.cn

More information

Optimizing the use of the Hard Disk in MapReduce Frameworks for Multi-core Architectures*

Optimizing the use of the Hard Disk in MapReduce Frameworks for Multi-core Architectures* Optimizing the use of the Hard Disk in MapReduce Frameworks for Multi-core Architectures* Tharso Ferreira 1, Antonio Espinosa 1, Juan Carlos Moure 2 and Porfidio Hernández 2 Computer Architecture and Operating

More information

An Efficient Approach for Color Pattern Matching Using Image Mining

An Efficient Approach for Color Pattern Matching Using Image Mining An Efficient Approach for Color Pattern Matching Using Image Mining * Manjot Kaur Navjot Kaur Master of Technology in Computer Science & Engineering, Sri Guru Granth Sahib World University, Fatehgarh Sahib,

More information

Implementation of Parallel CASINO Algorithm Based on MapReduce. Li Zhang a, Yijie Shi b

Implementation of Parallel CASINO Algorithm Based on MapReduce. Li Zhang a, Yijie Shi b International Conference on Artificial Intelligence and Engineering Applications (AIEA 2016) Implementation of Parallel CASINO Algorithm Based on MapReduce Li Zhang a, Yijie Shi b State key laboratory

More information

arxiv: v1 [cs.cv] 6 Jun 2017

arxiv: v1 [cs.cv] 6 Jun 2017 Volume Calculation of CT lung Lesions based on Halton Low-discrepancy Sequences Liansheng Wang a, Shusheng Li a, and Shuo Li b a Department of Computer Science, Xiamen University, Xiamen, China b Dept.

More information

A Review on Cache Memory with Multiprocessor System

A Review on Cache Memory with Multiprocessor System A Review on Cache Memory with Multiprocessor System Chirag R. Patel 1, Rajesh H. Davda 2 1,2 Computer Engineering Department, C. U. Shah College of Engineering & Technology, Wadhwan (Gujarat) Abstract

More information

4th National Conference on Electrical, Electronics and Computer Engineering (NCEECE 2015)

4th National Conference on Electrical, Electronics and Computer Engineering (NCEECE 2015) 4th National Conference on Electrical, Electronics and Computer Engineering (NCEECE 2015) Benchmark Testing for Transwarp Inceptor A big data analysis system based on in-memory computing Mingang Chen1,2,a,

More information

A Batched GPU Algorithm for Set Intersection

A Batched GPU Algorithm for Set Intersection A Batched GPU Algorithm for Set Intersection Di Wu, Fan Zhang, Naiyong Ao, Fang Wang, Xiaoguang Liu, Gang Wang Nankai-Baidu Joint Lab, College of Information Technical Science, Nankai University Weijin

More information

A Row-and-Column Generation Method to a Batch Machine Scheduling Problem

A Row-and-Column Generation Method to a Batch Machine Scheduling Problem The Ninth International Symposium on Operations Research and Its Applications (ISORA 10) Chengdu-Jiuzhaigou, China, August 19 23, 2010 Copyright 2010 ORSC & APORC, pp. 301 308 A Row-and-Column Generation

More information

Drawing Bipartite Graphs as Anchored Maps

Drawing Bipartite Graphs as Anchored Maps Drawing Bipartite Graphs as Anchored Maps Kazuo Misue Graduate School of Systems and Information Engineering University of Tsukuba 1-1-1 Tennoudai, Tsukuba, 305-8573 Japan misue@cs.tsukuba.ac.jp Abstract

More information

FSRM Feedback Algorithm based on Learning Theory

FSRM Feedback Algorithm based on Learning Theory Send Orders for Reprints to reprints@benthamscience.ae The Open Cybernetics & Systemics Journal, 2015, 9, 699-703 699 FSRM Feedback Algorithm based on Learning Theory Open Access Zhang Shui-Li *, Dong

More information

Storage access optimization with virtual machine migration during execution of parallel data processing on a virtual machine PC cluster

Storage access optimization with virtual machine migration during execution of parallel data processing on a virtual machine PC cluster Storage access optimization with virtual machine migration during execution of parallel data processing on a virtual machine PC cluster Shiori Toyoshima Ochanomizu University 2 1 1, Otsuka, Bunkyo-ku Tokyo

More information

V Conclusions. V.1 Related work

V Conclusions. V.1 Related work V Conclusions V.1 Related work Even though MapReduce appears to be constructed specifically for performing group-by aggregations, there are also many interesting research work being done on studying critical

More information

A Novel Method for Activity Place Sensing Based on Behavior Pattern Mining Using Crowdsourcing Trajectory Data

A Novel Method for Activity Place Sensing Based on Behavior Pattern Mining Using Crowdsourcing Trajectory Data A Novel Method for Activity Place Sensing Based on Behavior Pattern Mining Using Crowdsourcing Trajectory Data Wei Yang 1, Tinghua Ai 1, Wei Lu 1, Tong Zhang 2 1 School of Resource and Environment Sciences,

More information

A Database Redo Log System Based on Virtual Memory Disk*

A Database Redo Log System Based on Virtual Memory Disk* A Database Redo Log System Based on Virtual Memory Disk* Haiping Wu, Hongliang Yu, Bigang Li, Xue Wei, and Weimin Zheng Department of Computer Science and Technology, Tsinghua University, 100084, Beijing,

More information

Design Considerations on Implementing an Indoor Moving Objects Management System

Design Considerations on Implementing an Indoor Moving Objects Management System , pp.60-64 http://dx.doi.org/10.14257/astl.2014.45.12 Design Considerations on Implementing an s Management System Qian Wang, Qianyuan Li, Na Wang, Peiquan Jin School of Computer Science and Technology,

More information

A METHOD FOR CONTENT-BASED SEARCHING OF 3D MODEL DATABASES

A METHOD FOR CONTENT-BASED SEARCHING OF 3D MODEL DATABASES A METHOD FOR CONTENT-BASED SEARCHING OF 3D MODEL DATABASES Jiale Wang *, Hongming Cai 2 and Yuanjun He * Department of Computer Science & Technology, Shanghai Jiaotong University, China Email: wjl8026@yahoo.com.cn

More information

A New Model of Search Engine based on Cloud Computing

A New Model of Search Engine based on Cloud Computing A New Model of Search Engine based on Cloud Computing DING Jian-li 1,2, YANG Bo 1 1. College of Computer Science and Technology, Civil Aviation University of China, Tianjin 300300, China 2. Tianjin Key

More information

Comment Extraction from Blog Posts and Its Applications to Opinion Mining

Comment Extraction from Blog Posts and Its Applications to Opinion Mining Comment Extraction from Blog Posts and Its Applications to Opinion Mining Huan-An Kao, Hsin-Hsi Chen Department of Computer Science and Information Engineering National Taiwan University, Taipei, Taiwan

More information

Related Work The Concept of the Signaling. In the mobile communication system, in addition to transmit the necessary user information (usually voice

Related Work The Concept of the Signaling. In the mobile communication system, in addition to transmit the necessary user information (usually voice International Conference on Information Science and Computer Applications (ISCA 2013) The Research and Design of Personalization preferences Based on Signaling analysis ZhiQiang Wei 1,a, YiYan Zhang 1,b,

More information

A paralleled algorithm based on multimedia retrieval

A paralleled algorithm based on multimedia retrieval A paralleled algorithm based on multimedia retrieval Changhong Guo Teaching and Researching Department of Basic Course, Jilin Institute of Physical Education, Changchun 130022, Jilin, China Abstract With

More information

Compressing and Decoding Term Statistics Time Series

Compressing and Decoding Term Statistics Time Series Compressing and Decoding Term Statistics Time Series Jinfeng Rao 1,XingNiu 1,andJimmyLin 2(B) 1 University of Maryland, College Park, USA {jinfeng,xingniu}@cs.umd.edu 2 University of Waterloo, Waterloo,

More information

Retrieval of Highly Related Documents Containing Gene-Disease Association

Retrieval of Highly Related Documents Containing Gene-Disease Association Retrieval of Highly Related Documents Containing Gene-Disease Association K. Santhosh kumar 1, P. Sudhakar 2 Department of Computer Science & Engineering Annamalai University Annamalai Nagar, India. santhosh09539@gmail.com,

More information

Data Mining and Data Warehousing Introduction to Data Mining

Data Mining and Data Warehousing Introduction to Data Mining Data Mining and Data Warehousing Introduction to Data Mining Quiz Easy Q1. Which of the following is a data warehouse? a. Can be updated by end users. b. Contains numerous naming conventions and formats.

More information

Intel Solid State Drive Data Center Family for PCIe* in Baidu s Data Center Environment

Intel Solid State Drive Data Center Family for PCIe* in Baidu s Data Center Environment Intel Solid State Drive Data Center Family for PCIe* in Baidu s Data Center Environment Case Study Order Number: 334534-002US Ordering Information Contact your local Intel sales representative for ordering

More information

SMCCSE: PaaS Platform for processing large amounts of social media

SMCCSE: PaaS Platform for processing large amounts of social media KSII The first International Conference on Internet (ICONI) 2011, December 2011 1 Copyright c 2011 KSII SMCCSE: PaaS Platform for processing large amounts of social media Myoungjin Kim 1, Hanku Lee 2 and

More information

Secure Remote Storage Using Oblivious RAM

Secure Remote Storage Using Oblivious RAM Secure Remote Storage Using Oblivious RAM Giovanni Malloy Mentors: Georgios Kellaris, Kobbi Nissim August 11, 2016 Abstract Oblivious RAM (ORAM) is a protocol that allows a user to access the data she

More information

An Evolutionary Algorithm for Mining Association Rules Using Boolean Approach

An Evolutionary Algorithm for Mining Association Rules Using Boolean Approach An Evolutionary Algorithm for Mining Association Rules Using Boolean Approach ABSTRACT G.Ravi Kumar 1 Dr.G.A. Ramachandra 2 G.Sunitha 3 1. Research Scholar, Department of Computer Science &Technology,

More information

The Performance Analysis of a Service Deployment System Based on the Centralized Storage

The Performance Analysis of a Service Deployment System Based on the Centralized Storage The Performance Analysis of a Service Deployment System Based on the Centralized Storage Zhu Xu Dong School of Computer Science and Information Engineering Zhejiang Gongshang University 310018 Hangzhou,

More information

UAPRIORI: AN ALGORITHM FOR FINDING SEQUENTIAL PATTERNS IN PROBABILISTIC DATA

UAPRIORI: AN ALGORITHM FOR FINDING SEQUENTIAL PATTERNS IN PROBABILISTIC DATA UAPRIORI: AN ALGORITHM FOR FINDING SEQUENTIAL PATTERNS IN PROBABILISTIC DATA METANAT HOOSHSADAT, SAMANEH BAYAT, PARISA NAEIMI, MAHDIEH S. MIRIAN, OSMAR R. ZAÏANE Computing Science Department, University

More information

An Improved KNN Classification Algorithm based on Sampling

An Improved KNN Classification Algorithm based on Sampling International Conference on Advances in Materials, Machinery, Electrical Engineering (AMMEE 017) An Improved KNN Classification Algorithm based on Sampling Zhiwei Cheng1, a, Caisen Chen1, b, Xuehuan Qiu1,

More information

Channel Allocation for Social Networking Features on Publish/Subscribe-based Mobile Application

Channel Allocation for Social Networking Features on Publish/Subscribe-based Mobile Application Allocation for Social Networking Features on Publish/Subscribe-based Mobile Application Alfian Ramadhan, Achmad Imam Kistijantoro Laboratory of Distributed System School of Electrical Engineering and Informatics,

More information

An Efficient Technique for Distance Computation in Road Networks

An Efficient Technique for Distance Computation in Road Networks Fifth International Conference on Information Technology: New Generations An Efficient Technique for Distance Computation in Road Networks Xu Jianqiu 1, Victor Almeida 2, Qin Xiaolin 1 1 Nanjing University

More information

An Application of Genetic Algorithm for Auto-body Panel Die-design Case Library Based on Grid

An Application of Genetic Algorithm for Auto-body Panel Die-design Case Library Based on Grid An Application of Genetic Algorithm for Auto-body Panel Die-design Case Library Based on Grid Demin Wang 2, Hong Zhu 1, and Xin Liu 2 1 College of Computer Science and Technology, Jilin University, Changchun

More information

Distributed Bottom up Approach for Data Anonymization using MapReduce framework on Cloud

Distributed Bottom up Approach for Data Anonymization using MapReduce framework on Cloud Distributed Bottom up Approach for Data Anonymization using MapReduce framework on Cloud R. H. Jadhav 1 P.E.S college of Engineering, Aurangabad, Maharashtra, India 1 rjadhav377@gmail.com ABSTRACT: Many

More information

OPEN MP-BASED PARALLEL AND SCALABLE GENETIC SEQUENCE ALIGNMENT

OPEN MP-BASED PARALLEL AND SCALABLE GENETIC SEQUENCE ALIGNMENT OPEN MP-BASED PARALLEL AND SCALABLE GENETIC SEQUENCE ALIGNMENT Asif Ali Khan*, Laiq Hassan*, Salim Ullah* ABSTRACT: In bioinformatics, sequence alignment is a common and insistent task. Biologists align

More information

Accelerating Implicit LS-DYNA with GPU

Accelerating Implicit LS-DYNA with GPU Accelerating Implicit LS-DYNA with GPU Yih-Yih Lin Hewlett-Packard Company Abstract A major hindrance to the widespread use of Implicit LS-DYNA is its high compute cost. This paper will show modern GPU,

More information

An Evaluation of Client-Side Dependencies of Search Engines by Load Testing

An Evaluation of Client-Side Dependencies of Search Engines by Load Testing An Evaluation of Client-Side Dependencies of by Load Testing Emine Sefer, Sinem Aykanat TUBITAK BILGEM YTKDM Kocaeli, Turkey emine.sefer@tubitak.gov.tr sinem.aykanat@tubitak.gov.tr Abstract Nowadays, web

More information

GPU-Accelerated Apriori Algorithm

GPU-Accelerated Apriori Algorithm GPU-Accelerated Apriori Algorithm Hao JIANG a, Chen-Wei XU b, Zhi-Yong LIU c, and Li-Yan YU d School of Computer Science and Engineering, Southeast University, Nanjing, China a hjiang@seu.edu.cn, b wei1517@126.com,

More information

SKETCHES ON SINGLE BOARD COMPUTERS

SKETCHES ON SINGLE BOARD COMPUTERS Sabancı University Program for Undergraduate Research (PURE) Summer 17-1 SKETCHES ON SINGLE BOARD COMPUTERS Ali Osman Berk Şapçı Computer Science and Engineering, 1 Egemen Ertuğrul Computer Science and

More information

Fast K-nearest neighbors searching algorithms for point clouds data of 3D scanning system 1

Fast K-nearest neighbors searching algorithms for point clouds data of 3D scanning system 1 Acta Technica 62 No. 3B/2017, 141 148 c 2017 Institute of Thermomechanics CAS, v.v.i. Fast K-nearest neighbors searching algorithms for point clouds data of 3D scanning system 1 Zhang Fan 2, 3, Tan Yuegang

More information

CONTRIBUTION TO THE INVESTIGATION OF STOPPING SIGHT DISTANCE IN THREE-DIMENSIONAL SPACE

CONTRIBUTION TO THE INVESTIGATION OF STOPPING SIGHT DISTANCE IN THREE-DIMENSIONAL SPACE National Technical University of Athens School of Civil Engineering Department of Transportation Planning and Engineering Doctoral Dissertation CONTRIBUTION TO THE INVESTIGATION OF STOPPING SIGHT DISTANCE

More information

2386 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 52, NO. 6, JUNE 2006

2386 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 52, NO. 6, JUNE 2006 2386 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 52, NO. 6, JUNE 2006 The Encoding Complexity of Network Coding Michael Langberg, Member, IEEE, Alexander Sprintson, Member, IEEE, and Jehoshua Bruck,

More information

Utilizing Concurrency: A New Theory for Memory Wall

Utilizing Concurrency: A New Theory for Memory Wall Utilizing Concurrency: A New Theory for Memory Wall Xian-He Sun (&) and Yu-Hang Liu Illinois Institute of Technology, Chicago, USA {sun,yuhang.liu}@iit.edu Abstract. In addition to locality, data access

More information

EXTRACT THE TARGET LIST WITH HIGH ACCURACY FROM TOP-K WEB PAGES

EXTRACT THE TARGET LIST WITH HIGH ACCURACY FROM TOP-K WEB PAGES EXTRACT THE TARGET LIST WITH HIGH ACCURACY FROM TOP-K WEB PAGES B. GEETHA KUMARI M. Tech (CSE) Email-id: Geetha.bapr07@gmail.com JAGETI PADMAVTHI M. Tech (CSE) Email-id: jageti.padmavathi4@gmail.com ABSTRACT:

More information

A Data Classification Algorithm of Internet of Things Based on Neural Network

A Data Classification Algorithm of Internet of Things Based on Neural Network A Data Classification Algorithm of Internet of Things Based on Neural Network https://doi.org/10.3991/ijoe.v13i09.7587 Zhenjun Li Hunan Radio and TV University, Hunan, China 278060389@qq.com Abstract To

More information

Keywords APSE: Advanced Preferred Search Engine, Google Android Platform, Search Engine, Click-through data, Location and Content Concepts.

Keywords APSE: Advanced Preferred Search Engine, Google Android Platform, Search Engine, Click-through data, Location and Content Concepts. Volume 5, Issue 3, March 2015 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Advanced Preferred

More information

Parallelizing Structural Joins to Process Queries over Big XML Data Using MapReduce

Parallelizing Structural Joins to Process Queries over Big XML Data Using MapReduce Parallelizing Structural Joins to Process Queries over Big XML Data Using MapReduce Huayu Wu Institute for Infocomm Research, A*STAR, Singapore huwu@i2r.a-star.edu.sg Abstract. Processing XML queries over

More information

Supporting Fuzzy Keyword Search in Databases

Supporting Fuzzy Keyword Search in Databases I J C T A, 9(24), 2016, pp. 385-391 International Science Press Supporting Fuzzy Keyword Search in Databases Jayavarthini C.* and Priya S. ABSTRACT An efficient keyword search system computes answers as

More information

DELL EMC POWER EDGE R940 MAKES DE NOVO ASSEMBLY EASIER

DELL EMC POWER EDGE R940 MAKES DE NOVO ASSEMBLY EASIER DELL EMC POWER EDGE R940 MAKES DE NOVO ASSEMBLY EASIER Genome Assembly on Deep Sequencing data with SOAPdenovo2 ABSTRACT De novo assemblies are memory intensive since the assembly algorithms need to compare

More information

Yunfeng Zhang 1, Huan Wang 2, Jie Zhu 1 1 Computer Science & Engineering Department, North China Institute of Aerospace

Yunfeng Zhang 1, Huan Wang 2, Jie Zhu 1 1 Computer Science & Engineering Department, North China Institute of Aerospace [Type text] [Type text] [Type text] ISSN : 0974-7435 Volume 10 Issue 20 BioTechnology 2014 An Indian Journal FULL PAPER BTAIJ, 10(20), 2014 [12526-12531] Exploration on the data mining system construction

More information

Performance Extrapolation for Load Testing Results of Mixture of Applications

Performance Extrapolation for Load Testing Results of Mixture of Applications Performance Extrapolation for Load Testing Results of Mixture of Applications Subhasri Duttagupta, Manoj Nambiar Tata Innovation Labs, Performance Engineering Research Center Tata Consulting Services Mumbai,

More information

Dynamic Data Placement Strategy in MapReduce-styled Data Processing Platform Hua-Ci WANG 1,a,*, Cai CHEN 2,b,*, Yi LIANG 3,c

Dynamic Data Placement Strategy in MapReduce-styled Data Processing Platform Hua-Ci WANG 1,a,*, Cai CHEN 2,b,*, Yi LIANG 3,c 2016 Joint International Conference on Service Science, Management and Engineering (SSME 2016) and International Conference on Information Science and Technology (IST 2016) ISBN: 978-1-60595-379-3 Dynamic

More information

Query-Sensitive Similarity Measure for Content-Based Image Retrieval

Query-Sensitive Similarity Measure for Content-Based Image Retrieval Query-Sensitive Similarity Measure for Content-Based Image Retrieval Zhi-Hua Zhou Hong-Bin Dai National Laboratory for Novel Software Technology Nanjing University, Nanjing 2193, China {zhouzh, daihb}@lamda.nju.edu.cn

More information

Efficient Common Items Extraction from Multiple Sorted Lists

Efficient Common Items Extraction from Multiple Sorted Lists 00 th International Asia-Pacific Web Conference Efficient Common Items Extraction from Multiple Sorted Lists Wei Lu,, Chuitian Rong,, Jinchuan Chen, Xiaoyong Du,, Gabriel Pui Cheong Fung, Xiaofang Zhou

More information

Searching frequent itemsets by clustering data: towards a parallel approach using MapReduce

Searching frequent itemsets by clustering data: towards a parallel approach using MapReduce Searching frequent itemsets by clustering data: towards a parallel approach using MapReduce Maria Malek and Hubert Kadima EISTI-LARIS laboratory, Ave du Parc, 95011 Cergy-Pontoise, FRANCE {maria.malek,hubert.kadima}@eisti.fr

More information

AIIA shot boundary detection at TRECVID 2006

AIIA shot boundary detection at TRECVID 2006 AIIA shot boundary detection at TRECVID 6 Z. Černeková, N. Nikolaidis and I. Pitas Artificial Intelligence and Information Analysis Laboratory Department of Informatics Aristotle University of Thessaloniki

More information