Web Data Mining based on Cloud Computing
|
|
- Alexia Webb
- 6 years ago
- Views:
Transcription
1 Web Data Mining based on Cloud Computing Liangfei XUE 1 Dongfeng Yuan 2 Mingyan Jiang 3 Abstract With the recent success of cloud computing, data mining is going to be more accessible due to easier access to less expensive computational resources. In this paper, we use the virtualization technology which is the key in cloud computing to build up a web data mining cloud model. This model is consisted of Storage Cloud and Calculation Cloud, established mainly through the parallel storage technology and parallel computing technology. Finally, this paper describes a specific instance of Web Date Mining combined with the application of Cloud Computing. This instance shows the proposed method can satisfy the mass user s information demand concurrently, fast, real-time and efficiently. Keywords Web date mining, Cloud Computing, Cloud Model, Storage Cloud, Calculation Cloud. 1 Introduction The wide adoption of the Internet has fundamentally altered the ways in which we communicate, gather information, conduct businesses and make purchases. How to get useful information exactly from such a wide variety of data determine the development of the society, this is now one of the most important problems. This makes Web Data Mining interesting and challenging. Cloud Computing is the new Foundation: the National Natural Science Foundation of Shandong Province of China under Grant No.ZR2010FM040; Special Funding Project for Independent Innovation Achievements Transform of Shandong Province under Grant No.2009ZHZX1A0108, No.2010ZHZX1A Liangfei Xue ( ) School of Information Science and Engineering, Shandong University, Jinan , China feiliangxue@gmail.com 2 Dongfeng Yuan( ) School of Information Science and Engineering, Shandong University, Jinan , China dfyuan@sdu.edu.cn 3 Mingyan Jiang ( ) School of Information Science and Engineering, Shandong University, Jinan , China jiangmingyan@sdu.edu.cn
2 2 Liangfei Xue Dongfeng Yuan Mingyan Jiang concept during the emergence of the parallel computing development. Cloud computing now for many customers can do mass data mining with cheap cost, which is of important scientific research value and commercial value. The potential value of the cloud computing has got attention from Google, IBM and other foreign firms and domestic companies such as Inspur and Baidu[1,2]. Recently, the distributed data mining focused on the research of grid computing, and has obtained some achievements. Literature [3] proposed an OGSI.net framework of distributed data mining model, and gave the software deployment scheme of this model, but it did not really applied to experimental projects. Literature [4] analysis a grid environment data mining system, which is made up by the personal computer, the data mining process and the work each process should complete. Foreign studies have shown that the data mining based on cloud computing has the characteristics of low power consumption. This paper is to combine the Web data mining and the cloud computing through the study of the theory of the cloud computing and Web data mining. As a result, we will build a Web data mining cloud model, to get rapid and real-time method for the mass data mining on Web. 2 Data Mining Using Cloud Computing Model 2.1 Web Data Mining Web data mining is the data processing technology conforming to this need, namely using traditional thoughts and methods of data mining, depending on a large amount of climbing through Web page to dig out the useful information. Web data mining tasks can be divided into three main types: Web structure mining, Web content mining and Web use mining [5], deciding the purpose of Web data mining is to explore useful information from Web link, Web content and structure and the user log. For Web data mining is concerned, with the spread and development of the Internet, the quantity of the information is increasing and information is also changing with time passed by, so the data collection is a difficult task, especially in Web structure mining and Web content mining. This will need to climb a great deal of Web pages. The rise of cloud computing has brought a great deal of applications. The development of parallel computing makes the cloud computing as the wide useful theory in solving mass data mining. This society where the information in World Wide Web explodes, if you want to get precise and effective Web data collection, has prompted the cloud computing. According to the current recognition about cloud computing, cloud computing is a web based, the masses participated parallel
3 Web Data Mining based on Cloud Computing 3 computation mode. Cloud computing resources includes computing power, storage capacity expansion, the virtualization and can provide mass cloud users related services to solve complicated mass task request. 2.2 Distributed Storage Technology In the cloud computing, data storage is implemented with the distributed data storage technology implementation. This would ensure file storage of high reliability, high availability and guarantee the efficiency of resources. Google's open source System GFS (Google File System) and Hadoop's open source System HDFS (Hadoop Distributed File System) are the most popular distributed data storage technology among cloud computing. Using redundancy storage method can be used to ensure that the data storage of high reliability. The information will be stored with several piece of data section during the storage process, at the same time, producing backups in the different physical node. This is also the way of using the software reliability to make up for the deficiency of the hardware. By this way we can solve the problems in Web data mining urgently. These problems are how to do web data mining cheaply and fast. In addition, data distributed storage technology is of high throughput rate and the characteristics of the high transfer rate in cloud computing. The node is flexible and easy to management so that cloud computing system can satisfy the demand of users, provide services for mass users and can be applied to the changing user group. Here, we will elaborate cloud computing distributed storage technology principle and technological advantages with open source HDFS example. 2.3 Distributed Computing Technology In the distributed storage technology, we talked about calculation should be moved to the data storage areas. Data storage is stored distributedly in different DataNode, so we inevitable refer to distributed computing technology. Map/Reduce is brought by Google Company. The basic requirement is the processed data sets can be broken down into many small data sets. Each small data set can be completely parallel processed. Map/Reduce is "task decomposition and results aggregation". It divided all the operation on data into two steps, Map stage and Reduce stage. Distributed computing technology is to change a task into many more fine grain sub-tasks. These sub-tasks can be scheduled when there are some spare nodes. Through this way we can make the faster processing speed of processing the task of the node.
4 4 Liangfei Xue Dongfeng Yuan Mingyan Jiang 2.4 Virtualization technology Virtualization of resources is storage cloud. Storage cloud is the virtualized resources pool. Storage cloud provides data storage service and management operation for cloud calculation cloud. It is not the file system, but must rely on local file system to provide the services. For the safety of the data files, storage cloud can monitor the file quantity anytime anywhere. With the character of easy application, storage cloud has higher fault-tolerant mechanism and disaster recovery, insuring the consistency of the whole file system. Virtualization of service is calculating cloud. Calculation cloud is the virtualized service pool. Calculation cloud use mass data stored in cloud storage according to the user s program request. Get and show the final results on the Map/Reduce parallel computing with the mass data. In cloud computing, it is because of the existence of calculation cloud and storage cloud, the resources in cloud can be dynamic expanded and configured. The single integral form characteristics of cloud computing in the logical finally can realized. It can more convenient to complete the task of data mining. As a result, virtualization is the most critical and the most core driving force technology in Web data mining. 3 Cloud Model of Web Data Mining 3.1 Cloud model Web data mining although can satisfy user's information service request, the process is trivial, energy consumption and high. Cloud computing has the efficient information processing, low energy consumption characteristics. Put it to the Web use in data mining can is a very good solution to this problem. Data mining software adopt the mode of parallel data mining. This is based on cloud computing. The same algorithm can be distributed in multiple nodes. Multiple algorithms are parallel executed. Those resources distribute according to its need. Distributed computing model is same to the cloud computing model using virtualization. Data processing also is to use the distributed file system in cloud computing. The data set required for cloud model is already pretreated. It comes from our storage cloud and be ready to data mining. Thus, we can define our own functions in the calculation cloud for our data mining process. The output from calculation cloud again distributed stored in our resource pool. Here, we define the pretreat-
5 Web Data Mining based on Cloud Computing 5 ment data, the store data in storage cloud, data computing in the calculation cloud and results stored in storage cloud as Web data mining cloud model. The model is shown in figure 1. Data sets from web mass information Data sets pretreate Return the result to the client Client request Cloud server command Original data Computing result Storage cloud stores the data sets and the output produced by the calculation cloud Calculation cloud receive the command from the cloud server Fig. 1 Web data mining cloud model In cloud model, the storage cloud and calculation cloud are the same computer cluster. When data collection is ready we regard computer cluster resources as virtual storage cloud. When we receive user order the computer cluster conduct data mining algorithm. At this time, virtual storage cloud becomes calculation cloud. After the results restore in the data block, computer cluster is called the storage cloud again. Cloud server returns the final combined results to the user through global control algorithm. Web data mining model based on cloud computing mainly has three main modules. The global algorithm module. This module processes the overall algorithm, coordinate the distributed data mining process and finally synthesize the mining results. Local algorithm module. This module process the local algorithm. Each data block will produce local data mining results with local data mining algorithm. The data management module. This module is to manage the data block and the local data mining results. The local algorithm module is shown in figure 2.
6 6 Liangfei Xue Dongfeng Yuan Mingyan Jiang Local data mining algorithm Data clock Local data mining result Fig. 2 Local algorithm module 3.2 Contrast In the process of cloud model we must establish the communication system and data exchange mechanism between each component part. We ally the function of each component and finally establish Web data mining system based on cloud computing. The difference between the traditional Web data mining and the cloud model is shown in table 1. Table 1 Web data mining comparison Contract entries Traditional Cloud model Data storage management Trivial Convenient Data storage speed Slow Fast Data mining speed Slow Fast Real-time of the results Dad Good 4 Application Examples Web data mining has a wide range of applications in real life. We need to get valuable information from mass data every day. Here, we put forward low cost, high reliability and interactive rural information-based construction system. We use this application example to illustrate the great advantage of cloud computing in Web
7 Web Data Mining based on Cloud Computing 7 data mining. This system framework is constituted by, network and server. System architecture is shown in figure 3. Server cluster Internet Fig. 3 Rural information-based system architecture In this application, data in storage cloud comes from existing professional agriculture website related to the agricultural production. These data sets are the pretreatment collections of information. Calculation cloud is the response to the user's required algorithm. For example, a user issue cotton cultivation technology. Cloud control server will give commands to the calculation cloud. Calculation cloud use specific data mining method to crawl information on data in storage cloud. The grabbing information returns to the customer. The traditional Web data mining costs a long time and high energy consumption. As shown in figure 4 is the grab situation. Fig. 4 Traditional web data mining statistics In cloud model trial there are many virtual nodes. If this task runs in the distributed node, a fnode for data mining time-consuming and grab situation shows in figure 5. After data statistics we can see, using the cloud model Web data mining can complete the task faster and more real-time. This model can reduce the energy consumption.
8 8 Liangfei Xue Dongfeng Yuan Mingyan Jiang Fig. 5 Web data mining cloud model mining statistics Through mathematical analysis, we can get the comparison in table 2. The original data set is the same. Table 2 Cloud model comparison Contract entries 2 nodes 5 nodes 8 nodes Average Time consuming 5 hour 20 minute 17 minute Data storage speed 10 KB/sec 35 KB/sec 43 KB/sec We can see from table that multiple nodes can significantly improve the data processing speed. Combining Web data mining and cloud computing organically make full use of the advantage of cloud computing in mass data processing. In cloud model, the backbone is the storage cloud and calculation cloud. These are the essence of the whole model and they provide the new ideas for future Web data mining. Storage cloud make mass data storage no longer the bottlenecks of the system. Storage cloud greatly improves the data management and the speed and accuracy of usage. Calculation cloud can use distributed computing technology to do rapid mass data mining in such a short period. This improves the quality of the data mining and shortens the response time of the service. Cloud model is greatly reduced the emergence of data in storage and in the process of computation. This model fits green economy theory. 5 Conclusions This paper solves the problem that how to mine useful information from the vast amount of information on internet. We brought the cloud computing thought to the Web data mining and established the Web data mining cloud model. This model can solve the most users' will and be able to give reliable information service according to the request. Along with the development of cloud computing, this solution is the inevitable result in solving Web data mining. Web data mining algorithm is various. The focus of the research in the future will be concentrated on looking for algorithm that can be better and more effective to meet parallel
9 Web Data Mining based on Cloud Computing 9 computing technology (Map/Reduce) features in data mining in the cloud model so as to better solve the problems in real life. Using cloud computing virtualization technology provides information services to a large number of users through distributed storage which can quickly store mass information. Through the calculation cloud we can develop efficient Web data mining to satisfy people's growing information needs. 7 References 1. Liu B. Web Data Mining[M]. New York: Springer-Verlag, H. Roth, J. Schiefer, H. Obweger, and S. Rozsnyai: Event Data Warehousing for Complex Event Processing. in 4th International Conference on Research Challenges in Information Science Szabolcs Rozsnyai, Aleksander Slominski, Yurdaer Doganata. Large-Scale Distributed Storage System for Business Provenance[C]. Cloud Computing, 2011:516~ Wu, K.L. Yu, P. S. Ballman, A. A Web usage mining and analysis tool, IBM Systems Journal, S. Rozsnyai, R. Vecera, J. Schiefer, and A. Schatten. Event Cloud-Searching for Correlated Business Events. In Proceedings of the 9th IEEE International Conference on E-Commerce Technology and The 4th IEEE International Conference on Enterprise Computing, Ecommerce and E-Services (CEC-EEE 2007), pages IEEE Computer Society, B.F. Cooper, A. Silberstein, E. Tam, R. Ramakrishnan, and R. Sears Benchmarking cloud serving systems with YCSB. In Proceedings of the 1st ACM symposium on Cloud computing (SoCC'10). ACM, New York, NY, USA, Borthaku D. The Hadoop Distributed File System: Architecture and Design[EB/OL]. ( ) Amazon, Inc.. Amazon Simple Store Service (Amazon S3) [EB/OL]. ( ) CHANG F, DEAN J, CHEMA WAT S. Big Table: A distributed storage system for structured data [J]. ACM Transactions on Computer System,2008,26(2): John Shafer, Rakesh Agrawal, Manish Mehta. SPRINT: A Scalable Parallel Classifier for Data Mining [C].U.S: IBM Almaden Research Center, 1996:544~ WANG JZ, WAN JG, LIU Z, WANG P. Data Mining of Mass Storage based on Cloud Computing[C]. Grid and cooperative computing (GCC), 2010:426~ T.R. Gopalakrishnan Nair, K.Lakshmi Madhuri. Data mining using hierarchical virtual k- means approach integrating data fragmenting data fragments in cloud computing environment[c]. Cloud Computing and Intelligence System (CCIS), 2011:230~ Pieter Noordhuis, Michiel Heijkoop, Alexander Lazovik. Mining Twitter in the Cloud: A Case Study[C]. Cloud Computing, 2010:107~ Raymond Kosala, Hendrik Blockeel, Web Mining Research: A Survey, In ACM SIGKDD, July Chen, M. S, Han, J. and Yu, P. S. Data Mining: An overview from a database perspective, IEEE transaction on knowledge and data engineering, Vol. 08, No. 6, pp: , 1996
Cloud Computing Model in Web Data Mining
Cloud Computing Model in Web Data Mining 1 Liangfei Xue, 2 Mingyan Jiang, 3 Dongfeng Yuan 1, First Author School of Information Science and Engineering, Shandong University, Jinan 250100, China, feiliangxue@gmail.com
More informationThe Establishment of Large Data Mining Platform Based on Cloud Computing. Wei CAI
2017 International Conference on Electronic, Control, Automation and Mechanical Engineering (ECAME 2017) ISBN: 978-1-60595-523-0 The Establishment of Large Data Mining Platform Based on Cloud Computing
More informationHuge Data Analysis and Processing Platform based on Hadoop Yuanbin LI1, a, Rong CHEN2
2nd International Conference on Materials Science, Machinery and Energy Engineering (MSMEE 2017) Huge Data Analysis and Processing Platform based on Hadoop Yuanbin LI1, a, Rong CHEN2 1 Information Engineering
More informationThe Design and Implementation of Disaster Recovery in Dual-active Cloud Center
International Conference on Information Sciences, Machinery, Materials and Energy (ICISMME 2015) The Design and Implementation of Disaster Recovery in Dual-active Cloud Center Xiao Chen 1, a, Longjun Zhang
More informationConstruction Scheme for Cloud Platform of NSFC Information System
, pp.200-204 http://dx.doi.org/10.14257/astl.2016.138.40 Construction Scheme for Cloud Platform of NSFC Information System Jianjun Li 1, Jin Wang 1, Yuhui Zheng 2 1 Information Center, National Natural
More informationResearch and Improvement of Apriori Algorithm Based on Hadoop
Research and Improvement of Apriori Algorithm Based on Hadoop Gao Pengfei a, Wang Jianguo b and Liu Pengcheng c School of Computer Science and Engineering Xi'an Technological University Xi'an, 710021,
More informationResearch on Mass Image Storage Platform Based on Cloud Computing
6th International Conference on Sensor Network and Computer Engineering (ICSNCE 2016) Research on Mass Image Storage Platform Based on Cloud Computing Xiaoqing Zhou1, a *, Jiaxiu Sun2, b and Zhiyong Zhou1,
More informationOpen Access Apriori Algorithm Research Based on Map-Reduce in Cloud Computing Environments
Send Orders for Reprints to reprints@benthamscience.ae 368 The Open Automation and Control Systems Journal, 2014, 6, 368-373 Open Access Apriori Algorithm Research Based on Map-Reduce in Cloud Computing
More informationProcessing Technology of Massive Human Health Data Based on Hadoop
6th International Conference on Machinery, Materials, Environment, Biotechnology and Computer (MMEBC 2016) Processing Technology of Massive Human Health Data Based on Hadoop Miao Liu1, a, Junsheng Yu1,
More informationNew research on Key Technologies of unstructured data cloud storage
2017 International Conference on Computing, Communications and Automation(I3CA 2017) New research on Key Technologies of unstructured data cloud storage Songqi Peng, Rengkui Liua, *, Futian Wang State
More informationConstruction and Application of Cloud Data Center in University
International Conference on Logistics Engineering, Management and Computer Science (LEMCS 2014) Construction and Application of Cloud Data Center in University Hong Chai Institute of Railway Technology,
More informationABSTRACT I. INTRODUCTION
International Journal of Scientific Research in Computer Science, Engineering and Information Technology 2018 IJSRCSEIT Volume 3 Issue 3 ISS: 2456-3307 Hadoop Periodic Jobs Using Data Blocks to Achieve
More informationA Study of Cloud Computing Scheduling Algorithm Based on Task Decomposition
2016 3 rd International Conference on Engineering Technology and Application (ICETA 2016) ISBN: 978-1-60595-383-0 A Study of Cloud Computing Scheduling Algorithm Based on Task Decomposition Feng Gao &
More informationDecision analysis of the weather log by Hadoop
Advances in Engineering Research (AER), volume 116 International Conference on Communication and Electronic Information Engineering (CEIE 2016) Decision analysis of the weather log by Hadoop Hao Wu Department
More informationA priority based dynamic bandwidth scheduling in SDN networks 1
Acta Technica 62 No. 2A/2017, 445 454 c 2017 Institute of Thermomechanics CAS, v.v.i. A priority based dynamic bandwidth scheduling in SDN networks 1 Zun Wang 2 Abstract. In order to solve the problems
More informationFramework Research on Privacy Protection of PHR Owners in Medical Cloud System Based on Aggregation Key Encryption Algorithm
Framework Research on Privacy Protection of PHR Owners in Medical Cloud System Based on Aggregation Key Encryption Algorithm Huiqi Zhao 1,2,3, Yinglong Wang 2,3*, Minglei Shu 2,3 1 Department of Information
More informationInternational Journal of Advance Engineering and Research Development. A Study: Hadoop Framework
Scientific Journal of Impact Factor (SJIF): e-issn (O): 2348- International Journal of Advance Engineering and Research Development Volume 3, Issue 2, February -2016 A Study: Hadoop Framework Devateja
More informationResearch Article Mobile Storage and Search Engine of Information Oriented to Food Cloud
Advance Journal of Food Science and Technology 5(10): 1331-1336, 2013 DOI:10.19026/ajfst.5.3106 ISSN: 2042-4868; e-issn: 2042-4876 2013 Maxwell Scientific Publication Corp. Submitted: May 29, 2013 Accepted:
More informationAn Indian Journal FULL PAPER ABSTRACT KEYWORDS. Trade Science Inc. The study on magnanimous data-storage system based on cloud computing
[Type text] [Type text] [Type text] ISSN : 0974-7435 Volume 10 Issue 11 BioTechnology 2014 An Indian Journal FULL PAPER BTAIJ, 10(11), 2014 [5368-5376] The study on magnanimous data-storage system based
More informationAn Indian Journal FULL PAPER. Trade Science Inc. Research on data mining clustering algorithm in cloud computing environments ABSTRACT KEYWORDS
[Type text] [Type text] [Type text] ISSN : 0974-7435 Volume 10 Issue 17 BioTechnology 2014 An Indian Journal FULL PAPER BTAIJ, 10(17), 2014 [9562-9566] Research on data mining clustering algorithm in cloud
More informationResearch on Design and Application of Computer Database Quality Evaluation Model
Research on Design and Application of Computer Database Quality Evaluation Model Abstract Hong Li, Hui Ge Shihezi Radio and TV University, Shihezi 832000, China Computer data quality evaluation is the
More informationA New HadoopBased Network Management System with Policy Approach
Computer Engineering and Applications Vol. 3, No. 3, September 2014 A New HadoopBased Network Management System with Policy Approach Department of Computer Engineering and IT, Shiraz University of Technology,
More informationDesign and Implementation of High-Speed Real-Time Data Acquisition and Processing System based on FPGA
2nd International Conference on Social Science and Technology Education (ICSSTE 2016) Design and Implementation of High-Speed Real-Time Data Acquisition and Processing System based on FPGA Guojuan Zhou
More informationBioTechnology. An Indian Journal FULL PAPER. Trade Science Inc. Study on secure data storage based on cloud computing ABSTRACT KEYWORDS
[Type text] [Type text] [Type text] ISSN : 0974-7435 Volume 10 Issue 22 BioTechnology 2014 An Indian Journal FULL PAPER BTAIJ, 10(22), 2014 [13778-13783] Study on secure data storage based on cloud computing
More informationImprovements and Implementation of Hierarchical Clustering based on Hadoop Jun Zhang1, a, Chunxiao Fan1, Yuexin Wu2,b, Ao Xiao1
3rd International Conference on Machinery, Materials and Information Technology Applications (ICMMITA 2015) Improvements and Implementation of Hierarchical Clustering based on Hadoop Jun Zhang1, a, Chunxiao
More informationINTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY
INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY A PATH FOR HORIZING YOUR INNOVATIVE WORK DISTRIBUTED FRAMEWORK FOR DATA MINING AS A SERVICE ON PRIVATE CLOUD RUCHA V. JAMNEKAR
More informationResearch and Application of E-Commerce Recommendation System Based on Association Rules Algorithm
Research and Application of E-Commerce Recommendation System Based on Association Rules Algorithm Qingting Zhu 1*, Haifeng Lu 2 and Xinliang Xu 3 1 School of Computer Science and Software Engineering,
More informationDesign and Implementation of Networked CNC Machine DNC System in. Colleges and Universities Based on Internet Plus
5th International Conference on Mechatronics, Materials, Chemistry and Computer Engineering (ICMMCCE 2017) Design and Implementation of Networked CNC Machine DNC System in Colleges and Universities Based
More informationCLASSIFICATION FOR SCALING METHODS IN DATA MINING
CLASSIFICATION FOR SCALING METHODS IN DATA MINING Eric Kyper, College of Business Administration, University of Rhode Island, Kingston, RI 02881 (401) 874-7563, ekyper@mail.uri.edu Lutz Hamel, Department
More informationIntroduction to Hadoop and MapReduce
Introduction to Hadoop and MapReduce Antonino Virgillito THE CONTRACTOR IS ACTING UNDER A FRAMEWORK CONTRACT CONCLUDED WITH THE COMMISSION Large-scale Computation Traditional solutions for computing large
More informationPLATFORM AND SOFTWARE AS A SERVICE THE MAPREDUCE PROGRAMMING MODEL AND IMPLEMENTATIONS
PLATFORM AND SOFTWARE AS A SERVICE THE MAPREDUCE PROGRAMMING MODEL AND IMPLEMENTATIONS By HAI JIN, SHADI IBRAHIM, LI QI, HAIJUN CAO, SONG WU and XUANHUA SHI Prepared by: Dr. Faramarz Safi Islamic Azad
More informationResearch Article Apriori Association Rule Algorithms using VMware Environment
Research Journal of Applied Sciences, Engineering and Technology 8(2): 16-166, 214 DOI:1.1926/rjaset.8.955 ISSN: 24-7459; e-issn: 24-7467 214 Maxwell Scientific Publication Corp. Submitted: January 2,
More informationInformation Push Service of University Library in Network and Information Age
2013 International Conference on Advances in Social Science, Humanities, and Management (ASSHM 2013) Information Push Service of University Library in Network and Information Age Song Deng 1 and Jun Wang
More informationThe power quality intelligent monitoring system based on cloud computing Jie Bai 1a, Changpo Song 2b
International Conference on Intelligent Systems Research and Mechatronics Engineering (ISRME 2015) The power quality intelligent monitoring system based on cloud computing Jie Bai 1a, Changpo Song 2b State
More informationMAPREDUCE FOR BIG DATA PROCESSING BASED ON NETWORK TRAFFIC PERFORMANCE Rajeshwari Adrakatti
International Journal of Computer Engineering and Applications, ICCSTAR-2016, Special Issue, May.16 MAPREDUCE FOR BIG DATA PROCESSING BASED ON NETWORK TRAFFIC PERFORMANCE Rajeshwari Adrakatti 1 Department
More informationSQL Query Optimization on Cross Nodes for Distributed System
2016 International Conference on Power, Energy Engineering and Management (PEEM 2016) ISBN: 978-1-60595-324-3 SQL Query Optimization on Cross Nodes for Distributed System Feng ZHAO 1, Qiao SUN 1, Yan-bin
More informationNext-generation IT Platforms Delivering New Value through Accumulation and Utilization of Big Data
Next-generation IT Platforms Delivering New Value through Accumulation and Utilization of Big Data 46 Next-generation IT Platforms Delivering New Value through Accumulation and Utilization of Big Data
More informationIntegration of information security and network data mining technology in the era of big data
Acta Technica 62 No. 1A/2017, 157 166 c 2017 Institute of Thermomechanics CAS, v.v.i. Integration of information security and network data mining technology in the era of big data Lu Li 1 Abstract. The
More informationA New Model of Search Engine based on Cloud Computing
A New Model of Search Engine based on Cloud Computing DING Jian-li 1,2, YANG Bo 1 1. College of Computer Science and Technology, Civil Aviation University of China, Tianjin 300300, China 2. Tianjin Key
More informationHigh Performance Computing on MapReduce Programming Framework
International Journal of Private Cloud Computing Environment and Management Vol. 2, No. 1, (2015), pp. 27-32 http://dx.doi.org/10.21742/ijpccem.2015.2.1.04 High Performance Computing on MapReduce Programming
More informationQADR with Energy Consumption for DIA in Cloud
Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology IJCSMC, Vol. 3, Issue. 4, April 2014,
More informationResearch on Applications of Data Mining in Electronic Commerce. Xiuping YANG 1, a
International Conference on Education Technology, Management and Humanities Science (ETMHS 2015) Research on Applications of Data Mining in Electronic Commerce Xiuping YANG 1, a 1 Computer Science Department,
More informationEnergy efficient optimization method for green data center based on cloud computing
4th ational Conference on Electrical, Electronics and Computer Engineering (CEECE 2015) Energy efficient optimization method for green data center based on cloud computing Runze WU1, a, Wenwei CHE1, b,
More informationDesign and Implementation of Agricultural Information Resources Vertical Search Engine Based on Nutch
619 A publication of CHEMICAL ENGINEERING TRANSACTIONS VOL. 51, 2016 Guest Editors: Tichun Wang, Hongyang Zhang, Lei Tian Copyright 2016, AIDIC Servizi S.r.l., ISBN 978-88-95608-43-3; ISSN 2283-9216 The
More informationINTEGRATING COLORED PETRI NET AND OBJECT ORIENTED THEORY INTO WORKFLOW MODEL
INTEGRATING COLORED PETRI NET AND OBJECT ORIENTED THEORY INTO WORKFLOW MODEL Zhengli Zhai 1,2 1 Department of Computer Science and Technology, Tongji University, China zhaizhl@163.com 2 Computer Engineering
More informationDynamic Data Placement Strategy in MapReduce-styled Data Processing Platform Hua-Ci WANG 1,a,*, Cai CHEN 2,b,*, Yi LIANG 3,c
2016 Joint International Conference on Service Science, Management and Engineering (SSME 2016) and International Conference on Information Science and Technology (IST 2016) ISBN: 978-1-60595-379-3 Dynamic
More informationINTERNATIONAL JOURNAL OF COMPUTER ENGINEERING & TECHNOLOGY (IJCET)
INTERNATIONAL JOURNAL OF COMPUTER ENGINEERING & TECHNOLOGY (IJCET) International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-6367(Print), ISSN 0976 6367(Print) ISSN 0976 6375(Online)
More informationImproving Suffix Tree Clustering Algorithm for Web Documents
International Conference on Logistics Engineering, Management and Computer Science (LEMCS 2015) Improving Suffix Tree Clustering Algorithm for Web Documents Yan Zhuang Computer Center East China Normal
More informationCLIENT DATA NODE NAME NODE
Volume 6, Issue 12, December 2016 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Efficiency
More informationImplementation of Parallel CASINO Algorithm Based on MapReduce. Li Zhang a, Yijie Shi b
International Conference on Artificial Intelligence and Engineering Applications (AIEA 2016) Implementation of Parallel CASINO Algorithm Based on MapReduce Li Zhang a, Yijie Shi b State key laboratory
More informationThe Design and Application of GIS Mathematical Model Database System with Meta-algorithm Li-Zhijiang
4th National Conference on Electrical, Electronics and Computer Engineering (NCEECE 2015) The Design and Application of GIS Mathematical Model Database System with Meta-algorithm Li-Zhijiang Yishui College,
More informationA Security Audit Module for HBase
2016 Joint International Conference on Artificial Intelligence and Computer Engineering (AICE 2016) and International Conference on Network and Communication Security (NCS 2016) ISBN: 978-1-60595-362-5
More informationResearch and Design of Education and Teaching Resource Management System based on ASP.NET Technology
2018 3rd International Conference on Education & Education Research (EDUER 2018) Research and Design of Education and Teaching Resource Management System based on ASP.NET Technology Jin Xin Science and
More informationD DAVID PUBLISHING. Big Data; Definition and Challenges. 1. Introduction. Shirin Abbasi
Journal of Energy and Power Engineering 10 (2016) 405-410 doi: 10.17265/1934-8975/2016.07.004 D DAVID PUBLISHING Shirin Abbasi Computer Department, Islamic Azad University-Tehran Center Branch, Tehran
More informationMulti-path based Algorithms for Data Transfer in the Grid Environment
New Generation Computing, 28(2010)129-136 Ohmsha, Ltd. and Springer Multi-path based Algorithms for Data Transfer in the Grid Environment Muzhou XIONG 1,2, Dan CHEN 2,3, Hai JIN 1 and Song WU 1 1 School
More informationI ++ Mapreduce: Incremental Mapreduce for Mining the Big Data
IOSR Journal of Computer Engineering (IOSR-JCE) e-issn: 2278-0661,p-ISSN: 2278-8727, Volume 18, Issue 3, Ver. IV (May-Jun. 2016), PP 125-129 www.iosrjournals.org I ++ Mapreduce: Incremental Mapreduce for
More informationCAMPSNA: A Cloud Assisted Mobile Peer to Peer Social Network Architecture
CAMPSNA: A Cloud Assisted Mobile Peer to Peer Social Network Architecture Yuan-ni Liu Hong Tang, Guo-feng Zhao The School of Communication and Information Engineering of ChongQing University of Posts and
More informationApplication of Redundant Backup Technology in Network Security
2018 2nd International Conference on Systems, Computing, and Applications (SYSTCA 2018) Application of Redundant Backup Technology in Network Security Shuwen Deng1, Siping Hu*, 1, Dianhua Wang1, Limin
More informationA Software-Defined Networking Security Controller Architecture. Fengjun Shang, Qiang Fu
4th International Conference on Machinery, Materials and Computing Technology (ICMMCT 2016) A Software-Defined Networking Security Controller Architecture Fengjun Shang, Qiang Fu College of Computer Science
More informationSurvey on MapReduce Scheduling Algorithms
Survey on MapReduce Scheduling Algorithms Liya Thomas, Mtech Student, Department of CSE, SCTCE,TVM Syama R, Assistant Professor Department of CSE, SCTCE,TVM ABSTRACT MapReduce is a programming model used
More informationMitigating Data Skew Using Map Reduce Application
Ms. Archana P.M Mitigating Data Skew Using Map Reduce Application Mr. Malathesh S.H 4 th sem, M.Tech (C.S.E) Associate Professor C.S.E Dept. M.S.E.C, V.T.U Bangalore, India archanaanil062@gmail.com M.S.E.C,
More informationSTATS Data Analysis using Python. Lecture 7: the MapReduce framework Some slides adapted from C. Budak and R. Burns
STATS 700-002 Data Analysis using Python Lecture 7: the MapReduce framework Some slides adapted from C. Budak and R. Burns Unit 3: parallel processing and big data The next few lectures will focus on big
More informationANN-Based Modeling for Load and Main Steam Pressure Characteristics of a 600MW Supercritical Power Generating Unit
ANN-Based Modeling for Load and Main Steam Pressure Characteristics of a 600MW Supercritical Power Generating Unit Liangyu Ma, Zhiyuan Gao Automation Department, School of Control and Computer Engineering
More informationAnalyzing and Improving Load Balancing Algorithm of MooseFS
, pp. 169-176 http://dx.doi.org/10.14257/ijgdc.2014.7.4.16 Analyzing and Improving Load Balancing Algorithm of MooseFS Zhang Baojun 1, Pan Ruifang 1 and Ye Fujun 2 1. New Media Institute, Zhejiang University
More informationUnstructured Data Migration and Dump Technology of Large-scale Enterprises
2018 2nd International Conference on Systems, Computing, and Applications (SYSTCA 2018) Unstructured Data Migration and Dump Technology of Large-scale Enterprises Shuo Chen1,*, Shixin Fan2, Zhao Li1, Xinliu
More informationAnalysis Range-Free Node Location Algorithm in WSN
International Conference on Education, Management and Computer Science (ICEMC 2016) Analysis Range-Free Node Location Algorithm in WSN Xiaojun Liu1, a and Jianyu Wang1 1 School of Transportation Huanggang
More informationSystem For Product Recommendation In E-Commerce Applications
International Journal of Engineering Research and Development e-issn: 2278-067X, p-issn: 2278-800X, www.ijerd.com Volume 11, Issue 05 (May 2015), PP.52-56 System For Product Recommendation In E-Commerce
More informationStudy on A Recommendation Algorithm of Crossing Ranking in E- commerce
International Journal of u-and e-service, Science and Technology, pp.53-62 http://dx.doi.org/10.14257/ijunnesst2014.7.4.6 Study on A Recommendation Algorithm of Crossing Ranking in E- commerce Duan Xueying
More informationDesign and Realization of Agricultural Information Intelligent Processing and Application Platform
Design and Realization of Agricultural Information Intelligent Processing and Application Platform Dan Wang 1,2 1 Institute of Agricultural Information, Chinese Academy of Agricultural Sciences, Beijing
More informationThe Construction of Open Source Cloud Storage System for Digital Resources
2017 3rd International Conference on Electronic Information Technology and Intellectualization (ICEITI 2017) ISBN: 978-1-60595-512-4 The Construction of Open Source Cloud Storage System for Digital Resources
More informationA Privacy Preserving Model for Ownership Indexing in Distributed Storage Systems
A Privacy Preserving Model for Ownership Indexing in Distributed Storage Systems Tiejian Luo tjluo@ucas.ac.cn Zhu Wang wangzhubj@gmail.com Xiang Wang wangxiang11@mails.ucas.ac.cn ABSTRACT The indexing
More informationPROFILING BASED REDUCE MEMORY PROVISIONING FOR IMPROVING THE PERFORMANCE IN HADOOP
ISSN: 0976-2876 (Print) ISSN: 2250-0138 (Online) PROFILING BASED REDUCE MEMORY PROVISIONING FOR IMPROVING THE PERFORMANCE IN HADOOP T. S. NISHA a1 AND K. SATYANARAYAN REDDY b a Department of CSE, Cambridge
More informationEmbedded Technosolutions
Hadoop Big Data An Important technology in IT Sector Hadoop - Big Data Oerie 90% of the worlds data was generated in the last few years. Due to the advent of new technologies, devices, and communication
More informationResearch on Social Relationship Network System based on MongoDB
Research on Social Relationship Network System based on MongoDB Yingyan Long School of Educational Sciences Shaanxi University of Technology Han Zhong, Shaanxi, China Abstract The relationship between
More informationVideo annotation based on adaptive annular spatial partition scheme
Video annotation based on adaptive annular spatial partition scheme Guiguang Ding a), Lu Zhang, and Xiaoxu Li Key Laboratory for Information System Security, Ministry of Education, Tsinghua National Laboratory
More informationPart 1: Indexes for Big Data
JethroData Making Interactive BI for Big Data a Reality Technical White Paper This white paper explains how JethroData can help you achieve a truly interactive interactive response time for BI on big data,
More informationAN WIRELESS COLLECTION AND MONITORING SYSTEM DESIGN BASED ON ARDUINO. Lu Shaokun 1,e*
Advanced Materials Research Online: 2014-06-25 ISSN: 1662-8985, Vols. 971-973, pp 1076-1080 doi:10.4028/www.scientific.net/amr.971-973.1076 2014 Trans Tech Publications, Switzerland AN WIRELESS COLLECTION
More informationExploration of Fault Diagnosis Technology for Air Compressor Based on Internet of Things
Exploration of Fault Diagnosis Technology for Air Compressor Based on Internet of Things Zheng Yue-zhai and Chen Xiao-ying Abstract With the development of network and communication technology, this article
More informationCombining Review Text Content and Reviewer-Item Rating Matrix to Predict Review Rating
Combining Review Text Content and Reviewer-Item Rating Matrix to Predict Review Rating Dipak J Kakade, Nilesh P Sable Department of Computer Engineering, JSPM S Imperial College of Engg. And Research,
More informationPreliminary Research on Distributed Cluster Monitoring of G/S Model
Available online at www.sciencedirect.com Physics Procedia 25 (2012 ) 860 867 2012 International Conference on Solid State Devices and Materials Science Preliminary Research on Distributed Cluster Monitoring
More informationAn improved MapReduce Design of Kmeans for clustering very large datasets
An improved MapReduce Design of Kmeans for clustering very large datasets Amira Boukhdhir Laboratoire SOlE Higher Institute of management Tunis Tunis, Tunisia Boukhdhir _ amira@yahoo.fr Oussama Lachiheb
More informationResearch on Parallelized Stream Data Micro Clustering Algorithm Ke Ma 1, Lingjuan Li 1, Yimu Ji 1, Shengmei Luo 1, Tao Wen 2
International Conference on Advances in Mechanical Engineering and Industrial Informatics (AMEII 2015) Research on Parallelized Stream Data Micro Clustering Algorithm Ke Ma 1, Lingjuan Li 1, Yimu Ji 1,
More informationOpen Access The Three-dimensional Coding Based on the Cone for XML Under Weaving Multi-documents
Send Orders for Reprints to reprints@benthamscience.ae 676 The Open Automation and Control Systems Journal, 2014, 6, 676-683 Open Access The Three-dimensional Coding Based on the Cone for XML Under Weaving
More informationAN EFFECTIVE DETECTION OF SATELLITE IMAGES VIA K-MEANS CLUSTERING ON HADOOP SYSTEM. Mengzhao Yang, Haibin Mei and Dongmei Huang
International Journal of Innovative Computing, Information and Control ICIC International c 2017 ISSN 1349-4198 Volume 13, Number 3, June 2017 pp. 1037 1046 AN EFFECTIVE DETECTION OF SATELLITE IMAGES VIA
More informationWhen, Where & Why to Use NoSQL?
When, Where & Why to Use NoSQL? 1 Big data is becoming a big challenge for enterprises. Many organizations have built environments for transactional data with Relational Database Management Systems (RDBMS),
More informationDynamic Clustering of Data with Modified K-Means Algorithm
2012 International Conference on Information and Computer Networks (ICICN 2012) IPCSIT vol. 27 (2012) (2012) IACSIT Press, Singapore Dynamic Clustering of Data with Modified K-Means Algorithm Ahamed Shafeeq
More informationSurvey Paper on Traditional Hadoop and Pipelined Map Reduce
International Journal of Computational Engineering Research Vol, 03 Issue, 12 Survey Paper on Traditional Hadoop and Pipelined Map Reduce Dhole Poonam B 1, Gunjal Baisa L 2 1 M.E.ComputerAVCOE, Sangamner,
More informationLOG FILE ANALYSIS USING HADOOP AND ITS ECOSYSTEMS
LOG FILE ANALYSIS USING HADOOP AND ITS ECOSYSTEMS Vandita Jain 1, Prof. Tripti Saxena 2, Dr. Vineet Richhariya 3 1 M.Tech(CSE)*,LNCT, Bhopal(M.P.)(India) 2 Prof. Dept. of CSE, LNCT, Bhopal(M.P.)(India)
More informationIMPLEMENTATION OF INFORMATION RETRIEVAL (IR) ALGORITHM FOR CLOUD COMPUTING: A COMPARATIVE STUDY BETWEEN WITH AND WITHOUT MAPREDUCE MECHANISM *
Journal of Contemporary Issues in Business Research ISSN 2305-8277 (Online), 2012, Vol. 1, No. 2, 42-56. Copyright of the Academic Journals JCIBR All rights reserved. IMPLEMENTATION OF INFORMATION RETRIEVAL
More informationStudy on XML-based Heterogeneous Agriculture Database Sharing Platform
Study on XML-based Heterogeneous Agriculture Database Sharing Platform Qiulan Wu, Yongxiang Sun, Xiaoxia Yang, Yong Liang,Xia Geng School of Information Science and Engineering, Shandong Agricultural University,
More informationEFFICIENT ALLOCATION OF DYNAMIC RESOURCES IN A CLOUD
EFFICIENT ALLOCATION OF DYNAMIC RESOURCES IN A CLOUD S.THIRUNAVUKKARASU 1, DR.K.P.KALIYAMURTHIE 2 Assistant Professor, Dept of IT, Bharath University, Chennai-73 1 Professor& Head, Dept of IT, Bharath
More informationSurvey on Incremental MapReduce for Data Mining
Survey on Incremental MapReduce for Data Mining Trupti M. Shinde 1, Prof.S.V.Chobe 2 1 Research Scholar, Computer Engineering Dept., Dr. D. Y. Patil Institute of Engineering &Technology, 2 Associate Professor,
More informationThe Comparative Study of Machine Learning Algorithms in Text Data Classification*
The Comparative Study of Machine Learning Algorithms in Text Data Classification* Wang Xin School of Science, Beijing Information Science and Technology University Beijing, China Abstract Classification
More informationDesign of Labour Agency Platform Based on Agent Technology of JADE *
Design of Labour Agency Platform Based on Agent Technology of JADE * Xiaobin Qiu **, Nan Zhou, and Xin Wang Network Center, China Agriculture University, Beijing 100083, P.R. China qxb@cau.edu.cn Abstract.
More informationCloud Computing. Hwajung Lee. Key Reference: Prof. Jong-Moon Chung s Lecture Notes at Yonsei University
Cloud Computing Hwajung Lee Key Reference: Prof. Jong-Moon Chung s Lecture Notes at Yonsei University Cloud Computing Cloud Introduction Cloud Service Model Big Data Hadoop MapReduce HDFS (Hadoop Distributed
More informationResearch on Heterogeneous Data resource Management Model in Cloud Environment
, pp.141-152 http://dx.doi.org/10.14257/ijdta.2013.6.5.13 Research on Heterogeneous Data resource Management Model in Cloud Environment Tao Sun 1,2 and Xinjun Wang 1 1 School of Computer Science and Technology,
More informationActiveScale Erasure Coding and Self Protecting Technologies
WHITE PAPER AUGUST 2018 ActiveScale Erasure Coding and Self Protecting Technologies BitSpread Erasure Coding and BitDynamics Data Integrity and Repair Technologies within The ActiveScale Object Storage
More informationResearch and Design of Data Storage Scheme for Electric Power Big Data
3rd International Conference on Management, Education, Information and Control (MEICI 2015) Research and Design of Data Storage Scheme for Electric Power Big Data Wenfeng Song 1,a, Wanqing Yang 2,b*, Jingzhao
More information4th National Conference on Electrical, Electronics and Computer Engineering (NCEECE 2015)
4th National Conference on Electrical, Electronics and Computer Engineering (NCEECE 2015) Benchmark Testing for Transwarp Inceptor A big data analysis system based on in-memory computing Mingang Chen1,2,a,
More informationTwitter data Analytics using Distributed Computing
Twitter data Analytics using Distributed Computing Uma Narayanan Athrira Unnikrishnan Dr. Varghese Paul Dr. Shelbi Joseph Research Scholar M.tech Student Professor Assistant Professor Dept. of IT, SOE
More information