DYNAMIC DATA STORAGE AND PLACEMENT SYSTEM BASED ON THE CATEGORY AND POPULARITY
|
|
- Blaise Foster
- 5 years ago
- Views:
Transcription
1 Software Metric Trends And Evolution, B Venkata Ramana, Dr.G.Narasimha, Journal Impact Factor DYNAMIC DATA STORAGE AND PLACEMENT SYSTEM BASED ON (2015): (Calculated by GISI) THE CATEGORY AND POPULARITY Miss. Radhika Jaju, Prof. Priya Deshpande Volume 6, Issue 6, June (2015), pp Article ID: International Journal of Computer Engineering & Technology (IJCET) IAEME: ISSN (Print) ISSN (Online) IJCET I A E M E 1 Student (MITCOE, Pune) 2 Asst. Professor (MITCOE, Pune) ABSTRACT Distributed computing storage and management is widely adopted topic for research now days. Various replication strategies are developed to solve the issues regarding the storage and management. We also developed one system i.e. S&P System which gives better results than HDFS for parameters like performance, access time, memory utilization. In this paper S&P system divides data category wise and store that data on the node assigned to it. This will reduce the access time and total cost. After that replication of data will be done on the basis of access history, popularity and replication factor of the file. Replication factor is calculated by formula. This will achieve memory utilization. Keywords: Category, Access History, Popularity. I. INTRODUCTION Distributed computing and its parameters are very popular and hot topics in recent years. Big data storage is one of them. Many experiments are taking place every day and some of them are really showing effective results. Big data is a data which is of any kind and of any size. And day by day storing and managing of such large amount of data is a challengeable task. And to do so, many strategies and algorithms are derived. Out of them some showed better results and enhanced overall performance. Hadoop is the basic technology where simple strategy is used for storage and placement of data. Hadoop stores data in the form of chunks where all the chunks are of equal size. Large number of data is generating regularly, and this data is needed to be stored properly for efficient data access and also to prevent the data loss, memory loss, data redundancy, data duplications. So How to place the data? Where to place the data? Why to place the data? [1] Hadoop is open source framework, works on the storage issue. Hadoop plays important role in distributed computing system. Here following diagram will show the process of storing data on hadoop. 8
2 This shows the storage of data in Hadoop framework. Hadoop structure basically has a Namenode which acts as a master and group of different Datanodes are attached to the namenode which acts like slaves. Namenode is connected to the several datanodes on which user performs different storage operations. Client ask for the storing of data to the namenode and then namenode checks for the availability of the datanodes and send acknowledgement accordingly to the clients. The datanodes are pipelined together on which client willing to write the data. And several copies of the stored data are generated and stores on other nodes of the pipeline. Copy count of the data is a constant number in Hadoop most probably the count is 3 or 4. Due to several copies of data, storage is safe and efficient as any of datanode crashes, the data is available on another node so that one can recover it from there. With some advantages basic Hadoop strategy leads to some disadvantages also like redundancy, overheads, data loading time, retrieval time etc. so to overcome these disadvantages of Hadoop number of strategies are implemented and performance of few of them is effective. We are also applied a strategy which is extension of Hadoop. Means with the help of basic Hadoop and adding some new concepts in it we derived new algorithm which showss effective results overcomes the disadvantage of Hadoop. Our strategy will store the data according to category it belongs. And then the copies of the are placed where they will be used more so that consumption time will be less. II. RELATED WORK In [3] Wuqing Zhao and others introduce us to the new strategy DORS (Dynamic Optimal Replication strategy). First of all they decide whether there is need to replicate the file or not, on the basis of some earlier work. After that importance number is given to the files by calculating the occurrence of the file and depending upon the accesss history of the file. And according to the occurrences the files are replaced with the less important file. This algorithm showed better results with the simulator. In [4] Alexis M. Soosai et. Al proposed strategy LVR (Least Value Replacement). This method is based upon the future value prediction. LVR framework automatically decides which file should get deleted in future whenever the data grid storage of sites is full on the basis of information about access 9
3 frequency of the file, files future value, free space on the storage element. LVR strategy shows the simulated results with the help of OpterSim simulator In [5] Myunghoon Jeon et. Al defined about DRS: Dynamic Replication Strategy is used for improved data access. As the data access and performance is closely related to the data access pattern. Traditional strategies belong to the particular data access pattern which is less effective for other patterns. So DRS is derived in such way thatt strategy changes according to the pattern. Again as the pattern changes dynamically frequency count number of files is adopted dynamically. In [6]Chang has proposed LALW (Latest Access Largest weight), here largest weight will be applied to the file which is accessed most recently. Similarly SATO et. al presented small modification to the simple replication algorithms on the basis of file access pattern and network capacity. DRCP is Dynamic Replica creation and placement proposes the placement policy and the replica selection to reduce the execution time and bandwidth consumption. Their replication is based on the popularity of the file and this strategy is implemented using data grid simulator, OptorSim [7, 8]. In this paper we derived strategy for the data placement where we are storing the data category wise and accordingly we are placing the data on the nodes depending on the access history, replication factor and popularity of the file. Rest of the paper is divided as section III will describe the system architecture and the storage and placement of the data according to the popularity and category of the data. Section IV shows the results and evaluation. Section V gives the conclusion and future work and Section VI suggests the references. III. SYSTEM ARCHITECTUREE Figure gives the overall idea of proposed strategy. Different jobs from different clients are requested to the job broker. Here job broker works as a namenode of hadoop. Then job broker will run the K-means algorithm to find out the category of the data to be submitted. Then according to category data will be stored on the particular categorized node. 10
4 A. Dynamic Data Distribution As client requests for the data storage. There is need to store the data efficiently so that one can access it easily. So for that we need to store data in such format that one can get the data easily, and ultimately access cost will also get reduced as files are easily accessible. In our strategy to get the information easily Jobs are submitted to the job broker to store the data. After that job broker will divide the data category wise. And then one category get sub-category and so on. And then data will be stored in appropriate datanode assigned for that category. Due to this fragmentation it is suitable to store and access the data. Data will be retrieved easily and the file transfer traffic ratio will be low ultimately it will effect on the performance and cost of the file transfer. Data will be divided category wise. For example, Plastic factory data will have different category like vendors data, supplier s data, There are different strategies are studied to divide such a data category wise. K-means is one of the algorithms used for the data categorization purpose. It is a well known partitioning algorithm where the objects are categorized as they belong to the one of K-groups, here K is priori. Depending on the mean multidimensional version i.e. centroid of the cluster, the belonging of that object to the particular cluster is finalized. It means object is assigned to the group having closest centroid [9, 10].K-means works particularly by calculating centroid of each cluster. And it is cost effective. Basic k-means algorithm is Whenever data will come to the job tracker, job tracker will invoke the K-means algorithm. K- means will divide that data category wise and that categorized data will be stored on to the node assigned for that category. In case of storage full, the new arriving data will automatically propagate to the nearer node. B. Data Placement When actually we storing data, what is the need of data placement? As clients are storing unstructured any size and any kind of data storage may get full. What if storage element of particular site gets full? How to store new data on the sites? How to prevent already stored important data from deleting. As per Hadoop, when such situation occurs Hadoop deletes the files randomly and replace new 11
5 one instead of. Hadoop may delete one or two files as per requirement of free size to store new file. But here, there are chances of deleting a file which will highly require in future. So to deal with this problem, our strategy is implemented. This is based on the popularity and access history of the file. The storage of the new file will be depends on the replication factor. If the file having copies less than the replication factor then they are copied and if they are having number more than it they will not get copied. The replication factor will be decided on the basis of the capacity of all the nodes to the total size of all the files. R= C/W [11] R= Replication factor, C= capacity of all the nodes, W= Total size of all files in data grid. Here R will decide whether to replicate the file or not. If the copies of the files are less than R then files will be replaced otherwise not. [12] Our system there is one replication manager who maintains all the replica count log files and access history count log files which is centrally situated and connected to all the nodes. These logs are cleared after each fixed interval of time. Here we have taken a fix time interval. Due to this we get the updated replica count and new replication factor after every interval. Once we got the popularity count of all the files at that interval, according to the count we replace the files. The file having more popularity will get first chance of replication and so on. IV. PERFORMANCE EVALUATION To show improved results of our system (S&P System), we compared our system with HDFS. A. Popularity Comparison Popularity comparison graph between the two systems is shown in the fig 3 and fig 4. We have taken 4 same size files for both the system. S & P shows files requested count at every interval. So depending upon the count, the chance for replication gets to the highest popular file. Where HDFS does not work on popularity, that mean all the files replicated randomly. B. Data Placement Replica comparison is shown in the graph 5& 6 at fix time interval. At every interval Hadoop replicates all the files which are requested at that time, where S&P system replaces only those files which are popular. This will result in efficient access of time. 12
6 C. If Memory Gets Full In case if any node storage gets full, the new file will be replaced with the old one. In HDFS the file to be deleted is selected randomly, whereas in S& P system the least popular file will be deleted. The scenario is explained in fig 7&8. D. Mean Access Time As we started to upload different type and size of files, we got the different results. We have compared our results with the basic Hadoop. And that shows our system s access time is less than Original HDFS. Mean access Time will be calculated according to size. We got two different Values for two different sizes of data sets. They are as follows. 13
7 E. Memory Utilization Memory Utilization is based on the strategy used for the system. This relates to the storage capacity of the system. Here, we are comparing Original HDFS system with our data storage and placement system. The Utilization factor depends upon the number of jobs executed V. CONCLUSION In This paper, our data distribution strategy helps to improve the data access time. K-means algorithm will divide the data category wise and then it will send to respective node assigned for the particular category. So when the user request for the file or stores the file, K-Means will run and will go to particular category node and will perform on it. As our Replication strategy is based on the popularity and access history of the file. This strategy shows the better results than traditional HDFS system. In future we will also consider the scheduling criteria, load balancing, recovering so that we can perform on whole system and will give better results. VI. REFERENCES 1. W. Zhao, et al., A Dynamic Optimal Replication Strategy in Data Grid Environment, International Conference on Internet Technology and Applications, pp. 1-4, White, Tom. Hadoop The Definitive Guide. Sebastopol : O'Reilly, A Dynamic Optimal Replication Strategy in Data Grid Environment, Wuqing Zhao, Xianbin Xu, Zhuowei Wang, Yuping Zhang, Shuibing He, School of Computer, Wuhan University, Wuhan- IEEE. 4. Alxis M Sosaai et. Al Dynamic Replica replacement strategy in data grids. Department of CS, University of Malaysia. 5. MyunghoonJeon, Kwang-Ho Lim, Hyun Ahn, Byoung-Dai Lee, Dynamic Data Replication Scheme in cloud Computing IEEE. 6. R.S. Chang, H.P. Chang, "A Dynamic Data Replication Strategy Using Access-Weights in Data Grids," Supercomputing, Vol. 45, No. 3, pp , K. Sashi, A. Selvadoss Thanamani, A New Replica Creation and Placement Algorithm for Data Grid Environment, IEEE International Conference on Data Storage and Data Engineering (2010). 8. K. Sashi, A. Selvadoss Thanamani, Dynamic Replication in a Data Grid using a Modified BHR Region Based Algorithm, Elsevier Future Generation Computer Systems (2011). 14
8 Prof. Priya Deshpande, Journal Impact Factor (2015): (Calculated By Gisi) 9. Chen G., Jaradat S., Banerjee N., Tanaka T., Ko M., and Zhang M., Evaluation and Comparison of clustering algorithms in analyzing ES cell Gene Expression Data, Statistica Sinica, vol. 12, pp , Osama Abu Abbas, Comparisons between data clustering Algorithms, The international Arab journal of information technology, Vol. 5, No. 3, July Wolfgang Hoschek, Francisco Javier Jaén-Martínez, Asad Samar, Heinz Stockinger, and Kurt Stockinger, Data Management in an International Data Grid Project, Proceedings of the First IEEE/ACM International Workshop on Grid Computing, Springer-Verlag, 2000, pp Radhika Jaju, et. Al. Dynamic Data Storage and Replication Based on the Category and Data Access Patterns, MITCOE, Pune University IJSWS D.Sai Anuhya and Smriti Agrawal, 3-D Holographic Data Storage International journal of Computer Engineering & Technology (IJCET), Volume 4, Issue 6, 2013, pp , ISSN Print: , ISSN Online: Yaser Fuad Al-Dubai and Dr. Khamitkar S.D, A Proposed Model For Data Storage Security In Cloud Computing Using Kerberos Authentication Service International journal of Computer Engineering & Technology (IJCET), Volume 4, Issue 6, 2013, pp , ISSN Print: , ISSN Online: D.Pratiba and Dr.G.Shobha, Privacy-Preserving Public Auditing For Data Storage Security In Cloud Computing International journal of Computer Engineering & Technology (IJCET), Volume 4, Issue 3, 2013, pp , ISSN Print: , ISSN Online:
HADOOP BLOCK PLACEMENT POLICY FOR DIFFERENT FILE FORMATS
INTERNATIONAL JOURNAL OF COMPUTER ENGINEERING & TECHNOLOGY (IJCET) International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-6367(Print), ISSN 0976 6367(Print) ISSN 0976 6375(Online)
More informationCATEGORIZATION OF THE DOCUMENTS BY USING MACHINE LEARNING
CATEGORIZATION OF THE DOCUMENTS BY USING MACHINE LEARNING Amol Jagtap ME Computer Engineering, AISSMS COE Pune, India Email: 1 amol.jagtap55@gmail.com Abstract Machine learning is a scientific discipline
More informationINTERNATIONAL JOURNAL OF COMPUTER ENGINEERING & TECHNOLOGY (IJCET)
INTERNATIONAL JOURNAL OF COMPUTER ENGINEERING & TECHNOLOGY (IJCET) International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-6367(Print), ISSN 0976 6367(Print) ISSN 0976 6375(Online)
More informationDynamic Data Placement Strategy in MapReduce-styled Data Processing Platform Hua-Ci WANG 1,a,*, Cai CHEN 2,b,*, Yi LIANG 3,c
2016 Joint International Conference on Service Science, Management and Engineering (SSME 2016) and International Conference on Information Science and Technology (IST 2016) ISBN: 978-1-60595-379-3 Dynamic
More informationQADR with Energy Consumption for DIA in Cloud
Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology IJCSMC, Vol. 3, Issue. 4, April 2014,
More informationINTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY
INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY A PATH FOR HORIZING YOUR INNOVATIVE WORK DISTRIBUTED FRAMEWORK FOR DATA MINING AS A SERVICE ON PRIVATE CLOUD RUCHA V. JAMNEKAR
More informationDynamic Data Grid Replication Strategy Based on Internet Hierarchy
Dynamic Data Grid Replication Strategy Based on Internet Hierarchy Sang-Min Park 1, Jai-Hoon Kim 1, Young-Bae Ko 2, and Won-Sik Yoon 2 1 Graduate School of Information and Communication Ajou University,
More informationSDS: A Scalable Data Services System in Data Grid
SDS: A Scalable Data s System in Data Grid Xiaoning Peng School of Information Science & Engineering, Central South University Changsha 410083, China Department of Computer Science and Technology, Huaihua
More informationCLIENT DATA NODE NAME NODE
Volume 6, Issue 12, December 2016 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Efficiency
More informationReview On Data Replication with QoS and Energy Consumption for Data Intensive Applications in Cloud Computing
Review On Data Replication with QoS and Energy Consumption for Data Intensive Applications in Cloud Computing Ms. More Reena S 1, Prof.Nilesh V. Alone 2 Department of Computer Engg, University of Pune
More information6367(Print), ISSN (Online) Volume 4, Issue 2, March April (2013), IAEME & TECHNOLOGY (IJCET)
INTERNATIONAL International Journal of Computer JOURNAL Engineering OF COMPUTER and Technology ENGINEERING (IJCET), ISSN 0976- & TECHNOLOGY (IJCET) ISSN 0976 6367(Print) ISSN 0976 6375(Online) Volume 4,
More informationA Review Approach for Big Data and Hadoop Technology
International Journal of Modern Trends in Engineering and Research www.ijmter.com e-issn No.:2349-9745, Date: 2-4 July, 2015 A Review Approach for Big Data and Hadoop Technology Prof. Ghanshyam Dhomse
More informationOptimizing Hadoop Block Placement Policy & Cluster Blocks Distribution
Vol:6, No:1, 212 Optimizing Hadoop Block Placement Policy & Cluster Blocks Distribution Nchimbi Edward Pius, Liu Qin, Fion Yang, Zhu Hong Ming International Science Index, Computer and Information Engineering
More informationSURVEY ON LOAD BALANCING AND DATA SKEW MITIGATION IN MAPREDUCE APPLICATIONS
INTERNATIONAL JOURNAL OF COMPUTER ENGINEERING & TECHNOLOGY (IJCET) International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-6367(Print), ISSN 0976 6367(Print) ISSN 0976 6375(Online)
More informationPerformance Evaluation of Mesh - Based Multicast Routing Protocols in MANET s
Performance Evaluation of Mesh - Based Multicast Routing Protocols in MANET s M. Nagaratna Assistant Professor Dept. of CSE JNTUH, Hyderabad, India V. Kamakshi Prasad Prof & Additional Cont. of. Examinations
More informationResearch on Load Balancing in Task Allocation Process in Heterogeneous Hadoop Cluster
2017 2 nd International Conference on Artificial Intelligence and Engineering Applications (AIEA 2017) ISBN: 978-1-60595-485-1 Research on Load Balancing in Task Allocation Process in Heterogeneous Hadoop
More informationMounica B, Aditya Srivastava, Md. Faisal Alam
International Journal of Scientific Research in Computer Science, Engineering and Information Technology 2017 IJSRCSEIT Volume 2 Issue 3 ISSN : 2456-3307 Clustering of large datasets using Hadoop Ecosystem
More informationIndexing Strategies of MapReduce for Information Retrieval in Big Data
International Journal of Advances in Computer Science and Technology (IJACST), Vol.5, No.3, Pages : 01-06 (2016) Indexing Strategies of MapReduce for Information Retrieval in Big Data Mazen Farid, Rohaya
More informationA Novel Data Replication Policy in Data Grid
Australian Journal of Basic and Applied Sciences, 6(7): 339-344, 2012 ISSN 1991-8178 A Novel Data Replication Policy in Data Grid Yaser Nemati, Faramarz Samsami, Mehdi Nikhkhah Department of Computer,
More informationNowadays data-intensive applications play a
Journal of Advances in Computer Engineering and Technology, 3(2) 2017 Data Replication-Based Scheduling in Cloud Computing Environment Bahareh Rahmati 1, Amir Masoud Rahmani 2 Received (2016-02-02) Accepted
More informationEDPFRS: ENHANCED DYNAMIC POPULAR FILE REPLICATION AND SCHEDULING FOR DATA GRID ENVIRONMENT
International Journal of Computer Engineering & Technology (IJCET) Volume 9, Issue 6, November-December 2018, pp. 125 139, Article ID: IJCET_09_06_015 Available online at http://www.iaeme.com/ijcet/issues.asp?jtype=ijcet&vtype=9&itype=6
More informationDistributed Face Recognition Using Hadoop
Distributed Face Recognition Using Hadoop A. Thorat, V. Malhotra, S. Narvekar and A. Joshi Dept. of Computer Engineering and IT College of Engineering, Pune {abhishekthorat02@gmail.com, vinayak.malhotra20@gmail.com,
More informationAnalyzing and Improving Load Balancing Algorithm of MooseFS
, pp. 169-176 http://dx.doi.org/10.14257/ijgdc.2014.7.4.16 Analyzing and Improving Load Balancing Algorithm of MooseFS Zhang Baojun 1, Pan Ruifang 1 and Ye Fujun 2 1. New Media Institute, Zhejiang University
More informationTITLE: PRE-REQUISITE THEORY. 1. Introduction to Hadoop. 2. Cluster. Implement sort algorithm and run it using HADOOP
TITLE: Implement sort algorithm and run it using HADOOP PRE-REQUISITE Preliminary knowledge of clusters and overview of Hadoop and its basic functionality. THEORY 1. Introduction to Hadoop The Apache Hadoop
More informationINTERNATIONAL JOURNAL OF COMPUTER ENGINEERING & TECHNOLOGY (IJCET)
INTERNATIONAL JOURNAL OF COMPUTER ENGINEERING & TECHNOLOGY (IJCET) International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-6367(Print), ISSN 0976 6367(Print) ISSN 0976 6375(Online)
More informationResearch Article Mobile Storage and Search Engine of Information Oriented to Food Cloud
Advance Journal of Food Science and Technology 5(10): 1331-1336, 2013 DOI:10.19026/ajfst.5.3106 ISSN: 2042-4868; e-issn: 2042-4876 2013 Maxwell Scientific Publication Corp. Submitted: May 29, 2013 Accepted:
More informationFuture Generation Computer Systems. PDDRA: A new pre-fetching based dynamic data replication algorithm in data grids
Future Generation Computer Systems 28 (2012) 666 681 Contents lists available at SciVerse ScienceDirect Future Generation Computer Systems journal homepage: www.elsevier.com/locate/fgcs PDDRA: A new pre-fetching
More informationAvailable online at ScienceDirect. Procedia Computer Science 89 (2016 )
Available online at www.sciencedirect.com ScienceDirect Procedia Computer Science 89 (2016 ) 341 348 Twelfth International Multi-Conference on Information Processing-2016 (IMCIP-2016) Parallel Approach
More informationHigh Performance Computing on MapReduce Programming Framework
International Journal of Private Cloud Computing Environment and Management Vol. 2, No. 1, (2015), pp. 27-32 http://dx.doi.org/10.21742/ijpccem.2015.2.1.04 High Performance Computing on MapReduce Programming
More informationCorrelation based File Prefetching Approach for Hadoop
IEEE 2nd International Conference on Cloud Computing Technology and Science Correlation based File Prefetching Approach for Hadoop Bo Dong 1, Xiao Zhong 2, Qinghua Zheng 1, Lirong Jian 2, Jian Liu 1, Jie
More informationIMPLEMENTATION OF INFORMATION RETRIEVAL (IR) ALGORITHM FOR CLOUD COMPUTING: A COMPARATIVE STUDY BETWEEN WITH AND WITHOUT MAPREDUCE MECHANISM *
Journal of Contemporary Issues in Business Research ISSN 2305-8277 (Online), 2012, Vol. 1, No. 2, 42-56. Copyright of the Academic Journals JCIBR All rights reserved. IMPLEMENTATION OF INFORMATION RETRIEVAL
More informationCS60021: Scalable Data Mining. Sourangshu Bhattacharya
CS60021: Scalable Data Mining Sourangshu Bhattacharya In this Lecture: Outline: HDFS Motivation HDFS User commands HDFS System architecture HDFS Implementation details Sourangshu Bhattacharya Computer
More informationABSTRACT I. INTRODUCTION
International Journal of Scientific Research in Computer Science, Engineering and Information Technology 2018 IJSRCSEIT Volume 3 Issue 3 ISS: 2456-3307 Hadoop Periodic Jobs Using Data Blocks to Achieve
More informationA priority based dynamic bandwidth scheduling in SDN networks 1
Acta Technica 62 No. 2A/2017, 445 454 c 2017 Institute of Thermomechanics CAS, v.v.i. A priority based dynamic bandwidth scheduling in SDN networks 1 Zun Wang 2 Abstract. In order to solve the problems
More informationReview Article AN ANALYSIS ON THE PERFORMANCE OF VARIOUS REPLICA ALLOCATION ALGORITHMS IN CLOUD USING MATLAB
ISSN: 0975-766X CODEN: IJPTFI Available through Online Review Article www.ijptonline.com AN ANALYSIS ON THE PERFORMANCE OF VARIOUS REPLICA ALLOCATION ALGORITHMS IN CLOUD USING MATLAB 1 P. Nagendramani*,
More informationHDFS Architecture. Gregory Kesden, CSE-291 (Storage Systems) Fall 2017
HDFS Architecture Gregory Kesden, CSE-291 (Storage Systems) Fall 2017 Based Upon: http://hadoop.apache.org/docs/r3.0.0-alpha1/hadoopproject-dist/hadoop-hdfs/hdfsdesign.html Assumptions At scale, hardware
More informationCloud Computing. Hwajung Lee. Key Reference: Prof. Jong-Moon Chung s Lecture Notes at Yonsei University
Cloud Computing Hwajung Lee Key Reference: Prof. Jong-Moon Chung s Lecture Notes at Yonsei University Cloud Computing Cloud Introduction Cloud Service Model Big Data Hadoop MapReduce HDFS (Hadoop Distributed
More informationA FAST CLUSTERING-BASED FEATURE SUBSET SELECTION ALGORITHM
A FAST CLUSTERING-BASED FEATURE SUBSET SELECTION ALGORITHM Akshay S. Agrawal 1, Prof. Sachin Bojewar 2 1 P.G. Scholar, Department of Computer Engg., ARMIET, Sapgaon, (India) 2 Associate Professor, VIT,
More informationYuval Carmel Tel-Aviv University "Advanced Topics in Storage Systems" - Spring 2013
Yuval Carmel Tel-Aviv University "Advanced Topics in About & Keywords Motivation & Purpose Assumptions Architecture overview & Comparison Measurements How does it fit in? The Future 2 About & Keywords
More informationThe Analysis Research of Hierarchical Storage System Based on Hadoop Framework Yan LIU 1, a, Tianjian ZHENG 1, Mingjiang LI 1, Jinpeng YUAN 1
International Conference on Intelligent Systems Research and Mechatronics Engineering (ISRME 2015) The Analysis Research of Hierarchical Storage System Based on Hadoop Framework Yan LIU 1, a, Tianjian
More informationDept. Of Computer Science, Colorado State University
CS 455: INTRODUCTION TO DISTRIBUTED SYSTEMS [HADOOP/HDFS] Trying to have your cake and eat it too Each phase pines for tasks with locality and their numbers on a tether Alas within a phase, you get one,
More informationInternational Journal of Advance Engineering and Research Development. A Study: Hadoop Framework
Scientific Journal of Impact Factor (SJIF): e-issn (O): 2348- International Journal of Advance Engineering and Research Development Volume 3, Issue 2, February -2016 A Study: Hadoop Framework Devateja
More informationA new efficient Virtual Machine load balancing Algorithm for a cloud computing environment
Volume 02 - Issue 12 December 2016 PP. 69-75 A new efficient Virtual Machine load balancing Algorithm for a cloud computing environment Miss. Rajeshwari Nema MTECH Student Department of Computer Science
More informationA Fast and High Throughput SQL Query System for Big Data
A Fast and High Throughput SQL Query System for Big Data Feng Zhu, Jie Liu, and Lijie Xu Technology Center of Software Engineering, Institute of Software, Chinese Academy of Sciences, Beijing, China 100190
More informationCloud Computing and Hadoop Distributed File System. UCSB CS170, Spring 2018
Cloud Computing and Hadoop Distributed File System UCSB CS70, Spring 08 Cluster Computing Motivations Large-scale data processing on clusters Scan 000 TB on node @ 00 MB/s = days Scan on 000-node cluster
More informationDirectory Structure and File Allocation Methods
ISSN:0975-9646 Mandeep Kaur et al, / (IJCSIT) International Journal of Computer Science and Information Technologies, Vol. 7 (2), 2016, 577-582 Directory Structure and ile Allocation Methods Mandeep Kaur,
More informationSBKMMA: Sorting Based K Means and Median Based Clustering Algorithm Using Multi Machine Technique for Big Data
International Journal of Computer (IJC) ISSN 2307-4523 (Print & Online) Global Society of Scientific Research and Researchers http://ijcjournal.org/ SBKMMA: Sorting Based K Means and Median Based Algorithm
More informationA Novel Architecture to Efficient utilization of Hadoop Distributed File Systems for Small Files
A Novel Architecture to Efficient utilization of Hadoop Distributed File Systems for Small Files Vaishali 1, Prem Sagar Sharma 2 1 M. Tech Scholar, Dept. of CSE., BSAITM Faridabad, (HR), India 2 Assistant
More informationAES and DES Using Secure and Dynamic Data Storage in Cloud
Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology IJCSMC, Vol. 3, Issue. 1, January 2014,
More informationSelf Destruction Of Data On Cloud Computing
Volume 118 No. 24 2018 ISSN: 1314-3395 (on-line version) url: http://www.acadpubl.eu/hub/ http://www.acadpubl.eu/hub/ Self Destruction Of Data On Cloud Computing Pradnya Harpale 1,Mohini Korde 2, Pritam
More informationA Survey on improving performance of Information Retrieval System using Adaptive Genetic Algorithm
A Survey on improving performance of Information Retrieval System using Adaptive Genetic Algorithm Prajakta Mitkal 1, Prof. Ms. D.V. Gore 2 1 Modern College of Engineering Shivajinagar, Pune 2 Modern College
More informationA New HadoopBased Network Management System with Policy Approach
Computer Engineering and Applications Vol. 3, No. 3, September 2014 A New HadoopBased Network Management System with Policy Approach Department of Computer Engineering and IT, Shiraz University of Technology,
More informationDATA DEDUPLCATION AND MIGRATION USING LOAD REBALANCING APPROACH IN HDFS Pritee Patil 1, Nitin Pise 2,Sarika Bobde 3 1
DATA DEDUPLCATION AND MIGRATION USING LOAD REBALANCING APPROACH IN HDFS Pritee Patil 1, Nitin Pise 2,Sarika Bobde 3 1 Department of Computer Engineering 2 Department of Computer Engineering Maharashtra
More informationWorkloads Programmierung Paralleler und Verteilter Systeme (PPV)
Workloads Programmierung Paralleler und Verteilter Systeme (PPV) Sommer 2015 Frank Feinbube, M.Sc., Felix Eberhardt, M.Sc., Prof. Dr. Andreas Polze Workloads 2 Hardware / software execution environment
More informationUbiquitous and Mobile Computing CS 525M: Virtually Unifying Personal Storage for Fast and Pervasive Data Accesses
Ubiquitous and Mobile Computing CS 525M: Virtually Unifying Personal Storage for Fast and Pervasive Data Accesses Pengfei Tang Computer Science Dept. Worcester Polytechnic Institute (WPI) Introduction:
More informationCLOUD-SCALE FILE SYSTEMS
Data Management in the Cloud CLOUD-SCALE FILE SYSTEMS 92 Google File System (GFS) Designing a file system for the Cloud design assumptions design choices Architecture GFS Master GFS Chunkservers GFS Clients
More informationA Study of Comparatively Analysis for HDFS and Google File System towards to Handle Big Data
A Study of Comparatively Analysis for HDFS and Google File System towards to Handle Big Data Rajesh R Savaliya 1, Dr. Akash Saxena 2 1Research Scholor, Rai University, Vill. Saroda, Tal. Dholka Dist. Ahmedabad,
More informationERASURE-CODING DEPENDENT STORAGE AWARE ROUTING
International Journal of Mechanical Engineering and Technology (IJMET) Volume 9 Issue 11 November 2018 pp.2226 2231 Article ID: IJMET_09_11_235 Available online at http://www.ia aeme.com/ijmet/issues.asp?jtype=ijmet&vtype=
More informationHadoop and HDFS Overview. Madhu Ankam
Hadoop and HDFS Overview Madhu Ankam Why Hadoop We are gathering more data than ever Examples of data : Server logs Web logs Financial transactions Analytics Emails and text messages Social media like
More informationIteration Reduction K Means Clustering Algorithm
Iteration Reduction K Means Clustering Algorithm Kedar Sawant 1 and Snehal Bhogan 2 1 Department of Computer Engineering, Agnel Institute of Technology and Design, Assagao, Goa 403507, India 2 Department
More informationAN EFFICIENT APPROACH FOR PROVIDING FULL CONNECTIVITY IN WIRELESS SENSOR NETWORK
International Journal of Computer Engineering & Technology (IJCET) Volume 9, Issue 6, November-December 2018, pp. 140 154, Article ID: IJCET_09_06_016 Available online at http://www.iaeme.com/ijcet/issues.asp?jtype=ijcet&vtype=9&itype=6
More informationTransaction Processing in Mobile Database Systems
Ashish Jain* 1 http://dx.doi.org/10.18090/samriddhi.v7i2.8631 ABSTRACT In a mobile computing environment, a potentially large number of mobile and fixed users may simultaneously access shared data; therefore,
More informationFast and Effective System for Name Entity Recognition on Big Data
International Journal of Computer Sciences and Engineering Open Access Research Paper Volume-3, Issue-2 E-ISSN: 2347-2693 Fast and Effective System for Name Entity Recognition on Big Data Jigyasa Nigam
More informationSurvey Paper on Traditional Hadoop and Pipelined Map Reduce
International Journal of Computational Engineering Research Vol, 03 Issue, 12 Survey Paper on Traditional Hadoop and Pipelined Map Reduce Dhole Poonam B 1, Gunjal Baisa L 2 1 M.E.ComputerAVCOE, Sangamner,
More informationUnsupervised learning on Color Images
Unsupervised learning on Color Images Sindhuja Vakkalagadda 1, Prasanthi Dhavala 2 1 Computer Science and Systems Engineering, Andhra University, AP, India 2 Computer Science and Systems Engineering, Andhra
More information4/9/2018 Week 13-A Sangmi Lee Pallickara. CS435 Introduction to Big Data Spring 2018 Colorado State University. FAQs. Architecture of GFS
W13.A.0.0 CS435 Introduction to Big Data W13.A.1 FAQs Programming Assignment 3 has been posted PART 2. LARGE SCALE DATA STORAGE SYSTEMS DISTRIBUTED FILE SYSTEMS Recitations Apache Spark tutorial 1 and
More informationLOAD BALANCING IN CLOUD COMPUTING USING ANT COLONY OPTIMIZATION
International Journal of Computer Engineering & Technology (IJCET) Volume 8, Issue 6, Nov-Dec 2017, pp. 54 59, Article ID: IJCET_08_06_006 Available online at http://www.iaeme.com/ijcet/issues.asp?jtype=ijcet&vtype=8&itype=6
More informationGoogle File System (GFS) and Hadoop Distributed File System (HDFS)
Google File System (GFS) and Hadoop Distributed File System (HDFS) 1 Hadoop: Architectural Design Principles Linear scalability More nodes can do more work within the same time Linear on data size, linear
More informationISSN: (Online) Volume 2, Issue 3, March 2014 International Journal of Advance Research in Computer Science and Management Studies
ISSN: 2321-7782 (Online) Volume 2, Issue 3, March 2014 International Journal of Advance Research in Computer Science and Management Studies Research Article / Paper / Case Study Available online at: www.ijarcsms.com
More informationEnhanced Hadoop with Search and MapReduce Concurrency Optimization
Volume 114 No. 12 2017, 323-331 ISSN: 1311-8080 (printed version); ISSN: 1314-3395 (on-line version) url: http://www.ijpam.eu ijpam.eu Enhanced Hadoop with Search and MapReduce Concurrency Optimization
More informationResearch on Mass Image Storage Platform Based on Cloud Computing
6th International Conference on Sensor Network and Computer Engineering (ICSNCE 2016) Research on Mass Image Storage Platform Based on Cloud Computing Xiaoqing Zhou1, a *, Jiaxiu Sun2, b and Zhiyong Zhou1,
More informationCLUSTERING BIG DATA USING NORMALIZATION BASED k-means ALGORITHM
Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology ISSN 2320 088X IMPACT FACTOR: 5.258 IJCSMC,
More informationAnalysis of Extended Performance for clustering of Satellite Images Using Bigdata Platform Spark
Analysis of Extended Performance for clustering of Satellite Images Using Bigdata Platform Spark PL.Marichamy 1, M.Phil Research Scholar, Department of Computer Application, Alagappa University, Karaikudi,
More informationThe Design of Distributed File System Based on HDFS Yannan Wang 1, a, Shudong Zhang 2, b, Hui Liu 3, c
Applied Mechanics and Materials Online: 2013-09-27 ISSN: 1662-7482, Vols. 423-426, pp 2733-2736 doi:10.4028/www.scientific.net/amm.423-426.2733 2013 Trans Tech Publications, Switzerland The Design of Distributed
More informationProposed System. Start. Search parameter definition. User search criteria (input) usefulness score > 0.5. Retrieve results
, Impact Factor- 5.343 Hybrid Approach For Efficient Diversification on Cloud Stored Large Dataset Geetanjali Mohite 1, Prof. Gauri Rao 2 1 Student, Department of Computer Engineering, B.V.D.U.C.O.E, Pune,
More informationBatch Inherence of Map Reduce Framework
Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology IJCSMC, Vol. 4, Issue. 6, June 2015, pg.287
More informationMining Distributed Frequent Itemset with Hadoop
Mining Distributed Frequent Itemset with Hadoop Ms. Poonam Modgi, PG student, Parul Institute of Technology, GTU. Prof. Dinesh Vaghela, Parul Institute of Technology, GTU. Abstract: In the current scenario
More informationMANAGEMENT AND PLACEMENT OF REPLICAS IN A HIERARCHICAL DATA GRID
MANAGEMENT AND PLACEMENT OF REPLICAS IN A HIERARCHICAL DATA GRID Ghalem Belalem 1 and Bakhta Meroufel 2 1 Department of Computer Science, Faculty of Sciences, University of Oran (Es Senia), Algeria ghalem1dz@gmail.com
More informationFigure 1: Virtualization
Volume 6, Issue 9, September 2016 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com A Profitable
More informationDynamic Replication Management Scheme for Cloud Storage
Dynamic Replication Management Scheme for Cloud Storage May Phyo Thu, Khine Moe Nwe, Kyar Nyo Aye University of Computer Studies, Yangon mayphyothu.mpt1@gmail.com, khinemoenwe@ucsy.edu.mm, kyarnyoaye@gmail.com
More informationPROFILING BASED REDUCE MEMORY PROVISIONING FOR IMPROVING THE PERFORMANCE IN HADOOP
ISSN: 0976-2876 (Print) ISSN: 2250-0138 (Online) PROFILING BASED REDUCE MEMORY PROVISIONING FOR IMPROVING THE PERFORMANCE IN HADOOP T. S. NISHA a1 AND K. SATYANARAYAN REDDY b a Department of CSE, Cambridge
More informationEvaluation of Apache Hadoop for parallel data analysis with ROOT
Evaluation of Apache Hadoop for parallel data analysis with ROOT S Lehrack, G Duckeck, J Ebke Ludwigs-Maximilians-University Munich, Chair of elementary particle physics, Am Coulombwall 1, D-85748 Garching,
More informationApril Final Quiz COSC MapReduce Programming a) Explain briefly the main ideas and components of the MapReduce programming model.
1. MapReduce Programming a) Explain briefly the main ideas and components of the MapReduce programming model. MapReduce is a framework for processing big data which processes data in two phases, a Map
More informationResearch Article A Two-Level Cache for Distributed Information Retrieval in Search Engines
The Scientific World Journal Volume 2013, Article ID 596724, 6 pages http://dx.doi.org/10.1155/2013/596724 Research Article A Two-Level Cache for Distributed Information Retrieval in Search Engines Weizhe
More informationA Level-wise Priority Based Task Scheduling for Heterogeneous Systems
International Journal of Information and Education Technology, Vol., No. 5, December A Level-wise Priority Based Task Scheduling for Heterogeneous Systems R. Eswari and S. Nickolas, Member IACSIT Abstract
More informationA SURVEY ON SCHEDULING IN HADOOP FOR BIGDATA PROCESSING
Journal homepage: www.mjret.in ISSN:2348-6953 A SURVEY ON SCHEDULING IN HADOOP FOR BIGDATA PROCESSING Bhavsar Nikhil, Bhavsar Riddhikesh,Patil Balu,Tad Mukesh Department of Computer Engineering JSPM s
More informationA New Platform NIDS Based On WEMA
I.J. Information Technology and Computer Science, 2015, 06, 52-58 Published Online May 2015 in MECS (http://www.mecs-press.org/) DOI: 10.5815/ijitcs.2015.06.07 A New Platform NIDS Based On WEMA Adnan A.
More informationTop 25 Big Data Interview Questions And Answers
Top 25 Big Data Interview Questions And Answers By: Neeru Jain - Big Data The era of big data has just begun. With more companies inclined towards big data to run their operations, the demand for talent
More informationAutonomic Data Replication in Cloud Environment
International Journal of Electronics and Computer Science Engineering 459 Available Online at www.ijecse.org ISSN- 2277-1956 Autonomic Data Replication in Cloud Environment Dhananjaya Gupt, Mrs.Anju Bala
More informationObtaining Rough Set Approximation using MapReduce Technique in Data Mining
Obtaining Rough Set Approximation using MapReduce Technique in Data Mining Varda Dhande 1, Dr. B. K. Sarkar 2 1 M.E II yr student, Dept of Computer Engg, P.V.P.I.T Collage of Engineering Pune, Maharashtra,
More informationInverted Index for Fast Nearest Neighbour
Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology ISSN 2320 088X IMPACT FACTOR: 5.258 IJCSMC,
More informationA Case Study on Cloud Based Hybrid Adaptive Mobile Streaming: Performance Evaluation
A Case Study on Cloud Based Hybrid Adaptive Mobile Streaming: Performance Evaluation T. Mahesh kumar 1, Dr. k. Santhisree 2, M. Bharat 3, V. Pruthvi Chaithanya Varshu 4 Student member of IEEE, M. tech
More informationCPSC 426/526. Cloud Computing. Ennan Zhai. Computer Science Department Yale University
CPSC 426/526 Cloud Computing Ennan Zhai Computer Science Department Yale University Recall: Lec-7 In the lec-7, I talked about: - P2P vs Enterprise control - Firewall - NATs - Software defined network
More informationImplementation of Parallel CASINO Algorithm Based on MapReduce. Li Zhang a, Yijie Shi b
International Conference on Artificial Intelligence and Engineering Applications (AIEA 2016) Implementation of Parallel CASINO Algorithm Based on MapReduce Li Zhang a, Yijie Shi b State key laboratory
More informationOpen Access Apriori Algorithm Research Based on Map-Reduce in Cloud Computing Environments
Send Orders for Reprints to reprints@benthamscience.ae 368 The Open Automation and Control Systems Journal, 2014, 6, 368-373 Open Access Apriori Algorithm Research Based on Map-Reduce in Cloud Computing
More informationPERSONAL communications service (PCS) provides
646 IEEE/ACM TRANSACTIONS ON NETWORKING, VOL. 5, NO. 5, OCTOBER 1997 Dynamic Hierarchical Database Architecture for Location Management in PCS Networks Joseph S. M. Ho, Member, IEEE, and Ian F. Akyildiz,
More informationADAPTIVE HANDLING OF 3V S OF BIG DATA TO IMPROVE EFFICIENCY USING HETEROGENEOUS CLUSTERS
INTERNATIONAL JOURNAL OF RESEARCH IN COMPUTER APPLICATIONS AND ROBOTICS ISSN 2320-7345 ADAPTIVE HANDLING OF 3V S OF BIG DATA TO IMPROVE EFFICIENCY USING HETEROGENEOUS CLUSTERS Radhakrishnan R 1, Karthik
More informationAdaptive replica consistency policy for Kafka
Adaptive replica consistency policy for Kafka Zonghuai Guo 1,2,*, Shiwang Ding 1,2 1 Chongqing University of Posts and Telecommunications, 400065, Nan'an District, Chongqing, P.R.China 2 Chongqing Mobile
More informationResearch on Parallelized Stream Data Micro Clustering Algorithm Ke Ma 1, Lingjuan Li 1, Yimu Ji 1, Shengmei Luo 1, Tao Wen 2
International Conference on Advances in Mechanical Engineering and Industrial Informatics (AMEII 2015) Research on Parallelized Stream Data Micro Clustering Algorithm Ke Ma 1, Lingjuan Li 1, Yimu Ji 1,
More informationEfficient Algorithm for Frequent Itemset Generation in Big Data
Efficient Algorithm for Frequent Itemset Generation in Big Data Anbumalar Smilin V, Siddique Ibrahim S.P, Dr.M.Sivabalakrishnan P.G. Student, Department of Computer Science and Engineering, Kumaraguru
More information