International Journal of Computer Engineering and Applications, ICCSTAR-2016, Special Issue, May.16

Similar documents
Data Mining Techniques

A Comparative Study of Data Mining Process Models (KDD, CRISP-DM and SEMMA)

Data Mining. Introduction. Piotr Paszek. (Piotr Paszek) Data Mining DM KDD 1 / 44

Question Bank. 4) It is the source of information later delivered to data marts.

This tutorial will help computer science graduates to understand the basic-to-advanced concepts related to data warehousing.

DATA WAREHOUSING IN LIBRARIES FOR MANAGING DATABASE

PESIT- Bangalore South Campus Hosur Road (1km Before Electronic city) Bangalore

KDD, SEMMA AND CRISP-DM: A PARALLEL OVERVIEW. Ana Azevedo and M.F. Santos

An Introduction to Data Mining in Institutional Research. Dr. Thulasi Kumar Director of Institutional Research University of Northern Iowa

TIM 50 - Business Information Systems

Overview. Data-mining. Commercial & Scientific Applications. Ongoing Research Activities. From Research to Technology Transfer

Data Mining. Ryan Benton Center for Advanced Computer Studies University of Louisiana at Lafayette Lafayette, La., USA.

This tutorial has been prepared for computer science graduates to help them understand the basic-to-advanced concepts related to data mining.

1. Inroduction to Data Mininig

Dynamic Data in terms of Data Mining Streams

Data Mining and Warehousing

GUJARAT TECHNOLOGICAL UNIVERSITY MASTER OF COMPUTER APPLICATIONS (MCA) Semester: IV

International Journal of Mechatronics, Electrical and Computer Technology

International Journal of Advance Engineering and Research Development. A Survey on Data Mining Methods and its Applications

Topics covered 10/12/2015. Pengantar Teknologi Informasi dan Teknologi Hijau. Suryo Widiantoro, ST, MMSI, M.Com(IS)

DATA MINING AND WAREHOUSING

WKU-MIS-B10 Data Management: Warehousing, Analyzing, Mining, and Visualization. Management Information Systems

TIM 50 - Business Information Systems

Analytical model A structure and process for analyzing a dataset. For example, a decision tree is a model for the classification of a dataset.

The CRISP-DM Process Model

An Improved Apriori Algorithm for Association Rules

A Novel Approach of Data Warehouse OLTP and OLAP Technology for Supporting Management prospective

International Journal of Data Mining & Knowledge Management Process (IJDKP) Vol.7, No.3, May Dr.Zakea Il-Agure and Mr.Hicham Noureddine Itani

Overview. Introduction to Data Warehousing and Business Intelligence. BI Is Important. What is Business Intelligence (BI)?

ISSN: (Online) Volume 3, Issue 9, September 2015 International Journal of Advance Research in Computer Science and Management Studies

DATA MINING TRANSACTION

Data mining overview. Data Mining. Data mining overview. Data mining overview. Data mining overview. Data mining overview 3/24/2014

Integrating Logistic Regression with Knowledge Discovery Systems

The Data Mining usage in Production System Management

Performance Analysis of Data Mining Classification Techniques

Meaning & Concepts of Databases

Data Mining & Data Warehouse

Database and Knowledge-Base Systems: Data Mining. Martin Ester

Lluis Belanche + Alfredo Vellido Data Mining II An Introduction to Mining (2)

Chapter 6. Foundations of Business Intelligence: Databases and Information Management VIDEO CASES

Chapter 6 VIDEO CASES

Management Information Systems MANAGING THE DIGITAL FIRM, 12 TH EDITION FOUNDATIONS OF BUSINESS INTELLIGENCE: DATABASES AND INFORMATION MANAGEMENT

Web Mining Evolution & Comparative Study with Data Mining

Data Mining. Associate Professor Dr. Raed Ibraheem Hamed. University of Human Development, College of Science and Technology

Data Warehousing. Ritham Vashisht, Sukhdeep Kaur and Shobti Saini

Frequent Item Set using Apriori and Map Reduce algorithm: An Application in Inventory Management

Data warehouse and Data Mining

Lluis Belanche + Alfredo Vellido. Intelligent Data Analysis and Data Mining. Data Analysis and Knowledge Discovery

Introduction to Data Mining

DATA MINING II - 1DL460

Overview of Web Mining Techniques and its Application towards Web

Building Data Mining Application for Customer Relationship Management

Knowledge Modelling and Management. Part B (9)

The Study on Data Warehouse and Data Mining for Birth Registration System of the Surat City

Data Mining & Machine Learning F2.4DN1/F2.9DM1

CS377: Database Systems Data Warehouse and Data Mining. Li Xiong Department of Mathematics and Computer Science Emory University

Chapter 28. Outline. Definitions of Data Mining. Data Mining Concepts

[Shingarwade, 6(2): February 2019] ISSN DOI /zenodo Impact Factor

Data Mining Technology Based on Bayesian Network Structure Applied in Learning

DATA MINING II - 1DL460

Data warehouse architecture consists of the following interconnected layers:

Think & Work like a Data Scientist with SQL 2016 & R DR. SUBRAMANI PARAMASIVAM (MANI)

Parametric Comparisons of Classification Techniques in Data Mining Applications

IJMIE Volume 2, Issue 9 ISSN:

Chapter 3. Foundations of Business Intelligence: Databases and Information Management

A Comparative Study of Selected Classification Algorithms of Data Mining

Performance Evaluation of Sequential and Parallel Mining of Association Rules using Apriori Algorithms

Data Mining: Approach Towards The Accuracy Using Teradata!

International Journal of Advanced Research in Computer Science and Software Engineering

Data Mining Techniques Methods Algorithms and Tools

K-Mean Clustering Algorithm Implemented To E-Banking

Gurpreet Kaur 1, Naveen Aggarwal 2 1,2

A Technical Analysis of Market Basket by using Association Rule Mining and Apriori Algorithm

Time: 3 hours. Full Marks: 70. The figures in the margin indicate full marks. Answers from all the Groups as directed. Group A.

Chapter 1, Introduction

1 (eagle_eye) and Naeem Latif

Data Mining in the Application of E-Commerce Website

INTRODUCTION TO BIG DATA, DATA MINING, AND MACHINE LEARNING

Knowledge Discovery and Data Mining

Business Intelligence Roadmap HDT923 Three Days

Data Mining An Overview ITEV, F /18

Oracle Database 11g: Data Warehousing Fundamentals

Analysis of Fragment Mining on Indian Financial Market

Log Information Mining Using Association Rules Technique: A Case Study Of Utusan Education Portal

This proposed research is inspired by the work of Mr Jagdish Sadhave 2009, who used

Correlation Based Feature Selection with Irrelevant Feature Removal

Statistical Learning and Data Mining CS 363D/ SSC 358

Chapter 3. Databases and Data Warehouses: Building Business Intelligence

INTRODUCTION TO DATA MINING

SCHEME OF COURSE WORK. Data Warehousing and Data mining

Disquisition of a Novel Approach to Enhance Security in Data Mining

COURSE 20466D: IMPLEMENTING DATA MODELS AND REPORTS WITH MICROSOFT SQL SERVER

Data warehousing in telecom Industry

Classifying Twitter Data in Multiple Classes Based On Sentiment Class Labels

Data Mining. Chapter 1: Introduction. Adapted from materials by Jiawei Han, Micheline Kamber, and Jian Pei

Data Warehouse and Data Mining

Knowledge Discovery. Javier Béjar URL - Spring 2019 CS - MIA

Research Article ISSN:

A Study on Mining of Frequent Subsequences and Sequential Pattern Search- Searching Sequence Pattern by Subset Partition

Data Mining Concepts

Transcription:

The Survey Of Data Mining And Warehousing Architha.S, A.Kishore Kumar Department of Computer Engineering Department of computer engineering city engineering college VTU Bangalore, India ABSTRACT: Data mining is the practice of examining large pre-existing databases in order to generate new information. The paper discusses few of the data mining techniques, applications and some of the organizations which have adapted data mining technology to improve their businesses and found excellent results. Data mining on large data bases have been a very big concern in research community due to the difficulty of analyzing large volumes of data using only traditional OLAP tools. This sort of process takes up a lot of computational power, memory and disk input and output, which can only be provided by parallel computers. Thus here we present to you a brief discussion on how the data base technology can be integrated to data mining techniques. We also learn about the life cycle of data mining. Keywords: Data mining, warehousing, life cycle, applications. INTRODUCTION Data mining, the extraction of hidden information from large databases, is one of the powerful technologies which has a great potential to help other companies focus on the most important information in their data warehouses. The concept of data mining tools predict the future trends and behaviors which allows businesses to make knowledge driven and clever decisions. Data mining techniques have increasingly been studied a lot, especially because of their application in real world databases. The only big problem that arises is that the databases tend to be very large and these topics repeatedly scan the entire set. Even though the term data mining was introduced in the 90 s, the concept of data mining goes back to many years. Data mining has reached its current form after going through many phases of research and studies. what we have today here is a competitive world of information and hence all are trying to make the best use of their data to make business profitable and successful. Storage and collection of data on computers, tapes and disks started in the 60 s. next major and evolutionary step in data mining happened during the 80 s with the introduction of relational databases and structured query languages. 115

This sure helped the users to know more about the data stored in relational databases using structured query languages. The next main introduction was about data warehousing which happened during the 90 s. online analytic processing and multidimensional databases contributed to the growth of data warehousing. But now, data mining is the emerging technique. If we analyze each step of evolution of data mining, we get to observe clearly that each step is built upon the previous step. In addition to large data warehouses and basic statistics, a major part of data mining is artificial intelligence, commonly known as AI. This artificial intelligence was started in the 80 s with a set of algorithms which were designed to teach a computer how to learn by itself. As they developed, these algorithms became valuable data manipulation tools and hence they were applied to a large sets of data. The data mining software, combined with the artificial intelligence technology was able to generate its own relationship between the data. AI gave way to machine learning. It is defined as the ability of a machine to improve its performance based on previous results. DATA WAREHOUSING The term data warehouse was first coined by Bill Inmon in 1990. According to Bill, a subject oriented, integrated, non volatile, time variant collection of data. This data plays a major role for analysts to take informed decisions in an organization. Everyday, an operational database undergoes frequent changes on account of the transactions that take place. If a business executive wants to analyze any previous feedback on any data such as a supplier, product, or any consumer data, then the executive will not have any data available to analyze because all the previous data has been updated due to transactions. A data warehouse provides us some generalized and consolidated data in multidimensional view. It also provides us Online Analytical Processing Tools (OLAP). These tools help us in interactive and effective data analysis in a multidimensional space. This type of analysis will result in data generalization and data mining. Data mining functions classification, clustering, association, prediction can all be integrated using OLPA operations to enhance the interactive mining of knowledge at multiple level abstraction. This is why data warehousing has become a very important platform for data analysis and online analytical processing. Figure: 1. Bill Inmon Figure: 2. Data Warehousing 116

DATA MINING LIFE CYCLE The life cycle of data mining project consists of six phases[2,4]. The sequence of the phases is not rigid, moving back forth between phases is always required. It depends on the outcome of each phase. The main phases are: Business Understanding: This phase focuses on understanding the project objectives and requirements from a business perspective, then converting this knowledge into a data mining problem definition and a preliminary plan designed to achieve the objectives. Data Understanding: It starts with an initial data collection, to get familiar with the data, to identify data quality problems, to discover first insights into the data or to detect interesting subsets to form hypothesis for hidden information. Data Preparation: In this stage, it collects all the different data sets and construct the varieties of activities basing on the internal raw data. Modeling: In this phase, various modeling techniques are selected and applied and their parameters are calibrated to optimal values. Figure: 3. Data Mining Cycle Figure: 4. Data Mining Process Deployment: The purpose of the model is to increase knowledge of the data, the knowledge gained will need to be organized and presented in a way that the customer can use it. The deployment phase can be as 117

simple as generating a report or as complex as implementing a repeatable data mining process acrosstheenterprise. 118

Visualizing Data Mining Model Evaluation: In this stage the model is thoroughly evaluated and reviewed. The steps executed to construct the model to be certain it properly achieves the business objectives. At the end of this phase, a decision on the use of the data mining results should be reached. The main objective of data visualization is the overall idea about the data mining model.in data mining most of the times we are retrieving the data from the repositories which are in the hidden form. This is the difficult task for a user. So this visualization of the data mining model helps us to provide utmost levels of understanding and what the data mining process has discovered, it is a much bigger leap to take the output of the system and translate it into an actionable solution to a business problem. The data mining models are of two types Predictive and Descriptive. The predictive model makes prediction about unknown data values by using the known values. Ex. Classification, Regression, Time series analysis, Prediction etc. The descriptive model identifies the patterns or relationships in data and explores the properties of the data examined. Ex. Clustering, Summarization, Association rule, Sequence discovery etc. The term clustering means analyzes the different data objects without consulting a known class levels. It is also referred to as unsupervised learning or segmentation. It is the partitioning or segmentation of the data in to groups or clusters. The clusters are defined by studying the behavior of the data by the domain experts. The term segmentation is used in very specific context; it is a process of partitioning of database into disjoint grouping of similar tuples. The association rule finds the association between the different attributes. Data Mining Applications In this section, we have focused some of the applications of data mining and its techniques analyzed respectively. Data mining is used for market basket analysis Data mining technique is used in MBA(Market Basket Analysis).When the customer want to buying some products then this technique helps us finding the associations between different items that the customer put in their shopping buckets. Here the discovery of such associations that promotes the business technique.in this way the retailers uses the data mining technique so that they can identify that which customers intension (buying the different pattern).in this way this technique is used for profits of the business and also helps to purchase the related items. Data Mining Applications can be generic or domain specific. Data mining system can be applied for generic or domain specific. Some generic data mining applications cannot take its own these decisions but guide users for selection of data, selection of data mining method and for the interpretation of the results. The multi agent based data mining application [8, 10] has capability of automatic selection item sets. Sequence discovery is the process of finding new sequences in data 119

. Data Mining methods are used in the Web Education Data mining methods are used in the web Education which is used to improve courseware. The relationships are discovered among the usage data picked up during students sessions. This knowledge is very useful for the teacher or the author of the course, who could decide what modifications will be the most appropriate to improve the effectiveness of the course. In the 21 st century the beginners are using the data mining techniques which is one of the best learning method in this era. This makes it possible to increase the awareness of learners. Web Education which will rapidly growth in the application of data mining methods to educational chats which is both feasible and can be improvement in learning environments in the 21 st century. Conclusion In this paper we briefly reviewed the various data mining applications. This review would be helpful to researchers to focus on the various issues of data mining. In future course, we will review the various classification algorithms and significance of evolutionary computing (genetic programming) approach in designing of efficient classification algorithms for data mining. Most of the previous studies on data mining applications in various fields use the variety of data types range from text to images and stores in variety of databases and data structures. The different methods of data mining are used to extract the patterns and thus the knowledge from this variety databases. Selection of data and methods for data mining is an important task in this process and needs the knowledge of the domain. REFERENCES [1] Introduction to Data Mining and Knowledge Discovery, Third Edition ISBN: 1-892095-02-5, Two Crows Corporation, 10500 Falls Road, Potomac, MD 20854 (U.S.A.), 1999. [2] Larose, D. T., Discovering Knowledge in Data: An Introduction to Data Mining, ISBN 0-471-66657-2, ohn Wiley & Sons, Inc, 2005. [3] Dunham, M. H., Sridhar S., Data Mining: Introductory and Advanced Topics, Pearson Education, New Delhi, ISBN: 81-7758-785-4, 1 st Edition, 2006 [4] Chapman, P., Clinton, J., Kerber, R., Khabaza, T., Reinartz, T., Shearer, C. and Wirth, R... CRISP-DM 1.0 : Step-by-step data mining guide, NCR Systems Engineering Copenhagen (USA and Denmark), DaimlerChrysler AG (Germany), SPSS Inc. (USA) and OHRA Verzekeringenen Bank Group B.V (The Netherlands), 2000. [5] Fayyad, U., Piatetsky-Shapiro, G., and Smyth P., From Data Mining to Knowledge Discovery in Databases, AI Magazine, American Association for Artificial Intelligence, 1996. [6] Baazaoui, Z., H., Faiz, S., and Ben 120

Ghezala, H., A Framework for Data Mining Based Multi-Agent: An Application to Spatial Data, volume 5, ISSN 1307-6884, Proceedings of World Academy of Science, Engineering and Technology, April 2005. [7] Botia, J. A., Garijo, M. y Velasco, J. R., Skarmeta, A. F., A Generic Data mining System basic design and implementation guidelines, A Technical Project Report of CYCYTprojectofSpanishGovernment.199 8.WebSite: http://citeseerx.ist.psu.edu/viewdoc/summa ry?doi=10.1.1.53.1935 Authors Architha.S: Presently Architha is pursuing B.E. in Computer Science Engineering at City Engineering College Bangalore, India A.Kishore Kumar: Presently Kishore is pursuing B.E. in Computer Science Engineering at Engineering College Bangalore, India City Corresponding Address- A.Kishore Kumar- #1180, Renuka Nilaya, 2 nd main, 7 th block, hosakerehalli, bsk 3 rd Bangalore- 85 Mobile number- +919845745456 Email id- kishore.kumar.kishore1997@gmail.com stage, Architha.S- #330/10 ground floor, 23 rd cross, 6 th block, jayanagar, Bangalore- 70 Mobile number- +918123870278 Email id- architha96@gmail.com 121