A LITERATURE SURVEY: EDUCATIONAL DATA MINING AND WEB DATA MINING

Similar documents
Context Based Indexing in Search Engines: A Review

A New Context Based Indexing in Search Engines Using Binary Search Tree

An Improved Indexing Mechanism Based On Homonym Using Hierarchical Clustering in Search Engine *

Inverted Indexing Mechanism for Search Engine

Context Based Web Indexing For Semantic Web

Educational Data Mining: Performance Evaluation of Decision Tree and Clustering Techniques using WEKA Platform

Overview of Web Mining Techniques and its Application towards Web

A Survey Of Different Text Mining Techniques Varsha C. Pande 1 and Dr. A.S. Khandelwal 2

TERM BASED WEIGHT MEASURE FOR INFORMATION FILTERING IN SEARCH ENGINES

AUTOMATIC VISUAL CONCEPT DETECTION IN VIDEOS

UNIT-V WEB MINING. 3/18/2012 Prof. Asha Ambhaikar, RCET Bhilai.

Chapter 5: Summary and Conclusion CHAPTER 5 SUMMARY AND CONCLUSION. Chapter 1: Introduction

Prediction of Student Performance using MTSD algorithm

Inferring User Search for Feedback Sessions

IJMIE Volume 2, Issue 9 ISSN:

Correlation Based Feature Selection with Irrelevant Feature Removal

BEST PRACTICES FOR ADAPTATION OF DATA MINING TECHNIQUES IN EDUCATION SECTOR Jikitsha Sheth, Dr. Bankim Patel

ISSN: (Online) Volume 3, Issue 9, September 2015 International Journal of Advance Research in Computer Science and Management Studies

International Journal of Scientific Research & Engineering Trends Volume 4, Issue 6, Nov-Dec-2018, ISSN (Online): X

KDD, SEMMA AND CRISP-DM: A PARALLEL OVERVIEW. Ana Azevedo and M.F. Santos

International Journal of Advance Engineering and Research Development. A Survey on Data Mining Methods and its Applications

Web Mining Evolution & Comparative Study with Data Mining

Data Mining Technology Based on Bayesian Network Structure Applied in Learning

INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY

Clustering of Data with Mixed Attributes based on Unified Similarity Metric

SEQUENTIAL PATTERN MINING FROM WEB LOG DATA

INTRODUCTION. Chapter GENERAL

Review on Text Mining

Performance Analysis of Data Mining Classification Techniques

Data Mining of Web Access Logs Using Classification Techniques

ANALYSIS COMPUTER SCIENCE Discovery Science, Volume 9, Number 20, April 3, Comparative Study of Classification Algorithms Using Data Mining

Chapter 27 Introduction to Information Retrieval and Web Search

Dynamic Clustering of Data with Modified K-Means Algorithm

A Novel Approach for Restructuring Web Search Results by Feedback Sessions Using Fuzzy clustering

Recommendation on the Web Search by Using Co-Occurrence

A Framework for Hierarchical Clustering Based Indexing in Search Engines

International Journal of Data Mining & Knowledge Management Process (IJDKP) Vol.7, No.3, May Dr.Zakea Il-Agure and Mr.Hicham Noureddine Itani

A Comparative Study of Data Mining Process Models (KDD, CRISP-DM and SEMMA)

Review Paper onbuilding Prediction based model for cloud-based data mining

Competitive Intelligence and Web Mining:

2013, IJARCSSE All Rights Reserved Page 628

IJREAT International Journal of Research in Engineering & Advanced Technology, Volume 1, Issue 5, Oct-Nov, ISSN:

An Approach To Web Content Mining

Parametric Comparisons of Classification Techniques in Data Mining Applications

Open Access Apriori Algorithm Research Based on Map-Reduce in Cloud Computing Environments

An Introduction to Data Mining in Institutional Research. Dr. Thulasi Kumar Director of Institutional Research University of Northern Iowa

1. Inroduction to Data Mininig

A Study on K-Means Clustering in Text Mining Using Python

International Journal of Computer Engineering and Applications, ICCSTAR-2016, Special Issue, May.16

SURVEY ON STUDENT INFORMATION ANALYSIS

INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY

Index Terms Data Mining, Classification, Rapid Miner. Fig.1. RapidMiner User Interface

Outlier Detection Using Unsupervised and Semi-Supervised Technique on High Dimensional Data

A Comparative Study of Selected Classification Algorithms of Data Mining

International Journal of Computer Engineering and Applications, Volume XII, Issue II, Feb. 18, ISSN

Information Retrieval

Classifying Twitter Data in Multiple Classes Based On Sentiment Class Labels

Template Extraction from Heterogeneous Web Pages

Taccumulation of the social network data has raised

Empirical Analysis of Single and Multi Document Summarization using Clustering Algorithms

Frequent Item Set using Apriori and Map Reduce algorithm: An Application in Inventory Management

Relevance Feature Discovery for Text Mining

An Improved Document Clustering Approach Using Weighted K-Means Algorithm

Web Data mining-a Research area in Web usage mining

GETTING STARTED WITH DATA MINING

Empowering People with Knowledge the Next Frontier for Web Search. Wei-Ying Ma Assistant Managing Director Microsoft Research Asia

Text Document Clustering Using DPM with Concept and Feature Analysis

SCENARIO BASED ADAPTIVE PREPROCESSING FOR STREAM DATA USING SVM CLASSIFIER

To Enhance Projection Scalability of Item Transactions by Parallel and Partition Projection using Dynamic Data Set

Customer Clustering using RFM analysis

Enhanced Retrieval of Web Pages using Improved Page Rank Algorithm

Keywords Data alignment, Data annotation, Web database, Search Result Record

DATA MINING II - 1DL460. Spring 2014"

Adaptive and Personalized System for Semantic Web Mining

Information mining and information retrieval : methods and applications

A Survey on k-means Clustering Algorithm Using Different Ranking Methods in Data Mining

Web Information Retrieval using WordNet

Domain Specific Search Engine for Students

Educational Data Mining: An Exploration of students' Academic performance Through Association Rule Mining set

RECOMMENDATION SYSTEM BASED ON ASSOCIATION RULES FOR DISTRIBUTED E-LEARNING MANAGEMENT SYSTEMS

SK International Journal of Multidisciplinary Research Hub Research Article / Survey Paper / Case Study Published By: SK Publisher

COMPARISON OF DIFFERENT CLASSIFICATION TECHNIQUES

Semantic Web Mining and its application in Human Resource Management

Web Page Classification using FP Growth Algorithm Akansha Garg,Computer Science Department Swami Vivekanad Subharti University,Meerut, India

Twitter data Analytics using Distributed Computing

A New Technique to Optimize User s Browsing Session using Data Mining

Letter Pair Similarity Classification and URL Ranking Based on Feedback Approach

INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY

Collaborative Framework for Testing Web Application Vulnerabilities Using STOWS

PREDICTING UPCOMING STUDENTS PERFORMANCE USING MINING TECHNIQUE

Web Usage Mining: A Research Area in Web Mining

COMP 465 Special Topics: Data Mining

Discovery of Agricultural Patterns Using Parallel Hybrid Clustering Paradigm

Department of Business Information Technology

Text Mining. Representation of Text Documents

An Efficient Methodology for Image Rich Information Retrieval

KEYWORDS: Clustering, RFPCM Algorithm, Ranking Method, Query Redirection Method.

Focused crawling: a new approach to topic-specific Web resource discovery. Authors

WEB SEARCH, FILTERING, AND TEXT MINING: TECHNOLOGY FOR A NEW ERA OF INFORMATION ACCESS

Optimization of Query Processing in XML Document Using Association and Path Based Indexing

Transcription:

A LITERATURE SURVEY: EDUCATIONAL DATA MINING AND WEB DATA MINING Sarvesh Tendulkar, Swarada Adavadkar Department of Computer Engineering MIT College of Engineering Pune, India ABSTRACT: Data Mining (DM) is extracting information from large set of data and obtaining meaningful results from it. Data Mining has evolved into diverse fields which requires development of data collection, creation, data management (including data storage retrieval and database transaction processing) and advanced data analysis (data warehousing and mining).this paper explains about two of Data Mining fields : Educational Data Mining (EDM) and Web Data Mining (WDM). Educational Data Mining (EDM) is an emerging field of data mining which applies various data mining algorithms for betterment of students and improving quality of student courseware. Web Data Mining (WDM) is a technique used to crawl through various web resources to collect information which an individual or a company use to promote business. In this survey, different published research papers on methodologies used in EDM and WDM are compared. Also, it gives an insight into techniques used in EDM and WDM. Keywords: Keyword Indexing, Search engines, document frequency, term weights, data mining, neural network, predicting school failure, job trends analysis PART I: EDM [1] INTRODUCTION Educational Data Mining (EDM) is a new trend in the data mining and Knowledge Discovery in Databases (KDD) field which focuses in mining useful patterns and discovering useful knowledge from the educational information systems, such as, admissions systems, registration systems, course management systems (moodle, blackboard, etc ), and any other systems dealing with students at different levels of education, from schools, to colleges and universities. As interest in EDM continued to increase, EDM researchers established an academic journal in 2009, the Journal of Educational Data Mining, for sharing and disseminating research results. In 2011, EDM researchers established the International Educational Data Mining Society to connect EDM researchers and continue to grow the field. Goals for EDM in educational field: Baker and Yacef [1] describes the following four goals of EDM: 1) Predicting student's future learning behaviour Sarvesh Tendulkar and Swarada Adavadkar 1

A LITERATURE SURVEY: EDUCATIONAL DATA MINING AND WEB DATA MINING 2) Discovering or improving domain models 3) Studying the effects of educational support 4) Advancing scientific knowledge about learning and learners [2] EDM METHODS Classification It is a two-way technique (training and testing) which maps data into a predefined class. This technique is useful for success analysis with low, medium, high risk students used in [2], student monitoring systems [3], predicting student performance, misuse detection used in [4] etc. Statistics It is a technique to identify outlier fields, record using mean, mode etc. and hypothetical testing. This technique is useful to improve the course management system & student response system [5]. Clustering It is a technique to group similar data into clusters in a way that groups are not predefined. This technique is useful to distinguish learner with their preference in using interactive multimedia system used in [6], Students comprehensive character analysis used in [7] and suitable for collaborative learning. Prediction It is a technique which predicts a future state rather than a current state. This technique is useful to predict success rate, drop out used in Dekker et al. [7], and retention management used in [8] of students. Neural Network It is a technique to improve the interpretability of the learned network by using extracted rules for learning networks. This technique is useful to determine residency, ethnicity used in [9], to predict academic performance used in [10], accuracy prediction in the branch selection used in and explores learning performance in a TESL based e-learning system. Association Rule Mining It is a technique to identify specific relationships among data. This technique is useful to identify students failure patterns [11], parameters related to the admission process, migration, contribution of alumni, student assessment, co-relation between different group of students, to guide a search for a better fitting transfer model of student learning etc. Apart from the above methods, two new methods i.e. distillation of data for human judgment and discovery with models to analyze the behavioral impact of students in learning environments. [3] APPLICATIONS 1. Analysis and Visualization of Data 2. Predicting Student Performance 3. Enrolment Management 4. Grouping Students 5. Predicting Students Profiling 6. Planning and scheduling 7. Organization of Syllabus 8. Detecting Cheating in Online Examination Sarvesh Tendulkar and Swarada Adavadkar 2

[5] LITERATURE SURVEY OF EDM TRENDS Author(s) and Year Dutta Borah et al., 2011 [13] Lin S.H., 2012 [8] Merceron and Yacef. 2005 [14] Amjad Abu Saa 2016 [15] Carlos Márquez- Vera, Cristóbal Romero Morales, and Sebastián Ventura Soto, 2013 [12] Data Mining Software(s) SPSS Clementine 11.1 WEKA Excel, Clementine, Tada-Ed, SODAS RapidMiner, WEKA WEKA Data Mining Algorithm(s) Classification- Decision Tree Decision Tree Algorithm Classification- DT, Clustering, Association Mining Decision Tree - C4.5, CART, CHAID, ID3 Decision Tree, Rule Induction, Synthetic Minority Oversampling Technique(SMOTE) Input Dataset AIEEE 2007 5943 records of 1st year students of Biola University Logic-ITA Student Data survey distributed to students within their daily classes and as an online survey using Google Forms A specific survey, a general survey, The Department of School Services Educational Outcome Predict student s enrolment decision Student Retention Management Student/teachers' performance monitoring system Students Performance Prediction Predicting the academic status of students PART II: WDM [6] INTRODUCTION The term Web Data Mining is a technique used to crawl through various web resources to collect required information, which enables an individual or a company to promote business, understanding marketing dynamics, new promotions floating on the Internet, etc. Web mining refers to the automatic discovery of interesting and useful patterns from the Sarvesh Tendulkar and Swarada Adavadkar 3

A LITERATURE SURVEY: EDUCATIONAL DATA MINING AND WEB DATA MINING data associated with the usage, content, and the linkage structure of Web resources. Web mining lies in between and copes with semi structured data and/or unstructured data. Web mining calls for creative use of data mining and/or text mining techniques and its distinctive approaches. Mining the web data is one of the most challenging tasks for the data mining and data management scholars because there are huge heterogeneous, less structured data available on the web and we can easily get overwhelmed with data.[18] Web data mining can be broadly classified into 3 types: Web usage mining Web content mining Web structure mining [7] WDM APPLICATIONS Applications: 1. E-commerce to do personalized marketing 2. Analyze feedback for particular product 3. Government agencies to identify threat and fight terrorism 4. Companies to keep better relation with customers 5. Analyze current market trends for better performance[20] [9] LITERATURE SURVEY: INDEXING TECHNIQUES Sr.N o. Author & Year 1 Changsha ng Zhou, Wei Ding, Na Yang 2006[24] 2 Fabrizio Silvestri, Raffaele Perego and Salvatore Orlando 2004[23] Indexing Techniques Double Indexing Mechanism of Search Engine Assigning Document Identifiers to Enhance Compressibilit y of Web Search Engines Indexes Methodology It has document index as well as word index. The document index is based on the clustering, and ordered by the position in each document. In the retrieval, the search engine first gets the document id of the word in the word index, and then goes to the position of corresponding word in the document index. It was the reordering algorithm which partitions the set of documents into k ordered clusters on the basis of similarity measure. According to this algorithm, the biggest document is selected as centroid of the first cluster and n/k1 most similar documents are assigned to this cluster. Then the biggest document is selected and the same process repeats Limitations Time consuming as the index exists at two levels. It is not effective in clustering the most similar documents. The biggest document may not have similarity with any of the documents but still it is taken as the representative of the cluster. 3 L. Huilin, K. Efficiently Crawling Focused search engine comprising of two components; filtering and link forecasting. Focused search engine does not Sarvesh Tendulkar and Swarada Adavadkar 4

Chunhua and W. Guangxin g 2007[22] 4 Rada Mihalcea and Dan Moldovan, 2000[21] Strategy for Focused Searching Engine Semantic Indexing using WordNet senses The first component filters out the irrelevant documents from previously fetched list of documents and the second component envisage the best matched links form related documents to assign them to crawler for further downloading. It shows improved effectiveness over the classic word based indexing techniques. Herein a Boolean Information retrieval system adds word semantics to the classic word based indexing. Two of the core tasks of the system, namely the indexing and retrieval components, use a combined wordbased and sense-based approach consider the context of the keywords of the user query. Representation is dense, so hard to index based on individual dimensions. 5 X. Chen, X. Zhang, 2008[20] 6 Parul Gupta, Dr. A.K. Sharma, 2010[16] HAWK: A Focused Crawler with Content and Link Analysis Context based Indexing in Search Engines using Ontology It is based on the content of the document and on link analysis. It is implemented using user-defined relevant formula, shark-search and Page-Rank. An index is built on the basis of context of the document rather than on the terms basis using ontology. The context of the documents being collected by the crawler in the repository is being extracted by the indexer using the context repository, thesaurus and ontology repository and then documents are indexed according to their respective context. It is does not make use of domain ontologies to represent topical maps and link Web pages. It considered terms only in title of the document with maximum frequency. User has to pass the context along with query keyword which creates problem with naive users. 7 P.Mudgil, A.K.Shar ma, P.Gupta 2013[17] An Improved Indexing Mechanism to Index Web Documents This indexing considers the senses of the keyword. Different users type the same query to get the different results according to their interest. It will provide the different senses to the user which will help in generating the more relevant results to the user. The processing of file includes: removal of stop words, stemming, adding of weights and indexing of file. [10] CONCLUSION Data Mining technology has emerged to remedy the issues of management and analysis of massive volumes of data. In this paper, a comparison of trends and techniques of EDM and WDM is presented. This work focuses on various trends used in educational data mining and Sarvesh Tendulkar and Swarada Adavadkar 5

A LITERATURE SURVEY: EDUCATIONAL DATA MINING AND WEB DATA MINING different technologies used for web data mining. It is observed that the educational outcome varies according to input dataset provided for the given experiment. From above survey it can also be inferred that context based indexing has proved to give more relevant results efficiently. REFERENCES [1] Baker, R.S.; Yacef, K (2009). "The state of educational data mining in 2009: A review and future visions". JEDM-Journal of Educational Data Mining 1 (1): 2017 [2] Vandamme, J.P. et al. (2007), Predicting academic performance by Data Mining methods, Taylor and Francis group Journal Education Economics.Vol.15, No.4, pp.405-419. [3] Luan, J., (2002), Data mining, knowledge management in higher education, potential applications,in Workshop associate of institutional research international conference. [4] Baker, R.S, Corbett, A.T., Koedinger, K.R (2004), Detecting Student Misuse of Intelligent Tutoring Systems in Proc. Lecture Notes in Computer ScienceVol.3220,531-540. [5] Campbell, J.P., and Oblinger, D.G. (2007,Oct.). Academic Analytics. EDUCAUSE, Washington, D.C. [Online].Available http://net.educause.edu/ir/library/pdf/pub6101.pdf,2007. [6] Chrysostomu K. el al.(2009), Investigation of users preference in interactive multimedia learning systems: a data mining approach,taylor and Francis online journal Interactive learning environments. Vol. 17,No. 2. [7] Zhang Y.et al. (2010), Using data mining to improve student retention in HE: a case study, in Proc.12th Int. Conf. on Enterprise Information Systems, Volume 1: Databases and Information Systems Integration.Portugal, pp.190-197. [8] Lin, S.H.(2012), Data Mining for student retention management ACM journal of Computing Sciences in CollegesI. Vol.27,No.4,pp. 92-99. [9] Yu et al. (2010), A data mining approach for identifying predictors of student retention from sophomore to junior year, Journal of Data Science.Vol.8, pp.307-325. [10] Vandamme, J.P. et al. (2007), Predicting academic performance by Data Mining methods, Taylor and Francis group Journal Education Economics.Vol.15, No.4, pp.405-419. [11] Oladipupo,O.O.,Oyelade,O.J.(2009), Knowledge Discovery from Students Result Repository: Association Rule Mining Approach, International Journal of Computer Science & Security,Vol.4,No.2,pp.199-207. [12] Carlos Mrquez-Vera, Cristbal Romero Morales, and Sebastin Ventura Soto "Predicting School Failure and Dropout by Using Data Mining Techniques", IEEE Journal Of Latin- American Learning Technologies, vol. 8, no. 1,February 2013. [13] Dutta Borah, M., Jindal, R., Gupta,D.,Deka,G.C (2011), Application of knowledge based decision technique to Predict student s enrolment decision. in Proc. Int. Conf. on Recent Trends in Information Systems, India:IEEE,pp.180-184,DOI :10.1109/ReTIS.2011.6146864. [14] Mercheron, A., and Yacef, K. (2005), Educational Data Mining: a case study in Proc. Conf. on Artificial Intelligence in Education Supporting Learning through Intelligent and Socially Informed Technology. IOS Press, Amsterdam, The Netherlands, pp. 467-474. [15] Amjad Abu Saa " Educational Data Mining & Students Performance Prediction", International Journal of Advanced Computer Science and Applications, vol. 7, no. 5, 2016. [16] P.Gupta and Dr. A.K.Sharma Context based Indexing in Search Engines using Ontology, International Journal of Computer Applications, Volume 1 No. 14 [17] P. Mudgil, A.K.Sharma and P.Gupta, An Improved Indexing Mechanism to Index Web Documents, 5th International Conference on Computational Intelligence and Communication Networks [18] Smith, D., & Ali, A., Analyzing computer programming job trend using web data mining Issue in Informing Science and Information Technology, 11, 203-214 [19] http://www.ieeefinalyearprojects.org/web_mining_projects_in_java_dotnet.html [20] X. Chen, X. Zhang, HAWK: A Focused Crawler with Content and Link Analysis, IEEE International Conference on e-business Engineering. pp- 677-680, 2008. Sarvesh Tendulkar and Swarada Adavadkar 6

[21] Rada Mihalcea and Dan Moldovan, Semantic Indexing using WordNet senses, in proceedings of ACL workshop on IR and NLP, Hong Kong, 2000. [22] ]L. Huilin, K. Chunhua and W. Guangxing, Efficiently Crawling Strategy for Focused Searching Engine, Advances in Web and Network Technologies and Information Management [23] Fabrizio Silvestri, Raffaele Perego and Salvatore Orlando. Assigning Document Identifiers to Enhance Compressibility of Web Search Engines Indexes,. In the proceedings of SAC,2004. [24] Changshang Zhou, Wei Ding, Na Yang, Double Indexing Mechanism of Search Engine based on Campus Net,Proceedings of the 2006 IEEE Asia-Pacific Conference on Services Computing (APSCC'06) Sarvesh Tendulkar and Swarada Adavadkar 7