Project on Data Analytics CIS 660 Sunnie S Chung
|
|
- Meryl Armstrong
- 6 years ago
- Views:
Transcription
1 Project on Data Analytics CIS 660 Sunnie S Chung 2 Person Group Project: 20% Presentation of a project with related research papers: 10 % You can choose one of the following projects or you can create your own. You can change some details of the project that you choose from the list as you need. For some of those projects on Social Network sites, you will need to get an account approval from Twitter, Yelp, Facebook, or LinkedIn site to register your Project (App) as a developer to be able to download data from the sites. Check their Developer/App/Tool options in those sites for this process. Give the class Project site for App URL for the process. Or you can choose any set of web sites, system log files or any other data that you can obtain to process for your project. Some of the available data sets are listed below. For those who want to work with NoSQL systems on Hadoop, you may use any Hadoop related apps/tools to create projects (See CIS612 Project List for the guides for this). More instructions to download and install them will be given per request. However, this option is not recommended for those who have never had any experience on Hadoop or NoSQL systems. Please take CIS612 for that. Submit 1-2 page proposal on a project your group choose to specify your data, major tasks and data analytic systems/tools to use and plan a time line by the deadline of Phase 1. Each group (2 person group) will give a 20 min presentation on a project and the related research paper you choose (tasks and tools used for this project as well) during last class sessions. Presentation scheduling will be done after midterm. First session presentation groups will get 5-10% extra credits (This not applicable for any summer semester).
2 Project Specification CIS 660 SS Chung Phase 1: Planning Plan your project by researching data sets and data mining algorithms/tools to create your data mining project. Submit 1-2 page proposal per your group. Phase 2: Data Cleaning/Preprocessing/Transformation Obtain your data and preprocess them. Create a data mining project with your data set using a data mining system or tools of your choice. For this project, you can use and any data mining tools or any open source implementations of the data mining techniques covered in class and any data set of your choice given below or any data that you obtain from the suggested links. Phase 3: Implement/Apply Data Mining, Validate your result, and Presentation Implement/Perform Data Mining Algorithms to get results. Validate your results using cross validation tool available in your choice of systems. Visualize your results and prepare your presentation. See the deadline for each phase on the class webpage.
3 Project List You can create your own data mining project or you can choose your project in the suggested project list below and papers in the suggested research topics and the paper list here. You can also choose one on the topics and the papers on the conference sites below or related resource sites that listed here. You can change the detail of the project as you wish. Examples of Selective Current Research Topics in Big Data Analytics/Data Mining 1. Text Mining of Social Network Data: Twitter, Yelp, Facebook, LinkedIn, and more Sentiment Analysis of Product Review Social Network Data Analysis One of the most common Data Analytics is mining text data which are unstructured/semi structured data. The common examples of such data are message logging data from social media sites or system generated log files. One way to mining such data is to transform the unstructured/semi-structured logging format into structured files to process. You can also create a database/collections from the transformed files to query for data mining. Such structured files could be tables in RDBMS, Key Value Stores (in JSON format), CSV(Comma Separated Value), TSV (Tab Separated Value) or a Document Collection for the common NoSQL systems like Mongo DB, Hive, Cassandra in HDFS. You can use HBase or Pig as well. There are useful open source tools like Tweepy, FacePager, Flume, or any other available tools. They can be used to download a stream of data from the Twitter/Facebook site to your system or any HDFS system. Once you transform your text data into a structured file, then you can apply any datamining tool/algorithms to the transformed data for Classification:Decision Tree, PEBLS, Neural Network, SVM or Clustering. Text data you can download from: Twitter Yelp Facebook LinkedIn You can download any web pages or any data sets. (See Resource List below) One available data set that used for the examples in this section from (This download contains the text for 219 State of the Union addresses of U.S. Presidents between 1790and 2006)
4 See the project guide below for more detail - an example of the project for Text Mining with R: Twitter Data Analysis Any Public Facebook Sites: NewYork Times, Washington Post, Boston Tribune Facebook Data transformation into either one of the platforms: Tables in RDBMS (MS SQL Server or any database server with Java/JDBC) Key Value Stores (JSON file format), CSV(Comma Separated Value), TSV (Tab Separated Value) Twitter Message data transformation into either one of the platforms: Tables in RDBMS (MS SQL Server or any database server with Java/JDBC) Key Value Stores in JSON file format, CSV(Comma Separated Value), TSV (Tab Separated Value) The Unified Logging Infrastructure for Data Analytics at Twitter George Lee, Jimmy Lin, Chuang Liu, Andrew Lorek, and Dmitriy Ryaboy Twitter, Inc. Webpage or Document Processing for Text Analysis Document Clustering, Phrase Search Generating Word2Vec for each word in Wikipedia or Webpage collection and generating Paragraph2vec for each document to do Similarity Search for Document (Webpage) Clustering or Sentiment Analysis See Natural Language Processing in Unstructured Text Mining section of Class Lecture Notes for the details, the tutorial sites, papers, and Data sets. Text Mining (Sentiment Analysis) with SVM using Yelp Review Data Set or Movie Review in rotten tomato site. Implement Sentiment Analysis in the papers below. (See me for more guides on this) Review Data Sources for Sentiment Analysis Amazon Product Review Data: Movie Review Data
5 Yelp Data Set Question Answering System Question Answering Data on Amazon Product Reviews Papers: Some related papers to start: User-Level Sentiment Analysis Incorporating Social Networks in Twitter (Yahoo) Good Research Project on Sentiment Analysis Sites: Topic sentiment analysis in twitter: a graph-based hashtag sentiment classification approach Text Mining Using MS Analysis Service with Association Rule Mining See detail guides in the Project Section on the class webpage or the following links This section examines two particularly interesting data flow transformations that facilitate text mining: Term Extraction and Term Lookup. SQL Server Data Mining supports the TEXT data type, but that data type is not enough to perform meaningful text analysis. From the algorithm s perspective, columns having the TEXT data type are treated just like discrete columns that have the LONG data type as a collection of various distinct states, without any way to directly access the content of a text value.
6 To perform text mining with SQL Server Data Mining, you must first bring the text to some form that can be consumed by the algorithms. The solution included in the product is to represent each piece of text as a collection of words and phrases, and perform data mining based on the occurrence of certain key words and phrases inside a certain document (and possibly some frequency-related scores). Therefore, a document is modeled very similarly to a shopping basket that contains (or does not contain) certain items (which happen to be key words and phrases). After each document is represented as a collection of key phrases, you can perform data mining using one of the following model types: Classification models that use the key words and phrases nested table as input to predict the class of a document Clustering models that find similar documents based on common occurrences Association models that detect cross-correlations between key words and phrases 1. Build a dictionary of key words and phrases over a collection of representative documents. This task is usually accomplished using the Term Extraction transformation. 2. Based on the dictionary, extract the list of significant key words and phrases for each document to be analyzed. This task is usually accomplished using the Term Lookup transformation. 3. Train mining models on top of the transformed data. NOTE More Data Sources for text mining: State of the Union Any electronic books available on the web About 500 webpages on the Wikipedia site Fortune 500 Company Any Newspaper or Magazine Site Instead of using MS Data Tool, you can build your Document Frequency and Inverted Index described in the Lecture Notes on Information Retrieval to build Term Frequency and Document Frequency for Cosine Similarity. The lecture notes show how cosine similarity is adopted as vector space scoring for document ranking. The one that is not done in the lab2 (I didn't ask this in the lab2) is building weight matrix by calculating weighted score based on tf-idf on page in in the lecture note. Then you can calculated Cosine similarity between documents and the keyword using the weight score based on tf-idf you
7 built. At the end of lecture notes, there are variations of the scoring matrix to optimize. Cosine normalization as well. You can use any electronic books on the web or more than 500 webpages on the web. 2. Fraud Detection or Intrusion Detection using Data Mining Intrusion Detection - Process system Logging files to build database to query - Transform log files in any system into CSV file or a Table to apply any Data mining techniques for Anomaly Detection with Classification (e.g., SVM), Clustering (K Mean), etc. Two Datasets are available per request: NASA Webserver Log file (Old Data set from 1990) See an example project guide in detail to get NASA HTTP Access Logs Wireless Network Log file (New data Set from 2015) For the Data Set and papers, See Anomaly Detection Section in Class Lecture Notes on the Class Webpage Related Paper: Networks-Empirical-Evaluation-of-Threats.pdf 3. Recommendation System o Item-to-Item Collaborative Filtering in Recommendation System o Implement Data Transformation (Binarization of Basket Item Sets) to apply the data mining algorithm SVM. Data Source: Related papers from Amazon Recommendation System:
8 IBM Research Project: Building Data Analytic Artificial Intelligence: IBM Watson DeepQA Project Crime Forecasting Using Clustering Techniques NIJ (National Institute of Justice) Crime Forecasting Challenges and data set 6. Image Data Analytics Deep Learning for Image Recognition See Image Recognition Section at the end of the Class Lecture Notes for the details, the tutorial sites, papers, and Data sets. Face Recognition Research Image Data Processing Tutorial Sites: Data Source:
9 Related Research Papers: ImageNet Classification with Deep Convolutional Neural Networks Going Deeper with Convolutions Data Source: IMDB, Instagram Web Scrapping with XPath in Python E.md tics.py Other Related References: Image Data Sources:
10 Suggested Data Sources The suggested public social media sites or known data collection sites for data analytics are listed below with related industry research papers. You can deploy your big data infrastructure on Cloud. Data transformation into one of the HDFS based NoSQL Systems or both of the following HDFS platforms and RDBMS: 1-1) XML, Key Value Stores, JSON files in a Document Collection for Mongo DB, Cassandra or CSV(Comma Separated Value), TSV (Tab Separated Value) in Hive, PigLatin or Volt DB in HDFS. 1-2) Big Table in HBase in HDFS 1-3) RDD in Spark in HDFS to use Pipeling 1-4) Tables in RDBMS (MS SQL Server in Data Integration Service/ Data Analysis Service using LINQ or any RDBMS Database Server) 1. LinkedIn Related papers to read: Avatara: OLAP for Webscale Analytics Products Lili Wu Roshan Sumbaly Chris Riccomini Gordon Koo Hyung Jin Kim Jay Kreps Sam Shah LinkedIn The Big Data Ecosystem at LinkedIn Roshan Sumbaly, Jay Kreps, and Sam Shah LinkedIn 2. Any well-known Newspaper or Magazine sites on Facebook: Related papers to read: Petabyte Scale Databases and Storage Systems Deployed at Facebook. Dhruba Borthakur Data Warehousing and Analytics Infrastructure at Facebook, in SIGMOD 2010 by Ashish Thusoo (Facebook), et al,
11 3. Twitter Message data transformation: Related papers to read: will be given The Unified Logging Infrastructure for Data Analytics at Twitter George Lee, Jimmy Lin, Chuang Liu, Andrew Lorek, and Dmitriy Ryaboy Twitter, Inc. Fast Data in the Era of Big Data: Twitter s Real-Time Related Query Suggestion Architecture Gilad Mishne, Jeff Dalton, Zhenghua Li, Aneesh Sharma, Jimmy Lin Twitter, Inc Yelp Data Challenge: Business Data set 6. Transform log files in any system into either one of the platforms: Related papers to read: will be given 7. Webpage or Document Processing for Text Analysis Document Clustering, Phrase Search See Natural Language Processing in Unstructured Text Mining section of Class Lecture Notes for the details, the tutorial sites, papers, and Data sets. Or download all the webpages in one domain sites in any well known public news sites of your choice and extract the text body only using XPATH library in any web browser or language. Or You can download preprocessed Wikipedia texts (in XML) here
12 Arxiv research paper repository to Download You can download the data used for the examples in this section from This download contains the text for 219 State of the Union addresses of U.S. Presidents between 1790and IMDB Movie Review Collection for Sentiment Analysis: 9. WordNet You can build your Document Frequency and Inverted Index described in the Lecture Notes on Information Retrieval to build Any IR related Metrics in an algorithm or to apply Association Rule Mining algorithm. The lecture notes show how cosine similarity is adopted as vector space scoring for document ranking. One example is building weight matrix by calculating weighted score based on tf-idf on page in in the lecture note. Then you can calculated Cosine similarity between documents and the keyword using the weight score based on tf-idf you built. At the end of lecture notes, there are variations of the scoring matrix to optimize. Cosine normalization is one of them as well. 8. Transform any electronic books or online documents for text processing analysis Any Electronic book on line See item 7 Webpage Processing above for processing. 9. Text Mining with Data Source in 7 for Association Rule Mining Using MS Analysis Service See detail guides in the Project Section on the class webpage or the following links This section examines two particularly interesting data flow transformations that facilitate text mining: Term Extraction and Term Lookup. SQL Server Data Mining supports the TEXT data type, but that data type
13 is not enough to perform meaningful text analysis. From the algorithm s perspective, columns having the TEXT data type are treated just like discrete columns that have the LONG data type as a collection of various distinct states, without any way to directly access the content of a text value. To perform text mining with SQL Server Data Mining, you must first bring the text to some form that can be consumed by the algorithms. The solution included in the product is to represent each piece of text as a collection of words and phrases, and perform data mining based on the occurrence of certain key words and phrases inside a certain document (and possibly some frequency-related scores). Therefore, a document is modeled very similarly to a shopping basket that contains (or does not contain) certain items (which happen to be key words and phrases). After each document is represented as a collection of key phrases, you can perform data mining using one of the following model types: Classification models that use the key words and phrases nested table as input to predict the class of a document Clustering models that find similar documents based on common occurrences Association models that detect cross-correlations between key words and phrases 1. Build a dictionary of key words and phrases over a collection of representative documents. This task is usually accomplished using the Term Extraction transformation. 2. Based on the dictionary, extract the list of significant key words and phrases for each document to be analyzed. This task is usually accomplished using the Term Lookup transformation. 3. Train mining models on top of the transformed data. Data Source for text mining: Or You can use any electronic books on the web or more than 500 webpages on the web. 10. Image Data Analytics Deep Learning for Image Recognition See Deep Learning for Image Recognition Section at the end of the Class Lecture Notes for the details, the tutorial sites, papers, and Data sets. Data Sets: ImageNet Building Social Network Graph into a store
14 Facebook Friends Social Network (Graph API) data transformation Facebook Friends Social Network (Graph API) data transformation Related papers to read: will be given: 12. Implement any Data Mining Metric you learned in class with a Cube and Dimensions using Microsoft DW. Create Dimensions with a set of attributes and define measure in terms of similarity, distance, or correlation between any two records in vtargetmail data set for Clustering. 13. Minority Class Detection with Decision Tree with adapted measure and weight You can implement your own metric specified in the paper below that can be used in a Decision Tree Algorithm and test with Adventure Data Set. A Robust Decision Tree Algorithm for Imbalanced Data Sets Information Retrieval for finding the most related documents with keywords using any set of webpages or Wikipedia webpages. 15. Any Data Mining Project using Data Warehouse/OLAP with MDX and DMX See DW Tutorial and MDX, DMX Tutorial in Lab3 section for this. 16. Building Social Network Graph into a store Facebook Friends Social Network (Graph API) data transformation into either one of the platforms: Tables in RDBMS (MS SQL Server or any database server with Java/JDBC) Key Value Stores in JSON file format, CSV(Comma Separated Value), TSV (Tab Separated Value) Processing JSON file to table or CSV files with user id with edge columns then apply to data mining query 17. Any GIS data mining 18. Any Papers on One of the Following Topics: Stream data mining using Sparks Sequential pattern mining, sequence classification and clustering Time-series analysis, regression and trend analysis Biological sequence analysis and biological data mining Graph pattern mining, graph classification and clustering Social network analysis
15 Information network analysis Spatial, spatiotemporal and moving object data mining Multimedia data mining Mining computer systems and sensor networks Mining software programs Statistical data mining methods Other Useful Data Sources: Other Related Sites: Useful Resources R or Weka is a collection of machine learning or data mining algorithms for data mining tasks. The algorithms can either be applied directly to a dataset or called from your own Java code. Weka contains tools for data pre-processing, classification, regression, clustering, association rules, and visualization. R Programming:
16 SQL Server Analysis Services (SSAS) Data Tools: You can use R in 2016 SQL Server or Stand Alone R Server R Hadoop System: Weka: Good Conference Sites to Search: KDD Top Research Data Mining Conferences: KDD, IEEE ICDE, IEEE ICDM, CIKM, and SIAM SDM. ACM SIGMOD : VLDB (IEEE): ICDE (IEEE) Cyber Security: Accessorize to a Crime: Real and Stealthy Attacks on State-of-the-Art Face Recognition (Mahmood Sharif Carnegie Mellon University at SIGMOD 2016) AmpPot: Monitoring and Defending Against Amplication DDoS Attacks
17 A Privacy Protection Technique for Publishing Data Mining Models and Research Data fu.pdf?ip= &id= &acc=active%20service&key=1d8e1ca5b8d7d 8DD%2E3DC751E0CA962F99%2E4D4702B0C3E38B35%2E4D4702B0C3E38B35&CFID= &CFTOKEN= & acm = _163bd14f58b49ab867c87c6de3 9445e9#URLTOKEN# Privacy-Preserving Data Mining through Knowledge Model Sharing IMR based Anonymization for Privacy Preservation in Data Mining EN= Hiding a Needle in a Haystack: Privacy Preserving Apriori Algorithm in MapReduce Framework EN= Artificial Intelligence and Machine Learning: o Deep Face Recognition by Omkar M Parkhi o o o Some Research Resources (will be updated) Major Conference Proceedings that will be used 1. DM conferences: ACM SIGKDD (KDD), ICDM (IEEE, Int. Conf. Data Mining), SDM (SIAM Data Mining), PKDD (Principles KDD)/ECML, PAKDD (Pacific-Asia) 2. DB conferences: ACM SIGMOD, VLDB, ICDE 3. ML conferences: NIPS, ICML 4. IR conferences: SIGIR, CIKM 5. Web conferences: WWW, WSDM 6. Other related conferences and journals 7. IEEE TKDE, ACM TKDD, DMKD, ML Recommended Reference Books 1. C. M. Bishop, Pattern Recognition and Machine Learning, Springer 2007.
18 2. S. Chakrabarti, Mining the Web: Statistical Analysis of Hypertext and Semi-Structured Data, Morgan Kaufmann, T. Hastie, R. Tibshirani, J. Friedman, The Elements of Statistical Learning: Data Mining, Inference, and Prediction,2nd ed., Springer-Verlag, B. Liu, Web Data Mining: Exploring Hyperlinks, Contents, and Usage Data, Springer, D. Easley and J. Kleinberg, Networks, Crowds, and Markets: Reasoning About a Highly Connected World, Cambridge Univ. Press, M. Newman, Networks: An Introduction, Oxford Univ. Press, 2010.
CIS 601 Graduate Seminar in Computer Science Sunnie S. Chung
CIS 601 Graduate Seminar in Computer Science Sunnie S. Chung Research on Topics in Recent Computer Science Research and related papers in the subject that you choose and give presentations in class and
More informationCleveland State University
Cleveland State University CIS 612/CIS712 Big Data & Parallel Database Processing Systems (3-0-3) Prerequisites: CIS 530. CIS 611 Preferred. Instructor: Dr. Sunnie S. Chung Office Location: FH 222 Phone:
More informationCleveland State University
Cleveland State University CIS 612/CIS712 Big Data & Parallel Database Processing Systems (3-0-3) Prerequisites: CIS 530. CIS 611 Preferred. Instructor: Dr. Sunnie S. Chung Office Location: FH 222 Phone:
More informationChapter 1, Introduction
CSI 4352, Introduction to Data Mining Chapter 1, Introduction Young-Rae Cho Associate Professor Department of Computer Science Baylor University What is Data Mining? Definition Knowledge Discovery from
More informationIntroduction to Text Mining. Hongning Wang
Introduction to Text Mining Hongning Wang CS@UVa Who Am I? Hongning Wang Assistant professor in CS@UVa since August 2014 Research areas Information retrieval Data mining Machine learning CS@UVa CS6501:
More informationBig Data. Big Data Analyst. Big Data Engineer. Big Data Architect
Big Data Big Data Analyst INTRODUCTION TO BIG DATA ANALYTICS ANALYTICS PROCESSING TECHNIQUES DATA TRANSFORMATION & BATCH PROCESSING REAL TIME (STREAM) DATA PROCESSING Big Data Engineer BIG DATA FOUNDATION
More informationBig Trend in Business Intelligence: Data Mining over Big Data Web Transaction Data. Fall 2012
Big Trend in Business Intelligence: Data Mining over Big Data Web Transaction Data Fall 2012 Data Warehousing and OLAP Introduction Decision Support Technology On Line Analytical Processing Star Schema
More informationData Mining: Concepts and Techniques. (3 rd ed.) Chapter 1
Data Mining: Concepts and Techniques (3 rd ed.) Chapter 1 Jiawei Han, Micheline Kamber, and Jian Pei University of Illinois at Urbana-Champaign & Simon Fraser University 2013 Han, Kamber & Pei. All rights
More informationCleveland State University
Cleveland State University CIS 612 Modern Database Programming & Big Data Processing (3-0-3) Fall 2014 Section 50 Class Nbr. 2670. Tues, Thur 4:00 5:15 PM Prerequisites: CIS 505 and CIS 530. CIS 611 Preferred.
More informationDelving Deep into Hadoop Course Contents Introduction to Hadoop and Architecture
Delving Deep into Hadoop Course Contents Introduction to Hadoop and Architecture Hadoop 1.0 Architecture Introduction to Hadoop & Big Data Hadoop Evolution Hadoop Architecture Networking Concepts Use cases
More informationBig Data Architect.
Big Data Architect www.austech.edu.au WHAT IS BIG DATA ARCHITECT? A big data architecture is designed to handle the ingestion, processing, and analysis of data that is too large or complex for traditional
More informationChallenges for Data Driven Systems
Challenges for Data Driven Systems Eiko Yoneki University of Cambridge Computer Laboratory Data Centric Systems and Networking Emergence of Big Data Shift of Communication Paradigm From end-to-end to data
More informationLab Assignment 2. CIS 612 Dr. Sunnie S. Chung
CIS 612 Dr. Sunnie S. Chung Lab Assignment 2 1. Creating a User Defined Type (UDT) 2. Text Processing to Create a Table Valued Function 3. Visualization of Data in Mongo DB in JSON Geo Location Data Type
More informationLab Assignment 2. CIS 612 Dr. Sunnie S. Chung. Creating a User Defined Type (UDT) and Create a Table Function Using the UDT Data Type
CIS 612 Dr. Sunnie S. Chung Lab Assignment 2 Creating a User Defined Type (UDT) and Create a Table Function Using the UDT Data Type In a modern web application such as in a Data Analytic/Big data processing
More informationData Mining. Yi-Cheng Chen ( 陳以錚 ) Dept. of Computer Science & Information Engineering, Tamkang University
Data Mining Yi-Cheng Chen ( 陳以錚 ) Dept. of Computer Science & Information Engineering, Tamkang University Why Mine Data? Commercial Viewpoint Lots of data is being collected and warehoused Web data, e-commerce
More informationNew Challenges in Big Data: Technical Perspectives. Hwanjo Yu POSTECH
New Challenges in Big Data: Technical Perspectives Hwanjo Yu POSTECH http:/hwanjoyu.org Over 1 Billion SNS users!! Viral Marketing Word-of-Mouth Effect > TV advertising......... Influence Maximization
More informationThe Hadoop Ecosystem. EECS 4415 Big Data Systems. Tilemachos Pechlivanoglou
The Hadoop Ecosystem EECS 4415 Big Data Systems Tilemachos Pechlivanoglou tipech@eecs.yorku.ca A lot of tools designed to work with Hadoop 2 HDFS, MapReduce Hadoop Distributed File System Core Hadoop component
More informationOverview. Prerequisites. Course Outline. Course Outline :: Apache Spark Development::
Title Duration : Apache Spark Development : 4 days Overview Spark is a fast and general cluster computing system for Big Data. It provides high-level APIs in Scala, Java, Python, and R, and an optimized
More informationCSE-4412: Data Mining
CSE-4412: Data Mining Welcome! Parke Godfrey www.cse.yorku.ca/course/4412/ January 9, 2007 Data Mining: Concepts and Techniques 1 Chapter 1. Introduction Why is data mining needed? What is data mining?
More informationChapter 6 VIDEO CASES
Chapter 6 Foundations of Business Intelligence: Databases and Information Management VIDEO CASES Case 1a: City of Dubuque Uses Cloud Computing and Sensors to Build a Smarter, Sustainable City Case 1b:
More informationDATA MINING RESEARCH: RETROSPECT AND PROSPECT
DATA MINING RESEARCH: RETROSPECT AND PROSPECT Prof(Dr).V.SARAVANAN & Mr. ABDUL KHADAR JILANI Department of Computer Science College of Computer and Information Sciences Majmaah University Kingdom of Saudi
More informationChapter 3. Foundations of Business Intelligence: Databases and Information Management
Chapter 3 Foundations of Business Intelligence: Databases and Information Management THE DATA HIERARCHY TRADITIONAL FILE PROCESSING Organizing Data in a Traditional File Environment Problems with the traditional
More informationCloud Computing 2. CSCI 4850/5850 High-Performance Computing Spring 2018
Cloud Computing 2 CSCI 4850/5850 High-Performance Computing Spring 2018 Tae-Hyuk (Ted) Ahn Department of Computer Science Program of Bioinformatics and Computational Biology Saint Louis University Learning
More informationData Mining Jay Urbain, PhD. Credits: Nazli Goharian, Jiawei Han, Micheline Kamber, and Jian Pei
Data Mining Jay Urbain, PhD Credits: Nazli Goharian, Jiawei Han, Micheline Kamber, and Jian Pei 1 What is Data Mining? 2 Data Mining: Discovering interesting patterns from data 3 Data Mining: Course Description
More informationDatabases 2 (VU) ( / )
Databases 2 (VU) (706.711 / 707.030) MapReduce (Part 3) Mark Kröll ISDS, TU Graz Nov. 27, 2017 Mark Kröll (ISDS, TU Graz) MapReduce Nov. 27, 2017 1 / 42 Outline 1 Problems Suited for Map-Reduce 2 MapReduce:
More informationCSE5243 INTRO. TO DATA MINING
CSE5243 INTRO. TO DATA MINING Chapter 1. Introduction Huan Sun, CSE@The Ohio State University Slides adapted from UIUC CS412, Fall 2017, by Prof. Jiawei Han CSE 5243. Course Page & Schedule Class Homepage:
More informationCleveland State University
Cleveland State University CIS 611/711 Enterprise Databases and Data Warehouse (3-0-3) Prerequisites: CIS430/CIS 530 Instructor: Dr. Sunnie S. Chung Office Location: FH222 Phone: 216 687 4661 Email: sschung.cis@gmail.com
More informationBIG DATA COURSE CONTENT
BIG DATA COURSE CONTENT [I] Get Started with Big Data Microsoft Professional Orientation: Big Data Duration: 12 hrs Course Content: Introduction Course Introduction Data Fundamentals Introduction to Data
More informationData Mining. Introduction. Hamid Beigy. Sharif University of Technology. Fall 1395
Data Mining Introduction Hamid Beigy Sharif University of Technology Fall 1395 Hamid Beigy (Sharif University of Technology) Data Mining Fall 1395 1 / 21 Table of contents 1 Introduction 2 Data mining
More informationBig Data Hadoop Course Content
Big Data Hadoop Course Content Topics covered in the training Introduction to Linux and Big Data Virtual Machine ( VM) Introduction/ Installation of VirtualBox and the Big Data VM Introduction to Linux
More informationCIS 612 Advanced Topics in Database Big Data Project Lawrence Ni, Priya Patil, James Tench
CIS 612 Advanced Topics in Database Big Data Project Lawrence Ni, Priya Patil, James Tench Abstract Implementing a Hadoop-based system for processing big data and doing analytics is a topic which has been
More informationData Mining. Introduction. Hamid Beigy. Sharif University of Technology. Fall 1394
Data Mining Introduction Hamid Beigy Sharif University of Technology Fall 1394 Hamid Beigy (Sharif University of Technology) Data Mining Fall 1394 1 / 20 Table of contents 1 Introduction 2 Data mining
More informationBig Data with Hadoop Ecosystem
Diógenes Pires Big Data with Hadoop Ecosystem Hands-on (HBase, MySql and Hive + Power BI) Internet Live http://www.internetlivestats.com/ Introduction Business Intelligence Business Intelligence Process
More information3 Data, Data Mining. Chengkai Li
CSE4334/5334 Data Mining 3 Data, Data Mining Chengkai Li Department of Computer Science and Engineering University of Texas at Arlington Fall 2018 (Slides partly courtesy of Pang-Ning Tan, Michael Steinbach
More informationPrototyping Data Intensive Apps: TrendingTopics.org
Prototyping Data Intensive Apps: TrendingTopics.org Pete Skomoroch Research Scientist at LinkedIn Consultant at Data Wrangling @peteskomoroch 09/29/09 1 Talk Outline TrendingTopics Overview Wikipedia Page
More informationDealing with Data Especially Big Data
Dealing with Data Especially Big Data INFO-GB-2346.01 Fall 2017 Professor Norman White nwhite@stern.nyu.edu normwhite@twitter Teaching Assistant: Frenil Sanghavi fps241@stern.nyu.edu Administrative Assistant:
More informationOverview of Web Mining Techniques and its Application towards Web
Overview of Web Mining Techniques and its Application towards Web *Prof.Pooja Mehta Abstract The World Wide Web (WWW) acts as an interactive and popular way to transfer information. Due to the enormous
More informationData Architectures in Azure for Analytics & Big Data
Data Architectures in for Analytics & Big Data October 20, 2018 Melissa Coates Solution Architect, BlueGranite Microsoft Data Platform MVP Blog: www.sqlchick.com Twitter: @sqlchick Data Architecture A
More informationIBM Data Science Experience White paper. SparkR. Transforming R into a tool for big data analytics
IBM Data Science Experience White paper R Transforming R into a tool for big data analytics 2 R Executive summary This white paper introduces R, a package for the R statistical programming language that
More informationA Review Paper on Big data & Hadoop
A Review Paper on Big data & Hadoop Rupali Jagadale MCA Department, Modern College of Engg. Modern College of Engginering Pune,India rupalijagadale02@gmail.com Pratibha Adkar MCA Department, Modern College
More informationHadoop, Yarn and Beyond
Hadoop, Yarn and Beyond 1 B. R A M A M U R T H Y Overview We learned about Hadoop1.x or the core. Just like Java evolved, Java core, Java 1.X, Java 2.. So on, software and systems evolve, naturally.. Lets
More informationThe age of Big Data Big Data for Oracle Database Professionals
The age of Big Data Big Data for Oracle Database Professionals Oracle OpenWorld 2017 #OOW17 SessionID: SUN5698 Tom S. Reddy tom.reddy@datareddy.com About the Speaker COLLABORATE & OpenWorld Speaker IOUG
More informationHadoop An Overview. - Socrates CCDH
Hadoop An Overview - Socrates CCDH What is Big Data? Volume Not Gigabyte. Terabyte, Petabyte, Exabyte, Zettabyte - Due to handheld gadgets,and HD format images and videos - In total data, 90% of them collected
More informationIncrease Value from Big Data with Real-Time Data Integration and Streaming Analytics
Increase Value from Big Data with Real-Time Data Integration and Streaming Analytics Cy Erbay Senior Director Striim Executive Summary Striim is Uniquely Qualified to Solve the Challenges of Real-Time
More informationBig Data Analytics. Description:
Big Data Analytics Description: With the advance of IT storage, pcoressing, computation, and sensing technologies, Big Data has become a novel norm of life. Only until recently, computers are able to capture
More informationNowcasting. D B M G Data Base and Data Mining Group of Politecnico di Torino. Big Data: Hype or Hallelujah? Big data hype?
Big data hype? Big Data: Hype or Hallelujah? Data Base and Data Mining Group of 2 Google Flu trends On the Internet February 2010 detected flu outbreak two weeks ahead of CDC data Nowcasting http://www.internetlivestats.com/
More informationOracle Big Data Fundamentals Ed 1
Oracle University Contact Us: +0097143909050 Oracle Big Data Fundamentals Ed 1 Duration: 5 Days What you will learn In the Oracle Big Data Fundamentals course, learn to use Oracle's Integrated Big Data
More informationFall Principles of Knowledge Discovery in Databases. University of Alberta
Principles of Knowledge Discovery in Databases Fall 1999 Dr. Osmar R. Zaïane 2 1 Class and Office Hours Class: Mondays, Wednesdays and Fridays from 10:00 to 10:50 Office Hours: Tuesdays from 11:00 to 11:55
More informationCS 412 Intro. to Data Mining
CS 412 Intro. to Data Mining Chapter 1. Introduction Jiawei Han, Computer Science, Univ. Illinois at Urbana -Champaign, 2017 1 August 28, 2017 Data Mining: Concepts and Techniques 2 August 28, 2017 Data
More informationOracle Big Data Fundamentals Ed 2
Oracle University Contact Us: 1.800.529.0165 Oracle Big Data Fundamentals Ed 2 Duration: 5 Days What you will learn In the Oracle Big Data Fundamentals course, you learn about big data, the technologies
More informationAn Introduction to Apache Spark
An Introduction to Apache Spark 1 History Developed in 2009 at UC Berkeley AMPLab. Open sourced in 2010. Spark becomes one of the largest big-data projects with more 400 contributors in 50+ organizations
More informationData Mining: Dynamic Past and Promising Future
SDM@10 Anniversary Panel: Data Mining: A Decade of Progress and Future Outlook Data Mining: Dynamic Past and Promising Future Jiawei Han Department of Computer Science University of Illinois at Urbana
More informationBig Data and Hadoop. Course Curriculum: Your 10 Module Learning Plan. About Edureka
Course Curriculum: Your 10 Module Learning Plan Big Data and Hadoop About Edureka Edureka is a leading e-learning platform providing live instructor-led interactive online training. We cater to professionals
More informationOracle Big Data Science
Oracle Big Data Science Tim Vlamis and Dan Vlamis Vlamis Software Solutions 816-781-2880 www.vlamis.com @VlamisSoftware Vlamis Software Solutions Vlamis Software founded in 1992 in Kansas City, Missouri
More informationLearning to Match. Jun Xu, Zhengdong Lu, Tianqi Chen, Hang Li
Learning to Match Jun Xu, Zhengdong Lu, Tianqi Chen, Hang Li 1. Introduction The main tasks in many applications can be formalized as matching between heterogeneous objects, including search, recommendation,
More informationHadoop Development Introduction
Hadoop Development Introduction What is Bigdata? Evolution of Bigdata Types of Data and their Significance Need for Bigdata Analytics Why Bigdata with Hadoop? History of Hadoop Why Hadoop is in demand
More informationR Language for the SQL Server DBA
R Language for the SQL Server DBA Beginning with R Ing. Eduardo Castro, PhD, Principal Data Analyst Architect, LP Consulting Moderated By: Jose Rolando Guay Paz Thank You microsoft.com idera.com attunity.com
More informationData Mining. Chapter 1: Introduction. Adapted from materials by Jiawei Han, Micheline Kamber, and Jian Pei
Data Mining Chapter 1: Introduction Adapted from materials by Jiawei Han, Micheline Kamber, and Jian Pei 1 Any Question? Just Ask 3 Chapter 1. Introduction Why Data Mining? What Is Data Mining? A Multi-Dimensional
More informationDATA MINING II - 1DL460. Spring 2014"
DATA MINING II - 1DL460 Spring 2014" A second course in data mining http://www.it.uu.se/edu/course/homepage/infoutv2/vt14 Kjell Orsborn Uppsala Database Laboratory Department of Information Technology,
More informationCluster Computing Architecture. Intel Labs
Intel Labs Legal Notices INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL PRODUCTS. NO LICENSE, EXPRESS OR IMPLIED, BY ESTOPPEL OR OTHERWISE, TO ANY INTELLECTUAL PROPERTY RIGHTS IS GRANTED
More information4th National Conference on Electrical, Electronics and Computer Engineering (NCEECE 2015)
4th National Conference on Electrical, Electronics and Computer Engineering (NCEECE 2015) Benchmark Testing for Transwarp Inceptor A big data analysis system based on in-memory computing Mingang Chen1,2,a,
More informationBest practices for building a Hadoop Data Lake Solution CHARLOTTE HADOOP USER GROUP
Best practices for building a Hadoop Data Lake Solution CHARLOTTE HADOOP USER GROUP 07.29.2015 LANDING STAGING DW Let s start with something basic Is Data Lake a new concept? What is the closest we can
More informationBig Data com Hadoop. VIII Sessão - SQL Bahia. Impala, Hive e Spark. Diógenes Pires 03/03/2018
Big Data com Hadoop Impala, Hive e Spark VIII Sessão - SQL Bahia 03/03/2018 Diógenes Pires Connect with PASS Sign up for a free membership today at: pass.org #sqlpass Internet Live http://www.internetlivestats.com/
More informationCS423: Data Mining. Introduction. Jakramate Bootkrajang. Department of Computer Science Chiang Mai University
CS423: Data Mining Introduction Jakramate Bootkrajang Department of Computer Science Chiang Mai University Jakramate Bootkrajang CS423: Data Mining 1 / 29 Quote of the day Never memorize something that
More informationWe are ready to serve Latest Testing Trends, Are you ready to learn?? New Batches Info
We are ready to serve Latest Testing Trends, Are you ready to learn?? New Batches Info START DATE : TIMINGS : DURATION : TYPE OF BATCH : FEE : FACULTY NAME : LAB TIMINGS : PH NO: 9963799240, 040-40025423
More informationSQT03 Big Data and Hadoop with Azure HDInsight Andrew Brust. Senior Director, Technical Product Marketing and Evangelism
Big Data and Hadoop with Azure HDInsight Andrew Brust Senior Director, Technical Product Marketing and Evangelism Datameer Level: Intermediate Meet Andrew Senior Director, Technical Product Marketing and
More informationEffective Keyword Search over (Semi)-Structured Big Data Mehdi Kargar
Effective Keyword Search over (Semi)-Structured Big Data Mehdi Kargar School of Computer Science Faculty of Science University of Windsor How Big is this Big Data? 40 Billion Instagram Photos 300 Hours
More informationHadoop Online Training
Hadoop Online Training IQ training facility offers Hadoop Online Training. Our Hadoop trainers come with vast work experience and teaching skills. Our Hadoop training online is regarded as the one of the
More informationINTRODUCTION TO BIG DATA, DATA MINING, AND MACHINE LEARNING
CS 7265 BIG DATA ANALYTICS INTRODUCTION TO BIG DATA, DATA MINING, AND MACHINE LEARNING * Some contents are adapted from Dr. Hung Huang and Dr. Chengkai Li at UT Arlington Mingon Kang, PhD Computer Science,
More informationOverview. Data-mining. Commercial & Scientific Applications. Ongoing Research Activities. From Research to Technology Transfer
Data Mining George Karypis Department of Computer Science Digital Technology Center University of Minnesota, Minneapolis, USA. http://www.cs.umn.edu/~karypis karypis@cs.umn.edu Overview Data-mining What
More informationThe 2018 (14th) International Conference on Data Science (ICDATA)
CALL FOR PAPERS LATE BREAKING PAPERS, POSITION PAPERS, ABSTRACTS, POSTERS Paper Submission Deadline: May 20, 2018 The 2018 (14th) International Conference on Data Science (ICDATA) (former International
More informationBased on Big Data: Hype or Hallelujah? by Elena Baralis
Based on Big Data: Hype or Hallelujah? by Elena Baralis http://dbdmg.polito.it/wordpress/wp-content/uploads/2010/12/bigdata_2015_2x.pdf 1 3 February 2010 Google detected flu outbreak two weeks ahead of
More informationContents PART I: CLOUD, BIG DATA, AND COGNITIVE COMPUTING 1
Preface xiii PART I: CLOUD, BIG DATA, AND COGNITIVE COMPUTING 1 1 Princi ples of Cloud Computing Systems 3 1.1 Elastic Cloud Systems for Scalable Computing 3 1.1.1 Enabling Technologies for Cloud Computing
More informationSpecialist ICT Learning
Specialist ICT Learning APPLIED DATA SCIENCE AND BIG DATA ANALYTICS GTBD7 Course Description This intensive training course provides theoretical and technical aspects of Data Science and Business Analytics.
More informationElection Analysis and Prediction Using Big Data Analytics
Election Analysis and Prediction Using Big Data Analytics Omkar Sawant, Chintaman Taral, Roopak Garbhe Students, Department Of Information Technology Vidyalankar Institute of Technology, Mumbai, India
More informationBig Data Technology Ecosystem. Mark Burnette Pentaho Director Sales Engineering, Hitachi Vantara
Big Data Technology Ecosystem Mark Burnette Pentaho Director Sales Engineering, Hitachi Vantara Agenda End-to-End Data Delivery Platform Ecosystem of Data Technologies Mapping an End-to-End Solution Case
More informationSURVEY ON STUDENT INFORMATION ANALYSIS
Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology ISSN 2320 088X IMPACT FACTOR: 5.258 IJCSMC,
More informationBIG DATA TESTING: A UNIFIED VIEW
http://core.ecu.edu/strg BIG DATA TESTING: A UNIFIED VIEW BY NAM THAI ECU, Computer Science Department, March 16, 2016 2/30 PRESENTATION CONTENT 1. Overview of Big Data A. 5 V s of Big Data B. Data generation
More informationStages of Data Processing
Data processing can be understood as the conversion of raw data into a meaningful and desired form. Basically, producing information that can be understood by the end user. So then, the question arises,
More informationIntroduction to Data Mining and Data Analytics
1/28/2016 MIST.7060 Data Analytics 1 Introduction to Data Mining and Data Analytics What Are Data Mining and Data Analytics? Data mining is the process of discovering hidden patterns in data, where Patterns
More informationData Mining: Concepts and Techniques
Data Mining: Concepts and Techniques Slides for Textbook Chapter 1 Jiawei Han and Micheline Kamber Intelligent Database Systems Research Lab School of Computing Science Simon Fraser University, Canada
More informationBig Data Infrastructures & Technologies
Big Data Infrastructures & Technologies Spark and MLLIB OVERVIEW OF SPARK What is Spark? Fast and expressive cluster computing system interoperable with Apache Hadoop Improves efficiency through: In-memory
More informationCS249: ADVANCED DATA MINING
CS249: ADVANCED DATA MINING 1: Introduction Instructor: Yizhou Sun yzsun@cs.ucla.edu (Instructor for Today s class: Ting Chen) April 9, 2017 Course Information Course homepage: http://web.cs.ucla.edu/~yzsun/classes/2017spr
More informationACHIEVEMENTS FROM TRAINING
LEARN WELL TECHNOCRAFT DATA SCIENCE/ MACHINE LEARNING SYLLABUS 8TH YEAR OF ACCOMPLISHMENTS AUTHORIZED GLOBAL CERTIFICATION CENTER FOR MICROSOFT, ORACLE, IBM, AWS AND MANY MORE. 8411002339/7709292162 WWW.DW-LEARNWELL.COM
More informationCS-490WIR Web Information Retrieval and Management. Luo Si
CS490W: Web Information Retrieval & Management CS-490WIR Web Information Retrieval and Management Luo Si Department of Computer Science Purdue University Overview Web: Growth of the Web The world produces
More informationBig Data Hadoop Developer Course Content. Big Data Hadoop Developer - The Complete Course Course Duration: 45 Hours
Big Data Hadoop Developer Course Content Who is the target audience? Big Data Hadoop Developer - The Complete Course Course Duration: 45 Hours Complete beginners who want to learn Big Data Hadoop Professionals
More informationProcessing Unstructured Data. Dinesh Priyankara Founder/Principal Architect dinesql Pvt Ltd.
Processing Unstructured Data Dinesh Priyankara Founder/Principal Architect dinesql Pvt Ltd. http://dinesql.com / Dinesh Priyankara @dinesh_priya Founder/Principal Architect dinesql Pvt Ltd. Microsoft Most
More informationWeb Mining TEAM 8. Professor Anita Wasilewska CSE 634 Data Mining
Web Mining TEAM 8 Paper - You Are What You Tweet : Analyzing Twitter for Public Health Authors : Paul, Michael J., and Mark Dredze. Conference : AAAI Publications, Fifth International AAAI Conference on
More informationBig Data Hadoop Stack
Big Data Hadoop Stack Lecture #1 Hadoop Beginnings What is Hadoop? Apache Hadoop is an open source software framework for storage and large scale processing of data-sets on clusters of commodity hardware
More information1 Dulcian, Inc., 2001 All rights reserved. Oracle9i Data Warehouse Review. Agenda
Agenda Oracle9i Warehouse Review Dulcian, Inc. Oracle9i Server OLAP Server Analytical SQL Mining ETL Infrastructure 9i Warehouse Builder Oracle 9i Server Overview E-Business Intelligence Platform 9i Server:
More informationOracle Database 11g for Data Warehousing & Big Data: Strategy, Roadmap Jean-Pierre Dijcks, Hermann Baer Oracle Redwood City, CA, USA
Oracle Database 11g for Data Warehousing & Big Data: Strategy, Roadmap Jean-Pierre Dijcks, Hermann Baer Oracle Redwood City, CA, USA Keywords: Big Data, Oracle Big Data Appliance, Hadoop, NoSQL, Oracle
More informationDATA SCIENCE USING SPARK: AN INTRODUCTION
DATA SCIENCE USING SPARK: AN INTRODUCTION TOPICS COVERED Introduction to Spark Getting Started with Spark Programming in Spark Data Science with Spark What next? 2 DATA SCIENCE PROCESS Exploratory Data
More informationA data-driven framework for archiving and exploring social media data
A data-driven framework for archiving and exploring social media data Qunying Huang and Chen Xu Yongqi An, 20599957 Oct 18, 2016 Introduction Social media applications are widely deployed in various platforms
More informationCSE 444: Database Internals. Lecture 23 Spark
CSE 444: Database Internals Lecture 23 Spark References Spark is an open source system from Berkeley Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing. Matei
More informationWebinar Series TMIP VISION
Webinar Series TMIP VISION TMIP provides technical support and promotes knowledge and information exchange in the transportation planning and modeling community. Today s Goals To Consider: Parallel Processing
More informationOracle Big Data Science IOUG Collaborate 16
Oracle Big Data Science IOUG Collaborate 16 Session 4762 Tim and Dan Vlamis Tuesday, April 12, 2016 Vlamis Software Solutions Vlamis Software founded in 1992 in Kansas City, Missouri Developed 200+ Oracle
More informationBlended Learning Outline: Developer Training for Apache Spark and Hadoop (180404a)
Blended Learning Outline: Developer Training for Apache Spark and Hadoop (180404a) Cloudera s Developer Training for Apache Spark and Hadoop delivers the key concepts and expertise need to develop high-performance
More informationIntroduction to Information Retrieval. Hongning Wang
Introduction to Information Retrieval Hongning Wang CS@UVa What is information retrieval? 2 Why information retrieval Information overload It refers to the difficulty a person can have understanding an
More informationHadoop course content
course content COURSE DETAILS 1. In-detail explanation on the concepts of HDFS & MapReduce frameworks 2. What is 2.X Architecture & How to set up Cluster 3. How to write complex MapReduce Programs 4. In-detail
More information745: Advanced Database Systems
745: Advanced Database Systems Yanlei Diao University of Massachusetts Amherst Outline Overview of course topics Course requirements Database Management Systems 1. Online Analytical Processing (OLAP) vs.
More information