Project on Data Analytics CIS 660 Sunnie S Chung

Size: px
Start display at page:

Download "Project on Data Analytics CIS 660 Sunnie S Chung"

Transcription

1 Project on Data Analytics CIS 660 Sunnie S Chung 2 Person Group Project: 20% Presentation of a project with related research papers: 10 % You can choose one of the following projects or you can create your own. You can change some details of the project that you choose from the list as you need. For some of those projects on Social Network sites, you will need to get an account approval from Twitter, Yelp, Facebook, or LinkedIn site to register your Project (App) as a developer to be able to download data from the sites. Check their Developer/App/Tool options in those sites for this process. Give the class Project site for App URL for the process. Or you can choose any set of web sites, system log files or any other data that you can obtain to process for your project. Some of the available data sets are listed below. For those who want to work with NoSQL systems on Hadoop, you may use any Hadoop related apps/tools to create projects (See CIS612 Project List for the guides for this). More instructions to download and install them will be given per request. However, this option is not recommended for those who have never had any experience on Hadoop or NoSQL systems. Please take CIS612 for that. Submit 1-2 page proposal on a project your group choose to specify your data, major tasks and data analytic systems/tools to use and plan a time line by the deadline of Phase 1. Each group (2 person group) will give a 20 min presentation on a project and the related research paper you choose (tasks and tools used for this project as well) during last class sessions. Presentation scheduling will be done after midterm. First session presentation groups will get 5-10% extra credits (This not applicable for any summer semester).

2 Project Specification CIS 660 SS Chung Phase 1: Planning Plan your project by researching data sets and data mining algorithms/tools to create your data mining project. Submit 1-2 page proposal per your group. Phase 2: Data Cleaning/Preprocessing/Transformation Obtain your data and preprocess them. Create a data mining project with your data set using a data mining system or tools of your choice. For this project, you can use and any data mining tools or any open source implementations of the data mining techniques covered in class and any data set of your choice given below or any data that you obtain from the suggested links. Phase 3: Implement/Apply Data Mining, Validate your result, and Presentation Implement/Perform Data Mining Algorithms to get results. Validate your results using cross validation tool available in your choice of systems. Visualize your results and prepare your presentation. See the deadline for each phase on the class webpage.

3 Project List You can create your own data mining project or you can choose your project in the suggested project list below and papers in the suggested research topics and the paper list here. You can also choose one on the topics and the papers on the conference sites below or related resource sites that listed here. You can change the detail of the project as you wish. Examples of Selective Current Research Topics in Big Data Analytics/Data Mining 1. Text Mining of Social Network Data: Twitter, Yelp, Facebook, LinkedIn, and more Sentiment Analysis of Product Review Social Network Data Analysis One of the most common Data Analytics is mining text data which are unstructured/semi structured data. The common examples of such data are message logging data from social media sites or system generated log files. One way to mining such data is to transform the unstructured/semi-structured logging format into structured files to process. You can also create a database/collections from the transformed files to query for data mining. Such structured files could be tables in RDBMS, Key Value Stores (in JSON format), CSV(Comma Separated Value), TSV (Tab Separated Value) or a Document Collection for the common NoSQL systems like Mongo DB, Hive, Cassandra in HDFS. You can use HBase or Pig as well. There are useful open source tools like Tweepy, FacePager, Flume, or any other available tools. They can be used to download a stream of data from the Twitter/Facebook site to your system or any HDFS system. Once you transform your text data into a structured file, then you can apply any datamining tool/algorithms to the transformed data for Classification:Decision Tree, PEBLS, Neural Network, SVM or Clustering. Text data you can download from: Twitter Yelp Facebook LinkedIn You can download any web pages or any data sets. (See Resource List below) One available data set that used for the examples in this section from (This download contains the text for 219 State of the Union addresses of U.S. Presidents between 1790and 2006)

4 See the project guide below for more detail - an example of the project for Text Mining with R: Twitter Data Analysis Any Public Facebook Sites: NewYork Times, Washington Post, Boston Tribune Facebook Data transformation into either one of the platforms: Tables in RDBMS (MS SQL Server or any database server with Java/JDBC) Key Value Stores (JSON file format), CSV(Comma Separated Value), TSV (Tab Separated Value) Twitter Message data transformation into either one of the platforms: Tables in RDBMS (MS SQL Server or any database server with Java/JDBC) Key Value Stores in JSON file format, CSV(Comma Separated Value), TSV (Tab Separated Value) The Unified Logging Infrastructure for Data Analytics at Twitter George Lee, Jimmy Lin, Chuang Liu, Andrew Lorek, and Dmitriy Ryaboy Twitter, Inc. Webpage or Document Processing for Text Analysis Document Clustering, Phrase Search Generating Word2Vec for each word in Wikipedia or Webpage collection and generating Paragraph2vec for each document to do Similarity Search for Document (Webpage) Clustering or Sentiment Analysis See Natural Language Processing in Unstructured Text Mining section of Class Lecture Notes for the details, the tutorial sites, papers, and Data sets. Text Mining (Sentiment Analysis) with SVM using Yelp Review Data Set or Movie Review in rotten tomato site. Implement Sentiment Analysis in the papers below. (See me for more guides on this) Review Data Sources for Sentiment Analysis Amazon Product Review Data: Movie Review Data

5 Yelp Data Set Question Answering System Question Answering Data on Amazon Product Reviews Papers: Some related papers to start: User-Level Sentiment Analysis Incorporating Social Networks in Twitter (Yahoo) Good Research Project on Sentiment Analysis Sites: Topic sentiment analysis in twitter: a graph-based hashtag sentiment classification approach Text Mining Using MS Analysis Service with Association Rule Mining See detail guides in the Project Section on the class webpage or the following links This section examines two particularly interesting data flow transformations that facilitate text mining: Term Extraction and Term Lookup. SQL Server Data Mining supports the TEXT data type, but that data type is not enough to perform meaningful text analysis. From the algorithm s perspective, columns having the TEXT data type are treated just like discrete columns that have the LONG data type as a collection of various distinct states, without any way to directly access the content of a text value.

6 To perform text mining with SQL Server Data Mining, you must first bring the text to some form that can be consumed by the algorithms. The solution included in the product is to represent each piece of text as a collection of words and phrases, and perform data mining based on the occurrence of certain key words and phrases inside a certain document (and possibly some frequency-related scores). Therefore, a document is modeled very similarly to a shopping basket that contains (or does not contain) certain items (which happen to be key words and phrases). After each document is represented as a collection of key phrases, you can perform data mining using one of the following model types: Classification models that use the key words and phrases nested table as input to predict the class of a document Clustering models that find similar documents based on common occurrences Association models that detect cross-correlations between key words and phrases 1. Build a dictionary of key words and phrases over a collection of representative documents. This task is usually accomplished using the Term Extraction transformation. 2. Based on the dictionary, extract the list of significant key words and phrases for each document to be analyzed. This task is usually accomplished using the Term Lookup transformation. 3. Train mining models on top of the transformed data. NOTE More Data Sources for text mining: State of the Union Any electronic books available on the web About 500 webpages on the Wikipedia site Fortune 500 Company Any Newspaper or Magazine Site Instead of using MS Data Tool, you can build your Document Frequency and Inverted Index described in the Lecture Notes on Information Retrieval to build Term Frequency and Document Frequency for Cosine Similarity. The lecture notes show how cosine similarity is adopted as vector space scoring for document ranking. The one that is not done in the lab2 (I didn't ask this in the lab2) is building weight matrix by calculating weighted score based on tf-idf on page in in the lecture note. Then you can calculated Cosine similarity between documents and the keyword using the weight score based on tf-idf you

7 built. At the end of lecture notes, there are variations of the scoring matrix to optimize. Cosine normalization as well. You can use any electronic books on the web or more than 500 webpages on the web. 2. Fraud Detection or Intrusion Detection using Data Mining Intrusion Detection - Process system Logging files to build database to query - Transform log files in any system into CSV file or a Table to apply any Data mining techniques for Anomaly Detection with Classification (e.g., SVM), Clustering (K Mean), etc. Two Datasets are available per request: NASA Webserver Log file (Old Data set from 1990) See an example project guide in detail to get NASA HTTP Access Logs Wireless Network Log file (New data Set from 2015) For the Data Set and papers, See Anomaly Detection Section in Class Lecture Notes on the Class Webpage Related Paper: Networks-Empirical-Evaluation-of-Threats.pdf 3. Recommendation System o Item-to-Item Collaborative Filtering in Recommendation System o Implement Data Transformation (Binarization of Basket Item Sets) to apply the data mining algorithm SVM. Data Source: Related papers from Amazon Recommendation System:

8 IBM Research Project: Building Data Analytic Artificial Intelligence: IBM Watson DeepQA Project Crime Forecasting Using Clustering Techniques NIJ (National Institute of Justice) Crime Forecasting Challenges and data set 6. Image Data Analytics Deep Learning for Image Recognition See Image Recognition Section at the end of the Class Lecture Notes for the details, the tutorial sites, papers, and Data sets. Face Recognition Research Image Data Processing Tutorial Sites: Data Source:

9 Related Research Papers: ImageNet Classification with Deep Convolutional Neural Networks Going Deeper with Convolutions Data Source: IMDB, Instagram Web Scrapping with XPath in Python E.md tics.py Other Related References: Image Data Sources:

10 Suggested Data Sources The suggested public social media sites or known data collection sites for data analytics are listed below with related industry research papers. You can deploy your big data infrastructure on Cloud. Data transformation into one of the HDFS based NoSQL Systems or both of the following HDFS platforms and RDBMS: 1-1) XML, Key Value Stores, JSON files in a Document Collection for Mongo DB, Cassandra or CSV(Comma Separated Value), TSV (Tab Separated Value) in Hive, PigLatin or Volt DB in HDFS. 1-2) Big Table in HBase in HDFS 1-3) RDD in Spark in HDFS to use Pipeling 1-4) Tables in RDBMS (MS SQL Server in Data Integration Service/ Data Analysis Service using LINQ or any RDBMS Database Server) 1. LinkedIn Related papers to read: Avatara: OLAP for Webscale Analytics Products Lili Wu Roshan Sumbaly Chris Riccomini Gordon Koo Hyung Jin Kim Jay Kreps Sam Shah LinkedIn The Big Data Ecosystem at LinkedIn Roshan Sumbaly, Jay Kreps, and Sam Shah LinkedIn 2. Any well-known Newspaper or Magazine sites on Facebook: Related papers to read: Petabyte Scale Databases and Storage Systems Deployed at Facebook. Dhruba Borthakur Data Warehousing and Analytics Infrastructure at Facebook, in SIGMOD 2010 by Ashish Thusoo (Facebook), et al,

11 3. Twitter Message data transformation: Related papers to read: will be given The Unified Logging Infrastructure for Data Analytics at Twitter George Lee, Jimmy Lin, Chuang Liu, Andrew Lorek, and Dmitriy Ryaboy Twitter, Inc. Fast Data in the Era of Big Data: Twitter s Real-Time Related Query Suggestion Architecture Gilad Mishne, Jeff Dalton, Zhenghua Li, Aneesh Sharma, Jimmy Lin Twitter, Inc Yelp Data Challenge: Business Data set 6. Transform log files in any system into either one of the platforms: Related papers to read: will be given 7. Webpage or Document Processing for Text Analysis Document Clustering, Phrase Search See Natural Language Processing in Unstructured Text Mining section of Class Lecture Notes for the details, the tutorial sites, papers, and Data sets. Or download all the webpages in one domain sites in any well known public news sites of your choice and extract the text body only using XPATH library in any web browser or language. Or You can download preprocessed Wikipedia texts (in XML) here

12 Arxiv research paper repository to Download You can download the data used for the examples in this section from This download contains the text for 219 State of the Union addresses of U.S. Presidents between 1790and IMDB Movie Review Collection for Sentiment Analysis: 9. WordNet You can build your Document Frequency and Inverted Index described in the Lecture Notes on Information Retrieval to build Any IR related Metrics in an algorithm or to apply Association Rule Mining algorithm. The lecture notes show how cosine similarity is adopted as vector space scoring for document ranking. One example is building weight matrix by calculating weighted score based on tf-idf on page in in the lecture note. Then you can calculated Cosine similarity between documents and the keyword using the weight score based on tf-idf you built. At the end of lecture notes, there are variations of the scoring matrix to optimize. Cosine normalization is one of them as well. 8. Transform any electronic books or online documents for text processing analysis Any Electronic book on line See item 7 Webpage Processing above for processing. 9. Text Mining with Data Source in 7 for Association Rule Mining Using MS Analysis Service See detail guides in the Project Section on the class webpage or the following links This section examines two particularly interesting data flow transformations that facilitate text mining: Term Extraction and Term Lookup. SQL Server Data Mining supports the TEXT data type, but that data type

13 is not enough to perform meaningful text analysis. From the algorithm s perspective, columns having the TEXT data type are treated just like discrete columns that have the LONG data type as a collection of various distinct states, without any way to directly access the content of a text value. To perform text mining with SQL Server Data Mining, you must first bring the text to some form that can be consumed by the algorithms. The solution included in the product is to represent each piece of text as a collection of words and phrases, and perform data mining based on the occurrence of certain key words and phrases inside a certain document (and possibly some frequency-related scores). Therefore, a document is modeled very similarly to a shopping basket that contains (or does not contain) certain items (which happen to be key words and phrases). After each document is represented as a collection of key phrases, you can perform data mining using one of the following model types: Classification models that use the key words and phrases nested table as input to predict the class of a document Clustering models that find similar documents based on common occurrences Association models that detect cross-correlations between key words and phrases 1. Build a dictionary of key words and phrases over a collection of representative documents. This task is usually accomplished using the Term Extraction transformation. 2. Based on the dictionary, extract the list of significant key words and phrases for each document to be analyzed. This task is usually accomplished using the Term Lookup transformation. 3. Train mining models on top of the transformed data. Data Source for text mining: Or You can use any electronic books on the web or more than 500 webpages on the web. 10. Image Data Analytics Deep Learning for Image Recognition See Deep Learning for Image Recognition Section at the end of the Class Lecture Notes for the details, the tutorial sites, papers, and Data sets. Data Sets: ImageNet Building Social Network Graph into a store

14 Facebook Friends Social Network (Graph API) data transformation Facebook Friends Social Network (Graph API) data transformation Related papers to read: will be given: 12. Implement any Data Mining Metric you learned in class with a Cube and Dimensions using Microsoft DW. Create Dimensions with a set of attributes and define measure in terms of similarity, distance, or correlation between any two records in vtargetmail data set for Clustering. 13. Minority Class Detection with Decision Tree with adapted measure and weight You can implement your own metric specified in the paper below that can be used in a Decision Tree Algorithm and test with Adventure Data Set. A Robust Decision Tree Algorithm for Imbalanced Data Sets Information Retrieval for finding the most related documents with keywords using any set of webpages or Wikipedia webpages. 15. Any Data Mining Project using Data Warehouse/OLAP with MDX and DMX See DW Tutorial and MDX, DMX Tutorial in Lab3 section for this. 16. Building Social Network Graph into a store Facebook Friends Social Network (Graph API) data transformation into either one of the platforms: Tables in RDBMS (MS SQL Server or any database server with Java/JDBC) Key Value Stores in JSON file format, CSV(Comma Separated Value), TSV (Tab Separated Value) Processing JSON file to table or CSV files with user id with edge columns then apply to data mining query 17. Any GIS data mining 18. Any Papers on One of the Following Topics: Stream data mining using Sparks Sequential pattern mining, sequence classification and clustering Time-series analysis, regression and trend analysis Biological sequence analysis and biological data mining Graph pattern mining, graph classification and clustering Social network analysis

15 Information network analysis Spatial, spatiotemporal and moving object data mining Multimedia data mining Mining computer systems and sensor networks Mining software programs Statistical data mining methods Other Useful Data Sources: Other Related Sites: Useful Resources R or Weka is a collection of machine learning or data mining algorithms for data mining tasks. The algorithms can either be applied directly to a dataset or called from your own Java code. Weka contains tools for data pre-processing, classification, regression, clustering, association rules, and visualization. R Programming:

16 SQL Server Analysis Services (SSAS) Data Tools: You can use R in 2016 SQL Server or Stand Alone R Server R Hadoop System: Weka: Good Conference Sites to Search: KDD Top Research Data Mining Conferences: KDD, IEEE ICDE, IEEE ICDM, CIKM, and SIAM SDM. ACM SIGMOD : VLDB (IEEE): ICDE (IEEE) Cyber Security: Accessorize to a Crime: Real and Stealthy Attacks on State-of-the-Art Face Recognition (Mahmood Sharif Carnegie Mellon University at SIGMOD 2016) AmpPot: Monitoring and Defending Against Amplication DDoS Attacks

17 A Privacy Protection Technique for Publishing Data Mining Models and Research Data fu.pdf?ip= &id= &acc=active%20service&key=1d8e1ca5b8d7d 8DD%2E3DC751E0CA962F99%2E4D4702B0C3E38B35%2E4D4702B0C3E38B35&CFID= &CFTOKEN= & acm = _163bd14f58b49ab867c87c6de3 9445e9#URLTOKEN# Privacy-Preserving Data Mining through Knowledge Model Sharing IMR based Anonymization for Privacy Preservation in Data Mining EN= Hiding a Needle in a Haystack: Privacy Preserving Apriori Algorithm in MapReduce Framework EN= Artificial Intelligence and Machine Learning: o Deep Face Recognition by Omkar M Parkhi o o o Some Research Resources (will be updated) Major Conference Proceedings that will be used 1. DM conferences: ACM SIGKDD (KDD), ICDM (IEEE, Int. Conf. Data Mining), SDM (SIAM Data Mining), PKDD (Principles KDD)/ECML, PAKDD (Pacific-Asia) 2. DB conferences: ACM SIGMOD, VLDB, ICDE 3. ML conferences: NIPS, ICML 4. IR conferences: SIGIR, CIKM 5. Web conferences: WWW, WSDM 6. Other related conferences and journals 7. IEEE TKDE, ACM TKDD, DMKD, ML Recommended Reference Books 1. C. M. Bishop, Pattern Recognition and Machine Learning, Springer 2007.

18 2. S. Chakrabarti, Mining the Web: Statistical Analysis of Hypertext and Semi-Structured Data, Morgan Kaufmann, T. Hastie, R. Tibshirani, J. Friedman, The Elements of Statistical Learning: Data Mining, Inference, and Prediction,2nd ed., Springer-Verlag, B. Liu, Web Data Mining: Exploring Hyperlinks, Contents, and Usage Data, Springer, D. Easley and J. Kleinberg, Networks, Crowds, and Markets: Reasoning About a Highly Connected World, Cambridge Univ. Press, M. Newman, Networks: An Introduction, Oxford Univ. Press, 2010.

CIS 601 Graduate Seminar in Computer Science Sunnie S. Chung

CIS 601 Graduate Seminar in Computer Science Sunnie S. Chung CIS 601 Graduate Seminar in Computer Science Sunnie S. Chung Research on Topics in Recent Computer Science Research and related papers in the subject that you choose and give presentations in class and

More information

Cleveland State University

Cleveland State University Cleveland State University CIS 612/CIS712 Big Data & Parallel Database Processing Systems (3-0-3) Prerequisites: CIS 530. CIS 611 Preferred. Instructor: Dr. Sunnie S. Chung Office Location: FH 222 Phone:

More information

Cleveland State University

Cleveland State University Cleveland State University CIS 612/CIS712 Big Data & Parallel Database Processing Systems (3-0-3) Prerequisites: CIS 530. CIS 611 Preferred. Instructor: Dr. Sunnie S. Chung Office Location: FH 222 Phone:

More information

Chapter 1, Introduction

Chapter 1, Introduction CSI 4352, Introduction to Data Mining Chapter 1, Introduction Young-Rae Cho Associate Professor Department of Computer Science Baylor University What is Data Mining? Definition Knowledge Discovery from

More information

Introduction to Text Mining. Hongning Wang

Introduction to Text Mining. Hongning Wang Introduction to Text Mining Hongning Wang CS@UVa Who Am I? Hongning Wang Assistant professor in CS@UVa since August 2014 Research areas Information retrieval Data mining Machine learning CS@UVa CS6501:

More information

Big Data. Big Data Analyst. Big Data Engineer. Big Data Architect

Big Data. Big Data Analyst. Big Data Engineer. Big Data Architect Big Data Big Data Analyst INTRODUCTION TO BIG DATA ANALYTICS ANALYTICS PROCESSING TECHNIQUES DATA TRANSFORMATION & BATCH PROCESSING REAL TIME (STREAM) DATA PROCESSING Big Data Engineer BIG DATA FOUNDATION

More information

Big Trend in Business Intelligence: Data Mining over Big Data Web Transaction Data. Fall 2012

Big Trend in Business Intelligence: Data Mining over Big Data Web Transaction Data. Fall 2012 Big Trend in Business Intelligence: Data Mining over Big Data Web Transaction Data Fall 2012 Data Warehousing and OLAP Introduction Decision Support Technology On Line Analytical Processing Star Schema

More information

Data Mining: Concepts and Techniques. (3 rd ed.) Chapter 1

Data Mining: Concepts and Techniques. (3 rd ed.) Chapter 1 Data Mining: Concepts and Techniques (3 rd ed.) Chapter 1 Jiawei Han, Micheline Kamber, and Jian Pei University of Illinois at Urbana-Champaign & Simon Fraser University 2013 Han, Kamber & Pei. All rights

More information

Cleveland State University

Cleveland State University Cleveland State University CIS 612 Modern Database Programming & Big Data Processing (3-0-3) Fall 2014 Section 50 Class Nbr. 2670. Tues, Thur 4:00 5:15 PM Prerequisites: CIS 505 and CIS 530. CIS 611 Preferred.

More information

Delving Deep into Hadoop Course Contents Introduction to Hadoop and Architecture

Delving Deep into Hadoop Course Contents Introduction to Hadoop and Architecture Delving Deep into Hadoop Course Contents Introduction to Hadoop and Architecture Hadoop 1.0 Architecture Introduction to Hadoop & Big Data Hadoop Evolution Hadoop Architecture Networking Concepts Use cases

More information

Big Data Architect.

Big Data Architect. Big Data Architect www.austech.edu.au WHAT IS BIG DATA ARCHITECT? A big data architecture is designed to handle the ingestion, processing, and analysis of data that is too large or complex for traditional

More information

Challenges for Data Driven Systems

Challenges for Data Driven Systems Challenges for Data Driven Systems Eiko Yoneki University of Cambridge Computer Laboratory Data Centric Systems and Networking Emergence of Big Data Shift of Communication Paradigm From end-to-end to data

More information

Lab Assignment 2. CIS 612 Dr. Sunnie S. Chung

Lab Assignment 2. CIS 612 Dr. Sunnie S. Chung CIS 612 Dr. Sunnie S. Chung Lab Assignment 2 1. Creating a User Defined Type (UDT) 2. Text Processing to Create a Table Valued Function 3. Visualization of Data in Mongo DB in JSON Geo Location Data Type

More information

Lab Assignment 2. CIS 612 Dr. Sunnie S. Chung. Creating a User Defined Type (UDT) and Create a Table Function Using the UDT Data Type

Lab Assignment 2. CIS 612 Dr. Sunnie S. Chung. Creating a User Defined Type (UDT) and Create a Table Function Using the UDT Data Type CIS 612 Dr. Sunnie S. Chung Lab Assignment 2 Creating a User Defined Type (UDT) and Create a Table Function Using the UDT Data Type In a modern web application such as in a Data Analytic/Big data processing

More information

Data Mining. Yi-Cheng Chen ( 陳以錚 ) Dept. of Computer Science & Information Engineering, Tamkang University

Data Mining. Yi-Cheng Chen ( 陳以錚 ) Dept. of Computer Science & Information Engineering, Tamkang University Data Mining Yi-Cheng Chen ( 陳以錚 ) Dept. of Computer Science & Information Engineering, Tamkang University Why Mine Data? Commercial Viewpoint Lots of data is being collected and warehoused Web data, e-commerce

More information

New Challenges in Big Data: Technical Perspectives. Hwanjo Yu POSTECH

New Challenges in Big Data: Technical Perspectives. Hwanjo Yu POSTECH New Challenges in Big Data: Technical Perspectives Hwanjo Yu POSTECH http:/hwanjoyu.org Over 1 Billion SNS users!! Viral Marketing Word-of-Mouth Effect > TV advertising......... Influence Maximization

More information

The Hadoop Ecosystem. EECS 4415 Big Data Systems. Tilemachos Pechlivanoglou

The Hadoop Ecosystem. EECS 4415 Big Data Systems. Tilemachos Pechlivanoglou The Hadoop Ecosystem EECS 4415 Big Data Systems Tilemachos Pechlivanoglou tipech@eecs.yorku.ca A lot of tools designed to work with Hadoop 2 HDFS, MapReduce Hadoop Distributed File System Core Hadoop component

More information

Overview. Prerequisites. Course Outline. Course Outline :: Apache Spark Development::

Overview. Prerequisites. Course Outline. Course Outline :: Apache Spark Development:: Title Duration : Apache Spark Development : 4 days Overview Spark is a fast and general cluster computing system for Big Data. It provides high-level APIs in Scala, Java, Python, and R, and an optimized

More information

CSE-4412: Data Mining

CSE-4412: Data Mining CSE-4412: Data Mining Welcome! Parke Godfrey www.cse.yorku.ca/course/4412/ January 9, 2007 Data Mining: Concepts and Techniques 1 Chapter 1. Introduction Why is data mining needed? What is data mining?

More information

Chapter 6 VIDEO CASES

Chapter 6 VIDEO CASES Chapter 6 Foundations of Business Intelligence: Databases and Information Management VIDEO CASES Case 1a: City of Dubuque Uses Cloud Computing and Sensors to Build a Smarter, Sustainable City Case 1b:

More information

DATA MINING RESEARCH: RETROSPECT AND PROSPECT

DATA MINING RESEARCH: RETROSPECT AND PROSPECT DATA MINING RESEARCH: RETROSPECT AND PROSPECT Prof(Dr).V.SARAVANAN & Mr. ABDUL KHADAR JILANI Department of Computer Science College of Computer and Information Sciences Majmaah University Kingdom of Saudi

More information

Chapter 3. Foundations of Business Intelligence: Databases and Information Management

Chapter 3. Foundations of Business Intelligence: Databases and Information Management Chapter 3 Foundations of Business Intelligence: Databases and Information Management THE DATA HIERARCHY TRADITIONAL FILE PROCESSING Organizing Data in a Traditional File Environment Problems with the traditional

More information

Cloud Computing 2. CSCI 4850/5850 High-Performance Computing Spring 2018

Cloud Computing 2. CSCI 4850/5850 High-Performance Computing Spring 2018 Cloud Computing 2 CSCI 4850/5850 High-Performance Computing Spring 2018 Tae-Hyuk (Ted) Ahn Department of Computer Science Program of Bioinformatics and Computational Biology Saint Louis University Learning

More information

Data Mining Jay Urbain, PhD. Credits: Nazli Goharian, Jiawei Han, Micheline Kamber, and Jian Pei

Data Mining Jay Urbain, PhD. Credits: Nazli Goharian, Jiawei Han, Micheline Kamber, and Jian Pei Data Mining Jay Urbain, PhD Credits: Nazli Goharian, Jiawei Han, Micheline Kamber, and Jian Pei 1 What is Data Mining? 2 Data Mining: Discovering interesting patterns from data 3 Data Mining: Course Description

More information

Databases 2 (VU) ( / )

Databases 2 (VU) ( / ) Databases 2 (VU) (706.711 / 707.030) MapReduce (Part 3) Mark Kröll ISDS, TU Graz Nov. 27, 2017 Mark Kröll (ISDS, TU Graz) MapReduce Nov. 27, 2017 1 / 42 Outline 1 Problems Suited for Map-Reduce 2 MapReduce:

More information

CSE5243 INTRO. TO DATA MINING

CSE5243 INTRO. TO DATA MINING CSE5243 INTRO. TO DATA MINING Chapter 1. Introduction Huan Sun, CSE@The Ohio State University Slides adapted from UIUC CS412, Fall 2017, by Prof. Jiawei Han CSE 5243. Course Page & Schedule Class Homepage:

More information

Cleveland State University

Cleveland State University Cleveland State University CIS 611/711 Enterprise Databases and Data Warehouse (3-0-3) Prerequisites: CIS430/CIS 530 Instructor: Dr. Sunnie S. Chung Office Location: FH222 Phone: 216 687 4661 Email: sschung.cis@gmail.com

More information

BIG DATA COURSE CONTENT

BIG DATA COURSE CONTENT BIG DATA COURSE CONTENT [I] Get Started with Big Data Microsoft Professional Orientation: Big Data Duration: 12 hrs Course Content: Introduction Course Introduction Data Fundamentals Introduction to Data

More information

Data Mining. Introduction. Hamid Beigy. Sharif University of Technology. Fall 1395

Data Mining. Introduction. Hamid Beigy. Sharif University of Technology. Fall 1395 Data Mining Introduction Hamid Beigy Sharif University of Technology Fall 1395 Hamid Beigy (Sharif University of Technology) Data Mining Fall 1395 1 / 21 Table of contents 1 Introduction 2 Data mining

More information

Big Data Hadoop Course Content

Big Data Hadoop Course Content Big Data Hadoop Course Content Topics covered in the training Introduction to Linux and Big Data Virtual Machine ( VM) Introduction/ Installation of VirtualBox and the Big Data VM Introduction to Linux

More information

CIS 612 Advanced Topics in Database Big Data Project Lawrence Ni, Priya Patil, James Tench

CIS 612 Advanced Topics in Database Big Data Project Lawrence Ni, Priya Patil, James Tench CIS 612 Advanced Topics in Database Big Data Project Lawrence Ni, Priya Patil, James Tench Abstract Implementing a Hadoop-based system for processing big data and doing analytics is a topic which has been

More information

Data Mining. Introduction. Hamid Beigy. Sharif University of Technology. Fall 1394

Data Mining. Introduction. Hamid Beigy. Sharif University of Technology. Fall 1394 Data Mining Introduction Hamid Beigy Sharif University of Technology Fall 1394 Hamid Beigy (Sharif University of Technology) Data Mining Fall 1394 1 / 20 Table of contents 1 Introduction 2 Data mining

More information

Big Data with Hadoop Ecosystem

Big Data with Hadoop Ecosystem Diógenes Pires Big Data with Hadoop Ecosystem Hands-on (HBase, MySql and Hive + Power BI) Internet Live http://www.internetlivestats.com/ Introduction Business Intelligence Business Intelligence Process

More information

3 Data, Data Mining. Chengkai Li

3 Data, Data Mining. Chengkai Li CSE4334/5334 Data Mining 3 Data, Data Mining Chengkai Li Department of Computer Science and Engineering University of Texas at Arlington Fall 2018 (Slides partly courtesy of Pang-Ning Tan, Michael Steinbach

More information

Prototyping Data Intensive Apps: TrendingTopics.org

Prototyping Data Intensive Apps: TrendingTopics.org Prototyping Data Intensive Apps: TrendingTopics.org Pete Skomoroch Research Scientist at LinkedIn Consultant at Data Wrangling @peteskomoroch 09/29/09 1 Talk Outline TrendingTopics Overview Wikipedia Page

More information

Dealing with Data Especially Big Data

Dealing with Data Especially Big Data Dealing with Data Especially Big Data INFO-GB-2346.01 Fall 2017 Professor Norman White nwhite@stern.nyu.edu normwhite@twitter Teaching Assistant: Frenil Sanghavi fps241@stern.nyu.edu Administrative Assistant:

More information

Overview of Web Mining Techniques and its Application towards Web

Overview of Web Mining Techniques and its Application towards Web Overview of Web Mining Techniques and its Application towards Web *Prof.Pooja Mehta Abstract The World Wide Web (WWW) acts as an interactive and popular way to transfer information. Due to the enormous

More information

Data Architectures in Azure for Analytics & Big Data

Data Architectures in Azure for Analytics & Big Data Data Architectures in for Analytics & Big Data October 20, 2018 Melissa Coates Solution Architect, BlueGranite Microsoft Data Platform MVP Blog: www.sqlchick.com Twitter: @sqlchick Data Architecture A

More information

IBM Data Science Experience White paper. SparkR. Transforming R into a tool for big data analytics

IBM Data Science Experience White paper. SparkR. Transforming R into a tool for big data analytics IBM Data Science Experience White paper R Transforming R into a tool for big data analytics 2 R Executive summary This white paper introduces R, a package for the R statistical programming language that

More information

A Review Paper on Big data & Hadoop

A Review Paper on Big data & Hadoop A Review Paper on Big data & Hadoop Rupali Jagadale MCA Department, Modern College of Engg. Modern College of Engginering Pune,India rupalijagadale02@gmail.com Pratibha Adkar MCA Department, Modern College

More information

Hadoop, Yarn and Beyond

Hadoop, Yarn and Beyond Hadoop, Yarn and Beyond 1 B. R A M A M U R T H Y Overview We learned about Hadoop1.x or the core. Just like Java evolved, Java core, Java 1.X, Java 2.. So on, software and systems evolve, naturally.. Lets

More information

The age of Big Data Big Data for Oracle Database Professionals

The age of Big Data Big Data for Oracle Database Professionals The age of Big Data Big Data for Oracle Database Professionals Oracle OpenWorld 2017 #OOW17 SessionID: SUN5698 Tom S. Reddy tom.reddy@datareddy.com About the Speaker COLLABORATE & OpenWorld Speaker IOUG

More information

Hadoop An Overview. - Socrates CCDH

Hadoop An Overview. - Socrates CCDH Hadoop An Overview - Socrates CCDH What is Big Data? Volume Not Gigabyte. Terabyte, Petabyte, Exabyte, Zettabyte - Due to handheld gadgets,and HD format images and videos - In total data, 90% of them collected

More information

Increase Value from Big Data with Real-Time Data Integration and Streaming Analytics

Increase Value from Big Data with Real-Time Data Integration and Streaming Analytics Increase Value from Big Data with Real-Time Data Integration and Streaming Analytics Cy Erbay Senior Director Striim Executive Summary Striim is Uniquely Qualified to Solve the Challenges of Real-Time

More information

Big Data Analytics. Description:

Big Data Analytics. Description: Big Data Analytics Description: With the advance of IT storage, pcoressing, computation, and sensing technologies, Big Data has become a novel norm of life. Only until recently, computers are able to capture

More information

Nowcasting. D B M G Data Base and Data Mining Group of Politecnico di Torino. Big Data: Hype or Hallelujah? Big data hype?

Nowcasting. D B M G Data Base and Data Mining Group of Politecnico di Torino. Big Data: Hype or Hallelujah? Big data hype? Big data hype? Big Data: Hype or Hallelujah? Data Base and Data Mining Group of 2 Google Flu trends On the Internet February 2010 detected flu outbreak two weeks ahead of CDC data Nowcasting http://www.internetlivestats.com/

More information

Oracle Big Data Fundamentals Ed 1

Oracle Big Data Fundamentals Ed 1 Oracle University Contact Us: +0097143909050 Oracle Big Data Fundamentals Ed 1 Duration: 5 Days What you will learn In the Oracle Big Data Fundamentals course, learn to use Oracle's Integrated Big Data

More information

Fall Principles of Knowledge Discovery in Databases. University of Alberta

Fall Principles of Knowledge Discovery in Databases. University of Alberta Principles of Knowledge Discovery in Databases Fall 1999 Dr. Osmar R. Zaïane 2 1 Class and Office Hours Class: Mondays, Wednesdays and Fridays from 10:00 to 10:50 Office Hours: Tuesdays from 11:00 to 11:55

More information

CS 412 Intro. to Data Mining

CS 412 Intro. to Data Mining CS 412 Intro. to Data Mining Chapter 1. Introduction Jiawei Han, Computer Science, Univ. Illinois at Urbana -Champaign, 2017 1 August 28, 2017 Data Mining: Concepts and Techniques 2 August 28, 2017 Data

More information

Oracle Big Data Fundamentals Ed 2

Oracle Big Data Fundamentals Ed 2 Oracle University Contact Us: 1.800.529.0165 Oracle Big Data Fundamentals Ed 2 Duration: 5 Days What you will learn In the Oracle Big Data Fundamentals course, you learn about big data, the technologies

More information

An Introduction to Apache Spark

An Introduction to Apache Spark An Introduction to Apache Spark 1 History Developed in 2009 at UC Berkeley AMPLab. Open sourced in 2010. Spark becomes one of the largest big-data projects with more 400 contributors in 50+ organizations

More information

Data Mining: Dynamic Past and Promising Future

Data Mining: Dynamic Past and Promising Future SDM@10 Anniversary Panel: Data Mining: A Decade of Progress and Future Outlook Data Mining: Dynamic Past and Promising Future Jiawei Han Department of Computer Science University of Illinois at Urbana

More information

Big Data and Hadoop. Course Curriculum: Your 10 Module Learning Plan. About Edureka

Big Data and Hadoop. Course Curriculum: Your 10 Module Learning Plan. About Edureka Course Curriculum: Your 10 Module Learning Plan Big Data and Hadoop About Edureka Edureka is a leading e-learning platform providing live instructor-led interactive online training. We cater to professionals

More information

Oracle Big Data Science

Oracle Big Data Science Oracle Big Data Science Tim Vlamis and Dan Vlamis Vlamis Software Solutions 816-781-2880 www.vlamis.com @VlamisSoftware Vlamis Software Solutions Vlamis Software founded in 1992 in Kansas City, Missouri

More information

Learning to Match. Jun Xu, Zhengdong Lu, Tianqi Chen, Hang Li

Learning to Match. Jun Xu, Zhengdong Lu, Tianqi Chen, Hang Li Learning to Match Jun Xu, Zhengdong Lu, Tianqi Chen, Hang Li 1. Introduction The main tasks in many applications can be formalized as matching between heterogeneous objects, including search, recommendation,

More information

Hadoop Development Introduction

Hadoop Development Introduction Hadoop Development Introduction What is Bigdata? Evolution of Bigdata Types of Data and their Significance Need for Bigdata Analytics Why Bigdata with Hadoop? History of Hadoop Why Hadoop is in demand

More information

R Language for the SQL Server DBA

R Language for the SQL Server DBA R Language for the SQL Server DBA Beginning with R Ing. Eduardo Castro, PhD, Principal Data Analyst Architect, LP Consulting Moderated By: Jose Rolando Guay Paz Thank You microsoft.com idera.com attunity.com

More information

Data Mining. Chapter 1: Introduction. Adapted from materials by Jiawei Han, Micheline Kamber, and Jian Pei

Data Mining. Chapter 1: Introduction. Adapted from materials by Jiawei Han, Micheline Kamber, and Jian Pei Data Mining Chapter 1: Introduction Adapted from materials by Jiawei Han, Micheline Kamber, and Jian Pei 1 Any Question? Just Ask 3 Chapter 1. Introduction Why Data Mining? What Is Data Mining? A Multi-Dimensional

More information

DATA MINING II - 1DL460. Spring 2014"

DATA MINING II - 1DL460. Spring 2014 DATA MINING II - 1DL460 Spring 2014" A second course in data mining http://www.it.uu.se/edu/course/homepage/infoutv2/vt14 Kjell Orsborn Uppsala Database Laboratory Department of Information Technology,

More information

Cluster Computing Architecture. Intel Labs

Cluster Computing Architecture. Intel Labs Intel Labs Legal Notices INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL PRODUCTS. NO LICENSE, EXPRESS OR IMPLIED, BY ESTOPPEL OR OTHERWISE, TO ANY INTELLECTUAL PROPERTY RIGHTS IS GRANTED

More information

4th National Conference on Electrical, Electronics and Computer Engineering (NCEECE 2015)

4th National Conference on Electrical, Electronics and Computer Engineering (NCEECE 2015) 4th National Conference on Electrical, Electronics and Computer Engineering (NCEECE 2015) Benchmark Testing for Transwarp Inceptor A big data analysis system based on in-memory computing Mingang Chen1,2,a,

More information

Best practices for building a Hadoop Data Lake Solution CHARLOTTE HADOOP USER GROUP

Best practices for building a Hadoop Data Lake Solution CHARLOTTE HADOOP USER GROUP Best practices for building a Hadoop Data Lake Solution CHARLOTTE HADOOP USER GROUP 07.29.2015 LANDING STAGING DW Let s start with something basic Is Data Lake a new concept? What is the closest we can

More information

Big Data com Hadoop. VIII Sessão - SQL Bahia. Impala, Hive e Spark. Diógenes Pires 03/03/2018

Big Data com Hadoop. VIII Sessão - SQL Bahia. Impala, Hive e Spark. Diógenes Pires 03/03/2018 Big Data com Hadoop Impala, Hive e Spark VIII Sessão - SQL Bahia 03/03/2018 Diógenes Pires Connect with PASS Sign up for a free membership today at: pass.org #sqlpass Internet Live http://www.internetlivestats.com/

More information

CS423: Data Mining. Introduction. Jakramate Bootkrajang. Department of Computer Science Chiang Mai University

CS423: Data Mining. Introduction. Jakramate Bootkrajang. Department of Computer Science Chiang Mai University CS423: Data Mining Introduction Jakramate Bootkrajang Department of Computer Science Chiang Mai University Jakramate Bootkrajang CS423: Data Mining 1 / 29 Quote of the day Never memorize something that

More information

We are ready to serve Latest Testing Trends, Are you ready to learn?? New Batches Info

We are ready to serve Latest Testing Trends, Are you ready to learn?? New Batches Info We are ready to serve Latest Testing Trends, Are you ready to learn?? New Batches Info START DATE : TIMINGS : DURATION : TYPE OF BATCH : FEE : FACULTY NAME : LAB TIMINGS : PH NO: 9963799240, 040-40025423

More information

SQT03 Big Data and Hadoop with Azure HDInsight Andrew Brust. Senior Director, Technical Product Marketing and Evangelism

SQT03 Big Data and Hadoop with Azure HDInsight Andrew Brust. Senior Director, Technical Product Marketing and Evangelism Big Data and Hadoop with Azure HDInsight Andrew Brust Senior Director, Technical Product Marketing and Evangelism Datameer Level: Intermediate Meet Andrew Senior Director, Technical Product Marketing and

More information

Effective Keyword Search over (Semi)-Structured Big Data Mehdi Kargar

Effective Keyword Search over (Semi)-Structured Big Data Mehdi Kargar Effective Keyword Search over (Semi)-Structured Big Data Mehdi Kargar School of Computer Science Faculty of Science University of Windsor How Big is this Big Data? 40 Billion Instagram Photos 300 Hours

More information

Hadoop Online Training

Hadoop Online Training Hadoop Online Training IQ training facility offers Hadoop Online Training. Our Hadoop trainers come with vast work experience and teaching skills. Our Hadoop training online is regarded as the one of the

More information

INTRODUCTION TO BIG DATA, DATA MINING, AND MACHINE LEARNING

INTRODUCTION TO BIG DATA, DATA MINING, AND MACHINE LEARNING CS 7265 BIG DATA ANALYTICS INTRODUCTION TO BIG DATA, DATA MINING, AND MACHINE LEARNING * Some contents are adapted from Dr. Hung Huang and Dr. Chengkai Li at UT Arlington Mingon Kang, PhD Computer Science,

More information

Overview. Data-mining. Commercial & Scientific Applications. Ongoing Research Activities. From Research to Technology Transfer

Overview. Data-mining. Commercial & Scientific Applications. Ongoing Research Activities. From Research to Technology Transfer Data Mining George Karypis Department of Computer Science Digital Technology Center University of Minnesota, Minneapolis, USA. http://www.cs.umn.edu/~karypis karypis@cs.umn.edu Overview Data-mining What

More information

The 2018 (14th) International Conference on Data Science (ICDATA)

The 2018 (14th) International Conference on Data Science (ICDATA) CALL FOR PAPERS LATE BREAKING PAPERS, POSITION PAPERS, ABSTRACTS, POSTERS Paper Submission Deadline: May 20, 2018 The 2018 (14th) International Conference on Data Science (ICDATA) (former International

More information

Based on Big Data: Hype or Hallelujah? by Elena Baralis

Based on Big Data: Hype or Hallelujah? by Elena Baralis Based on Big Data: Hype or Hallelujah? by Elena Baralis http://dbdmg.polito.it/wordpress/wp-content/uploads/2010/12/bigdata_2015_2x.pdf 1 3 February 2010 Google detected flu outbreak two weeks ahead of

More information

Contents PART I: CLOUD, BIG DATA, AND COGNITIVE COMPUTING 1

Contents PART I: CLOUD, BIG DATA, AND COGNITIVE COMPUTING 1 Preface xiii PART I: CLOUD, BIG DATA, AND COGNITIVE COMPUTING 1 1 Princi ples of Cloud Computing Systems 3 1.1 Elastic Cloud Systems for Scalable Computing 3 1.1.1 Enabling Technologies for Cloud Computing

More information

Specialist ICT Learning

Specialist ICT Learning Specialist ICT Learning APPLIED DATA SCIENCE AND BIG DATA ANALYTICS GTBD7 Course Description This intensive training course provides theoretical and technical aspects of Data Science and Business Analytics.

More information

Election Analysis and Prediction Using Big Data Analytics

Election Analysis and Prediction Using Big Data Analytics Election Analysis and Prediction Using Big Data Analytics Omkar Sawant, Chintaman Taral, Roopak Garbhe Students, Department Of Information Technology Vidyalankar Institute of Technology, Mumbai, India

More information

Big Data Technology Ecosystem. Mark Burnette Pentaho Director Sales Engineering, Hitachi Vantara

Big Data Technology Ecosystem. Mark Burnette Pentaho Director Sales Engineering, Hitachi Vantara Big Data Technology Ecosystem Mark Burnette Pentaho Director Sales Engineering, Hitachi Vantara Agenda End-to-End Data Delivery Platform Ecosystem of Data Technologies Mapping an End-to-End Solution Case

More information

SURVEY ON STUDENT INFORMATION ANALYSIS

SURVEY ON STUDENT INFORMATION ANALYSIS Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology ISSN 2320 088X IMPACT FACTOR: 5.258 IJCSMC,

More information

BIG DATA TESTING: A UNIFIED VIEW

BIG DATA TESTING: A UNIFIED VIEW http://core.ecu.edu/strg BIG DATA TESTING: A UNIFIED VIEW BY NAM THAI ECU, Computer Science Department, March 16, 2016 2/30 PRESENTATION CONTENT 1. Overview of Big Data A. 5 V s of Big Data B. Data generation

More information

Stages of Data Processing

Stages of Data Processing Data processing can be understood as the conversion of raw data into a meaningful and desired form. Basically, producing information that can be understood by the end user. So then, the question arises,

More information

Introduction to Data Mining and Data Analytics

Introduction to Data Mining and Data Analytics 1/28/2016 MIST.7060 Data Analytics 1 Introduction to Data Mining and Data Analytics What Are Data Mining and Data Analytics? Data mining is the process of discovering hidden patterns in data, where Patterns

More information

Data Mining: Concepts and Techniques

Data Mining: Concepts and Techniques Data Mining: Concepts and Techniques Slides for Textbook Chapter 1 Jiawei Han and Micheline Kamber Intelligent Database Systems Research Lab School of Computing Science Simon Fraser University, Canada

More information

Big Data Infrastructures & Technologies

Big Data Infrastructures & Technologies Big Data Infrastructures & Technologies Spark and MLLIB OVERVIEW OF SPARK What is Spark? Fast and expressive cluster computing system interoperable with Apache Hadoop Improves efficiency through: In-memory

More information

CS249: ADVANCED DATA MINING

CS249: ADVANCED DATA MINING CS249: ADVANCED DATA MINING 1: Introduction Instructor: Yizhou Sun yzsun@cs.ucla.edu (Instructor for Today s class: Ting Chen) April 9, 2017 Course Information Course homepage: http://web.cs.ucla.edu/~yzsun/classes/2017spr

More information

ACHIEVEMENTS FROM TRAINING

ACHIEVEMENTS FROM TRAINING LEARN WELL TECHNOCRAFT DATA SCIENCE/ MACHINE LEARNING SYLLABUS 8TH YEAR OF ACCOMPLISHMENTS AUTHORIZED GLOBAL CERTIFICATION CENTER FOR MICROSOFT, ORACLE, IBM, AWS AND MANY MORE. 8411002339/7709292162 WWW.DW-LEARNWELL.COM

More information

CS-490WIR Web Information Retrieval and Management. Luo Si

CS-490WIR Web Information Retrieval and Management. Luo Si CS490W: Web Information Retrieval & Management CS-490WIR Web Information Retrieval and Management Luo Si Department of Computer Science Purdue University Overview Web: Growth of the Web The world produces

More information

Big Data Hadoop Developer Course Content. Big Data Hadoop Developer - The Complete Course Course Duration: 45 Hours

Big Data Hadoop Developer Course Content. Big Data Hadoop Developer - The Complete Course Course Duration: 45 Hours Big Data Hadoop Developer Course Content Who is the target audience? Big Data Hadoop Developer - The Complete Course Course Duration: 45 Hours Complete beginners who want to learn Big Data Hadoop Professionals

More information

Processing Unstructured Data. Dinesh Priyankara Founder/Principal Architect dinesql Pvt Ltd.

Processing Unstructured Data. Dinesh Priyankara Founder/Principal Architect dinesql Pvt Ltd. Processing Unstructured Data Dinesh Priyankara Founder/Principal Architect dinesql Pvt Ltd. http://dinesql.com / Dinesh Priyankara @dinesh_priya Founder/Principal Architect dinesql Pvt Ltd. Microsoft Most

More information

Web Mining TEAM 8. Professor Anita Wasilewska CSE 634 Data Mining

Web Mining TEAM 8. Professor Anita Wasilewska CSE 634 Data Mining Web Mining TEAM 8 Paper - You Are What You Tweet : Analyzing Twitter for Public Health Authors : Paul, Michael J., and Mark Dredze. Conference : AAAI Publications, Fifth International AAAI Conference on

More information

Big Data Hadoop Stack

Big Data Hadoop Stack Big Data Hadoop Stack Lecture #1 Hadoop Beginnings What is Hadoop? Apache Hadoop is an open source software framework for storage and large scale processing of data-sets on clusters of commodity hardware

More information

1 Dulcian, Inc., 2001 All rights reserved. Oracle9i Data Warehouse Review. Agenda

1 Dulcian, Inc., 2001 All rights reserved. Oracle9i Data Warehouse Review. Agenda Agenda Oracle9i Warehouse Review Dulcian, Inc. Oracle9i Server OLAP Server Analytical SQL Mining ETL Infrastructure 9i Warehouse Builder Oracle 9i Server Overview E-Business Intelligence Platform 9i Server:

More information

Oracle Database 11g for Data Warehousing & Big Data: Strategy, Roadmap Jean-Pierre Dijcks, Hermann Baer Oracle Redwood City, CA, USA

Oracle Database 11g for Data Warehousing & Big Data: Strategy, Roadmap Jean-Pierre Dijcks, Hermann Baer Oracle Redwood City, CA, USA Oracle Database 11g for Data Warehousing & Big Data: Strategy, Roadmap Jean-Pierre Dijcks, Hermann Baer Oracle Redwood City, CA, USA Keywords: Big Data, Oracle Big Data Appliance, Hadoop, NoSQL, Oracle

More information

DATA SCIENCE USING SPARK: AN INTRODUCTION

DATA SCIENCE USING SPARK: AN INTRODUCTION DATA SCIENCE USING SPARK: AN INTRODUCTION TOPICS COVERED Introduction to Spark Getting Started with Spark Programming in Spark Data Science with Spark What next? 2 DATA SCIENCE PROCESS Exploratory Data

More information

A data-driven framework for archiving and exploring social media data

A data-driven framework for archiving and exploring social media data A data-driven framework for archiving and exploring social media data Qunying Huang and Chen Xu Yongqi An, 20599957 Oct 18, 2016 Introduction Social media applications are widely deployed in various platforms

More information

CSE 444: Database Internals. Lecture 23 Spark

CSE 444: Database Internals. Lecture 23 Spark CSE 444: Database Internals Lecture 23 Spark References Spark is an open source system from Berkeley Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing. Matei

More information

Webinar Series TMIP VISION

Webinar Series TMIP VISION Webinar Series TMIP VISION TMIP provides technical support and promotes knowledge and information exchange in the transportation planning and modeling community. Today s Goals To Consider: Parallel Processing

More information

Oracle Big Data Science IOUG Collaborate 16

Oracle Big Data Science IOUG Collaborate 16 Oracle Big Data Science IOUG Collaborate 16 Session 4762 Tim and Dan Vlamis Tuesday, April 12, 2016 Vlamis Software Solutions Vlamis Software founded in 1992 in Kansas City, Missouri Developed 200+ Oracle

More information

Blended Learning Outline: Developer Training for Apache Spark and Hadoop (180404a)

Blended Learning Outline: Developer Training for Apache Spark and Hadoop (180404a) Blended Learning Outline: Developer Training for Apache Spark and Hadoop (180404a) Cloudera s Developer Training for Apache Spark and Hadoop delivers the key concepts and expertise need to develop high-performance

More information

Introduction to Information Retrieval. Hongning Wang

Introduction to Information Retrieval. Hongning Wang Introduction to Information Retrieval Hongning Wang CS@UVa What is information retrieval? 2 Why information retrieval Information overload It refers to the difficulty a person can have understanding an

More information

Hadoop course content

Hadoop course content course content COURSE DETAILS 1. In-detail explanation on the concepts of HDFS & MapReduce frameworks 2. What is 2.X Architecture & How to set up Cluster 3. How to write complex MapReduce Programs 4. In-detail

More information

745: Advanced Database Systems

745: Advanced Database Systems 745: Advanced Database Systems Yanlei Diao University of Massachusetts Amherst Outline Overview of course topics Course requirements Database Management Systems 1. Online Analytical Processing (OLAP) vs.

More information