Lluis Belanche + Alfredo Vellido Data Mining II An Introduction to Mining (2)

Size: px
Start display at page:

Download "Lluis Belanche + Alfredo Vellido Data Mining II An Introduction to Mining (2)"

Transcription

1 Lluis Belanche + Alfredo Vellido Data Mining II An Introduction to Mining (2)

2 On dates & evaluation: Lectures expected to end on the week 14-18th Dec Likely essay deadline & presentation: 15th, 22nd Jan

3 What s MINING?: A historicist viewpoint $!%&!"#"

4 MINING as a methodology

5 CRISP: a DM methodology CRoss-Industry Standard Process for Data Mining: neutral methodology from the point of view of industry, tool and application (free & non-proprietary) Pete Chapman, Randy Kerber (NCR); Julian Clinton, Thomas Khabaza, Colin Shearer (SPSS), Thomas Reinartz, Rüdiger Wirth (DaimlerChrysler) CRISP-DM was conceived in 1996 DaimlerChrysler: leaders in industrial application, SPSS: leaders in product development (Clementine, 1994), NCR: owners of large (huge!) databases (Teradata) Financed by the EU. Version 1.0 released officially in 1999

6 CRISP: Hierarchic structure of the methodology

7 CRISP: The virtuous loop of methodology phases

8 CRISP: Description of phases Problem understanding: study of targets and requirements form the business/problem viewpoint. Defining it as a DM problem. Data understanding: data recolection; getting to know the data, trying to detect both quality problems and interesting features. Data preparation: Preparing the data set to be modelled, starting from raw data. This is an iterative and exploratory process. Selection of files, tables, variables, record samples plus data cleaning. Modelling: Data analysis using modelling techniques of a sort that are suitable for the problem at hand. Includes fiddling with the models, tuning their parameters, etc. Evaluation: All previous steps must be evaluated as whole (as a unitary process), and we must decide whether deliverables so far meet the DM challenge. Implementation: All the knowledge aquired to this point must be organized and presented to the client in a usable form. We must define, together with this client, a protocol to reliably deploy the DM findings.

9 CRISP: The virtuous loop of methodology phases

10 Use of DM methodologies ( )!! "#$ %$ Enterprise MinerTM: SEMMA The acronym SEMMA -- Sample, Explore, Modify, Model, Assess -- refers to the core process of conducting data mining. Beginning with a statistically representative sample of your data, SEMMA makes it easy to apply exploratory statistical and visualization techniques, select and transform the most significant predictive variables, model the variables to predict outcomes, and confirm a model's accuracy.

11 Use of DM methodologies ( )

12 CRISP: Phases: Problem understanding PROBLEM UNDERSTANDING UNDERST ING PREPARATION MODELLING EVALUATION IMPLEMEN TATION DETERMINE PROBLEM GOAL BACKGROUND PROBLEM GOALS SUCCESS CRITERIA ASSESS SITUATION INVENTORY RESOURCES REQUERIMS. ASSUMPTIONS LIMITATIONS RISKS CONTINGEN. TERMINOLOG. COSTS & BENEFITS DETERMINE DM GOALS GOALS DM SUCCESS CRITERIA DM PRODUCE PROJECT PLAN PROJECT PLAN INITIAL SELECTION OF TOOLS

13 DM application areas ( 06-> 09) & &'( )*+$$, (! $, -$.)*+ ( $+, '( /$,#.0$1, 2(2 3 $4,$1.$,#2 &( "#$2 &( 5$6$,1 ( 3 $4*$1 (',$,$ (' *,$ (' $6 ( 7$1$.,- ( $+,6.#1! (& *8,* ( 07$1$. 6 ( $,11$,$ 2(2 57$6.9:62 (2 $,*.$12 (2 9$6#,$.92 (2 ;*-$16.:1 (! $1$. *, (! /- ('

14 CRISP: Phases: Data understanding PROBLEM UNDERSTANDING UNDERST ING PREPARATION MODELLING EVALUATION IMPLEMEN TATION OBTAIN INITIAL INITIAL REPORT DESCRIPTION EXPLORATION VERIFICATION QUALITY DESCRIPTIVE REPORT EXPLORATION REPORT QUALITY REPORT

15 METROFANG: a real story about data understanding (1)

16 METROFANG: a real story about data understanding (2) caudal entrada 350,00 300,00 250,00 200,00 150,00 100,00 50,00 0, Par motor Secador A 140,00 120,00 100,00 80,00 Missing data Stationality Outliers Time Series Weekend? FORUM??? 60,00 40,00 20,00 0,

17 Storing data ( 07) Poll What did you use for data storage for significant data mining projects in the past year: [142 voters, 284 votes] Text files (e.g. tab or comma delim) (75) 52.8% Data mining system format (SAS, SPSS, arff) (57) 40.1% Excel (28) 19.7% Oracle (25) 17.6% SQL Server (15) 10.6% mysql (12) 8.5% other format (10) 7.0% other commercial DBMS (7) 4.9% other free DBMS (4) 2.8%

18 CRISP: Phases: Data preparation PROBLEM UNDERSTANDING UNDERST ING PREPARATION MODELLING EVALUATION IMPLEMEN TATION SELECTION ARGUMENTS FOR SELECTION CLEANING DATOA CLEANING REPORT RECONSTRUCT DERIVATED VARIABLES OSERVATIONS GENERATED INTEGRATE INTEGRATED FORMATTING WITH NEW FORMAT

19 Is data preparation that important?! "#$ " 7$!!& 6$2! 2 2 &'

20 Common data types analyzed ( 07) Compared to 2005 KDnuggets Poll on Types of data you analyzed/mined in last 12 months, the biggest increase was in anonymized data (perhaps and indicator of increasing importance of privacy issues).

21 Common data types analyzed ( 09)

22 How big is yours? ( 06 -> 09) % & ' 6$# /2 ( / / /0/ (0/2 0/ 0/5$4$! 7$5$4$ & 2 2 2

23 Data manipulation tools ( 07)

24 CRISP: Phases: Modelling PROBLEM UNDERSTANDING UNDERST ING PREPARATION MODELLING EVALUATION IMPLEMEN TATION SELECT MODELING TECHNIQUE SELECTED TECHNIQUE CREATE TEST DESIGN TEST DESIGN BUILD MODEL PARAMETER SELECTION MODEL MODEL DESCRIPTION VALIDATE MODEL MODEL VALIDATION

25 CRISP: Selection of techniques U N I V E R S E OF T E C H N I Q U E S (Definided by tools) TECHNIQUES SUITED TO A PROBLEM POLITICAL REQUIREMENTS (Business, executive) Money, time, hh.rr. LIMITATIONS Data types, knowledge SELECTED TOOL(S)

26 Commonly used models/techniques ( 05) (" ") ) *+ $,5$$.*6$ 6*$ & $$',! <*6&! %$*6%$!,*6$2 %$$%$#4& < *::7$,1,#$& /$& $=*$,$.51$$$6 & /2 & 94+1$#+& & / & 0$$,6#1' "#$ &

27 Commonly used models/techniques ( 07)

28 CRISP: Phases: Evaluation PROBLEM UNDERSTANDING UNDERST ING PREPARATION MODELLING EVALUATION IMPLEMEN TATION EVALUATE RESULTS EVOLUTION OF DM RESULTS APPROVED MODELS REVISE PROCESSES REVISION OF THE PROCESS DETERMINE NEXT STEPS LIST OF POSSIBLE ACTIONS DECISSIONS

29 CRISP: Phases: Deployment PROBLEM UNDERSTANDING UNDERST ING PREPARATION MODELLING EVALUATION IMPLEMEN TATION PLAN IMPLEMEN TATION IMPLEMENTATION PLAN PLAN MONITORIZATION & MAINTENANCE MONITORIZATION & MAINTENANCE PLAN GENERATE FINAL REPORT FINAL REPORT FINAL PRESENTATION REVISE PROJECT DOCUMENTATION OF EXPERIENCE

30 How do you deploy it? ( 06 > 09), #- $*./ *46#$$,#::$& >$8+,#$4*$*6$??? $:6:+*,+ ( >$+168, 7$1+$6@A 7$1+$6#$6*$ 7$1+$6;7 7$1+$6 A??? $:64,#1+$! $:6$61$1+$ &!(' (!( '(2 ( (! (! ( 2(2 ( Cloud computing : computing in which dynamically scalable and often virtualized resources are provided as a service over the Internet. An example Google Apps

31 Software popularity ( 07) Free vs. commercial: debate

32 Software popularity ( 09)

33 '%' $(") $(")*+%' %,%, - Why? Many changes have occurred in the business application of data mining since CRISP-DM 1.0 was published. Emerging issues and requirements include: The availability of new types of data text, Web, and attitudinal data, for example along with new techniques for pre-processing, analyzing, and combining them with related case data Integration and deployment of results with operational systems such as call centers and Web sites Far more demanding requirements for scalability and for deployment into real-time environments The need to package analytical tasks for non-analytical end users and integrate these tasks in business workflows The need to seamlessly integrate the deployment of results and closed-loop feedback with existing business processes The need to mine large-scale databases in situ, rather than exporting an analytical dataset Organizations increasing reliance on teams, making it important to educate greater numbers of people on the processes and best practices associated with data mining and predictive analytics In July 2006 the consortium announced that it was going to start the process of working towards a second version of CRISP-DM. On 26 September 2006, the CRISP-DM SIG met to discuss potential enhancements for CRISP-DM 2.0 and the subsequent roadmap. However, these efforts appear to be stalled. The SIG has not met, updated the CRISP website, or communicated anything to members since early 2007.

Lluis Belanche + Alfredo Vellido. Intelligent Data Analysis and Data Mining. Data Analysis and Knowledge Discovery

Lluis Belanche + Alfredo Vellido. Intelligent Data Analysis and Data Mining. Data Analysis and Knowledge Discovery Lluis Belanche + Alfredo Vellido Intelligent Data Analysis and Data Mining or Data Analysis and Knowledge Discovery a.k.a. Data Mining II MINING as a methodology (from previous session ) CRISP: a DM methodology

More information

KDD, SEMMA AND CRISP-DM: A PARALLEL OVERVIEW. Ana Azevedo and M.F. Santos

KDD, SEMMA AND CRISP-DM: A PARALLEL OVERVIEW. Ana Azevedo and M.F. Santos KDD, SEMMA AND CRISP-DM: A PARALLEL OVERVIEW Ana Azevedo and M.F. Santos ABSTRACT In the last years there has been a huge growth and consolidation of the Data Mining field. Some efforts are being done

More information

The CRISP-DM Process Model

The CRISP-DM Process Model CRISP-DM Discussion Paper March, 1999 The CRISP-DM Process Model Pete Chapman (NCR) Julian Clinton (SPSS) Thomas Khabaza (SPSS) Thomas Reinartz (DaimlerChrysler) Rüdiger Wirth (DaimlerChrysler) This discussion

More information

A Comparative Study of Data Mining Process Models (KDD, CRISP-DM and SEMMA)

A Comparative Study of Data Mining Process Models (KDD, CRISP-DM and SEMMA) International Journal of Innovation and Scientific Research ISSN 2351-8014 Vol. 12 No. 1 Nov. 2014, pp. 217-222 2014 Innovative Space of Scientific Research Journals http://www.ijisr.issr-journals.org/

More information

CRISP-DM 1.0. Step-by-step data mining guide

CRISP-DM 1.0. Step-by-step data mining guide Step-by-step data mining guide Pete Chapman (NCR), Julian Clinton (SPSS), Randy Kerber (NCR), Thomas Khabaza (SPSS), Thomas Reinartz (DaimlerChrysler), Colin Shearer (SPSS) and Rüdiger Wirth (DaimlerChrysler)

More information

International Journal of Computer Engineering and Applications, ICCSTAR-2016, Special Issue, May.16

International Journal of Computer Engineering and Applications, ICCSTAR-2016, Special Issue, May.16 The Survey Of Data Mining And Warehousing Architha.S, A.Kishore Kumar Department of Computer Engineering Department of computer engineering city engineering college VTU Bangalore, India ABSTRACT: Data

More information

Introduction to Data Mining

Introduction to Data Mining Introduction to Data Mining José Hernández-Orallo Dpto. de Sistemas Informáticos y Computación Universidad Politécnica de Valencia, Spain jorallo@dsic.upv.es Roma, 14-15th May 2009 1 Outline Motivation.

More information

The Data Science Process. Polong Lin Big Data University Leader & Data Scientist IBM

The Data Science Process. Polong Lin Big Data University Leader & Data Scientist IBM The Data Science Process Polong Lin Big Data University Leader & Data Scientist IBM polong@ca.ibm.com Every day, we create 2.5 quintillion bytes of data so much that 90% of the data in the world today

More information

TIM 50 - Business Information Systems

TIM 50 - Business Information Systems TIM 50 - Business Information Systems Lecture 15 UC Santa Cruz May 20, 2014 Announcements DB 2 Due Tuesday Next Week The Database Approach to Data Management Database: Collection of related files containing

More information

SEGUE DISCOVERY PARTICIPATION IN DISCOVERY DISCOVERY DELIVERABLES. Discovery

SEGUE DISCOVERY PARTICIPATION IN DISCOVERY DISCOVERY DELIVERABLES.   Discovery SEGUE DISCOVERY An initial engagement with Segue begins with a Phase where our experienced team works directly with our customer to define the vision, scope, and high-level requirements for the project.

More information

A Variability-Aware Design Approach to the Data Analysis Modeling Process

A Variability-Aware Design Approach to the Data Analysis Modeling Process A Variability-Aware Design Approach to the Data Analysis Modeling Process Maria Cristina Vale Tavares David R. Cheriton School of Computer Science University of Waterloo Waterloo, Canada mvtavare@uwaterloo.ca

More information

Data Mining An Overview ITEV, F /18

Data Mining An Overview ITEV, F /18 Data Mining An Overview ITEV, F-2008 1/18 ITEV, F-2008 2/18 What is Data Mining?? ITEV, F-2008 2/18 What is Data Mining?? ITEV, F-2008 2/18 What is Data Mining?! ITEV, F-2008 3/18 What is Data Mining?

More information

9. Conclusions. 9.1 Definition KDD

9. Conclusions. 9.1 Definition KDD 9. Conclusions Contents of this Chapter 9.1 Course review 9.2 State-of-the-art in KDD 9.3 KDD challenges SFU, CMPT 740, 03-3, Martin Ester 419 9.1 Definition KDD [Fayyad, Piatetsky-Shapiro & Smyth 96]

More information

Practical Guide to Cloud Computing Version 2. Read whitepaper at

Practical Guide to Cloud Computing Version 2. Read whitepaper at Practical Guide to Cloud Computing Version 2 Read whitepaper at www.cloud-council.org/resource-hub Sept, 2015 The Cloud Standards Customer Council THE Customer s Voice for Cloud Standards! 2011/2012 Deliverables

More information

Oracle Big Data Science

Oracle Big Data Science Oracle Big Data Science Tim Vlamis and Dan Vlamis Vlamis Software Solutions 816-781-2880 www.vlamis.com @VlamisSoftware Vlamis Software Solutions Vlamis Software founded in 1992 in Kansas City, Missouri

More information

Enhancing Preprocessing in Data-Intensive Domains using Online-Analytical Processing

Enhancing Preprocessing in Data-Intensive Domains using Online-Analytical Processing Enhancing Preprocessing in Data-Intensive Domains using Online-Analytical Processing Alexander Maedche 1, Andreas Hotho 1, and Markus Wiese 2 1 Institute AIFB, Karlsruhe University, D-76128 Karlsruhe,

More information

BEST BIG DATA CERTIFICATIONS

BEST BIG DATA CERTIFICATIONS VALIANCE INSIGHTS BIG DATA BEST BIG DATA CERTIFICATIONS email : info@valiancesolutions.com website : www.valiancesolutions.com VALIANCE SOLUTIONS Analytics: Optimizing Certificate Engineer Engineering

More information

Now, Data Mining Is Within Your Reach

Now, Data Mining Is Within Your Reach Clementine Desktop Specifications Now, Data Mining Is Within Your Reach Data mining delivers significant, measurable value. By uncovering previously unknown patterns and connections in data, data mining

More information

Data Mining Overview. CHAPTER 1 Introduction to SAS Enterprise Miner Software

Data Mining Overview. CHAPTER 1 Introduction to SAS Enterprise Miner Software 1 CHAPTER 1 Introduction to SAS Enterprise Miner Software Data Mining Overview 1 Layout of the SAS Enterprise Miner Window 2 Using the Application Main Menus 3 Using the Toolbox 8 Using the Pop-Up Menus

More information

1 of 5 1/28/2015 12:27 PM BDA Program Program Mission/Purpose The mission of the Bachelor of Science in Business Data Analytics (BDA) program is to prepare students to understand the foundation of business

More information

Knowledge Discovery. URL - Spring 2018 CS - MIA 1/22

Knowledge Discovery. URL - Spring 2018 CS - MIA 1/22 Knowledge Discovery Javier Béjar cbea URL - Spring 2018 CS - MIA 1/22 Knowledge Discovery (KDD) Knowledge Discovery in Databases (KDD) Practical application of the methodologies from machine learning/statistics

More information

TIM 50 - Business Information Systems

TIM 50 - Business Information Systems TIM 50 - Business Information Systems Lecture 15 UC Santa Cruz Nov 10, 2016 Class Announcements n Database Assignment 2 posted n Due 11/22 The Database Approach to Data Management The Final Database Design

More information

Integrating MATLAB Analytics into Business-Critical Applications Marta Wilczkowiak Senior Applications Engineer MathWorks

Integrating MATLAB Analytics into Business-Critical Applications Marta Wilczkowiak Senior Applications Engineer MathWorks Integrating MATLAB Analytics into Business-Critical Applications Marta Wilczkowiak Senior Applications Engineer MathWorks 2015 The MathWorks, Inc. 1 Problem statement Democratization: Is it possible to

More information

Agile Accessibility. Presenters: Ensuring accessibility throughout the Agile development process

Agile Accessibility. Presenters: Ensuring accessibility throughout the Agile development process Agile Accessibility Ensuring accessibility throughout the Agile development process Presenters: Andrew Nielson, CSM, PMP, MPA Ann Marie Davis, CSM, PMP, M. Ed. Cammie Truesdell, M. Ed. Overview What is

More information

PROIV Annual Announcement Event 15 th July 2015

PROIV Annual Announcement Event 15 th July 2015 PROIV Annual Announcement Event 15 th July 2015 www.proiv.com PROIV Annual Announcements - July 15 th 2015 This year the PROIV announcement event delivered news and updates on the future of the PROIV Application

More information

ENTERPRISE MINER: 1 DATA EXPLORATION AND VISUALISATION

ENTERPRISE MINER: 1 DATA EXPLORATION AND VISUALISATION ENTERPRISE MINER: 1 DATA EXPLORATION AND VISUALISATION JOZEF MOFFAT, ANALYTICS & INNOVATION PRACTICE, SAS UK 10, MAY 2016 DATA EXPLORATION AND VISUALISATION AGENDA SAS Webinar 10th May 2016 at 10:00 AM

More information

Week 1 Unit 1: Introduction to Data Science

Week 1 Unit 1: Introduction to Data Science Week 1 Unit 1: Introduction to Data Science The next 6 weeks What to expect in the next 6 weeks? 2 Curriculum flow (weeks 1-3) Business & Data Understanding 1 2 3 Data Preparation Modeling (1) Introduction

More information

Data Management Glossary

Data Management Glossary Data Management Glossary A Access path: The route through a system by which data is found, accessed and retrieved Agile methodology: An approach to software development which takes incremental, iterative

More information

Event: PASS SQL Saturday - DC 2018 Presenter: Jon Tupitza, CTO Architect

Event: PASS SQL Saturday - DC 2018 Presenter: Jon Tupitza, CTO Architect Event: PASS SQL Saturday - DC 2018 Presenter: Jon Tupitza, CTO Architect BEOP.CTO.TP4 Owner: OCTO Revision: 0001 Approved by: JAT Effective: 08/30/2018 Buchanan & Edwards Proprietary: Printed copies of

More information

An Introduction to Data Mining in Institutional Research. Dr. Thulasi Kumar Director of Institutional Research University of Northern Iowa

An Introduction to Data Mining in Institutional Research. Dr. Thulasi Kumar Director of Institutional Research University of Northern Iowa An Introduction to Data Mining in Institutional Research Dr. Thulasi Kumar Director of Institutional Research University of Northern Iowa AIR/SPSS Professional Development Series Background Covering variety

More information

Assignments. Assignment 2 is due TODAY, 11:59pm! Submit one per pair on Blackboard.

Assignments. Assignment 2 is due TODAY, 11:59pm! Submit one per pair on Blackboard. HCI and Design Assignments Assignment 2 is due TODAY, 11:59pm! Submit one per pair on Blackboard. Today Paper prototyping An essential tool in your design toolbox! How do we design things that actually

More information

opensap Getting Started with Data Science

opensap Getting Started with Data Science opensap Getting Started with Data Science Week 1 Unit 1 00:00:11 Hello and welcome to the opensap course "Getting Started with Data Science". My name is Stuart Clarke and I am a consultant with SAP, specializing

More information

a brief introduction to creating quality software continuously Copyright 2011 Davisbase, LLC

a brief introduction to creating quality software continuously Copyright 2011 Davisbase, LLC a brief introduction to creating quality software continuously Andy Painter Agile Coach/Trainer/Consultant CSM, CSP, CSD Instructor andy@davisbase.org (704) 835-0194 Interests: Cloud Computing, Agile Development

More information

Applying Auto-Data Classification Techniques for Large Data Sets

Applying Auto-Data Classification Techniques for Large Data Sets SESSION ID: PDAC-W02 Applying Auto-Data Classification Techniques for Large Data Sets Anchit Arora Program Manager InfoSec, Cisco The proliferation of data and increase in complexity 1995 2006 2014 2020

More information

EU mhealth Working Group

EU mhealth Working Group EU mhealth Working Group 500 million users 45000 publishers 5% of total app market 3bn downloads Status Guidelines for assessing reliability & validity Lots of work on Scope, Target Group & Criteria https://ec.europa.eu/digital-single-market/en/mhealth

More information

Slice Intelligence!

Slice Intelligence! Intern @ Slice Intelligence! Wei1an(Wu( September(8,(2014( Outline!! Details about the job!! Skills required and learned!! My thoughts regarding the internship! About the company!! Slice, which we call

More information

SYLLABUS. Departmental Syllabus. Structured Query Language (SQL)

SYLLABUS. Departmental Syllabus. Structured Query Language (SQL) SYLLABUS DATE OF LAST REVIEW: 02/2013 CIP CODE: 11.0901 SEMESTER: COURSE TITLE: COURSE NUMBER: Structured Query Language (SQL) CIST0151 CREDIT HOURS: 3 INSTRUCTOR: OFFICE LOCATION: OFFICE HOURS: TELEPHONE:

More information

The development process of the Online S3 project. Anastasia Panori, INTELSPACE Innovation Technologies S.A.

The development process of the Online S3 project. Anastasia Panori, INTELSPACE Innovation Technologies S.A. The development process of the Online S3 project Anastasia Panori, INTELSPACE Innovation Technologies S.A. Online S3 Final Workshop, Brussels 25 04 2018 Some general information Proposal submission: September

More information

How to choose a website design firm

How to choose a website design firm How to choose a website design firm 22 questions to ask before engaging in an important partnership Website development projects can be fraught with risk. Organizations often wonder: How can we be sure

More information

MAASTO TPIMS Systems Engineering Analysis. Documentation

MAASTO TPIMS Systems Engineering Analysis. Documentation MAASTO TPIMS Project MAASTO TPIMS Systems Engineering Analysis Documentation Date: November 18, 2016 Subject: MAASTO TPIMS Systems Engineering Analysis and Supplementary Project Documentation Summary Introduction

More information

The Future of Analytics or The New SQL

The Future of Analytics or The New SQL The Future of Analytics or The New SQL Gerhard Otterbach, Sales Manager Teradata Germany Hanau, Feb. 28th, 2018 Teradata At A Glance: 39 Years Ago Teradata was big data before there was big data Donald

More information

Stakeholder consultation process and online consultation platform

Stakeholder consultation process and online consultation platform Stakeholder consultation process and online consultation platform Grant agreement no.: 633107 Deliverable No. D6.2 Stakeholder consultation process and online consultation platform Status: Final Dissemination

More information

REVENUE REPORTING DASHBOARD FOR A HOTEL GROUP

REVENUE REPORTING DASHBOARD FOR A HOTEL GROUP REVENUE REPORTING DASHBOARD FOR A HOTEL GROUP THE CLIENT PROBLEM Our client, an international hotel chain, wanted to create a completely automated performance evaluation engine for ancillary products.

More information

Boost your Analytics with Machine Learning for SQL Nerds. Julie mssqlgirl.com

Boost your Analytics with Machine Learning for SQL Nerds. Julie mssqlgirl.com Boost your Analytics with Machine Learning for SQL Nerds Julie Koesmarno @MsSQLGirl mssqlgirl.com 1. Y ML 2. Operationalizing ML 3. Tips & Tricks 4. Resources automation delighting customers Deepen Engagement

More information

DATA SCIENCE METHODOLOGY FOR CYBERSECURITY PROJECTS

DATA SCIENCE METHODOLOGY FOR CYBERSECURITY PROJECTS DATA SCIENCE METHODOLOGY FOR CYBERSECURITY PROJECTS Farhad Foroughi 1 and Peter Luksch 2 Institute of Computer Science University of Rostock, Rostock, Germany 1 farhad.foroughi@uni-rostock.de 2 peter.luksch@uni-rostock.de

More information

Intelligence for the connected world How European First-Movers Manage IoT Analytics Projects Successfully

Intelligence for the connected world How European First-Movers Manage IoT Analytics Projects Successfully Intelligence for the connected world How European First-Movers Manage IoT Analytics Projects Successfully Thomas Rohrmann, Michael Probst Analytics Experience 2016, Rome #analyticsx C opyr i g ht 2016,

More information

Think & Work like a Data Scientist with SQL 2016 & R DR. SUBRAMANI PARAMASIVAM (MANI)

Think & Work like a Data Scientist with SQL 2016 & R DR. SUBRAMANI PARAMASIVAM (MANI) Think & Work like a Data Scientist with SQL 2016 & R DR. SUBRAMANI PARAMASIVAM (MANI) About the Speaker Dr. SubraMANI Paramasivam PhD., MCT, MCSE, MCITP, MCP, MCTS, MCSA CEO, Principal Consultant & Trainer

More information

Introducing Oracle Machine Learning

Introducing Oracle Machine Learning Introducing Oracle Machine Learning A Collaborative Zeppelin notebook for Oracle s machine learning capabilities Charlie Berger Marcos Arancibia Mark Hornick Advanced Analytics and Machine Learning Copyright

More information

Saving the Project Brief document under its own name

Saving the Project Brief document under its own name HOW TO USE THIS TEMPLATE: Introduction The template reflects the steps set out in the PRINCE2 Method and is designed to prompt the Project Manager and help in the creation of the. The information for the

More information

Enterprise Guest Access

Enterprise Guest Access Data Sheet Published Date July 2015 Service Overview Whether large or small, companies have guests. Guests can be virtually anyone who conducts business with the company but is not an employee. Many of

More information

SAS 9 Programming Enhancements Marje Fecht, Prowerk Consulting Ltd Mississauga, Ontario, Canada

SAS 9 Programming Enhancements Marje Fecht, Prowerk Consulting Ltd Mississauga, Ontario, Canada SAS 9 Programming Enhancements Marje Fecht, Prowerk Consulting Ltd Mississauga, Ontario, Canada ABSTRACT Performance improvements are the well-publicized enhancement to SAS 9, but what else has changed

More information

CLOUD WORKLOAD SECURITY

CLOUD WORKLOAD SECURITY SOLUTION OVERVIEW CLOUD WORKLOAD SECURITY Bottom line: If you re in IT today, you re already in the cloud. As technology becomes an increasingly important element of business success, the adoption of highly

More information

Outrun Your Competition With SAS In-Memory Analytics Sascha Schubert Global Technology Practice, SAS

Outrun Your Competition With SAS In-Memory Analytics Sascha Schubert Global Technology Practice, SAS Outrun Your Competition With SAS In-Memory Analytics Sascha Schubert Global Technology Practice, SAS Topics AGENDA Challenges with Big Data Analytics How SAS can help you to minimize time to value with

More information

SAS Enterprise Miner 7.1

SAS Enterprise Miner 7.1 SAS Enterprise Miner 7.1 Data Mining using SAS IASRI Satyajit Dwivedi Transforming the World DATA MINING SEMMA Process Sample Explore Modify Model Assess Utility 2 SEMMA Process - Creating Library Select

More information

Data Analysis Using Sql And Excel 2nd Edition

Data Analysis Using Sql And Excel 2nd Edition We have made it easy for you to find a PDF Ebooks without any digging. And by having access to our ebooks online or by storing it on your computer, you have convenient answers with data analysis using

More information

Gain Greater Productivity in Enterprise Data Mining

Gain Greater Productivity in Enterprise Data Mining Clementine 9.0 Specifications Gain Greater Productivity in Enterprise Data Mining Discover patterns and associations in your organization s data and make decisions that lead to significant, measurable

More information

CoE CENTRE of EXCELLENCE ON DATA WAREHOUSING

CoE CENTRE of EXCELLENCE ON DATA WAREHOUSING in partnership with Overall handbook to set up a S-DWH CoE: Deliverable: 4.6 Version: 3.1 Date: 3 November 2017 CoE CENTRE of EXCELLENCE ON DATA WAREHOUSING Handbook to set up a S-DWH 1 version 2.1 / 4

More information

Data Sheet - Site and User Analytics for SharePoint PRODUCT BROCHURE.

Data Sheet - Site and User Analytics for SharePoint PRODUCT BROCHURE. Data Sheet - Site and User Analytics for SharePoint PRODUCT BROCHURE www.sharepointvitals.com SharePoint Vitals brings all the user activity data you could ever want to your fingertips. ABOUT VITALS Find

More information

Lecture 8 Requirements Engineering

Lecture 8 Requirements Engineering Lecture 8 Requirements Engineering Software Engineering ITCS 3155 Fall 2008 Dr. Jamie Payton Department of Computer Science University of North Carolina at Charlotte September 18, 2008 Lecture Overview

More information

Standards, Evaluation Criteria and Best Practices Telecommunications and Technology Advisory Committee Systemwide Architecture Committee.

Standards, Evaluation Criteria and Best Practices Telecommunications and Technology Advisory Committee Systemwide Architecture Committee. Standards, Evaluation Criteria and Best Practices Telecommunications and Technology Advisory Committee Systemwide Architecture Committee Appendix D APPENDIX D STANDARDS, EVALUATION CRITERIA, AND BEST PRACTICES

More information

MGA Developing Interactive Systems (5 ECTS), spring 2017 (16 weeks)

MGA Developing Interactive Systems (5 ECTS), spring 2017 (16 weeks) MGA 672 - Developing Interactive Systems (5 ECTS), spring 2017 (16 weeks) Lecturer: Ilja Šmorgun ilja.smorgun@idmaster.eu, Sónia Sousa sonia.sousa@idmaster.eu Contact Details: All email communication regarding

More information

Data Entry, and Manipulation. DataONE Community Engagement & Outreach Working Group

Data Entry, and Manipulation. DataONE Community Engagement & Outreach Working Group Data Entry, and Manipulation DataONE Community Engagement & Outreach Working Group Lesson Topics Best Practices for Creating Data Files Data Entry Options Data Integration Best Practices Data Manipulation

More information

Optimizing Your Analytics Life Cycle with SAS & Teradata. Rick Lower

Optimizing Your Analytics Life Cycle with SAS & Teradata. Rick Lower Optimizing Your Analytics Life Cycle with SAS & Teradata Rick Lower 1 Agenda The Analytic Life Cycle Common Problems SAS & Teradata solutions Analytical Life Cycle Exploration Explore All Your Data Preparation

More information

Data Mining. Introduction. Piotr Paszek. (Piotr Paszek) Data Mining DM KDD 1 / 44

Data Mining. Introduction. Piotr Paszek. (Piotr Paszek) Data Mining DM KDD 1 / 44 Data Mining Piotr Paszek piotr.paszek@us.edu.pl Introduction (Piotr Paszek) Data Mining DM KDD 1 / 44 Plan of the lecture 1 Data Mining (DM) 2 Knowledge Discovery in Databases (KDD) 3 CRISP-DM 4 DM software

More information

Data Warehousing and Machine Learning

Data Warehousing and Machine Learning Data Warehousing and Machine Learning Introduction Thomas D. Nielsen Aalborg University Department of Computer Science Spring 2008 DWML Spring 2008 1 / 47 What is Data Mining?? Introduction DWML Spring

More information

Database Systems: Concepts, design, and implementation ISE 382 (3 Units)

Database Systems: Concepts, design, and implementation ISE 382 (3 Units) Database Systems: Concepts, design, and implementation ISE 382 (3 Units) Spring 2013 Description Obectives Instructor Contact Information Office Hours Concepts in modeling data for industry applications.

More information

DATA MINING TEAM #1. Kristen Durst Mark Gillespie Banan Mandura. MBA 664: Database Management

DATA MINING TEAM #1. Kristen Durst Mark Gillespie Banan Mandura. MBA 664: Database Management DATA MINING TEAM #1 Kristen Durst Mark Gillespie Banan Mandura : Database Management OUTLINE INTRODUCTION 1 DATA MINING DEFINITION AND EXAMPLES 1 DATA MINING PRODUCTS 2 DATA MINING PROCESS 4 DATA MINING

More information

Queries give database managers its real power. Their most common function is to filter and consolidate data from tables to retrieve it.

Queries give database managers its real power. Their most common function is to filter and consolidate data from tables to retrieve it. 1 2 Queries give database managers its real power. Their most common function is to filter and consolidate data from tables to retrieve it. The data you want to see is usually spread across several tables

More information

Re-using Data Mining Workflows

Re-using Data Mining Workflows Re-using Data Mining Workflows Stefan Rüping, Dennis Wegener, and Philipp Bremer Fraunhofer IAIS, Schloss Birlinghoven, 53754 Sankt Augustin, Germany http://www.iais.fraunhofer.de Abstract. Setting up

More information

HP Storage Summit 2015 Transform Now.

HP Storage Summit 2015 Transform Now. HP Storage Summit 2015 Transform Now. Flash Class Data Protection Andrew Dickerson Data protection from software-defined to scale-out Data Protection Challenges Even more pressure to prevent downtime High

More information

COCKPIT FP Citizens Collaboration and Co-Creation in Public Service Delivery. Deliverable D Opinion Mining Tools 1st version

COCKPIT FP Citizens Collaboration and Co-Creation in Public Service Delivery. Deliverable D Opinion Mining Tools 1st version COCKPIT FP7-248222 Citizens Collaboration and Co-Creation in Public Service Delivery Deliverable D2.1.1 Opinion Mining Tools 1st version Editor(s): Responsible Partner: Kostas Giannakakis ATC, INTRASOFT

More information

Red Hat Application Migration Toolkit 4.0

Red Hat Application Migration Toolkit 4.0 Red Hat Application Migration Toolkit 4.0 Getting Started Guide Simplify Migration of Java Applications Last Updated: 2018-04-04 Red Hat Application Migration Toolkit 4.0 Getting Started Guide Simplify

More information

ERP Solution to the Cloud

ERP Solution to the Cloud IT s Not so Scary: Moving your Onprem ERP Solution to the Cloud Lizza Novo Mission Furthering your success through the alignment of strategy, people, processes and technology. What is the Term Cloud? Server

More information

This tutorial also elaborates on other related methodologies like Agile, RAD and Prototyping.

This tutorial also elaborates on other related methodologies like Agile, RAD and Prototyping. i About the Tutorial SDLC stands for Software Development Life Cycle. SDLC is a process that consists of a series of planned activities to develop or alter the Software Products. This tutorial will give

More information

Eight units must be completed and passed to be awarded the Diploma.

Eight units must be completed and passed to be awarded the Diploma. Diploma of Computing Course Outline Campus Intake CRICOS Course Duration Teaching Methods Assessment Course Structure Units Melbourne Burwood Campus / Jakarta Campus, Indonesia March, June, October 022638B

More information

Dr. SubraMANI Paramasivam. Think & Work like a Data Scientist with SQL 2016 & R

Dr. SubraMANI Paramasivam. Think & Work like a Data Scientist with SQL 2016 & R Dr. SubraMANI Paramasivam Think & Work like a Data Scientist with SQL 2016 & R About the Speaker Group Leader Dr. SubraMANI Paramasivam PhD., MVP, MCT, MCSE (x2), MCITP (x2), MCP, MCTS (x3), MCSA CEO,

More information

Creating an Intranet using Lotus Web Content Management. Part 2 Project Planning

Creating an Intranet using Lotus Web Content Management. Part 2 Project Planning Creating an Intranet using Lotus Web Content Management Introduction Part 2 Project Planning Many projects have failed due to poor project planning. The following article gives an overview of the typical

More information

Introducing Oracle R Enterprise 1.4 -

Introducing Oracle R Enterprise 1.4 - Hello, and welcome to this online, self-paced lesson entitled Introducing Oracle R Enterprise. This session is part of an eight-lesson tutorial series on Oracle R Enterprise. My name is Brian Pottle. I

More information

Using ArcGIS Online to Release an AODA Compliant Application

Using ArcGIS Online to Release an AODA Compliant Application Using ArcGIS Online to Release an AODA Compliant Application Presentation for the Esri Canada User Conference (Toronto 2017) Mitchell Knight Ontario Ministry of the Environment and Climate Change October

More information

COCKPIT FP Citizens Collaboration and Co-Creation in Public Service Delivery. Deliverable D2.4.1

COCKPIT FP Citizens Collaboration and Co-Creation in Public Service Delivery. Deliverable D2.4.1 : 0.3, Date: 31/05/2012 COCKPIT FP7-248222 Citizens Collaboration and Co-Creation in Public Service Delivery Deliverable D2.4.1 Citizens Deliberative Engagement Platform 2 nd Editor(s): Responsible Partner:

More information

Units. Unit 4: Internet. Year 1 Unit 1: Course Overview

Units. Unit 4: Internet. Year 1 Unit 1: Course Overview ITGS SL Units All Pamoja courses are written by experienced subject matter experts and integrate the principles of TOK and the approaches to learning of the IB learner profile. This course has been authorised

More information

Embarking on the next stage of hosted desktop delivery for international events management company

Embarking on the next stage of hosted desktop delivery for international events management company Embarking on the next stage of hosted desktop delivery for international events management company Richmond Events is an international events management company, delivering a diverse range of forums and

More information

Advanced Data Modeling: Be Happier, Add More Value and Be More Valued

Advanced Data Modeling: Be Happier, Add More Value and Be More Valued Advanced Data Modeling: Be Happier, Add More Value and Be More Valued Karen Lopez Karen López, A frequent speaker on data modeling, data-driven methodologies and pattern data models. SQL Server MVP She

More information

Il caso della Prescriptive Maintenance

Il caso della Prescriptive Maintenance Reimagine 2018 HPE Pointnext AI at Work Il caso della Prescriptive Maintenance nel mondo industriale Edmondo De Salvo WW AI, Data & Emerging Technology Center of Excellence 24 maggio 2018 We live in a

More information

Managing Data Resources

Managing Data Resources Chapter 7 OBJECTIVES Describe basic file organization concepts and the problems of managing data resources in a traditional file environment Managing Data Resources Describe how a database management system

More information

DATA MINING AND WAREHOUSING

DATA MINING AND WAREHOUSING DATA MINING AND WAREHOUSING Qno Question Answer 1 Define data warehouse? Data warehouse is a subject oriented, integrated, time-variant, and nonvolatile collection of data that supports management's decision-making

More information

Chapter 17: INTERNATIONAL DATA PRODUCTS

Chapter 17: INTERNATIONAL DATA PRODUCTS Chapter 17: INTERNATIONAL DATA PRODUCTS After the data processing and data analysis, a series of data products were delivered to the OECD. These included public use data files and codebooks, compendia

More information

NCHRP Project Impacts of Connected Vehicles and Automated Vehicles on State and Local Transportation Agencies

NCHRP Project Impacts of Connected Vehicles and Automated Vehicles on State and Local Transportation Agencies NCHRP Project 20-102 Impacts of Connected Vehicles and Automated Vehicles on State and Local Transportation Agencies Announcement of New Tasks, August 2016 The National Cooperative Highway Research Program

More information

CTL.SC4x Technology and Systems

CTL.SC4x Technology and Systems in Supply Chain Management CTL.SC4x Technology and Systems Key Concepts Document This document contains the Key Concepts for the SC4x course, Weeks 1 and 2. These are meant to complement, not replace,

More information

09/07: Project Plan. The Capstone Experience. Dr. Wayne Dyksen Department of Computer Science and Engineering Michigan State University Fall 2016

09/07: Project Plan. The Capstone Experience. Dr. Wayne Dyksen Department of Computer Science and Engineering Michigan State University Fall 2016 09/07: Project Plan The Capstone Experience Dr. Wayne Dyksen Department of Computer Science and Engineering Michigan State University Fall 2016 From Students to Professionals Project Plan Functional Specifications

More information

Creating a Departmental Standard SAS Enterprise Guide Template

Creating a Departmental Standard SAS Enterprise Guide Template Paper 1288-2017 Creating a Departmental Standard SAS Enterprise Guide Template ABSTRACT Amanda Pasch and Chris Koppenhafer, Kaiser Permanente This paper describes an ongoing effort to standardize and simplify

More information

International Journal of Data Mining & Knowledge Management Process (IJDKP) Vol.7, No.3, May Dr.Zakea Il-Agure and Mr.Hicham Noureddine Itani

International Journal of Data Mining & Knowledge Management Process (IJDKP) Vol.7, No.3, May Dr.Zakea Il-Agure and Mr.Hicham Noureddine Itani LINK MINING PROCESS Dr.Zakea Il-Agure and Mr.Hicham Noureddine Itani Higher Colleges of Technology, United Arab Emirates ABSTRACT Many data mining and knowledge discovery methodologies and process models

More information

Department of Computer Science and Information Systems, College of Business and Technology, Morehead State University

Department of Computer Science and Information Systems, College of Business and Technology, Morehead State University 1 Department of Computer Science and Information Systems, College of Business and Technology, Morehead State University Lecture 3 Part A CIS 311 Introduction to Management Information Systems (Spring 2017)

More information

Oracle Big Data Science IOUG Collaborate 16

Oracle Big Data Science IOUG Collaborate 16 Oracle Big Data Science IOUG Collaborate 16 Session 4762 Tim and Dan Vlamis Tuesday, April 12, 2016 Vlamis Software Solutions Vlamis Software founded in 1992 in Kansas City, Missouri Developed 200+ Oracle

More information

Seamless Dynamic Web (and Smart Device!) Reporting with SAS D.J. Penix, Pinnacle Solutions, Indianapolis, IN

Seamless Dynamic Web (and Smart Device!) Reporting with SAS D.J. Penix, Pinnacle Solutions, Indianapolis, IN Paper RIV05 Seamless Dynamic Web (and Smart Device!) Reporting with SAS D.J. Penix, Pinnacle Solutions, Indianapolis, IN ABSTRACT The SAS Business Intelligence platform provides a wide variety of reporting

More information

Continuous Delivery and Team Foundation Server Ognjen Bajić Ana Roje Ivančić Ekobit

Continuous Delivery and Team Foundation Server Ognjen Bajić Ana Roje Ivančić Ekobit Continuous Delivery and Team Foundation Server 2013 Ognjen Bajić Ana Roje Ivančić Ekobit Turn off your mobile. Thank you. Agenda Continuous Delivery Challenges Automated Build with Build Verification Tests

More information

SOFTWARE DEVELOPMENT: DATA SCIENCE

SOFTWARE DEVELOPMENT: DATA SCIENCE PROFESSIONAL CAREER TRAINING INSTITUTE SOFTWARE DEVELOPMENT: DATA SCIENCE www.pcti.edu/data-science applicant@pcti.edu 832-484-9100 PROGRAM OVERVIEW Prepare for a life changing career as a data scientist

More information

Technology Strategy and Roadmap. October 2015

Technology Strategy and Roadmap. October 2015 Technology Strategy and Roadmap October 2015 1 STREAM & EVOLUTION of TECHNOLOGY 2 User interaction across all devices Klopotek STREAM is a platform for user interaction across computers and portables.

More information

Units. Year 1 Unit 1: Course Overview. Unit 4: Internet

Units. Year 1 Unit 1: Course Overview. Unit 4: Internet ITGS HL Units All Pamoja courses are written by experienced subject matter experts and integrate the principles of TOK and the approaches to learning of the IB learner profile. This course has been authorised

More information

Stages of Data Processing

Stages of Data Processing Data processing can be understood as the conversion of raw data into a meaningful and desired form. Basically, producing information that can be understood by the end user. So then, the question arises,

More information