Analysis of Big Data Tim Miller, Sr. Analytics Consultant Teradata Alexander Kolovos, Ph.D., Advanced Analytics Software Engineer Teradata

Size: px
Start display at page:

Download "Analysis of Big Data Tim Miller, Sr. Analytics Consultant Teradata Alexander Kolovos, Ph.D., Advanced Analytics Software Engineer Teradata"

Transcription

1 Analysis of Big Data Tim Miller, Sr. Analytics Consultant Teradata Alexander Kolovos, Ph.D., Advanced Analytics Software Engineer Teradata March 28, 2017

2 Your Presenters Tim Miller Senior Analytics Consultant Teradata Corporation Expertise in advanced analytic software, systems and methodologies. Principal engineer for the first commercial in-database data mining system, Teradata Warehouse Miner. Consultant to Teradata analytic partners (SAS, SPSS, etc.) and customers. Retired youth basketball, football, baseball, softball, soccer, etc. coach Alexander Kolovos, Ph.D. Advanced Analytics Software Engineer Teradata Corporation Expertise in analytical methodologies and platforms Specialization in space-time data and stochastic predictive analysis Ph.D. in Sciences and Engineering 4 years in Teradata (Analytics Engineer) 6 years in SAS (Spatial software expert) Loves language, theater, & rock music Teradata

3 Overview Big Data: Brief review Storage and Availability Information in your hands. Wanna keep it where? Analysis Exploratory and summary. Is this enough? Can pretty pictures tell stories? Recreate the world around you Tools and tricks of the trade Walking the walk The foundation, the strategy, the tools Managing the big story Model management Application management Teradata

4 Overview Big Data: Brief review Storage and Availability Information in your hands. Wanna keep it where? Analysis Exploratory and summary. Is this enough? Can pretty pictures tell stories? Recreate the world around you Tools and tricks of the trade Walking the walk The foundation, the strategy, the tools Managing the big story Model management Application management Teradata

5 Big Data: Brief Review Last half-century marked precipitous technological advances that gradually provided unprecedented computing power to speed up calculations that were previously time-consuming or even time-prohibitive enabled progressively increasing monitoring, recording and storing of empirical information, specifically focusing on data measurements Schematic depiction of Moore s Law (computational power doubles annually to Physics laws limit) Teradata

6 Big Data: Brief Review A more sober approach appears to have followed an initial frenzy about the availability of large volumes of information. According to Hortonworks Inc., developer of the Hadoop platform: Big data describes the realization of greater business intelligence by storing, processing, and analyzing data that was previously ignored or siloed due to the limitations of traditional data management technologies Teradata

7 Big Data: Brief Review Big Data is the 3 Vs: Variety Structured and unstructured data Volume Tera- (10 12 ) peta- (10 15 ) and even exabytes (10 18 ) of data Velocity Data flows into your organization at an increasing rate Teradata

8 Big Data: Brief Review Big Data bring forward the issue of scaling: Solve problems old or new, trivial or elaborate in entirely new frameworks characterized by increasing data sizes Face new challenges: Conceive and apply appropriate methodologies Maintain competitive performance Engineer effective hardware architectures Generate suitable software solutions Example: Handle matrix inversions for increasingly large matrix sizes Example: Deal with sparse data in very large dimensions Teradata

9 Overview Big Data: Brief review Storage and Availability Information in your hands. Wanna keep it where? Analysis Exploratory and summary. Is this enough? Can pretty pictures tell stories? Recreate the world around you Tools and tricks of the trade Walking the walk The foundation, the strategy, the tools Managing the big story Model management Application management Teradata

10 Storing Your Information So you have this dazzling amount of information. Where will you keep it? Teradata

11 Storing Your Information Nowadays, majority of options can be rather summarized in the following: Locally: Computer / Drive Locally: Company Server Cloud: Remote Server Teradata

12 Storing Your Information Nowadays, majority of options can be rather summarized in the following: [Typically] somebody else s computer! No matter whether it is called DropBox, Box, icloud, AWS, Samsung Cloud, etc. it is what it is: Somebody else s computer. Cloud: Remote Server Privacy concerns and data safety: Huge topics in the era of Big Data! Teradata

13 Availability and Safekeeping If your data are not in your hands, then where are they? Cloud servers can be hardware located anywhere in the world. Data can be stored in multiple copies, possibly in different locations, too, to prevent loss in case of hardware or network failures Example: Amazon Web Services Global Infrastructure Teradata

14 Availability and Safekeeping What steps can be taken to protect your data? Formal legislation Data encryption, safety protocols, restricted access A very sensitive topic in the nascent steps of new tech. Technology offers great business opportunities, but Caution needed to prevent putting the cart before the horse Teradata

15 Overview Big Data: Brief review Storage and Availability Information in your hands. Wanna keep it where? Analysis Exploratory and summary. Is this enough? Can pretty pictures tell stories? Recreate the world around you Tools and tricks of the trade Walking the walk The foundation, the strategy, the tools Managing the big story Model management Application management Teradata

16 Now What? Assume your data is kept safe somewhere. What comes next? Teradata

17 The Next Step In academic environments: Use data to answer scientific questions about a phenomenon or study an attribute of interest. In a business context: Gain insight about problem to understand market, optimize operations, increase profit, etc.; commonly expressed as aiming to increase Business Intelligence Teradata

18 D e b t < 1 0 % o f I n c o m e D e b t = 0 % Y G o o d C r e d i t R i s k s e s Y e s N O I n c o m e > $ 4 0 K B a d C r e d i t R i s k s N O N O Y e s G o o d C r e d i t R i s k s Preliminary Data Exploration Big Data analysis conceptually similar to any other data analysis. First step: Perform preliminary exploratory analysis Obtain data Make records available within a data processing environment Data may be accessed at storage location or brought over locally Ensure all analysis-relevant datasets are present Such as unprocessed / raw data, different contributing data collections. Server Desktop Teradata

19 Preliminary Data Exploration Big Data analysis conceptually similar to any other data analysis. First step: Perform preliminary exploratory analysis Inspect data For missing values and errors Check for type mismatch Missings: Remove record? Assign default? Additional quality checks Check for out-of-bound values (Ex.: Remove neg recs from positive variable) Transform variables as needed (Ex.: Isolate street number from address string) Yield secondary variables as needed (Ex.: Time duration from date range) Isolate and handle extremes, if makes sense to do so Teradata

20 Preliminary Data Exploration Big Data analysis conceptually similar to any other data analysis. First step: Perform preliminary exploratory analysis Possibly run summary calculations For example, compute minimum, maximum, average, etc. Insight from charts, plots, maps Visualization can be huge aid for cognitive understanding or: When, then vs. helps Teradata

21 Preliminary Data Exploration Scale matters in Big Data. Visualization can be all the more elaborate and important when exploring complexity and multiple dimensions in data. At the end of the day, are summary statistics and pretty pictures enough? Teradata

22 Transition To Data Workbench Most typically, one needs deeper insight to help drive decision-making. Very often, one may want to predict a variable Examples: Assess sales volume; project the number of passengers for an airline select between multiple options when taking action Examples: Accept or reject a transaction? Has a threshold been exceeded? classify/put in order a series of items Examples: How can a business distribute retail stores? Which customers are loyal? To provide answers to similar problems, one must seek to understand behavior of a target variable as a function of the data features Teradata

23 Transition To Data Workbench One starts with the output of the first exploratory step, that is the so-called Analytics Data Set (ADS). The ADS contains all the features (variables) that are assessed to be relevant to the target variable. In each problem, one utterly seeks to build an accurate enough representation of reality to enable inference about the target variable. This is done in the next step, otherwise known as Second step: Data modeling A model is a representation of reality. Data modeling serves in discovering and establishing associations among the data to describe accurately the target variable Teradata

24 Transition To Data Workbench How do you fit a model to data? Keep in mind the Indetermination Thesis: In principle, there may exist an infinite number of curves that satisfy a given dataset Teradata

25 Transition To Data Workbench How do you fit a model to data? Keep in mind the Indetermination Thesis: In principle, there may exist an infinite number of curves that satisfy a given dataset Teradata

26 Transition To Data Workbench How do you fit a model to data? Keep in mind the Indetermination Thesis: In principle, there may exist an infinite number of curves that satisfy a given dataset Teradata

27 Transition To Data Workbench How do you fit a model to data? A theoretical model can eliminate the bulk of possibilities and provide us with a few meaningful ones Teradata

28 Data Modeling: Prediction We distinguish 2 main data modeling categories: Prediction and clustering. Prediction: A model is trained by data to develop data-driven behavior Training data: The subset used to train and validate the model behavior Testing/scoring data: The subset used by the model to yield predictions Also known as: Supervised learning Teradata

29 Data Modeling: Prediction We distinguish 2 main data modeling categories: Prediction and clustering. Prediction: Commonly used methodologies include: Regression: Family of techniques suitable for a variety of tasks; some types are Linear: Suitable for continuous variable prediction Logistic: Suitable for prediction of binary outcome or class variables (discrete values) k-nearest Neighbor: Prediction based on averaged response from k nearest samples Ridge: An option to solve ill-posed (overfitted or underfitted) problems Neural Networks: Parametric, layered models inspired by how neurons work Decision Trees: Node-based classification algorithm. Each node makes a decision by using a condition on one of the input features. Random Forests: A regressor made up of multiple decision trees. Performs partial analysis for each tree; then averages answers from trees Teradata

30 Data Modeling: Clustering We distinguish 2 main data modeling categories: Prediction and clustering. Clustering: Associate and categorize data in groups (clusters) on the basis of specified group characteristics Model can be used to split a data set in a desired number of clusters Also known as: Unsupervised learning Teradata

31 Data Modeling: Clustering We distinguish 2 main data modeling categories: Prediction and clustering. Clustering: Commonly used methodologies include: K-Means Algorithm: Data separated into specified number of clusters around center points called centroids. Objective is to minimize each cluster s data distance from the cluster centroid. Done by minimizing a criterion called inertia. Hierarchical Clustering: Family of clustering algorithms. Objective is to build nested clusters in the form of a dendrogram tree. The tree root is a unique cluster that gathers all the samples; the leaves are clusters with a single sample. Variations / Combinations of the above that include Spectral Clustering: Performs a low-dimensional embedding of the affinity matrix between samples; then applies K-Means. Interactive Clustering: First user K-Means to create small clusters; then applies hierarchical clustering to each one of those Teradata

32 Data Modeling: Contemporary Trends Aside from core statistical techniques adopted for data modeling Big Data analysis also gave rise to some very popular concepts in the field: Machine Learning: Actually, ML is a Computer Science domain Very similar to Computational Statistics. Focuses on using/combining statistical methodologies to solve problems through use of computers alone. Artificial Intelligence: A neighboring concept; relies on machine-only intelligence. Deep Learning: A class of ML algorithms Cascading multiple nonlinear processing units for feature transformation and/or extraction. Based on unsupervised learning of multiple levels of data features. Hierarchical representation: Higher level features derived from lower level ones. Multiple level approach: Different levels of abstraction; a hierarchy of concepts Teradata

33 Data Modeling: Business Applications Often enough, specific methodologies might be recommended more than others for particular business tasks. Business Issue Customer Segmentation Propensity to Buy Attrition Lifetime Value Purchase Sequence Sales Forecasting Customer Acquisition and Prospecting Profitability analysis Campaign Effectiveness Assessment Data Mining Analytical Approaches Clustering, Factor Analysis, Ranking and Tiering Induction Trees, Logistic Regression, Neural Nets Induction Trees, Logistic Regression, Neural Nets Net Present value, Structural Equation Modeling Association/Affinity and Sequence Analysis, Time Series Time Series, Neural Nets, Linear Regression Induction Trees, Logistic Regression, Neural Nets Activity-based costing, process-based costing, Neural Nets, Rule Induction, Logistic Regression, Discriminant Analysis Teradata

34 Overview Big Data: Brief review Storage and Availability Information in your hands. Wanna keep it where? Analysis Exploratory and summary. Is this enough? Can pretty pictures tell stories? Recreate the world around you Tools and tricks of the trade Walking the walk The foundation, the strategy, the tools Managing the big story Model management Application management Teradata

35 Big Data Analysis Setup: Foundation Everything we saw to this point requires a core framework to be built on... Hardware, including: Servers Storage devices Network infrastructure Connectors Computing resources Teradata

36 Big Data Analysis Setup: Foundation Everything we saw to this point requires a core framework to be built on... Software for the engineering foundation to operate on to enable data transfers for hardware setup for communication between hardware, and processing requests Teradata

37 Big Data Analysis Setup: Architecture Everything we saw to this point requires an architectural approach strategy. Example: A very simple strategy: I can do everything from the comfort of my multi-core computer! Teradata

38 Big Data Analysis Setup: Architecture Everything we saw to this point requires an architectural approach strategy. Often in practice, one of the following computing architectures is adopted: Side-by-side Distributed Computing In-Database PC Client PC Client PC Client Request Results Server or Cluster Of Servers Request Results Data Extract Data Extract Database Database Database Teradata

39 D e b t < 1 0 % o f I n c o m e D e b t = 0 % Y e s G o o d C r e d i t R i s k s Y e s N O I n c o m e > $ 4 0 K B a d C r e d i t R i s k s N O N O Y e s G o o d C r e d i t R i s k s D e b t < 1 0 % o f I n c o m e D e b t = 0 % Y G o o d C r e d i t R i s k s e s Y e s N O I n c o m e > $ 4 0 K B a d C r e d i t R i s k s N O N O Y e s G o o d C r e d i t R i s k s Big Data Analysis Setup: Architecture Everything we saw to this point requires an architectural approach strategy. In-Database architecture may carry significant speed advantages. Sample Data Desktop and Server Analytic Architecture Results Results Processing Request In-Database Analytic Architecture Exponential Performance Improvement Teradata

40 Big Data Analysis Setup: Architecture Everything we saw to this point requires an architectural approach strategy. In business applications, the architecture strategy further implies how data are accessed to enable lower cost and higher speed in decision-making. OPERATIONAL SYSTEMS DECISION MAKERS OPERATIONAL SYSTEMS DECISION MAKERS Integrated Data Warehouse (IDW) Teradata Fragmented vs. Integrated data approach

41 Big Data Analysis Setup: Tools Everything we saw to this point requires programming and software. Programming languages and software packages provide the building blocks for implementation of suitable algorithms and analysis with Big Data Teradata

42 Big Data Analysis Setup: Tools Everything we saw to this point requires programming and software. Analytical frameworks and interfaces provide creative programming platforms that facilitate data analysis and understanding Teradata

43 Big Data Analysis Setup: People Above all, everything we saw to this point requires people on the helm. Sign of times? The industry no longer needs just a mathematician, a statistician, a programmer, or a general scientist. The new superstar in the age of Big Data is a Data Scientist: Coined as a person with a most versatile skillset to perform all-in-one tasks such as handling computationally any size of datasets possessing statistical prowess and modeling skills understanding and programming relational data storage (databases) solving problems; visualize, communicate well Teradata

44 Overview Big Data: Brief review Storage and Availability Information in your hands. Wanna keep it where? Analysis Exploratory and summary. Is this enough? Can pretty pictures tell stories? Recreate the world around you Tools and tricks of the trade Walking the walk The foundation, the strategy, the tools Managing the big story Model management Application management Teradata

45 In Production: Model Management Aspects Data analysis yields an optimally fit model for prediction. For an individual or a small team, this is where a Big Data analysis might come to completion. In an industrial/commercial setting, however: Data refresh: Customer data might be in constant flow. Deliverable must account for streaming data flows and continuous model use Data Scientists Business Analysts Marketing Front-Line Workers Engineers Customers / Partners Executives Operational Systems LANGUAGES MATH & STATS DATA MINING BUSINESS INTELLIGENCE APPLICATIONS Teradata

46 In Production: Model Management Aspects Data analysis yields an optimally fit model for prediction. For an individual or a small team, this is where a Big Data analysis might come to completion. In an industrial/commercial setting, however: Scoring, validation, and model health: Model must be kept current. With time, data characteristics might change and prediction may deteriorate. Mechanisms should perform quality checks and support model retraining Teradata

47 In Production: Model Management Aspects Data analysis yields an optimally fit model for prediction. For an individual or a small team, this is where a Big Data analysis might come to completion. In an industrial/commercial setting, however: Promotion: Model may need to circulate across an organization. Different teams might require training to learn about a model Champion model: Analysis may indicate multiple prevailing solutions. Different teams might need to select own champion model for usage among challenger ones. In addition, this might be a repeatable periodic process Teradata

48 In Production: Application Management Aspects In industrial/commercial settings, a successful model could be converted into an application for broader adoption. Topics pertaining to this case are: Application development: Fitting a new application in existing ecosystem Effort to retain model functionality and provide users with control over parameters and variables to specify Deployment and Extensions: Distributing, maintaining and extending the application Teradata

49 Review: Flow Overview For Big Data Analysis What you want to do with Big Data? A conceptual analytical framework Data Refresh Scoring / Validation Champion & Challenger Model Management Health Statistics Promotion Version Control Application Deployment Application Extensions Application Management Edge Systems Streams IoT Workflow Management Data integration Interface Data Profiling Provisioning Lifecycle Dashboard Tools Algorithms Feature Engineering ADS Data Science Lab Data Science Workbench Production Analytics Common Scoring Library Analytics Batch Stream Data Ingest Modeling Analytic Data Set Repository ADS Metadata Data Stores Scoring Data Set Repository Scoring Metadata Batch Stream Data Ingest Teradata

50 50

BIG DATA SCIENTIST Certification. Big Data Scientist

BIG DATA SCIENTIST Certification. Big Data Scientist BIG DATA SCIENTIST Certification Big Data Scientist Big Data Science Professional (BDSCP) certifications are formal accreditations that prove proficiency in specific areas of Big Data. To obtain a certification,

More information

Deploying, Managing and Reusing R Models in an Enterprise Environment

Deploying, Managing and Reusing R Models in an Enterprise Environment Deploying, Managing and Reusing R Models in an Enterprise Environment Making Data Science Accessible to a Wider Audience Lou Bajuk-Yorgan, Sr. Director, Product Management Streaming and Advanced Analytics

More information

Analytical model A structure and process for analyzing a dataset. For example, a decision tree is a model for the classification of a dataset.

Analytical model A structure and process for analyzing a dataset. For example, a decision tree is a model for the classification of a dataset. Glossary of data mining terms: Accuracy Accuracy is an important factor in assessing the success of data mining. When applied to data, accuracy refers to the rate of correct values in the data. When applied

More information

BEST BIG DATA CERTIFICATIONS

BEST BIG DATA CERTIFICATIONS VALIANCE INSIGHTS BIG DATA BEST BIG DATA CERTIFICATIONS email : info@valiancesolutions.com website : www.valiancesolutions.com VALIANCE SOLUTIONS Analytics: Optimizing Certificate Engineer Engineering

More information

This tutorial has been prepared for computer science graduates to help them understand the basic-to-advanced concepts related to data mining.

This tutorial has been prepared for computer science graduates to help them understand the basic-to-advanced concepts related to data mining. About the Tutorial Data Mining is defined as the procedure of extracting information from huge sets of data. In other words, we can say that data mining is mining knowledge from data. The tutorial starts

More information

Optimizing Your Analytics Life Cycle with SAS & Teradata. Rick Lower

Optimizing Your Analytics Life Cycle with SAS & Teradata. Rick Lower Optimizing Your Analytics Life Cycle with SAS & Teradata Rick Lower 1 Agenda The Analytic Life Cycle Common Problems SAS & Teradata solutions Analytical Life Cycle Exploration Explore All Your Data Preparation

More information

Event: PASS SQL Saturday - DC 2018 Presenter: Jon Tupitza, CTO Architect

Event: PASS SQL Saturday - DC 2018 Presenter: Jon Tupitza, CTO Architect Event: PASS SQL Saturday - DC 2018 Presenter: Jon Tupitza, CTO Architect BEOP.CTO.TP4 Owner: OCTO Revision: 0001 Approved by: JAT Effective: 08/30/2018 Buchanan & Edwards Proprietary: Printed copies of

More information

Now, Data Mining Is Within Your Reach

Now, Data Mining Is Within Your Reach Clementine Desktop Specifications Now, Data Mining Is Within Your Reach Data mining delivers significant, measurable value. By uncovering previously unknown patterns and connections in data, data mining

More information

UNLEASHING THE VALUE OF THE TERADATA UNIFIED DATA ARCHITECTURE WITH ALTERYX

UNLEASHING THE VALUE OF THE TERADATA UNIFIED DATA ARCHITECTURE WITH ALTERYX UNLEASHING THE VALUE OF THE TERADATA UNIFIED DATA ARCHITECTURE WITH ALTERYX 1 Successful companies know that analytics are key to winning customer loyalty, optimizing business processes and beating their

More information

Applying Supervised Learning

Applying Supervised Learning Applying Supervised Learning When to Consider Supervised Learning A supervised learning algorithm takes a known set of input data (the training set) and known responses to the data (output), and trains

More information

Specialist ICT Learning

Specialist ICT Learning Specialist ICT Learning APPLIED DATA SCIENCE AND BIG DATA ANALYTICS GTBD7 Course Description This intensive training course provides theoretical and technical aspects of Data Science and Business Analytics.

More information

Contents. Preface to the Second Edition

Contents. Preface to the Second Edition Preface to the Second Edition v 1 Introduction 1 1.1 What Is Data Mining?....................... 4 1.2 Motivating Challenges....................... 5 1.3 The Origins of Data Mining....................

More information

Data Mining: Models and Methods

Data Mining: Models and Methods Data Mining: Models and Methods Author, Kirill Goltsman A White Paper July 2017 --------------------------------------------------- www.datascience.foundation Copyright 2016-2017 What is Data Mining? Data

More information

Data Science Course Content

Data Science Course Content CHAPTER 1: INTRODUCTION TO DATA SCIENCE Data Science Course Content What is the need for Data Scientists Data Science Foundation Business Intelligence Data Analysis Data Mining Machine Learning Difference

More information

Exploratory Analysis: Clustering

Exploratory Analysis: Clustering Exploratory Analysis: Clustering (some material taken or adapted from slides by Hinrich Schutze) Heejun Kim June 26, 2018 Clustering objective Grouping documents or instances into subsets or clusters Documents

More information

STATISTICS (STAT) Statistics (STAT) 1

STATISTICS (STAT) Statistics (STAT) 1 Statistics (STAT) 1 STATISTICS (STAT) STAT 2013 Elementary Statistics (A) Prerequisites: MATH 1483 or MATH 1513, each with a grade of "C" or better; or an acceptable placement score (see placement.okstate.edu).

More information

Introduction to Data Science

Introduction to Data Science UNIT I INTRODUCTION TO DATA SCIENCE Syllabus Introduction of Data Science Basic Data Analytics using R R Graphical User Interfaces Data Import and Export Attribute and Data Types Descriptive Statistics

More information

ADVANCED ANALYTICS USING SAS ENTERPRISE MINER RENS FEENSTRA

ADVANCED ANALYTICS USING SAS ENTERPRISE MINER RENS FEENSTRA INSIGHTS@SAS: ADVANCED ANALYTICS USING SAS ENTERPRISE MINER RENS FEENSTRA AGENDA 09.00 09.15 Intro 09.15 10.30 Analytics using SAS Enterprise Guide Ellen Lokollo 10.45 12.00 Advanced Analytics using SAS

More information

Analytics Fundamentals by Mark Peco

Analytics Fundamentals by Mark Peco Analytics Fundamentals by Mark Peco All rights reserved. Reproduction in whole or part prohibited except by written permission. Product and company names mentioned herein may be trademarks of their respective

More information

Data Mining Overview. CHAPTER 1 Introduction to SAS Enterprise Miner Software

Data Mining Overview. CHAPTER 1 Introduction to SAS Enterprise Miner Software 1 CHAPTER 1 Introduction to SAS Enterprise Miner Software Data Mining Overview 1 Layout of the SAS Enterprise Miner Window 2 Using the Application Main Menus 3 Using the Toolbox 8 Using the Pop-Up Menus

More information

Data Mining. Part 2. Data Understanding and Preparation. 2.4 Data Transformation. Spring Instructor: Dr. Masoud Yaghini. Data Transformation

Data Mining. Part 2. Data Understanding and Preparation. 2.4 Data Transformation. Spring Instructor: Dr. Masoud Yaghini. Data Transformation Data Mining Part 2. Data Understanding and Preparation 2.4 Spring 2010 Instructor: Dr. Masoud Yaghini Outline Introduction Normalization Attribute Construction Aggregation Attribute Subset Selection Discretization

More information

COMP 465 Special Topics: Data Mining

COMP 465 Special Topics: Data Mining COMP 465 Special Topics: Data Mining Introduction & Course Overview 1 Course Page & Class Schedule http://cs.rhodes.edu/welshc/comp465_s15/ What s there? Course info Course schedule Lecture media (slides,

More information

Data Science. Data Analyst. Data Scientist. Data Architect

Data Science. Data Analyst. Data Scientist. Data Architect Data Science Data Analyst Data Analysis in Excel Programming in R Introduction to Python/SQL/Tableau Data Visualization in R / Tableau Exploratory Data Analysis Data Scientist Inferential Statistics &

More information

International Journal of Computer Engineering and Applications, ICCSTAR-2016, Special Issue, May.16

International Journal of Computer Engineering and Applications, ICCSTAR-2016, Special Issue, May.16 The Survey Of Data Mining And Warehousing Architha.S, A.Kishore Kumar Department of Computer Engineering Department of computer engineering city engineering college VTU Bangalore, India ABSTRACT: Data

More information

Oracle Big Data Discovery

Oracle Big Data Discovery Oracle Big Data Discovery Turning Data into Business Value Harald Erb Oracle Business Analytics & Big Data 1 Safe Harbor Statement The following is intended to outline our general product direction. It

More information

2 The IBM Data Governance Unified Process

2 The IBM Data Governance Unified Process 2 The IBM Data Governance Unified Process The benefits of a commitment to a comprehensive enterprise Data Governance initiative are many and varied, and so are the challenges to achieving strong Data Governance.

More information

Data Management Glossary

Data Management Glossary Data Management Glossary A Access path: The route through a system by which data is found, accessed and retrieved Agile methodology: An approach to software development which takes incremental, iterative

More information

Yunfeng Zhang 1, Huan Wang 2, Jie Zhu 1 1 Computer Science & Engineering Department, North China Institute of Aerospace

Yunfeng Zhang 1, Huan Wang 2, Jie Zhu 1 1 Computer Science & Engineering Department, North China Institute of Aerospace [Type text] [Type text] [Type text] ISSN : 0974-7435 Volume 10 Issue 20 BioTechnology 2014 An Indian Journal FULL PAPER BTAIJ, 10(20), 2014 [12526-12531] Exploration on the data mining system construction

More information

Spotfire Data Science with Hadoop Using Spotfire Data Science to Operationalize Data Science in the Age of Big Data

Spotfire Data Science with Hadoop Using Spotfire Data Science to Operationalize Data Science in the Age of Big Data Spotfire Data Science with Hadoop Using Spotfire Data Science to Operationalize Data Science in the Age of Big Data THE RISE OF BIG DATA BIG DATA: A REVOLUTION IN ACCESS Large-scale data sets are nothing

More information

Data mining overview. Data Mining. Data mining overview. Data mining overview. Data mining overview. Data mining overview 3/24/2014

Data mining overview. Data Mining. Data mining overview. Data mining overview. Data mining overview. Data mining overview 3/24/2014 Data Mining Data mining processes What technological infrastructure is required? Data mining is a system of searching through large amounts of data for patterns. It is a relatively new concept which is

More information

ATA DRIVEN GLOBAL VISION CLOUD PLATFORM STRATEG N POWERFUL RELEVANT PERFORMANCE SOLUTION CLO IRTUAL BIG DATA SOLUTION ROI FLEXIBLE DATA DRIVEN V

ATA DRIVEN GLOBAL VISION CLOUD PLATFORM STRATEG N POWERFUL RELEVANT PERFORMANCE SOLUTION CLO IRTUAL BIG DATA SOLUTION ROI FLEXIBLE DATA DRIVEN V ATA DRIVEN GLOBAL VISION CLOUD PLATFORM STRATEG N POWERFUL RELEVANT PERFORMANCE SOLUTION CLO IRTUAL BIG DATA SOLUTION ROI FLEXIBLE DATA DRIVEN V WHITE PAPER Create the Data Center of the Future Accelerate

More information

MACHINE LEARNING Example: Google search

MACHINE LEARNING Example: Google search MACHINE LEARNING Lauri Ilison, PhD Data Scientist 20.11.2014 Example: Google search 1 27.11.14 Facebook: 350 million photo uploads every day The dream is to build full knowledge of the world and know everything

More information

Implementing Operational Analytics Using Big Data Technologies to Detect and Predict Sensor Anomalies

Implementing Operational Analytics Using Big Data Technologies to Detect and Predict Sensor Anomalies Implementing Operational Analytics Using Big Data Technologies to Detect and Predict Sensor Anomalies Joseph Coughlin, Rohit Mital, Shashi Nittur, Benjamin SanNicolas, Christian Wolf, Rinor Jusufi Stinger

More information

Overview of Data Services and Streaming Data Solution with Azure

Overview of Data Services and Streaming Data Solution with Azure Overview of Data Services and Streaming Data Solution with Azure Tara Mason Senior Consultant tmason@impactmakers.com Platform as a Service Offerings SQL Server On Premises vs. Azure SQL Server SQL Server

More information

Introducing SAS Model Manager 15.1 for SAS Viya

Introducing SAS Model Manager 15.1 for SAS Viya ABSTRACT Paper SAS2284-2018 Introducing SAS Model Manager 15.1 for SAS Viya Glenn Clingroth, Robert Chu, Steve Sparano, David Duling SAS Institute Inc. SAS Model Manager has been a popular product since

More information

INTRODUCTION... 2 FEATURES OF DARWIN... 4 SPECIAL FEATURES OF DARWIN LATEST FEATURES OF DARWIN STRENGTHS & LIMITATIONS OF DARWIN...

INTRODUCTION... 2 FEATURES OF DARWIN... 4 SPECIAL FEATURES OF DARWIN LATEST FEATURES OF DARWIN STRENGTHS & LIMITATIONS OF DARWIN... INTRODUCTION... 2 WHAT IS DATA MINING?... 2 HOW TO ACHIEVE DATA MINING... 2 THE ROLE OF DARWIN... 3 FEATURES OF DARWIN... 4 USER FRIENDLY... 4 SCALABILITY... 6 VISUALIZATION... 8 FUNCTIONALITY... 10 Data

More information

Build a system health check for Db2 using IBM Machine Learning for z/os

Build a system health check for Db2 using IBM Machine Learning for z/os Build a system health check for Db2 using IBM Machine Learning for z/os Jonathan Sloan Senior Analytics Architect, IBM Analytics Agenda A brief machine learning overview The Db2 ITOA model solutions template

More information

Customer Clustering using RFM analysis

Customer Clustering using RFM analysis Customer Clustering using RFM analysis VASILIS AGGELIS WINBANK PIRAEUS BANK Athens GREECE AggelisV@winbank.gr DIMITRIS CHRISTODOULAKIS Computer Engineering and Informatics Department University of Patras

More information

This document (including, without limitation, any product roadmap or statement of direction data) illustrates the planned testing, release and

This document (including, without limitation, any product roadmap or statement of direction data) illustrates the planned testing, release and AI and Visual Analytics: Machine Learning in Business Operations Steven Hillion Senior Director, Data Science Anshuman Mishra Principal Data Scientist DISCLAIMER During the course of this presentation,

More information

Oracle Big Data Connectors

Oracle Big Data Connectors Oracle Big Data Connectors Oracle Big Data Connectors is a software suite that integrates processing in Apache Hadoop distributions with operations in Oracle Database. It enables the use of Hadoop to process

More information

Fast Innovation requires Fast IT

Fast Innovation requires Fast IT Fast Innovation requires Fast IT Cisco Data Virtualization Puneet Kumar Bhugra Business Solutions Manager 1 Challenge In Data, Big Data & Analytics Siloed, Multiple Sources Business Outcomes Business Opportunity:

More information

Overview and Practical Application of Machine Learning in Pricing

Overview and Practical Application of Machine Learning in Pricing Overview and Practical Application of Machine Learning in Pricing 2017 CAS Spring Meeting May 23, 2017 Duncan Anderson and Claudine Modlin (Willis Towers Watson) Mark Richards (Allstate Insurance Company)

More information

Virtuoso Infotech Pvt. Ltd.

Virtuoso Infotech Pvt. Ltd. Virtuoso Infotech Pvt. Ltd. About Virtuoso Infotech Fastest growing IT firm; Offers the flexibility of a small firm and robustness of over 30 years experience collectively within the leadership team Technology

More information

Predictive Analytics: Demystifying Current and Emerging Methodologies. Tom Kolde, FCAS, MAAA Linda Brobeck, FCAS, MAAA

Predictive Analytics: Demystifying Current and Emerging Methodologies. Tom Kolde, FCAS, MAAA Linda Brobeck, FCAS, MAAA Predictive Analytics: Demystifying Current and Emerging Methodologies Tom Kolde, FCAS, MAAA Linda Brobeck, FCAS, MAAA May 18, 2017 About the Presenters Tom Kolde, FCAS, MAAA Consulting Actuary Chicago,

More information

Learning Objectives for Data Concept and Visualization

Learning Objectives for Data Concept and Visualization Learning Objectives for Data Concept and Visualization Assignment 1: Data Quality Concept and Impact of Data Quality Summarize concepts of data quality. Understand and describe the impact of data on actuarial

More information

Table Of Contents: xix Foreword to Second Edition

Table Of Contents: xix Foreword to Second Edition Data Mining : Concepts and Techniques Table Of Contents: Foreword xix Foreword to Second Edition xxi Preface xxiii Acknowledgments xxxi About the Authors xxxv Chapter 1 Introduction 1 (38) 1.1 Why Data

More information

Lecture 25: Review I

Lecture 25: Review I Lecture 25: Review I Reading: Up to chapter 5 in ISLR. STATS 202: Data mining and analysis Jonathan Taylor 1 / 18 Unsupervised learning In unsupervised learning, all the variables are on equal standing,

More information

Big Data Analytics The Data Mining process. Roger Bohn March. 2016

Big Data Analytics The Data Mining process. Roger Bohn March. 2016 1 Big Data Analytics The Data Mining process Roger Bohn March. 2016 Office hours HK thursday5 to 6 in the library 3115 If trouble, email or Slack private message. RB Wed. 2 to 3:30 in my office Some material

More information

towards advanced HR Analytics by Arie-Jan Baan and Bram Eigenhuis

towards advanced HR Analytics by Arie-Jan Baan and Bram Eigenhuis towards advanced HR Analytics by Arie-Jan Baan and Bram Eigenhuis Content #1 advanced Data Analytics (?) #2 data Science Process #3 a case study #4 your case #5 Q&A Who is who? and what is your expectation?

More information

Data Preprocessing. Slides by: Shree Jaswal

Data Preprocessing. Slides by: Shree Jaswal Data Preprocessing Slides by: Shree Jaswal Topics to be covered Why Preprocessing? Data Cleaning; Data Integration; Data Reduction: Attribute subset selection, Histograms, Clustering and Sampling; Data

More information

Automate Transform Analyze

Automate Transform Analyze Competitive Intelligence 2.0 Turning the Web s Big Data into Big Insights Automate Transform Analyze Introduction Today, the web continues to grow at a dizzying pace. There are more than 1 billion websites

More information

Pouya Kousha Fall 2018 CSE 5194 Prof. DK Panda

Pouya Kousha Fall 2018 CSE 5194 Prof. DK Panda Pouya Kousha Fall 2018 CSE 5194 Prof. DK Panda 1 Observe novel applicability of DL techniques in Big Data Analytics. Applications of DL techniques for common Big Data Analytics problems. Semantic indexing

More information

BIG DATA SCIENCE PROFESSIONAL Certification. Big Data Science Professional

BIG DATA SCIENCE PROFESSIONAL Certification. Big Data Science Professional BIG DATA SCIENCE PROFESSIONAL Certification Big Data Science Professional Big Data Science Professional (BDSCP) certifications are formal accreditations that prove proficiency in specific areas of Big

More information

The Emerging Data Lake IT Strategy

The Emerging Data Lake IT Strategy The Emerging Data Lake IT Strategy An Evolving Approach for Dealing with Big Data & Changing Environments bit.ly/datalake SPEAKERS: Thomas Kelly, Practice Director Cognizant Technology Solutions Sean Martin,

More information

Data Mining and Analytics. Introduction

Data Mining and Analytics. Introduction Data Mining and Analytics Introduction Data Mining Data mining refers to extracting or mining knowledge from large amounts of data It is also termed as Knowledge Discovery from Data (KDD) Mostly, data

More information

Hybrid Data Platform

Hybrid Data Platform UniConnect-Powered Data Aggregation Across Enterprise Data Warehouses and Big Data Storage Platforms A Percipient Technology White Paper Author: Ai Meun Lim Chief Product Officer Updated Aug 2017 2017,

More information

ACHIEVEMENTS FROM TRAINING

ACHIEVEMENTS FROM TRAINING LEARN WELL TECHNOCRAFT DATA SCIENCE/ MACHINE LEARNING SYLLABUS 8TH YEAR OF ACCOMPLISHMENTS AUTHORIZED GLOBAL CERTIFICATION CENTER FOR MICROSOFT, ORACLE, IBM, AWS AND MANY MORE. 8411002339/7709292162 WWW.DW-LEARNWELL.COM

More information

Machine Learning in Python. Rohith Mohan GradQuant Spring 2018

Machine Learning in Python. Rohith Mohan GradQuant Spring 2018 Machine Learning in Python Rohith Mohan GradQuant Spring 2018 What is Machine Learning? https://twitter.com/myusuf3/status/995425049170489344 Traditional Programming Data Computer Program Output Getting

More information

COSC160: Detection and Classification. Jeremy Bolton, PhD Assistant Teaching Professor

COSC160: Detection and Classification. Jeremy Bolton, PhD Assistant Teaching Professor COSC160: Detection and Classification Jeremy Bolton, PhD Assistant Teaching Professor Outline I. Problem I. Strategies II. Features for training III. Using spatial information? IV. Reducing dimensionality

More information

Getting Started with Advanced Analytics in Finance, Marketing, and Operations

Getting Started with Advanced Analytics in Finance, Marketing, and Operations Getting Started with Advanced Analytics in Finance, Marketing, and Operations Southwest Regional Oracle Applications User Group Dan Vlamis February 24, 2017 @VlamisSoftware Vlamis Software Solutions Vlamis

More information

ECS289: Scalable Machine Learning

ECS289: Scalable Machine Learning ECS289: Scalable Machine Learning Cho-Jui Hsieh UC Davis Sept 22, 2016 Course Information Website: http://www.stat.ucdavis.edu/~chohsieh/teaching/ ECS289G_Fall2016/main.html My office: Mathematical Sciences

More information

An Introduction to Big Data Formats

An Introduction to Big Data Formats Introduction to Big Data Formats 1 An Introduction to Big Data Formats Understanding Avro, Parquet, and ORC WHITE PAPER Introduction to Big Data Formats 2 TABLE OF TABLE OF CONTENTS CONTENTS INTRODUCTION

More information

Oracle Machine Learning Notebook

Oracle Machine Learning Notebook Oracle Machine Learning Notebook Included in Autonomous Data Warehouse Cloud Charlie Berger, MS Engineering, MBA Sr. Director Product Management, Machine Learning, AI and Cognitive Analytics charlie.berger@oracle.com

More information

Data Mining: Approach Towards The Accuracy Using Teradata!

Data Mining: Approach Towards The Accuracy Using Teradata! Data Mining: Approach Towards The Accuracy Using Teradata! Shubhangi Pharande Department of MCA NBNSSOCS,Sinhgad Institute Simantini Nalawade Department of MCA NBNSSOCS,Sinhgad Institute Ajay Nalawade

More information

Clustering algorithms and autoencoders for anomaly detection

Clustering algorithms and autoencoders for anomaly detection Clustering algorithms and autoencoders for anomaly detection Alessia Saggio Lunch Seminars and Journal Clubs Université catholique de Louvain, Belgium 3rd March 2017 a Outline Introduction Clustering algorithms

More information

Putting it all together: Creating a Big Data Analytic Workflow with Spotfire

Putting it all together: Creating a Big Data Analytic Workflow with Spotfire Putting it all together: Creating a Big Data Analytic Workflow with Spotfire Authors: David Katz and Mike Alperin, TIBCO Data Science Team In a previous blog, we showed how ultra-fast visualization of

More information

Random Forest A. Fornaser

Random Forest A. Fornaser Random Forest A. Fornaser alberto.fornaser@unitn.it Sources Lecture 15: decision trees, information theory and random forests, Dr. Richard E. Turner Trees and Random Forests, Adele Cutler, Utah State University

More information

Transforming Utility Grid Operations with the Internet of Things

Transforming Utility Grid Operations with the Internet of Things Solution Brief Internet of Things Energy Industry Transforming Utility Grid Operations with the Internet of Things Access key process data in real time to increase situational awareness of grid operations.

More information

GETTING STARTED WITH DATA MINING

GETTING STARTED WITH DATA MINING GETTING STARTED WITH DATA MINING Nora Galambos, PhD Senior Data Scientist Office of Institutional Research, Planning & Effectiveness Stony Brook University AIR Forum 2017 Washington, D.C. 1 Using Data

More information

An Introduction to Data Mining in Institutional Research. Dr. Thulasi Kumar Director of Institutional Research University of Northern Iowa

An Introduction to Data Mining in Institutional Research. Dr. Thulasi Kumar Director of Institutional Research University of Northern Iowa An Introduction to Data Mining in Institutional Research Dr. Thulasi Kumar Director of Institutional Research University of Northern Iowa AIR/SPSS Professional Development Series Background Covering variety

More information

DATA SCIENCE INTRODUCTION QSHORE TECHNOLOGIES. About the Course:

DATA SCIENCE INTRODUCTION QSHORE TECHNOLOGIES. About the Course: DATA SCIENCE About the Course: In this course you will get an introduction to the main tools and ideas which are required for Data Scientist/Business Analyst/Data Analyst/Analytics Manager/Actuarial Scientist/Business

More information

data-based banking customer analytics

data-based banking customer analytics icare: A framework for big data-based banking customer analytics Authors: N.Sun, J.G. Morris, J. Xu, X.Zhu, M. Xie Presented By: Hardik Sahi Overview 1. 2. 3. 4. 5. 6. Why Big Data? Traditional versus

More information

Business Club. Decision Trees

Business Club. Decision Trees Business Club Decision Trees Business Club Analytics Team December 2017 Index 1. Motivation- A Case Study 2. The Trees a. What is a decision tree b. Representation 3. Regression v/s Classification 4. Building

More information

Outrun Your Competition With SAS In-Memory Analytics Sascha Schubert Global Technology Practice, SAS

Outrun Your Competition With SAS In-Memory Analytics Sascha Schubert Global Technology Practice, SAS Outrun Your Competition With SAS In-Memory Analytics Sascha Schubert Global Technology Practice, SAS Topics AGENDA Challenges with Big Data Analytics How SAS can help you to minimize time to value with

More information

8/3/2017. Contour Assessment for Quality Assurance and Data Mining. Objective. Outline. Tom Purdie, PhD, MCCPM

8/3/2017. Contour Assessment for Quality Assurance and Data Mining. Objective. Outline. Tom Purdie, PhD, MCCPM Contour Assessment for Quality Assurance and Data Mining Tom Purdie, PhD, MCCPM Objective Understand the state-of-the-art in contour assessment for quality assurance including data mining-based techniques

More information

Lecture 27: Review. Reading: All chapters in ISLR. STATS 202: Data mining and analysis. December 6, 2017

Lecture 27: Review. Reading: All chapters in ISLR. STATS 202: Data mining and analysis. December 6, 2017 Lecture 27: Review Reading: All chapters in ISLR. STATS 202: Data mining and analysis December 6, 2017 1 / 16 Final exam: Announcements Tuesday, December 12, 8:30-11:30 am, in the following rooms: Last

More information

Ambition Market Insights

Ambition Market Insights The second half of 2017 has seen strong hiring activities, driven by a number of key factors, across the technology sector. Many organisations were embracing technology to make their business more efficient

More information

Data Mining. Introduction. Hamid Beigy. Sharif University of Technology. Fall 1395

Data Mining. Introduction. Hamid Beigy. Sharif University of Technology. Fall 1395 Data Mining Introduction Hamid Beigy Sharif University of Technology Fall 1395 Hamid Beigy (Sharif University of Technology) Data Mining Fall 1395 1 / 21 Table of contents 1 Introduction 2 Data mining

More information

CS 8520: Artificial Intelligence. Machine Learning 2. Paula Matuszek Fall, CSC 8520 Fall Paula Matuszek

CS 8520: Artificial Intelligence. Machine Learning 2. Paula Matuszek Fall, CSC 8520 Fall Paula Matuszek CS 8520: Artificial Intelligence Machine Learning 2 Paula Matuszek Fall, 2015!1 Regression Classifiers We said earlier that the task of a supervised learning system can be viewed as learning a function

More information

Q1) Describe business intelligence system development phases? (6 marks)

Q1) Describe business intelligence system development phases? (6 marks) BUISINESS ANALYTICS AND INTELLIGENCE SOLVED QUESTIONS Q1) Describe business intelligence system development phases? (6 marks) The 4 phases of BI system development are as follow: Analysis phase Design

More information

DATA MINING AND WAREHOUSING

DATA MINING AND WAREHOUSING DATA MINING AND WAREHOUSING Qno Question Answer 1 Define data warehouse? Data warehouse is a subject oriented, integrated, time-variant, and nonvolatile collection of data that supports management's decision-making

More information

CONFIDENTLY INTEGRATE VMWARE CLOUD ON AWS WITH INTELLIGENT OPERATIONS

CONFIDENTLY INTEGRATE VMWARE CLOUD ON AWS WITH INTELLIGENT OPERATIONS SOLUTION OVERVIEW CONFIDENTLY INTEGRATE VMWARE WITH INTELLIGENT OPERATIONS VMware Cloud TM on AWS brings VMware s enterprise class Software-Defined Data Center (SDDC) software to the AWS Cloud, with optimized

More information

Think & Work like a Data Scientist with SQL 2016 & R DR. SUBRAMANI PARAMASIVAM (MANI)

Think & Work like a Data Scientist with SQL 2016 & R DR. SUBRAMANI PARAMASIVAM (MANI) Think & Work like a Data Scientist with SQL 2016 & R DR. SUBRAMANI PARAMASIVAM (MANI) About the Speaker Dr. SubraMANI Paramasivam PhD., MCT, MCSE, MCITP, MCP, MCTS, MCSA CEO, Principal Consultant & Trainer

More information

SAS Enterprise Miner : What does the future hold?

SAS Enterprise Miner : What does the future hold? SAS Enterprise Miner : What does the future hold? David Duling EM Development Director SAS Inc. Sascha Schubert Product Manager Data Mining SAS International Topics for Discussion: EM 4.2/SAS 9.0 AF/SCL

More information

Big Data Analytics: What is Big Data? Stony Brook University CSE545, Fall 2016 the inaugural edition

Big Data Analytics: What is Big Data? Stony Brook University CSE545, Fall 2016 the inaugural edition Big Data Analytics: What is Big Data? Stony Brook University CSE545, Fall 2016 the inaugural edition What s the BIG deal?! 2011 2011 2008 2010 2012 What s the BIG deal?! (Gartner Hype Cycle) What s the

More information

A Systematic Overview of Data Mining Algorithms. Sargur Srihari University at Buffalo The State University of New York

A Systematic Overview of Data Mining Algorithms. Sargur Srihari University at Buffalo The State University of New York A Systematic Overview of Data Mining Algorithms Sargur Srihari University at Buffalo The State University of New York 1 Topics Data Mining Algorithm Definition Example of CART Classification Iris, Wine

More information

Introduction to Data Mining and Data Analytics

Introduction to Data Mining and Data Analytics 1/28/2016 MIST.7060 Data Analytics 1 Introduction to Data Mining and Data Analytics What Are Data Mining and Data Analytics? Data mining is the process of discovering hidden patterns in data, where Patterns

More information

A Systematic Overview of Data Mining Algorithms

A Systematic Overview of Data Mining Algorithms A Systematic Overview of Data Mining Algorithms 1 Data Mining Algorithm A well-defined procedure that takes data as input and produces output as models or patterns well-defined: precisely encoded as a

More information

Big Data Integration BIG DATA 9/15/2017. Business Performance

Big Data Integration BIG DATA 9/15/2017. Business Performance BIG DATA Business Performance Big Data Integration Big data is often about doing things that weren t widely possible because the technology was not advanced enough or the cost of doing so was prohibitive.

More information

Correlative Analytic Methods in Large Scale Network Infrastructure Hariharan Krishnaswamy Senior Principal Engineer Dell EMC

Correlative Analytic Methods in Large Scale Network Infrastructure Hariharan Krishnaswamy Senior Principal Engineer Dell EMC Correlative Analytic Methods in Large Scale Network Infrastructure Hariharan Krishnaswamy Senior Principal Engineer Dell EMC 2018 Storage Developer Conference. Dell EMC. All Rights Reserved. 1 Data Center

More information

Enterprise Data Architect

Enterprise Data Architect Enterprise Data Architect Position Summary Farmer Mac maintains a considerable repository of financial data that spans over two decades. Farmer Mac is looking for a hands-on technologist and data architect

More information

SVM in Oracle Database 10g: Removing the Barriers to Widespread Adoption of Support Vector Machines

SVM in Oracle Database 10g: Removing the Barriers to Widespread Adoption of Support Vector Machines SVM in Oracle Database 10g: Removing the Barriers to Widespread Adoption of Support Vector Machines Boriana Milenova, Joseph Yarmus, Marcos Campos Data Mining Technologies Oracle Overview Support Vector

More information

Correlation Based Feature Selection with Irrelevant Feature Removal

Correlation Based Feature Selection with Irrelevant Feature Removal Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology IJCSMC, Vol. 3, Issue. 4, April 2014,

More information

Contents. Foreword to Second Edition. Acknowledgments About the Authors

Contents. Foreword to Second Edition. Acknowledgments About the Authors Contents Foreword xix Foreword to Second Edition xxi Preface xxiii Acknowledgments About the Authors xxxi xxxv Chapter 1 Introduction 1 1.1 Why Data Mining? 1 1.1.1 Moving toward the Information Age 1

More information

Data Mining. Introduction. Hamid Beigy. Sharif University of Technology. Fall 1394

Data Mining. Introduction. Hamid Beigy. Sharif University of Technology. Fall 1394 Data Mining Introduction Hamid Beigy Sharif University of Technology Fall 1394 Hamid Beigy (Sharif University of Technology) Data Mining Fall 1394 1 / 20 Table of contents 1 Introduction 2 Data mining

More information

Big Data Specialized Studies

Big Data Specialized Studies Information Technologies Programs Big Data Specialized Studies Accelerate Your Career extension.uci.edu/bigdata Offered in partnership with University of California, Irvine Extension s professional certificate

More information

APPLICATION OF MULTIPLE RANDOM CENTROID (MRC) BASED K-MEANS CLUSTERING ALGORITHM IN INSURANCE A REVIEW ARTICLE

APPLICATION OF MULTIPLE RANDOM CENTROID (MRC) BASED K-MEANS CLUSTERING ALGORITHM IN INSURANCE A REVIEW ARTICLE APPLICATION OF MULTIPLE RANDOM CENTROID (MRC) BASED K-MEANS CLUSTERING ALGORITHM IN INSURANCE A REVIEW ARTICLE Sundari NallamReddy, Samarandra Behera, Sanjeev Karadagi, Dr. Anantha Desik ABSTRACT: Tata

More information

How a Metadata Repository enables dynamism and automation in SDTM-like dataset generation

How a Metadata Repository enables dynamism and automation in SDTM-like dataset generation Paper DH05 How a Metadata Repository enables dynamism and automation in SDTM-like dataset generation Judith Goud, Akana, Bennekom, The Netherlands Priya Shetty, Intelent, Princeton, USA ABSTRACT The traditional

More information

CS535 Big Data Fall 2017 Colorado State University 10/10/2017 Sangmi Lee Pallickara Week 8- A.

CS535 Big Data Fall 2017 Colorado State University   10/10/2017 Sangmi Lee Pallickara Week 8- A. CS535 Big Data - Fall 2017 Week 8-A-1 CS535 BIG DATA FAQs Term project proposal New deadline: Tomorrow PA1 demo PART 1. BATCH COMPUTING MODELS FOR BIG DATA ANALYTICS 5. ADVANCED DATA ANALYTICS WITH APACHE

More information

Ivy s Business Analytics Foundation Certification Details (Module I + II+ III + IV + V)

Ivy s Business Analytics Foundation Certification Details (Module I + II+ III + IV + V) Ivy s Business Analytics Foundation Certification Details (Module I + II+ III + IV + V) Based on Industry Cases, Live Exercises, & Industry Executed Projects Module (I) Analytics Essentials 81 hrs 1. Statistics

More information