Data Mining. ❷Chapter 2 Basic Statistics. Asso.Prof.Dr. Xiao-dong Zhu. Business School, University of Shanghai for Science & Technology
|
|
- Marianna Holland
- 5 years ago
- Views:
Transcription
1 ❷Chapter 2 Basic Statistics Business School, University of Shanghai for Science & Technology nd Semester, Spring2017
2 Contents of chapter 1 1 recording data using computers
3 some famous database management systems some famous data analysis software Some famous programming language for data analysis 1 recording data using computers
4 DBMS recording data using computers some famous database management systems some famous data analysis software Some famous programming language for data analysis
5 DBMS recording data using computers some famous database management systems some famous data analysis software Some famous programming language for data analysis 1 Oracle company. Oracle 12c (recent version)
6 DBMS recording data using computers some famous database management systems some famous data analysis software Some famous programming language for data analysis 1 Oracle company. Oracle 12c (recent version) 2 Microsoft company. Microsoft SQL server 2014.
7 DBMS recording data using computers some famous database management systems some famous data analysis software Some famous programming language for data analysis 1 Oracle company. Oracle 12c (recent version) 2 Microsoft company. Microsoft SQL server IBM company. IBM DB (recent version)
8 DBMS recording data using computers some famous database management systems some famous data analysis software Some famous programming language for data analysis 1 Oracle company. Oracle 12c (recent version) 2 Microsoft company. Microsoft SQL server IBM company. IBM DB (recent version) 4 MySQL (developed by MySQL AB company, Sweden, now belong to Oracle, 5.5 recent version)
9 DBMS recording data using computers some famous database management systems some famous data analysis software Some famous programming language for data analysis 1 Oracle company. Oracle 12c (recent version) 2 Microsoft company. Microsoft SQL server IBM company. IBM DB (recent version) 4 MySQL (developed by MySQL AB company, Sweden, now belong to Oracle, 5.5 recent version) 5 Excel, which is a good and simple office software for recording data)
10 DBMS recording data using computers some famous database management systems some famous data analysis software Some famous programming language for data analysis 1 Oracle company. Oracle 12c (recent version) 2 Microsoft company. Microsoft SQL server IBM company. IBM DB (recent version) 4 MySQL (developed by MySQL AB company, Sweden, now belong to Oracle, 5.5 recent version) 5 Excel, which is a good and simple office software for recording data) 6 others.
11 some famous data analysis software some famous database management systems some famous data analysis software Some famous programming language for data analysis 1 SPSS 2 Matlab 3 WEKA 4 Tableau 5 others...(excel is a good and simple office software for analyzing data)
12 some famous data analysis software some famous database management systems some famous data analysis software Some famous programming language for data analysis 1 SPSS 2 Matlab 3 WEKA 4 Tableau 5 others...(excel is a good and simple office software for analyzing data) we will learning above four tools in later classes and experiments
13 Some famous programming language some famous database management systems some famous data analysis software Some famous programming language for data analysis
14 Some famous programming language some famous database management systems some famous data analysis software Some famous programming language for data analysis 1 Pythorn
15 Some famous programming language some famous database management systems some famous data analysis software Some famous programming language for data analysis 1 Pythorn 2 R
16 Some famous programming language some famous database management systems some famous data analysis software Some famous programming language for data analysis 1 Pythorn 2 R 3 Matlab scripts
17 Some famous programming language some famous database management systems some famous data analysis software Some famous programming language for data analysis 1 Pythorn 2 R 3 Matlab scripts 4 Java
18 Some famous programming language some famous database management systems some famous data analysis software Some famous programming language for data analysis 1 Pythorn 2 R 3 Matlab scripts 4 Java 5 C#
19 Some famous programming language some famous database management systems some famous data analysis software Some famous programming language for data analysis 1 Pythorn 2 R 3 Matlab scripts 4 Java 5 C# 6 others.
20 Starting from buying a bicycle
21 Starting from buying a bicycle What would you consider in buying a second hand bike?
22 Starting from buying a bicycle What would you consider in buying a second hand bike? 1 Brand (Trek, Raleigh)
23 Starting from buying a bicycle What would you consider in buying a second hand bike? 1 Brand (Trek, Raleigh) 2 Type (road, mountain, racer)
24 Starting from buying a bicycle What would you consider in buying a second hand bike? 1 Brand (Trek, Raleigh) 2 Type (road, mountain, racer) 3 Components (Shimano, no name)
25 Starting from buying a bicycle What would you consider in buying a second hand bike? 1 Brand (Trek, Raleigh) 2 Type (road, mountain, racer) 3 Components (Shimano, no name) 4 Age
26 Starting from buying a bicycle What would you consider in buying a second hand bike? 1 Brand (Trek, Raleigh) 2 Type (road, mountain, racer) 3 Components (Shimano, no name) 4 Age 5 Condition (Excellent, good, poor)
27 Starting from buying a bicycle What would you consider in buying a second hand bike? 1 Brand (Trek, Raleigh) 2 Type (road, mountain, racer) 3 Components (Shimano, no name) 4 Age 5 Condition (Excellent, good, poor) 6 Price
28 Starting from buying a bicycle What would you consider in buying a second hand bike? 1 Brand (Trek, Raleigh) 2 Type (road, mountain, racer) 3 Components (Shimano, no name) 4 Age 5 Condition (Excellent, good, poor) 6 Price 7 Frame size
29 Starting from buying a bicycle What would you consider in buying a second hand bike? 1 Brand (Trek, Raleigh) 2 Type (road, mountain, racer) 3 Components (Shimano, no name) 4 Age 5 Condition (Excellent, good, poor) 6 Price 7 Frame size 8 Number of gears
30 Table: data field(data variable) Brand (Trek, Raleigh) Type (road, mountain, racer) Components (Shimano, no name) Age Condition (Excellent, good, poor) Price Frame size Number of gears
31 variables recording data using computers Samples are made up of individuals, all individuals have characteristics. Members of a sample will differ on certain characteristics. Hence, we call this variation amongst individuals variable characteristics or variables for short.
32 types of scales Table: data field(data variable) Brand (Trek, Raleigh) Type (road, mountain, racer) Components (Shimano, no name) Age Condition (Excellent, good, poor) Price Frame size Number of gears Trek road Shimano 22 Excellent 500$ 45cm 21
33 types of scales Table: data field(data variable) Brand (Trek, Raleigh) Type (road, mountain, racer) Components (Shimano, no name) Age Condition (Excellent, good, poor) Price Frame size Number of gears Trek road Shimano 22 Excellent 500$ 45cm 21
34 types of scales Nominal objects or people are categorized according to some criterion (gender, job category) Ordinal categories which are ranked according to characteristics (income- low, moderate, high) Interval contain equal distance between units of measure- but no zero (calendar years, temperature) Ratio has an absolute zero and consistent intervals (distance, weight)
35 Scales in variable view of SPSS Figure: Variable View in SPSS tool
36 1 recording data using computers
37 Population VS Sample Figure: census and sample
38 Population VS Sample Population A population refers to all the cases to which a researcher wants his estimates to apply to Examples: White mice, light bulb life, students
39 Population VS Sample Population A population refers to all the cases to which a researcher wants his estimates to apply to Examples: White mice, light bulb life, students Sample A sample is used because it is normally impossible to study all the members of a population Descriptive stats simply summarize a sample Inferential stats generalize from a sample to the wider population
40 Population & Sample
41 variance standard deviation kurtosis skewness example of 1 recording data using computers
42 variance standard deviation kurtosis skewness example of VS Example I cycle about 50 km per week on average. We can expect a lot of rain this time of year.inferential statistics
43 variance standard deviation kurtosis skewness example of VS Example I cycle about 50 km per week on average. statistics description We can expect a lot of rain this time of year.inferential statistics
44 terms of variance standard deviation kurtosis skewness example of minimum maximum sum mean variance range standard deviation distribution: kurtosis distribution: skewness
45 variance, σ 2 recording data using computers variance standard deviation kurtosis skewness example of (1)variance is the expectation of the squared deviation of a random variable from its mean, and it informally measures how far a set of (random) numbers are spread out from their mean. (2)it is often represented by σ 2
46 standard deviation, SD, σ variance standard deviation kurtosis skewness example of is a measure that is used to quantify the amount of variation or dispersion of a set of data values. p A low standard deviation indicates that the data points tend to be close to the mean (also called the expected value) of the set a high standard deviation indicates that the data points are spread out over a wider range of values.
47 kurtosis recording data using computers variance standard deviation kurtosis skewness example of Figure: the Pearson type VII distribution with excess kurtosis of infinity (red); 2 (blue); and 0 (black)
48 skewness recording data using computers variance standard deviation kurtosis skewness example of Figure: Comparison of mean, median and mode of two log-normal distributions with different skewness.
49 variance standard deviation kurtosis skewness example of
50 variance standard deviation kurtosis skewness example of
51 Bayes theorem hypothesis testing Chi Square statistic regression correlation 1 recording data using computers
52 Bayes theorem hypothesis testing Chi Square statistic regression correlation Bayes theorem hypothesis testing Chi square testing Regression correlation...
53 basic concepts and formulas Bayes theorem hypothesis testing Chi Square statistic regression correlation Posterior Probability: P(h 1 x i ) Prior Probability: P(h 1 )
54 basic concepts and formulas Bayes theorem hypothesis testing Chi Square statistic regression correlation Posterior Probability: P(h 1 x i ) Prior Probability: P(h 1 ) Bayes theorem m P(x i ) = P(x i h j )P(h j ) (1) j=1 P(h 1 x i ) = P(x i h 1 )P(h 1 ) P(x i ) (2)
55 Bayes Example recording data using computers Bayes theorem hypothesis testing Chi Square statistic regression correlation Credit authorizations (hypotheses): 1 h1 = authorize purchase, 2 h2 = authorize after further identification, 3 h3 = do not authorize, 4 h4 = do not authorize but contact police.
56 Bayes theorem hypothesis testing Chi Square statistic regression correlation Bayes Example(Cont d):assign twelve data values for all combinations of credit and income Table: Add caption Excellent x 1 x 2 x 3 x 4 Good x 5 x 6 x 7 x 8 Bad x 9 x 10 x 11 x 12
57 Bayes theorem hypothesis testing Chi Square statistic regression correlation Bayes Example(Cont d):10 training data Table: Add caption ID Income Credit Class x i 1 4 Excellent h 1 x Good h 1 x Excellent h 1 x Good h 1 x Good h 1 x Excellent h 1 x Bad h 2 x Bad h 2 x Bad h 3 x Bad h 4 x 9
58 Bayes theorem hypothesis testing Chi Square statistic regression correlation Bayes Example(Cont d):10 training data Table: Add caption ID Income Credit Class x i 1 4 Excellent h 1 x Good h 1 x Excellent h 1 x Good h 1 x Good h 1 x Excellent h 1 x Bad h 2 x Bad h 2 x Bad h 3 x Bad h 4 x 9 From training data: P(h 1 ) = 60%; P(h 2 ) = 20%; P(h 3 ) = 10%; P(h 4 ) = 10%.
59 Bayes Example(Cont d): Bayes theorem hypothesis testing Chi Square statistic regression correlation Calculate P(x i h j ) and P(x i ) P(x 7 h 1 ) = 2/6; P(x 4 h 1 ) = 1/6; P(x 2 h 1 ) = 2/6; P(x 8 h 1 ) = 1/6; P(x i h 1 ) = 0 for all other x i. Predict the class for x 4 Calculate P(h j x 4 ) for all h j. Place x p 4 in class with largest value. Ex: P(h 1 x 4 )= (P(x 4 h 1 )(P(h 1 ))/P(x 4 ) = (1/6)(0.6)/0.1 = 1. x 4 in class h 1.
60 Bayes Example(Cont d): Bayes theorem hypothesis testing Chi Square statistic regression correlation Calculate P(x i h j ) and P(x i ) P(x 7 h 1 ) = 2/6; P(x 4 h 1 ) = 1/6; P(x 2 h 1 ) = 2/6; P(x 8 h 1 ) = 1/6; P(x i h 1 ) = 0 for all other x i. Predict the class for x 4 Calculate P(h j x 4 ) for all h j. Place x p 4 in class with largest value. Ex: P(h 1 x 4 )= (P(x 4 h 1 )(P(h 1 ))/P(x 4 ) = (1/6)(0.6)/0.1 = 1. x 4 in class h 1.
61 what is hypothesis testing? Bayes theorem hypothesis testing Chi Square statistic regression correlation hypothesis testing Find model to explain behavior by creating and then testing a hypothesis about the data. H 0 Null hypothesis; Hypothesis to be tested. H 1 Alternative hypothesis
62 Chi Square Statistic Bayes theorem hypothesis testing Chi Square statistic regression correlation formula O - observed value E - expected value based on hypothesis (usually denoted by h0). χ 2 = (O E) 2 E (3) example: O = 50, 93, 67, 78, 87 E = 75 χ 2 = and therefore significant
63 Principle of Chi Square Statistic Bayes theorem hypothesis testing Chi Square statistic regression correlation 1 (1) (O E) is a residual. 2 (2) Obviously, (O E) can represent a deviation degree of observation value from the theoretical value. It has shortcoming, (O E) maybe appear negative number, then the sum maybe leadto zero. So we using (O E) 2 3 (3) On the other hand, (O E) has relative property. If E is 10, then it is intolerable and too big that residual being 20. However, if E is 1000, then it looks very small that residual being 20. So before summation, using (Op E) 2 /E
64 Bayes theorem hypothesis testing Chi Square statistic regression correlation Principle of Chi Square Statistic(Cont d) 1 So, if χ 2 is less, we tend to accept H 0. 2 if χ 2 is large, we tend to reject H 0. 3 to what degree χ 2 is large? we need using χ 2 distribution.
65 Bayes theorem hypothesis testing Chi Square statistic regression correlation Principle of Chi Square Statistic(Cont d) 1 So, if χ 2 is less, we tend to accept H 0. 2 if χ 2 is large, we tend to reject H 0. 3 to what degree χ 2 is large? we need using χ 2 distribution. So χ 2 statistic can be applied into hypothesis testing
66 Regression recording data using computers Bayes theorem hypothesis testing Chi Square statistic regression correlation 1 Predict future values based on past values 2 Linear Regression assumes linear relationship exists. y = c 0 + c 1 x c n x n 3 Find values to best fit the data
67 Regression recording data using computers Bayes theorem hypothesis testing Chi Square statistic regression correlation 1 Predict future values based on past values 2 Linear Regression assumes linear relationship exists. y = c 0 + c 1 x c n x n 3 Find values to best fit the data
68 Correlation recording data using computers Bayes theorem hypothesis testing Chi Square statistic regression correlation Correlation Examine the degree to which the values for two variables behave similarly. Correlation coefficient r :
69 Correlation recording data using computers Bayes theorem hypothesis testing Chi Square statistic regression correlation Correlation Examine the degree to which the values for two variables behave similarly. Correlation coefficient r : 1 1 = perfect correlation 2-1 = perfect but opposite correlation 3 0 = no correlation 4 0 < r < 1 or 1 < r < 0, denote the correlation degree formula r = (xi X )(y i Ȳ ) (xi X ) 2 (y i Ȳ ) 2 (4)
70 Demo recording data using computers Bayes theorem hypothesis testing Chi Square statistic regression correlation In the next class. Demonstrate SPSS operation.
71 Homework Find an article (a paper, or a book) from websites, download it and read it.
Introduction to SPSS Faiez Mossa 2 nd Class
Introduction to SPSS 16.0 Faiez Mossa 2 nd Class 1 Outline Review of Concepts (stats and scales) Data entry (the workspace and labels) By hand Import Excel Running an analysis- frequency, central tendency,
More informationStatistical Package for the Social Sciences INTRODUCTION TO SPSS SPSS for Windows Version 16.0: Its first version in 1968 In 1975.
Statistical Package for the Social Sciences INTRODUCTION TO SPSS SPSS for Windows Version 16.0: Its first version in 1968 In 1975. SPSS Statistics were designed INTRODUCTION TO SPSS Objective About the
More informationMeasures of Dispersion
Measures of Dispersion 6-3 I Will... Find measures of dispersion of sets of data. Find standard deviation and analyze normal distribution. Day 1: Dispersion Vocabulary Measures of Variation (Dispersion
More informationData Statistics Population. Census Sample Correlation... Statistical & Practical Significance. Qualitative Data Discrete Data Continuous Data
Data Statistics Population Census Sample Correlation... Voluntary Response Sample Statistical & Practical Significance Quantitative Data Qualitative Data Discrete Data Continuous Data Fewer vs Less Ratio
More informationData can be in the form of numbers, words, measurements, observations or even just descriptions of things.
+ What is Data? Data is a collection of facts. Data can be in the form of numbers, words, measurements, observations or even just descriptions of things. In most cases, data needs to be interpreted and
More informationResearch Methods for Business and Management. Session 8a- Analyzing Quantitative Data- using SPSS 16 Andre Samuel
Research Methods for Business and Management Session 8a- Analyzing Quantitative Data- using SPSS 16 Andre Samuel A Simple Example- Gym Purpose of Questionnaire- to determine the participants involvement
More informationData analysis using Microsoft Excel
Introduction to Statistics Statistics may be defined as the science of collection, organization presentation analysis and interpretation of numerical data from the logical analysis. 1.Collection of Data
More informationData Analysis and Solver Plugins for KSpread USER S MANUAL. Tomasz Maliszewski
Data Analysis and Solver Plugins for KSpread USER S MANUAL Tomasz Maliszewski tmaliszewski@wp.pl Table of Content CHAPTER 1: INTRODUCTION... 3 1.1. ABOUT DATA ANALYSIS PLUGIN... 3 1.3. ABOUT SOLVER PLUGIN...
More informationNonparametric Testing
Nonparametric Testing in Excel By Mark Harmon Copyright 2011 Mark Harmon No part of this publication may be reproduced or distributed without the express permission of the author. mark@excelmasterseries.com
More informationNuts and Bolts Research Methods Symposium
Organizing Your Data Jenny Holcombe, PhD UT College of Medicine Nuts & Bolts Conference August 16, 3013 Topics to Discuss: Types of Variables Constructing a Variable Code Book Developing Excel Spreadsheets
More informationPart I, Chapters 4 & 5. Data Tables and Data Analysis Statistics and Figures
Part I, Chapters 4 & 5 Data Tables and Data Analysis Statistics and Figures Descriptive Statistics 1 Are data points clumped? (order variable / exp. variable) Concentrated around one value? Concentrated
More informationQuantitative - One Population
Quantitative - One Population The Quantitative One Population VISA procedures allow the user to perform descriptive and inferential procedures for problems involving one population with quantitative (interval)
More informationResources for statistical assistance. Quantitative covariates and regression analysis. Methods for predicting continuous outcomes.
Resources for statistical assistance Quantitative covariates and regression analysis Carolyn Taylor Applied Statistics and Data Science Group (ASDa) Department of Statistics, UBC January 24, 2017 Department
More informationSpatial Patterns Point Pattern Analysis Geographic Patterns in Areal Data
Spatial Patterns We will examine methods that are used to analyze patterns in two sorts of spatial data: Point Pattern Analysis - These methods concern themselves with the location information associated
More informationData Mining: Concepts and Techniques. (3 rd ed.) Chapter 3. Chapter 3: Data Preprocessing. Major Tasks in Data Preprocessing
Data Mining: Concepts and Techniques (3 rd ed.) Chapter 3 1 Chapter 3: Data Preprocessing Data Preprocessing: An Overview Data Quality Major Tasks in Data Preprocessing Data Cleaning Data Integration Data
More informationIAT 355 Visual Analytics. Data and Statistical Models. Lyn Bartram
IAT 355 Visual Analytics Data and Statistical Models Lyn Bartram Exploring data Example: US Census People # of people in group Year # 1850 2000 (every decade) Age # 0 90+ Sex (Gender) # Male, female Marital
More informationFurther Maths Notes. Common Mistakes. Read the bold words in the exam! Always check data entry. Write equations in terms of variables
Further Maths Notes Common Mistakes Read the bold words in the exam! Always check data entry Remember to interpret data with the multipliers specified (e.g. in thousands) Write equations in terms of variables
More informationVCEasy VISUAL FURTHER MATHS. Overview
VCEasy VISUAL FURTHER MATHS Overview This booklet is a visual overview of the knowledge required for the VCE Year 12 Further Maths examination.! This booklet does not replace any existing resources that
More informationStatistical Analysis of Metabolomics Data. Xiuxia Du Department of Bioinformatics & Genomics University of North Carolina at Charlotte
Statistical Analysis of Metabolomics Data Xiuxia Du Department of Bioinformatics & Genomics University of North Carolina at Charlotte Outline Introduction Data pre-treatment 1. Normalization 2. Centering,
More informationBasic Statistical Terms and Definitions
I. Basics Basic Statistical Terms and Definitions Statistics is a collection of methods for planning experiments, and obtaining data. The data is then organized and summarized so that professionals can
More informationSLStats.notebook. January 12, Statistics:
Statistics: 1 2 3 Ways to display data: 4 generic arithmetic mean sample 14A: Opener, #3,4 (Vocabulary, histograms, frequency tables, stem and leaf) 14B.1: #3,5,8,9,11,12,14,15,16 (Mean, median, mode,
More informationPrepare a stem-and-leaf graph for the following data. In your final display, you should arrange the leaves for each stem in increasing order.
Chapter 2 2.1 Descriptive Statistics A stem-and-leaf graph, also called a stemplot, allows for a nice overview of quantitative data without losing information on individual observations. It can be a good
More informationProduct Catalog. AcaStat. Software
Product Catalog AcaStat Software AcaStat AcaStat is an inexpensive and easy-to-use data analysis tool. Easily create data files or import data from spreadsheets or delimited text files. Run crosstabulations,
More informationMath 214 Introductory Statistics Summer Class Notes Sections 3.2, : 1-21 odd 3.3: 7-13, Measures of Central Tendency
Math 14 Introductory Statistics Summer 008 6-9-08 Class Notes Sections 3, 33 3: 1-1 odd 33: 7-13, 35-39 Measures of Central Tendency odd Notation: Let N be the size of the population, n the size of the
More informationSTATS PAD USER MANUAL
STATS PAD USER MANUAL For Version 2.0 Manual Version 2.0 1 Table of Contents Basic Navigation! 3 Settings! 7 Entering Data! 7 Sharing Data! 8 Managing Files! 10 Running Tests! 11 Interpreting Output! 11
More informationResearch Data Analysis using SPSS. By Dr.Anura Karunarathne Senior Lecturer, Department of Accountancy University of Kelaniya
Research Data Analysis using SPSS By Dr.Anura Karunarathne Senior Lecturer, Department of Accountancy University of Kelaniya MBA 61013- Business Statistics and Research Methodology Learning outcomes At
More informationSTA 570 Spring Lecture 5 Tuesday, Feb 1
STA 570 Spring 2011 Lecture 5 Tuesday, Feb 1 Descriptive Statistics Summarizing Univariate Data o Standard Deviation, Empirical Rule, IQR o Boxplots Summarizing Bivariate Data o Contingency Tables o Row
More informationModelling Proportions and Count Data
Modelling Proportions and Count Data Rick White May 4, 2016 Outline Analysis of Count Data Binary Data Analysis Categorical Data Analysis Generalized Linear Models Questions Types of Data Continuous data:
More informationModelling Proportions and Count Data
Modelling Proportions and Count Data Rick White May 5, 2015 Outline Analysis of Count Data Binary Data Analysis Categorical Data Analysis Generalized Linear Models Questions Types of Data Continuous data:
More informationDescriptive Statistics, Standard Deviation and Standard Error
AP Biology Calculations: Descriptive Statistics, Standard Deviation and Standard Error SBI4UP The Scientific Method & Experimental Design Scientific method is used to explore observations and answer questions.
More informationAn Econometric Study: The Cost of Mobile Broadband
An Econometric Study: The Cost of Mobile Broadband Zhiwei Peng, Yongdon Shin, Adrian Raducanu IATOM13 ENAC January 16, 2014 Zhiwei Peng, Yongdon Shin, Adrian Raducanu (UCLA) The Cost of Mobile Broadband
More informationDATA MINING Introductory and Advanced Topics Part I
DATA MINING Introductory and Advanced Topics Part I Margaret H. Dunham Department of Computer Science and Engineering Southern Methodist University Companion slides for the text by Dr. M.H.Dunham, Data
More informationBluman & Mayer, Elementary Statistics, A Step by Step Approach, Canadian Edition
Bluman & Mayer, Elementary Statistics, A Step by Step Approach, Canadian Edition Online Learning Centre Technology Step-by-Step - Minitab Minitab is a statistical software application originally created
More informationSection 3.2: Multiple Linear Regression II. Jared S. Murray The University of Texas at Austin McCombs School of Business
Section 3.2: Multiple Linear Regression II Jared S. Murray The University of Texas at Austin McCombs School of Business 1 Multiple Linear Regression: Inference and Understanding We can answer new questions
More informationIQR = number. summary: largest. = 2. Upper half: Q3 =
Step by step box plot Height in centimeters of players on the 003 Women s Worldd Cup soccer team. 157 1611 163 163 164 165 165 165 168 168 168 170 170 170 171 173 173 175 180 180 Determine the 5 number
More informationOrganizing Your Data. Jenny Holcombe, PhD UT College of Medicine Nuts & Bolts Conference August 16, 3013
Organizing Your Data Jenny Holcombe, PhD UT College of Medicine Nuts & Bolts Conference August 16, 3013 Learning Objectives Identify Different Types of Variables Appropriately Naming Variables Constructing
More informationAcquisition Description Exploration Examination Understanding what data is collected. Characterizing properties of data.
Summary Statistics Acquisition Description Exploration Examination what data is collected Characterizing properties of data. Exploring the data distribution(s). Identifying data quality problems. Selecting
More informationMean Tests & X 2 Parametric vs Nonparametric Errors Selection of a Statistical Test SW242
Mean Tests & X 2 Parametric vs Nonparametric Errors Selection of a Statistical Test SW242 Creation & Description of a Data Set * 4 Levels of Measurement * Nominal, ordinal, interval, ratio * Variable Types
More informationWeek 4: Simple Linear Regression III
Week 4: Simple Linear Regression III Marcelo Coca Perraillon University of Colorado Anschutz Medical Campus Health Services Research Methods I HSMP 7607 2017 c 2017 PERRAILLON ARR 1 Outline Goodness of
More informationMeasures of Central Tendency
Page of 6 Measures of Central Tendency A measure of central tendency is a value used to represent the typical or average value in a data set. The Mean The sum of all data values divided by the number of
More informationIENG484 Quality Engineering Lab 1 RESEARCH ASSISTANT SHADI BOLOUKIFAR
IENG484 Quality Engineering Lab 1 RESEARCH ASSISTANT SHADI BOLOUKIFAR SPSS (Statistical package for social science) Originally is acronym of Statistical Package for the Social Science but, now it stands
More informationData Mining and Knowledge Discovery: Practice Notes
Data Mining and Knowledge Discovery: Practice Notes Petra Kralj Novak Petra.Kralj.Novak@ijs.si 2016/01/12 1 Keywords Data Attribute, example, attribute-value data, target variable, class, discretization
More informationMath 120 Introduction to Statistics Mr. Toner s Lecture Notes 3.1 Measures of Central Tendency
Math 1 Introduction to Statistics Mr. Toner s Lecture Notes 3.1 Measures of Central Tendency lowest value + highest value midrange The word average: is very ambiguous and can actually refer to the mean,
More informationZ-TEST / Z-STATISTIC: used to test hypotheses about. µ when the population standard deviation is unknown
Z-TEST / Z-STATISTIC: used to test hypotheses about µ when the population standard deviation is known and population distribution is normal or sample size is large T-TEST / T-STATISTIC: used to test hypotheses
More informationAntrix Academy of Data Science TM
TM Preparing for MOST Promising Career Opportunities in Data Analytics... Excel Tableau SAS Excel & SQL IBM SPSS Business Analytics COURSES # Duration* 1 Excel Proficiency 5 Hrs 2 Data Analytics with SAS
More informationFrequency Distributions
Displaying Data Frequency Distributions After collecting data, the first task for a researcher is to organize and summarize the data so that it is possible to get a general overview of the results. Remember,
More informationSelected Introductory Statistical and Data Manipulation Procedures. Gordon & Johnson 2002 Minitab version 13.
Minitab@Oneonta.Manual: Selected Introductory Statistical and Data Manipulation Procedures Gordon & Johnson 2002 Minitab version 13.0 Minitab@Oneonta.Manual: Selected Introductory Statistical and Data
More informationData Mining and Knowledge Discovery Practice notes: Numeric Prediction, Association Rules
Keywords Data Mining and Knowledge Discovery: Practice Notes Petra Kralj Novak Petra.Kralj.Novak@ijs.si 06/0/ Data Attribute, example, attribute-value data, target variable, class, discretization Algorithms
More informationECLT 5810 Data Preprocessing. Prof. Wai Lam
ECLT 5810 Data Preprocessing Prof. Wai Lam Why Data Preprocessing? Data in the real world is imperfect incomplete: lacking attribute values, lacking certain attributes of interest, or containing only aggregate
More information- 1 - Fig. A5.1 Missing value analysis dialog box
WEB APPENDIX Sarstedt, M. & Mooi, E. (2019). A concise guide to market research. The process, data, and methods using SPSS (3 rd ed.). Heidelberg: Springer. Missing Value Analysis and Multiple Imputation
More informationRegression Analysis and Linear Regression Models
Regression Analysis and Linear Regression Models University of Trento - FBK 2 March, 2015 (UNITN-FBK) Regression Analysis and Linear Regression Models 2 March, 2015 1 / 33 Relationship between numerical
More informationSlide Copyright 2005 Pearson Education, Inc. SEVENTH EDITION and EXPANDED SEVENTH EDITION. Chapter 13. Statistics Sampling Techniques
SEVENTH EDITION and EXPANDED SEVENTH EDITION Slide - Chapter Statistics. Sampling Techniques Statistics Statistics is the art and science of gathering, analyzing, and making inferences from numerical information
More informationData Mining and Knowledge Discovery Practice notes: Numeric Prediction, Association Rules
Keywords Data Mining and Knowledge Discovery: Practice Notes Petra Kralj Novak Petra.Kralj.Novak@ijs.si Data Attribute, example, attribute-value data, target variable, class, discretization Algorithms
More informationSection 2.3: Simple Linear Regression: Predictions and Inference
Section 2.3: Simple Linear Regression: Predictions and Inference Jared S. Murray The University of Texas at Austin McCombs School of Business Suggested reading: OpenIntro Statistics, Chapter 7.4 1 Simple
More informationChpt 3. Data Description. 3-2 Measures of Central Tendency /40
Chpt 3 Data Description 3-2 Measures of Central Tendency 1 /40 Chpt 3 Homework 3-2 Read pages 96-109 p109 Applying the Concepts p110 1, 8, 11, 15, 27, 33 2 /40 Chpt 3 3.2 Objectives l Summarize data using
More informationExcel 2010 with XLSTAT
Excel 2010 with XLSTAT J E N N I F E R LE W I S PR I E S T L E Y, PH.D. Introduction to Excel 2010 with XLSTAT The layout for Excel 2010 is slightly different from the layout for Excel 2007. However, with
More informationReference
Leaning diary: research methodology 30.11.2017 Name: Juriaan Zandvliet Student number: 291380 (1) a short description of each topic of the course, (2) desciption of possible examples or exercises done
More information1. Basic Steps for Data Analysis Data Editor. 2.4.To create a new SPSS file
1 SPSS Guide 2009 Content 1. Basic Steps for Data Analysis. 3 2. Data Editor. 2.4.To create a new SPSS file 3 4 3. Data Analysis/ Frequencies. 5 4. Recoding the variable into classes.. 5 5. Data Analysis/
More informationData mining, 4 cu Lecture 6:
582364 Data mining, 4 cu Lecture 6: Quantitative association rules Multi-level association rules Spring 2010 Lecturer: Juho Rousu Teaching assistant: Taru Itäpelto Data mining, Spring 2010 (Slides adapted
More informationPredicting housing price
Predicting housing price Shu Niu Introduction The goal of this project is to produce a model for predicting housing prices given detailed information. The model can be useful for many purpose. From estimating
More informationData Analyst Nanodegree Syllabus
Data Analyst Nanodegree Syllabus Discover Insights from Data with Python, R, SQL, and Tableau Before You Start Prerequisites : In order to succeed in this program, we recommend having experience working
More informationCHAPTER 3: Data Description
CHAPTER 3: Data Description You ve tabulated and made pretty pictures. Now what numbers do you use to summarize your data? Ch3: Data Description Santorico Page 68 You ll find a link on our website to a
More informationChapters 5-6: Statistical Inference Methods
Chapters 5-6: Statistical Inference Methods Chapter 5: Estimation (of population parameters) Ex. Based on GSS data, we re 95% confident that the population mean of the variable LONELY (no. of days in past
More informationChapter 2. Descriptive Statistics: Organizing, Displaying and Summarizing Data
Chapter 2 Descriptive Statistics: Organizing, Displaying and Summarizing Data Objectives Student should be able to Organize data Tabulate data into frequency/relative frequency tables Display data graphically
More informationData Mining. 2.4 Data Integration. Fall Instructor: Dr. Masoud Yaghini. Data Integration
Data Mining 2.4 Fall 2008 Instructor: Dr. Masoud Yaghini Data integration: Combines data from multiple databases into a coherent store Denormalization tables (often done to improve performance by avoiding
More informationGetting to Know Your Data
Chapter 2 Getting to Know Your Data 2.1 Exercises 1. Give three additional commonly used statistical measures (i.e., not illustrated in this chapter) for the characterization of data dispersion, and discuss
More informationExam Review: Ch. 1-3 Answer Section
Exam Review: Ch. 1-3 Answer Section MDM 4U0 MULTIPLE CHOICE 1. ANS: A Section 1.6 2. ANS: A Section 1.6 3. ANS: A Section 1.7 4. ANS: A Section 1.7 5. ANS: C Section 2.3 6. ANS: B Section 2.3 7. ANS: D
More informationC A S I O f x S UNIVERSITY OF SOUTHERN QUEENSLAND. The Learning Centre Learning and Teaching Support Unit
C A S I O f x - 1 0 0 S UNIVERSITY OF SOUTHERN QUEENSLAND The Learning Centre Learning and Teaching Support Unit MASTERING THE CALCULATOR USING THE CASIO fx-100s Learning and Teaching Support Unit (LTSU)
More informationStatistics: Interpreting Data and Making Predictions. Visual Displays of Data 1/31
Statistics: Interpreting Data and Making Predictions Visual Displays of Data 1/31 Last Time Last time we discussed central tendency; that is, notions of the middle of data. More specifically we discussed
More informationMULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question.
31-32 Review Name 1) Which of the following is the properly rounded mean for the given data? 7, 8, 13, 9, 10, 11 A) 9 B) 967 C) 97 D) 10 2) What is the median of the following set of values? 5, 19, 17,
More informationSTA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 12 Combining
More informationSecondary 1 Vocabulary Cards and Word Walls Revised: June 27, 2012
Secondary 1 Vocabulary Cards and Word Walls Revised: June 27, 2012 Important Notes for Teachers: The vocabulary cards in this file match the Common Core, the math curriculum adopted by the Utah State Board
More informationMultivariate Capability Analysis
Multivariate Capability Analysis Summary... 1 Data Input... 3 Analysis Summary... 4 Capability Plot... 5 Capability Indices... 6 Capability Ellipse... 7 Correlation Matrix... 8 Tests for Normality... 8
More informationThe first few questions on this worksheet will deal with measures of central tendency. These data types tell us where the center of the data set lies.
Instructions: You are given the following data below these instructions. Your client (Courtney) wants you to statistically analyze the data to help her reach conclusions about how well she is teaching.
More informationHeteroskedasticity and Homoskedasticity, and Homoskedasticity-Only Standard Errors
Heteroskedasticity and Homoskedasticity, and Homoskedasticity-Only Standard Errors (Section 5.4) What? Consequences of homoskedasticity Implication for computing standard errors What do these two terms
More informationAssignment 4 (Sol.) Introduction to Data Analytics Prof. Nandan Sudarsanam & Prof. B. Ravindran
Assignment 4 (Sol.) Introduction to Data Analytics Prof. andan Sudarsanam & Prof. B. Ravindran 1. Which among the following techniques can be used to aid decision making when those decisions depend upon
More informationHypermarket Retail Analysis Customer Buying Behavior. Reachout Analytics Client Sample Report
Hypermarket Retail Analysis Customer Buying Behavior Report Tools Used: R Python WEKA Techniques Applied: Comparesion Tests Association Tests Requirement 1: All the Store Brand significance to Gender Towards
More informationHierarchical Clustering
What is clustering Partitioning of a data set into subsets. A cluster is a group of relatively homogeneous cases or observations Hierarchical Clustering Mikhail Dozmorov Fall 2016 2/61 What is clustering
More informationTo calculate the arithmetic mean, sum all the values and divide by n (equivalently, multiple 1/n): 1 n. = 29 years.
3: Summary Statistics Notation Consider these 10 ages (in years): 1 4 5 11 30 50 8 7 4 5 The symbol n represents the sample size (n = 10). The capital letter X denotes the variable. x i represents the
More informationTHE UNIVERSITY OF BRITISH COLUMBIA FORESTRY 430 and 533. Time: 50 minutes 40 Marks FRST Marks FRST 533 (extra questions)
THE UNIVERSITY OF BRITISH COLUMBIA FORESTRY 430 and 533 MIDTERM EXAMINATION: October 14, 2005 Instructor: Val LeMay Time: 50 minutes 40 Marks FRST 430 50 Marks FRST 533 (extra questions) This examination
More informationTELCOM2125: Network Science and Analysis
School of Information Sciences University of Pittsburgh TELCOM2125: Network Science and Analysis Konstantinos Pelechrinis Spring 2015 Figures are taken from: M.E.J. Newman, Networks: An Introduction 2
More informationExample 1 of panel data : Data for 6 airlines (groups) over 15 years (time periods) Example 1
Panel data set Consists of n entities or subjects (e.g., firms and states), each of which includes T observations measured at 1 through t time period. total number of observations : nt Panel data have
More informationStatCalc User Manual. Version 9 for Mac and Windows. Copyright 2018, AcaStat Software. All rights Reserved.
StatCalc User Manual Version 9 for Mac and Windows Copyright 2018, AcaStat Software. All rights Reserved. http://www.acastat.com Table of Contents Introduction... 4 Getting Help... 4 Uninstalling StatCalc...
More informationChapter 6: DESCRIPTIVE STATISTICS
Chapter 6: DESCRIPTIVE STATISTICS Random Sampling Numerical Summaries Stem-n-Leaf plots Histograms, and Box plots Time Sequence Plots Normal Probability Plots Sections 6-1 to 6-5, and 6-7 Random Sampling
More informationMinitab 17 commands Prepared by Jeffrey S. Simonoff
Minitab 17 commands Prepared by Jeffrey S. Simonoff Data entry and manipulation To enter data by hand, click on the Worksheet window, and enter the values in as you would in any spreadsheet. To then save
More informationCorrelation. January 12, 2019
Correlation January 12, 2019 Contents Correlations The Scattterplot The Pearson correlation The computational raw-score formula Survey data Fun facts about r Sensitivity to outliers Spearman rank-order
More informationChapter2 Description of samples and populations. 2.1 Introduction.
Chapter2 Description of samples and populations. 2.1 Introduction. Statistics=science of analyzing data. Information collected (data) is gathered in terms of variables (characteristics of a subject that
More informationLearner Expectations UNIT 1: GRAPICAL AND NUMERIC REPRESENTATIONS OF DATA. Sept. Fathom Lab: Distributions and Best Methods of Display
CURRICULUM MAP TEMPLATE Priority Standards = Approximately 70% Supporting Standards = Approximately 20% Additional Standards = Approximately 10% HONORS PROBABILITY AND STATISTICS Essential Questions &
More information10.4 Measures of Central Tendency and Variation
10.4 Measures of Central Tendency and Variation Mode-->The number that occurs most frequently; there can be more than one mode ; if each number appears equally often, then there is no mode at all. (mode
More information10.4 Measures of Central Tendency and Variation
10.4 Measures of Central Tendency and Variation Mode-->The number that occurs most frequently; there can be more than one mode ; if each number appears equally often, then there is no mode at all. (mode
More informationChapter 5. Normal. Normal Curve. the Normal. Curve Examples. Standard Units Standard Units Examples. for Data
curve Approximation Part II Descriptive Statistics The Approximation Approximation The famous normal curve can often be used as an 'ideal' histogram, to which histograms for data can be compared. Its equation
More informationChapter 2: Linear Equations and Functions
Chapter 2: Linear Equations and Functions Chapter 2: Linear Equations and Functions Assignment Sheet Date Topic Assignment Completed 2.1 Functions and their Graphs and 2.2 Slope and Rate of Change 2.1
More informationData Analysis using SPSS
Data Analysis using SPSS 2073/03/05 03/07 Bijay Lal Pradhan, Ph.D. Ground Rule Mobile Penalty Participation Involvement Introduction to SPSS Day 1 2073/03/05 Session I Bijay Lal Pradhan, Ph.D. Object of
More informationChapter 2 Describing, Exploring, and Comparing Data
Slide 1 Chapter 2 Describing, Exploring, and Comparing Data Slide 2 2-1 Overview 2-2 Frequency Distributions 2-3 Visualizing Data 2-4 Measures of Center 2-5 Measures of Variation 2-6 Measures of Relative
More informationDr. Barbara Morgan Quantitative Methods
Dr. Barbara Morgan Quantitative Methods 195.650 Basic Stata This is a brief guide to using the most basic operations in Stata. Stata also has an on-line tutorial. At the initial prompt type tutorial. In
More informationSPSS QM II. SPSS Manual Quantitative methods II (7.5hp) SHORT INSTRUCTIONS BE CAREFUL
SPSS QM II SHORT INSTRUCTIONS This presentation contains only relatively short instructions on how to perform some statistical analyses in SPSS. Details around a certain function/analysis method not covered
More informationSTA Rev. F Learning Objectives. Learning Objectives (Cont.) Module 3 Descriptive Measures
STA 2023 Module 3 Descriptive Measures Learning Objectives Upon completing this module, you should be able to: 1. Explain the purpose of a measure of center. 2. Obtain and interpret the mean, median, and
More informationData Analyst Nanodegree Syllabus
Data Analyst Nanodegree Syllabus Discover Insights from Data with Python, R, SQL, and Tableau Before You Start Prerequisites : In order to succeed in this program, we recommend having experience working
More informationANSWERS -- Prep for Psyc350 Laboratory Final Statistics Part Prep a
ANSWERS -- Prep for Psyc350 Laboratory Final Statistics Part Prep a Put the following data into an spss data set: Be sure to include variable and value labels and missing value specifications for all variables
More informationAn introduction to SPSS
An introduction to SPSS To open the SPSS software using U of Iowa Virtual Desktop... Go to https://virtualdesktop.uiowa.edu and choose SPSS 24. Contents NOTE: Save data files in a drive that is accessible
More information