User Behaviour and Platform Performance. in Mobile Multiplayer Environments

Size: px
Start display at page:

Download "User Behaviour and Platform Performance. in Mobile Multiplayer Environments"

Transcription

1 User Behaviour and Platform Performance in Mobile Multiplayer Environments HELSINKI UNIVERSITY OF TECHNOLOGY Systems Analysis Laboratory Ilkka Hirvonen 51555K

2 1 Introduction As mobile technologies advance and more games and applications are designed for mobile devices, a need for methods to analyse data gathered by mobile application platforms has arisen. This study discusses this field and gives several examples of how statistical methods can be used to draw conclusions about user behaviour based on simple data gathered by server-based multiplayer application platforms. The focus is on multiplayer games. Due to the nature of mobile environment, games running on mobile platforms are subject to several restrictions. Because game device is mobile, the user can be physically at any location. Network failures might afflict the user and latencies can vary significantly. Compared to fixed networks latencies are larger and data transfer speeds are lower. Costs of data transfer may vary depending on the operator s tariffs. These factors, among others, need to be taken into account when designing mobile games and applications as well as analysing data gathered by mobile platforms. There have been studies on game platforms and user behaviour in fixed network environments, but this study transfers the focus to mobile environment and mobile game devices. (See Henderson et al. 2001, McCoy et al. 2004) The objective of this study is to find patterns in user behaviour and to give recommendations how these patterns can be used for marketing purposes and for improving mobile game design in order to develop games that better serve these segments. The objective can be divided into two sub-objectives: 1. Segmentation of users; Importance of this analysis stems from the needs of marketing and game design. Information on user segments can be used directly to focus marketing efforts as well as for designing more attractive games that conform to the characteristics of different segments. Users behavioural patterns are rendered from characteristics of users play sessions by means of simple statistical key figures and autocorrelation analysis. 2. Forecasting future server load; Estimates of future server loads help optimise server capacity and avoid unnecessary performance problems. First some useful statistical methods are described. After that, applications of those methods are sketched. Focus is on simplicity of implementation of the methods, as the analysing

3 software may be running real-time. Finally, a case example is presented. Multiplayer Publishing Environment (MUPE) is an open-source application platform for context-aware multiplayer mobile applications (see The platform is released under the Nokia Open Source Licence (NOKOS). From a data set logged by MUPE some key figures are calculated with intention to segment users and to keep track on e.g. cumulative amounts of data sent per user and average latencies the users are experiencing. All this information can be used to develop better games and other multiplayer applications. Also, if the available data is adequate and the internal structure of the application is known, conclusions about e.g. optimal game strategies can be drawn. The data could well be used for forecasting purposes and for analysing platform s performance. However, these areas are beyond the scope of this study, but may provide fruitful topics for future research. 1.1 Objectives of segmentation User segmentation is the process of dividing users into different groups depending on a particular user s position on certain segmentation dimensions, for example how frequently the user is using the mobile application (Kotler and Keller, 2005). Two partly interrelated objectives of segmentation discussed in this study are (i) (ii) segmentation for marketing purposes, and segmentation for game design purposes. For companies that develop or offer mobile games marketing costs are usually significant. To efficiently focus marketing efforts it is imperative to have a good understanding of user segments and their characteristics. One dimension on which users can be segmented is temporal, i.e. for how long, how often and when the users are using the applications. Information on the user segments is also useful for game developers. Game design is very different if the users were to play a game for one minute ten times a day, compared to a game that is played only a few times a day but for longer at a time.

4 Answers to these questions can be mined from the data gathered by the application platform. Methods depend on the structure and content of the data; in this study sets of statistical key figures, autocorrelations and covariances are used. 1.2 Objectives of forecasting The number of users of mobile applications is not constant around the day and on different days of the week. Server loads also differ respectively. Therefore, forecasting the number of users online (or value of any similar variable that affects server load significantly) may prove useful when deciding how much server capacity should be reserved for the mobile applications. Lack of sufficient server capacity would appear as higher latencies and as a generally unsatisfying user experience. 2 Elements of logged data In this section some variables that provide useful input for data analysis are proposed. Without information carried by these variables relevant results are difficult, if not impossible, to be derived. Typically an application platform writes down data whenever events caused by users or the system occur. For simplicity, it is assumed that the platform (the server) logs data every time a user triggers a method call. When a method call occurs, the platform writes down a set of values of different attributes or variables. By mining this data it should be possible to distinguish some fundamental key elements or objects, at least different users and the users play sessions with some internal characteristics. User is an entity that uses a certain application that runs on the platform. Usually users are human beings, though artificial intelligence technologies could in some cases be considered as a user. A play session belongs to a single user. Also with multi-player games each user has its own play sessions; naturally, different users play sessions can overlap. A play session has a beginning and an end, which together determine the duration of the play session. During the play session user causes events and these different events shape the characteristics of the play session in question.

5 Life cycle of a play session is a series of three states. This way, the start and end criteria of a play session can be defined according to the requirements of the case in question. State Trigger START A certain method call indicating a new play session Any method call from a new user SESSION Automatically from START state END A certain method call indicating an end of the session No method calls in SESSION state for N minutes (e.g. for 5 minutes) Loss of connection Figure 1 Play sessions of four different users. Play sessions have a certain duration and some of the play sessions might overlap with other users'sessions. (Source: Riku Suomela, Nokia Research Center)

6 This study is restricted to platforms and contexts where it is logically relevant to split the logged data into users and play sessions of these users. 2.1 Required variables For the analysis, each data set stored by the logger should contain at least values of the following attributes: - User identification number; to identify the originator of the method call and to which play session the data set belongs, - Timestamp of the method call; to determine the time when the method call was initiated, - Server processing time; how long it took for the server to process the method call, - Latency; the delay between making the previous method call and the time user receives a reply from the platform for that method call, - Size of the message sent by the user, - Size of the response sent by the server, - Name of the method called. Unique identification numbers are used to distinguish different users. Latencies provide evidence of how real-time and fluent the user s game experience has been. Long latencies are generally unpleasant to the user. Message sizes indicate how much data has been transferred over the network. This is interesting because i) it is one of the factors showing how much load there is on the network and ii) operators providing networking services usually bill according to the amount of data transferred. It is also assumed that there are plenty of data sets available. With very limited data, it is not reasonable to perform some of the analysis or, with insufficient data, the results might at least be biased. For example, time series analysis is not considered applicable with series shorter than 100 periods.

7 3 Methods of data analysis 3.1 Statistical key figures A set of simple statistical key figures to describe a play session (or a set of play sessions) on a general level could include at least - arithmetic mean, - variance or standard deviation, - correlations, and - minimum and maximum values. The logged data is divided into finite sequences, one for each of the stored parameters such as latencies, processing times and data packet sizes. The previously described key figures are calculated from these sequences with formulae presented in Laininen (1998). Arithmetic means are used to find sequence s average value, e.g. average length of a play session. Variances or standard deviations show how constant the values in the sequence have been. Correlation coefficient is used to find interdependencies between variables. Minimum and maximum values of a sequence show the range or peak values of the sequence. Taken these statistical key figures, appropriate variables give information on the distinguishing characteristics of users play sessions, thereby depicting users behaviour as well. 3.2 Autocorrelations A time series is a sequence of data points with uniform time intervals, for example daily temperature measurements, opening prices of a share of stock or amount of data transferred per hour over a two-month period. With appropriate methods the future values of these data points can be predicted. (Pindyck and Rubinfeld, 1998) In this case autocorrelation is the correlation of a discrete time series with its time-shifted version. It is usually calculated for different time shifts (lags) and plotted as a bar graph (histogram) with lags on the x-axis and the correlations on the y-axis. The time series has to

8 be stationary, meaning that the probability density function of the random variable of which the series consist of does not change over time. Autocorrelations R(k) of a time series X t, k being the lag, are calculated with the following formula: [( µ )( µ )] E X ( ) = i X R k i+k. 2 σ The formula is similar to normal correlation coefficient, the only difference being that variable Y is replaced with variable X s time-shifted version. Autocorrelations can be used to identify repeating patterns in time series. High correlation on a certain time shift indicates a repeating pattern occurring with respective intervals. A heart rate would be a good example of this phenomenon: if the patients heart rate is 120 there would be a peak in the autocorrelation graph at 0.5 seconds. High but declining values in the beginning of an autocorrelation graph tell that changes in the original series are rather small and the new values are quite near to (i.e., highly correlated with) the previous ones. 3.3 Time series forecasting Time series forecasting is an attempt to predict future values of data points of a time series. To obtain justified estimates of future values one can use a mathematical model based on previous values of data points. Previous data points contain valuable information on the underlying process and that information can be used to select and fit an appropriate mathematical model. The process of analysing the previous data points and finding the appropriate model is called identification. (Pindyck and Rubinfeld, 1998) There are two widely used classes of forecasting models: autoregressive models (AR) and moving average models (MA). These two models can be combined, producing a family of ARMA models. ARMA(p,q) refers to an ARMA model with an autoregressive part of order p and a moving average part of order q. For some applications the ARMA model can be, if needed, developed further by adding a seasonal part (SARMA) and an integrated part (SARIMA) to obtain stationarity. However, those models are not covered in this study.

9 Autoregressive model of order p is simply a sum of a constant c (sometimes the constant is omitted), weighted sum of previous values X t-i and an error term t : X t p = c + φ i X t i + ε t. i= 1 Autoregressive model is a linear regressive model which can be fitted to data in hand with a least squares algorithm. Moving average model, linear as an AR(p) model, of order q is written as X t q = ε + θ ε, t i= 1 i t i where t, t-1, are error terms. By combining these two models we obtain an ARMA(p,q) model: p i= 1 t i + X t = ε + φ X θ ε. t i q i= 1 i t i The variables 1, q, and 1,, p are the parameters of the model. The error-minimising values of these parameters are usually found with a least squares algorithm; this is called model fitting. Choosing the correct orders p and q are often crucial for fitting the model. As a rule, the smallest values of p and q that produce adequate fit to the data are chosen. In this context seasonality is highly probable. It is quite clear that there are more users playing the games during daytime than at night or very early in the morning. That characteristic in mind, the data might probably require a seasonal ARIMA model which uses, in addition to the parameters described above, seasonal components with seasonal lag of S. An educated guess for the length of the season would be 24 (hours). More information on advanced time series forecasting and fitting the models can be found in the literature. Tools for time series analysis can be downloaded, e.g., from the home page of U.S. Census Bureau (see References).

10 4 Applications of data analysis methods 4.1 User segmentation Sets of statistical key figures One of the simplest ways to analyse data stored by a data logger might be to implement a feature that calculates sets of statistical key figures. From a programmers point of view this should not be a time-consuming task. Sets could include key figures calculated from either variables related to user s play sessions (user behaviour) or from variables related to server s performance. In these cases the figures could be exploited for example as follows: - Arithmetic means and standard deviations (or variances) could be calculated to obtain average values for server processing times, latencies and sizes of messages sent between users and the server. In this case e.g. large deviation of latencies could be a sign of an unreliable network connection. - Correlations between e.g. latencies, message sizes and server processing times might reveal useful information about the internal performance of the application. Also correlation between the number of users online and latencies and/or server processing times could prove useful information about server s capability to handle larger strain. - Minimum and maximum values can be used to find peak values, for example largest latencies users have experienced; this value among average latencies can be used as one method to measure users perceived quality. - Average number of user s method calls per play session or number of play sessions per day; according to this information the user could be classified e.g. as a heavy user or a random user Temporal segmentation with autocorrelations Besides their important role in identifying time series models, autocorrelations can be used independently to find certain characteristics of user behaviour. In this section a way to use autocorrelations for temporal segmentation of users is described. Users are segmented depending on the time of day when they use the application: in the morning, during afternoon

11 coffee breaks, in the evening or evenly during the day. Numbers of method calls triggered by a user per hour are used to conclude how active the user has been during that time of day. As stated in previous sections, a time series is a sequence of data points with uniform intervals. The problem in this application is that users do not trigger method calls regularly with uniform time intervals (Figure 2). Therefore, values of the attributes stored by the data logger cannot be handled as time series without manipulating the data properly. One way to obtain a feasible time series from the data at hand would be to count the method calls or cumulative values of variables during some limited time intervals. For example, total data transferred per one hour or number of method calls of a user triggered in one hour could be used as sources for data points (Figure 3). Figure 2 Uneven distribution of method calls. Method calls occur randomly instead of uniform time intervals, and thus require data manipulation to form a feasible time series.

12 Time Figure 3 Number of method calls as a histogram. Number of method calls during a certain time interval (one hour in this example) has been counted and visualised as a histogram. This data can now be used as input for analysis methods. With a small number of users one can draw conclusions visually from histograms, but a more useful way would be to analyse the data automatically using autocorrelations. For this analysis to be reliable, the amount of data has to be large enough. Practically more than 20 days should be enough, but with more data the analysis will be more accurate. The steps to implement a segmentation feature to an analyser application are as follows: 1. For each user, count the total number of method calls during a constant time period, for example one hour, over the total observation period, for example 30 days, thus obtaining a time series with 24 * 30 data points (time series A). User identification numbers and time stamps of the method calls are needed complete this task. 2. For each user, calculate autocorrelations of the time series A for lags between 1 to 24.

13 3. Analyse the autocorrelations: a. If there is a significant peak around the lag 24 the user is using the application daily around the same time of day. What is considered to be a significant peak can be determined e.g. by comparing single autocorrelations to their arithmetic mean or by selecting a predetermined threshold value. In that case, e.g. values over 0.50 are considered significant. (See Figure 4.) b. If autocorrelations seem to be nearly same for all lags the user is using the application randomly during the day and the user is segmented as a random user. Now skip the following steps and go to step 2 to analyse next user. 4. To find out the time of day which caused the high autocorrelation, count the total numbers of method calls that have occurred during a certain hour of day over the whole observation period. The result is a series of 24 data points (time series B). 5. If the values of the data points before noon are larger than the values in the afternoon or in the evening, i.e. if their difference from the arithmetic mean is more than a certain predetermined threshold, then the user is segmented as a morning user (or afternoon user or evening user respectively).

14 Correlation 1 0,9 0,8 0,7 0,6 0,5 0,4 0,3 0,2 0, Lag Figure 4 Autocorrelations of a user s time series A (from simulated data). Correlation is significantly high around lag 24, which implies that there is a pattern occurring with an interval of 24 hours. In other words, the users seems to be sending method calls regularly around the same time of day. Deeper inspection of autocorrelations can reveal more information on user s behaviour. For example, if the user is using the application twice a day, first around 7-8 a.m. and then again around a.m., there would be, in addition to lag 24 s peak, two slightly lower peaks around lags 3 and 21. However, finding and deciphering this information might be too complicated, but together with other means of analysis could give useful results Two users simultaneous play sessions To find out if two players play sessions are often overlapping an accurate algorithm could be designed. However, while that kind of algorithm might be rather easy to design, it may be time-consuming to execute due to complexity issues. Comparisons that need to be done will be (n 2 n) / 2, where n is the number of users. On the contrary, calculating covariances is

15 quite a simple task. In that sense, a statistical method is more desirable, albeit slightly less accurate. Covariance is calculated between two users time series As (from step 1 in the previous list). If the covariance is significant (larger than a certain threshold), the users are often sending similar numbers of method calls around the same time of day, which may imply that they are using the application simultaneously to play multi-player games together. However, there is a risk of false results because of at least two factors: The users may play the games with different strategies. User A s strategy might demand sending a lot of method calls, while user B is sending less method calls. In that case covariance is small and the simultaneity would be overlooked. Counting play sessions instead of method calls could reduce this risk. High covariance coefficient between two users time series does not automatically imply that they play the game together, even though they are playing the game around the same time of day. A combination of statistical analysis and a tailored algorithm with required built-in features of the platform s data logger would give optimal results. 4.2 Forecasting There are several elements of mobile games and user behaviour that can be forecasted. These elements include e.g. amounts of data to be transferred in the near future or number of players online on the next Saturday evening. As the process of selecting the appropriate model and fitting it is very complex, it might prove difficult to implement a built-in time series forecasting feature to the analyser. A more useful way to exploit time series forecasting would be to use ready-made statistical applications with time series features. The ready-made application with built-in tests can be used to select the model. If the structure of the time series seems stable enough over time, it may be appropriate to use the same model constantly once it has been identified. In that case it would be enough to fit

16 the parameters again when needed. If the re-selection of the model is unnecessary, it would be in within the limits of reasonable amount of programming work only to implement a least squares fitting algorithm to obtain new parameter values and to run it regularly, for example once a week to get the next week s estimates. More information on least squares algorithms is available in literature. 4.3 Case MUPE MUPE stands for Multiplayer Publishing Environment. MUPE is an open source platform for context-aware multiplayer mobile applications. It is released under the Nokia Open Source Licence (NOKOS). The model described above is customised and used to evaluate usage of MUPE. The data logger of MUPE stores a line of text every time a user sends a method call. The variables stored are timestamp, user identification number, server processing time, lag of the previous method call (-1 if the method call was user s first one), size of the message sent by the user and size of the response message Suggested analysis methods Useful methods of analysis include at least statistical key figures and autocorrelation analysis. There are two goals: the first goal is to get a view of single users play sessions and to be able to compare them to one user s previous play sessions and other users play sessions in general. The second goal is an attempt to segment users based on the time of day they are usually playing mobile games. The set of statistical key figures described in section 3.1 Statistical key figures is directly applicable to case MUPE. For one play session the set could include Start time, end time and duration Total amount of data sent and received during the session Highest, lowest and average latency and standard deviance of latencies

17 Highest, lowest and average server processing times and their standard deviations By comparing these sets of statistical key figures game developers can draw conclusions to some extent about users perceived experiences of quality and users behaviour. For all play sessions the set could include Start time, end time and duration of the observation period Number of players Number of play sessions Average number of play sessions per user and average time between play sessions Minimum, maximum and average values of latencies, server processing times etc. Average amount per session and total amount of data sent and received Correlation between number of open play sessions and server processing times and/or latencies Analysing these sets might provide useful information on both performance of the MUPE platform and users behaviour. Autocorrelation analysis and covariance analysis, as described in previous sections, can be exploited as such to segment users. However, it is crucial that there is enough data available for the analysis. Recommended minimum duration of the observation period is at least one month. Without enough data the results cannot be considered reliable. 5 Discussions The methods described in this study are not yet tested and validated with real data and therefore should be considered as a suggestion how this field of study could be approached. Further, autocorrelation and covariance analysis are not absolutely exact methods and results obtained from them are merely approximate; this should be noticed when exploiting the results.

18 Time series analysis requires an external application for the model selection and fitting process. The process is rather time consuming and hard to automate or to implement into an analyser application as such, but performed manually every once in a while it may give useful results. Regression analysis could also provide sound estimates of future server load. Most importantly, selection of variables that are stored has a strong impact on what analyses can be performed. Many of the calculations could be replaced with built-in features of server s data logger. For example, keeping record of online users directly by the server itself might require additional computing capacity, but that way exact results could be obtained. However, if the data logger is rather simple and all information must be derived from very limited data, the methods described in this study might well give adequate results.

19 References Multi-User Publishing Environment (MUPE), official web site, (Link verified on December 21, 2005) Laininen, Pertti (1998). Todennäköisyys ja sen tilastollinen soveltaminen. Otatieto, book number 586. Pindyck, R.S. and Rubinfeld, D.L. (1998). Econometric Models and Economic Forecasts. McGraw & Hill, New York. U. S. Census Bureau. The X-12-ARIMA Seasonal Adjustment Program, available at (Link verified on December 21, 2005) Kotler, P and Keller, K.L. (2005). Marketing Management, 12 th edition. Prentice Hall. Henderson, T. and Bhatti, S. (2001). Modelling User Behaviour in Networked Games. In Proceedings of ACM Multimedia, Ottawa, Canada. McCoy, A., Delaney, D., McLoone, S. and Ward, T. (2004). Towards Statistical Client Prediction Analysis of User Behaviour in Distributed Interactive Media. In Proceedings of the 5 th Game-On International Conference, MS Campus, Reading, UK.

Intro to ARMA models. FISH 507 Applied Time Series Analysis. Mark Scheuerell 15 Jan 2019

Intro to ARMA models. FISH 507 Applied Time Series Analysis. Mark Scheuerell 15 Jan 2019 Intro to ARMA models FISH 507 Applied Time Series Analysis Mark Scheuerell 15 Jan 2019 Topics for today Review White noise Random walks Autoregressive (AR) models Moving average (MA) models Autoregressive

More information

Aaron Daniel Chia Huang Licai Huang Medhavi Sikaria Signal Processing: Forecasting and Modeling

Aaron Daniel Chia Huang Licai Huang Medhavi Sikaria Signal Processing: Forecasting and Modeling Aaron Daniel Chia Huang Licai Huang Medhavi Sikaria Signal Processing: Forecasting and Modeling Abstract Forecasting future events and statistics is problematic because the data set is a stochastic, rather

More information

Basic Statistical Terms and Definitions

Basic Statistical Terms and Definitions I. Basics Basic Statistical Terms and Definitions Statistics is a collection of methods for planning experiments, and obtaining data. The data is then organized and summarized so that professionals can

More information

Data Analysis and Solver Plugins for KSpread USER S MANUAL. Tomasz Maliszewski

Data Analysis and Solver Plugins for KSpread USER S MANUAL. Tomasz Maliszewski Data Analysis and Solver Plugins for KSpread USER S MANUAL Tomasz Maliszewski tmaliszewski@wp.pl Table of Content CHAPTER 1: INTRODUCTION... 3 1.1. ABOUT DATA ANALYSIS PLUGIN... 3 1.3. ABOUT SOLVER PLUGIN...

More information

SYS 6021 Linear Statistical Models

SYS 6021 Linear Statistical Models SYS 6021 Linear Statistical Models Project 2 Spam Filters Jinghe Zhang Summary The spambase data and time indexed counts of spams and hams are studied to develop accurate spam filters. Static models are

More information

Forecasting Video Analytics Sami Abu-El-Haija, Ooyala Inc

Forecasting Video Analytics Sami Abu-El-Haija, Ooyala Inc Forecasting Video Analytics Sami Abu-El-Haija, Ooyala Inc (haija@stanford.edu; sami@ooyala.com) 1. Introduction Ooyala Inc provides video Publishers an endto-end functionality for transcoding, storing,

More information

Organizing and Summarizing Data

Organizing and Summarizing Data 1 Organizing and Summarizing Data Key Definitions Frequency Distribution: This lists each category of data and how often they occur. : The percent of observations within the one of the categories. This

More information

Cpk: What is its Capability? By: Rick Haynes, Master Black Belt Smarter Solutions, Inc.

Cpk: What is its Capability? By: Rick Haynes, Master Black Belt Smarter Solutions, Inc. C: What is its Capability? By: Rick Haynes, Master Black Belt Smarter Solutions, Inc. C is one of many capability metrics that are available. When capability metrics are used, organizations typically provide

More information

Section 4.1: Time Series I. Jared S. Murray The University of Texas at Austin McCombs School of Business

Section 4.1: Time Series I. Jared S. Murray The University of Texas at Austin McCombs School of Business Section 4.1: Time Series I Jared S. Murray The University of Texas at Austin McCombs School of Business 1 Time Series Data and Dependence Time-series data are simply a collection of observations gathered

More information

Slide Copyright 2005 Pearson Education, Inc. SEVENTH EDITION and EXPANDED SEVENTH EDITION. Chapter 13. Statistics Sampling Techniques

Slide Copyright 2005 Pearson Education, Inc. SEVENTH EDITION and EXPANDED SEVENTH EDITION. Chapter 13. Statistics Sampling Techniques SEVENTH EDITION and EXPANDED SEVENTH EDITION Slide - Chapter Statistics. Sampling Techniques Statistics Statistics is the art and science of gathering, analyzing, and making inferences from numerical information

More information

Further Maths Notes. Common Mistakes. Read the bold words in the exam! Always check data entry. Write equations in terms of variables

Further Maths Notes. Common Mistakes. Read the bold words in the exam! Always check data entry. Write equations in terms of variables Further Maths Notes Common Mistakes Read the bold words in the exam! Always check data entry Remember to interpret data with the multipliers specified (e.g. in thousands) Write equations in terms of variables

More information

ADAPTIVE NETWORK ANOMALY DETECTION USING BANDWIDTH UTILISATION DATA

ADAPTIVE NETWORK ANOMALY DETECTION USING BANDWIDTH UTILISATION DATA 1st International Conference on Experiments/Process/System Modeling/Simulation/Optimization 1st IC-EpsMsO Athens, 6-9 July, 2005 IC-EpsMsO ADAPTIVE NETWORK ANOMALY DETECTION USING BANDWIDTH UTILISATION

More information

Data can be in the form of numbers, words, measurements, observations or even just descriptions of things.

Data can be in the form of numbers, words, measurements, observations or even just descriptions of things. + What is Data? Data is a collection of facts. Data can be in the form of numbers, words, measurements, observations or even just descriptions of things. In most cases, data needs to be interpreted and

More information

Chapter 6: DESCRIPTIVE STATISTICS

Chapter 6: DESCRIPTIVE STATISTICS Chapter 6: DESCRIPTIVE STATISTICS Random Sampling Numerical Summaries Stem-n-Leaf plots Histograms, and Box plots Time Sequence Plots Normal Probability Plots Sections 6-1 to 6-5, and 6-7 Random Sampling

More information

MODEL BASED CLASSIFICATION USING MULTI-PING DATA

MODEL BASED CLASSIFICATION USING MULTI-PING DATA MODEL BASED CLASSIFICATION USING MULTI-PING DATA Christopher P. Carbone Naval Undersea Warfare Center Division Newport 76 Howell St. Newport, RI 084 (email: carbonecp@npt.nuwc.navy.mil) Steven M. Kay University

More information

Lecture 3: Linear Classification

Lecture 3: Linear Classification Lecture 3: Linear Classification Roger Grosse 1 Introduction Last week, we saw an example of a learning task called regression. There, the goal was to predict a scalar-valued target from a set of features.

More information

CHAPTER 2: SAMPLING AND DATA

CHAPTER 2: SAMPLING AND DATA CHAPTER 2: SAMPLING AND DATA This presentation is based on material and graphs from Open Stax and is copyrighted by Open Stax and Georgia Highlands College. OUTLINE 2.1 Stem-and-Leaf Graphs (Stemplots),

More information

Generating random samples from user-defined distributions

Generating random samples from user-defined distributions The Stata Journal (2011) 11, Number 2, pp. 299 304 Generating random samples from user-defined distributions Katarína Lukácsy Central European University Budapest, Hungary lukacsy katarina@phd.ceu.hu Abstract.

More information

Chapter 2: Frequency Distributions

Chapter 2: Frequency Distributions Chapter 2: Frequency Distributions Chapter Outline 2.1 Introduction to Frequency Distributions 2.2 Frequency Distribution Tables Obtaining ΣX from a Frequency Distribution Table Proportions and Percentages

More information

Model Based Symbolic Description for Big Data Analysis

Model Based Symbolic Description for Big Data Analysis Model Based Symbolic Description for Big Data Analysis 1 Model Based Symbolic Description for Big Data Analysis *Carlo Drago, **Carlo Lauro and **Germana Scepi *University of Rome Niccolo Cusano, **University

More information

Chapter 2 Describing, Exploring, and Comparing Data

Chapter 2 Describing, Exploring, and Comparing Data Slide 1 Chapter 2 Describing, Exploring, and Comparing Data Slide 2 2-1 Overview 2-2 Frequency Distributions 2-3 Visualizing Data 2-4 Measures of Center 2-5 Measures of Variation 2-6 Measures of Relative

More information

Building Better Parametric Cost Models

Building Better Parametric Cost Models Building Better Parametric Cost Models Based on the PMI PMBOK Guide Fourth Edition 37 IPDI has been reviewed and approved as a provider of project management training by the Project Management Institute

More information

DOWNLOAD PDF BIG IDEAS MATH VERTICAL SHRINK OF A PARABOLA

DOWNLOAD PDF BIG IDEAS MATH VERTICAL SHRINK OF A PARABOLA Chapter 1 : BioMath: Transformation of Graphs Use the results in part (a) to identify the vertex of the parabola. c. Find a vertical line on your graph paper so that when you fold the paper, the left portion

More information

McGraw-Hill Ryerson. Data Management 12. Section 5.1 Continuous Random Variables. Continuous Random. Variables

McGraw-Hill Ryerson. Data Management 12. Section 5.1 Continuous Random Variables. Continuous Random. Variables McGraw-Hill Ryerson Data Management 12 Section Continuous Random I am learning to distinguish between discrete variables and continuous variables work with sample values for situations that can take on

More information

3. Data Analysis and Statistics

3. Data Analysis and Statistics 3. Data Analysis and Statistics 3.1 Visual Analysis of Data 3.2.1 Basic Statistics Examples 3.2.2 Basic Statistical Theory 3.3 Normal Distributions 3.4 Bivariate Data 3.1 Visual Analysis of Data Visual

More information

Probability Models.S4 Simulating Random Variables

Probability Models.S4 Simulating Random Variables Operations Research Models and Methods Paul A. Jensen and Jonathan F. Bard Probability Models.S4 Simulating Random Variables In the fashion of the last several sections, we will often create probability

More information

Software Engineering - I

Software Engineering - I Software Engineering - I An Introduction to Software Construction Techniques for Industrial Strength Software Chapter 3 Requirement Engineering Copy Rights Virtual University of Pakistan 1 Requirement

More information

Time Series Analysis by State Space Methods

Time Series Analysis by State Space Methods Time Series Analysis by State Space Methods Second Edition J. Durbin London School of Economics and Political Science and University College London S. J. Koopman Vrije Universiteit Amsterdam OXFORD UNIVERSITY

More information

Chapter 12: Statistics

Chapter 12: Statistics Chapter 12: Statistics Once you have imported your data or created a geospatial model, you may wish to calculate some simple statistics, run some simple tests, or see some traditional plots. On the main

More information

Prepare a stem-and-leaf graph for the following data. In your final display, you should arrange the leaves for each stem in increasing order.

Prepare a stem-and-leaf graph for the following data. In your final display, you should arrange the leaves for each stem in increasing order. Chapter 2 2.1 Descriptive Statistics A stem-and-leaf graph, also called a stemplot, allows for a nice overview of quantitative data without losing information on individual observations. It can be a good

More information

Week 2: Frequency distributions

Week 2: Frequency distributions Types of data Health Sciences M.Sc. Programme Applied Biostatistics Week 2: distributions Data can be summarised to help to reveal information they contain. We do this by calculating numbers from the data

More information

To complete the computer assignments, you ll use the EViews software installed on the lab PCs in WMC 2502 and WMC 2506.

To complete the computer assignments, you ll use the EViews software installed on the lab PCs in WMC 2502 and WMC 2506. An Introduction to EViews The purpose of the computer assignments in BUEC 333 is to give you some experience using econometric software to analyse real-world data. Along the way, you ll become acquainted

More information

STA 570 Spring Lecture 5 Tuesday, Feb 1

STA 570 Spring Lecture 5 Tuesday, Feb 1 STA 570 Spring 2011 Lecture 5 Tuesday, Feb 1 Descriptive Statistics Summarizing Univariate Data o Standard Deviation, Empirical Rule, IQR o Boxplots Summarizing Bivariate Data o Contingency Tables o Row

More information

Predicting Messaging Response Time in a Long Distance Relationship

Predicting Messaging Response Time in a Long Distance Relationship Predicting Messaging Response Time in a Long Distance Relationship Meng-Chen Shieh m3shieh@ucsd.edu I. Introduction The key to any successful relationship is communication, especially during times when

More information

Exploring Econometric Model Selection Using Sensitivity Analysis

Exploring Econometric Model Selection Using Sensitivity Analysis Exploring Econometric Model Selection Using Sensitivity Analysis William Becker Paolo Paruolo Andrea Saltelli Nice, 2 nd July 2013 Outline What is the problem we are addressing? Past approaches Hoover

More information

Scientific Computing: An Introductory Survey

Scientific Computing: An Introductory Survey Scientific Computing: An Introductory Survey Chapter 13 Random Numbers and Stochastic Simulation Prof. Michael T. Heath Department of Computer Science University of Illinois at Urbana-Champaign Copyright

More information

( ) = Y ˆ. Calibration Definition A model is calibrated if its predictions are right on average: ave(response Predicted value) = Predicted value.

( ) = Y ˆ. Calibration Definition A model is calibrated if its predictions are right on average: ave(response Predicted value) = Predicted value. Calibration OVERVIEW... 2 INTRODUCTION... 2 CALIBRATION... 3 ANOTHER REASON FOR CALIBRATION... 4 CHECKING THE CALIBRATION OF A REGRESSION... 5 CALIBRATION IN SIMPLE REGRESSION (DISPLAY.JMP)... 5 TESTING

More information

Louis Fourrier Fabien Gaie Thomas Rolf

Louis Fourrier Fabien Gaie Thomas Rolf CS 229 Stay Alert! The Ford Challenge Louis Fourrier Fabien Gaie Thomas Rolf Louis Fourrier Fabien Gaie Thomas Rolf 1. Problem description a. Goal Our final project is a recent Kaggle competition submitted

More information

Network Traffic Measurements and Analysis

Network Traffic Measurements and Analysis DEIB - Politecnico di Milano Fall, 2017 Introduction Often, we have only a set of features x = x 1, x 2,, x n, but no associated response y. Therefore we are not interested in prediction nor classification,

More information

10.4 Linear interpolation method Newton s method

10.4 Linear interpolation method Newton s method 10.4 Linear interpolation method The next best thing one can do is the linear interpolation method, also known as the double false position method. This method works similarly to the bisection method by

More information

Institute for Statics und Dynamics of Structures Fuzzy Time Series

Institute for Statics und Dynamics of Structures Fuzzy Time Series Institute for Statics und Dynamics of Structures Fuzzy Time Series Bernd Möller 1 Description of fuzzy time series 2 3 4 5 Conclusions Folie 2 von slide422 1 Description of fuzzy time series 2 3 4 5 Conclusions

More information

Fathom Dynamic Data TM Version 2 Specifications

Fathom Dynamic Data TM Version 2 Specifications Data Sources Fathom Dynamic Data TM Version 2 Specifications Use data from one of the many sample documents that come with Fathom. Enter your own data by typing into a case table. Paste data from other

More information

Knowledge Discovery and Data Mining

Knowledge Discovery and Data Mining Knowledge Discovery and Data Mining Computer Science 591Y Department of Computer Science University of Massachusetts Amherst February 3, 2005 Topics Tasks (Definition, example, and notes) Classification

More information

Q WEB APPLICATION ATTACK STATISTICS

Q WEB APPLICATION ATTACK STATISTICS WEB APPLICATION ATTACK STATISTICS CONTENTS Introduction...3 Results at a glance...4 Web application attacks: statistics...5 Attack types...5 Attack trends...8 Conclusions... 11 2 INTRODUCTION This report

More information

Graph Structure Over Time

Graph Structure Over Time Graph Structure Over Time Observing how time alters the structure of the IEEE data set Priti Kumar Computer Science Rensselaer Polytechnic Institute Troy, NY Kumarp3@rpi.edu Abstract This paper examines

More information

Readers will be provided a link to download the software and Excel files that are used in the book after payment. Please visit

Readers will be provided a link to download the software and Excel files that are used in the book after payment. Please visit Readers will be provided a link to download the software and Excel files that are used in the book after payment. Please visit http://www.xlpert.com for more information on the book. The Excel files are

More information

Chapter 2 - Graphical Summaries of Data

Chapter 2 - Graphical Summaries of Data Chapter 2 - Graphical Summaries of Data Data recorded in the sequence in which they are collected and before they are processed or ranked are called raw data. Raw data is often difficult to make sense

More information

Data analysis using Microsoft Excel

Data analysis using Microsoft Excel Introduction to Statistics Statistics may be defined as the science of collection, organization presentation analysis and interpretation of numerical data from the logical analysis. 1.Collection of Data

More information

Introduction to Geospatial Analysis

Introduction to Geospatial Analysis Introduction to Geospatial Analysis Introduction to Geospatial Analysis 1 Descriptive Statistics Descriptive statistics. 2 What and Why? Descriptive Statistics Quantitative description of data Why? Allow

More information

Chapter 5. Track Geometry Data Analysis

Chapter 5. Track Geometry Data Analysis Chapter Track Geometry Data Analysis This chapter explains how and why the data collected for the track geometry was manipulated. The results of these studies in the time and frequency domain are addressed.

More information

A Statistical Analysis of UK Financial Networks

A Statistical Analysis of UK Financial Networks A Statistical Analysis of UK Financial Networks J. Chu & S. Nadarajah First version: 31 December 2016 Research Report No. 9, 2016, Probability and Statistics Group School of Mathematics, The University

More information

M. Xie, G. Y. Hong and C. Wohlin, "A Study of Exponential Smoothing Technique in Software Reliability Growth Prediction", Quality and Reliability

M. Xie, G. Y. Hong and C. Wohlin, A Study of Exponential Smoothing Technique in Software Reliability Growth Prediction, Quality and Reliability M. Xie, G. Y. Hong and C. Wohlin, "A Study of Exponential Smoothing Technique in Software Reliability Growth Prediction", Quality and Reliability Engineering International, Vol.13, pp. 247-353, 1997. 1

More information

Downloaded from

Downloaded from UNIT 2 WHAT IS STATISTICS? Researchers deal with a large amount of data and have to draw dependable conclusions on the basis of data collected for the purpose. Statistics help the researchers in making

More information

of Nebraska - Lincoln

of Nebraska - Lincoln University of Nebraska - Lincoln DigitalCommons@University of Nebraska - Lincoln MAT Exam Expository Papers Math in the Middle Institute Partnership -007 The Polygon Game Kyla Hall Follow this and additional

More information

Response to API 1163 and Its Impact on Pipeline Integrity Management

Response to API 1163 and Its Impact on Pipeline Integrity Management ECNDT 2 - Tu.2.7.1 Response to API 3 and Its Impact on Pipeline Integrity Management Munendra S TOMAR, Martin FINGERHUT; RTD Quality Services, USA Abstract. Knowing the accuracy and reliability of ILI

More information

3D Unsharp Masking for Scene Coherent Enhancement Supplemental Material 1: Experimental Validation of the Algorithm

3D Unsharp Masking for Scene Coherent Enhancement Supplemental Material 1: Experimental Validation of the Algorithm 3D Unsharp Masking for Scene Coherent Enhancement Supplemental Material 1: Experimental Validation of the Algorithm Tobias Ritschel Kaleigh Smith Matthias Ihrke Thorsten Grosch Karol Myszkowski Hans-Peter

More information

VCEasy VISUAL FURTHER MATHS. Overview

VCEasy VISUAL FURTHER MATHS. Overview VCEasy VISUAL FURTHER MATHS Overview This booklet is a visual overview of the knowledge required for the VCE Year 12 Further Maths examination.! This booklet does not replace any existing resources that

More information

FACETs. Technical Report 05/19/2010

FACETs. Technical Report 05/19/2010 F3 FACETs Technical Report 05/19/2010 PROJECT OVERVIEW... 4 BASIC REQUIREMENTS... 4 CONSTRAINTS... 5 DEVELOPMENT PROCESS... 5 PLANNED/ACTUAL SCHEDULE... 6 SYSTEM DESIGN... 6 PRODUCT AND PROCESS METRICS...

More information

Chapter 5snow year.notebook March 15, 2018

Chapter 5snow year.notebook March 15, 2018 Chapter 5: Statistical Reasoning Section 5.1 Exploring Data Measures of central tendency (Mean, Median and Mode) attempt to describe a set of data by identifying the central position within a set of data

More information

ROLE OF RANDOM NUMBERS IN SIMULATIONS OF ECONOMIC PROCESSES

ROLE OF RANDOM NUMBERS IN SIMULATIONS OF ECONOMIC PROCESSES 562 ROLE OF RANDOM NUMBERS IN SIMULATIONS OF ECONOMIC PROCESSES Dominika Crnjac Milić, Ph.D. J.J. Strossmayer University in Osijek Faculty of Electrical Engineering Ljiljanka Kvesić, M.Sc. University of

More information

The first few questions on this worksheet will deal with measures of central tendency. These data types tell us where the center of the data set lies.

The first few questions on this worksheet will deal with measures of central tendency. These data types tell us where the center of the data set lies. Instructions: You are given the following data below these instructions. Your client (Courtney) wants you to statistically analyze the data to help her reach conclusions about how well she is teaching.

More information

(S)LOC Count Evolution for Selected OSS Projects. Tik Report 315

(S)LOC Count Evolution for Selected OSS Projects. Tik Report 315 (S)LOC Count Evolution for Selected OSS Projects Tik Report 315 Arno Wagner arno@wagner.name December 11, 009 Abstract We measure the dynamics in project code size for several large open source projects,

More information

Averages and Variation

Averages and Variation Averages and Variation 3 Copyright Cengage Learning. All rights reserved. 3.1-1 Section 3.1 Measures of Central Tendency: Mode, Median, and Mean Copyright Cengage Learning. All rights reserved. 3.1-2 Focus

More information

Process Modeling. Wei-Tsong Wang 1 IIM, NCKU

Process Modeling. Wei-Tsong Wang 1 IIM, NCKU Process Modeling Based on Chapter 9 of Whitten, Bentley, and Dittman: Systems Analysis and Design for the Global Enterprise (7th Ed). McGraw Hill. 2007 Wei-Tsong Wang 1 IIM, NCKU 2 Models: Logical and

More information

Error Analysis, Statistics and Graphing

Error Analysis, Statistics and Graphing Error Analysis, Statistics and Graphing This semester, most of labs we require us to calculate a numerical answer based on the data we obtain. A hard question to answer in most cases is how good is your

More information

Chapter Two: Descriptive Methods 1/50

Chapter Two: Descriptive Methods 1/50 Chapter Two: Descriptive Methods 1/50 2.1 Introduction 2/50 2.1 Introduction We previously said that descriptive statistics is made up of various techniques used to summarize the information contained

More information

WhatsApp Group Data Analysis with R

WhatsApp Group Data Analysis with R WhatsApp Group Data Analysis with R Sanchita Patil MCA Department Vivekanand Education Society's Institute of Technology Chembur, Mumbai 400074. ABSTRACT The means of communication has changed over time

More information

Year 10 General Mathematics Unit 2

Year 10 General Mathematics Unit 2 Year 11 General Maths Year 10 General Mathematics Unit 2 Bivariate Data Chapter 4 Chapter Four 1 st Edition 2 nd Edition 2013 4A 1, 2, 3, 4, 6, 7, 8, 9, 10, 11 1, 2, 3, 4, 6, 7, 8, 9, 10, 11 2F (FM) 1,

More information

Statistical Good Practice Guidelines. 1. Introduction. Contents. SSC home Using Excel for Statistics - Tips and Warnings

Statistical Good Practice Guidelines. 1. Introduction. Contents. SSC home Using Excel for Statistics - Tips and Warnings Statistical Good Practice Guidelines SSC home Using Excel for Statistics - Tips and Warnings On-line version 2 - March 2001 This is one in a series of guides for research and support staff involved in

More information

Detecting Salient Contours Using Orientation Energy Distribution. Part I: Thresholding Based on. Response Distribution

Detecting Salient Contours Using Orientation Energy Distribution. Part I: Thresholding Based on. Response Distribution Detecting Salient Contours Using Orientation Energy Distribution The Problem: How Does the Visual System Detect Salient Contours? CPSC 636 Slide12, Spring 212 Yoonsuck Choe Co-work with S. Sarma and H.-C.

More information

WELCOME! Lecture 3 Thommy Perlinger

WELCOME! Lecture 3 Thommy Perlinger Quantitative Methods II WELCOME! Lecture 3 Thommy Perlinger Program Lecture 3 Cleaning and transforming data Graphical examination of the data Missing Values Graphical examination of the data It is important

More information

ACCURACY AND EFFICIENCY OF MONTE CARLO METHOD. Julius Goodman. Bechtel Power Corporation E. Imperial Hwy. Norwalk, CA 90650, U.S.A.

ACCURACY AND EFFICIENCY OF MONTE CARLO METHOD. Julius Goodman. Bechtel Power Corporation E. Imperial Hwy. Norwalk, CA 90650, U.S.A. - 430 - ACCURACY AND EFFICIENCY OF MONTE CARLO METHOD Julius Goodman Bechtel Power Corporation 12400 E. Imperial Hwy. Norwalk, CA 90650, U.S.A. ABSTRACT The accuracy of Monte Carlo method of simulating

More information

Chapter 3 - Displaying and Summarizing Quantitative Data

Chapter 3 - Displaying and Summarizing Quantitative Data Chapter 3 - Displaying and Summarizing Quantitative Data 3.1 Graphs for Quantitative Data (LABEL GRAPHS) August 25, 2014 Histogram (p. 44) - Graph that uses bars to represent different frequencies or relative

More information

PARAMETRIC ESTIMATION OF CONSTRUCTION COST USING COMBINED BOOTSTRAP AND REGRESSION TECHNIQUE

PARAMETRIC ESTIMATION OF CONSTRUCTION COST USING COMBINED BOOTSTRAP AND REGRESSION TECHNIQUE INTERNATIONAL JOURNAL OF CIVIL ENGINEERING AND TECHNOLOGY (IJCIET) Proceedings of the International Conference on Emerging Trends in Engineering and Management (ICETEM14) ISSN 0976 6308 (Print) ISSN 0976

More information

BoR (11) 08. BEREC Report on Alternative Voice and SMS Retail Roaming Tariffs and Retail Data Roaming Tariffs

BoR (11) 08. BEREC Report on Alternative Voice and SMS Retail Roaming Tariffs and Retail Data Roaming Tariffs BEREC Report on Alternative Voice and SMS Retail Roaming Tariffs and Retail Data Roaming Tariffs February 2011 1. Overview This BEREC Report on Alternative Retail Voice and SMS Roaming Tariffs and Retail

More information

Curve fitting. Lab. Formulation. Truncation Error Round-off. Measurement. Good data. Not as good data. Least squares polynomials.

Curve fitting. Lab. Formulation. Truncation Error Round-off. Measurement. Good data. Not as good data. Least squares polynomials. Formulating models We can use information from data to formulate mathematical models These models rely on assumptions about the data or data not collected Different assumptions will lead to different models.

More information

[spa-temp.inf] Spatial-temporal information

[spa-temp.inf] Spatial-temporal information [spa-temp.inf] Spatial-temporal information VI Table of Contents for Spatial-temporal information I. Spatial-temporal information........................................... VI - 1 A. Cohort-survival method.........................................

More information

STA Rev. F Learning Objectives. Learning Objectives (Cont.) Module 3 Descriptive Measures

STA Rev. F Learning Objectives. Learning Objectives (Cont.) Module 3 Descriptive Measures STA 2023 Module 3 Descriptive Measures Learning Objectives Upon completing this module, you should be able to: 1. Explain the purpose of a measure of center. 2. Obtain and interpret the mean, median, and

More information

B.2 Measures of Central Tendency and Dispersion

B.2 Measures of Central Tendency and Dispersion Appendix B. Measures of Central Tendency and Dispersion B B. Measures of Central Tendency and Dispersion What you should learn Find and interpret the mean, median, and mode of a set of data. Determine

More information

CHAPTER 4 STOCK PRICE PREDICTION USING MODIFIED K-NEAREST NEIGHBOR (MKNN) ALGORITHM

CHAPTER 4 STOCK PRICE PREDICTION USING MODIFIED K-NEAREST NEIGHBOR (MKNN) ALGORITHM CHAPTER 4 STOCK PRICE PREDICTION USING MODIFIED K-NEAREST NEIGHBOR (MKNN) ALGORITHM 4.1 Introduction Nowadays money investment in stock market gains major attention because of its dynamic nature. So the

More information

Understanding and Comparing Distributions. Chapter 4

Understanding and Comparing Distributions. Chapter 4 Understanding and Comparing Distributions Chapter 4 Objectives: Boxplot Calculate Outliers Comparing Distributions Timeplot The Big Picture We can answer much more interesting questions about variables

More information

Understanding Statistical Questions

Understanding Statistical Questions Unit 6: Statistics Standards, Checklist and Concept Map Common Core Georgia Performance Standards (CCGPS): MCC6.SP.1: Recognize a statistical question as one that anticipates variability in the data related

More information

DISTRIBUTED HIGH-SPEED COMPUTING OF MULTIMEDIA DATA

DISTRIBUTED HIGH-SPEED COMPUTING OF MULTIMEDIA DATA DISTRIBUTED HIGH-SPEED COMPUTING OF MULTIMEDIA DATA M. GAUS, G. R. JOUBERT, O. KAO, S. RIEDEL AND S. STAPEL Technical University of Clausthal, Department of Computer Science Julius-Albert-Str. 4, 38678

More information

Planting the Seeds Exploring Cubic Functions

Planting the Seeds Exploring Cubic Functions 295 Planting the Seeds Exploring Cubic Functions 4.1 LEARNING GOALS In this lesson, you will: Represent cubic functions using words, tables, equations, and graphs. Interpret the key characteristics of

More information

Continuous Probability Distributions

Continuous Probability Distributions Continuous Probability Distributions Chapter 7 McGraw-Hill/Irwin Copyright 2011 by the McGraw-Hill Companies, Inc. All rights reserved. 제 6 장연속확률분포와정규분포 7-2 제 6 장연속확률분포와정규분포 7-3 LEARNING OBJECTIVES LO1.

More information

Introduction to Artificial Intelligence

Introduction to Artificial Intelligence Introduction to Artificial Intelligence COMP307 Evolutionary Computing 3: Genetic Programming for Regression and Classification Yi Mei yi.mei@ecs.vuw.ac.nz 1 Outline Statistical parameter regression Symbolic

More information

Example how not to do it: JMP in a nutshell 1 HR, 17 Apr Subject Gender Condition Turn Reactiontime. A1 male filler

Example how not to do it: JMP in a nutshell 1 HR, 17 Apr Subject Gender Condition Turn Reactiontime. A1 male filler JMP in a nutshell 1 HR, 17 Apr 2018 The software JMP Pro 14 is installed on the Macs of the Phonetics Institute. Private versions can be bought from

More information

Vocabulary. 5-number summary Rule. Area principle. Bar chart. Boxplot. Categorical data condition. Categorical variable.

Vocabulary. 5-number summary Rule. Area principle. Bar chart. Boxplot. Categorical data condition. Categorical variable. 5-number summary 68-95-99.7 Rule Area principle Bar chart Bimodal Boxplot Case Categorical data Categorical variable Center Changing center and spread Conditional distribution Context Contingency table

More information

Chapter 6 Normal Probability Distributions

Chapter 6 Normal Probability Distributions Chapter 6 Normal Probability Distributions 6-1 Review and Preview 6-2 The Standard Normal Distribution 6-3 Applications of Normal Distributions 6-4 Sampling Distributions and Estimators 6-5 The Central

More information

Table of Contents (As covered from textbook)

Table of Contents (As covered from textbook) Table of Contents (As covered from textbook) Ch 1 Data and Decisions Ch 2 Displaying and Describing Categorical Data Ch 3 Displaying and Describing Quantitative Data Ch 4 Correlation and Linear Regression

More information

University of Florida CISE department Gator Engineering. Clustering Part 5

University of Florida CISE department Gator Engineering. Clustering Part 5 Clustering Part 5 Dr. Sanjay Ranka Professor Computer and Information Science and Engineering University of Florida, Gainesville SNN Approach to Clustering Ordinary distance measures have problems Euclidean

More information

Mean Tests & X 2 Parametric vs Nonparametric Errors Selection of a Statistical Test SW242

Mean Tests & X 2 Parametric vs Nonparametric Errors Selection of a Statistical Test SW242 Mean Tests & X 2 Parametric vs Nonparametric Errors Selection of a Statistical Test SW242 Creation & Description of a Data Set * 4 Levels of Measurement * Nominal, ordinal, interval, ratio * Variable Types

More information

Data Analyst Nanodegree Syllabus

Data Analyst Nanodegree Syllabus Data Analyst Nanodegree Syllabus Discover Insights from Data with Python, R, SQL, and Tableau Before You Start Prerequisites : In order to succeed in this program, we recommend having experience working

More information

Time Series Prediction and Neural Networks

Time Series Prediction and Neural Networks Time Series Prediction and Neural Networks N.Davey, S.P.Hunt, R.J.Frank, University of Hertfordshire Hatfield, UK. Email: {N.Davey, S.P.Hunt, R.J.Frank}@herts.ac.uk Abstract Neural Network approaches to

More information

Maths Revision Worksheet: Algebra I Week 1 Revision 5 Problems per night

Maths Revision Worksheet: Algebra I Week 1 Revision 5 Problems per night 2 nd Year Maths Revision Worksheet: Algebra I Maths Revision Worksheet: Algebra I Week 1 Revision 5 Problems per night 1. I know how to add and subtract positive and negative numbers. 2. I know how to

More information

Learning Objectives. Continuous Random Variables & The Normal Probability Distribution. Continuous Random Variable

Learning Objectives. Continuous Random Variables & The Normal Probability Distribution. Continuous Random Variable Learning Objectives Continuous Random Variables & The Normal Probability Distribution 1. Understand characteristics about continuous random variables and probability distributions 2. Understand the uniform

More information

Introduction to Objective Analysis

Introduction to Objective Analysis Chapter 4 Introduction to Objective Analysis Atmospheric data are routinely collected around the world but observation sites are located rather randomly from a spatial perspective. On the other hand, most

More information

Indexing and Querying Constantly Evolving Data Using Time Series Analysis

Indexing and Querying Constantly Evolving Data Using Time Series Analysis Indexing and Querying Constantly Evolving Data Using Time Series Analysis Yuni Xia 1, Sunil Prabhakar 1, Jianzhong Sun 2, and Shan Lei 1 1 Computer Science Department, Purdue University 2 Mathematics Department,

More information

2013 Association Marketing Benchmark Report

2013 Association  Marketing Benchmark Report 2013 Association Email Marketing Benchmark Report Part I: Key Metrics 1 TABLE of CONTENTS About Informz.... 3 Introduction.... 4 Key Findings.... 5 Overall Association Metrics... 6 Results by Country of

More information

Network Bandwidth Utilization Prediction Based on Observed SNMP Data

Network Bandwidth Utilization Prediction Based on Observed SNMP Data 160 TUTA/IOE/PCU Journal of the Institute of Engineering, 2017, 13(1): 160-168 TUTA/IOE/PCU Printed in Nepal Network Bandwidth Utilization Prediction Based on Observed SNMP Data Nandalal Rana 1, Krishna

More information