MATLAB COMPUTATIONAL FINANCE CONFERENCE Quantitative Sports Analytics using MATLAB

Similar documents
Football result prediction using simple classification algorithms, a comparison between k-nearest Neighbor and Linear Regression

American Football Route Identification Using Supervised Machine Learning

Weighted Powers Ranking Method

Stat 528 (Autumn 2008) Density Curves and the Normal Distribution. Measures of center and spread. Features of the normal distribution

The CMA Program: The Essential Credential

Clustering Lecture 5: Mixture Model

Week 7 Picturing Network. Vahe and Bethany

Skills Academy. Forensic Studies Courses

CPSC 340: Machine Learning and Data Mining. Principal Component Analysis Fall 2017

Announcement date: December 1, 2009 New program launch date: May 1, 2010

CDG2A/CDZ4A/CDC4A/ MBT4A ELEMENTS OF OPERATIONS RESEARCH. Unit : I - V

Stats 50: Linear Regression Analysis of NCAA Basketball Data April 8, 2016

Also please note there are a number of documents outlining more detailed League Manager processes at support.tennis.com.au

Nonparametric Risk Attribution for Factor Models of Portfolios. October 3, 2017 Kellie Ottoboni

Information Technology

Machine Learning for Professional Tennis Match Prediction and Betting

Support Vector Machines: Brief Overview" November 2011 CPSC 352

2017 Ethics & Compliance Hotline & Incident Management Benchmark Report Webinar

Subject : Mathematics Level B1 Class VII Lesson: 1 (Integers)

CMA. Certified Management Accountant

MSc Econometrics. VU Amsterdam School of Business and Economics. Academic year

Print Article - Team Managers Manual. This article is also available for viewing online at

Rolling Markov Chain Monte Carlo

Puck Systems User Manual for Coaches and Team Managers

CPSC 340: Machine Learning and Data Mining. More Linear Classifiers Fall 2017

PROFESSIONAL DEVELOPMENT COURSES. May - December Institute for Professional Excellence

GoVenture

Lecture 4. Lecture 4: The E/R Model

Associate of Science in Business Administration To Bachelor of Science in Business Administration

Team Administrator Instruction Guide: Access Information: Updating Home Game Times and Fields:

Introduction to the Oracle Academy. Copyright 2007, Oracle. All rights reserved.

More on Neural Networks. Read Chapter 5 in the text by Bishop, except omit Sections 5.3.3, 5.3.4, 5.4, 5.5.4, 5.5.5, 5.5.6, 5.5.7, and 5.

Bachelor of Computer Science (Course Code: C2001)

Function Algorithms: Linear Regression, Logistic Regression

CHAPTER 2 DESCRIPTIVE STATISTICS

Section 2.1: Intro to Simple Linear Regression & Least Squares

Business Analytics Nanodegree Syllabus

Using Machine Learning to Optimize Storage Systems

Nuts and Bolts Research Methods Symposium

MHPE 494: Data Analysis. Welcome! The Analytic Process

NCAA Instructions Nerdy for Sports

Agile where are we at?

The Chicago Punks CS300 PROJECT 2 WEB PROPOSAL

Comparative analysis of data mining methods for predicting credit default probabilities in a retail bank portfolio

Java Outline (Upto Exam 2)

Citizen Data Scientist is the new Data Analyst

Table of contents 2 / 42

A BA General Business Administration Associate in Applied Science

A Brief Introduction to Data Mining

Table XXX MBA Assessment Results for Basic Content Knowledge Learning Goal: Aggregate Subject Matter Scores

SIDStats Tennis User Documentation Updated: 7/9/2010

Evaluation Metrics. (Classifiers) CS229 Section Anand Avati

Rolling Markov Chain Monte Carlo

Chapter 28. Outline. Definitions of Data Mining. Data Mining Concepts

Problem Set #6 Due: 11:30am on Wednesday, June 7th Note: We will not be accepting late submissions.

DATA MINING INTRODUCTION TO CLASSIFICATION USING LINEAR CLASSIFIERS

Accounting Ethics and Auditing

Handling Ties. Analysis of Ties in Input and Output Data of Rankings

Collecting data. stat 480 Heike Hofmann

Data Science Training

Application of PageRank Algorithm on Sorting Problem Su weijun1, a

6 SHC 3-6 SHC Optional 3 SHC 3 SHC

Career and Technical Education. Career Clusters

Knowledge Discovery and Data Mining. Neural Nets. A simple NN as a Mathematical Formula. Notes. Lecture 13 - Neural Nets. Tom Kelsey.

Panel Data 4: Fixed Effects vs Random Effects Models

Sport performance analysis Project Report

Knowledge Discovery and Data Mining

CATALOG 2018/2019 BINUS UNIVERSITY. Computer Science. Introduction. Vision. Mission

Agenda. Why choose our specification The GCSE reforms Changes to all GCSEs in Business Our new GCSE (9-1) qualification

Step-by-step data transformation

Portland Timbers ODP (Pre-Academy) Invite and Information 2001/02 and 2003/04 Age Groups Tryouts: August 12-13/Salem, Oregon

DakStats Web-Sync Setup Guide 1 of 7

Business Analytics. Admission and Degree Requirements. Courses. Business Analytics 1

Section 2.1: Intro to Simple Linear Regression & Least Squares

Calculating Call Blocking and Utilization for Communication Satellites that Use Dynamic Resource Allocation

Why Use Graphs? Test Grade. Time Sleeping (Hrs) Time Sleeping (Hrs) Test Grade

The Finer Things In Alteryx Ken Black 10/2/17

A Brief Introduction to Data Mining


Multicollinearity and Validation CIVL 7012/8012

Data Science Tutorial

CONTENT OUTLINE AND FUNCTIONAL SPECIFICATION

GCS Training Extension of your team Solutions Quality teaching Understanding your needs Funding Range of options

Statistical Analysis of List Experiments

Week 10: Heteroskedasticity II

Math 340 Fall 2014, Victor Matveev. Binary system, round-off errors, loss of significance, and double precision accuracy.

SIDStats Volleyball User Documentation

Forward-Looking and Cautionary Statements

Excel and Tableau. A Beautiful Partnership. Faye Satta, Senior Technical Writer Eriel Ross, Technical Writer

R Language for the SQL Server DBA

STATISTICS (STAT) Statistics (STAT) 1

Pearson BTEC Level 5 Higher National Diploma in Engineering (Electrical and Electronic Engineering)

DE LA SALLE SCHOOL LEARNING PROGRAMME. YEAR 7 PI Pathway. Half Term 1a

Graded Unit Title: Mechanical Engineering: Graded Unit 1

European Risk Management Certification. Candidate Information Guide

SIMULTANEOUS EQUATIONS

Diploma in GRAPHIC DESIGN. - Part 1 LESSON 1

Lie. Cheat. Deceive. How to Practice the Art of Deception at Machine Speed

1.1 Defining Functions

Data Mining Concepts

Transcription:

MATLAB COMPUTATIONAL FINANCE CONFERENCE 2017 Quantitative Sports Analytics using MATLAB Robert Kissell, PhD Robert.Kissell@KissellResearch.com September 28, 2017

Important Email and Web Addresses AlgoSports23/MATLAB Competition Are you smarter than the Algo? Email: AlgoSports23@gmail.com Website: AlgoSports23.com Please check the website for data updates, and contact AlgoSports23@gmail.com for further information.

Quantitative Sports Modeling Modeling Techniques from: Presentation Outline Optimal Sports, Math, Statistics, and Fantasy Probability Models Rank Sports Teams Estimate Winning Probability Calculate Winning Margin Computing Probability of Beating a Spread AlgoSports23/MATLAB Competition

Quantitative Sports Modeling Modeling Techniques from: Presentation Outline Optimal Sports, Math, Statistics, and Fantasy Probability Models Rank Sports Teams Estimate Winning Probability Calculate Winning Margin Computing Probability of Beating a Spread AlgoSports23/MATLAB Competition Are you smarter than the Algo!

Transaction Cost Analysis and Algorithm Trading Suite of TCA Models and Optimizers have been fully integrated into MATLAB s Trading Toolbox. These suites of tools are being used for Algorithmic Trading and Portfolio Management. These include: Market Impact Estimation Pre-Trade Post-Trade Trade Schedule Optimization Liquidation Cost Analysis Portfolio Optimization with TCA Various Libraries are Available Access to a full suite of TCA libraries and MI Data is available upon request. Contact: info@kissellresearch.com or Robert.Kissell@KissellResearch.com

Optimal Sport Math, Statistics, and Fantasy Key items addressed include: Accurately rank sports teams Compute winning probability Demystify the black-box world of computer models Provide insight into the BCS and RPI selection process. Select optimal mix of players for a fantasy league competition Evaluate player skill and forecast future player performance Select team rosters Assist in salary negotiation Determine Hall of Fame eligibility Sabermetrics on Steroids!

What is Quantitative Finance? Quantitative Finance is the application of methods and analyses from the different sciences to solve financial problems. This include: Math, Statistics, Physics, Engineering, Economics, Computer Science, Biology, Psychology, Business, etc. Quantitative Finance is all about proper utilization of the Scientific Method and drawing statistically significant conclusions.

Scientist or Engineer A Scientist is someone who loves surprises. This is an opportunity to learn and make further advancements. The goal is to learn, improve, and progress.

Scientist or Engineer A Scientist is someone who loves surprises. This is an opportunity to learn and make further advancements. The goal is to learn, improve, and progress. A Engineer is someone who hates surprises. Surprises are usually a indication that something failed or gone wrong and often results in a loss or slowing of progress.

What about a Quant? A Quant is someone who learns from a proper application of the scientific method by finding Scientific surprises and profit opportunities. Quants go through great lengths to learn the cause of these surprises and to ensure that these relationships are statistically significant. Quants then seek to implement these scientific surprises without suffering any Engineering surprises and losses.

The Scientific Method in Practice Data Data Data Scientist Statistically Significant Conclusion

The Scientific Method in Practice Data Data Data Scientist Statistically Significant Conclusion Data Data Data Attorney Desired Outcome Find supporting data Data Mining

The Scientific Method in Practice Data Data Data Scientist Statistically Significant Conclusion Data Data Data Attorney Desired Outcome Find supporting data Data Mining Data? Data? Data? Doctor Educated Guess Test Data Worse Case Scenario?

Moral of the Story: Be a Scientist!

Moral of the Story: Be a Scientist! Don t be that Anti-Scientist!

Quantitative Sports Modeling

What is Quantitative Sports Modeling? The application of quantitative tools and analytics, and sound scientific methods, to sports related problems and questions. Quantitative sports modeling consists of the same tools used in quantitative finance and is comprised of: mathematics, statistics, engineering, machine learning, economics, business, etc. Sports Modeling is based on the same framework as Quantitative Finance, but solves different set of problems.

What do we want to solve? Expected Winning Team Probability of Winning Expected Winning Margin Probability of Beating a Specified Margin Future Player Performance Roster of Players (Best set of Complementary Players) Best Mix of Players given Opponent Salaries & Salary Negotiation

Sports Modeling Data: What we want to Predict (LHS) Win/Loss Win Margin Probability of winning by more than X points Player Statistics (Fantasy Sports) Evaluating Player Ability Roster Selection Salary and Salary Negotiations Line-up and Match-ups Player Trades Hall of Fame Selection

Sports Modeling Data: Explanatory Factors Data (RHS) Win/Loss Result Game Scores Game Data Team Statistics (AVG, OBP, ERA, HR, Comp. Ratio) Venue Location (Home Field Advantage) Momentum Players, Injuries Career Statistics Salary Age Teammates & Roster Principal Component Analysis

Different Sports Prediction Models Probability Models Non-Linear Regression Non-Parametric Statistics Neural Networks / Machine Learning Sabermetrics on Steroids!

Head-to-Head Competitions How do we Rank Teams B D A C E Ranking: A B & C D & E F F

Head-to-Head Competitions How do we Rank Teams A Ranking: A, B, C B C

Head-to-Head Competitions How do we Rank Teams B D A C E G Ranking: A & G B & C D & E F Ranking: A B & C & G D & E F F

Head-to-Head Competitions How do we Rank Teams B D A C E H Ranking: A B & C D & E F & H Ranking: A B & C D & E & H F F

Sports Models To Discuss Today

Probability Models: Probability (X>Y) Power Function: λ x λ x + λ y Logit Regression b 0 + b h b a = ln F 1 z 1 F 1 z In probability models, the LHS variable is (0,1)!

Power Function

Power Function The Power function is derived from the Exponential Distribution. Let, Then, f x ~λ x e λ xt f y ~λ y e λ yt Prob x > y = λ x λ x + λ y where, λ k = Team k Rating

Power Function with Home Field Advantage Let X be Home Team Prob X > Y = λ x + λ 0 λ x + λ y + λ 0 Let Y be Away Team Prob Y > X = λ y λ x + λ y + λ 0 λ k = Team k Rating λ 0 = Team k Rating

Power Function: Solving Parameters Function G = λ x + λ 0 λ x + λ y + λ 0 λ x + λ 0 λ x + λ y + λ 0 if home team wins game if away team wins game Max Max L = ς G i log L = σ log G i Solve using Maximum Likelihood Estimates ( MLE )

Power Function: Estimate Spread Run Second Regression, Spread = d 0 + d 1 Probability Results, d 0, d 1, sey

MATLAB Solving Power Function Parameters % Power Function Model % Num = matrix of winning team and location (HFA if at home) % Denon = matrix of all teams including HFA [b,fval,exitflag,output]=fmincon(@(b) mypower(b,num,denom),... b0,[],[],[],[],lb,ub,... [],... options); exitflag; function f = mypower(b,num,denom) Z=(Num*b)./(Denom*b); f=-sum(log(z)); end

Steps to Solve Power Function Set up Objective Function: Estimate Team Ratings using MLE Compute Winning Probabilities using Power Function Formula Run Regression of Home Team Win Margin (Spread ) as function of Predicted Home Team Winning Probability ( Prob ): Spread = d 0 + d 1 Prob This provides: 1) Probability that Home Team Wins Game 2) Expected Home Team Win Margin 3) Teams can be ranked based on Model Parameter (from highest to lowest)

Logit Regression

Logit Regression Model Start with Logistic Distribution Function: 1 1 + exp b 0 + b h b a = z 1 s = Home Pts Away Pts = Home Team Spread, (-inf, +inf) z = s avg(s) stdev(s), ( inf, +inf) z 1 = F 1 z = normcdf z, (0,1)

Logit Regression Model We transform the logistic function into the logit regression: b 0 + b h b a = ln z 1 1 z 1 s = Home Team Spread, (-inf, +inf) z = s avg(s) stdev(s), ( inf, +inf) z 1 = F 1 z = normcdf z, (0,1)

Steps to Solve Logit Spread Regression (Part 1) Calculate LHS Spread Value s = Home Team Spread, (-inf, +inf); s avg(s) z = stdev(s), inf, +inf ; z 1 = F 1 z = normcdf z, (0,1) Solve parameters from OLS b 0 + b h b a = ln 1 z 1 Estimate Home Team Win Margin z 1 = F 1 z = z 1 1 1+exp b 0 +b h b a z = norminv z 1 s = z 1 stdev s + avg(s)

Steps to Solve Logit Spread Regression (Part 2) Run second regression: Actual Spread = d 0 + d 1 Estimated Spread Y = d 0 + d 1 s d 0, d 1, sey Compute Home Team Win Probability Prob Spread > 0 Prob Y > 0 Y~N s, sey

MATLAB Logit Regression % Logit Regression % s = home team win margin, % s>0, home team won game by s % s<0, home team lost game by s % z=zscore(s), mu = mean(s), stdev = stdev(s) % Finv=normcdf(z) % Y=log(Finv/(1-Finv)) % X=matrix of games, home team = +1, away team = -1 whichstats={'beta','tstat','r','yhat','mse','rsquare'}; mystats = regstats(y,x,'linear',whichstats); beta=mystats.tstat.beta; beta=[beta(2:end);beta(1)]; TeamRating=beta;

NFL

NFL Data: Only Three Weeks of Games (47 Games)

NFL Data: Only Three Weeks of Games

NFL Data: Only Three Weeks of Games

Power Function: Estimating Spreads prob = λ x + λ 0 λ x + λ y + λ 0 spread = d 0 + d 1 prob

NFL - Power Function Estimating Home Team Win Probability: prob = λ x + λ 0 λ x + λ y + λ 0 Estimating Home Team Spread s = d 0 + d 1 prob = 12.601 + 28.154 prob

Example: Power Function New England (Home) vs. Carolina (Away) New England = 28.954 Carolina = 5.1099 HFA = 0.01 prob = 28.954+0.01 28.954+5.109+0.01 = 85% Estimating Home Team Spread s = 12.601 + 28.154 0.85 = +11.3 (need to adjust)

Logit Regression: Estimating Spreads Est. Spread = b 0 + b H b a Act. Spread = d 0 + d 1 Est. Spread

NFL Logit Regression Estimating Home Team Win Probability: ln z 1 1 z 1 = b 0 + b h b a Estimating Home Team Spread Y (Actual Spread) = d 0 + d 1 Estimated Spread s d 0, d 1, sey Prob Y > 0 = normcdf 0, s, sey

NFL Data: Only Three Weeks of Games

Example: Power Function New England (Home) vs. Carolina (Away) New England = 1.0079 Carolina = 0.4869 HFA = -0.0592 Estimating Home Team Spread: s = J K 1 1 + exp( (1.0079 0.4869 0.0592) = +6.7 Estimating Home Team Win Probability: p = f 6.7 =74%

NFL - Predictions

NCAA College Football

College Football: Only Four Weeks of Games (286 Games) Games with Div 1- FBS Teams Only

NCAA Football: Only Four Weeks of Games

NCAA Football - FBS: Model Results

NCAA Football - FBS: Algorithmic Rankings (after 4 weeks)

NCAA Football - FBS: Week 5 Predictions (Part 1)

NCAA Football - FBS: Week 5 Predictions (Part 2)

AlgoSports23/MATLAB Competition

AlgoSports23 / MATLAB Competition Are you Smarter than the Algo!

AlgoSports23 / MATLAB Competition Are you Smarter than the Algo! Can you Beat the Algo!

AlgoSports23 / MATLAB Competition Two Important Emails: Robert.Kissell@KissellResearch.com AlgoSports23@gmail.com

AlgoSports23 / MATLAB Competition Rules of the Competition All Analysis & Programming MATLAB Game Results Data will be Posted Weekly Game Prediction File will be Posted Weekly Return Model Predictions by Specified Date Top 23 performing Algorithms each week will be included in the AlgoSport23 Computer Rankings and Prediction National Media Attention! Are you smarter than the Algo?

AlgoSports23 / MATLAB Competition Your program and submission needs to include the following: 1) Ranking of Teams 2) Prediction of Home Team Winning Margin for all game in a week Models are measured based on: 1) RMSE 2) Avg Difference 3) Number of Wins

AlgoSports23 / MATLAB Competition Top 23 performing Algorithms each week will be included in the AlgoSport23 Computer Rankings and Prediction! National Media Attention! Bragging Rights!