MATLAB COMPUTATIONAL FINANCE CONFERENCE 2017 Quantitative Sports Analytics using MATLAB Robert Kissell, PhD Robert.Kissell@KissellResearch.com September 28, 2017
Important Email and Web Addresses AlgoSports23/MATLAB Competition Are you smarter than the Algo? Email: AlgoSports23@gmail.com Website: AlgoSports23.com Please check the website for data updates, and contact AlgoSports23@gmail.com for further information.
Quantitative Sports Modeling Modeling Techniques from: Presentation Outline Optimal Sports, Math, Statistics, and Fantasy Probability Models Rank Sports Teams Estimate Winning Probability Calculate Winning Margin Computing Probability of Beating a Spread AlgoSports23/MATLAB Competition
Quantitative Sports Modeling Modeling Techniques from: Presentation Outline Optimal Sports, Math, Statistics, and Fantasy Probability Models Rank Sports Teams Estimate Winning Probability Calculate Winning Margin Computing Probability of Beating a Spread AlgoSports23/MATLAB Competition Are you smarter than the Algo!
Transaction Cost Analysis and Algorithm Trading Suite of TCA Models and Optimizers have been fully integrated into MATLAB s Trading Toolbox. These suites of tools are being used for Algorithmic Trading and Portfolio Management. These include: Market Impact Estimation Pre-Trade Post-Trade Trade Schedule Optimization Liquidation Cost Analysis Portfolio Optimization with TCA Various Libraries are Available Access to a full suite of TCA libraries and MI Data is available upon request. Contact: info@kissellresearch.com or Robert.Kissell@KissellResearch.com
Optimal Sport Math, Statistics, and Fantasy Key items addressed include: Accurately rank sports teams Compute winning probability Demystify the black-box world of computer models Provide insight into the BCS and RPI selection process. Select optimal mix of players for a fantasy league competition Evaluate player skill and forecast future player performance Select team rosters Assist in salary negotiation Determine Hall of Fame eligibility Sabermetrics on Steroids!
What is Quantitative Finance? Quantitative Finance is the application of methods and analyses from the different sciences to solve financial problems. This include: Math, Statistics, Physics, Engineering, Economics, Computer Science, Biology, Psychology, Business, etc. Quantitative Finance is all about proper utilization of the Scientific Method and drawing statistically significant conclusions.
Scientist or Engineer A Scientist is someone who loves surprises. This is an opportunity to learn and make further advancements. The goal is to learn, improve, and progress.
Scientist or Engineer A Scientist is someone who loves surprises. This is an opportunity to learn and make further advancements. The goal is to learn, improve, and progress. A Engineer is someone who hates surprises. Surprises are usually a indication that something failed or gone wrong and often results in a loss or slowing of progress.
What about a Quant? A Quant is someone who learns from a proper application of the scientific method by finding Scientific surprises and profit opportunities. Quants go through great lengths to learn the cause of these surprises and to ensure that these relationships are statistically significant. Quants then seek to implement these scientific surprises without suffering any Engineering surprises and losses.
The Scientific Method in Practice Data Data Data Scientist Statistically Significant Conclusion
The Scientific Method in Practice Data Data Data Scientist Statistically Significant Conclusion Data Data Data Attorney Desired Outcome Find supporting data Data Mining
The Scientific Method in Practice Data Data Data Scientist Statistically Significant Conclusion Data Data Data Attorney Desired Outcome Find supporting data Data Mining Data? Data? Data? Doctor Educated Guess Test Data Worse Case Scenario?
Moral of the Story: Be a Scientist!
Moral of the Story: Be a Scientist! Don t be that Anti-Scientist!
Quantitative Sports Modeling
What is Quantitative Sports Modeling? The application of quantitative tools and analytics, and sound scientific methods, to sports related problems and questions. Quantitative sports modeling consists of the same tools used in quantitative finance and is comprised of: mathematics, statistics, engineering, machine learning, economics, business, etc. Sports Modeling is based on the same framework as Quantitative Finance, but solves different set of problems.
What do we want to solve? Expected Winning Team Probability of Winning Expected Winning Margin Probability of Beating a Specified Margin Future Player Performance Roster of Players (Best set of Complementary Players) Best Mix of Players given Opponent Salaries & Salary Negotiation
Sports Modeling Data: What we want to Predict (LHS) Win/Loss Win Margin Probability of winning by more than X points Player Statistics (Fantasy Sports) Evaluating Player Ability Roster Selection Salary and Salary Negotiations Line-up and Match-ups Player Trades Hall of Fame Selection
Sports Modeling Data: Explanatory Factors Data (RHS) Win/Loss Result Game Scores Game Data Team Statistics (AVG, OBP, ERA, HR, Comp. Ratio) Venue Location (Home Field Advantage) Momentum Players, Injuries Career Statistics Salary Age Teammates & Roster Principal Component Analysis
Different Sports Prediction Models Probability Models Non-Linear Regression Non-Parametric Statistics Neural Networks / Machine Learning Sabermetrics on Steroids!
Head-to-Head Competitions How do we Rank Teams B D A C E Ranking: A B & C D & E F F
Head-to-Head Competitions How do we Rank Teams A Ranking: A, B, C B C
Head-to-Head Competitions How do we Rank Teams B D A C E G Ranking: A & G B & C D & E F Ranking: A B & C & G D & E F F
Head-to-Head Competitions How do we Rank Teams B D A C E H Ranking: A B & C D & E F & H Ranking: A B & C D & E & H F F
Sports Models To Discuss Today
Probability Models: Probability (X>Y) Power Function: λ x λ x + λ y Logit Regression b 0 + b h b a = ln F 1 z 1 F 1 z In probability models, the LHS variable is (0,1)!
Power Function
Power Function The Power function is derived from the Exponential Distribution. Let, Then, f x ~λ x e λ xt f y ~λ y e λ yt Prob x > y = λ x λ x + λ y where, λ k = Team k Rating
Power Function with Home Field Advantage Let X be Home Team Prob X > Y = λ x + λ 0 λ x + λ y + λ 0 Let Y be Away Team Prob Y > X = λ y λ x + λ y + λ 0 λ k = Team k Rating λ 0 = Team k Rating
Power Function: Solving Parameters Function G = λ x + λ 0 λ x + λ y + λ 0 λ x + λ 0 λ x + λ y + λ 0 if home team wins game if away team wins game Max Max L = ς G i log L = σ log G i Solve using Maximum Likelihood Estimates ( MLE )
Power Function: Estimate Spread Run Second Regression, Spread = d 0 + d 1 Probability Results, d 0, d 1, sey
MATLAB Solving Power Function Parameters % Power Function Model % Num = matrix of winning team and location (HFA if at home) % Denon = matrix of all teams including HFA [b,fval,exitflag,output]=fmincon(@(b) mypower(b,num,denom),... b0,[],[],[],[],lb,ub,... [],... options); exitflag; function f = mypower(b,num,denom) Z=(Num*b)./(Denom*b); f=-sum(log(z)); end
Steps to Solve Power Function Set up Objective Function: Estimate Team Ratings using MLE Compute Winning Probabilities using Power Function Formula Run Regression of Home Team Win Margin (Spread ) as function of Predicted Home Team Winning Probability ( Prob ): Spread = d 0 + d 1 Prob This provides: 1) Probability that Home Team Wins Game 2) Expected Home Team Win Margin 3) Teams can be ranked based on Model Parameter (from highest to lowest)
Logit Regression
Logit Regression Model Start with Logistic Distribution Function: 1 1 + exp b 0 + b h b a = z 1 s = Home Pts Away Pts = Home Team Spread, (-inf, +inf) z = s avg(s) stdev(s), ( inf, +inf) z 1 = F 1 z = normcdf z, (0,1)
Logit Regression Model We transform the logistic function into the logit regression: b 0 + b h b a = ln z 1 1 z 1 s = Home Team Spread, (-inf, +inf) z = s avg(s) stdev(s), ( inf, +inf) z 1 = F 1 z = normcdf z, (0,1)
Steps to Solve Logit Spread Regression (Part 1) Calculate LHS Spread Value s = Home Team Spread, (-inf, +inf); s avg(s) z = stdev(s), inf, +inf ; z 1 = F 1 z = normcdf z, (0,1) Solve parameters from OLS b 0 + b h b a = ln 1 z 1 Estimate Home Team Win Margin z 1 = F 1 z = z 1 1 1+exp b 0 +b h b a z = norminv z 1 s = z 1 stdev s + avg(s)
Steps to Solve Logit Spread Regression (Part 2) Run second regression: Actual Spread = d 0 + d 1 Estimated Spread Y = d 0 + d 1 s d 0, d 1, sey Compute Home Team Win Probability Prob Spread > 0 Prob Y > 0 Y~N s, sey
MATLAB Logit Regression % Logit Regression % s = home team win margin, % s>0, home team won game by s % s<0, home team lost game by s % z=zscore(s), mu = mean(s), stdev = stdev(s) % Finv=normcdf(z) % Y=log(Finv/(1-Finv)) % X=matrix of games, home team = +1, away team = -1 whichstats={'beta','tstat','r','yhat','mse','rsquare'}; mystats = regstats(y,x,'linear',whichstats); beta=mystats.tstat.beta; beta=[beta(2:end);beta(1)]; TeamRating=beta;
NFL
NFL Data: Only Three Weeks of Games (47 Games)
NFL Data: Only Three Weeks of Games
NFL Data: Only Three Weeks of Games
Power Function: Estimating Spreads prob = λ x + λ 0 λ x + λ y + λ 0 spread = d 0 + d 1 prob
NFL - Power Function Estimating Home Team Win Probability: prob = λ x + λ 0 λ x + λ y + λ 0 Estimating Home Team Spread s = d 0 + d 1 prob = 12.601 + 28.154 prob
Example: Power Function New England (Home) vs. Carolina (Away) New England = 28.954 Carolina = 5.1099 HFA = 0.01 prob = 28.954+0.01 28.954+5.109+0.01 = 85% Estimating Home Team Spread s = 12.601 + 28.154 0.85 = +11.3 (need to adjust)
Logit Regression: Estimating Spreads Est. Spread = b 0 + b H b a Act. Spread = d 0 + d 1 Est. Spread
NFL Logit Regression Estimating Home Team Win Probability: ln z 1 1 z 1 = b 0 + b h b a Estimating Home Team Spread Y (Actual Spread) = d 0 + d 1 Estimated Spread s d 0, d 1, sey Prob Y > 0 = normcdf 0, s, sey
NFL Data: Only Three Weeks of Games
Example: Power Function New England (Home) vs. Carolina (Away) New England = 1.0079 Carolina = 0.4869 HFA = -0.0592 Estimating Home Team Spread: s = J K 1 1 + exp( (1.0079 0.4869 0.0592) = +6.7 Estimating Home Team Win Probability: p = f 6.7 =74%
NFL - Predictions
NCAA College Football
College Football: Only Four Weeks of Games (286 Games) Games with Div 1- FBS Teams Only
NCAA Football: Only Four Weeks of Games
NCAA Football - FBS: Model Results
NCAA Football - FBS: Algorithmic Rankings (after 4 weeks)
NCAA Football - FBS: Week 5 Predictions (Part 1)
NCAA Football - FBS: Week 5 Predictions (Part 2)
AlgoSports23/MATLAB Competition
AlgoSports23 / MATLAB Competition Are you Smarter than the Algo!
AlgoSports23 / MATLAB Competition Are you Smarter than the Algo! Can you Beat the Algo!
AlgoSports23 / MATLAB Competition Two Important Emails: Robert.Kissell@KissellResearch.com AlgoSports23@gmail.com
AlgoSports23 / MATLAB Competition Rules of the Competition All Analysis & Programming MATLAB Game Results Data will be Posted Weekly Game Prediction File will be Posted Weekly Return Model Predictions by Specified Date Top 23 performing Algorithms each week will be included in the AlgoSport23 Computer Rankings and Prediction National Media Attention! Are you smarter than the Algo?
AlgoSports23 / MATLAB Competition Your program and submission needs to include the following: 1) Ranking of Teams 2) Prediction of Home Team Winning Margin for all game in a week Models are measured based on: 1) RMSE 2) Avg Difference 3) Number of Wins
AlgoSports23 / MATLAB Competition Top 23 performing Algorithms each week will be included in the AlgoSport23 Computer Rankings and Prediction! National Media Attention! Bragging Rights!