Maximum Likelihood estimation: Stata vs. Gauss

Similar documents
Maximum Likelihood Estimation 5.0

Title. Syntax. optimize( ) Function optimization. S = optimize init() (varies) optimize init which(s [, { "max" "min" } ] )

Title. Description. stata.com

Constrained Maximum Likelihood Estimation

Package ihs. February 25, 2015

Programming MLE models in STATA 1

Theoretical Concepts of Machine Learning

Title. Syntax. Step 1: Initialization. moptimize( ) Model optimization

Logistic Regression

A large number of user subroutines and utility routines is available in Abaqus, that are all programmed in Fortran. Subroutines are different for

Optimization 3.1. for GAUSS TM. Aptech Systems, Inc.

David G. Luenberger Yinyu Ye. Linear and Nonlinear. Programming. Fourth Edition. ö Springer

Introduction to Optimization

Lecture 6 - Multivariate numerical optimization

A Study on the Optimization Methods for Optomechanical Alignment

Experimental Data and Training

Nested Fixed Point Algorithm Documentation Manual

Gradient Descent. Wed Sept 20th, James McInenrey Adapted from slides by Francisco J. R. Ruiz

Multi Layer Perceptron trained by Quasi Newton learning rule

06: Logistic Regression

Tree-GP: A Scalable Bayesian Global Numerical Optimization algorithm

Convex Optimization CMU-10725

APPLIED OPTIMIZATION WITH MATLAB PROGRAMMING

Computation of the variance-covariance matrix

1. (20%) a) Calculate and plot the impulse response functions for the model. u1t 1 u 2t 1

STRAT. A Program for Analyzing Statistical Strategic Models. Version 1.4. Curtis S. Signorino Department of Political Science University of Rochester

PANTOB INSTRUCTIONS. Installing Pantob. The rst le should be made part of your Gauss program using the #include statement. At.

CHAPTER 1 INTRODUCTION

Numerical Optimization

A Brief Look at Optimization

Multivariate Numerical Optimization

A User Manual for the Multivariate MLE Tool. Before running the main multivariate program saved in the SAS file Part2-Main.sas,

10.7 Variable Metric Methods in Multidimensions

Wiswall, Applied Microeconometrics, Lecture Notes 1. In this section we focus on three very common computational tasks in applied

Optimization. there will solely. any other methods presented can be. saved, and the. possibility. the behavior of. next point is to.

Constrained Optimization

Maximum Likelihood MT 2.0

INTRODUCTION TO LINEAR AND NONLINEAR PROGRAMMING

Performance Evaluation of an Interior Point Filter Line Search Method for Constrained Optimization

maxlik: A Package for Maximum Likelihood Estimation in R

Constrained and Unconstrained Optimization

Package stochprofml. February 20, 2015

Modern Methods of Data Analysis - WS 07/08

Logistic Regression. Abstract

Newton and Quasi-Newton Methods

Package truncreg. R topics documented: August 3, 2016

Constrained Maximum Likelihood Estimation

Today. Golden section, discussion of error Newton s method. Newton s method, steepest descent, conjugate gradient

Ludwig Fahrmeir Gerhard Tute. Statistical odelling Based on Generalized Linear Model. íecond Edition. . Springer

Optimization. Industrial AI Lab.

DS Machine Learning and Data Mining I. Alina Oprea Associate Professor, CCIS Northeastern University

Estimation of Item Response Models

Composite Self-concordant Minimization

Generalized Additive Models

Package mnlogit. November 8, 2016

Efficient Tuning of SVM Hyperparameters Using Radius/Margin Bound and Iterative Algorithms

Instrumental variables, bootstrapping, and generalized linear models

A CONJUGATE DIRECTION IMPLEMENTATION OF THE BFGS ALGORITHM WITH AUTOMATIC SCALING. Ian D Coope

Correctly Compute Complex Samples Statistics

Deep Generative Models Variational Autoencoders

Tested Paradigm to Include Optimization in Machine Learning Algorithms

Mixture Models and EM

A projected Hessian matrix for full waveform inversion Yong Ma and Dave Hale, Center for Wave Phenomena, Colorado School of Mines

Classical Gradient Methods

Overview Citation. ML Introduction. Overview Schedule. ML Intro Dataset. Introduction to Semi-Supervised Learning Review 10/4/2010

CS 395T Lecture 12: Feature Matching and Bundle Adjustment. Qixing Huang October 10 st 2018

Laboratory exercise. Laboratory experiment OPT-1 Nonlinear Optimization

Conditional Volatility Estimation by. Conditional Quantile Autoregression

25. NLP algorithms. ˆ Overview. ˆ Local methods. ˆ Constrained optimization. ˆ Global methods. ˆ Black-box methods.

Missing Data Analysis for the Employee Dataset

Recapitulation on Transformations in Neural Network Back Propagation Algorithm

Handbook of Statistical Modeling for the Social and Behavioral Sciences

5 Machine Learning Abstractions and Numerical Optimization

Statistical Modeling with Spline Functions Methodology and Theory

Comparison of Interior Point Filter Line Search Strategies for Constrained Optimization by Performance Profiles

A Survey of Basic Deterministic, Heuristic, and Hybrid Methods for Single-Objective Optimization and Response Surface Generation

FMA901F: Machine Learning Lecture 3: Linear Models for Regression. Cristian Sminchisescu

Package intccr. September 12, 2017

Chapter 7 Heteroscedastic Tobit Model Code

The EM Algorithm Lecture What's the Point? Maximum likelihood parameter estimates: One denition of the \best" knob settings. Often impossible to nd di

Adaptive Estimation of Distributions using Exponential Sub-Families Alan Gous Stanford University December 1996 Abstract: An algorithm is presented wh

Package ICsurv. February 19, 2015

Expectation Maximization (EM) and Gaussian Mixture Models

R package mcll for Monte Carlo local likelihood estimation

An RSSI Gradient-based AP Localization Algorithm

1 Training/Validation/Testing

Random projection for non-gaussian mixture models

Training of Neural Networks. Q.J. Zhang, Carleton University

Correctly Compute Complex Samples Statistics

Digital Image Processing Laboratory: MAP Image Restoration

An Introduction to Model-Fitting with the R package glmm

Convexization in Markov Chain Monte Carlo

Package oglmx. R topics documented: May 5, Type Package

SAS/STAT 9.2 User s Guide. The TCALIS Procedure. (Experimental) (Book Excerpt)

STOCHASTIC USER EQUILIBRIUM TRAFFIC ASSIGNMENT WITH MULTIPLE USER CLASSES AND ELASTIC DEMAND

Conditional Random Fields for Word Hyphenation

The exam is closed book, closed notes except your one-page (two-sided) cheat sheet.

Package gpr. February 20, 2015

Optimization Algorithms, Implementations and. Discussions (technical report for self-reference)

MODULE THREE, PART FOUR: PANEL DATA ANALYSIS IN ECONOMIC EDUCATION RESEARCH USING SAS

Transcription:

Maximum Likelihood estimation: Stata vs. Gauss

Index Motivation Objective The Maximum Likelihood Method Capabilities: Stata vs Gauss Conclusions

Motivation Stata is a powerful and flexible statistical package for researchers who need to compute ML estimators that are not available as prepackaged routines. Prospective and advanced users would want to know: I. ML estimation facilities in Stata and GAUSS. II. The main advantages of Stata compared with GAUSS. III. What is still needed and what might be refined to implement the whole ML methodology in Stata.

Objective The main purpose of this presentation is to compare the ML routines and capabilities offered by STATA and GAUSS.

The Maximum Likelihood Method The foundation for the theory and practice of maximum likelihood estimation is a probability model: Where Z is the random variable distributed according to a cumulative probability distribution function F() with parameter vector from, which is the parameter space for F().

The Maximum Likelihood Method Typically, there is more that one variable of interest, so the model: Describes the joint distribution of the random variables, with. Using F(), we can compute probabilities for values of the Zs given values of the parameters.

The Maximum Likelihood Method Given observed values z of the variables, the likelihood function is: Where f() is the probability density function corresponding to F(). We are interested in the element (vector) of that was used to generate z. We denote this vector by

The Maximum Likelihood Method Data typically consist of multiple observations on relevant variables, so we will denote a dataset with the matrix Z. Each of N rows, zj, of Z consists of jointly observed values of the relevant variables:

The Maximum Likelihood Method f() is now the joint-distribution function of the data-generating process. This means that the method by which the data were collected now plays a role in the functional from of f(). We introduce the assumption that observations are independent and identically distributed (i.i.d.) and rewrite the likelihood as:

The Maximum Likelihood Method The maximum likelihood estimates for are the values such that:

The Maximum Likelihood Method We are dealing with likelihood functions that are continuous in their parameters, let s define some differential operators to simplify the notation. For any real valued function a(t), we define D and D2 by: DERIVATIVE GRADIENT VECTOR

Maximum Likelihood: Commonly used Densities in Microeconometrics MODEL RANGE OF y Normal (-, ) DENSITY f(y) Bernoulli 0 or 1 Logit Exponential (0, ) Poisson 0, 1, 2, COMMON PARAMETERIZATION Source: Cameron, Colin A. and Pravin K. Trivedi. Microeconometrics. Methods and Applications, Cambridge, 2007, pág. 140.

Comparison: Stata and Gauss

The Log-Likelihood Function in Stata SYNTAX args lnf mu sigma quietly replace `lnf'=ln(normalden($ml_y1, `mu', `sigma'))

The Log-Likelihood Function in Gauss Where N is the number of observations, P(Yi, 0) is the probability of Yi given 0, a vector of parameters, and wi is the weight of the i-th observation. proc loglik(theta,z); local y,x,b,s; x=z[.,1:cols(z)-1]; y=z[.,cols(z)]; b=theta[1:cols(x)]; s=theta[cols(x)+1]; retp(-0.5*ln(s^2)-0.5*(y-x*b)^2/s^2); endp;

Procedure to maximize a likelihood functions in Stata Syntax: ml model lf my_normal f (y1=x1 x2)/sigma, technique(bhhh) vce(oim) waldtest(0) ml maximize Where: Bhhh: Is a method of optimization Berndt-Hall-Hall-Hausman Oim: Variance-covariance matrix (inverse of the negative Hessian matrix)

Procedure to maximize a likelihood functions in Gauss

Capabilities: Stata vs Gauss Methods Optimization Algorithms Covariance Matrix Inference Data and Inference Initial Values Report of the Coefficients Other

Stata s Capabilities: Methods CAPABILITIES STATA GAUSS Linear Form Restrictions (lnf) NO No Analytical Derivatives (d0) Analytical Gradients First Derivative Analytical Hessian Second Derivative (d1) (d2)

Stata s Capabilities: Optimization Algorithms CAPABILITIES STATA GAUSS Newton-Raphson (NR) Berndt-Hall-Hall-Hausman (BHHH) Davidon-Fletcher-Powell (DFP) Broyden-Fletcher-Goldfarb- Shanno (BFGS) SD (steepest descent) NO Polak-Ribiere Conjugate Gradient NO

Stata s Capabilities: Covariance Matrix CAPABILITIES STATA GAUSS The inverse of the final information matrix from the optimization The inverse of the cross-product of the first derivatives The hetereskedastic-consistent covariance matrix

Stata s Capabilities: Data and Inference CAPABILITIES STATA GAUSS Robust Cluster NO Weights* Survey Data (SVY) NO Modification of the Subsample NO Constrains NO Wald Test Switching** * Stata contains frequency, probability, analytic and importance weights. Gauss have only frequency weight. **Switching between algorithms.

Stata s Capabilities: Initial Values CAPABILITIES STATA GAUSS Initial Values * Plot *Stata searches for feasible initial values with ml search.

Stata s Capabilities: Report of the Coefficients CAPABILITIES STATA* GAUSS Hazard ratios (hr) NO Incidence-rate ratios (irr) NO Odds ratios (or) NO Relative risk ratios (rrr) NO *It must be used the command ml display

Stata s Capabilities: Other CAPABILITIES STATA GAUSS Convergence Syntax Errors Graph: the log-likelihood values

Example with the Poisson model ESTIMATION IN STATA program define my_poisson version 9.0 args lnf mu quietly replace `lnf' = $ML y1*ln(`mu')- `mu' - lnfact($ml y1) end ml model lf my_poisson f (y1=x1 x2)/ sigma, technique(bhhh) vce(oim) ml maximize ESTIMATION IN GAUSS library maxlik; maxset; proc lpsn(b,z); /* Function - Poisson Regression */ local m; m = z[.,2:4]*b; retp(z[.,1].*m-exp(m)); endp; proc lgd(b,z); /* Gradient */ retp((z[.,1]-exp(z[.,2:4]*b)).*z[.,2:4]); endp; x0 = {.5,.5,.5 }; _max_gradproc = &lgd; _max_gradchecktol = 1e-3; { x,f0,g,h,retcode } = MAXLIK("psn", 0,&lpsn,x0); call MAXPrt(x,f0,g,h,retcode);

Conclusions Stata s features seems best suited for analyzing specific models of decision making processes and other microeconometric applications. Gauss is ideal for analyzing a more ample range of statistical issues based on maximum likelihood estimation.