layout: true --- class: middle, inverse ## .center[Estimating Models with {logitr}] .leftcol40[ <center> <img src="https://jhelvy.github.io/2023-qux-conf-conjoint/images/logo.png" width=300> </center> ] .rightcol60[ ###
John Paul Helveston ###
The George Washington University | Dept. of Engineering Management and Systems Engineering ###
June 15, 2023 ] --- # Many FOSS options model estimation R packages: - [{logitr}](https://github.com/jhelvy/logitr): Fastest, mixed logit, WTP space. - [{apollo}](http://www.apollochoicemodelling.com/): Most flexible, great documentation. - [{mlogit}](https://www.jstatsoft.org/article/view/v095i11): The OG R package. - [{gmnl}](https://www.jstatsoft.org/article/view/v079i02): Generalized logit model (though slow). - [{mixl}](https://github.com/joemolloy/fast-mixed-mnl): Good for big datasets (uses C for speed). Python packages: - [{xlogit}](https://xlogit.readthedocs.io/en/latest/): Basically Python version of {logitr}. [Stan](https://www.inwt-statistics.com/blog/understand-customer-decision-making-discrete-choice-models-with-rstan): For the Bayesians. --- # Many FOSS options model estimation R packages: - [{logitr}](https://github.com/jhelvy/logitr): Fastest, mixed logit, WTP space. .red[<- I wrote this one, so I'm showcasing it!] - [{apollo}](http://www.apollochoicemodelling.com/): Most flexible, great documentation. - [{mlogit}](https://www.jstatsoft.org/article/view/v095i11): The OG R package. - [{gmnl}](https://www.jstatsoft.org/article/view/v079i02): Generalized logit model (though slow). - [{mixl}](https://github.com/joemolloy/fast-mixed-mnl): Good for big datasets (uses C for speed). Python packages: - [{xlogit}](https://xlogit.readthedocs.io/en/latest/): Basically Python version of {logitr}. [Stan](https://www.inwt-statistics.com/blog/understand-customer-decision-making-discrete-choice-models-with-rstan): For the Bayesians. --- class: center background-color: #fff ## {logitr} is fast! <center> <img src="https://jhelvy.github.io/logitr/articles/benchmark.png" width=750> </center> --- class: center ## {logitr} supports two common forms of utility models .leftcol[ ## Preference Space <center> <img src="images/utilityPreference2.png" width=500> </center> ] .rightcol[ ## WTP Space <center> <img src="images/utilityWtp2.png" width=520> </center> ] --- ## .center[{logitr} has a similar UI with {cbcTools}] .center[({cbcTools} uses {logitr} to simulate choices and assess power)] .leftcol[ ## .center[{cbcTools}] ```r power <- cbc_power( nbreaks = 10, n_q = 6, data = data, obsID = "obsID", outcome = "choice", pars = c("price", "type", "freshness") ) ``` ] .rightcol[ ## .center[{logitr}] ```r model <- logitr( data = data, obsID = "obsID", outcome = "choice", pars = c("price", "type", "freshness") ) ``` ] --- class: inverse, middle, center # Utility model refresher --- class: center # Which would you choose? .cols4[ ## .center[$2.49] <center> <img src="images/dannon.png" width=300> </center> ] .cols4[ ## .center[$2.99] <center> <img src="images/yoplait.png" width=300> </center> ] .cols4[ ## .center[$1.99] <center> <img src="images/hiland.png" width=300> </center> ] .cols4[ ## .center[$3.99] <center> <img src="images/weight.png" width=300> </center> ] --- # .center[Estimate marginal utilities] </br> <center> <img src="images/utilityPreference.png" width="1000"> </center> -- .code100[ ``` #> Estimate Std. Error z-value Pr(>|z|) #> price -0.3886257 0.02426923 -16.01311 0 #> brandhiland -3.1167063 0.14496806 -21.49926 0 #> brandyoplait 1.4463603 0.08869767 16.30663 0 #> branddannon 0.6440868 0.05435965 11.84862 0 ``` ] --- # .center[Convert marginal _utilities_ to marginal _WTPs_] </br> <center> <img src="images/wtpHatComputed.png" width=250> </center> -- .code100[ ``` #> Estimate Std. Error z-value Pr(>|z|) #> brandhiland -8.01982 0.46096 -17.3980 < 2.2e-16 *** #> brandyoplait 3.72173 0.15890 23.4214 < 2.2e-16 *** #> branddannon 1.65734 0.16832 9.8463 < 2.2e-16 *** #> --- #> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ``` ] --- class: center ## Alternative approach: **Estimate a WTP-Space Model** .leftcol30[ ## Substitutions: <center> <img src="images/wtpComputed.png" width=200></br> <img src="images/lambda.png" width=160> </center> ] .rightcol70[ ## "Preference Space" <center> <img src="images/utilityPreference2.png" width="370"> </center> ## "WTP Space" <center> <img src="images/utilityWtp2.png" width=400> </center> ] --- class: center, middle, inverse # What's the difference? .leftcol[ ## Preference Space <center> <img src="images/utilityPreference2Inverse.png" width=500> </center> .font200[↓] <center> <img src="images/wtpHatComputedInverse.png" width=200></br> </center> ] .rightcol[ ## WTP Space </br></br> <center> <img src="images/utilityWtp2Inverse.png" width=520> </center> ] --- class: center ## **Mixed logit**: ## Unreasonably large WTP variance across population </br> .leftcol40[ <center> <img src="images/utilityPreference2.png" width=500> </center> .center[.font200[↓]] <center> <img src="images/wtpHatComputed.png" width=200></br> </center> ] .rightcol60[ <center> <img src="images/betaNormal.png" width=380> </br> <img src="images/alphaNormal.png" width=380> </center> ] --- background-color: #fff class: center ### Preference space model produces unreasonably large variance in WTP .cols4[‍] .cols4[ <center> <b>Preference Space</b> <img src="images/betaNormal.png" width=200> </center> ] .cols4[ <center> <b>WTP Space</b> <img src="images/omegaNormal.png" width=200> </center> </br> ] .cols4[‍] -- <center> <img src="images/wtpCompare.png" width=1100> </center> --- class: center, middle, inverse # .fancy[Practical Considerations] --- class: center ## .fancy[Practical Considerations] > ### WTP space models produce immediately interpretable results -- .leftcol[ Unit: "Utility" (relative) <center> <img src="images/utilityPreference2.png" width="250"> </center> .code50[ ``` #> Estimate Std. Error z-value Pr(>|z|) #> price -0.3886257 0.02426923 -16.01311 0 #> brandhiland -3.1167063 0.14496806 -21.49926 0 #> brandyoplait 1.4463603 0.08869767 16.30663 0 #> branddannon 0.6440868 0.05435965 11.84862 0 ``` ]] -- .rightcol[ Units: $ (absolute) <center> <img src="images/utilityWtp2.png" width="300"> </center> .code50[ ``` #> Estimate Std. Error z-value Pr(>|z|) #> scalePar 0.388626 0.024399 15.9280 < 2.2e-16 *** #> brandhiland -8.019815 0.460961 -17.3980 < 2.2e-16 *** #> brandyoplait 3.721731 0.158903 23.4214 < 2.2e-16 *** #> branddannon 1.657345 0.168321 9.8463 < 2.2e-16 *** #> --- #> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ``` ]] --- class: center ## .fancy[Practical Considerations] > ### WTPs can be directly compared across different models</br>(even estimates from different data sets) -- <center> <img src="images/utility.png" width=700> </center> -- .leftcol[ **Preference Space**</br>Parameters proportional to `\(\sigma\)` <center> <img src="images/utilityPreferenceScaled2.png" width=420> <img src="images/utilityPreference2.png" width=300> </center> ] -- .rightcol[ **WTP Space**</br>Parameters independent of `\(\sigma\)` <center> <img src="images/utilityWtpScaled2.png" width=450> <img src="images/utilityWtp2.png" width=350> </center> ] --- ## .center[.fancy[Practical Considerations]] > ### .center[Neither space systematically predicts choice better] .leftcol80[ - **Train and Weeks (2005)** and **Sonnier et al. (2007)** found preference space model fit data better. - **Das et al. (2009)** found nearly identical model fit on out-of-sample predictions with each model specification. ] --- class: center, middle, inverse # ...but most software is built for <center> <img src="images/utilityPreference2Inverse.png" width="370"> </center> # not <center> <img src="images/utilityWtp2Inverse.png" width=400> </center> --- class: center, middle, inverse # `logitr` to the rescue! <center> <img src="images/logitr-hex.png" width=250> </center> --- # The logitr
Package <a href='https://jhelvy.github.io/logitr/'><img src='images/logitr-hex.png' align="right" height="240"/></a> ### Estimation of multinomial and mixed logit models in with "Preference" space or "Willingness-to-pay" (WTP) space utility parameterizations. - Multinomial logit (MNL) models - Mixed logit (MXL) models with normal and log-normal parameter distributions. - Preference space and WTP space utility parameterizations. - Weighted models to differentially weight individual observations. - Uncorrelated or correlated heterogeneity covariances for mixed logit models. - Functions for computing WTP from preference space models. - Functions for predicting expected probabilities and outcomes for sets of alternatives based on an estimated model. - A parallelized multistart optimization loop that uses different random starting points in each iteration to search for different local minima (useful for non-convex problems like MXL models or models with WTP space parameterizations). --- # .center[Data format] Data must be arranged in a "long" format: - Each row is an alternative from a choice observation. - Choice observations do _not_ have to be symmetric. Required variables: - `outcome`: A dummy variable for the chosen alternative (`1` or `0`). - `obsID`: A sequence of repeated numbers identifying each unique choice observation, e.g. `1, 1, 2, 2, 3, 3`. - `pars`: Any other variables to use as model covariates. --- # .center[Data format] .leftcol[ ```r head(yogurt, 10) ``` ``` #> choice obsID alt price brand #> 1 0 1 1 8.1 dannon #> 2 0 1 2 6.1 hiland #> 3 1 1 3 7.9 weight #> 4 0 1 4 10.8 yoplait #> 5 1 2 1 9.8 dannon #> 6 0 2 2 6.4 hiland #> 7 0 2 3 7.5 weight #> 8 0 2 4 10.8 yoplait #> 9 1 3 1 9.8 dannon #> 10 0 3 2 6.1 hiland ``` ] .rightcol[ - `outcome = "choice"` - `obsID = "obsID"` - `pars = c("price", "brand")` ] --- # .center[Multinomial logit in **Preference Space**] .leftcol45[ ```r mnl_pref <- logitr( data = yogurt, outcome = "choice", obsID = "obsID", pars = c("price", "brand") ) *summary(mnl_pref) ``` <center> <img src="images/utilityPreference2.png" width=300> </center> ] -- .rightcol55[.code50[ ``` #> ================================================= #> #> Model estimated on: Fri Jun 09 10:12:27 2023 #> #> Using logitr version: 1.1.0 #> #> Call: #> logitr(data = yogurt, outcome = "choice", obsID = "obsID", pars = c("price", #> "brand")) #> #> Frequencies of alternatives: #> 1 2 3 4 #> 0.402156 0.029436 0.229270 0.339138 #> #> Exit Status: 3, Optimization stopped because ftol_rel or ftol_abs was reached. #> #> Model Type: Multinomial Logit #> Model Space: Preference #> Model Run: 1 of 1 #> Iterations: 20 #> Elapsed Time: 0h:0m:0.01s #> Algorithm: NLOPT_LD_LBFGS #> Weights Used?: FALSE #> Robust? FALSE #> #> Model Coefficients: #> Estimate Std. Error z-value Pr(>|z|) #> price -0.388626 0.024269 -16.013 < 2.2e-16 *** #> brandhiland -3.116706 0.144968 -21.499 < 2.2e-16 *** #> brandyoplait 1.446360 0.088698 16.307 < 2.2e-16 *** #> branddannon 0.644087 0.054360 11.849 < 2.2e-16 *** #> --- #> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 #> #> Log-Likelihood: -2665.1101915 #> Null Log-Likelihood: -3343.7419990 #> AIC: 5338.2203830 #> BIC: 5361.3732000 #> McFadden R2: 0.2029558 #> Adj McFadden R2: 0.2017595 #> Number of Observations: 2412.0000000 ``` ]] --- # .center[Multinomial logit in **WTP Space**] .leftcol45[ ```r library(logitr) mnl_wtp <- logitr( data = yogurt, outcome = "choice", obsID = "obsID", * pars = "brand", * scalePar = "price" ) summary(mnl_wtp) ``` <center> <img src="images/utilityWtp2.png" width=300> </center> ] -- .rightcol55[.code50[ ``` #> ================================================= #> #> Model estimated on: Fri Jun 09 10:12:41 2023 #> #> Using logitr version: 1.1.0 #> #> Call: #> logitr(data = yogurt, outcome = "choice", obsID = "obsID", pars = "brand", #> scalePar = "price") #> #> Frequencies of alternatives: #> 1 2 3 4 #> 0.402156 0.029436 0.229270 0.339138 #> #> Exit Status: 3, Optimization stopped because ftol_rel or ftol_abs was reached. #> #> Model Type: Multinomial Logit #> Model Space: Willingness-to-Pay #> Model Run: 1 of 1 #> Iterations: 40 #> Elapsed Time: 0h:0m:0.02s #> Algorithm: NLOPT_LD_LBFGS #> Weights Used?: FALSE #> Robust? FALSE #> #> Model Coefficients: #> Estimate Std. Error z-value Pr(>|z|) #> scalePar 0.388633 0.024269 16.013 < 2.2e-16 *** #> brandhiland -8.019717 0.455549 -17.605 < 2.2e-16 *** #> brandyoplait 3.721711 0.157655 23.607 < 2.2e-16 *** #> branddannon 1.657290 0.165712 10.001 < 2.2e-16 *** #> --- #> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 #> #> Log-Likelihood: -2665.1101915 #> Null Log-Likelihood: -3343.7419990 #> AIC: 5338.2203830 #> BIC: 5361.3732000 #> McFadden R2: 0.2029558 #> Adj McFadden R2: 0.2017595 #> Number of Observations: 2412.0000000 ``` ]] --- class: center, middle ## **.red[Caution]**</br> ## Log-likelihood function for WTP space models is</br>**non-convex** 😔 --- # .center[Use a Multistart] .leftcol45[ ```r mnl_wtp <- logitr( data = yogurt, outcome = "choice", obsID = "obsID", pars = "brand", scalePar = "price", * numMultiStarts = 10 ) summary(mnl_wtp) ``` <center> <img src="images/utilityWtp2.png" width=300> </center> ] -- .rightcol55[.code50[ ``` #> ================================================= #> #> Model estimated on: Fri Jun 09 10:13:43 2023 #> #> Using logitr version: 1.1.0 #> #> Call: #> logitr(data = yogurt, outcome = "choice", obsID = "obsID", pars = "brand", #> scalePar = "price", numMultiStarts = 10) #> #> Frequencies of alternatives: #> 1 2 3 4 #> 0.402156 0.029436 0.229270 0.339138 #> #> Summary Of Multistart Runs: #> Log Likelihood Iterations Exit Status #> 1 -2665.11 40 3 #> 2 -2665.11 39 3 #> 3 -2665.11 43 3 #> 4 -2665.11 47 3 #> 5 -2665.11 54 3 #> 6 -2665.11 42 3 #> 7 -2665.11 39 3 #> 8 -2665.11 44 3 #> 9 -2665.11 39 3 #> 10 -2665.11 38 3 #> #> Use statusCodes() to view the meaning of each status code #> #> Exit Status: 3, Optimization stopped because ftol_rel or ftol_abs was reached. #> #> Model Type: Multinomial Logit #> Model Space: Willingness-to-Pay #> Model Run: 10 of 10 #> Iterations: 38 #> Elapsed Time: 0h:0m:0.02s #> Algorithm: NLOPT_LD_LBFGS #> Weights Used?: FALSE #> Robust? FALSE #> #> Model Coefficients: #> Estimate Std. Error z-value Pr(>|z|) #> scalePar 0.388631 0.024269 16.013 < 2.2e-16 *** #> brandhiland -8.019767 0.455557 -17.604 < 2.2e-16 *** #> brandyoplait 3.721702 0.157655 23.607 < 2.2e-16 *** #> branddannon 1.657316 0.165715 10.001 < 2.2e-16 *** #> --- #> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 #> #> Log-Likelihood: -2665.1101915 #> Null Log-Likelihood: -3343.7419990 #> AIC: 5338.2203830 #> BIC: 5361.3732000 #> McFadden R2: 0.2029558 #> Adj McFadden R2: 0.2017595 #> Number of Observations: 2412.0000000 ``` ]] --- .leftcol[ ## .center[Mixed logit in<br>**Preference Space**] ```r mxl_pref <- logitr( data = yogurt, outcome = "choice", obsID = "obsID", pars = c("price", "brand"), * randPars = c(brand = "n"), numMultiStarts = 10 ) ``` <center> <img src="images/utilityPreference2.png" width=300> <img src="images/betaNormal.png" width=200> </center> ] .rightcol[ ## .center[Mixed logit in<br>**WTP Space**] ```r mxl_wtp <- logitr( data = yogurt, outcome = "choice", obsID = "obsID", pars = "brand", scalePar = "price", * randPars = c(brand = "n"), * randScale = "ln", numMultiStarts = 10 ) ``` <center> <img src="images/utilityWtp2.png" width=300> <img src="images/omegaNormal.png" width=200> </center> ] --- class: center, middle, inverse # .fancy[Convenient helper functions] --- ### .center[`predict()`: Expected shares for a set of alternatives] </br> .leftcol[ Define a set of alternatives ```r data <- subset( yogurt, obsID == 42, select = c('price', 'brand', 'obsID')) data ``` ``` #> price brand obsID #> 1 6.3 dannon 42 #> 2 6.1 hiland 42 #> 3 7.9 weight 42 #> 4 11.5 yoplait 42 ``` ] -- .rightcol[ Predict probabilities ```r predict( mnl_pref, newdata = data, obsID = "obsID", returnData = TRUE ) ``` ``` #> obsID predicted_prob price brand #> 1 42 0.62391435 6.3 dannon #> 2 42 0.01568877 6.1 hiland #> 3 42 0.17593683 7.9 weight #> 4 42 0.18446005 11.5 yoplait ``` ] --- class: inverse
−
+
10
:
00
# Your turn - Be sure to have downloaded and unzipped the [practice code](https://jhelvy.github.io/2023-qux-conf-conjoint/practice/2023-qux-conf-conjoint.zip). - Open the `2023-qux-conf-conjoint.Rproj` file to open RStudio. - In RStudio, open the `estimating-models.R` file. - Experiment with estimating different models (use either one of the example datasets included in the package, or simulate your own data!) --- class: inverse <br> ## .center[{logitr} documentation:<br>https://jhelvy.github.io/logitr/] ## .center[Back to workshop website:<br> https://jhelvy.github.io/2023-qux-conf-conjoint/] .footer-large[ .right[ @JohnHelveston
<br> @jhelvy
<br> @jhelvy
<br> jhelvy.com
<br> jph@gwu.edu
]]