Once a model has been estimated, it can be used to predict choices for a set of alternatives. This vignette demonstrates examples of how to so using the
predictChoices() function along with the results of an estimated model.
To predict choices, you first need to define a set of alternatives for which you want to make predictions. Each row should be an alternative, and each column should be an attribute. I will predict choices on the full
yogurt data set, which was used to estimate each of the models used in this example.
This example uses the yogurt data set from Jain et al. (1994). The data set contains 2,412 choice observations from a series of yogurt purchases by a panel of 100 households in Springfield, Missouri, over a roughly two-year period. The data were collected by optical scanners and contain information about the price, brand, and a “feature” variable, which identifies whether a newspaper advertisement was shown to the customer. There are four brands of yogurt: Yoplait, Dannon, Weight Watchers, and Hiland, with market shares of 34%, 40%, 23% and 3%, respectively.
head(yogurt) #> id obsID alt choice price feat brand dannon hiland weight yoplait #> 1 1 1 1 0 8.1 0 dannon 1 0 0 0 #> 2 1 1 2 0 6.1 0 hiland 0 1 0 0 #> 3 1 1 3 1 7.9 0 weight 0 0 1 0 #> 4 1 1 4 0 10.8 0 yoplait 0 0 0 1 #> 5 1 2 1 1 9.8 0 dannon 1 0 0 0 #> 6 1 2 2 0 6.4 0 hiland 0 1 0 0
In the example below, I estimate a preference space MNL model called
mnl_pref. I can then use the
predictChoices() function with the
mnl_pref model to predict the choices for each set of alternatives in the
yogurt data set:
# Estimate the model mnl_pref <- logitr( data = yogurt, choice = 'choice', obsID = 'obsID', pars = c('price', 'feat', 'brand') ) # Predict choices choices_mnl_pref <- predictChoices( model = mnl_pref, alts = yogurt, altID = "alt", obsID = "obsID" )
choices_mnl_pref data frame contains the same
alts data frame with an additional column,
choice_predict, which contains the predicted choices. You can quickly compute the accuracy by dividing the number of correctly predicted choices by the total number of choices:
You can also use WTP space models to predict choices. For example, here are the results from an equivalent model but in the WTP space:
# Estimate the model mnl_wtp <- logitr( data = yogurt, choice = 'choice', obsID = 'obsID', pars = c('feat', 'brand'), price = 'price', modelSpace = 'wtp', numMultiStarts = 10 ) # Make predictions choices_mnl_wtp <- predictChoices( model = mnl_wtp, alts = yogurt, altID = "alt", obsID = "obsID" )
#> NOTE: Using results from run 8 of 10 multistart runs #> (the run with the largest log-likelihood value)
You can also use mixed logit models to predict choices. Heterogeneity is modeled by simulating draws from the population estimates of the estimated model. Here is an example using a preference space mixed logit model:
# Estimate the model mxl_pref <- logitr( data = yogurt, choice = 'choice', obsID = 'obsID', pars = c('price', 'feat', 'brand'), randPars = c(feat = 'n', brand = 'n'), numMultiStarts = 5 ) # Make predictions choices_mxl_pref <- predictChoices( model = mxl_pref, alts = yogurt, altID = "alt", obsID = "obsID" )
Likewise, mixed logit WTP space models can also be used to predict choices:
# Estimate the model mxl_wtp <- logitr( data = yogurt, choice = 'choice', obsID = 'obsID', pars = c('feat', 'brand'), price = 'price', randPars = c(feat = 'n', brand = 'n'), modelSpace = 'wtp', numMultiStarts = 5 ) # Make predictions choices_mxl_wtp <- predictChoices( model = mxl_wtp, alts = yogurt, altID = "alt", obsID = "obsID" )
library(dplyr) # Combine models into one data frame choices <- rbind( choices_mnl_pref, choices_mnl_wtp, choices_mxl_pref, choices_mxl_wtp) choices$model <- c( rep("mnl_pref", nrow(choices_mnl_pref)), rep("mnl_wtp", nrow(choices_mnl_wtp)), rep("mxl_pref", nrow(choices_mxl_pref)), rep("mxl_wtp", nrow(choices_mxl_wtp))) # Compute prediction accuracy by model choices %>% filter(choice == 1) %>% mutate(predict_correct = (choice_predict == choice)) %>% group_by(model) %>% summarise(p_correct = sum(predict_correct) / n()) #> # A tibble: 4 × 2 #> model p_correct #> <chr> <dbl> #> 1 mnl_pref 0.390 #> 2 mnl_wtp 0.362 #> 3 mxl_pref 0.390 #> 4 mxl_wtp 0.379
The models all perform about the same with ~38% correct predictions. This is significantly better than random predictions, which should be 25%.