Use this function to estimate multinomial (MNL) and mixed logit (MXL) models with "Preference" space or "Willingness-to-pay" (WTP) space utility parameterizations. The function includes an option to run a multistart optimization loop with random starting points in each iteration, which is useful for non-convex problems like MXL models or models with WTP space utility parameterizations. The main optimization loop uses the nloptr() function to minimize the negative log-likelihood function.

logitr(
  data,
  outcome,
  obsID,
  pars,
  scalePar = NULL,
  randPars = NULL,
  randScale = NULL,
  modelSpace = NULL,
  weights = NULL,
  panelID = NULL,
  clusterID = NULL,
  robust = FALSE,
  correlation = FALSE,
  startValBounds = c(-1, 1),
  startVals = NULL,
  numMultiStarts = 1,
  useAnalyticGrad = TRUE,
  scaleInputs = TRUE,
  standardDraws = NULL,
  drawType = "halton",
  numDraws = 50,
  numCores = NULL,
  vcov = FALSE,
  predict = TRUE,
  options = list(print_level = 0, xtol_rel = 1e-06, xtol_abs = 1e-06, ftol_rel = 1e-06,
    ftol_abs = 1e-06, maxeval = 1000, algorithm = "NLOPT_LD_LBFGS"),
  price,
  randPrice,
  choice,
  parNames,
  choiceName,
  obsIDName,
  priceName,
  weightsName,
  clusterName,
  cluster
)

Arguments

data

The data, formatted as a data.frame object.

outcome

The name of the column that identifies the outcome variable, which should be coded with a 1 for TRUE and 0 for FALSE.

obsID

The name of the column that identifies each observation.

pars

The names of the parameters to be estimated in the model. Must be the same as the column names in the data argument. For WTP space models, do not include the scalePar variable in pars.

scalePar

The name of the column that identifies the scale variable, which is typically "price" for WTP space models, but could be any continuous variable, such as "time". Defaults to NULL.

randPars

A named vector whose names are the random parameters and values the distribution: 'n' for normal, 'ln' for log-normal, or 'cn' for zero-censored normal. Defaults to NULL.

randScale

The random distribution for the scale parameter: 'n' for normal, 'ln' for log-normal, or 'cn' for zero-censored normal. Only used for WTP space MXL models. Defaults to NULL.

modelSpace

This argument is no longer needed as of v0.7.0. The model space is now determined based on the scalePar argument: if NULL (the default), the model will be in the preference space, otherwise it will be in the WTP space. Defaults to NULL.

weights

The name of the column that identifies the weights to be used in model estimation. Defaults to NULL.

panelID

The name of the column that identifies the individual (for panel data where multiple observations are recorded for each individual). Defaults to NULL.

clusterID

The name of the column that identifies the cluster groups to be used in model estimation. Defaults to NULL.

robust

Determines whether or not a robust covariance matrix is estimated. Defaults to FALSE. Specification of a clusterID or weights will override the user setting and set this to `TRUE' (a warning will be displayed in this case). Replicates the functionality of Stata's cmcmmixlogit.

correlation

Set to TRUE to account for correlation across random parameters (correlated heterogeneity). Defaults to FALSE.

startValBounds

sets the lower and upper bounds for the starting parameter values for each optimization run, which are generated by runif(n, lower, upper). Defaults to c(-1, 1).

startVals

is vector of values to be used as starting values for the optimization. Only used for the first run if numMultiStarts > 1. Defaults to NULL.

numMultiStarts

is the number of times to run the optimization loop, each time starting from a different random starting point for each parameter between startValBounds. Recommended for non-convex models, such as WTP space models and mixed logit models. Defaults to 1.

useAnalyticGrad

Set to FALSE to use numerically approximated gradients instead of analytic gradients during estimation. For now, using the analytic gradient is faster for MNL models but slower for MXL models. Defaults to TRUE.

scaleInputs

By default each variable in data is scaled to be between 0 and 1 before running the optimization routine because it usually helps with stability, especially if some of the variables have very large or very small values (e.g. > 10^3 or < 10^-3). Set to FALSE to turn this feature off. Defaults to TRUE.

standardDraws

By default, a new set of standard normal draws are generated during each call to logitr (the same draws are used during each multistart iteration). The user can override those draws by providing a matrix of standard normal draws if desired. Defaults to NULL.

drawType

Specify the draw type as a character: "halton" (the default) or "sobol" (recommended for models with more than 5 random parameters).

numDraws

The number of Halton draws to use for MXL models for the maximum simulated likelihood. Defaults to 50.

numCores

The number of cores to use for parallel processing of the multistart. Set to 1 to serially run the multistart. Defaults to NULL, in which case the number of cores is set to parallel::detectCores() - 1. Max cores allowed is capped at parallel::detectCores().

vcov

Set to TRUE to evaluate and include the variance-covariance matrix and coefficient standard errors in the returned object. Defaults to FALSE.

predict

If FALSE, predicted probabilities, fitted values, and residuals are not included in the returned object. Defaults to TRUE.

options

A list of options for controlling the nloptr() optimization. Run nloptr::nloptr.print.options() for details.

price

No longer used as of v0.7.0 - if provided, this is passed to the scalePar argument and a warning is displayed.

randPrice

No longer used as of v0.7.0 - if provided, this is passed to the randScale argument and a warning is displayed.

choice

No longer used as of v0.4.0 - if provided, this is passed to the outcome argument and a warning is displayed.

parNames

No longer used as of v0.2.3 - if provided, this is passed to the pars argument and a warning is displayed.

choiceName

No longer used as of v0.2.3 - if provided, this is passed to the outcome argument and a warning is displayed.

obsIDName

No longer used as of v0.2.3 - if provided, this is passed to the obsID argument and a warning is displayed.

priceName

No longer used as of v0.2.3 - if provided, this is passed to the scalePar argument and a warning is displayed.

weightsName

No longer used as of v0.2.3 - if provided, this is passed to the weights argument and a warning is displayed.

clusterName

No longer used as of v0.2.3 - if provided, this is passed to the clusterID argument and a warning is displayed.

cluster

No longer used as of v0.2.3 - if provided, this is passed to the clusterID argument and a warning is displayed.

Value

The function returns a list object containing the following objects.

ValueDescription
coefficientsThe model coefficients at convergence.
logLikThe log-likelihood value at convergence.
nullLogLikThe null log-likelihood value (if all coefficients are 0).
gradientThe gradient of the log-likelihood at convergence.
hessianThe hessian of the log-likelihood at convergence.
probabilitiesPredicted probabilities. Not returned if predict = FALSE.
fitted.valuesFitted values. Not returned if predict = FALSE.
residualsResiduals. Not returned if predict = FALSE.
startValsThe starting values used.
multistartNumberThe multistart run number for this model.
multistartSummaryA summary of the log-likelihood values for each multistart run (if more than one multistart was used).
timeThe user, system, and elapsed time to run the optimization.
iterationsThe number of iterations until convergence.
messageA more informative message with the status of the optimization result.
statusAn integer value with the status of the optimization (positive values are successes). Use statusCodes() for a detailed description.
callThe matched call to logitr().
inputsA list of the original inputs to logitr().
dataA list of the original data provided to logitr() broken up into components used during model estimation.
numObsThe number of observations.
numParamsThe number of model parameters.
freqThe frequency counts of each alternative.
modelTypeThe model type, 'mnl' for multinomial logit or 'mxl' for mixed logit.
weightsUsedTRUE or FALSE for whether weights were used in the model.
numClustersThe number of clusters.
parSetupA summary of the distributional assumptions on each model parameter ("f"="fixed", "n"="normal distribution", "ln"="log-normal distribution").
parIDsA list identifying the indices of each parameter in coefficients by a variety of types.
scaleFactorsA vector of the scaling factors used to scale each coefficient during estimation.
standardDrawsThe draws used during maximum simulated likelihood (for MXL models).
optionsA list of options for controlling the nloptr() optimization. Run nloptr::nloptr.print.options() for details.

Details

The the options argument is used to control the detailed behavior of the optimization and must be passed as a list, e.g. options = list(...). Below are a list of the default options, but other options can be included. Run nloptr::nloptr.print.options() for more details.

ArgumentDescriptionDefault
xtol_relThe relative x tolerance for the nloptr optimization loop.1.0e-6
xtol_absThe absolute x tolerance for the nloptr optimization loop.1.0e-6
ftol_relThe relative f tolerance for the nloptr optimization loop.1.0e-6
ftol_absThe absolute f tolerance for the nloptr optimization loop.1.0e-6
maxevalThe maximum number of function evaluations for the nloptr optimization loop.1000
algorithmThe optimization algorithm that nloptr uses."NLOPT_LD_LBFGS"
print_levelThe print level of the nloptr optimization loop.0

Examples

# For more detailed examples, visit
# https://jhelvy.github.io/logitr/articles/

library(logitr)

# Estimate a MNL model in the Preference space
mnl_pref <- logitr(
  data    = yogurt,
  outcome = "choice",
  obsID   = "obsID",
  pars    = c("price", "feat", "brand")
)
#> Running model...
#> Done!

# Estimate a MNL model in the WTP space, using a 5-run multistart
mnl_wtp <- logitr(
  data           = yogurt,
  outcome        = "choice",
  obsID          = "obsID",
  pars           = c("feat", "brand"),
  scalePar       = "price",
  numMultiStarts = 5
)
#> Running multistart...
#>   Random starting point iterations: 5
#>   Number of cores: 3
#> Done!

# Estimate a MXL model in the Preference space with "feat"
# following a normal distribution
# Panel structure is accounted for in this example using "panelID"
mxl_pref <- logitr(
  data     = yogurt,
  outcome  = "choice",
  obsID    = "obsID",
  panelID  = "id",
  pars     = c("price", "feat", "brand"),
  randPars = c(feat = "n")
)
#> Running model...
#> Done!