The main function for estimating logit models

Use this function to estimate multinomial (MNL) and mixed logit (MXL) models with "Preference" space or "Willingness-to-pay" (WTP) space utility parameterizations. The function includes an option to run a multistart optimization loop with random starting points in each iteration, which is useful for non-convex problems like MXL models or models with WTP space utility parameterizations. The main optimization loop uses the nloptr() function to minimize the negative log-likelihood function.

logitr(
  data,
  outcome,
  obsID,
  pars,
  scalePar = NULL,
  randPars = NULL,
  randScale = NULL,
  modelSpace = NULL,
  weights = NULL,
  panelID = NULL,
  clusterID = NULL,
  robust = FALSE,
  correlation = FALSE,
  startValBounds = c(-1, 1),
  startVals = NULL,
  numMultiStarts = 1,
  useAnalyticGrad = TRUE,
  scaleInputs = TRUE,
  standardDraws = NULL,
  drawType = "halton",
  numDraws = 50,
  numCores = NULL,
  vcov = FALSE,
  predict = TRUE,
  options = list(print_level = 0, xtol_rel = 1e-06, xtol_abs = 1e-06, ftol_rel = 1e-06,
    ftol_abs = 1e-06, maxeval = 1000, algorithm = "NLOPT_LD_LBFGS"),
  price,
  randPrice,
  choice,
  parNames,
  choiceName,
  obsIDName,
  priceName,
  weightsName,
  clusterName,
  cluster
)

Arguments

data: The data, formatted as a data.frame object.
outcome: The name of the column that identifies the outcome variable, which should be coded with a 1 for TRUE and 0 for FALSE.
obsID: The name of the column that identifies each observation.
pars: The names of the parameters to be estimated in the model. Must be the same as the column names in the data argument. For WTP space models, do not include the scalePar variable in pars.
scalePar: The name of the column that identifies the scale variable, which is typically "price" for WTP space models, but could be any continuous variable, such as "time". Defaults to NULL.
randPars: A named vector whose names are the random parameters and values the distribution: 'n' for normal, 'ln' for log-normal, or 'cn' for zero-censored normal. Defaults to NULL.
randScale: The random distribution for the scale parameter: 'n' for normal, 'ln' for log-normal, or 'cn' for zero-censored normal. Only used for WTP space MXL models. Defaults to NULL.
modelSpace: This argument is no longer needed as of v0.7.0. The model space is now determined based on the scalePar argument: if NULL (the default), the model will be in the preference space, otherwise it will be in the WTP space. Defaults to NULL.
weights: The name of the column that identifies the weights to be used in model estimation. Defaults to NULL.
panelID: The name of the column that identifies the individual (for panel data where multiple observations are recorded for each individual). Defaults to NULL.
clusterID: The name of the column that identifies the cluster groups to be used in model estimation. Defaults to NULL.
robust: Determines whether or not a robust covariance matrix is estimated. Defaults to FALSE. Specification of a clusterID or weights will override the user setting and set this to `TRUE' (a warning will be displayed in this case). Replicates the functionality of Stata's cmcmmixlogit.
correlation: Set to TRUE to account for correlation across random parameters (correlated heterogeneity). Defaults to FALSE.
startValBounds: sets the lower and upper bounds for the starting parameter values for each optimization run, which are generated by runif(n, lower, upper). Defaults to c(-1, 1).
startVals: is vector of values to be used as starting values for the optimization. Only used for the first run if numMultiStarts > 1. Defaults to NULL.
numMultiStarts: is the number of times to run the optimization loop, each time starting from a different random starting point for each parameter between startValBounds. Recommended for non-convex models, such as WTP space models and mixed logit models. Defaults to 1.
useAnalyticGrad: Set to FALSE to use numerically approximated gradients instead of analytic gradients during estimation. For now, using the analytic gradient is faster for MNL models but slower for MXL models. Defaults to TRUE.
scaleInputs: By default each variable in data is scaled to be between 0 and 1 before running the optimization routine because it usually helps with stability, especially if some of the variables have very large or very small values (e.g. > 10^3 or < 10^-3). Set to FALSE to turn this feature off. Defaults to TRUE.
standardDraws: By default, a new set of standard normal draws are generated during each call to logitr (the same draws are used during each multistart iteration). The user can override those draws by providing a matrix of standard normal draws if desired. Defaults to NULL.
drawType: Specify the draw type as a character: "halton" (the default) or "sobol" (recommended for models with more than 5 random parameters).
numDraws: The number of Halton draws to use for MXL models for the maximum simulated likelihood. Defaults to 50.
numCores: The number of cores to use for parallel processing of the multistart. Set to 1 to serially run the multistart. Defaults to NULL, in which case the number of cores is set to parallel::detectCores() - 1. Max cores allowed is capped at parallel::detectCores().
vcov: Set to TRUE to evaluate and include the variance-covariance matrix and coefficient standard errors in the returned object. Defaults to FALSE.
predict: If FALSE, predicted probabilities, fitted values, and residuals are not included in the returned object. Defaults to TRUE.
options: A list of options for controlling the nloptr() optimization. Run nloptr::nloptr.print.options() for details.
price: No longer used as of v0.7.0 - if provided, this is passed to the scalePar argument and a warning is displayed.
randPrice: No longer used as of v0.7.0 - if provided, this is passed to the randScale argument and a warning is displayed.
choice: No longer used as of v0.4.0 - if provided, this is passed to the outcome argument and a warning is displayed.
parNames: No longer used as of v0.2.3 - if provided, this is passed to the pars argument and a warning is displayed.
choiceName: No longer used as of v0.2.3 - if provided, this is passed to the outcome argument and a warning is displayed.
obsIDName: No longer used as of v0.2.3 - if provided, this is passed to the obsID argument and a warning is displayed.
priceName: No longer used as of v0.2.3 - if provided, this is passed to the scalePar argument and a warning is displayed.
weightsName: No longer used as of v0.2.3 - if provided, this is passed to the weights argument and a warning is displayed.
clusterName: No longer used as of v0.2.3 - if provided, this is passed to the clusterID argument and a warning is displayed.
cluster: No longer used as of v0.2.3 - if provided, this is passed to the clusterID argument and a warning is displayed.

Value

The function returns a list object containing the following objects.

Value	Description
`coefficients`	The model coefficients at convergence.
`logLik`	The log-likelihood value at convergence.
`nullLogLik`	The null log-likelihood value (if all coefficients are 0).
`gradient`	The gradient of the log-likelihood at convergence.
`hessian`	The hessian of the log-likelihood at convergence.
`probabilities`	Predicted probabilities. Not returned if `predict = FALSE`.
`fitted.values`	Fitted values. Not returned if `predict = FALSE`.
`residuals`	Residuals. Not returned if `predict = FALSE`.
`startVals`	The starting values used.
`multistartNumber`	The multistart run number for this model.
`multistartSummary`	A summary of the log-likelihood values for each multistart run (if more than one multistart was used).
`time`	The user, system, and elapsed time to run the optimization.
`iterations`	The number of iterations until convergence.
`message`	A more informative message with the status of the optimization result.
`status`	An integer value with the status of the optimization (positive values are successes). Use `statusCodes()` for a detailed description.
`call`	The matched call to `logitr()`.
`inputs`	A list of the original inputs to `logitr()`.
`data`	A list of the original data provided to `logitr()` broken up into components used during model estimation.
`numObs`	The number of observations.
`numParams`	The number of model parameters.
`freq`	The frequency counts of each alternative.
`modelType`	The model type, `'mnl'` for multinomial logit or `'mxl'` for mixed logit.
`weightsUsed`	`TRUE` or `FALSE` for whether weights were used in the model.
`numClusters`	The number of clusters.
`parSetup`	A summary of the distributional assumptions on each model parameter (`"f"`="fixed", `"n"`="normal distribution", `"ln"`="log-normal distribution").
`parIDs`	A list identifying the indices of each parameter in `coefficients` by a variety of types.
`scaleFactors`	A vector of the scaling factors used to scale each coefficient during estimation.
`standardDraws`	The draws used during maximum simulated likelihood (for MXL models).
`options`	A list of options for controlling the `nloptr()` optimization. Run `nloptr::nloptr.print.options()` for details.

Details

The the options argument is used to control the detailed behavior of the optimization and must be passed as a list, e.g. options = list(...). Below are a list of the default options, but other options can be included. Run nloptr::nloptr.print.options() for more details.

Argument	Description	Default
`xtol_rel`	The relative `x` tolerance for the `nloptr` optimization loop.	`1.0e-6`
`xtol_abs`	The absolute `x` tolerance for the `nloptr` optimization loop.	`1.0e-6`
`ftol_rel`	The relative `f` tolerance for the `nloptr` optimization loop.	`1.0e-6`
`ftol_abs`	The absolute `f` tolerance for the `nloptr` optimization loop.	`1.0e-6`
`maxeval`	The maximum number of function evaluations for the `nloptr` optimization loop.	`1000`
`algorithm`	The optimization algorithm that `nloptr` uses.	`"NLOPT_LD_LBFGS"`
`print_level`	The print level of the `nloptr` optimization loop.	`0`

Examples

# For more detailed examples, visit
# https://jhelvy.github.io/logitr/articles/

library(logitr)

# Estimate a MNL model in the Preference space
mnl_pref <- logitr(
  data    = yogurt,
  outcome = "choice",
  obsID   = "obsID",
  pars    = c("price", "feat", "brand")
)
#> Running model...
#> Done!

# Estimate a MNL model in the WTP space, using a 5-run multistart
mnl_wtp <- logitr(
  data           = yogurt,
  outcome        = "choice",
  obsID          = "obsID",
  pars           = c("feat", "brand"),
  scalePar       = "price",
  numMultiStarts = 5
)
#> Running multistart...
#>   Random starting point iterations: 5
#>   Number of cores: 2
#> Done!

# Estimate a MXL model in the Preference space with "feat"
# following a normal distribution
# Panel structure is accounted for in this example using "panelID"
mxl_pref <- logitr(
  data     = yogurt,
  outcome  = "choice",
  obsID    = "obsID",
  panelID  = "id",
  pars     = c("price", "feat", "brand"),
  randPars = c(feat = "n")
)
#> Running model...
#> Done!