Use this function to estimate multinomial (MNL) and mixed logit (MXL)
models with "Preference" space or "Willingness-to-pay" (WTP) space utility
parameterizations. The function includes an option to run a multistart
optimization loop with random starting points in each iteration, which is
useful for non-convex problems like MXL models or models with WTP space
utility parameterizations. The main optimization loop uses the nloptr()
function to minimize the negative log-likelihood function.
logitr(
data,
outcome,
obsID,
pars,
scalePar = NULL,
randPars = NULL,
randScale = NULL,
modelSpace = NULL,
weights = NULL,
panelID = NULL,
clusterID = NULL,
robust = FALSE,
correlation = FALSE,
startValBounds = c(-1, 1),
startVals = NULL,
numMultiStarts = 1,
useAnalyticGrad = TRUE,
scaleInputs = TRUE,
standardDraws = NULL,
drawType = "halton",
numDraws = 50,
numCores = NULL,
vcov = FALSE,
predict = TRUE,
options = list(print_level = 0, xtol_rel = 1e-06, xtol_abs = 1e-06, ftol_rel = 1e-06,
ftol_abs = 1e-06, maxeval = 1000, algorithm = "NLOPT_LD_LBFGS"),
price,
randPrice,
choice,
parNames,
choiceName,
obsIDName,
priceName,
weightsName,
clusterName,
cluster
)
The data, formatted as a data.frame
object.
The name of the column that identifies the outcome variable,
which should be coded with a 1
for TRUE
and 0
for FALSE
.
The name of the column that identifies each observation.
The names of the parameters to be estimated in the model.
Must be the same as the column names in the data
argument. For WTP space
models, do not include the scalePar
variable in pars
.
The name of the column that identifies the scale variable,
which is typically "price" for WTP space models, but could be any
continuous variable, such as "time". Defaults to NULL
.
A named vector whose names are the random parameters and
values the distribution: 'n'
for normal, 'ln'
for log-normal, or
'cn'
for zero-censored normal. Defaults to NULL
.
The random distribution for the scale parameter: 'n'
for
normal, 'ln'
for log-normal, or 'cn'
for zero-censored normal. Only used
for WTP space MXL models. Defaults to NULL
.
This argument is no longer needed as of v0.7.0. The model
space is now determined based on the scalePar
argument:
if NULL
(the default), the model will be in the preference space,
otherwise it will be in the WTP space. Defaults to NULL
.
The name of the column that identifies the weights to be
used in model estimation. Defaults to NULL
.
The name of the column that identifies the individual (for
panel data where multiple observations are recorded for each individual).
Defaults to NULL
.
The name of the column that identifies the cluster
groups to be used in model estimation. Defaults to NULL
.
Determines whether or not a robust covariance matrix is
estimated. Defaults to FALSE
. Specification of a clusterID
or
weights
will override the user setting and set this to `TRUE' (a
warning will be displayed in this case). Replicates the functionality of
Stata's cmcmmixlogit.
Set to TRUE
to account for correlation across random
parameters (correlated heterogeneity). Defaults to FALSE
.
sets the lower
and upper
bounds for the starting
parameter values for each optimization run, which are generated by
runif(n, lower, upper)
. Defaults to c(-1, 1)
.
is vector of values to be used as starting values for the
optimization. Only used for the first run if numMultiStarts > 1
. Defaults
to NULL
.
is the number of times to run the optimization loop,
each time starting from a different random starting point for each parameter
between startValBounds
. Recommended for non-convex models, such as WTP
space models and mixed logit models. Defaults to 1
.
Set to FALSE
to use numerically approximated
gradients instead of analytic gradients during estimation. For now, using
the analytic gradient is faster for MNL models but slower for MXL models.
Defaults to TRUE
.
By default each variable in data
is scaled to be
between 0 and 1 before running the optimization routine because it usually
helps with stability, especially if some of the variables have very large or
very small values (e.g. > 10^3
or < 10^-3
). Set to FALSE
to turn this
feature off. Defaults to TRUE
.
By default, a new set of standard normal draws are
generated during each call to logitr
(the same draws are used during each
multistart iteration). The user can override those draws by providing a
matrix of standard normal draws if desired. Defaults to NULL
.
Specify the draw type as a character: "halton"
(the default) or "sobol"
(recommended for models with more than 5
random parameters).
The number of Halton draws to use for MXL models for the
maximum simulated likelihood. Defaults to 50
.
The number of cores to use for parallel processing of the
multistart. Set to 1
to serially run the multistart. Defaults to NULL
,
in which case the number of cores is set to parallel::detectCores() - 1
.
Max cores allowed is capped at parallel::detectCores()
.
Set to TRUE
to evaluate and include the variance-covariance
matrix and coefficient standard errors in the returned object.
Defaults to FALSE
.
If FALSE
, predicted probabilities, fitted values, and
residuals are not included in the returned object. Defaults to TRUE
.
A list of options for controlling the nloptr()
optimization.
Run nloptr::nloptr.print.options()
for details.
No longer used as of v0.7.0 - if provided, this is passed
to the scalePar
argument and a warning is displayed.
No longer used as of v0.7.0 - if provided, this is passed
to the randScale
argument and a warning is displayed.
No longer used as of v0.4.0 - if provided, this is passed
to the outcome
argument and a warning is displayed.
No longer used as of v0.2.3 - if provided, this is passed
to the pars
argument and a warning is displayed.
No longer used as of v0.2.3 - if provided, this is passed
to the outcome
argument and a warning is displayed.
No longer used as of v0.2.3 - if provided, this is passed
to the obsID
argument and a warning is displayed.
No longer used as of v0.2.3 - if provided, this is passed
to the scalePar
argument and a warning is displayed.
No longer used as of v0.2.3 - if provided, this is passed
to the weights
argument and a warning is displayed.
No longer used as of v0.2.3 - if provided, this is passed
to the clusterID
argument and a warning is displayed.
No longer used as of v0.2.3 - if provided, this is passed
to the clusterID
argument and a warning is displayed.
The function returns a list object containing the following objects.
Value | Description |
coefficients | The model coefficients at convergence. |
logLik | The log-likelihood value at convergence. |
nullLogLik | The null log-likelihood value (if all coefficients are 0). |
gradient | The gradient of the log-likelihood at convergence. |
hessian | The hessian of the log-likelihood at convergence. |
probabilities | Predicted probabilities. Not returned if predict = FALSE . |
fitted.values | Fitted values. Not returned if predict = FALSE . |
residuals | Residuals. Not returned if predict = FALSE . |
startVals | The starting values used. |
multistartNumber | The multistart run number for this model. |
multistartSummary | A summary of the log-likelihood values for each multistart run (if more than one multistart was used). |
time | The user, system, and elapsed time to run the optimization. |
iterations | The number of iterations until convergence. |
message | A more informative message with the status of the optimization result. |
status | An integer value with the status of the optimization (positive values are successes). Use statusCodes() for a detailed description. |
call | The matched call to logitr() . |
inputs | A list of the original inputs to logitr() . |
data | A list of the original data provided to logitr() broken up into components used during model estimation. |
numObs | The number of observations. |
numParams | The number of model parameters. |
freq | The frequency counts of each alternative. |
modelType | The model type, 'mnl' for multinomial logit or 'mxl' for mixed logit. |
weightsUsed | TRUE or FALSE for whether weights were used in the model. |
numClusters | The number of clusters. |
parSetup | A summary of the distributional assumptions on each model parameter ("f" ="fixed", "n" ="normal distribution", "ln" ="log-normal distribution"). |
parIDs | A list identifying the indices of each parameter in coefficients by a variety of types. |
scaleFactors | A vector of the scaling factors used to scale each coefficient during estimation. |
standardDraws | The draws used during maximum simulated likelihood (for MXL models). |
options | A list of options for controlling the nloptr() optimization. Run nloptr::nloptr.print.options() for details. |
The the options
argument is used to control the detailed behavior of the
optimization and must be passed as a list, e.g. options = list(...)
.
Below are a list of the default options, but other options can be included.
Run nloptr::nloptr.print.options()
for more details.
Argument | Description | Default |
xtol_rel | The relative x tolerance for the nloptr optimization loop. | 1.0e-6 |
xtol_abs | The absolute x tolerance for the nloptr optimization loop. | 1.0e-6 |
ftol_rel | The relative f tolerance for the nloptr optimization loop. | 1.0e-6 |
ftol_abs | The absolute f tolerance for the nloptr optimization loop. | 1.0e-6 |
maxeval | The maximum number of function evaluations for the nloptr optimization loop. | 1000 |
algorithm | The optimization algorithm that nloptr uses. | "NLOPT_LD_LBFGS" |
print_level | The print level of the nloptr optimization loop. | 0 |
# For more detailed examples, visit
# https://jhelvy.github.io/logitr/articles/
library(logitr)
# Estimate a MNL model in the Preference space
mnl_pref <- logitr(
data = yogurt,
outcome = "choice",
obsID = "obsID",
pars = c("price", "feat", "brand")
)
#> Running model...
#> Done!
# Estimate a MNL model in the WTP space, using a 5-run multistart
mnl_wtp <- logitr(
data = yogurt,
outcome = "choice",
obsID = "obsID",
pars = c("feat", "brand"),
scalePar = "price",
numMultiStarts = 5
)
#> Running multistart...
#> Random starting point iterations: 5
#> Number of cores: 2
#> Done!
# Estimate a MXL model in the Preference space with "feat"
# following a normal distribution
# Panel structure is accounted for in this example using "panelID"
mxl_pref <- logitr(
data = yogurt,
outcome = "choice",
obsID = "obsID",
panelID = "id",
pars = c("price", "feat", "brand"),
randPars = c(feat = "n")
)
#> Running model...
#> Done!