Estimate the same model on different size subsets of a data set

This function estimates the same model multiple times using different size subsets of a set of choice data. The number of models to run is set by the nbreaks argument, which breaks up the data into groups of increasing sample sizes. All models are estimated models using the logitr package.

estimateModels(
  nbreaks = 10,
  nQPerResp = 1,
  data,
  outcome,
  obsID,
  pars,
  price = NULL,
  randPars = NULL,
  randPrice = NULL,
  modelSpace = "pref",
  weights = NULL,
  panelID = NULL,
  clusterID = NULL,
  robust = FALSE,
  startParBounds = c(-1, 1),
  startVals = NULL,
  numMultiStarts = 1,
  useAnalyticGrad = TRUE,
  scaleInputs = TRUE,
  standardDraws = NULL,
  numDraws = 50,
  vcov = FALSE,
  predict = FALSE,
  options = list(print_level = 0, xtol_rel = 1e-06, xtol_abs = 1e-06, ftol_rel = 1e-06,
    ftol_abs = 1e-06, maxeval = 1000, algorithm = "NLOPT_LD_LBFGS")
)

Arguments

nbreaks	The number of different sample size groups.
nQPerResp	Number of questions per respondent. Defaults to `1` if not specified.
data	The data, formatted as a `data.frame` object.
outcome	The name of the column that identifies the outcome variable, which should be coded with a `1` for `TRUE` and `0` for `FALSE`.
obsID	The name of the column that identifies each observation.
pars	The names of the parameters to be estimated in the model. Must be the same as the column names in the `data` argument. For WTP space models, do not include price in `pars`.
price	The name of the column that identifies the price variable. Required for WTP space models. Defaults to `NULL`.
randPars	A named vector whose names are the random parameters and values the distribution: `'n'` for normal or `'ln'` for log-normal. Defaults to `NULL`.
randPrice	The random distribution for the price parameter: `'n'` for normal or `'ln'` for log-normal. Only used for WTP space MXL models. Defaults to `NULL`.
modelSpace	Set to `'wtp'` for WTP space models. Defaults to `"pref"`.
weights	The name of the column that identifies the weights to be used in model estimation. Defaults to `NULL`.
panelID	The name of the column that identifies the individual (for panel data where multiple observations are recorded for each individual). Defaults to `NULL`.
clusterID	The name of the column that identifies the cluster groups to be used in model estimation. Defaults to `NULL`.
robust	Determines whether or not a robust covariance matrix is estimated. Defaults to `FALSE`. Specification of a `clusterID` or `weights` will override the user setting and set this to `TRUE' (a warning will be displayed in this case). Replicates the functionality of Stata's cmcmmixlogit.
startParBounds	sets the `lower` and `upper` bounds for the starting parameters for each optimization run, which are generated by `runif(n, lower, upper)`. Defaults to `c(-1, 1)`.
startVals	is vector of values to be used as starting values for the optimization. Only used for the first run if `numMultiStarts > 1`. Defaults to `NULL`.
numMultiStarts	is the number of times to run the optimization loop, each time starting from a different random starting point for each parameter between `startParBounds`. Recommended for non-convex models, such as WTP space models and mixed logit models. Defaults to `1`.
useAnalyticGrad	Set to `FALSE` to use numerically approximated gradients instead of analytic gradients during estimation. For now, using the analytic gradient is faster for MNL models but slower for MXL models. Defaults to `TRUE`.
scaleInputs	By default each variable in `data` is scaled to be between 0 and 1 before running the optimization routine because it usually helps with stability, especially if some of the variables have very large or very small values (e.g. `> 10^3` or `< 10^-3`). Set to `FALSE` to turn this feature off. Defaults to `TRUE`.
standardDraws	By default, a new set of standard normal draws are generated during each call to `logitr` (the same draws are used during each multistart iteration). The user can override those draws by providing a matrix of standard normal draws if desired. Defaults to `NULL`.
numDraws	The number of Halton draws to use for MXL models for the maximum simulated likelihood. Defaults to `50`.
vcov	Set to `TRUE` to evaluate and include the variance-covariance matrix and coefficient standard errors in the returned object. Defaults to `FALSE`.
predict	If `FALSE`, predicted probabilities, fitted values, and residuals are not included in the returned object. Defaults to `TRUE`.
options	A list of options for controlling the `nloptr()` optimization. Run `nloptr::nloptr.print.options()` for details.

Value

Returns a nested data frame with each estimated model object in the model column.

Examples

library(conjointTools)

# Define the attributes and levels
levels <- list(
  price     = seq(1, 4, 0.5), # $ per pound
  type      = c('Fuji', 'Gala', 'Honeycrisp', 'Pink Lady', 'Red Delicious'),
  freshness = c('Excellent', 'Average', 'Poor')
)

# Make a full-factorial design of experiment and recode the levels
doe <- makeDoe(levels)
doe <- recodeDoe(doe, levels)

# Make the survey
survey <- makeSurvey(
    doe       = doe,  # Design of experiment
    nResp     = 2000, # Total number of respondents (upper bound)
    nAltsPerQ = 3,    # Number of alternatives per question
    nQPerResp = 6     # Number of questions per respondent
)

# Simulate random choices for the survey
data <- simulateChoices(
    survey = survey,
    obsID  = "obsID"
)

# Estimate models with different sample sizes
models <- estimateModels(
    nbreaks = 10,
    data    = data,
    pars    = c("price", "type", "freshness"),
    outcome = "choice",
    obsID   = "obsID"
)