Estimate the same model on different size subsets of data

This function estimates the same model multiple times using different size subsets of a set of choice data and then returns a data frame of the estimated model coefficients and standard errors for each sample size. This is useful for determining the required sample size for obtaining a desired level of statistical power on each coefficient. The number of models to estimate is set by the nbreaks argument, which breaks up the data into groups of increasing sample sizes. All models are estimated models using the 'logitr' package. For more details see the JSS article on the 'logitr' package (Helveston, 2023).

cbc_power(
  data,
  outcome,
  obsID,
  pars,
  randPars = NULL,
  nbreaks = 10,
  n_q = 1,
  return_models = FALSE,
  panelID = NULL,
  clusterID = NULL,
  robust = FALSE,
  predict = FALSE,
  n_cores = NULL,
  ...
)

Arguments

data: The data, formatted as a data.frame object.
outcome: The name of the column that identifies the outcome variable, which should be coded with a 1 for TRUE and 0 for FALSE.
obsID: The name of the column that identifies each observation.
pars: The names of the parameters to be estimated in the model. Must be the same as the column names in the data argument.
randPars: A named vector whose names are the random parameters and values the distribution: 'n' for normal or 'ln' for log-normal. Defaults to NULL.
nbreaks: The number of different sample size groups.
n_q: Number of questions per respondent. Defaults to 1 if not specified.
return_models: If TRUE, a list of all estimated models is returned. This can be useful if you want to extract other outputs from each model, such as the variance-covariance matrix, etc. Defaults to FALSE.
panelID: The name of the column that identifies the individual (for panel data where multiple observations are recorded for each individual). Defaults to NULL.
clusterID: The name of the column that identifies the cluster groups to be used in model estimation. Defaults to NULL.
robust: Determines whether or not a robust covariance matrix is estimated. Defaults to FALSE. Specification of a clusterID will override the user setting and set this to `TRUE' (a warning will be displayed in this case). Replicates the functionality of Stata's cmcmmixlogit.
predict: If TRUE, predicted probabilities, fitted values, and residuals are also included in the returned model objects. Defaults to FALSE.
n_cores: The number of cores to use for parallel processing. Set to 1 to run serially Defaults to NULL, in which case the number of cores is set to parallel::detectCores() - 1. Max cores allowed is capped at parallel::detectCores().
...: Other arguments that are passed to logitr::logitr() for model estimation. See the logitr documentation for details about other available arguments.

Value

Returns a data frame of estimated model coefficients and standard errors for the same model estimated on subsets of the data with increasing sample sizes.

References

Helveston, J. P. (2023). logitr: Fast Estimation of Multinomial and Mixed Logit Models with Preference Space and Willingness-to-Pay Space Utility Parameterizations. Journal of Statistical Software, 105(10), 1–37, doi:10.18637/jss.v105.i10

Examples

library(cbcTools)

# A simple conjoint experiment about apples

# Generate all possible profiles
profiles <- cbc_profiles(
  price     = c(1, 1.5, 2, 2.5, 3, 3.5, 4, 4.5, 5),
  type      = c("Fuji", "Gala", "Honeycrisp"),
  freshness = c('Poor', 'Average', 'Excellent')
)

# Make a survey design from all possible profiles
# (This is the default setting where method = 'full' for "full factorial")
design <- cbc_design(
  profiles = profiles,
  n_resp   = 300, # Number of respondents
  n_alts   = 3,   # Number of alternatives per question
  n_q      = 6    # Number of questions per respondent
)

# Simulate random choices
data <- cbc_choices(
  design = design,
  obsID  = "obsID"
)

# Conduct a power analysis
power <- cbc_power(
  data    = data,
  pars    = c("price", "type", "freshness"),
  outcome = "choice",
  obsID   = "obsID",
  nbreaks = 10,
  n_q     = 6,
  n_cores = 2
)
#> Estimating models using 2 cores...
#> done!