Skip to contents

This function creates experimental designs for choice-based conjoint experiments using multiple design approaches including optimization and frequency-based methods.

Usage

cbc_design(
  profiles,
  method = "random",
  priors = NULL,
  n_alts,
  n_q,
  n_resp = 100,
  n_blocks = 1,
  n_cores = NULL,
  no_choice = FALSE,
  label = NULL,
  balance_by = NULL,
  randomize_questions = TRUE,
  randomize_alts = TRUE,
  remove_dominant = FALSE,
  dominance_types = c("total", "partial"),
  dominance_threshold = 0.8,
  max_dominance_attempts = 50,
  max_iter = 50,
  n_start = 5,
  include_probs = FALSE,
  use_idefix = TRUE
)

Arguments

profiles

A data frame of class cbc_profiles created using cbc_profiles()

method

Choose the design method: "random", "shortcut", "minoverlap", "balanced", "stochastic", "modfed", or "cea". Defaults to "random"

priors

A cbc_priors object created by cbc_priors(), or NULL for random/shortcut designs

n_alts

Number of alternatives per choice question

n_q

Number of questions per respondent (or per block)

n_resp

Number of respondents (for random/shortcut designs) or 1 (for optimized designs that get repeated)

n_blocks

Number of blocks in the design. Defaults to 1

n_cores

Number of cores to use for parallel processing in the design search. Defaults to NULL, in which case it is set to the number of available cores minus 1.

no_choice

Include a "no choice" option? Defaults to FALSE

label

The name of the variable to use in a "labeled" design. Defaults to NULL

balance_by

Character vector of attribute names to balance sampling across. Ensures balanced representation across levels of specified attributes. Only compatible with "random", "shortcut", "minoverlap", and "balanced" methods. Cannot be used with labeled designs or D-optimal methods ("stochastic", "modfed", "cea"). Defaults to NULL

randomize_questions

Randomize question order for each respondent? Defaults to TRUE (optimized methods only)

randomize_alts

Randomize alternative order within questions? Defaults to TRUE (optimized methods only)

remove_dominant

Remove choice sets with dominant alternatives? Defaults to FALSE

dominance_types

Types of dominance to check: "total" and/or "partial"

dominance_threshold

Threshold for total dominance detection. Defaults to 0.8

max_dominance_attempts

Maximum attempts to replace dominant choice sets. Defaults to 50.

max_iter

Maximum iterations for optimized designs. Defaults to 50

n_start

Number of random starts for optimized designs. Defaults to 5

include_probs

Include predicted probabilities in resulting design? Requires priors. Defaults to FALSE

use_idefix

If TRUE (the default), the idefix package will be used to find optimal designs, which is faster. Only valid with "cea" and "modfed" methods.

Value

A cbc_design object containing the experimental design

Details

Design Methods

The method argument determines the design approach used:

  • "random": Creates designs by randomly sampling profiles for each respondent independently

  • "shortcut": Frequency-based greedy algorithm that balances attribute level usage

  • "minoverlap": Greedy algorithm that minimizes attribute overlap within choice sets

  • "balanced": Greedy algorithm that maximizes overall attribute balance across the design

  • "stochastic": Stochastic profile swapping with D-error optimization (first improvement found)

  • "modfed": Modified Fedorov algorithm with exhaustive profile swapping for D-error optimization

  • "cea": Coordinate Exchange Algorithm with attribute-by-attribute D-error optimization

Method Compatibility

The table below summarizes method compatibility with design features:

MethodNo choice?Labeled designs?Restricted profiles?balance_by?Blocking?Interactions?Dominance removal?
"random"YesYesYesYesNoYesYes
"shortcut"YesYesYesYesNoNoYes
"minoverlap"YesYesYesYesNoNoYes
"balanced"YesYesYesYesNoNoYes
"stochastic"YesYesYesNoYesYesYes
"modfed"YesYesYesNoYesYesYes
"cea"YesYesNoNoYesYesYes

Design Quality Assurance

All methods ensure the following criteria are met:

  1. No duplicate profiles within any choice set

  2. No duplicate choice sets within any respondent

  3. If remove_dominant = TRUE, choice sets with dominant alternatives are eliminated (optimization methods only)

Balanced Sampling with balance_by

The balance_by argument enables balanced sampling across specified attributes, solving the problem of attribute-specific features that create imbalanced designs. For example, consider an experiment on alternative vehicle powertrains with a "powertrain" attribute for gas and electric vehicles. If you had an "electric_vehicle_range" attribute, it should be 0 for non-electric powertrains, but using restrictions can lead to over-representation of electric vehicles. Using balance_by = "powertrain" ensures that each choice question samples proportionally from gas and electric powertrains, maintaining balance even when electric vehicles have additional attributes.

Multiple attributes can be balanced simultaneously using balance_by = c("attr1", "attr2"), which creates groups based on unique combinations of the specified attributes.

Method Details

Random Method

Creates designs where each respondent sees completely independent, randomly generated choice sets.

Greedy Methods (shortcut, minoverlap, balanced)

These methods use frequency-based algorithms that make locally optimal choices:

  • Shortcut: Balances attribute level usage within questions and across the overall design

  • Minoverlap: Minimizes attribute overlap within choice sets while allowing some overlap for balance

  • Balanced: Maximizes overall attribute balance, prioritizing level distribution over overlap reduction

These methods provide good level balance without requiring priors or D-error calculations and offer fast execution suitable for large designs.

D-Error Optimization Methods (stochastic, modfed, cea)

These methods minimize D-error to create statistically efficient designs:

  • Stochastic: Random profile sampling with first improvement acceptance

  • Modfed: Exhaustive profile testing for best improvement (slower but thorough)

  • CEA: Coordinate exchange testing attribute levels individually (requires full factorial profiles)

idefix Integration

When use_idefix = TRUE (the default), the function leverages the highly optimized algorithms from the idefix package for 'cea' and 'modfed' design generation methods. This can provide significant speed improvements, especially for larger problems.

Key benefits of idefix integration:

  • Faster optimization algorithms with C++ implementation

  • Better handling of large candidate sets

  • Optimized parallel processing

  • Advanced blocking capabilities for multi-block designs

Examples

library(cbcTools)

# Create profiles for an apple choice experiment
profiles <- cbc_profiles(
    price = c(1, 1.5, 2, 2.5, 3),
    type = c("Fuji", "Gala", "Honeycrisp"),
    freshness = c("Poor", "Average", "Excellent")
)

# Basic random design
design_random <- cbc_design(
    profiles = profiles,
    n_alts = 3,
    n_q = 6,
    n_resp = 100
)

head(design_random)
#> Design method: random
#> Structure: 100 respondents × 6 questions × 3 alternatives
#> Profile usage: 45/45 (100.0%)
#> 
#> 💡 Use cbc_inspect() for a more detailed summary
#> 
#> First few rows of design:
#>   profileID respID qID altID obsID price typeGala typeHoneycrisp
#> 1        22      1   1     1     1   1.5        1              0
#> 2         9      1   1     2     1   2.5        1              0
#> 3        34      1   1     3     1   2.5        0              0
#> 4        18      1   2     1     2   2.0        0              0
#> 5         3      1   2     2     2   2.0        0              0
#> 6         7      1   2     3     2   1.5        1              0
#>   freshnessAverage freshnessExcellent
#> 1                1                  0
#> 2                0                  0
#> 3                0                  1
#> 4                1                  0
#> 5                0                  0
#> 6                0                  0

# Inspect design
cbc_inspect(design_random)
#> DESIGN SUMMARY
#> =========================
#> 
#> STRUCTURE
#> ================
#> Method: random
#> Created: 2025-09-23 16:54:07
#> Respondents: 100
#> Questions per respondent: 6
#> Alternatives per question: 3
#> Total choice sets: 600
#> Profile usage: 45/45 (100.0%)
#> 
#> SUMMARY METRICS
#> =================
#> D-error calculation not available for random designs
#> Overall balance score: 0.736 (higher is better)
#> Overall overlap score: 0.255 (lower is better)
#> 
#> VARIABLE ENCODING
#> =================
#> Format: Dummy-coded (type, freshness)
#> 💡 Use cbc_decode_design() to convert to categorical format
#> 
#> ATTRIBUTE BALANCE
#> =================
#> Overall balance score: 0.736 (higher is better)
#> 
#> Individual attribute level counts:
#> 
#> price:
#> 
#>   1 1.5   2 2.5   3 
#> 380 350 361 381 328 
#>   Balance score: 0.942 (higher is better)
#> 
#> typeGala:
#> 
#>    0    1 
#> 1180  620 
#>   Balance score: 0.694 (higher is better)
#> 
#> typeHoneycrisp:
#> 
#>    0    1 
#> 1223  577 
#>   Balance score: 0.663 (higher is better)
#> 
#> freshnessAverage:
#> 
#>    0    1 
#> 1194  606 
#>   Balance score: 0.684 (higher is better)
#> 
#> freshnessExcellent:
#> 
#>    0    1 
#> 1179  621 
#>   Balance score: 0.695 (higher is better)
#> 
#> ATTRIBUTE OVERLAP
#> =================
#> Overall overlap score: 0.255 (lower is better)
#> 
#> Counts of attribute overlap:
#> (# of questions with N unique levels)
#> 
#> price: Continuous variable
#>   Questions by # unique levels:
#>   1 (complete overlap):   2.5%  (15 / 600 questions)
#>   2 (partial overlap):   47.2%  (283 / 600 questions)
#>   3 (partial overlap):   50.3%  (302 / 600 questions)
#>   4 (partial overlap):    0.0%  (0 / 600 questions)
#>   5 (no overlap):         0.0%  (0 / 600 questions)
#>   Average unique levels per question: 2.48
#> 
#> typeGala: Continuous variable
#>   Questions by # unique levels:
#>   1 (complete overlap):  29.2%  (175 / 600 questions)
#>   2 (no overlap):        70.8%  (425 / 600 questions)
#>   Average unique levels per question: 1.71
#> 
#> typeHoneycrisp: Continuous variable
#>   Questions by # unique levels:
#>   1 (complete overlap):  33.8%  (203 / 600 questions)
#>   2 (no overlap):        66.2%  (397 / 600 questions)
#>   Average unique levels per question: 1.66
#> 
#> freshnessAverage: Continuous variable
#>   Questions by # unique levels:
#>   1 (complete overlap):  28.8%  (173 / 600 questions)
#>   2 (no overlap):        71.2%  (427 / 600 questions)
#>   Average unique levels per question: 1.71
#> 
#> freshnessExcellent: Continuous variable
#>   Questions by # unique levels:
#>   1 (complete overlap):  33.0%  (198 / 600 questions)
#>   2 (no overlap):        67.0%  (402 / 600 questions)
#>   Average unique levels per question: 1.67
#> 
#> 

# Greedy design with balanced frequency
design_balanced <- cbc_design(
    profiles = profiles,
    method = "balanced",
    n_alts = 3,
    n_q = 6,
    n_resp = 100
)
#> Generating balanced design for 100 respondents using 3 cores...

# Design with priors using D-optimal method
priors <- cbc_priors(
    profiles = profiles,
    price = -0.25,
    type = c("Gala" = 0.5, "Honeycrisp" = 1.0),
    freshness = c("Average" = 0.6, "Excellent" = 1.2)
)

design_optimal <- cbc_design(
    profiles = profiles,
    method = "stochastic",
    priors = priors,
    n_alts = 3,
    n_q = 6,
    n_resp = 100,
    n_start = 3
)
#> Stochastic design will be optimized into 1 design block, then allocated across 100 respondents
#> Running 3 design searches using 3 cores...
#> 
#> D-error results from all starts:
#> Start 2: 0.889353   (Best)
#> Start 1: 0.909082 
#> Start 3: 0.997963 

# Compare designs
cbc_compare(
    "Random" = design_random,
    "Balanced" = design_balanced,
    "D-optimal" = design_optimal
)
#> CBC Design Comparison
#> =====================
#> Designs compared: 3
#> Metrics: structure, efficiency, balance, overlap
#> Sorted by: d_error (ascending)
#> 
#> Structure
#> =====================
#>     Design     Method respondents questions
#>  D-optimal stochastic         100         6
#>     Random     random         100         6
#>   Balanced   balanced         100         6
#>  Alternatives Blocks Profile Usage
#>             3      1 (16/45) 35.6%
#>             3      1  (45/45) 100%
#>             3      1  (45/45) 100%
#>  No Choice Labeled?
#>         No       No
#>         No       No
#>         No       No
#> 
#> Design Metrics
#> =====================
#>     Design     Method D-Error (Null) D-Error (Prior) Balance Overlap
#>  D-optimal stochastic       0.785048        0.889353   0.643   0.100
#>     Random     random             NA              NA   0.736   0.255
#>   Balanced   balanced             NA              NA   0.742   0.000
#> 
#> Interpretation:
#> - D-Error: Lower is better (design efficiency)
#> - Balance: Higher is better (level distribution)
#> - Overlap: Lower is better (attribute variation)
#> - Profile Usage: Higher means more profiles used
#> 
#> Best performers:
#> - D-Error: D-optimal (0.889353)
#> - Balance: Balanced (0.742)
#> - Overlap: Balanced (0.000)
#> - Profile Usage: Random (100.0%)
#> 
#> Use summary() for detailed information on any one design.