
Generate survey designs for choice experiments (Updated Implementation)
Source:R/design.R
cbc_design.Rd
This function creates experimental designs for choice-based conjoint experiments using multiple design approaches including optimization and frequency-based methods.
Usage
cbc_design(
profiles,
method = "random",
priors = NULL,
n_alts,
n_q,
n_resp = 100,
n_blocks = 1,
n_cores = NULL,
no_choice = FALSE,
label = NULL,
balance_by = NULL,
randomize_questions = TRUE,
randomize_alts = TRUE,
remove_dominant = FALSE,
dominance_types = c("total", "partial"),
dominance_threshold = 0.8,
max_dominance_attempts = 50,
max_iter = 50,
n_start = 5,
include_probs = FALSE,
use_idefix = TRUE
)
Arguments
- profiles
A data frame of class
cbc_profiles
created usingcbc_profiles()
- method
Choose the design method: "random", "shortcut", "minoverlap", "balanced", "stochastic", "modfed", or "cea". Defaults to "random"
- priors
A
cbc_priors
object created bycbc_priors()
, or NULL for random/shortcut designs- n_alts
Number of alternatives per choice question
- n_q
Number of questions per respondent (or per block)
- n_resp
Number of respondents (for random/shortcut designs) or 1 (for optimized designs that get repeated)
- n_blocks
Number of blocks in the design. Defaults to 1
- n_cores
Number of cores to use for parallel processing in the design search. Defaults to NULL, in which case it is set to the number of available cores minus 1.
- no_choice
Include a "no choice" option? Defaults to FALSE
- label
The name of the variable to use in a "labeled" design. Defaults to NULL
- balance_by
Character vector of attribute names to balance sampling across. Ensures balanced representation across levels of specified attributes. Only compatible with "random", "shortcut", "minoverlap", and "balanced" methods. Cannot be used with labeled designs or D-optimal methods ("stochastic", "modfed", "cea"). Defaults to NULL
- randomize_questions
Randomize question order for each respondent? Defaults to TRUE (optimized methods only)
- randomize_alts
Randomize alternative order within questions? Defaults to TRUE (optimized methods only)
- remove_dominant
Remove choice sets with dominant alternatives? Defaults to FALSE
- dominance_types
Types of dominance to check: "total" and/or "partial"
- dominance_threshold
Threshold for total dominance detection. Defaults to 0.8
- max_dominance_attempts
Maximum attempts to replace dominant choice sets. Defaults to 50.
- max_iter
Maximum iterations for optimized designs. Defaults to 50
- n_start
Number of random starts for optimized designs. Defaults to 5
- include_probs
Include predicted probabilities in resulting design? Requires
priors
. Defaults toFALSE
- use_idefix
If
TRUE
(the default), the idefix package will be used to find optimal designs, which is faster. Only valid with"cea"
and"modfed"
methods.
Details
Design Methods
The method
argument determines the design approach used:
"random"
: Creates designs by randomly sampling profiles for each respondent independently"shortcut"
: Frequency-based greedy algorithm that balances attribute level usage"minoverlap"
: Greedy algorithm that minimizes attribute overlap within choice sets"balanced"
: Greedy algorithm that maximizes overall attribute balance across the design"stochastic"
: Stochastic profile swapping with D-error optimization (first improvement found)"modfed"
: Modified Fedorov algorithm with exhaustive profile swapping for D-error optimization"cea"
: Coordinate Exchange Algorithm with attribute-by-attribute D-error optimization
Method Compatibility
The table below summarizes method compatibility with design features:
Method | No choice? | Labeled designs? | Restricted profiles? | balance_by? | Blocking? | Interactions? | Dominance removal? |
"random" | Yes | Yes | Yes | Yes | No | Yes | Yes |
"shortcut" | Yes | Yes | Yes | Yes | No | No | Yes |
"minoverlap" | Yes | Yes | Yes | Yes | No | No | Yes |
"balanced" | Yes | Yes | Yes | Yes | No | No | Yes |
"stochastic" | Yes | Yes | Yes | No | Yes | Yes | Yes |
"modfed" | Yes | Yes | Yes | No | Yes | Yes | Yes |
"cea" | Yes | Yes | No | No | Yes | Yes | Yes |
Design Quality Assurance
All methods ensure the following criteria are met:
No duplicate profiles within any choice set
No duplicate choice sets within any respondent
If
remove_dominant = TRUE
, choice sets with dominant alternatives are eliminated (optimization methods only)
Balanced Sampling with balance_by
The balance_by
argument enables balanced sampling across specified attributes,
solving the problem of attribute-specific features that create imbalanced designs.
For example, consider an experiment on alternative vehicle powertrains with a "powertrain"
attribute for gas and electric vehicles. If you had an "electric_vehicle_range" attribute,
it should be 0 for non-electric powertrains, but using restrictions can lead to
over-representation of electric vehicles. Using balance_by = "powertrain"
ensures that each
choice question samples proportionally from gas and electric powertrains, maintaining balance
even when electric vehicles have additional attributes.
Multiple attributes can be balanced simultaneously using balance_by = c("attr1", "attr2")
,
which creates groups based on unique combinations of the specified attributes.
Method Details
Random Method
Creates designs where each respondent sees completely independent, randomly generated choice sets.
Greedy Methods (shortcut, minoverlap, balanced)
These methods use frequency-based algorithms that make locally optimal choices:
Shortcut: Balances attribute level usage within questions and across the overall design
Minoverlap: Minimizes attribute overlap within choice sets while allowing some overlap for balance
Balanced: Maximizes overall attribute balance, prioritizing level distribution over overlap reduction
These methods provide good level balance without requiring priors or D-error calculations and offer fast execution suitable for large designs.
D-Error Optimization Methods (stochastic, modfed, cea)
These methods minimize D-error to create statistically efficient designs:
Stochastic: Random profile sampling with first improvement acceptance
Modfed: Exhaustive profile testing for best improvement (slower but thorough)
CEA: Coordinate exchange testing attribute levels individually (requires full factorial profiles)
idefix Integration
When use_idefix = TRUE
(the default), the function leverages the highly optimized
algorithms from the idefix package for 'cea' and 'modfed' design generation methods.
This can provide significant speed improvements, especially for larger
problems.
Key benefits of idefix integration:
Faster optimization algorithms with C++ implementation
Better handling of large candidate sets
Optimized parallel processing
Advanced blocking capabilities for multi-block designs
Examples
library(cbcTools)
# Create profiles for an apple choice experiment
profiles <- cbc_profiles(
price = c(1, 1.5, 2, 2.5, 3),
type = c("Fuji", "Gala", "Honeycrisp"),
freshness = c("Poor", "Average", "Excellent")
)
# Basic random design
design_random <- cbc_design(
profiles = profiles,
n_alts = 3,
n_q = 6,
n_resp = 100
)
head(design_random)
#> Design method: random
#> Structure: 100 respondents × 6 questions × 3 alternatives
#> Profile usage: 45/45 (100.0%)
#>
#> 💡 Use cbc_inspect() for a more detailed summary
#>
#> First few rows of design:
#> profileID respID qID altID obsID price typeGala typeHoneycrisp
#> 1 22 1 1 1 1 1.5 1 0
#> 2 9 1 1 2 1 2.5 1 0
#> 3 34 1 1 3 1 2.5 0 0
#> 4 18 1 2 1 2 2.0 0 0
#> 5 3 1 2 2 2 2.0 0 0
#> 6 7 1 2 3 2 1.5 1 0
#> freshnessAverage freshnessExcellent
#> 1 1 0
#> 2 0 0
#> 3 0 1
#> 4 1 0
#> 5 0 0
#> 6 0 0
# Inspect design
cbc_inspect(design_random)
#> DESIGN SUMMARY
#> =========================
#>
#> STRUCTURE
#> ================
#> Method: random
#> Created: 2025-09-23 16:54:07
#> Respondents: 100
#> Questions per respondent: 6
#> Alternatives per question: 3
#> Total choice sets: 600
#> Profile usage: 45/45 (100.0%)
#>
#> SUMMARY METRICS
#> =================
#> D-error calculation not available for random designs
#> Overall balance score: 0.736 (higher is better)
#> Overall overlap score: 0.255 (lower is better)
#>
#> VARIABLE ENCODING
#> =================
#> Format: Dummy-coded (type, freshness)
#> 💡 Use cbc_decode_design() to convert to categorical format
#>
#> ATTRIBUTE BALANCE
#> =================
#> Overall balance score: 0.736 (higher is better)
#>
#> Individual attribute level counts:
#>
#> price:
#>
#> 1 1.5 2 2.5 3
#> 380 350 361 381 328
#> Balance score: 0.942 (higher is better)
#>
#> typeGala:
#>
#> 0 1
#> 1180 620
#> Balance score: 0.694 (higher is better)
#>
#> typeHoneycrisp:
#>
#> 0 1
#> 1223 577
#> Balance score: 0.663 (higher is better)
#>
#> freshnessAverage:
#>
#> 0 1
#> 1194 606
#> Balance score: 0.684 (higher is better)
#>
#> freshnessExcellent:
#>
#> 0 1
#> 1179 621
#> Balance score: 0.695 (higher is better)
#>
#> ATTRIBUTE OVERLAP
#> =================
#> Overall overlap score: 0.255 (lower is better)
#>
#> Counts of attribute overlap:
#> (# of questions with N unique levels)
#>
#> price: Continuous variable
#> Questions by # unique levels:
#> 1 (complete overlap): 2.5% (15 / 600 questions)
#> 2 (partial overlap): 47.2% (283 / 600 questions)
#> 3 (partial overlap): 50.3% (302 / 600 questions)
#> 4 (partial overlap): 0.0% (0 / 600 questions)
#> 5 (no overlap): 0.0% (0 / 600 questions)
#> Average unique levels per question: 2.48
#>
#> typeGala: Continuous variable
#> Questions by # unique levels:
#> 1 (complete overlap): 29.2% (175 / 600 questions)
#> 2 (no overlap): 70.8% (425 / 600 questions)
#> Average unique levels per question: 1.71
#>
#> typeHoneycrisp: Continuous variable
#> Questions by # unique levels:
#> 1 (complete overlap): 33.8% (203 / 600 questions)
#> 2 (no overlap): 66.2% (397 / 600 questions)
#> Average unique levels per question: 1.66
#>
#> freshnessAverage: Continuous variable
#> Questions by # unique levels:
#> 1 (complete overlap): 28.8% (173 / 600 questions)
#> 2 (no overlap): 71.2% (427 / 600 questions)
#> Average unique levels per question: 1.71
#>
#> freshnessExcellent: Continuous variable
#> Questions by # unique levels:
#> 1 (complete overlap): 33.0% (198 / 600 questions)
#> 2 (no overlap): 67.0% (402 / 600 questions)
#> Average unique levels per question: 1.67
#>
#>
# Greedy design with balanced frequency
design_balanced <- cbc_design(
profiles = profiles,
method = "balanced",
n_alts = 3,
n_q = 6,
n_resp = 100
)
#> Generating balanced design for 100 respondents using 3 cores...
# Design with priors using D-optimal method
priors <- cbc_priors(
profiles = profiles,
price = -0.25,
type = c("Gala" = 0.5, "Honeycrisp" = 1.0),
freshness = c("Average" = 0.6, "Excellent" = 1.2)
)
design_optimal <- cbc_design(
profiles = profiles,
method = "stochastic",
priors = priors,
n_alts = 3,
n_q = 6,
n_resp = 100,
n_start = 3
)
#> Stochastic design will be optimized into 1 design block, then allocated across 100 respondents
#> Running 3 design searches using 3 cores...
#>
#> D-error results from all starts:
#> Start 2: 0.889353 (Best)
#> Start 1: 0.909082
#> Start 3: 0.997963
# Compare designs
cbc_compare(
"Random" = design_random,
"Balanced" = design_balanced,
"D-optimal" = design_optimal
)
#> CBC Design Comparison
#> =====================
#> Designs compared: 3
#> Metrics: structure, efficiency, balance, overlap
#> Sorted by: d_error (ascending)
#>
#> Structure
#> =====================
#> Design Method respondents questions
#> D-optimal stochastic 100 6
#> Random random 100 6
#> Balanced balanced 100 6
#> Alternatives Blocks Profile Usage
#> 3 1 (16/45) 35.6%
#> 3 1 (45/45) 100%
#> 3 1 (45/45) 100%
#> No Choice Labeled?
#> No No
#> No No
#> No No
#>
#> Design Metrics
#> =====================
#> Design Method D-Error (Null) D-Error (Prior) Balance Overlap
#> D-optimal stochastic 0.785048 0.889353 0.643 0.100
#> Random random NA NA 0.736 0.255
#> Balanced balanced NA NA 0.742 0.000
#>
#> Interpretation:
#> - D-Error: Lower is better (design efficiency)
#> - Balance: Higher is better (level distribution)
#> - Overlap: Lower is better (attribute variation)
#> - Profile Usage: Higher means more profiles used
#>
#> Best performers:
#> - D-Error: D-optimal (0.889353)
#> - Balance: Balanced (0.742)
#> - Overlap: Balanced (0.000)
#> - Profile Usage: Random (100.0%)
#>
#> Use summary() for detailed information on any one design.