This article demonstrates how to convert between different encoding
schemes for categorical variables in choice-based conjoint designs using
the cbc_encode()
function.
Overview
Choice-based conjoint data can use different encoding schemes for categorical variables:
- Standard encoding: Categorical variables represented as factors or characters
- Dummy coding: Binary indicators with a reference category (all zeros)
- Effects coding: Coded as -1, 0, or 1 to ensure coefficients sum to zero
The cbc_encode()
function allows you to convert between
these encodings and customize reference levels.
Basic Encoding Conversion
Creating a Design
Let’s start by creating a simple design:
library(cbcTools)
# Create profiles
profiles <- cbc_profiles(
price = c(1, 1.5, 2, 2.5, 3),
type = c("Fuji", "Gala", "Honeycrisp"),
freshness = c("Poor", "Average", "Excellent")
)
# Create design (uses standard encoding by default)
design <- cbc_design(
profiles = profiles,
n_alts = 3,
n_q = 6,
n_resp = 100,
method = "random"
)
head(design)
#> Design method: random
#> Encoding: standard
#> Structure: 100 respondents × 6 questions × 3 alternatives
#> Profile usage: 45/45 (100.0%)
#>
#> 💡 Use cbc_inspect() for a more detailed summary
#>
#> First few rows of design:
#> profileID respID qID altID obsID price type freshness
#> 1 45 1 1 1 1 3.0 Honeycrisp Excellent
#> 2 23 1 1 2 1 2.0 Gala Average
#> 3 12 1 1 3 1 1.5 Honeycrisp Poor
#> 4 37 1 2 1 2 1.5 Gala Excellent
#> 5 38 1 2 2 2 2.0 Gala Excellent
#> 6 31 1 2 3 2 1.0 Fuji Excellent
By default, designs are created with standard encoding where categorical variables remain as factors.
Converting to Dummy Coding
Convert to dummy coding for model estimation:
design_dummy <- cbc_encode(design, coding = "dummy")
head(design_dummy)
#> Design method: random
#> Encoding: dummy
#> Structure: 100 respondents × 6 questions × 3 alternatives
#> Profile usage: 45/45 (100.0%)
#>
#> 💡 Use cbc_inspect() for a more detailed summary
#> 💡 Use cbc_encode(design, 'standard') to view categorical format
#>
#> First few rows of design:
#> profileID respID qID altID obsID price typeGala typeHoneycrisp
#> 1 45 1 1 1 1 3.0 0 1
#> 2 23 1 1 2 1 2.0 1 0
#> 3 12 1 1 3 1 1.5 0 1
#> 4 37 1 2 1 2 1.5 1 0
#> 5 38 1 2 2 2 2.0 1 0
#> 6 31 1 2 3 2 1.0 0 0
#> freshnessAverage freshnessExcellent
#> 1 0 1
#> 2 1 0
#> 3 0 0
#> 4 0 1
#> 5 0 1
#> 6 0 1
Notice that: - The type
variable is replaced with
typeGala
and typeHoneycrisp
- The
freshness
variable is replaced with
freshnessAverage
and freshnessExcellent
-
Fuji
and Poor
are the reference levels
(represented when dummy variables = 0) - Continuous variables like
price
remain unchanged
Converting to Effects Coding
Effects coding uses -1 for the reference level:
design_effects <- cbc_encode(design, coding = "effects")
head(design_effects)
#> Design method: random
#> Encoding: effects
#> Structure: 100 respondents × 6 questions × 3 alternatives
#> Profile usage: 45/45 (100.0%)
#>
#> 💡 Use cbc_inspect() for a more detailed summary
#> 💡 Use cbc_encode(design, 'standard') to view categorical format
#>
#> First few rows of design:
#> profileID respID qID altID obsID price typeGala typeHoneycrisp
#> 1 45 1 1 1 1 3.0 0 1
#> 2 23 1 1 2 1 2.0 1 0
#> 3 12 1 1 3 1 1.5 0 1
#> 4 37 1 2 1 2 1.5 1 0
#> 5 38 1 2 2 2 2.0 1 0
#> 6 31 1 2 3 2 1.0 -1 -1
#> freshnessAverage freshnessExcellent
#> 1 0 1
#> 2 1 0
#> 3 -1 -1
#> 4 0 1
#> 5 0 1
#> 6 0 1
In effects coding: - Non-reference levels are coded as 0 or 1 (same as dummy) - Reference level rows have -1 for all level indicators - This ensures coefficients sum to zero
Converting Back to Standard
Convert back to categorical variables:
design_standard <- cbc_encode(design_dummy, coding = "standard")
head(design_standard)
#> Design method: random
#> Encoding: standard
#> Structure: 100 respondents × 6 questions × 3 alternatives
#> Profile usage: 45/45 (100.0%)
#>
#> 💡 Use cbc_inspect() for a more detailed summary
#>
#> First few rows of design:
#> profileID respID qID altID obsID price type freshness
#> 1 45 1 1 1 1 3.0 Honeycrisp Excellent
#> 2 23 1 1 2 1 2.0 Gala Average
#> 3 12 1 1 3 1 1.5 Honeycrisp Poor
#> 4 37 1 2 1 2 1.5 Gala Excellent
#> 5 38 1 2 2 2 2.0 Gala Excellent
#> 6 31 1 2 3 2 1.0 Fuji Excellent
Customizing Reference Levels
By default, the first level of each categorical variable is used as
the reference. You can specify different reference levels using the
ref_levels
argument.
Setting Custom References
# Use "Honeycrisp" as reference for type, "Excellent" for freshness
design_custom <- cbc_encode(
design,
coding = "dummy",
ref_levels = list(
type = "Honeycrisp",
freshness = "Excellent"
)
)
head(design_custom)
#> Design method: random
#> Encoding: dummy
#> Structure: 100 respondents × 6 questions × 3 alternatives
#> Profile usage: 45/45 (100.0%)
#>
#> 💡 Use cbc_inspect() for a more detailed summary
#> 💡 Use cbc_encode(design, 'standard') to view categorical format
#>
#> First few rows of design:
#> profileID respID qID altID obsID price typeFuji typeGala freshnessPoor
#> 1 45 1 1 1 1 3.0 0 0 0
#> 2 23 1 1 2 1 2.0 0 1 0
#> 3 12 1 1 3 1 1.5 0 0 1
#> 4 37 1 2 1 2 1.5 0 1 0
#> 5 38 1 2 2 2 2.0 0 1 0
#> 6 31 1 2 3 2 1.0 1 0 0
#> freshnessAverage
#> 1 0
#> 2 1
#> 3 0
#> 4 0
#> 5 0
#> 6 0
Now Honeycrisp
and Excellent
are the
reference categories.
Updating References Without Changing Encoding
You can update reference levels while keeping the current encoding:
# Start with dummy coding
design_dummy <- cbc_encode(design, coding = "dummy")
# Update reference levels only (keeps dummy coding)
design_updated <- cbc_encode(
design_dummy,
ref_levels = list(type = "Gala")
)
head(design_updated)
#> Design method: random
#> Encoding: dummy
#> Structure: 100 respondents × 6 questions × 3 alternatives
#> Profile usage: 45/45 (100.0%)
#>
#> 💡 Use cbc_inspect() for a more detailed summary
#> 💡 Use cbc_encode(design, 'standard') to view categorical format
#>
#> First few rows of design:
#> profileID respID qID altID obsID price typeFuji typeHoneycrisp
#> 1 45 1 1 1 1 3.0 0 1
#> 2 23 1 1 2 1 2.0 0 0
#> 3 12 1 1 3 1 1.5 0 1
#> 4 37 1 2 1 2 1.5 0 0
#> 5 38 1 2 2 2 2.0 0 0
#> 6 31 1 2 3 2 1.0 1 0
#> freshnessAverage freshnessExcellent
#> 1 0 1
#> 2 1 0
#> 3 0 0
#> 4 0 1
#> 5 0 1
#> 6 0 1
Working with No-Choice Options
When using designs with no-choice options, you should convert to dummy coding before power analysis or model estimation:
# Create profiles
profiles_nc <- cbc_profiles(
price = c(1, 2, 3),
quality = c("Low", "High")
)
# Create priors including no-choice
priors_nc <- cbc_priors(
profiles = profiles_nc,
price = -0.1,
quality = c("High" = 0.5),
no_choice = -1.5
)
# Create design with no-choice
design_nc <- cbc_design(
profiles = profiles_nc,
priors = priors_nc,
n_alts = 2,
n_q = 4,
n_resp = 50,
no_choice = TRUE,
method = "random"
)
# Simulate choices
choices_nc <- cbc_choices(design_nc, priors_nc)
head(choices_nc)
#> CBC Choice Data
#> ===============
#> Encoding: standard
#> Observations: 2 choice tasks
#> Alternatives per task: 3
#> Total choices made: 2
#>
#> Simulation method: utility_based
#> Priors: Used for utility-based simulation
#> Simulated at: 2025-10-12 18:06:01
#>
#> Choice rates by alternative:
#> Alt 1: 50.0% (1 choices)
#> Alt 2: 50.0% (1 choices)
#> Alt 3: 0.0% (0 choices)
#>
#> No-choice rate: 0.0%
#>
#> First few rows:
#> profileID respID qID altID obsID price quality no_choice choice
#> 1 2 1 1 1 1 2 Low 0 0
#> 2 6 1 1 2 1 3 High 0 1
#> 3 0 1 1 3 1 NA <NA> 1 0
#> 4 6 1 2 1 2 3 High 0 1
#> 5 1 1 2 2 2 1 Low 0 0
#> 6 0 1 2 3 2 NA <NA> 1 0
For modeling or power analysis with no-choice data, convert to dummy or effects coding:
# Convert to dummy coding for power analysis
choices_dummy <- cbc_encode(choices_nc, coding = "dummy")
# Run power analysis
power_result <- cbc_power(
data = choices_dummy,
n_breaks = 5
)
power_result
Use Cases
For Model Estimation
While it is not required for the logitr
package,
encoding the data into dummy or effects coding can be helpful when
estimating models for easier interpretation or simply greater control
over which levels are included in the model:
library(logitr)
# Convert to dummy coding
choices_dummy <- cbc_encode(choices, coding = "dummy")
# Estimate model
model <- logitr(
data = choices_dummy,
outcome = "choice",
obsID = "obsID",
pars = c("price", "typeGala", "typeHoneycrisp",
"freshnessAverage", "freshnessExcellent")
)
For Data Inspection
It is generally easier to inspect your data when using standard encoding:
# Work with categorical variables
choices_standard <- design
# Filter for chosen alternatives
chosen <- choices_standard[sample(1:nrow(choices_standard), 100), ]
# Examine choice frequencies by category
table(chosen$type)
#>
#> Fuji Gala Honeycrisp
#> 45 25 30
table(chosen$freshness)
#>
#> Poor Average Excellent
#> 33 37 30
# Use cbc_inspect
cbc_inspect(choices_standard, sections = 'balance')
#> DESIGN SUMMARY
#> =========================
#>
#> ATTRIBUTE BALANCE
#> =================
#> Overall balance score: 0.972 (higher is better)
#>
#> Individual attribute level counts:
#>
#> price:
#>
#> 1 1.5 2 2.5 3
#> 372 362 348 362 356
#> Balance score: 0.976 (higher is better)
#>
#> type:
#>
#> Fuji Gala Honeycrisp
#> 605 612 583
#> Balance score: 0.975 (higher is better)
#>
#> freshness:
#>
#> Poor Average Excellent
#> 624 593 583
#> Balance score: 0.966 (higher is better)
For Power Analysis
You can use either encoding, but results differ:
# Dummy coding: estimates for each level
power_dummy <- cbc_power(
cbc_encode(choices, coding = "dummy"),
n_breaks = 5
)
# Standard coding: estimates categorical effect
power_standard <- cbc_power(
cbc_encode(choices, coding = "standard"),
pars = c("price", "type", "freshness"),
n_breaks = 5
)