Feature Flags & Experimentation
Bosca includes a feature-flag and experimentation system for delivering software changes to defined audiences, measuring whether they actually moved a metric, and rolling them out gradually. Flags decide what value each user sees; targeting rules decide which users get which split; experiments observe the split and report statistical lift on conversion goals.
What you get
- Feature Flags: Boolean, percentage, string, and JSON-valued flags with stable per-user bucket assignments.
- Targeted Rollouts: Top-to-bottom targeting rules with segment, principal, profile-attribute, and device-attribute conditions.
- Weighted Variations: Per-rule rollouts that split matched users across multiple variations by weight, with bucket-range stability so widening a rollout doesn't reshuffle existing users.
- Experiments: Observation and statistics attached to a single targeting rule, comparing variations on conversion goals.
- Conversion Goals: Match analytics events by type, element, and page, with two metric types (
UNIQUE_CONVERSIONfor binary "did they convert?" questions andEVENT_COUNTfor "how often per user?" questions). - Live Distribution: Per-flag panel showing the actual variation distribution among recently active users, with a date-range picker.
- Mutual Exclusion: Exclusion layers prevent the same user from being assigned to conflicting experiments at once.
- AI Analysis: Automated experiment summaries with significance, lift, and a recommendation.
Core Concepts
Feature Flags
A FeatureFlag is a named decision the system can make at runtime. Each flag has:
- A stable key clients use to evaluate it (e.g.
checkout-redesign). - A type declaring what shape its values take.
- A palette of variations — the candidate values the flag can resolve to.
- A default variation served when no targeting rule matches.
- An ordered list of targeting rules that can override the default for matched users.
- A per-flag salt that mixes into bucket-assignment hashes; regenerating it reshuffles every user.
- A status controlling whether the flag is currently evaluated.
Flag Types
| Type | Variation values are |
|---|---|
BOOLEAN | true / false |
PERCENTAGE | numbers 0–100 |
STRING | arbitrary strings |
JSON | arbitrary JSON payloads |
Flag Status
| Status | Description |
|---|---|
DRAFT | Being built; not yet evaluated |
ENABLED | Evaluating normally per targeting rules |
DISABLED | Turned off; everyone gets the default variation |
ARCHIVED | Retired; hidden from the main UI |
DISABLED is the "stop serving this flag" state used both for planned pauses and oncall incident response — flipping a flag to DISABLED makes evaluation short-circuit to the default variation regardless of rules.
Variations
A Variation is one entry in a flag's palette of candidate values. Each variation has:
- A stable key (e.g.
on,control,treatment-a) referenced by targeting rules and bucket assignments. Treat keys as forever-stable once a flag is in production — renaming reshuffles bucket ranges. - A human-readable name shown in dashboards.
- An optional description.
- A value returned to clients when this variation is served.
Targeting Rules
A TargetingRule is one entry in a flag's ordered rule list. Each rule has:
- A stable id (UUID string) referenced by attached experiments and by the bucket-assignment hash. Reordering rules in the UI does not change ids.
- An optional name and description for human readers.
- A list of conditions that all must match for the rule to apply (AND semantics).
- A rollout describing how matched users are split across variations.
Rules are evaluated top-to-bottom. The first rule whose conditions all match wins, and matched users are bucketed across that rule's rollout. If no rule matches, the flag's default variation is returned.
Conditions
| Condition Type | Matches when | Requires |
|---|---|---|
Segment | The user is a member of the named segment | Authenticated principal |
Principal | The user's principal id is in the supplied list | Authenticated principal |
ProfileAttribute | A value inside a profile attribute's JSON object matches via an operator | Authenticated principal |
DeviceAttribute | A field on the analytics Device payload matches via an operator | Client-supplied device |
Every condition type supports a negate flag, which inverts the match result without needing a separate NOT_X operator.
DeviceAttribute keys
DeviceAttribute conditions resolve their key against the analytics SDK's Device class — the same object the SDK ships when emitting analytics events. The recognized keys (with common aliases) are:
| Key | Aliases |
|---|---|
installationId | installation_id |
manufacturer | |
model | |
platform | |
primaryLocale | primary_locale, locale |
systemName | system_name, os |
timezone | tz |
type | device_type, deviceType |
version | os_version, osVersion |
The browser SDK populates platform, primaryLocale (from navigator.language), timezone, and version (from navigator.userAgent) automatically when calling evaluate. Server-side and non-browser SDKs may pass any subset; unknown keys silently fail to match.
Rollouts and Bucketing
A Rollout is a list of (variationKey, weight) pairs. Weights are unitless and normalized at evaluation time, so [(on, 9), (off, 1)] is a 90/10 split, [(on, 1), (off, 1)] is 50/50, and weights don't need to sum to 100.
Bucket assignment is deterministic per user:
bucket = sha256(flagKey + ":" + salt + ":" + ruleId + ":" + identifier) % totalWeight
The hash inputs are deliberate:
| Input | Why it's there |
|---|---|
flagKey | Independent buckets across flags — being "on" for flag A doesn't correlate with flag B |
salt | Per-flag random salt; regenerate to reshuffle every user |
ruleId | Independent buckets across rules; each rule has its own bucket space |
identifier | Authenticated principal id (if present), otherwise installation id; the only user input |
What's intentionally not in the hash:
- Conditions — two users who match the same rule's conditions are bucketed identically based on their identifier alone, not based on why they qualified.
- Variation weights — increasing one variation's weight absorbs users from adjacent ranges rather than reshuffling everyone. This is what enables smooth gradual rollouts.
Bucket-range stability
Variation weights are sorted alphabetically by key, then assigned contiguous bucket ranges. With sorted keys [off, on] and weights off=80, on=20:
offgets bucket range0 – 0.80ongets bucket range0.80 – 1.00
Increase on from 20 to 40:
offgets0 – 0.667(80 / (80+40))ongets0.667 – 1.00
A user whose hash falls at 0.5 was in off before AND is still in off after — no movement. A user at 0.72 was in off (0.72 < 0.80) and is now in on (0.72 > 0.667) — moves on next eval. A user at 0.95 was in on and stays in on. Only users in the newly-extended range move; everyone else keeps their bucket. This is what makes gradual rollouts safe to widen.
Regenerating the per-flag salt is the explicit "I want fresh assignments" knob and reshuffles every user.
Conditions and Bucketing Are Two Separate Steps
A common confusion: conditions and bucketing both determine what variation a user gets, but they do completely different things.
for each rule in rules:
if rule.conditions ALL match the user: ← condition gate
bucket the user against rule.rollout ← bucket assignment
return the resulting variation
return defaultVariation
Conditions are the gate. They decide whether the rule applies to this user. They're a hard yes/no filter — all conditions on a rule must match (AND), each can be negated, and any one failure skips the rule entirely. The user is never bucketed against a rule whose conditions they don't satisfy.
Bucketing is the within-rule assignment. Once conditions pass, the deterministic hash above decides which of the rule's variations the user falls into. Conditions are not part of the hash, so two users who happen to qualify for the same rule are bucketed identically against that rule's space.
Practical implications:
- A user moving in or out of a segment changes which rule they match but doesn't reshuffle their bucket within either rule. If they were in
onunder rule 1 and they leave the segment, rule 1 stops matching for them and they fall through to rule 2 — rule 2 will assign them an independent bucket based on its ownruleId. - The identifier is principal or installation, not both. If an anonymous user logs in mid-session, their identifier flips from
installationIdtoprincipalId.toString()and they're re-bucketed across every rule and every flag. This is a known property of identifier-based bucketing — there's no built-in consolidation between anonymous and authenticated identities. - Conditions are evaluated per call. The server doesn't cache "does this user match this rule's conditions"; it walks the conditions on every
evaluate. Add a user to a segment and the nextevaluatesees the new membership and re-routes them.
Live Distribution
Every evaluate call updates a per-(flag, user) row in the flag_exposures table recording the variation that user is currently bucketed into. The flag detail page shows this as a Live Distribution panel with two side-by-side views:
Actual
Aggregated counts of users currently active in each variation, scoped to a date range picked from the same date-range picker as the analytics dashboards. The window is what makes the panel honest about dormant users:
A user who hasn't evaluated the flag since a rollout change isn't really in any bucket right now — they're a historical record. The window filter excludes them by construction, so the panel shows "users actively being served each variant during this period" instead of "every user we've ever bucketed".
After a rollout edit, the actual distribution converges to the new weights as users re-evaluate within the window.
Configured
Computed client-side from the flag's targeting rules and default-variation key. For each rule, normalized rollout weights as percentages; for the default path, a single 100% entry pointing at the default variation. This is the answer to "if every user evaluated right now, what would the rollout configuration produce?" — independent of who's actually active.
Comparing the two surfaces drift between intent and reality. If the configured panel says rule 1 is 40/60 but the actual panel shows 50/50, traffic isn't matching the intent — maybe the rule isn't matching as many users as expected, or the default path is absorbing more than intended.
Lifecycle hooks
The exposure table self-heals when flag state changes:
- Variation value change (
on.value: true → false): no effect on the exposure rows. They reference variation keys, not values. - Variation key rename or removal: the
editmutation prunes orphaned rows whosevariation_keyis no longer in the palette. - Salt regeneration ("Reshuffle Buckets"): wipes all exposure rows for the flag — the operator's explicit intent is fresh assignments, so leaving stale counts would be misleading. The table refills naturally as users re-evaluate.
- Rule weight change: no immediate write to exposures. Users move between variations gradually as they re-evaluate, with bucket-range stability preserving most existing assignments.
Experiments
An Experiment is observation and statistics attached to a single targeting rule on a feature flag. It does not own its own values — those come from the rule's rollout. An experiment's only job is to:
- Record assignments when users hit the attached rule.
- Aggregate per-variation metrics from the analytics events table.
- Optionally produce an AI summary of the result.
Experiment Status
| Status | Description |
|---|---|
DRAFT | Being configured; assignments and aggregation are not running |
RUNNING | Assignments are persisted on every evaluation that hits the attached rule |
PAUSED | Temporarily stopped; existing assignments remain but no new ones land |
COMPLETED | Finalized; results are frozen |
ARCHIVED | Hidden from the main UI |
An experiment must be RUNNING for assignments to be persisted. A common pitfall: leaving the experiment in DRAFT, calling evaluate, seeing the right variation come back, and then wondering why the impressions count is zero. Click Start in the experiment detail page to enter RUNNING.
Conversion Goals
A ConversionGoal defines what counts as a "success" for an experiment. Each goal has:
- A name for display.
- An optional eventType filter (one of
Session,Interaction,Impression,Completion,Installation,Error). Null matches any event type — useful for "did total activity per user grow?" metrics. - An optional elementType filter (e.g.
button,click). - An optional elementId filter.
- An optional pagePath filter matched against
page.pathon the analytics events table. - A metricType selecting how the goal is analyzed.
Metric Types
| Metric | What it answers | Statistical test |
|---|---|---|
UNIQUE_CONVERSION | "Did the user do this at all?" | Chi-squared |
EVENT_COUNT | "How many times per user did they do it?" | Welch's t-test |
UNIQUE_CONVERSION counts each converting user at most once (regardless of how many matching events they emitted) and reports a proportion in 0, 1. EVENT_COUNT totals matching events per user, reports a per-user mean and variance, and uses Welch's t-test to detect a difference of means. Picking the wrong metric type for the wrong question will quietly produce misleading results — chi-squared on a count metric isn't even mathematically valid because the proportion can exceed 1.
Aggregation
The Aggregate action enqueues a background job that recomputes the experiment_results table for the experiment. For each (variation, goal) combination, it:
- Counts impressions from the assignments table.
- Queries the analytics events table for matching events, joins each converting
client_idagainst the assignments table to attribute it to a variation. - For
UNIQUE_CONVERSIONgoals, computes conversion rate, chi-squared confidence, and percentage lift over the control variation. - For
EVENT_COUNTgoals, computes per-user mean, variance via Bessel-corrected sample variance, Welch's t-test confidence, and lift over the control mean. - Upserts one row per
(variation, goal)intoexperiment_results.
The control variation is the alphabetically first variation key — the same convention bucket assignment uses, so the "control" is stable across runs.
The job is gated on RUNNING and skips goals whose Trino query fails (e.g., a missing column in the warehouse) without overwriting prior good results with zeros.
Mutual Exclusion Layers
An ExclusionLayer prevents a single user from being assigned to two conflicting experiments at the same time. When an experiment is attached to a layer, the assignment write checks whether the user already has an assignment in any other experiment in that layer; if so, the new assignment is rejected. Useful for preventing interaction effects between simultaneous tests on related surfaces.
Putting It Together
A typical end-to-end flow:
- Create a flag. Pick a type, define the variation palette, set the default variation. Leave the targeting rules empty for now.
- Add a targeting rule. Set its conditions (if any) and its rollout weights. Single-variation rollouts ("everyone matching the rule gets
on") are fine for plain feature gating. - Set status to
ENABLED. Clients callingevaluatewill now bucket users against the rule. - Watch the Live Distribution panel to confirm the actual split is matching the configured intent.
- (Optional) Attach an experiment to the rule for A/B observation. Add conversion goals scoped to the events you want to count. Set status to
RUNNING. - Click Aggregate after enough time has passed for events to land. The Results section shows per-variation impressions, conversions/mean, confidence, and lift.
- Click Analyze for an AI summary of the result and a recommendation.
Common Pitfalls
- Experiment is in
DRAFTorPAUSED. Assignments are only persisted while the experiment isRUNNING. Theevaluateresponse will still return the correct variation, but no row lands inassignmentsand aggregation reports zero impressions. - Goal filter is too tight. A goal with
elementType: buttonwon't match events the SDK emits withelement.type: click. The simplest sanity-check goal haseventType: Interactionand no other filters. - Targeting rule has a Segment condition that the test devices aren't in. The rule won't match, evaluation falls through to the default path, and no assignments land against the experiment. For testing, use a rule with no conditions or add the test devices to the segment.
- Anonymous user becomes authenticated mid-session. Their identifier flips from installation id to principal id, and they're re-bucketed across every flag and rule. The exposure row is updated in place.
- Variation key was renamed without realising. Old assignments and exposures still reference the old key for users who haven't re-evaluated. The exposure table prunes orphans on
edit, but the experiment_results table doesn't — re-aggregate to refresh. - Salt was regenerated mid-experiment. Every user is reshuffled. The exposure table is wiped (intentional). Existing assignments are kept (they don't cascade-clear), but new evaluations land against new buckets, which can muddy in-flight experiments.
For developers
Related modules:
- Core interfaces:
backend/framework/core-experimentation - Implementation:
backend/framework/experimentation - GraphQL schema:
backend/framework/experimentation/src/main/resources/graphql/experimentation.graphqls - Migrations:
backend/framework/experimentation/src/main/resources/db/migrations/V1__experimentation.sql - Aggregation job:
backend/framework/experimentation/src/main/kotlin/bosca/experimentation/jobs/ExperimentResultAggregation.kt - Bucket assignment:
ExperimentServiceImpl.bucketRolloutinbackend/framework/experimentation/src/main/kotlin/bosca/experimentation/service/ExperimentServiceImpl.kt
Related docs:
- Segments referenced by
Segmentconditions: Segmentation & Campaigns - Analytics events the conversion goals match against: Analytics
- Typed
Deviceclass used byDeviceAttributeconditions: Devices & Push - Profiles backing
ProfileAttributeconditions and segment membership: Profiles