Recommendations

Deliver personalized content recommendations powered by analytics, audience segments, and machine learning.

Bosca includes a recommendation engine that surfaces the right content for each user. It combines analytics-driven strategies, semantic similarity, audience segmentation, and machine learning to generate personalized suggestions for both metadata items and collections.

What you get

Multiple strategy types — trending, segment-based, content similarity, curated lists, collaborative filtering, and ML-powered predictions
Placement system — define named display slots in your UI and control which strategies feed each one
Segment targeting — strategies can target specific audience segments or apply to all users
Scheduled evaluation — strategies re-evaluate automatically on cron schedules to keep recommendations fresh
Dismissals — users can dismiss content they don't want to see, and undo dismissals later
Diversity controls — category caps and freshness boosting prevent filter bubbles
TensorFlow Recommenders — train two-tower retrieval models on your interaction data and serve predictions via TF Serving
Works for anonymous users — trending and curated strategies serve content without requiring a profile

Strategy Types

Recommendations are generated by strategies, each using a different approach to match content with users.

Type	How it works	Data source
`TRENDING`	Ranks content by recent interaction velocity across all users	Trino analytics queries over Iceberg events
`SEGMENT_BASED`	Surfaces content popular among users in the same audience segments	Trino queries filtered by segment membership
`CONTENT_BASED`	Finds content similar to what a user has interacted with, using categories, labels, and vector embeddings	Meilisearch hybrid search with semantic similarity
`CURATED`	Admin-managed content lists for editorial picks or featured content	Manual selection, no analytics query needed
`COLLABORATIVE`	Recommends content that similar users engaged with but the target user hasn't seen	Trino co-occurrence query with IDF weighting (seeded by default installer)
`ML_MODEL`	Uses a trained TensorFlow Recommenders model to predict personalized content rankings	TF Serving REST API backed by a TFRS two-tower model

Strategy Lifecycle

Status	Description
`DRAFT`	Being configured, not yet generating recommendations
`ACTIVE`	Live and producing recommendations on its evaluation schedule
`PAUSED`	Temporarily suspended
`ARCHIVED`	Retired and no longer evaluated

Placements

A placement is a named location in your application where recommendations are displayed — for example, home_feed, article_sidebar, or post_read_next. Each placement links to one or more strategies in priority order and defines a maximum number of items.

When a client requests recommendations for a placement, the system:

Resolves the strategies linked to that placement
Filters to strategies applicable to the requesting user's segments
Queries pre-computed recommendations from each strategy
Runs the results through the assembler pipeline
Returns a ranked list

Recommendation Assembly

Raw recommendations from multiple strategies pass through an assembly pipeline before being served:

Dismissed filtering — content the user has dismissed is removed
Deduplication — when multiple strategies recommend the same content, only the highest-scoring entry is kept
Freshness boost — scores are adjusted by a time-decay factor so recently generated recommendations surface above stale ones
Category diversity cap — no single category dominates the result set (configurable, default 3 items per category)
Final ranking — sorted by adjusted score, top N returned

How It Stays Fresh

Recommendations update automatically through scheduled evaluation:

Each strategy can have an evaluation schedule (cron expression) that triggers periodic re-evaluation
Trending strategies typically refresh hourly, segment-based every 6 hours, ML models daily
The scheduler runs the strategy's analytics query against the latest Iceberg event data and upserts fresh scores
Expired recommendations are cleaned up automatically by a periodic job
Content-based similarity is always real-time — it queries Meilisearch on demand, so new content is discoverable immediately after indexing

Machine Learning with TensorFlow Recommenders

For platforms with sufficient interaction data, Bosca supports ML-powered recommendations using TensorFlow Recommenders (TFRS).

Architecture

recommendation-trainer (Python)          tf-serving (Google)
  1. Connect to Trino                      Loads trained SavedModel
  2. Load interaction history              Serves predictions via REST
  3. Train two-tower model                 Hot-reloads new model versions
  4. Upload model to Bosca storage
                                         bosca-server (Kotlin)
recommendation-model-loader (sidecar)      ML_MODEL strategy evaluation
  Polls Bosca for new models               calls TF Serving
  Downloads to TF Serving                  maps predictions to recommendations

Two-Tower Model

The TFRS model uses a two-tower architecture:

User tower — maps user IDs to embedding vectors based on interaction patterns
Content tower — maps content IDs plus features (content type, language, categories) to embedding vectors
Scoring — the dot product of user and content embeddings predicts relevance

The model is trained on implicit feedback from Bosca's analytics events (impressions, interactions, completions) loaded directly from Trino.

Training Pipeline

Training is triggered by Bosca's TrainModelJob, which calls the trainer service's HTTP endpoint. The trainer:

Queries Trino for interaction data and content features
Trains the two-tower retrieval model
Builds a ScaNN index for fast approximate nearest neighbor search
Exports the SavedModel and uploads it to Bosca's content storage
TF Serving detects the new version and hot-reloads it

Model artifacts are stored as Bosca metadata content — versioned, access-controlled, and backed up with everything else.

Cold Start

The system handles cold start gracefully:

New users get trending and curated content, plus segment-based recommendations if they belong to any segments
New content is immediately discoverable via Meilisearch similarity (embeddings are generated at index time) and appears in trending feeds once it accumulates interactions
ML models use content features (type, language, categories) alongside IDs, so items with zero interaction history still receive predictions based on their metadata

User Engagement Tracking

User interactions with recommended content (views, clicks, completions) are tracked through Bosca's existing analytics event pipeline — not a separate tracking system. When a client displays a recommendation, it should include attribution context in the analytics event's extras field so that strategy effectiveness can be measured:

{
  "type": "Interaction",
  "element": {
    "type": "recommendation",
    "content": [{"id": "content-uuid", "type": "metadata"}],
    "extras": {
      "recommendation_strategy": "trending-24h",
      "recommendation_position": 3
    }
  }
}

This data feeds back into strategy evaluation on the next cycle, creating a continuous improvement loop.

Dismissals are the one exception — they are stored as a persistent user preference (not an analytics event) because they need to be:

Queried at serving time with low latency
Revocable (users can undo a dismissal)
Filtered in real-time, not on a batch schedule

GraphQL API

Querying Recommendations

# Personalized feed for a profile
query {
  recommendation {
    profile(profileId: "...", offset: 0, limit: 10) {
      metadata { id name contentType }
      collection { id name }
      score
      reason
      strategy { name type }
    }
  }
}

# Recommendations for a specific UI placement
query {
  recommendation {
    placement(profileId: "...", placementSlug: "home_feed", limit: 5) {
      metadata { id name }
      score
    }
  }
}

# Similar content (real-time vector search)
query {
  recommendation {
    similar(metadataId: "...", limit: 5) {
      metadata { id name }
      score
    }
  }
}

# Trending (works without authentication)
query {
  recommendation {
    trending(offset: 0, limit: 10) {
      metadata { id name }
      score
    }
  }
}

Managing Dismissals

mutation {
  recommendation {
    dismiss(profileId: "...", metadataId: "...")
    undismiss(profileId: "...", metadataId: "...")
  }
}

Administration

Strategies and placements are managed through admin-only fields nested under recommendation:

# Create a trending strategy with hourly refresh
mutation {
  recommendation {
    strategies {
      add(
        strategy: {
          name: "Trending Content"
          type: TRENDING
          status: ACTIVE
          analyticsQueryId: "..."
          evaluationSchedule: "0 * * * *"
          priority: 5
          maxRecommendations: 20
        }
        segmentIds: []
      ) { id name evaluationSchedule }
    }
  }
}

# Create a placement that blends multiple strategies
mutation {
  recommendation {
    placements {
      add(
        placement: {
          name: "Home Feed"
          slug: "home_feed"
          maxItems: 10
        }
        strategyIds: ["strategy-1", "strategy-2"]
      ) { id slug }
    }
  }
}

# Browse strategies and placements
query {
  recommendation {
    strategies {
      all(offset: 0, limit: 10) { id name type status }
    }
    placements {
      all { id name slug maxItems }
    }
  }
}

Default Setup

On first startup, Bosca's package installer seeds:

Trending Content analytics query (interaction velocity over 7 days with recency weighting)
Popular Content analytics query (interaction counts over 14 days)
Collaborative Filtering analytics query (co-occurrence with IDF weighting over 30 days)
A Trending strategy with hourly evaluation
A Popular strategy with 6-hour evaluation
A Collaborative Filtering strategy with daily evaluation
A home_feed placement linking all strategies

These defaults provide working recommendations out of the box as soon as analytics data starts flowing.

For developers

Related modules:

Core interfaces: backend/framework/core-recommendations
Implementation: backend/framework/recommendations
ML pipeline: ml/recommendation-trainer
GraphQL schema: backend/framework/recommendations/src/main/resources/graphql/recommendations.graphqls

Analytics: Event tracking and Trino queries
Segmentation: Audience segments for targeting
Search: Meilisearch and vector similarity
Profiles: User profiles and attributes
Scheduler: Cron-based job scheduling

Feature Flags & Experimentation

Roll out features safely with targeted feature flags, A/B test changes with conversion goals, and observe live distribution and statistical lift.