Recommendations
Bosca includes a recommendation engine that surfaces the right content for each user. It combines analytics-driven strategies, semantic similarity, audience segmentation, and machine learning to generate personalized suggestions for both metadata items and collections.
What you get
- Multiple strategy types — trending, segment-based, content similarity, curated lists, collaborative filtering, and ML-powered predictions
- Placement system — define named display slots in your UI and control which strategies feed each one
- Segment targeting — strategies can target specific audience segments or apply to all users
- Scheduled evaluation — strategies re-evaluate automatically on cron schedules to keep recommendations fresh
- Dismissals — users can dismiss content they don't want to see, and undo dismissals later
- Diversity controls — category caps and freshness boosting prevent filter bubbles
- TensorFlow Recommenders — train two-tower retrieval models on your interaction data and serve predictions via TF Serving
- Works for anonymous users — trending and curated strategies serve content without requiring a profile
Strategy Types
Recommendations are generated by strategies, each using a different approach to match content with users.
| Type | How it works | Data source |
|---|---|---|
TRENDING | Ranks content by recent interaction velocity across all users | Trino analytics queries over Iceberg events |
SEGMENT_BASED | Surfaces content popular among users in the same audience segments | Trino queries filtered by segment membership |
CONTENT_BASED | Finds content similar to what a user has interacted with, using categories, labels, and vector embeddings | Meilisearch hybrid search with semantic similarity |
CURATED | Admin-managed content lists for editorial picks or featured content | Manual selection, no analytics query needed |
COLLABORATIVE | Recommends content that similar users engaged with but the target user hasn't seen | Trino co-occurrence query with IDF weighting (seeded by default installer) |
ML_MODEL | Uses a trained TensorFlow Recommenders model to predict personalized content rankings | TF Serving REST API backed by a TFRS two-tower model |
Strategy Lifecycle
| Status | Description |
|---|---|
DRAFT | Being configured, not yet generating recommendations |
ACTIVE | Live and producing recommendations on its evaluation schedule |
PAUSED | Temporarily suspended |
ARCHIVED | Retired and no longer evaluated |
Placements
A placement is a named location in your application where recommendations are displayed — for example, home_feed, article_sidebar, or post_read_next. Each placement links to one or more strategies in priority order and defines a maximum number of items.
When a client requests recommendations for a placement, the system:
- Resolves the strategies linked to that placement
- Filters to strategies applicable to the requesting user's segments
- Queries pre-computed recommendations from each strategy
- Runs the results through the assembler pipeline
- Returns a ranked list
Recommendation Assembly
Raw recommendations from multiple strategies pass through an assembly pipeline before being served:
- Dismissed filtering — content the user has dismissed is removed
- Deduplication — when multiple strategies recommend the same content, only the highest-scoring entry is kept
- Freshness boost — scores are adjusted by a time-decay factor so recently generated recommendations surface above stale ones
- Category diversity cap — no single category dominates the result set (configurable, default 3 items per category)
- Final ranking — sorted by adjusted score, top N returned
How It Stays Fresh
Recommendations update automatically through scheduled evaluation:
- Each strategy can have an evaluation schedule (cron expression) that triggers periodic re-evaluation
- Trending strategies typically refresh hourly, segment-based every 6 hours, ML models daily
- The scheduler runs the strategy's analytics query against the latest Iceberg event data and upserts fresh scores
- Expired recommendations are cleaned up automatically by a periodic job
- Content-based similarity is always real-time — it queries Meilisearch on demand, so new content is discoverable immediately after indexing
Machine Learning with TensorFlow Recommenders
For platforms with sufficient interaction data, Bosca supports ML-powered recommendations using TensorFlow Recommenders (TFRS).
Architecture
recommendation-trainer (Python) tf-serving (Google)
1. Connect to Trino Loads trained SavedModel
2. Load interaction history Serves predictions via REST
3. Train two-tower model Hot-reloads new model versions
4. Upload model to Bosca storage
bosca-server (Kotlin)
recommendation-model-loader (sidecar) ML_MODEL strategy evaluation
Polls Bosca for new models calls TF Serving
Downloads to TF Serving maps predictions to recommendations
Two-Tower Model
The TFRS model uses a two-tower architecture:
- User tower — maps user IDs to embedding vectors based on interaction patterns
- Content tower — maps content IDs plus features (content type, language, categories) to embedding vectors
- Scoring — the dot product of user and content embeddings predicts relevance
The model is trained on implicit feedback from Bosca's analytics events (impressions, interactions, completions) loaded directly from Trino.
Training Pipeline
Training is triggered by Bosca's TrainModelJob, which calls the trainer service's HTTP endpoint. The trainer:
- Queries Trino for interaction data and content features
- Trains the two-tower retrieval model
- Builds a ScaNN index for fast approximate nearest neighbor search
- Exports the SavedModel and uploads it to Bosca's content storage
- TF Serving detects the new version and hot-reloads it
Model artifacts are stored as Bosca metadata content — versioned, access-controlled, and backed up with everything else.
Cold Start
The system handles cold start gracefully:
- New users get trending and curated content, plus segment-based recommendations if they belong to any segments
- New content is immediately discoverable via Meilisearch similarity (embeddings are generated at index time) and appears in trending feeds once it accumulates interactions
- ML models use content features (type, language, categories) alongside IDs, so items with zero interaction history still receive predictions based on their metadata
User Engagement Tracking
User interactions with recommended content (views, clicks, completions) are tracked through Bosca's existing analytics event pipeline — not a separate tracking system. When a client displays a recommendation, it should include attribution context in the analytics event's extras field so that strategy effectiveness can be measured:
{
"type": "Interaction",
"element": {
"type": "recommendation",
"content": [{"id": "content-uuid", "type": "metadata"}],
"extras": {
"recommendation_strategy": "trending-24h",
"recommendation_position": 3
}
}
}
This data feeds back into strategy evaluation on the next cycle, creating a continuous improvement loop.
Dismissals are the one exception — they are stored as a persistent user preference (not an analytics event) because they need to be:
- Queried at serving time with low latency
- Revocable (users can undo a dismissal)
- Filtered in real-time, not on a batch schedule
GraphQL API
Querying Recommendations
# Personalized feed for a profile
query {
recommendation {
profile(profileId: "...", offset: 0, limit: 10) {
metadata { id name contentType }
collection { id name }
score
reason
strategy { name type }
}
}
}
# Recommendations for a specific UI placement
query {
recommendation {
placement(profileId: "...", placementSlug: "home_feed", limit: 5) {
metadata { id name }
score
}
}
}
# Similar content (real-time vector search)
query {
recommendation {
similar(metadataId: "...", limit: 5) {
metadata { id name }
score
}
}
}
# Trending (works without authentication)
query {
recommendation {
trending(offset: 0, limit: 10) {
metadata { id name }
score
}
}
}
Managing Dismissals
mutation {
recommendation {
dismiss(profileId: "...", metadataId: "...")
undismiss(profileId: "...", metadataId: "...")
}
}
Administration
Strategies and placements are managed through admin-only fields nested under recommendation:
# Create a trending strategy with hourly refresh
mutation {
recommendation {
strategies {
add(
strategy: {
name: "Trending Content"
type: TRENDING
status: ACTIVE
analyticsQueryId: "..."
evaluationSchedule: "0 * * * *"
priority: 5
maxRecommendations: 20
}
segmentIds: []
) { id name evaluationSchedule }
}
}
}
# Create a placement that blends multiple strategies
mutation {
recommendation {
placements {
add(
placement: {
name: "Home Feed"
slug: "home_feed"
maxItems: 10
}
strategyIds: ["strategy-1", "strategy-2"]
) { id slug }
}
}
}
# Browse strategies and placements
query {
recommendation {
strategies {
all(offset: 0, limit: 10) { id name type status }
}
placements {
all { id name slug maxItems }
}
}
}
Default Setup
On first startup, Bosca's package installer seeds:
- Trending Content analytics query (interaction velocity over 7 days with recency weighting)
- Popular Content analytics query (interaction counts over 14 days)
- Collaborative Filtering analytics query (co-occurrence with IDF weighting over 30 days)
- A Trending strategy with hourly evaluation
- A Popular strategy with 6-hour evaluation
- A Collaborative Filtering strategy with daily evaluation
- A home_feed placement linking all strategies
These defaults provide working recommendations out of the box as soon as analytics data starts flowing.
For developers
Related modules:
- Core interfaces:
backend/framework/core-recommendations - Implementation:
backend/framework/recommendations - ML pipeline:
ml/recommendation-trainer - GraphQL schema:
backend/framework/recommendations/src/main/resources/graphql/recommendations.graphqls
Related:
- Analytics: Event tracking and Trino queries
- Segmentation: Audience segments for targeting
- Search: Meilisearch and vector similarity
- Profiles: User profiles and attributes
- Scheduler: Cron-based job scheduling