Analytics
Bosca Analytics helps you see what resonates with your audience so you can make smarter decisions, faster.
What you get
- Clear insight into content performance and engagement
- First-party data collection you control
- Signals that power personalization and AI enrichment
- Dashboards your teams can actually use
- SQL-based data exploration through Trino
Where your data comes from
- Lightweight browser and app SDKs capture key events (page views, actions, completions)
- Server-side events fill in the gaps where needed
- The Analytics SDK is distributed via NPM for easy integration
Auto-Instrumentation
The Analytics SDK includes automatic instrumentation that tracks user behavior with zero extra setup:
- Click tracking — clicks, touch events, and rage-click detection
- Scroll depth — milestone-based tracking (25%, 50%, 75%, 90%, 100%)
- Page views — full page loads and SPA navigations via History API hooks
- Form interactions — submissions, field focus/blur timing, and abandonment detection
- Mouse movement — sampling-based position tracking
- Element visibility — IntersectionObserver-based impression tracking with configurable dwell time
This data is suitable for building heat maps and understanding user behavior patterns across your content.
Data pipeline
Events flow through a two-stage pipeline: the Analytics Collector ingests and enriches events, then the Analytics Processor applies dynamic transforms and writes them to long-term storage.
Event data structures
Every batch sent to the collector is an Events payload containing:
- EventContext — app ID, app version, session ID, optional user ID, plus nested structures for:
- Device — installation ID, manufacturer, model, platform, locale, timezone, and OS details
- Browser — user agent string
- Geo — city, country, continent, region, coordinates, postal code, and timezone (populated by the collector via Cloudflare headers)
- Event — one or more events, each with:
- type — one of
session,interaction,impression,completion,installation, orerror - created / created_micros — client-side creation timestamp
- Element — the UI element involved, with an ID, type, optional content references, and a freeform
extrasJSON map - ErrorInfo — message, type, stack trace, fatal flag, and error code (for
errorevents)
- type — one of
- sent / sent_micros — when the client dispatched the batch
- received / received_micros — stamped by the collector on arrival
Collector
The Analytics Collector (backend/servers/analytics-collector) is a lightweight Ktor/Netty service that accepts event batches over HTTP.
Endpoints:
POST /api/v1/events(or legacyPOST /events) — submit an event batchPOST /api/v1/installation— register a new device installation and receive a ULID-based installation IDGET /api/v1/events/flush— admin-only, forces buffered events to storage
When a batch arrives the collector stamps it with a received timestamp and passes it through its transform chain:
- Geo-enrichment — reads Cloudflare headers (
cf-ipcity,cf-ipcountry,cf-ipcontinent, etc.) and populates theGeostructure on the event context.
After transforms complete, events are handed to the configured EventRepository:
- NATS mode (
eventRepository.type: nats) — publishes each event as a JSON message to theanalytics.eventsJetStream stream for downstream processing by the Analytics Processor. This is the default for native-image deployments. - Iceberg mode (
eventRepository.type: iceberg) — writes events directly to an Apache Iceberg table backed by S3-compatible object storage. This is the default when the type is unset.
The collector is built with GraalVM native image support, producing a fast-starting, low-memory binary ideal for edge deployment. Because native images cannot run dynamic scripting, script-based transforms are deferred to the processor.
Processor
The Analytics Processor (backend/servers/analytics-processor) is a JVM service that consumes events from the NATS JetStream stream and applies the dynamic transforms that require a full runtime.
The processor's NatsEventConsumer pulls messages in configurable batches, dispatches them to a pool of worker coroutines, and runs each batch through the processor transform chain:
- Script transforms — for each event type (
session,interaction,impression, etc.), aScriptTransformPipelineTransforminvokes the matching script hook (e.g.,analytics.transform.session). Scripts can modify the event batch inline before it continues through the pipeline. - Script triggers — for each event type, a
ScriptTriggerPipelineTransformfires the matching notification hook (e.g.,analytics.notify.session). These execute asynchronously via the job queue and cannot modify events — they are fire-and-forget side effects.
After transforms complete, the processor writes events to the Iceberg event repository, which converts them to Apache Parquet files and commits them to the Iceberg table.
Each message is acknowledged only after the full pipeline succeeds. On failure the message is NAK'd so NATS can redeliver it.
Data flow summary
Client SDK
│ POST /api/v1/events (JSON batch)
▼
Analytics Collector (native image or JVM)
│ 1. Stamp received timestamp
│ 2. Geo-enrichment (Cloudflare headers)
│ 3. Publish to NATS JetStream ──or── Write directly to Iceberg
▼
NATS JetStream (analytics.events stream)
│ WorkQueue retention, durable consumer
▼
Analytics Processor (JVM)
│ 1. Script transforms (inline, per event type)
│ 2. Script triggers (fire-and-forget, per event type)
│ 3. Write to Iceberg (Parquet → S3)
▼
Apache Iceberg table
│ Queryable via Trino SQL
▼
Dashboards, AI agents, reports
Processing configuration
Both the collector and processor expose tuning knobs via application.yaml:
| Setting | Default | Description |
|---|---|---|
eventProcessing.workerCount | 16 | Coroutine workers draining the processing channel |
eventProcessing.channelCapacity | 5000 | Bounded channel capacity for backpressure |
eventProcessing.natsBatchSize | 100 | Messages pulled per NATS fetch call (processor) |
eventProcessing.natsFetchTimeoutSeconds | 5 | Timeout per NATS fetch (processor) |
eventRepository.type | iceberg | nats to forward events, iceberg to write directly |
Iceberg schema
Events are stored in an Iceberg table with the following top-level columns:
| Column | Type | Description |
|---|---|---|
id | UUID | Unique event identifier (generated server-side) |
client_id | string | Client-provided identifier |
type | string | Event type (session, interaction, impression, etc.) |
sent / sent_micros | timestamp + long | When the client dispatched the batch |
received / received_micros | timestamp + long | When the collector received the batch |
created / created_micros | timestamp + long | When the event was created on the client |
context | struct | Nested EventContext (app, device, browser, geo, session) |
element | struct | Nested Element (id, type, content refs, extras JSON) |
error | struct | Nested ErrorInfo (message, type, stack trace, fatal, code) |
Native image support
The Analytics Collector supports GraalVM native image compilation for fast startup and low memory footprint. The build uses the graalvm Gradle plugin with configurations in src/main/resources/META-INF/native-image/:
- native-image.properties — build-time initialization for logging and XML, run-time initialization for Netty SSL/compression/epoll, monitoring support
- reflect-config.json — reflection metadata for serialization classes
- reachability-metadata.json — reachability overrides for native compilation
- resource-config.json — resource inclusion patterns
Because GraalVM native images cannot execute dynamic scripts, the collector is designed to handle only static transforms (geo-enrichment) and forward events to NATS. The processor runs on a standard JVM where scripting is fully supported.
Querying with Trino
Bosca integrates Trino as a distributed SQL query engine for analytics workloads. Trino connects to the Iceberg catalog and S3 storage, enabling standard SQL queries over your event data.
AI agents can also query analytics data through Trino, allowing natural language data exploration via the MCP server.
Dashboards
Bosca includes configurable dashboards that surface important metrics directly in the Administration UI. Dashboard configurations are managed through the administration interface.
Visualization Types
Dashboards support a range of visualization types:
- Number — single metric display
- Label — text-based display
- Bar, Line, Pie, Doughnut — standard chart types
- Bubble, Scatter — distribution and correlation charts
- Table — tabular data display
- Date Picker — interactive date filtering
- TopoJSON Map — geographic data visualization
Parameterized Queries
Analytics queries support typed parameters (string, integer, float, boolean, date, time, datetime) that can be shared across visualizations within a dashboard. This allows dashboard-level filters to control multiple charts at once.
Access Control
Dashboards integrate with the platform's permission model, so you can control which groups have access to specific dashboards.
Script transforms and triggers
Analytics events can be processed by user-defined scripts at two stages in the pipeline:
Script transforms (inline)
Script transforms modify events as they flow through the pipeline. Each event type has a dedicated hook name following the pattern analytics.transform.<type> (e.g., analytics.transform.session, analytics.transform.interaction). The script receives the full event batch wrapped in an AnalyticsScriptEvent and returns a modified version. If the scripting module is unavailable, the transform is a no-op.
Script triggers (fire-and-forget)
Script triggers fire asynchronously after transforms complete. They follow the naming pattern analytics.notify.<type> (e.g., analytics.notify.impression, analytics.notify.completion). Triggers are dispatched via the job queue and cannot modify the event data — they are intended for side effects such as sending notifications, updating external systems, or triggering workflows.
Both script transforms and triggers are applied in the Analytics Processor, not the collector, because they require a full JVM runtime.
Privacy and ownership
- You own your data — fully first-party
- Choose what to collect and how long to keep it
- The client SDK never captures form field values, only identifiers and timing
- Stack traces are capped at 8,192 characters; element text is truncated to 100 characters
- Use alongside tools like Google Analytics without duplication
For developers
Related modules:
- Analytics collector:
backend/servers/analytics-collector - Analytics processor:
backend/servers/analytics-processor - Analytics models:
backend/framework/analytics-models - Analytics framework:
backend/framework/analytics - Analytics AI:
backend/framework/analytics-ai - Core analytics:
backend/framework/core-analytics - Web SDK:
web/analytics
Related:
- Architecture: Analytics overview
- AI: SQL agents and data exploration
- Experimentation: Feature flags & A/B testing — conversion goals query the events table to compute per-variation metrics
- Recommendations: Analytics-powered content recommendations