Analytics

Turn first-party event data into practical insights and personalized experiences.

Bosca Analytics helps you see what resonates with your audience so you can make smarter decisions, faster.

What you get

Clear insight into content performance and engagement
First-party data collection you control
Signals that power personalization and AI enrichment
Dashboards your teams can actually use
SQL-based data exploration through Trino

Where your data comes from

Lightweight browser and app SDKs capture key events (page views, actions, completions)
Server-side events fill in the gaps where needed
The Analytics SDK is distributed via NPM for easy integration

Auto-Instrumentation

The Analytics SDK includes automatic instrumentation that tracks user behavior with zero extra setup:

Click tracking — clicks, touch events, and rage-click detection
Scroll depth — milestone-based tracking (25%, 50%, 75%, 90%, 100%)
Page views — full page loads and SPA navigations via History API hooks
Form interactions — submissions, field focus/blur timing, and abandonment detection
Mouse movement — sampling-based position tracking
Element visibility — IntersectionObserver-based impression tracking with configurable dwell time

This data is suitable for building heat maps and understanding user behavior patterns across your content.

Data pipeline

Events flow through a two-stage pipeline: the Analytics Collector ingests and enriches events, then the Analytics Processor applies dynamic transforms and writes them to long-term storage.

Event data structures

Every batch sent to the collector is an Events payload containing:

EventContext — app ID, app version, session ID, optional user ID, plus nested structures for:
- Device — installation ID, manufacturer, model, platform, locale, timezone, and OS details
- Browser — user agent string
- Geo — city, country, continent, region, coordinates, postal code, and timezone (populated by the collector via Cloudflare headers)
Event — one or more events, each with:
- type — one of session, interaction, impression, completion, installation, or error
- created / created_micros — client-side creation timestamp
- Element — the UI element involved, with an ID, type, optional content references, and a freeform extras JSON map
- ErrorInfo — message, type, stack trace, fatal flag, and error code (for error events)
sent / sent_micros — when the client dispatched the batch
received / received_micros — stamped by the collector on arrival

Collector

The Analytics Collector (backend/servers/analytics-collector) is a lightweight Ktor/Netty service that accepts event batches over HTTP.

Endpoints:

POST /api/v1/events (or legacy POST /events) — submit an event batch
POST /api/v1/installation — register a new device installation and receive a ULID-based installation ID
GET /api/v1/events/flush — admin-only, forces buffered events to storage

When a batch arrives the collector stamps it with a received timestamp and passes it through its transform chain:

Geo-enrichment — reads Cloudflare headers (cf-ipcity, cf-ipcountry, cf-ipcontinent, etc.) and populates the Geo structure on the event context.

After transforms complete, events are handed to the configured EventRepository:

NATS mode (eventRepository.type: nats) — publishes each event as a JSON message to the analytics.events JetStream stream for downstream processing by the Analytics Processor. This is the default for native-image deployments.
Iceberg mode (eventRepository.type: iceberg) — writes events directly to an Apache Iceberg table backed by S3-compatible object storage. This is the default when the type is unset.

The collector is built with GraalVM native image support, producing a fast-starting, low-memory binary ideal for edge deployment. Because native images cannot run dynamic scripting, script-based transforms are deferred to the processor.

Processor

The Analytics Processor (backend/servers/analytics-processor) is a JVM service that consumes events from the NATS JetStream stream and applies the dynamic transforms that require a full runtime.

The processor's NatsEventConsumer pulls messages in configurable batches, dispatches them to a pool of worker coroutines, and runs each batch through the processor transform chain:

Script transforms — for each event type (session, interaction, impression, etc.), a ScriptTransformPipelineTransform invokes the matching script hook (e.g., analytics.transform.session). Scripts can modify the event batch inline before it continues through the pipeline.
Script triggers — for each event type, a ScriptTriggerPipelineTransform fires the matching notification hook (e.g., analytics.notify.session). These execute asynchronously via the job queue and cannot modify events — they are fire-and-forget side effects.

After transforms complete, the processor writes events to the Iceberg event repository, which converts them to Apache Parquet files and commits them to the Iceberg table.

Each message is acknowledged only after the full pipeline succeeds. On failure the message is NAK'd so NATS can redeliver it.

Data flow summary

Client SDK
  │  POST /api/v1/events (JSON batch)
  ▼
Analytics Collector (native image or JVM)
  │  1. Stamp received timestamp
  │  2. Geo-enrichment (Cloudflare headers)
  │  3. Publish to NATS JetStream  ──or──  Write directly to Iceberg
  ▼
NATS JetStream (analytics.events stream)
  │  WorkQueue retention, durable consumer
  ▼
Analytics Processor (JVM)
  │  1. Script transforms (inline, per event type)
  │  2. Script triggers (fire-and-forget, per event type)
  │  3. Write to Iceberg (Parquet → S3)
  ▼
Apache Iceberg table
  │  Queryable via Trino SQL
  ▼
Dashboards, AI agents, reports

Processing configuration

Both the collector and processor expose tuning knobs via application.yaml:

Setting	Default	Description
`eventProcessing.workerCount`	16	Coroutine workers draining the processing channel
`eventProcessing.channelCapacity`	5000	Bounded channel capacity for backpressure
`eventProcessing.natsBatchSize`	100	Messages pulled per NATS fetch call (processor)
`eventProcessing.natsFetchTimeoutSeconds`	5	Timeout per NATS fetch (processor)
`eventRepository.type`	`iceberg`	`nats` to forward events, `iceberg` to write directly

Iceberg schema

Events are stored in an Iceberg table with the following top-level columns:

Column	Type	Description
`id`	UUID	Unique event identifier (generated server-side)
`client_id`	string	Client-provided identifier
`type`	string	Event type (session, interaction, impression, etc.)
`sent` / `sent_micros`	timestamp + long	When the client dispatched the batch
`received` / `received_micros`	timestamp + long	When the collector received the batch
`created` / `created_micros`	timestamp + long	When the event was created on the client
`context`	struct	Nested EventContext (app, device, browser, geo, session)
`element`	struct	Nested Element (id, type, content refs, extras JSON)
`error`	struct	Nested ErrorInfo (message, type, stack trace, fatal, code)

Native image support

The Analytics Collector supports GraalVM native image compilation for fast startup and low memory footprint. The build uses the graalvm Gradle plugin with configurations in src/main/resources/META-INF/native-image/:

native-image.properties — build-time initialization for logging and XML, run-time initialization for Netty SSL/compression/epoll, monitoring support
reflect-config.json — reflection metadata for serialization classes
reachability-metadata.json — reachability overrides for native compilation
resource-config.json — resource inclusion patterns

Because GraalVM native images cannot execute dynamic scripts, the collector is designed to handle only static transforms (geo-enrichment) and forward events to NATS. The processor runs on a standard JVM where scripting is fully supported.

Querying with Trino

Bosca integrates Trino as a distributed SQL query engine for analytics workloads. Trino connects to the Iceberg catalog and S3 storage, enabling standard SQL queries over your event data.

AI agents can also query analytics data through Trino, allowing natural language data exploration via the MCP server.

Dashboards

Bosca includes configurable dashboards that surface important metrics directly in the Administration UI. Dashboard configurations are managed through the administration interface.

Visualization Types

Dashboards support a range of visualization types:

Number — single metric display
Label — text-based display
Bar, Line, Pie, Doughnut — standard chart types
Bubble, Scatter — distribution and correlation charts
Table — tabular data display
Date Picker — interactive date filtering
TopoJSON Map — geographic data visualization

Parameterized Queries

Analytics queries support typed parameters (string, integer, float, boolean, date, time, datetime) that can be shared across visualizations within a dashboard. This allows dashboard-level filters to control multiple charts at once.

Access Control

Dashboards integrate with the platform's permission model, so you can control which groups have access to specific dashboards.

Script transforms and triggers

Analytics events can be processed by user-defined scripts at two stages in the pipeline:

Script transforms (inline)

Script transforms modify events as they flow through the pipeline. Each event type has a dedicated hook name following the pattern analytics.transform.<type> (e.g., analytics.transform.session, analytics.transform.interaction). The script receives the full event batch wrapped in an AnalyticsScriptEvent and returns a modified version. If the scripting module is unavailable, the transform is a no-op.

Script triggers (fire-and-forget)

Script triggers fire asynchronously after transforms complete. They follow the naming pattern analytics.notify.<type> (e.g., analytics.notify.impression, analytics.notify.completion). Triggers are dispatched via the job queue and cannot modify the event data — they are intended for side effects such as sending notifications, updating external systems, or triggering workflows.

Both script transforms and triggers are applied in the Analytics Processor, not the collector, because they require a full JVM runtime.

Privacy and ownership

You own your data — fully first-party
Choose what to collect and how long to keep it
The client SDK never captures form field values, only identifiers and timing
Stack traces are capped at 8,192 characters; element text is truncated to 100 characters
Use alongside tools like Google Analytics without duplication

For developers

Related modules:

Analytics collector: backend/servers/analytics-collector
Analytics processor: backend/servers/analytics-processor
Analytics models: backend/framework/analytics-models
Analytics framework: backend/framework/analytics
Analytics AI: backend/framework/analytics-ai
Core analytics: backend/framework/core-analytics
Web SDK: web/analytics

Architecture: Analytics overview
AI: SQL agents and data exploration
Experimentation: Feature flags & A/B testing — conversion goals query the events table to compute per-variation metrics
Recommendations: Analytics-powered content recommendations

Localization

Multi-language support for content and interfaces using IETF language tags.

AI & Agents

AI model registry, reusable prompts, intelligent agents, and Model Context Protocol integration.