Analytics

Turn first-party event data into practical insights and personalized experiences.

Bosca Analytics helps you see what resonates with your audience so you can make smarter decisions, faster.

What you get

  • Clear insight into content performance and engagement
  • First-party data collection you control
  • Signals that power personalization and AI enrichment
  • Dashboards your teams can actually use
  • SQL-based data exploration through Trino

Where your data comes from

  • Lightweight browser and app SDKs capture key events (page views, actions, completions)
  • Server-side events fill in the gaps where needed
  • The Analytics SDK is distributed via NPM for easy integration

Auto-Instrumentation

The Analytics SDK includes automatic instrumentation that tracks user behavior with zero extra setup:

  • Click tracking — clicks, touch events, and rage-click detection
  • Scroll depth — milestone-based tracking (25%, 50%, 75%, 90%, 100%)
  • Page views — full page loads and SPA navigations via History API hooks
  • Form interactions — submissions, field focus/blur timing, and abandonment detection
  • Mouse movement — sampling-based position tracking
  • Element visibility — IntersectionObserver-based impression tracking with configurable dwell time

This data is suitable for building heat maps and understanding user behavior patterns across your content.

Data pipeline

Events flow through a two-stage pipeline: the Analytics Collector ingests and enriches events, then the Analytics Processor applies dynamic transforms and writes them to long-term storage.

Event data structures

Every batch sent to the collector is an Events payload containing:

  • EventContext — app ID, app version, session ID, optional user ID, plus nested structures for:
    • Device — installation ID, manufacturer, model, platform, locale, timezone, and OS details
    • Browser — user agent string
    • Geo — city, country, continent, region, coordinates, postal code, and timezone (populated by the collector via Cloudflare headers)
  • Event — one or more events, each with:
    • type — one of session, interaction, impression, completion, installation, or error
    • created / created_micros — client-side creation timestamp
    • Element — the UI element involved, with an ID, type, optional content references, and a freeform extras JSON map
    • ErrorInfo — message, type, stack trace, fatal flag, and error code (for error events)
  • sent / sent_micros — when the client dispatched the batch
  • received / received_micros — stamped by the collector on arrival

Collector

The Analytics Collector (backend/servers/analytics-collector) is a lightweight Ktor/Netty service that accepts event batches over HTTP.

Endpoints:

  • POST /api/v1/events (or legacy POST /events) — submit an event batch
  • POST /api/v1/installation — register a new device installation and receive a ULID-based installation ID
  • GET /api/v1/events/flush — admin-only, forces buffered events to storage

When a batch arrives the collector stamps it with a received timestamp and passes it through its transform chain:

  1. Geo-enrichment — reads Cloudflare headers (cf-ipcity, cf-ipcountry, cf-ipcontinent, etc.) and populates the Geo structure on the event context.

After transforms complete, events are handed to the configured EventRepository:

  • NATS mode (eventRepository.type: nats) — publishes each event as a JSON message to the analytics.events JetStream stream for downstream processing by the Analytics Processor. This is the default for native-image deployments.
  • Iceberg mode (eventRepository.type: iceberg) — writes events directly to an Apache Iceberg table backed by S3-compatible object storage. This is the default when the type is unset.

The collector is built with GraalVM native image support, producing a fast-starting, low-memory binary ideal for edge deployment. Because native images cannot run dynamic scripting, script-based transforms are deferred to the processor.

Processor

The Analytics Processor (backend/servers/analytics-processor) is a JVM service that consumes events from the NATS JetStream stream and applies the dynamic transforms that require a full runtime.

The processor's NatsEventConsumer pulls messages in configurable batches, dispatches them to a pool of worker coroutines, and runs each batch through the processor transform chain:

  1. Script transforms — for each event type (session, interaction, impression, etc.), a ScriptTransformPipelineTransform invokes the matching script hook (e.g., analytics.transform.session). Scripts can modify the event batch inline before it continues through the pipeline.
  2. Script triggers — for each event type, a ScriptTriggerPipelineTransform fires the matching notification hook (e.g., analytics.notify.session). These execute asynchronously via the job queue and cannot modify events — they are fire-and-forget side effects.

After transforms complete, the processor writes events to the Iceberg event repository, which converts them to Apache Parquet files and commits them to the Iceberg table.

Each message is acknowledged only after the full pipeline succeeds. On failure the message is NAK'd so NATS can redeliver it.

Data flow summary

Client SDK
  │  POST /api/v1/events (JSON batch)
  ▼
Analytics Collector (native image or JVM)
  │  1. Stamp received timestamp
  │  2. Geo-enrichment (Cloudflare headers)
  │  3. Publish to NATS JetStream  ──or──  Write directly to Iceberg
  ▼
NATS JetStream (analytics.events stream)
  │  WorkQueue retention, durable consumer
  ▼
Analytics Processor (JVM)
  │  1. Script transforms (inline, per event type)
  │  2. Script triggers (fire-and-forget, per event type)
  │  3. Write to Iceberg (Parquet → S3)
  ▼
Apache Iceberg table
  │  Queryable via Trino SQL
  ▼
Dashboards, AI agents, reports

Processing configuration

Both the collector and processor expose tuning knobs via application.yaml:

SettingDefaultDescription
eventProcessing.workerCount16Coroutine workers draining the processing channel
eventProcessing.channelCapacity5000Bounded channel capacity for backpressure
eventProcessing.natsBatchSize100Messages pulled per NATS fetch call (processor)
eventProcessing.natsFetchTimeoutSeconds5Timeout per NATS fetch (processor)
eventRepository.typeicebergnats to forward events, iceberg to write directly

Iceberg schema

Events are stored in an Iceberg table with the following top-level columns:

ColumnTypeDescription
idUUIDUnique event identifier (generated server-side)
client_idstringClient-provided identifier
typestringEvent type (session, interaction, impression, etc.)
sent / sent_microstimestamp + longWhen the client dispatched the batch
received / received_microstimestamp + longWhen the collector received the batch
created / created_microstimestamp + longWhen the event was created on the client
contextstructNested EventContext (app, device, browser, geo, session)
elementstructNested Element (id, type, content refs, extras JSON)
errorstructNested ErrorInfo (message, type, stack trace, fatal, code)

Native image support

The Analytics Collector supports GraalVM native image compilation for fast startup and low memory footprint. The build uses the graalvm Gradle plugin with configurations in src/main/resources/META-INF/native-image/:

  • native-image.properties — build-time initialization for logging and XML, run-time initialization for Netty SSL/compression/epoll, monitoring support
  • reflect-config.json — reflection metadata for serialization classes
  • reachability-metadata.json — reachability overrides for native compilation
  • resource-config.json — resource inclusion patterns

Because GraalVM native images cannot execute dynamic scripts, the collector is designed to handle only static transforms (geo-enrichment) and forward events to NATS. The processor runs on a standard JVM where scripting is fully supported.

Querying with Trino

Bosca integrates Trino as a distributed SQL query engine for analytics workloads. Trino connects to the Iceberg catalog and S3 storage, enabling standard SQL queries over your event data.

AI agents can also query analytics data through Trino, allowing natural language data exploration via the MCP server.

Dashboards

Bosca includes configurable dashboards that surface important metrics directly in the Administration UI. Dashboard configurations are managed through the administration interface.

Visualization Types

Dashboards support a range of visualization types:

  • Number — single metric display
  • Label — text-based display
  • Bar, Line, Pie, Doughnut — standard chart types
  • Bubble, Scatter — distribution and correlation charts
  • Table — tabular data display
  • Date Picker — interactive date filtering
  • TopoJSON Map — geographic data visualization

Parameterized Queries

Analytics queries support typed parameters (string, integer, float, boolean, date, time, datetime) that can be shared across visualizations within a dashboard. This allows dashboard-level filters to control multiple charts at once.

Access Control

Dashboards integrate with the platform's permission model, so you can control which groups have access to specific dashboards.

Script transforms and triggers

Analytics events can be processed by user-defined scripts at two stages in the pipeline:

Script transforms (inline)

Script transforms modify events as they flow through the pipeline. Each event type has a dedicated hook name following the pattern analytics.transform.<type> (e.g., analytics.transform.session, analytics.transform.interaction). The script receives the full event batch wrapped in an AnalyticsScriptEvent and returns a modified version. If the scripting module is unavailable, the transform is a no-op.

Script triggers (fire-and-forget)

Script triggers fire asynchronously after transforms complete. They follow the naming pattern analytics.notify.<type> (e.g., analytics.notify.impression, analytics.notify.completion). Triggers are dispatched via the job queue and cannot modify the event data — they are intended for side effects such as sending notifications, updating external systems, or triggering workflows.

Both script transforms and triggers are applied in the Analytics Processor, not the collector, because they require a full JVM runtime.

Privacy and ownership

  • You own your data — fully first-party
  • Choose what to collect and how long to keep it
  • The client SDK never captures form field values, only identifiers and timing
  • Stack traces are capped at 8,192 characters; element text is truncated to 100 characters
  • Use alongside tools like Google Analytics without duplication

For developers

Related modules:

  • Analytics collector: backend/servers/analytics-collector
  • Analytics processor: backend/servers/analytics-processor
  • Analytics models: backend/framework/analytics-models
  • Analytics framework: backend/framework/analytics
  • Analytics AI: backend/framework/analytics-ai
  • Core analytics: backend/framework/core-analytics
  • Web SDK: web/analytics

Related: