Architecture

A small, focused set of components that keeps Bosca simple, reliable, and scalable.

Bosca uses a small, thoughtfully designed set of components. Fewer moving parts mean fewer surprises, easier operations, and predictable scaling.

You can start simple: many core functions run in a single server. As your needs grow, you can split responsibilities and scale components independently.

At a glance, this approach helps you:

Keep operations straightforward for small teams
Add capabilities gradually with modular growth
Balance performance, cost, and reliability

Component Organization

Bosca's components are grouped into key functional areas to maintain clarity and ensure effective modularization:

Object Storage
Structured Storage
Search
Caching
Workflows
AI/ML
Analytics
General Operations

These functional areas allow us to design and organize the system in a way that is both efficient and scalable. In the following sections, we will explore these components in greater depth and explain how they work together.

Ingress

Component Type: General Operations, See More

Bosca is agnostic about the ingress method you choose for deployment. But, we do recommend nginx as a starting point.

Kubernetes Deployment: We leverage nginx ingress because we have experienced it running at scale and find it suitable.
Docker Compose Deployments: In this setup, all services are routed through nginx, enabling it to handle SSL termination and load balancing effectively.

Analytics

Component Type: Analytics, AI/ML, Workflows
See More

The analytics system uses a two-stage pipeline to capture, enrich, and store first-party events.

Analytics Collector

The Analytics Collector (:backend:servers:analytics-collector) is a lightweight Netty service that ingests event batches over HTTP. It stamps events with a received timestamp, enriches them with geo-location data from Cloudflare headers, and forwards them to the next stage.

The collector supports GraalVM native image compilation for fast startup and low memory usage, making it well-suited for edge deployments. Because native images cannot run dynamic scripts, the collector handles only static enrichment and delegates scripting to the processor.

When configured with eventRepository.type: nats (the default for native-image deployments), the collector publishes enriched events to a NATS JetStream stream. Alternatively, it can write directly to Iceberg when set to iceberg mode.

Analytics Processor

The Analytics Processor (:backend:servers:analytics-processor) is a JVM service that consumes events from the NATS JetStream stream. It runs the dynamic transform chain — script transforms that can modify events inline, and script triggers that fire asynchronous side effects — then writes the final events to an Apache Iceberg table backed by S3-compatible object storage.

The processor also runs a job queue for analytics-related background work, including scheduled SQL queries and AI-powered data exploration.

Storage

Events are persisted as Parquet files in an Iceberg table, making them suitable for both real-time insights and long-term batch processing. Trino queries the Iceberg catalog for dashboards, reports, and AI agent data exploration.

We still recommend using third-party analytics alongside Bosca for validation and redundancy. This control allows for validation, advanced system capabilities, and a safety net in case additional privacy laws cause unexpected changes in how you leverage third party systems through systems like the App Store or Play Store.

If you don't want to use Bosca's analytics system, you can bypass these components.

Bosca Server

Component Type: General Operations

The Bosca Server serves as the backbone of the Bosca platform, offering GraphQL interfaces to manage and interact with your content. It handles critical functions, including workflow state transitions, authentication, permissions, profiles, collections, metadata, supplementary content, documents, guides, AI agents, scripting, backup & restore, configuration, and more.

The server also includes an optional MCP server that allows AI clients like Claude to discover and query your GraphQL API.

Other Servers

Analytics Collector: event ingestion and geo-enrichment (:backend:servers:analytics-collector), supports GraalVM native image
Analytics Processor: script transforms, triggers, and Iceberg storage (:backend:servers:analytics-processor)

Job Runners

Component Type: General Operations, Workflows

Bosca job runners process background work such as indexing, transition validation, and content processing. Runners are part of the same server binary and can be enabled or isolated by configuration, allowing you to separate API traffic from background processing when needed.

Job runners support distributed locking for clustered environments, preventing duplicate execution of the same job across multiple instances.

PostgreSQL

Component Type: General Operations, Structured Storage, See More

Bosca uses two PostgreSQL instances:

Primary database (default port 5433 in development): Stores operational data including content, profiles, security, workflows, and configuration.
Analytics warehouse (default port 5434 in development): A separate database for analytics data, used by the Iceberg catalog and Trino for batch queries.

Most major cloud providers provide managed PostgreSQL services, allowing for low overhead backups and scaling (through things like read-replicas). There are also several PostgreSQL compliant databases that allow for other scaling approaches like CockroachDB and YugabyteDB. We typically use CloudNativePG to manage our PostgreSQL deployments.

Trino

Component Type: Analytics, AI/ML, See More

Trino is a distributed SQL query engine used for analytics workloads. It connects to the analytics warehouse and S3 object storage, enabling SQL queries over Iceberg tables for reporting and data exploration.

AI agents can use Trino for natural language data queries, allowing non-technical users to explore analytics through chat interfaces.

Meilisearch

Component Type: General Operations, Search, See More

Meilisearch is our preferred search index. Thanks to its foundations in Rust, it has a very reasonable memory footprint and is very fast. It also has many advanced features. While there are certain trade-offs in functionality that they have chosen to make to achieve some of the capabilities they have, we have found them to be acceptable in most cases. With their vector store, things like semantic search are extremely easy to integrate and manage.

Search indexing supports Jsonata transformations for customizable field mapping, giving you fine-grained control over what gets indexed and how.

While Meilisearch doesn't have native clustering, there are easy ways to achieve eventually consistent read replicas via Bosca Workflows. Combined with Kubernetes load balancing, this is a practical way to scale search efficiently.

Redis or NATS

Component Type: General Operations, Caching, Messaging

Bosca supports either Redis or NATS (with JetStream) as the backend for caching, pub/sub, job queues, and distributed locking. You can choose whichever fits your infrastructure best — both are fully supported and interchangeable for these roles.

Most cloud providers offer managed Redis services. NATS is lightweight and well-suited for event-driven architectures. Either option works for small or large-scale deployments.

Object Storage (S3 or Cloud Storage)

Component Type: General Operations, Object Storage

Bosca uses S3-compatible object storage for assets and analytics data. But it also supports Cloud Storage, and standard file systems. In development, an S3 proxy provides local S3-compatible storage.

Text Extractor

Component Type: General Operations, Content Processing

The text extractor is a standalone service used for extracting text from uploaded documents. It runs as a separate container in local development.

Image Processor

Component Type: General Operations

Publishing images often requires creating multiple size and format variants. The image processor handles tasks such as resizing, format conversion, optimization, and more. Default size variants include thumbnail (480x270), small (960x540), medium (1440x810), and large (1920x1080).

OpenTelemetry

Component Type: General Operations, Telemetry

Bosca includes OpenTelemetry instrumentation and can export traces to any OpenTelemetry compatible backend.

Backup & Restore

Export and import your platform data with full backup and restore capabilities.

Deployment

Bosca supports multiple deployment options so you can start small and scale with confidence.