xarvio – BASF Digital Farming aims to provide farmers globally with advanced digital tools for agronomic decision-making. The xarvio FIELD MANAGER platform is crucial to this, offering actionable insights derived from various geospatial assets like satellite imagery, drone data, and sprayer application maps.
This article details the development of a scalable geospatial data solution on AWS. The solution efficiently catalogs, manages, and visualizes both raster and vector datasets via the web. It is built upon the SpatioTemporal Asset Catalog (STAC) specification and the open-source eoAPI ecosystem, covering the architecture, key technologies, and deployment insights. This expands on a previous post concerning efficient satellite imagery ingestion using AWS Serverless, addressing the complete lifecycle of large-scale geospatial data management.
Geospatial Data Solution Requirements
The xarvio FIELD MANAGER platform by BASF Digital Farming operates on an extensive scale within the geospatial data ecosystem. It processes hundreds of millions of satellite images, which become STAC items, further breaking down into billions of individual geospatial artifacts. Unlike traditional satellite data providers with structured data flows, xarvio operates in a dynamic agricultural setting, ingesting near-daily satellite imagery per field from various global sensors and providers. Supporting farmers worldwide with advanced digital agronomic advice necessitates a robust, cloud-based infrastructure. This infrastructure must manage significant data velocity and volume, while also applying advanced quality assurance, including cloud and anomaly detection algorithms. The platform’s core value comes from its machine learning (ML) pipelines, which convert raw satellite data into actionable insights. For instance, precise estimation of absolute biomass, such as Leaf Area Index (LAI), assists farmers in making data-driven agronomic decisions to optimize crop yield and resource use globally.
STAC and eoAPI Ecosystem
To effectively manage its expanding geospatial data archive, the platform adopted the SpatioTemporal Asset Catalog (STAC) specification. This open standard offers a unified language for describing and cataloging raster and vector datasets. STAC allows for metadata standardization across various sources, including satellite imagery, UAV datasets, and prescription maps, simplifying asset search, filtering, and retrieval across the platform. The platform was developed using the eoAPI ecosystem, an integrated collection of open-source tools designed for comprehensive geospatial data management in the cloud. Central to this is pgSTAC, a high-performance PostGIS-backed STAC API implementation. pgSTAC enables efficient indexing of millions of STAC items, supporting spatial, temporal, and attribute-based filtering at scale. Additionally, Tiles in PostGIS (TiPG) is utilized to serve tiled vector data directly from the PostGIS database. This facilitates real-time visualization of field boundaries, management zones, and application histories as lightweight Mapbox Vector Tiles (MVT), eliminating the need for an external tile server. For raster assets, such as satellite and drone imagery, TiTiler is employed. TiTiler is a modern dynamic tile server designed for Cloud Optimized GeoTIFFs (COGs), allowing on-demand imagery streaming as WMTS or XYZ tiles, dynamic rendering (e.g., NDVI or false color composites), and seamless integration into web and mobile applications.
Solution Overview
The architecture diagram below illustrates the implementation of the geospatial data platform on AWS. This section describes each architectural component and its role in processing millions of satellite images and geospatial assets daily. The solution leverages Amazon Elastic Kubernetes Service (Amazon EKS) as its primary computing platform, Amazon Simple Storage Service (Amazon S3) for data storage, and Amazon Relational Database Service (Amazon RDS) for metadata management. The architecture is structured into four main layers: core services, storage, database, and ingestion.

Core Services Layer
An EKS cluster hosts three essential services within the solution:
- stac-service – This service implements the STAC API specification, cataloging and serving metadata for both raster and vector datasets.
- raster-service – Utilizing TiTiler, this service dynamically renders and tiles cloud-optimized raster data (e.g., COGs) for smooth integration into web and mobile maps.
- vector-service – Developed with TiPG, this component delivers vector data (e.g., boundaries or application zones) as tiled MVT layers, either directly from the database or from Amazon S3.
These services are containerized and managed within Kubernetes, ensuring high availability, modularity, and streamlined continuous integration and delivery (CI/CD) processes.
KEDA-based Automatic Scaling
Kubernetes Event-Driven Autoscaling (KEDA) is employed to dynamically scale platform services according to real-time workloads. KEDA enables the scaling of individual pods based on specific event-driven metrics, such as STAC ingestion queue depth or visualization request load. This ensures responsive performance during peak activity while optimizing resource usage during idle times, meeting the demand for elasticity in a data-intensive, variable-load environment.
Geospatial Asset Storage Layer
All raw and processed geospatial assets are stored in S3 buckets, optimized for performance and durability. This layer contains COGs for raster imagery and formats like FlatGeobuf for vector data. These formats are selected for their support of streaming access, indexing, and cloud-based performance.
Database Layer
A PostgreSQL database, hosted on Amazon RDS and enhanced with the pgSTAC plugin, forms the system’s metadata backbone. This configuration allows for efficient indexing and querying of millions of STAC items and collections. An RDS proxy is positioned before the database, offering connection pooling and resilience, particularly during sudden or concurrent access patterns typical in geospatial applications.
Ingestion Layer
An independent ingestion component manages batch or streaming geospatial data inputs. This component processes satellite imagery, drone data, or prescription maps, then pushes relevant metadata into the STAC API and stores assets in Amazon S3. The ingestion engine is separate from the serving infrastructure, facilitating asynchronous and large-scale data loading.
Amazon API Gateway and Clients
Public access to the platform is managed via Amazon API Gateway. This enables clients, including browser-based and mobile applications, to interact securely with the services. The API Gateway acts as a unified entry point, applying rate limiting, authorization, and routing policies.
Solution Benefits
- Rapid Onboarding with STAC Standardization – Adhering to the STAC specification has significantly shortened the time required to onboard new data domains, such as sprayer application maps. Compared to previous legacy system methods, metadata modeling and integration are now standardized and automated, allowing new geospatial data products to be made available to clients in days rather than weeks or months.
- Optimized Storage with COGs and Amazon S3 – Storing raster and vector assets in Amazon S3 using cloud-optimized formats (e.g., COGs for imagery or FlatGeobuf for vectors) lowers storage costs while enabling low-latency, streaming access. This eliminates the need for extensive preprocessing or extract, transform, and load (ETL) pipelines, simplifying client delivery.
- Large-Scale Ingestion with a Batch STAC Ingestor – The custom STAC ingestor supports both real-time and batch-mode operations. This capability allows for bulk onboarding of satellite constellations, drone imagery, and historical datasets without interrupting active services. The ingestion service utilizes optimized database ingestion functions, capable of ingesting thousands of items per second, ensuring high-throughput and reliable data integration at scale.
- PostgreSQL, pgSTAC, and Amazon RDS Proxy for a Scalable Metadata Backbone – The combination of pgSTAC and Amazon RDS Proxy provides advanced spatial-temporal querying while ensuring graceful database connection management, even under high concurrency. This pairing delivers reliability without sacrificing performance.
- Scalable Deployment with Amazon EKS – Hosting the solution on Amazon EKS offers comprehensive control over deployments, resource tuning, and service orchestration. Coupled with automatic scaling, compute capacity can be dynamically adjusted based on demand, enhancing resilience and cost-efficiency.
Learnings
- RDS Proxy is Essential for Automatically Scaled Environments – For environments utilizing automatic scaling pods in Amazon EKS, RDS Proxy is crucial. It efficiently manages connection pooling and safeguards the underlying PostgreSQL database from connection exhaustion during sudden scale-up events. Without it, instances of spiky load failures and blocked connections were observed during high-ingest periods.
- Batch STAC Ingestor is a Core Component – The custom STAC ingestor proved to be an indispensable part of the system. It directly interacts with pgSTAC to perform large-scale, automated ingestions of geospatial metadata from streams and archives. Without this tool, onboarding data providers or processing legacy imagery at scale would have been labor-intensive and prone to errors.
- COGs are Non-Negotiable – For rapid, scalable visualization of large raster datasets, Cloud Optimized GeoTIFFs (COGs) are essential, especially when datasets exceed several gigabytes. They facilitate efficient HTTP range requests, reduce the need for preprocessing, and integrate seamlessly with TiTiler for real-time tile rendering. Non-COG formats resulted in noticeably slower performance and were not suitable for cloud-based visualization.
- Serverless-Compliant, Optimized for Amazon EKS (for now) – While the architecture is designed to be serverless-compatible, an Amazon EKS-first approach was chosen due to the existing application landscape. Components like TiTiler and TiPG benefit from persistent, memory-tuned environments that are more challenging to achieve in a serverless runtime. Nevertheless, the solution maintains a modular and stateless design, making certain subsystems (e.g., ingestion triggers, notifications, or monitoring) potential candidates for future serverless migration to further enhance elasticity and reduce operational overhead.
Conclusion
BASF Digital Farming GmbH successfully implemented a STAC-based geospatial data platform on Amazon EKS. This platform enables efficient management and visualization of satellite imagery, drone data, and application maps. The architecture facilitates the onboarding of new data sources within weeks instead of months. The new platform also processes double the data volume daily while reducing costs by 50%, attributed to streamlined data handling via the STAC schema and the efficiencies of automatic scaling. Adopting the STAC standard enhances data discoverability, decreases search latency, and supports more efficient analytical workflows.
Organizations aiming to develop similar geospatial data solutions can leverage AWS services such as Amazon EKS, Amazon S3, and Amazon RDS, alongside open-source tools like STAC and eoAPI, to build scalable and cost-effective solutions. Further information on building containerized applications on AWS is available at Containers on AWS.

