6,000 AWS accounts, three people, one platform: Lessons learned

As software-as-a-service (SaaS) platforms expand, maintaining a balance between innovation speed, strong security, and tenant data isolation becomes critical. While AWS Identity and Access Management (IAM) mechanisms secure both shared and dedicated environments, establishing a hard security boundary is often simpler in an account-per-tenant model, where the AWS account itself acts as the isolation boundary. In contrast, shared-account deployments rely on resource-level boundaries like tenant-scoped IAM policies and data partitioning. This multi-tenancy increases architectural and operational complexity and can introduce security challenges if safeguards are not properly designed and enforced. Adopting an account-per-tenant model on Amazon Web Services (AWS) can lead to clearer security boundaries, streamlined service ownership, and more transparent cost attribution, but this requires increased investment in platform automation.

ProGlove develops smart wearable barcode scanning solutions that connect frontline workers to digital workflows. Its scanners integrate with Insight, an AWS-based SaaS platform, to provide real-time process visibility. This helps customers in manufacturing, logistics, and retail enhance productivity, reduce errors, and improve ergonomics.

This article describes why an account-per-tenant approach was chosen for a serverless SaaS architecture and how it impacts the operational model. It covers anticipated challenges related to automation, observability, and cost. The discussion also touches upon how this approach can affect other operational models, such as those in an enterprise context.

Why a multi-account strategy?

Many SaaS providers start with a straightforward, dedicated deployment model, often using one AWS account per tenant. This approach simplifies initial implementation and limits the scope of issues. However, as the platform scales, operational overhead and inefficiencies from idle or underutilized resources can increase. Serverless architectures can mitigate these inefficiencies by scaling automatically with demand. Over time, providers often consider shared or multi-tenant models to consolidate operations and improve cost efficiency. Yet, this shift introduces new challenges as the number of tenants and services grows:

Blast radius – An accidental misconfiguration or vulnerability could expose multiple tenants.
Quota limits – Tenants in a single AWS account share the same quotas.
Operational complexity – Shared infrastructure makes it difficult to determine resource ownership.
Customization limits – Making changes for one tenant risks impacting others.
Cost visibility – Attributing resource usage to individual tenants is challenging.

The choice between a dedicated or shared model is ultimately a trade-off. Dedicated deployments are simpler to build but demand investment in SaaS operations and orchestration for management at scale, whereas shared models reduce operational overhead but increase architectural and management complexity.

AWS recommends a multi-account strategy for organizing an AWS environment. At scale, the AWS account boundary is the easiest way to implement isolation. Accounts are fully isolated containers for compute, storage, networking, and more, with no shared scope unless explicitly configured.

Based on its use case, ProGlove adopted this model to its logical extreme: every tenant receives their own AWS account, and the services they consume are deployed directly into that account. The full set of microservices required by the tenant is deployed, running exclusively with that tenant’s data and configuration. At its current scale, ProGlove manages approximately:

6,000 tenant accounts, with about 50% active and deployed
40 microservices, each with multiple AWS Lambda functions and supporting resources
70 internal accounts for continuous integration and continuous delivery (CI/CD), observability, and shared tooling

This translates to over 120,000 deployed service instances and roughly 1,000,000 Lambda functions in production. The following diagram provides an overview of the main services used in the platform.

Benefits of the account-per-tenant model

This model offers several benefits that directly support security, agility, and operational clarity, including a strong isolation model, a simplified mental model, per-tenant customization, and transparent cost attribution. Tenant data is not co-located; each account has its own storage, compute, and permissions. If a security issue, runaway process, or misconfiguration occurs, the impact is limited to that tenant’s account, leaving other tenants unaffected. For developers, the need to consider multi-tenancy is reduced, as a deployed service instance always belongs to exactly one tenant. This lowers cognitive load and simplifies debugging. Developers can easily be provided with isolated, production-like tenant accounts to eliminate the gap between development and production environments. Individual accounts can be modified, tested, and migrated independently, allowing for tailored deployments, such as activating premium features for specific tenants, without impacting the overall system.

AWS Cost Explorer and linked accounts make it straightforward to report and charge back costs on a per-tenant basis. For SaaS providers with consumption-based pricing models, this becomes a significant advantage.

During an AWS Well-Architected Framework review conducted with AWS, many items from the operational excellence and security pillars were found to be inapplicable to this setup, making those review sections quick and straightforward to complete.

Challenges and trade-offs

The account-per-tenant model, like most architectural choices, involves trade-offs. While it provides strong isolation, it introduces challenges in platform operations. The approach shifts complexity from application development to platform development.

Provisioning, configuring, and managing thousands of accounts manually is not feasible. Automation of account creation, baseline setup, IAM roles, guardrails, and service enablement is mandatory. The platform relies on AWS Organizations, its service control policies (SCPs), and AWS CloudFormation StackSets, as well as custom tooling to handle these tasks.

Some workflows are well-suited for automation, while others can be implemented more effectively using traditional scripting and manual operations, provided the introduced overhead is low enough. For instance, account creation is a fully automated process using AWS Step Functions, but the retirement and closure of accounts are performed manually through regularly run scripts.

Some AWS services are billed per provisioned resource, independent of utilization, rather than scaling to zero when not in use. Notable examples include Amazon Elastic Compute Cloud (Amazon EC2) or Amazon Relational Database Service (Amazon RDS), where resources must be provisioned to use the service. Even the smallest EC2 instance type costs around USD $3, which accumulates to USD $3,000 when deployed into 1,000 accounts. In contrast, serverless offerings such as AWS Lambda or Amazon DynamoDB automatically scale based on actual usage, minimizing idle resource costs. While per-invocation or per-request pricing for serverless services might appear higher, these models often offset the operational overhead and resource wastage associated with always-on infrastructure. In any case, costs should be carefully modeled, measured, and optimized.

Monitoring infrastructure across thousands of accounts and Regions at scale is significantly more challenging than monitoring a few accounts. Observability tooling should be centralized, but without reintroducing the very risks that accounts are designed to isolate. It is important to note that Amazon CloudWatch now offers significantly improved cross-account observability features compared to when the platform began, such as the Observability Access Manager.

Developers, operations teams, and platform services and tools need to operate across accounts daily. This requires a robust identity model with IAM roles and cross-account trust policies. If not carefully designed, this can become a source of complexity and security risk. It is also crucial to follow the best practice of avoiding long-lived credentials, as these introduce a major security threat and monitoring effort when deployed into many accounts. AWS service limits are enforced per account. In a shared-account model, a single set of quotas is monitored. In an account-per-tenant setup, quota management becomes distributed and harder to predict. Proactive quota requests and monitoring are essential. For example, AWS Lambda has a quota for the number of concurrent executions that functions in a single account share. If a tenant experiences heavier load, its corresponding account is likely to encounter throttling errors for Lambda functions, making a single pane of glass view essential to track quota usage and adapt as necessary. Although multi-account strategies are common at the enterprise level, adopting them at the SaaS tenant level is less common. Patterns, tooling, and reference architectures are still evolving, which means building custom solutions may be necessary. It is advisable to research available resources and consult AWS to avoid reinventing the wheel.

Scaling observability across tenants

Observability can become a challenge in this architecture. If each tenant account emits its own logs, metrics, and traces, operational visibility becomes fragmented. For enhanced cross-account capabilities, a third-party observability solution was used. For example, telemetry (logs and metrics) is forwarded to a central application where multi-alerts can be configured once and applied to individual tenant accounts. This not only reduces cost but also simplifies the operational experience. Engineers interact with a single view, while underlying telemetry still originates from isolated accounts.

It is vital to use tags whenever possible to correlate telemetry data and to employ consistent tagging and naming conventions. Depending on the scale of operations, consider using AWS Organizations tag policies to enforce a consistent scheme. For example, fields for the source AWS account ID are included in most metrics and logs to ensure easy drill-down into data for a particular tenant.

Key takeaways:

Do not replicate per-account alarms blindly. Use streaming and aggregation.
Use tags for consistent context across thousands of instances.
Stay current with AWS feature releases via the AWS News Blog: metric streams, Amazon EventBridge integrations, Amazon CloudWatch Observability Access Manager, and other offerings can streamline an observability stack.
Follow the What’s New with AWS feed.

CI/CD and deployment at scale

Deploying microservices into one AWS account is straightforward. Deploying the same service into thousands of accounts requires a different approach. The application code is stored in a monorepo, which helps enforce the same version of libraries or Lambda layers, among other things. The following diagram illustrates how many tenant accounts are updated using AWS CodePipeline combined with AWS CloudFormation StackSets to deploy applications. Each pipeline execution updates many target accounts in parallel, with only a single StackSet update operation in a central account.

While this provides the necessary scale, it also introduces new failure modes:

Partial rollouts – If one account fails to deploy, rollback or retry strategies need to be defined and tested.
Pipeline duration – Large-scale updates can take significant time to propagate.
Tooling maturity – StackSets are powerful but still evolving, and operational edge cases are possible.

In practice, this necessitates investment in platform engineering. A dedicated team builds and maintains internal tools that abstract deployment complexity away from service developers. Developers remain focused on business logic, and the platform team ensures consistency and reliability across accounts.

Cost management

Cost modeling changes significantly with this architecture. In a shared account, many costs are pooled, making per-tenant attribution difficult. In an account-per-tenant model, costs are naturally segmented by account. On the positive side, tenant-specific cost reporting is trivial. SaaS providers can align billing directly with AWS usage and even receive monthly reporting per tenant automatically through AWS billing.

Costs that scale per account need careful consideration. At scale, even small charges per resource become significant. For example, collecting metrics from thousands of accounts requires careful planning, and the chosen approach greatly influences costs. At this scale, using standard observability tooling out of the box is not feasible because the volume of collected data can make per-account costs economically unsustainable. Instead, focus on understanding which metrics are essential to monitor and select an observability approach that allows for their implementation. As a recommendation, evaluate cost multipliers early. Services that scale linearly with the number of accounts should be avoided where possible. It is crucial to verify assumptions with actual measurements.

Operational considerations

To succeed with this model, preparation to invest in platform capabilities is essential:

Account management – Automate everything from creation to decommissioning.
Baseline guardrails – Enforce compliance and security controls using SCPs and strict IAM management.
Developer training – Ensure teams understand the scope and boundaries of their services.
CI/CD investment – Pipelines need to scale to thousands of accounts without hindering innovation.
Observability discipline – Monitoring needs to be consistent, centralized, and cost-effective.

Conclusion

This article described how ProGlove implemented a large-scale account-per-tenant model on AWS and how that model shifts complexity from service code to platform operations. This is a trade-off that demands more platform automation, scalable CI/CD pipelines, and disciplined observability practices. The benefits include strong tenant and workload isolation, transparent costs, and a severely reduced blast radius. These benefits are key for platform providers operating at scale with a strictly limited operations team size. Managing thousands of AWS accounts with a small team might seem impossible. However, with the right architectural choices, each new workload adds only marginal operational load while the platform absorbs exponential scale. The team size can remain constant, and efficiency grows with every account added. If security, compliance, and clarity are top priorities, this approach can serve as a strong foundation for a platform. Working backward from these requirements can help achieve the same balance: drastically scaling a tenant base without scaling the operations team at the same rate.

Latest Post

Verifying 5G Standalone Activation on Your iPhone

Hands on: the Galaxy S26 and S26 Plus are more of the same for more money

IronCurtain: A Secure AI Agent Designed to Prevent Rogue Actions

GitHub Actions Now Supports Unzipped Artifact Uploads and Downloads

Discovering the Artist Behind Firefox’s New Community-Created App Icon

Improved Search Functionality for GitHub Issues Dashboard

ChatGPT Mobile App Surpasses $3 Billion in Consumer Spending

Automate Your iPhone’s Always-On Display for Better Battery Life and Privacy

Creator Tayla Cannon Lands $1.1M Investment for Rebuildr PT Software

Latest Post

Verifying 5G Standalone Activation on Your iPhone

Hands on: the Galaxy S26 and S26 Plus are more of the same for more money

IronCurtain: A Secure AI Agent Designed to Prevent Rogue Actions

Latest Post

6,000 AWS accounts, three people, one platform: Lessons learned

Why a multi-account strategy?

Benefits of the account-per-tenant model

Challenges and trade-offs

Scaling observability across tenants

CI/CD and deployment at scale

Cost management

Operational considerations

Conclusion

Related Posts