Intelligent Document Processing (IDP) revolutionizes how organizations manage unstructured document data, facilitating the automatic extraction of valuable information from various sources like invoices, contracts, and reports. This article details the programmatic creation of an IDP solution utilizing Strands SDK, Amazon Bedrock AgentCore, Amazon Bedrock Knowledge Base, and Bedrock Data Automation (BDA). The solution is presented via a Jupyter notebook, allowing users to upload multi-modal business documents and extract insights. BDA acts as a parser to retrieve relevant data chunks, which then augment a prompt for a foundational model (FM). A specific application involves retrieving context for public school districts from a Nation’s Report Card published by the U.S. Department of Education.
Amazon Bedrock Data Automation can function as a standalone feature or as a parser when configuring a knowledge base for Retrieval-Augmented Generation (RAG) workflows. BDA is capable of generating valuable insights from unstructured, multi-modal content, including documents, images, video, and audio. With BDA, automated IDP and RAG workflows can be built quickly and cost-effectively. For RAG workflows, Amazon OpenSearch Service can be used to store vector embeddings of necessary documents. In this context, Bedrock AgentCore leverages BDA through tools to perform multi-modal RAG for the IDP solution.
Amazon Bedrock AgentCore is a fully managed service designed for building and configuring autonomous agents. Developers can create and deploy agents using popular frameworks and a variety of models from Amazon Bedrock, Anthropic, Google, and OpenAI, all without managing underlying infrastructure or writing custom code.
Strands Agents SDK is a sophisticated open-source toolkit that enhances artificial intelligence (AI) agent development through a model-driven approach. Developers can define a Strands Agent with a prompt (specifying agent behavior) and a list of tools. A large language model (LLM) handles the reasoning, autonomously deciding optimal actions and tool usage based on context and task. This workflow supports complex systems, reducing the code typically required for orchestrating multi-agent collaboration. Strands SDK is used here for creating the agent and defining the tools necessary for intelligent document processing.
The following prerequisites and step-by-step implementations guide the deployment of this solution in an AWS environment.
Prerequisites
To follow the example use cases, the following prerequisites are necessary:
- AWS credentials with appropriate permissions
- Access to Github
- Git installed locally; for instructions, refer to Getting Started – Installing Git
Architecture
The solution incorporates the following AWS services:
- Amazon S3 for document storage and upload capabilities
- Bedrock Knowledge Bases to convert S3-stored objects into a RAG-ready workflow
- Amazon OpenSearch for vector embeddings
- Amazon Bedrock AgentCore for the IDP workflow
- Strands Agent SDK for the open-source framework defining tools to perform IDP
- Bedrock Data Automation (BDA) to extract structured insights from documents

To begin, follow these steps:
- Upload relevant documents to Amazon S3
- Create an Amazon Bedrock Knowledge Base and parse the S3 data source using Amazon Bedrock Data Automation.
- Document chunks are stored as vector embeddings in Amazon OpenSearch
- A Strands Agent deployed on Amazon Bedrock AgentCore Runtime performs RAG to answer user questions.
- The end user receives a response
Configure the AWS CLI
Use the following command to configure the AWS Command Line Interface (AWS CLI) with AWS credentials for an Amazon account and AWS Region. Before starting, check AWS Bedrock Data Automation for region availability and pricing:
aws configure
Clone and build the GitHub repository locally
git clone https://github.com/aws-samples/sample-for-amazon-bda-agents
cd sample-for-amazon-bda-agents
Open the Jupyter notebook named:
bedrock-data-automation-with-agents.ipynb
Bedrock Data Automation with AgentCore Notebook instructions:
This notebook illustrates how to create an IDP solution using BDA with Amazon Bedrock AgentCore Runtime. Instead of traditional Bedrock Agents, a Strands Agent will be deployed through AgentCore, offering enterprise-grade capabilities with framework flexibility. More specific instructions are provided within the Jupyter notebook. Below is an overview of how to set up Bedrock Knowledge Bases with data automation as a parser with Bedrock AgentCore.
Steps:
- Import libraries and set up AgentCore capabilities
- Create the Knowledge Base for Amazon Bedrock with BDA
- Upload the academic reports dataset to Amazon S3
- Deploy the Strands Agent using AgentCore Runtime
- Test the AgentCore-hosted agent
- Clean-up all resources
Security considerations
The implementation incorporates several security guardrails, such as:
- Secure file upload handling
- Identity and Access Management (IAM) role-based access control
- Input validation and error handling
Note: This implementation is for demonstration purposes. Additional security controls, testing, and architectural reviews are required before deployment in a production environment.
Benefits and use cases
This solution offers particular value for:
- Automated document processing workflows
- Intelligent document analysis on large-scale datasets
- Question-answering systems based on document content
- Multi-modal content processing
Conclusion
This solution demonstrates how to leverage Amazon Bedrock AgentCore’s capabilities to build intelligent document processing applications. By developing Strands Agents to support Amazon Bedrock Data Automation, powerful applications can be created that understand and interact with multi-modal document content using tools. With Amazon Bedrock Data Automation, the RAG experience can be enhanced for more complex data formats, including visually rich documents, images, audio, and video.
Additional resources
For more information, visit Amazon Bedrock.
Service User Guides:
- Amazon Bedrock Knowledge Bases User Guide
- Amazon Bedrock AgentCore User Guide
- Strands Agents: Open Source AI Agents SDK
- Amazon Bedrock Data Automation User Guide
Relevant Samples:

