- Technology Illumination
- Posts
- AWS Bedrock AgentCore: Monitoring Architecture for AI-Driven Merchant Banking Application Intake
AWS Bedrock AgentCore: Monitoring Architecture for AI-Driven Merchant Banking Application Intake
Observability patterns for agentic tool performance and compliance attribution
Reading time: 5 minutes
The Research That Started This
A paper crossed my desk recently: "AgentSHAP: Interpreting LLM Agent Tool Importance with Monte Carlo Shapley Value Estimation" (arXiv:2512.12597).
The core insight is simple but powerful: AI agents call tools, but we can't explain which tools actually mattered for the response. The paper applies game theory—running agents with different tool subsets and measuring response changes to compute fair attribution scores.
For banking, this isn't academic. When a regulator asks "How do you know your OFAC screening actually influenced the decision?", you need better than "the logs show it was called."
This got me thinking about a perfect application.
The Merchant Banking Problem
Merchant acquiring—enabling businesses to accept card payments—has an intake problem that's been unsolved for decades.
Here's what typically happens:
┌─────────────────────────────────────────────────────────────┐
│ CURRENT STATE │
├─────────────────────────────────────────────────────────────┤
│ │
│ Business owner visits website │
│ │ │
│ ▼ │
│ Fills "Request a Call" form │
│ │ │
│ ▼ ◀── 4-24 hour wait │
│ Sales rep calls back │
│ │ │
│ ▼ ◀── 25-45 minutes on phone │
│ Rep collects business info, owner details, volume estimates│
│ │ │
│ ▼ ◀── Manual data entry, error-prone │
│ Rep enters data into internal systems │
│ │ │
│ ▼ ◀── Days of email back-and-forth │
│ Chase missing documents │
│ │ │
│ ▼ ◀── Finally reaches underwriting │
│ Application reviewed │
│ │
└─────────────────────────────────────────────────────────────┘
Industry experience suggests these ranges are common:
Metric | What It Means |
|---|---|
60-70% abandonment | Merchants START applications but don't FINISH—forms are complex, they don't have EIN or articles of incorporation handy |
$150-300 internal cost | Bank's cost per application: sales rep time, systems, follow-up calls, data entry labor |
15-20% rework rate | Data quality issues discovered in underwriting: TIN mismatch, address validation failure, identity verification issues |
5-10 business days | First contact to underwriting decision |
Meanwhile, fintech competitors onboard merchants in minutes. The gap is existential.
Why an AI Agent Fits
The intake process isn't a simple form—it's a branching conversation:
Business type determines product recommendations
Expected volume determines pricing tier
Risk profile determines documentation requirements
State of incorporation affects compliance checks
A rigid web form can't handle this. A human rep can, but at $150-300 per application.
An AI agent guided by tools can:
Recommend products based on business type
Validate data in real-time (catch the TIN mismatch during conversation, not in underwriting)
Run compliance checks in the background
Collect documents via chat with OCR extraction
Submit complete, validated applications
The result: 5-8 minutes instead of 45. $15-30 instead of $150-300. 24/7 availability. Perfect data quality.
But here's the catch: you need to prove the agent is doing what it should.
Enter AWS Bedrock AgentCore
Bedrock AgentCore provides the managed infrastructure. Let me show you the configurations that matter for this use case.
Action Groups: Organizing Tools
Tools are grouped by function. This affects how the model reasons about selection:
PRODUCT SELECTION BUSINESS VERIFICATION KYC & COMPLIANCE
├─ RecommendProduct ├─ ValidateTIN ├─ VerifyIdentity
├─ GetProductDetails ├─ VerifyBusinessReg │ └─ Integrates with
├─ CalculatePricing ├─ LookupMCC │ third-party KYC
└─ ValidateAddress │ (Jumio, Onfido, etc.)
├─ ScreenOFAC
DOCUMENTS APPLICATION MGMT ├─ CheckPEP
├─ CheckExistingDocs ├─ CheckExistingApps └─ AssessRisk
│ └─ Is doc already │ └─ Prevent duplicates
│ on file? ├─ SaveProgress
├─ CompareDocuments ├─ ResumeApplication
│ └─ Old vs new changes? └─ SubmitApplication
├─ ProcessDocument (OCR)
└─ ValidateDocumentData
Key tools often missed:
CheckExistingDocuments→ Before requesting a driver's license, check if one is already on fileCompareDocuments→ If document exists, compare old vs. new for material changesCheckExistingApplications→ Detect pending or recently completed applications before starting new oneVerifyIdentity→ KYC via third-party providers, not just OFAC
Guardrails: PII and Compliance Controls
This is where Bedrock shines for financial services:
Guardrail:
SensitiveInformationPolicy:
PIIEntities:
- Type: SSN
Action: ANONYMIZE # Replace with [SSN] in responses
- Type: CREDIT_CARD_NUMBER
Action: BLOCK # Never include
- Type: BANK_ACCOUNT_NUMBER
Action: ANONYMIZE
RegexPatterns:
- Name: EIN_PATTERN
Pattern: '\b\d{2}-\d{7}\b'
Action: ANONYMIZE
TopicPolicy:
DeniedTopics:
- Name: COMPETITOR_DISCUSSION
Definition: "Discussion of competitor products or pricing"
Action: BLOCK
- Name: INVESTMENT_ADVICE
Action: BLOCK
WordPolicy:
CustomWordLists:
- Words: ["guaranteed approval", "instant approval"]
Action: BLOCK # Compliance violation
The agent collects SSNs, EINs, bank accounts—these never appear in responses, even when confirming back to the merchant. That's configuration, not code.
The Monitoring Architecture
Here's where AgentSHAP principles become practical. You need to prove tools actually influenced outcomes.
Layer 1: Comprehensive Tool Logging
Every invocation generates structured logs:
{
"timestamp": "2025-01-13T14:32:18Z",
"session_id": "ses_abc123",
"application_id": "APP-2025-00892",
"phase": "kyc_verification",
"tool_invocation": {
"action_group": "KYCCompliance",
"tool_name": "ScreenOFAC",
"input_hash": "sha256:a1b2c3...", // Hash, NOT actual PII
"execution_ms": 892,
"output_summary": {
"match_found": false,
"lists_checked": ["SDN", "CONS"]
}
},
"preceding_context": {
"user_message_type": "owner_info_provided",
"agent_intent": "run_compliance_screening"
}
}
Critical: input_hash not actual input. PII never lands in logs.
Layer 2: Analytics Pipeline
CloudWatch Logs
│
▼
Kinesis Firehose (transforms to Parquet)
│
▼
S3 (partitioned by date)
│
▼
Athena (SQL queries)
│
▼
QuickSight (dashboards)
Layer 3: QuickSight Dashboards
Dashboard 1: Application Funnel
Started vs. completed by day
Abandonment by phase (where do merchants drop?)
Time spent per phase
Dashboard 2: Tool Performance
Invocation frequency by action group
Latency distribution (p50, p95, p99)
Error rates
Dashboard 3: Compliance Monitoring
OFAC screening completion rate (must be 100%)
KYC verification outcomes
Manual review triggers
Dashboard 4: Tool Attribution (AgentSHAP-inspired)
-- Which tools correlate with successful applications?
SELECT
tool_name,
phase,
COUNT(*) as invocations,
AVG(CASE WHEN outcome='APPROVED' THEN 1.0 ELSE 0.0 END) as approval_rate
FROM tool_invocations t
JOIN applications a ON t.application_id = a.id
GROUP BY tool_name, phase
ORDER BY phase, approval_rate DESC;
The Attribution Alert System
This is where AgentSHAP thinking pays off. Define expected importance by phase:
Phase | Must Be HIGH Importance | Alert If LOW |
|---|---|---|
KYC/Compliance | VerifyIdentity, ScreenOFAC | Either one |
Business Verification | ValidateTIN, VerifyBusinessReg | ValidateTIN |
Document Collection | ProcessDocument | - |
Submission | CheckExistingApplications | - |
If OFAC shows low importance during KYC phase → investigate immediately.
Either:
Tool isn't being called (agent bug)
Tool output isn't influencing agent behavior (prompt issue)
Both are compliance failures. Both are now detectable.
Real Conversations the Agent Handles
Duplicate application detected:
Agent: [Calls CheckExistingApplications]
Agent: "I found an existing application for Tasty Tacos LLC
submitted January 8th, currently in underwriting.
Would you like to check that status, or start
a new application for a different location?"
Document already on file:
Merchant: [Uploads driver's license]
Agent: [Calls CheckExistingDocuments → found]
[Calls CompareDocuments → address changed]
Agent: "I have a license on file from your previous application,
but the address is different—123 Main St vs 456 Oak St.
Should I update your records with the new address?"
KYC verification fails:
Agent: [Calls VerifyIdentity → FAILED, confidence 0.3]
Agent: "I couldn't verify your identity with the information
provided. This sometimes happens with:
• SSN typos
• Name variations (maiden name, suffix)
• Address mismatches
Would you like to re-enter, or speak with a specialist?"
Architecture Summary
┌────────────────────────────────────────────────────────────────────────┐
│ MERCHANT INTAKE AGENT │
├────────────────────────────────────────────────────────────────────────┤
│ │
│ CloudFront + S3 (React) ──▶ API Gateway (WebSocket) │
│ │ │
│ ▼ │
│ ┌────────────────────────────────────────────────────────────────┐ │
│ │ BEDROCK AGENTCORE │ │
│ │ • Claude 3.5 Sonnet (orchestration) │ │
│ │ • Action Groups (tools via Lambda) │ │
│ │ • Knowledge Base (products, pricing, MCCs) │ │
│ │ • Guardrails (PII filtering, topic blocking) │ │
│ └────────────────────────────────────────────────────────────────┘ │
│ │ │
│ ┌───────────────┼───────────────┐ │
│ ▼ ▼ ▼ │
│ DynamoDB Textract External APIs │
│ (sessions, (OCR) (IRS, KYC, OFAC, │
│ applications) State SOS) │
│ │
├────────────────────────────────────────────────────────────────────────┤
│ MONITORING LAYER │
│ │
│ CloudWatch ──▶ Firehose ──▶ S3 ──▶ Athena ──▶ QuickSight │
│ │ │ │
│ └──────────────────────────────────────────────┘ │
│ Attribution Analysis │
│ │
└────────────────────────────────────────────────────────────────────────┘
Key Takeaways
Tools need structure — Action Groups affect how the model reasons about tool selection
Guardrails are configuration — PII filtering, topic blocking defined declaratively, not in code
Observability enables attribution — Log tool invocations with context; hash PII
Compliance needs proof — AgentSHAP principles turn "we logged it" into "we can prove it mattered"
Handle edge cases explicitly — Existing applications, duplicate documents, verification failures
Monitor tool importance by phase — Alert when compliance tools show unexpectedly low attribution
The Business Impact
Estimated improvements based on industry implementations:
Metric | Before (Manual) | After (Agent) |
|---|---|---|
Time per application | 25-45 minutes | 5-8 minutes |
Internal cost | $150-300 | $15-30 |
Abandonment rate | 60-70% | 20-30% |
Rework rate | 15-20% | <5% |
Availability | Business hours | 24/7 |
Compliance auditability | "We have logs" | Provable attribution |
Resources:
Research: AgentSHAP (arXiv:2512.12597)
Research Code: github.com/GenAISHAP/TokenSHAP