Technology Illumination
Posts
AWS Bedrock AgentCore: Monitoring Architecture for AI-Driven Merchant Banking Application Intake

AWS Bedrock AgentCore: Monitoring Architecture for AI-Driven Merchant Banking Application Intake

Observability patterns for agentic tool performance and compliance attribution

Aruna Kishore Veleti
January 13, 2026

Reading time: 5 minutes

The Research That Started This

A paper crossed my desk recently: "AgentSHAP: Interpreting LLM Agent Tool Importance with Monte Carlo Shapley Value Estimation" (arXiv:2512.12597).

The core insight is simple but powerful: AI agents call tools, but we can't explain which tools actually mattered for the response. The paper applies game theory—running agents with different tool subsets and measuring response changes to compute fair attribution scores.

For banking, this isn't academic. When a regulator asks "How do you know your OFAC screening actually influenced the decision?", you need better than "the logs show it was called."

This got me thinking about a perfect application.

The Merchant Banking Problem

Merchant acquiring—enabling businesses to accept card payments—has an intake problem that's been unsolved for decades.

Here's what typically happens:

┌─────────────────────────────────────────────────────────────┐
│                    CURRENT STATE                            │
├─────────────────────────────────────────────────────────────┤
│                                                             │
│  Business owner visits website                              │
│              │                                              │
│              ▼                                              │
│  Fills "Request a Call" form                                │
│              │                                              │
│              ▼  ◀── 4-24 hour wait                          │
│  Sales rep calls back                                       │
│              │                                              │
│              ▼  ◀── 25-45 minutes on phone                  │
│  Rep collects business info, owner details, volume estimates│
│              │                                              │
│              ▼  ◀── Manual data entry, error-prone          │
│  Rep enters data into internal systems                      │
│              │                                              │
│              ▼  ◀── Days of email back-and-forth            │
│  Chase missing documents                                    │
│              │                                              │
│              ▼  ◀── Finally reaches underwriting            │
│  Application reviewed                                       │
│                                                             │
└─────────────────────────────────────────────────────────────┘

Industry experience suggests these ranges are common:

Metric	What It Means
60-70% abandonment	Merchants START applications but don't FINISH—forms are complex, they don't have EIN or articles of incorporation handy
$150-300 internal cost	Bank's cost per application: sales rep time, systems, follow-up calls, data entry labor
15-20% rework rate	Data quality issues discovered in underwriting: TIN mismatch, address validation failure, identity verification issues
5-10 business days	First contact to underwriting decision

Meanwhile, fintech competitors onboard merchants in minutes. The gap is existential.

Why an AI Agent Fits

The intake process isn't a simple form—it's a branching conversation:

Business type determines product recommendations
Expected volume determines pricing tier
Risk profile determines documentation requirements
State of incorporation affects compliance checks

A rigid web form can't handle this. A human rep can, but at $150-300 per application.

An AI agent guided by tools can:

Recommend products based on business type
Validate data in real-time (catch the TIN mismatch during conversation, not in underwriting)
Run compliance checks in the background
Collect documents via chat with OCR extraction
Submit complete, validated applications

The result: 5-8 minutes instead of 45. $15-30 instead of $150-300. 24/7 availability. Perfect data quality.

But here's the catch: you need to prove the agent is doing what it should.

Enter AWS Bedrock AgentCore

Bedrock AgentCore provides the managed infrastructure. Let me show you the configurations that matter for this use case.

Action Groups: Organizing Tools

Tools are grouped by function. This affects how the model reasons about selection:

PRODUCT SELECTION          BUSINESS VERIFICATION      KYC & COMPLIANCE
├─ RecommendProduct        ├─ ValidateTIN             ├─ VerifyIdentity
├─ GetProductDetails       ├─ VerifyBusinessReg       │   └─ Integrates with
├─ CalculatePricing        ├─ LookupMCC               │      third-party KYC
                           └─ ValidateAddress         │      (Jumio, Onfido, etc.)
                                                      ├─ ScreenOFAC
DOCUMENTS                  APPLICATION MGMT           ├─ CheckPEP
├─ CheckExistingDocs       ├─ CheckExistingApps       └─ AssessRisk
│   └─ Is doc already      │   └─ Prevent duplicates
│      on file?            ├─ SaveProgress
├─ CompareDocuments        ├─ ResumeApplication
│   └─ Old vs new changes? └─ SubmitApplication
├─ ProcessDocument (OCR)
└─ ValidateDocumentData

Key tools often missed:

CheckExistingDocuments → Before requesting a driver's license, check if one is already on file
CompareDocuments → If document exists, compare old vs. new for material changes
CheckExistingApplications → Detect pending or recently completed applications before starting new one
VerifyIdentity → KYC via third-party providers, not just OFAC

Guardrails: PII and Compliance Controls

This is where Bedrock shines for financial services:

Guardrail:
  SensitiveInformationPolicy:
    PIIEntities:
      - Type: SSN
        Action: ANONYMIZE    # Replace with [SSN] in responses
      - Type: CREDIT_CARD_NUMBER
        Action: BLOCK        # Never include
      - Type: BANK_ACCOUNT_NUMBER
        Action: ANONYMIZE
    
    RegexPatterns:
      - Name: EIN_PATTERN
        Pattern: '\b\d{2}-\d{7}\b'
        Action: ANONYMIZE

  TopicPolicy:
    DeniedTopics:
      - Name: COMPETITOR_DISCUSSION
        Definition: "Discussion of competitor products or pricing"
        Action: BLOCK
      - Name: INVESTMENT_ADVICE
        Action: BLOCK

  WordPolicy:
    CustomWordLists:
      - Words: ["guaranteed approval", "instant approval"]
        Action: BLOCK  # Compliance violation

The agent collects SSNs, EINs, bank accounts—these never appear in responses, even when confirming back to the merchant. That's configuration, not code.

The Monitoring Architecture

Here's where AgentSHAP principles become practical. You need to prove tools actually influenced outcomes.

Layer 1: Comprehensive Tool Logging

Every invocation generates structured logs:

{
  "timestamp": "2025-01-13T14:32:18Z",
  "session_id": "ses_abc123",
  "application_id": "APP-2025-00892",
  "phase": "kyc_verification",
  
  "tool_invocation": {
    "action_group": "KYCCompliance",
    "tool_name": "ScreenOFAC",
    "input_hash": "sha256:a1b2c3...",  // Hash, NOT actual PII
    "execution_ms": 892,
    "output_summary": {
      "match_found": false,
      "lists_checked": ["SDN", "CONS"]
    }
  },
  
  "preceding_context": {
    "user_message_type": "owner_info_provided",
    "agent_intent": "run_compliance_screening"
  }
}

Critical: input_hash not actual input. PII never lands in logs.

Layer 2: Analytics Pipeline

CloudWatch Logs
      │
      ▼
Kinesis Firehose (transforms to Parquet)
      │
      ▼
S3 (partitioned by date)
      │
      ▼
Athena (SQL queries)
      │
      ▼
QuickSight (dashboards)

Layer 3: QuickSight Dashboards

Dashboard 1: Application Funnel

Started vs. completed by day
Abandonment by phase (where do merchants drop?)
Time spent per phase

Dashboard 2: Tool Performance

Invocation frequency by action group
Latency distribution (p50, p95, p99)
Error rates

Dashboard 3: Compliance Monitoring

OFAC screening completion rate (must be 100%)
KYC verification outcomes
Manual review triggers

Dashboard 4: Tool Attribution (AgentSHAP-inspired)

-- Which tools correlate with successful applications?
SELECT 
  tool_name,
  phase,
  COUNT(*) as invocations,
  AVG(CASE WHEN outcome='APPROVED' THEN 1.0 ELSE 0.0 END) as approval_rate
FROM tool_invocations t
JOIN applications a ON t.application_id = a.id
GROUP BY tool_name, phase
ORDER BY phase, approval_rate DESC;

The Attribution Alert System

This is where AgentSHAP thinking pays off. Define expected importance by phase:

Phase	Must Be HIGH Importance	Alert If LOW
KYC/Compliance	VerifyIdentity, ScreenOFAC	Either one
Business Verification	ValidateTIN, VerifyBusinessReg	ValidateTIN
Document Collection	ProcessDocument	-
Submission	CheckExistingApplications	-

If OFAC shows low importance during KYC phase → investigate immediately.

Either:

Tool isn't being called (agent bug)
Tool output isn't influencing agent behavior (prompt issue)

Both are compliance failures. Both are now detectable.

Real Conversations the Agent Handles

Duplicate application detected:

Agent: [Calls CheckExistingApplications]

Agent: "I found an existing application for Tasty Tacos LLC 
        submitted January 8th, currently in underwriting.
        
        Would you like to check that status, or start 
        a new application for a different location?"

Document already on file:

Merchant: [Uploads driver's license]

Agent: [Calls CheckExistingDocuments → found]
       [Calls CompareDocuments → address changed]

Agent: "I have a license on file from your previous application,
        but the address is different—123 Main St vs 456 Oak St.
        
        Should I update your records with the new address?"

KYC verification fails:

Agent: [Calls VerifyIdentity → FAILED, confidence 0.3]

Agent: "I couldn't verify your identity with the information 
        provided. This sometimes happens with:
        • SSN typos
        • Name variations (maiden name, suffix)
        • Address mismatches
        
        Would you like to re-enter, or speak with a specialist?"

Architecture Summary

┌────────────────────────────────────────────────────────────────────────┐
│                     MERCHANT INTAKE AGENT                              │
├────────────────────────────────────────────────────────────────────────┤
│                                                                        │
│  CloudFront + S3 (React) ──▶ API Gateway (WebSocket)                  │
│                                      │                                 │
│                                      ▼                                 │
│  ┌────────────────────────────────────────────────────────────────┐   │
│  │                  BEDROCK AGENTCORE                             │   │
│  │  • Claude 3.5 Sonnet (orchestration)                          │   │
│  │  • Action Groups (tools via Lambda)                           │   │
│  │  • Knowledge Base (products, pricing, MCCs)                   │   │
│  │  • Guardrails (PII filtering, topic blocking)                 │   │
│  └────────────────────────────────────────────────────────────────┘   │
│                              │                                         │
│              ┌───────────────┼───────────────┐                        │
│              ▼               ▼               ▼                        │
│         DynamoDB         Textract       External APIs                 │
│         (sessions,         (OCR)        (IRS, KYC, OFAC,             │
│          applications)                   State SOS)                   │
│                                                                        │
├────────────────────────────────────────────────────────────────────────┤
│                    MONITORING LAYER                                    │
│                                                                        │
│  CloudWatch ──▶ Firehose ──▶ S3 ──▶ Athena ──▶ QuickSight           │
│       │                                              │                 │
│       └──────────────────────────────────────────────┘                │
│                    Attribution Analysis                                │
│                                                                        │
└────────────────────────────────────────────────────────────────────────┘

Key Takeaways

Tools need structure — Action Groups affect how the model reasons about tool selection
Guardrails are configuration — PII filtering, topic blocking defined declaratively, not in code
Observability enables attribution — Log tool invocations with context; hash PII
Compliance needs proof — AgentSHAP principles turn "we logged it" into "we can prove it mattered"
Handle edge cases explicitly — Existing applications, duplicate documents, verification failures
Monitor tool importance by phase — Alert when compliance tools show unexpectedly low attribution

The Business Impact

Estimated improvements based on industry implementations:

Metric	Before (Manual)	After (Agent)
Time per application	25-45 minutes	5-8 minutes
Internal cost	$150-300	$15-30
Abandonment rate	60-70%	20-30%
Rework rate	15-20%	<5%
Availability	Business hours	24/7
Compliance auditability	"We have logs"	Provable attribution

Resources:

Research: AgentSHAP (arXiv:2512.12597)
Research Code: github.com/GenAISHAP/TokenSHAP