• Technology Illumination
  • Posts
  • AWS Bedrock AgentCore: Monitoring Architecture for AI-Driven Merchant Banking Application Intake

AWS Bedrock AgentCore: Monitoring Architecture for AI-Driven Merchant Banking Application Intake

Observability patterns for agentic tool performance and compliance attribution

Reading time: 5 minutes

The Research That Started This

A paper crossed my desk recently: "AgentSHAP: Interpreting LLM Agent Tool Importance with Monte Carlo Shapley Value Estimation" (arXiv:2512.12597).

The core insight is simple but powerful: AI agents call tools, but we can't explain which tools actually mattered for the response. The paper applies game theory—running agents with different tool subsets and measuring response changes to compute fair attribution scores.

For banking, this isn't academic. When a regulator asks "How do you know your OFAC screening actually influenced the decision?", you need better than "the logs show it was called."

This got me thinking about a perfect application.

The Merchant Banking Problem

Merchant acquiring—enabling businesses to accept card payments—has an intake problem that's been unsolved for decades.

Here's what typically happens:

┌─────────────────────────────────────────────────────────────┐
│                    CURRENT STATE                            │
├─────────────────────────────────────────────────────────────┤
│                                                             │
│  Business owner visits website                              │
│              │                                              │
│              ▼                                              │
│  Fills "Request a Call" form                                │
│              │                                              │
│              ▼  ◀── 4-24 hour wait                          │
│  Sales rep calls back                                       │
│              │                                              │
│              ▼  ◀── 25-45 minutes on phone                  │
│  Rep collects business info, owner details, volume estimates│
│              │                                              │
│              ▼  ◀── Manual data entry, error-prone          │
│  Rep enters data into internal systems                      │
│              │                                              │
│              ▼  ◀── Days of email back-and-forth            │
│  Chase missing documents                                    │
│              │                                              │
│              ▼  ◀── Finally reaches underwriting            │
│  Application reviewed                                       │
│                                                             │
└─────────────────────────────────────────────────────────────┘

Industry experience suggests these ranges are common:

Metric

What It Means

60-70% abandonment

Merchants START applications but don't FINISH—forms are complex, they don't have EIN or articles of incorporation handy

$150-300 internal cost

Bank's cost per application: sales rep time, systems, follow-up calls, data entry labor

15-20% rework rate

Data quality issues discovered in underwriting: TIN mismatch, address validation failure, identity verification issues

5-10 business days

First contact to underwriting decision

Meanwhile, fintech competitors onboard merchants in minutes. The gap is existential.

Why an AI Agent Fits

The intake process isn't a simple form—it's a branching conversation:

  • Business type determines product recommendations

  • Expected volume determines pricing tier

  • Risk profile determines documentation requirements

  • State of incorporation affects compliance checks

A rigid web form can't handle this. A human rep can, but at $150-300 per application.

An AI agent guided by tools can:

  • Recommend products based on business type

  • Validate data in real-time (catch the TIN mismatch during conversation, not in underwriting)

  • Run compliance checks in the background

  • Collect documents via chat with OCR extraction

  • Submit complete, validated applications

The result: 5-8 minutes instead of 45. $15-30 instead of $150-300. 24/7 availability. Perfect data quality.

But here's the catch: you need to prove the agent is doing what it should.

Enter AWS Bedrock AgentCore

Bedrock AgentCore provides the managed infrastructure. Let me show you the configurations that matter for this use case.

Action Groups: Organizing Tools

Tools are grouped by function. This affects how the model reasons about selection:

PRODUCT SELECTION          BUSINESS VERIFICATION      KYC & COMPLIANCE
├─ RecommendProduct        ├─ ValidateTIN             ├─ VerifyIdentity
├─ GetProductDetails       ├─ VerifyBusinessReg       │   └─ Integrates with
├─ CalculatePricing        ├─ LookupMCC               │      third-party KYC
                           └─ ValidateAddress         │      (Jumio, Onfido, etc.)
                                                      ├─ ScreenOFAC
DOCUMENTS                  APPLICATION MGMT           ├─ CheckPEP
├─ CheckExistingDocs       ├─ CheckExistingApps       └─ AssessRisk
│   └─ Is doc already      │   └─ Prevent duplicates
│      on file?            ├─ SaveProgress
├─ CompareDocuments        ├─ ResumeApplication
│   └─ Old vs new changes? └─ SubmitApplication
├─ ProcessDocument (OCR)
└─ ValidateDocumentData

Key tools often missed:

  • CheckExistingDocuments → Before requesting a driver's license, check if one is already on file

  • CompareDocuments → If document exists, compare old vs. new for material changes

  • CheckExistingApplications → Detect pending or recently completed applications before starting new one

  • VerifyIdentity → KYC via third-party providers, not just OFAC

Guardrails: PII and Compliance Controls

This is where Bedrock shines for financial services:

Guardrail:
  SensitiveInformationPolicy:
    PIIEntities:
      - Type: SSN
        Action: ANONYMIZE    # Replace with [SSN] in responses
      - Type: CREDIT_CARD_NUMBER
        Action: BLOCK        # Never include
      - Type: BANK_ACCOUNT_NUMBER
        Action: ANONYMIZE
    
    RegexPatterns:
      - Name: EIN_PATTERN
        Pattern: '\b\d{2}-\d{7}\b'
        Action: ANONYMIZE

  TopicPolicy:
    DeniedTopics:
      - Name: COMPETITOR_DISCUSSION
        Definition: "Discussion of competitor products or pricing"
        Action: BLOCK
      - Name: INVESTMENT_ADVICE
        Action: BLOCK

  WordPolicy:
    CustomWordLists:
      - Words: ["guaranteed approval", "instant approval"]
        Action: BLOCK  # Compliance violation

The agent collects SSNs, EINs, bank accounts—these never appear in responses, even when confirming back to the merchant. That's configuration, not code.

The Monitoring Architecture

Here's where AgentSHAP principles become practical. You need to prove tools actually influenced outcomes.

Layer 1: Comprehensive Tool Logging

Every invocation generates structured logs:

{
  "timestamp": "2025-01-13T14:32:18Z",
  "session_id": "ses_abc123",
  "application_id": "APP-2025-00892",
  "phase": "kyc_verification",
  
  "tool_invocation": {
    "action_group": "KYCCompliance",
    "tool_name": "ScreenOFAC",
    "input_hash": "sha256:a1b2c3...",  // Hash, NOT actual PII
    "execution_ms": 892,
    "output_summary": {
      "match_found": false,
      "lists_checked": ["SDN", "CONS"]
    }
  },
  
  "preceding_context": {
    "user_message_type": "owner_info_provided",
    "agent_intent": "run_compliance_screening"
  }
}

Critical: input_hash not actual input. PII never lands in logs.

Layer 2: Analytics Pipeline

CloudWatch Logs
      │
      ▼
Kinesis Firehose (transforms to Parquet)
      │
      ▼
S3 (partitioned by date)
      │
      ▼
Athena (SQL queries)
      │
      ▼
QuickSight (dashboards)

Layer 3: QuickSight Dashboards

Dashboard 1: Application Funnel

  • Started vs. completed by day

  • Abandonment by phase (where do merchants drop?)

  • Time spent per phase

Dashboard 2: Tool Performance

  • Invocation frequency by action group

  • Latency distribution (p50, p95, p99)

  • Error rates

Dashboard 3: Compliance Monitoring

  • OFAC screening completion rate (must be 100%)

  • KYC verification outcomes

  • Manual review triggers

Dashboard 4: Tool Attribution (AgentSHAP-inspired)

-- Which tools correlate with successful applications?
SELECT 
  tool_name,
  phase,
  COUNT(*) as invocations,
  AVG(CASE WHEN outcome='APPROVED' THEN 1.0 ELSE 0.0 END) as approval_rate
FROM tool_invocations t
JOIN applications a ON t.application_id = a.id
GROUP BY tool_name, phase
ORDER BY phase, approval_rate DESC;

The Attribution Alert System

This is where AgentSHAP thinking pays off. Define expected importance by phase:

Phase

Must Be HIGH Importance

Alert If LOW

KYC/Compliance

VerifyIdentity, ScreenOFAC

Either one

Business Verification

ValidateTIN, VerifyBusinessReg

ValidateTIN

Document Collection

ProcessDocument

-

Submission

CheckExistingApplications

-

If OFAC shows low importance during KYC phase → investigate immediately.

Either:

  1. Tool isn't being called (agent bug)

  2. Tool output isn't influencing agent behavior (prompt issue)

Both are compliance failures. Both are now detectable.

Real Conversations the Agent Handles

Duplicate application detected:

Agent: [Calls CheckExistingApplications]

Agent: "I found an existing application for Tasty Tacos LLC 
        submitted January 8th, currently in underwriting.
        
        Would you like to check that status, or start 
        a new application for a different location?"

Document already on file:

Merchant: [Uploads driver's license]

Agent: [Calls CheckExistingDocuments → found]
       [Calls CompareDocuments → address changed]

Agent: "I have a license on file from your previous application,
        but the address is different—123 Main St vs 456 Oak St.
        
        Should I update your records with the new address?"

KYC verification fails:

Agent: [Calls VerifyIdentity → FAILED, confidence 0.3]

Agent: "I couldn't verify your identity with the information 
        provided. This sometimes happens with:
        • SSN typos
        • Name variations (maiden name, suffix)
        • Address mismatches
        
        Would you like to re-enter, or speak with a specialist?"

Architecture Summary

┌────────────────────────────────────────────────────────────────────────┐
│                     MERCHANT INTAKE AGENT                              │
├────────────────────────────────────────────────────────────────────────┤
│                                                                        │
│  CloudFront + S3 (React) ──▶ API Gateway (WebSocket)                  │
│                                      │                                 │
│                                      ▼                                 │
│  ┌────────────────────────────────────────────────────────────────┐   │
│  │                  BEDROCK AGENTCORE                             │   │
│  │  • Claude 3.5 Sonnet (orchestration)                          │   │
│  │  • Action Groups (tools via Lambda)                           │   │
│  │  • Knowledge Base (products, pricing, MCCs)                   │   │
│  │  • Guardrails (PII filtering, topic blocking)                 │   │
│  └────────────────────────────────────────────────────────────────┘   │
│                              │                                         │
│              ┌───────────────┼───────────────┐                        │
│              ▼               ▼               ▼                        │
│         DynamoDB         Textract       External APIs                 │
│         (sessions,         (OCR)        (IRS, KYC, OFAC,             │
│          applications)                   State SOS)                   │
│                                                                        │
├────────────────────────────────────────────────────────────────────────┤
│                    MONITORING LAYER                                    │
│                                                                        │
│  CloudWatch ──▶ Firehose ──▶ S3 ──▶ Athena ──▶ QuickSight           │
│       │                                              │                 │
│       └──────────────────────────────────────────────┘                │
│                    Attribution Analysis                                │
│                                                                        │
└────────────────────────────────────────────────────────────────────────┘

Key Takeaways

  1. Tools need structure — Action Groups affect how the model reasons about tool selection

  2. Guardrails are configuration — PII filtering, topic blocking defined declaratively, not in code

  3. Observability enables attribution — Log tool invocations with context; hash PII

  4. Compliance needs proof — AgentSHAP principles turn "we logged it" into "we can prove it mattered"

  5. Handle edge cases explicitly — Existing applications, duplicate documents, verification failures

  6. Monitor tool importance by phase — Alert when compliance tools show unexpectedly low attribution

The Business Impact

Estimated improvements based on industry implementations:

Metric

Before (Manual)

After (Agent)

Time per application

25-45 minutes

5-8 minutes

Internal cost

$150-300

$15-30

Abandonment rate

60-70%

20-30%

Rework rate

15-20%

<5%

Availability

Business hours

24/7

Compliance auditability

"We have logs"

Provable attribution

Resources: