Skip to main content

I Asked an AI to Build Me a Bank Data Platform

I Asked an AI to Build Me a Bank Data Platform. Here's What Happened.

Claude wrote this post for me and we reviewed and revised it together. I know this won't be popular with at least a few of my regular readers, but it felt like the right thing to do given the project it's describing. 

Lambdas don't play well with RDS databases. Connection pooling, VPC cold starts, idle timeouts — it's a well-documented headache. But a Lambda is the obvious choice for a cloud-native Open Banking API implementation. And RDS is usually the best choice when you want to query and analyse data. I'd hit this tension before and never resolved it cleanly.

So I wanted to see if Apache Iceberg could be the answer. It works with Lambdas (just write Parquet files to S3), and it can be queried like a relational database via Spark or Athena. I was also keen to see what the Lambda integration actually looked like in practice.

I decided to pair with Claude Code on the build — treating it as a junior engineer who never sleeps and never complains about writing Terraform tests.

Why a Lakehouse?

Bank data arrives as events. A customer grants consent, accounts are discovered, transactions are fetched. You need two things at once — operational tracking (where are we in the import?) and analytical storage (let me query all my transactions over time).

I settled on a two-store pattern. DynamoDB handles the ephemeral state: connection lifecycle, import progress, and token storage. It's fast, cheap, and purpose-built for key-value access patterns. Apache Iceberg on S3 holds the settled data: accounts, balances, and transactions. It supports schema evolution, queries cleanly via Spark or Athena, and keeps an immutable append-only history.

The architecture looks like this:

SQS sits in the middle on purpose — it decouples the consent flow from the heavy lifting, so a slow data fetch never blocks a customer's browser.

The Open Banking Detour

I originally designed the whole system around GoCardless Bank Account Data. They offered free Open Banking API access to UK bank accounts, a webhook-driven model, and good documentation — exactly what I needed. The architecture doc, the implementation plan, and the first few modules were all built around their API.

Then I discovered, mid-build, that GoCardless had quietly stopped accepting new signups. No announcement. No deprecation notice. The login page had no link to register — just a message saying new signups were closed. It appears to be a quiet wind-down of the free self-service tier while existing users retain access.

So I evaluated alternatives:

Provider Verdict
Tink (Visa)Free sandbox, but states "business use only"
PlaidSandbox free, but more US-centric
YapilyEnterprise-only, no self-service free tier
Enable BankingUK coverage uncertain, requires sales quote
TrueLayerUK-native, free sandbox, no limits, Mock Bank with test credentials

TrueLayer won. It's UK-native, the sandbox is genuinely free with no expiry, and you don't need a commercial agreement or FCA registration to use it. The Mock Bank provides a full end-to-end flow with test credentials.

However, it's a fundamentally different integration model. GoCardless uses webhooks — they push events to you when accounts are discovered and transactions are ready. TrueLayer's Data API is synchronous — after consent, you pull the data yourself.

The architecture survived the pivot because the queue-based decoupling meant only "what triggers the message" changed, not "what processes it." The webhook Lambda became unnecessary, but the SQS-to-processor pattern stayed. The lesson: loose coupling paid off on day one.

How Claude Code Built It

The build happened in roughly five stages across two days. I'll go deep on the most interesting moment — the provider pivot — because that's where AI-assisted development felt most different from working alone.

Design phase. I described the requirements in plain English. Claude produced the architecture document and a phased implementation plan, both of which are still in the repo. This was the most valuable output — not code, but structure. Ten phases, each with test-first development, clear module boundaries, and explicit verification steps.

Scaffold and modules. I built the Terraform modules one at a time, following the plan. Each module came with tests — terraform test for infrastructure and vitest for the Node.js Lambdas. The data-processing Lambda turned out to need Python (more on that shortly), so pytest joined the toolchain. TDD meant fewer "it compiles but doesn't work" surprises.

The pivot. When I discovered GoCardless was unavailable, I described the constraint to Claude. It produced the ADR (architecture decision record), rewrote the implementation plan, and migrated all the Lambda code. The connection-based model — replacing GoCardless's requisition-based approach — was its suggestion. A fundamental architecture change, handled calmly and systematically. No panic, no throwaway prototypes. Just: here's the decision, here's the rationale, here's the new code.

This is what felt different. A solo developer hitting a dead end mid-build has to context-switch between disappointment, research, decision-making, and re-implementation. Claude compressed that into a single focused session: evaluate alternatives, pick one, document why, rewrite the code.

Integration and debugging. The first deploy had issues. Iceberg write failures (schema mismatches between TrueLayer's response format and my table definitions), CORS problems, and callback URL wiring all needed sorting out. Claude diagnosed and fixed from error logs — I'd paste the CloudWatch error, and it would explain what was wrong and produce a fix.

The accounts page. The next morning, I added an end-to-end feature in a single session: a query Lambda that reads from Iceberg, an API Gateway route, and a frontend page that displays the data. Planned, built, and deployed in about an hour.

What Worked Well

  • The original hypothesis held — Lambdas can write to Iceberg cleanly, and the resulting banking data is queryable from Spark or Athena exactly as if it lived in an RDS instance. No connection pools, no VPC cold starts, no idle timeouts.
  • Structured planning before coding — the implementation plan prevented scope creep and gave both of us a shared reference point.
  • TDD approach — caught integration issues early, particularly around DynamoDB access patterns and Iceberg schema definitions.
  • Handling the pivot — the ADR plus the rewritten plan meant nothing was lost in the transition; the reasoning is documented.

What Needed Human Judgement

  • Choosing TrueLayer over the alternatives — that needed domain knowledge about the UK Open Banking landscape and what "free sandbox" actually means in practice.
  • Deciding sandbox-only was fine — a product scope decision, not a technical one.
  • Validating the end-to-end flow in a real browser — clicking through the consent redirect, checking the email arrived, and viewing the accounts page.
  • AWS account setup, SES verification, and SSM secrets — the "real world" bits that aren't in any codebase.

The Technical Stack

Layer Technology Why
InfrastructureTerraformReproducible, modular, testable
ComputeAWS Lambda (Node.js + Python)Node for API-facing (fast cold start), Python for data processing (PyIceberg)
QueueSQS + DLQDecouples consent from processing, handles retries
StateDynamoDBSingle-table design, pay-per-request, point-in-time recovery
StorageS3 + Iceberg + GlueAnalytics-ready, schema evolution, standard format
APIAPI Gateway (HTTP API)Cheap, auto-deploy, payload v2
FrontendStatic HTML/JS on S3/CloudFrontNo framework, no build step
EmailSESTransactional notifications
Open BankingTrueLayer (sandbox)Free, UK-native, good developer experience
AIClaude CodeDesign, implementation, debugging

Note the deliberate absence of: React, Docker (except for the Python layer build), Kubernetes, databases-as-a-service, API frameworks. The simplest thing that works.

Closing

The system works. It connects to a real (sandbox) bank via TrueLayer, pulls accounts, transactions, and balances, writes them into Apache Iceberg tables on S3, emails me when it's done, and lets me view the data in a browser. All infrastructure-as-code, all test-driven, all from a monorepo. Built in a single day.

Claude Code didn't replace engineering judgement — it amplified it. I still made the decisions: what to build, which provider to use, sandbox-only scope, plain HTML over a framework. But the implementation velocity was genuinely different. A system that would have taken two or three weekends of scattered effort took one focused day of pairing.

Bear in mind what this isn't. There's no auth beyond "type your email." No token refresh for expired connections. No monitoring or alerting. No CI/CD pipeline. It's sandbox-only — the transactions are synthetic test data from TrueLayer's Mock Bank. The goal was to prove the pattern end-to-end, not to build something production-ready.

The repo is open if you want to poke around. The architecture doc, implementation plan, and ADR are all in docs/ — they're as much a record of the process as the code itself.

Comments

Popular posts from this blog

Write Your Own Load Balancer: A worked Example

I was out walking with a techie friend of mine I’d not seen for a while and he asked me if I’d written anything recently. I hadn’t, other than an article on data sharing a few months before and I realised I was missing it. Well, not the writing itself, but the end result. In the last few weeks, another friend of mine, John Cricket , has been setting weekly code challenges via linkedin and his new website, https://codingchallenges.fyi/ . They were all quite interesting, but one in particular on writing load balancers appealed, so I thought I’d kill two birds with one stone and write up a worked example. You’ll find my worked example below. The challenge itself is italics and voice is that of John Crickets. The Coding Challenge https://codingchallenges.fyi/challenges/challenge-load-balancer/ Write Your Own Load Balancer This challenge is to build your own application layer load balancer. A load balancer sits in front of a group of servers and routes client requests across all of the serv...

Catalina-Ant for Tomcat 7

I recently upgraded from Tomcat 6 to Tomcat 7 and all of my Ant deployment scripts stopped working. I eventually worked out why and made the necessary changes, but there doesn’t seem to be a complete description of how to use Catalina-Ant for Tomcat 7 on the web so I thought I'd write one. To start with, make sure Tomcat manager is configured for use by Catalina-Ant. Make sure that manager-script is included in the roles for one of the users in TOMCAT_HOME/conf/tomcat-users.xml . For example: <tomcat-users> <user name="admin" password="s3cr£t" roles="manager-gui, manager-script "/> </tomcat-users> Catalina-Ant for Tomcat 6 was encapsulated within a single JAR file. Catalina-Ant for Tomcat 7 requires four JAR files. One from TOMCAT_HOME/bin : tomcat-juli.jar and three from TOMCAT_HOME/lib: catalina-ant.jar tomcat-coyote.jar tomcat-util.jar There are at least three ways of making the JARs available to Ant: Copy the JARs into th...

Do software engineering professionals still read? - survey results

  In order to gauge the potential audience for my book, So you think you can lead a team? , I conducted a small survey of my colleagues, co-workers and anyone from Linked. I read regularly, for work and pleasure, and assumed everyone else did too but did the responses I received confirm this? I polled 173 people, all within the software engineering field (including Product, etc), with a range of ages and years of experience in their role. What surprised me the most was that the majority of people, young or old, just starting or seasoned, still prefer reading physical books to blogs or e-readers. It also seemed that the older and more experienced were the most keen in learning more, and reading to expand or update their knowledge.  When it comes to reading habits between different roles the survey showed that software engineers and team leads read more regularly for their career than other roles, with 55 years old and over and 16+ years experience being the biggest readers over...