My AI Journey — A Timeline of Tinkering, Models, and Machines
My AI Journey — A Timeline of Tinkering, Models, and Machines
If you know me from finance, you know the background: twenty-plus years in risk management roles across investment banks. FRTB specialist.
What you don't know is what I was doing on evenings and weekends. And that's the story I want to tell — because a lot of what's becoming mainstream now in AI agents is something I was building and using much earlier. Not better — the UI/UX was terrible and I was the only user — but the capabilities were there.
Before GPT: The Foundations
I studied Computing at Imperial College London, and the early part of my career was developing software in the financial industry during the dotcom era. What happened next is what happens to a lot of engineers who move into senior corporate roles. For the last 15 to 20 years, I've been in risk management roles in investment banks — positions where you're managing teams and frameworks, not writing code day-to-day. But I never stopped. On evenings and weekends, I kept tinkering, building side projects, entering hackathons, getting involved in startups, and keeping up with the technology.
I went deep into Haskell — natural fit for someone with a quant and CS background. That led me into the Nix ecosystem, as any good Haskeller would. I drank the pure functional cool-aid and never quite put the glass down. I still use Nix today, even though I barely use Haskell anymore. The languages have shifted to Python and JavaScript — entirely because of the AI ecosystem — but the Nix infrastructure remains.
I built an algorithmic trading system in Haskell using functional reactive programming during the Bitcoin and crypto hype. I was also involved in startups and tried to build a FinTech product when I worked in Copenhagen. That introduced me to Flutter for mobile development — which turned out to be remarkably pertinent for my current apps.
My home setup is admittedly esoteric. My main server runs WSL2 and Nix. I'm a Windows guy — the corporate job demands it. Excel is non-negotiable for anyone in finance, and anyone who deals with trading platforms (Interactive Brokers, NinjaTrader) needs a Windows machine. So my world is Windows on the outside, Nix on the inside. I also have a Mac Mini — essential for iOS development — and I'm slowly moving more towards the Apple ecosystem, as all AI tinkerers seem to be doing these days.
The ChatGPT Moment
Then GPT-3.5 landed, and ChatGPT showed the world something most people hadn't considered: AIs can understand language and speak back coherently. For most people, this was the awakening. For me, it was confirmation that the trajectory I'd been watching was accelerating faster than anyone expected.
But the real spark came shortly after.
March/April 2023: The Awakening — AutoGPT and BabyAGI
Around the release of GPT-4, the open-source community produced something that grabbed my attention far more than chatbots.
AutoGPT hit 100,000 GitHub stars in weeks. BabyAGI showed that an AI could plan and execute multi-step tasks autonomously. These were among the first ReAct agents — precursors of the coding agents and tools we have now. The concept was correct, but the model capabilities weren't there yet. GPT-4 was the best available, but it cost an arm and a leg in those days.
I started tinkering with AutoGPT, burning tokens on exercises that often went nowhere. Infinite loops, hallucinated plans, wasted API credits. But with sufficient guardrails and enough tokens burnt, you could get productive output — particularly in coding.
I forked AutoGPT and started tinkering with it directly — modifying prompts, adding guardrails, trying to make it actually useful rather than just impressive.
Key models: GPT-4 (March 2023). Expensive. Dominant. Nothing else came close.
July 2023: The Birth of AI Assistant
Within months, I wasn't just running other people's agents — I was building my own.
In July 2023, ai_assistant was born. For context: Claude Code didn't exist. Cursor was just getting started. Codex was an API, not a CLI. There was no "AI coding agent" category yet. I was building something that wouldn't have a name for another year.
At first it was a hodgepodge of all my AI ReAct-based tooling — a documentation summariser, a web browser, a code generator. But it grew fast.
Most of the code was eventually abstracted into separate repositories:
llm_utils— my custom LiteLLM-like library to abstract across providersshin-web-agent— an autonomous web browser agent, inspired by projects like BrowserUse
This was my height of reading the seminal prompt engineering papers of the time. The AI assistant had implementations of the key techniques: Tree of Thoughts, Chain of Thought, ReAct. I also built RAG querying using ChromaDB, adding a planning module inspired by BabyAGI.
Early on, the assistant also had its own context management system — commands to add files, URLs, and goals to the working context — essentially manual context engineering before the term existed. It could also generate git commit messages from diffs and automate GitHub issue-to-PR workflows in a single command.
RAG was the big paradigm at the time — Retrieval-Augmented Generation was everywhere in mid-2023, and for good reason. I had a clear use case: querying large regulatory documents at home to facilitate my work research. Cross-jurisdictional analysis on things like the CRR (Capital Requirements Regulation) — massive documents, dense legal language, cross-referencing requirements across jurisdictions. These were genuinely challenging RAG problems. Today the conversation has moved to agentic RAG, but back then, getting basic retrieval to work reliably on financial regulation was hard enough.
Key models: GPT-4 dominant, Claude 2 just launched from Anthropic. Early function calling. Everyone building LangChain wrappers.
Summer 2023: Self-Hosted LLMs
The self-hosting era began that same summer. I bought a new machine and two GPUs — a 4090 and a 3090 — and started running models locally.

I used ExLlamaV2 as the inference platform and TabbyAPI for an OpenAI-compatible endpoint. I tinkered with every new model that dropped: Llama 2, Mistral 7B, Mixtral 8x7B — quantised, optimised, doing everything I could to squeeze performance out of consumer hardware.
The appeal was obvious: privacy, no API costs, full control, no rate limits. The reality was more nuanced. Local models in mid-to-late 2023 were good enough for simple tasks but couldn't handle the complex multi-step reasoning I needed for serious agent work.
I learned a lot about quantization, GGUF formats, and the practical limits of local inference. Eventually I went API-first for production use cases — the quality gap was too large. But I still believe the "bitter lesson" will close that gap.
Key models: Llama 2, Mistral 7B, Mixtral 8x7B. The quantization revolution. The self-hosted AI movement in full swing.
December 2023: The Aider-Inspired Era — AI Coding Gets Real
Inspired by the Aider project, I decided to unify all my hodgepodge of utilities into a coherent system.
As an avid Vim user, I built an ex-based interface. Instead of slash commands, I used colon commands — like any vimmer would. I could run any AI LLM orchestration imperatively from the command line. I also added a Gradio web frontend for when I wanted a visual interface.
All this time, the system was building itself. People might call it vibe coding, but the better term is vibe engineering — end-to-end guardrails to ensure built-in quality at all times: testing, linting, build validation, CI pipelines. The AI wrote the code, but the engineering discipline was mine.
The system also introduced escalating bug-fixing strategies — a simple fix agent for straightforward failures, and a mixture-of-agents approach for tricky bugs that used multiple models to reach consensus on the fix. Interactive and automated code review workflows rounded it out.
Key models: GPT-4 Turbo, early Claude 3 previews. Models getting cheaper and faster. Aider proving the concept of LLM-driven code editing.
2024: Multi-Modality Changes Everything
GPT-4o and the Gemini series brought multi-modality — and it became a big deal, particularly from November 2024 onward as costs dropped.
Voice Agent for My Son: Using GPT-4o and LiveKit, I created a voice agent — a personal AI voice friend for my son. Built a complete mobile app as a POC. This was pure joy to build.
Voice TODO App: I also built a voice-based TODO app to manage my own task list by speaking to it.
Web Browser Agent: With the Gemini 2 series of models (late November/December 2024) making multi-modal inference more cost-effective, I built a Playwright-based browser agent that used the accessibility tree, screenshots, and naive Monte Carlo tree search for web navigation. Most of my time was spent trying to implement stealth techniques to avoid bot detection.
Key models: GPT-4o (multi-modal breakthrough), Gemini 2.0 Flash (cost-effective multi-modal). The era of seeing, hearing, and speaking.
January – April 2025: Reasoning Models and the DSL Revolution
The O1/O3 reasoning models changed what was possible in complex planning and code generation.
In January 2025, I started building a full DSL for the AI assistant — as any Haskeller would! This allowed me to craft precise coding workflows: use an expensive model (like O1) to plan and architect, then have cheaper models execute the plan. Central agent plans, satellite agents execute.
Here's a small snippet of the DSL — these are some of the commands from init.lm, the entrypoint that wired everything together. The full init.lm is available as a gist if you want to see the complete picture:
# Feature development — from small edits to large multi-file features
user_command "ai_quick_edit" "llm_scripts/workflows/ai_quick_edit.lm"
user_command "ai_add_feature" "llm_scripts/workflows/ai_add_feature.lm"
user_command "ai_add_large_feature" "llm_scripts/workflows/ai_add_large_feature.lm"
# Bug fixing — escalating levels of sophistication
user_command "ai_fix_test" "llm_scripts/agents/smart_fix_test_agent.lm"
user_command "ai_fix_error" "llm_scripts/workflows/fix_error.lm"
user_command "ai_hard_test_fix" "llm_scripts/workflows/hard_test_fix.lm"
# Code review — interactive or automated
user_command "ai_interactive_review" "llm_scripts/workflows/interactive_code_review.lm"
user_command "ai_code_review" "llm_scripts/agents/code_review_agent.lm"
# GitHub integration — issue to PR in one command
user_command "ai_gh_issue_pr" "llm_scripts/workflows/ai_gh_issue_pr.lm"
# Research, memory, and context management
user_command "ai_research" "llm_scripts/workflows/ai_research_topic.lm"
user_command "create_memory" "llm_scripts/create_memory.lm"
user_command "extract_knowledge" "llm_scripts/extract_knowledge.lm"
# Mixture of Agents — consensus-driven problem solving
user_command "ai_moa" "llm_scripts/agents/mixture_of_agents.lm"
Every command mapped to a workflow script. Every workflow had guardrails, context management, and model selection built in. This was Claude Code before Claude Code — but written in a custom DSL instead of TypeScript.
Around this time, I migrated to just running code reviews and nothing else. I had stopped coding directly. I would brainstorm with the AI, review its plans, approve the diffs, and decide whether to continue or address issues. The AI assistant continued to evolve — supporting autopilot mode for applying series of code changes without approval, and adding support for new models as they came out.
The assistant transitioned to multi-agent orchestration using tmux and git worktrees — multiple agents working on different branches simultaneously, coordinated from a single terminal. It also gained a persistent memory system — the ability to save and recall context across sessions — and task continuity, so I could pause work, come back the next day, and pick up exactly where I left off.
Gemini 2.5 Pro was the standout model of this period — particularly its ability to consume large chunks of a codebase and produce coherent plans.
Key models: O1/O3 (reasoning leap), Gemini 2.5 Pro (massive context + planning), Claude 3.5 Sonnet (coding quality breakthrough).
Mid-2025: The Leap
Big changes were happening at my corporate job at the same time as all this tinkering. I decided I could give it a go — building something of my own. At the time, I had no idea what I could build, but I knew these capabilities could 15x yourself.
In June 2025, I went to the AI Engineer World's Fair in San Francisco. My eyes were opened. Not just to the technology — I'd been deep in that for two years — but to the ecosystem, the community, and the sheer pace of what was coming.
I left Barclays. Not because my finance career was over — but because I needed to be at the forefront of this change. It felt like the dotcom era all over again. Back then, I knew the internet would reshape the world and the future of work. This felt the same, but bigger.
The AI Assistant's Legacy
The ai_assistant is for sure a casualty of the bitter lesson. Commercial tools — Claude Code, Codex, Cursor — have caught up and surpassed what one person could maintain alongside everything else. I'm admittedly a little jealous that everyone now has a similar level of capability that I've had for a while — but that's how progress works. I'll be open-sourcing it soon once I've cleaned it up, so people can see the approach in detail.
However, its precise AI coding orchestration is still unmatched in some ways, I believe. The DSL, the guardrails, the workflow automation — these were designed for my specific way of working. But the gap has closed enough that it's no longer necessary, at least for my current projects.
The real legacy isn't the code. It's the thinking. Every principle that went into the AI assistant — structured workflows, human checkpoints, quality gates, multi-model orchestration — lives on in what I'm building now. And the fact that these ideas are now showing up in mainstream tools tells me the instincts were right, even if the implementation was rough around the edges.
The Bigger Picture
Here's what most people miss: coding was just the first domain. AI agents took hold in software development first because code is easy to verify — you run the tests, the build passes or it doesn't. The feedback loop is tight and the rewards are clear.
But the same trajectory is coming for every white-collar discipline. Compliance analysis. Risk reporting. Document review. Financial modelling. Project management. Any structured knowledge work where you can define what "good" looks like is on the same path that coding just went through.
I know this because I'm already living it. Think about what happens in a corporate company: management layers orchestrate the work below them. A CEO sets direction. A COO runs operations. VPs plan delivery. Directors review quality. That's exactly what my AI empire does — except the management layers are agents. A Chief of Staff plans product delivery. An XO runs operations. A Critic reviews quality. Workers execute. I sit at the top, approving the key decisions.
It's not fully there yet — the system is still building itself, continuously improving. But the shape of it is unmistakable. One person, operating like a CEO of a company, with AI agents filling the roles that would normally require a team of 50.
For anyone coming from a corporate background — and I spent 20 years in that world — this is the shift to pay attention to. Not because AI is coming for your job tomorrow, but because the people who learn to orchestrate agents effectively will operate at a scale that's simply not possible manually. That's not a tech story — it's an operating model.
I'll go deeper on Mission Control and the HITL App in upcoming posts.
What's Next
Agent orchestration is the TODO app of 2026. Everyone will build one. I've already rolled my own — Mission Control, which powers everything I do today. I'll cover the architecture and philosophy in a future post.
I wrote this post because the timeline matters. What's happening in coding right now is a preview of what's coming everywhere else. I've seen the trajectory from the inside — from the first ReAct agents that could barely hold a conversation, to a system that runs an entire product empire. The pattern is the same. Only the domain changes.
This is the first post on my personal blog. If you're interested in what happens when two decades of corporate experience meets the agentic revolution — and what it means for the future of work — stick around. There's a lot more to tell.
Ask about this post
Digital Twin — Live