Ai | Tyler Wells

Building the AI Product Assistant: Context Injection, Multi-Turn Chat, and Cross-Product Retrieval

The previous posts focused on search. This one turns to the AI assistant — a floating chat widget that answers product questions, recommends complementary gear, and builds camping loadouts on request. Under the hood, it is a multi-turn conversation system with history, context injection when viewing a product, and dynamic retrieval when the query requires cross-product knowledge. What the assistant does Three core capabilities: Product Q&A — user is viewing a tent, asks “Is this waterproof?”, assistant answers from the product description without retrieving anything. ...

Building the Catalog and Ingestion Pipeline: Archetypes, Embeddings, and ChromaDB

The first post covered architecture. Here the focus shifts to data: how to generate a realistic product catalog at scale, why description quality matters for RAG, and how the ingestion pipeline embeds everything into ChromaDB. The pipeline produced 1180 products with rich descriptions, embedded them in 39 seconds, and returned retrieval results that actually held up. The archetype strategy Writing 1180 product descriptions by hand is infeasible. Having Claude write them one-by-one is slow and produces inconsistent output. The solution: archetype-based generation. ...

AI-Powered QA Testing with playwright-cli and GitHub Copilot

Most AI-assisted QA workflows assume you have access to everything: Playwright MCP configured in VS Code, Copilot Vision enabled, the embedded browser panel working. In an enterprise environment, those assumptions often don’t hold. Security policies restrict which tools can connect to which services. Features get disabled. The standard setup isn’t available. This post documents a different approach factoring in certain constraints. The combination: playwright-cli for browser interaction, GitHub Copilot CLI for the agent loop, and a plain natural language prompt describing what to test. No MCP. No generated test files. No vision model. Just a coding agent running shell commands against a real browser. ...

What I Learned Building a LangGraph Agent From Scratch

I wanted to understand what it actually takes to build something that makes real decisions. So I built a job research agent using LangGraph: give it a company name, it autonomously gathers information from multiple sources, evaluates whether it has enough to work with, and loops back if it doesn’t. This post is about what that process taught me about state, nodes, and conditional nodes. The Problem With Linear Pipelines A typical “agent” pattern looks like this: ...

Your MCP Server Is Only as Good as Its Docstrings

I built a college football data MCP server that connects Claude to CollegeFootballData.com, a free API with deep historical stats, advanced metrics, recruiting data, and play-by-play going back decades. Its data goes beyond what frontier AI models are trained on. Getting it working was straightforward — there’s a gofastmcp.com tutorial for that. Getting Claude to use it well required understanding something that’s easy to overlook: the key interface between an LLM and a tool is the docstring. ...

How a Simple Power Automate Workflow Automated 250+ Hours of Work Per Month

Not every automation has to be sophisticated to matter. This one is a form, an AI analysis step, and a spreadsheet output. It runs on about ~1,500 products a month. Each run saves about 10 of manual work. That adds up to ~250 hours per month and an estimated ~$500,000 per year in labor cost that no longer exists. This is a post about how it works, what made it possible, and what the experience taught me about where AI automation creates real business value. ...

Scoring RAG Answer Quality with an LLM Judge

The previous post in this series built an eval harness that scores retrieval quality: does the right documentation page appear in the retrieved chunks? 7/8 passing, 88%. A useful signal. But retrieval quality and answer quality are different things. A test can pass retrieval scoring and still produce a bad answer. A test can fail retrieval scoring and still produce a correct one. Source URL retrieval is a proxy — a fast, cheap proxy that catches a lot of problems, but not all of them. ...

How to Design RAG Eval Test Cases

A working RAG pipeline is easy. Knowing whether it will keep working after you change something is harder, and most projects skip that part entirely. Here the focus is designing an eval harness that catches real problems, using the Anthropic docs RAG agent as the example. What an eval harness does An eval harness is a script that runs a fixed set of test cases against your pipeline and produces a pass/fail score. Run it before and after a change — if the score drops, the change broke something. If it improves, the change helped. ...

RAG Retrieval: Chunking, Embeddings, Reranking, and an Eval

This series covers building a RAG pipeline to answer questions about the Anthropic documentation. A RAG agent answers questions by first searching a private knowledge base, then passing the relevant excerpts to an LLM as context — the model reads the actual source material before it responds, rather than guessing from training data. Here the focus is the retrieval layer: how to chunk text, embed it, retrieve it, and measure whether retrieval is actually working. ...