Building AI Search for a Retail Website: The Stack and Why

I built Ozark Ridge, a mock outdoor gear retail site with AI-powered product search and a Rufus-style product assistant. The project exists to demonstrate RAG (Retrieval-Augmented Generation) in a realistic e-commerce context.

This is the first post in a series documenting the build. This one covers the architecture and stack decisions. Later posts cover the RAG pipeline, keyword vs semantic search comparison, and building the AI assistant.

What it does

Two features:

AI Search — natural language product search using semantic similarity instead of keyword matching. “Something warm for cold nights at camp” returns sleeping bags and insulated jackets rated for cold weather, even though those products don’t contain the words “warm” or “nights” in their descriptions.

Product Q&A assistant — a floating chat widget (like Amazon’s Rufus) that answers product questions, suggests complementary gear, and builds camping loadouts on request. Ask “what sleeping bag pairs with this tent?” and it retrieves related products from the vector store and recommends specific options with reasoning.

The catalog is 1180 products across tents, sleeping bags, footwear, apparel, packs, and camp kitchen gear. All generated from 20 product archetypes using a template-based system. The descriptions are rich enough to make semantic search interesting — specific attributes, use cases, technical specs, all in natural language.

The stack

Layer	Tool	Why
Frontend	React (Vite)	Standard, fast, nothing fancy
Backend	FastAPI	Async-native Python, automatic API docs
Database	Neon (Postgres)	Free tier Postgres, source of truth for product data
Vector store	ChromaDB	Local, persistent, zero setup
Indexing/retrieval	LlamaIndex	RAG framework — handles embedding and retrieval
Embeddings	`BAAI/bge-small-en-v1.5`	Local HuggingFace model, no API cost
Inference	Anthropic API (Claude)	AI search summaries + assistant responses
Images	Placeholder images	Avoiding API rate limits and cost

Why RAG

The core problem: AI (Claude) doesn’t know your product catalog. You can’t fine-tune it every time a product changes or inventory shifts. You could paste your entire catalog into every prompt, but that’s expensive, slow, and hits context window limits fast with a catalog of any real size.

RAG solves this by retrieving only the relevant subset of products at query time and injecting them into Claude’s context. The three-step pattern:

Index — at ingest time, embed each product’s text into a vector and store it in ChromaDB
Retrieve — at query time, embed the user’s query and find the closest matching product vectors
Generate — pass the retrieved products + user query to Claude, get a response

LlamaIndex handles steps 1 and 2. Anthropic handles step 3.

This is the dominant pattern for production AI applications that need domain-specific knowledge.

Architecture diagram

React Frontend (localhost:5173)
     │
     ├── GET /products                ────► FastAPI (localhost:8000)
     ├── GET /products/:id            ────►       │
     ├── GET /search/keyword?q=...    ────►       ├── Neon (Postgres)
     ├── POST /search/ai              ────►       ├── ChromaDB (local)
     └── POST /assistant              ────►       ├── LlamaIndex
                                                  └── Anthropic API

The separation matters. Neon is the source of truth — prices change, descriptions update, inventory moves. Other teams could query it directly. ChromaDB is an index built from a snapshot of that data, used only for AI features. If a product description changes in Neon, you re-run ingestion to update ChromaDB. This separation is the correct production pattern.

Why these tools

Neon over local Postgres — the free tier is generous, it’s serverless, and connection strings work exactly like regular Postgres. No Docker, no local database management.

ChromaDB over Pinecone/Weaviate — local, file-based, zero configuration. For a portfolio project it’s the right choice. In production you’d swap it for a hosted vector store, but the LlamaIndex abstraction makes that a one-line change.

LlamaIndex over building retrieval manually — LlamaIndex handles embedding, indexing, and retrieval with a consistent API. You could write this yourself with sentence-transformers and raw ChromaDB calls, but LlamaIndex eliminates boilerplate without hiding what’s happening under the hood.

bge-small-en-v1.5 over OpenAI embeddings — runs locally via HuggingFace, no API cost, ~130MB model download. First run takes an extra 30 seconds to download and cache the model, then it’s instant. For a demo with 1180 products, it embedded everything in 39 seconds on CPU.

FastAPI over Flask — async-native Python means database calls don’t block other requests. For a demo it doesn’t matter much, but it’s the correct production pattern and interviewers notice.

The data flow for AI search

Here’s what happens when someone searches for “waterproof tent for 2 people under $300”:

React sends POST /search/ai with {"query": "..."}
FastAPI handler calls the LlamaIndex retriever
LlamaIndex embeds the query using bge-small-en-v1.5 (local, no API call)
ChromaDB performs cosine similarity search, returns top-10 product vectors + scores
FastAPI fetches full product records from Neon using the product_id metadata from ChromaDB
FastAPI builds a prompt: system prompt + query + retrieved product details
Claude returns structured JSON: {"summary": "...", "product_ids": [...]}
FastAPI returns summary + full product objects to frontend
React renders AI summary card above product grid

The key architectural point: ChromaDB returns product IDs, not full product data. You use those IDs to fetch fresh records from Neon. The vector store is an index, not a database. Prices might have changed, inventory might have shifted — Neon is the source of truth.

What makes this realistic

Most RAG tutorials index a handful of documents and call it done. This project makes different choices that push it closer to production:

Volume — 1180 products is small for retail but large enough that retrieval quality matters. Bad chunking or poor embeddings surface as visible problems at this scale.

Archetype-based generation — products aren’t manually written. They’re generated from templates with realistic variation in attributes, descriptions, and use cases. That’s closer to how enterprise catalogs actually work.

Keyword search for contrast — building both keyword and AI search side-by-side lets you demonstrate why semantic search matters. “Something warm for cold nights at camp” fails keyword search entirely and succeeds with AI. That’s the demo.

Dual-purpose database — Neon isn’t just for the AI features. It’s structured so that other services could query it too. The schema includes indexes for category filtering, full-text search vectors, and standard relational patterns.

What I’d add in v2

Filters — let users constrain AI search by category, price range, or brand. The retrieval logic would query ChromaDB with metadata filters before running similarity search.

Hybrid search — combine vector similarity with keyword matching. Some queries (“Big Agnes tent”) have an exact brand match; pure semantic search might miss it if the description doesn’t emphasize the brand enough.

Re-ranking — retrieve top-20 by vector similarity, then re-rank to top-5 using a cross-encoder model. Improves precision for ambiguous queries. Voyage AI’s rerank API is the standard choice here.

Evaluation harness — build a test suite that checks whether the right products surface for known queries. Without this, every change is a guess. With it, you can measure whether top-k changes or embedding model swaps improve or degrade retrieval quality.

The broader point

RAG is compelling for e-commerce because product catalogs change constantly. New products arrive, descriptions update, prices shift, inventory moves. Fine-tuning a model every time something changes is impossible. Pasting the entire catalog into every prompt doesn’t scale.

Retrieval sidesteps both problems. The vector index rebuilds overnight as products change. Each query pulls only the relevant subset. Claude generates answers grounded in current, accurate product data without needing to know the full catalog.

The challenge isn’t the framework — LlamaIndex makes the mechanics trivial. The challenge is understanding what makes retrieval work: how embeddings capture meaning, why cosine similarity measures it geometrically, when to re-rank, how to choose top-k, and how to prevent Claude from hallucinating products that weren’t retrieved.

Those are the questions every AI Solutions Engineer interview will ask.

Next: Building the Catalog and Ingestion Pipeline

Source code

Full project: github.com/tylerwellss/ozark-ridge

LlamaIndex docs: docs.llamaindex.ai

ChromaDB docs: docs.trychroma.com

Anthropic API docs: docs.anthropic.com

What it does#

The stack#

Why RAG#

Architecture diagram#

Why these tools#

The data flow for AI search#

What makes this realistic#

What I’d add in v2#

The broader point#

Series navigation#

Source code#