I built Ozark Ridge, a mock outdoor gear retail site with AI-powered product search and a Rufus-style product assistant. The project exists to demonstrate RAG (Retrieval-Augmented Generation) in a realistic e-commerce context.

This is the first post in a series documenting the build. This one covers the architecture and stack decisions. Later posts cover the RAG pipeline, keyword vs semantic search comparison, and building the AI assistant.


What it does

Two features:

AI Search — natural language product search using semantic similarity instead of keyword matching. “Something warm for cold nights at camp” returns sleeping bags and insulated jackets rated for cold weather, even though those products don’t contain the words “warm” or “nights” in their descriptions.

Product Q&A assistant — a floating chat widget (like Amazon’s Rufus) that answers product questions, suggests complementary gear, and builds camping loadouts on request. Ask “what sleeping bag pairs with this tent?” and it retrieves related products from the vector store and recommends specific options with reasoning.

The catalog is 1180 products across tents, sleeping bags, footwear, apparel, packs, and camp kitchen gear. All generated from 20 product archetypes using a template-based system. The descriptions are rich enough to make semantic search interesting — specific attributes, use cases, technical specs, all in natural language.


The stack

LayerToolWhy
FrontendReact (Vite)Standard, fast, nothing fancy
BackendFastAPIAsync-native Python, automatic API docs
DatabaseNeon (Postgres)Free tier Postgres, source of truth for product data
Vector storeChromaDBLocal, persistent, zero setup
Indexing/retrievalLlamaIndexRAG framework — handles embedding and retrieval
EmbeddingsBAAI/bge-small-en-v1.5Local HuggingFace model, no API cost
InferenceAnthropic API (Claude)AI search summaries + assistant responses
ImagesPlaceholder imagesAvoiding API rate limits and cost

Why RAG

The core problem: AI (Claude) doesn’t know your product catalog. You can’t fine-tune it every time a product changes or inventory shifts. You could paste your entire catalog into every prompt, but that’s expensive, slow, and hits context window limits fast with a catalog of any real size.

RAG solves this by retrieving only the relevant subset of products at query time and injecting them into Claude’s context. The three-step pattern:

  1. Index — at ingest time, embed each product’s text into a vector and store it in ChromaDB
  2. Retrieve — at query time, embed the user’s query and find the closest matching product vectors
  3. Generate — pass the retrieved products + user query to Claude, get a response

LlamaIndex handles steps 1 and 2. Anthropic handles step 3.

This is the dominant pattern for production AI applications that need domain-specific knowledge.


Architecture diagram

React Frontend (localhost:5173)
     ├── GET /products                ────► FastAPI (localhost:8000)
     ├── GET /products/:id            ────►       │
     ├── GET /search/keyword?q=...    ────►       ├── Neon (Postgres)
     ├── POST /search/ai              ────►       ├── ChromaDB (local)
     └── POST /assistant              ────►       ├── LlamaIndex
                                                  └── Anthropic API

The separation matters. Neon is the source of truth — prices change, descriptions update, inventory moves. Other teams could query it directly. ChromaDB is an index built from a snapshot of that data, used only for AI features. If a product description changes in Neon, you re-run ingestion to update ChromaDB. This separation is the correct production pattern.


Why these tools

Neon over local Postgres — the free tier is generous, it’s serverless, and connection strings work exactly like regular Postgres. No Docker, no local database management.

ChromaDB over Pinecone/Weaviate — local, file-based, zero configuration. For a portfolio project it’s the right choice. In production you’d swap it for a hosted vector store, but the LlamaIndex abstraction makes that a one-line change.

LlamaIndex over building retrieval manually — LlamaIndex handles embedding, indexing, and retrieval with a consistent API. You could write this yourself with sentence-transformers and raw ChromaDB calls, but LlamaIndex eliminates boilerplate without hiding what’s happening under the hood.

bge-small-en-v1.5 over OpenAI embeddings — runs locally via HuggingFace, no API cost, ~130MB model download. First run takes an extra 30 seconds to download and cache the model, then it’s instant. For a demo with 1180 products, it embedded everything in 39 seconds on CPU.

FastAPI over Flask — async-native Python means database calls don’t block other requests. For a demo it doesn’t matter much, but it’s the correct production pattern and interviewers notice.


Here’s what happens when someone searches for “waterproof tent for 2 people under $300”:

  1. React sends POST /search/ai with {"query": "..."}
  2. FastAPI handler calls the LlamaIndex retriever
  3. LlamaIndex embeds the query using bge-small-en-v1.5 (local, no API call)
  4. ChromaDB performs cosine similarity search, returns top-10 product vectors + scores
  5. FastAPI fetches full product records from Neon using the product_id metadata from ChromaDB
  6. FastAPI builds a prompt: system prompt + query + retrieved product details
  7. Claude returns structured JSON: {"summary": "...", "product_ids": [...]}
  8. FastAPI returns summary + full product objects to frontend
  9. React renders AI summary card above product grid

The key architectural point: ChromaDB returns product IDs, not full product data. You use those IDs to fetch fresh records from Neon. The vector store is an index, not a database. Prices might have changed, inventory might have shifted — Neon is the source of truth.


What makes this realistic

Most RAG tutorials index a handful of documents and call it done. This project makes different choices that push it closer to production:

Volume — 1180 products is small for retail but large enough that retrieval quality matters. Bad chunking or poor embeddings surface as visible problems at this scale.

Archetype-based generation — products aren’t manually written. They’re generated from templates with realistic variation in attributes, descriptions, and use cases. That’s closer to how enterprise catalogs actually work.

Keyword search for contrast — building both keyword and AI search side-by-side lets you demonstrate why semantic search matters. “Something warm for cold nights at camp” fails keyword search entirely and succeeds with AI. That’s the demo.

Dual-purpose database — Neon isn’t just for the AI features. It’s structured so that other services could query it too. The schema includes indexes for category filtering, full-text search vectors, and standard relational patterns.


What I’d add in v2

Filters — let users constrain AI search by category, price range, or brand. The retrieval logic would query ChromaDB with metadata filters before running similarity search.

Hybrid search — combine vector similarity with keyword matching. Some queries (“Big Agnes tent”) have an exact brand match; pure semantic search might miss it if the description doesn’t emphasize the brand enough.

Re-ranking — retrieve top-20 by vector similarity, then re-rank to top-5 using a cross-encoder model. Improves precision for ambiguous queries. Voyage AI’s rerank API is the standard choice here.

Evaluation harness — build a test suite that checks whether the right products surface for known queries. Without this, every change is a guess. With it, you can measure whether top-k changes or embedding model swaps improve or degrade retrieval quality.


The broader point

RAG is compelling for e-commerce because product catalogs change constantly. New products arrive, descriptions update, prices shift, inventory moves. Fine-tuning a model every time something changes is impossible. Pasting the entire catalog into every prompt doesn’t scale.

Retrieval sidesteps both problems. The vector index rebuilds overnight as products change. Each query pulls only the relevant subset. Claude generates answers grounded in current, accurate product data without needing to know the full catalog.

The challenge isn’t the framework — LlamaIndex makes the mechanics trivial. The challenge is understanding what makes retrieval work: how embeddings capture meaning, why cosine similarity measures it geometrically, when to re-rank, how to choose top-k, and how to prevent Claude from hallucinating products that weren’t retrieved.

Those are the questions every AI Solutions Engineer interview will ask.

Series navigation

Next: Building the Catalog and Ingestion Pipeline


Source code

Full project: github.com/tylerwellss/ozark-ridge

LlamaIndex docs: docs.llamaindex.ai

ChromaDB docs: docs.trychroma.com

Anthropic API docs: docs.anthropic.com