Building the AI Product Assistant: Context Injection, Multi-Turn Chat, and Cross-Product Retrieval

The previous posts focused on search. This one turns to the AI assistant — a floating chat widget that answers product questions, recommends complementary gear, and builds camping loadouts on request.

Under the hood, it is a multi-turn conversation system with history, context injection when viewing a product, and dynamic retrieval when the query requires cross-product knowledge.

What the assistant does

Three core capabilities:

Product Q&A — user is viewing a tent, asks “Is this waterproof?”, assistant answers from the product description without retrieving anything.

Cross-product recommendations — user is viewing a tent, asks “What sleeping bag pairs with this?”, assistant retrieves sleeping bags from ChromaDB and recommends specific options with reasoning.

Loadout building — user asks “Build me a camping setup for a family of 4, budget $800”, assistant retrieves across categories (tents, sleeping bags, pads, etc.) and constructs a complete kit with total price.

The first requires only the current product context. The second and third require retrieval. The challenge is deciding which queries need retrieval and which don’t.

The state management problem

Claude is stateless. Every API call is independent — it has no memory of previous messages. To build a multi-turn conversation, the frontend must store the full message history and send it with every request.

The React state:

const [messages, setMessages] = useState([]);
const [currentProductId, setCurrentProductId] = useState(null);
const [isLoading, setIsLoading] = useState(false);

When the user sends a message:

const handleSend = async (message) => {
  // Add user message to state
  const newMessages = [...messages, { role: "user", content: message }];
  setMessages(newMessages);
  setIsLoading(true);
  
  // Send full history + current product context to backend
  const response = await fetch("/api/assistant", {
    method: "POST",
    headers: { "Content-Type": "application/json" },
    body: JSON.stringify({
      message,
      history: messages,  // Full conversation history
      current_product_id: currentProductId
    })
  });
  
  const data = await response.json();
  
  // Add assistant response to state
  setMessages([...newMessages, { role: "assistant", content: data.response }]);
  setIsLoading(false);
};

Every request includes:

The new user message
The full conversation history
The current product ID (if on a product detail page)

The backend receives this and passes it all to Claude. From Claude’s perspective, each call is independent but contains the full conversation context.

Context injection: knowing what product the user is viewing

When the user is on a product detail page, the assistant needs to know that without the user having to say “I’m looking at the Big Agnes Ridgeline 2-Person Tent.”

The frontend tracks this with currentProductId state, set when navigating to a product page:

// In ProductDetail.jsx
useEffect(() => {
  setCurrentProductId(product.id);
  return () => setCurrentProductId(null);  // Clear on unmount
}, [product.id]);

The backend receives this ID and fetches the product from Neon:

@router.post("/assistant")
async def assistant(
    request: AssistantRequest, 
    db: AsyncSession = Depends(get_db)
):
    current_product = None
    if request.current_product_id:
        result = await db.execute(
            text("SELECT * FROM products WHERE id = :id"),
            {"id": request.current_product_id}
        )
        current_product = result.mappings().first()
    
    # Build system prompt with product context if available
    system_prompt = build_assistant_prompt(current_product)
    # ...

The system prompt changes based on whether a product is in context:

def build_assistant_prompt(current_product: dict | None) -> str:
    base = """You are a helpful outdoor gear expert for Ozark Ridge.
    
    Your role:
    - Answer questions about products accurately and concisely
    - Suggest complementary products when asked
    - Build gear loadouts when asked
    - Compare products when asked
    
    Rules:
    - Only recommend products from the context provided
    - Never make up product names, prices, or specs
    - Keep responses 2-4 sentences unless more detail is needed"""
    
    if current_product:
        base += f"""
        
The customer is currently viewing:
{current_product['name']} by {current_product['brand']} (${current_product['price']})
{current_product['description']}
Tags: {', '.join(current_product.get('tags', []))}"""
    
    return base

When the user asks “Is this tent waterproof?”, Claude sees the full product description in the system prompt and answers without retrieval. When the user navigates away, current_product becomes None and the context disappears.

The retrieval decision: when to query ChromaDB

Not every user message needs retrieval. “What’s the weight of this tent?” can be answered from the current product context. “What sleeping bag pairs with this tent?” cannot.

The heuristic:

RETRIEVAL_TRIGGERS = [
    "recommend", "suggest", "pair", "compare", "alternative",
    "similar", "goes with", "loadout", "kit", "bundle",
    "other", "what else", "anything", "instead", "versus"
]

def needs_retrieval(message: str) -> bool:
    """Check if the message suggests cross-product intent."""
    message_lower = message.lower()
    return any(trigger in message_lower for trigger in RETRIEVAL_TRIGGERS)

Simple keyword matching. If the message contains words that imply looking beyond the current product, trigger retrieval. Otherwise, answer from context alone.

Examples that trigger retrieval:

“What sleeping bag pairs with this?”
“Suggest alternatives under $200”
“Build me a camping kit”
“Compare this to other 2-person tents”

Examples that don’t:

“Is this waterproof?”
“What’s the packed size?”
“How much does it weigh?”

The heuristic isn’t perfect — “What other colors does this come in?” triggers retrieval even though it shouldn’t. But for a v1 it works well enough, and adding more products later gives the assistant legitimate alternative products to suggest even if the query was just about color.

A more sophisticated approach: ask Claude itself whether the query needs retrieval by doing a lightweight classification call first. But that’s a second API call per message, which doubles latency and cost. The heuristic is fast and correct >90% of the time.

Cross-product retrieval: pairing and loadout building

When retrieval is triggered, the flow changes:

@router.post("/assistant")
async def assistant(request: AssistantRequest, db: AsyncSession = Depends(get_db)):
    # ... fetch current_product if available ...
    
    # Check if retrieval is needed
    if needs_retrieval(request.message):
        # Retrieve top-5 related products
        retrieved = retrieve_products(request.message, top_k=5)
        product_ids = [r["product_id"] for r in retrieved]
        
        # Fetch full records from Neon
        placeholders = ", ".join([f":id{i}" for i in range(len(product_ids))])
        result = await db.execute(
            text(f"SELECT * FROM products WHERE id IN ({placeholders})"),
            {f"id{i}": pid for i, pid in enumerate(product_ids)}
        )
        retrieved_products = [dict(row) for row in result.mappings().all()]
    else:
        retrieved_products = []
    
    # Build the full context for Claude
    system_prompt = build_assistant_prompt(current_product, retrieved_products)
    
    # Call Claude with full history + new message
    # ...

The system prompt now includes both the current product and the retrieved products:

def build_assistant_prompt(
    current_product: dict | None,
    retrieved_products: list[dict]
) -> str:
    prompt = base_prompt
    
    if current_product:
        prompt += f"\n\nCurrently viewing:\n{format_product(current_product)}"
    
    if retrieved_products:
        prompt += "\n\nRelated products:\n"
        for p in retrieved_products:
            prompt += format_product(p) + "\n"
    
    return prompt

def format_product(p: dict) -> str:
    return f"{p['name']} by {p['brand']} (${p['price']})\n{p['description']}"

Now when the user asks “What sleeping bag pairs with this tent?”, Claude sees:

The tent they’re viewing (from current product context)
5 sleeping bags retrieved from ChromaDB
The full conversation history
The new question

Claude can reason about temperature ratings, pack size, weight budgets, and price to recommend a specific bag with justification.

Loadout building: multi-category retrieval

The most complex queries are loadout requests: “Build me a camping setup for a family of 4, budget $800.”

This requires:

Retrieving across multiple categories (tents, sleeping bags, pads, etc.)
Staying within budget
Ensuring compatibility (4-person tent + 4 sleeping bags + 4 pads)

The retrieval query sent to ChromaDB is the user’s original message: “camping setup for a family of 4, budget $800”. The embedding model understands this implies multiple product types.

ChromaDB returns the top-5 most semantically similar products. Because the corpus includes tents, sleeping bags, pads, and other gear, the top-5 will naturally span categories if the query is broad enough.

Result for the family camping query:

1. Big Agnes Meadow 6-Person Camping Tent ($335)
2. Kelty Trailmix 34°F Insulated Bag ($91)
3. REI Co-op Strato Sleeping Pad ($89)
4. Coleman Basecamp 6-Person Camping Tent ($198)
5. NEMO Cumulus Sleeping Pad ($129)

The mix of tents, bags, and pads gives Claude enough context to build a loadout. Claude’s response:

For a family of 4 camping setup under $800, I’d recommend:
Coleman Basecamp 6-Person Tent ($198) — spacious enough for 4 with room for gear
4x Kelty Trailmix 34°F Sleeping Bags ($364 total) — warm enough for 3-season camping
4x REI Co-op Strato Sleeping Pads ($356 total) — comfortable and affordable
Total: $918
This is slightly over budget. To stay under $800, consider the Coleman tent ($198) + 4x Kelty bags ($364) + budget foam pads instead of inflatables.

Claude did the math, noticed it exceeded budget, and suggested an alternative. This is the value of passing full product details (including price) in the context — Claude can reason about constraints the retrieval system can’t.

Conversation history in practice

Multi-turn conversations only work if history is maintained correctly. Example flow:

Turn 1 User: “What’s a good tent for weekend backpacking?” Assistant: [retrieves tents, recommends 3 options]

Turn 2 User: “What about something cheaper?” Assistant: [sees history, understands “cheaper” means cheaper than the previous recommendations, retrieves again with price constraint]

Turn 3 User: “Does the second one pack small?” Assistant: [sees history, knows “the second one” refers to the tent mentioned in Turn 1, answers from that product’s description]

This requires sending the full history array with every request:

messages = [
    {"role": "user", "content": "What's a good tent for weekend backpacking?"},
    {"role": "assistant", "content": "I'd recommend..."},
    {"role": "user", "content": "What about something cheaper?"},
    {"role": "assistant", "content": "For a more budget-friendly option..."},
    {"role": "user", "content": "Does the second one pack small?"}
]

Claude receives all of this in the API call for Turn 3. It can see “the second one” refers to a tent mentioned earlier and answer accordingly.

Without history, “Does the second one pack small?” is ambiguous and unanswerable.

The assistant is always available, not just on product pages. It’s a fixed-position React component that overlays the page:

// ChatWidget.jsx
const [isOpen, setIsOpen] = useState(false);
const [messages, setMessages] = useState([]);

return (
  <div className="chat-widget">
    {!isOpen && (
      <button onClick={() => setIsOpen(true)} className="chat-button">
        💬 Ask about gear
      </button>
    )}
    
    {isOpen && (
      <div className="chat-panel">
        <div className="chat-header">
          <span>Ozark Ridge Assistant</span>
          <button onClick={() => setIsOpen(false)}>✕</button>
        </div>
        
        <div className="chat-messages">
          {messages.map((msg, i) => (
            <Message key={i} role={msg.role} content={msg.content} />
          ))}
          {isLoading && <TypingIndicator />}
        </div>
        
        <div className="chat-input">
          <input 
            value={input}
            onChange={(e) => setInput(e.target.value)}
            onKeyDown={(e) => e.key === 'Enter' && handleSend()}
            placeholder="Ask about this product..."
          />
          <button onClick={handleSend}>Send</button>
        </div>
      </div>
    )}
  </div>
);

The collapsed state is a small button in the corner. Clicking it expands the panel. The messages persist across page navigations until the widget is closed.

One UX detail worth noting: when the user navigates to a different product, the chat should either:

Clear and start fresh
Offer a “New conversation” button

This project does #1 — navigating to a new product clears the chat. The reasoning: continuing a conversation about the previous product while viewing a new one creates confusion. “Is this waterproof?” becomes ambiguous.

An alternative: keep the conversation but inject a system message when the product changes: “The user is now viewing [new product]. Adjust context accordingly.” That’s more complex but preserves conversation history across products.

Preventing hallucination

The assistant can only recommend products that exist in the retrieved context. The guardrails:

1. Explicit instruction in the system prompt:

“Only recommend products that appear in the product context provided below. Never make up product names, prices, or specs.”

2. Retrieval-first architecture: The assistant can’t access the full catalog. It only sees:

The current product (if on a PDP)
The top-5 retrieved products (if retrieval was triggered)

If a product wasn’t retrieved, the assistant can’t recommend it.

3. ID-based references: When building loadouts, Claude returns product names. The backend validates these against the retrieved set before returning to the frontend. If Claude invented a product, it won’t match anything and gets filtered out.

This isn’t perfect — Claude could still recommend “the Big Agnes tent” when there are multiple Big Agnes tents in the retrieved set and it’s unclear which one was meant. But it prevents Claude from inventing the “Big Agnes UltraDream 5000” which doesn’t exist.

What I’d add next

Confidence scoring — have Claude rate its confidence in each recommendation (1-5 scale). Surface low-confidence responses to the user differently: “I think the X might work, but I’m not certain — here’s why.”

Conversation reset button — let users clear the chat without navigating away. Useful if the conversation goes off-track or they want to start fresh on the same product.

Suggested follow-ups — after the assistant responds, show 2-3 suggested next questions: “What about a cheaper option?”, “What sleeping bag pairs with this?”, “Show me alternatives.” Reduces typing and guides users toward productive queries.

Streaming responses — use Claude’s streaming API to render the assistant’s response token-by-token instead of waiting for the full response. Makes the assistant feel more responsive even though the total latency is the same.

The architectural lesson

The assistant is fundamentally a RAG application with conversation state. The pattern:

Maintain conversation history in the frontend
Send full history + current context with every request
Decide whether to retrieve based on query type
Pass retrieved products + current product + history to Claude
Return response, append to history, repeat

This pattern generalizes to any conversational assistant that needs domain-specific knowledge. The product catalog could be documentation, internal policies, customer data, inventory — the architecture stays the same.

The challenge isn’t the mechanics (LlamaIndex and Claude make that trivial). The challenge is:

Deciding when to retrieve vs. answer from context
Maintaining conversation state correctly across page navigations
Preventing hallucination when the model doesn’t have the right information
Making the latency acceptable (1-3 seconds per response)

Those are design problems, not implementation problems.

Previous: Keyword Search vs Semantic Search

Next: Building Ozark Ridge: Lessons Learned and What I’d Do Differently

Source code

Full project: github.com/tylerwellss/ozark-ridge

Anthropic API docs: docs.anthropic.com

LlamaIndex docs: docs.llamaindex.ai

Building the AI Product Assistant: Context Injection, Multi-Turn Chat, and Cross-Product Retrieval

What the assistant does

The state management problem

Context injection: knowing what product the user is viewing

The retrieval decision: when to query ChromaDB

Cross-product retrieval: pairing and loadout building

Loadout building: multi-category retrieval

Conversation history in practice

The floating widget UI

Preventing hallucination

What I’d add next

The architectural lesson

Series navigation

Source code

What the assistant does#

The state management problem#

Context injection: knowing what product the user is viewing#

The retrieval decision: when to query ChromaDB#

Cross-product retrieval: pairing and loadout building#

Loadout building: multi-category retrieval#

Conversation history in practice#

The floating widget UI#

Preventing hallucination#

What I’d add next#

The architectural lesson#

Series navigation#

Source code#

What the assistant does

The state management problem

Context injection: knowing what product the user is viewing

The retrieval decision: when to query ChromaDB

Cross-product retrieval: pairing and loadout building

Loadout building: multi-category retrieval

Conversation history in practice

The floating widget UI

Preventing hallucination

What I’d add next

The architectural lesson

Series navigation

Source code