Building Specialist LLM Agents: Technical, Risk, Cost, and Timeline Analysis

The first post covered why I split document analysis into multiple agents. This one covers how the specialists are actually built.

The Python code is not the hard part.

The specialist behavior mostly comes from:

the system prompt
the output schema
the boundaries around what the agent should ignore

The code is intentionally repetitive. Once you’ve written a couple agents, it’s a breeze.

The shared base class

Every agent needs the same basic execution logic:

receive text
call Claude
clean the response
parse JSON

That belongs in one place.

class BaseAgent:
    def __init__(self, client, model, system_prompt, max_tokens=1024):
        self.client = client
        self.model = model
        self.system_prompt = system_prompt
        self.max_tokens = max_tokens

    def analyze(self, text: str) -> dict:
        message = self.client.messages.create(
            model=self.model,
            max_tokens=self.max_tokens,
            system=self.system_prompt,
            messages=[
                {
                    "role": "user",
                    "content": text,
                }
            ],
        )

        response_text = message.content[0].text
        cleaned = clean_json_response(response_text)
        return json.loads(cleaned)

This is the reusable part.

Everything else is a prompt.

TechnicalAnalyzer

The technical agent looks for things like:

missing architecture details
API ambiguity
integration gaps
scalability problems
security implementation gaps
performance requirements without test criteria

The important thing is what it is not supposed to do.

If the technical agent also analyzes every legal, financial, and delivery issue, the separation breaks down. It becomes a generalist again.

TECHNICAL_ANALYZER_SYSTEM_PROMPT = """
You are a technical requirements analyst.

Analyze RFPs, contracts, and technical specifications.

Focus on:
- architecture ambiguity
- integration requirements
- API requirements
- performance and scalability
- security implementation details
- missing technical specifications

Do not analyze budget, delivery timeline, vendor/legal risk, or general project
management risk unless it directly affects technical feasibility.

Return JSON only in this format:
{
  "agent": "technical",
  "findings": [
    {
      "category": "",
      "description": "",
      "severity": "",
      "evidence": ""
    }
  ]
}
"""

The first version of this prompt drifted. It flagged budget problems and timeline issues. Those findings were useful, but they belonged to other agents.

That was the first lesson: specialist prompts need negative instructions, not just positive instructions.

“Focus on technical issues” is not enough. You also need “do not analyze cost unless it directly affects technical feasibility.”

RiskAnalyzer

The risk agent needs a different shape.

A technical finding can be a missing API spec. A risk finding needs likelihood, impact, and mitigation.

RISK_ANALYZER_SYSTEM_PROMPT = """
You are a risk analyst.

Analyze RFPs, contracts, and technical specifications for potential risks.

Focus on:
- legal risk
- compliance risk
- operational risk
- financial risk
- delivery risk
- vendor risk
- security risk

Return JSON only in this format:
{
  "agent": "risk",
  "findings": [
    {
      "category": "",
      "description": "",
      "severity": "",
      "likelihood": "",
      "impact": "",
      "evidence": "",
      "mitigation": ""
    }
  ]
}
"""

This output format forces the model to think like a risk analyst instead of just listing concerns.

For example, “data migration is unspecified” becomes:

{
  "category": "Legal Risk",
  "description": "Data migration risks are acknowledged but no contractual mechanism assigns responsibility or liability.",
  "severity": "High",
  "likelihood": "High",
  "impact": "Data loss, corruption, compliance violations, warranty disputes, litigation",
  "evidence": "RFP acknowledges 'Data migration risks' but lacks validation, rollback, liability, or acceptance criteria.",
  "mitigation": "Define a migration plan with validation checkpoints, rollback procedures, and vendor liability for data loss."
}

That is more useful than “migration is risky.”

The schema matters because it forces the model to produce decision-relevant fields.

CostAnalyzer

The cost agent looks at the same document and asks a different question:

What will cost money, and what cost information is missing?

The first version of this schema had a generic cost_impact field. That was too vague.

The better version uses:

{
  "category": "",
  "cost_type": "",
  "description": "",
  "severity": "",
  "evidence": "",
  "estimated_impact": "",
  "mitigation": "",
  "question_for_vendor": ""
}

The most valuable field is question_for_vendor.

RFP analysis is often about uncertainty. The system should not invent precise numbers when the document does not support them. But it should identify what needs to be clarified.

Example:

{
  "category": "Licensing Model",
  "cost_type": "Ongoing Operations",
  "description": "The RFP does not specify whether licensing is per-user, per-document, subscription, or perpetual.",
  "severity": "Medium",
  "evidence": "The budget includes implementation, licensing, and support but does not specify licensing model.",
  "estimated_impact": "Unknown — licensing model directly affects Year 2+ costs.",
  "question_for_vendor": "What is your licensing model? How do costs scale with user count and document volume?"
}

This is the right behavior. The system is not pretending to know the answer. It is surfacing the uncertainty and turning it into a vendor question.

That makes the output actionable.

TimelineAnalyzer

The timeline agent looks for:

milestone realism
dependency gaps
compressed schedules
missing acceptance gates
sequencing conflicts
deployment assumptions

The prompt follows the same pattern:

TIMELINE_ANALYZER_SYSTEM_PROMPT = """
You are a timeline and delivery analyst.

Analyze RFPs, contracts, and technical specifications for schedule feasibility.

Focus on:
- milestones and deadlines
- dependencies between workstreams
- unrealistic delivery windows
- missing implementation phases
- testing, rollout, and migration sequencing
- acceptance gates and go/no-go criteria

Do not analyze general technical feasibility, cost, or legal risk unless it
directly affects schedule realism.

Return JSON only in this format:
{
  "agent": "timeline",
  "findings": [
    {
      "category": "",
      "description": "",
      "severity": "",
      "evidence": "",
      "dependency": "",
      "schedule_impact": "",
      "recommended_adjustment": ""
    }
  ]
}
"""

This is where RFPs often fall apart. They ask for:

discovery
design
implementation
integrations
migration
compliance validation
UAT
training
production deployment

Then they put “go live in 6 months” at the bottom.

The timeline agent’s job is to make that tension explicit.

The class pattern

Each specialist is tiny:

class CostAnalyzer(BaseAgent):
    def __init__(self, client, model="claude-haiku-4-5"):
        super().__init__(
            client=client,
            model=model,
            system_prompt=COST_ANALYZER_SYSTEM_PROMPT,
            max_tokens=2048,
        )

That is why creating new specialists starts to feel routine.

The point of the abstraction is not to show off inheritance. It is to make the repeated part disappear so the design work shifts to:

what should this agent care about?
what should it ignore?
what fields make its output useful?
how will the coordinator use the result?

That is where the real design work happens.

Output comparison

On the sample RFP, the agents found different aspects of the same underlying issues.

Technical:

REST API requirement lacks interface specifications: rate limiting,
authentication, response formats, versioning strategy, and endpoint SLA.

Risk:

Complex multi-system integration with limited technical requirements detail
creates a high likelihood of integration failure, data sync issues, and delays.

Cost:

Salesforce, data warehouse, and Okta integrations may require custom API
development and testing. Scope is undefined, creating cost uncertainty.

Timeline:

The 6-month delivery window does not appear to include enough time for
integration discovery, testing, migration validation, and production rollout.

Same underlying issue. Four useful lenses.

Overlap is not always bad

At first, I wanted strict separation. Technical should only do technical. Risk should only do risk. Cost should only do cost.

That is mostly right, but not completely.

Some overlap is useful because it gives the coordinator confidence. If integration ambiguity shows up in technical, risk, cost, and timeline findings, that is probably a top concern.

The goal is controlled overlap:

enough overlap to reveal cross-cutting issues
not so much overlap that every agent says the same thing

That is why the prompts need boundaries.

What I would improve next

Pydantic schemas. Right now the agents ask for JSON and parse it. That works for a demo, but it is not enough for production. Each agent should have a Pydantic model validating required fields, allowed severity levels, and known enum values.

Retry logic. If JSON parsing fails, retry with the parsing error and ask the model to repair the output. The current cleaner handles markdown fences but not malformed JSON.

Finding IDs. Each finding should have a stable ID so the coordinator can cite it. Right now the coordinator knows which agent produced a finding, but not exactly which finding triggered each top concern.

Evaluation. The next serious step is an eval harness with sample RFPs and expected findings. Without that, prompt changes are guesswork.

The broader lesson

The specialist agents are not complex because the code is complex.

They are complex because each one needs a clear analytical contract.

A good specialist has:

a narrow focus
explicit boundaries
a schema that matches its job
evidence fields to keep it grounded
enough structure for the coordinator to use

The code is just the delivery mechanism.

Previous: Why Multi-Agent Systems Beat Single Agents for Complex Documents

Next: Coordinating Multiple LLM Agents: Cross-Domain Synthesis

The shared base class#

TechnicalAnalyzer#

RiskAnalyzer#

CostAnalyzer#

TimelineAnalyzer#

The class pattern#

Output comparison#

Overlap is not always bad#

What I would improve next#

The broader lesson#

Series navigation#