The first post covered why I split document analysis into multiple agents. This one covers how the specialists are actually built.
The Python code is not the hard part.
The specialist behavior mostly comes from:
- the system prompt
- the output schema
- the boundaries around what the agent should ignore
The code is intentionally repetitive. Once you’ve written a couple agents, it’s a breeze.
The shared base class
Every agent needs the same basic execution logic:
- receive text
- call Claude
- clean the response
- parse JSON
That belongs in one place.
class BaseAgent:
def __init__(self, client, model, system_prompt, max_tokens=1024):
self.client = client
self.model = model
self.system_prompt = system_prompt
self.max_tokens = max_tokens
def analyze(self, text: str) -> dict:
message = self.client.messages.create(
model=self.model,
max_tokens=self.max_tokens,
system=self.system_prompt,
messages=[
{
"role": "user",
"content": text,
}
],
)
response_text = message.content[0].text
cleaned = clean_json_response(response_text)
return json.loads(cleaned)
This is the reusable part.
Everything else is a prompt.
TechnicalAnalyzer
The technical agent looks for things like:
- missing architecture details
- API ambiguity
- integration gaps
- scalability problems
- security implementation gaps
- performance requirements without test criteria
The important thing is what it is not supposed to do.
If the technical agent also analyzes every legal, financial, and delivery issue, the separation breaks down. It becomes a generalist again.
TECHNICAL_ANALYZER_SYSTEM_PROMPT = """
You are a technical requirements analyst.
Analyze RFPs, contracts, and technical specifications.
Focus on:
- architecture ambiguity
- integration requirements
- API requirements
- performance and scalability
- security implementation details
- missing technical specifications
Do not analyze budget, delivery timeline, vendor/legal risk, or general project
management risk unless it directly affects technical feasibility.
Return JSON only in this format:
{
"agent": "technical",
"findings": [
{
"category": "",
"description": "",
"severity": "",
"evidence": ""
}
]
}
"""
The first version of this prompt drifted. It flagged budget problems and timeline issues. Those findings were useful, but they belonged to other agents.
That was the first lesson: specialist prompts need negative instructions, not just positive instructions.
“Focus on technical issues” is not enough. You also need “do not analyze cost unless it directly affects technical feasibility.”
RiskAnalyzer
The risk agent needs a different shape.
A technical finding can be a missing API spec. A risk finding needs likelihood, impact, and mitigation.
RISK_ANALYZER_SYSTEM_PROMPT = """
You are a risk analyst.
Analyze RFPs, contracts, and technical specifications for potential risks.
Focus on:
- legal risk
- compliance risk
- operational risk
- financial risk
- delivery risk
- vendor risk
- security risk
Return JSON only in this format:
{
"agent": "risk",
"findings": [
{
"category": "",
"description": "",
"severity": "",
"likelihood": "",
"impact": "",
"evidence": "",
"mitigation": ""
}
]
}
"""
This output format forces the model to think like a risk analyst instead of just listing concerns.
For example, “data migration is unspecified” becomes:
{
"category": "Legal Risk",
"description": "Data migration risks are acknowledged but no contractual mechanism assigns responsibility or liability.",
"severity": "High",
"likelihood": "High",
"impact": "Data loss, corruption, compliance violations, warranty disputes, litigation",
"evidence": "RFP acknowledges 'Data migration risks' but lacks validation, rollback, liability, or acceptance criteria.",
"mitigation": "Define a migration plan with validation checkpoints, rollback procedures, and vendor liability for data loss."
}
That is more useful than “migration is risky.”
The schema matters because it forces the model to produce decision-relevant fields.
CostAnalyzer
The cost agent looks at the same document and asks a different question:
What will cost money, and what cost information is missing?
The first version of this schema had a generic cost_impact field. That was too vague.
The better version uses:
{
"category": "",
"cost_type": "",
"description": "",
"severity": "",
"evidence": "",
"estimated_impact": "",
"mitigation": "",
"question_for_vendor": ""
}
The most valuable field is question_for_vendor.
RFP analysis is often about uncertainty. The system should not invent precise numbers when the document does not support them. But it should identify what needs to be clarified.
Example:
{
"category": "Licensing Model",
"cost_type": "Ongoing Operations",
"description": "The RFP does not specify whether licensing is per-user, per-document, subscription, or perpetual.",
"severity": "Medium",
"evidence": "The budget includes implementation, licensing, and support but does not specify licensing model.",
"estimated_impact": "Unknown — licensing model directly affects Year 2+ costs.",
"question_for_vendor": "What is your licensing model? How do costs scale with user count and document volume?"
}
This is the right behavior. The system is not pretending to know the answer. It is surfacing the uncertainty and turning it into a vendor question.
That makes the output actionable.
TimelineAnalyzer
The timeline agent looks for:
- milestone realism
- dependency gaps
- compressed schedules
- missing acceptance gates
- sequencing conflicts
- deployment assumptions
The prompt follows the same pattern:
TIMELINE_ANALYZER_SYSTEM_PROMPT = """
You are a timeline and delivery analyst.
Analyze RFPs, contracts, and technical specifications for schedule feasibility.
Focus on:
- milestones and deadlines
- dependencies between workstreams
- unrealistic delivery windows
- missing implementation phases
- testing, rollout, and migration sequencing
- acceptance gates and go/no-go criteria
Do not analyze general technical feasibility, cost, or legal risk unless it
directly affects schedule realism.
Return JSON only in this format:
{
"agent": "timeline",
"findings": [
{
"category": "",
"description": "",
"severity": "",
"evidence": "",
"dependency": "",
"schedule_impact": "",
"recommended_adjustment": ""
}
]
}
"""
This is where RFPs often fall apart. They ask for:
- discovery
- design
- implementation
- integrations
- migration
- compliance validation
- UAT
- training
- production deployment
Then they put “go live in 6 months” at the bottom.
The timeline agent’s job is to make that tension explicit.
The class pattern
Each specialist is tiny:
class CostAnalyzer(BaseAgent):
def __init__(self, client, model="claude-haiku-4-5"):
super().__init__(
client=client,
model=model,
system_prompt=COST_ANALYZER_SYSTEM_PROMPT,
max_tokens=2048,
)
That is why creating new specialists starts to feel routine.
The point of the abstraction is not to show off inheritance. It is to make the repeated part disappear so the design work shifts to:
- what should this agent care about?
- what should it ignore?
- what fields make its output useful?
- how will the coordinator use the result?
That is where the real design work happens.
Output comparison
On the sample RFP, the agents found different aspects of the same underlying issues.
Technical:
REST API requirement lacks interface specifications: rate limiting,
authentication, response formats, versioning strategy, and endpoint SLA.
Risk:
Complex multi-system integration with limited technical requirements detail
creates a high likelihood of integration failure, data sync issues, and delays.
Cost:
Salesforce, data warehouse, and Okta integrations may require custom API
development and testing. Scope is undefined, creating cost uncertainty.
Timeline:
The 6-month delivery window does not appear to include enough time for
integration discovery, testing, migration validation, and production rollout.
Same underlying issue. Four useful lenses.
Overlap is not always bad
At first, I wanted strict separation. Technical should only do technical. Risk should only do risk. Cost should only do cost.
That is mostly right, but not completely.
Some overlap is useful because it gives the coordinator confidence. If integration ambiguity shows up in technical, risk, cost, and timeline findings, that is probably a top concern.
The goal is controlled overlap:
- enough overlap to reveal cross-cutting issues
- not so much overlap that every agent says the same thing
That is why the prompts need boundaries.
What I would improve next
Pydantic schemas. Right now the agents ask for JSON and parse it. That works for a demo, but it is not enough for production. Each agent should have a Pydantic model validating required fields, allowed severity levels, and known enum values.
Retry logic. If JSON parsing fails, retry with the parsing error and ask the model to repair the output. The current cleaner handles markdown fences but not malformed JSON.
Finding IDs. Each finding should have a stable ID so the coordinator can cite it. Right now the coordinator knows which agent produced a finding, but not exactly which finding triggered each top concern.
Evaluation. The next serious step is an eval harness with sample RFPs and expected findings. Without that, prompt changes are guesswork.
The broader lesson
The specialist agents are not complex because the code is complex.
They are complex because each one needs a clear analytical contract.
A good specialist has:
- a narrow focus
- explicit boundaries
- a schema that matches its job
- evidence fields to keep it grounded
- enough structure for the coordinator to use
The code is just the delivery mechanism.
Series navigation
Previous: Why Multi-Agent Systems Beat Single Agents for Complex Documents
Next: Coordinating Multiple LLM Agents: Cross-Domain Synthesis