After building the specialist agents, the output looked impressive.
It was not useful enough.
The system produced:
- 12 technical findings
- 14 risk findings
- 10 cost findings
- timeline findings
That is a lot of analysis. It is also a lot to read.
The coordinator is the piece that turns those separate findings into something a person can act on.
Aggregation is not synthesis
The first version of the coordinator just ran the agents and returned their results.
class Coordinator:
def __init__(self, client):
self.technical_agent = TechnicalAnalyzer(client=client)
self.risk_agent = RiskAnalyzer(client=client)
self.cost_agent = CostAnalyzer(client=client)
self.timeline_agent = TimelineAnalyzer(client=client)
def analyze(self, rfp_text: str) -> dict:
technical = self.technical_agent.analyze(rfp_text)
risk = self.risk_agent.analyze(rfp_text)
cost = self.cost_agent.analyze(rfp_text)
timeline = self.timeline_agent.analyze(rfp_text)
return {
"technical": technical,
"risk": risk,
"cost": cost,
"timeline": timeline,
}
This is orchestration, but it is not synthesis.
It answers:
What did each agent find?
It does not answer:
What matters most?
That second question is the coordinator’s job.
The synthesis prompt
The coordinator gets the structured findings from each specialist and looks for cross-domain patterns.
SUMMARY_SYSTEM_PROMPT = """
You are a senior analyst synthesizing findings from multiple specialists.
You are given:
- technical findings
- risk findings
- cost findings
- timeline findings
Your job:
- identify the most important cross-cutting issues
- avoid duplicates
- highlight the biggest risks to success
- recommend concrete next steps
Do not invent facts not supported by the specialist findings.
Do not state specific dollar ranges or industry benchmarks unless supported by input.
When uncertainty exists, label it as uncertainty instead of guessing.
Return JSON only:
{
"summary": {
"overall_assessment": "",
"top_concerns": [],
"recommended_next_steps": []
}
}
"""
The constraints matter. Without them, the coordinator becomes too confident. It starts adding industry benchmarks, cost ranges, and legal conclusions that were not grounded in the input.
That happened in testing.
The coordinator produced a strong-looking report with made-up cost ranges. It looked polished. That made it more dangerous.
This is the most important failure mode in LLM systems: confident output that sounds useful but is not grounded.
The coordinator method
The implementation is simple:
def synthesize(self, technical, risk, cost, timeline):
prompt = f"""
Technical Findings:
{technical}
Risk Findings:
{risk}
Cost Findings:
{cost}
Timeline Findings:
{timeline}
"""
message = self.client.messages.create(
model="claude-sonnet-4-6",
max_tokens=2048,
system=SUMMARY_SYSTEM_PROMPT,
messages=[{"role": "user", "content": prompt}],
)
text = message.content[0].text
cleaned = clean_json_response(text)
return json.loads(cleaned)
The code is straightforward.
What matters is the behavior.
What the coordinator sees that specialists don’t
Specialists see domain-specific problems.
The coordinator sees compound problems.
Example:
Technical:
The Salesforce, data warehouse, and Okta integrations lack sync frequency,
data mapping, authentication, and error handling specifications.
Risk:
Undefined integrations create high likelihood of delivery delays, data sync
issues, and post-award disputes.
Cost:
Integration costs are uncertain because the RFP does not say whether pre-built
connectors exist or custom development is required.
Timeline:
The schedule does not allocate time for integration discovery, testing, and
rollback planning.
Any one of those is useful.
Together, they become a top concern:
Integration requirements are insufficient to evaluate vendor proposals.
That is synthesis.
The coordinator can recommend:
Before issuing the RFP, document integration requirements for each system:
data direction, sync frequency, expected volume, API version compatibility,
error handling, and whether pre-built connectors are acceptable.
That is decision guidance.
Example output
On the sample EDMS RFP, the coordinator produced a summary like this:
The RFP contains critical gaps across technical, compliance, operational,
and financial dimensions. The most severe issues span timeline realism,
compliance conflicts, security underspecification, infrastructure ambiguity,
cost opacity, and integration risk. The RFP should not proceed to vendor
selection without resolving these deficiencies.
The top concerns were cross-domain:
1. Unrealistic 6-month deployment timeline
2. GDPR / US data residency / audit retention conflict
3. Security and encryption key management underspecified
4. Infrastructure requirements do not support the SLA
5. Budget lacks enough detail for vendor comparison
6. Integration specifications are insufficient
This is much more useful than four separate agent outputs.
Why the coordinator should be a class
At first, it is tempting to put this logic in a script.
technical = TechnicalAnalyzer(client).analyze(text)
risk = RiskAnalyzer(client).analyze(text)
cost = CostAnalyzer(client).analyze(text)
summary = synthesize(technical, risk, cost)
That works for a test file.
It does not scale well as the workflow grows.
The coordinator owns:
- agent initialization
- execution order
- chunking decisions
- partial failure handling
- aggregation
- synthesis
- final report shape
That belongs in one object.
Scripts should run the system. They should not contain the system.
Sequential vs parallel execution
The current implementation runs agents sequentially:
Technical → Risk → Cost → Timeline → Coordinator
That is simple and easy to debug. It is also slow.
The obvious improvement is parallel execution:
results = await asyncio.gather(
technical_agent.analyze_async(text),
risk_agent.analyze_async(text),
cost_agent.analyze_async(text),
timeline_agent.analyze_async(text),
)
I did not start there because sequential execution is easier to reason about while building.
Parallelism is a performance optimization, not a design requirement. The architecture already supports it because each specialist is independent.
That independence is one of the benefits of the multi-agent design.
Partial failure handling
Right now, if one agent fails, the whole run can fail.
That is not ideal.
The coordinator should eventually return partial results:
{
"summary": {
"overall_assessment": "Partial analysis completed. Technical analysis failed.",
"top_concerns": [],
"recommended_next_steps": []
},
"agent_status": {
"technical": "failed",
"risk": "success",
"cost": "success",
"timeline": "success"
}
}
If the technical agent fails but risk, cost, and timeline succeed, the system can still produce a useful report. It should just be explicit about what is missing.
This is the same pattern as RAG evals and scraping pipelines: failure should be visible and recoverable, not hidden.
The hard part: controlling confidence
The coordinator is powerful because it synthesizes.
That is also the danger.
A specialist might make a small inference. The coordinator can amplify it into a major recommendation.
For example, a cost agent might say:
The budget may be insufficient because licensing and infrastructure are not itemized.
A bad coordinator turns that into:
This project will cost $5M.
That is not acceptable unless the input supports it.
The fix is prompt discipline:
- distinguish facts from inference
- avoid specific estimates unless supported
- label uncertainty
- cite specialist evidence
- prefer vendor questions over made-up precision
In production, I would go further and require the coordinator to attach source finding IDs to every top concern.
{
"concern": "Integration specifications are insufficient",
"supporting_findings": [
"TECH-004",
"RISK-005",
"COST-004",
"TIME-003"
]
}
That would make the output auditable.
When coordination overhead is worth it
The coordinator adds another LLM call. For short or simple documents, that might be unnecessary.
It is worth it when:
- the document is long
- findings overlap across domains
- recommendations require prioritization
- the user needs an executive summary
- raw findings are too numerous to act on
It is not worth it when:
- you only need extraction
- the schema is simple
- there is only one analytical lens
- the answer can be produced reliably in one pass
This project is in the first category. RFPs and contracts are cross-domain by nature. Technical requirements affect cost. Cost constraints affect delivery risk. Legal requirements affect architecture. Timeline compression affects everything.
That is exactly where synthesis matters.
What I learned
The coordinator is the simplest code and the most important component.
The specialists produce observations.
The coordinator produces judgment.
That is the difference between:
Here are 40 findings.
and:
Do not issue this RFP until these six issues are fixed.
Series navigation
Previous: Building Specialist LLM Agents: Technical, Risk, Cost, and Timeline Analysis
Next: What I Learned Building a Multi-Agent Document Analysis System