Coordinating Multiple LLM Agents: Cross-Domain Synthesis

After building the specialist agents, the output looked impressive.

It was not useful enough.

The system produced:

12 technical findings
14 risk findings
10 cost findings
timeline findings

That is a lot of analysis. It is also a lot to read.

The coordinator is the piece that turns those separate findings into something a person can act on.

Aggregation is not synthesis

The first version of the coordinator just ran the agents and returned their results.

class Coordinator:
    def __init__(self, client):
        self.technical_agent = TechnicalAnalyzer(client=client)
        self.risk_agent = RiskAnalyzer(client=client)
        self.cost_agent = CostAnalyzer(client=client)
        self.timeline_agent = TimelineAnalyzer(client=client)

    def analyze(self, rfp_text: str) -> dict:
        technical = self.technical_agent.analyze(rfp_text)
        risk = self.risk_agent.analyze(rfp_text)
        cost = self.cost_agent.analyze(rfp_text)
        timeline = self.timeline_agent.analyze(rfp_text)

        return {
            "technical": technical,
            "risk": risk,
            "cost": cost,
            "timeline": timeline,
        }

This is orchestration, but it is not synthesis.

It answers:

What did each agent find?

It does not answer:

What matters most?

That second question is the coordinator’s job.

The synthesis prompt

The coordinator gets the structured findings from each specialist and looks for cross-domain patterns.

SUMMARY_SYSTEM_PROMPT = """
You are a senior analyst synthesizing findings from multiple specialists.

You are given:
- technical findings
- risk findings
- cost findings
- timeline findings

Your job:
- identify the most important cross-cutting issues
- avoid duplicates
- highlight the biggest risks to success
- recommend concrete next steps

Do not invent facts not supported by the specialist findings.
Do not state specific dollar ranges or industry benchmarks unless supported by input.
When uncertainty exists, label it as uncertainty instead of guessing.

Return JSON only:

{
  "summary": {
    "overall_assessment": "",
    "top_concerns": [],
    "recommended_next_steps": []
  }
}
"""

The constraints matter. Without them, the coordinator becomes too confident. It starts adding industry benchmarks, cost ranges, and legal conclusions that were not grounded in the input.

That happened in testing.

The coordinator produced a strong-looking report with made-up cost ranges. It looked polished. That made it more dangerous.

This is the most important failure mode in LLM systems: confident output that sounds useful but is not grounded.

The coordinator method

The implementation is simple:

def synthesize(self, technical, risk, cost, timeline):
    prompt = f"""
Technical Findings:
{technical}

Risk Findings:
{risk}

Cost Findings:
{cost}

Timeline Findings:
{timeline}
"""

    message = self.client.messages.create(
        model="claude-sonnet-4-6",
        max_tokens=2048,
        system=SUMMARY_SYSTEM_PROMPT,
        messages=[{"role": "user", "content": prompt}],
    )

    text = message.content[0].text
    cleaned = clean_json_response(text)
    return json.loads(cleaned)

The code is straightforward.

What matters is the behavior.

What the coordinator sees that specialists don’t

Specialists see domain-specific problems.

The coordinator sees compound problems.

Example:

Technical:
The Salesforce, data warehouse, and Okta integrations lack sync frequency,
data mapping, authentication, and error handling specifications.

Risk:
Undefined integrations create high likelihood of delivery delays, data sync
issues, and post-award disputes.

Cost:
Integration costs are uncertain because the RFP does not say whether pre-built
connectors exist or custom development is required.

Timeline:
The schedule does not allocate time for integration discovery, testing, and
rollback planning.

Any one of those is useful.

Together, they become a top concern:

Integration requirements are insufficient to evaluate vendor proposals.

That is synthesis.

The coordinator can recommend:

Before issuing the RFP, document integration requirements for each system:
data direction, sync frequency, expected volume, API version compatibility,
error handling, and whether pre-built connectors are acceptable.

That is decision guidance.

Example output

On the sample EDMS RFP, the coordinator produced a summary like this:

The RFP contains critical gaps across technical, compliance, operational,
and financial dimensions. The most severe issues span timeline realism,
compliance conflicts, security underspecification, infrastructure ambiguity,
cost opacity, and integration risk. The RFP should not proceed to vendor
selection without resolving these deficiencies.

The top concerns were cross-domain:

1. Unrealistic 6-month deployment timeline
2. GDPR / US data residency / audit retention conflict
3. Security and encryption key management underspecified
4. Infrastructure requirements do not support the SLA
5. Budget lacks enough detail for vendor comparison
6. Integration specifications are insufficient

This is much more useful than four separate agent outputs.

Why the coordinator should be a class

At first, it is tempting to put this logic in a script.

technical = TechnicalAnalyzer(client).analyze(text)
risk = RiskAnalyzer(client).analyze(text)
cost = CostAnalyzer(client).analyze(text)
summary = synthesize(technical, risk, cost)

That works for a test file.

It does not scale well as the workflow grows.

The coordinator owns:

agent initialization
execution order
chunking decisions
partial failure handling
aggregation
synthesis
final report shape

That belongs in one object.

Scripts should run the system. They should not contain the system.

Sequential vs parallel execution

The current implementation runs agents sequentially:

Technical → Risk → Cost → Timeline → Coordinator

That is simple and easy to debug. It is also slow.

The obvious improvement is parallel execution:

results = await asyncio.gather(
    technical_agent.analyze_async(text),
    risk_agent.analyze_async(text),
    cost_agent.analyze_async(text),
    timeline_agent.analyze_async(text),
)

I did not start there because sequential execution is easier to reason about while building.

Parallelism is a performance optimization, not a design requirement. The architecture already supports it because each specialist is independent.

That independence is one of the benefits of the multi-agent design.

Partial failure handling

Right now, if one agent fails, the whole run can fail.

That is not ideal.

The coordinator should eventually return partial results:

{
  "summary": {
    "overall_assessment": "Partial analysis completed. Technical analysis failed.",
    "top_concerns": [],
    "recommended_next_steps": []
  },
  "agent_status": {
    "technical": "failed",
    "risk": "success",
    "cost": "success",
    "timeline": "success"
  }
}

If the technical agent fails but risk, cost, and timeline succeed, the system can still produce a useful report. It should just be explicit about what is missing.

This is the same pattern as RAG evals and scraping pipelines: failure should be visible and recoverable, not hidden.

The hard part: controlling confidence

The coordinator is powerful because it synthesizes.

That is also the danger.

A specialist might make a small inference. The coordinator can amplify it into a major recommendation.

For example, a cost agent might say:

The budget may be insufficient because licensing and infrastructure are not itemized.

A bad coordinator turns that into:

This project will cost $5M.

That is not acceptable unless the input supports it.

The fix is prompt discipline:

distinguish facts from inference
avoid specific estimates unless supported
label uncertainty
cite specialist evidence
prefer vendor questions over made-up precision

In production, I would go further and require the coordinator to attach source finding IDs to every top concern.

{
  "concern": "Integration specifications are insufficient",
  "supporting_findings": [
    "TECH-004",
    "RISK-005",
    "COST-004",
    "TIME-003"
  ]
}

That would make the output auditable.

When coordination overhead is worth it

The coordinator adds another LLM call. For short or simple documents, that might be unnecessary.

It is worth it when:

the document is long
findings overlap across domains
recommendations require prioritization
the user needs an executive summary
raw findings are too numerous to act on

It is not worth it when:

you only need extraction
the schema is simple
there is only one analytical lens
the answer can be produced reliably in one pass

This project is in the first category. RFPs and contracts are cross-domain by nature. Technical requirements affect cost. Cost constraints affect delivery risk. Legal requirements affect architecture. Timeline compression affects everything.

That is exactly where synthesis matters.

What I learned

The coordinator is the simplest code and the most important component.

The specialists produce observations.

The coordinator produces judgment.

That is the difference between:

Here are 40 findings.

and:

Do not issue this RFP until these six issues are fixed.

Previous: Building Specialist LLM Agents: Technical, Risk, Cost, and Timeline Analysis

Next: What I Learned Building a Multi-Agent Document Analysis System

Aggregation is not synthesis#

The synthesis prompt#

The coordinator method#

What the coordinator sees that specialists don’t#

Example output#

Why the coordinator should be a class#

Sequential vs parallel execution#

Partial failure handling#

The hard part: controlling confidence#

When coordination overhead is worth it#

What I learned#

Series navigation#