A while back I was on an integration team building the backend for a bank's mobile app. One of the first screens the product team wanted was a "Customer 360" view: open the app, and you immediately see your accounts, your investments, your loans, your insurance policies, and your cards on a single screen. Sounds simple. The catch is that every one of those data domains lives in a different system, and none of them talk to each other.
The naive first cut called each backend in turn. The core-banking system answered in about 800ms, the wealth platform took 1.2 seconds, the mortgage system 600ms, the insurance platform 900ms, the card system 500ms. Add those up and you get four seconds of staring at a spinner before anything renders. On mobile, that is a user who has already switched apps. The fix is conceptually obvious — fire all five calls at once — but the way you express "at once" in Mule 4 matters a great deal, and the patterns that look interchangeable in a demo behave very differently the first time a downstream system has a bad day.
This post is about the three parallel-execution tools I reach for in Mule 4 — Scatter-Gather, Parallel For Each, and Async — and, more importantly, the decisions and gotchas that separate a demo from something you'd put behind an SLA.
Scatter-Gather: one message, N different routes
Scatter-Gather is the right tool when you need data from several distinct systems as fast as possible. It takes the incoming message, hands a copy to each route, runs all routes concurrently on their own threads, waits for every one to finish, and then collects the results into a single Java Map keyed by route index — "0", "1", "2", and so on. For the Customer 360 flow, that means five routes, each a plain HTTP request to one backend, and the whole scope completes in roughly the time of the slowest route rather than the sum of all of them. In our case that took the screen from ~4s down to a P95 around 1.3s, which was the difference between shipping and not shipping.
Two details about how the output is shaped trip people up. First, each route gets its own copy of the message, so there is no shared mutable payload and no race condition to reason about — but it also means a variable you set inside a route does not leak back out after the gather. Once the scope completes, your payload is that index-keyed Map, and any vars.something you set inside route 2 is simply gone. You access results positionally, through payload."1".payload, not through some variable you hoped survived.
Second, the merge step is pure DataWeave and worth getting right. After the gather I transform the map into the actual 360 object the app expects, pulling each domain out by its route index — accounts from payload."0", holdings from payload."1", and so on. Because route order is fixed and deterministic, this positional access is safe; it only becomes fragile if someone reorders the routes without updating the transform, so keep them next to each other and comment the indices.
The failure mode nobody plans for: one route takes down the whole thing
Here is the part that bites everyone exactly once. Scatter-Gather waits for all routes and has no short-circuit, which is fine — but if any single route fails, the entire scope throws a MULE:COMPOSITE_ROUTING error. So picture the insurance platform going down for maintenance. Four of your five systems answer perfectly, the user could easily see 80% of their data, and instead the API returns a 500 and a blank screen. That is almost never the behavior the business wants for a read-only aggregation view.
There are two ways to make this graceful, and I use both depending on the route. For best-effort domains, wrap the individual route's request in a try with an on-error-continue that swaps the failure for a sentinel payload like { error: 'insurance-unavailable' }. Now that route "succeeds" with a marker, the gather completes normally, and the front end can decide how to render a missing section. Alternatively — or in addition — catch MULE:COMPOSITE_ROUTING in the Scatter-Gather's own error handler; the error carries the result map of the routes that did succeed, so you can still assemble a partial response instead of a hard failure. In my merge transform I lean on a small helper that returns an empty list when a route's payload is null, so a degraded backend produces an empty section rather than an exception.
The other non-negotiable is the timeout. Scatter-Gather with no timeout attribute waits forever. One backend that hangs rather than fails will hold the flow open, and held flows pin HTTP listener threads, and a drained listener pool means your entire API service stops accepting requests — a single slow dependency cascading into a full outage. Always set an explicit timeout on any Scatter-Gather doing outbound HTTP, and set it comfortably below the client-facing HTTP listener timeout so you fail in a controlled way before the client gives up on you.
Parallel For Each is a different animal
Scatter-Gather is N different routes over one message. Parallel For Each is the opposite shape: one piece of logic applied across N items of a collection, concurrently. On the same project we needed to enrich a list of accounts pulled from core banking with real-time balances from a separate microservice. Done sequentially, fifty accounts at ~100ms each is five seconds; with Parallel For Each and a maxConcurrency of 10, it comes in around half a second.
Watch the output shape, because it differs from an ordinary For Each. A regular For Each mutates the payload in place; Parallel For Each builds a new array where each element wraps the payload and attributes of that iteration, so you'll typically follow it with a payload map ((item) -> item.payload) to flatten back to a clean list. Result order is preserved by input index, which is reassuring — but the timing of side effects is not. If each iteration writes to a database, the inserts happen in non-deterministic order. So never use Parallel For Each where you need auto-increment IDs assigned in sequence or an audit trail with a faithful timeline. Reach for the sequential variant there and accept the latency cost, because correctness wins.
Async: when the client genuinely should not wait
The third pattern answers a different question entirely: not "how do I go faster," but "why is the client waiting for this at all?" Consider a payment flow. Posting the transaction to core banking is critical and must block — the user cannot get a success response before the money actually moves. But the audit-log write, the SMS confirmation, and the analytics event pushed to Kafka are all side effects the user has no reason to wait on. Wrapping those in an Async scope spins them off onto a separate thread pool; the main flow continues straight to building the response, and the client sees ~700ms even though the side effects keep churning for another second in the background. (And yes, you can nest a Scatter-Gather inside the Async to run those three side effects in parallel too.)
Async earns its keep for audit logging, notifications, cache warm-ups, analytics, and non-critical webhooks — anything where the result does not shape the client response. But it comes with a sharp edge: an exception thrown inside an Async does not bubble up to the parent flow's error handler. It dies silently. So every Async I write has its own internal error-handler that at minimum logs at ERROR level into whatever observability stack we're using, otherwise failures vanish without a trace. And when "fire and forget" isn't good enough — when you actually need guaranteed delivery — Async is the wrong tool, because it isn't persistent and a runtime restart loses the in-flight task. That's when you publish to a VM queue, Anypoint MQ, or Kafka and let a separate listener flow consume it, with a dead-letter queue and retries doing the heavy lifting.
Closing principle
The pattern almost picks itself once you name the intent honestly. Data from several different systems, as fast as possible, is Scatter-Gather. The same logic over a list of items, in parallel, is Parallel For Each. Work the client shouldn't wait on is Async — or a real message queue when you can't afford to lose it. What separates production code from a tutorial isn't knowing which scope to drop in; it's the timeout you set before a backend hangs, the per-route try/catch you add before a dependency fails, the error handler you tuck inside the Async before something dies quietly, and the maxConcurrency you cap before a fan-out-inside-a-fan-out exhausts your thread pool. Parallelism is cheap to add and expensive to add carelessly. Decide your failure behavior first, then go fast.
Building or operating MuleSoft integrations? Our Salesforce team designs API-led architectures, builds Mule flows, and runs them in production. Get in touch ->
See our full platform services for the stack we cover.








