SapotaCorp

DataWeave fundamentals: header, body, and your first real transform

A core-banking system speaks XML over a message queue; the mobile team wants clean JSON. DataWeave is the layer that does the translating, and most of the pain comes from misunderstanding three things: where the header ends, why type coercion fails silently, and how a single expression has to carry the whole transform. Here is what I wish I'd internalized before my first production mapping.

DataWeave fundamentals: header, body, and your first real transform

Key takeaways

  • Every DataWeave script is a header of directives plus exactly one body expression separated by `---`, so when you need many output fields you build a single object or array literal rather than a sequence of statements.
  • DataWeave never coerces types implicitly the way JavaScript does; XML and CSV text arrives as String, and forgetting `as Number` or `as Date {format: ...}` is the most common reason a downstream system rejects an otherwise correct-looking payload.
  • Coercion can fail silently — an empty element coerced with `as Number` becomes `null` instead of throwing — so guard real-world inputs with `default` before casting to avoid corrupt records reaching the target system.
  • Once the basics click, real transforms are just `filter` then `map` over arrays, and ordering `filter` before `map` matters at batch scale because you avoid transforming records you are about to discard.

A while back I was sitting between two systems that genuinely could not talk to each other. On one side was a core-banking platform — the kind of legacy system that has run for two decades and exposes account data as XML pushed over a message queue. On the other side was a mobile team building a banking app in React Native, and they wanted JSON over a plain REST API. Nobody was going to rewrite the core. Nobody was going to make the app parse XML. The integration layer had to absorb the entire impedance mismatch, and inside that layer the actual translation work falls to DataWeave.

If you are coming to MuleSoft from a general-purpose language, DataWeave looks deceptively familiar and then surprises you in three specific places. It is a functional, declarative transformation language — there is no for loop, no return, no mutating a variable in place. A script is split into a header and a body, and the entire body is a single expression. Once those two facts sink in, most of the confusion disappears. This post is the version of the fundamentals I wish someone had handed me before my first production mapping, built around that bank-to-mobile scenario.

The job, concretely: the core system answers a balance query with an XML document wrapping a transaction header and an account-info block, and the mobile app expects a nested JSON object with txnId, a timestamp, and an account sub-object carrying a balance breakdown and a boolean active flag. That is the transform we are going to build.

The header is metadata, and it produces nothing

Every DataWeave script is divided by a line of three dashes (---). Everything above it is the header; everything below is the body. The header declares metadata and, crucially, emits no output of its own. The two directives you cannot omit are the version and the output type:

%dw 2.0
output application/json
input payload application/xml
---

%dw 2.0 pins the language version — 2.0 is the Mule 4 line, and the 1.x dialect from Mule 3 is a different language you can mostly ignore now. output application/json tells the writer what MIME type to serialize to, and this single line is what makes DataWeave format-agnostic: change it to application/xml or application/csv and the same body produces a different shape. The input payload application/xml line is optional — the runtime auto-detects the payload format — but I keep it in because it gives the Studio editor enough information to autocomplete the selector paths, which is worth a lot when you are navigating a deep XML tree.

The header is also where you declare reusable pieces: var for local values, fun for functions, import for modules from dw::core, and ns for XML namespaces. You do not have to use any of them, but pulling logic up into a named fun is the single biggest readability win once a script crosses about a hundred lines.

The body is one expression — design around that

Below the --- you get exactly one expression. Not a sequence of statements, no semicolons, no explicit return. This trips people up immediately, because the natural instinct is to "assign txnId, then assign timestamp, then…". DataWeave does not work that way. When you need to emit many fields, you express them as a single object or array literal, and that literal is the one expression:

%dw 2.0
output application/json
input payload application/xml
---
{
  txnId: payload.T24Response.Header.TransactionId,
  timestamp: payload.T24Response.Header.Timestamp,
  account: {
    number: payload.T24Response.AccountInfo.AccountNumber,
    holder: payload.T24Response.AccountInfo.CustomerName,
    balance: {
      currency: payload.T24Response.AccountInfo.Currency,
      total: payload.T24Response.AccountInfo.Balance as Number,
      available: payload.T24Response.AccountInfo.AvailableBalance as Number
    },
    active: payload.T24Response.AccountInfo.Status == "ACTIVE"
  }
}

A few things are worth pointing out. The dotted paths like payload.T24Response.Header.TransactionId are selectors that walk into the parsed XML, starting from the root element. DataWeave gives you a small selector vocabulary that pays off constantly: .field for a single value, .*field to pull every sibling with the same name into an array, ..field to search descendants at any depth, and .@attr to read an XML attribute. When the core system returns repeating <order> elements, payload.orders.*order hands you a clean array without any looping construct.

The body also supports tricks that keep output honest. A conditional field, written (age: payload.age) if (payload.age != null), omits the key entirely when the condition is false — which is genuinely different from emitting age: null. The spread operator ...payload.user merges an existing object's fields into a new one. And a dynamic key needs parentheses, (keyName): value, otherwise DataWeave treats the name as a literal string rather than evaluating it.

Type coercion is explicit, and that is on purpose

Here is the part that bites everyone at least once. DataWeave does not auto-convert types the way JavaScript does. XML text content arrives as String. CSV cells arrive as String. If the mobile app expects total to be a number and you forget the cast, the output is the string "15750000.00", the app's JSON parser hands the UI a string, and someone files a bug about formatting that has nothing to do with formatting. That is why Balance as Number and AvailableBalance as Number appear in the body above — the as operator is your one and only conversion mechanism.

Coercion takes options, and you will reach for them constantly in financial work. 1234567.89 as String {format: "#,###.00"} produces a thousands-separated display string, and adding locale: "vi-VN" flips the separators so thousands use . and the decimal uses , — a detail that matters enormously when you are rendering currency for a specific market. Dates are the same story: the core system might hand you "06/03/2026" in day-month-year order, and as Date with no format throws because it assumes ISO 8601. You have to spell it out: as Date {format: "dd/MM/yyyy"}. Get the format pattern wrong and you do not get an error — you get the third of June silently turned into the sixth of March.

The genuinely dangerous failure is the silent one. The core system occasionally returns an empty <Balance></Balance> for a freshly opened account. Coerce that empty string with as Number and you do not get an exception — you get null, which sails straight through into the target system and breaks something downstream. The defensive habit is to supply a fallback before casting: (payload.AccountInfo.Balance default "0") as Number. The default operator is your null-safety seatbelt throughout DataWeave, and on real-world inputs you want it on every coercion that touches an optional field.

Real transforms are filter, then map

Single-record mapping is the warm-up. The work that actually fills your day is reshaping arrays, and almost all of it is two operators. map walks an array and transforms each element, always returning an array of the same length; its lambda receives the item and an index, or you can use the terse $ and $$ placeholders. filter walks an array and keeps only the elements whose predicate returns true, so the result is the same length or shorter.

Take a nightly batch that syncs accounts from the core system into Salesforce Financial Services Cloud. The source uses snake_case keys and numeric ISO currency codes (704 for VND, 840 for USD); Salesforce wants its FinServ__ field names and alphabetic currency codes. A lookup table in a var plus a map covers the whole translation:

%dw 2.0
output application/json
var currencyLookup = { "704": "VND", "840": "USD", "978": "EUR" }
var statusLookup = { "A": "Active", "C": "Closed", "D": "Dormant" }
---
payload.accounts
  filter ((acc) -> acc.status_code == "A")
  map ((acc) -> {
    FinServ__FinancialAccountNumber__c: acc.account_no,
    FinServ__PrimaryOwner__c: acc.customer_id,
    CurrencyIsoCode: currencyLookup[acc.currency_code] default "VND",
    FinServ__Balance__c: acc.balance as Number,
    Status__c: statusLookup[acc.status_code] default "Unknown",
    ExternalId__c: "T24-" ++ acc.account_no
  })

Two decisions in there are deliberate. The inline lookup tables mean you are not making an external call per record just to translate a code, and the default on each lookup keeps a stray unknown code from dropping a raw null into Salesforce. And the ordering — filter before map — is not cosmetic. With a hundred records it is irrelevant, but the same pipeline runs over six-figure batches, and transforming twenty thousand records only to throw them away afterward is exactly the kind of waste that shows up in your nightly job's runtime. The ++ operator concatenates the ExternalId__c so the Salesforce upsert stays idempotent across reruns. When the source nests accounts under customers, flatMap collapses the one-to-many into a single flat array in one pass instead of a map followed by a separate flatten.

The throughline across all of this is that DataWeave rewards being explicit. It will not guess that your string is really a number, it will not invent a missing field, and it will not loop behind your back. That feels like friction the first week and like a safety rail every week after, because the bugs it forces you to confront at design time are precisely the ones that would otherwise surface as a corrupted record in production at six in the morning. Learn the header-body split, internalize the single-expression rule, and treat every coercion as a place a real payload might be empty — do that, and the rest of DataWeave is just composition.


Building or operating MuleSoft integrations? Our Salesforce team designs API-led architectures, builds Mule flows, and runs them in production. Get in touch ->

See our full platform services for the stack we cover.

Engineering certifications

Sapota engineers hold credentials on MuleSoft. Each badge links to the individual engineer's credly profile.

Browse MuleSoft certs

Need this on your team?

Sapota engineers ship the patterns you read here. Two-week paid trial, direct pricing from $1,800/ engineer/month, no agency markup.

Get a quote
Contact Us Now

Share Your Story

We build trust by delivering what we promise – the first time and every time!

We'd love to hear your vision. Our IT experts will reach out to you during business hours to discuss making it happen.

WHY CHOOSE US

"Collaborate, Elevate, Celebrate where Associates - Create Project Excellence"

SapotaCorp beyond the IT industry standard, we are

  • Certificated
  • Assured quality
  • Extra maintenance

Tell us about your project