A while back I was building the middleware layer for a financial-services client who wanted customer data to stay consistent across three systems that had nothing in common. The core-banking system was the source of truth for customer master records and balances, and it spoke XML — heavily namespaced, full-close-tag, the kind of XML that legacy banking platforms have emitted unchanged for two decades. The CRM that the relationship managers actually worked in was a Salesforce org talking REST and JSON. And a separate wealth-management platform consumed SOAP, which is to say more XML, with its own namespace prefixes.
The flow that mattered most went like this: a relationship manager edits a customer in the CRM, Salesforce fires a JSON webhook at our Mule app, we enrich that payload by querying the core-banking system (XML response), then we write an audit record into the wealth platform (SOAP XML). That is three format conversions inside a single flow, and a fourth — CSV — showed up the moment the accounting team asked for an end-of-day export over SFTP.
What I want to share is not a syntax reference. It is the set of decisions and gotchas that DataWeave's format support actually forces on you in production, because the transformation logic is rarely the hard part. The hard part is the reader options, the writer options, and the dozen ways a payload can come out looking right and be wrong.
Reading and writing are two separate problems
The single most useful mental model is that DataWeave splits a transformation into an input side and an output side, and each carries its own options that have nothing to do with the other. The output application/json directive is a writer instruction — it controls serialization. An input payload application/csv directive is a reader instruction — it controls parsing. You can, and constantly do, read one format with one set of options and write a completely different format with another.
This matters because the defaults are opinionated. JSON output is not pretty-printed by default, which is the right call in production (you are not paying for whitespace on every message) but mildly infuriating when you are staring at a one-line blob in a log. Flip indent=true for debugging and turn it off for the wire. Encoding defaults to UTF-8 everywhere, which is what you want, right up until a legacy system hands you Windows-1252 and you have to say so explicitly on the reader.
The option that has burned me hardest is skipNullOn, because it is not a formatting choice — it is a semantic one. When you write JSON destined for a Salesforce PATCH, an absent field and a null field mean two different things: absent means "don't touch this," null means "clear it." We had a flow that was helpfully serializing "middleName": null and quietly wiping middle names on every sync until we set skipNullOn="objects". The reader/writer options are where the real behavior lives, so treat them as code, not decoration.
XML is mostly a fight with namespaces and attributes
Every piece of production XML I have touched has namespaces, and namespaces are where DataWeave selectors silently fail. If the core-banking response wraps everything in a t24: prefix and you write payload.CustomerResponse.Customer, you get back null — not an error, just null — because the actual element is <t24:Customer> and your selector is looking for an unprefixed one. The fix is to declare the prefix in the header with ns t24 http://... and then select with the # syntax: payload.t24#CustomerResponse.t24#Customer. The # separates prefix from local name. Miss it on one segment of a deep path and the whole expression collapses to null.
Attributes are the second trap. In DataWeave's model every XML element is an object, and attributes live in special keys prefixed with @. So <t24:Branch code="HN001">Hoan Kiem</t24:Branch> gives you the text content directly when you select the element, but the branch code only comes back through payload...t24#Branch.@code. When an element has attributes and text and child elements all at once, the raw text node hides under the $ key. Building XML works the same way in reverse — you attach attributes with @(key: value) placed immediately after the key name, like Account @(number: "1900-12345", type: "SAVINGS"): {...}. Forget that syntax and you get a plain <Account> element with the data crammed into the body instead of where the receiving system expects it.
Two more XML facts worth internalizing. First, XML has exactly one root — if your script returns an object with several top-level keys, the writer throws XML_WRITER_MULTIPLE_ROOTS, and the fix is to wrap everything in a single outer element. Second, writeDeclaration and inlineCloseOn exist for a reason: turn off the <?xml ?> declaration when you are dropping the result into a SOAP body that already has one, and force full close tags (<Foo></Foo> rather than <Foo/>) when the downstream parser is strict about it. Legacy core-banking parsers very often are.
CSV is simple until you forget everything is a string
CSV is not dead. The end-of-day SOA file the bank sent merchants was CSV over SFTP, and a separate retail client pushed a thousand-dealer billing feed the same way. It survives because it is trivial to eyeball in Excel and to diff in git, and DataWeave handles it cleanly — as long as you remember the one rule that catches everyone: after parsing, every single value is a String. The reader gives you an array of objects keyed by header name, and that creditLimit of 500000000 is text, not a number.
That means any arithmetic needs an explicit as Number, and any date logic needs as Date, applied per value. On the dealer billing job I built, the aggregation was a groupBy then pluck to roll orders up per dealer, and every quantity and unit price had to be coerced inside the map before it could be summed. Skip a coercion and DataWeave will happily string-concatenate or produce garbage totals without complaining.
The separator is a locale landmine too. Excel in a comma-decimal locale opens CSV with semicolons, so an export aimed at the accounting team needs separator=";" even though the source file used commas. I also set quoteValues=true on those accounting exports specifically so that numbers stay quoted and Excel does not reinterpret an account number as a float and eat the leading zeros.
Java is for staying inside the JVM, and class loaders will fight you
When a Mule flow calls into a Java component — a legacy banking SDK, an old library the client insists on — the payload has to be actual Java objects, not a JSON string. The output application/java directive serializes into java.util.LinkedHashMap, ArrayList, Long, BigDecimal, ZonedDateTime, and so on. That is usually enough, but when the SDK expects a specific class — a JAXB type, a Hibernate entity — you append a class hint: ... as Object {class: "com.example.dto.CustomerDTO"}. The runtime instantiates that class via its default constructor and setters, which means the POJO needs a no-arg constructor and JavaBean setters or it won't populate. The failure mode here is almost never DataWeave; it is a ClassNotFoundException because the JAR was dropped into a lib/ folder by hand instead of being declared as a proper pom.xml dependency, and parent-first versus child-first class loading does the rest.
DataWeave also reads application/multipart for file uploads — customer ID-document images coming off a mobile app, where you reach into payload.parts.<name>.content and the per-part headers — but JSON, XML, CSV, and Java are the four you live in day to day.
The pitfalls that actually show up in production
A handful of failures recur often enough that I now check for them by reflex. Silent nulls from missing XML namespaces top the list, followed by string-not-number bugs in CSV aggregation. Encoding is the third: Vietnamese names turning into mojibake because the source was Windows-1258 and the reader assumed UTF-8, or the reverse where a clean UTF-8 export gets mangled by an old Excel. I confirm the real encoding with file -I on the actual file rather than trusting anyone's description of it, and I set encoding explicitly on both sides.
The last one is memory. A reader without streaming=true materializes the whole file, and a 500MB batch will OOM the worker. Turning on streaming helps, but groupBy and orderBy force the stream to materialize anyway because they need to see every row — so on genuinely large files you either buffer deliberately mid-flow or rethink the aggregation. The general lesson: log the payload size right after the reader so production tells you when a file has outgrown your assumptions before the heap does.
If there is one principle to carry away, it is that DataWeave's format support rewards treating the reader and writer options as the real surface area of your integration. The mapping between fields is the easy, visible part. The namespaces you forgot to declare, the strings you forgot to coerce, the null policy that quietly wipes data, and the encoding nobody verified — that is where correctness actually lives, and where production will find you out.
Building or operating MuleSoft integrations? Our Salesforce team designs API-led architectures, builds Mule flows, and runs them in production. Get in touch ->
See our full platform services for the stack we cover.








