After discovery on a typical mid-market SFMC engagement, you find out the client has subscribers spread across three systems - CRM, an e-commerce platform, and a loyalty program. Each one uses a different primary key to identify a customer. If you skip the data modeling step and start creating Data Extensions as the data arrives, you end up with DEs that can't be joined, segments that can't be built, and reports that can't be produced.
A 30-minute exercise before anyone opens the Contact Builder UI fixes this.
The setup to map first
Map what each source system actually stores, including its identifier column:
Three different primary keys. Three different customer records describing the same person.
The unifying identifier
You need one common identifier to match a person across systems. In this case, email works - it's present in all three. If email isn't common (common when SMS-first or multi-channel identity), fall back to a deterministic hash or a CDP-provided canonical ID.
In SFMC, this unified identifier becomes the Subscriber Key - one customer, one Subscriber Key, regardless of how many systems they appear in.
The master DE pattern
Don't dump everything into one giant Data Extension. Build a Master_Customer_DE that holds the identifiers + commonly-segmented attributes, and let satellite DEs hold the detail:
Satellite DEs referenced by Master:
Join from Master to satellites via AMPscript Lookup() or LookupRows() at send time, or pre-compute joins via Automation Studio SQL into a ready-to-send DE.
Two principles that save you from a rewrite
Principle 1: One Subscriber Key per human
Even if the same person is a Contact in CRM, a Customer on Shopify, and a Member in Loyalty - they get one Subscriber Key in SFMC. Otherwise the same person receives three welcome emails, one for each identity, and unsubscribing from one doesn't stop the other two.
The Subscriber Key should be stable over time. Don't use fields that change (like Shopify internal IDs that can reset on a migration). Use the most durable identifier the client controls - usually their CRM ContactID or a purpose-built customer UUID.
Principle 2: Split large DEs by function
A DE with 60 columns is a DE you can't maintain. When you inherit one, good luck knowing which attributes are still populated and which are stale.
Break by function:
- Master_Customer_DE: identifiers + 10-15 most-queried segmentation fields
- Order_History_DE: transaction detail, grows over time
- Preference_DE: channel/content preferences
- Consent_DE: opt-in records with timestamps
Join on demand instead of duplicating columns.
Common mistakes we fix on audit
- No Subscriber Key strategy - each DE uses its own primary key, join impossible. Fix: pick a Subscriber Key, retrofit.
- Email as Subscriber Key - same person with multiple addresses = multiple subscribers. Fix: use CustomerID.
- One mega-DE - 80-column monstrosity, unknown which columns are fresh. Fix: split by function, use lookups.
- Dropping raw CRM fields into DEs untouched - Contact__c_External_ID__pc as a column name in SFMC reporting is a bad day. Fix: rename on import, keep SFMC-facing names clean.
Takeaway
The data model lives on a whiteboard (or a Miro board) before any DE is created. Get the identifier story right, split DEs by function, and you avoid the single biggest source of rework on SFMC engagements. When onboarding a new client, asking "can I see your data model?" within the first week often reveals that there isn't one - which is both the problem and the opportunity.
Planning an SFMC data architecture? Our Salesforce team designs data models, DE layouts, and multi-source identity resolution on production engagements. Get in touch ->
See our full platform services for the stack we cover.