Three out of every four Power Platform projects we start have the same first technical question: "How do we get the existing data in?" The answer is never "one way." Dataverse exposes at least five import mechanisms, and the right one depends on whether you are doing a one-time historical load, migrating off a legacy CRM, or setting up a recurring sync from a billing system.
Here is the matrix we run on every project.
The options, briefly
- Import from Excel (maker portal, Import Data): quick wizard that takes a CSV or XLSX. Good for one-off loads under 50,000 rows.
- Dataflows: visual ETL pipelines, refresh on schedule or demand. Good for recurring sync from file, SQL, SharePoint, or API sources.
- Power Platform CLI (
pac data import): command-line, scriptable. Good for pipeline-integrated loads. - SDK / Web API bulk operations: code (C# or scripting) using
ExecuteMultipleRequestor batch operations. Good for high-volume, transform-heavy imports. - Third-party tools (KingswaySoft, XrmToolBox Data Import): GUI tools with richer features than the built-in wizard.
The decision splits first on "one-off or recurring," then on volume, then on transform complexity.
Matrix
| Scenario | Row count | Frequency | Best tool |
|---|---|---|---|
| Reference data seed (countries, roles) | < 500 | Once per env setup | pac data import (scriptable, repeatable) |
| Historical account/contact load | 5,000 - 50,000 | Once | Dataflow with Excel source |
| Legacy CRM cutover | 100,000+ | Once (with rehearsals) | SDK bulk operations in code |
| Nightly sync from ERP | Any | Recurring | Dataflow on schedule, or SDK-based integration |
| Ad-hoc admin fix | < 100 | Ad-hoc | Excel template download, edit, re-import |
Pattern we use for one-off historical loads
For a 20,000-row account load from a legacy system:
- Export to CSV from the source, one CSV per target table. Include only the columns needed.
- Add a staging column on the target table named
acme_legacy_id(string). This holds the source system's primary key for every imported row. Becomes the alternate key for the load. - First pass: import parent tables. Accounts before contacts (so contacts' lookup-to-account can resolve). Use the
acme_legacy_idas the alternate key on the parent; contacts reference the parent's legacy ID. - Second pass: import dependent tables with lookup columns populated using the legacy IDs.
- Validate row counts and spot-check. Compare source count to Dataverse count; pull 20 random rows and diff.
- Clean up staging columns after the load is stable (optional - keeping them is useful for audit).
The key insight is step 2 - introducing a legacy_id column so the load is idempotent and re-runnable. If the load fails halfway, you can re-run without creating duplicates, because the alternate key blocks them.
Pattern for recurring sync from file drop
A client drops a CSV in SharePoint weekly; Dataverse should absorb it and update records.
Dataflow is the right answer here. Steps:
- Create a dataflow with the SharePoint file as source.
- Apply Power Query transforms - rename columns to match target schema, convert types, handle nulls.
- Map the dataflow output to the target Dataverse table. Pick the alternate key (usually
acme_legacy_id) as the upsert key. - Schedule refresh (hourly / daily / weekly) or trigger on file change.
Gotchas we hit:
- Dataflow transactions are batched, not row-by-row. A failure in row 500 does not roll back rows 1-499. You need to reconcile post-run.
- Source CSV schema drift silently breaks dataflows. The new column the client added on their side does not fail the refresh - it just doesn't populate. Add a column-count assertion as the first transform step.
- Delete handling. Dataflows upsert but do not delete. If the source removes a row, the target keeps it. Either add a
is_activeflag in the source and filter on it, or add a separate deletion reconciliation step.
Pattern for high-volume cutover
Legacy CRM with 300,000 accounts, 1.2 million contacts, 4 million activity records. Timeline is a single weekend.
Neither the wizard nor dataflows scale here. We write a C# migration tool using the Dataverse SDK with ExecuteMultipleRequest batching.
Framework we reuse:
var batchSize = 500; // Dataverse hard limit is 1000 per ExecuteMultipleRequest
var settings = new ExecuteMultipleSettings {
ContinueOnError = true,
ReturnResponses = false
};
foreach (var batch in accounts.Chunk(batchSize)) {
var request = new ExecuteMultipleRequest {
Settings = settings,
Requests = new OrganizationRequestCollection()
};
foreach (var account in batch) {
request.Requests.Add(new UpsertRequest {
Target = BuildEntity("account", account)
});
}
var response = (ExecuteMultipleResponse)service.Execute(request);
LogFailures(response, batch);
}
Hardening that matters:
- Parallelism: run multiple batches concurrently against separate Dataverse connections. Dataverse's concurrency limits allow 52 connections per user; tune based on throughput measurement.
- Failure logging:
ContinueOnError = truelets the batch proceed past individual failures. Log the failures with the original source row so you can re-run only the failures. - Idempotency via upsert:
UpsertRequestwith an alternate key means re-running the tool produces the same result, crucial for rehearsals. - Rehearsals: run the full migration against a clone of UAT environment twice before the real cutover. Time it, measure failure rates, fix.
A 2-million-row cutover we ran last year took 6 hours of wall clock time on the real run, after 3 rehearsal runs found and fixed 4 data quality issues that would have failed the migration.
Things we do not use
Dynamics 365 Data Loader (the in-app bulk import for Sales/Customer Service): limited, undocumented quirks, slow. Skip it unless you are already in the Dynamics stack for other reasons.
Manual Excel re-import in a loop. Teams sometimes do this for recurring loads - download the template, edit, upload. It is fragile (schema drift on the template), manual (no audit trail), and does not scale. Replace with a dataflow as soon as the load becomes weekly or more.
The checklist we run pre-import
On every load above 1,000 rows:
- Alternate key defined on the target for idempotency
- Source data cleaned: no null alt-key values, no duplicates, no trailing whitespace
- Schema validated: source column count and types match target
- Rehearsal against a Dev or Test copy first
- Validation queries ready (row count, spot checks, relationship integrity)
- Rollback plan: either idempotent re-run, or bulk delete criteria ready to hand
The checklist is short and boring. The imports that fail are the ones where two of these items were missed.








