Three out of every four Power Platform projects we start have the same first technical question: "How do we get the existing data in?" The answer is never "one way." Dataverse exposes at least five import mechanisms, and the right one depends on whether you are doing a one-time historical load, migrating off a legacy CRM, or setting up a recurring sync from a billing system.
Here is the matrix we run on every project.
The options, briefly
- Import from Excel (maker portal, Import Data): quick wizard that takes a CSV or XLSX. Good for one-off loads under 50,000 rows.
- Dataflows: visual ETL pipelines, refresh on schedule or demand. Good for recurring sync from file, SQL, SharePoint, or API sources.
- Power Platform CLI (pac data import): command-line, scriptable. Good for pipeline-integrated loads.
- SDK / Web API bulk operations: code (C# or scripting) using ExecuteMultipleRequest or batch operations. Good for high-volume, transform-heavy imports.
- Third-party tools (KingswaySoft, XrmToolBox Data Import): GUI tools with richer features than the built-in wizard.
The decision splits first on "one-off or recurring," then on volume, then on transform complexity.
Matrix
ScenarioRow countFrequencyBest toolReference data seed (countries, roles)< 500Once per env setuppac data import (scriptable, repeatable)Historical account/contact load5,000 - 50,000OnceDataflow with Excel sourceLegacy CRM cutover100,000+Once (with rehearsals)SDK bulk operations in codeNightly sync from ERPAnyRecurringDataflow on schedule, or SDK-based integrationAd-hoc admin fix< 100Ad-hocExcel template download, edit, re-import
Pattern we use for one-off historical loads
For a 20,000-row account load from a legacy system:
- Export to CSV from the source, one CSV per target table. Include only the columns needed.
- Add a staging column on the target table named acme_legacy_id (string). This holds the source system's primary key for every imported row. Becomes the alternate key for the load.
- First pass: import parent tables. Accounts before contacts (so contacts' lookup-to-account can resolve). Use the acme_legacy_id as the alternate key on the parent; contacts reference the parent's legacy ID.
- Second pass: import dependent tables with lookup columns populated using the legacy IDs.
- Validate row counts and spot-check. Compare source count to Dataverse count; pull 20 random rows and diff.
- Clean up staging columns after the load is stable (optional - keeping them is useful for audit).
The key insight is step 2 - introducing a legacy_id column so the load is idempotent and re-runnable. If the load fails halfway, you can re-run without creating duplicates, because the alternate key blocks them.
Pattern for recurring sync from file drop
A client drops a CSV in SharePoint weekly; Dataverse should absorb it and update records.
Dataflow is the right answer here. Steps:
- Create a dataflow with the SharePoint file as source.
- Apply Power Query transforms - rename columns to match target schema, convert types, handle nulls.
- Map the dataflow output to the target Dataverse table. Pick the alternate key (usually acme_legacy_id) as the upsert key.
- Schedule refresh (hourly / daily / weekly) or trigger on file change.
Gotchas we hit:
- Dataflow transactions are batched, not row-by-row. A failure in row 500 does not roll back rows 1-499. You need to reconcile post-run.
- Source CSV schema drift silently breaks dataflows. The new column the client added on their side does not fail the refresh - it just doesn't populate. Add a column-count assertion as the first transform step.
- Delete handling. Dataflows upsert but do not delete. If the source removes a row, the target keeps it. Either add a is_active flag in the source and filter on it, or add a separate deletion reconciliation step.
Pattern for high-volume cutover
Legacy CRM with 300,000 accounts, 1.2 million contacts, 4 million activity records. Timeline is a single weekend.
Neither the wizard nor dataflows scale here. We write a C# migration tool using the Dataverse SDK with ExecuteMultipleRequest batching.
Framework we reuse:
Hardening that matters:
- Parallelism: run multiple batches concurrently against separate Dataverse connections. Dataverse's concurrency limits allow 52 connections per user; tune based on throughput measurement.
- Failure logging: ContinueOnError = true lets the batch proceed past individual failures. Log the failures with the original source row so you can re-run only the failures.
- Idempotency via upsert: UpsertRequest with an alternate key means re-running the tool produces the same result, crucial for rehearsals.
- Rehearsals: run the full migration against a clone of UAT environment twice before the real cutover. Time it, measure failure rates, fix.
A 2-million-row cutover we ran last year took 6 hours of wall clock time on the real run, after 3 rehearsal runs found and fixed 4 data quality issues that would have failed the migration.
Things we do not use
Dynamics 365 Data Loader (the in-app bulk import for Sales/Customer Service): limited, undocumented quirks, slow. Skip it unless you are already in the Dynamics stack for other reasons.
Manual Excel re-import in a loop. Teams sometimes do this for recurring loads - download the template, edit, upload. It is fragile (schema drift on the template), manual (no audit trail), and does not scale. Replace with a dataflow as soon as the load becomes weekly or more.
The checklist we run pre-import
On every load above 1,000 rows:
- Alternate key defined on the target for idempotency
- Source data cleaned: no null alt-key values, no duplicates, no trailing whitespace
- Schema validated: source column count and types match target
- Rehearsal against a Dev or Test copy first
- Validation queries ready (row count, spot checks, relationship integrity)
- Rollback plan: either idempotent re-run, or bulk delete criteria ready to hand
The checklist is short and boring. The imports that fail are the ones where two of these items were missed.