Half the production incidents we have been called in to debug on Power Platform projects trace back to the same root cause: something in the release process was skipped. A missing dependency solution. A deployment settings file with an unset env variable. A plugin assembly that was built from the wrong branch. Individually these are trivial mistakes. Together, they are what separates a deploy that works from a weekend spent in recovery mode.
Here is the 12-item checklist we run before every Prod release. Eight of the items are automated; four require human judgment.
The checklist
Automated (pipeline-enforced)
- Solution builds cleanly from source. The unpacked XML in git packs into a valid solution zip. The pipeline's first job fails the release if this does not hold.
- Solution Checker passes with zero Error-level findings. Warnings are allowed with documentation in the release notes; Errors block the deploy. The pipeline enforces this as a gate.
- All tests pass. For projects with plugin unit tests or PCF tests, the test suite runs on PR and must pass before merge. The release is cut from a commit that already has green tests.
- Dependencies are on the correct version in the target environment. We maintain a dependencies.json per solution listing required dependency versions. A pre-deploy pipeline step queries the target and fails if any dependency is missing or below the required version.
- Deployment settings file exists for the target and has all required values. The file deployment-settings/prod.json is validated against the solution's environment variable definitions; missing schema names or empty required values fail the pipeline.
- Service principal authentication works against the target. A 30-second smoke test in the pipeline authenticates as the deploy SP and queries a trivial table. Fails the pipeline if auth is broken before we attempt the real import.
- Target environment is within maintenance capacity. Dataverse has API limits; a release during peak hours can get throttled. The pipeline checks current API consumption via the admin API and warns if usage is above 80% of the hourly limit.
- Rollback artifact exists. Before the pipeline imports the new managed solution, it downloads the currently-installed version of the same solution and stores it as a pipeline artifact. The artifact is what you would import if you needed to roll back.
Human review (release reviewer signoff)
- Release notes accurately describe the change. One person writes them, a different person reads them and confirms they match the commits in the release. Mismatches caught here are far cheaper than in production.
- Breaking changes are called out and agreed. A column removal, a required-field change, a plugin behavior change - each needs explicit acknowledgment from the business stakeholder that they accept the change going live.
- A responsible human is on call for the next two hours. Not a specific name - any qualified engineer. The release window is planned; the on-call coverage is confirmed.
- Business is aware and on standby for smoke test. Someone from the business team is expected to log in within 30 minutes of the deploy completing and run a basic happy-path check. If they are not available, the release is deferred.
The pipeline structure
The 8 automated checks map to pipeline stages:
The manual approval gate is Azure DevOps's built-in feature. The reviewer has to affirm each of items 9-12 in the approval comment before clicking approve. This is not fancy tooling; it is a cultural practice that prevents the "approved without reading" pattern.
What we automate next
As tooling matures, we chip away at the human-review list:
- Release notes validation via a pre-commit hook that requires a CHANGELOG entry on every PR. The reviewer at release time still reads them, but the hook ensures they exist.
- Breaking change detection via a pre-packaged tool that diffs the old and new solution XML and flags schema breaking changes (removed columns, required-level increases).
- On-call roster integration with PagerDuty or similar, so the pipeline queries the current on-call engineer and posts the release to their chat channel.
Items 9-11 are partly automatable; item 12 is genuinely human. "Is a business user available" will always be a coordination decision.
The post-deploy smoke test
After the import succeeds, the pipeline runs a small set of automated checks:
- A plugin step fires correctly on a test record (tests register via a test-mode flag that the plugin recognizes in Prod only).
- A key flow triggers and completes successfully.
- A model-driven app loads without error.
These take 60-90 seconds and catch ~70% of deploys where the import reported success but something subtle broke. The remaining 30% are caught by the business smoke test in item 12, which is why the business contact still matters.
Rollback decisions
The rollback artifact from item 8 is the first option if the deploy goes wrong. The decision tree:
- Rollback is needed and immediate: reimport the previous managed solution via the pipeline's "Rollback" job. This is the cleanest path. Takes 5-15 minutes depending on solution size.
- Partial issue, unclear severity: keep the deploy, open a hotfix ticket, fix forward in the next release. Rollback imposes its own risk (uninstalling components can drop data in edge cases); a forward fix is safer when the issue is contained.
- Data corruption detected: stop. Get a second engineer. Do not rollback blindly - understand the scope first. Rolling back a managed solution on corrupted data can make the corruption worse.
Frequency of the full checklist
For standard feature releases: every deploy. No exceptions.
For urgent hotfixes: items 1-7 and 10 are non-negotiable. Items 8 (rollback artifact), 9 (release notes), 11 (on-call), 12 (business) can be compressed or skipped if the situation genuinely requires it, but the team lead documents why.
In the past year, we have skipped items on a hotfix exactly twice. Both were due to security-sensitive customer-facing bugs where speed mattered more than process. Both were followed by a retrospective that confirmed the decision; neither caused a secondary incident.
The checklist is tedious exactly as long as nothing has ever gone wrong. The first time it catches a missing dependency at 4pm on a Friday is the day the tedium is paid back in full.