Ask most teams what their business-continuity plan for a Power Platform solution is, and the answer is some version of "it's in the cloud, Microsoft handles it." That answer is half right, which is what makes it dangerous. Microsoft genuinely does handle the part it owns: the data centres, the hardware, the platform's infrastructure resilience, the regional availability of the service. What Microsoft does not handle is the category of failure that actually takes solutions down in practice, and that category is entirely yours. A deployment that breaks production, an environment deleted by mistake, an integration that goes down and strands a process, a data load that overwrites good records with bad ones, none of these are infrastructure failures, and none of them are Microsoft's to recover.
So the architect's job on continuity is to find the line between what the platform covers and what the solution owns, and to design for the solution side, because that is the side where the real outages live. "It's in the cloud" is a statement about infrastructure. Business continuity is a statement about the solution, and the two are not the same thing.
What Microsoft owns, and why it is not enough
It is worth being clear about what you genuinely get for free, because it is real and it is the foundation. Microsoft runs the infrastructure with the resilience you would expect from a major cloud: hardware failure, data-centre issues, and the underlying platform's availability are Microsoft's responsibility, handled at a scale and reliability no individual project would build itself. For the infrastructure layer, the cloud assumption is correct, and you should not be designing your own data-centre failover.
The trap is generalising that to "continuity is handled," because infrastructure resilience does not protect you from anything that happens at the solution level. Microsoft keeping the service running does not undo a deployment that introduced a breaking change, does not restore an environment someone removed, does not fix an integration whose downstream system is down, and does not roll back a bad data import. Those are the failures that take a business process offline, and they sit squarely on the customer side of the shared-responsibility line. Recognising that line is the whole starting point: the platform is resilient, your solution's continuity is still your design.
Design for the failures you actually own
Once you accept that the real risks are solution-level, continuity becomes a concrete design exercise around a handful of failure types.
A bad deployment. The most common self-inflicted outage is a change that breaks production. Continuity here is ALM discipline: solutions versioned and deployed through a pipeline, with the ability to roll back to the last known-good version rather than scrambling to hand-fix production. If your only path after a bad deployment is forward, you do not have continuity, you have hope. A tested rollback path is the answer.
A lost or corrupted environment. Environments can be deleted, and data within them can be corrupted, and the answer is backup and restore: knowing what backups exist for your environments and Dataverse, both the system-managed ones and any you take deliberately, and crucially having tested that a restore actually works and how long it takes. An untested backup is a guess.
A dependency failure. Power Platform solutions usually depend on other systems through integrations, and those systems fail or slow down. Continuity means designing for that: what the solution does when an integration is unavailable, whether it queues, degrades gracefully, or alerts, rather than silently breaking a process. A solution that assumes its dependencies are always up will go down whenever they do.
A bad data load. Imports and migrations can write wrong or duplicate data over good data, and recovering means being able to get back to the pre-load state, which again comes back to backup and to running risky loads in a way you can reverse. We have seen test migrations reveal exactly this risk before production; the recovery plan is what makes the risk survivable.
Each of these is a failure you own, and each has a concrete continuity measure. Designing for them is what turns "it's in the cloud" into an actual plan.
Frame it with RTO and RPO
The way to make these decisions concrete rather than vague is the standard continuity framing: recovery time objective and recovery point objective. How long can this solution be down before it hurts the business, and how much data can the business afford to lose if you have to restore. Those two numbers, asked per solution, turn continuity from a checkbox into a design.
A solution that can be down for a day and lose a few hours of data needs far less than one that must be back in minutes with no data loss, and pretending every solution is one or the other is how you either over-engineer trivial apps or under-protect critical ones. The architect captures RTO and RPO as part of requirements, the same way as any other non-functional requirement, and designs the backup frequency, the rollback path, and the dependency handling to meet them. Without those numbers, "business continuity" is an aspiration; with them, it is a specification you can build and test against.
The plan an architect should be able to state
A Power Platform solution with real continuity has answers, not assumptions. It knows what Microsoft covers at the infrastructure level and does not rely on that for solution-level recovery. It has an ALM pipeline with a tested rollback for bad deployments, a known and tested backup-and-restore story for environments and Dataverse, a defined behaviour when integrations fail rather than a silent break, and a way back from a bad data load. And it has RTO and RPO numbers that those measures were designed to meet, captured as requirements rather than discovered during the first real outage.
The mistake to leave behind is the comfortable one: assuming the cloud handles continuity because it handles infrastructure. It handles the data centre. It does not handle your deployment, your environment, your integration, or your data, and those are exactly the things that go wrong. Continuity for them is the architect's to design, and the time to design it is before the outage, not during.








