Loading...

Dataverse webhooks to Service Bus: retry and dead-lettering

Dataverse can call a webhook on every table change. Azure Service Bus can accept that webhook and buffer it for downstream consumers. Plugging them together correctly - with retry on transient failures and dead-letter on permanent ones - is the difference between reliable integration and nightly pager alerts.

dataverse-webhooks-service-bus-retry

A Dataverse row change needs to trigger downstream processing - update an ERP, index into a search engine, notify a mobile app. The "sync plugin that makes HTTP calls" path works for demos and fails in production the first time the downstream service has a bad hour. Plug-ins with external HTTP calls hit the 2-minute plugin timeout, block the user's save, and accumulate retries that saturate everything downstream.

The production-grade pattern: Dataverse fires a webhook into Azure Service Bus on the row change. Service Bus holds the message durably. A consumer (Azure Function, Logic App, or custom service) pulls from the queue on its own schedule, processes at its own pace, and dead-letters failures for later inspection.

Here is the full pipeline, the retry semantics at each stage, and the gotchas we have debugged in production.

The shape

Three queues total: the main queue, the dead-letter queue (automatic subqueue of the main), and optionally a "retry" queue for messages that failed once but might succeed on a delay.

Stage 1: Dataverse webhook plugin

Dataverse's built-in Service Endpoint mechanism posts messages to Azure Service Bus natively. You register a Service Endpoint pointing at your Service Bus queue's URL and SAS token. Plugins registered with ServiceEndpoint as the destination post automatically.

Alternatively, a traditional async plugin that uses the Azure SDK to send to Service Bus works the same way and gives you finer control (batch sends, custom message properties, etc.). We use this for high-volume scenarios.

Plugin registered as: post-operation, async, on the table Update (or whatever event triggers the downstream work).

The plugin's job is tiny:

  1. Build a message body from the current row (or just include the row's primary key plus enough metadata for the consumer to fetch the full row).
  2. Send to Service Bus.
  3. Return.

Durations measured: typically 100-300ms per execution. The async service handles this on its own schedule; user saves never wait on it.

Stage 2: Service Bus queue configuration

Queue settings that matter:

  • Max delivery count: how many times the queue will re-deliver a message before moving it to dead-letter. Default 10. We set to 5 for most scenarios - enough retries for transient issues, few enough that persistent failures don't churn for an hour.
  • Message time-to-live: maximum time a message stays queued. Default 14 days. We set to 7 days - long enough for a weekend outage to recover.
  • Lock duration: how long a consumer holds a message before visibility returns. Default 30 seconds. Set to maximum expected processing time + buffer (e.g., if processing takes 10 seconds max, set lock to 30 seconds).

The Session and Partitioning features we usually do not enable. Sessions are for order-preserving scenarios; partitioning adds complexity we don't usually need at our volumes.

Stage 3: consumer (Azure Function with Service Bus trigger)

A Function App with the Service Bus Queue trigger pulls messages automatically:

Three completion paths:

  • Complete: message processed successfully, removed from queue. This is the happy path.
  • Dead-letter: permanent failure (bad data, unknown row, business logic rejection). Moved to dead-letter queue for manual review.
  • Throw/abandon: transient failure. Message returns to the queue for retry until max delivery count is reached, then auto-moved to dead-letter.

Categorizing exceptions into "permanent" and "transient" is the single most important decision in the consumer. Transient failures should retry; permanent ones should not churn the retry counter.

Retry behavior in detail

When the consumer throws (transient path), Service Bus behavior:

  1. First delivery fails. Lock expires. Message becomes visible again after the lock expiration.
  2. Next poll pulls the message. Delivery count is 2. Same failure? Same outcome.
  3. Repeat until delivery count hits max (5 in our config).
  4. Message auto-moves to dead-letter queue with system-reason "MaxDeliveryCountExceeded."

The back-off between retries is configurable via the consumer's maxAutoLockRenewalDuration and retry policy settings. The Function App Service Bus trigger has reasonable defaults; tune only if your failure characteristics require it.

Dead-letter queue handling

Messages in DLQ are the alerts that matter. Options:

  • Manual review dashboard: a simple admin page that lists DLQ messages, shows their content, lets an operator decide: retry, discard, fix-data-then-retry.
  • Automated re-enqueue after a delay: for certain dead-letter reasons, moving the message back to the main queue on a schedule. Be careful with this - if the failure is actually permanent, you create a loop.
  • Automated alerting: DLQ depth > N → PagerDuty. For critical integrations, operators should know within minutes that messages are dead-lettering.

We use a combination: alerting for DLQ depth above zero (for critical queues) plus a manual review dashboard that captures the specific message and lets operators take action.

Idempotency requirements on the consumer

Because retries happen, the consumer must be idempotent: processing the same message twice should produce the same result as processing once.

Common patterns:

  • Upsert at the destination: if updating a row in ERP, use the order ID as a natural key and upsert. Same message twice yields the same end state.
  • Dedupe by message ID: each Service Bus message has a unique MessageId. Track processed IDs in a fast store (Redis, Cosmos); skip repeats.
  • Compare-and-swap patterns: for sequence-sensitive updates, include a version number; reject updates with stale version.

Without idempotency, every retry is a chance for corruption. We audit this explicitly on every new consumer.

Monitoring the whole chain

Observability per stage:

  1. Dataverse plugin: trace logs, System Jobs list. Failures here are rare once the plugin is stable, but Azure Service Bus connectivity blips show up as plugin failures.
  2. Service Bus queue: Azure Monitor metrics for message count, DLQ depth, incoming rate, active connections. Alert on DLQ depth > 0 for critical queues.
  3. Consumer Function: Application Insights for exceptions, duration, failure rate. Alert on error rate > 1% or duration > SLA.
  4. Downstream system: whatever native monitoring the target has (ERP dashboards, search engine metrics).

Correlate via MessageId or a custom correlation header set at plugin send time. Every log entry across stages includes the correlation ID; one query traces a message from Dataverse to final destination.

The pattern vs the alternatives

Why not Power Automate?

  • Cost at scale: at one flow run per Dataverse change and a million changes per month, Power Automate licensing gets expensive.
  • Retry and dead-letter semantics are less mature: Power Automate has retry but less explicit dead-letter; operations visibility is weaker.
  • Concurrency limits differ: an Azure Function App can scale to hundreds of concurrent instances; a Power Automate flow has per-flow throughput limits.

Why not direct plugin-to-ERP?

  • 2-minute plugin timeout kills long integrations.
  • Synchronous retry blocks users.
  • Failures become support tickets instead of operations dashboard alerts.

When Power Automate is enough (low volume, simple routing), use it. When the integration is core to the business and failures have operational consequences, the Dataverse → Service Bus → Function pattern pays its complexity back within the first outage it absorbs gracefully.

Contact Us Now

Share Your Story

We build trust by delivering what we promise – the first time and every time!

We'd love to hear your vision. Our IT experts will reach out to you during business hours to discuss making it happen.

WHY CHOOSE US

"Collaborate, Elevate, Celebrate where Associates - Create Project Excellence"

SapotaCorp beyond the IT industry standard, we are

  • Certificated
  • Assured quality
  • Extra maintenance

Tell us about your project

close