Loading...

SOQL Selectivity: Avoiding Full Table Scans on Million-Row Objects

A query that runs in 200ms against 100K records can run for 30 seconds against 5M records. The difference is selectivity. The fix is index design, query shape, and a few patterns that keep Apex fast as the org grows.

SOQL Selectivity: Avoiding Full Table Scans on Million-Row Objects

A Salesforce org grew from 50,000 Cases to 5 million Cases over four years. Apex code that processed Cases used to complete in 200ms. The same code now takes 30 seconds when it completes at all; sometimes it times out. The team blames "the platform getting slower." The actual cause is almost always SOQL selectivity. The query no longer hits an index. The platform is scanning the full table.

Selectivity is the single most important performance concern in Apex code touching large objects. The fix is mechanical when caught early and expensive when caught after the data has accumulated. Sapota's Salesforce team treats LDV (large data volume) audits as a recurring engagement type because the failure mode keeps recurring.

What selectivity actually means

A SOQL query is selective when Salesforce can use an index to find the matching records quickly. A query is non-selective when Salesforce has to scan the full table.

Salesforce indexes:

  • All standard fields that are primary keys (Id, Name on standard objects).
  • All foreign key fields (lookups, master-details).
  • Any field marked as External ID.
  • Any field marked as Unique.
  • Custom indexes on specific fields (requested via support; not self-serve).

For an index to be used, the WHERE clause must filter on an indexed field, AND the filter must be selective enough. The selectivity threshold:

  • Standard indexed field: query returns less than 30 percent of total records, OR less than 1 million records (whichever is lower).
  • Custom indexed field: query returns less than 10 percent of total records, OR less than 333,333 records.

Cross these thresholds and Salesforce falls back to a full table scan, regardless of whether the field is technically indexed.

What non-selective queries look like

Patterns that fail at high volume:

// Status = 'Open' might match 40 percent of all Cases. // Non-selective on a 5M-row Case table. SELECT Id, Subject FROM Case WHERE Status = 'Open' `// LIKE with leading wildcard cannot use the index.` `SELECT Id FROM Account WHERE Name LIKE '%Acme%'` // NOT operators on indexed fields disable the index. SELECT Id FROM Lead WHERE IsConverted != TRUE `// OR across non-indexed fields produces a scan.` `SELECT Id FROM Opportunity` `WHERE StageName = 'Closed Won' OR Probability > 90` // No filter at all (yes, this exists in production code). SELECT Id, Amount FROM Opportunity

All of these may run fine in a developer org with 1,000 records. They become incidents when the data grows.

The selective rewrite patterns

The same queries, refactored:

// Add an indexed field to narrow the scope. // CreatedDate is indexed; pair it with Status. SELECT Id, Subject FROM Case WHERE Status = 'Open' AND CreatedDate = LAST_N_DAYS:30 `// Use full-text search (SOSL) for substring matching,` `// or store a normalized prefix in an indexed field.` `SELECT Id FROM Account WHERE Name LIKE 'Acme%'` // Rewrite to avoid NOT on indexed fields. SELECT Id FROM Lead WHERE IsConverted = FALSE `// Split the OR into separate queries against indexed fields.` `List won = [SELECT Id FROM Opportunity` ` WHERE StageName = 'Closed Won'];` `List highProb = [SELECT Id FROM Opportunity` ` WHERE Probability > 90` ` AND StageName != 'Closed Lost'];` // Always include a date or owner filter on large objects. SELECT Id, Amount FROM Opportunity WHERE CloseDate = THIS_QUARTER

The pattern: narrow the result set with an indexed filter as the primary criterion. Use additional filters to refine within that subset.

How to know if a query is selective

Two diagnostic tools:

Query Plan Tool in Developer Console. Type the query, click "Query Plan." Returns a cost estimate and notes whether the query is selective. A "cost" below 1.0 generally means the query is selective; above 1.0 means the platform will scan more than the threshold.

EXPLAIN-like analysis with sObjectDescribe and indexes. For custom fields, check the field's customIndexed attribute or the field setup screen. Indexed fields show an "Indexed" indicator.

In production, the symptom is timing. A SOQL query that takes more than a few seconds against a single object table is almost always non-selective. The fix starts with identifying the indexed field the query should use.

Skinny tables and external IDs

For very high-volume objects (often 5M+ rows) with frequent reporting queries, Salesforce offers two additional optimizations:

Skinny tables are a feature requested through support. Salesforce creates a read-only copy of an object that contains only the columns you specify. Reports and queries that touch only those columns run against the skinny table, bypassing the full record table. Massive performance improvement for predictable query patterns.

The catch: skinny tables only apply for specific use cases (read-only reporting, specific SOQL patterns). Custom Apex queries must be written to hit the skinny table specifically. Engagement with Salesforce support is required to set them up.

External IDs and unique indexes. Marking a custom field as External ID makes it indexed. Marking it Unique also indexes it. Both are self-serve in Setup. Use these proactively for fields that filter or join in Apex. Adding an External ID to a custom field after the data has grown requires a one-time index build (Salesforce manages this).

When to use Database.query vs static SOQL

Static SOQL (inside square brackets) is checked at compile time and has better tooling support. Dynamic SOQL (Database.query(...)) is needed for queries built at runtime.

For LDV concerns, both produce the same query plan. Selectivity is a property of the query string, not how it was constructed. Prefer static SOQL when the query shape is known; use dynamic SOQL only when the shape genuinely varies.

A common LDV anti-pattern with dynamic SOQL: building queries from user input that produces non-selective filters. The user-typed search hits the unindexed field, the query scans the full table, the page times out. Always validate that user-driven queries use indexed fields with selective filters.

Batch Apex for genuinely large operations

When a single SOQL query truly needs to process millions of records, no amount of selectivity makes it fit in one transaction. The right tool is Batch Apex.

Batch Apex chunks the query into batches of 200 records by default (configurable up to 2,000). Each batch runs in its own transaction with its own governor limits. The framework handles the iteration.

The pattern:

public class CaseAgingBatch implements Database.Batchable { public Database.QueryLocator start(Database.BatchableContext bc) { // Selective by CreatedDate to bound the result. return Database.getQueryLocator( 'SELECT Id, Status FROM Case ' + 'WHERE CreatedDate < LAST_N_DAYS:90 ' + 'AND Status = \'Open\'' ); } ` public void execute(Database.BatchableContext bc,` ` List scope) {` ` // Process up to 200 cases per batch.` ` }` public void finish(Database.BatchableContext bc) {} }

Even Batch Apex requires a selective start query. The framework cannot turn a non-selective query into a fast one; it can only chunk the result.

What good LDV practices look like

A Salesforce org healthy with respect to large data volumes:

  • Every Apex SOQL query against high-volume objects (Case, Task, Opportunity, custom objects above 1M rows) uses an indexed field in the WHERE clause.
  • Date-bounded filters (CreatedDate = LAST_N_DAYS:30) are the default on bulk processing code.
  • External ID fields configured on every custom field used in joins or lookups across Apex.
  • Skinny tables requested for high-volume reporting use cases.
  • Batch Apex used for operations touching more than 50K records.
  • Query Plan tool runs against new SOQL during code review.

Sapota's Salesforce team holds the Platform Developer I credential and runs LDV audits as a defined engagement type. The audit produces a list of non-selective queries, expected impact at the org's projected data growth, and the refactor path for each. Investing in selectivity before data accumulates is dramatically cheaper than after.


Auditing Apex SOQL for selectivity or running an LDV-aware refactor? Sapota's Salesforce team handles performance audits, indexing strategy, and Batch Apex design on production engagements. Get in touch ->

See our full platform services for the stack we cover.

Contact Us Now

Share Your Story

We build trust by delivering what we promise – the first time and every time!

We'd love to hear your vision. Our IT experts will reach out to you during business hours to discuss making it happen.

WHY CHOOSE US

"Collaborate, Elevate, Celebrate where Associates - Create Project Excellence"

SapotaCorp beyond the IT industry standard, we are

  • Certificated
  • Assured quality
  • Extra maintenance

Tell us about your project