The Impact of Wildcards on SQL Performance:

SQL wildcards (% and _) can be powerful for pattern matching, but they can also have performance implications, especially on large datasets. Here’s a breakdown of their usage and impact on performance:

wild card

Leading Wildcards (%pattern)

WHERE column LIKE '%pattern'
Python

Performance Impact: Using a leading % disables index usage, as the database must perform a full scan to find matches. Avoid leading wildcards when possible, especially on large datasets, as they are resource-intensive.

Trailing Wildcards (pattern%)

WHERE column LIKE 'pattern%'
Python

Performance Impact: Trailing % allows the database to use indexes (if they exist on that column) because it only needs to match the beginning of the text. This type of wildcard search is faster and more efficient compared to leading wildcards.

Mid-Pattern Wildcards (%pattern%)

WHERE column LIKE '%pattern%'
Python

Performance Impact: Mid-pattern wildcards prevent index usage as the database must check the whole text, causing a full table scan. This is often the most resource-intensive wildcard search and should be minimized or optimized when possible.

No Wildcard

WHERE column LIKE 'pattern'
Python

or

WHERE column = 'pattern'
Python

Performance Impact: Fastest, as it can leverage the index fully for an exact match.

Summary of Expected Performance

Query TypeQuery ExampleIndex UseExpected Performance
Exact MatchWHERE example_column = 'pattern'YesFastest
Trailing WildcardWHERE example_column LIKE 'pattern%'YesFast
Leading WildcardWHERE example_column LIKE '%pattern'NoSlow
Mid-Pattern WildcardWHERE example_column LIKE '%pattern%'NoSlowest

Why leading pattern and mid-pattern does use index?

Leading and mid-pattern wildcards (%pattern or %pattern%) prevent index usage because indexes, typically structured as B-trees, are optimized to search from the beginning of each entry in left-to-right order. When a wildcard appears at the beginning or middle of a pattern, the database can’t predict the starting position of the match, so it must perform a full table scan instead of efficiently navigating the index. Indexes can still be used for patterns with a fixed prefix (e.g., pattern%), allowing the database to directly access entries that start with the specified prefix, thus maintaining performance.

Conclusion

Trailing wildcards (LIKE ‘pattern%’) provide the best performance after exact matches, as they can utilize indexes while leading (LIKE ‘%pattern’) and mid-pattern wildcards (LIKE ‘%pattern%’) significantly degrade performance due to the need for a full table scan. For large datasets, avoid leading and mid-pattern wildcards whenever possible, as they are the slowest and prevent index use. Instead, consider Full-Text Search for more complex text-matching needs, as it offers better optimization for searches within text.

Resources

1 thought on “The Impact of Wildcards on SQL Performance:”

Leave a Comment