SQL wildcards (% and _) can be powerful for pattern matching, but they can also have performance implications, especially on large datasets. Here’s a breakdown of their usage and impact on performance:
Table of Contents
Leading Wildcards (%pattern)
WHERE column LIKE '%pattern'
PythonPerformance Impact: Using a leading % disables index usage, as the database must perform a full scan to find matches. Avoid leading wildcards when possible, especially on large datasets, as they are resource-intensive.
Trailing Wildcards (pattern%)
WHERE column LIKE 'pattern%'
PythonPerformance Impact: Trailing % allows the database to use indexes (if they exist on that column) because it only needs to match the beginning of the text. This type of wildcard search is faster and more efficient compared to leading wildcards.
Mid-Pattern Wildcards (%pattern%)
WHERE column LIKE '%pattern%'
PythonPerformance Impact: Mid-pattern wildcards prevent index usage as the database must check the whole text, causing a full table scan. This is often the most resource-intensive wildcard search and should be minimized or optimized when possible.
No Wildcard
WHERE column LIKE 'pattern'
Pythonor
WHERE column = 'pattern'
PythonPerformance Impact: Fastest, as it can leverage the index fully for an exact match.
Summary of Expected Performance
Query Type | Query Example | Index Use | Expected Performance |
---|---|---|---|
Exact Match | WHERE example_column = 'pattern' | Yes | Fastest |
Trailing Wildcard | WHERE example_column LIKE 'pattern%' | Yes | Fast |
Leading Wildcard | WHERE example_column LIKE '%pattern' | No | Slow |
Mid-Pattern Wildcard | WHERE example_column LIKE '%pattern%' | No | Slowest |
Why leading pattern and mid-pattern does use index?
Leading and mid-pattern wildcards (%pattern or %pattern%) prevent index usage because indexes, typically structured as B-trees, are optimized to search from the beginning of each entry in left-to-right order. When a wildcard appears at the beginning or middle of a pattern, the database can’t predict the starting position of the match, so it must perform a full table scan instead of efficiently navigating the index. Indexes can still be used for patterns with a fixed prefix (e.g., pattern%), allowing the database to directly access entries that start with the specified prefix, thus maintaining performance.
Conclusion
Trailing wildcards (LIKE ‘pattern%’) provide the best performance after exact matches, as they can utilize indexes while leading (LIKE ‘%pattern’) and mid-pattern wildcards (LIKE ‘%pattern%’) significantly degrade performance due to the need for a full table scan. For large datasets, avoid leading and mid-pattern wildcards whenever possible, as they are the slowest and prevent index use. Instead, consider Full-Text Search for more complex text-matching needs, as it offers better optimization for searches within text.
1 thought on “The Impact of Wildcards on SQL Performance:”