Query Exercise Answers: Solving the 201 Buckets Problem


In this week’s Query Exercise challenge, I explained SQL Server’s 201 buckets problem. SQL Server’s statistics only handle up to ~201 outliers, which means that outliers ~202-300 get wildly inaccurate estimates.

In our example, I had an index on Location and perfectly accurate statistics, but even still, this query gets bad estimates because Lithuania is in outliers ~202-300:

CREATE INDEX Location ON dbo.Users(Location); GO SELECT * FROM dbo.Users WHERE Location = N’lithuania’ ORDER BY Reputation DESC;

SQL Server estimates that only 8 rows will be found for Lithuania, when in reality 2,554 rows come back:

This under-estimation isn’t really