Don’t use DISTINCT as a “join-fixer”


 

I’ve quietly resolved performance issues by re-writing slow queries to avoid DISTINCT. Often, the DISTINCT is there only to serve as a “join-fixer,” and I can explain what that means using an example.

Let’s say we have the following grossly simplified schema, representing customers, products, and product categories:

CREATE TABLE dbo.Customers ( CustomerID int NOT NULL, Name nvarchar(255) NOT NULL, CONSTRAINT PK_Customers PRIMARY KEY (CustomerID) ); CREATE TABLE dbo.Categories ( CategoryID int NOT NULL, Name nvarchar(255) NOT NULL, CONSTRAINT PK_Categories PRIMARY KEY (CategoryID), CONSTRAINT UQ_Categories UNIQUE (Name) ); CREATE TABLE dbo.Products ( ProductID int NOT NULL, CategoryID int NOT NULL,