Find Posts with the Wrong CommentCount: Answers & Discussion


Your Query Exercise was to find denormalization accuracy problems: checking the accuracy of a reporting column, Posts.CommentCount. There were two parts: finding the top 100 most problematic Posts with the biggest variances, and thinking about a long term solution to keep the CommentCount accuracy as high as practical.

Question 1: Finding the Problematic Posts

Your first attempt might have looked something like this:

SELECT TOP 100 p.Id, p.CommentCount, COUNT(distinct c.Id) as comments_actual, p.CommentCount – COUNT(distinct c.Id) as comments_variance FROM dbo.Posts p INNER JOIN dbo.Comments c ON p.Id = c.PostId GROUP BY p.id, p.CommentCount HAVING p.CommentCount <> COUNT(distinct c.Id) ORDER BY p.CommentCount