I'm trying to get the top 5 comments by score for each Reddit post. I only want to retrieve the top N comments by score for each post title.
Example: I only would want comment 1 and 2 for each post.
Post 1 | Comment 1 | Comment Score 10
Post 1 | Comment 2 | Comment Score 9
Post 1 | Comment 3 | Comment Score 8
Post 2 | Comment 1 | Comment Score 10
Post 2 | Comment 2 | Comment Score 9
Post 2 | Comment 3 | Comment Score 8
StandardSQL
SELECT
posts.title,
posts.url,
posts.score AS postsscore,
DATE_TRUNC(DATE(TIMESTAMP_SECONDS(posts.created_utc)), MONTH),
SUBSTR(comments.body, 0, 80),
comments.score AS commentsscore,
comments.id
FROM
`fh-bigquery.reddit_posts.2015*` AS posts
JOIN `fh-bigquery.reddit_comments.2015*` AS comments
ON posts.id = SUBSTR(comments.link_id, 4)
WHERE
posts.subreddit = 'Showerthoughts'
AND posts.score >100
AND comments.score >100
ORDER BY
posts.score DESC,
posts.title DESC,
comments.score DESC
The following recommendations will help you in your SQL tuning process.
You'll find 3 sections below:
ALTER TABLE `fh-bigquery.reddit_comments.2015*` ADD INDEX `fhbigquery_reddit_idx_score` (`score`);
ALTER TABLE `fh-bigquery.reddit_posts.2015*` ADD INDEX `fhbigquery_reddit_idx_subreddit_score` (`subreddit`,`score`);
SELECT
posts.title,
posts.url,
posts.score AS postsscore,
DATE_TRUNC(DATE(TIMESTAMP_SECONDS(posts.created_utc)),
MONTH),
SUBSTR(comments.body,
0,
80),
comments.score AS commentsscore,
comments.id
FROM
`fh-bigquery.reddit_posts.2015*` AS posts
JOIN
`fh-bigquery.reddit_comments.2015*` AS comments
ON posts.id = SUBSTR(comments.link_id,
4)
WHERE
posts.subreddit = 'Showerthoughts'
AND posts.score > 100
AND comments.score > 100
ORDER BY
posts.score DESC,
posts.title DESC,
comments.score DESC