【问题标题】:Having trouble joining multiple Reddit tables with an AS and ON clause使用 AS 和 ON 子句连接多个 Reddit 表时遇到问题
【发布时间】:2019-01-29 03:01:38
【问题描述】:

我正在尝试将 cmets 加入到多个表的帖子中。我需要一个 AS 子句,因为 posts 表和 cmets 表共享一个“分数”列。

我的目标是能够使用所有这些表格中的数据在热门帖子中找到最热门的 cmets。

#standardSQL
SELECT posts.title, posts.url, posts.score AS postsscore, 
DATE_TRUNC(DATE(TIMESTAMP_SECONDS(posts.created_utc)), MONTH), 
comments.body, comments.score AS commentsscore, comments.id

FROM

fh-bigquery.reddit_posts.2015_12, fh-bigquery.reddit_posts.2016_01, fh-bigquery.reddit_posts.2016_02, fh-bigquery.reddit_posts.2016_03, fh-bigquery.reddit_posts.2016_04, fh-bigquery.reddit_posts.2016_05, fh-bigquery.reddit_posts.2016_06, fh-bigquery.reddit_posts.2016_07, fh-bigquery.reddit_posts.2016_08, fh-bigquery.reddit_posts.2016_09, fh-bigquery.reddit_posts.2016_10, fh-bigquery.reddit_posts.2016_11, fh-bigquery.reddit_posts.2016_12, fh-bigquery.reddit_posts.2017_01, fh-bigquery.reddit_posts.2017_02, fh-bigquery.reddit_posts.2017_03, fh-bigquery.reddit_posts.2017_04, fh-bigquery.reddit_posts.2017_05, fh-bigquery.reddit_posts.2017_06, fh-bigquery.reddit_posts.2017_07, fh-bigquery.reddit_posts.2017_08, fh-bigquery.reddit_posts.2017_09, fh-bigquery.reddit_posts.2017_10, fh-bigquery.reddit_posts.2017_11, fh-bigquery.reddit_posts.2017_12, fh-bigquery.reddit_posts.2018_01, fh-bigquery.reddit_posts.2018_02, fh-bigquery.reddit_posts.2018_03, fh-bigquery.reddit_posts.2018_04, fh-bigquery.reddit_posts.2018_05, fh-bigquery.reddit_posts.2018_06, fh-bigquery.reddit_posts.2018_07, fh-bigquery.reddit_posts.2018_08, fh-bigquery.reddit_posts.2018_09, fh-bigquery.reddit_posts.2018_10

AS posts

JOIN

fh-bigquery.reddit_comments.2015_12, fh-bigquery.reddit_comments.2016_01, fh-bigquery.reddit_comments.2016_02, fh-bigquery.reddit_comments.2016_03, fh-bigquery.reddit_comments.2016_04, fh-bigquery.reddit_comments.2016_05, fh-bigquery.reddit_comments.2016_06, fh-bigquery.reddit_comments.2016_07, fh-bigquery.reddit_comments.2016_08, fh-bigquery.reddit_comments.2016_09, fh-bigquery.reddit_comments.2016_10, fh-bigquery.reddit_comments.2016_11, fh-bigquery.reddit_comments.2016_12, fh-bigquery.reddit_comments.2017_01, fh-bigquery.reddit_comments.2017_02, fh-bigquery.reddit_comments.2017_03, fh-bigquery.reddit_comments.2017_04, fh-bigquery.reddit_comments.2017_05, fh-bigquery.reddit_comments.2017_06, fh-bigquery.reddit_comments.2017_07, fh-bigquery.reddit_comments.2017_08, fh-bigquery.reddit_comments.2017_09, fh-bigquery.reddit_comments.2017_10, fh-bigquery.reddit_comments.2017_11, fh-bigquery.reddit_comments.2017_12, fh-bigquery.reddit_comments.2018_01, fh-bigquery.reddit_comments.2018_02, fh-bigquery.reddit_comments.2018_03, fh-bigquery.reddit_comments.2018_04, fh-bigquery.reddit_comments.2018_05, fh-bigquery.reddit_comments.2018_06, fh-bigquery.reddit_comments.2018_07, fh-bigquery.reddit_comments.2018_08, fh-bigquery.reddit_comments.2018_09, fh-bigquery.reddit_comments.2018_10

AS comments

ON posts.id = SUBSTR(comments.link_id, 4)

WHERE posts.subreddit = 'Showerthoughts' AND posts.score >100 AND comments.score >100
ORDER BY posts.score DESC

我的目标是能够使用所有这些表格中的数据在热门帖子中找到最热门的 cmets。

【问题讨论】:

    标签: google-bigquery reddit


    【解决方案1】:

    好的,这个查询的问题:

    • 小心!此查询将处理大量数据。我可以重新对表进行聚类以提高这种方式的效率,但我还没有。
    • 在#standardSQL 中,逗号表示JOIN,而不是UNION。所以你需要UNION这些表。
    • 快捷方式:您可以在表名末尾附加* 以扩展所有匹配的表。
    • 使用反引号对表名进行转义。

    话虽如此,一个有效的查询将是:

    #standardSQL
    SELECT posts.title, posts.url, posts.score AS postsscore, 
    DATE_TRUNC(DATE(TIMESTAMP_SECONDS(posts.created_utc)), MONTH), 
    SUBSTR(comments.body, 0, 80), comments.score AS commentsscore, comments.id
    
    FROM `fh-bigquery.reddit_posts.2015*` AS posts
    JOIN `fh-bigquery.reddit_comments.2015*` AS comments
    
    ON posts.id = SUBSTR(comments.link_id, 4)
    
    WHERE posts.subreddit = 'Showerthoughts' 
    AND posts.score >100 
    AND comments.score >100
    ORDER BY posts.score DESC
    

    【讨论】:

      猜你喜欢
      • 2017-07-25
      • 2019-07-14
      • 2020-01-31
      • 1970-01-01
      • 2011-07-13
      • 1970-01-01
      • 2023-01-17
      • 2017-03-11
      • 1970-01-01
      相关资源
      最近更新 更多