如果评论太多，YouTube评论提取器无限循环答案

【问题标题】：YouTube comments extractor infinite loop if there is too many comments如果评论太多，YouTube评论提取器无限循环
【发布时间】：2021-07-21 12:31:07
【问题描述】：

我编写了一个脚本来提取 YouTube 的视频 cmets 并将其存储在给定视频 ID 的文件中。如果视频少于 10-15 cmets，则没有问题，脚本运行良好，但是当有更多时，它会进入无限循环，我不知道为什么。

from googleapiclient.discovery import build 
import os
api_key = '...'

def video_comments(video_id): 
    # empty file for storing comments
    outputFile = open("comments_"+video_id+".txt", "w", encoding='utf-8')

    # empty dictionnary to store the data
    commentsDict = []

    # empty list for storing reply 
    replies = [] 

    # creating youtube resource object 
    youtube = build('youtube', 'v3', 
                    developerKey=api_key) 

    # retrieve youtube video results 
    video_response=youtube.commentThreads().list( 
    part='snippet,replies', 
    videoId=video_id 
    ).execute() 

    # iterate video response 
    while video_response: 
        
        # extracting required info 
        # from each result object 
        for item in video_response['items']: 
            # Extracting comments 
            comment = item['snippet']['topLevelComment']['snippet']['textDisplay'] 
            commentEntrie = {"comment": comment, 'replies': []}
            
            # counting number of reply of comment 
            replycount = item['snippet']['totalReplyCount'] 

            # if reply is there 
            if replycount>0: 
                
                # iterate through all reply 
                for reply in item['replies']['comments']: 
                    
                    # Extract reply 
                    reply = reply['snippet']['textDisplay'] 
                    
                    # Store reply is list 
                    replies.append(reply) 
                    commentEntrie['replies'].append(reply)
                    
            # print comment with list of reply 
            print(comment, replies, end = '\n\n')
            outputFile.write("%s" % comment)
            outputFile.write("%s\n" % replies)
            commentsDict.append(commentEntrie)
            # empty reply list 
            replies = [] 

        # Again repeat 
        if 'nextPageToken' in video_response: 
            video_response = youtube.commentThreads().list( 
                    part = 'snippet,replies', 
                    videoId = video_id 
                ).execute() 
        else: 
            break
    outputFile.close()
    print(commentsDict)

# Enter video id 
video_id = "aDHYbM9OqUc" 

# Call function 
video_comments(video_id)

我可以提供两个视频ID，这个LVgKlfw4DHc 工作正常，但这个以无限循环结束aDHYbM9OqUc 有什么想法吗？

[编辑] 我觉得nextPageToken 总是在这里，它会无限地运行

【问题讨论】：

标签： python loops youtube-api youtube-data-api

【解决方案1】：

由于这段代码，您的循环 while video_response: 变为无限：

if 'nextPageToken' in video_response: 
    video_response = youtube.commentThreads().list( 
        part = 'snippet,replies', 
        videoId = video_id 
    ).execute() 
else: 
    break

如果第一个video_response 包含属性nextPageToken，则循环内对CommentThreads.list 的调用与循环外的调用完全相同。因此，通过第二次调用，您将得到完全与从前一次调用中获得的video_response 相同的video_response。

正确的实现应该是：

if 'nextPageToken' in video_response: 
    video_response = youtube.commentThreads().list( 
        pageToken = video_response['nextPageToken'],
        part = 'snippet,replies', 
        videoId = video_id 
    ).execute() 
else: 
    break

由于您使用的是 Google 的 APIs Client Library for Python，因此在 CommentThreads.list API 端点上实现 result set pagination 的 pythonic way 如下所示：

request = youtube.commentThreads().list(
    part = 'snippet,replies', 
    videoId = video_id 
)

while request:
    response = request.execute()

    for item in response['items']:
        ...

    request = youtube.commentThreads().list_next(
        request, response)

由于 Python 客户端库的实现方式，这很简单：根本不需要显式处理 API 响应对象的属性 nextPageToken 和 API 请求参数 pageToken。

【讨论】：

感谢您的回答！感谢您的详细信息！