根据视频 ID 列表返回 youtube 视频的时间戳答案

【问题标题】：Returning the time stamps of youtube videos based on a list of video ids根据视频 ID 列表返回 youtube 视频的时间戳
【发布时间】：2022-01-21 14:39:16
【问题描述】：

你可以在这个 google colab 文件中运行我的代码 --> https://colab.research.google.com/drive/1Tfoa5y13GPLxbS-wFNmZpvtQDogyh1Rg?usp=sharing

所以我编写了一个脚本，它采用 YouTube 视频的 VideoID，例如：

VideoID = '3c584TGG7jQ'

基于此 VideoID，我的脚本返回带有 youtube 成绩单（视频内容）的字典列表，例如：

data = [{'text': 'Hello World', 'start': 0.19, 'duration': 4.21}, ...]

最后我写了一个函数，它接受用户的输入，即你要搜索的单词/句子，函数返回带有相应超链接的时间戳。

def search_dictionary(user_input, dictionary):
        MY_CODE_SEE_GOOGLE_COLAB_NOTEBOOK


search_dictionary(user_input, dictionary)

Input: "stolen"

Output: 
the 2 million packages that are stolen... 0.0 min und 39.0 sec :: https://youtu.be/3c584TGG7jQ?t=38s
stolen and the fifth is this outer... 3.0 min und 13.0 sec :: https://youtu.be/3c584TGG7jQ?t=192s

现在是我的问题。如何将此应用于 video_ids 列表？例如

list_of_video_ids = ['pXDx6DjNLDU', '8HEfIJlcFbs', '3c584TGG7jQ', ...]

预期输出：

Title_0, timestamp, hyperlink
Title_0, timestamp, hyperlink
Title_1, timestamp, hyperlink
Title_2, timestamp, hyperlink
Title_2, timestamp, hyperlink
Title_2, timestamp, hyperlink
Title_2, timestamp, hyperlink

所以所有 video_id 中的每一个提及，而不仅仅是一个 video_id

【问题讨论】：

您可以循环list_of_video_ids 中的字符串，并在每个循环中将transcript.fetch() 的响应附加到您的data 变量中。搜索如何每次都附加结果。
@MarcoAurelioFernandezReyes 您能否将我的代码复制粘贴到您自己的 google colab 中并应用您的解决方案？我有点卡住了。我还认为循环它会很容易。
不过，我仍在学习 Python，但是，当我阅读您的代码时，您几乎掌握了它。如果未将元素附加到给定的列表/字典，请尝试循环响应并在单独的列表变量中添加结果。也许我可以再过一段时间，但现在不行，抱歉。
好吧，没问题。慢慢来。
感谢您分享您的代码，我已对其进行了更改，希望对您有所帮助。干杯。

标签： python python-3.x youtube youtube-api

【解决方案1】：

我已经检查了你的代码，你只是需要更多的时间和测试。

正如我commented，您需要将transcript.fetch() 的结果附加到一个全局变量中——每次循环list_of_video_ids 的元素，然后，您可以——在您创建的search_dictionary 函数中，迭代成绩单。

这是主要代码：

# Get user input here: 
# N.B: You should validate for avoid a blank line or some invalid input...
user_input = input("Enter a word or sentence: ")
user_input = user_input.lower()

# We use here the global list "all_transcripts": 
dictionary = all_transcripts

# Function to loop all transcripts and search the captions thath contains the 
# user input.
# TO-DO: Validate when no data is found.
def search_dictionary(user_input, dictionary): 
  link = 'https://youtu.be/'

  # Get the video_id: 
  v_id  = ""

  # I add here the debbuged results: 
  lst_results = []

  # string body:
  matched_line = ""

  # You're really looping a list of dictionaries: 
  for i in range(len(dictionary)): # <= this is really a "list".
    try:
      #print(type(dictionary[i])) # <= this is really a "dictionary".
      #print(dictionary[i])

      # now you can iterate here the "dictionary": 
      for x, y in dictionary[i].items():
        #print(x, y)
        if (x == "video_id"): 
          v_id = y
        if (user_input in str(y) and len(v_id) > 0):
          matched_line = str(dictionary[i]['text']) + '...' + str(dictionary[i]['start']) + ' min und ' + str(dictionary[i]['duration']) + ' sec :: ' + link + v_id + '?t=' + str(int(dictionary[i]['start'] - 1)) + 's'
          #matched_line = "text: " + y + " -- found in video_id = " + v_id
          
          # Check if line does not exists in the list of results: 
          if len(lst_results) == 0:
            lst_results.append(matched_line)
          if matched_line not in lst_results: 
            lst_results.append(matched_line)

    except Exception as err: 
      print('Unexpected error - see details bellow:')
      print(err)

  # Just an example for show "no results":
  if (len(lst_results) == 0):
    print("No results found with input (" + user_input + ")")
  else: 
    print("Results: ")
    print("\n".join(lst_results)) # <= this is for show the results with a line break.
# Function ends here.

# Call function: 
search_dictionary(user_input, dictionary) 

# Show message - indicating end of the program - just informative :)
print("End of the program")

按照这个问题的思路，我修改了你的代码，这是你Google Colab file modified的链接。

这是Google Colab public notebook link。

简历代码如下：

您的变量命名需要更改，因为 - 在测试时，我无法理解我正在处理的数据类型 = lists 或 dictionaries，似乎有 both = 如您所见当您阅读修改后的代码时。
我建议您组织代码并专注于间距 - 在 Google Colab 中，行数太长而无法阅读 - 不过，这可能是一些个人喜好。
正如您在我在您的代码中创建的 cmets 中看到的那样，我鼓励您在代码中添加 cmets - 以帮助其他人理解您的代码。

要测试此代码并查看它是否适用于修改后的代码，请尝试输入 teach:

结果如下：

Enter a word or sentence: teach
Results: 
teacher and set up a class or even...626.0 min und 4.079 sec :: https://youtu.be/pXDx6DjNLDU?t=625s
teach this process and where you watch...738.399 min und 3.68 sec :: https://youtu.be/8HEfIJlcFbs?t=737s
few times a year i teach a month-long...418.8 min und 3.44 sec :: https://youtu.be/3c584TGG7jQ?t=417s
End of the program

【讨论】：

非常感谢您花时间帮助我解决这个问题。非常感谢:)
只有一件事...你能把谷歌colab笔记本公开给任何人看吗？默认情况下，colab 笔记本是私有的，只能使用密码访问
@MaximilianFreitag 确定 - 这是public link - 请检查它并让我知道此笔记本的任何问题。我很高兴我的解决方案奏效了。