【问题标题】:Can't access a tweet id with beautiful soup无法访问带有漂亮汤的推文 ID
【发布时间】:2020-04-17 12:47:27
【问题描述】:

我的目标是在发布推文时在推特搜索中检索推文的 ID。到目前为止,我的代码如下所示:

import requests
from bs4 import BeautifulSoup

keys = some_key_words + " -filter:retweets AND -filter:replies"
query = "https://twitter.com/search?f=tweets&vertical=default&q=" + keys + "&src=typd&lang=es"
req = requests.get(query).text
soup = BeautifulSoup(req, "lxml")

for tweets in soup.findAll("li",{"class":"js-stream-item stream-item stream-item"}):
    print(tweets)

但是,这不会返回任何内容。代码本身有问题还是我看错了源代码的位置?我知道 id 应该存储在这里:

<div class="stream">
  <ol class="stream-items js-navigable-stream" id="stream-items-id">
    <li class="js-stream-item stream-item stream-item" **data-item-id**="1210306781806833664" id="stream-item-tweet-1210306781806833664" data-item-type="tweet">

【问题讨论】:

标签: python html twitter beautifulsoup


【解决方案1】:
from bs4 import BeautifulSoup
data = """
<div class="stream">
    <ol class="stream-items js-navigable-stream" id="stream-items-id">
        <li class="js-stream-item stream-item stream-item
" **data-item-id**="1210306781806833664"
id="stream-item-tweet-1210306781806833664"
data-item-type="tweet"
>
        ...
"""


soup = BeautifulSoup(data, 'html.parser')

for item in soup.findAll("li", {'class': 'js-stream-item stream-item stream-item'}):
    print(item.get("**data-item-id**"))

输出:

1210306781806833664

【讨论】:

  • 这确实适用于该变量“数据”,但当我获取 twitter 页面的结果时它停止工作
猜你喜欢
  • 1970-01-01
  • 1970-01-01
  • 2017-04-06
  • 1970-01-01
  • 2015-11-19
  • 1970-01-01
  • 2014-06-18
  • 1970-01-01
  • 1970-01-01
相关资源
最近更新 更多