如何获取href中的链接？答案

【问题标题】：How do I get the link inside href?如何获取href中的链接？
【发布时间】：2020-04-07 13:55:21
【问题描述】：

我正在构建一个机器人，并将从下面的 twitter.com 的 html 中取出 href 部分，即 /VegSpringRoll/status/1205121838302420993，

<a class="css-4rbku5 css-18t94o4 css-901oao r-1re7ezh r-1loqt21 r-1q142lx r-1qd0xha r-a023e6 r-16dba41 r-ad9z0x r-bcqeeo r-3s2u2q r-qvutc0" title="9:46 PM · Dec 12, 2019" href="/VegSpringRoll/status/1205121838302420993" dir="auto" aria-label="Dec 12" role="link" data-focusable="true"</a>

我的脚本是：

class TwitterBot:
def __init__(self, username, password):
    self.username = username
    self.password = password
    self.bot = webdriver.Firefox()


def login(self):
    bot = self.bot
    bot.get('https://twitter.com/login')
    time.sleep(1)
    email = bot.find_element_by_class_name('js-username-field.email-input.js-initial-focus')
    password = bot.find_element_by_class_name('js-password-field')
    email.clear()
    password.clear()
    email.send_keys(self.username)
    password.send_keys(self.password)
    password.send_keys(Keys.RETURN)
    time.sleep()

def like_tweet(self,hashtag):
    bot = self.bot
    bot.get('https://twitter.com/search?q=%23' + hashtag + '&src=type')
    time.sleep(1)
    for i in range(1,10):
        bot.execute_script('window.scrollTo(0,document.body.scrollHeight)')# this scroll 1 time only.
        time.sleep(1)

        tweets = bot.find_elements_by_class_name('css-4rbku5 css-18t94o4 css-901oao r-1re7ezh r-1loqt21 r-1q142lx r-1qd0xha r-a023e6 r-16dba41 r-ad9z0x r-bcqeeo r-3s2u2q r-qvutc0')
        links = [elem.get_attribute('href') for elem in tweets]
        print(links)

在推文部分之前一切正常。

但什么都没有打印出来。有人可以帮忙吗？

【问题讨论】：

tweet 包含任何内容吗？这个bot 是什么？它是什么物体？
首先检查您在tweets 中获得的内容。每次（重新）加载页面时，某些页面可能会使用不同的随机类。
HTML 看起来无效。
你好@Reznik 和 furas，我已经更新了 OP，并将整个脚本粘贴在那里。 DebanjanB，直接从推特上复制过来的。

标签： python selenium twitter bots

【解决方案1】：

不允许使用 Selenium 复合类名称，您必须使用 css 选择器或 xpath。以下代码应该可以工作

tweets = bot.find_elements_by_css_selector('.css-4rbku5.css-18t94o4.css-901oao.r-1re7ezh.r-1loqt21.r-1q142lx.r-1qd0xha.r-a023e6.r-16dba41.r-ad9z0x.r-bcqeeo.r-3s2u2q.r-qvutc0')
links = [elem.get_attribute('href') for elem in tweets]
print(links)

请阅读此discussion 以获取更多信息。

【讨论】：

谢谢@Naeem，您能详细说明一下吗？我用谷歌搜索了它，但仍然不知道。虽然我找到了这样的答案，stackoverflow.com/questions/33155454/… 喜欢如何在我的情况下应用它。
我重写了它： tweets = bot.find_element_by_xpath('//*[@class="css-4rbku5 css-18t94o4 css-901oao r-1re7ezh r-1loqt21 r-1q142lx r-1qd0xha r- a023e6 r-16dba41 r-ad9z0x r-bcqeeo r-3s2u2q r-qvutc0"]')，它返回错误 'FirefoxWebElement' object is not iterable
@yts61 我为您提供了带有 css 选择器的代码。你试过吗？
不幸的是它不起作用，但没关系，我找到了解决方案！