【问题标题】:Is there any alternative for \ in f string in python?python中的f字符串中的\有什么替代方法吗?
【发布时间】:2021-02-15 17:14:48
【问题描述】:

所以我正在用链接抓取这个网站:https://www.americanexpress.com/in/credit-cards/payback-card/ 使用漂亮的汤和蟒蛇。

link = 'https://www.americanexpress.com/in/credit-cards/payback-card/'
html = urlopen(link)
soup = BeautifulSoup(html, 'lxml')

details = []

for span in soup.select(".why-amex__subtitle span"):
    details.append(f'{span.get_text(strip=True)}: {span.find_next("span").get_text(strip=True)}')

print(details)

输出:

['EARN POINTS: Earn multiple Points from more than 50 PAYBACK partners2and 2 PAYBACK Points from American\xa0Express PAYBACK Credit\xa0Card for every Rs.\xa0100 spent', 'WELCOME GIFT: Get Flipkart voucher worth Rs. 7503on taking 3 transactions within 60 days of Cardmembership', 'MILESTONE BENEFITS: Flipkart vouchers4worth Rs. 7,000 on spending Rs. 2.5 lacs in a Cardmembership yearYou will earn a Flipkart voucher4worth Rs. 2,000 on spending Rs. 1.25 lacs in a Cardmembership year. Additionally, you will earn a Flipkart voucher4worth Rs. 5,000 on spending Rs. 2.5 lacs in a Cardmembership year.']

正如您在输出中看到的,有 \xa0 字符要从字符串中删除。

我尝试使用替换功能,但它不适用于 f 字符串,因为其中涉及 \。

details.append(f'{span.get_text(strip=True)}: {span.find_next("span").get_text(strip=True).replace("\xa0","")}')

有没有其他方法可以解决这个问题?

任何帮助都非常感谢!!!

【问题讨论】:

  • 这就是你要找的东西:stackoverflow.com/questions/10993612/…
  • 重新打开。假定的副本在 f-string 中不起作用,也不能解决 f-strings。
  • @smitpatel 不,它没有回答我的问题,我正在寻找使用 f 字符串的现有代码的解决方案。

标签: python web-scraping beautifulsoup string-formatting scrapinghub


【解决方案1】:

您可以使用unicodedata 删除\xa0 字符。当包含在 f 字符串中时它不会运行,但是这样做会:

from bs4 import BeautifulSoup
from unicodedata import normalize

link = 'https://www.americanexpress.com/in/credit-cards/payback-card/'
html = urlopen(link)
soup = BeautifulSoup(html, 'lxml')

details = []

for span in soup.select(".why-amex__subtitle span"):
    a = normalize('NFKD', span.get_text(strip=True))
    b = normalize('NFKD',span.find_next("span").get_text(strip=True))
    details.append(f'{a}: {b}')

print(details)

【讨论】:

    【解决方案2】:

    这可能是一个临时解决方案,因为 .replace("\xa0","") 在内部不工作之前在外部进行更改:

    link = 'https://www.americanexpress.com/in/credit-cards/payback-card/'
    html = urlopen(link)
    soup = BeautifulSoup(html, 'lxml')
    
    details = []
    
    for span in soup.select(".why-amex__subtitle span"):
    
        element = span.get_text(strip=True).replace("\xa0","")
        next_element = span.find_next("span").get_text(strip=True).replace("\xa0","")
        details.append(f'{element}: {next_element}')
    
    print(details)
    

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2020-01-17
      • 2021-12-31
      • 2016-01-24
      • 1970-01-01
      相关资源
      最近更新 更多