【发布时间】:2021-02-22 22:36:00
【问题描述】:
这个问题之前已经被问过几次了,但每次人们都说“只需添加 UTF-8”,一切都很好。据我了解,我现在正在处理的案例似乎无法通过 UTF-8 hack 解决?基本上我的程序从网站上抓取数据,但这些数据包含特殊的欧洲字符,如“č、š、ř”等......添加 encoding="UTF-8" 后错误消失但结果 CSV 文件包含完全损坏的字符特殊字符应该位于的位置。这会破坏整个文件并使其无法使用。
我自己无法在互联网上找到任何解决方案,我不知道如何处理它。我需要将这些特殊字符写入文件。另一个需要注意的是,我还需要脚本是跨平台的。我不希望它只是为了“摆脱错误”而以某种方式特定于 Windows。
这是我的代码:
with open('links.csv') as read:
reader = csv.reader(read)
link_list = list(reader)
with open('ScrapedContent.csv', 'w+', newline='') as write:
writer = csv.writer(write)
for link in link_list:
driver.get(', '.join(link))
title = wait.until(ec.presence_of_element_located((By.CSS_SELECTOR, "h1.page-title span.text.ng-binding")))
offers = wait.until(ec.presence_of_element_located((By.CSS_SELECTOR, "a.switcher.ng-binding.ng-scope span.ng-binding.ng-scope")))
address = wait.until(ec.presence_of_element_located((By.CSS_SELECTOR, "tr.c-aginfo__table__row td.ng-binding")))
try:
wait.until(ec.presence_of_element_located((By.CSS_SELECTOR, "button.value.link.ng-binding.ng-scope"))).click()
phone_number = wait.until(ec.presence_of_element_located((By.CSS_SELECTOR, "span.phone.ng-binding")))
except TimeoutException:
pass
try:
wait.until(ec.presence_of_element_located((By.CSS_SELECTOR, "button.value.link.ng-binding"))).click()
email = wait.until(ec.presence_of_element_located((By.CSS_SELECTOR, "a.value.link.ng-binding")))
except TimeoutException:
pass
try:
phone_number = phone_number.text
except AttributeError:
phone_number = ""
pass
try:
email = email.text
except AttributeError:
email = ""
pass
print(title.text, " ", offers.text, " ", address.text, " ", phone_number, " ", email)
writer.writerow([title.text, offers.text, address.text, phone_number, email])
driver.quit()
我在代码中找不到任何可能导致这种情况发生的错误。非常感谢您对如何解决此问题提出任何建议!
【问题讨论】:
标签: python python-3.x selenium selenium-webdriver geckodriver