【问题标题】:Python how do I split a string using a special character encoding error being returnedPython如何使用返回的特殊字符编码错误来拆分字符串
【发布时间】:2015-03-13 03:37:44
【问题描述】:

我正在使用 selenium 网络驱动程序来解析来自 facebook 个人资料的所有文本,即数据挖掘。我需要通过特殊字符进行解析,但尝试时出现错误。我不知道为什么,因为我在使用它之前对其进行了编码,但它仍然返回错误。

我试图搜索的字符是'·'

如果我按这个字符拆分每个帖子将按行拆分。

from selenium import webdriver
from selenium.common.exceptions import TimeoutException
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.keys import Keys
from bs4 import BeautifulSoup

driver = webdriver.Firefox()
driver.get("https://www.facebook.com/userprofilelink")
inputEmail = driver.find_element_by_id("email")
inputEmail.send_keys("fbemail")
inputPass = driver.find_element_by_id("pass")
inputPass.send_keys("fbpasswd")
inputPass.submit()
page_text = (driver.page_source).encode('utf-8')
soup = BeautifulSoup(page_text)
parse_data = soup.get_text().encode('utf-8').split('Name how it appears on post John Doe')
latest_message = parse_data[3]

这是我的错误发生的地方。我明白了:

SyntaxError: Non-ASCII character '\xc2' in file  C:\Users\Administraor\workspace\NagioPlugins\selinium_test.py on line 19, but no encoding declared; see http://python.org/dev/peps/pep-0263/ for details

代码:

search_string = ('·').encode('utf-8')
latest_message = parse_data[3].split(search_string)
print latest_message
driver.close()

print latest_message

【问题讨论】:

标签: python python-2.7 selenium


【解决方案1】:

想通了:我必须将脚本的编码设置为 utf 8。

#!/usr/bin/python
# -*- coding: utf-8 -*-
from selenium import webdriver
from selenium.common.exceptions import TimeoutException
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.keys import Keys
from bs4 import BeautifulSoup

driver = webdriver.Firefox()
driver.get("https://www.facebook.com/fbusername")
inputEmail = driver.find_element_by_id("email")
inputEmail.send_keys("fbemail")
inputPass = driver.find_element_by_id("pass")
inputPass.send_keys("fbpasswd")
inputPass.submit()
page_text = (driver.page_source).encode('utf-8')
soup = BeautifulSoup(page_text)
parse_data = soup.get_text().encode('utf-8').split('·')
for i,v in enumerate(parse_data):
    print i,v

parse_data = soup.get_text().encode('utf-8').split('First Last')
for i,v in enumerate(parse_data):
    print i,v

latest_message = parse_data[4]
latest_message = parse_data[4].split('·')

driver.close()
print latest_message

【讨论】:

    猜你喜欢
    • 2022-11-13
    • 1970-01-01
    • 1970-01-01
    • 2016-03-03
    • 1970-01-01
    • 2020-10-14
    • 1970-01-01
    相关资源
    最近更新 更多