【问题标题】：NoneType object has no attribute 'encode' (Web Scraping)NoneType 对象没有属性“编码”（Web Scraping）
【发布时间】：2016-05-15 14:34:53
【问题描述】：

我遇到错误

'NoneType' 对象没有属性 'encode'

当我运行这段代码时

url = soup.find('div',attrs={"class":"entry-content"}).findAll('div', attrs={"class":None})


 fobj = open('D:\Scraping\parveen_urls.txt', 'w')
 
 for getting in url:
   fobj.write(getting.string.encode('utf8'))

但是当我使用 find 而不是 findAll 时，我得到一个 url。我如何通过 findAll 从对象中获取所有 url？

【问题讨论】：

如果你使用.text而不是.string呢？
@alecxe 是的，它的工作原理，但你能告诉我为什么吗？？？

标签： python web-scraping beautifulsoup

【解决方案1】：

我发现问题属于 NULL DATA。

我通过 FILTER OUT NULL DATA 修复了它

【讨论】：

【解决方案2】：

'NoneType' object has no attribute 'encode'

您正在使用.string。如果一个标签有多个孩子.string 将是None (docs)：

如果一个标签的唯一子标签是另一个标签，并且该标签有一个 .string，则认为父标签与其子标签具有相同的 .string：

请改用.get_text()。

【讨论】：

【解决方案3】：

下面我提供两个例子和一个可能的解决方案：

示例 1 显示了一个工作示例。
示例 2 显示了一个导致您报告的错误的非工作示例。
Solution 显示了一个可能的解决方案。

示例 1：html 具有预期的 div

    doc = ['<html><head><title>Page title</title></head>',
    '<body><div class="entry-content"><div>http://teste.com</div>',
    '<div>http://teste2.com</div></div></body>',
    '</html>']       
soup = BeautifulSoup(''.join(doc))
url = soup.find('div',attrs={"class":"entry-content"}).findAll('div', attrs={"class":None})
fobj = open('.\parveen_urls.txt', 'w')
for getting in url:
  fobj.write(getting.string.encode('utf8'))

示例 2：html 的内容中没有预期的 div

doc = ['<html><head><title>Page title</title></head>',
    '<body><div class="entry"><div>http://teste.com</div>',
    '<div>http://teste2.com</div></div></body>',
    '</html>']       
soup = BeautifulSoup(''.join(doc))

""" 
The error will rise here because the first find does not return nothing, 
and nothing is equals to None. Calling "findAll" on a None object will
raise: AttributeError: 'NoneType' object has no attribute 'findAll' 
"""
url = soup.find('div',attrs={"class":"entry-content"}).findAll('div', attrs={"class":None})
fobj = open('.\parveen_urls2.txt', 'w')
for getting in url:
  fobj.write(getting.string.encode('utf8'))

可能的解决方案：

doc = ['<html><head><title>Page title</title></head>',
    '<body><div class="entry"><div>http://teste.com</div>',
    '<div>http://teste2.com</div></div></body>',
    '</html>']     
soup = BeautifulSoup(''.join(doc))
url = soup.find('div',attrs={"class":"entry-content"})

"""
Deal with documents that do not have the expected html structure
"""
if url:
    url = url.findAll('div', attrs={"class":None})
    fobj = open('.\parveen_urls2.txt', 'w')
    for getting in url:
        fobj.write(getting.string.encode('utf8'))
else:
    print("The html source does not comply with expected structure")

【讨论】：