Python Urllib 密钥错误答案

【问题标题】：Python Urllib keyerrorPython Urllib 密钥错误
【发布时间】：2016-02-11 03:23:29
【问题描述】：

我想统计一个特定 url 中的所有单词

import urllib.request
url = 'http://www.py4inf.com/code/romeo.txt'
fhand = urllib.request.Request(url)
resp = urllib.request.urlopen(fhand)
counts = dict()
for line in resp:
    words = line.split()
    print (words)
    for word in words:
        counts[word] = counts[word] +1
print (counts)

运行此程序时出现错误： [b'But', b'soft', b'what', b'light', b'through', b'yonder', b'window', b'breaks']

Traceback（最近一次调用最后一次）：文件“C:/Python/Hello/Exercise.py”，第 13 行，在计数[字] = 计数[字] +1

KeyError: b'But'

为什么每个单词或每一行都附加 b'？如果我使用相同的代码从文件中读取，它工作正常。

【问题讨论】：

标签： python urllib keyerror

【解决方案1】：

似乎每天都有一个问题的答案是defaultdict。

import urllib.request
from collections import defaultdict

url = 'http://www.py4inf.com/code/romeo.txt'
fhand = urllib.request.Request(url)
resp = urllib.request.urlopen(fhand)
counts = defaultdict(int) # pass a default type in, int() == 0
for line in resp:
    words = line.split()
    print (words)
    for word in words:
        counts[word] = counts[word] +1
print (counts)

使用常规字典时，count[word] 尚未定义，将抛出 KeyError。 defaultdict 的简单实现可能类似于：

class defaultdict(dict):
    def __init__(self, default_type, *args, **kwargs):
        # this allows for the regular dictionary constructor to be used
        dict.__init__(self, *args, **kwargs) 
        self._type = default_type

    def __getitem__(self, key):
        try:
            return dict.__getitem__(self, key)
        except KeyError:
            dict.__setitem__(self, key, self._type())
            return dict.__getitem__(self, key)

我确信有更好的方法可以做到这一点，但它应该以大致相同的方式工作。 __setitem__ 的默认实现将隐式引用 __getitem__ 的新定义。

【讨论】：

【解决方案2】：

当它还不存在时，您正在尝试添加它。例如。

counts = {}
counts["test"] = counts["test"] + 1 # counts["test"] does not exist...

因为"test" 还没有在counts 中，所以会引发KeyError。

简单的解决方案是检查它是否在那里。如果不是，则将其分配给 1：

import urllib.request
url = 'http://www.py4inf.com/code/romeo.txt'
fhand = urllib.request.Request(url)
resp = urllib.request.urlopen(fhand)
counts = dict()
for line in resp:
    words = line.split()
    print (words)
    for word in words:
        counts[word] = counts[word]+1 if word in counts else 1
print (counts)

【讨论】：

【解决方案3】：

我有问题。尽管我声明为字典，但我正在添加为列表。

对于字典，我试过了

counts[word] = counts.get(word,0) +1

它成功了。

【讨论】：

这是有效的，因为.get 不会抛出KeyError；当字典不包含键时，它需要第二个参数default，如果没有传递第二个参数，则返回None。