【问题标题】：How to make a histogram from a list of strings in Python?如何从 Python 中的字符串列表制作直方图？
【发布时间】：2015-04-09 17:40:47
【问题描述】：

我有一个字符串列表：

a = ['a', 'a', 'a', 'a', 'b', 'b', 'c', 'c', 'c', 'd', 'e', 'e', 'e', 'e', 'e']

我想制作一个直方图来显示字母的频率分布。我可以使用以下代码制作一个包含每个字母计数的列表：

from itertools import groupby
b = [len(list(group)) for key, group in groupby(a)]

如何制作直方图？我的列表a 中可能有一百万个这样的元素。

【问题讨论】：

from collections import Counter; histogram = Counter(text)
那么直方图对你来说是什么？
首先你应该使用Counter ... groupby 会让你失败['a','a','b','b','a']（除其他外）
Making a histogram of string values in python的可能重复
顺便说一句，你想要一个条形图而不是直方图。

标签： python string histogram

【解决方案1】：

查看matplotlib.pyplot.bar。还有numpy.histogram，如果你想要更宽的垃圾箱，它会更灵活。

【讨论】：

【解决方案2】：

不要使用groupby()（这需要对您的输入进行排序），而是使用collections.Counter()；这不必为了计算输入而创建中间列表：

from collections import Counter

counts = Counter(a)

您还没有真正指定您认为是“直方图”的内容。假设您想在终端上执行此操作：

width = 120  # Adjust to desired width
longest_key = max(len(key) for key in counts)
graph_width = width - longest_key - 2
widest = counts.most_common(1)[0][1]
scale = graph_width / float(widest)

for key, size in sorted(counts.items()):
    print('{}: {}'.format(key, int(size * scale) * '*'))

演示：

>>> from collections import Counter
>>> a = ['a', 'a', 'a', 'a', 'b', 'b', 'c', 'c', 'c', 'd', 'e', 'e', 'e', 'e', 'e']
>>> counts = Counter(a)
>>> width = 120  # Adjust to desired width
>>> longest_key = max(len(key) for key in counts)
>>> graph_width = width - longest_key - 2
>>> widest = counts.most_common(1)[0][1]
>>> scale = graph_width / float(widest)
>>> for key, size in sorted(counts.items()):
...     print('{}: {}'.format(key, int(size * scale) * '*'))
... 
a: *********************************************************************************************
b: **********************************************
c: **********************************************************************
d: ***********************
e: *********************************************************************************************************************

在numpy.histogram() 和matplotlib.pyplot.hist() 函数中可以找到更复杂的工具。这些为您计算，matplotlib.pyplot.hist() 还为您提供图形输出。

【讨论】：

谢谢Martijin！这是一个聪明的方法，但我如何制作可打印的图表？
以及如何使用 numpy.histogram() 来解决这个问题？对不起，我不是程序员。
@Gray：说实话，我不知道也没有时间去了解。图书馆有教程，我建议你去关注它们！ :-)
非常感谢您花时间回答我的问题，Martijin！
如果您手头只有 Python 标准库，这是最好的解决方案。在某些情况下，NumPy、Pandas 和 matplotlib 可能有点过头了。

【解决方案3】：

Pandas 非常简单。

import pandas
from collections import Counter
a = ['a', 'a', 'a', 'a', 'b', 'b', 'c', 'c', 'c', 'd', 'e', 'e', 'e', 'e', 'e']
letter_counts = Counter(a)
df = pandas.DataFrame.from_dict(letter_counts, orient='index')
df.plot(kind='bar')

注意Counter 正在计算频率，所以我们的绘图类型是'bar' 而不是'hist'。

【讨论】：

酷，不混淆！但是如何制作连续直方图呢？我只是将 kind = bar 更改为 kind = hist 吗？
我的列表中有超过 100 万个这样的元素，所以我猜条形图显示频率会有些困难。
@Gray，如果你想平滑它，我建议kind='area'
很好，虽然使用Series 对象而不是DataFrame 可能更简单，并且避免了图中的虚假0：pandas.Series(Counter(a)).plot(kind='bar')。

【解决方案4】：

正如上面@notconfusing 所指出的，这可以通过 Pandas 和 Counter 解决。如果出于任何原因您不需要使用 Pandas，您可以使用以下代码中的函数只使用 matplotlib：

from collections import Counter
import numpy as np
import matplotlib.pyplot as plt

a = ['a', 'a', 'a', 'a', 'b', 'b', 'c', 'c', 'c', 'd', 'e', 'e', 'e', 'e', 'e']
letter_counts = Counter(a)

def plot_bar_from_counter(counter, ax=None):
    """"
    This function creates a bar plot from a counter.

    :param counter: This is a counter object, a dictionary with the item as the key
     and the frequency as the value
    :param ax: an axis of matplotlib
    :return: the axis wit the object in it
    """

    if ax is None:
        fig = plt.figure()
        ax = fig.add_subplot(111)

    frequencies = counter.values()
    names = counter.keys()

    x_coordinates = np.arange(len(counter))
    ax.bar(x_coordinates, frequencies, align='center')

    ax.xaxis.set_major_locator(plt.FixedLocator(x_coordinates))
    ax.xaxis.set_major_formatter(plt.FixedFormatter(names))

    return ax

plot_bar_from_counter(letter_counts)
plt.show()

这会产生

【讨论】：

【解决方案5】：

在python中制作字符直方图的简单有效方法

import numpy as np

import matplotlib.pyplot as plt

from collections import Counter



a = []
count =0
d = dict()
filename = raw_input("Enter file name: ")
with open(filename,'r') as f:
    for word in f:
        for letter  in word:
            if letter not in d:
                d[letter] = 1
            else:
                d[letter] +=1
num = Counter(d)
x = list(num.values())
y = list(num.keys())

x_coordinates = np.arange(len(num.keys()))
plt.bar(x_coordinates,x)
plt.xticks(x_coordinates,y)
plt.show()
print x,y

【讨论】：

【解决方案6】：

这是一个简洁的 all-pandas 方法：

a = ['a', 'a', 'a', 'a', 'b', 'b', 'c', 'c', 'c', 'd', 'e', 'e', 'e', 'e', 'e']
pd.Series(a).value_counts().plot('bar')

【讨论】：

这是最简洁的答案。我会推广到data_frame.attribute_name.value_counts().plot.bar()
如何给这个情节添加标题？
@fireball.1 如果你做import matplotlib.pyplot as plt，那么你可以plt.title("will add title to current plot")

【解决方案7】：

使用 numpy

使用 numpy 1.9 或更高版本：

import numpy as np
a = ['a', 'a', 'a', 'a', 'b', 'b', 'c', 'c', 'c', 'd', 'e', 'e', 'e', 'e', 'e']
labels, counts = np.unique(a,return_counts=True)

这可以使用：

import matplotlib.pyplot as plt 
ticks = range(len(counts))
plt.bar(ticks,counts, align='center')
plt.xticks(ticks, labels)

【讨论】：

【解决方案8】：

这是不久前的事了，所以我不确定您是否还需要帮助，但其他人可能会所以我在这里。如果您被允许使用 matplotlib，我认为有一个更简单的解决方案！

a = ['a', 'a', 'a', 'a', 'b', 'b', 'c', 'c', 'c', 'd', 'e', 'e', 'e', 'e', 'e']

import matplotlib.pyplot as plt
plt.hist(a) #gives you a histogram of your array 'a'
plt.show() #finishes out the plot

这应该会给你一个漂亮的直方图！如果您愿意，还可以进行更多编辑来清理图表

【讨论】：