【问题标题】:Python-compute the histogram of a set of dataPython-计算一组数据的直方图
【发布时间】:2016-07-19 13:35:38
【问题描述】:

下面的 Python 函数用于计算具有相同大小的 bin 的数据直方图。我想得到正确的结果

[1, 6, 4, 6]

但是在我运行代码之后,它得到了结果

[7, 12, 17, 17]

这是不正确的。有人知道怎么解决吗?

# Computes the histogram of a set of data
def histogram(data, num_bins):

# Find what range the data spans, and use it to calculate the bin size.
span = max(data) - min(data)
bin_size = span / num_bins

# Calculate the thresholds for each bin.
thresholds = [0] * num_bins
for i in range(num_bins):
    thresholds[i] += bin_size * (i+1)

# Compute the histogram
counts = [0] * num_bins
for datum in data:
    # Increment the count of the bin that the datum falls in
    for bin_index, threshold in enumerate(thresholds):
        if datum <= threshold:
            counts[bin_index] += 1
return counts

# Some random data
data = [-3.2, 0, 1, 1.5, 1.6, 1.9, 5, 6, 9, 1, 4, 5, 8, 9, 5, 6.7, 9]
print("Correct result:\t" + str([1, 6, 4, 6]))
print("Your result:\t" + str(histogram(data, num_bins=4)))

【问题讨论】:

  • 你认为是什么使它不正确?
  • 您的代码不是有效的 Python。请edit它并修复缩进。
  • @Tichodroma:感谢编辑。
  • @Donkey Kong:我想得到正确的结果 [1, 6, 4, 6]

标签: python histogram


【解决方案1】:

如果要查找直方图,请使用 numpy

import numpy as np
np.histogram([-3.2, 0, 1, 1.5, 1.6, 1.9, 5, 6, 9, 1, 4, 5, 8, 9, 5, 6.7, 9],4)

【讨论】:

    【解决方案2】:

    只有你有两个逻辑错误

    (1)计算阈值

    (2) 添加break in for,一旦找到范围

    def histogram(data, num_bins):
      span = max(data) - min(data)
      bin_size = float(span) / num_bins
      thresholds = [0] * num_bins
    
      for i in range(num_bins):
        #I change thresholds calc
        thresholds[i] = min(data) + bin_size * (i+1)
    
      counts = [0] * num_bins
      for datum in data:
        for bin_index, threshold in enumerate(thresholds):
          if datum <= threshold:
            counts[bin_index] += 1
            #I add a break
            break
      return counts
    
    data = [-3.2, 0, 1, 1.5, 1.6, 1.9, 5, 6, 9, 1, 4, 5, 8, 9, 5, 6.7, 9]
    print("Correct result:\t" + str([1, 6, 4, 6]))
    print("Your result:\t" + str(histogram(data, num_bins=4)))
    

    【讨论】:

      【解决方案3】:

      检查阈值定义和 if 语句。 这有效:

      def histogram(data, num_bins):
      
          # Find what range the data spans, and use it to calculate the bin size.
          span = max(data) - min(data)
          bin_size = span / float(num_bins)
      
          # Calculate the thresholds for each bin.
          thresholds = [0 for i in range(num_bins+1)]
          for i in range(num_bins):
              thresholds[i] += bin_size * (i)
      
          print thresholds
          # Compute the histogram
          counts = [0 for i in range(num_bins)]
          for datum in data:
              # Increment the count of the bin that the datum falls in
              for bin_index, threshold in enumerate(thresholds):
                  if thresholds[bin_index-1] <= datum <= threshold:
                      counts[bin_index] += 1
          return counts
      

      【讨论】:

        【解决方案4】:

        首先,如果只是想对您的数据进行直方图,numpy 提供了这个。但是,您问自己如何做到这一点。您的代码表明您忘记了您正在尝试做什么,因此将您的功能分解为更小的功能。例如,要计算阈值,请编写函数thresholds(xmin, xmax, nbins),或者最好使用numpy.linspace。如果您假设您正在相对于0(而不是min(data))递增,这将引起您的注意,并且如果您幸运的话,可能会提醒您不要希望精确的浮点累积。所以你最终可能会得到

        def thresholds(xmin, xmax, nbins):
            span = (xmax - xmin) / float(nbins)
            thresholds = [xmin + (i+1)*span for i in range(nbins)]
            thresholds[-1] = xmax
            return thresholds
        

        接下来,您需要获取垃圾箱计数。同样,您可以只使用numpy.digitize。与您的代码相比,重要的是不要增加一个以上的 bin。最后你可能会得到类似的东西

        def counts(data, bounds):
            counts = [0] * len(bounds)
            for datum in data:
                bin = min(i for i,bound in enumerate(bounds) if bound >= datum)
                counts[bin] += 1
            return counts
        

        现在你准备好了:

        def histogram02(data, num_bins):
            xmin = min(data)
            xmax = max(data)
            th = thresholds(xmin, xmax, num_bins)
            return counts(data, th)
        

        【讨论】:

          猜你喜欢
          • 2011-09-17
          • 2014-04-05
          • 1970-01-01
          • 2011-08-12
          • 1970-01-01
          • 1970-01-01
          • 2023-04-10
          • 1970-01-01
          • 2021-11-02
          相关资源
          最近更新 更多