本福德的法律程序答案

【问题标题】：Benford's law program本福德的法律程序
【发布时间】：2013-04-27 03:43:47
【问题描述】：

我必须编写一个程序来证明两个数据列表的本福德定律。我想我的代码大部分都已经写好了，但我认为我遗漏了一些小错误。如果这不是该网站的使用方式，我很抱歉，但我真的需要帮助。这是我的代码。

def getData(fileName):

    data = []
    f = open(fileName,'r')
    for line in f:
        data.append(line)
    f.close()

    return data

def getLeadDigitCounts(data):

    counts = [0,0,0,0,0,0,0,0,0]

    for i in data:
        pop = i[1]
        digits = pop[0]
        int(digits)
        counts[digits-1] += 1

    return counts

def showResults(counts):

    percentage = 0
    Sum = 0
    num = 0
    Total = 0

    for i in counts:
        Total += i

    print"number of data points:",Sum
    print
    print"digit number percentage"
    for i in counts:
        Sum += i
        percentage = counts[i]/float(Sum)
        num = counts[i]
        print"5%d 6%d %f"%(i,num,percentage)


def showLeadingDigits(digit,data):

    print"Showing data with a leading",digit
    for i in data:
        if digit == i[i][1]:
            print i

def processFile(name):

    data = getData(name)
    counts = getLeadDigitCounts(data)
    showResults(counts)

    digit = input('Enter leading digit: ')
    showLeadingDigits(digit, data)

def main():

    processFile('TexasCountyPop2010.txt')
    processFile('MilesofTexasRoad.txt')

main()

再次抱歉，如果这不是我应该使用本网站的方式。另外，我只能使用教授向我们展示的编程技术，所以如果您能给我建议以清理代码，我将不胜感激。

另外，这里有几行来自我的数据。

Anderson County     58458
Andrews County  14786
Angelina County     86771
Aransas County  23158
Archer County   9054
Armstrong County    1901

【问题讨论】：

如果您发布几 (2-5) 行正在检查的文件，将会很有用。

标签： python benfords-law

【解决方案1】：

您的错误来自这一行：

int(digits)

这实际上对digits 没有任何作用。如果要将digits转换为整数，则必须重新设置变量：

digits = int(digits)

另外，为了正确解析您的数据，我会这样做：

for line in data:
    place, digits = line.rsplit(None, 1)
    digits = int(digits)
    counts[digits - 1] += 1

【讨论】：

【解决方案2】：

让我们遍历您的代码的一个周期，我想您会发现问题所在。我将在此处使用此文件作为数据

An, 10, 22
In, 33, 44
Out, 3, 99

现在getData 回归：

["An, 10, 22",
"In, 33, 44",
"Out, 3, 99"]

现在看看循环的第一遍：

for i in data:
    # i = "An, 10, 22"
    pop = i[1]
    # pop = 'n', the second character of i
    digits = pop[0]
    # digits = 'n', the first character of pop
    int(digits)
    # Error here, but you probably wanted digits = int(digits)
    counts[digits-1] += 1

根据您的数据的结构，您需要找出逻辑来提取您希望从文件中获得的数字。这种逻辑在 getData 函数中可能会做得更好，但这主要取决于您的数据的具体情况。

【讨论】：

啊，我的印象是列表看起来像 [['a','b'],['c','d']]。我可以做些什么让列表看起来像这样吗？
对，打印出来看看是什么样子的。

【解决方案3】：

只是在这里分享一个不同的（也许更一步一步的）代码。是红宝石。

The thing is, Benford's Law doesn't apply when you have a specific range of random data to extract from. The maximum number of the data set that you are extracting random information from must be undetermined, or infinite.

In other words, say, you used a computer number generator that had a 'set' or specific range from which to extract the numbers, eg. 1-100. You would undoubtedly end up with a random dataset of numbers, yes, but the number 1 would appear as a first digit as often as the number 9 or any other number.

**The interesting** part, actually, happens when you let a computer (or nature) decide randomly, and on each instance, how large you want the random number to potentially be. Then you get a nice, bi-dimensional random dataset, that perfectly attains to Benford's Law. I have generated this RUBY code for you, which will neatly prove that, to our fascination as Mathematicians, Benford's Law works each and every single time!

Take a look at this bit of code I've put together for you!
It's a bit WET, but I'm sure it'll explain.

dataset = []

999.times do
  random = rand(999)
  dataset << rand(random)
end

startwith1 = []
startwith2 = []
startwith3 = []
startwith4 = []
startwith5 = []
startwith6 = []
startwith7 = []
startwith8 = []
startwith9 = []

dataset.each do |element|
  case element.to_s.split('')[0].to_i
  when 1 then startwith1 << element
  when 2 then startwith2 << element
  when 3 then startwith3 << element
  when 4 then startwith4 << element
  when 5 then startwith5 << element
  when 6 then startwith6 << element
  when 7 then startwith7 << element
  when 8 then startwith8 << element
  when 9 then startwith9 << element
  end
end

a = startwith1.length
b = startwith2.length
c = startwith3.length
d = startwith4.length
e = startwith5.length
f = startwith6.length
g = startwith7.length
h = startwith8.length
i = startwith9.length

sum = a + b + c + d + e + f + g + h + i

p "#{a} times first digit = 1; equating #{(a * 100) / sum}%"
p "#{b} times first digit = 2; equating #{(b * 100) / sum}%"
p "#{c} times first digit = 3; equating #{(c * 100) / sum}%"
p "#{d} times first digit = 4; equating #{(d * 100) / sum}%"
p "#{e} times first digit = 5; equating #{(e * 100) / sum}%"
p "#{f} times first digit = 6; equating #{(f * 100) / sum}%"
p "#{g} times first digit = 7; equating #{(g * 100) / sum}%"
p "#{h} times first digit = 8; equating #{(h * 100) / sum}%"
p "#{i} times first digit = 9; equating #{(i * 100) / sum}%"

【讨论】：