【问题标题】:Array out of bounds in ID3 algorithmID3算法中的数组越界
【发布时间】:2018-04-09 18:08:55
【问题描述】:
import numpy as np
training_set = np.array([[0, 1, 0, 1, 0, 1],[0, 0, 0, 1, 0, 0],[0, 0, 0, 0, 1, 0],[1, 0, 1, 0, 1, 0],[0, 1, 1, 1, 0, 1],[0, 1, 0, 0, 1, 1],[1, 1, 1, 0, 0, 0],[1, 1, 1, 1, 0, 1],[0, 1, 1, 0, 1, 0],[1, 1, 0, 0, 0, 1],[1, 0, 0, 0, 1, 0]])

def p(X):
   Fx = X[:,X.shape[1]-1]
   x0= 0
   x1= 0
   for i in range(len(Fx)):
       if Fx[i-1] == 1:
           x0 = x0+1
       else:
           x1 = x1+1
   P0 = x0/len(Fx)
   P1 = x1/len(Fx)      
   return(P0,P1)    

def H(X):
    result = -p(X)[0]*np.log(p(X)[0])-p(X)[1]*np.log(p(X)[1]) #needs to be log2
    print("1 = pure, 0 = unpure 1/2 = decision can be random:  Calculating Entropy: -" + str(p(X)[0]) + "*" + str(np.log(p(X)[0])) + "-" + str(p(X)[1]) + "*" + str(np.log(p(X)[1])) )
    return result

def Q(X,i):
    Xi = X[:,i]
    result0= 0
    result1= 0
    for j in range(len(Xi)):
        if Xi[j] == 1:
            result1 = result1 + len(X[i,:])
        else: result0 = result0 + len(X[i,:])
    result1 = result1/len(X)
    result0 = result0/len(X)
    return(result0,result1) 

def X_column(X,i,v):
    list = X[np.where(X[:,i] == v)]
    return list

def IG(X,i):
    result = H(X)-Q(X,i)[0]*H(X_column(X,i,0))-Q(X,i)[1]*H(X_column(X,i,1))
    return result

#To teach decision trees on learning set S, we will used following alorithm(ID3):
#    1. There is example set S
#    2. If |{f(x) : (x, f(x)) ∈ S}| = 1= 1 create leaf with label f(x)
#    3. For i = 1,2,...,n calculate value IG(S,i)
#    4. May j be an index o fthe biggest of calculated values
#    5. Set node with label Xj
#    6. For subsets:
#        S0 = {(x, f(x)) ∈ S : xj = 0}
#        and
#        S1 = {(x, f(x)) ∈ S : xj = 1}
#        run algorithm recurrent (for S ← S0 i S ← S1) and add new nodes as a childs for a node with label j

tree = np.array([])
def ID3(S, recursion = 0):
    result = np.array([])
    recursion += 1
    rows = S.shape[0]
    columns = S.shape[1]      
    if S[:,columns-1].all() == True:
        leaf_label = S[0,columns-1]
        tree.put(recursion, leaf_label)
        return 
    for i in range(rows):
        ig = IG(S,i)
        result.put(i, ig)
    j = result.max()
    leaf_label2 = S[0,j] 
    tree.put(recursion, leaf_label2)
    S0 = X_column(S,i,0)
    S1 = X_column(S,i,1)
    ID3(S0,recursion+1)    
    ID3(S1,recursion+2)    
    return tree

ID3(training_set)

result.put(i, ig) IndexError:索引 0 超出轴 0 的范围,大小为 0

我习惯了 javascript,所以我无法掌握如何在不预设大小的情况下将某些东西放入数组中(我不知道决策树会有多大)。 有没有什么功能,或者有没有我没听说过的更好的解决方案?

【问题讨论】:

    标签: python python-3.x numpy machine-learning id3


    【解决方案1】:

    目前发现的两个问题。

    1. np.put

      错误再次发生:

      li = np.array([])
      
      li.put(0,123) # this gives error as your description
      
      # Correct way of appending
      
      li = np.array([])
      
      li = np.append(li,123)
      
    2. 函数 Q(X,i):索引超出范围

      参数 i 是 X 的行号。它在您的代码中用作列索引。

      def Q(X,i):
          Xi = X[:,i] # i is used as a column index.
          ...
      

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 1970-01-01
      • 2012-10-28
      • 1970-01-01
      • 1970-01-01
      • 2019-05-03
      • 2011-06-23
      • 1970-01-01
      • 2023-03-17
      相关资源
      最近更新 更多