【问题标题】:Converting a long list of sequence of 0's and 1's into a numpy array or pandas dataframe将一长串 0 和 1 的序列转换为 numpy 数组或 pandas 数据帧
【发布时间】:2019-02-15 09:26:43
【问题描述】:

我有一个很长的序列列表(假设每个长度为 16),由 0 和 1 组成。例如

s = ['0100100000010111', '1100100010010101', '1100100000010000', '0111100011110111', '1111100011010111']

现在我想将每个位视为一个特征,因此我需要将其转换为 numpy 数组或 pandas 数据帧。为了做到这一点,我需要用逗号分隔序列中存在的所有位,这对于大数据集来说是不可能的。

所以我尝试的是生成字符串中的所有位置:

slices = []
for j in range(len(s[0])):
    slices.append((j,j+1)) 

print(slices)
[(0, 1), (1, 2), (2, 3), (3, 4), (4, 5), (5, 6), (6, 7), (7, 8), (8, 9), (9, 10), (10, 11), (11, 12), (12, 13), (13, 14), (14, 15), (15, 16)]


new = []
for i in range(len(s)):
    seq = s[i]
    for j in range(len(s[i])):
    ## I have tried both of these LOC but couldn't figure out 
    ## how it could be done        
    new.append([s[slice(*slc)] for slc in slices])
    new.append(s[j:j+1])
print(new)

预期 o/p:

new = [[0,1,0,0,1,0,0,0,0,0,0,1,0,1,1,1], [1,1,0,0,1,0,0,0,1,0,0,1,0,1,0,1], [1,1,0,0,1,0,0,0,0,0,0,1,0,0,0,0], [0,1,1,1,1,0,0,0,1,1,1,1,0,1,1,1], [1,1,1,1,1,0,0,0,1,1,0,1,0,1,1,1]]

提前致谢!!

【问题讨论】:

    标签: python string python-3.x pandas numpy


    【解决方案1】:

    使用np.array 构造函数和列表推导:

    np.array([list(row) for row in s], dtype=int)
    

    array([[0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 1, 1, 1],
           [1, 1, 0, 0, 1, 0, 0, 0, 1, 0, 0, 1, 0, 1, 0, 1],
           [1, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0],
           [0, 1, 1, 1, 1, 0, 0, 0, 1, 1, 1, 1, 0, 1, 1, 1],
           [1, 1, 1, 1, 1, 0, 0, 0, 1, 1, 0, 1, 0, 1, 1, 1]])
    

    【讨论】:

    • @user9905807 如果此答案对您有所帮助,请考虑接受!
    【解决方案2】:

    一行,没有for循环:

    np.array(s).view('<U1').astype(int).reshape(len(s), -1)
    
    array([[0, 1, 0, ..., 1, 1, 1],
           [1, 1, 0, ..., 1, 0, 1],
           [1, 1, 0, ..., 0, 0, 0],
           [0, 1, 1, ..., 1, 1, 1],
           [1, 1, 1, ..., 1, 1, 1]])
    

    虽然还是比列表理解慢了一点

    【讨论】:

      猜你喜欢
      • 2018-02-23
      • 2021-04-04
      • 1970-01-01
      • 1970-01-01
      • 2018-12-31
      • 1970-01-01
      • 1970-01-01
      • 2021-05-19
      • 1970-01-01
      相关资源
      最近更新 更多