【问题标题】:Python - Convert dataframe elements to np.arraysPython - 将数据框元素转换为 np.arrays
【发布时间】:2017-10-24 14:07:23
【问题描述】:

我有一个数据框 df3,其中一列具有以下格式:

df3
Out[196]: 
                                              Utterances
0                   23825 141520 79229147 135 1951822935
1                                       15162091514 2015
2                                      1851315229147 114
3                                  225189625 141135 1144
4                                    1325 31854920 31184
5                                         31854920 31184
6           2085185-5719 19151352089147 1514 1325 229191
7      2085185 919 114 11618 129115 2015 3151329145 2...
8      185351420 193113 21815115 9142015 1325 3151316...
9      851216 2015 7520 1325 6152118 8211441854 41512...
10                                     31143512 15184518
11                                     31143512 15184518
12                                 13315211420 172151825
13                                                229191
14                                        16518191514112
15     9 15235 1514 19516205132518 20235142025 149142...
16     9 14554 2015 69144 152120 2385185 1325 14523 3...

我需要创建一个具有以下格式的 numpy 数组 arr:

array=[[23825, 141520, 79229147, 135, 1951822935], [15162091514, 2015], [1851315229147, 114], [.....]]

此外,方法:df3.values 将不起作用,因为输出如下:

array([['23825 141520 79229147 135 1951822935'],
       ['15162091514 2015'],
       ['1851315229147 114'],
       [....]],

感谢您的帮助,谢谢。

【问题讨论】:

  • 刚刚做了,抱歉

标签: python arrays pandas numpy dataframe


【解决方案1】:

使用

In [5639]: np.array(map(str.split, df.Utterances.values))
Out[5639]:
array([['23825', '141520', '79229147', '135', '1951822935'],
       ['15162091514', '2015'], ['1851315229147', '114'],
       ['225189625', '141135', '1144'], ['1325', '31854920', '31184'],
       ['31854920', '31184'],
       ['2085185-5719', '19151352089147', '1514', '1325', '229191'],
       ['2085185', '919', '114', '11618', '129115', '2015', '3151329145', '2'],
       ['185351420', '193113', '21815115', '9142015', '1325', '3151316'],
       ['851216', '2015', '7520', '1325', '6152118', '8211441854', '41512'],
       ['31143512', '15184518'], ['31143512', '15184518'],
       ['13315211420', '172151825'], ['229191'], ['16518191514112'],
       ['9', '15235', '1514', '19516205132518', '20235142025', '149142'],
       ['9', '14554', '2015', '69144', '152120', '2385185', '1325', '14523', '3']], 
       dtype=object)

或者,

In [5642]: np.array([x.split() for x in df.Utterances.values])
Out[5642]:
array([['23825', '141520', '79229147', '135', '1951822935'],
       ['15162091514', '2015'], ['1851315229147', '114'],
       ['225189625', '141135', '1144'], ['1325', '31854920', '31184'],
       ['31854920', '31184'],
       ['2085185-5719', '19151352089147', '1514', '1325', '229191'],
       ['2085185', '919', '114', '11618', '129115', '2015', '3151329145', '2'],
       ['185351420', '193113', '21815115', '9142015', '1325', '3151316'],
       ['851216', '2015', '7520', '1325', '6152118', '8211441854', '41512'],
       ['31143512', '15184518'], ['31143512', '15184518'],
       ['13315211420', '172151825'], ['229191'], ['16518191514112'],
       ['9', '15235', '1514', '19516205132518', '20235142025', '149142'],
       ['9', '14554', '2015', '69144', '152120', '2385185', '1325', '14523', '3']], 
       dtype=object)

【讨论】:

  • 它们都不起作用。当我尝试用这些数组拟合 k-means 模型时,我返回:ValueError: setting an array element with a sequence。
  • @user37143,上述解决方案生成的数组的条目类型为str。需要修改解决方案以将它们转换为数字类型。
  • 啊,但是一行中有一个2085185-5719,可能不起作用。
  • 如果df 是干净的,np.array([[int(v) for v in x.split()] for x in df.Utterances.values])
  • 或者,np.array([map(int, x.split()) for x in df.Utterances.values])
猜你喜欢
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 2017-02-19
  • 1970-01-01
  • 1970-01-01
  • 2013-09-21
  • 2018-12-08
相关资源
最近更新 更多