【发布时间】:2018-08-30 04:39:45
【问题描述】:
我有一个代码贯穿系列中的每一行/项目,并将其转换为二元组/三元组。代码如下
def splitting(txt,gram=2):
tx1 = txt.str.replace('[^\w\s]','').str.split().tolist()[0]
if(len(tx1)==0):
return np.nan
txlis = [w for w in tx1 if w.lower() not in stop_wrds]
if gram==2:
return map(tuple,set(map(frozenset,list(nltk.bigrams(txlis)))))
else:
return map(tuple,set(map(frozenset,list(nltk.trigrams(txlis)))))
#pdb.set_trace()
print len(namedat)
prop_data = pd.DataFrame(namedat.apply(splitting,axis=1))
当我应用名为namedat 的系列数据时,错误出现在最后一行,看起来像这样:
0 inter-burgo ansan
1 dogo glory condo
2 w hotel
3 onyang grand hotel
4 onyang hot spring hotel
5 onyang cheil hotel (ex. onyang palace hotel)
6 springhill suites paso robles atascadero
7 best western plus colony inn
8 hesse
9 ibis styles aachen city
10 pullman aachen quellenhof
11 mercure aachen europaplatz
12 leonardo hotel aachen
13 aquis grana cityhotel
14 buschhausen
... ...
[166295 rows x 1 columns]
ValueError: 使用 df.apply 时无法将输入数组从形状 (2) 广播到形状 (1)
我试过调试,txt和bigrams都生成成功了,splitting这个函数似乎没有问题。我不知道如何解决这个问题。请帮忙
完整的错误信息:
Traceback (most recent call last):
File "data_playground.py", line 163, in <module>
main()
File "data_playground.py", line 156, in main
createparams(db.hotelbeds_properties,"hotelbeds")
File "data_playground.py", line 139, in createparams
prop_params = analyze(prop_subdf)
File "data_playground.py", line 110, in analyze
prop_data = pd.DataFrame(namedat.apply(splitting,axis=1))
File "/home/shubhang/.virtualenvs/pa/local/lib/python2.7/site-packages/pandas/core/frame.py", line 4877, in apply
ignore_failures=ignore_failures)
File "/home/shubhang/.virtualenvs/pa/local/lib/python2.7/site-packages/pandas/core/frame.py", line 4990, in _apply_standard
result = self._constructor(data=results, index=index)
File "/home/shubhang/.virtualenvs/pa/local/lib/python2.7/site-packages/pandas/core/frame.py", line 330, in __init__
mgr = self._init_dict(data, index, columns, dtype=dtype)
File "/home/shubhang/.virtualenvs/pa/local/lib/python2.7/site-packages/pandas/core/frame.py", line 461, in _init_dict
return _arrays_to_mgr(arrays, data_names, index, columns, dtype=dtype)
File "/home/shubhang/.virtualenvs/pa/local/lib/python2.7/site-packages/pandas/core/frame.py", line 6173, in _arrays_to_mgr
return create_block_manager_from_arrays(arrays, arr_names, axes)
File "/home/shubhang/.virtualenvs/pa/local/lib/python2.7/site-packages/pandas/core/internals.py", line 4642, in create_block_manager_from_arrays
construction_error(len(arrays), arrays[0].shape, axes, e)
File "/home/shubhang/.virtualenvs/pa/local/lib/python2.7/site-packages/pandas/core/internals.py", line 4604, in construction_error
raise e
ValueError: could not broadcast input array from shape (2) into shape (1)
我的代码执行的示例: 它从上面显示的表格中取出一行,例如:
name shaba boutique hotel
Name: 166278, dtype: object
然后返回由它生成的二元组
[(u'shaba', u'boutique'), (u'boutique', u'hotel')]
如果我执行一个简单的 for 循环(使用 iterrows),该函数将起作用并且我得到一个列表。我不明白为什么 apply 函数会失败。
【问题讨论】:
-
请包含完整错误信息和最小示例。
-
嘿,谢谢@DyZ!我添加了完整的错误消息和代码的作用示例。