【发布时间】:2019-12-10 03:47:20
【问题描述】:
我有两个数据框:
city_state 数据框
city state
0 huntsville alabama
1 montgomery alabama
2 birmingham alabama
3 mobile alabama
4 dothan alabama
5 chicago illinois
6 boise idaho
7 des moines iowa
和句子数据框
sentence
0 marthy was born in dothan
1 michelle reads some books at her home
2 hasan is highschool student in chicago
3 hartford of the west is the nickname of des moines
我想从名为 city 的句子数据框中提取新特征。该列city 是从sentence 中提取的,如果句子中包含来自列city_state['city'] 的某个名称city,如果它不包含某个名称city,则其值为Null。
预期的新数据框将是这样的:
sentence city
0 marthy was born in dothan dothan
1 michelle reads some books at her home Null
2 hasan is highschool student in chicago chicago
3 capital of dream is the motto of des moines des moines
我已经运行了这段代码
sentence['city'] ={}
for city in city_state.city:
for text in sentence.sentence:
words = text.split()
for word in words:
if word == city:
sentence['city'].append(city)
break
else:
sentence['city'].append(None)
但是这段代码的结果是这样的
ValueError: Length of values does not match length of index
如果您有类似案例的特征工程经验,您能否给我一些建议,如何为预期结果编写正确的代码。
谢谢
注意: 这是错误的完整日志
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-205-8a9038a015ee> in <module>
----> 1 sentence['city'] ={}
2
3 for city in city_state.city:
4 for text in sentence.sentence:
5 words = text.split()
~\Anaconda3\lib\site-packages\pandas\core\frame.py in __setitem__(self, key, value)
3117 else:
3118 # set column
-> 3119 self._set_item(key, value)
3120
3121 def _setitem_slice(self, key, value):
~\Anaconda3\lib\site-packages\pandas\core\frame.py in _set_item(self, key, value)
3192
3193 self._ensure_valid_index(value)
-> 3194 value = self._sanitize_column(key, value)
3195 NDFrame._set_item(self, key, value)
3196
~\Anaconda3\lib\site-packages\pandas\core\frame.py in _sanitize_column(self, key, value, broadcast)
3389
3390 # turn me into an ndarray
-> 3391 value = _sanitize_index(value, self.index, copy=False)
3392 if not isinstance(value, (np.ndarray, Index)):
3393 if isinstance(value, list) and len(value) > 0:
~\Anaconda3\lib\site-packages\pandas\core\series.py in _sanitize_index(data, index, copy)
3999
4000 if len(data) != len(index):
-> 4001 raise ValueError('Length of values does not match length of ' 'index')
4002
4003 if isinstance(data, ABCIndexClass) and not copy:
ValueError: Length of values does not match length of index
【问题讨论】:
标签: python pandas dataframe machine-learning feature-extraction