【发布时间】:2017-11-29 00:11:41
【问题描述】:
例如df1的形状为(533, 2176),索引如Elkford (5901003) DM 01010,df2的形状为(743, 12),索引如5901003; df1 的索引括号中的数字将匹配 df2 的索引。正如形状所显示的,一些索引根本不匹配。现在我想要一个形状为(533, 2176+12) 的数据集,即在增加列的同时保持匹配的行。
加载数据
import pandas as pd
from tabulate import tabulate
if __name__ == '__main__':
# Read data
census_subdivision_profile = pd.read_excel('../data/census_subdivision_profile.xlsx', sheetname='Data',
index_col='Geography', encoding='utf-8').T
print(tabulate(census_subdivision_profile.head(), headers="keys", index_col='CNSSSBDVSN', tablefmt='psql'))
print(census_subdivision_profile.shape)
census_subdivision_count = pd.read_csv('../data/augmented/census_subdivision.csv', encoding='utf-8')
print(tabulate(census_subdivision_count.head(), headers='keys', tablefmt='psql'))
print(census_subdivision_count.shape)
使用第一个答案我得到了错误:
Traceback (most recent call last):
File "/Users/Chu/Documents/dssg/ongoing/economy_vs_tourism.py", line 26, in <module>
census_subdivision_profile.index = census_subdivision_profile.index.map(extract_id)
File "/anaconda/lib/python2.7/site-packages/pandas/core/indexes/base.py", line 2727, in map
mapped_values = self._arrmap(self.values, mapper)
File "pandas/_libs/algos_common_helper.pxi", line 1212, in pandas._libs.algos.arrmap_object (pandas/_libs/algos.c:31954)
File "/Users/Chu/Documents/dssg/ongoing/economy_vs_tourism.py", line 10, in extract_id
return int(m.group(0)[1:-1])
ValueError: invalid literal for int() with base 10: 'Part 1) (5917054'
只是因为
Index([u'Canada (01) 20000',
u'British Columbia / Colombie-Britannique (59) 21010',
u'East Kootenay (5901) 01010', u'Elkford (5901003) DM 01010',
u'Sparwood (5901006) DM 01010', u'Fernie (5901012) CY 01010',
u'East Kootenay A (5901017) RDA 02020',
u'East Kootenay B (5901019) RDA 01020', u'Cranbrook (5901022) CY 01011',
u'Kimberley (5901028) CY 01010',
另一个是
Int64Index([5931813, 5941833, 5949832, 5919012, 5923033, 5924836, 5941016,
5955040, 5923809, 5941801,
数据框太大,放不下
【问题讨论】:
-
请用一个更好的例子让这个问题更清楚。阅读 [MCVE]((stackoverflow.com/help/mcve)
标签: python pandas dataframe indexing merge