在一行中连接多个列答案

【问题标题】：Concatenate multiple columns in a row在一行中连接多个列
【发布时间】：2017-04-11 04:33:39
【问题描述】：

我用下面的代码制作了一个矩阵，并在其中存储了某些数据

df = []
r = 5000
c = 50
for i in xrange(r):
    r = [''] * c
    table.append(r)

使矩阵看起来像这样：

    0     1          2                 3        4    5     6    7   ...
3   NaN   Nestlé     Africa            Import   
4   NaN   Nutella    Europe            Report   2010 to    2011 
5   Shell            USA               Revenues      2017

由于每一行都有奇数列，我很困惑如何将所有列连接为一列并最终删除不必要的空列，使其看起来像这样

    1
3.  Nestlé Africa Import
4.  Nutella Europe Report 2010 to 2011
5.  Shell USA Revenues 2017
etc.

如果在 pandas.DataFrame (e.g. df2 = pd.DataFrame(df) ) 中更容易做到这一点，那么我也可以。

【问题讨论】：

我不确定您的数据来自哪里，为什么会不均匀？使用 ''.join() 方法连接很容易，只需让我知道雀巢、非洲等数据来自哪里以及为什么会不均匀
您好阿比德，数据来自 ocr 处理的 pdf 文档，表格中的长度不均匀，给出了这些结果。但是，这些结果是编造的，它只是代表我的问题
那为什么你不能使用数组的长度来确定在哪里删除你的列？

标签： python pandas concatenation

【解决方案1】：

使用pandas，您可以加入非空列，例如：

代码：

df['concat'] = df.apply(lambda x: ' '.join(
    [unicode(y) for y in x if not pd.isnull(y)]), axis=1)

测试代码：

import pandas as pd
from io import StringIO
df = pd.read_fwf(StringIO(u"""
    0     1          2                 3        4    5     6
3   NaN   Nestlé     Africa            Import   
4   NaN   Nutella    Europe            Report   2010 to    2011 
5   Shell            USA               Revenues      2017"""),
    skiprows=0, header=1, index_col=0)
print(df)

df['concat'] = df.apply(lambda x: ' '.join(
    [unicode(y) for y in x if y and not pd.isnull(y)]), axis=1)

print(df['concat'])

结果：

       0        1       2         3     4     5     6
3          Nestlé  Africa    Import                  
4         Nutella  Europe    Report  2010    to  2011
5  Shell              USA  Revenues        2017      

3                      Nestlé Africa Import
4    Nutella Europe Report 2010.0 to 2011.0
5                   Shell USA Revenues 2017

【讨论】：