【问题标题】:Replicate/Duplicate row by index row Pandas/Numpy按索引行复制/复制行 Pandas/Numpy
【发布时间】:2021-03-03 19:09:04
【问题描述】:

我正在尝试通过索引复制/复制数据框中的几行,但我没有这样做。

鉴于此 DataFrame:

DataFrame Sample

我的代码:

def duplicateDealers(self, data):
    pd.options.display.width = 0
    counter = 0
    for index, row in data.iterrows():
        brandColumn = 'Brand' + str(counter)
        # print(index, row[brandColumn])
        if str(row[brandColumn]) == 'Cadillac':
            newData = pd.DataFrame(np.repeat(data.loc[int(index)], 1))
            newData['Repeated'] = 'Yes'
    print(newData.columns)
    print(type(pd.DataFrame(np.repeat(data.loc[int(index)], 1))))
    print(newData)

如果我使用以下代码:

newData = pd.DataFrame(np.repeat(data.loc[int(index)], 1, axis=0))

我收到此错误:

ValueError: 'axis' 参数在 pandas 的 repeat() 实现中不受支持

我想用这段代码实现什么?

我遍历行和列以识别列“Brand0”中的单词“Cadillac”,如果条件为 True,那么我想通过其索引复制整行并保持行的原始格式,然后我将按照我的意愿操作新的行数据。

输出如下(列名“4108”为随机索引,DataFrame记录量巨大,超过5k):

Actual Output

我想要的输出是:

Desired Output

我做错了什么?

问候和感谢。

编辑:

以下是一些示例数据:


Source_ID | SecondaryName                      | PrimaryName                        | Address                | City        | State | Full_Postal_Code | Postal_Code | Country | Telephone  | Brand0
123456    | JACK SCHMITT CADILLAC, INC.        | JACK SCHMITT CADILLAC, INC.        | 915 W HWY 50           | O FALLON    | IL    | 62269            | 62269       | USA     | 6186321001 | Cadillac
987654    | JAMES E. BLACK CADILLAC            | JAMES E. BLACK CADILLAC            | 3929 ADMIRAL PEARY HWY | EBENSBURG   | PA    | 15931            | 15931       | USA     | 8144729553 | Cadillac
753951    | COLE-VALLEY MOTOR COMPANY          | COLE-VALLEY MOTOR COMPANY          | 4111 ELM ROAD NE       | WARREN      | OH    | 44483            | 44483       | USA     | 3303721668 | Cadillac
159357    | MCDONALD GMC-CADILLAC, INC.        | MCDONALD GMC-CADILLAC, INC.        | 5155 STATE ST          | SAGINAW     | MI    | 48603            | 48603       | USA     | 9897905154 | Cadillac
456987    | DAVID BRUCE AUTO CENTER, INC.      | DAVID BRUCE AUTO CENTER, INC.      | 555 LATHAM DR          | BOURBONNAIS | IL    | 60914            | 60914       | USA     | 8159337709 | Cadillac
321456    | JACK WOLF CADILLAC-GMC TRUCK, INC. | JACK WOLF CADILLAC-GMC TRUCK, INC. | 1855 N STATE ST        | BELVIDERE   | IL    | 61008            | 61008       | USA     | 8155443403 | Cadillac

编辑 2:

这里有更多关于我尝试实现的目标的详细信息:

每一行可能有几个 BrandX 列,根据其内容,我将复制该行并将品牌名称和其他内容添加到 Source_ID,这样我就可以根据经销商品牌获得适量的记录。

数据帧:

Source_ID | SecondaryName                      | PrimaryName                        | Address                | City        | State | Full_Postal_Code | Postal_Code | Country | Telephone  | Brand0   | Brand1    | Brand2
123456    | JACK SCHMITT CADILLAC, INC.        | JACK SCHMITT CADILLAC, INC.        | 915 W HWY 50           | O FALLON    | IL    | 62269            | 62269       | USA     | 6186321001 | Cadillac | GMC       | Buick
987654    | JAMES E. BLACK CADILLAC            | JAMES E. BLACK CADILLAC            | 3929 ADMIRAL PEARY HWY | EBENSBURG   | PA    | 15931            | 15931       | USA     | 8144729553 | Cadillac | NaN       | GMC
753951    | COLE-VALLEY MOTOR COMPANY          | COLE-VALLEY MOTOR COMPANY          | 4111 ELM ROAD NE       | WARREN      | OH    | 44483            | 44483       | USA     | 3303721668 | Cadillac | Buick     | NaN
159357    | MCDONALD GMC-CADILLAC, INC.        | MCDONALD GMC-CADILLAC, INC.        | 5155 STATE ST          | SAGINAW     | MI    | 48603            | 48603       | USA     | 9897905154 | Cadillac | Buick     | NaN
456987    | DAVID BRUCE AUTO CENTER, INC.      | DAVID BRUCE AUTO CENTER, INC.      | 555 LATHAM DR          | BOURBONNAIS | IL    | 60914            | 60914       | USA     | 8159337709 | Cadillac | Chevrolet | GMC
321456    | JACK WOLF CADILLAC-GMC TRUCK, INC. | JACK WOLF CADILLAC-GMC TRUCK, INC. | 1855 N STATE ST        | BELVIDERE   | IL    | 61008            | 61008       | USA     | 8155443403 | Cadillac | NaN       | NaN

预期输出:

Source_ID        | SecondaryName                      | PrimaryName                        | Address                | City        | State | Full_Postal_Code | Postal_Code | Country | Telephone  | Brand0   | Brand1    | Brand2
123456_Cadillac  | JACK SCHMITT CADILLAC, INC.        | JACK SCHMITT CADILLAC, INC.        | 915 W HWY 50           | O FALLON    | IL    | 62269            | 62269       | USA     | 6186321001 | Cadillac | GMC       | Buick
123456_GMC       | JACK SCHMITT CADILLAC, INC.        | JACK SCHMITT CADILLAC, INC.        | 915 W HWY 50           | O FALLON    | IL    | 62269            | 62269       | USA     | 6186321001 | Cadillac | GMC       | Buick
123456_Buick     | JACK SCHMITT CADILLAC, INC.        | JACK SCHMITT CADILLAC, INC.        | 915 W HWY 50           | O FALLON    | IL    | 62269            | 62269       | USA     | 6186321001 | Cadillac | GMC       | Buick
987654_Cadillac  | JAMES E. BLACK CADILLAC            | JAMES E. BLACK CADILLAC            | 3929 ADMIRAL PEARY HWY | EBENSBURG   | PA    | 15931            | 15931       | USA     | 8144729553 | Cadillac | NaN       | GMC
987654_GMC       | JAMES E. BLACK CADILLAC            | JAMES E. BLACK CADILLAC            | 3929 ADMIRAL PEARY HWY | EBENSBURG   | PA    | 15931            | 15931       | USA     | 8144729553 | Cadillac | NaN       | GMC
753951_Cadillac  | COLE-VALLEY MOTOR COMPANY          | COLE-VALLEY MOTOR COMPANY          | 4111 ELM ROAD NE       | WARREN      | OH    | 44483            | 44483       | USA     | 3303721668 | Cadillac | Buick     | NaN
753951_GMC       | COLE-VALLEY MOTOR COMPANY          | COLE-VALLEY MOTOR COMPANY          | 4111 ELM ROAD NE       | WARREN      | OH    | 44483            | 44483       | USA     | 3303721668 | Cadillac | Buick     | NaN
159357_Cadillac  | MCDONALD GMC-CADILLAC, INC.        | MCDONALD GMC-CADILLAC, INC.        | 5155 STATE ST          | SAGINAW     | MI    | 48603            | 48603       | USA     | 9897905154 | Cadillac | Buick     | NaN
159357_Buick     | MCDONALD GMC-CADILLAC, INC.        | MCDONALD GMC-CADILLAC, INC.        | 5155 STATE ST          | SAGINAW     | MI    | 48603            | 48603       | USA     | 9897905154 | Cadillac | Buick     | NaN
456987_Cadillac  | DAVID BRUCE AUTO CENTER, INC.      | DAVID BRUCE AUTO CENTER, INC.      | 555 LATHAM DR          | BOURBONNAIS | IL    | 60914            | 60914       | USA     | 8159337709 | Cadillac | Chevrolet | GMC
456987_Chevrolet | DAVID BRUCE AUTO CENTER, INC.      | DAVID BRUCE AUTO CENTER, INC.      | 555 LATHAM DR          | BOURBONNAIS | IL    | 60914            | 60914       | USA     | 8159337709 | Cadillac | Chevrolet | GMC
456987_GMC       | DAVID BRUCE AUTO CENTER, INC.      | DAVID BRUCE AUTO CENTER, INC.      | 555 LATHAM DR          | BOURBONNAIS | IL    | 60914            | 60914       | USA     | 8159337709 | Cadillac | Chevrolet | GMC
321456_Cadillac  | JACK WOLF CADILLAC-GMC TRUCK, INC. | JACK WOLF CADILLAC-GMC TRUCK, INC. | 1855 N STATE ST        | BELVIDERE   | IL    | 61008            | 61008       | USA     | 8155443403 | Cadillac | NaN       | NaN

【问题讨论】:

  • 能否请您分享数据而不是屏幕截图。重现您的问题并帮助您解决它会很有帮助。

标签: python pandas numpy


【解决方案1】:

以下可以做到这一点。

模块

import numpy as np
import pandas as pd

示例数据

df = pd.DataFrame({'Source':[111347,115742,100007], 'Brand0':['Cadillac', 'Cadillac', 'Alternative']})

使用np.repeat的解决方案。

df.loc[np.repeat(df.index.values, list(df['Brand0'].isin(['Cadillac'])+1))]

【讨论】:

  • 实际上这是可行的,但是会一次复制所有行,并且我会丢失每个重复行的可见性,以便对行的单元格进行一些更改。假设我需要根据我在条件下检测到的品牌(可以是凯迪拉克、别克、雪佛兰等)更改“来源”,然后进行一些更改,例如:Row1 = Source['111347_GMC'] |第 2 行 = 源 ['111347_Cadillac']。这就是为什么我需要一一复制行,这样我就可以控制每个重复的行并在行中进行必要的更改。
  • 不确定我是否理解您的要求,在您一次复制所有数据后使用df.iloc[0:2, :] 逐行切片数据会不会更容易?
  • @RuthgerRighart,也许在您的答案中添加df['Repeated'] = df.duplicated() 可以满足 OP 要求。
  • @RuthgerRighart 如果我不是很清楚,我很抱歉,我再次更新了帖子。
猜你喜欢
  • 2021-08-12
  • 1970-01-01
  • 2016-10-27
  • 2021-10-02
  • 2013-08-23
  • 2018-08-06
  • 2022-01-17
  • 1970-01-01
  • 2021-08-03
相关资源
最近更新 更多