【问题标题】:Reading two columns for matching values读取两列以匹配值
【发布时间】:2022-01-28 08:20:41
【问题描述】:

我在 csv 中有两个列,并尝试使用 pandas 来读取它并告诉我的程序删除这些在“ftps”和“value”列中具有匹配字母的玩家 主要是我想从两列中删除 E:E、C:C、D:E、C:E 匹配项

我试图设置这样的东西,但我对 python 很陌生

import pandas as pd
csv_filepath = '/home/joe/Downloads/NBA FD Rankings.csv'
cdf = pd.read_csv(csv_filepath)
for i in range(len(cdf)):
    if cdf[(cdf["Name"].isin(cdf.loc[(cdf.Fpts == "C"), "Value"])) & (cdf.Value == "C")]:
        optimizer.remove_player(player)

csv 看起来像这样

|Name        |Position  |Salary    |Fpts    |Value  |Team   |Matchup    |Team total
   
|Georges Niang     |PF/SF   |3700   |D      |C  |PHI    |LAL    |111
|Andre Drummond    |C       |4400   |D      |C  |PHI    |LAL    |111
|Karl-Anthony Towns |C      |9300   |A      |C  |MIN    |GSW    |112
|Avery Bradley      |SG |3700       |D      |C  |LAL    |PHI    |106
|Carmelo Anthony    |SF/PF  |5200   |C      |C  |LAL    |PHI    |106
|Anthony Davis      |PF/C   |8900   |B      |C  |LAL    |PHI    |106
|Jordan Poole       |PG/SG  |5300   |C      |C  |GSW    |MIN    |119
|Otto Porter Jr.    |SF/PF  |5700   |C      |C  |GSW    |MIN    |119
|Malik Beasley      |SF/SG  |3800   |D      |D  |MIN    |GSW    |112
|Jaden McDaniels    |PF     |3900   |D      |D  |MIN    |GSW    |112
|Taurean Prince     |SF/PF  |3500   |E      |D  |MIN    |GSW    |112
|Klay Thompson      |SG     |6200   |C      |D  |GSW    |MIN    |119
|Damion Lee         |SG     |3700   |E      |D  |GSW    |MIN    |119
|Nemanja Bjelica    |PF     |4000   |D      |D  |GSW    |MIN    |119
|Isaiah Joe         |PG     |3500   |E      |E  |PHI    |LAL    |111
|Danny Green        |SG/SF  |3600   |E      |E  |PHI    |LAL    |111

【问题讨论】:

标签: python pandas csv


【解决方案1】:

删除两列中具有相同值的行。

import pandas as pd
csv_filepath = '/home/joe/Downloads/NBA FD Rankings.csv'
df = pd.read_csv(csv_filepath)
df = df[df["fpts"] != df["values"]]

【讨论】:

  • 如何指定只有 D:D, D:E, C:D, D,E, 的匹配项
  • 使用普通的布尔表达式。在 pandas 中,您对布尔表达式使用按位运算符,它们是“&|~”用于与,或与非。您还应该使用括号以避免歧义。
【解决方案2】:

有几种方法可以解决这个问题。 pandas 有多种处理逻辑和基于逻辑选择行的方法。基本上在这里,我提取了与逻辑匹配的索引值,并从原始数据框中删除了这些行。

注意条件运算符& 是AND,| 是OR

给定这个数据集:

import pandas as pd
from pandas.api.types import CategoricalDtype

cdf = pd.DataFrame( 
[['Georges Niang',     'PF/SF',   '3700',   'D',      'C',  'PHI',    'LAL',    '111'],
['Andre Drummond',   'C',       '4400',   'D',      'C',  'PHI',    'LAL',    '111'],
['Karl-Anthony Towns', 'C',      '9300',   'A',      'C',  'MIN',    'GSW',    '112'],
['Avery Bradley',      'SG', '3700',       'D',      'C',  'LAL',    'PHI',    '106'],
['Carmelo Anthony',    'SF/PF',  '5200',   'C',      'C',  'LAL',    'PHI',    '106'],
['Anthony Davis',      'PF/C',   '8900',   'B',      'C',  'LAL ',   'PHI',    '106'],
['Jordan Poole',       'PG/SG',  '5300',   'C',      'C',  'GSW',    'MIN',    '119'],
['Otto Porter Jr.',    'SF/PF',  '5700',   'C',      'C',  'GSW',    'MIN',    '119'],
['Malik Beasley',      'SF/SG',  '3800',   'D',      'D',  'MIN',    'GSW',    '112'],
['Jaden McDaniels',    'PF',     '3900',   'D',      'D',  'MIN',    'GSW',    '112'],
['Taurean Prince',     'SF/PF',  '3500',   'E',      'D',  'MIN',    'GSW',    '112'],
['Klay Thompson',      'SG',     '6200',   'C',      'D',  'GSW',    'MIN',    '119'],
['Damion Lee',         'SG',    '3700',   'E',      'D',  'GSW',    'MIN',    '119'],
['Nemanja Bjelica',    'PF',     '4000',   'D',      'D',  'GSW',    'MIN',    '119'],
['Isaiah Joe',       'PG',     '3500',   'E',      'E',  'PHI',    'LAL',    '111'],
['Danny Green',        'SG/SF',  '3600',   'E',      'E',  'PHI',    'LAL',    '111']],
columns = ['Name','Position','Salary','Fpts','Value','Team','Matchup','Team total'])

选项 A:对组合进行硬编码

remove_index = cdf[   (cdf['Fpts'] == 'C') & (cdf['Value'] == 'D')
                   |  (cdf['Fpts'] == 'D') & (cdf['Value'] == 'C')
                   |  (cdf['Fpts'] == 'D') & (cdf['Value'] == 'D')
                   |  (cdf['Fpts'] == 'D') & (cdf['Value'] == 'E')
                   |  (cdf['Fpts'] == 'E') & (cdf['Value'] == 'D')
                   |  (cdf['Fpts'] == 'E') & (cdf['Value'] == 'E')].index

filtered_cdf = cdf.drop(remove_index)

选项 B:将您的“成绩”转换为有序的分类列,并在过滤中加入一些逻辑

# List the grades, then reverse it so that "A" is considered "bigger"/better than "B"
grades = ["A", "B", "C", "D", "E"]
grades.reverse()
cat_type = CategoricalDtype(categories=grades, ordered=True)
cdf[['Fpts', 'Value']] = cdf[['Fpts', 'Value']].astype(cat_type)

# Find index rows that match the logic
remove_index = cdf[(cdf['Fpts'] <= 'C') & (cdf['Value'] <= 'D') |
                   (cdf['Value'] <= 'C') & (cdf['Fpts'] <= 'D')].index
filtered_cdf = cdf.drop(remove_index)

输出:

print(filtered_cdf)
                 Name Position Salary Fpts Value  Team Matchup Team total
2  Karl-Anthony Towns        C   9300    A     C   MIN     GSW        112
4     Carmelo Anthony    SF/PF   5200    C     C   LAL     PHI        106
5       Anthony Davis     PF/C   8900    B     C  LAL      PHI        106
6        Jordan Poole    PG/SG   5300    C     C   GSW     MIN        119
7     Otto Porter Jr.    SF/PF   5700    C     C   GSW     MIN        119

【讨论】:

    猜你喜欢
    • 1970-01-01
    • 1970-01-01
    • 2021-09-29
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    相关资源
    最近更新 更多