【问题标题】:Filtering data through command line通过命令行过滤数据
【发布时间】:2022-01-30 19:28:31
【问题描述】:

我目前正在研究一个分析 tsv 文件数据的程序。我创建了基本功能,但我需要进一步过滤数据框。我有需要用于过滤的运营商、来源和日期列。这就是我现在的方法:

import argparse
import pandas as pd
# Parsing arguments. You must not modify these lines!
parser = argparse.ArgumentParser()
parser.add_argument("statistic", choices=["avg", "max"], help="Which statistic should be run?")
parser.add_argument("variable", choices=["distance", "delay"], help="What variable should be used for the calculation?")
parser.add_argument("tsvfile", help="Name of data file to be analyzed")
parser.add_argument("--carrier", dest="carrier", help="Comma-separated list of airline codes for those airlines whose flights should be included")
#parser.add_argument("--date", dest="date", help="Departure dates for flights to be included")
#parser.add_argument("--origin", dest="origin", help="Departure dates for flights to be included")
args = parser.parse_args()

# Start here with the rest of the program....

#accesing the values
stats = args.statistic
var = args.variable
car = args.carrier
#the_date = args.date
#origin = args.origin

#opening the file
file = pd.read_csv(args.tsvfile,  sep='\t')



#printing the max distance
if stats == "max" and var == "distance":
    print(max(file["DISTANCE"]))

#printing the max delay
if stats == "max" and var =="delay":
    print(max(file["DEPARTURE_DELAY"]))

#printing the avg delay
if stats == "avg" and var == "delay":
    no_of_planes_delay = 0
sum_delay = 0
for number in file["DEPARTURE_DELAY"]:
    if number > 0:
        no_of_planes_delay += 1
        sum_delay = sum_delay + number
    if number <= 0:
        no_of_planes_delay +=1
        sum_delay = sum_delay + 0
average_delay = sum_delay/no_of_planes_delay
print(round(average_delay, 1))

#printing the avg distance
if stats == "avg" and var == "distance":
    sum_distance = 0
no_of_planes = 0
for number in file["DISTANCE"]:
    no_of_planes +=1
    sum_distance = sum_distance + number           
average_distance = (sum_distance/no_of_planes)
print(round(average_distance, 1))`

所以我需要通过命令行应用这些过滤器,例如 python flight.py --carrier AA,DL --origin JFK avg delay flight.tsv 有谁知道我如何使用我的函数并进一步过滤数据框?

【问题讨论】:

    标签: python pandas dataframe command-line-arguments argparse


    【解决方案1】:

    过滤器见Pandas filter rows based on multiple conditions

    carrier_list = args.carrier.split(',')
    file[file.Carrier.isin(carrier_list)]
    

    你可以使用字典来避免那些 if 条件

        var_to_col_mapping = {
           'distance':'DISTANCE',
           'delay':'DEPARTURE_DELAY'
        }
    
        def calc_avg(df):
           # logic to calculate average goes here
           pass
    
        def calc_max(df):
           # logic to calculate max goes here
           pass
    
        stat_to_func_mapping = {
           'max':calc_max,
           'avg':calc_avg
        }
    
        print( stat_to_func_mapping[stats](file[var_to_col_mapping[var]]) )
    

    【讨论】:

      猜你喜欢
      • 2019-08-10
      • 2013-02-25
      • 1970-01-01
      • 2012-12-11
      • 2015-08-17
      • 1970-01-01
      • 2014-02-27
      • 2018-12-05
      • 2017-09-08
      相关资源
      最近更新 更多