【问题标题】:How can I make this code more pythonic?我怎样才能使这段代码更pythonic?
【发布时间】:2015-09-07 15:24:34
【问题描述】:

我正在阅读一堆日常文件,并使用 glob 将它们全部连接到单独的数据帧中。我最终将它们连接在一起,基本上创建了一个大文件,用于连接到仪表板。我对 Python 不太熟悉,但我经常使用 pandas 和 sklearn。

如您所见,我基本上只是在读取过去 60(或更多)天的数据(最后 60 个文件)并为每个数据创建一个数据框。这行得通,但我想知道是否有更pythonic/更好的方法?我观看了一个关于 pydata 的视频(关于不受 PEP 8 限制并确保您的代码是 Pythonic),这很有趣。

(仅供参考 - 我需要阅读 60 天时间的原因是因为客户可以从很久以前发生的电话中填写调查。客户今天填写了关于 7 月发生的电话的调查. 我需要知道那个电话(它持续了多长时间,主题是什么等)。

os.chdir(r'C:\\Users\Documents\FTP\\')
loc = r'C:\\Users\Documents\\'
rosterloc = r'\\mand\\'
splitsname = r'Splits.csv'
fcrname = r'global_disp_'
npsname = r'survey_'
ahtname = r'callbycall_'
rostername = 'Daily_Roster.csv'
vasname = r'vas_report_'
ext ='.csv'
startdate = dt.date.today() - Timedelta('60 day')
enddate = dt.date.today() 
daterange = Timestamp(enddate) - Timestamp(startdate)
daterange = (daterange / np.timedelta64(1, 'D')).astype(int)

data = []
frames = []
calls = []
bracket = []
try:
    for date_range in (Timestamp(startdate) + dt.timedelta(n) for n in range(daterange)):
        aht = pd.read_csv(ahtname+date_range.strftime('%Y_%m_%d')+ext)
        calls.append(aht)
except IOError:
        print('File does not exist:', ahtname+date_range.strftime('%Y_%m_%d')+ext)
aht = pd.concat(calls)
print('AHT Done')                 
try:
    for date_range in (Timestamp(startdate) + dt.timedelta(n) for n in range(daterange)):
        fcr = pd.read_csv(fcrname+date_range.strftime('%m_%d_%Y')+ext, parse_dates = ['call_time'])
        data.append(fcr)
except IOError:
        print('File does not exist:', fcrname+date_range.strftime('%m_%d_%Y')+ext)
fcr = pd.concat(data)
print('FCR Done')                                                
try:
    for date_range in (Timestamp(enddate) - dt.timedelta(n) for n in range(3)):
        nps = pd.read_csv(npsname+date_range.strftime('%m_%d_%Y')+ext, parse_dates = ['call_date','date_completed'])
        frames.append(nps)
except IOError:
        print('File does not exist:', npsname+date_range.strftime('%m_%d_%Y')+ext)
nps = pd.concat(frames)
print('NPS Done')                
try:
    for date_range in (Timestamp(startdate) + dt.timedelta(n) for n in range(daterange)):
        vas = pd.read_csv(vasname+date_range.strftime('%m_%d_%Y')+ext, parse_dates = ['Call_date'])
        bracket.append(vas)
except IOError:
        print('File does not exist:', vasname+date_range.strftime('%m_%d_%Y')+ext)
vas = pd.concat(bracket)
print('VAS Done')                 
roster = pd.read_csv(loc+rostername)
print('Roster Done')
splits = pd.read_csv(loc+splitsname)
print('Splits Done')      

【问题讨论】:

标签: python pandas


【解决方案1】:

我没有更改名称,但恕我直言,它们应该更详细,例如。 pd ==熊猫?没有把握。这里有一些更 Pythonic 的写法:

from functools import partial
import logging
from operator import add, sub
import os
import datetime as dt
import contextlib

os.chdir(r'C:\\Users\Documents\FTP\\')
location = r'C:\\Users\Documents\\'
roster_location = r'\\mand\\'
splits_name = r'Splits.csv'
fcr_name = r'global_disp_'
nps_name = r'survey_'
aht_name = r'callbycall_'
roster_name = 'Daily_Roster.csv'
vas_name = r'vas_report_'
ext = '.csv'
start_date = dt.date.today() - Timedelta('60 day')
end_date = dt.date.today()
daterange = Timestamp(end_date) - Timestamp(start_date)
daterange = (daterange / np.timedelta64(1, 'D')).astype(int)
logger = logging.getLogger()    # logger is better than "print" in case, when you have multiple tiers to log. In this case: regular debug and exceptions


def timestamps_in_range(daterange, method=add):    # injected operation method instead of "if" statement in case of subtracting
    for n in xrange(daterange):
        yield method(Timestamp(start_date), dt.timedelta(n))    # use generators for creating series of data in place


def read_csv(name, date_range, **kwargs):    # use functions/methods to shorten (make more readable) long, repetitive method invocation
    return pd.read_csv(name + date_range.strftime('%Y_%m_%d') + ext, kwargs)


def log_done(module):    # use functions/methods to shorten (make more readable) long, repetitive method invocation
    logger.debug("%s Done" % module)


@contextlib.contextmanager    #contextmanager is great to separate business logic from exception handling
def mapper(function, iterable):
    try:
        yield map(function, iterable)    # map instead of executing function in "for" loop
    except IOError, err:
        logger.error('File does not exist: ', err.filename)


# Following code is visualy tight and cleaner. 
# Shows only what's needed, hiding most insignificant details and repetitive code

read_csv_aht = partial(read_csv, aht_name)    # partial pre-fills function (first argument) with arguments of this function (remaining arguments). In this case it is useful for feeding "map" function - it takes one-argument function to execute on each element of a list
with mapper(read_csv_aht, timestamps_in_range(daterange)) as calls:    # contextmanager beautifully hides "dangerous" content, sharing only the "safe" result to be used
    aht = pd.concat(calls)
    log_done('AHT')

read_csv_fcr = partial(read_csv, fcr_name)
with mapper(read_csv_fcr, timestamps_in_range(daterange)) as data:
    fcr = pd.concat(data)
    log_done('FCR')

read_csv_nps = partial(read_csv, nps_name, parse_dates=['call_date', 'date_completed'])
with mapper(read_csv_nps, timestamps_in_range(3, sub)) as frames:
    nps = pd.concat(frames)
    log_done('NPS')

read_csv_vas = partial(read_csv, vas_name, parse_dates=['Call_date'])
with mapper(read_csv_vas, timestamps_in_range(daterange)) as bracket:
    vas = pd.concat(bracket)
    log_done('VAS')

roster = pd.read_csv(location + roster_name)
log_done('Roster')

splits = pd.read_csv(location + splits_name)
log_done('Splits')

【讨论】:

  • 引用并解释您为使 OP 的代码更“pythonic”所做的具体更改会很有帮助。
猜你喜欢
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 2012-10-12
  • 2017-08-15
  • 1970-01-01
  • 2021-01-31
  • 1970-01-01
  • 1970-01-01
相关资源
最近更新 更多