【问题标题】:Loop over three lists to create combined output?循环三个列表以创建组合输出?
【发布时间】:2020-12-07 13:50:19
【问题描述】:

我正在做一个 MapReduce 项目,我的输入是(天、车站、温度),我的目标是输出每个车站每天的最高和最低温度。所以基本上对于这个输入,我的输出应该是这样的:

输入:

20200101, station1, 35
20200101, station1, 44
20200101, station1, 77
20200101, station3, 66,
20200101, station3, 99
20200102, station1, 54, 
20200102, station2, 55, 

输出:

20200101, station1, max(77) min(35)
20200101, station3, max(99) min(66)
20200102, station1, max(54) min(..)
20200102, station2, max(55) min(..)

到目前为止,我所尝试的仅对 2 个列表有用,不适用于 3 个列表: 对于每一天,找到每个气象站,对于每个气象站,每个温度......

这是我迄今为止尝试过的代码:

# Read file txt file in 
file1 = open('bigdatatemp.txt', 'r') 
Lines = file1.readlines() 

Lines ouput: (the variables that are important are (WBAN NUMBER = station, YearMonthDay = day, DryBulb Temp = temperature) 

['Wban Number, YearMonthDay, Time, Station Type, Maintenance Indicator, Sky Conditions, Visibility, Weather Type, Dry Bulb Temp, Dew Point Temp, Wet Bulb Temp, % Relative Humidity, Wind Speed (kt), Wind Direction, Wind Char. Gusts (kt), Val for Wind Char., Station Pressure, Pressure Tendency, Sea Level Pressure, Record Type, Precip. Total\n',
 '03011,20070401,0050,AO2 ,-,SCT055                                       ,10SM   ,-,32,23,28,69  , 4   ,130,-,0  ,30.13,-,-,AA,-\n',
 '03011,20070401,0150,AO2 ,-,BKN055                                       ,10SM   ,-,32,23,28,69  , 4   ,140,-,0  ,30.12,-,-,AA,-\n',
 '03011,20070401,0250,AO2 ,-,OVC050                                       ,10SM   ,-,32,23,28,69  , 3   ,130,-,0  ,30.12,-,-,AA,-\n',
 '03011,20070401,0350,AO2 ,-,OVC050                                       ,10SM   ,-,34,23,30,64  , 3   ,120,-,0  ,30.12,-,-,AA,-\n',
 '03011,20070401,0450,AO2 ,-,BKN050                                       ,10SM   ,-,34,23,30,64  , 4   ,130,-,0  ,30.11,-,-,AA,-\n',
 '03011,20070401,0550,AO2 ,-,SCT050 SCT070                                ,10SM   ,-,32,25,28,75  , 3   ,150,-,0  ,30.10,-,-,AA,-\n',
 '03011,20070401,0650,AO2 ,-,SCT070                                       ,10SM   ,-,34,25,30,70  , 3   ,130,-,0  ,30.12,-,-,AA,-\n',
 '03012,20070401,0750,AO2 ,-,CLR                                          ,10SM   ,-,37,27,34,67  , 4   ,140,-,0  ,30.12,-,-,AA,-\n',
 '03011,20070401,0850,AO2 ,-,SCT060 BKN075                                ,10SM   ,-,41,27,36,58  , 0   ,000,-,0  ,30.13,-,-,AA,-\n',
 '03011,20070401,0950,AO2 ,-,SCT060 OVC075                                ,10SM   ,-,45,23,37,42  , 0   ,000,-,0  ,30.14,-,-,AA,-\n',

然后我创建一个字典并创建 3 个包含所需变量(车站、年份、温度)的列表

# Create a dictionary
# Iterate each line
# If the key doesn't exist, create one equal to empty list
# Otherwise, append temperature to list
# This also uses an interim dictionary (tmp).
years = []
stations = []
temps = []

for line in Lines:
    (station, year, ac, ad, af, ag, ah, aj, temp, al, ae, ar, at, ay, au, ai, alc, ap, ax, av, an) = line.split(',')
    stations.append(station)
    years.append(year)
    temps.append(temp)

最后但并非最不重要的是我被卡住的地方。我为 2 个列表创建了一个循环并遍历它们:

dayTemps = {d:[] for d in stations}
for d,t in zip(stations,temps): dayTemps[d].append(t)

print(dayTemps)

output:
{'Wban Number': [' Dry Bulb Temp'], '03011': ['32', '32', '32', '34', '34', '32', '34', '41', '45', '55', '54', '54', '52', '46', '43', '43', '43'], '03012': ['37', '46', '54', '46', '45', '43'], '03013': ['50', '52', '50', '46', '45'], '03014': ['45']}

但我实际上也需要 day 变量,但我似乎无法理解它。它应该是一个以日期为键、以我上面的字典为值的字典吗?另外,我将如何构建它,以便我获得每个气象站的最高和最低温度,应该在 1 步或 2/多个步骤中发生吗?

【问题讨论】:

  • 您是否考虑过为此使用熊猫?这将使这变得非常简单。
  • @Chris 是的,但是我认为它不适用于 MapReduce 脚本,或者会吗?

标签: python list loops dictionary


【解决方案1】:

或多或少如下

data = {}
MIN = 0
MAX = 1
DATE = 0
STATION = 1
VALUE = 2
with open('in.txt') as f:
    lines = [line.strip() for line in f.readlines()]
    for line in lines:
        fields = [f.strip() for f in line.split(',')]
        if data.get(fields[DATE]) is None:
            data[fields[DATE]] = {}
        if fields[STATION] not in data[fields[DATE]]:
            data[fields[DATE]][fields[STATION]] = [None, None]
        if data[fields[DATE]][fields[STATION]][MIN] is None:
            data[fields[DATE]][fields[STATION]][MIN] = (int(fields[VALUE]))
        else:
            if data[fields[DATE]][fields[STATION]][MIN] > int(fields[VALUE]):
                data[fields[DATE]][fields[STATION]][MIN] = (int(fields[VALUE]))
        if data[fields[DATE]][fields[STATION]][MAX] is None:
            data[fields[DATE]][fields[STATION]][MAX] = (int(fields[VALUE]))
        else:
            if data[fields[DATE]][fields[STATION]][MAX] < int(fields[VALUE]):
                data[fields[DATE]][fields[STATION]][MAX] = (int(fields[VALUE]))


for date, stations in data.items():
    for station, values in stations.items():
        print(f'{date} {station} {values}')

in.txt

20200101, station1, 35
20200101, station1, 44
20200101, station1, 77
20200101, station3, 66
20200101, station3, 99
20200102, station1, 54
20200102, station2, 55

输出

20200101 station1 [35, 77]
20200101 station3 [66, 99]
20200102 station1 [54, 54]
20200102 station2 [55, 55]

【讨论】:

    【解决方案2】:
    lines = ['Wban Number, YearMonthDay, Time, Station Type, Maintenance Indicator, Sky Conditions, Visibility, Weather Type, Dry Bulb Temp, Dew Point Temp, Wet Bulb Temp, % Relative Humidity, Wind Speed (kt), Wind Direction, Wind Char. Gusts (kt), Val for Wind Char., Station Pressure, Pressure Tendency, Sea Level Pressure, Record Type, Precip. Total\n',
     '03011,20070401,0050,AO2 ,-,SCT055                                       ,10SM   ,-,32,23,28,69  , 4   ,130,-,0  ,30.13,-,-,AA,-\n',
     '03011,20070401,0150,AO2 ,-,BKN055                                       ,10SM   ,-,32,23,28,69  , 4   ,140,-,0  ,30.12,-,-,AA,-\n',
     '03011,20070401,0250,AO2 ,-,OVC050                                       ,10SM   ,-,32,23,28,69  , 3   ,130,-,0  ,30.12,-,-,AA,-\n',
     '03011,20070401,0350,AO2 ,-,OVC050                                       ,10SM   ,-,34,23,30,64  , 3   ,120,-,0  ,30.12,-,-,AA,-\n',
     '03011,20070401,0450,AO2 ,-,BKN050                                       ,10SM   ,-,34,23,30,64  , 4   ,130,-,0  ,30.11,-,-,AA,-\n',
     '03011,20070401,0550,AO2 ,-,SCT050 SCT070                                ,10SM   ,-,32,25,28,75  , 3   ,150,-,0  ,30.10,-,-,AA,-\n',
     '03011,20070401,0650,AO2 ,-,SCT070                                       ,10SM   ,-,34,25,30,70  , 3   ,130,-,0  ,30.12,-,-,AA,-\n',
     '03012,20070401,0750,AO2 ,-,CLR                                          ,10SM   ,-,37,27,34,67  , 4   ,140,-,0  ,30.12,-,-,AA,-\n',
     '03011,20070401,0850,AO2 ,-,SCT060 BKN075                                ,10SM   ,-,41,27,36,58  , 0   ,000,-,0  ,30.13,-,-,AA,-\n',
     '03011,20070401,0950,AO2 ,-,SCT060 OVC075                                ,10SM   ,-,45,23,37,42  , 0   ,000,-,0  ,30.14,-,-,AA,-\n',]
    
    
    
    lst = [i.split(',')[0:2] + [i.split(',')[8]] for i in lines[1:]]
    
    station = set([i[0] for i in lst])
    
    data = list(map(lambda station_now: (max([l for l in lst if l[0] == station_now]), min([l for l in lst if l[0] == station_now])), station))
    
    for collected_data in data:
        print(collected_data[0][1],collected_data[0][0],' max(',collected_data[0][2],')',' min(',collected_data[1][2],')')
    
    >>> 20070401 03012  max( 37 )  min( 37 )
        20070401 03011  max( 45 )  min( 32 )
    
    

    创建子列表

    然后创建另一个包含不同站号的子列表

    然后遍历每个子列表得到最大值和最小值

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 2019-08-12
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2014-11-17
      • 1970-01-01
      • 2023-04-01
      • 1970-01-01
      相关资源
      最近更新 更多