【发布时间】:2020-12-07 13:50:19
【问题描述】:
我正在做一个 MapReduce 项目,我的输入是(天、车站、温度),我的目标是输出每个车站每天的最高和最低温度。所以基本上对于这个输入,我的输出应该是这样的:
输入:
20200101, station1, 35
20200101, station1, 44
20200101, station1, 77
20200101, station3, 66,
20200101, station3, 99
20200102, station1, 54,
20200102, station2, 55,
输出:
20200101, station1, max(77) min(35)
20200101, station3, max(99) min(66)
20200102, station1, max(54) min(..)
20200102, station2, max(55) min(..)
到目前为止,我所尝试的仅对 2 个列表有用,不适用于 3 个列表: 对于每一天,找到每个气象站,对于每个气象站,每个温度......
这是我迄今为止尝试过的代码:
# Read file txt file in
file1 = open('bigdatatemp.txt', 'r')
Lines = file1.readlines()
Lines ouput: (the variables that are important are (WBAN NUMBER = station, YearMonthDay = day, DryBulb Temp = temperature)
['Wban Number, YearMonthDay, Time, Station Type, Maintenance Indicator, Sky Conditions, Visibility, Weather Type, Dry Bulb Temp, Dew Point Temp, Wet Bulb Temp, % Relative Humidity, Wind Speed (kt), Wind Direction, Wind Char. Gusts (kt), Val for Wind Char., Station Pressure, Pressure Tendency, Sea Level Pressure, Record Type, Precip. Total\n',
'03011,20070401,0050,AO2 ,-,SCT055 ,10SM ,-,32,23,28,69 , 4 ,130,-,0 ,30.13,-,-,AA,-\n',
'03011,20070401,0150,AO2 ,-,BKN055 ,10SM ,-,32,23,28,69 , 4 ,140,-,0 ,30.12,-,-,AA,-\n',
'03011,20070401,0250,AO2 ,-,OVC050 ,10SM ,-,32,23,28,69 , 3 ,130,-,0 ,30.12,-,-,AA,-\n',
'03011,20070401,0350,AO2 ,-,OVC050 ,10SM ,-,34,23,30,64 , 3 ,120,-,0 ,30.12,-,-,AA,-\n',
'03011,20070401,0450,AO2 ,-,BKN050 ,10SM ,-,34,23,30,64 , 4 ,130,-,0 ,30.11,-,-,AA,-\n',
'03011,20070401,0550,AO2 ,-,SCT050 SCT070 ,10SM ,-,32,25,28,75 , 3 ,150,-,0 ,30.10,-,-,AA,-\n',
'03011,20070401,0650,AO2 ,-,SCT070 ,10SM ,-,34,25,30,70 , 3 ,130,-,0 ,30.12,-,-,AA,-\n',
'03012,20070401,0750,AO2 ,-,CLR ,10SM ,-,37,27,34,67 , 4 ,140,-,0 ,30.12,-,-,AA,-\n',
'03011,20070401,0850,AO2 ,-,SCT060 BKN075 ,10SM ,-,41,27,36,58 , 0 ,000,-,0 ,30.13,-,-,AA,-\n',
'03011,20070401,0950,AO2 ,-,SCT060 OVC075 ,10SM ,-,45,23,37,42 , 0 ,000,-,0 ,30.14,-,-,AA,-\n',
然后我创建一个字典并创建 3 个包含所需变量(车站、年份、温度)的列表
# Create a dictionary
# Iterate each line
# If the key doesn't exist, create one equal to empty list
# Otherwise, append temperature to list
# This also uses an interim dictionary (tmp).
years = []
stations = []
temps = []
for line in Lines:
(station, year, ac, ad, af, ag, ah, aj, temp, al, ae, ar, at, ay, au, ai, alc, ap, ax, av, an) = line.split(',')
stations.append(station)
years.append(year)
temps.append(temp)
最后但并非最不重要的是我被卡住的地方。我为 2 个列表创建了一个循环并遍历它们:
dayTemps = {d:[] for d in stations}
for d,t in zip(stations,temps): dayTemps[d].append(t)
print(dayTemps)
output:
{'Wban Number': [' Dry Bulb Temp'], '03011': ['32', '32', '32', '34', '34', '32', '34', '41', '45', '55', '54', '54', '52', '46', '43', '43', '43'], '03012': ['37', '46', '54', '46', '45', '43'], '03013': ['50', '52', '50', '46', '45'], '03014': ['45']}
但我实际上也需要 day 变量,但我似乎无法理解它。它应该是一个以日期为键、以我上面的字典为值的字典吗?另外,我将如何构建它,以便我获得每个气象站的最高和最低温度,应该在 1 步或 2/多个步骤中发生吗?
【问题讨论】:
-
您是否考虑过为此使用熊猫?这将使这变得非常简单。
-
@Chris 是的,但是我认为它不适用于 MapReduce 脚本,或者会吗?
标签: python list loops dictionary