【问题标题】:Python read text file for data then extract sub-strings from list of stringsPython读取数据的文本文件,然后从字符串列表中提取子字符串
【发布时间】:2019-12-13 15:34:18
【问题描述】:

我有一个天气数据文件,其中包含高温、低温、降雨等。我需要打开该文件并根据用户输入的年份范围返回数据。用户输入开始日期和结束日期,然后我将该数据放入列表中,然后用户可以在该年份范围的数据子列表中搜索最高 (HIGHTEMP) 或最低温度 (LOWTEMP) 或最高降雨量 (PRCP)。目前我可以搜索字符串,但不确定如何识别高温,例如,然后在子列表中收集高温,然后找到最高的,然后返回该数据。与低温和降雨相同。

这是我目前所拥有的:

def openFile():
    begin = input("Enter your starting year in this format YYYY ")
    end = input("Enter your ending year for weather data in this format YYYY ")

    lines = tuple(open('/Users/jasontt/test/spokaneweatherdata.txt', 'r'))
    #print(lines)
    print("")
    #print(lines[1])
    print("")

    result = [i for i in lines if str(begin) in i]
    #print("This is begining data ", result)

    resultTwo = [i for i in lines if str(end) in i]
    #print("This is end of data ", resultTwo)
    #Combined list based on years entered
    ultimateList = [result + resultTwo]
    #Combined list of weather data for years selected
    print(ultimateList)

    '''

测试数据:

STATION           STATION_NAME                                       ELEVATION  LATITUDE   LONGITUDE  DATE     PRCP     TEMPMAX     TEMPMIN
----------------- -------------------------------------------------- ---------- ---------- ---------- -------- -------- -------- --------
GHCND:USW00013741                     SPOKANE REGIONAL AIRPORT WA US      366.1   37.31667  -79.96667 19490101 0.00     44       27
GHCND:USW00013741                     SPOKANE REGIONAL AIRPORT WA US      366.1   37.31667  -79.96667 19490102 0.00     42       25
GHCND:USW00013741                     SPOKANE REGIONAL AIRPORT WA US      366.1   37.31667  -79.96667 19490103 0.15     46       30
GHCND:USW00013741                     SPOKANE REGIONAL AIRPORT WA US      366.1   37.31667  -79.96667 19490104 0.03     41       30
GHCND:USW00013741                     SPOKANE REGIONAL AIRPORT WA US      366.1   37.31667  -79.96667 19490105 1.14     46       37
GHCND:USW00013741                     SPOKANE REGIONAL AIRPORT WA US      366.1   37.31667  -79.96667 19490106 0.00     51       40
GHCND:USW00013741                     SPOKANE REGIONAL AIRPORT WA US      366.1   37.31667  -79.96667 19490107 0.00     57       36
GHCND:USW00013741                     SPOKANE REGIONAL AIRPORT WA US      366.1   37.31667  -79.96667 19490108 0.00     56       45
GHCND:USW00013741                     SPOKANE REGIONAL AIRPORT WA US      366.1   37.31667  -79.96667 19490109 0.00     66       42
GHCND:USW00013741                     SPOKANE REGIONAL AIRPORT WA US      366.1   37.31667  -79.96667 19490110 0.00     70       51
GHCND:USW00013741                     SPOKANE REGIONAL AIRPORT WA US      366.1   37.31667  -79.96667 19490111 0.03     59       45
GHCND:USW00013741                     SPOKANE REGIONAL AIRPORT WA US      366.1   37.31667  -79.96667 19490112 0.04     48       38
GHCND:USW00013741                     SPOKANE REGIONAL AIRPORT WA US      366.1   37.31667  -79.96667 19490113 0.00     52       36
GHCND:USW00013741                     SPOKANE REGIONAL AIRPORT WA US      366.1   37.31667  -79.96667 19490114 0.00     56       36
GHCND:USW00013741                     SPOKANE REGIONAL AIRPORT WA US      366.1   37.31667  -79.96667 19490115 0.00     49       31
GHCND:USW00013741                     SPOKANE REGIONAL AIRPORT WA US      366.1   37.31667  -79.96667 19490116 0.00     68       28
GHCND:USW00013741                     SPOKANE REGIONAL AIRPORT WA US      366.1   37.31667  -79.96667 19490117 0.00     63       50
GHCND:USW00013741                     SPOKANE REGIONAL AIRPORT WA US      366.1   37.31667  -79.96667 19490118 0.04     53       42
GHCND:USW00013741                     SPOKANE REGIONAL AIRPORT WA US      366.1   37.31667  -79.96667 19490119 0.01     63       38
GHCND:USW00013741                     SPOKANE REGIONAL AIRPORT WA US      366.1   37.31667  -79.96667 19490120 0.00     45       28
GHCND:USW00013741                     SPOKANE REGIONAL AIRPORT WA US      366.1   37.31667  -79.96667 19490121 0.97     35       28
GHCND:USW00013741                     SPOKANE REGIONAL AIRPORT WA US      366.1   37.31667  -79.96667 19490122 0.29     60       34
GHCND:USW00013741                     SPOKANE REGIONAL AIRPORT WA US      366.1   37.31667  -79.96667 19490123 0.14     47       38
GHCND:USW00013741                     SPOKANE REGIONAL AIRPORT WA US      366.1   37.31667  -79.96667 19490124 0.01     72       38
GHCND:USW00013741                     SPOKANE REGIONAL AIRPORT WA US      366.1   37.31667  -79.96667 19490125 0.05     66       49

【问题讨论】:

  • 确切地知道每行中的哪些字符,按它们的位置,包含您所寻找的温度。
  • 感谢斯科特的编辑!

标签: python list filter text-files


【解决方案1】:

很难从复制粘贴的数据样本中分辨出来,但看起来您的文件使用的是“固定宽度”行格式 - 一行中的每一列都从给定位置开始并在给定位置结束。这在当时是一种相当普遍的“格式”......

所以这里你想要的是写下每一列的名称、开始和结束位置,这样你就可以很容易地将行解析成字段,即:

FORMAT_MAP = {
    # fieldname : (start, end)
    "STATION": (0, 17),
    "STATION_NAME": (18, 68),
    "ELEVATION": (69, 79),
    # etc...
    }


def parse_line(line):
    return {name: line[start:end].strip() for name, (start, end) in FORMAT_MAP.items()}

现在您可以将文件解析为一系列字段字典:

def iter_parse_file(f, startyear, endyear):
   # skip the first two header lines
   next(f);  next(f)

   for line in f: 
      # we assume the lines are sorted on date, and that the
      # date format is YYYYMMDD. 
      row = parse_line(line)
      year = row["DATE"][:4]
      if year < startyear:
         continue
      elif year > endyear:
         break
      yield row


with open("your/file.ext") as f:
    rows = list(iter_parse_file(f, startyear, endyear))

for row in rows:
    print("{DATE} : {TEMPMIN} - {TEMPMAX}".format(**row))

您还可以对列值进行过滤、排序等,构建熊猫数据框等。

请注意,您可以(并且可能希望)在解析期间将数据转换为正确的类型。有了上面的起点,你应该可以很容易地做到这一点。

【讨论】:

    猜你喜欢
    • 2016-01-21
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2015-08-06
    • 1970-01-01
    • 2019-02-08
    • 1970-01-01
    • 1970-01-01
    相关资源
    最近更新 更多