【问题标题】:How to parse block data from a text file into an 2D array in Python?如何在 Python 中将文本文件中的块数据解析为二维数组?
【发布时间】:2012-09-01 00:31:13
【问题描述】:

我正在尝试解析具有以下结构的文本文件:

latitude                        5.0000
number_of_data_values             9
  0.1   0.2   0.3   0.4
  1.1   1.2   1.3   1.4      
  8.1
latitude                        4.3000
number_of_data_values             9
  0.1   0.2   0.3   0.4
  1.1   1.2   1.3   1.4       
  8.1
latitude                        4.0000
number_of_data_values             9
  0.1   0.2   0.3   0.4
  1.1   1.2   1.3   1.4       
  8.1
 ...

每个不同的latitude 数字都是不同的数组行。 number_of_data_values 是列数(与文件一致)。

对于这个例子,我想读取文件并输出一个 3 x 9 的二维数组,如下所示:

array = [[0.1,0.2,0.3,0.4,1.1,1.2,1.3,1.4,8.1],
         [0.1,0.2,0.3,0.4,1.1,1.2,1.3,1.4,8.1],
         [0.1,0.2,0.3,0.4,1.1,1.2,1.3,1.4,8.1]]

我尝试通过循环遍历该行来尝试它,但我正在寻找一种更有效的方法来做到这一点,因为我可能会处理大量输入文件。

【问题讨论】:

    标签: python arrays parsing text block


    【解决方案1】:

    逐行实现相当简单易懂。假设您的 latitude 总是从新行开始(这不是您的示例给出的,但可能是错字),您可以这样做:

    latitudes = []
    counts = []
    blocks = []
    current_block = []
    for line in test:
        print line
        if line.startswith("latitude"):
            # New block: add the previous one to `blocks` and reset
            blocks.append(current_block)
            current_block = []
            latitudes.append(float(line.split()[-1]))
        elif line.startswith("number_of_data"):
            # Just append the current count to the list
            counts.append(int(line.split()[-1]))
        else:
            # Update the current block
            current_block += [float(f) for f in line.strip().split()]
    # Make sure to add the last block...
    blocks.append(current_block)
    # And to remove the first (empty) one
    blocks.pop(0)
    

    你可以知道检查你所有的块是否有合适的大小:

    all(len(b)==c for (c,b) in zip(counts,blocks))
    

    替代解决方案

    如果您担心循环,您可能需要考虑查询文件的内存映射版本。这个想法是找到以latitude 开头的行的位置。找到一个后,找到下一个,就得到了一段文本:删除前两行(以latitude 开头的一行和以number_of_data 开头的一行),合并其余行并处理。

    import mmap
    
    with open("crap.txt", "r+b") as f:
        # Create the mapper
        mapper = mmap.mmap(f.fileno(), 0)
        # Initialize your output variables
        latitudes = []
        blocks = [] 
        # Find the beginning of the first block
        position = mapper.find("latitude")
        # `position` will be -1 if we can't find it
        while (position >= 0):
            # Move to the beginning of the block
            mapper.seek(position)
            # Read the first line
            lat_line = mapper.readline().strip()
            latitudes.append(lat_line.split()[-1])
            # Read the second one
            zap = mapper.readline()
            # Where are we ?
            start = mapper.tell()
            # Where's the next block ?
            position = mapper.find("latitude")
            # Read the lines and combine them into a large string
            current_block = mapper.read(position-start).replace("\n", " ")
            # Transform the string into a list of floats and update the block
            blocks.append(list(float(i) for i in current_block.split() if i))
    

    【讨论】:

      【解决方案2】:

      看起来很简单。解析数字的部分只是line.split()。其余的或解析可以被强化或软化,这取决于输入数据的格式有多稳定。

      results = []
      latitude = None
      numbers_total = None
      value_list = []
      
      for line in text.splitlines():
        if line.startswith('latitude '):
          if latitude is not None:
            assert len(value_list) == numbers_total
            results.append((latitude, value_list))
            value_list = []
          latitude = line.split()[-1]
        elif line.startswith('number_of_data_values '):
          numbers_total = int(line.split()[-1])
        else:
          value_list.extend(line.split())
      
      # Make sure the last block gets added to the results.
      if latitude is not None:
        assert len(value_list) == numbers_total
        results.append((latitude, value_list))
        value_list = []
      
      for latitude, value_list in results:
        print 'latitude %r: %r' % (latitude, value_list)
      

      这个输出:

      latitude '5.0000': ['0.1', '0.2', '0.3', '0.4', '1.1', '1.2', '1.3', '1.4', '8.1']
      latitude '4.3000': ['0.1', '0.2', '0.3', '0.4', '1.1', '1.2', '1.3', '1.4', '8.1']
      latitude '4.0000': ['0.1', '0.2', '0.3', '0.4', '1.1', '1.2', '1.3', '1.4', '8.1']
      

      【讨论】:

        猜你喜欢
        • 1970-01-01
        • 2015-10-17
        • 1970-01-01
        • 2012-12-16
        • 1970-01-01
        • 1970-01-01
        • 2022-07-12
        • 2016-12-28
        • 1970-01-01
        相关资源
        最近更新 更多