在读取文件 Python 时跳过大块行答案

【问题标题】：Skip chunks of lines while reading a file Python在读取文件 Python 时跳过大块行
【发布时间】：2019-04-07 14:39:06
【问题描述】：

我有一个文件，其中包含重复结构如下的曲线数据：

numbersofsamples
Title
     data
     data
     data
      ...

例如：

999numberofsamples
title crvTitle
             0.0            0.866423
    0.0001001073           0.6336382
    0.0002002157           0.1561626
    0.0003000172          -0.1542121
             ...                 ...
1001numberofsamples
title nextCrv
    0.000000e+00        0.000000e+00
    1.001073e-04        1.330026e+03
    2.002157e-04        3.737352e+03
    3.000172e-04        7.578963e+03
             ...                 ...

文件由多条曲线组成，最大可达 2GB。

我的任务是通过跳过我不感兴趣的块（曲线）来查找和导出特定曲线。我知道曲线的长度（样本数），所以应该有办法跳转到下一个分隔符（例如 numberofsamples），直到找到我需要的标题？

我尝试使用迭代器来做到这一点，不幸的是没有任何成功。这是完成任务的正确方法吗？

如果可能的话，我不想将数据保存到内存中。

【问题讨论】：

显示您尝试过的代码以及未按预期工作的代码。

标签： python file iterator readfile skip

【解决方案1】：

这是跳过您不关心的行的一般方法：

for line in file:
    if 'somepattern' not in line:
        continue
    # if we got here, 'somepattern' is in the line, so process it

【讨论】：

是的，我知道这种方法，但是如果我在不检查每一行的条件的情况下进行更大的“跳跃”会不会更快？

【解决方案2】：

您不需要将所有行都保存在内存中。跳到想要的标题，然后只保存你想要的留置权：

with open('somefile.txt') as lines
    # skip to title
    for line in lines
        if line == 'title youwant':
            break
    numbers = []
    for line in lines:
        if 'numberofsamples' in line:
            break # next samples
        numbers.append(line)

【讨论】：