【问题标题】:Part 2 of a successful outcome regarding white-space filling关于空白填充的成功结果的第 2 部分
【发布时间】:2023-03-28 15:41:01
【问题描述】:

所以,我的第一个问题得到了正确回答。作为参考,你可以去这里...

How to fill the white-space with info while leaving the rest unchanged?

简而言之,我需要这个...

POLYGON_POINT -79.750000000217,42.017498354525,0
POLYGON_POINT -79.750000000217,42.016478251402,0
POLYGON_POINT -79.750598748133,42.017193264943,0
POLYGON_POINT -79.750000000217,42.017498354525,0


POLYGON_POINT -79.750000000217,42.085882815878,0
POLYGON_POINT -79.750000000217,42.082008734634,0
POLYGON_POINT -79.751045507507,42.082126409633,0
POLYGON_POINT -79.750281907508,42.083166574215,0
POLYGON_POINT -79.750781149174,42.084212672130,0
POLYGON_POINT -79.750000000217,42.085882815878,0

变成这样……

BEGIN_POLYGON
POLYGON_POINT -79.750000000217,42.017498354525,0
POLYGON_POINT -79.750000000217,42.016478251402,0
POLYGON_POINT -79.750598748133,42.017193264943,0
POLYGON_POINT -79.750000000217,42.017498354525,0
END_POLY
BEGIN_POLYGON
POLYGON_POINT -79.750000000217,42.085882815878,0
POLYGON_POINT -79.750000000217,42.082008734634,0
POLYGON_POINT -79.751045507507,42.082126409633,0
POLYGON_POINT -79.750281907508,42.083166574215,0
POLYGON_POINT -79.750781149174,42.084212672130,0
POLYGON_POINT -79.750000000217,42.085882815878,0
END_POLY

这是使用 python 脚本成功完成的。现在我发现我需要删除重复的行,特别是每个块的最后一行。那条线关闭了多边形,但构建批次给出了错误,因为它自己关闭了多边形。基本上我需要它在这一切结束...

BEGIN_POLYGON
POLYGON_POINT -79.750000000217,42.017498354525,0
POLYGON_POINT -79.750000000217,42.016478251402,0
POLYGON_POINT -79.750598748133,42.017193264943,0
END_POLY
BEGIN_POLYGON
POLYGON_POINT -79.750000000217,42.085882815878,0
POLYGON_POINT -79.750000000217,42.082008734634,0
POLYGON_POINT -79.751045507507,42.082126409633,0
POLYGON_POINT -79.750281907508,42.083166574215,0
POLYGON_POINT -79.750781149174,42.084212672130,0
END_POLY

还有 3,415,978 行要经过。每个其他重复删除器都会删除空白和所有措辞。嗯

【问题讨论】:

  • 跟踪您阅读的最后一行,如果当前行是END POLY,则不要写。
  • 您要保留订单吗?你检查我的答案了吗?

标签: python duplicates


【解决方案1】:

正如 cmets 中所指出的,保留对上一行的引用:

with open('in.txt') as fin, open('out.txt', 'w') as fout:
    prev = None
    for i, line in enumerate(fin):
      if line.strip() != 'END_POLY' and prev:
        fout.write(prev)
      prev = line
      if not i % 10000:
        print('Processing line {}'.format(i))
    fout.write(line)

【讨论】:

  • 得到了这个.... 回溯(最近一次调用最后):文件“K:\X-Plane 11\addpoly.py”,第 9 行,在 fout.write(prev) TypeError: 期望一个字符串或其他字符缓冲区对象
  • import itertools with open("y.txt") as f, open("yout.txt","w") as fw: fw.writelines(itertools.chain.from_iterable([[" BEGIN_POLYGON\n"]+list(v)+["END_POLY\n"] for k,v in itertools.groupby(f,key = lambda l : bool(l.strip())) if k])) with open ('yout.txt') as fin, open('yout1.txt', 'w') as fout: prev = None for i, line in enumerate(fin): if line.strip() != 'END_POLY': fout.write(prev) prev = line if not i % 10000: print('Processing line {}'.format(i)) fout.write(line)
  • 添加了无检查
【解决方案2】:

虽然不是在 python 中,但如果您使用sed,这些类型的编辑非常简单

sed 'N;s/.*\n\(END_POLY\)/\1/' file.txt

基本上它的作用是它使用N一次读取2行,如果第二行包含字符串END_POLY,它会删除第一行,只留下END_POLY

【讨论】:

    【解决方案3】:

    如果您不想要重复的数据,您可以将列表转换为集合,然后转换为列表(对另一个问题中的@Jean-François Fabre 代码稍作修改):

    import itertools, collections
    
    with open("file.txt") as f, open("fileout.txt","w") as fw:
        fw.writelines(itertools.chain.from_iterable([["BEGIN_POLYGON\n"]+list(collections.OrderedDict.fromkeys(v).keys())+["END_POLYGON\n"] for k,v in itertools.groupby(f,key = lambda l : bool(l.strip())) if k]))
    

    如您所见,如果您这样做:

    print(list(collections.OrderedDict.fromkeys([1,1,1,1,1,1,2,2,2,2,5,3,3,3,3,3]).keys()))
    

    它将是 -> [1, 2, 5, 3] 并且你保留订单

    【讨论】:

    • 错误,将列表变为集合会更改顺序。不确定 OP 想要那个。
    • @Jean-FrançoisFabre 我编辑了答案,现在它保留了顺序
    猜你喜欢
    • 2016-08-09
    • 1970-01-01
    • 2011-09-29
    • 2012-08-25
    • 1970-01-01
    • 2017-01-10
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    相关资源
    最近更新 更多