用python编写的代码有什么问题[重复]答案

【问题标题】：what is wrong in the code written inpython [duplicate]用python编写的代码有什么问题[重复]
【发布时间】：2013-09-07 17:07:11
【问题描述】：

鉴于 infile 包含：

aaaaaaa"pic01.jpg"bbbwrtwbbbsize 110KB
aawerwefrewqa"pic02.jpg"bbbertebbbsize 100KB
atyrtyruraa"pic03.jpg"bbbwtrwtbbbsize 190KB

如何获取outfile为：

pic01.jpg 110KB
pic02.jpg 100KB
pic03.jpg 190KB

我的代码是：

with open ('test.txt', 'r') as infile, open ('outfile.txt', 'w') as outfile:
    for line in infile:
        lines_set1 = line.split ('"')
        lines_set2 = line.split (' ')
        for item_set1 in lines_set1:
            for item_set2 in lines_set2:
                if item_set1.endswith ('.jpg'):
                    if item_set2.endswith ('KB'):
                            outfile.write (item_set1 + ' ' + item_set2 + '\n')

但是代码会产生空白文件。有什么问题？？

【问题讨论】：

为什么要使用嵌套的 for 循环？
你应该使用正则表达式。
@Sebastian 修正情况如何？
您是否在整个文件中添加了打印语句以查看它在做什么？

标签： python string file

【解决方案1】：

您的代码只有一个主要问题：if item_set2.endswith ('KB') 检查不起作用，因为每行末尾都有一个换行符。将其替换为（注意strip() 调用）：

if item_set2.strip().endswith('KB'):

另外，您不需要+ '\n'，因为item_set2 末尾已经包含一个换行符：

outfile.write (item_set1 + ' ' + item_set2.strip())

仅供参考，您可以使用带有保存组的正则表达式来提取数据：

import re


with open('test.txt', 'r') as infile, open('outfile.txt', 'w') as outfile:
    for line in infile:
        match = re.search(r'"(.*)"\w+\s(\w+)', line)
        outfile.write(' '.join(match.groups()) + "\n")

运行代码后outfile.txt的内容：

pic01.jpg 110KB
pic02.jpg 100KB
pic03.jpg 190KB

【讨论】：

我想用我写代码的方式来解决问题，而不是重新导入@alecxe
@leanne 好的，更新了答案，请检查。
当您编写“您的代码”时，您指的是问题还是其他答案？如果您澄清一下，您的答案会更好。
@alecxe 谢谢。我不明白为什么 item_set2 需要 strip() 调用但 item_set1 不需要调用
@leanne 是的，谢谢。有人刚刚添加了-1，很高兴知道为什么。

【解决方案2】：

无需导入re 的解决方案。条件可以改进为单行条件。

with open('test.txt', 'r') as infile, open('outfile.txt', 'w') as outfile:
    for line in infile:
        filename = line.strip().split('"')[1]
        size = line.rsplit(None, 1)[-1]
        if filename.endswith('.jpg') and size.endswith('KB'):
            outfile.write('%s %s\n' % (filename, size))

【讨论】：

size = line.rsplit(None, 1)[-1] 更好。
谢谢，现在编辑我的答案

【解决方案3】：

您应该使用正则表达式，这将简化您的代码。像这样的东西：

import re
with open ('test.txt', 'r') as infile, open ('outfile.txt', 'w') as outfile:
    for line in infile:
        obj = re.match('.+"(.+\.jpg)".+\s(\d+KB)', line)
        if obj:
             outfile.write (obj.group(1) + ' ' + obj.group(2) + '\n')

此脚本返回的outfile.txt：

pic01.jpg 110KB
pic02.jpg 100KB
pic03.jpg 190KB

【讨论】：

我想以我编写代码的方式解决问题，而不是重新导入@Maxime Lorant
你的代码很脏。为什么要保留嵌套循环解决方案？

【解决方案4】：

首先，在空格处分割行并取第二项（在基于 0 的列表中，第一项），这将给出大小部分。

接下来，在 " 处拆分第一项并取第二项。这将给出文件名。

如果你想知道它是如何分裂的，请查看在线演示。

with open ('test.txt', 'r') as infile, open ('outfile.txt', 'w') as outfile:
    for line in infile:
        Parts = line.split()
        outfile.write (Parts[0].split('"')[1] + " " + Parts[1] + "\n")

输出：

pic01.jpg 110KB
pic02.jpg 100KB
pic03.jpg 190KB

在线演示：

http://ideone.com/EOcuXL

【讨论】：

我想用我写代码的方式来解决问题，而不是重新导入@thefourtheye
我们现在不使用re。请检查
您仍然需要从第一部分中删除 "bbbwrtwbbbsize。
@MaximeLorant 为你添加了一个演示 :)
最近几周我在第二个参数 1 处使用了 split() 方法太多了，这就是为什么我有点困惑 :-)

【解决方案5】：

使用sed：

$ sed 's/.*"\(.*\)".*size \(.*\)/\1 \2/' foo.txt
pic01.jpg 110KB
pic02.jpg 100KB
pic03.jpg 190KB

【讨论】：