【问题标题】:Extracting data using regular expressions: Python使用正则表达式提取数据:Python
【发布时间】:2017-02-28 01:11:33
【问题描述】:

这个问题的基本大纲是读取文件,使用re.findall()查找整数,查找[0-9]+的正则表达式,然后将提取的字符串转换为整数并对整数求和。

我在附加列表时遇到了麻烦。从我下面的代码中,它只是附加了该行的第一个(0)索引。请帮我。谢谢。

import re
hand = open ('a.txt')
lst = list()
for line in hand:
    line = line.rstrip()
    stuff = re.findall('[0-9]+', line) 
    if len(stuff)!= 1  : continue
    num = int (stuff[0])
    lst.append(num)
print sum(lst)

【问题讨论】:

标签: regex python-2.7 data-extraction


【解决方案1】:

太好了,感谢您包含整个 txt 文件!您的主要问题出在if len(stuff)... 行中,如果stuff 里面有零个东西,当它有2,3 等等时,它就会跳过。您只保留长度为 1 的 stuff 列表。我将 cmets 放入代码中,但如果有不清楚的地方,请提出任何问题。

import re
hand = open ('a.txt')
str_num_lst = list()
for line in hand:
    line = line.rstrip()
    stuff = re.findall('[0-9]+', line)
    #If we didn't find anything on this line then continue
    if len(stuff) == 0: continue
    #if len(stuff)!= 1: continue #<-- This line was wrong as it skip lists with more than 1 element

    #If we did find something, stuff will be a list of string:
    #(i.e. stuff = ['9607', '4292', '4498'] or stuff = ['4563'])
    #For now lets just add this list onto our str_num_list
    #without worrying about converting to int.
    #We use '+=' instead of 'append' since both stuff and str_num_lst are lists
    str_num_lst += stuff

#Print out the str_num_list to check if everything's ok
print str_num_lst

#Get an overall sum by looping over the string numbers in the str_num_lst
#Can convert to int inside the loop
overall_sum = 0
for str_num in str_num_lst:
    overall_sum += int(str_num)

#Print sum
print 'Overall sum is:'
print overall_sum

编辑:

你是对的,将整个文件作为一行读取是一个很好的解决方案,而且不难做到。查看this post。下面是代码的样子。

import re

hand = open('a.txt')
all_lines = hand.read() #Reads in all lines as one long string
all_str_nums_as_one_line = re.findall('[0-9]+',all_lines)
hand.close() #<-- can close the file now since we've read it in

#Go through all the matches to get a total
tot = 0
for str_num in all_str_nums_as_one_line:
    tot += int(str_num)

print('Overall sum is:',tot) #editing to add ()

【讨论】:

  • 非常感谢。我知道我在 if len(stuff)... 行做错了。我无法弄清楚方法。 '+=' 是正确的选择。谢谢你分享。作为入门级程序员,我想知道我们是否可以将整个文件作为单个字符串读取()并使用字符串中的 '[0-9]+' 进行提取?
  • 是的,好点子!我已经编辑了答案以包含该选项
  • 那太好了。非常感谢。
【解决方案2】:
import re
ls=[];
text=open('C:/Users/pvkpu/Desktop/py4e/file1.txt');
for line in text:
    line=line.rstrip();
    l=re.findall('[0-9]+',line);
    if len(l)==0:
        continue
    ls+=l
for i in range(len(ls)):
    ls[i]=int(ls[i]);
print(sum(ls));

【讨论】:

    猜你喜欢
    • 2013-04-04
    • 2019-05-13
    • 1970-01-01
    • 2018-01-07
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2015-02-24
    相关资源
    最近更新 更多