如何从文件的每一行中提取字符和数字？答案

【问题标题】：How to extract characters and numbers from every line of a file?如何从文件的每一行中提取字符和数字？
【发布时间】：2014-07-23 13:32:34
【问题描述】：

我尝试从文件的每一行中提取第一个字符、第二个数字和第三个字符并存储到三个变量中，分别称为 FirstChar、SecondNum、ThirdChar。

输入文件（MultiPointMutation.txt）：

P1T,C11F,E13T
L7A
E2W

预期输出：

FirstChar="PCELE"
SecondNum="1 11 13 7 2"
ThirdChar="TFTAW"

我的代码：

 import re 
 import itertools
 ns=map(lambda x:x.strip(),open('MultiplePointMutation.txt','r').readlines())#reading  file
 for line in ns:
         second="".join(re.findall(r'\d+',line))#extract second position numbers
         print second # print second nums
         char="".join(re.findall(r'[a-zA-Z]',line))#Extract all characters
         c=str(char.rstrip())
         First=0
         Third=1
         for index in range(len(c)):
                 if index==First:
                         FC=c[index]#here i got all first characters
                         print FC
                         First=First+2
                 if index==Third:
                         TC=c[index]
                         print TC
                         Third=Third+2#here i got all third characters

输出：在这里，我得到的 FirstCharacter 和 ThirdCharacter 完全正确

FirstChar:
          P
          C
          E
          L
          E
ThirdChar:
          T
          F
          T
          A
          W

但问题在于获取 SecondNum：

我想提取数字如下：

注意：在这里，我不想一一打印。我想一一读取这个 SecondNum 变量值以备后用。

【问题讨论】：

标签： python regex string file-io extraction

【解决方案1】：

对于 secondNum，您可以简单地修改该行：

second="".join(re.findall(r'\d+',line))#extract second position numbers

到

second="\n".join(re.findall(r'\d+',line))#extract second position numbers

但我认为您的第一个和第三个字符无法正常工作。从你想收到的第一个输出中，你应该有这样的东西：

 import re

 x= """P1T,C11F,E13T
 L7A
 E2W"""

 secondNum = []
 firstChar = []
 thirdChar = []
 for line in x.split('\n'):

      [secondNum.append(a) for a in re.findall('\d+',line)]

      [firstChar.append(a) for a in re.findall('(?:^|,)([a-zA-Z])',line)]
      # this is an inline for loop which takes each element returned from re.findall  
      # and appends it to the firstChar Array
      # the regex searchs for the start of the string (^) or a comma(,) and this is a 
      # non capturing group (starting with (?:  meaning that the result of this group 
      # is not considered for the returned result and finally capture 1 character 
      # [a-zA-Z] behind the comma or the start which should be the first character

      [thirdChar.append(a) for a in re.findall('(?:^\w\d+|,\w\d+)([a-zA-Z])',line)
      # the third char works quite similar, but the non capturing group searchs for a 
      # comma or start of the string again followed by 1 char and at least one number 
      # (\d+) after this number there should be the third character which is in the 
      # captured group again

 print "firstChar=\""+str(firstChar)+"\""
 print "secondNum=\""+str(secondNum)+"\""
 print "thirdChar=\""+str(thirdChar)+"\""

但是你的第三个字符是 L7A 的第三个字符（你想要 A 的位置），但它也是 P1TQ 的第四个字符（你想要 Q 的位置）

【讨论】：

实际上我确实喜欢这个“用新行打印 secondnumber 变量它只会打印，但我想一个接一个地读取 SecondNum 变量以供以后使用”同时我可以读取值 FC 和TC 一个一个，但不是第二个
感谢您的快速回复 gaw 和您的友好信息，他们是输入中的一个小更正
我编辑了代码来创建你想要创建的元素的数组，这样你就可以一个一个地处理元素
好吧，我会检查并告诉你
能否解释一下你提取firstchar和thirdchar的逻辑和正则表达式