【问题标题】:Find strings and subtring from the wordlist从单词列表中查找字符串和子字符串
【发布时间】:2019-11-23 09:13:57
【问题描述】:

我有 test.txt 文件,从词表中查找字符串和子串

<aardwolf>
<Aargau>
<Aaronic>
<aac>
<akac>
<abaca>
<abactinal>
<abacus>  

test.py 文件

import sys  # the sys module
import os
import re
def hasattr(str,list):
    expr = re.compile(str)
    # yield the elements
    return [elem for elem in list if expr.match(elem)]

isword = {}
FH = open(sys.argv[1],'r',encoding="ISO-8859-1")
for strLine in FH.readlines():  isword.setdefault(''.join(sorted(strLine[1:strLine.find('>')].upper())),[]).append(strLine[:-1])
print (isword)
basestring=str()
for ARGV in sys.argv[2:]:
    print ("\n*** %s\n" %ARGV )#print Argv

diffpatletters = re.compile(u'[a-zA-Z]').findall(ARGV.upper())
#print (diffpatletters)
diffpat = '.*' + '(.*)'.join(sorted(diffpatletters)) + '.*'
#print (diffpat)
for KEY in hasattr(diffpat,isword.keys()):
#       print (KEY)
       SUBKEY = KEY
       for X in diffpatletters:
         #print (X)
         SUBKEY1 = SUBKEY.replace(X,'')
          #print (SUBKEY)
       if SUBKEY1 in isword:
           #print (SUBKEY)
           basestring+=  "%s -> %s" %(isword[KEY], isword[SUBKEY1])
print (basestring + "\n")

下面是在命令行中运行文件

python test.py test.txt  aack aadfl

预期是在第二个参数之后找到匹配的字符串和子字符串。My basestring not printing

【问题讨论】:

  • 您可以将逻辑的第一个答案带到基本字符串中,它将打印出来。

标签: python dictionary


【解决方案1】:

你必须使用正则表达式吗? 如果没关系,你想要这样的结果吗?

with open('test.txt', 'r')as f:
    s = f.read()
s = s.split('\n')
s

Out[1]:
['<aardwolf>',
 '<Aargau>',
 '<Aaronic>',
 '<aac>',
 '<akac>',
 '<abaca>',
 '<abactinal>',
 '<abacus>  ']

对于列表类型的结果:

ARGVs = ['aard', 'onic', 'abacu']

matches = [x for x in s for arg in ARGVs if arg.lower() in x.lower()]
print(matches)

Out[2]:
['<aardwolf>', '<Aaronic>', '<abacus>  ']

对于字典类型的结果

ARGVs = ['aard', 'onic', 'abacu', 'aaro', 'ac']

{key:[x for x in s if key in x] for key in ARGVs if len([x for x in s if key in x]) != 0}

Out[3]:

{'aard': ['<aardwolf>'],
 'onic': ['<Aaronic>'],
 'abacu': ['<abacus>  '],
 'ac': ['<aac>', '<akac>', '<abaca>', '<abactinal>', '<abacus>  ']}

使用正则表达式

import re

with open('test.txt', 'r')as f:
    s = f.read()

ARGVs = ['wol','ac']
cond = '|'.join([f'\w*{patt}\w*' for patt in ARGVs])
re.findall(cond,s)  

Out[4]:
['aardwolf', 'aac', 'akac', 'abaca', 'abactinal', 'abacus']

【讨论】:

  • 不,我需要做regex 操作并为建议投 1 票
  • 好的,我明白了。我尝试用正则表达式来做。已更新代码。
猜你喜欢
  • 2018-06-02
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 2012-05-16
  • 2016-10-21
相关资源
最近更新 更多