如何在文件/字符串中查找字符串之间的字符串数 - python答案

【问题标题】：How to find the number of strings between strings in a file/string - python如何在文件/字符串中查找字符串之间的字符串数 - python
【发布时间】：2016-05-12 18:00:38
【问题描述】：

***********************************解决方案*********** *****************

经过大量测试和一些调整后，我已经成功编写了一个工作代码！

我与大家分享它，以防有人有兴趣执行与我相同的事情。感谢所有帮助过的人——谢谢！ :)

stringToSearchIn = open('FileName.py').read()

def findBetween(file, firststring, laststring, findstring):
    start = 0
    countfinal = 0
    while True:
        try:
            start = file.index(firststring, start)
        except:
            break
        try:
            end = file.index(laststring, start)
            count = file[start:end].count(findstring)
            countfinal = count + countfinal
            start = end
        except:
            break
     return countfinal

print findBetween(stringToSearchIn, "example", "file", "letters")

*********************************结束解决方案************ ***************

我已经尝试解决这个问题很长一段时间了，我相信我的想法过于复杂。对我来说写起来有点复杂，但我会尽力而为。如果有不清楚的地方，请随时提问！

请不要为我编写代码。我是来学习的，不是来抄袭的:)

例如：

#This is the entire text I want to scan
      s = open('test.py').read()
#I want to go through the entire file and find the string between these two strings:
     stringStartToSearch = "example" 
     stringEndToSearch = "file"
#Next, I want to count the number of times a certain string is located 
#between the previously found string.
     stringSearch = "letters"

为了进一步澄清，假设这是在“test.py”文件中找到的字符串：

#An example text that I have many letters in, just to give and example for a file.
#It's an example with many letters that I made especially for this file test.
#And these are many letters which should not be counted

如您所见，“字母”一词在此文件中出现了 3 次，但在“示例”和“文件”之间仅出现 2 次strong>。这就是我要数的。

有没有人知道一种有效的pythonic方法来实现这一点？

非常感谢！

为您服务

脚本确实在 2 个给定的字符串之间找到了正确的字符串，但是在找到它之后就停止了。我需要它继续搜索整个文件，而不是在找到后停止。另外，在我找到这两个字符串之间的字符串之后，我需要遍历它并计算某个单词显示的次数。用什么命令可以实现？

file = open('testfile.py').read()

def findBetween(file, firstWord, secondWord):
        start = file.index(firstWord)+len(firstWord)
        end = file.index(secondWord, start)
        return file[start:end]

print findBetween(file, "example", "file")

【问题讨论】：

签出string.find() 和字符串切片将是一个好的开始。
另外，如果文本中有连续的示例或文件怎么办？我的意思是... example ... letter ... example ... letter ... file ...
如果测试字符串为example something file letters file，结果应该是什么？
@Lafexlos - 会有连续的示例和文件，我想计算在它们之间找到“字母”的所有时间。
所以你想得到第一个example和最后一个file之间的所有letters？

标签： python string file python-3.x find

【解决方案1】：

让我们假设您拥有所提供的字符串列表。

Python Lists

list.index(x)

返回列表中第一个值为 x 的项目的索引。如果没有这样的项目是错误的。

获取开始的索引和结束的索引。如果 begin 和 end 都存在，并且 end 的索引大于 start 的索引，则只需使用 start 和 end 索引上的范围进行处理以获取所需的元素。

当然，您必须进行适当的错误检查并决定如果您有一个开始指示符但到达列表末尾但没有结束指示符（作为必须处理的错误情况的示例） )

请注意，list.index() 会查找开始字符串的第一次出现。如果还有更多，则从第一次出现的结束字符串开始范围，然后再做一次。这可以在适当的do ... while 循环中完成，while 检查是否有另一个开始字符串出现。

请注意，如果列表中再次出现开始字符串，则不会将其视为重置开始，而只是另一个条目。

mylist = ('string' 'start' 'string' 'start' 'string' 'end' 'string)

将处理

('start' 'string' 'start' 'string' 'end')

所以我们现在有了

start = 0

while True:
    try:
        start = mylist[start:].index(firststring)
    except:
        # index did not find start string. nothing to do, force exit
        break
    try:
        end = mylist[start:].index(laststring)
        count = mylist[start:end].count(findstring)
        # process findstring
        start = end # set up for the next loop
    except:
        # index did not find end string but did find start
        count = mylist[start:].count(findstring)
        # process findstring
        break # reached the end of the list, exit the while

现在你有了开始和结束索引

索引、切片和矩阵

因为列表是序列，所以索引和切片对列表的工作方式与对字符串的工作方式相同。所以只需使用 list[a:b].count(string) 和适当的切片指标..

list.count(obj)

返回 obj 在列表中出现的次数

【讨论】：

非常感谢！这帮助我解决了我第一次在字符串之间搜索字符串的问题。现在我需要以某种方式创建第二部分。我已经更新了我的主要帖子。请查看“For you sabbahillel”下的部分。谢谢你:)

【解决方案2】：

使用正则表达式进行查找：

import re

example = """An example text that I have many letters in, just to give and example for a file.
It's an example with many letters that I made especially for this file test.
And these are many letters which should not be counted"""

found_lines = re.findall('.+example.+letters.+file.+', example)

result = {}
for line in found_lines:
    example_word = line.find('example') + len('example')
    file_word = line.find('file', example_word)
    result[line] = file_word - example_word

print result

【讨论】：

您好，感谢您的帮助。 "result" 给出了在 "text" 中找到的整个字符串，并且不显示单词 "letters" 位于 "example" 和 "files" 之间的次数。不幸的是，这不是我想要的，非常感谢！ :)