使用python读取特定字符内的字符串答案

【问题标题】：Reading a string inside a specific character using python使用python读取特定字符内的字符串
【发布时间】：2021-11-18 10:59:13
【问题描述】：

我最近开始学习编程，现在我正在使用 python 来进行数据过滤。我的问题是：如何在特定字符中获取字符串？例如，在文本文件中我有这样的内容：

5d:6g:9h:5t:7a:45;33:12:5B:9J;70;9C;89;85:4B:38:16:9B:45:56:85:

我想要第 10 个字符 ; ; 或第 15 个 : : 内的字符串

我已经阅读了 txt 文件，我得到了一些信息，但这部分具体我无法弄清楚。到目前为止，这是我所拥有的：

import zipfile

arq = zipfile.ZipFile('DSts.zip')

for file in arq.namelist():
    print(file)
    f = arq.open(file) 
    Lines = f.readlines()
    for line in Lines:
        print(f'{line[11:16]}')

【问题讨论】：

标签： python string search printing character

【解决方案1】：

这是一个可以集成到代码中的解决方案。您会将其应用于您阅读的每一行（或您认为需要像这样解析的每一行），

def get_substring(input_string, delim, nth, delims):
    ''' Returns the substring between the nth character
            delim in the string and next such character; 
            delims is a list of all delimiters to account for '''

    # Indices of all occurences of delims
    idx_delims = [i for i, x in enumerate(input_string) if x in delims]
    # Retrieve the index of nth delim
    idx_nth = idx_delims[nth-1]
    # Find the index of the nth+1 delim
    idx_nth_p1 = input_string.index(delim, idx_nth+1)
    # Return the substring between those two positions
    return input_string[idx_nth+1:idx_nth_p1]

orig_string = '5d:6g:9h:5t:7a:45;33:12:5B:9J;70;9C;89;85:4B:38:16:9B:45:56:85:'

print(orig_string)

# All delimiters
delims = [':', ';']

# Substring between 10th and 11th :
str_1 = get_substring(orig_string, ';', 10, delims)
print(str_1)
# Substring between 15th and 16th ;
str_2 = get_substring(orig_string, ':', 15, delims)
print(str_2)

此函数从输入字符串中提取所有被视为分隔符的字符。然后它根据请求找到第 N 个分隔符，以及原始字符串中的下一个分隔符。它返回介于两者之间的字符串。

实际上，这应该有一些检查功能，以及相关的警告，甚至抛出异常（例如，delim 是否存在，以及它是否在请求的nth 位置）。此外，它可以写得更简洁，为了便于阅读和理解，我把它写得更长。最后，您应该删除最终版本中的打印语句。

更新：这是演示集成的最少代码。您可以单独对其进行测试，然后在原始代码中使用读取和后处理方法而不是 open 和 readlines。两者都没有错，但是：

open 子句需要 close 和 with open 在幕后为您提供 close，即使事情崩溃了。
readlines 读取整个文件。我经常处理大文件，所以我习惯于节省内存并逐行处理。这取决于您，以及您正在解决的问题。

下面是例子：

def get_substring(input_string, delim, nth, delims):
    ''' Returns the substring between the nth character
            delim in the string and next such character; 
            delims is a list of all delimiters to account for '''

    # Indices of all occurences of delims
    idx_delims = [i for i, x in enumerate(input_string) if x in delims]
    # Retrieve the index of nth delim
    idx_nth = idx_delims[nth-1]
    # Find the index of the nth+1 delim
    idx_nth_p1 = input_string.index(delim, idx_nth+1)
    # Return the substring between those two positions
    return input_string[idx_nth+1:idx_nth_p1]


# All delimiters
delims = [':', ';']

all_substrings = []
with open('testfile.txt', 'r') as fin:
    for line in fin:
        # Remove the leading and trailing whitespace
        line = line.strip()
        temp_str = get_substring(line, ':', 2, delims)
        all_substrings.append(temp_str)

print(all_substrings)

代码用strip() 清除尾随换行符，并将所有子字符串附加到列表中。

注意：按照您描述问题的方式，在我看来，您想在一个包含所有分隔符的位置匹配特定分隔符，即对于此 5d:6g:9h:5t:7a:45;33:12:，分隔符 ; 将是第 6 个定界符，所以调用转向(line, ';', 6, delims)。如果不是这种情况，请告诉我，但请考虑自行调整以进行练习。这意味着您在评论中提到的电话应该就像这里一样，(line, ':', 2, delims)。因为: 是第二个分隔符。还要记住，Python 索引从 0 开始，所以这实际上是 idx_delims 列表中的位置 1。

最后，这是一个用于测试的最小输入文件：

5d:6g:9h:5t:7a:45;33:12:5B:9J;70;9C;89;85:4B:38:16:9B:45:56:85:
5d:6g:9h:4t:7a:45;33:12:5B:9J;70;9C;89;85:4B:38:16:9B:45:56:85:
3d:7g:9i:5t:7a:45;33:12:5B:9J;70;9C;89;85:4B:38:16:9B:45:56:85:

【讨论】：

非常感谢您的帮助。我尝试将其包含在我的代码中，但没有成功。因此，当我正在读取包含多行的文本文件时，我必须检查文件中的每一行，对吗？这就是我所做的：我将字符串“orig_string”和“str_1”放入循环中，如下所示：（函数“get_substring”在打开txt文件的行之上）：-> for line in Lines：orig_string =行 str_1 = get_substring(orig_string, ';', 2, delims) print(str_1)
@ElleOliver 更新了答案，让我知道现在是否更干净了。
试一试，如果卡住了，请发布有关您文件的更多详细信息（仅是您发布的条目类型吗？它有标题吗？）
添加了一个完整的例子，如果你还需要什么，请告诉我。
一切都如我所愿，非常感谢！！