每次范围更改时，将每组范围的所有行写入新文件 Python 3.6答案

【问题标题】：Write all lines for each set of a range to new file each time the range changes Python 3.6每次范围更改时，将每组范围的所有行写入新文件 Python 3.6
【发布时间】：2018-05-02 13:43:45
【问题描述】：

试图找到一种方法使这个过程以 Python 方式或根本不工作。基本上，我有一个很长的文本文件，它被分成几行。每 x 行有一个主要是大写的，大致应该是该特定部分的标题。理想情况下，我希望标题和之后的所有内容都使用标题作为文件名进入文本文件。在这种情况下，这必须发生 3039，因为那里会有尽可能多的标题。到目前为止，我的过程是这样的：我创建了一个通过文本文件读取的变量，告诉我它是否主要是大写的。

def mostly_uppercase(text):
    threshold = 0.7
    isupper_bools = [character.isupper() for character in text]
    isupper_ints = [int(val) for val in isupper_bools]
    try:
        upper_percentage = np.mean(isupper_ints)
    except:
        return False
    if upper_percentage >= threshold:
        return True
    else:
        return False

之后，我做了一个计数器，这样我就可以创建一个索引，然后我把它组合起来：

counter = 0

headline_indices = []

for line in page_text:
    if mostly_uppercase(line):
        print(line)
        headline_indices.append(counter)
    counter+=1

headlines_with_articles = []
headline_indices_expanded = [0] + headline_indices + [len(page_text)-1]

for first, second in list(zip(headline_indices_expanded, headline_indices_expanded[1:])):
    article_text = (page_text[first:second])
    headlines_with_articles.append(article_text)

据我所知，所有这些似乎都运行良好。但是当我尝试打印我想要归档的部分时，我所能做的就是将整个文本打印到所有 txt 文件中。

for i in range(100):
    out_pathname = '/sharedfolder/temp_directory/' + 'new_file_' + str(i) + '.txt'
    with open(out_pathname, 'w') as fo:
        fo.write(articles_filtered[2])

编辑：这让我走到了一半。现在，我只需要一种用第一行命名每个文件的方法。

for i,text in enumerate(articles_filtered):
    open('/sharedfolder/temp_directory' + str(i + 1) + '.txt', 'w').write(str(text))

【问题讨论】：

标签： python python-3.x nlp nltk

【解决方案1】：

处理单个输入文件的一种常规方法是使用 Python with 语句和 for 循环，方法如下。我还改编了其他人的一个很好的答案来计算大写字符，以获得您需要的分数。

def mostly_upper(text):
    threshold = 0.7
    ## adapted from https://stackoverflow.com/a/18129868/131187
    upper_count = sum(1 for c in text if c.isupper())
    return upper_count/len(text) >= threshold

first = True
out_file = None
with open('some_uppers.txt') as some_uppers:
    for line in some_uppers:
        line = line.rstrip()
        if first or mostly_upper(line):
            first = False
            if out_file: out_file.close()
            out_file = open(line+'.txt', 'w')
        print(line, file=out_file)
out_file.close()

在循环中，我们读取每一行，询问它是否大部分是大写的。如果是，我们关闭用于上一组行的文件并为下一个集合打开一个新文件，使用当前行的内容作为标题。

我允许第一行可能不是标题。在这种情况下，代码会创建一个以第一行的内容作为名称的文件，然后继续将它找到的所有内容写入该文件，直到它确实找到标题行。

【讨论】：