【问题标题】:How to Convert a Text File into a List in Python3如何在 Python3 中将文本文件转换为列表
【发布时间】:2020-03-31 12:46:59
【问题描述】:

在 Python3 中,从包含歌词/字幕/其他的现有 .txt 文件中, 我想做一个简单的列表(没有任何嵌套) 现有单词,没有空格或其他插入符号。

根据其他 StackExchange 请求,我做了这个

import csv

crimefile = open('she_loves_you.txt', 'r')
reader = csv.reader(crimefile)
allRows = list(reader) # result is a list with nested lists

ultimate = []
for i in allRows:
    ultimate += i # result is a list with elements longer than one word

ultimate2 = []
for i in ultimate:
    ultimate2 += i # result is a list with elements which are single letters

我希望的结果会是这样的

['She', 'loves', 'you', 'yeah', 'yeah', 'yeah', 'She', 'loves', 'you', ...]

================================================ ========================

有趣的是理解为什么代码(它作为上述代码的扩展运行):

import re
print (re.findall(r"[\w']+", ultimate))

带来以下错误:

Traceback (most recent call last):
  File "4.4.4.csv.into.list.py", line 72, in <module>
    print (re.findall(r"[\w']+", ultimate))
  File "/usr/lib/python3.7/re.py", line 223, in findall
    return _compile(pattern, flags).findall(string)
TypeError: expected string or bytes-like object

【问题讨论】:

  • re.findall 的第二个参数(本例中为ultimate)应该是一个字符串。您正在传递一个字符串列表。

标签: python python-3.x list data-conversion


【解决方案1】:

错误消息完全清楚"expected string or bytes-like object"。 这意味着您的ultimate 应该转换为字符串(str),并且当您检查ultimatetypelist 对象时。

>>> type(ultimate)
<class 'list'>

# or 

>>> type([])
<class 'list'>

在你的情况下;

print (re.findall(r"[\w']+", str(ultimate)))  # original text

# or

print (re.findall(r"[\w']+", ' '.join(ultimate)))  # joined words

【讨论】:

  • 哇,这行得通,我可以要求进一步解释吗? print (re.findall(r"[\w']+", str(ultimate))) 给我带来像“'She”,'loves',“you'”,“'”,“yeah'”,“ '", "yeah'", "'", "yeah'", "'She", 'loves', "you'", 而 print (re.findall(r"[\w']+", ' ' .join(ultimate)) 给出不同的(更干净的)'She'、'loves'、'you'、'yeah'、'yeah'、'yeah'、'She'、'loves'、'you'、'yeah ',. 有什么区别?
【解决方案2】:

试试这个:

import csv

crimefile = open('she_loves_you.txt', 'r')
reader = csv.reader(crimefile)
allRows = list(reader) # result is a list with nested lists

ultimate = []
for i in allRows:
    ultimate += i.split(" ")

【讨论】:

  • 好吧,这里我们有一个错误 $ python3 4.4.4.csv.into.list.py Traceback(最近一次调用最后):文件“4.4.4.csv.into.list.py” ,第 108 行,在 最终 += i.split(" ") AttributeError: 'list' object has no attribute 'split'
【解决方案3】:

下面是我在这个问题领域所做的工作的全部输出

import csv
import re
import json

#1 def1
#def decomposition(file):
'''
    opening the text file,
    and in 3 steps creating a list containing signle words that appears in the text file
'''

crimefile = open('she_loves_you.txt', 'r')
reader = csv.reader(crimefile)

        #step1 : list with nested lists
allRows = list(reader) # result is a list with nested lists, on which we are going to work later

        #step2 : one list, with elements longer that one word
ultimate = []
for i in allRows:
    ultimate += i

        #step3 : one list, with elements which are lenght of one word
            #print (re.findall(r"[\w']+", ultimate)) # does not work
            #print (re.findall(r"[\w']+", str(ultimate)))  # works
list_of_words = re.findall(r"[\w']+", ' '.join(ultimate)) # works even better!


#2 def2
def saving():
    '''
    #    creating/opening writable file (as a variable),
    #    and saving into it 'list of words'
    '''

    with open('she_loves_you_list.txt', 'w') as fp:
    #Save as JSON with
        json.dump(list_of_words, fp)


#3 def3
def lyric_to_frequencies(lyrics):
    '''
    #    you provide a list,
    #    and recieve a dictionary, which contain amount of unique words in this list
    '''

    myDict = {}
    for word in lyrics:
        if word in myDict:
            myDict[word] += 1
        else :
            myDict[word] = 1
    #print (myDict)
    return myDict

#4 def4
def  most_common_words(freqs):
    '''
    you provide a list of words ('freqs')
    and recieve how often they appear
    '''

    values = freqs.values()
    best = max(values) #finding biggest value very easily
    words = []
    for k in freqs : # and here we are checking which entries have biggers (best) values
        if freqs[k] == best:
            words.append(k) #just add it to the list
    print(words,best)
    return(words,best)

#5 def5
def words_often(freqs, minTimes):
    '''
    you provide a list of words ('freqs') AND minimumTimes how the word suppose to appear in file to be printed out
    and recieve how often they appear
    '''

    result = []
    done = False
    while not done :
        temp = most_common_words(freqs)
        if temp[1] >= minTimes:
            result.append(temp)
            for w in temp[0]:
                del(freqs[w])
        else:
            done = True
    return result



#1
decomposition('she_loves_you.txt')

#2
saving()

#3
lyric_to_frequencies(list_of_words)

#4
most_common_words(lyric_to_frequencies(list_of_words))

#5
words_often(lyric_to_frequencies(list_of_words), 5)

【讨论】:

    猜你喜欢
    • 1970-01-01
    • 2016-11-17
    • 2013-06-12
    • 1970-01-01
    • 2019-08-14
    • 1970-01-01
    • 1970-01-01
    相关资源
    最近更新 更多