【发布时间】:2021-04-29 02:37:53
【问题描述】:
我是 python 的初学者,我正在大学学习它,教授在考试前给了我们一些工作要做。目前,我被这个程序困住已经快 2 周了,规则是我们不能使用任何库。 基本上,我有这本词典,有多种从古代语言翻译成英语的可能性,一本从英语到意大利语的词典(只有 1 个键 - 1 个值对),一个古代语言的文本文件和另一个意大利语文本文件。到目前为止,我所做的基本上是扫描古代语言文件并使用字典搜索相应的字符串(使用 .strip(".,:;?!") 方法),现在我保存了那些包含至少 2 个单词的相应字符串在字符串列表中。 现在是困难的部分:基本上我需要尝试所有可能的翻译组合(从古代语言到英语的值),然后将这些从英语到意大利语的翻译带到另一个字典中,并检查该字符串是否存在于意大利语文件中,如果是的话我保存了结果和找到的段落(不同段落中的结果不计算在内,必须与我制作的一小段代码来计算段落相同)。 我在这里遇到问题的原因如下:
- 在我发现的字符串中,我应该如何替换单词并保留标点符号?因为返回结果必须包含所有的标点符号,否则输出结果会出错
- 如果字符串包含在文本的 2 行不同的行中,我应该如何进行才能使其正常工作?例如,我有一个 5 个单词的字符串,在一行的末尾我找到了对应的前 2 个单词,但其余 3 个单词是下一行的前 3 个单词。
- 如前所述,从古代语言到英语的字典很大,每个键(古代语言)最多可以有 7 个值(翻译),是否有任何有效的方法来尝试所有组合,同时搜索字符串是否存在于一个文本文件?这可能是最难的部分。 处理这个问题的最好方法可能是每次逐字扫描,如果序列被破坏,我会以某种方式重置它并继续扫描文本文件...... 有什么想法吗?
这里你已经注释了我到目前为止所做的代码:
k = 2 #Random value, the whole program gonna be a function and the "k" value will be different each time
file = [ line.strip().split(';') for line in open('lexicon-GR-EN.csv', encoding="utf8").readlines() ] #Opening CSV file with possible translations from ancient Greek to English
gr_en = { words[0]: tuple(words[1:]) for words in file } #Creating a dictionary with the several translations (values)
file = open('lexicon-EN-IT.csv', encoding="utf8") # Opening 2nd CSV file
en_it = {} # Initializing dictionary
for row in file: # Scanning each row of the CSV file (From English to Italian)
L = row.rstrip("\n").split(';') # Clearing newline char and splitting the words
x = L[0]
t1 = L[1]
en_it[x] = t1 # Since in this CSV file all the words are 1 - 1 is not necesary any check for the length (len(L) is always 2 basically)
file = open('odyssey.txt', encoding="utf8") # Opening text file
result = () # Empty tuple
spacechecker = 0 # This is the variable that i need to determine if i'm on a even or odd line, if odd the line will be scanned normaly otherwise word order and words will be reversed
wordcount = 0 # Counter of how many words have been found
paragraph = 0 # Paragraph counter, starts at 0
paragraphspace = 0 # Another paragraph variable, i need this to prevent double-space to count as paragraph
string = "" # Empty string to store corresponding sequences
foundwords = [] # Empty list to store words that have been found
completed_sequences = [] # Empty list, here will be stored all completed sequences of words
completed_paragraphs = [] # Paragraph counter, this shows in which paragraph has been found each sequence of completed_sequences
for index, line in enumerate(file.readlines()): # Starting line by line scan of the txt file
words = line.split() # Splitting words
if not line.isspace() and index == 0: # Since i don't know nothing about the "secret tests" that will be conducted with this program i've set this check for the start of the first paragraph to prevent errors: if first line is not space
paragraph += 1 # Add +1 to paragraph counter
spacechecker += 1 # Add +1 to spacechecker
elif not line.isspace() and paragraphspace == 1: # Checking if the previous line was space and the current is not
paragraphspace = 0 # Resetting paragraphspace (precedent line was space) value
spacechecker += 1 # Increasing the spacechecker +1
paragraph +=1 # This means we're on a new paragraph so +1 to paragraph
elif line.isspace() and paragraphspace == 1: # Checking if the current line is space and the precedent line was space too.
continue # Do nothing and cycle again
elif line.isspace(): # Checking if the current line is space
paragraphspace += 1 # Increase paragraphspace (precedent line was space variable) +1
continue
else:
spacechecker += 1 # Any other case increase spacechecker +1
if spacechecker % 2 == 1: # Check if spacechecker is odd
for i in range(len(words)): # If yes scan the words in normal order
if words[i].strip(",.!?:;-") in gr_en != "[unavailable]": # If words[i] without any special char is in dictionary
currword = words[i] # If yes, we will call it "currword"
foundwords.append(currword) # Add currword to the foundwords list
wordcount += 1 # Increase wordcount +1
elif (words[i].strip(",.!?:;-") in gr_en == "[unavailable]" and wordcount >= k) or (currword not in gr_en and wordcount >= k): #Elif check if it's not in dictionary but wordcount has gone over k
string = " ".join(foundwords) # We will put the foundwords list in a string
completed_sequences.append(string) # And add this string to the list of strings of completed_sequences
completed_paragraphs.append(paragraph) # Then add the paragraph of that string to the list of completed_paragraphs
result = list(zip(completed_sequences, completed_paragraphs)) # This the output format required, a tuple with the string and the paragraph of that string
wordcount = 0
foundwords.clear() # Clearing the foundwords list
else: # If none of the above happened (word is not in dictionary and wordcounter still isn't >= k)
wordcount = 0 # Reset wordcount to 0
foundwords.clear() # Clear foundwords list
continue # Do nothing and cycle again
else: # The case of spacechecker being not odd,
words = words[::-1] # Reverse the word order
for i in range(len(words)): # Scanning the row of words
currword = words[i][::-1] # Currword in this case will be reversed since the words in even lines are written in reverse.
if currword.strip(",.!?:;-") in gr_en != "[unavailable]": # If currword without any special char is in dictionary
foundwords.append(currword) # Append it to the foundwords list
wordcount += 1 # Increase wordcount +1
elif (currword.strip(",.!?:;-") in gr_en == "[unavailable]" and wordcount >= k) or (currword.strip(",.!?:;-") not in gr_en and wordcount >= k): #Elif check if it's not in dictionary but wordcount has gone over k
string = " ".join(foundwords) # Add the words that has been found to the string
completed_sequences.append(string) # Append the string to completed_sequences list
completed_paragraphs.append(paragraph) # Append the paragraph of the strings to the completed_paragraphs list
result = list(zip(completed_sequences, completed_paragraphs)) # Adding to the result the tuple combination of strings and corresponding paragraphs
wordcount = 0 # Reset wordcount
foundwords.clear() # Clear foundwords list
else: # In case none of above happened
wordcount = 0 # Reset wordcount to 0
foundwords.clear() # Clear foundwords list
continue # Do nothing and cycle again
【问题讨论】:
-
你可以发布一些你的代码吗?
-
@horcrux 如果您想知道为什么所有关于检查行是否奇数或偶数的东西是因为古希腊语中的文本写成 1 行正常和 1 行反向(单词必须是颠倒和他们的顺序也是)
-
我已经完成了对代码的 cmets,现在应该更容易阅读:pastebin.com/7eQEN5PG
标签: python file dictionary