【问题标题】:How would you convert this code from a for loop to a while loop?如何将此代码从 for 循环转换为 while 循环?
【发布时间】:2023-01-22 11:38:28
【问题描述】:

每次我尝试将其转换为 while 循环时,它都会无休止地循环,任何想法都将不胜感激。如果我使用 for 循环,它运行得非常好,所以我假设使用 while 循环并迭代索引值会获得相同的结果。

from fuzzywuzzy import fuzz
import time
import fitz
from date_check import locate_date

## Each header is a list containing the header text and the form name ##
headers = ["header1", "Header1"]

## cast to lowercase ##
for header in headers:
    header[0] = header[0].lower()

## One of the following is expected to be on the last page of the form ##
end_texts = ["Signature", "Signed"]
## cast to lowercase ##
for i in range(len(end_texts)):
    end_texts[i] = end_texts[i].lower()


## set variables ##
forms = []
first_page = 0
header = ""

## Scan entire document for headers ##
def scan_document(document):
    document = fitz.open(document)
    first_page = False
    last_page = False
    index = 0
    ## This is the loop in question ##
    for i in range(len(document)):
        page = document[i]
        text = page.get_text("text")
        text = text.lower()
        if first_page == False:
            for header in headers:
                if fuzz.partial_ratio(header[0], text) > 90:
                    first_page = i
                    ## Find the date on the page ##
                    date = locate_date(text)
                    forms.append([date, header[1], first_page])
                    break

        elif  first_page != False and last_page == False:
            for end_text in end_texts:
                if end_text in text:
                    last_page = i
                    forms[index].append(last_page)
                    first_page = False
                    last_page = False
                    index += 1
                    break


    ## Return forms list containing first and last page of each form as well as the header ##
    return(forms)

我尝试使用 while 循环并遍历索引,但每当我使用它时程序都会挂起。

## set variables ##
forms = []
first_page = 0
header = ""

## Scan entire document for headers ##
def scan_document(document):
    document = fitz.open(document)
    first_page = False
    last_page = False
    page_num = 0
    index = 0
    
    while page_num <= len(document):
        page = document[page_num]
        text = page.get_text("text")
        text = text.lower()
        if first_page == False:
            for header in headers:
                if fuzz.partial_ratio(header[0], text) > 90:
                    first_page = page_num
                    ## Find the date on the page ##
                    date = locate_date(text)
                    forms.append([date, header[1], first_page])
                    page_num += 1
                    break

        elif  first_page != False and last_page == False:
            for end_text in end_texts:
                if end_text in text:
                    last_page = page_num
                    forms[index].append(last_page)
                    first_page = False
                    last_page = False
                    index += 1
                    page_num += 1
                    break
        else:
            page_num += 1

    ## Return forms list containing first and last page of each form as well as the header ##
    return(forms)```

【问题讨论】:

  • 在某些情况下,你的 page_num += 1 语句都没有被访问。
  • 每次循环迭代只需增加一次page_num。不要把它放在任何条件语句中。以前不是有条件的,为什么它是有条件的是没有意义的。只需在循环的最后一行递增它。
  • 您是否调试过您的程序以检查它是否达到前两个条件的 page_num += 1 行?

标签: python


【解决方案1】:

您需要在每次迭代中增加 page_num,因为某些 if 语句不会被触发。

【讨论】:

    【解决方案2】:

    page_num += 1行有一些情况,可以在while进入后先增加page_number的值,但是记得用的时候用page_number - 1

    ## set variables ##
    forms = []
    first_page = 0
    header = ""
    
    
    ## Scan entire document for headers ##
    def scan_document(document):
        document = fitz.open(document)
        first_page = False
        last_page = False
        page_num = 0
        index = 0
    
        while page_num <= len(document):
            page_num += 1
            page = document[page_num - 1]
            text = page.get_text("text")
            text = text.lower()
            if first_page == False:
                for header in headers:
                    if fuzz.partial_ratio(header[0], text) > 90:
                        first_page = page_num - 1
                        ## Find the date on the page ##
                        date = locate_date(text)
                        forms.append([date, header[1], first_page])
                        break
    
            elif first_page != False and last_page == False:
                for end_text in end_texts:
                    if end_text in text:
                        last_page = page_num - 1
                        forms[index].append(last_page)
                        first_page = False
                        last_page = False
                        index += 1
                        break
    
        ## Return forms list containing first and last page of each form as well as the header ##
        return (forms)
    

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 2021-10-22
      • 2021-02-17
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2016-07-01
      • 1970-01-01
      相关资源
      最近更新 更多