【问题标题】:Getting a binary search to work in Python让二进制搜索在 Python 中工作
【发布时间】:2016-05-15 11:27:53
【问题描述】:

我正在尝试让二进制搜索在 Python 中工作。我有一个庞大的、排序的密码列表。计划是从用户那里获取密码输入并查看它是否在列表中。由于列表的大小,我决定实现二进制搜索。

这是我的代码:

Found = False
Password = user_input("Enter a password: ")


with io.open('final.txt', encoding='latin-1') as myfile:

    data = myfile.readlines()
    low = 0
    high = (int(len(data))+1)
    while (low < high) and not Found:

        mid = int((low+high)/2)

        if data[mid] == Password:
            Found = True
            break
        elif Password < str(data[mid]):
            high = mid - 1
        elif Password > str(data[mid]):
            low = mid + 1

我猜是因为字符串比较?有任何想法吗?二进制搜索永远不会返回 true,即使我明确搜索我知道在列表中的内容。

我使用此代码对密码列表进行排序。

import io

with io.open('result.txt', encoding='latin-1') as myfile:
    data = myfile.readlines()

def partition(data, start, end):
    pivot = data[end]                          # Partition around the last value
    bottom = start-1                           # Start outside the area to be partitioned
    top = end                                  # Ditto

    done = 0
    while not done:                            # Until all elements are partitioned...

        while not done:                        # Until we find an out of place element...
            bottom = bottom+1                  # ... move the bottom up.

            if bottom == top:                  # If we hit the top...
                done = 1                       # ... we are done.
                break

            if data[bottom] > pivot:           # Is the bottom out of place?
                data[top] = data[bottom]       # Then put it at the top...
                break                          # ... and start searching from the top.

        while not done:                        # Until we find an out of place element...
            top = top-1                        # ... move the top down.

            if top == bottom:                  # If we hit the bottom...
                done = 1                       # ... we are done.
                break

            if data[top] < pivot:              # Is the top out of place?
                data[bottom] = data[top]       # Then put it at the bottom...
                break                          # ...and start searching from the bottom.

    data[top] = pivot                          # Put the pivot in its place.
    return top                                 # Return the split point


def quicksort(data, start, end):
    if start < end:                            # If there are two or more elements...
        split = partition(data, start, end)    # ... partition the sublist...
        quicksort(data, start, split-1)
        quicksort(data, split+1, end)


quicksort(data, 0, (int(len(data))-1))

with io.open('final.txt', 'w', encoding='latin-1') as f:
    for s in data:
        f.write(s)

排序后的列表如下所示:空格,然后是符号,然后是数字,然后是大写字母(按字母顺序排序),然后是普通字母(按字母顺序排序)。

【问题讨论】:

  • 有什么问题?
  • 二进制搜索永远不会返回 true,即使我明确搜索我知道在列表中的内容。在任何搜索之后,打印高或低总是返回 992352。
  • 除了你的算法问题,还有两个注意事项:1)执行时间是读取文件的 99%:所以线性搜索是这里最好的方法。 2)如果您将密码存储在内存中,则 set 比 list 更好:passwords=set(data), Password in passwords 在您的方法是 O( ln(n)) 时在 0(1) 中解决您的问题。

标签: python string search binary-search


【解决方案1】:

由于您设置lowhigh 的方式,您正在跳过部分列表。正因为如此,low == high发生在更新后检查前,导致你过早跳出循环。

有两种简单的解决方案:

要么..

  • 设置high = midlow = mid 而不是mid -/+ 1,触发额外的迭代,

或者..

  • 检查循环后是否high == low and data[low] == Password 终止,因为您可能仍会在那里找到Password

【讨论】:

  • 啊,正如@KIDJourney 提到的,改变你的循环条件也可以解决它。
【解决方案2】:

有两个问题。

  1. 您的二分搜索算法错误。

重复条件应该是

while (low <= high)

或者你找不到第一个和最后一个元素。

  1. readlines() 将读取 \n 但 user_input() 不会。

这会导致`Password` == `Password\n' 永远为假。

【讨论】:

    【解决方案3】:

    这是二分查找的例子

    def binarySearch(alist, item):
            first = 0
            last = len(alist)-1
            found = False
    
            while first<=last and not found:
                midpoint = (first + last)//2
                if alist[midpoint] == item:
                    found = True
                else:
                    if item < alist[midpoint]:
                        last = midpoint-1
                    else:
                        first = midpoint+1
    
            return found
    
    mylist1 = [0, 1, 2, 8, 9, 17, 19, 32, 42,]
    print(binarySearch(mylist1, 3))
    print(binarySearch(mylist1, 13))
    
    mylist2 = [0, 1, 2, 8, 9, 17, 19, 32, 42, 99]
    print(binarySearch(mylist2, 2))
    print(binarySearch(mylist2, 42))
    

    我明白了

    False
    False
    True
    True
    

    是的,正如 Eamon 指出的那样,我确信在调用 readlines 后,您需要在每个密码的末尾添加换行符。

    【讨论】:

      【解决方案4】:

      您可能在调用readlines 后每个密码的末尾都有一个换行符,使用rstrip() 将其删除

          Found = False
          Password = user_input("Enter a password: ")
      
      
          with io.open('final.txt', encoding='latin-1') as myfile:
      
              data = myfile.readlines()
              low = 0
              high = len(data)-1   #no need to cast to int, should be len()-1
              while (low <= high) and not Found:  #less than or equal to
      
                  mid = int((low+high)/2)
      
                  if data[mid].rstrip() == Password:   #Remove newline character before compare
                      Found = True
                      break
                  elif Password < str(data[mid]):
                      high = mid - 1
                  elif Password > str(data[mid]):
                      low = mid + 1
      

      【讨论】:

        【解决方案5】:

        如果你只想在你的列表中搜索密码,那么在你的代码中

        data = myfile.readlines()
        

        你已经把所有的密码都存入了内存。 因此,如果您只想检查给定密码是否存在于您的列表中,您可以使用直接检查

        if Password in data:
             print "yes it is present in the list"
        else:
            print "Not present in the list"
        

        希望能有所帮助。

        【讨论】:

          【解决方案6】:

          不要编写自己的二进制搜索,要正确处理它们有点棘手。请改用bisect 模块。

          from bisect import bisect_left
          def binary_search(lst, el):
             # returns lower bound of key `el` in list `lst`
             index = bisect_left(lst, el)
             # check that: (1) the lower bound is not at the end of the list and
             # (2) the element at the index matches `el`
             return index < len(lst) and lst[index] == el
          

          用法:

          test = ["abc", "def", "ghi"]
          print(binary_search(test, "def")) # True
          print(binary_search(test, "xyz")) # False
          

          【讨论】:

          • 我很清楚有很多方法可以做到这一点,而无需重新发明轮子。但是,我将其作为算法思维的练习,希望您能帮助我调试我的代码。
          猜你喜欢
          • 1970-01-01
          • 2015-05-18
          • 1970-01-01
          • 2012-03-19
          • 1970-01-01
          • 2022-06-19
          • 1970-01-01
          • 1970-01-01
          相关资源
          最近更新 更多