【问题标题】:Finding median from two sorted arrays with different lengths从具有不同长度的两个排序数组中查找中值
【发布时间】:2018-07-11 18:29:37
【问题描述】:

从两个给定的相同长度的排序数组中找到中位数的问题是众所周知且容易的(之前在这里被问过很多次)。 (这可以通过简单的递归算法来完成)

我的问题是当两个数组的长度不同时如何有效地找到中位数(即不使用mergesort对它们进行排序并找到中位数)

另外,如何找到 k 相同 长度的排序数组的中位数?有没有高效的算法?

我试图回答最后一个问题,但没有找到好的解决方案, 谢谢!

【问题讨论】:

    标签: arrays algorithm sorting selection


    【解决方案1】:

    如果您从一个数组中选择一个值并在另一个数组中对其进行二进制搜索,那么您将知道每个数组中有多少值高于和低于所选值,这足以告诉您有多少两者组合中的值高于和低于所选值。

    因此,您可以对第一个数组进行二进制切分并找出其哪个值最接近整体中位数,您可以对第二个数组进行二进制切分并找出其哪个值最接近整体中位数,并且这两个数组之一必须包含整体中位数。

    在最坏的情况下,这样做的成本是两个外部二进制印章,每次猜测都会花费你一个内部二进制印章,所以 O(log^2(n))。

    有几个想法至少可以提供实际的加速:

    1) 在进行内部二元切分时,您不一定需要找到完全匹配。只要您减小了匹配值的间隔,足以判断所选值是高于还是低于目标中值,您就可以返回该范围内的任何值。

    2) 您可以查看上次调用内部二进制斩波返回的间隔是否是当前调用的可行起点。如果它没有包含搜索的值,则可能在它的一侧或另一侧有相同大小的间隔。

    【讨论】:

      【解决方案2】:

      你可以在O(log2(min(m+n)))时间找到两个不同长度的排序数组mn的并集的中位数。本质上,您在每个数组中搜索一个拆分点,两个小拆分贡献的元素数量与两个大拆分相同。这标识了中位数上方和下方相同数量的元素。

      可以使用二分搜索来搜索理想的分割点(排序确保您可以通过检查是否过冲或下冲来有效地接近)。
      在一个数组中查找分割点可以免费为您提供另一个数组的分割点(因为您知道需要多少元素来平衡从第一个数组中选择的元素)。

      一旦您在每个数组中找到给出“所有低于中位数的元素”和“所有高于中位数的元素”的分割点,您就可以通过检查它们之间的边界来计算中位数(即,如果联合长度则抓住中间元素是奇数,否则直接在边界上平均元素)。

      我将 this leetcode discussion 的 cmets 中的 Python 算法翻译成 JavaScript(zzg_zzm 在 stellari 算法之上的调整)。但我选择了更直观的变量名,并添加了 cmets。

      没有经过详尽的测试,但在我尝试过的几个输入中都有效。

      function findUnionMedianSorted(smallArr, largeArr) {  
        // there are an equal number of elements below and above median
        // we need to find partitions on arr1 and arr2 such that arr1 and arr2
        // together contribute an equal number of submedian and supermedian elements
      
        // because fitness of partition point is transitive,
        // we can use binary search to approach optimal partition
      
        // we use the smaller array as a basis for finding the first partition,
        // since this eliminates situation where small array lacks enough elements to balance the partition
      
        // global median can then be calculated as:
        // avg(elementBelowMedian, elementAboveMedian)
        // so we must find also the elements that flank the median
      
        // ensure smallArr is the smaller array
        if (largeArr.length < smallArr.length) {
          return findUnionMedianSorted(largeArr, smallArr)
        }
      
        const unionArrLen = smallArr.length + largeArr.length
      
        // indices at which we would consider performing a cut
        let smallArrCutStartIx = 0, smallArrCutEndIx = smallArr.length
        while (smallArrCutStartIx <= smallArrCutEndIx) {
          // cut we are evaluating
          // midpoint of current search space of possible smallArr cuts
          const smallArrCutIx = Math.floor((smallArrCutStartIx + smallArrCutEndIx)/2)
          // partition on largeArr must provide same number of elements
          // above median as smallArr provides below median
          const largeArrCutIx = Math.floor(unionArrLen/2) - smallArrCutIx
      
          // smallArr and largeArr both submit a candidate for "what may be the element preceding the median"
          // this is the element preceding that array's cut
          // if there is no such element: we are cutting at an end of the array, so we have no element to offer
          // thus: we set extreme value such that comparisons favor the alternative (candidate from other array)
          const smallArrElementBeforeMedian = smallArrCutIx === 0
          ? Number.MIN_SAFE_INTEGER
          : smallArr[smallArrCutIx-1]
          const smallArrElementAfterMedian = smallArrCutIx === smallArr.length
          ? Number.MAX_SAFE_INTEGER
          : smallArr[smallArrCutIx]
      
          const largeArrElementBeforeMedian = largeArrCutIx === 0
          ? Number.MIN_SAFE_INTEGER
          : largeArr[largeArrCutIx-1]
          const largeArrElementAfterMedian = largeArrCutIx === largeArr.length
          ? Number.MAX_SAFE_INTEGER
          : largeArr[largeArrCutIx]
      
          // elements before median must be smaller than elements after median
          // this is already guaranteed within-array (elements are sorted)
          // but we check whether our proposed cut violates this across the two proposed arrays
          if (smallArrElementBeforeMedian > largeArrElementAfterMedian) {
            // our cut on smallArr is at too high an index
            // eliminate all cut locations equal to or greater than the cut index we tried
            smallArrCutEndIx = smallArrCutIx-1
            continue
          }
          if (smallArrElementAfterMedian < largeArrElementBeforeMedian) {
            // our cut on smallArr is at too low an index
            // eliminate all cut locations equal to or less than the cut index we tried
            smallArrCutStartIx = smallArrCutIx+1
            continue
          }
      
          // both candidates will be present in the union array,
          // but only the smaller one will be directly after the median
          const elementAfterMedian = Math.min(smallArrElementAfterMedian, largeArrElementAfterMedian)
      
          // does the union array have one middle or two?
          if (unionArrLen %2 === 1) {
            // odd length; one middle
      
            // why do we prefer `elementAfterMedian` and not `elementBeforeMedian`?
            // the material I adapted this from did not explain, so what follows is my (shaky) guess:
      
            // our "after" index points to the midpoint of a search space, so for odd-length arrays
            // it is actually an "equal to" index.
            return elementAfterMedian
          }
      
          // both candidates will be present in the union array,
          // but only the larger one will be directly before the median
          const elementBeforeMedian = Math.max(smallArrElementBeforeMedian, largeArrElementBeforeMedian)
      
          // average the two middles
          return (elementBeforeMedian + elementAfterMedian) / 2
        }
      }
      

      至于:

      另外,如何找到相同长度的 k 个排序数组的中位数?有没有高效的算法?

      这足够大,值得发布一个单独的问题。

      【讨论】:

        猜你喜欢
        • 2018-06-28
        • 1970-01-01
        • 1970-01-01
        • 2022-12-17
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        相关资源
        最近更新 更多