你可以在O(log2(min(m+n)))时间找到两个不同长度的排序数组m和n的并集的中位数。本质上,您在每个数组中搜索一个拆分点,两个小拆分贡献的元素数量与两个大拆分相同。这标识了中位数上方和下方相同数量的元素。
可以使用二分搜索来搜索理想的分割点(排序确保您可以通过检查是否过冲或下冲来有效地接近)。
在一个数组中查找分割点可以免费为您提供另一个数组的分割点(因为您知道需要多少元素来平衡从第一个数组中选择的元素)。
一旦您在每个数组中找到给出“所有低于中位数的元素”和“所有高于中位数的元素”的分割点,您就可以通过检查它们之间的边界来计算中位数(即,如果联合长度则抓住中间元素是奇数,否则直接在边界上平均元素)。
我将 this leetcode discussion 的 cmets 中的 Python 算法翻译成 JavaScript(zzg_zzm 在 stellari 算法之上的调整)。但我选择了更直观的变量名,并添加了 cmets。
没有经过详尽的测试,但在我尝试过的几个输入中都有效。
function findUnionMedianSorted(smallArr, largeArr) {
// there are an equal number of elements below and above median
// we need to find partitions on arr1 and arr2 such that arr1 and arr2
// together contribute an equal number of submedian and supermedian elements
// because fitness of partition point is transitive,
// we can use binary search to approach optimal partition
// we use the smaller array as a basis for finding the first partition,
// since this eliminates situation where small array lacks enough elements to balance the partition
// global median can then be calculated as:
// avg(elementBelowMedian, elementAboveMedian)
// so we must find also the elements that flank the median
// ensure smallArr is the smaller array
if (largeArr.length < smallArr.length) {
return findUnionMedianSorted(largeArr, smallArr)
}
const unionArrLen = smallArr.length + largeArr.length
// indices at which we would consider performing a cut
let smallArrCutStartIx = 0, smallArrCutEndIx = smallArr.length
while (smallArrCutStartIx <= smallArrCutEndIx) {
// cut we are evaluating
// midpoint of current search space of possible smallArr cuts
const smallArrCutIx = Math.floor((smallArrCutStartIx + smallArrCutEndIx)/2)
// partition on largeArr must provide same number of elements
// above median as smallArr provides below median
const largeArrCutIx = Math.floor(unionArrLen/2) - smallArrCutIx
// smallArr and largeArr both submit a candidate for "what may be the element preceding the median"
// this is the element preceding that array's cut
// if there is no such element: we are cutting at an end of the array, so we have no element to offer
// thus: we set extreme value such that comparisons favor the alternative (candidate from other array)
const smallArrElementBeforeMedian = smallArrCutIx === 0
? Number.MIN_SAFE_INTEGER
: smallArr[smallArrCutIx-1]
const smallArrElementAfterMedian = smallArrCutIx === smallArr.length
? Number.MAX_SAFE_INTEGER
: smallArr[smallArrCutIx]
const largeArrElementBeforeMedian = largeArrCutIx === 0
? Number.MIN_SAFE_INTEGER
: largeArr[largeArrCutIx-1]
const largeArrElementAfterMedian = largeArrCutIx === largeArr.length
? Number.MAX_SAFE_INTEGER
: largeArr[largeArrCutIx]
// elements before median must be smaller than elements after median
// this is already guaranteed within-array (elements are sorted)
// but we check whether our proposed cut violates this across the two proposed arrays
if (smallArrElementBeforeMedian > largeArrElementAfterMedian) {
// our cut on smallArr is at too high an index
// eliminate all cut locations equal to or greater than the cut index we tried
smallArrCutEndIx = smallArrCutIx-1
continue
}
if (smallArrElementAfterMedian < largeArrElementBeforeMedian) {
// our cut on smallArr is at too low an index
// eliminate all cut locations equal to or less than the cut index we tried
smallArrCutStartIx = smallArrCutIx+1
continue
}
// both candidates will be present in the union array,
// but only the smaller one will be directly after the median
const elementAfterMedian = Math.min(smallArrElementAfterMedian, largeArrElementAfterMedian)
// does the union array have one middle or two?
if (unionArrLen %2 === 1) {
// odd length; one middle
// why do we prefer `elementAfterMedian` and not `elementBeforeMedian`?
// the material I adapted this from did not explain, so what follows is my (shaky) guess:
// our "after" index points to the midpoint of a search space, so for odd-length arrays
// it is actually an "equal to" index.
return elementAfterMedian
}
// both candidates will be present in the union array,
// but only the larger one will be directly before the median
const elementBeforeMedian = Math.max(smallArrElementBeforeMedian, largeArrElementBeforeMedian)
// average the two middles
return (elementBeforeMedian + elementAfterMedian) / 2
}
}
至于:
另外,如何找到相同长度的 k 个排序数组的中位数?有没有高效的算法?
这足够大,值得发布一个单独的问题。