In this article, you will learn how to ding the median of two sorted arrays using the binary search approach. Before going ahead, let’s explain what a median is. In a sorted data, the median is the middle element if the total number of items is odd or the average of two middle elements if the length is even. Therefore, half of the values are above the median and half below it.
One approach to solve this problem is to combine the two sorted arrays and then access the middle element(s). However, merging takes O(n+m) time, where n
and m
are lengths of arrays. Not that it is a bad approach, but we can do better.
We can also find the median of two sorted arrays using the binary search approach. It is an efficient method and has a time of complexity of log(min(m, n)).
Intuition
As you know, the length of the combined array is n+m
. The intuition behind this approach is to have two parts (left and right) such that each portion contains an equal number of elements. If n+m
is odd, then the left side will have one more item. Assume that the combined array is sorted, then the median will be the last element in the left part or the average of it and the first element in the right half. Consider the examples below.
data = [ 3, 4, 7, 8, 10, 12, 15, 18] Total number of elements = k = 8 (even) left = [3, 4, 7, 8] right = [10, 12, 15, 18] median = (last element in the left part + first element in the right part)/2 = (8+10)/2 = 18/2=9
data = [ 3, 4, 7, 8, 10, 12, 15, 18, 20] k = 9 (odd) left = [3, 4, 7, 8, 10] right = [12, 15, 18, 20] median = last element in the left part = 10
Given two sorted arrays, arr1
and arr2
, having lengths n
and m
, respectively, we aim to find left and right parts such that all the values in the left half are less than or equal to all the values in the right half. These halves can contain values from both arrays. But, how do we find the partitions? How do we find the number of values contributed by arr1
and arr2
to the left and right parts?
For this, we can use the binary search approach. Moreover, we will keep moving around the partitions until we find the correct one, i.e., values in the left part are less than or equal to values in the right half.
The logic
We take an array that has a smaller size, let’s say arr1
. Doing so will keep our algorithm simple. Now, arr1
can contribute a minimum of 0
values and a maximum of n
values to the left part.
Let’s say arr1
contributes 1 item to the left half. Then, arr2
will contribute three elements because the left part contains four values.
If this is the correct partition, then a1 must be less than or equal to a2. It is true because arr1
is sorted. Also, b3 is less than or equal to b4. Moreover, a1 must be less than or equal to b4, and b3 must be less than or equal to a2. If these conditions satisfy, then the median will be the maximum of a1 and b3. If the combined array’s length is even, then we also have to take the first element from the right side, which will be the minimum of a2 and b4.
If that is not the case, then we need to move around our partitions. Let’s say, b3 > a2. Then, we have to move arr1’s partition to the right. This will also decrease arr2’s contribution size.
If a1 > b4, then we need to do the opposite, i.e., decrease arr1’s contribution size.
As you can see in the above image, both the boundaries are at the left and right corners. Therefore, we only need to compare b4 and a1. In coding, to handle the left edge case, we use the minimum integer value, and for the right edge case, we use the maximum integer value so that our answers don’t vary.
We will keep checking and move around the partitions till we get the right one.
Implementation
import math #helper function to find the first right element in an array (after the partition), if it does not exist, return positive infinity def getMinRight(arr, count): if count == len(arr): return math.inf else: return arr[count] #helper function to find the last left element in an array (before the partition), if it does not exist, return negative infinity def getMaxLeft(arr, count): if count == 0: return -math.inf else: return arr[count-1] #find the median of two sorted arrays def medianSortedArrays(arr1, arr2): if len(arr1) == 0 and len(arr2)==0: return None #if arr1 is empty, return the median of arr2 if len(arr1) == 0: l = len(arr2) if (l%2 == 0): return (arr2[l//2-1] + arr2[l//2])/2 else: return arr2[l//2] #if arr2 is empty, return the median of arr1 if len(arr2) == 0: l = len(arr1) if (l%2 == 0): return (arr1[l//2-1] + arr1[l//2])/2 else: return arr1[l//2] #we keep the arr1 as the smaller array, so if arr2 is smaller than arr1, then swap them if len(arr1) > len(arr2): arr1, arr2 = arr2, arr1 n = len(arr1) m = len(arr2) k = n+m #length of the combined array left_len = (n+m+1)//2 #length of the left part l = 0 #ar1 minimum contribution h = n #arr1 maximum contribution while l<=h: #find the partitions, i.e., no. of values contributed by arr1 and arr2 arr1_count = l + (h-l)//2 arr2_count = left_len - arr1_count maxLeft1 = getMaxLeft(arr1, arr1_count) #leftmost element in the arr1 maxLeft2 = getMaxLeft(arr2, arr2_count) #first right element in the arr1 minRight1 = getMinRight(arr1, arr1_count) #leftmost element in the arr2 minRight2 = getMinRight(arr2, arr2_count) #first right element in the arr1 #found the correct partition if (maxLeft1 <= minRight2 and maxLeft2 <= minRight1): med1 = max(maxLeft1, maxLeft2) if k%2 != 0: #odd length return med1 else: #even length med2 = min(minRight1, minRight2) return (med1+med2)/2 #reduce arr1 contribution size if maxLeft1 > minRight2: h = arr1_count-1 else: #increase arr1 contribution size l = arr1_count + 1 arr1 = [5, 7, 9, 10] arr2 = [4, 5, 6, 9 ] median = medianSortedArrays(arr1, arr2) print("The median of two sorted arrays is:", median)
Output
The median of two sorted arrays is: 6.5
arr1 = [3, 7, 10] arr2 = [4, 5, 6, 9 ] median = medianSortedArrays(arr1, arr2) print("The median of two sorted arrays is:", median)
Output
The median of two sorted arrays is: 6