Find the Median of Two Sorted Arrays using the Binary Search Approach

In this article, you will learn how to ding the median of two sorted arrays using the binary search approach. Before going ahead, let’s explain what a median is. In a sorted data, the median is the middle element if the total number of items is odd or the average of two middle elements if the length is even. Therefore, half of the values are above the median and half below it.

One approach to solve this problem is to combine the two sorted arrays and then access the middle element(s). However, merging takes O(n+m) time, where n and m are lengths of arrays. Not that it is a bad approach, but we can do better.

We can also find the median of two sorted arrays using the binary search approach. It is an efficient method and has a time of complexity of log(min(m, n)).

Intuition

As you know, the length of the combined array is n+m. The intuition behind this approach is to have two parts (left and right) such that each portion contains an equal number of elements. If n+m is odd, then the left side will have one more item. Assume that the combined array is sorted, then the median will be the last element in the left part or the average of it and the first element in the right half. Consider the examples below.

data = [ 3, 4, 7, 8, 10, 12, 15, 18]
Total number of elements  = k = 8 (even)
left = [3, 4, 7, 8]
right = [10, 12, 15, 18]
median = (last element in the left part + first element in the right part)/2 = (8+10)/2 = 18/2=9
data = [ 3, 4, 7, 8, 10, 12, 15, 18, 20]
k = 9 (odd)
left = [3, 4, 7, 8, 10]
right = [12, 15, 18, 20]
median = last element in the left part = 10

Given two sorted arrays, arr1 and arr2, having lengths n and m, respectively, we aim to find left and right parts such that all the values in the left half are less than or equal to all the values in the right half. These halves can contain values from both arrays. But, how do we find the partitions? How do we find the number of values contributed by arr1 and arr2 to the left and right parts?

For this, we can use the binary search approach. Moreover, we will keep moving around the partitions until we find the correct one, i.e., values in the left part are less than or equal to values in the right half.

The logic

We take an array that has a smaller size, let’s say arr1. Doing so will keep our algorithm simple. Now, arr1 can contribute a minimum of 0 values and a maximum of n values to the left part.

Let’s say arr1 contributes 1 item to the left half. Then, arr2 will contribute three elements because the left part contains four values.

If this is the correct partition, then a1 must be less than or equal to a2. It is true because arr1 is sorted. Also, b3 is less than or equal to b4. Moreover, a1 must be less than or equal to b4, and b3 must be less than or equal to a2. If these conditions satisfy, then the median will be the maximum of a1 and b3. If the combined array’s length is even, then we also have to take the first element from the right side, which will be the minimum of a2 and b4.


If that is not the case, then we need to move around our partitions. Let’s say, b3 > a2. Then, we have to move arr1’s partition to the right. This will also decrease arr2’s contribution size.

If a1 > b4, then we need to do the opposite, i.e., decrease arr1’s contribution size.

As you can see in the above image, both the boundaries are at the left and right corners. Therefore, we only need to compare b4 and a1. In coding, to handle the left edge case, we use the minimum integer value, and for the right edge case, we use the maximum integer value so that our answers don’t vary.

We will keep checking and move around the partitions till we get the right one.

Implementation

import math
#helper function to find the first right element in an array (after the partition), if it does not exist, return positive infinity
def getMinRight(arr, count):
        if count == len(arr):
            return math.inf
        else:
            return arr[count]
        
#helper function to find the last left element in an array (before the partition), if it does not exist, return negative infinity
def getMaxLeft(arr, count):
    if count == 0:
        return -math.inf
    else:
        return arr[count-1]

#find the median of two sorted arrays
def medianSortedArrays(arr1, arr2):
        if len(arr1) == 0 and len(arr2)==0:
            return None

        #if arr1 is empty, return the median of arr2
        if len(arr1) == 0:
            l = len(arr2)
            if (l%2 == 0):
                return (arr2[l//2-1] + arr2[l//2])/2
            else:
                return arr2[l//2]

        #if arr2 is empty, return the median of arr1
        if len(arr2) == 0:
            l = len(arr1)
            if (l%2 == 0):
                return (arr1[l//2-1] + arr1[l//2])/2
            else:
                return arr1[l//2]
        
        #we keep the arr1 as the smaller array, so if arr2 is smaller than arr1, then swap them
        if len(arr1) > len(arr2):
            arr1, arr2 = arr2, arr1
        
        n = len(arr1)
        m = len(arr2)
        k = n+m #length of the combined array
        left_len = (n+m+1)//2 #length of the left part 
        
        l = 0 #ar1 minimum contribution
        h = n #arr1 maximum contribution
        
        while l<=h:
           
            #find the partitions, i.e., no. of values contributed by arr1 and arr2
            arr1_count = l + (h-l)//2
            arr2_count = left_len - arr1_count
            
           
            maxLeft1 = getMaxLeft(arr1, arr1_count) #leftmost element in the arr1 
            maxLeft2 = getMaxLeft(arr2, arr2_count) #first right element in the arr1 
            minRight1 = getMinRight(arr1, arr1_count) #leftmost element in the arr2
            minRight2 = getMinRight(arr2, arr2_count) #first right element in the arr1 
            
            #found the correct partition 
            if (maxLeft1 <= minRight2 and maxLeft2 <= minRight1):
                med1 = max(maxLeft1, maxLeft2)
                if k%2 != 0: #odd length
                    return med1
                else: #even length
                    med2 = min(minRight1, minRight2)
                    return (med1+med2)/2
                
            #reduce arr1 contribution size
            if maxLeft1 > minRight2:
                h = arr1_count-1
            else: #increase arr1 contribution size
                l = arr1_count + 1
                


arr1 = [5, 7, 9, 10]
arr2 = [4, 5, 6, 9 ]
median = medianSortedArrays(arr1, arr2)
print("The median of two sorted arrays is:", median)

Output

The median of two sorted arrays is: 6.5

arr1 = [3, 7, 10]
arr2 = [4, 5, 6, 9 ]
median = medianSortedArrays(arr1, arr2)
print("The median of two sorted arrays is:", median)

Output

The median of two sorted arrays is: 6

Back to: Data Structure > Data Structure - Arrays

Leave a Reply

Your email address will not be published. Required fields are marked *