Find the Median of Elements in Two Arrays

In a sorted data, the median is the middle element if the total number of items is odd, and the average of two middle elements if the length is even. Therefore, half of the values are above the median and half below it.

Data = 4, 2, 3, 0, 9, 8, 7
n = total number of elements = 7
Median = 4
Data = 4, 2, 3, 0, 9, 8, 7, 1
n = 8
Median = 3.5

Now that we know what a median is, let’s move forward to our specific topic; finding the median of elements in two arrays.

The first step is pretty straightforward, i.e., merge two arrays to get a single unsorted array. Now, our problem is reduced to finding the median of an array.

Naïve Approach

A simple and intuitive approach is to sort the array first using some sorting algorithm, i.e., quick sort or merge sort, etc. Then, choose the element at n/2 index if n is odd or the average of elements at (n-1)/2 and n/2 indices, otherwise. Note that we are performing the integer division and assuming the index of an array starts from 0.

The problem with this approach is that it has a time complexity of O(nlogn). We can do better than this because we are sorting the complete array, which is unnecessary.

Quick Select Algorithm

The quick select algorithm is similar to quicksort, but it does not sort the complete data.

  • It starts by choosing a pivot element. Then, it partitions the array based on this pivot point to find its correct position. Everything less than the pivot gets shifted to the left, and larger values are moved to the right.
  • If the pivot is at the required position, i.e., n/2 index, then the recursion stops, and the pivot gets returned.
  • If the pivot’s position is less than n/2, then the right part gets considered for recursion only. Otherwise, the algorithm recurs for the left side.

Therefore, the algorithm recursively partitions the array until the pivot element is at the n/2 index.

One question that arises at this point is, how do you choose the pivot? Choosing a random pivot from the array gives us an average time complexity of O(n). However, it has O(n2) complexity in the worst case.

Consider the following program that finds the median of two arrays using the quick select algorithm with a random pivot element.

Implementation

import math
import random

def partition(array, l, r, index):
  pivot = array[index] #find the pivot element
  left=[] #elements less than or equal to pivot
  right=[] #elements greater than pivot
  for i in range(l, r+1): 
    if i==index: #leave the pivot element
      pass
    elif array[i] <= pivot:
      left.append(array[i])
    else:
      right.append(array[i])

  #make changes in the actual array
  end_left = l + len(left)
  array[l: end_left] = left 
  array[end_left] = pivot
  array[end_left+1:r+1] = right

  return end_left

def selectPivot(array, l, r): #find a random pivot index
  return random.randint(l, r)

def quickSelect(array, l, r, k): 
  if l <= r:
    index = selectPivot(array, l, r) #get the index of the pivot point
    pivot_pos = partition(array,l, r,  index) #partition the array based on the pivot
    if pivot_pos==k: #if the pivot is at n//2
      return array[k]
    elif pivot_pos < k: #consider the right part
        return quickSelect(array,pivot_pos+1, r, k)
    else:
        return quickSelect(array, l, pivot_pos-1, k) #consider the left part

def findMedian(array):
  n=len(array)
  if (n%2==0): #even number of element
    x = quickSelect(array, 0, n-1, (n-1)//2)
    y = quickSelect(array, 0, n-1, n//2)
    return (x+y)/2
  else: #odd number of elements
    return quickSelect(array, 0, n-1, n//2)


When the number of elements is odd:
import copy
arr1 = [4,2,3,0] #array 1
arr2 = [9,8,7] #array2
array = copy.copy(arr1)
array.extend(arr2) #combine both arrays
print("Array:", array)
median = findMedian(array) #find median
print("Median:", median)

Output

Array: [4, 2, 3, 0, 9, 8, 7] Median: 4

When the number of elements is even:

import copy
arr1 = [4,2,3,0] #array 1
arr2 = [9,8,7, 1] #array2
array = copy.copy(arr1)
array.extend(arr2) #combine both arrays
print("Array:", array)
median = findMedian(array) #find median
print("Median:", median)

Output

Array: [4, 2, 3, 0, 9, 8, 7, 1] Median: 3.5

Median of Medians

To have a worst-case linear time, we can choose the pivot point using the median of medians algorithm. It works in the following way.

  • It divides the array into chunks of a specific size, which is usually 5. The last group can have less than 5 elements.
  • Then, it finds the median of these chunks by sorting and store results in an array.
  • If the length of this array is 1, then it gets returned as a pivot element. Otherwise, the algorithm works again on the medians array until we get a single median.

Implementation

Its implementation is given below.

import math
import random

def partition(array, l, r, pivot):
  index = array.index(pivot)
  left=[] #elements less than or equal to pivot
  right=[] #elements greater than pivot
  for i in range(l, r+1): 
    if i==index: #leave the pivot element
      pass
    elif array[i] <= pivot:
      left.append(array[i])
    else:
      right.append(array[i])

  #make changes in the actual array
  end_left = l + len(left)
  array[l: end_left] = left 
  array[end_left] = pivot
  array[end_left+1:r+1] = right

  return end_left

def medianOfMedians(arr, l, r):
  groups=[]
  #create chunks of size 5
  temp = arr[l:r+1]
  for i in range(0, len(temp), 5):
      groups.append(temp[i:i+5])
  medians =[]

  #find median of every chunk
  for group in groups:
    medians.append(sorted(group)[(len(group))//2])
  l_med = len(medians)
  if l_med == 1:
    return medians[0]
  else:
    return medianOfMedians(medians, 0, l_med-1)

def quickSelect(array, l, r, k): 
  if l <= r:
    pivot = medianOfMedians(array, l, r) #get pivot point
    pivot_pos = partition(array,l, r,  pivot) #partition the array based on the pivot
    if pivot_pos==k: #if the pivot is at n//2
      return array[k]
    elif pivot_pos < k: #consider the right part
        return quickSelect(array,pivot_pos+1, r, k)
    else:
        return quickSelect(array, l, pivot_pos-1, k) #consider the left part

def findMedian(array):
  n=len(array)
  if (n%2==0): #even number of element
    x = quickSelect(array, 0, n-1, (n-1)//2)
    y = quickSelect(array, 0, n-1, n//2)
    return (x+y)/2
  else: #odd number of elements
    return quickSelect(array, 0, n-1, n//2)


import copy
arr1 = [4,2,3,0] #array 1
arr2 = [9,8,7] #array2
array = copy.copy(arr1)
array.extend(arr2) #combine both arrays
print("Array:", array)
median = findMedian(array) #find median
print("Median:", median)

Output

Array: [4, 2, 3, 0, 9, 8, 7] Median: 4

Back to: Data Structure > Data Structure - Arrays

Leave a Reply

Your email address will not be published. Required fields are marked *