Table of Contents
In a sorted data, the median is the middle element if the total number of items is odd, and the average of two middle elements if the length is even. Therefore, half of the values are above the median and half below it.
Data = 4, 2, 3, 0, 9, 8, 7 n = total number of elements = 7 Median = 4 Data = 4, 2, 3, 0, 9, 8, 7, 1 n = 8 Median = 3.5
Now that we know what a median is, let’s move forward to our specific topic; finding the median of elements in two arrays.
The first step is pretty straightforward, i.e., merge two arrays to get a single unsorted array. Now, our problem is reduced to finding the median of an array.
Naïve Approach
A simple and intuitive approach is to sort the array first using some sorting algorithm, i.e., quick sort or merge sort, etc. Then, choose the element at n/2 index if n is odd or the average of elements at (n-1)/2 and n/2 indices, otherwise. Note that we are performing the integer division and assuming the index of an array starts from 0.
The problem with this approach is that it has a time complexity of O(nlogn)
. We can do better than this because we are sorting the complete array, which is unnecessary.
Quick Select Algorithm
The quick select algorithm is similar to quicksort, but it does not sort the complete data.
- It starts by choosing a pivot element. Then, it partitions the array based on this pivot point to find its correct position. Everything less than the pivot gets shifted to the left, and larger values are moved to the right.
- If the pivot is at the required position, i.e., n/2 index, then the recursion stops, and the pivot gets returned.
- If the pivot’s position is less than n/2, then the right part gets considered for recursion only. Otherwise, the algorithm recurs for the left side.
Therefore, the algorithm recursively partitions the array until the pivot element is at the n/2 index.
One question that arises at this point is, how do you choose the pivot? Choosing a random pivot from the array gives us an average time complexity of O(n)
. However, it has O(n2)
complexity in the worst case.
Consider the following program that finds the median of two arrays using the quick select algorithm with a random pivot element.
Implementation
import math import random def partition(array, l, r, index): pivot = array[index] #find the pivot element left=[] #elements less than or equal to pivot right=[] #elements greater than pivot for i in range(l, r+1): if i==index: #leave the pivot element pass elif array[i] <= pivot: left.append(array[i]) else: right.append(array[i]) #make changes in the actual array end_left = l + len(left) array[l: end_left] = left array[end_left] = pivot array[end_left+1:r+1] = right return end_left def selectPivot(array, l, r): #find a random pivot index return random.randint(l, r) def quickSelect(array, l, r, k): if l <= r: index = selectPivot(array, l, r) #get the index of the pivot point pivot_pos = partition(array,l, r, index) #partition the array based on the pivot if pivot_pos==k: #if the pivot is at n//2 return array[k] elif pivot_pos < k: #consider the right part return quickSelect(array,pivot_pos+1, r, k) else: return quickSelect(array, l, pivot_pos-1, k) #consider the left part def findMedian(array): n=len(array) if (n%2==0): #even number of element x = quickSelect(array, 0, n-1, (n-1)//2) y = quickSelect(array, 0, n-1, n//2) return (x+y)/2 else: #odd number of elements return quickSelect(array, 0, n-1, n//2) When the number of elements is odd: import copy arr1 = [4,2,3,0] #array 1 arr2 = [9,8,7] #array2 array = copy.copy(arr1) array.extend(arr2) #combine both arrays print("Array:", array) median = findMedian(array) #find median print("Median:", median)
Output
Array: [4, 2, 3, 0, 9, 8, 7] Median: 4
When the number of elements is even:
import copy arr1 = [4,2,3,0] #array 1 arr2 = [9,8,7, 1] #array2 array = copy.copy(arr1) array.extend(arr2) #combine both arrays print("Array:", array) median = findMedian(array) #find median print("Median:", median)
Output
Array: [4, 2, 3, 0, 9, 8, 7, 1] Median: 3.5
Median of Medians
To have a worst-case linear time, we can choose the pivot point using the median of medians algorithm. It works in the following way.
- It divides the array into chunks of a specific size, which is usually 5. The last group can have less than 5 elements.
- Then, it finds the median of these chunks by sorting and store results in an array.
- If the length of this array is 1, then it gets returned as a pivot element. Otherwise, the algorithm works again on the medians array until we get a single median.
Implementation
Its implementation is given below.
import math import random def partition(array, l, r, pivot): index = array.index(pivot) left=[] #elements less than or equal to pivot right=[] #elements greater than pivot for i in range(l, r+1): if i==index: #leave the pivot element pass elif array[i] <= pivot: left.append(array[i]) else: right.append(array[i]) #make changes in the actual array end_left = l + len(left) array[l: end_left] = left array[end_left] = pivot array[end_left+1:r+1] = right return end_left def medianOfMedians(arr, l, r): groups=[] #create chunks of size 5 temp = arr[l:r+1] for i in range(0, len(temp), 5): groups.append(temp[i:i+5]) medians =[] #find median of every chunk for group in groups: medians.append(sorted(group)[(len(group))//2]) l_med = len(medians) if l_med == 1: return medians[0] else: return medianOfMedians(medians, 0, l_med-1) def quickSelect(array, l, r, k): if l <= r: pivot = medianOfMedians(array, l, r) #get pivot point pivot_pos = partition(array,l, r, pivot) #partition the array based on the pivot if pivot_pos==k: #if the pivot is at n//2 return array[k] elif pivot_pos < k: #consider the right part return quickSelect(array,pivot_pos+1, r, k) else: return quickSelect(array, l, pivot_pos-1, k) #consider the left part def findMedian(array): n=len(array) if (n%2==0): #even number of element x = quickSelect(array, 0, n-1, (n-1)//2) y = quickSelect(array, 0, n-1, n//2) return (x+y)/2 else: #odd number of elements return quickSelect(array, 0, n-1, n//2) import copy arr1 = [4,2,3,0] #array 1 arr2 = [9,8,7] #array2 array = copy.copy(arr1) array.extend(arr2) #combine both arrays print("Array:", array) median = findMedian(array) #find median print("Median:", median)
Output
Array: [4, 2, 3, 0, 9, 8, 7] Median: 4