Stokastik

Machine Learning, AI and Programming

Leetcode : Sliding Window Median

Problem Statement

Solution:

The brute force approach to solve this problem is to sort the numbers in each window and then compute the median for each window. Since there are K numbers in each window, thus the run-time complexity of the brute force approach is O(N*K*logK) because, there are N-K windows and sorting each window is O(K*logK).

When K=N/2, the worst case complexity is i.e. O(N2*logN). We know that we can always improve this run-time complexity by using a data structure which already indexes the numbers in each window in sorted order such as a Binary Search Tree.

When we shift the window to the right, the leftmost element in the last window gets deleted whereas the rightmost element in the current window gets added. Now if the tree is balanced, then insertion and deletion is both O(logK) operation. Thus the optimum run-time complexity comes out to be O(N*logK).

But a vanilla BST is not usually balanced. Thus the worst case time complexity can still come out to be O(N2*logN) with a completely skewed BST. We can try to use some self-balancing BST variants like AVL-Trees or Red-Black-Trees. But we can also come up with an alternative solution using two priority queues (heaps) with similar run-time complexity that avoids coding an AVL-Tree or a Red-Black Tree from scratch for this problem. In this solution we use two heaps approach - a min heap and a max heap.

Red Black Tree

Assuming that we have a sorted array of M numbers, then we store the first half i.e. first M/2 numbers in a max-heap and the second half in a min-heap. Then the median of the sorted array of M numbers is computed as follows:

  1. If M is even, then the median is computed by summing the root of the max-heap with the root of the min-heap and dividing by 2.0
  2. If M is odd, then the median of the M numbers is just the root of the min-heap.

Now when a new number X arrives, it is either added to the max-heap or the min-heap depending on its value. When X is less than equal to the root of the max-heap, it is added to the max-heap, else it is added to the min-heap. But in order to maintain the property that the max-heap stores the smallest half of the numbers and the min-heap the greatest half, we also need to re-adjust the heaps as follows:

  1. If X is added to max-heap, then remove the root of max-heap and add it to the min-heap.
  2. If X is added to min-heap, then remove the root of min-heap and add it to the max-heap.

The median is computed in the same way as defined above.

But with a sliding window, we also need to delete the leftmost element in the window. Deletion also takes place from either of the heaps depending on which element is to be deleted. After deletion, we do the similar kind of re-adjustment as above:

  1. If deleted from max-heap, then remove the root of min-heap and add it to the max-heap.
  2. If deleted from min-heap, then remove the root of max-heap and add it to the min-heap.

Instead of doing two separate re-adjustments i.e. one after insertion and one after deletion, we can combine them as follows:

  1. If X is added to max-heap and an element is deleted from min-heap, then remove the root of max-heap and add it to the min-heap.
  2. If X is added to min-heap and an element is deleted from max-heap, then remove the root of min-heap and add it to the max-heap.
  3. If X is added to max-heap and an element is deleted from max-heap, then do not make any re-adjustments.
  4. If X is added to min-heap and an element is deleted from min-heap, then do not make any re-adjustments.

Both insertion and deletion in a heap (or a priority queue) is O(logK), for K elements. Deletion in a heap can be accomplished efficiently by keeping a hash map pointing each element to the corresponding position in the heap as seen in one of my earlier post. Thus we can also achieve a run-time complexity of O(N*logK) using a combination of min-heap and max-heap to find the median.

But as we have seen that python heapq implementation do not have any arbitrary element deletion api. Thus to implement the code in Python, we cannot do the deletion operation as specified in the above algorithm. But we can take into account the deleted elements in the heap while doing the insertion operation. To do the re-adjustments we only need to know which of the heap contains the element to be deleted without actually deleting that element.

But whenever we encounter an already deleted element at the root of either of the heaps, we pop the element without any further consideration. Here is the Python code to achieve the same:

import heapq, math

class Solution(object):
    def medianSlidingWindow(self, nums, k):
        if k == 1:
            return [float(x) for x in nums]
        
        nums = zip(nums, range(len(nums)))
        pref = sorted(nums[:k], key=lambda k:k[0])
            
        max_heap, min_heap = pref[:int(k/2)], pref[int(k/2):]
        max_heap = [(-x, y) for x, y in max_heap]
        
        heapq.heapify(max_heap)
        heapq.heapify(min_heap)
        
        median = (-max_heap[0][0] + min_heap[0][0])/2.0 if k % 2 == 0 else float(min_heap[0][0])
        out = [median]
            
        for i in range(k, len(nums)):
            num, num_deleted = nums[i][0], nums[i-k][0]

            if num <= -max_heap[0][0]:
                heapq.heappush(max_heap, (-num, i))
                if num_deleted >= -max_heap[0][0]:
                    while max_heap[0][1] < i-k:
                        heapq.heappop(max_heap)

                    u = max_heap[0]
                    heapq.heappop(max_heap)
                    if u[1] > i-k:
                        heapq.heappush(min_heap, (-u[0], u[1]))

            else:
                heapq.heappush(min_heap, (num, i))
                if num_deleted <= min_heap[0][0]:
                    while min_heap[0][1] < i-k:
                        heapq.heappop(min_heap)

                    u = min_heap[0]
                    heapq.heappop(min_heap)
                    if u[1] > i-k:
                        heapq.heappush(max_heap, (-u[0], u[1]))
            
            while max_heap[0][1] <= i-k:
                heapq.heappop(max_heap)
                
            while min_heap[0][1] <= i-k:
                heapq.heappop(min_heap)
                
            median = (-max_heap[0][0] + min_heap[0][0])/2.0 if k % 2 == 0 else float(min_heap[0][0])
            out.append(median)
        
        return out

Since we are doing the insertion to the heaps without deleting the elements outside of the sliding window, we cannot say that the run-time complexity of the code is O(N*logK) but O(N*logN) because the maximum possible number of elements in the heap is N/2. Still it is better than O(N2*logN).

Categories: PROBLEM SOLVING

Tags: , , , , , , ,

Leave a Reply