Stokastik

Machine Learning, AI and Programming

Leetcode : Design O(1) data structure for insertion, deletion and get random

Problem Statement

Solution:

It is pretty obvious that O(1) insertion and removal can be designed using a Set or Hash-Map or a Doubly Linked List. When duplicates are prohibited, using a Set() data structure is very convenient, but when duplicates are allowed we cannot use Set. Instead we can use Hash-Map. The key of the Hash-Map are the unique integers while the values are the count of each integer. While inserting we increment the count for that integer if already present, else add a new key and set the count to 1. While removing an integer we decrement the counter and if counter becomes 0, we remove the key. The only problem we will face with the Hash-Map is that we cannot fetch a random integer uniformly in O(1) time.

Given that we need to get random numbers uniformly from the integers according to their counts distribution, we can achieve it in O(M) time where M is the number of unique integers. The following simple method can do that:

import random
def get_random(count_map, m):
    p = random.uniform(0, m)
    s = 0
    for h, count in count_map.items():
        s += count
        if p < s:
            return h
    return None

But we need to get random numbers in O(1) time complexity which can be made possible only if we use "random.choice" method on the list of all the integers including the duplicates. This is possible with two Hash-Maps but we are going to demonstrate it using a Doubly Linked List (DLL) which is conceptually simpler.

Recall that DLL allows O(1) insertion and removal. Insertion is pretty straightforward. Append the new Node at the tail of the DLL. Since in a DLL we have both previous and next pointer for each Node, once we know which Node to remove, we just set the next pointer of its previous Node to the next Node and similarly the previous pointer of its next Node to its previous Node.

Deletion of Node in DLL

But unlike array, Nodes in linked list are not indexed and thus we cannot access a Node to be removed in O(1) time complexity, we need to linearly scan for the Node which is O(N) time. Instead we can use a Hash-Map to map an integer to its corresponding Node pointer in the linked list in O(1) time and then delete the pointer. In case of duplicates, we map an integer to a set of Nodes in the linked list.

Here is the Python code for O(1) insertion and removal from DLL.

class Node(object):
    def __init__(self, val):
        self.val = val
        self.next, self.prev = None, None
        
class RandomizedCollection(object):

    def __init__(self):
        self.tail = None
        self.node_map = dict()

    def insert(self, val):
        node = Node(val)
        
        if self.tail is None:
            self.tail = node
            
        else:
            self.tail.next = node
            node.prev = self.tail
            self.tail = self.tail.next
        
        out = val not in self.node_map
        if out:
            self.node_map[val] = set()
            
        self.node_map[val].add(node)
        return out
        

    def remove(self, val):
        if val in self.node_map:
            node = self.node_map[val].pop()
            prev, nxt = node.prev, node.next
            
            if prev is None:
                self.head = nxt
            else:
                prev.next = nxt
                
            if nxt is None:
                self.tail = prev
            else:
                nxt.prev = prev
            
            if len(self.node_map[val]) == 0:
                self.node_map.pop(val)
            
            return True
        return False

Now coming to the get_random part, we can keep a list of the Nodes present in the DLL and then randomly select one and return its integer value. But this is fine if we do not remove any Nodes from in-between. If we are removing a Node from the list of Nodes, the list needs to be updated which is O(N) again. Deletion in DLL in O(1) but not in an array or in an indexed list.

Instead we delete the tail Node from the DLL and only change the value of the Node to be deleted to be equal to the value of the tail Node. Any change made to the Node pointers in the DLL is reflected in the Nodes list because Node is a pointer. Thus we only need to remove the last element from the Node list. Here is the Python code for the entire functionality - i.e. O(1) insertion, removal and get_random:

import collections, random
from random import randint

class Node(object):
    def __init__(self, val):
        self.val = val
        self.next, self.prev = None, None
        
class RandomizedCollection(object):

    def __init__(self):
        self.tail = None
        self.node_map = dict()
        self.node_list = []

    def insert(self, val):
        node = Node(val)
        
        if self.tail is None:
            self.tail = node
            
        else:
            self.tail.next = node
            node.prev = self.tail
            self.tail = self.tail.next
        
        out = val not in self.node_map
        if out:
            self.node_map[val] = set()
            
        self.node_map[val].add(node)
        self.node_list.append(node)
        
        return out
        

    def remove(self, val):
        if val in self.node_map:
            node = self.node_map[val].pop()
            node.val = self.tail.val
            self.node_map[node.val].add(node)
            self.node_map[node.val].remove(self.tail)
            
            self.tail = self.tail.prev
            
            if self.tail is not None:
                self.tail.next = None
            
            if len(self.node_list) > 0:
                self.node_list.pop()
                
            if len(self.node_map[val]) == 0:
                self.node_map.pop(val)
            
            return True
        return False
        

    def getRandom(self):
        if len(self.node_list) > 0:
            return random.choice(self.node_list).val
        return None

Categories: PROBLEM SOLVING

Tags: , , , , ,

Leave a Reply