Machine Learning, AI and Programming

Leetcode : Remove Duplicate Letters

Problem Statement


The problem looks to be easy on the outside but gets 'tricky' when we have to find the smallest possible string (alphabetically) with all unique letters. Removing duplicates is a very common problem that we usually encounter in our day to day software engineering. But in most use-cases it is sufficient to remove any one of the duplicates (first, last or randomly picked). Let's analyze the brute force approach for this problem. Assuming that the length of the string is N and there are 'n' unique letters and each letter is repeated 'm' times.

Thus n*m = N

In the brute force approach, each time we would pick one out of m possibilities for each letter and then evaluate the strings. Thus the total number of possible strings would be nm and thus the run-time complexity is also O(nm). This looks like a really bad approach.

We can solve this problem in O(N) time complexity assuming that n=26 is constant.

The idea is to scan the string S from start and fix the positions for some of the letters X such that we know if any x belonging to X occurs at a future position (if at all) then the resulting string would be alphabetically higher than the current one or x never occurs later at all.

One possible approach to fix these positions is to scan the string from start and whenever we encounter a letter Y which is the last occurrence of Y in the string and Y has not been fixed earlier, we fix Y (because if we don't fix Y now then Y will never occur in the final string).

...Y'='q' ... 'm' ... Y='t' ....

Once we fix Y, we can also fix some other letters that occur before Y in the string that has not been fixed yet. Note that from the position 'i' where we had last fixed a letter Y', to the position 'j' where Y is fixed (j > i) there could be letters that are alphabetically lower than Y. For e.g. if Y'= 'q' and Y = 't' and there is a 'm' in between which is not yet fixed (because 'm' also occurs in future) then fixing 'm' now will lead to a smaller string because 'qmt' is smaller than 'qtm' alphabetically.

...Y'='q' ... 'b' ... 'h' ... 'm' ...'p' ... 'r' ... 's' ... Y='t' ....

Similarly we can also choose other letters between Y' and Y which can further lower the final string. For e.g. find a letter between 'm' and 't' which is greater than 'm' but lower than 't' or a letter between 'q' and 'm' that is lower than 'm'. One algorithm is to find a sequence xt, xt+1, ..., xt+k between Y' and Y such that xt < xt+1 < ... < xt+k. This can be done in O(N) time complexity. For any letter in [xt, xt+k] if that letter is less than or equal to Y then add all these letters to the final string. For the remaining letters, carry it over to the next fixation point.

Word of advice : Try this exercise on example strings to understand why and how this works.

Here is the Python code:

from string import ascii_lowercase

class Solution(object):
    def removeDuplicateLetters(self, s):
        if len(s) == 0:
            return ''
        char_last_pos = {s[i]:i for i in range(len(s))}
        output_str, curr_out, visited, carry = [], [], set(), []
        for i in range(len(s)):
            if char_last_pos[s[i]] != i:
                if s[i] not in visited:
            elif s[i] not in visited:
                curr_out = carry + curr_out
                h, m, carry = [], -1, []
                for char in sorted(set(curr_out)): 
                    for j in range(m+1, len(curr_out)):
                        if curr_out[j] == char:
                            if curr_out[j] <= s[i]:
                            m = j
                curr_out = []
        return ''.join(output_str)

Although there is a nested for loop, but actually in each iteration we either add a letter to the 'curr_out' list or ignore or fix a letter present in 'curr_out' array and make it empty. Thus each letter is visited at-most twice. Thus the run-time complexity is O(N).


Tags: , , ,

Leave a Reply