Stokastik

Machine Learning, AI and Programming

Leetcode : K-Similar Strings

Problem Statement

Solution:

The brute force approach is to evaluate all possible swaps. Given the string is of length N, the number of possible pairs is U=0.5*N*(N-1). But observe that not just the swaps but the sequence of swaps also matters. The number of possible sequence of swaps is U!. Thus the run-time complexity is O(U!). For moderately sized value of N such as 20 as given by the problem definition, the brute force run-time is too huge. One can also come up with similar run-time if we visualize the steps on a tree.

The root of the tree is the original string A. At the first level we have all strings obtained by swapping one pair of characters. Since there are U possible pairs, thus at the first level of the tree we will have U number of strings. At the 2nd level we will have all possible strings obtained by one swap from each first level string. Thus the number of possible strings at 2nd level is U2. The height of the tree is O(U) which is the theoretical maximum possible number of swaps. Thus the total number of strings in the entire tree would be :

1 + U + U2 + U3 + ... U= O(UU+1) = O(UU)

Also note that U! <= (U)U = O(UU). Thus we obtain equivalent brute force run-times using a tree data structure.

But we can make use of one observation of the problem is that although the length of the string is N, the number of possible characters is limited to ['a', 'b', 'c', 'd', 'e', 'f'] . Thus only 6 possible characters. This will help to reduce the time-complexity.

Assuming that out of the N characters, H (<N) characters are same, rest all are different, then all swaps involving same characters do not count, because they all lead to the same configuration. For e.g. if there are H 'a's then we should omit all the swaps that involve two a's. Thus the total number of possible configurations is now :

(U - 0.5*H*(H-1))!

We can extend this to say that out of N characters, let's say that the number of 'a' is Ha, number of 'b' be Hb and so on till Hf, then the total number of possible swap configurations is:

(U - (Ua + Ub + ... + Uf))!, where Ua = 0.5*Ha*(Ha-1) and Ha + Hb + ... + Hf = N

Although we are able to reduce the time complexity by using the fact that there are repeated characters in the strings but the worst case complexity is still high in this case too (when N=20), which will occur when all the characters occurs equal number of times in the string A. Thus to make the swaps more efficient, we use some of the following observations:

  1. Do not swap characters in A if they are same.
  2. Do not swap characters in A if either of them is in its correct position in string B. For e.g. A="abcd" and B="bacd", then we should not swap "c" and "d" with any other positions, because they are already in correct positions.
  3. Use greedy approach to swap two characters first, for which the above two condition holds as well as after the swap at-least one of the character gets placed at its correct position according to string B.
  4. Goal is to place each character at its correct position according to string B.
  5. If a character is already at its correct position according to string B, then that character should not be swapped with any position.

Using the above two observations plus the fact that we should not swap same character positions in A, we can code the algorithm in Python as follows:

import collections
class Solution(object):
    def kSimilarity(self, A, B):
        chars, n = ['a', 'b', 'c', 'd', 'e', 'f'], len(A)
        
        queue = collections.deque([(A, 0, ''.join(chars))])
        visited = set([(A, ''.join(chars))])
        
        pos_b_dict = collections.defaultdict(list)
        for i in range(n):
            pos_b_dict[B[i]].append(i)
        
        min_cnt = float("Inf")
        while len(queue) > 0:
            front, cnt, all_chars = queue.popleft()
            
            if front == B:
                min_cnt = min(min_cnt, cnt)
            
            for char in all_chars:
                q = set(all_chars)
                q.remove(char)
                rem_chars = ''.join(sorted(q))
                new_cnt, taken  = 0, set()
                pos_a = [i for i in range(len(front)) if front[i] == char]
                
                new_front = front
                if len(pos_a) > 0:
                    for x in pos_a:
                        flag = False
                        for y in pos_b_dict[char]:
                            if new_front[y] == B[x] and new_front[x] != B[x] and new_front[y] != B[y] and new_front[x] != new_front[y] and y not in taken:
                                a, b = sorted((x, y))
                                new_front = new_front[:a] + new_front[b] + new_front[a+1:b] + new_front[a] + new_front[b+1:]
                                new_cnt += 1
                                flag = True
                                taken.add(y)
                                break

                            if flag:
                                break

                        if flag is False:
                            for y in pos_b_dict[char]:
                                if new_front[y] != B[y] and new_front[x] != B[x] and new_front[x] != new_front[y] and y not in taken:
                                    a, b = sorted((x, y))
                                    new_front = new_front[:a] + new_front[b] + new_front[a+1:b] + new_front[a] + new_front[b+1:]
                                    new_cnt += 1
                                    taken.add(y)
                                    break
                                    
                    if len(new_front) > 0 and (new_front, rem_chars) not in visited:
                        queue.append((new_front, cnt + new_cnt, rem_chars))
                        visited.add((new_front, rem_chars))
                
        return min_cnt

The algorithm works as follows:

  • Starting at the root of the tree with string A, generate all possible configurations obtained by placing the character "a" at its correct positions according to string B. Repeat this for all remaining characters "b", "c", "d", "e" and "f".
  • Use the above heuristics to generate these configurations.
  • We do the re-arrangement for all characters at each level because we do not know, that placing which character first at their respective correct positions will give us the shortest K.
  • Each node in the first level of the tree represents the configuration obtained by placing each of the characters "a", "b", ... "f" at their correct positions.
  • For the 2nd level, exclude the characters already placed at their correct respective positions in the 1st level nodes and only place the remaining characters at their respective correct positions according to string B.
  • Repeat this for all levels till either we have reached the configuration for string B, i.e. no further possible re-arrangement can be made.

At each level of the tree, we are doing O(6N) comparisons, because for each character, we need to place that character at its correct positions in O(N) time complexity and there are 6 characters. Thus the asymptotic run-time should be theoretically :

6N + (6N)2 + ... + (6N)6 = O(N7)

Because the depth of the tree should be 6 as there are maximum 6 possible characters. These run-time can be further improved by traversing in both directions i.e. from A to B and also from B to A, in which case the run-time complexity would reduce to O(N4).

Following is the python code to traverse the BFS tree in both directions.

import collections
class Solution(object):
    def get_strings_at_depth(self, A, B, max_depth):
        chars, n = ['a', 'b', 'c', 'd', 'e', 'f'], len(A)
        
        queue = collections.deque([(A, 0, 0, ''.join(chars))])
        visited, cache = set([(A, ''.join(chars))]), {A : 0}
        
        pos_b_dict = collections.defaultdict(list)
        
        for i in range(n):
            pos_b_dict[B[i]].append(i)
        
        while len(queue) > 0:
            front, cnt, depth, all_chars = queue.popleft()
            
            if depth > max_depth:
                return cache
            
            for char in all_chars:
                q = set(all_chars)
                q.remove(char)
                rem_chars = ''.join(sorted(q))
                new_cnt, taken  = 0, set()
                pos_a = [i for i in range(len(front)) if front[i] == char]
                
                new_front = front
                if len(pos_a) > 0:
                    for x in pos_a:
                        flag = False
                        for y in pos_b_dict[char]:
                            if new_front[y] == B[x] and new_front[x] != B[x] and new_front[y] != B[y] and new_front[x] != new_front[y] and y not in taken:
                                a, b = sorted((x, y))
                                new_front = new_front[:a] + new_front[b] + new_front[a+1:b] + new_front[a] + new_front[b+1:]
                                new_cnt += 1
                                flag = True
                                taken.add(y)
                                break

                            if flag:
                                break

                        if flag is False:
                            for y in pos_b_dict[char]:
                                if new_front[y] != B[y] and new_front[x] != B[x] and new_front[x] != new_front[y] and y not in taken:
                                    a, b = sorted((x, y))
                                    new_front = new_front[:a] + new_front[b] + new_front[a+1:b] + new_front[a] + new_front[b+1:]
                                    new_cnt += 1
                                    taken.add(y)
                                    break
                                    
                    if len(new_front) > 0 and (new_front, rem_chars) not in visited:
                        queue.append((new_front, cnt + new_cnt, depth + 1, rem_chars))
                        visited.add((new_front, rem_chars))
                        
                        if new_front not in cache:
                            cache[new_front] = cnt + new_cnt
                        else:
                            cache[new_front] = min(cache[new_front], cnt + new_cnt)
        return cache
                        
                        
                        
    def kSimilarity(self, A, B):
        cache_a = self.get_strings_at_depth(A, B, max_depth=4)
        cache_b = self.get_strings_at_depth(B, A, max_depth=3)
        
        min_cnt = float("Inf")
        for mystr, cnt in cache_a.items():
            if mystr in cache_b:
                min_cnt = min(min_cnt, cnt + cache_b[mystr])
        
        return min_cnt

Find the minimum swap-distances of each string till depth 4 from string A, and also the minimum swap-distances of each string till depth 3 from string B. Then find all the common strings and among these common strings, return the sum of the minimum swap-distances from A and B. The above code has run-time complexity of O(N4).

Categories: PROBLEM SOLVING

Tags: , , , ,

Leave a Reply