Stokastik

Machine Learning, AI and Programming

Leetcode : Count The Repetitions

Problem Statement

Solution:

One naive approach is to estimate the value of M from the fact that the maximum value of M possible is:

Q=(n1*len(s1))/(n2*len(s2))

Let s(x) denote the string s repeated 'x' times.

Using binary search between 0 to Q, check if the string s2(n2*m) is present in the string s1(n1) for some m. If s2(n2*m) is present in s1(n1), then search to the right half of m else to the left half of m. Continue searching until we find some M for which any value m' > M, s2(n2*m') is NOT present in s1(n1). Searching for whether the string s2(n2*m) is present in s1(n1) is of the order O(n1*len(s1)) because in the worst case we need to scan the entire string s1(n1), which is of length n1*len(s1). The overall run-time is thus (factoring in the time to do binary search):

O(n1*len(s1)*log(Q))

If we denote n1*len(s1) by N, then the run-time is O(N*logN). Since N is of the order of 108, the run-time can be really huge and so this method is not effecient at all. Can we come up with an effecient approach ?

One of the question that we should ask is that, can we compute how many times the string s2 occurs in the string s1(n1) without needing to scan the entire string. One observation is that if the string s2 occurs between the indices [i, j] in s1(n1) then there exists some 'p' for which s2 will also occur between the indices [i+p, j+p], [i+2p, j+2p], ... [i+kp, j+kp] in s1(n1) where j + kp <= n1*len(s1)-1 for some factor 'k'.

'p' is known as the period. The period 'p' is in-fact multiple of len(s1). Thus if we find two occurrences of s2 such that they start/end at the same character position in s1, then we have found the period 'p' because (j + kp) mod len(s1) = j mod len(s1), since p is a multiple of len(s1).

For e.g. if s1='nlhqgllunmelayl' and n1=1000 and s2='lnl', then one occurrence of s2 would be [6,11], but observe that s2 will also occur at [21,26], [36,41] ... and so on. Thus period 'p' is 15 = len(s1).

Also if there are more than one occurrences of s2 between [i, j] and [i+p, j+p] then these occurrences will also have a period of 'p'. Observe that the maximum value of the period 'p' is len(s1)*len(s2), because the maximum number of occurrences of s1 required for s2 to be present in it is len(s1)*len(s2)

For e.g. s1="aaabaa" and s2="bbb", then we need al-least 3 occurrences of s1 for s2 to be present, if at all s2 should be present.

To find the number of occurrences of s2 in s1(n1) , find the minimum value of 'p' first in O(len(s1)*len(s2)) time complexity and then for all j' between j and j+p, find the respective k', such that j'+k'p <= n1*len(s1)-1:

k' = (n1*len(s1)-1-j')/p

Add up all these k' to get the number of times s2 occurs in s1(n1). The number of such k's would be O(len(s1)).

The python code for the solution is as follows:

class Solution(object):
    def get_period(self, s1, s2):
        end_positions, end_pos_map = [], dict()
        i, j = 0, 0
        n, m = len(s1), len(s2)
        period = 0
        
        while True:
            if s1[i%n] == s2[j%m]:
                i += 1
                j += 1
                if j % m == 0:
                    if (i-1) % n in end_pos_map:
                        period = i-1-end_pos_map[(i-1)%n]
                        break
                    end_positions.append(i-1)
                    end_pos_map[(i-1)%n] = i-1
            else:
                i += 1
        
        return end_positions, period

    def getMaxRepetitions(self, s1, n1, s2, n2):
        if n1 == 0 or len(set(s2).difference(set(s1))) > 0:
            return 0
        
        positions, period = self.get_period(s1, s2)
        
        max_pos, sums = n1*len(s1)-1, len(positions)
        
        for x in positions:
            sums += (max_pos-x)/period
            
        return sums/n2

The complexity of this approach is O(len(s1)*len(s2)), which is much better than what we had got earlier because len(s1) or len(s2) is of O(102) only.

Categories: PROBLEM SOLVING

Tags: , , , , ,