Stokastik

Machine Learning, AI and Programming

LeetCode: Three Equal Parts

Problem Statement

Solution:

The problem can be approached in multiple different ways. One trivial approach is to find the decimal representation for all binary sequences [i, j], where j >= i. Once we find the decimal representations for all [i, j], we can then find the indices i' and j' such that the decimal representation [0, i'], [i'+1, j'-1] and [j', N-1] are equal where N is the length of the input. Let us denote D(i, j) as the decimal representation for [i, j]. Then one can compute the values of D(i, j) using dynamic programming as follows:

D(i, j) = 2*D(i, j-1)+A[j], where A is the input array

Then [i', j'] is a solution if D(0, i') = D(i'+1, j'-1) = D(j', N-1).

There are three problems with this approach:

  1. Run time complexity is O(N2) which can be 'bettered'.
  2. The maximum length of the input is N=30K, thus we need a matrix of size 9x108 to store the decimal representations. Matrix of this big size causes MemoryError.
  3. The maximum length of the input is N=30K, thus the maximum decimal representation is of the order of 230,000, which is too huge a number and will cause integer overflow errors.

Let's tackle problems 1 and 2 first. We observe that the decimal representation D(i, j) can also be computed using the following formula:

D(i, j) = D(0, j) - 2j-i*D(0, i-1)

Thus instead of storing decimal representations for all [i, j], it is enough to store the prefix decimal representations F(i)'s which is the decimal representation for the sequence [0, i].

F(i) = 2*F(i-1) + A[i]

F(i)'s can be computed in linear time complexity O(N). But we also need to compute the indices [i', j'] for which the decimal representation of the three parts are equal. This can be done in the following way. For each position i, compute the prefix D(0, i) and suffix D(i+1, N-1) decimal representation. The prefix is straightforward while the suffixes are computed by applying the above formula, i.e.

D(0, i) = F(i), and

D(i+1, N-1) = F(N-1) - 2N-i-1*F(i)

For some i=j'-1, if the suffix D(j', N-1) has already been seen as a prefix (for some i' < j'-1), lets call it D(0, i'), i.e. D(0, i') = D(j', N-1) and also the intermediate D(i'+1, j'-1) has the same representation, then we return [i', j'] as the solution. We keep a dictionary or map to store the already seen prefix decimal representations mapping to the position in the array. Here is the python code for that:

class Solution(object):
    def threeEqualParts(self, A):
        n = len(A)
        prefix_sums = [0]*n
        
        for i in range(n):
            if i == 0:
                prefix_sums[i] = A[i]
            else:
                prefix_sums[i] = 2*prefix_sums[i-1] + A[i]
                
        pos = dict()
        for i in range(n):
            a, b = prefix_sums[i], prefix_sums[n-1] - prefix_sums[i]*(2**(n-i-1))
            if b in pos and a - prefix_sums[pos[b]]*(2**(i-pos[b])) == b:
                return [pos[b], i+1]
            pos[a] = i
                
        return [-1,-1]

Both the run-time and space complexity is O(N) for this approach. But we have still not solved for the integer overflow issue that will arise with this approach when N becomes large ~30K. In order to handle that, we observe that we can solve this problem without even computing the decimal representations of the binary sequences. The idea is very simple:

  • If the input binary sequence can be divide into 3 equal parts of same decimal value then the number of 1's in the sequence should be a multiple of 3. If the number of 1's is not a multiple of 3 we return [-1, -1] immediately.
    • Thus each part should have M/3 1's in it where M is the total number of 1's in the input sequence
  • To find the binary sequence of the decimal number for which the solution exists, we observe that if we start to look from the end of the input sequence, then once we have encountered M/3 1's from the end, we have got our desired binary representation (reversed).
    • This approach will not work if we look from the beginning because we do not know how many trailing zeros should be there. But the input sequence should always end with the desired binary sequence.
  • Once we have the binary representation we check the remaining array to see if we can find sub-arrays equal to the representation. This is just a linear scan over the input sequence. For e.g. if the desired binary sequence is represented as X, then the input sequence must look like:

0*X0*X0*X, where 0* implies zero or more 0's

Here is the Python code for same:

import numpy as np

class Solution(object):
    def threeEqualParts(self, A):
        n, m = len(A), np.sum(A)
        
        if m % 3 != 0:
            return [-1,-1]
        elif m == 0:
            return [0,n-1]
        
        num_ones_each, rep = m/3, []
        
        i = n-1
        while num_ones_each > 0:
            rep.append(A[i])
            if A[i] == 1:
                num_ones_each -= 1
            i -= 1
        
        rep = rep[::-1]
        u, end = len(rep), i+1
        
        i, pos = 0, []
        while i < end:
            if A[i] == 1:
                if A[i:(i+u)] == rep:
                    pos.append(i+u-1)
                    i += u
                else:
                    return [-1,-1]
            else:
                i += 1
        
        return [pos[0], pos[1]+1]

Edit:

Given that if the binary sequence can be split into 3 equal parts, then it can be expressed as following:

0*X0*X0*X, where 0* implies zero or more 0's

So we can also use regular expressions to capture the matches by first converting the array to a string of 0s and 1s. Here is a more elegant Python code to do the same:

import re

class Solution(object):
    def threeEqualParts(self, A):
        str_a = ''.join([str(x) for x in A])
        
        if re.match(r'^0{3,}$', str_a):
            return [0, len(A)-1]
        
        p = re.compile(r'^0*(1.*)0*(\1)0*(\1)$')
        m = p.search(str_a)
        
        i, j = -1, -1
        
        if m is not None:
            i = m.span(1)[1] - 1
            j = m.span(2)[1]
        
        return [i,j]

Although the code looks elegant but the run-time complexity is not as good as our previous approach since regular expression matching is very slow as it uses back-tracking to match the regex which may not be actually linear time complexity.

Categories: PROBLEM SOLVING

Tags: , , , , ,