Stokastik

Machine Learning, AI and Programming

LeetCode : Maximal Rectangle

Problem Statement

Solution :

A simple brute force approach would require for each pair of points in the matrix, whether the points form the top left and bottom right corner of a rectangle (all 1's) and compute its area. Since there are O(N2) points, thus there are O(N4) pairs of points. Checking whether there is a rectangle between them and computing its area takes O(N2) comparisons. Thus the total run-time is O(N6), too bad !!! Definitely we can re-use some of our computations.

For example, for a pair of points (X1, Y1) and (X2, Y2), use the intermediate points (X1, Y) and (X2, Y) for some Y in range [Y1, Y2], to determine whether there is a rectangle between (X1, Y1) and (X2, Y2).

Rectangles

There is a rectangle between (X1, Y1) and (X2, Y2) if for some Y in the range [Y1, Y2], there is a rectangle between (X1, Y1) and (X2, Y) (red dots) and a rectangle between (X1, Y) and (X2, Y2) (yellow dots). The run-time with this optimization is O(N4), still bad !!! But we got the idea that this requires some kind of dynamic programming approach.

Instead of sticking with the above approach of checking whether there exist a rectangle or not, let's modify our approach by assuming that there is a rectangle starting at a point (X, Y) with the value 1 in the matrix, because there will always be a rectangle starting at a point having value of 1 (even if it's a rectangle of size 1x1).

Rows in matrix w.r.t. a single column

Let's say that currently we are at the top left cell with a 1 as in the above diagram, then any rectangle starting at that cell, will have a maximum height equal to the maximum number of continuous 1's along the same column starting at that cell. In the above diagram it is 5.

But for each of those rows, the number of continuous 1's along the width can be different (as shown in the diagram), which suggests that multiple possible rectangles could be formed.

For only the first row, area of rectangle = 1x5 = 5.

For the first 2 rows, area of rectangle = 2x5 = 10 (because width of rectangle has to be minimum of all the rows).

For the first 3 rows, area of rectangle = 3x3 = 9.

For the first 4 rows, area of rectangle = 4x3 = 12.

For the first 5 rows, area of rectangle = 5x3 = 15.

Note that the above area of rectangles are all rectangles starting at the top left cell. Rectangles can start anywhere in the matrix with a cell value of 1. Among all possible rectangles in the above diagram, maximum area is 15, which co-incidentally starts at the top left cell and ends in the last row.

Longest running histogram

 

In the above diagram, the maximum area of rectangle is 3x7 = 21, starting at the start of the 2nd row and ending at end of the 4th row.

Thus for each cell (X, Y) in the matrix, compute the width towards right (same row), starting from the column of that cell (i.e. number of continuous 1's along the same row towards right, but starting from (X, Y)).

Then iterate towards the bottom of the matrix starting from (X, Y) but along the same column.

While iterating each row towards the bottom, compute the width of each row (similar to the starting row). If the cell value of a bottom row (same column) is 1 then continue to go down else stop iterating further.

If the width of this row is greater than the current minimum width then considering the minimum width, compute the area of the rectangle of height equal to the distance of this row from the start row and width equal to the current minimum width. For e.g. In the first matrix diagram, if the blue row is our starting row and the orange row is our current row, then height = 4 and width = 3.

Else if the width of a bottom row is less than the current minimum width, then update the current minimum width and compute the area of the rectangle.

Python code for the above approach :

class Solution(object):
def maximalRectangle(self, matrix):
if len(matrix) == 0:
return 0
max_len_rt = [[0] * (len(matrix[0]) + 1) for _ in range(len(matrix) + 1)]
max_area = 0
for row in reversed(range(len(matrix))):
for col in reversed(range(len(matrix[0]))):
if matrix[row][col] == "1":
x = max_len_rt[row][col + 1] + 1
max_len_rt[row][col] = x
max_area = max(max_area, x)
for row2 in range(row + 1, len(matrix)):
if matrix[row2][col] == "1":
y = max_len_rt[row2][col]
if y <= x:
area = (row2 - row + 1) * y
x = y
else:
area = (row2 - row + 1) * x
max_area = max(max_area, area)
else:
break
return max_area

Note that in order to compute the width along the right side starting from a particular cell, we are using a dynamic programming approach. If a cell value is 1, then the width along the right side starting from the cell is equal to :

width(X, Y) = 1 + width(X, Y+1), if matrix[X][Y] = 1

Since we are iterating starting from the cell at the bottom right corner, thus when we compute width(X, Y), we already have computed width(X, Y+1). In the above code 'max_len_rt' is our width variable.

The run-time of the above code is O(N3), since for each cell, we do one more iteration towards the bottom. The above code is accepted by the LeetCode online judge.

But some of the solutions presented by submitters runs in O(N2) time complexity. So I took up the additional effort of solving this problem in O(N2). To do that we will revisit one of our earlier LeetCode problem solution. The Largest Rectangle in Histogram problem. But why ?

Note that the matrix diagrams presented above when rotated by 90 degrees ant-clockwise, presents itself as histograms of width 1.

Matrix transformed into Histograms.

Finding the largest rectangle in the matrix, is equivalent to finding the largest rectangle in a histogram, for each cell in the matrix and then taking the max out. The above is one such histogram for the cell in blue at the bottom left corner. The height of a bin is same as that of a row width defined in our earlier approach.

In our solution to the largest rectangle in a histogram problem, for a histogram with N bins, the time complexity to search for the longest "compatible" bins along both the left and right side of a bin was O(logN) and thus the total time taken for N such bins was O(N*logN).

In-fact, the total time taken could be bettered to O(N) by using a different technique, which we did not discuss there. The idea is to find the closest incompatible bin for each bin.

So for a bin of height 'h1', if the next bin is of height 'h2', is smaller than h1, then we are done, else if h2 > h1, we go to the location of the closest incompatible bin for h2. Now since h2 > h1, thus the closest incompatible bin for h2 should be located before the closest incompatible bin for h1. If the height h3 of the closest incompatible bin for h2 is less than h1, then we are done, else repeat the above process with h3.

Closest incompatible bin

At first this does not look like an O(N) solution because for each bin, we are hopping across bins on the right or left. But careful analysis reveals that the total number of hops is also O(N). In the worst case the number of hops is 2N.

Below is the python code for finding the longest "compatible" distance along the right side for each bin. Uses the logic described above.

def get_max_flow(bins):
flows = [-1] * len(bins)
for idx in reversed(range(len(bins))):
if idx == len(bins) - 1 or bins[idx] > bins[idx + 1]:
flows[idx] = idx + 1
else:
y = idx + 1
while True:
x = flows[y]
if x == len(bins) or bins[idx] > bins[x]:
flows[idx] = x
break
else:
y = x
return flows

In order to convince myself that the run time of the above code is indeed O(N), I did a benchmark analysis of the code with random inputs of sizes 1000 to 10000 and got the below time analysis.

Timing analysis of the above code. Linear Trend

Note that the trend is linear but there are multiple spikes in between due to the variation from O(N) to O(2N) due to the input patterns.

Thus convinced that finding the largest rectangle in histogram indeed takes O(N) time complexity, i applied the same algorithm above, for each cell in the matrix having a value of 1. The total time complexity of finding the longest compatible heights along each direction for a particular cell in the matrix is O(N2) (using dynamic programming).

Below is the final code for finding the largest rectangle in the matrix :

class Solution(object):
def get_column_flow(self, matrix, max_len_rt, direction=1):
flow = [[0] * (len(matrix[0])) for _ in range(len(matrix))]
n, m = range(len(matrix)), range(len(matrix[0]))
r_iter = n[::-1] if direction == 1 else n
c_iter = m[::-1] if direction == 1 else m
edge_row = len(matrix) - 1 if direction == 1 else 0
for row in r_iter:
for col in c_iter:
if matrix[row][col] == "1":
if row == edge_row or max_len_rt[row][col] > max_len_rt[row + direction][col]:
flow[row][col] = row + direction
else:
y = row + direction
while True:
x = flow[y][col]
if x == edge_row + direction or max_len_rt[row][col] > max_len_rt[x][col]:
flow[row][col] = x
break
else:
y = x
return flow
def maximalRectangle(self, matrix):
if len(matrix) == 0:
return 0
max_len_rt = [[0] * (len(matrix[0]) + 1) for _ in range(len(matrix) + 1)]
for row in reversed(range(len(matrix))):
for col in reversed(range(len(matrix[0]))):
if matrix[row][col] == "1":
max_len_rt[row][col] = max_len_rt[row][col + 1] + 1
fwd_flow = self.get_column_flow(matrix, max_len_rt, 1)
bwd_flow = self.get_column_flow(matrix, max_len_rt, -1)
max_area = 0
for row in range(len(matrix)):
for col in range(len(matrix[0])):
if matrix[row][col] == "1":
width = max_len_rt[row][col]
a = fwd_flow[row][col] - row
b = row - bwd_flow[row][col]
height = a + b - 1
max_area = max(max_area, width * height)
return max_area

Looks like we have achieved our goal of O(N2) complexity.

Categories: PROBLEM SOLVING

Tags: , , , ,