# Stokastik

## Machine Learning, AI and Programming

Problem Statement

Solution:

This is one of the problems asked frequently in coding interviews but has a relatively straighforward solution once one can get hold of the data-structures that he/she is going to implement. The idea is pretty simple, starting from 'beginWord', find out all possible words that are 1-distance away and also in the list of valid words. Repeat this step until we hit the 'endWord'. But since the problem asks to list all possible intermediate paths from 'beginWord' to 'endWord' and not only one of the paths or the shortest path length, we need to track all possible shortest paths to 'endWord'.

As we know that shortest path can be found out using BFS approach, but since we also need to track the paths taken in BFS, thus we also store the words at 1-distance away from the current word in a way such that we can easily reconstruct all the paths.

Here is a Python code to implement the same. Note that we are using a FIFO queue data structure to implement the BFS and Adjacency List to store the words at a 1-distance away.

```import collections, string

class Solution(object):
if beginWord == endWord:
return [[beginWord]]
return []
else:
out = []
for x in new_lst:
for y in w:
out.append([beginWord] + y)
return out

word_set = set(wordList)

if endWord not in word_set:
return []

queue, adjacency_list, visited = collections.deque([(beginWord, 0)]), collections.defaultdict(list), dict()
visited[beginWord] = 0

found = False

while len(queue) > 0:
first, depth = queue.popleft()

if curr_word != endWord:
for pos in range(len(first)):
prefix, suffix = first[:pos], first[pos+1:]
for alphabet in list(string.ascii_lowercase):
if alphabet != first[pos]:
new_str = prefix + alphabet + suffix
if new_str in word_set:
if new_str not in visited:
queue.append((new_str, depth+1))
visited[new_str] = depth+1

if visited[new_str] == depth+1:
else:
found = True

if found is False:
return []

To search for all words at 1-distance from current word, there are two possible approaches:

• Scan each word in the vocabulary list and compute the distance with the current word and only filter those that are at 1-distance only. Assuming there are N words in the list and each word is of length O(D), it takes O(ND) time complexity to search for words at 1-distance from current word.
• For each position in the current word, replace it with all possible letters from 'a' to 'z' except for the actual letter at that position. Then check if the new word is present in the 'wordList'. The time complexity for this approach is O(KD) where K=26 is a constant.

Thus we see the second approach is more efficient.

Observe that we keep a map for the variable 'visited' with the word mapping to its 'depth' (i.e. minimum distance from 'beginWord') at where it is seen first. We could have also used a 'set' data structure instead of 'map'. But in that case the adjacency list will contain cyclic references and we would need to handle them separately in the 'reconstruct' recursive method.

Observe that for the current word, we only want to store words in the wordList at 1-distance, which are at a 'depth' of 1 + 'depth' of current word and not the ones which has been already seen at lesser 'depths'. This is due to the condition that we only want to find minimum length paths. If we go back at lower depths then that path would not be the shortest.

If the 'endWord' is at a minimum distance of L from 'beginWord', then assuming that there are at-most 2 words at 1-distance from every word, we need to scan O(2L) words, which is O(N) since L=O(logN), where N is the number of words. In the worst case, we need to scan each and every word in the list, thus the time complexity is O(K*N*D) for constructing the adjacency list, where K=26 is a constant.

The complexity of the reconstruct method is equal to O(L*M), where L is the length of the shortest path and M is the number of possible shortest paths. To estimate the number of possible paths, assuming that we have a binary tree, where each word has 2 words at 1-distance from it, then at each 'depth' in the tree, we have 2 choices, thus the total number of paths is O(2L). The total time complexity of 'reconstruct' is thus O(L*2L) which is O(N*logN). This is true in case of any 'n-ary' tree where n>1. In the best case, where each word has only one word at 1-distance, the complexity is O(N).

Thus the total complexity of the above approach = O(K*N*D + N*logN)

Now observe that if we can go from 'beginWord' to 'endWord', then we can also go from 'endWord' to 'beginWord' in the reverse direction in the same number of minimum steps. Assuming we are traversing a tree, if we go from the root ('beginWord') towards 'endWord' and simultaneously go from 'endWord' towards root ('beginWord'), then they would cross paths at around half-way in the tree.

In that case we would only need to scan O(2L/2) words from each direction. Thus the number of words scanned would be O(2L/2+1). This in turn comes out as a complexity of O(N1/2). Thus the overall complexity of constructing the adjacency lists in each direction is O(K*N1/2*D). The complexity for the 'reconstruct' method would remain same since the length of the shortest path and the number of shortest paths would remain same. This can also be proved theoretically. Given that the number of 'half-paths' of length L/2 from each direction would be O(2L/2), we need to merge each pair of 'half-paths' to get all possible full-paths. Thus the number of 'full-paths' would again come out to be O(2L).

Improved run-time complexity = O(K*N1/2*D + N*logN)

Here is the updated Python code to traverse in both-directions simulateneously:

```import collections, string

class Solution(object):
if beginWord == endWord:
return [[beginWord]]
return []
else:
out = []
for x in new_lst:
for y in w:
out.append([beginWord] + y)
return out

def get_adjacency_list(self, word, word_set, max_depth, reverse=False):
queue, adjacency_list, visited = collections.deque([(word, 0)]), collections.defaultdict(list), dict()
visited[word] = 0

while len(queue) > 0:
first, depth = queue.popleft()

if depth < max_depth:
for pos in range(len(first)):
prefix, suffix = first[:pos], first[pos+1:]
for alphabet in list(string.ascii_lowercase):
if alphabet != first[pos]:
new_str = prefix + alphabet + suffix
if new_str in word_set:
if new_str not in visited:
queue.append((new_str, depth+1))
visited[new_str] = depth+1

if visited[new_str] == depth+1:
if reverse:
else:

def min_distance(self, beginWord, endWord, word_set):
queue1, queue2 = collections.deque([(beginWord, 0)]), collections.deque([(endWord, 0)])
visited1, visited2 = dict(), dict()

visited1[beginWord] = 0
visited2[endWord] = 0

while len(queue1) > 0 and len(queue2) > 0:
first1, depth1 = queue1.popleft()
first2, depth2 = queue2.popleft()

if first1 == first2:
return depth1+depth2

for pos in range(len(first1)):
prefix, suffix = first1[:pos], first1[pos+1:]
for alphabet in list(string.ascii_lowercase):
if alphabet != first1[pos]:
new_str = prefix + alphabet + suffix
if new_str in word_set and new_str not in visited1:
if new_str in visited2:
return depth1+visited2[new_str]+1

queue1.append((new_str, depth1+1))
visited1[new_str] = depth1+1

for pos in range(len(first2)):
prefix, suffix = first2[:pos], first2[pos+1:]
for alphabet in list(string.ascii_lowercase):
if alphabet != first2[pos]:
new_str = prefix + alphabet + suffix
if new_str in word_set and new_str not in visited2:
if new_str in visited1:
return depth2+visited1[new_str]+1

queue2.append((new_str, depth2+1))
visited2[new_str] = depth2+1

return -1

word_set = set(wordList)

if endWord not in word_set:
return []

dist = self.min_distance(beginWord, endWord, word_set)
if dist == -1:
return []

if dist % 2 == 0:
a, b = dist/2, dist/2
else:
a, b = dist/2+1, dist/2