# Heard in the Cafeteria - ML Interview Questions - Part 3

Probability & Statistics Puzzles, Bayes' Theorem

1. A group of 60 students is randomly split into 3 classes of equal size. All partitions are equally likely. Jack and Jill are two students belonging to that group. What is the probability that Jack and Jill will end up in the same class ?
2. Jack is having two coins in his hand. Out of the two coins, one is a real coin and the second one is a faulty one with Tails on both sides. He blindfolds himself to choose a random coin and tosses it in the air. The coin falls down with Tails facing upwards. What is the probability that this tail is shown by the faulty coin ?
3. Rob has fever and the doctor suspects it to be typhoid. To be sure, the doctor wants to conduct the test. The test results positive when the patient actually has typhoid 80% of the time. The test gives positive when the patient does not have typhoid 10% of the time. If 1% of the population has typhoid, what is the probability that Rob has typhoid provided he tested positive?
4. About 30% of human twins are identical, and the rest are fraternal. Identical twins are necessarily the same sex, half are males and the other half are females. One-quarter of fraternal twins are both males, one-quarter both female, and one-half are mixed: one male, one female. You have just become a parent of twins and are told they are both girls. Given this information, what is the probability that they are identical?
5. The students of a particular class were given two tests for evaluation. Twenty-five percent of the class cleared both the tests and forty-five percent of the students were able to clear the first test. Calculate the percentage of students who passed the second test given that they were also able to pass the first test.
6. Hospital records show that 75% of patients suffering from a disease die due to that disease. What is the probability that 4 out of the 6 randomly selected patients recover?
7. The prior probability of anyone having HIV is 0.00148. The true positive rate for Elisa is 93% and the true negative rate is 99%.
• What is the probability that a recruit has HIV, given he tested positive on first Elisa test?
• What is the probability of having HIV, given he tested positive on Elisa the second time as well ?
8. Ahmed is playing a lottery game where he must pick 2 numbers from 0 to 9 followed by an English alphabet (from 26-letters). He may choose the same number both times.If his ticket matches the 2 numbers and 1 letter drawn in order, he wins the grand prize and receives $10405. If just his letter matches but one or both of the numbers do not match, he wins$100. Under any other circumstance, he wins nothing. The game costs him $5 to play. Suppose he has chosen 04R to play.What is the expected net profit from playing this ticket? 9. In a class of 30 students, approximately what is the probability that two of the students have their birthday on the same day (defined by same day and month) (assuming it’s not a leap year)? 10. A coin of diameter 1-inches is thrown on a table covered with a grid of lines each two inches apart. What is the probability that the coin lands inside a square without touching any of the lines of the grid? You can assume that the person throwing has no skill in throwing the coin and is throwing it randomly.You can assume that the person throwing has no skill in throwing the coin and is throwing it randomly. 11. Suppose you were interviewed for a technical role. 50% of the people who sat for the first interview received the call for second interview. 95% of the people who got a call for second interview felt good about their first interview. 75% of people who did not receive a second call, also felt good about their first interview. If you felt good after your first interview, what is the probability that you will receive a second interview call? 12. Some test scores follow a normal distribution with a mean of 18 and a standard deviation of 6. What proportion of test takers have scored between 18 and 24? 13. A roulette wheel has 38 slots, 18 are red, 18 are black, and 2 are green. You play five games and always bet on red. • What is the probability that you win all the 5 games? • How many games can you expect to win? 14. Suppose a life insurance company sells a$240,000 one year term life insurance policy to a 25-year old female for $210. The probability that the female survives the year is .999592. Find the expected value of this policy for the insurance company. 15. Three friends in Seattle told you it’s rainy. Each has a probability of 1/3 of lying. What’s the probability of Seattle is rainy? 16. A miner is trapped in a mine containing 3 doors. The first door leads to a tunnel that will take him to safety after 3 hours of travel. The second door leads to a tunnel that will return him to the mine after 5 hours of travel. The third door leads to a tunnel that will return him to the mine after 7 hours. If we assume that the miner is at all times equally likely to choose any one of doors, what is the expected length of time until he reaches safety? 17. Suppose that we are to be presented with n distinct prizes, in sequence. After being presented with a prize, we must immediately decide whether to accepted or to reject it and consider the next prize. The only information we are given when deciding whether to accept a prize is the relative rank of that prize compared to ones already seen. That is, for instance, when the fifth prize is presented, we learn how it compares with the four prizes we have already seen. Suppose that once a prize is rejected, it is lost, and that our objective is to maximize the probability of obtaining the best prize. Assuming that all n! orderings of the prizes are equally likely, how well can we do? 18. Imagine there are a 100 people in line to board a plane that seats 100. The first person in line realizes he lost his boarding pass so when he boards he decides to take a random seat instead. Every person that boards the plane after him will either take their "proper" seat, or if that seat is taken, a random seat instead. What is the probability that the last person that boards will end up in his/her proper seat ? 19. A bowl of spaghetti contains n strands. Thor picks two ends at random and joins them together. He does this until no ends remain. What is the • Expected number of spaghetti loops in the bowl? • Expected average length of the loops? (in strands) • Expected number of k-hoops? ( a k-hoop is a loop made from k strands) 20. Imagine you are given two envelopes, one of which contains twice as much money as the other. You're allowed to pick one envelope and keep the money inside. But just before you open your chosen envelope you are given the chance to change your mind. Should you stick with the envelope you picked first or switch? 21. In a country in which people only want boys, every family continues to have children until they have a boy. If they have a girl, they have another child. If they have a boy, they stop. What is the proportion of boys to girls in the country? 22. Flip a fair coin repeatedly until you get two heads in a row (HH). On average, how many flips should this take? What if we flip until we get heads followed by tails (HT)? Are the answers the same? 23. Flip a fair coin repeatedly until you get two heads in a row (HH). What is the probability of getting HH in at-most N (N > 1) tosses ? 24. Flip a coin repeatedly until either HH emerges (I win) or HT emerges (you win). Is the game fair? 25. There are N applicants for a secretarial position. The applicants are interviewed in random order, and you must accept or reject a candidate immediately after interviewing them. After you reject someone, there's no way to bring them back. There is only one position available, so as soon as you accept a candidate you are done. What strategy should you pursue in order to maximize the likelihood of hiring the best candidate out of the N applicants? 26. Let's play a game. I pick a probability distribution on the real line. You know nothing about it, except that it's everywhere nonzero. Using my probability distribution, I generate two distinct numbers. Then I flip a fair coin to determine which number to show you -- call it A, and call the other number B. Then I ask you whether A < B or B < A ? Certainly, you can just guess. And you'd have a 50 percent chance of winning.But is there a strategy to do better? 27. Suppose you are at a dinner party. The host wants to give out a door prize that is wrapped in a box. Everyone (including the host) sits around a circular table and each person is given a fair coin. Initially the host is holding the box. He flips his coin. If it is heads, he passes to the right. If it is tails, he passes to the left. The process is repeated by whichever guest is holding the box. (Heads, they pass right; tails, they pass left.) The game ends when all person has held the prize at-least once. That last person gets to keep the box as the winner of the game. Where should you sit to maximize your chance of winning? 28. Let's say A keep tossing a fair coin, until he get 2 consecutive heads, define X to be the number of tosses for this process; B keep tossing another fair coin, until he get 3 consecutive heads, define Y to be the number of the tosses for this process. Calculate P{X>Y} 29. Suppose n fair 6-sided dice are rolled simultaneously. What is the expected value of the score on the highest valued die? 30. The probability of a car passing a certain intersection in a 20 minute windows is 0.9. What is the probability of a car passing the intersection in a 5 minute window? 31. A stick is broken into 3 pieces, by randomly choosing two points along its unit length. What is the probability that the 3 parts form a triangle ? 32. There are n letters and n envelopes. Your servant puts the letters randomly in the envelopes so that each letter is in one envelope and all envelopes have exactly one letter. (Effectively a random permutation of n numbers chosen uniformly). Calculate the expected number of envelopes with correct letter inside them. 33. A soda company is holding a contest where everyone who collects one each of N different coupons wins some prize. You get a coupon with each purchase of a soda, and each coupon is equally likely. What’s the expected number of soda bottles you have to buy in order to collect all the coupons? 34. Given the set of numbers from 1 to n: { 1, 2, 3 .. n }. We draw n numbers randomly (with uniform distribution) from this set (with replacement). What is the expected number of distinct values that we would draw? 35. There are 64 teams who play single elimination tournament, hence 6 rounds, and you have to predict all the winners in all 63 games. Your score is then computed as follows: 32 points for correctly predicting the final winner, 16 points for each correct finalist, and so on, down to 1 point for every correctly predicted winner for the first round. (The maximum number of points you can get is thus 192.) Knowing nothing about any team, you flip fair coins to decide every one of your 63 bets. Compute the expected number of points. 36. You are in possession of n pairs of socks (hence a total of 2n socks) ranging in shades of grey, labeled from 1 (white) to n (black). Take the socks blindly from a drawer and pair them at random. What is the probability that they are paired so that the colors of any pair differ by at most 1? 37. You have 50 red marbles, 50 blue marbles and 2 jars. One of the jars is chosen at random and then one marble will be chosen from that jar at random. How would you maximize the chance of drawing a red marble? What is the probability of doing so? 38. If a university’s football team has a 10% chance of winning game 1 and a 30% chance of winning game 2, and a 65% chance of losing both games, what are their chances of winning exactly once? 39. Imagine a island that is inhabited by 10 people, and the politics is such that each day an islander is chosen at random to be chief for exactly one day; after the day has elapsed another islander is chosen at random (so the same islander who was just chief has a 1/10 chance of being chief again). The question to be solved: on average, how many days would have to elapse before each islander would have been chief at least once? 40. There are 75 multiple choice questions in an exam. Each question contains 4 possible answers only 1 is correct. The exam pass mark is 50%. What are the chances of passing the exam by guessing each answer? Answers and Hints 1. There are 2 ways to solve this: • Method 1 - Total number of ways of creating 3 groups of 20 each $\frac{60!}{3!20!20!20!}$. Removing Jack and Jill we are left with 58 students. Select a group with 18 students in 58C18 ways and arrange the remaining 40 students in 2 groups of 20 each in $\frac{40!}{2!20!20!}$ ways. Thus total ways of arranging the groups = 58C18*$\frac{40!}{2!20!20!}$. Thus probability = 19/59. Add back Jack and Jill into the group with 18 students. • Method 2 - Assign random numbers from 1 to 60 to the 60 students. Let 1st group contain students with numbers 1 to 20, 2nd group with numbers 21 to 40 and the remaining group with 41 to 60. After Jack is assigned a random number, there are 59 numbers left and Jill can be assigned 19 numbers out of 59 for her to be in the same group as Jack. Thus probability is 19/59. For e.g. if Jack is assigned 17 then Jill can only be assigned from (1-16, 18-20), similarly if Jack is assigned 25 Jill can be assigned only (21-24, 26-40) and so on. 2. Using Bayes' Theorem P(Faulty|Tails) = $\frac{\text{P}(\text{Tails}|\text{Faulty})\text{P}(\text{Faulty})}{\text{P}(\text{Tails})}$ • P(Tails|Faulty) = 1 • P(Faulty) = 0.5 • P(Tails) = P(Tails|Faulty)P(Faulty) + P(Tails|Non-Faulty)P(Non-Faulty) = 1*0.5+0.5*0.5=0.75 • P(Faulty|Tails) = 0.5/0.75 = 0.667 3. Using Bayes' Theorem P(Typhoid|Positive) = $\frac{\text{P}(\text{Positive}|\text{Typhoid})\text{P}(\text{Typhoid})}{\text{P}(\text{Positive})}$ • P(Positive|Typhoid) = 0.8 • P(Typhoid) = 0.01 • P(Positive) = P(Positive|Typhoid)P(Typhoid) + P(Positive|Non-Typhoid)P(Non-Typhoid) = 0.8*0.01+0.1*0.99 • P(Typhoid|Positive) = 0.0747 4. Let GG denote that both children are girls. • P(Identical|GG) = $\frac{\text{P}(\text{GG}|\text{Identical})\text{P}(\text{Identical})}{\text{P}(\text{GG})}$ • P(GG|Identical) = 0.5 • P(Identical) = 0.3 • P(GG) = P(GG|Identical)P(Identical) + P(GG|Fraternal)P(Fraternal) = 0.5*0.3+0.25*0.7 • P(Identical|GG) = 0.4615 5. P(Pass-2|Pass-1) = $\frac{\text{P}(\text{Pass-1},\text{Pass-2})}{\text{P}(\text{Pass-1})}$ • P(Pass-1, Pass-2) = 0.25 • P(Pass-1) = 0.45 • P(Pass-2|Pass-1) = 0.555 6. 6C4*0.254*0.752= 0.033 7. P(HIV) = 0.00148. P(Elisa+ve|HIV) = 0.93 and P(Elisa-ve|No HIV) = 0.99 • P(Elisa+ve) = P(Elisa+ve|HIV)P(HIV) + P(Elisa+ve|No HIV)P(No HIV) • P(No HIV) = 1-0.00148 • P(Elisa+ve|No HIV) = 1-P(Elisa-ve|No HIV) = 1-0.99 • P(Elisa+ve) = 0.93*0.00148 + (1-0.99)*(1-0.00148) • P(HIV|Elisa+ve) = 0.121 • P(Elisa+ve, Elisa+ve|HIV) = 0.93*0.93 • P(Elisa+ve, Elisa+ve) = P(Elisa+ve, Elisa+ve|HIV)P(HIV) + P(Elisa+ve, Elisa+ve|No HIV)P(No HIV) • P(Elisa+ve, Elisa+ve|No HIV) = (1-0.99)*(1-0.99) = 0.01*0.01 • P(HIV|Elisa+ve, Elisa+ve) = 0.93*0.93*0.00148/(0.93*0.93*0.00148+0.01*0.01*(1-0.00148)) = 0.927 8. E(profit) = 10405*P(all matches) + 100*P(only letter matches) - 5 • P(all matches) = 1/2600 • P(only letter matches) = (9*9+2C1*9)/2600 = 99/2600 • E(profit) = 10405/2600 + 9900/2600 - 5 =$2.81
9. Total number of ways = 36530. Select 2 students in 30C2 ways and select a birthday for them in 365 ways. The remaining 28 students will have different birthdays in 364*363*...(365-28) ways. Thus probability 2 students have same birthday = 30C2*365*364*363*...(365-28)/36530 = 0.38.
10. The center of the coin should lie inside a 1-inch square inside the 2-inch square in order to not touch the lines of the 2-inch square. Thus probability = area of 1-inch square/area of 2-inch square = 1/4.
11. Again using Bayes' Theorem P(2nd|1st) = 0.5, P(Good|2nd) = 0.95, P(Good|No 2nd) = 0.75
• P(Good) = 0.95*0.5 + 0.75*0.5
• P(2nd|Good) = 0.95/P(Good) = 0.56
12. import scipy.stats
scipy.stats.norm(18, 6).cdf(24)-scipy.stats.norm(18, 6).cdf(18)

13. P(R) = 18/38, P(B) = 18/38, P(G) = 2/38.
• P(RRRRR) = (18/28)5 = 0.0238
• E = 1*5C1*P*(1-P)4 + 2*5C2*P2*(1-P)3 + 3*5C3*P3*(1-P)2 + 4*5C4*P4*(1-P) + 5*5C5*P5 = 2.37
14. 210*0.999592-240000*(1-0.999592) = \$112
15. Let R denote random variable that it is actually Rainy. A denote the event that 1st friend has told that that it is raining, B denote the event that 2nd friend has told that that it is raining and C denote event that 3rd friend has told that that it is raining.
• P(R|A, B, C) = P(A, B, C|R)P(R)/P(A, B, C)
• P(A|R) = P(B|R) = P(C|R) = 2/3 because each friend has 2/3 probability of telling truth i.e. saying it is raining given that it is actually raining.
• P(R) = 0.5 (can assume anything reasonable here)
• P(A, B, C|R) = 8/27
• P(A, B, C|NR) = 1/27
• P(A, B, C) = P(A, B, C|R)P(R) + P(A, B, C|NR)P(NR) = 0.5*(8/27+1/27) = 1/6
• P(R|A, B, C) = 24/27 = 8/9
16. Let the expected time of exit be X hours, then we can derive the following recursion relation:
• X = (1/3)(3) + (1/3)(X+5) + (1/3)(X+7)
• Solving for X = 3+5+7 = 15 hours
17. Strategy is to reject the first k prizes and then take the one that is greater than the highest prize in those k rejected prizes.
• Suppose I take the i-th prize, then the probability that the i-th prize is the best prize is P(best|X=i)P(X=i)
• Probability of choosing the best prize :
• P(best) = P(best|X=1)P(X=1) + P(best|X=2)P(X=2) + ... + P(best|X=n)P(X=n)
• P(X=i) = 1/n
• P(best|X=i) probability that if I choose the i-th prize then it is the best prize
• It means that all prizes from k+1-th prize to i-1-th prize are smaller or equal to the highest prize in the first k prizes, then only we will select the i-th prize which is higher than the highest prize in the first k prizes.
• Or in other words the best prize of the first i-1 prizes is in the first k prizes.
• Thus P(best|X=i) = k/(i-1)
• If i <= k, the P(best|X=i) = 0 because we reject all prizes in the first k prizes thus we do not get the best prize.
• P(best) = 1/n*(1 + k/(k+1) + k/(k+2) + ... + k/(n-1))
• Choose k for which the above summation is highest.
18. The probability of the N-th passenger getting his/her own seat is 0.5
• The 1st passenger can choose either the 1st seat or any other seat k > 1.
• If the 1st passenger chooses the 1st seat, then all remaining passengers take their own seats else if 1st passenger chooses seat k > 1, then all passengers from 2 to k-1 gets their own seat and k-th passenger can either choose seat 1 or any seat k' > k.
• If the k-th passenger chooses seat 1 then all passengers from k+1 to N gets their own seats. Else if he chooses seat k' > k then again all passengers from k+1 to k'-1 gets their own seats and the k'-th passenger gets to choose either seat 1 or any seat k'' > k'.
• Thus any passenger 1 < P < N, will have 3 choices, either seat 1 or his own seat or any seat K > P.
• For the last passenger N, he cannot choose any seat > N because there are no seats > N. Thus he has only option either seat 1 or his own seat. Thus probability is 0.5.
19. There are 2n ends. Once I select an end, I can select another end such that it forms a loop with a probability of 1/(2n-1) because only the other end of the strand will form a loop. Thus there are n-1 strands (2n-2 ends) left to do the same. But if I select 2 ends which does not form a loop with probability (2n-2)/(2n-1) then also the number of open strands left is n-1 because the open ends are still 2n-2.
• The expected number of loops with n strands : E(n) = 1/(2n-1)*(1 + E(n-1)) + (2n-2)/(2n-1)*E(n-1) = 1/(2n-1) + E(n-1)
• E(n) = 1 + 1/3 + 1/5 + .... + 1/(2n-1)
• Expected average length of loops = Total number of ends/Expected number of loops = 2n/E(n).
20. TBD
21. Let the probability of having a boy/girl be 0.5. Since each family will have only one boy after infinite time, thus expected number of boys is 1.
• We will have the following scenarios : B, GB, GGB, GGGB, .....
• Thus expected number of girls S = 1*(1/2)2 + 2*(1/2)3 + 3*(1/2)4 + .....
• Then (1/2)*S = (1/2)2 + (1/2)3 + (1/2)4 + .... = 1/2
• Thus expected number of girls S = 1
• Thus expected proportion is 1.
• Logically it would seem that the expected number of girls should be higher than the expected number of boys.
• The above assumes that the number of families is infinite.
• Thus the the probability of the 1st scenario {B} is same as the combined probability of the remaining scenarios {GB, GGB, GGGB,...}. Hence the number of scenarios of 1st boy nullifies the effects of all the other scenarios of higher number of girls. Think like this that for every 'GGGB' there are two 'B' scenarios.
• Thus the effective proportion of boys to girls remains as 1.
22. To obtain HH at the end, the number of different scenarios can be expressed as a recursion relation.
• Let x be the expected number of flips.
• If the first 2 flips are HH, then we are done. (2)
• If the first flip is tails T, then we have still x more number of flips to go again. (x+1)
• If the first two flips are HT, then also we have still x more number of flips to go again. (x+2)
• Thus x = (1/4)*2 + (1/2)*(x+1) + (1/4)*(x+2)
• Solving for x = 6.
• For HT:
• If the 1st two flips are HT, then are done. (2)
• If the first flip is T, then we have still x more number of flips to go again. (x+1)
• The remaining cases are HHT, HHHT, HHHHT, ..... and so on. Observe that we cannot form recursion for this case as earlier.
• Thus x = (1/4)*2 + (1/2)*(x+1) + 3*(1/2)3 + 4*(1/2)4 + ....
• Solving for x = 4
• This is intuitive because in the first case HH, if the first toss was T or first 2 tosses were HT, they were going 'waste' and we needed to start over. But in this case only if the 1st toss is T, it is going waste, if it is HH, then still we have obtained half of the desired sequence.
23. Let F(n, H) be the number of sequences of length n ending with HH and beginning with H and F(n, T) be the number of sequences of length n ending with HH and beginning with T.
• Thus F(n+1, T) = F(n, H) + F(n, T) because if the sequence begins with T then the remaining sequence can begin with either H or T.
• F(n+1, H) = F(n, T) because if the sequence begins with H then the remaining sequence can begin only with T (HH is forbidden).
• Let G(n+1) = F(n+1, H) + F(n+1, T) = F(n, T) + F(n, H) + F(n, T) = F(n-1, H) + F(n-1, T) + F(n, H) + F(n, T)
• G(n+1) = G(n) + G(n-1)
• G(1) = 0, G(2) = 1
• Probability P(n) = (1/2)2*G(2) + (1/2)3*G(3) + (1/2)4*G(4) + .... + (1/2)n*G(n)
24. If the 1st toss is H then both (HH, HT) have equal chances in the second toss. But if 1st toss is T, then we start over implying that again both have equal chances of winning. Thus this game is fair.
• Let x be the probability of HH winning.
• x = (1/2)*(1/2) + (1/2)*x
• x = 1/2
• Similarly for HT winning.
25. Strategy is reject the first K secretaries and then hire the next best one which is better than the best secretary among those K.
• P(N, K) = 1/N*(1 + K/(K+1) + K/(K+2) + ... + K/(N-1))
• Choose K which maximizes the above equation.
26. Strategy is to pick a random number C and if A > C then predict that A > B, else if A < C then predict A < B.
• Let the 2 numbers picked be X and Y.
• Following are the possible cases with X, Y and C:
• {X<Y<C, X<C<Y, C<X<Y, Y<X<C, Y<C<X, C<Y<X}
• P(wins) = P(wins, X<Y<C) + P(wins, X<C<Y) + ... + P(wins, C<Y<X)
• P(wins) = P(wins|X<Y<C)*P(X<Y<C) + P(wins|X<C<Y)*P(X<C<Y) + ...
• P(wins|X<Y<C) = 0.5 because if the number picked after toss is X then X<Y<C is a success because X < C and X < Y strategy works. But if the number picked after toss is Y then it is failure because Y < C but Y > X strategy fails.
• Similarly P(wins|C<X<Y), P(wins|Y<X<C), P(wins|C<Y<X) are all 0.5 by same reasoning as above.
• But P(wins|X<C<Y) = 1 because if the number picked after toss is X, then it is success since X < C and X < Y strategy works. If the number picked after toss is Y, then it is also a success since Y > C and Y > X strategy works.
• Similarly P(wins|Y<C<X) = 1.
• P(wins) = 0.5*[P(X<Y<C) + P(C<X<Y) + P(Y<X<C) + P(C<Y<X)] + 1.0*[P(X<C<Y) + P(Y<C<X)]
• Rewriting the above:
• P(wins) = 0.5*[P(X<Y<C) + P(C<X<Y) + P(Y<X<C) + P(C<Y<X) + P(X<C<Y) + P(Y<C<X)] + 0.5*[P(X<C<Y) + P(Y<C<X)]
• P(X<Y<C) + P(C<X<Y) + P(Y<X<C) = P(X<Y)
• P(C<Y<X) + P(X<C<Y) + P(Y<C<X) = P(X>Y)
• P(wins) = 0.5*[P(X<Y) + P(X>Y)] + 0.5*[P(X<C<Y) + P(Y<C<X)]
• P(X<Y) + P(X>Y) = 1
• P(wins) = 0.5 + 0.5*[P(X<C<Y) + P(Y<C<X)] > 0.5
• Thus probability of winning is greater than 1/2.
27. All positions have the equal probability of winning.
• Assuming that there are N person and consider the case of the K-th person in clockwise direction from host or N-K th in the anti-clockwise direction
• If clockwise direction is denoted by heads H and anti-clockwise by tails T, the the K-th person wins if a sequence of H's and T's ends with either K H's or N-K T's. With infinite number of passes, there would always be a sequence with K H's or N-K T's. Thus probability the K-th person wins is 1/N.
28. P(X>Y) = [P(X=4, Y=3)] + [P(X=5, Y=3) + P(X=5, Y=4)] + [P(X=6, Y=3) + P(X=6, Y=4) + P(X=6, Y=5)] + ....
• Since minimum number of tosses for 3 H's is 3, thus for X > Y, X must start from 4.
• We can write P(X=i, Y=j) = P(X=i)*P(Y=j) since they are independent
• Let Q(Y=k) = P(Y=3) + P(Y=4) + ... + P(Y=k) is the cumulative probability
• Then P(X>Y) = P(X=4)*Q(Y=3) + P(X=5)*Q(Y=4) + P(X=6)*Q(Y=5) + .....
• P(X=n) = G(n)*(1/2n) where G(n) = G(n-1)+G(n-2)
• P(Y=n) = H(n)*(1/2n) where H(n) = H(n-1)+H(n-2)+H(n-3)
• Solving for the above we get P(X>Y) = 0.2125
• Python code is as follows:
• n = 1000

hh = [0 for i in range(n)]
hhh = [0 for i in range(n)]
c_hhh = [0 for i in range(n)]

for i in range(2, n):
if i == 2:
hh[i] = 1
else:
hh[i] = hh[i-1] + hh[i-2]

for i in range(3, n):
if i == 3:
hhh[i] = 1
else:
hhh[i] = hhh[i-1] + hhh[i-2] + hhh[i-3]

c_hhh[i] = c_hhh[i-1] + hhh[i]/float(2**i)

s = 0
for i in range(4, n):
a = hh[i]/float(2**i)
b = c_hhh[i-1]
s += a*b

29. Assuming that the highest number on the n dice be K, then the number of possible ways to get K has the highest number = Number of ways n dice can come up 1 to K with at-least one-die having K score = Number of ways n dice can come up 1 to K - No dice coming up K (i.e. All dice coming up from 1 to K-1)
• P(K) = (Kn-K-1n)/6n
• E = P(1) + 2*P(2) + 3*P(3) + 4*P(4) + 5*P(5) + 6*P(6)
30. Probability of not seeing the car in time X to X+20 = Probability of not seeing the car in time X to X+5 and Probability of not seeing the car in time X+5 to X+10 and Probability of not seeing the car in time X+10 to X+15 and Probability of not seeing the car in time X+15 to X+20
• Assuming that each interval [X, X+5], [X+5, X+10], [X+10, X+15] and [X+15, X+20] are equally probable (uniform distribution) to see a car.
•  Let y be the probability of not seeing the car in the 5 minute window.
• Then probability of not seeing the car in time X to X+20 = y4
• Thus probability of seeing the car in time X to X+20 = 0.9 = 1-y4
• Thus y = 0.10.25
• Probability of seeing the car in the 5 minute window = 1-y = 0.437
31. The 3 parts can form a triangle only if the longest part is less than half the length of the stick. This is due to the property that for the 3 parts X, Y and Z it must satisfy X+Y > Z, X+Z > Y and Y+Z > X.
• If the length of a part is greater than half the length of rod then the remaining 2 parts will be less than half the length.
• Thus there can be 4 possible cases:
• 1st part is greater than half length
• 2nd part is greater than half length
• 3rd part is greater than half length
• All parts are less than half length
• All 4 cases are equally probable and only in the last case a triangle can be formed. Thus probability is 1/4 = 0.25
32. If we write down all permutations of 1 to N, then each number has equal probability of occuring in all positions. Thus 1 occurs in 1st position 1/N times, 2 occurs in 2nd position in 1/N times and so on. Thus each number has 1/N probablity of occuring in its corresponding position. Probability that a number do not occur in its respective position is thus (N-1)/N
• E = E(1) + E(2) + ... + E(N)
• E(i) is the expectation that i-th letter occurs in its respective position
• E(i) = 1*(1/N) + 0*(N-1/N) = 1/N
• E = 1
33. Let the N coupons when ordered by the time when it was obtained for the 1st time be C1, C2, ..., CN. i.e. Ci was obtained before any Ci+1.
• Let Xi denote the random variable of the number of trials after Ci is obtained and before Ci+1 is obtained.
• E(Xi) is the expected number of trials for the i-th interval.
• E = E(X0) + E(X1) + ... + E(XN-1)
• Probability of observing a success in the i-th interval = (N-i)/N because we have already obtained i coupons in the first i-1 intervals.
• P(Xi) = (N-i)/N
• E(Xi) = 1/P(Xi) = N/(N-i)
• E = N*[1 + 1/2 + 1/3 + ... + 1/N] approximates to N*logN when N is large.
• There is another way to solve the problem
• Let G(K) be the probability of obtaining all the N coupons in exactly K trials.
• Minimum number of trials for N coupons is N.
• Then E = N*G(N) + (N+1)*G(N+1) + (N+2)*G(N+2) + ....
• Question is how to compute G(K) ?
• Let F(N, K) be the number of distinct sequences of length K (>= N) made up with integers from 1 to N with each integer occurring at-least once.
• F(N, K) can be computed recursively as follows:
• F(N, K) = N*[F(N, K-1) + F(N-1, K-1)] because for each integer D occurring at the K-th position, we can obtain F(N, K) from F(N, K-1) of length K-1 by adding D at the end and also since D already occurs we can also consider all sequences which do not have the integer D of length K-1 i.e. F(N-1, K-1). This is true for all integers D from 1 to N thus we multiply by N.
• To obtain G(K) we observe that the coupon obtained in the K-th trial occurs for the 1st time at the K-th trial i.e. except for one of the coupon we obtain the remaining N-1 coupons in any order in the first K-1 trials.
• G(K) = N*F(N-1, K-1)/NK
• Following is a Python implementation:
• from scipy.special import factorial

def get_counts(N, K):
cache = [[0]*(K+1) for i in range(N+1)]

for i in range(1, N+1):
for j in range(i, K+1):
if i == 1:
cache[i][j] = 1
elif i == j:
cache[i][j] = int(factorial(i, exact=True))
else:
cache[i][j] = i*(cache[i][j-1] + cache[i-1][j-1])

return cache

N, M = 4, 50
cache = get_counts(N, M)

s = 0
for i in range(N, M+1):
s += i*N*cache[N-1][i-1]/float(N**i)

print s

s1 = 0
for i in range(1, N+1):
s1 += 1.0/i

s1 *= N

print s1

We compare the results of both the implementations with N=4 and verify that they are infact same.

• F(N, K) can also be computed using Principal of Inclusion and Exclusion (PIE)
• Number of K-length sequences using {1,2,...N} with each integer occurring at-least once = Number of all possible K-length sequences - Number of K-length sequences with N-1 integers + Number of K-length sequences with N-2 integers - ...
• F(N, K) = NK - NCN-1*(N-1)K + NCN-2*(N-2)K - NCN-3*(N-3)K + ....
• from scipy.special import comb

N, K = 4, 10
s, j = 0, 0
for i in range(N, 0, -1):
s += (-1)**j*int(comb(N, i, exact=True))*i**K
j += 1
34. Let us denote the integers by 1, 2, ... N. Let Xi = 1 if the the integer i is selected else 0.
• Expected number of distinct elements = E(X1) + E(X2) + ... + E(XN)
• P(Xi=1) = 1-Probability of not selecting i = 1-((N-1)/N)^N
• E(Xi) = 1*P(Xi=1) + 0*P(Xi=0) = P(Xi=1) = 1-((N-1)/N)^N
• E = N*(1-((N-1)/N)^N)
• This can also be solved using PIE technique.
• Let F(N, K) denote the number of samplings of length N with K distinct elements.
• Number of samplings with exactly K distinct elements = Number of samplings with K elements any number of times - Number of samplings with only K-1 elements any number of times + Number of samplings with only K-2 elements any number of times ....
• F(N, K) = KN - KCK-1*(K-1)N + KCK-2*(K-2)N - KCK-3*(K-3)N + ....
• G(K) = NCK*F(N, K)/NN
• E = G(1) + 2*G(2) + 3*G(3) + ....
• from scipy.special import comb

def distinct(K, N):
s, j = 0, 0
for i in range(K, 0, -1):
s += (-1)**j*int(comb(K, K-i, exact=True))*i**N
j += 1
return s

N = 10
expectation = 0
for i in range(1, N+1):
expectation += i*distinct(i, N)*int(comb(N, i, exact=True))/float(N**N)

print expectation

print N*(1.0-(float(N-1)/N)**N)

35. Let E(Xi) be the expected number of correctly predicted games in round i.
• Expected points = E(X1) + 2*E(X2) + 4*E(X3) + 8*E(X4) + 16*E(X5) + 32*E(X6)
• In each round k, let Yi=1 if I bet on the winning team in game i else Yi=0.
• There are 32 games in round 1, 16 games in round 2, 8 in round 3 and so on.
• Let Pk(Yi=1) denote the probability of correctly betting on the winning team in round k in game number i.
• Pk(Yi=1) = Pk(Yi=1 | Team is playing in round k)*P(Team is playing in round k)
• P(Team is playing in round k) = P(Team has won in all k-1 previous rounds) = 1/2k-1
• Pk(Yi=1 | Team is playing in round k) = 1/2
• Pk(Yi=1) = 1/2k
• Ek(Yi) = 1*Pk(Yi=1) + 0*Pk(Yi=0) = 1/2k
• E(X1) = E1(Y1) + E1(Y2) + ... + E1(Y32) = 32*(1/2)
• Similarly E(X2) = 16*(1/2), ..., E(X6) = (1/2)
• Thus E = 63/2
36. Let us denote the first socks in each pair by {X1, X2, ..., Xn} and the 2nd socks in the corresponding pairs as {Y1, Y2, ..., Yn}.
• Let F(n) be the number of ways to pair the socks such that Xi can be paired only with Yi or Yi-1 or Yi+1.
• If we pair Xi with Yi, then we have n-1 pairs left i.e. F(n-1)
• If we pair Xi with Yi-1 then Yi can only pair with Xi-1, similarly if we pair Xi with Yi+1, then we can pair Yi with only Xi+1, thus we have n-2 pairs remaining for these 2 cases.
• F(n) = F(n-1) + 2*F(n-2)
• Number of all possible ways to select the pairs G = (2n)!/(2n*n!)
• Because there are 2n socks and for each permutation of 2n socks we create n pairs by taking the consecutive socks.
• For each pair (A, B) it is same as (B, A) thus we divide by 2 and there are n pairs thus we divide by 2n. Also if there are 2 pairs (A, B) and (C,D) then it doesn't matter it which order I select these 2 pairs thus divide by n!
• P(n) = F(n)/G
37. Put 1 red ball in jar 1 and 49 red balls and 50 blue balls in jar 2.
• P(red ball) = P(red ball | jar 1)*P(jar 1) + P(red ball | jar 2)*P(jar 2)
• P(red ball) = 1*0.5 + 0.5*(49/99) = 0.7474
38. Probability of winning at-least one game = 1-Probability of losing both games = 0.35
39. Same as Coupon Collection Problem.
40. Let X be the number of correct answers.
• Probability of passing = P(X=38) + P(X=39) + ... + P(X=75)
• P(X=i) = 75Ci*0.25i*0.75(75-i)
• Probability of passing = 1.574*10-6

Sources of Problems and Other Resources:

Categories: MACHINE LEARNING, PROBLEM SOLVING