Machine Learning, AI and Programming

Designing a Cab Hailing Service like Uber - Pooling

In the previous post of this series, we had looked into how to design a system to match a passenger with a cab and the driver for single passenger trips (like Uber Go, or Uber Premium in India). The challenge there was matching a cab with a single passenger and matching a passenger with a single cab. But when we talk about cab pooling, we need to match a cab with multiple passengers and match a passenger with a cab and other passengers. As you can rightly imagine that the number of constraints will now increase.

I took some inspiration from these series of posts by Lyft's Director of Engineering on Medium.

Source : ü

Matching a passenger with a cab and a driver alone will have similar design challenges and constraints as seen in the previous post, additionally,

Following are some of the design challenges and constraints that comes to mind immediately while matching a passenger to other passengers :

  • Deciding the optimal route for a cab so as to :
    • Minimize detours.
    • Minimize the total seating time of all passengers in a single trip.
    • Minimize the total distance travelled taken by a cab for pickup and drop-of of passengers.
  • Matching a passenger with other passengers with same gender majority (especially for female passengers).
  • Matching passengers with other passengers of similar ratings.

A cab can be matched to either a single pooling request or multiple pooling requests. While a cab is being matched to a pooling request, it can either be an empty cab or there could passengers already in it.

Let's consider the scenario where a cab is empty and a single pooling request comes. This is same as our single driver condition we had designed for in the previous post.

The cost function for such a case is just the waiting time of the passenger (and yes the cab must be located within a distance of 1.5 km).

Next we consider the scenario, where a single pooling request is matched to the cab but the cab already has one passenger in it. Let the source and destination for the existing passenger be denoted by S1 and D1 and the source and destination for the incoming request be S2 and D2. The current location of the cab be denoted as Q.

Then the possible routes for the cab are :

  1. S1 to Q to S2 to D1 to D2
  2. S1 to Q to S2 to D2 to D1

The cost of a particular route is the total delay to all passengers. For a single trip, the only delay to a passenger is the waiting time to pickup, but in case of pooling, the delays are :

  • The time to a pickup : This time is added towards all passengers.
  • The time to a drop-off : This time is added towards all passengers not being dropped.

In the first option, the delay to the first passenger P1 is :

C1 = [T(@, S1) + T(S1, Q) + T(Q, S2) + T(S2, D1)] - [T(S1, D1)]

where, T(X, Y) is the shortest time to travel from X to Y and T(@, S1) is the time taken to reach S1 from the empty cab's location '@'.

The delay for the second passenger P2 is :

C2 = [T(Q, S2) + T(S2, D1) + T(D1, D2)] - [T(S2, D2)]

In case of the second passenger, the times T(@, S1) and T(S1, Q) are inconsequential because his/her delay is only due to the time to his/her pickup and the additional drop-off time for the first passenger.

Thus the total cost to delay for the first option is :

C1 + C2 = [T(@, S1) + T(S1, Q) + 2T(Q, S2) + 2T(S2, D1) + T(D1, D2)] - [T(S1, D1) + T(S2, D2)]

Similarly the delay to passenger P1 in the second option above is :

C1 = [T(@, S1) + T(S1, Q) + T(Q, S2) + T(S2, D2) + T(D2, D1)] - [T(S1, D1)]

and for passenger P2 :

C2 = T(Q, S2), because 2nd passenger is the first to be dropped.

Thus the total cost to delay for the second option is :

C1 + C2 = [T(@, S1) + T(S1, Q) + 2T(Q, S2) + T(S2, D2) + T(D2, D1)] - [T(S1, D1)]

Additionally, we want to cap the delay to a passenger due to pickup and drop-off of other passengers to 20% of the estimated time if the passenger was traveling alone.

i.e. for the first passenger if the estimated time to D1 was 30 minutes before the second request came in, then after the second match, the delay to D1 should not be more than 6 minutes.

Thus, for the 1st passenger, we want to have :

C1 - T(@, S1) <= 0.2 * T(S1, D1)

and for the 2nd passenger, we want to have :

C2 - T(Q, S2) <= 0.2 * T(S2, D2)

The next scenario is where the cab is initially empty but 2 booking requests match the cab. This is similar to the scenario above but, the options are more since the pickup of the two passengers can also be ordered now :

  1. S1 to S2 to D1 to D2
  2. S1 to S2 to D2 to D1
  3. S2 to S1 to D1 to D2
  4. S2 to S1 to D2 to D1

The other scenarios are generalization of the above, where the number of passengers already seated could be more than one and number of requests matched to a cab can also be more than one.

Note that, although there could be more than 7 to 8 matching requests that can come to a cab, but the number of passengers actually picked up should be less than or equal to the remaining seats available in the cab.

When the driver needs to pickup N more passengers, then number of possible route options are :

Noptions = (N!)2

Because, the N sources can be permuted in N! ways and then for each permutation of sources, N destinations can be permuted in N! ways (assuming that no pickup happens after first drop).

When N is small < 5, the computation is still manageable with 14400 possible options for N=5, but when N starts to increase beyond 5, the possible options are far too large to be computed within 1-2 seconds.

Finding the optimal route for a generic N is a NP-Hard problem and is known as the Vehicle Routing Problem.

The total cost should be weighted cost of each possible routing option. More logical options should be given more weight. For e.g. one way to weight is to :

Cost = 0.3 * (S1, S2, D1, D2) + 0.2 * (S1, S2, D2, D1) + 0.2 * (S2, S1, D1, D2) + 0.3 * (S2, S1, D2, D1)

In a practical scenario, 10 passengers will be matched to 3 cabs, with each cab having a different number of seats available, say [2, 1, 3]. Thus only 6 out 10 will be allocated into the 3 cabs.

We need to evaluate the cost of each possible assignment of 6 out of 10 passengers into the 3 cabs, then we take the assignment with the minimum total cost and accordingly assign the cabs to the passengers.

One can use Genetic Programming technique to find an approximately good solution that minimises the above cost.

In the last post, we had seen how to track the cabs : Indexing the cabs into a Redis database with an index on GPS coordinates.

After modification to some of the fields, the updated cab properties to track :

  • Current location of all the cabs. Refresh and update location after every second.
  • Number of passengers currently in the cab.
  • Maximum number of passengers the cab can accommodate.
  • If number of passengers in cab is greater than 0, then flag to indicate whether it is a pooled cab or not.
  • Last trip start and end time. If cab is on-trip then start time of current trip and estimated end time.

A pooling cab is available if :

  • If the number of passengers in the cab is 0.
  • If the number of passengers in the cab is greater than 0 but less than the maximum number of passengers it can accommodate and it is a pooled cab.

Similar to our previous implementation, whenever a booking request comes in for a pool, queue the request and update the maps CP and PC.

From the time a request comes, use a separate thread to track the duration and after about 10 seconds, remove the request from the head of the queue, retrieve the passenger id Pi, retrieve the cab ids for this passenger id from the map CP. Then for each of these cab id, retrieve the list of other passenger ids from the map PC.

Out of all these cab ids and passenger ids, find the assignment that minimises the total cost and satisfying the constraints as explained above. All possible assignments to evaluate must include the current passenger id Pi, implying that even if the number of passengers is more than the number of seats available in the cabs, passenger id Pi has to be assigned a cab even if some other assignment without putting Pi inside any cab has a lower cost.

Note that, whichever cab passenger id Pi has been assigned to, if it has more than one seat available and the assignment which minimizes the cost, also assigns some there passenger id Pj to the same cab, then remove Pi from the queue and all other passenger ids assigned the same cab and update the maps CP and PC.


Tags: , , , ,

Leave a Reply