Towards Data-Driven On-Demand Transport

On-demand transport has been disrupted by Uber and other providers, which are challenging the traditional approach adopted by taxi services. Instead of using fixed passenger pricing and driver payments, there is now the possibility of adaptation to changes in demand and supply. Properly designed, this new approach can lead to desirable tradeoffs between passenger prices, individual driver profits and provider revenue. However, pricing and allocations—known as mechanisms—are challenging problems falling in the intersection of economics and computer science. In this paper, we develop a general framework to classify mechanisms in ondemand transport. Moreover, we show that data is key to optimizing each mechanism and analyze a dataset provided by a real-world on-demand transport provider. This analysis provides valuable new insights into efficient pricing and allocation in on-demand transport. Received on 09 January 2018; accepted on 30 March 2018; published on 27 June 2018


Introduction
Catching a cab has changed over the last few years. In New York City at the turn of the millennium, hailing a cab from the side of the road was the norm. Or in Sydney, it was common to call a taxi dispatcher for a journey to the airport. Now in 2017, it is increasingly common to instead open an app on a smartphone to book transport via Uber, Lyft, or one of several other on-demand transport services.
What has changed? After the now widespread adoption of internet-enabled smartphones and secure online payments, it is possible for both potential passengers and drivers to easily communicate with a provider or directly with each other. This offers the potential for passengers to be transported, irrespective of their location and the locations of available drivers. The problem that remains is to determine which driver should transport which passenger and how much they should pay-the allocation and pricing.
The high level of connectivity that we now take for granted can also assist in allocation and pricing. In particular, the use of smartphone apps has made it possible for providers to collect huge amounts of data, * Corresponding author. Email: malcom.egan@insa-lyon.fr such as the time and location of each passenger's pickup and drop-off, as well as the prices that passengers are prepared to pay and the fares drivers are willing to accept. The combination of this data is leading to new insights into the spatial-temporal profile of on-demand transport systems [1][2][3][4] and ultimately opens the way for increasingly efficient allocation and pricing.
In fact, data-driven allocation and pricing is now an active area of research and plays a key role in real-world on-demand transport services [5][6][7][8][9][10]. By accounting for the real-time behavior of passengers and drivers, it is possible to improve the reliability of on-demand transport services in terms of waiting times, driver profits, and also provider revenue.
In this paper, we develop a general framework to design allocation and pricing algorithms within on-demand transport, with a focus on profit-driven services; that is, we focus on taxi-like services rather than on dial-a-ride services [11]. In particular, we show that existing on-demand transport mechanisms are different variants of two-sided markets. Our framework provides a unifying perspective for existing allocation and pricing algorithms, which can be viewed as implementations of market mechanisms from economics in the context of on-demand transport. Key examples are auctions [12] and posted price mechanisms [13].
A key feature of our framework is that it clarifies the requirements of different mechanisms, which forms a basis for the mechanism selection problem that providers face. In fact, we identify allocation and pricing algorithms within our framework that have desirable properties, but have not yet received significant attention.
The algorithms in our framework all rely on information either obtained directly from real-time passenger requests and driver reports, or from historical data collected from previous requests. In this sense, effective mechanisms for pricing and allocation in on-demand transport are data-driven. It is therefore important to develop approaches to obtain and exploit this data.
There are two aspects of the mechanisms that require data. The first aspect is the optimization of the pricing and payments for each mechanism in our framework. In auction-based mechanisms data is obtained directly from passengers and driver reports (or bids). However, in practice an alternative approach based on posted price mechanisms-where the provider makes offers, which are accepted or rejected-has proven most popular. In the posted price mechanism, the prices are selected based on historical transactions. When there is limited data, an online pricing approach is desirable, where the price is adapted as new data becomes available.
The second aspect is the optimization of allocations of passengers and drivers. While each market mechanism in our framework provides an allocation algorithm for a given set of passengers and drivers, it does not provide a means of selecting this set-known as the market formation problem [14]. Due to the huge scale and heterogeneity of on-demand transport systems, it is not possible to allocate all passengers and drivers simultaneously. As such, market formation is an important practical issue.
At present, there are limited data-driven methods available for the market formation problem, with most commonly used approaches reliant on heuristics guided by experienced practitioners. To gain insight into efficient market formation, we apply machine learning methods to real-world data from the provider Liftago in the Czech Republic. In particular, we identify key features that determine which drivers are willing to be allocated to a passenger. The features include not only factors such as distance but also driverdependent factors such as historical acceptance rates. A key observation is also that the accuracy of the feature ranking is highly dependent on the availability of data.
In the remainder of the paper, we first detail our classification framework for on-demand transport mechanism in Section 2. Our framework provides a way to systematically identify the requirements of each pricing and allocation algorithm, which forms a basis for mechanism selection by on-demand transport providers. In Section 3, we discuss the market formation problem and present the results of the analysis of the Liftago dataset. We conclude and outline future directions in Section 4.

On-Demand Transport Market Mechanisms: A Classification
Two key problems in on-demand transport are: (1) how to allocate passengers to drivers; and (2) how to determine passenger prices and driver payments. For traditional taxi services, the allocation is based on passengers hailing from the side of the road (the "hackney carriage" model) or through a phone call to a dispatcher who then selects a driver nearby the passenger (the "dispatcher" model).
Pricing is typically determined either by regulation or through a negotiation between the driver and each potential passenger.
On the other hand, new on-demand transport services provided by companies such as Uber and Lyft are using a different method. By exploiting a smartphone app, passengers are connected to a number of drivers within their region and are able to select the one they prefer. How much passengers pay and the commission drivers receive is determined by the provider through a pricing mechanism. This allocation and pricing scheme is an example of a posted price mechanism [13], where the provider determines the price and passengers or drivers are able to accept or reject their offer.
Posted price mechanisms in on-demand transport are themselves a class of two-sided market, where a service is provided by one side (i.e., the drivers) and bought by the other side (i.e., the passengers). Twosided markets are an active area of research in both economics and computer science, aiming to efficiently allocate resources or services [15]. In this section, we develop a framework to unify the different approaches based on the fact that they are different variants of two-sided markets. Table 1 provides an overview of the framework, presenting the structure and challenges of the four classes of mechanisms: posted-price; double auction; and hybrids.

Posted-Price Mechanisms
Posted price (PP) mechanisms (illustrated in Fig. 1) are currently the most popular approach to two-sided markets within on-demand transport. In this approach, passengers and drivers are each offered a price and a payment, respectively, which they can either accept or reject. More generally, PP mechanisms play an important role in a range of economic problems. The posted price mechanism is summarized in Algorithm 1.  In postedprice mechanismsa single passenger is o˙ered a price by the providerand is able to accept or reject the o˙er. A numberof driversare theno˙eredthe passenger's journeyand a commis sion. If a driveraccepts the journey,the passenger is served by the driver .
Step 1. For each passenger request, the provider selects a driver.
Step 2. The provider offers the passenger a price p and the driver a payment η.
(a) If both the passenger and driver accept (p i ≥ p and η ≥ η j ), the passenger i is transported by the driver j.
(b) Otherwise, the passenger request is withdrawn.
A key advantage of PP mechanisms is that they require limited information from each passenger and driver. Instead, the prices and payments are set based on historical data. This is important as each passenger and driver is not always well informed about the state of the whole on-demand transport system. As such, the price calculations can be done by the provider which can simplify the use of the service for passengers and drivers.
PP mechanisms are commonly studied as the multiperiod pricing problem [16,17]. Here, the revenue and proportion of served passengers is optimized accounting for parameters including the number of taxis, average passenger demand, and average waiting times. The basis for this optimization problem is a model relating the provider's revenue to these parameters. An example is the model developed in [18], where the average waiting time of passengers W , is given by where ω depends on the density of taxi stands, p is the proportion of available taxis, N T is the total number of taxis, γ is the average number of passengers per journey, L is the average travel time, and D is the average number of passengers served by the whole taxi system. As such, pN T − DL γ represents the average number of available taxis. We remark that [18] focuses on the role of taxi regulation; however, this is under the assumption of a PP mechanism. Recently, this model has also been extended to account for strategic behavior of individual drivers [6,19].
A drawback of the model introduced in [18] is that the preferences of individual passengers are not explicitly modeled. In fact, the model relies on the average behavior of passengers and drivers. As such, the full statistical information in historical data is not exploited. To remedy this problem, another class of models has been proposed in [7,20] where each passenger i is treated as an agent, with a maximum price p i and a maximum waiting time δ i they are willing to accept, which are random variables drawn from beta distributions. The beta distribution is selected due to the fact that it has bounded support (passengers are not willing to pay arbitrarily high prices), as well as flexibility as it is parameterized by three parameters and includes many standard distributions as special cases (e.g., uniform). 3 EAI Endorsed Transactions on Industrial Networks and Intelligent Systems In general socio-economic systems, it is often challenging to understand the underlying processes that cause agents to behave in certain ways. As such, an important methodology is to instead consider stylized facts, which are qualitative trends satisfied by the system [21,22]. In the context of on-demand transport, the model in [7] is motivated by the two following stylized facts, which form common qualitative trends in on-demand transport systems: Stylized Fact 1. The maximum price a given passenger will pay for a journey and her maximum waiting time δ do not vary significantly for passengers that regularly use on-demand transport services. In other words, these parameters will in general vary from passenger to passenger, but not for the same passenger.
Stylized Fact 2. The probability that a passenger will accept an offer decreases when either the price increases with the waiting time fixed, or the waiting time increases with the price fixed.
The first stylized fact ensures that passengers have a well-defined maximum price and the second stylized fact captures the intuitive notion that if there is a better deal, more passengers will accept. These stylized facts provide an important basis for the passenger models. We note that they are not guaranteed to hold in every situation, and for such scenarios the algorithms may require additional tuning. Nevertheless, these algorithms can still provide improvements over just considering the average behavior as in [6].
As a consequence of the model in [7,20], the provider can set the price in the PP mechanism to maximize its expected revenue for short waiting times. In particular, the expected revenue is given by where Pr(p ≤ p i ) is the probability that passenger i accepts the offered price p. A similar approach can be applied to optimize the provider's percentage commission, η, from a journey of driver j by maximizing where C(η) is the provider's expected commission, η j is the minimum acceptable percentage commission of driver j and Pr(η > η j ) is the probability that the driver is willing to accept a commission η.
Observe that the PP mechanism relies on historical data through the distribution of the maximum price p i . As the distribution is modeled via the beta distribution, the problem reduces to estimating three parameters. However, it may not always be possible for drivers to have sufficient historical data to reliably estimate the parameters for the distributions of passenger maximum payments and driver minimum percentage commissions. In this situation, the price and commission need to be adapted as new data becomes available.
A promising approach to exploit new data is to use algorithms for the multi-armed bandit problem. In particular, passengers are offered a price within the set P = {p 1 , . . . , p N }. Using historical observations of the acceptance of each price, the price for the next passenger is updated according the to UCB1 algorithm, originally developed in [23]. This approach was proposed in a pure economic context in [24] and can be adopted in the on-demand transport setting as detailed in Algorithm 2.
, where x j is the average profit received so far from using price p j , n j is the number of times price p j has been allocated, and n is the total number of requests.
Intuitively, the algorithm trades off exploration of new (potentially higher) prices with exploitation of prices that are known to be successful. The analysis of the performance of multi-armed bandit pricing algorithms in on-demand transport remains to be fully explored.

DoubleAuctionMechanisms
In economics, a single-sided auction is a means of pricing goods or services where buyers (or sellers) bid for the price they are prepared to pay (or sell) [25]. Classical examples are the English or sealed bid auctions used in property sales or for antiques and artwork. More generally, two-sided or double auctions can be applied in situations where there are both multiple buyers and multiple sellers.
In the context of on-demand transport, a double auction (illustrated in Fig. 2) allows for passengers to disclose the price they are prepared to pay and drivers to set the fare they are prepared to accept in order to provide transport. In contrast, PP mechanisms only allow passengers and drivers to accept or reject prices set by the provider. For this reason, double auctions can lead to more efficient allocation of services. Towards Data-Driven On-Demand Transport In fact, the role of a double auction mechanism (to simultaneously price and allocate both passengers and drivers) implemented by a provider is to decide when passengers and drivers can be matched and how much they should pay, given their bids. The providerthen acts as a matchmakerto determinewhich passenger is served by each driver .
A key problem in double auctions is to account for the strategic behavior of the agents (i.e., passengers and drivers) involved. In fact, unlike traditional taxi models [18], the individual preferences of passengers and drivers influence the design of the mechanism. This is caused by the provider's lack of knowledge concerning how much passengers are prepared to pay and the minimum payment drivers are willing to accept, which requires the provider to elicit these preferences from both passengers and drivers via reporting (or bidding).
In the case of PP mechanisms, agents either accept or reject their offers, which means that agents have no incentive to report (or bid) untruthfully-clearly, accepting an unwanted service is not a rational strategy. However, this is not necessarily these case in double auctions and it is desirable to ensure that agents bid truthfully, the provider does not lose money (weak budget balance), and agents that bid truthfully have an incentive to participate (individual rationality). For a detailed discussion of these concepts, see [25].
Double auction mechanisms have been extensively studied within economics and computer science, and fall into two main classes: static and online. In the static case, all passengers and drivers to be allocated are known to the provider. In the dynamic case, passengers and drivers can arrive at different times which must be accounted for by the mechanism. A key desiderata in both cases is to ensure that the mechanism is efficient, which means that the total reward for buyers and sellers for receiving or providing the service is maximized. In general, optimal efficiency is not possible to achieve while also ensuring truthfulness, budget balance and individual rationality [26].
The application of classical double auctions in ondemand transport has not proven straightforward. One reason for this is that the purpose of the market can differ compared with the classical economic setting. In particular, the provider seeks to make a long-term profit, while the main related criterion in double auction theory is budget balance which only accounts for the short-term profit. As such, verifying the profitability of double auction mechanisms in on-demand transport relies heavily on simulation. This means that specialized on-demand transport simulation tools such as the Mobility Services Testbed [27] are required.
Another difficulty facing the application of double auctions is that allocations depend heavily on the financial resources of each passenger. For this reason, the passengers that are allocated may not be those that desire the service the most. As such, it is likely that a key application off double auction mechanisms to ondemand transport is for services targeted at businesses (to transport staff within a city) to minimize the effect of financial inequality.
At present, the application of double auctions in on-demand transport has focused on the static case where all passengers and drivers are known before the allocation and pricing begins. In particular, the mechanism is based on the approach by McAfee [26], which is known to have the desirable properties of truthfulness, budget balance and individual rationality. in particular, the work in [8][9][10] adapted the McAfee mechanism and investigated the average number of transactions and efficiency for each mechanism run.
There are two stages of the static double auction mechanism in [8]: 1. Decompose the market consisting of all passengers and drivers into markets.

Allocate passengers to drivers and set prices.
The first stage is necessary to ensure that the locations of passengers and drivers in each market lie within the same region. The reason for this is that the McAfee double auction mechanism requires homogeneous goods, which means that the passenger initial locations and requests as well as the initial locations of the drivers are similar.
The second stage of the static double auction approach in [8] is to allocate passengers to drivers using the McAfee mechanism. The allocation algorithm is detailed in Algorithm 3. In the algorithm, Steps 1 and 2 sort the passengers and drivers based on their reported maximum prices and minimum payments, respectively. Note that it is not possible to directly compare the reports of the passengers and drivers directly due to the fact that drivers report their minimum profit. As such, in Step 1 the price of each passenger is modified to account for the distance of their journey and the initial distance to each driver. Steps 3-5 then determine the prices and payments based on the McAfee mechanism [26].
Algorithm 3 Double auction mechanism pricing and allocation based on the McAfee mechanism [26].
Require: Passengers broadcast {p i } K i=1 (the maximum price they are prepared to pay for their journey), and drivers broadcast {s j } N j=1 (the minimum profit they are prepared to receive for their next journey).
Step 1. For each commuter, compute p i = p i − cR max − cR 0,max , where c is the cost per kilometer for each driver, R max is the maximum journey distance and R 0,max is the maximum distance of the driver from a passenger.
To the best of our knowledge there is no work in theory or practice applying online double auctions to on-demand transport. However, in certain applications it may be desirable to adopt an online approach, for which there is a firm basis in the economics and computer science literature (see, e.g., [15]).

HybridMechanisms
So far, we have examined allocation and pricing for ondemand transport where a PP mechanism or a double auction is applied to both passengers and drivers. It is also possible to consider the class of mechanisms where passengers are priced based on a PP mechanism and drivers based on a single-sided auction mechanism (the hybrid PP/A mechanism), or vice-versa (the hybrid A/PP mechanism). However, investigations into hybrid mechanisms remain ongoing work [28].
A particularly interesting hybrid mechanism is the application of a PP mechanism to price passengers and a single-sided auction to set driver payments (the hybrid PP/A mechanism, illustrated in Fig. 3). Although there has been no formal analysis of this mechanism, it has been put into practice by Liftago in the Czech Republic. Liftago's mechanism proceeds (illustrated in Fig. 4) as detailed in Algorithm 4: Figure 3. Illustration of the hybridPP/A mechanisms. In hybrid PP/A mechanisms, a passenger is o˙ered a price by the provider and is free to accept or reject the o˙er. Each driverthen bids for the passengers journey,wherethe bid consistsof the commis sion the driveris preparedto accept.  Algorithm 4 Liftago mechanism description.
Step 1: A passenger makes a request to Liftago.
Step 2: Liftago identifies a subset of drivers to potentially serve the passenger.
Step 3: Each driver bids to serve the passenger, or ignores the request.
Step 4: The passenger selects one of the drivers and pays the corresponding bid. Towards Data-Driven On-Demand Transport Even in competition with traditional taxis and other providers such as Uber, Liftago has managed to remain financially sustainable using the hybrid mechanism. This suggests that a fruitful avenue of research is an analysis of how the hybrid mechanism compares with traditional taxis and other market mechanisms. An initial investigation is currently being carried out in [28].
While Liftago's mechanism has several desirable features, a key challenge is the design of algorithms for Step 2. This step is known as market formation and is analogous to the first stage of the double auction mechanism in [8] detailed in Section 2.2. However, unlike in the double auction mechanism the provider does not have bids from both passengers and drivers. As such, there is a reliance on historical transaction data. In the next section, we focus on data-driven aspects of the market formation problem in the context of on-demand transport (see [14] for a more general perspective).

The Market Formation Problem
Each market mechanism has different features that make it appropriate for different applications of ondemand transport, whether it be a PP mechanism, double auction, or a hybrid. However, all the mechanisms have a common feature: they require a market. In particular, not every passenger and driver can be potentially matched. As such, it is necessary for the provider to make a selection of passengers and drivers that are allocated at the same time.
Ideally, market formation [14] should be performed so that the number of drivers that compete for each passenger is minimized, while still ensuring that each passenger is able to be serviced. The most basic market formation approach simply ensures that hard constraints are met. These hard constraints include the maximum travel time between the initial locations of drivers and the pick-up locations of potential passenger. Other hard constraints may be based on other preferences of passengers, which may limit the vehicle class of each driver, or driver reputation. Simply accounting for hard constraints ensures that passengers are not offered a service from any driver that does not satisfy these hard constraints.
In general, hard constraints such as maximum travel time do not rule out a large number of potential drivers [8]. As such, drivers may need to respond to many requests. This can reduce the safety of drivers as they need to concentrate on responding to requests, rather than focusing on driving. Moreover, for a given passenger request, many drivers that are offered a passenger's journey may not want to serve the passenger. This may be due to passenger characteristics such as desired drop-off location or passenger reputation.
To further reduce the number of drivers that can be potentially matched to each passenger, it is possible to exploit heuristics based on the advice of experienced practitioners. For instance, provider experience may suggest that a fixed number of drivers are allowed to be matched with each passenger, where each driver is selected randomly from those that satisfy the hard constraints. More sophisticated heuristics may be based on passenger features such as drop-off location as, for example, passengers that wish to travel to transport hubs (e.g., airports) are significantly more desirable than passengers that wish to travel to locations where there is unlikely to be another passenger [29]. In these cases, it may be desirable to offer more drivers the journey of passengers with undesirable drop-off locations to ensure that the chance they are served is maximized.
Heuristics based on expert support are built using the experience of providers. This approach can be improved by exploiting the availability of historical data, which leads to data-driven market formation. In data-driven market formation, systematic machine learning techniques are applied to select drivers and passengers that can be potentially matched.
At present, there are limited data-driven algorithms for the market formation problem (initial work in this direction can be found in [3,4], designed for PP mechanisms). The first step to developing these algorithms is to apply machine learning techniques to real-world data. To this end, we exploit a dataset provided by Liftago, which employs a hybrid mechanism as described in Section 2.3. Although this data is specific to Liftago, the analysis methodology can be applied to any provider's data set and the features provide important insights into the role of both per journey factors such as distance and long-term factors such as the historical acceptance rate of each driver.
Our aim in analyzing the Liftago data set is to understand the contribution of features that determine whether or not a driver is willing to bid for a given passenger's journey. Each feature is detailed in Table 2. We performed a ranking of the features on Liftago's dataset consisting of 253687 passenger requests. The ranking was based on a binary classification model able to predict the probability that a driver will accept and bid for a passenger's request based on the features in Table 2. The classifier is based on the Random Decision Forest Ensemble (RDFE) 1 [31], which provides a means of ranking features by averaging over the expected fraction of samples affected by tree nodes over all 200 trees.
The result of the feature ranking is shown in Fig. 5. Observe that all features have a non-negligible 1 The implementation is based on the scikit-learn package [30]. contribution, which implies that highly efficient market formation algorithms should account for a range of factors, both on a per journey basis and also based on long-term historical data. The analysis also revealed that the most significant factors are the distance from a driver's initial location to the passengers pick-up location and the mean acceptance rate of the drivers. It is important to note that the pick-up distance is a per journey factor, while the mean acceptance rate is based on long-term driver history.
In practice, the training data depends on how long the provider has been operating. As such, it may not be possible for providers to initially have access to a large dataset. To understand the effect of the size of the training set, the accuracy of the model based on small  Fig. 6 is a boxplot of the accuracy for different dataset sizes based on 8 runs of the RDFE with different seeds. Observe that the accuracy is highly dependent on the number of training samples. This suggests that building large training sets is important for market formation in on-demand transport.

Conclusions and Future Directions
Data-driven approaches to on-demand transport are now gaining traction as Uber and other providers are adopting market-based mechanisms to allocate and price passengers and drivers. By exploiting historical datasets, it is now possible to improve the efficiency of on-demand transport to ensure it is easier than ever to get from point A to B while maintaining financial sustainability of the provider. Due to the fundamentally different approach to allocation and pricing, there are new challenges in understanding and improving allocation and pricing in on-demand transport. In this paper, we have provided a framework to classify the different market mechanisms that can be applied and how market formation can be performed. We also have identified key features for market formation based on an analysis of Liftago's data set, which reveal that both journey-specific and long-term driver behavior should be accounted for. However, there are a number of other aspects that deserve attention.
In particular, the role of subsidies in on-demand transport market mechanisms is not well-understood. As providers need to ensure they have both sufficient supply (drivers) and demand (passengers), it can be necessary for the mechanism to further adjust pricing 2 The accuracy is a good measure of performance in the case that the classes are well balanced. In the Liftago dataset this is indeed the case with 45% positive and 55% negative. Towards Data-Driven On-Demand Transport to ensure that journey offers are accepted by both passengers and drivers. Selecting and optimizing mechanisms for on-demand transport is challenging due to the huge scale and heterogeneity of the system. For this reason, obtaining large datasets and developing effective simulation tools are a key aspect to improving performance. This is important not just for providers, but also for municipalities and governments to ensure that there is a fair tradeoff between provider revenue, individual driver profits and passenger prices.
Exploiting available data will require effective machine learning techniques at very large scales. Combining these machine learning techniques with economic theory is leading to many new open problems. However, given the importance of transportation, developing efficient data-driven on-demand transport is a valuable challenge.