Choose Early or Choose Wisely-A Chinese Restaurant Game Approach

Agents in a network often face situations requiring them to make decisions without sufficient information. In such situations, they may postpone their decisions in order to observe and collect more information through learning from other agents. In this paper, we discuss the advantages of the postponement strategy from a gametheoretic perspective. We propose an extension to Chinese Restaurant Game, a general framework for social learning. In the proposed extension, rational agents may change their decision order at will. We find that two important elements in Chinese Restaurant Game, social learning and negative network externality, still dominate agents’ decision process and the postponement strategy. We study a two-player case in detail. Through simulations, we find that the signal quality and table size ratio greatly influence whether a rational agent will apply the postponement strategy or not. In some cases, rational agents may postpone their decisions in response to some, but not all, signals they received. We observe that such a strategy is informative, which also helps other agents improve their strategies accordingly.


INTRODUCTION
Rational agents in a social network may encounter incidents that require them to make decision without complete information.In such occasions, they will learn the required knowledge from external information sources.Signals revealed by other agents, such as chats, reviews, rating, etc., are the main sources of information for agents to learn from [1][2][3][4].Sometimes agents require a period of time to collect and analyze these signals, which force them to postpone their decisions.The action of postponing the decisions, or "wait and see" strategy, is widely seen in the real world especially when the outcome is highly uncertain.Nevertheless, the action could be costly to the agents when they are in a competition with others.How rational agents balance the accuracy and delay of the decision is the main focus of this paper.
Chinese restaurant game is a general framework for modeling the individual decision in a network with negative network externality [5][6][7].The original framework addresses two different effects: advantages of learning from other agents' experiences and disadvantages of competition from others.Nevertheless, the decision order of agents in legacy Chinese restaurant game is fixed, which means that agents cannot postpone their decisions at will.In light of this, we extend the legacy Chinese restaurant game to support the action of postponing, that is, agents or customers may postpone their decisions at will in exchange for collecting more informative signals.Such an extension significantly expands the degree of freedom of agent's strategy and makes the model more suitable for real world applications.
In this paper, we introduce an extension to Chinese Restaurant Game to allow customers to postpone their decisions.We present a detailed discussion on the two-player case where two customers and two tables in the restaurant are considered.Specifically, we illustrate how the Nash equilibrium of the proposed game can be classified into four types.We propose an iterative-update algorithm to identify the pure strategy Nash equilibrium.We demonstrate the critical influence on the decision order and how rational agents may choose when the order can be altered at will.We discuss the influences through simulations, where we show how the table size and signal quality affect the final decision of each agent in the game.

SYSTEM MODEL
Let us consider a Chinese restaurant with a set of tables and customers.Each table has infinite seats in different size, and customers each request a table for having a meal in a sequential order.We model the table selection problem as a game, which is similar to the original Chinese Restaurant Game [6].Specifically, the table set is X = {1, . . ., K}, which is also the table selection action set of each customer.The * + ∈ X is the choice of customer -.Then, let the utility function of customerbe U(0 1 2 (θ), 5 1 2 ), where 5 1 2 is the number of customers choosing table * + at the end of the game, and 0 1 2 (θ) is the size of the * + table.We let the state θ be an objective parameter, which means the state θ changes when the restaurant is remodeled, and the table size function 0 6 (7) be fixed.Finally, we denote the number of customers in tableas n + and n = {5 9 , 5 : , . . ., 5 ; } as the grouping, i.e., the grouping of customers in the restaurant [5].
We assume that customers do not know the exact state θ, and they need to collect information, such as reviews and recommendations, to help them identify the true state of the restaurant.The information that customers gather in process is called signals.We assume that all customers know the prior distribution of the state information θ, which is denoted as < = = {< =,> |< =,> = Pr(θ = B, ∀B ∈ Θ} ; the predefined distribution E(F|7) produces the signal F + ∈ S each customer received.
A rational customer can estimate the system state θ by his belief [6].We assume all customers choose tables sequentially.Each customer reveals his signal to others after he chooses a table.Let us denote the signals customerlearned, excluding his own signal F + , as ℎ -= {s}.Customercan estimate the current system state in probability with the belief being defined as < + ℎ + , F + = < +,> ℎ + , F + = Pr (7 = B|ℎ + , F + , < = ) ∀B ∈ Θ} According to the above definition, < +,> ℎ + , F + represents the probability that system state θ is equal to l conditioning on the collected signals h i , received signal F + , the prior probability < = , and the conditional distribution E(F|7) .Since we assume that customers are fully rational and follow the Bayesian rule, they will update their beliefs by the Bayesian rule as follows [6,7]: In the paper, we propose to Chinese Restaurant Game by giving each customer in the line a chance to change their order of making decision.Specifically, a customer, except the customer N, may voluntarily leave and rejoin the line as the last customer after he received his signal and right before he is asked to select a table.In that case, he gives up the chance to choose a table earlier in exchange for more signals revealed by other customers.Mathematically speaking, the action of customeris denoted by M + = N + , * + ∈ 0,1 ×Q, -< S , where N + ∈ 0,1 represents whether customer i chooses to postpone his decision and rejoin the line when he is asked to select a table.The observed signals ℎ + will also change accordingly.
A rational customer will predict the expected utility he may gain if he chooses to postpone his decision given his received signals ℎ + , his own signal F + , the decisions of previous customers, and the expected reactions of other customers.He will choose to make decision later if he believes that making decision later will lead him to choose the right table and maximize his utility.We assume that the customers who make decision after him have full knowledge about the decision the customer made before them.

EQUILIBRIUM ANALYSIS
As mentioned above, the customers are rational and follow Bayesian learning rule.Considering the information they collect, customers will choose tables and decide whether to postpone their decisions so that they can maximize their own expected utility.We first represent 5 + = (5 +,9 , 5 +,: , . . ., 5 +,6 ) as the current grouping observed by customerbefore making decision, where 5 +,6 is the number of customers choosing table T before customer -.
Then, let ℎ -= {F 9 , F : , . . ., F + } be the history of revealed signals before customer -.In such a case, the best response of customercan be written as follows [6]: In [6], a recursive approach is proposed to compute the expected utility (1) under the assumption that the decision order is intact, which is not applicable here since customers may choose to postpone their decisions and rejoin the line if it is beneficial.
Due to page limitation, a two-player case is demonstrated.We will show that even under this simple setting we can derive interesting insights from the game.We consider a two-player game where there are two customers A and B, and two tables {1,2} in a Chinese restaurant under a system state θ.Without losing generality, customer A chooses first and customer B chooses later.Nevertheless, customer A has a chance to postpone his decision after receiving his own signal.In this regards, the action of customer A and B are N _ , * _ and * `, respectively.Both customers aim at choosing a table that maximizes their own utility.The game ends when both customers choose a table.Then, the utility of the customerchoosing table T is given by ](0 6 (7), 5 6 ) , where 5 6 is the number of customers choosing table T at the end.

Iterative Solution for Two-Player Case
We propose an iterative-update method to derive the Nash equilibrium in the proposed game model.The key concept is to update the best response function of each customer iteratively.The Nash equilibrium is derived when the algorithm terminates.

Initialization:
We first initialize the best response function of each customer using the recursive method proposed in [6] with customer A as the first customer while the customer B as the second customer.Here we denote the best response function of the first and second customer as a[ 9 F 9 and a[ : 5 : , ℎ : , F : , respectively.Notice that it is not necessary for the first customer to be customer A, and vice versa.Since we have only two customers, the best response function of the second customer can be simplified as a[ : * 9 , F 9 , F : .We then have a[ 9,_ F 9 = a[ 9,`F9 = a[ 9 F 9 , where the first two notations denote the best response of customer A and B if they are the first customer to choose a table.Finally, we denote b F _ as the postponement strategy.In each iteration, two stages are applied.
1 st Stage: In the first stage, customer A compares the expected utility if he remains the order and the expected utility if he changes to the position of customer B under the assumption that the strategy applied by customer B remains a[ 9,`F` conditioning on the signal F _ he received.The expected utility of both customers can be derived by the recursive approach in [6] as follows.The expected utility of customer A is The expected utility of customer A postponeing his decision under the assumption that customer B follows the same table selection strategy as him, conditioning on his received signal F _ , is , n 6 n : , s c , s `6e`f l 1 j ,i j ,i h , 1 j e`f g,j i j i j ∈m where n L is the grouping after customer B selects a table according to a[ 9 F `.
Notice here that the best response of customer A and B interchange when their order exchanges.In such a case, customer A receives both the signals F _ and F ` due to the postponement.In sum, the postponement strategy of customer A would be: We denote u Z = F _ |b F _ = 1 as the signals of postponing.

nd Stage:
In this stage, customer B has a chance to alter his strategy if he is forced to select a table first.In the previous stage, customer A might choose to make decision later when receiving any or a subset of signals.There are three possible cases: Case 1: customer A does not postpone his decision given any signal.In such a case, customer B follows his original best response function.
Case 2: customer A postpones his decision given any signal; the postponement reveals no new information to customer B. In such a case, customer B has no choice but to follow the same best response function he originally applies.
Case 3: customer A postpones his decision only when a subset of signals is received.In such a case, the action of postponing the decision itself reveals new information to customer B: the customer A must receive a subset of signals.In such a case, customer B may make use of this information to alter his strategy when he is forced to choose first.
For customer B, the knowledge that customer A postponed only when receiving a certain subset of signals can be used to improve 1) his belief on the system state, and 2) his prediction on customer A's choice on a table.Specifically, the postponement event suggests that customer A received signal F _ ∈ u Z , which reduces the signal space the customer B must be aware of.Although the customer B may not know the exact signal customer A received, he can use Bayes' theorem to estimate the possibility of each signal the customer A may receive and update his expected utility: where v 6 = 1 if BE : (T, F _ , F `) = T, and v 6 = 0 otherwise.
The best response of customer B can then be updated as follows: a[ 9,`F`= arg max 6 E U R 6 θ , n 6 s ` Termination: The proposed method terminates when both a[ 9 F 9 and a[ : 5 : , ℎ : , F : remain unchanged after an iteration.When this condition is met, a Nash equilibrium is reached.We output a[ 9 F 9 , a[ : 5 : , ℎ : , F : , and the postponement strategy.

SIMULATION RESULTS
We now study the proposed two-player case through simulations.In this section, we consider two customers A and B, and two tables {1,2} in a Chinese restaurant with two possible states θ ∈ {1,2}.The size of tables is given by 0 + 7 , whereis the number of table and θ is the state of the restaurant; 0 + 7 is unknown to all customers.When θ = 1, the size of table 1 is 0 9 (1) = 100 and the size of table 2 is 0 : (1) = 100z, where r is the ratio of table size and r  where 9 : + Ä + Å = 1 and 1 > 9 > : > Ä > Å > 0 can be regarded as the quality of signals.When the quality of signals is higher, the corresponding signal is more likely to reflect the true state θ.The utility of the customerchoosing table T is given by (0 6 (7), 5 6 ) = 0 6 /5 6 , where 5 6 is the number of customers choosing table T at the end of the game.We investigate how the quality of signals influences the decision order and the utility of customers, and how the ratio of table size affects the decision order and the utility of customers.
First, we show the results of two settings: table size ratio z = 0.6 and z = 0 .When z = 0.6 , the second table size is 60.The utility of customers under the different qualities of signals is illustrated in Figure 1.When customer A always decides to choose first whatever the signals he receives, he has significantly higher expected utility than customer B has.In this case, the network externality effect gives the first customer more advantage than the learning process gives.
When z = 0, the second table size is 0, which means that both of customers want to choose the right table.The result is different from the result of original Chinese Restaurant Game, where the first customer should have lower utility.From Figure 2, we can see that the utility customer A still has higher utility than customer B has.It shows that offering the option to postpone his strategy changes the outcome and gives the first customer significantly larger advantage.
In detail, we list the Nash equilibrium of the game given the received signals when z = 0 in Figure 3(a) and 3(b).There are four possible pure Nash Equilibrium.Type 1 and 4 represent individually the case that customer A always chooses first and always chooses later.Type 2 and Type 3 represent that customer A may postpone his decision only when some kind of signals are received, but the response of customer B may be different, respectively.Figure 3(a) demonstrates an example of Type 1, that is, customer A always chooses to make decision first.When signal quality 9 is low ( 9 < 0.3), customer A follows the signal he received to make decision first and choose the expected bigger table.Customer B, as the second customer, will choose the table that is still empty.In this case, the low quality of the signal leads customer B to believe that the expected utility of both tables is similar.Given that the signal quality is low, the benefit for choosing later is very limited, which causes customer A to always choose first.However, in Figure 3(b), we can see Type 4, where customer A always decides to make decision later after he received any kind of signals.When 9 is high ( 9 > 0.57 ), the best response of customer A is opposite, i.e., he will decide to make decision later and choose the table that is indicated as the larger one by the signal he received.The postponement strategy shows its value under this setting: when the imperfect but high quality signals are given, the uncertainty on the table size is low.It leads to clearly different expected table size for both tables.In such a case, customer A prefers to make decision later in order to collect and identify the larger table because it can provide a higher expected utility, compared with choosing the smaller table alone.In this case, learning from previous signals enables the latter customer to have a significant advantage.Therefore, customer A will choose to postpone his strategy to gain the advantages from learning.
Under some specific settings, we observe Type 2 and Type 3. In Type 2 case, customer A decides to make decision later only when he receives a subset of signals; customer B becomes the first customer and makes decision first.Type 3 is that customer A chooses to make decision later only when receiving a subset of signals.Then customer B becomes the first customer but changes his strategy.At z = 0, when 0.29 < 9 < 0.58, the quality of other signals may affect the strategy applied by customer A. For instance, in Figure 4, we can see that when 9 = 0.33, there are four types of Nash equilibrium.When : is not closer to 9 and Ä is not closer to Å , signal 2 and -2 is more informative than 1 and -1.In such a case, customer A chooses to postpone his decision only when receiving signal 2 or -2.The reason of customer A's strategy is that when customer A received those informative signals, he can safely assume that he already finds the larger table and predicts that either both customers choose the same large table, or customer B will choose another table.In such a case, it doesn't harm to choose a table later, and he can collect more signals to identify the large table more accurately if he postpones his decision.In sum, customer A will choose to postpone his decision only when receiving a more informative signal.For customer B, when he receives the signal 1 and -1, the signals provide different belief in the state from the signals 2 and -2.Therefore, customer B will change his strategy only when the signals 1, and -1 are stronger enough to give him a confidence that the other table is the right table.In such a case, Type 3 outcome is reached.Otherwise, Type 2 outcome is expected.

CONCLUSION
In this paper, we study the influence of decision order and how agent may benefit for the postponement strategy from a gametheoretical perspective.We first present a general model.Then, a two-player case is studied.By analyzing the model, we classify the Nash equilibrium into four types according to the execution of the postponement strategy and its influences on the opponent's strategy.We propose an iterative-update algorithm to derive the Nash equilibrium of the proposed game.From the simulations, we show that rational agents will strategically postpone their decisions when the advantage of learning is significant enough to cover the loss in negative network externality.