A New Mechanism of Dynamic Spectrum Access Based on Restless Bandit Allocation Indices Zhu Jiang

Based on the theory of Restless Multi-Armed Bandit model, a novel mechanism of dynamic spectrum access was proposed for the problem that how to coordinate multi-user access multi-channel which were idle. Firstly, take care of the sensing error must be exist in the practical network, the Whittle index policy which can deal with sensing error effectively was derived, in this policy, the users achieved one belief value for every channel based on the historical experience accumulation and chose the channels, which was need to sense and access, by considering the reward of immediate and future based on the belief values. Secondly, this paper used the multi-bid auction algorithm to deal with the collision among secondary users when they select the channels to improve the spectrum utilization. The simulation results demonstrate that, in the same environment, the cognitive users with the proposed mechanism have higher throughtput than the mechanism without dealing with sensing error or without multi-bid.


INTRODUCTION
Now, there are two major problems in the wireless communication network.Firstly, because of the scarcity of spectrum resources, the high-efficiency of spectrum is very important [1].Secondly, it is very difficult to gain the complete information of network environment.Therefore, it is the key that how to optimize the usage of spectrum resources under limited information of network environment.In the wireless network, the spectrum access problem based on cognitive radio has become the research hotspot [2][3].Wherein, RMAB model has adopted in many literatures, because it has the characteristics that dosen't depend on the information of environment to optimize [4] .In [5],the problem of multichannel allocation in single-hop mobile networks with multiple service classes was formulated as RMAB, and the sufficient conditions for the optimality of a myopic type index policy were established.In [6] proposed the spectrum access mechanism based on Whittle's index policy, proved the indexability of Whittle's index policy, and achieved the optimal solution of Whittle's index policy under the Lagrangian relaxation.The existing spectrum access researches based on RMAB model have the following characteristics: on one hand, they do not consider the impact of the user's current channel access for future access behavior; on the other hand, there are no one corresponding mechanism or rule to deal with the collision generated when multi-user access the same channel.Therefore, we proposed the spectrum access mechanism, considering the impact of the user's current channel access for future access behavior, deducing the Whittle's index policy which can deal with sensing error effectively, and improving the strategy of channels access.Meanwhile, we achieved the spectrum optimization allocation through the multi-bid auction mechanism to deal with the collision of users.Finally, we verified that the proposed mechanism can realize the approximate optimal allocation of spectrum allocation in the network and achieved a better spectrum efficiency.

Channel Model
We consider a cognitive network with K users, the users can dynamically use a spectrum which was divided into N channels, the set of users and channels can express as respectively.The channels states space can write as . The state of channel n can model as Markov Chain, the transition probability is Pr1 nn stst →+, and these N channels are modeled as independent [7] .Due to the factor of fading, noise, the difference of users location and so on, different channels have different qualities for the same user, as well, same channel has different qualities for the different users.The channels qualities are time varying for the same user on the same channel, but, the average of channels qualities can be regard as not time-varying.Therefore, if channel n is allocated to user n , then user n can gain the average transfer rate with , nk g .

Protocol Architecture
As Fig 1, in time slot t , assuming every user has a chance to access ( ) k t M channels.We use the multi-bid auction mechanism to deal with the collision of users.In RMAB model, the users update the belief value of channels according to the history information.The belief values reflect the idle probability of channel.Then, the users will introduce an index that measures how attractive it is to activate a particular arm at its current state, and activate those ( ) k t M arms with the largest index at each time slot.After all, the users give bid based on the belief value for the channels which they chose respectively, the seller achieves the optimal channels allocation according to the winner judgment algorithm, then, transmits the outcome to the winners.We define the set of allocation outcome of user k as () k t C , user k can sense the channels which are in the set, if the sensing outcome is 1 , then, user k will access the channel and transmit data.After the transmission, the receiver and transmitter of user k will transmit the ACK or NAK to indicate the success or failure of the communication respectively.In the last of time slot, the users update belief value of channels according to propose algorithm.

Description of RMAB Model and Belief Value Update
It is known from the protocol, at the beginning of time slot t , the users do not sense the channels, so, they don't know the states of channels.But, the users can achieve the conditional probabilities that each channel is in state 1 given all past decisions and observations.Also, the probability is a sufficient statistic for optimal making decision [8] and it is called belief value in this paper.We define the belief vector is denote the sensing outcome of user k for N channels, and if () We consider the sensing errors of users, assuming the false alarm probability and missed detection probability are independent with t and n , the expression of the false alarm probability and missed detection probability are as following, respectively: Pr1|0 Since user is the unknown of the real states of channels, user k can verify the accuracy of channel sensing outcome according to response information nk at ∈ which was gained after data transmission [8] (assumption the response information is completely correct).After considering the false alarm probability and missed detection probability, user k can update the belief value through equation ( 2) at time slot 1 t + according to the set of allocated channel () , the sensing outcome ( ) and the response information ( ) Where: where is the immediate reward for the selected channels.

Whittle's Index Policy with Sensing Error
We have modeled the dynamic spectrum access problem as RMAB model, unfortunately, the optimal solution to a RMAB problem is often intractable: the problem has been shown to be PSPACE-hard [9] .So, we adopt the Whittle's index policy to solve the RMAB problem which was proposed by Whittle in 1988, this policy can gain a index value for each channel based on the belief value, which denotes the attraction of choosing the channel, and then user k chooses the ( ) channels with the largest index as the set of sense channels at each time slot.In the meantime, we give the subsidy m for not choosing channel to make the selection more comprehensive and accurate, then, gain maximum throughput as possible.What's more, Whittle also proposed the indexability of RMAB model when use the Whittle's index policy [8] .In this paper, we used the two lemmas to achieve the indexability and the expression of Whittle's index.

Lemma 1：An arm is indexable if the passive set ( )
Umof the corresponding single-armed bandit process with subsidy m monotonically increases from ∅ to the whole state space every arm is indexable [9] .Lemma 2 ：If an arm is indexable, its Whittle index () ( ) Wt ω of the state ( ) , nk t ω is the infimum subsidy m so that it is optimal to make the arm passive (not be selected) at ( ) Wt ω is the infimum subsidy m that makes the passive and active actions equally rewarding: ( :,;0 inf ,;1 Through a series of analysis and reasoning, the Whittle's index has been achieved under the ideal situation [9] .Theorem 1：Whittle's index with sensing error is shown as follow： While ( ) ( )

Multi-Bid Auction Mechanism
From the protocol frame, there would generate collisions when many users choose the same channel.So, we use the multi-bid auction mechanism to deal with the collisions, and achieve the reasonable utilization of spectrum resources.We regard the environment of the users as the marketplace, the goods in the marketplace are the two-dimensional time-frequency channel resources.The process of multi-bid auction is shown above, each auction round corresponding to a time slot, and the key steps include of bid, winner decision, Levy a tax. the concrete process is shown below: as the vector of bid of user k at time slot t , also − B is the vector of bid which stands for excepting for user k at time slot t .

( ) ( )
, nkk btt ∈ B denotes the bids which was paid by user k for channel n : Use ( )

b) Winner Decision
After the t round bid, the channel i was allocated to user which was determined by the winner decision.The basic standard of winner decision is to maximize the auctioneer's revenue, that is: Strategy of Levy a Tax Because users are rational and selfish, so there are some users obtaining the usage of some certain channels by cheating ways.For example, user k will obtain the usage of channel n by improving the bid of channel n , i.e.
So, we use levy strategy to deal with the problem of lying about the bid.The levy strategy is defined as:

SIMULATION ANALYSIS
In this paper, we mainly simulate in two different miss detection probabilit ( ) 0.05;0.12δδ ==environments(for without loss of generality, we set 1 The labels in the simulation diagrams will be explained as follow: 'W + error + bid' denoting the mechanism that the Whittle's index policy which consider the sensing error and the multi-bid auction.After all, we define the enumerative algorithm to gain the approximate system traffic limit as the basis of mechanism performance judgment:  In Fig. 2 ( )

7,15 KN ==
, we can see that the mechanism which have processed the sensing error making the throughput of users not changing under different miss detection probability, but for the mechanism with not dealing with sensing errors, the throughput are declining.Also, the total system throughputs are all declining with the increasing of miss detection probability in both mechanisms.
In Fig. 3 and Fig. 4 we used different miss detection probability, In Fig. 3 we set 0.05 δ =, Fig. 4 0.12 δ =, we can see that the Whittle's index policy which have processed the sensing error can make the user evolve the idle probability of channel in this slot accurately, and improve the throughput of his own.In the meantime, there is decline for the performance of access mechanism with no sensing error with the increase of miss detection probability

SUMMARY
In this paper, we have considered the problem that the users didn't know the states and the usage of the channels, when they inter to the new network area.We proposed one adaptive channel access mechanism based on RMAB model which is under the situation that multi-user can access multi-channel simultaneously.In addition, we combine the Whittle's index policy and multi-bid auction under the RMAB model and achieve the optimal channel bandwidth reward (throughput) for the users.The simulation results show that, the proposed mechanism can make the users gain higher channel bandwidth than other access mechanism, and when the channel model was modeled as a continuous switching process model, the users can still learn the state transfer situation accurately according to the historical experience, and improve the utilization rate of the channel effectively.

Fig. 1
Fig. 1 protocol frame is the idle probability of each channel which was achieved by user k at time slot t .Here, we have modeled the dynamic multi-channel access problem as RMAB model and solved it.The standard pattern of RMAB model can express as: function denotes the maximum expected reward which was achieved for time slot t under the believe value ( ) t W , which is denoted below:

Fig 2：the impact of missed detection probabilitys Fig 3 : 7 ,
Fig 2：the impact of missed detection probabilitys
the vector of value of user k at time slot