EVOLVING AND CONTROLLING PERIMETER, RENDEZVOUS, AND FORAGING BEHAVIORS IN A COMPUTATION-FREE ROBOT SWARM

Designing and controlling simple collective robot behaviors often requires complex range and bearing sensors and peerto-peer communication strategies. Recent work studying swarms robots that have no computational power has shown that complex behaviors such as aggregation and clustering items can be produced from extremely simple control policies and sensing capability. We extend previous work on computation-free swarm behaviors and show that it is possible to evolve simple control policies to form a perimeter around a target, rendezvous to a specific location, and perform foraging. We also demonstrate that simple manipulations of the environment provide a form of stigmergic control, whereby these collective behavior can be controlled. The robustness and expressiveness of these behaviors, combined with the simple requirements for control and sensing, demonstrate the feasibility of implementing swarm behaviors at small scales or in extreme environments.


INTRODUCTION
Flocks of birds, schools of fish, and colonies of ants, bees, termites exhibit a remarkable robustness and resilience, despite the limited capabilities of each individual.Recently research into bio-inspired swarm robotics has been gaining popularity due to the low-cost, robust, redundant, and distributed nature of swarms [3].Potential applications for robot swarms include, search and rescue, construction, and chemical spill clean-up, as well as nano-medical applications such as finding tumors [10].Many of these applications would benefit from simple, cheap, disposable swarms of robots that can accomplish these tasks quickly and without much human supervision.
While there has been a lot of work on different swarming algorithms and technologies, many still require localization, mapping, complex coordination algorithms, and precise identification of neighboring robots' orientations and relative positions.This often results in swarm behaviors that are interesting but extremely difficult to implement on actual robotic platforms.For swarm applications in the nano-medical field, developing collective behaviors that use extremely simple controllers and sensors is especially important if these behaviors have an hope of being implemented on nano-robots [11].
Recently, Gauci et al. have shown that swarms of robots so dumb that they have no computational power-they can't even add or subtract, and have no memory can still collectively solve canonical multi-robot problems such as aggregation [6], and simple object clustering [5].There are several key benefits to researching the capabilities of extremely dumb robot swarms: (1)the dumber the robot, the cheaper and more disposable it is, (2) the simpler the control algorithm the easier to implement on real robots [8,6], and (3) even teams of smart robots may need a "Plan B" consisting of simple robust algorithms that require only the most basic capabilities in case of malfunctions and failure.
We extend the work of Gauci et al. by showing that many other interesting behaviors can be achieved using swarms of computation-free robots.Our work starts with the simple robot model proposed in [6] and adds a form of stigmergic control by changing the environment to expand the possible behaviors and control collective behaviors.We investigate what behaviors are possible given limited control over the placement of a small number of objects in the swarm's environment.We use a genetic algorithm approach to design these swarm behaviors by first defining a fitness function that describes a desired collective behavior, and then searching the space of simple controllers that best achieve this behavior.This approach allows us to evolve successively better controllers using a robot simulator to evaluate potential controllers.We present successful results on three behaviors that are possible using very simple sensors and controllers: forming a perimeter, rendezvous, and foraging.We additionally show that simple manipulations of the environment allow these behaviors to be controlled.

RELATED WORK
Trianni et al. evolved a neural network based controller that performs aggregation using swarms of S-bots [13].However, each S-bot uses eight infrared proximity sensors, three microphones, three sensors for detecting connections on the body and a gripper sensor.Baldassarre et al. evolved a controller that aggregates a group of robots and then moves them towards a light source [1].Their controller utilizes a neural network that takes in eight infrared proximity sensors readings, four directional light source sensors readings, and four directional sound sensors readings as control inputs.Gauci et al. introduced the concept of robots that can't compute [4].They showed that a simple reactive controller could be used to allow a swarm of computation-free robots with a single line-of-sight sensor to perform aggregation [6] and clustering [5].
Other work has looked at controlling collective behaviors.Rubenstein et al. [12] studied how to collectively transport items using a simple control signals and behaviors.Others have looked at controlling more complex collective transport problems [2] or using termite-inspired stigmergic control to build complex structures [14].However, none of this work considers the extreme conditions of a single line-of-sight sensor and zero computation.

PROBLEM FORMULATION
This paper investigates what collective behaviors are possible given a swarm of extremely simple robots operating in a simple environment.For our experiments we consider a circularly bounded 2D environment that is homogeneous and contains no obstacles.Throughout this space n circular robots are randomly distributed and randomly rotated such that each robot faces a random direction.Robots learn to interact with immovable targets, movable objects, and other robots to achieve global behaviors.All entities (robots, targets, and objects) are rigid such that no two entities can occupy the space at the same time.
We define simple robots as agents that are memoryless, cannot perform computations and have limited input/output capabilities.Specifically we look at robots that are only equipped with a line of sight sensor and two wheels for differential drive.The line of sight sensor can only detect the presence or absence of objects and outputs a trinary value where s = 2 corresponds to a target or object in line of sight, s = 1 corresponds to a robot in line of sight, and s = 0 corresponds to nothing in line of sight.
Simple robots are reactive in nature because they cannot remember past input or actions.As a result simple robot controllers can be to a sequential series of if-statements that assign left and right wheel velocities based on current sensor readings.This controller can be represented as a set of six wheel velocities where v l0 /vr0 are the velocities of the left/right wheel when there is no robot in the sensors current line of sight, v l1 /vr1 are the respective velocities when a robot is within the line of sight of the sensor and v l2 /vr2 are the respective velocities when a target or object is within the line of sight of the sensor.Velocities are normalized such that v = [−1, 1] where 1 corresponds to a wheel spinning forward at full speed and -1 corresponds to a wheel spinning backwards at full speed.

BEHAVIORS
This section explores several global behaviors learned using evolutionary optimization techniques.We discover global behaviors by optimizing a universal robotic controller according to a behavior dependent fitness function.Each potential robot controller is evaluated by running a swarm simulation and calculating the fitness function at every time step to generate a fitness score, which is given by where T is the number of time steps in the simulation and u(t) is the fitness function.Multiplying the fitness function by the time step rewards controllers that achieve desired behavior quickly.The robot controllers are optimized using the average fitness score over multiple simulations to reduce the effect of noise.
All simulations are run on the Enki 2.0 robot simulator, which is able simulate hundreds of robots in a 2D environment in faster than real time [9].For our experiments the simulation physics are updated 100 times per second and the robot controller is updated 10 times per second.Robots are simulated using Enki's Epuck model which have a diameter of 7.4 cm, inter-wheel distance of 5.1 cm, and weight of 152 g.Targets and objects are simulated as cylinders with a diameter of 10 cm using Enki's physical object model.Objects have a mass of 35g and a coefficient of friction of 0.58.Targets have a sufficiently large mass and coefficient of friction to ensure that are immobile.
Robot controllers are optimized using the Covariance Matrix Adaption Evolution Strategy (CMA-ES) [7].This genetic optimization technique uses the variance of each gene to generate mutations between generation.Earlier work by Gauci et al. has shown that CMA-ES can effectively optimize simple robotic controllers [5].CMA-ES optimizes across all real numbers, which can result in genes out of normalized range.To avoid this we constrain genes by applying the following sigmoid function where x is a gene optimized by CMA-ES.For our experiments we utilized the following CMA-ES parameters: population size of 13, initial step size, σ(0) = 0.72, and starting controller of V = [0, 0, 0, 0, 0, 0].

Aggregating to a Target
We first investigate what is possible when a single stationary target is placed in the environment.In this behavior where • is the Euclidean norm.This fitness function rewards solutions where the robots are close to the objects locations.

Perimeter Formation
The resulting controller that is evolved forms a perimeter around the target.The evolved controller is V = [1.0,0.37, 1.0, 1.0, −1.0, 0.83].This solution results in the robots aggregating to the target and forming a perimeter around the target.We also experimented with changing the location of the target midsimulation.The results are shown in Figure 2. The robots converge to the target and form a perimeter.When the target is placed in a new location, the entire swarm quickly moves to the new location and reforms the perimeter.This behavior is very robust and is automatic-the robots are purely reactive so they can be controlled simply by changing the environment, removing the need to broadcast information to the swarm or have additional control logic.

Rendezvous
We are also interested in having every member of the swarm gather as close as possible to the target, rather than just circle around it.Rendezvous is an important behavior for swarms because it sets the stage for more complicated behaviors by assembling a group of robots to a specific desired location.We first tried to find a controller for the rendezvous problem using the fitness function described above; however, all trials resulted in controllers in which robots would form a circle around a target.
To solve this problem we seeded the starting controller with an aggregation solution from Gauci's et al. earlier work

Foraging
In this behavior objects and robots are distributed randomly throughout the environment and the robots must gather the objects to a specified target location.Earlier work by Gauci et al. found an optimal controller for clustering objects, we extend there work by showing that this controller can be used for foraging [5].The clustering fitness function rewards global behaviors that minimize the total distance between each object and center of the cluster of objects.Let oi (t) represent the position of object i at time step t and o (t) represent the center of the object cluster.Then the fitness function is given by where m is the number of objects.Using the clustering fitness function we evolved the following controller V = [0.72,1.00, 0.40, 0.31, 0.53, −1.00] which causes the robots to circle around the objects and slowly nudge them into a central point as they pass.
The foraging behavior occurs when we place one or several fixed targets in the environment.Figure 4 shows the classic foraging problem where there is a "nest" location (shown in green) where all of the items must be gathered.Figure 5 shows an alternative foraging scheme where multiple stationary targets are placed in the environment.The convex hull of these targets defines the region into which the objects will be harvested.Similar to the previous behaviors, the foraging behavior can be controlled simply by changing  the location of the targets.The robots will then move the items to the new desired location, as shown in Figure 5.

CONCLUSIONS AND FUTURE WORK
A large amount of research has been dedicated to developing multi-agent systems that perform complex behaviors.We show that swarms of robots that can't compute can perform complex behaviors such as rendezvous to a desired location, simple perimeter monitoring of a desired location, and foraging in changing environments.Our results demonstrate that complex behaviors can be evolved from simple interactions between agents and that these behaviors can be controlled during execution by simply changing the environment.We also note that these behaviors are so simple that they could simply be hardwired, requiring no computational capabilities.We believe that this research is an important step towards swarm behaviors that can be easily implemented in hardware and produced at small, maybe even nano-scale.In the future we plan to apply these behaviors to actual robots, explore virtual targets and other forms of stigmergic control, and more rigorously explore the space of possible behaviors given our computation-free assumptions.

Figure 2 :
Figure 2: Perimeter formation around a dynamic target.