«Abstract This experiment attempts to use NeuroEvolution of Augmenting Topologies (NEAT) create a Multiagent System that coevolves cooperative ...»
Coevolution of Multiagent Systems using NEAT
This experiment attempts to use NeuroEvolution of Augmenting Topologies (NEAT) create
a Multiagent System that coevolves cooperative learning agents with a learning task in the form
of three predators hunting the prey. Since the prey is hardcoded in a way that eludes the nearest
predator, methods to capture the prey elude supervised learning techniques, i.e. cooperation is
required to succeed. This is in some part an extension of Yong and Miikkulainen’s experiment Coevolution of Role-Based Cooperation in Multiagent Systems, except the Multiagent Systems use NEAT for the adaptive method rather than Symbiotic, Adaptive NeuroEvolution Enforced SubPopulations (SANE ESP). The NEAT multiagent system consisted of coevolving three heterogeneous networks with a common ﬁtness function that rewards being close to the prey upon capture or the time limit imposed. The ﬁtness scores among the population promisingly grew, but stagnation proved to be a large obstacle during later generations. Emergent behaviors of teamwork and clear evidence of heterogeneity leading to diﬀerent roles were all found in the behavior and topology. This experiment made use of past research to create a rudimentary NEAT system that wouldn’t have been able to accomplish the same goal with a homogeneous or single agent system.
1 Introduction Cooperative multiagent problem solving involves agents working together with a shared goal. By working in a parallel fashion, multiagent systems can solve problems with more eﬃciency, robustness, and ﬂexibility than single-agent systems could. Cooperation can be implemented with the consideration of factors such as coordinating the agents with a single controller or having separate diverse controllers, i.e. homogeneity vs. heterogeneity, and whether communication is truly necessary.
Yong and Miikkulainen demonstrates in their paper Coevolution of Role-Based Cooperation in Multiagent Systems the implementation of a multiagent system that solves a predator-prey capture task with 3 predators chasing 1 prey using role-based cooperation, which is deﬁned as noncommunicating cooperation based on stigmergic coordination. The cooperative learning task is solved by evolving neural networks in a number of ways to determine the most eﬀective setup. The optimal set-up was found to be coevolving heterogeneous, autonomous networks without communication.
It is the goal of this paper to recreate Yong and Miikkulainen’s multiagent system using a
diﬀerent adaptive method in NEAT1, rather than SANE ESP. The learning task is the same:
3 predators evolving to chase and eventually capture the prey hardcoded to run away from the We use the python implementation of Kenneth O. Stanley which can be found on his website. www.cs.ucf.edu/∼kstanley/neat.html predators. A simulator was created to copy the same task domain as Yong and Miikkulainen’s discrete grid world. However, the simulator utilizes a toroidal world and has a continuous coordinate system. This is arguably better because steps become more natural since the predators can head in any direction. The main variable changed for analysis is simply the type of neural network used to coevolve the predators. By changing the type of neural network but possibly recreating the same result of role-based coordination, we can conﬁrm the level of robustness with applying concepts of heterogeneity and stigmergy.
SANE ESP, or Symbiotic, Adaptive NeuroEvolution Enforced SubPopulations, is an extension of SANE which evolves a population of neurons by having each chromosome represent the connection of one neuron instead of its own entire network. SANE ESP evolves a separate population for each hidden layer neuron in the network, and the neuron populations can be solved simultaneously. ESP acts as a cooperative coevolution method, with neurons hopefully converging to the roles that bring the highest ﬁtness.
NEAT, or NeuroEvolution of Augmenting Topologies, is an adaptive method that evolves networks starting with a topology of only inputs and outputs. Over generations of evolution, new hidden layers of nodes and connections are added. By complexifying the space of potential solutions and elaborating on the simple strategies to form more complex ones, the network is more likely to build on existing solutions rather than alter them. Since the experiment makes use of a complexifying method in NEAT, no incremental task is needed for the learning agents to converge to a solution. For example, the speed of the prey does not need to be incremented to allow the predators to get accustomed ﬁrst. However, SANE ESP should prove to be more eﬃcient since each chromosome only has to represent the connections of one neuron instead of that of a whole neural network like this experiment does. However, we can hope to, even with a bulkier chromosome population, converge to a high-level solution.
Stanley and Miikkulainen’s paper Competitive Coevolution through Evolutionary Complexiﬁcation shows that complexiﬁcation, deﬁned as the incremental elaboration of solutions through adding new structure, through NEAT develops more complex strategies that can reach higher level of optimal solutions.
Communication in the context of the experiment consists of giving each predator the position oﬀsets of the other predators as input for each of their neural networks. The question arises when it comes to determining the inputs we give our learning agents. In addition to giving the agents the prey oﬀsets, do we have them broadcast their locations to each other?
Yong and Miikkulainen concluded in their paper by direct comparison that communication is actually detrimental to performance when stigmergic coordination is possible. Stigmergy refers to the indirect coordination between agents done by observing the direct change in the environment caused by the actions of the agents. When stigmergy can be achieved, cooperation becomes much less costly due to the cost of requiring additional resources to make meaningful connections out of these largely noisy inputs. Since this experiment also makes room for stigmergic coordination, the NEAT networks will be coevolved without communication and will instead rely on stigmergy.
Heterogeneity when it comes to developing learning agents involves separately evolving the cooperative agents, rather than having homogeneity which means using the same singular network for each agent. Heterogeneity has been found to be more eﬀective since it opens up the possibility of having diﬀerent strategies among the agents in the form of specializing roles.
In his paper Behavioral Diversity in Learning Robot Teams, Balch concludes that heterogeneity works better in situations in which there has to be a division of labor, i.e. the task cannot be solved by one agent alone. The skills required to accomplish the goal, e.g. capturing the prey in our predator-prey task, involve cornering the prey in a way that requires multiple agents to move in for the kill at once.
More relevant is Yong and Miikkulainen’s test of robustness regarding having heterogeneous agents. Homogeneity can only be accomplished through communication, since their behavior will be too similar especially since their initial positions are close together. With homogeneous teams, there was a success rate of only 3% since establishing roles and breaking symmetry were both too diﬃcult for the homogeneous teams. The homogeneous teams just could not diﬀer well enough in a way to break oﬀ and go their own way, while heterogeneous agents are found to be naturally eﬀective at breaking symmetry. The main hypothesis of this paper is that it is possible to demonstrate role-based cooperation when coevolving multiagent systems with the same learning task but with a diﬀerent adaptive method in NEAT, eﬀectively emulating Yong and Miikkulainen’s most eﬀective system of heterogeneous, non-communicating coevolution. The hypothesis that the costs of communication outweigh the beneﬁts will also be tested.
Two important diﬀerences in the implementation involve (1)the switch from a discrete grid to a continuous world where each agent can move in any direction, rather than North, South, East, West, for each step and (2)taking advantage of NEAT’s complexiﬁcation by using a non-incremental task, which requires less supervision from the user which is a step forward in neuroevolution. Using a diﬀerent adaptive method to achieve the level of cooperation in Yong and Miikulainen’s experiment would do well to conﬁrm the eﬀectiveness of applying the underlying concepts of heterogeneity, stigmergy, as well as even simply the idea of using multiagent systems.
The evaluation of this hypothesis will be done by looking for cases of heterogeneity and cooperation in the emergent behaviors of the Multiagent system. There are too many diﬀerences in the implementation to able to directly compare the results to those of Yong and Miikkulainen’s experiment.
2 Experimental Setup
2.1 Prey-Capture Task The learning task will follow the lead of Yong and Miikulainen’s experiment, with three predators in the bottom left corner chasing after the prey. Yong and Miikkulainen used a toroidal discrete 100 x 100 grid in which agents are allowed to overlap and at each step each move either North, South, East, West, or not move at all. This experiment instead uses a 300 x 300 toroidal world created using the Zelle graphics library with a continuous coordinate system in which agents of radius 10 can move in any direction.
Commands are given to the agents in the form of an ordered pair consisting of numbers representing translation and rotation. These commands of course will act as the outputs of the NEAT networks used to control each predator. The agent representing the prey is captured in when it becomes in contact with one of the predator agents, i.e. the predator overlaps with the prey. The prey is hardcoded to run away from the closest predator.
To speed up evolution, Yong and Miikkulainen incremented the diﬃculty of capturing the prey Figure 1: In this experiment, the predators are initially in the bottom left corner with the prey in the center of the board. The predators are rewarded for being close to the prey and for capture.
going from not moving to always moving away from the closest predator in terms of probabilities.
This experiment varies in that the prey will always run away from the nearest point since the very beginning, but at 70% of the speed. This is a good deal harder to accomplish, but the hope is that the complexiﬁcation involved in NEAT will allow the predators to do better than with SANE ESP.
A huge diﬀerence is the initial position of the prey is ﬁxed at the center of the board as in Figure 1., unlike in Yong and Miikkulainen’s where the prey was placed randomly in one of nine boxes of area in their 100 x 100 discrete grid. Those nine areas also were used to serve as benchmark tests for how robust their predators were. Because of the level of diﬃculty and time involved in evolving such a robust system, the NEAT multiagent system will instead be conﬁned to one simple goal. It must able to track down the prey which always starts at the centings that I found over time of using this feature. Both are mentioned in the linked pdf, but I mier of the board, which, because of the toroidal world, is the maximum initial distance.
The toroidal world adds to the layer of diﬃculty by ensuring that simply running after the prey won’t work. The prey has to be cornered and approached from diﬀerent directions at once. Having a toroidal world requires the oﬀsets, distance between agents, and angle between agents to account for the idea that the shortest distance path may lead oﬀ-grid. The prey oﬀsets used as NEAT inputs account for this as well.
There will also be a time limit set at 75 steps, since it was found to be the time it takes for a predator to go from corner to corner. This is so as to eliminate accidental captures since bonuses may go to solutions that deserve a lower score. With a time limit, the predators are forced to evolve a strategy that doesn’t waste time. This is also following Yong and Miikkulainen’s lead.
2.2 Coevolution of NEAT Populations Each NEAT network has 2 inputs and 2 outputs, with the parameters as shown in Table 2.2. The two inputs are the prey’s x and y oﬀsets (accounting for toroidal distance), and the two outputs are the translation and rotation which are used to update the predator’s position for the next step.