forge.blade.core.realm module

forge.blade.core.realm.valToRGB(x)

x in range [0, 1]

class forge.blade.core.realm.Packet

Bases: object

Wrapper for state, reward, done signals

class forge.blade.core.realm.Spawner(config)

Bases: object

Manager class responsible for agent spawning logic

spawn(realm, iden, pop, name)

Adds an entity to the given environment

Parameters
  • realm – An environment Realm object

  • iden – An identifier index to assign to the new agent

  • pop – A population index to assign to the new agent

  • name – A string to prepend to iden as an agent name

cull(pop)

Decrement the agent counter for the specified population

Parameters

pop – A population index

class forge.blade.core.realm.Realm(config, idx=0)

Bases: forge.trinity.ascend.Timed

Neural MMO environment class implementing the OpenAI Gym API function signatures. The actual (ob, reward, done, info) data contents returned by the canonical reset() and step(action) methods conform to RLlib’s Gym extensions in order to support multiple and variably sized agent populations. This means you cannot use preexisting optimizer implementations that expect the OpenAI Gym API. We recommend PyTorch+RLlib to take advantage of our prebuilt baseline implementations, but any framework that supports RLlib’s fairly popular environment API and extended OpenAI gym.spaces observation/action definitions works as well.

step(decisions)

OpenAI Gym API step function simulating one game tick or timestep

Parameters

decisions

A dictionary of agent action choices of format:

{
   agent_1: {
      action_1: [arg_1, arg_2],
      action_2: [...],
      ...
   },
   agent_2: {
      ...
   },
   ...
}

Where agent_i is the integer index of the i’th agent

You do not need to provide actions for each agent You do not need to provide an action of each type for each agent Only provided actions for provided agents will be evaluated Unprovided action types are interpreted as no-ops Invalid actions are ignored

It is also possible to specify invalid combinations of valid actions, such as two movements or two attacks. In this case, one will be selected arbitrarily from each incompatible sets.

A well-formed algorithm should do none of the above. We only Perform this conditional processing to make batched action computation easier.

Returns

observations:

A dictionary of agent observations of format:

{
   agent_1: obs_1,
   agent_2: obs_2,
   ...
]

Where agent_i is the integer index of the i’th agent and obs_i is the observation of the i’th’ agent. Note that obs_i is a structured datatype – not a flat tensor. It is automatically interpretable under an extended OpenAI gym.spaces API. Our demo code shows how do to this in RLlib. Other frameworks must implement the same extended gym.spaces API to do the same.

rewards:

A dictionary of agent rewards of format:

{
   agent_1: reward_1,
   agent_2: reward_2,
   ...
]

Where agent_i is the integer index of the i’th agent and reward_i is the reward of the i’th’ agent.

By default, agents receive -1 reward for dying and 0 reward for all other circumstances. Realm.hook provides an interface for creating custom reward functions using full game state.

dones:

A dictionary of agent done booleans of format:

{
   agent_1: done_1,
   agent_2: done_2,
   ...
]

Where agent_i is the integer index of the i’th agent and done_i is a boolean denoting whether the i’th agent has died.

Note that obs_i will be a garbage placeholder if done_i is true. This is provided only for conformity with OpenAI Gym. Your algorithm should not attempt to leverage observations outside of trajectory bounds.

infos:

An empty dictionary provided only for conformity with OpenAI Gym.

Return type

(dict, dict, dict, None)

reset()

Instantiates the environment and returns initial observations

Neural MMO simulates a persistent world. It is best-practice to call reset() once per environment upon initialization and never again. Treating agent lifetimes as episodes enables training with all on-policy and off-policy reinforcement learning algorithms.

We provide this function for conformity with OpenAI Gym and compatibility with various existing off-the-shelf reinforcement learning algorithms that expect a hard environment reset. If you absolutely must call this method after the first initialization, we suggest using very long (1000+) timestep environment simulations.

Returns

observations, as documented by step()

reward(entID)

Computes the reward for the specified agent

You can override this method to create custom reward functions. This method has access to the full environment state via self. The baselines do not modify this method. You should specify any changes you may have made to this method when comparing to the baselines

Returns

reward:

The reward for the actions on the previous timestep of the entity identified by entID.

Return type

float

spawn()

Called when an agent is added to the environment

You can override this method to specify custom spawning behavior with full access to the environment state via self.

Returns

entID:

An integer used to uniquely identify the entity

popID:

An integer used to identity membership within a population

prefix:

The agent will be named prefix + entID

Return type

(int, int, str)

Notes

This API hook is mainly intended for population-based research. In particular, it allows you to define behavior that selectively spawns agents into particular populations based on the current game state – for example, current population sizes or performance.

clientData()

Data packet used by the renderer

Returns

A packet of data for the client

Return type

packet

act(actions)

Execute agent actions

Parameters

actions – A dictionary of agent actions

prioritize(decisions)

Reorders actions according to their priorities

Parameters

decisions – A dictionary of agent actions

Returns

Repriotized actions

Notes

Only saves the first action of each priority

stepEnv()

Advances the environment

stepEnts(decisions)

Advance agents

Parameters

decisions – A dictionary of agent actions

Returns

State-reward-done packets dones : A list of dead agent IDs

Return type

packets

postmortem(ent, dead)

Add agent to the graveyard if it is dead

Parameters
  • ent – An agent object

  • dead – A list of dead agents

Returns

Whether the agent is dead

Return type

bool

cullDead(dead)

Deletes the specified list of agents

Parameters

dead – A list of dead agent IDs to remove

getStims()

Gets agent stimuli from the environment

Parameters

packets – A dictionary of Packet objects

Returns

The packet dictionary populated with agent data

property size

Returns the size of the game map

You can override this method to create custom reward functions. This method has access to the full environment state via self. The baselines do not modify this method. You should specify any changes you may have made to this method when comparing to the baselines

Returns

size:

The size of the map as (rows, columns)

Return type

tuple(int, int)

registerOverlay(overlay, name)

Registers an overlay to be sent to the client

This variable is included in client data passed to the renderer and is typically used to send value maps computed using getValStim to the client in order to render as an overlay.

Parameters

values – A map-sized (self.size) array of floating point values

getValStim()

Simulates an agent on every tile and returns observations

This method is used to compute per-tile visualizations across the entire map simultaneously. To do so, we spawn agents on each tile one at a time. We compute the observation for each agent, delete that agent, and go on to the next one. In this fashion, each agent receives an observation where it is the only agent alive. This allows us to isolate potential influences from observations of nearby agents

This function is slow, and anything you do with it is porbably slower. As a concrete example, consider that we would like to visualize a learned agent value function for the entire map. This would require computing a forward pass for one agent per tile. To cut down on computation costs, we omit lava tiles from this method

Returns

observations:

A dictionary of agent observations as specified by step()

stimuli:

A dictionary of raw game object observations as follows:

{
   agent_1: (tiles, agent),
   agent_2: (tiles, agent),
   ...
]

Where agent_i is the integer index of the i’th agent, tiles is an array of observed game tiles, and agent is the game object corresponding to agent_i

Return type

(dict, dict)