Reinforcement Learning

RL utilities for queueing control.

The rl module provides reinforcement learning utilities for queueing control and optimization problems.

Key function categories:

Environment setup: rl_env(), rl_env_general()
TD agents: rl_td_agent(), rl_td_agent_general()

Reinforcement learning environments for queueing networks.

This module provides reinforcement learning (RL) environments that integrate with LINE queueing network models, enabling RL agents to learn control policies for queueing systems.

Key classes: - RlEnv: Basic RL environment for queueing networks - RlEnvGeneral: General-purpose RL environment - RlTDAgent: Temporal difference learning agent - RlTDAgentGeneral: General TD agent

These environments support research into adaptive control of queueing systems using reinforcement learning techniques.

class RlEnv(model, idx_of_queue_in_nodes, idx_of_source_in_nodes, state_size, gamma)[source]

Bases: object

Initialize the RL environment.

__init__(model, idx_of_queue_in_nodes, idx_of_source_in_nodes, state_size, gamma)[source]: Initialize the RL environment.

property model: Get the network model.

property action_size: Get the number of possible actions.

is_in_state_space(nodes)[source]: Check if nodes configuration is in state space.

is_in_action_space(nodes)[source]: Check if nodes configuration is in action space.

sample()[source]: Sample state and action from environment.

update(new_state)[source]: Update environment with new state.

reset()[source]: Reset environment to initial state.

class RlEnvGeneral(model, idx_of_queue_in_nodes, idx_of_action_nodes, state_size, gamma)[source]

Bases: object

Initialize the general RL environment.

__init__(model, idx_of_queue_in_nodes, idx_of_action_nodes, state_size, gamma)[source]: Initialize the general RL environment.

property model: Get the network model.

property nqueues: Get the number of queues.

property action_space: Get the action space mapping.

is_in_state_space(state)[source]: Check if state is in state space.

is_in_action_space(state)[source]: Check if state is in action space.

sample()[source]: Sample from environment.

update(sample)[source]: Update environment with sample.

reset()[source]: Reset environment to initial state.

class RlTDAgent(lr=0.05, epsilon=1.0, eps_decay=0.99)[source]

Bases: object

Initialize the TD agent.

__init__(lr=0.05, epsilon=1.0, eps_decay=0.99)[source]: Initialize the TD agent.

reset(env)[source]: Reset agent with environment.

get_value_function()[source]: Get value function as numpy array.

get_q_function()[source]: Get Q-function as numpy array.

solve(env)[source]: Solve RL problem in environment.

static create_greedy_policy(state_q, epsilon, n_a)[source]: Create greedy policy from Q-values.

static get_state_from_loc(obj_size, loc)[source]: Get state vector from location indices.

class RlTDAgentGeneral(lr=0.1, epsilon=1.0, eps_decay=0.9999)[source]

Bases: object

Initialize the advanced TD agent.

__init__(lr=0.1, epsilon=1.0, eps_decay=0.9999)[source]: Initialize the advanced TD agent.

reset(env)[source]: Reset agent with environment.

get_value_function()[source]: Get value function as numpy array.

solve_for_fixed_policy(env, num_episodes=10000)[source]: Solve for fixed policy with given episodes.

solve(env, num_episodes=10000)[source]: Solve RL problem with given episodes.

solve_by_hashmap(env, num_episodes=10000)[source]: Solve using hashmap-based value iteration.

solve_by_linear(env, num_episodes=10000)[source]: Solve using linear function approximation.

solve_by_quad(env, num_episodes=10000)[source]: Solve using quadratic function approximation.

static create_greedy_policy(state_q, epsilon, n_a)[source]: Create greedy policy from Q-values.