All Implemented Interfaces:
```
public final class RlTdAgentGeneral

                    
```
General TD learning agent for queueing network control.
This agent operates with RlEnvGeneral environments and supports:
- Value function evaluation for a fixed policy (solveForFixedPolicy)
- Policy optimization using tabular TD control (solve)
- Sparse state space exploration using HashMap-based value functions (solveByHashmap)
- Linear and quadratic value function approximation (solveByLinear, solveByQuad)
The agent uses average-reward TD(0) updates where the cost at each step is the total number of jobs in the system multiplied by the elapsed time.

Nested Class Summary

Nested Classes
Modifier and Type	Class	Description
`public final class`	`RlTdAgentGeneral.HashmapResult`	Result of the HashMap-based TD control solve.
`public final class`	`RlTdAgentGeneral.ApproximationResult`	Result of value function approximation.

Field Summary

Fields
Modifier and Type	Field	Description
`private DoubleArray`	`v`
`private IntArray`	`vSize`
`private final Double`	`lr`
`private Double`	`epsilon`
`private final Double`	`epsDecay`

Constructor Summary

Constructors
Constructor	Description
`RlTdAgentGeneral(Double lr, Double epsilon, Double epsDecay)`

Enum Constant Summary

Enum Constants
Enum Constant	Description

Method Summary

Modifier and Type	Method	Description
`final DoubleArray`	`getV()`	Value function stored as a flat array (N-dimensional table).
`final Unit`	`setV(DoubleArray v)`	Value function stored as a flat array (N-dimensional table).
`final IntArray`	`getVSize()`	Shape of the value function array.
`final Unit`	`setVSize(IntArray vSize)`	Shape of the value function array.
`final Double`	`getLr()`
`final Double`	`getEpsilon()`
`final Unit`	`setEpsilon(Double epsilon)`
`final Double`	`getEpsDecay()`
`final Unit`	`reset(RlEnvGeneral env)`	Resets the agent and environment.
`final DoubleArray`	`getValueFunction()`	Returns the learned value function.
`final DoubleArray`	`solveForFixedPolicy(RlEnvGeneral env, Integer numEpisodes)`	Evaluates the value function for the current (fixed) routing policy.
`final DoubleArray`	`solve(RlEnvGeneral env, Integer numEpisodes)`	Learns an optimal routing policy using tabular TD control.
`final RlTdAgentGeneral.HashmapResult`	`solveByHashmap(RlEnvGeneral env, Integer numEpisodes)`	Learns a routing policy using a HashMap-based sparse value function.
`final RlTdAgentGeneral.ApproximationResult`	`solveByLinear(RlEnvGeneral env, Integer numEpisodes)`	Learns a routing policy and fits a linear value function approximator.
`final RlTdAgentGeneral.ApproximationResult`	`solveByQuad(RlEnvGeneral env, Integer numEpisodes)`	Learns a routing policy and fits a quadratic value function approximator.

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

- Constructor Detail
  - RlTdAgentGeneral
```
RlTdAgentGeneral(Double lr, Double epsilon, Double epsDecay)
```
    Parameters:
    
    lr - learning rate for value function updates
    
    epsilon - initial exploration rate (0 to 1)
    
    epsDecay - decay factor applied to epsilon each episode
- Method Detail
  - getV
```
 final DoubleArray getV()
```
    Value function stored as a flat array (N-dimensional table).
  - setV
```
 final Unit setV(DoubleArray v)
```
    Value function stored as a flat array (N-dimensional table).
  - getVSize
```
 final IntArray getVSize()
```
    Shape of the value function array.
  - setVSize
```
 final Unit setVSize(IntArray vSize)
```
    Shape of the value function array.
  - getLr
```
 final Double getLr()
```
  - getEpsilon
```
 final Double getEpsilon()
```
  - setEpsilon
```
 final Unit setEpsilon(Double epsilon)
```
    Parameters:
    
    epsilon - initial exploration rate (0 to 1)
  - getEpsDecay
```
 final Double getEpsDecay()
```
  - reset
```
 final Unit reset(RlEnvGeneral env)
```
    Resets the agent and environment.
    
    Parameters:
    
    env - the general RL environment
  - getValueFunction
```
 final DoubleArray getValueFunction()
```
    Returns the learned value function.
    
    Returns:
    
    the value function as a flat array
  - solveForFixedPolicy
```
 final DoubleArray solveForFixedPolicy(RlEnvGeneral env, Integer numEpisodes)
```
    Evaluates the value function for the current (fixed) routing policy.
    This method runs TD(0) learning without modifying routing decisions. Events are sampled from the environment and the model's existing routing is used. The value function V(s) is updated to reflect the average cost under the current policy.
    This is useful for evaluating heuristic policies (e.g., JSQ, round-robin) before attempting policy improvement.
    
    Parameters:
    
    env - the general RL environment
    
    numEpisodes - number of episodes to run (typically 10^4)
    
    Returns:
    
    the learned value function as a flat array
  - solve
```
 final DoubleArray solve(RlEnvGeneral env, Integer numEpisodes)
```
    Learns an optimal routing policy using tabular TD control.
    In each episode, the agent:
    Samples an event from the environment
    Processes departures from queue nodes
    If the departure is from an action node and the state is in the action space, selects a routing action using epsilon-greedy policy based on the value of successor states
    Processes arrivals at queue nodes
    Updates the value function using average-reward TD(0)
    The epsilon parameter decays by epsDecay each episode for gradual exploitation.
    Parameters:
    
    env - the general RL environment
    
    numEpisodes - number of episodes to run (typically 10^4)
    
    Returns:
    
    the learned value function as a flat array
  - solveByHashmap
```
 final RlTdAgentGeneral.HashmapResult solveByHashmap(RlEnvGeneral env, Integer numEpisodes)
```
    Learns a routing policy using a HashMap-based sparse value function.
    Instead of allocating a full N-dimensional table, this method stores value function entries only for states actually visited during learning. States not in the map use an "external" default value.
    This is efficient for large state spaces where only a fraction of states are reachable.
    
    Parameters:
    
    env - the general RL environment
    
    numEpisodes - number of episodes to run
    
    Returns:
    
    HashmapResult containing the feature matrix X and value vector Y
  - solveByLinear
```
 final RlTdAgentGeneral.ApproximationResult solveByLinear(RlEnvGeneral env, Integer numEpisodes)
```
    Learns a routing policy and fits a linear value function approximator.
    Runs HashMap-based TD control, then fits a linear model: V(q1, q2, ..., qn) = w0 + w1q1 + w2q2 + ... + wn*qn
    The regression is performed using ordinary least squares (OLS): coefficients = (X^T X)^{-1} X^T Y
    
    Parameters:
    
    env - the general RL environment
    
    numEpisodes - number of episodes to run
    
    Returns:
    
    ApproximationResult with feature matrix, values, and regression coefficients
  - solveByQuad
```
 final RlTdAgentGeneral.ApproximationResult solveByQuad(RlEnvGeneral env, Integer numEpisodes)
```
    Learns a routing policy and fits a quadratic value function approximator.
    Runs HashMap-based TD control, then fits a quadratic model: V(q1, ..., qn) = sum_{i,j} w_{ij} * q_i * q_j + linear terms + intercept
    The feature matrix is augmented with all pairwise products of the original features (including self-products q_i^2).
    
    Parameters:
    
    env - the general RL environment
    
    numEpisodes - number of episodes to run
    
    Returns:
    
    ApproximationResult with augmented feature matrix, values, and regression coefficients

Class RlTdAgentGeneral

Nested Class Summary

Field Summary

Constructor Summary

Enum Constant Summary

Method Summary

Methods inherited from class java.lang.Object

Constructor Detail

RlTdAgentGeneral

Method Detail

getV

setV

getVSize

setVSize

getLr

getEpsilon

setEpsilon

getEpsDecay

reset

getValueFunction

solveForFixedPolicy

solve

solveByHashmap

solveByLinear

solveByQuad