All Implemented Interfaces:
```
public final class RlTdAgent

                    
```
TD learning agent for queueing network routing decisions.
This agent learns an optimal routing policy using average-reward TD(0) learning. When a new job arrives (departure from source), the agent selects a queue using an epsilon-greedy policy derived from the value function. When a job departs from a queue, the queue length is decremented.
The value function is normalized after each update so that V(0,...,0) = 0, ensuring the differential value function interpretation.

Nested Class Summary

Nested Classes
Modifier and Type	Class	Description

Field Summary

Fields
Modifier and Type	Field	Description
`private DoubleArray`	`v`
`private DoubleArray`	`q`
`private IntArray`	`vSize`
`private IntArray`	`qSize`
`private final Double`	`lr`
`private Double`	`epsilon`
`private final Double`	`epsDecay`

Constructor Summary

Constructors
Constructor	Description
`RlTdAgent(Double lr, Double epsilon, Double epsDecay)`

Enum Constant Summary

Enum Constants
Enum Constant	Description

Method Summary

Modifier and Type	Method	Description
`final DoubleArray`	`getV()`	Value function stored as a flat array (N-dimensional table).
`final Unit`	`setV(DoubleArray v)`	Value function stored as a flat array (N-dimensional table).
`final DoubleArray`	`getQ()`	Q-function stored as a flat array (N+1 dimensional table).
`final Unit`	`setQ(DoubleArray q)`	Q-function stored as a flat array (N+1 dimensional table).
`final IntArray`	`getVSize()`	Shape of the value function array (one entry per dimension).
`final Unit`	`setVSize(IntArray vSize)`	Shape of the value function array (one entry per dimension).
`final IntArray`	`getQSize()`	Shape of the Q-function array (one entry per dimension, last is actionSize).
`final Unit`	`setQSize(IntArray qSize)`	Shape of the Q-function array (one entry per dimension, last is actionSize).
`final Double`	`getLr()`
`final Double`	`getEpsilon()`
`final Unit`	`setEpsilon(Double epsilon)`
`final Double`	`getEpsDecay()`
`final Unit`	`reset(RlEnv env)`	Resets the agent and environment to their initial states.
`final DoubleArray`	`getValueFunction()`	Returns the learned value function.
`final DoubleArray`	`getQFunction()`	Returns the learned Q-function.
`final Unit`	`solve(RlEnv env)`	Trains the agent using average-reward TD(0) learning.
`final static DoubleArray`	`createGreedyPolicy(DoubleArray stateQ, Double epsilon, Integer nA)`	Creates an epsilon-greedy policy from state-action values.
`final static Integer`	`getStateFromLoc(IntArray objSize, IntArray loc)`	Converts a multi-dimensional location to a linear index (column-major order).
`final static IntArray`	`getStateFromLocs(IntArray objSize, Array<IntArray> locs)`	Converts multiple multi-dimensional locations to linear indices.

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

- Constructor Detail
  - RlTdAgent
```
RlTdAgent(Double lr, Double epsilon, Double epsDecay)
```
    Parameters:
    
    lr - learning rate for value function updates
    
    epsilon - initial exploration rate for the epsilon-greedy policy (0 to 1)
    
    epsDecay - decay factor applied to epsilon each episode
- Method Detail
  - getV
```
 final DoubleArray getV()
```
    Value function stored as a flat array (N-dimensional table).
  - setV
```
 final Unit setV(DoubleArray v)
```
    Value function stored as a flat array (N-dimensional table).
  - getQ
```
 final DoubleArray getQ()
```
    Q-function stored as a flat array (N+1 dimensional table).
  - setQ
```
 final Unit setQ(DoubleArray q)
```
    Q-function stored as a flat array (N+1 dimensional table).
  - getVSize
```
 final IntArray getVSize()
```
    Shape of the value function array (one entry per dimension).
  - setVSize
```
 final Unit setVSize(IntArray vSize)
```
    Shape of the value function array (one entry per dimension).
  - getQSize
```
 final IntArray getQSize()
```
    Shape of the Q-function array (one entry per dimension, last is actionSize).
  - setQSize
```
 final Unit setQSize(IntArray qSize)
```
    Shape of the Q-function array (one entry per dimension, last is actionSize).
  - getLr
```
 final Double getLr()
```
  - getEpsilon
```
 final Double getEpsilon()
```
  - setEpsilon
```
 final Unit setEpsilon(Double epsilon)
```
    Parameters:
    
    epsilon - initial exploration rate for the epsilon-greedy policy (0 to 1)
  - getEpsDecay
```
 final Double getEpsDecay()
```
  - reset
```
 final Unit reset(RlEnv env)
```
    Resets the agent and environment to their initial states.
    Clears the value function and Q-function, then resets the environment.
    
    Parameters:
    
    env - the RL environment
  - getValueFunction
```
 final DoubleArray getValueFunction()
```
    Returns the learned value function.
    
    Returns:
    
    the value function as a flat array
  - getQFunction
```
 final DoubleArray getQFunction()
```
    Returns the learned Q-function.
    
    Returns:
    
    the Q-function as a flat array
  - solve
```
 final Unit solve(RlEnv env)
```
    Trains the agent using average-reward TD(0) learning.
    Runs the TD learning algorithm for 10,000 episodes (matching MATLAB default). In each episode:
    An event is sampled from the environment
    If a new job arrives (source departure), the agent selects a queue using epsilon-greedy policy (or JSQ if outside action space)
    If a job completes (queue departure), the queue length is decremented
    If the state is valid, the value function is updated using TD(0) rule
    The average cost rate is estimated using exponentially weighted sums of costs and times.
    Parameters:
    
    env - the RL environment to train on
  - createGreedyPolicy
```
 final static DoubleArray createGreedyPolicy(DoubleArray stateQ, Double epsilon, Integer nA)
```
    Creates an epsilon-greedy policy from state-action values.
    Each action gets a base probability of epsilon/nA. The remaining probability mass (1-epsilon) is distributed equally among all actions whose value is within FineTol of the minimum value (cost minimization).
    
    Parameters:
    
    stateQ - array of state-action values (one per action)
    
    epsilon - exploration probability
    
    nA - number of actions
    
    Returns:
    
    probability distribution over actions
  - getStateFromLoc
```
 final static Integer getStateFromLoc(IntArray objSize, IntArray loc)
```
    Converts a multi-dimensional location to a linear index (column-major order).
    This mirrors MATLAB's column-major (Fortran) linear indexing: index = loc0 + (loc1-1)*size0 + (loc2-1)*size0*size1 + ...
    Note: locations are 1-based (as in MATLAB), converted to 0-based internally.
    
    Parameters:
    
    objSize - shape of the array (size of each dimension)
    
    loc - multi-dimensional location (1-based indices)
    
    Returns:
    
    linear index (0-based) into the flat array
  - getStateFromLocs
```
 final static IntArray getStateFromLocs(IntArray objSize, Array<IntArray> locs)
```
    Converts multiple multi-dimensional locations to linear indices.
    
    Parameters:
    
    objSize - shape of the array
    
    locs - array of locations (each row is one multi-dimensional location)
    
    Returns:
    
    array of linear indices

Class RlTdAgent

Nested Class Summary

Field Summary

Constructor Summary

Enum Constant Summary

Method Summary

Methods inherited from class java.lang.Object

Constructor Detail

RlTdAgent

Method Detail

getV

setV

getQ

setQ

getVSize

setVSize

getQSize

setQSize

getLr

getEpsilon

setEpsilon

getEpsDecay

reset

getValueFunction

getQFunction

solve

createGreedyPolicy

getStateFromLoc

getStateFromLocs