Q-learning algorithm

Introduction

It use utility table for Q-Values.

The best part of Q-Learning: it guarantees to provide a optimal policy.

What's Q?

Q means the function that the algorithm computes.

$Q[s,a]=$ immediate rewards + discounted rewards

Short term rewards: Daily return
Long term rewards: cumulative return

How to use Q?

\Pi(s)=argmax_a(Q[s,a])

The optimal:

$\Pi^*(s)=argmax_a(Q^*[s,a])$

Update Rule

$Q'[s,a]$ = $(1-\alpha)Q[s,a]$ + $\alpha$ * improved estimate

where Improved Estimate

= $r$ + $\gamma$ * later rewards

= $r$ + $\gamma$ * $Q[s',argmax_a(Q[s',a'])$

$\alpha$ : Learning rate [0, 1.0]

$\gamma$ : discount rate [0, 1.0]

$Q_i^1 *\gamma$

State

Can be used as state

Adjusted close/SMA
Bollinger Band Value
P/E Ratio
Holding stock
Return since entry

Creating the state

State is an integer
discretize each factor
combine all factors

Discretizing

Convert a real number into integer.

Summary

It's a model free algorithm that does not know Transition matrix T or rewards function.

Build a model

Define states, actions, rewards
Choose in-sample training period.
iterate: Q-Table update
back

Steps:

Init Q table
observe S
execute a, obverse $S'$ , r
Update Q with

<s,a,s',r>

Testing a model

Backtest on later data.

Dyna-Q

Build up Transition matrix T and Rewards matrix R to speed up model convergences for Q-Learning.

The real world training is expensive, we hallucinate many additional interactions, 100 rounds.

Learning T

$T[s,a,s']$ = prob $s,a->s'$

Init $T_c[]$ = 0.00001

while executing, observe s,a,s'

increment $T_c[s,a,s']$

$T[s,a,s']=T_c[s,a,s']/(\sum_i T[s,a,i])$

Learning R

$R'[s,a]=(1-\alpha)R[s,a]+\alpha*r$

r = immediate rewards.

R = expected reward for s,a.

Dyna-Q Algorithm

$T'[s,a,s']$ update

$R'[s,a]$ update

s = random
a = random
s' = infer from T[]
r = R[s,a]

Update Q with $<s,a,s',r>$

Reference

Reinforcement Q-Learning from Scratch in Python with OpenAI Gym

Simple Reinforcement Learning: Q-learning

Q-Learning in Python

Dyna-Q-Learning

Learn about Queue in Python

Introduction​

Update Rule​

State​

Discretizing​

Summary​

Build a model​

Testing a model​

Dyna-Q​

Learning T​

Learning R​

Dyna-Q Algorithm​

Reference​