Skip to main content

Reinforcement Learning introduction

Definition

A problem: The goal is to choose an action in response to each data point.

RL Robot

Robot: Sense Think Act cycle.

What's the process?

Environment

Action -> Transition function -> State of Env

Agent/Robot

State -> Policy: $\Pi(s)$ -> Collect Rewards, Action

The goal

How to find $\Pi$ to maximize ?

Trading Analog

Environment = Market
Action = Buy/Sell
State = Factors of stocks, e.g. P/E, Rollinger Band Value, etc.
Rewards = money returns
Policy: $\Pi$ = Trading strategy

Algorithm type

Model-Based

Use the transition T or the rewards R in the model.

Model Free

It does not know or use the models of the transitions T or the rewards R.

Fundamental Iterative methods

Value iteration
Policy iteration

How To Code The Value Iteration Algorithm For Reinforcement Learning Fundamental Iterative Methods of Reinforcement Learning

Reference

Reinforcement Learning: A Survey

Or Reinforcement Learning: A Survey-PDF

Section 8.2 - Reinforcement Learning: An Introduction

Section 9 - Reinforcement Learning: An Introduction PPT

Definition
What's the process?
Algorithm type
- Model-Based
- Model Free
Fundamental Iterative methods
Reference