Skip to main content

Reinforcement Learning introduction

Definition

A problem: The goal is to choose an action in response to each data point.

RL Robot

Robot: Sense Think Act cycle.

What's the process?

Environment

Action -> Transition function -> State of Env

Agent/Robot

State -> Policy: Π(s)\Pi(s) -> Collect Rewards, Action

The goal

How to find Π\Pi to maximize ?

Trading Analog

  • Environment = Market
  • Action = Buy/Sell
  • State = Factors of stocks, e.g. P/E, Rollinger Band Value, etc.
  • Rewards = money returns
  • Policy: Π\Pi = Trading strategy

Algorithm type

Model-Based

Use the transition T or the rewards R in the model.

Model Free

It does not know or use the models of the transitions T or the rewards R.

Fundamental Iterative methods

  • Value iteration
  • Policy iteration

How To Code The Value Iteration Algorithm For Reinforcement Learning Fundamental Iterative Methods of Reinforcement Learning

Reference

Reinforcement Learning: A Survey

Or Reinforcement Learning: A Survey-PDF

Section 8.2 - Reinforcement Learning: An Introduction

Section 9 - Reinforcement Learning: An Introduction PPT