IRL Experiment on Mountain Car

Code Application

Overview

Inverse reinforcement learning deals with the case of learning the reward function for a situation or an activity where the optimum behavior is known.

Environment Details

Mountain car env : A car (agent) learning how to get to the top of the cliff in the least timesteps.
The goal of the agent is to reach the goal position , represented by the flag at position = 0.5 units.
The state space is a vector [position, velocity] with
- 1.2 ≤ position ≤ 0.6
- velocity∈[-0.07, 0.07]
- The above implies that the state space is continuous and infinite.

The agent can perform 3 actions 0 ,1 ,2 :
- 0: accelerate in the left
- 1: zero acceleration
- 2: accelerate in the right
Acceleration = 0.001 units
The reward function inbuilt the environment is $R(s,a) = -1$ for each timestep and 0 if it reaches the goal.

Experiment Details

NG- Abbel paper vs Ours (Similarities)

A linear function approximator for reward function with 26 basis functions.
A penalty function with penalty constant = 2 in the updates of linear programming.
Equally spaced Gaussian functions as their reward basis functions.
Reward function depend only on the ‘position’ feature of state.

Differences:

Test in the paper was a naive approach (Algo 2). While ours in Algo 3.
The paper discretized the state space $s= [position,~ velocity]$ into $120*120$ discrete states. This makes the number of states finite.
They created a model based on the discretization of state space. We didn’t.
They evaluated the Linear programming maximization for a bunch of 5000 states.

Results

To be added

Questions?