Slides | Varshil Gandhi

Slides | Varshil Gandhi https://vjg28.github.io/slides/ Slides Source Themes Academic (https://sourcethemes.com/academic/)en-usWed, 01 Apr 2020 00:00:00 +0000 Inverse RL on Mountain Car Slides https://vjg28.github.io/slides/irl-slides/ Wed, 01 Apr 2020 00:00:00 +0000 https://vjg28.github.io/slides/irl-slides/ <h1 id="irl-experiment-on-mountain-car">IRL Experiment on Mountain Car</h1> <p><a href="https://github.com/vjg28/Linear-Inverse-RL-algorithms" target="_blank">Code Application</a></p> <hr /> <h2 id="overview">Overview</h2> <p><strong>Inverse reinforcement learning</strong> deals with the case of learning the reward function for a situation or an activity where the optimum behavior is known. <figure> <a data-fancybox="" href="https://vjg28.github.io/project/linear-inverse-rl/irl.png" > <img src="https://vjg28.github.io/project/linear-inverse-rl/irl.png" alt="" ></a> </figure> </p> <hr /> <h2 id="environment-details">Environment Details</h2> <figure> <a data-fancybox="" href="https://vjg28.github.io/project/linear-inverse-rl/car.png" > <img src="https://vjg28.github.io/project/linear-inverse-rl/car.png" alt="" ></a> </figure> <hr /> <ul> <li><strong>Mountain car env</strong> : A car (agent) learning how to get to the top of the cliff in the least timesteps.</li> <li>The goal of the agent is to reach the goal position , represented by the flag at position = 0.5 units.</li> <li>The state space is a vector [position, velocity] with <ul> <li>1.2 ≤ position ≤ 0.6</li> <li>velocity∈[-0.07, 0.07]</li> <li>The above implies that the state space is continuous and infinite.</li> </ul></li> </ul> <hr /> <ul> <li>The agent can perform 3 actions 0 ,1 ,2 : <ul> <li>0: accelerate in the left</li> <li>1: zero acceleration</li> <li>2: accelerate in the right</li> </ul></li> <li>Acceleration = 0.001 units</li> <li>The reward function inbuilt the environment is $R(s,a) = -1$ for each timestep and 0 if it reaches the goal.</li> </ul> <hr /> <h2 id="experiment-details">Experiment Details</h2> <hr /> <h3 id="ng-abbel-paper-vs-ours-similarities">NG- Abbel paper vs Ours (Similarities)</h3> <ul> <li><span class="fragment " > A linear function approximator for reward function with 26 basis functions. </span></li> <li><span class="fragment " > A penalty function with penalty constant = 2 in the updates of linear programming. </span></li> <li><span class="fragment " > Equally spaced Gaussian functions as their reward basis functions. </span></li> <li><span class="fragment " > Reward function depend only on the ‘position’ feature of state. </span></li> </ul> <hr /> <h3 id="differences">Differences:</h3> <ul> <li><span class="fragment " > Test in the paper was a naive approach (Algo 2). While ours in Algo 3. </span></li> <li><span class="fragment " > The paper discretized the state space $s= [position,~ velocity]$ into $120*120$ discrete states. This makes the number of states finite. </span><br /></li> <li><span class="fragment " > They created a model based on the discretization of state space. We didn’t. </span></li> <li><span class="fragment " > They evaluated the Linear programming maximization for a bunch of 5000 states. </span></li> </ul> <hr /> <h2 id="results">Results</h2> <ul> <li>To be added</li> </ul> <hr /> <h1 id="questions">Questions?</h1> <p><a href="https://discourse.gohugo.io" target="_blank">Ask</a></p> <p><a href="https://sourcethemes.com/academic/docs/" target="_blank">Documentation</a></p>