Master Deep Q Learning with PyTorch
Table of Contents
- Introduction
- The Basics of Deep Q Learning
- What is Deep Q Learning?
- How does Deep Q Learning Work?
- Setting up the Environment
- Installing the Required Packages
- Importing the Necessary Libraries
- Creating the Deep Q Network
- Building the Deep Neural Network Model
- Defining the Loss Function and Optimizer
- Implementing the Agent
- Initializing the Agent
- Choosing Actions
- Learning from Experiences
- Running the Deep Q Learning Algorithm
- Setting the Hyperparameters
- Training the Agent
- Analyzing the Performance
- Conclusion
Introduction
Deep Q learning is a popular reinforcement learning algorithm that has been widely used for solving complex decision-making tasks. In this article, we will explore the basics of deep Q learning and how it can be implemented from scratch.
The Basics of Deep Q Learning
What is Deep Q Learning?
Deep Q learning is a variant of the Q learning algorithm that utilizes a deep neural network as a function approximator to estimate the action-value function. It combines the power of deep learning with the reinforcement learning framework, making it capable of learning directly from high-dimensional input spaces.
How does Deep Q Learning Work?
Deep Q learning works by training a deep neural network to approximate the action-value function Q(s, a), which represents the expected cumulative reward of taking action a in state s. The neural network takes the current state as an input and predicts the expected cumulative reward for each possible action. The agent then chooses the action with the highest predicted reward to interact with the environment. The agent collects experiences, including the current state, the action taken, the resulting reward, and the next state. These experiences are then used to update the neural network's parameters using a variant of the Q-learning update rule. This process is iterated over multiple episodes until the agent achieves a satisfactory level of performance.
Setting up the Environment
Installing the Required Packages
Before we dive into coding, we need to make sure that the necessary packages are installed. Open your terminal and run the following command to install the required packages:
pip install torch gym numpy
Importing the Necessary Libraries
Next, we need to import the required libraries for our implementation. We will be using the following libraries:
import gym
import numpy as np
import torch
import torch.nn as nn
import torch.optim as optim
Creating the Deep Q Network
Building the Deep Neural Network Model
In this step, we will define the structure of our deep neural network model. We will be using a simple architecture consisting of fully connected (linear) layers. Let's define a class called DeepQNetwork
that extends the nn.Module
class from the PyTorch library:
class DeepQNetwork(nn.Module):
def __init__(self, input_dims, fc1_dims, fc2_dims, n_actions):
super(DeepQNetwork, self).__init__()
self.fc1 = nn.Linear(input_dims, fc1_dims)
self.fc2 = nn.Linear(fc1_dims, fc2_dims)
self.fc3 = nn.Linear(fc2_dims, n_actions)
Defining the Loss Function and Optimizer
The next step is to define the loss function and optimizer. We will be using the mean squared error (MSE) loss function and the Adam optimizer:
class DeepQNetwork(nn.Module):
# ... (previous code)
def forward(self, state):
x = torch.relu(self.fc1(state))
x = torch.relu(self.fc2(x))
x = self.fc3(x)
return x
env = gym.make('LunarLander-v2')
Implementing the Agent
Initializing the Agent
The agent class will handle all the operations related to the agent, including choosing actions, learning from experiences, and interacting with the environment. Let's define a class called DQNAgent
:
class DQNAgent:
def __init__(self, gamma, epsilon, learning_rate, input_dims, batch_size, n_actions):
self.gamma = gamma
self.epsilon = epsilon
# ... (other variables)
self.q_eval = DeepQNetwork(input_dims, 256, 256, n_actions)
self.optimizer = optim.Adam(self.q_eval.parameters(), lr=learning_rate)
Choosing Actions
In the DQNAgent
class, we need a method to choose actions Based on the Current state. Let's define a method called choose_action
:
class DQNAgent:
# ... (previous code)
def choose_action(self, observation):
if np.random.random() > self.epsilon:
state = torch.tensor([observation], dtype=torch.float).to(self.q_eval.device)
actions = self.q_eval.forward(state)
action = torch.argmax(actions).item()
else:
action = np.random.choice(self.action_space)
return action
Learning from Experiences
The most crucial part of the DQNAgent
class is the method for learning from experiences. Let's define a method called learn
:
class DQNAgent:
# ... (previous code)
def learn(self):
self.optimizer.zero_grad()
if self.mem_cntr < self.batch_size:
return
max_mem = min(self.mem_cntr, self.mem_size)
batch = np.random.choice(max_mem, self.batch_size, replace=False)
state_batch = self.state_memory[batch]
action_batch = self.action_memory[batch]
reward_batch = self.reward_memory[batch]
new_state_batch = self.new_state_memory[batch]
done_batch = self.terminal_memory[batch]
state_batch = torch.tensor(state_batch).to(self.q_eval.device)
action_batch = torch.tensor(action_batch).to(self.q_eval.device)
reward_batch = torch.tensor(reward_batch).to(self.q_eval.device)
new_state_batch = torch.tensor(new_state_batch).to(self.q_eval.device)
done_batch = torch.tensor(done_batch).to(self.q_eval.device)
q_eval = self.q_eval.forward(state_batch).gather(1, action_batch.unsqueeze(1)).squeeze(1)
q_next = self.q_eval.forward(new_state_batch).max(1)[0]
q_next[done_batch] = 0.0
q_target = reward_batch + self.gamma * q_next
loss = self.loss(q_eval, q_target).to(self.q_eval.device)
loss.backward()
self.optimizer.step()
Running the Deep Q Learning Algorithm
Setting the Hyperparameters
Before we train our agent, we need to set the hyperparameters. These include the discount factor (gamma), exploration rate (epsilon), learning rate, input dimensions, batch size, and number of actions. Let's define these hyperparameters:
gamma = 0.99
epsilon = 1.0
learning_rate = 0.003
input_dims = env.observation_space.shape[0]
batch_size = 64
n_actions = env.action_space.n
agent = DQNAgent(gamma, epsilon, learning_rate, input_dims, batch_size, n_actions)
Training the Agent
To train the agent, we need to iterate over multiple episodes and interact with the environment. Let's define the training loop:
scores = []
eps_history = []
for i in range(500):
score = 0
done = False
observation = env.reset()
while not done:
action = agent.choose_action(observation)
new_observation, reward, done, _ = env.step(action)
agent.store_transition(observation, action, reward, new_observation, done)
agent.learn()
observation = new_observation
score += reward
scores.append(score)
eps_history.append(agent.epsilon)
if i % 100 == 0:
average_score = np.mean(scores[-100:])
print(f'Episode {i}: Average Score = {average_score}, Epsilon = {agent.epsilon}')
env.close()
Analyzing the Performance
To analyze the agent's performance, we can plot the learning curve using the plot_learning_curve
function from the utils module:
from utils import plot_learning_curve
x = [i + 1 for i in range(len(scores))]
plot_learning_curve(x, scores, eps_history, "Lunar Lander", "Episodes", "Score/Epsilon")
Conclusion
In this article, we implemented the deep Q learning algorithm from scratch to solve the Lunar Lander environment in OpenAI Gym. We started by setting up the environment, then created the deep Q network model and the agent class. We trained the agent by interacting with the environment and updating the Q values based on the experiences. Finally, we analyzed the agent's performance by plotting the learning curve.
Deep Q learning is a powerful technique that can be applied to many complex RL problems. By understanding the basics of deep Q learning and how to implement it, You can start exploring more advanced RL algorithms and solve more challenging tasks.