0. Introduction
In this tutorial I’ll show how to develop a simple bot that uses reinforcement learning to win. I used python2.7, PyGame and TensorFlow with GPU acceleration enabled to achieve a faster training.
The Neural Network training script is developed by wh33ler, and I changed it a little in order to make it more readable and simple. In this example I developed a simple Flappy-Bird-like game.
1. Libraries installation
We’ll need to install NumPy , PyGame and TensorFlow. To install NumPy and PyGame we can use pip:
sudo pip install numpy
sudo pip install pygame
The TensorFlow installation is a bit different. In fact we’ll need first to install the CUDA drivers, as described here. Once we have finished we can install TensorFlow with:
export TF_BINARY_URL=https://storage.googleapis.com/tensorflow/linux/gpu/tensorflow_gpu-1.4.0-cp27-none-linux_x86_64.whl
sudo pip install --upgrade $TF_BINARY_URL
2. Testing the installation
Now that we have everything installed, we have to check if the software works. So download the code here:
https://github.com/SimoneNascivera/Avoid_obstacle_RL
Inside this repo you’ll find two folders: “Avoid_obstacle_easier and “Avoid_obstacle”. The first one is a simple game where the little ship should find the hole to pass, but the hole is always at the same place. The second one is more like the Flappy Bird game itself, with the hole randomically located in the middle of the screen.
If you open the terminal and type:
python2.7 evaluate.py
you should get something like this:
If not, ask me in the comments below or mail me.
3. Understanding the code
The code il quite simple, but it could be a little bit difficult if you’re not familiar with Python and PyGame.
The first lines are used to import the libraries and to initialize the variables.
import pygame #helps us make GUI games in python
import random #help us define which direction the ball will start moving in
#size of our window
WINDOW_WIDTH = 400
WINDOW_HEIGHT = 400
#size of our OBSTACLE
OBSTACLE_WIDTH = 10
OBSTACLE_HEIGHT = 180
OBSTACLE_SPACE = 30
#size of our SHIP
SHIP_WIDTH = 10
SHIP_HEIGHT = 10
#distance from the edge of the window
SHIP_BUFFER = 10
SHIP_SPEED = 5
#RGB colors for our paddle and ball
WHITE = (255, 255, 255)
BLACK = (0, 0, 0)
The next lines are used to initialize the game and draw the obstacle and the little ship. In order to draw something on the screen with pygame, we use
obstacle = pygame.Rect(WINDOW_WIDTH/2, center + OBSTACLE_SPACE, OBSTACLE_WIDTH, WINDOW_HEIGHT-center-OBSTACLE_SPACE)
pygame.draw.rect(screen, WHITE, obstacle)
Where “screen” is the game we init before, “WHITE” is a const we defined before and obstacle is a rectangle.
“drawScore(score, neg)” and “drawInfos(infos, action)” are used to display useful data on the screen.
So the code looks like:
#initialize our screen using width and height vars
screen = pygame.display.set_mode((WINDOW_WIDTH, WINDOW_HEIGHT))
def drawObstacle(center):
obstacle = pygame.Rect(WINDOW_WIDTH/2, center + OBSTACLE_SPACE, OBSTACLE_WIDTH, WINDOW_HEIGHT-center-OBSTACLE_SPACE)
obstacle1 = pygame.Rect(WINDOW_WIDTH/2, 0, OBSTACLE_WIDTH, center-OBSTACLE_SPACE)
pygame.draw.rect(screen, WHITE, obstacle)
pygame.draw.rect(screen, WHITE, obstacle1)
def drawShip(shipYPos, shipXPos):
#create it
paddle1 = pygame.Rect(shipXPos, shipYPos, SHIP_WIDTH, SHIP_HEIGHT)
#draw it
pygame.draw.rect(screen, WHITE, paddle#initialize our screen using width and height vars
screen = pygame.display.set_mode((WINDOW_WIDTH, WINDOW_HEIGHT))
def drawObstacle(center):
obstacle = pygame.Rect(WINDOW_WIDTH/2, center + OBSTACLE_SPACE, OBSTACLE_WIDTH, WINDOW_HEIGHT-center-OBSTACLE_SPACE)
obstacle1 = pygame.Rect(WINDOW_WIDTH/2, 0, OBSTACLE_WIDTH, center-OBSTACLE_SPACE)
pygame.draw.rect(screen, WHITE, obstacle)
pygame.draw.rect(screen, WHITE, obstacle1)
def drawShip(shipYPos, shipXPos):
#create it
paddle1 = pygame.Rect(shipXPos, shipYPos, SHIP_WIDTH, SHIP_HEIGHT)
#draw it
pygame.draw.rect(screen, WHITE, paddle1)
def drawScore(score, neg):
font = pygame.font.Font(None, 28)
scorelabel = font.render("Score " + str(score), 1, WHITE)
screen.blit(scorelabel, (30 , 10))
font = pygame.font.Font(None, 28)
scorelabel = font.render("Failed " + str(neg), 1, WHITE)
screen.blit(scorelabel, (30 , 50))
def drawInfos(infos, action):
font = pygame.font.Font(None, 15)
if(infos[3] != 'model only'):
label = font.render("step " + str(infos[0]) + " ["+str(infos[3])+"]", 1, WHITE)
screen.blit(label, (30 , 30))
label = font.render("epsilon " + str(infos[2]), 1, WHITE)
screen.blit(label, (30 , 45))
label = font.render("q_max " + str(infos[1]), 1, WHITE)
screen.blit(label, (30 , 60))
actionText = "--"
if (action[1] == 1):
actionText = "Up"
if (action[2] == 1):
actionText = "Down"
label = font.render("action " + actionText, 1, WHITE)
screen.blit(label, (30 , 75))1)
def drawScore(score, neg):
font = pygame.font.Font(None, 28)
scorelabel = font.render("Score " + str(score), 1, WHITE)
screen.blit(scorelabel, (30 , 10))
font = pygame.font.Font(None, 28)
scorelabel = font.render("Failed " + str(neg), 1, WHITE)
screen.blit(scorelabel, (30 , 50))
def drawInfos(infos, action):
font = pygame.font.Font(None, 15)
if(infos[3] != 'model only'):
label = font.render("step " + str(infos[0]) + " ["+str(infos[3])+"]", 1, WHITE)
screen.blit(label, (30 , 30))
label = font.render("epsilon " + str(infos[2]), 1, WHITE)
screen.blit(label, (30 , 45))
label = font.render("q_max " + str(infos[1]), 1, WHITE)
screen.blit(label, (30 , 60))
actionText = "--"
if (action[1] == 1):
actionText = "Up"
if (action[2] == 1):
actionText = "Down"
label = font.render("action " + actionText, 1, WHITE)
screen.blit(label, (30 , 75))
The main code comes now. We have to get a “-1” when the ship hits the wall, “0” if it is still “flying”, “1” if it successfully pass inside the two walls; if action is [0,1,0] the ship should move up, when it’s [0,0,1] should move down.
def updateShip(action, shipYPos, shipXPos, center):
#if move up
if (action[1] == 1):
shipYPos = shipYPos - SHIP_SPEED
#if move down
if (action[2] == 1):
shipYPos = shipYPos + SHIP_SPEED
#don't let it move off the screen
if (shipYPos < 0):
shipYPos = 0
if (shipYPos > WINDOW_HEIGHT - SHIP_HEIGHT):
shipYPos = WINDOW_HEIGHT - SHIP_HEIGHT
shipXPos = shipXPos + SHIP_SPEED
score = 0
#if the ship hits the walls, the score is -1
if((shipXPos >= WINDOW_WIDTH/2) and ((shipYPos> (center + OBSTACLE_SPACE-15)) or (shipYPos< (center -( OBSTACLE_SPACE-15))))):
center = random.randint(0, 370)
shipXPos = 0
shipYPos = WINDOW_HEIGHT / 2 - SHIP_HEIGHT / 2
score = -1
#if the ship pass inside the walls, the score is 1
if((shipXPos >= WINDOW_WIDTH/2+50) and (shipYPos< (center + OBSTACLE_SPACE-15)) and (shipYPos> (center -( OBSTACLE_SPACE-15)))):
center = random.randint(0, 370)
shipXPos = 0
shipYPos = WINDOW_HEIGHT / 2 - SHIP_HEIGHT / 2
score = 1
return shipYPos, shipXPos, center, score
The next part is to initialize the game itself and update the frame in each iteration. “getPresentFrame(self)” is the function that is used only the first time we update the frame and init the screen and only return the window screenshot. The most important function is “getNextFrame(self, action, infos)”, which has as input parameters the calculated action and the infos about score and return the score and the window screenshot.
class AvoidObstacle:
def __init__(self):
pygame.font.init()
#random number for initial direction of ball
num = random.randint(0,9)
#keep score
self.neg = 0
self.tally = 0
#initialie positions of paddle
self.shipYPos = WINDOW_HEIGHT / 2 - SHIP_HEIGHT / 2
#
def getPresentFrame(self):
#for each frame, calls the event queue, like if the main window needs to be repainted
pygame.event.pump()
#make the background black
screen.fill(BLACK)
#draw obstacles
self.center = random.randint(0, 370)
drawObstacle(self.center)
#draw our paddles
self.shipXPos = SHIP_BUFFER
drawShip(self.shipYPos, self.shipXPos)
#draw our ball
drawScore(self.tally, self.neg)
#copies the pixels from our surface to a 3D array. we'll use this for RL
image_data = pygame.surfarray.array3d(pygame.display.get_surface())
#updates the window
pygame.display.flip()
#return our surface data
return image_data
#update our screen
def getNextFrame(self, action, infos):
pygame.event.pump()
score = 0
screen.fill(BLACK)
#update our paddle
self.shipYPos, self.shipXPos, self.center, score = updateShip(action, self.shipYPos, self.shipXPos, self.center)
if(score == -1):
self.neg = self.neg + 1
drawObstacle(self.center)
if(self.shipXPos > WINDOW_WIDTH):
self.shipXPos = SHIP_BUFFER
drawShip(self.shipYPos, self.shipXPos)
#get the surface data
image_data = pygame.surfarray.array3d(pygame.display.get_surface())
drawScore(self.tally, self.neg)
drawInfos(infos, action)
#update the window
pygame.display.flip()
#record the total score
self.tally = self.tally + score
#return the score and the surface data
return [score, image_data]
4. Training
Once you understood how the game works, let’s start training it in order to get a successful bot. We first have to edit the RL.py script at line 22 and change it like this:
USE_MODEL = False
If you don’t change it, the bot will use the neural network trained with a random weights and bias and it won’t train.
If you did everything right when you run it at terminal:
python2.7 RL.py
You should get something like this:
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcublas.so locally
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcudnn.so locally
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcufft.so locally
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcuda.so.1 locally
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcurand.so locally
And the game should start:
The training software is designed to backup the trained weights every 5000 iterations and when you restart the training it automatically use the last backup. Once you think the training is enough you can evaluate the trained network with:
python2.7 evaluate.py
After 20 hours of training I get this result:
I hope this tutorial helped some of you and for questions and more mail me.
November 22, 2018 at 7:34 am
what kind of language programming?
February 1, 2019 at 7:47 am
It’s Python with TensorFlow