0. Introduction

In this tutorial I’ll show how to develop a simple bot that uses reinforcement learning to win. I used python2.7, PyGame and TensorFlow with GPU acceleration enabled to achieve a faster training.

The Neural Network training script is developed by wh33ler, and I changed it a little in order to make it more readable and simple. In this example I developed a simple Flappy-Bird-like game.

1. Libraries installation

We’ll need to install NumPy , PyGame and TensorFlow. To install NumPy and PyGame we can use pip:

sudo pip install numpy
sudo pip install pygame

The TensorFlow installation is a bit different. In fact we’ll need first to install the CUDA drivers, as described here. Once we have finished we can install TensorFlow with:

export TF_BINARY_URL=https://storage.googleapis.com/tensorflow/linux/gpu/tensorflow_gpu-1.4.0-cp27-none-linux_x86_64.whl
sudo pip install --upgrade $TF_BINARY_URL

2. Testing the installation

Now that we have everything installed, we have to check if the software works. So download the code here:

https://github.com/SimoneNascivera/Avoid_obstacle_RL

Inside this repo you’ll find two folders: “Avoid_obstacle_easier and “Avoid_obstacle”. The first one is a simple game where the little ship should find the hole to pass, but the hole is always at the same place. The second one is more like the Flappy Bird game itself, with the hole randomically located in the middle of the screen.

If you open the terminal and type:

python2.7 evaluate.py

you should get something like this:

If not, ask me in the comments below or mail me.

3. Understanding the code

The code il quite simple, but it could be a little bit difficult if you’re not familiar with Python and PyGame.

The first lines are used to import the libraries and to initialize the variables.

import pygame #helps us make GUI games in python
import random #help us define which direction the ball will start moving in

#size of our window
WINDOW_WIDTH = 400
WINDOW_HEIGHT = 400

#size of our OBSTACLE
OBSTACLE_WIDTH = 10
OBSTACLE_HEIGHT = 180
OBSTACLE_SPACE = 30
#size of our SHIP
SHIP_WIDTH = 10
SHIP_HEIGHT = 10
#distance from the edge of the window
SHIP_BUFFER = 10
SHIP_SPEED = 5

#RGB colors for our paddle and ball
WHITE = (255, 255, 255)
BLACK = (0, 0, 0)

The next lines are used to initialize the game and draw the obstacle and the little ship. In order to draw something on the screen with pygame, we use

obstacle = pygame.Rect(WINDOW_WIDTH/2,  center + OBSTACLE_SPACE, OBSTACLE_WIDTH, WINDOW_HEIGHT-center-OBSTACLE_SPACE)
pygame.draw.rect(screen, WHITE, obstacle)

Where “screen” is the game we init before, “WHITE” is a const we defined before and obstacle is a rectangle.

“drawScore(score, neg)” and “drawInfos(infos, action)” are used to display useful data on the screen.

So the code looks like:

#initialize our screen using width and height vars
screen = pygame.display.set_mode((WINDOW_WIDTH, WINDOW_HEIGHT))

def drawObstacle(center):
    obstacle = pygame.Rect(WINDOW_WIDTH/2,  center + OBSTACLE_SPACE, OBSTACLE_WIDTH, WINDOW_HEIGHT-center-OBSTACLE_SPACE)
    obstacle1 = pygame.Rect(WINDOW_WIDTH/2, 0, OBSTACLE_WIDTH, center-OBSTACLE_SPACE)
    pygame.draw.rect(screen, WHITE, obstacle)
    pygame.draw.rect(screen, WHITE, obstacle1)

def drawShip(shipYPos, shipXPos):
    #create it
    paddle1 = pygame.Rect(shipXPos, shipYPos, SHIP_WIDTH, SHIP_HEIGHT)
    #draw it
    pygame.draw.rect(screen, WHITE, paddle#initialize our screen using width and height vars
screen = pygame.display.set_mode((WINDOW_WIDTH, WINDOW_HEIGHT))

def drawObstacle(center):
    obstacle = pygame.Rect(WINDOW_WIDTH/2,  center + OBSTACLE_SPACE, OBSTACLE_WIDTH, WINDOW_HEIGHT-center-OBSTACLE_SPACE)
    obstacle1 = pygame.Rect(WINDOW_WIDTH/2, 0, OBSTACLE_WIDTH, center-OBSTACLE_SPACE)
    pygame.draw.rect(screen, WHITE, obstacle)
    pygame.draw.rect(screen, WHITE, obstacle1)

def drawShip(shipYPos, shipXPos):
    #create it
    paddle1 = pygame.Rect(shipXPos, shipYPos, SHIP_WIDTH, SHIP_HEIGHT)
    #draw it
    pygame.draw.rect(screen, WHITE, paddle1)

def drawScore(score, neg):    
    font = pygame.font.Font(None, 28)    
    scorelabel = font.render("Score " + str(score), 1, WHITE)
    screen.blit(scorelabel, (30 , 10))
    font = pygame.font.Font(None, 28)    
    scorelabel = font.render("Failed " + str(neg), 1, WHITE)
    screen.blit(scorelabel, (30 , 50))

def drawInfos(infos, action):
    font = pygame.font.Font(None, 15)
    if(infos[3] != 'model only'):        
        label = font.render("step " + str(infos[0]) + " ["+str(infos[3])+"]", 1, WHITE)
        screen.blit(label, (30 , 30))
        label = font.render("epsilon " + str(infos[2]), 1, WHITE)
        screen.blit(label, (30 , 45))
        label = font.render("q_max " + str(infos[1]), 1, WHITE)
        screen.blit(label, (30 , 60))
        actionText = "--"
        if (action[1] == 1):
            actionText = "Up"
        if (action[2] == 1):
            actionText = "Down"
        label = font.render("action " + actionText, 1, WHITE)
        screen.blit(label, (30 , 75))1)

def drawScore(score, neg):    
    font = pygame.font.Font(None, 28)    
    scorelabel = font.render("Score " + str(score), 1, WHITE)
    screen.blit(scorelabel, (30 , 10))
    font = pygame.font.Font(None, 28)    
    scorelabel = font.render("Failed " + str(neg), 1, WHITE)
    screen.blit(scorelabel, (30 , 50))

def drawInfos(infos, action):
    font = pygame.font.Font(None, 15)
    if(infos[3] != 'model only'):        
        label = font.render("step " + str(infos[0]) + " ["+str(infos[3])+"]", 1, WHITE)
        screen.blit(label, (30 , 30))
        label = font.render("epsilon " + str(infos[2]), 1, WHITE)
        screen.blit(label, (30 , 45))
        label = font.render("q_max " + str(infos[1]), 1, WHITE)
        screen.blit(label, (30 , 60))
        actionText = "--"
        if (action[1] == 1):
            actionText = "Up"
        if (action[2] == 1):
            actionText = "Down"
        label = font.render("action " + actionText, 1, WHITE)
        screen.blit(label, (30 , 75))

The main code comes now. We have to get a “-1” when the ship hits the wall, “0” if it is still “flying”, “1” if it successfully pass inside the two walls; if action is [0,1,0] the ship should move up, when it’s [0,0,1] should move down.

def updateShip(action, shipYPos, shipXPos, center):
    #if move up
    if (action[1] == 1):
        shipYPos = shipYPos - SHIP_SPEED
    #if move down
    if (action[2] == 1):
        shipYPos = shipYPos + SHIP_SPEED

    #don't let it move off the screen
    if (shipYPos < 0):
        shipYPos = 0
    if (shipYPos > WINDOW_HEIGHT - SHIP_HEIGHT):
        shipYPos = WINDOW_HEIGHT - SHIP_HEIGHT

    shipXPos = shipXPos + SHIP_SPEED
    score = 0
    #if the ship hits the walls, the score is -1
    if((shipXPos >= WINDOW_WIDTH/2) and ((shipYPos> (center + OBSTACLE_SPACE-15)) or (shipYPos< (center -( OBSTACLE_SPACE-15))))):      
        center = random.randint(0, 370)
        shipXPos = 0
        shipYPos = WINDOW_HEIGHT / 2 - SHIP_HEIGHT / 2
        score = -1
    #if the ship pass inside the walls, the score is 1
    if((shipXPos >= WINDOW_WIDTH/2+50) and (shipYPos< (center + OBSTACLE_SPACE-15)) and (shipYPos> (center -( OBSTACLE_SPACE-15)))):
        center = random.randint(0, 370)
        shipXPos = 0
        shipYPos = WINDOW_HEIGHT / 2 - SHIP_HEIGHT / 2
        score = 1
    return shipYPos, shipXPos, center, score

The next part is to initialize the game itself and update the frame in each iteration. “getPresentFrame(self)” is the function that is used only the first time we update the frame and init the screen and only return the window screenshot. The most important function is “getNextFrame(self, action, infos)”, which has as input parameters the calculated action and the infos about score and return the score and the window screenshot.

class AvoidObstacle:
    def __init__(self):
        pygame.font.init()
        #random number for initial direction of ball
        num = random.randint(0,9)
        #keep score
        self.neg = 0
        self.tally = 0
        #initialie positions of paddle
        self.shipYPos = WINDOW_HEIGHT / 2 - SHIP_HEIGHT / 2
    #
    def getPresentFrame(self):
        #for each frame, calls the event queue, like if the main window needs to be repainted
        pygame.event.pump()
        #make the background black
        screen.fill(BLACK)
        #draw obstacles
        self.center = random.randint(0, 370)
        drawObstacle(self.center)
        #draw our paddles
        self.shipXPos = SHIP_BUFFER
        drawShip(self.shipYPos, self.shipXPos)
        #draw our ball
        drawScore(self.tally, self.neg)  
        #copies the pixels from our surface to a 3D array. we'll use this for RL
        image_data = pygame.surfarray.array3d(pygame.display.get_surface())
        #updates the window
        pygame.display.flip()

        #return our surface data
        return image_data

    #update our screen
    def getNextFrame(self, action, infos):
        pygame.event.pump()
        score = 0
        screen.fill(BLACK)
        #update our paddle
        self.shipYPos, self.shipXPos, self.center, score = updateShip(action, self.shipYPos, self.shipXPos, self.center)
        if(score == -1):
            self.neg = self.neg + 1
        drawObstacle(self.center)
        if(self.shipXPos > WINDOW_WIDTH):
            self.shipXPos = SHIP_BUFFER
        drawShip(self.shipYPos, self.shipXPos)
        #get the surface data
        image_data = pygame.surfarray.array3d(pygame.display.get_surface())
        drawScore(self.tally, self.neg)  
        drawInfos(infos, action)
        #update the window
        pygame.display.flip()
        #record the total score
        self.tally = self.tally + score     
        #return the score and the surface data
        return [score, image_data]

4. Training

Once you understood how the game works, let’s start training it in order to get a successful bot. We first have to edit the RL.py script at line 22 and change it like this:

USE_MODEL = False

If you don’t change it, the bot will use the neural network trained with a random weights and bias and it won’t train.

If you did everything right when you run it at terminal:

python2.7 RL.py

You should get something like this:

I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcublas.so locally
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcudnn.so locally
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcufft.so locally
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcuda.so.1 locally
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcurand.so locally

And the game should start:

The training software is designed to backup the trained weights every 5000 iterations and when you restart the training it automatically use the last backup. Once you think the training is enough you can evaluate the trained network with:

python2.7 evaluate.py

After 20 hours of training I get this result:

I hope this tutorial helped some of you and for questions and more mail me.