Skip to content

Agent will compare the usage of Neural Network with heuristic vs Deep-Q-Network (DQN) learning to increasingly improve itself on playing a Snake game.

Notifications You must be signed in to change notification settings

23wc01/MachineLearningGamePlaying

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

39 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Neural Network with heuristic vs Deep Q Network

Artificial Intelligence 7750 - Graduate final project Relevant papers

Demo video

image

Summary

Agent will compare the usage of Neural Network with heuristic vs Deep-Q-Network (DQN) learning to increasingly improve itself on playing a Snake game.

Environment

Actions

Represent snake's 3 possible actions using one-hot encoding, with 1 = action to do and 0 = action to not do.

[1,0,0] = forward (continues in current direction)

[0,1,0] = turn right

[0,0,1] = turn left

State

state = Represents 11 conditions using one-hot encoding, with 1 = condition met and 0 = condition unmet.

  • If danger (snake collides with its own body or game window boundary) is forward, right, and or left of the snake.
  • If current direction of snake is going left, right, up, or down.
  • If mice is left, right, up, and or down of snake (can have 2 combos if it's diagonal).
    [danger_forward, danger_right_turn, danger_left_turn,
     going_left, going_right, going_up, going_down,
     mice_left, mice_right, mice_up, mice_down]
    

Ex: state = [0, 0, 1, 0, 0, 0, 1, 0, 1, 1, 0] = Danger to left of snake, snake moving downward, and mice (food) is to right & up of snake.

Model

image

Neural Network with heuristic

Uses heuristic function to determine target action to take:

  1. decided_action = Direction(s) where there's no danger
  2. decided_action = If mice in same direction snake is heading towards, return "go forward" action
  3. decided_action = If mice in direction that snake can turn towards, return that direction
  4. decided_action = If no previous conditions matched/danger everywhere just return random action

image

DQN

Reward & Penalty

  • eat_mice = +10
  • game_over = -10
  • idle_steps_after_long_time = -10 (idle/useless steps limit porportional to length of snake*100)

Q learning

Uses Bellman equation to calculate new Q values image

Gamma & Epsilon

epsilon = 80-m Random exploration if randint(0, 200) < epsilon, else do exploitation

gamma = 0.9 Results fair better when gamma, aka discount factor, set closer to 1 (aka values future rewards almost as much as current rewards)

Results comparison

Both experiments ran for 10 minutes.

Conclusion As can be seen, whereas Neural Network with heuristic approach improves in performance quickly, as time passes the increase in performance plateaus. In DQN, the performance doesn't improve as quickly initially, but as time goes on there is a clear and continuous increase in performance and no signs of plateauing yet even after 10 minutes.

Neural Network with heuristic

image

image

DQN

image

image

References

About

Agent will compare the usage of Neural Network with heuristic vs Deep-Q-Network (DQN) learning to increasingly improve itself on playing a Snake game.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages