Skip to content
Snippets Groups Projects
Unverified Commit 396048cf authored by Mohammad Haghighipanah's avatar Mohammad Haghighipanah Committed by GitHub
Browse files

Experimental Random Crawl task implemented with Reinforcement Learning (DQN) (#683)

* added random crawl task to nav2 with reinforcement learning

* factored out dqn class and parameters

* modified readme files

* removed extra file

* added tol to laser append)

* addressing reviewer comments

* fixed spacing format

* added setup cfg file and updated readme

* throttling down while ture loop

* fixed shutdown issue to exit cleanly

* renamed package name and fixed get pkg directory

* fixed few typos
parent 8da4958d
No related branches found
No related tags found
No related merge requests found
Showing with 551 additions and 0 deletions
# Overview
The ROS 2 Experimental Reinfocement Learning (RL) package is a package that enables to train and control a robot with RL techniques. This package is utilizing the Gazebo simulator and ROS 2 to train a TurtleBot 3 robot.
# Contributing
We have currently implemented a Random Crawl task to get started with RL and plan to implement and add more RL related tasks to this package. This Random Crawl RL task can be used as a tutorial to get started with RL.
# Building the source
For instructions on how to build this repo, see the [BUILD.md](https://github.intel.com/haghighi/Experimental_RL/blob/master/doc/BUILD.md) file.
###Random Crawl
Random Crawl is a task which enables TurtleBot3 to randomly crawl around the environment without hitting any obstacles.
The following use-cases can be achieved with Random Crawl task:
1. Map Exploration
2. Localization test
3. Security Monitoring
#### Model
TODO
Inputs
Outputs
DQN
Parameters
####Training
To train Random Crawl:
Firtst run Gazebo:
```sh
export <directory_for_turtlebot3_ws>/src/turtlebot3/turtlebot3_simulations/turtlebot3_gazebo/models
export TURTLEBOT3_MODEL=burger or waffle
ros2 launch turtlebot3_gazebo turtlebot3_world.launch.py
```
Then,
```sh
cd <navigation2_ws> source install setup.bash
```
Run:
```sh
ros2 run nav2_turtlebot3_rl random_crawl_train
```
####Running
To run a trained model Random Crawl task:
Firtst run Gazebo:
```sh
export <directory_for_turtlebot3_ws>/src/turtlebot3/turtlebot3_simulations/turtlebot3_gazebo/models
export TURTLEBOT3_MODEL=burger or waffle
ros2 launch turtlebot3_gazebo turtlebot3_world.launch.py
```
Then,
```sh
cd <navigation2_ws> source install setup.bash
```
Run:
```sh
ros2 run nav2_turtlebot3_rl random_crawl
```
# Build Instructions
### Steps
First, install all ROS2 dependencies from ROS2 Installation page:
Second, install dependencies:
1. Install TensorFlow by following instructions from: [tensorflow](https://www.tensorflow.org/install/pip)
2. Install Keras: `sudo pip install keras`
3. Install `TurtleBot3` by following instructions from here: [TurtleBot3 ROS2 Packages](http://emanual.robotis.com/docs/en/platform/turtlebot3/ros2/#setup) Section 15.1.1.3
4. TurtleBot3 installation also installs gazebo-ros-pkgs from ROS2 branch. However, at this time the needed services are not available on the ROS2 branch. To be able to use RL package, pull Crystal branch.
5. ```sh
cd ~/turtlebot3_ws/src/gazebo/gazebo_ros_pkgs
git checkout crystal
```
6. ```sh
cd <turtlebot3_ws> colcon build --symlink-install
```
###Build Experimental RL
```sh
cd <directory_ros2_ws> source install/setup.bash
cd <turtlebot3_ws> source install setup.bash
cd <navigation2_ws> colcon build --symlink-install
```
# Copyright (c) 2019 Intel Corporation
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import random
import numpy as np
import tensorflow as tf
import parameters
from collections import deque
from keras.models import Sequential
from keras.layers import Dense
from keras.optimizers import Adam
from keras.models import load_model
class DQN:
def __init__(self, observation_space, action_size):
self.action_size = action_size
self.memory = deque(maxlen=parameters.MEMORY_SIZE)
self.build_model(observation_space, False)
self.target_model = self.model
self.step = 0
def build_model(self, observation_space, load):
if load:
self.model = load_model('random_crawl_model.h5')
else:
self.model = Sequential()
self.model.add(Dense(8, input_shape=(observation_space,), activation="relu"))
self.model.add(Dense(8, activation="relu"))
self.model.add(Dense(8, activation="relu"))
self.model.add(Dense(self.action_size, activation="linear"))
self.model.compile(loss="mse", optimizer=Adam(lr=parameters.LEARNING_RATE))
def save_to_memory(self, state, action, reward, next_state, done):
self.memory.append((state, action, reward, next_state, done))
def get_linear_decay_epsilon(self, step):
if step <= parameters.EXPLORATION_TARGET_STEP:
epsilon = parameters.EXPLORATION_MAX - step * ((parameters.EXPLORATION_MAX -\
parameters.EXPLORATION_MIN)/parameters.EXPLORATION_TARGET_STEP)
else:
epsilon = parameters.EXPLORATION_MIN
return epsilon
def get_action(self, state):
exploration_rate = self.get_linear_decay_epsilon(self.step)
# Explore
if np.random.rand() <= exploration_rate:
action = random.randrange(self.action_size)
return action
# Exploit
q_values = self.model.predict(state,batch_size=1)
action = np.argmax(q_values)
return action
# Train with previous experiences
def experience_replay(self):
if len(self.memory) < parameters.BATCH_SIZE:
return
mini_batch = random.sample(self.memory, parameters.BATCH_SIZE)
for state, action, reward, next_state, done in mini_batch:
q_target = reward if done else reward +\
parameters.GAMMA * np.amax(self.target_model.predict(next_state,batch_size=1))
q_values = self.model.predict(state,batch_size=1)
q_values[0][action] = q_target
self.model.fit(state, q_values, verbose=0)
# update target model every TARGET_MODEL_UPDATE_STEP steps
def save_load_model_weights(self):
self.model.save_weights('target_model.h5')
self.target_model.load_weights('target_model.h5', by_name=False)
\ No newline at end of file
<?xml version="1.0"?>
<?xml-model href="http://download.ros.org/schema/package_format2.xsd" schematypens="http://www.w3.org/2001/XMLSchema"?>
<package format="2">
<name>nav2_turtlebot3_rl</name>
<version>0.0.1</version>
<description>This package enables Reinfocement Learning with Gazebo and Turtlebot3</description>
<maintainer email="mohammad.haghighipanah@intel.com">Mohammad Haghighipanah</maintainer>
<license>Apache License 2.0</license>
<exec_depend>rclpy</exec_depend>
<exec_depend>std_msgs</exec_depend>
<exec_depend>tensorflow</exec_depend>
<exec_depend>tensorflow-gpu</exec_depend>
<exec_depend>keras</exec_depend>
<test_depend>ament_copyright</test_depend>
<test_depend>ament_flake8</test_depend>
<test_depend>ament_pep257</test_depend>
<test_depend>python3-pytest</test_depend>
<test_depend>tensorflow</test_depend>
<test_depend>tensorflow-gpu</test_depend>
<test_depend>keras</test_depend>
<export>
<build_type>ament_python</build_type>
</export>
</package>
# dqn parameters
GAMMA = 0.90
LEARNING_RATE = 0.005
MEMORY_SIZE = 1000000
BATCH_SIZE = 64
EXPLORATION_MIN = 0.25
EXPLORATION_MAX = 1.0
EXPLORATION_TARGET_STEP = 1000
# model train parameters
TARGET_MODEL_UPDATE_STEP = 100 # Number of steps before updating the target model
EPISODES = 10000 # Desired number of episodes to run
LOOP_RATE = 0.2 # Control loop rate in seconds
# Turtlebot 3 environment parameters
LINEAR_FWD_VELOCITY = 0.16
ANGULAR_FWD_VELOCITY = 0.0
LINEAR_STR_VELOCITY = 0.06
ANGULAR_VELOCITY = 0.6
\ No newline at end of file
# Copyright (c) 2019 Intel Corporation
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import os.path
from ament_index_python.packages import get_package_share_directory
from turtlebot3_env import TurtlebotEnv
import numpy as np
import rclpy
import parameters
from rclpy.node import Node
from time import sleep
from keras.models import load_model
def loadModel(env):
state = env.reset()
observation_space = len(state)
state = np.reshape(state, [1, observation_space])
pkg_share_directory = get_package_share_directory('nav2_turtlebot3_rl')
path = os.path.join(pkg_share_directory,"saved_models/random_crawl_waffle.h5")
model = load_model(path)
while rclpy.ok():
q_values = model.predict(state)
action = np.argmax(q_values)
next_state, reward, terminal = env.step(action)
next_state = np.reshape(next_state, [1, observation_space])
state = next_state
sleep(parameters.LOOP_RATE)
def main(args=None):
rclpy.init(args=args)
env = TurtlebotEnv()
action_size = env.action_space()
# Ctrl-C doesn't make rclpy.ok() to return false. Thus, we catch the exception with
# `finally` to shutdown ros and terminate the background thread cleanly.
try:
loadModel(env)
except KeyboardInterrupt:
pass
finally:
rclpy.shutdown()
env.cleanup()
return
if __name__ == "__main__":
main()
# Copyright (c) 2019 Intel Corporation
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from turtlebot3_env import TurtlebotEnv
from dqn import DQN
import random
import numpy as np
import tensorflow as tf
import rclpy
from rclpy.node import Node
from time import sleep
import parameters
def trainModel(env, action_size):
state = env.reset()
observation_space = len(state)
agent = DQN(observation_space, action_size)
target_model_update_counter = 0
agent.step = 0
for _ in range(parameters.EPISODES):
print("Episode number: " + str(_))
state = env.reset()
observation_size = len(state)
state = np.reshape(state, [1, observation_size])
done = False
while not done and rclpy.ok():
agent.step += 1
target_model_update_counter += 1
if target_model_update_counter%parameters.TARGET_MODEL_UPDATE_STEP == 0:
agent.save_load_model_weights()
target_model_update_counter = 0
action = agent.get_action(state)
next_state, reward, done = env.step(action)
next_state = np.reshape(next_state, [1, observation_space])
agent.save_to_memory(state, action, reward, next_state, done)
state = next_state
if not done:
agent.experience_replay()
sleep(parameters.LOOP_RATE)
agent.model.save('random_crawl_model.h5')
def main(args=None):
rclpy.init(args=args)
env = TurtlebotEnv()
action_size = env.action_space()
# Ctrl-C doesn't make rclpy.ok() to return false. Thus, we catch the exception with
# `finally` to shutdown ros and terminate the background thread cleanly.
try:
trainModel(env, action_size)
except KeyboardInterrupt:
pass
finally:
rclpy.shutdown()
env.cleanup()
return
if __name__ == "__main__":
main()
File added
File added
[develop]
script-dir=$base/lib/nav2_turtlebot3_rl
[install]
install-scripts=$base/lib/nav2_turtlebot3_rl
\ No newline at end of file
from setuptools import setup
package_name = 'nav2_turtlebot3_rl'
setup(
name=package_name,
version='0.0.1',
packages=[],
py_modules=[
'random_crawl_train',
'random_crawl',
'turtlebot3_env',
'dqn',
'parameters',
],
data_files=[
('share/ament_index/resource_index/packages',
['resource/' + package_name]),
('share/' + package_name, ['package.xml']),
('share/' + package_name + '/saved_models',\
['saved_models/random_crawl_burger.h5','saved_models/random_crawl_waffle.h5']),
],
install_requires=['setuptools'],
zip_safe=True,
author='Mohammad Haghighipanah',
author_email='mohammad.haghighipanah@intel.com',
maintainer='Mohammad Haghighipanah',
maintainer_email='mohammad.haghighipanah@intel.com',
keywords=['ROS'],
classifiers=[
'License :: OSI Approved :: Apache Software License',
'Programming Language :: Python',
'Topic :: Software Development',
],
description='Examples running Turtlebot3 in gazebo with RL',
license='Apache License, Version 2.0',
tests_require=['pytest'],
entry_points={
'console_scripts': [
'random_crawl_train = random_crawl_train:main',
'random_crawl = random_crawl:main',
'turtlebot3_env = turtlebot3_env',
],
},
)
# Copyright (c) 2019 Intel Corporation
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from time import sleep
from threading import Thread
import rclpy
from rclpy.node import Node
from rclpy.duration import Duration
import numpy as np
import math
import random
import pandas
import parameters
from geometry_msgs.msg import Twist
from sensor_msgs.msg import LaserScan
from std_srvs.srv import Empty
from std_msgs.msg import String
from gazebo_msgs.srv import GetEntityState, SetEntityState
class TurtlebotEnv():
def __init__(self):
self.node_ = rclpy.create_node('turtlebot3_env')
self.act = 0
self.done = False
self.actions = [[parameters.LINEAR_FWD_VELOCITY, parameters.ANGULAR_FWD_VELOCITY],
[parameters.LINEAR_STR_VELOCITY, parameters.ANGULAR_VELOCITY],
[parameters.LINEAR_STR_VELOCITY, -parameters.ANGULAR_VELOCITY]]
self.bonous_reward = 0
self.collision = False
self.collision_tol = 0.125
self.laser_scan_range = [0] * 360
self.states_input = [3.5] * 8
self.zero_div_tol = 0.01
self.range_min = 0.0
self.pub_cmd_vel = self.node_.create_publisher(Twist, 'cmd_vel')
self.sub_scan = self.node_.create_subscription(LaserScan, 'scan', self.scan_callback)
self.reset_simulation = self.node_.create_client(Empty, 'reset_simulation')
self.reset_world = self.node_.create_client(Empty, 'reset_world')
self.unpause_proxy = self.node_.create_client(Empty, 'unpause_physics')
self.pause_proxy = self.node_.create_client(Empty, 'pause_physics')
self.get_entity_state = self.node_.create_client(GetEntityState, 'get_entity_state')
self.set_entity_state = self.node_.create_client(SetEntityState, 'set_entity_state')
self.scan_msg_received = False
self.t = Thread(target=rclpy.spin, args=[self.node_])
self.t.start()
def cleanup(self):
self.t.join()
def get_reward(self):
reward = 0
if self.collision == True:
reward = -10
self.done = True
return reward, self.done
elif self.collision == False and self.act == 0:
if abs(min(self.states_input)) >= self.zero_div_tol:
reward = 0.08 - (1/(min(self.states_input)**2))*0.005
else:
reward = -10
if reward > 0:
self.bonous_reward += reward
reward = self.bonous_reward
else:
bonous_discount_factor = 0.6
self.bonous_reward *= bonous_discount_factor
if abs(min(self.states_input)) >= self.zero_div_tol:
reward = 0.02 - (1/min(self.states_input))*0.005
else:
reward = -10
return reward, self.done
def scan_callback(self, LaserScan):
self.scan_msg_received = True
self.laser_scan_range = []
self.range_min = LaserScan.range_min
range_max = LaserScan.range_max
for i in range(len(LaserScan.ranges)):
if LaserScan.ranges[i] == float('Inf'):
self.laser_scan_range.append(range_max)
elif LaserScan.ranges[i] < self.range_min:
self.laser_scan_range.append(self.range_min + self.collision_tol)
else:
self.laser_scan_range.append(LaserScan.ranges[i])
if self.check_collision():
self.collision = True
self.done = True
self.states_input = []
for i in range(8):
step = int(len(LaserScan.ranges)/8)
self.states_input.append(min(self.laser_scan_range[i*step:(i+1)*step], default=0))
def action_space(self):
return len(self.actions)
def step(self, action):
vel_cmd = Twist()
self.act = action
vel_cmd.linear.x = self.actions[action][0]
vel_cmd.angular.z = self.actions[action][1]
self.pub_cmd_vel.publish(vel_cmd)
vel_cmd.linear.x = 0.0
vel_cmd.angular.z = 0.0
get_reward = self.get_reward()
return self.states_input, get_reward[0], self.done
def check_collision(self):
if min(self.laser_scan_range) < self.range_min + self.collision_tol:
print("Near collision detected... " + str(min(self.laser_scan_range)))
return True
return False
def reset(self):
self.scan_msg_received = False
vel_cmd = Twist()
vel_cmd.linear.x = 0.0
vel_cmd.angular.z = 0.0
self.pub_cmd_vel.publish(vel_cmd)
while not self.reset_world.wait_for_service(timeout_sec=1.0):
print('Reset world service is not available...')
self.reset_world.call_async(Empty.Request())
while not self.reset_simulation.wait_for_service(timeout_sec=1.0):
print('Reset simulation service is not available...')
self.reset_simulation.call_async(Empty.Request())
self.laser_scan_range = [0] * 360
self.states_input = [3.5] * 8
while not self.scan_msg_received and rclpy.ok():
sleep(0.1)
self.collision = False
self.done = False
self.bonous_reward = 0
return self.states_input
\ No newline at end of file
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment