Experimental Random Crawl task implemented with Reinforcement Learning (DQN) (#683)

* added random crawl task to nav2 with reinforcement learning * factored out dqn class and parameters * modified readme files * removed extra file * added tol to laser append) * addressing reviewer comments * fixed spacing format * added setup cfg file and updated readme * throttling down while ture loop * fixed shutdown issue to exit cleanly * renamed package name and fixed get pkg directory * fixed few typos

Experimental Random Crawl task implemented with Reinforcement Learning (DQN) (#683)
* added random crawl task to nav2 with reinforcement learning * factored out dqn class and parameters * modified readme files * removed extra file * added tol to laser append) * addressing reviewer comments * fixed spacing format * added setup cfg file and updated readme * throttling down while ture loop * fixed shutdown issue to exit cleanly * renamed package name and fixed get pkg directory * fixed few typos
396048cf · Mohammad Haghighipanah · GitHub · 8da4958d · 396048cf · 396048cf
Unverified Commit 396048cf authored 5 years ago by Mohammad Haghighipanah Committed by GitHub 5 years ago
--- a/nav2_experimental/nav2_rl/README.md
+++ b/nav2_experimental/nav2_rl/README.md
+# Overview
+The ROS 2 Experimental Reinfocement Learning (RL) package is a package that enables to train and control a robot with RL techniques.  This package is utilizing the Gazebo simulator and ROS 2 to train a TurtleBot 3 robot. 
+
+# Contributing
+We have currently implemented a Random Crawl task to get started with RL and plan to implement and add more RL related tasks to this package. This Random Crawl RL task can be used as a tutorial to get started with RL.
+
+# Building the source
+For instructions on how to build this repo, see the [BUILD.md](https://github.intel.com/haghighi/Experimental_RL/blob/master/doc/BUILD.md) file.
+
+###Random Crawl
+Random Crawl is a task which enables TurtleBot3 to randomly crawl around the environment without hitting any obstacles.
+The following use-cases can be achieved with Random Crawl task:
+1. Map Exploration
+2. Localization test
+3. Security Monitoring
+
+#### Model
+TODO
+Inputs
+Outputs
+DQN
+Parameters
+
+
+####Training
+To train Random Crawl:
+
+Firtst run Gazebo:
+```sh 
+export <directory_for_turtlebot3_ws>/src/turtlebot3/turtlebot3_simulations/turtlebot3_gazebo/models
+export TURTLEBOT3_MODEL=burger or waffle
+ros2 launch turtlebot3_gazebo turtlebot3_world.launch.py
+```
+Then, 
+```sh
+cd <navigation2_ws> source install setup.bash
+```
+Run:
+```sh 
+ros2 run nav2_turtlebot3_rl random_crawl_train
+```
+
+####Running
+To run a trained model Random Crawl task:
+Firtst run Gazebo:
+
+```sh 
+export <directory_for_turtlebot3_ws>/src/turtlebot3/turtlebot3_simulations/turtlebot3_gazebo/models
+export TURTLEBOT3_MODEL=burger or waffle
+ros2 launch turtlebot3_gazebo turtlebot3_world.launch.py
+```
+
+Then, 
+```sh
+cd <navigation2_ws> source install setup.bash
+```
+Run:
+```sh 
+ros2 run nav2_turtlebot3_rl random_crawl
+```
--- a/nav2_experimental/nav2_rl/doc/BUILD.md
+++ b/nav2_experimental/nav2_rl/doc/BUILD.md
+# Build Instructions
+### Steps
+First, install all ROS2 dependencies from ROS2 Installation page: 
+
+Second, install dependencies:
+1. Install TensorFlow by following instructions from: [tensorflow](https://www.tensorflow.org/install/pip)
+2. Install Keras: `sudo pip install keras`
+3. Install `TurtleBot3` by following instructions from here: [TurtleBot3 ROS2 Packages](http://emanual.robotis.com/docs/en/platform/turtlebot3/ros2/#setup) Section 15.1.1.3
+
+4. TurtleBot3 installation also installs gazebo-ros-pkgs from ROS2 branch.  However, at this time the needed services are not available on the ROS2 branch.  To be able to use RL package, pull Crystal branch. 
+5. ```sh
+   cd ~/turtlebot3_ws/src/gazebo/gazebo_ros_pkgs
+   git checkout crystal
+   ```
+6.  ```sh 
+    cd <turtlebot3_ws> colcon build --symlink-install
+    ```
+
+###Build Experimental RL
+```sh
+cd <directory_ros2_ws> source install/setup.bash
+cd <turtlebot3_ws> source install setup.bash
+cd <navigation2_ws> colcon build --symlink-install
+
+```
--- a/nav2_experimental/nav2_rl/dqn.py
+++ b/nav2_experimental/nav2_rl/dqn.py
+# Copyright (c) 2019 Intel Corporation
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import random
+import numpy as np
+import tensorflow as tf
+import parameters
+
+from collections import deque
+from keras.models import Sequential
+from keras.layers import Dense
+from keras.optimizers import Adam
+from keras.models import load_model
+
+class DQN:
+    def __init__(self, observation_space, action_size):
+        self.action_size = action_size
+        self.memory = deque(maxlen=parameters.MEMORY_SIZE)
+        self.build_model(observation_space, False)
+        self.target_model = self.model
+        self.step = 0
+
+    def build_model(self, observation_space, load):
+        if load:
+            self.model = load_model('random_crawl_model.h5')
+        else:
+            self.model = Sequential()
+            self.model.add(Dense(8, input_shape=(observation_space,), activation="relu"))
+            self.model.add(Dense(8, activation="relu"))
+            self.model.add(Dense(8, activation="relu"))
+            self.model.add(Dense(self.action_size, activation="linear"))
+            self.model.compile(loss="mse", optimizer=Adam(lr=parameters.LEARNING_RATE))
+
+    def save_to_memory(self, state, action, reward, next_state, done):
+        self.memory.append((state, action, reward, next_state, done))
+   
+    def get_linear_decay_epsilon(self, step):
+        if step <= parameters.EXPLORATION_TARGET_STEP:
+            epsilon = parameters.EXPLORATION_MAX - step * ((parameters.EXPLORATION_MAX -\
+                parameters.EXPLORATION_MIN)/parameters.EXPLORATION_TARGET_STEP)
+        else:
+           epsilon = parameters.EXPLORATION_MIN
+        return epsilon
+
+    def get_action(self, state):
+        exploration_rate = self.get_linear_decay_epsilon(self.step)
+        # Explore
+        if np.random.rand() <= exploration_rate:
+            action = random.randrange(self.action_size)
+            return action
+        # Exploit
+        q_values = self.model.predict(state,batch_size=1)
+        action = np.argmax(q_values)
+        return action
+
+    # Train with previous experiences
+    def experience_replay(self):
+        if len(self.memory) < parameters.BATCH_SIZE:
+            return
+        mini_batch = random.sample(self.memory, parameters.BATCH_SIZE)
+        for state, action, reward, next_state, done in mini_batch:
+            q_target = reward if done else reward +\
+                parameters.GAMMA * np.amax(self.target_model.predict(next_state,batch_size=1))
+            
+            q_values = self.model.predict(state,batch_size=1)
+            q_values[0][action] = q_target
+            
+            self.model.fit(state, q_values, verbose=0)
+
+    # update target model every TARGET_MODEL_UPDATE_STEP steps
+    def save_load_model_weights(self):
+        self.model.save_weights('target_model.h5')
+        self.target_model.load_weights('target_model.h5', by_name=False)
\ No newline at end of file
--- a/nav2_experimental/nav2_rl/package.xml
+++ b/nav2_experimental/nav2_rl/package.xml
+<?xml version="1.0"?>
+<?xml-model href="http://download.ros.org/schema/package_format2.xsd" schematypens="http://www.w3.org/2001/XMLSchema"?>
+<package format="2">
+  <name>nav2_turtlebot3_rl</name>
+  <version>0.0.1</version>
+  <description>This package enables Reinfocement Learning with Gazebo and Turtlebot3</description>
+
+  <maintainer email="mohammad.haghighipanah@intel.com">Mohammad Haghighipanah</maintainer>
+  <license>Apache License 2.0</license>
+
+  <exec_depend>rclpy</exec_depend>
+  <exec_depend>std_msgs</exec_depend>
+  <exec_depend>tensorflow</exec_depend>
+  <exec_depend>tensorflow-gpu</exec_depend>
+  <exec_depend>keras</exec_depend>
+
+  <test_depend>ament_copyright</test_depend>
+  <test_depend>ament_flake8</test_depend>
+  <test_depend>ament_pep257</test_depend>
+  <test_depend>python3-pytest</test_depend>
+  <test_depend>tensorflow</test_depend>
+  <test_depend>tensorflow-gpu</test_depend>
+  <test_depend>keras</test_depend>
+
+  <export>
+    <build_type>ament_python</build_type>
+  </export>
+</package>
--- a/nav2_experimental/nav2_rl/parameters.py
+++ b/nav2_experimental/nav2_rl/parameters.py
+# dqn parameters
+GAMMA = 0.90
+LEARNING_RATE = 0.005
+MEMORY_SIZE = 1000000
+BATCH_SIZE = 64
+EXPLORATION_MIN = 0.25
+EXPLORATION_MAX = 1.0
+EXPLORATION_TARGET_STEP = 1000
+
+# model train parameters
+TARGET_MODEL_UPDATE_STEP = 100 # Number of steps before updating the target model
+EPISODES = 10000 # Desired number of episodes to run
+LOOP_RATE = 0.2 # Control loop rate in seconds
+
+# Turtlebot 3 environment parameters
+LINEAR_FWD_VELOCITY = 0.16
+ANGULAR_FWD_VELOCITY = 0.0
+LINEAR_STR_VELOCITY = 0.06
+ANGULAR_VELOCITY = 0.6
\ No newline at end of file
--- a/nav2_experimental/nav2_rl/random_crawl.py
+++ b/nav2_experimental/nav2_rl/random_crawl.py
+# Copyright (c) 2019 Intel Corporation
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import os.path
+from ament_index_python.packages import get_package_share_directory
+from turtlebot3_env import TurtlebotEnv
+import numpy as np
+import rclpy
+import parameters
+
+from rclpy.node import Node
+from time import sleep
+from keras.models import load_model
+
+def loadModel(env):
+    state = env.reset()
+    observation_space = len(state)
+    state = np.reshape(state, [1, observation_space])
+    pkg_share_directory = get_package_share_directory('nav2_turtlebot3_rl')
+    path = os.path.join(pkg_share_directory,"saved_models/random_crawl_waffle.h5")
+    model = load_model(path)
+    while rclpy.ok():
+        q_values = model.predict(state)
+        action = np.argmax(q_values)
+        next_state, reward, terminal = env.step(action)
+        next_state = np.reshape(next_state, [1, observation_space])
+        state = next_state
+        sleep(parameters.LOOP_RATE)
+
+def main(args=None):
+    rclpy.init(args=args)
+    env = TurtlebotEnv()
+    action_size = env.action_space()
+    
+    # Ctrl-C doesn't make rclpy.ok() to return false. Thus, we catch the exception with
+    # `finally` to shutdown ros and terminate the background thread cleanly.
+    try:
+        loadModel(env)
+    except KeyboardInterrupt:
+        pass
+    finally:
+        rclpy.shutdown()
+        env.cleanup()
+    return
+
+if __name__ == "__main__":
+    main()
--- a/nav2_experimental/nav2_rl/random_crawl_train.py
+++ b/nav2_experimental/nav2_rl/random_crawl_train.py
+# Copyright (c) 2019 Intel Corporation
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+from turtlebot3_env import TurtlebotEnv
+from dqn import DQN
+
+import random
+import numpy as np
+import tensorflow as tf
+
+import rclpy
+from rclpy.node import Node
+from time import sleep
+import parameters
+
+def trainModel(env, action_size):
+    state = env.reset()
+    observation_space = len(state)
+    agent = DQN(observation_space, action_size)
+    target_model_update_counter = 0
+    agent.step = 0
+    for _ in range(parameters.EPISODES):
+        print("Episode number: " + str(_))
+        state = env.reset()
+        observation_size = len(state)
+        state = np.reshape(state, [1, observation_size])
+        done = False
+        while not done and rclpy.ok():
+            agent.step += 1
+            target_model_update_counter += 1
+            if target_model_update_counter%parameters.TARGET_MODEL_UPDATE_STEP == 0:
+                agent.save_load_model_weights()
+                target_model_update_counter = 0
+            action = agent.get_action(state)
+            next_state, reward, done = env.step(action)
+            next_state = np.reshape(next_state, [1, observation_space])
+            agent.save_to_memory(state, action, reward, next_state, done)
+            state = next_state
+            if not done: 
+                agent.experience_replay()
+            sleep(parameters.LOOP_RATE)
+        agent.model.save('random_crawl_model.h5')
+
+def main(args=None):
+    rclpy.init(args=args)
+    env = TurtlebotEnv()
+    action_size = env.action_space()
+    
+    # Ctrl-C doesn't make rclpy.ok() to return false. Thus, we catch the exception with
+    # `finally` to shutdown ros and terminate the background thread cleanly.
+    try:
+        trainModel(env, action_size)
+    except KeyboardInterrupt:
+        pass
+    finally:
+        rclpy.shutdown()
+        env.cleanup()
+    return
+
+if __name__ == "__main__":
+    main()
+
--- a/nav2_experimental/nav2_rl/saved_models/random_crawl_burger.h5
+++ b/nav2_experimental/nav2_rl/saved_models/random_crawl_burger.h5
--- a/nav2_experimental/nav2_rl/saved_models/random_crawl_waffle.h5
+++ b/nav2_experimental/nav2_rl/saved_models/random_crawl_waffle.h5
--- a/nav2_experimental/nav2_rl/setup.cfg
+++ b/nav2_experimental/nav2_rl/setup.cfg
+[develop]
+script-dir=$base/lib/nav2_turtlebot3_rl
+[install]
+install-scripts=$base/lib/nav2_turtlebot3_rl
\ No newline at end of file
--- a/nav2_experimental/nav2_rl/setup.py
+++ b/nav2_experimental/nav2_rl/setup.py
+from setuptools import setup
+
+package_name = 'nav2_turtlebot3_rl'
+
+setup(
+    name=package_name,
+    version='0.0.1',
+    packages=[],
+    py_modules=[
+        'random_crawl_train',
+        'random_crawl',
+        'turtlebot3_env',
+        'dqn',
+        'parameters',
+         ],
+    data_files=[
+        ('share/ament_index/resource_index/packages',
+            ['resource/' + package_name]),
+        ('share/' + package_name, ['package.xml']),
+	    ('share/' + package_name + '/saved_models',\
+        ['saved_models/random_crawl_burger.h5','saved_models/random_crawl_waffle.h5']),
+    ],
+    install_requires=['setuptools'],
+    zip_safe=True,
+    author='Mohammad Haghighipanah',
+    author_email='mohammad.haghighipanah@intel.com',
+    maintainer='Mohammad Haghighipanah',
+    maintainer_email='mohammad.haghighipanah@intel.com',
+    keywords=['ROS'],
+    classifiers=[
+        'License :: OSI Approved :: Apache Software License',
+        'Programming Language :: Python',
+        'Topic :: Software Development',
+    ],
+    description='Examples running Turtlebot3 in gazebo with RL',
+    license='Apache License, Version 2.0',
+    tests_require=['pytest'],
+    entry_points={
+        'console_scripts': [
+            'random_crawl_train = random_crawl_train:main',
+            'random_crawl = random_crawl:main',
+	    'turtlebot3_env = turtlebot3_env',
+
+        ],
+    },
+)
--- a/nav2_experimental/nav2_rl/turtlebot3_env.py
+++ b/nav2_experimental/nav2_rl/turtlebot3_env.py
+# Copyright (c) 2019 Intel Corporation
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+from time import sleep
+from threading import Thread
+
+import rclpy
+from rclpy.node import Node
+from rclpy.duration import Duration
+
+import numpy as np
+import math
+import random
+import pandas
+import parameters
+
+from geometry_msgs.msg import Twist
+from sensor_msgs.msg import LaserScan
+from std_srvs.srv import Empty
+from std_msgs.msg import String
+from gazebo_msgs.srv import GetEntityState, SetEntityState
+
+class TurtlebotEnv():
+    def __init__(self):
+        self.node_ = rclpy.create_node('turtlebot3_env')
+        self.act = 0
+        self.done = False
+        self.actions = [[parameters.LINEAR_FWD_VELOCITY, parameters.ANGULAR_FWD_VELOCITY],
+                        [parameters.LINEAR_STR_VELOCITY, parameters.ANGULAR_VELOCITY],
+                        [parameters.LINEAR_STR_VELOCITY, -parameters.ANGULAR_VELOCITY]]
+        self.bonous_reward = 0
+
+        self.collision = False
+        self.collision_tol = 0.125
+        self.laser_scan_range = [0] * 360
+        self.states_input = [3.5] * 8
+        self.zero_div_tol = 0.01
+        self.range_min = 0.0
+
+        self.pub_cmd_vel = self.node_.create_publisher(Twist, 'cmd_vel')
+        self.sub_scan = self.node_.create_subscription(LaserScan, 'scan', self.scan_callback)
+
+        self.reset_simulation = self.node_.create_client(Empty, 'reset_simulation')
+        self.reset_world = self.node_.create_client(Empty, 'reset_world')
+        self.unpause_proxy = self.node_.create_client(Empty, 'unpause_physics')
+        self.pause_proxy = self.node_.create_client(Empty, 'pause_physics')
+        self.get_entity_state = self.node_.create_client(GetEntityState, 'get_entity_state')
+        self.set_entity_state = self.node_.create_client(SetEntityState, 'set_entity_state')
+        self.scan_msg_received = False
+        self.t = Thread(target=rclpy.spin, args=[self.node_])
+        self.t.start()
+    
+    def cleanup(self):
+        self.t.join()
+
+    def get_reward(self):
+        reward = 0
+        if self.collision == True:
+            reward = -10
+            self.done = True
+            return reward, self.done
+        elif self.collision == False and self.act == 0:
+            if abs(min(self.states_input)) >= self.zero_div_tol:
+                reward = 0.08 - (1/(min(self.states_input)**2))*0.005
+            else:
+                reward = -10
+            if reward > 0:
+                self.bonous_reward += reward
+                reward = self.bonous_reward
+        else:
+            bonous_discount_factor = 0.6
+            self.bonous_reward *= bonous_discount_factor
+            if abs(min(self.states_input)) >= self.zero_div_tol:
+                reward = 0.02 - (1/min(self.states_input))*0.005
+            else:
+                reward = -10
+        return reward, self.done
+
+
+    def scan_callback(self, LaserScan):
+        self.scan_msg_received = True
+        self.laser_scan_range = []
+        self.range_min = LaserScan.range_min
+        range_max = LaserScan.range_max
+
+        for i in range(len(LaserScan.ranges)):
+            if LaserScan.ranges[i] == float('Inf'):
+                self.laser_scan_range.append(range_max)
+            elif LaserScan.ranges[i] < self.range_min:
+                self.laser_scan_range.append(self.range_min + self.collision_tol)
+            else:
+                self.laser_scan_range.append(LaserScan.ranges[i])
+        if self.check_collision():
+            self.collision = True
+            self.done = True
+        self.states_input = []
+        for i in range(8):
+            step = int(len(LaserScan.ranges)/8)
+            self.states_input.append(min(self.laser_scan_range[i*step:(i+1)*step], default=0))
+
+    def action_space(self):
+        return len(self.actions)
+
+    def step(self, action):
+        vel_cmd = Twist()
+        self.act = action
+        vel_cmd.linear.x = self.actions[action][0]
+        vel_cmd.angular.z = self.actions[action][1]
+        self.pub_cmd_vel.publish(vel_cmd)
+        vel_cmd.linear.x = 0.0
+        vel_cmd.angular.z = 0.0
+        get_reward = self.get_reward()
+        return self.states_input, get_reward[0], self.done
+    
+    def check_collision(self):
+        if  min(self.laser_scan_range) < self.range_min + self.collision_tol:
+            print("Near collision detected... " + str(min(self.laser_scan_range)))
+            return True
+        return False
+
+    def reset(self):
+        self.scan_msg_received = False
+        vel_cmd = Twist()
+        vel_cmd.linear.x = 0.0
+        vel_cmd.angular.z = 0.0
+        self.pub_cmd_vel.publish(vel_cmd)
+        
+        while not self.reset_world.wait_for_service(timeout_sec=1.0):
+            print('Reset world service is not available...')
+        self.reset_world.call_async(Empty.Request())
+
+        while not self.reset_simulation.wait_for_service(timeout_sec=1.0):
+            print('Reset simulation service is not available...')
+        self.reset_simulation.call_async(Empty.Request())
+
+        self.laser_scan_range = [0] * 360
+        self.states_input = [3.5] * 8
+        while not self.scan_msg_received and rclpy.ok():
+            sleep(0.1)
+        self.collision = False
+        self.done = False
+        self.bonous_reward = 0
+        return self.states_input
\ No newline at end of file