diff --git a/research/cv/single_path_nas/README.md b/research/cv/single_path_nas/README.md
new file mode 100644
index 0000000000000000000000000000000000000000..f4899b7ae70006667cc6c6ccbc8d89eb3e8b0e0f
--- /dev/null
+++ b/research/cv/single_path_nas/README.md
@@ -0,0 +1,338 @@
+# Contents
+
+<!-- TOC -->
+
+- [Contents](#contents)
+- [Single-path-nas description](#single-path-nas-description)
+- [Dataset](#dataset)
+- [Features](#features)
+    - [Mixed Precision](#mixed-precision)
+- [Environment Requirements](#environment-requirements)
+- [Quick Start](#quick-start)
+- [Scripts Description](#scripts-description)
+    - [Scripts and sample code](#scripts-and-sample-code)
+    - [Script parameters](#script-parameters)
+    - [Training process](#training-process)
+        - [Standalone training](#standalone-training)
+        - [Distributed training](#distributed-training)
+    - [Evaluation process](#evaluation-process)
+        - [Evaluate](#evaluate)
+    - [Export process](#export-process)
+        - [Export](#export)
+    - [Inference process](#inference-process)
+        - [Inference](#inference)
+- [Model description](#model-description)
+    - [Performance](#performance)
+        - [Training performance](#training-performance)
+            - [Single-Path-NAS on ImageNet-1k](#single-path-nas-on-imagenet-1k)
+        - [Inference performance](#inference-performance)
+            - [Single-Path-NAS on ImageNet-1k](#single-path-nas-on-imagenet-1k-1)
+- [ModelZoo Homepage](#modelzoo-homepage)
+
+<!-- /TOC -->
+
+# Single-path-nas description
+
+The author of single-path-nas used a large 7x7 convolution to represent the three convolutions of 3x3, 5x5, and 7x7.
+The weights of the smaller convolution layers are shared with the larger ones. The largest kernel becomes a "superkernel".
+This way when training the model we do not need to choose between different paths, instead we pass the data through a
+layer with shared weights among different sub-kernels. The search space is a block-based straight structure.
+Like in the ProxylessNAS and the FBNet, the Inverted Bottleneck block is used as the cell,
+and the number of layers is 22 as in the MobileNetV2. Each layer has only two searchable hyper-parameters:
+expansion rate and kernel size. The others hyper-parameters are fixed. For example, the filter number of each layer in
+the 22nd layer is fixed. Like FBNet, it is slightly changed from MobileNetV2. The used kernel sizes in the paper are
+only 3x3 and 5x5 like in the FBNet and ProxylessNAS, and 7x7 kernels are not used. The expansion ratio in the paper has
+only two choices of 3 and 6. Both the kernel size and expansion ratio have only 2 choices.
+The Single-Path-NAS paper uses the techniques described in Lightnn's paper.
+In particular, it describes using a continuous smooth function to represent the discrete choice,
+and the threshold is a group Lasso term. This paper uses the same technique as ProxylessNAS to express skip connection,
+which is represented by a zero layer.
+Paper: https://zhuanlan.zhihu.com/p/63605721
+
+# Dataset
+
+Dataset used锛歔ImageNet2012](http://www.image-net.org/)
+
+- Dataset size锛歛 total of 1000 categories, 224\*224 color images
+    - Training set: 1,281,167 images in total
+    - Test set: 50,000 images in total
+- Data format锛欽PEG
+    - Note: The data is processed in dataset.py.
+- Download the dataset and prepare the directories structure as follows锛�
+
+```text
+鈹斺攢dataset
+    鈹溾攢train                 # Training dataset
+    鈹斺攢val                   # Evaluation dataset
+```
+
+# Features
+
+## Mixed Precision
+
+The [mixed-precision](https://www.mindspore.cn/docs/programming_guide/zh-CN/master/enable_mixed_precision.html)
+training method uses single-precision and half-precision data to improve the training speed of
+deep learning neural networks, while maintaining the network accuracy that can be achieved by single-precision training.
+Mixed-precision training increases computing speed and reduces memory usage, while supporting training larger models or
+allowing larger batches for training on specific hardware.
+
+# Environment Requirements
+
+- Hardware锛圓scend, GPU锛�
+    - Prepare hardware environment with Ascend processor or CUDA based GPU.
+- Framework
+    - [MindSpore](https://www.mindspore.cn/install/en)
+- For more information, please check the links below:
+    - [MindSpore tutorials](https://www.mindspore.cn/tutorials/zh-CN/r1.3/index.html)
+    - [MindSpore Python API](https://www.mindspore.cn/docs/api/zh-CN/r1.3/index.html)
+
+# Quick Start
+
+After installing MindSpore through the official website, you can follow the steps below for training and evaluation:
+
+- For the Ascend hardware
+
+  ```bash
+  # Run the training example
+  python train.py --device_id=0 > train.log 2>&1 &
+
+  # Run a distributed training example
+  bash ./scripts/run_train.sh [RANK_TABLE_FILE] imagenet
+
+  # Run evaluation example
+  python eval.py --checkpoint_path ./ckpt_0 > ./eval.log 2>&1 &
+
+  # Run the inference example
+  bash run_infer_310.sh [MINDIR_PATH] [DATA_PATH] [DEVICE_ID]
+  ```
+
+  For distributed training, you need to create an **hccl** configuration file in JSON format in advance.
+
+  Please follow the instructions in the link below:
+
+  <https://gitee.com/mindspore/models/tree/master/utils/hccl_tools.>
+
+- For the GPU hardware
+
+  ```bash
+  # Run the training example
+  python train.py --device_target="GPU" --data_path="/path/to/imagenet/train/" --lr_init=0.26 > train.log 2>&1 &
+
+  # Run a distributed training example
+  bash ./scripts/run_distribute_train_gpu.sh "/path/to/imagenet/train/"
+
+  # Run evaluation example
+  python eval.py --device_target="GPU" --val_data_path="/path/to/imagenet/val/" --checkpoint_path ./ckpt_0 > ./eval.log 2>&1 &
+  ```
+
+# Scripts Description
+
+## Scripts and sample code
+
+```bash
+鈹溾攢鈹€ model_zoo
+  鈹溾攢鈹€ scripts
+  鈹�   鈹溾攢鈹€run_distribute_train.sh              // Shell script for running the Ascend distributed training
+  鈹�   鈹溾攢鈹€run_distribute_train_gou.sh          // Shell script for running the GPU distributed training
+  鈹�   鈹溾攢鈹€run_standalone_train.sh              // Shell script for running the Ascend standalone training
+  鈹�   鈹溾攢鈹€run_standalone_train_gpu.sh          // Shell script for running the GPU standalone training
+  鈹�   鈹溾攢鈹€run_eval.sh                          // Shell script for running the Ascend evaluation
+  鈹�   鈹溾攢鈹€run_eval_gpu.sh                      // Shell script for running the GPU evaluation
+  鈹�   鈹溾攢鈹€run_infer_310.sh                     // Shell script for running the Ascend 310 inference
+  鈹溾攢鈹€ src
+  鈹�   鈹溾攢鈹€lr_scheduler
+  鈹�   鈹�   鈹溾攢鈹€__init__.py
+  鈹�   鈹�   鈹溾攢鈹€linear_warmup.py                 // Definitions for the warm-up functionality
+  鈹�   鈹�   鈹溾攢鈹€warmup_cosine_annealing_lr.py    // Definitions for the cosine annealing learning rate schedule
+  鈹�   鈹�   鈹溾攢鈹€warmup_step_lr.py                // Definitions for the exponential learning rate schedule
+  鈹�   鈹溾攢鈹€__init__.py
+  鈹�   鈹溾攢鈹€config.py                            // Parameters configuration
+  鈹�   鈹溾攢鈹€CrossEntropySmooth.py                // Definitions for the cross entropy loss function
+  鈹�   鈹溾攢鈹€dataset.py                           // Functions for creating a dataset
+  鈹�   鈹溾攢鈹€spnasnet.py                          // Single-Path-NAS architecture.
+  鈹�   鈹溾攢鈹€utils.py                             // Auxiliary functions
+  鈹溾攢鈹€ create_imagenet2012_label.py            // Creating ImageNet labels
+  鈹溾攢鈹€ eval.py                                 // Evaluate the trained model
+  鈹溾攢鈹€ export.py                               // Export model to other formats
+  鈹溾攢鈹€ postprocess.py                          // Postprocess for the Ascend 310 inference.
+  鈹溾攢鈹€ README.md                               // Single-Path-NAS related instruction in English
+  鈹溾攢鈹€ README_CN.md                            // Single-Path-NAS related instruction in Chinese
+  鈹溾攢鈹€ train.py                                // Train the model.
+```
+
+## Script parameters
+
+Training parameters and evaluation parameters can be configured in a `config.py` file.
+
+- Parameters of a Single-Path-NAS model for the ImageNet-1k dataset.
+
+  ```python
+  'name':'imagenet'                        # dataset
+  'pre_trained':'False'                    # Whether to start using a pre-trained model
+  'num_classes':1000                       # Number of classes in a dataset
+  'lr_init':0.26                           # Initial learning rate, set to 0.26 for single-card training, and 1.5 for eight-card parallel training.
+  'batch_size':128                         # training batch size
+  'epoch_size':180                         # Number of epochs
+  'momentum':0.9                           # Momentum
+  'weight_decay':1e-5                      # Weight decay value
+  'image_height':224                       # Height of the model input image
+  'image_width':224                        # Width of the model input image
+  'data_path':'/data/ILSVRC2012_train/'    # The absolute path to the training dataset
+  'val_data_path':'/data/ILSVRC2012_val/'  # The absolute path to the validation dataset
+  'device_target':'Ascend'                 # Device
+  'device_id':0                            # ID of the device used for training/evaluation.
+  'keep_checkpoint_max':40                 # Number of checkpoints to keep
+  'checkpoint_path':None                   # The absolute path to the checkpoint file or a directory, where the checkpoints are saved
+
+  'lr_scheduler': 'cosine_annealing'       # Learning rate scheduler ['cosine_annealing', 'exponential']
+  'lr_epochs': [30, 60, 90]                # Key points for the exponential schedular
+  'lr_gamma': 0.3                          # Learning rate decay for the exponential scheduler
+  'eta_min': 0.0                           # Minimal learning rate
+  'T_max': 180                             # Number of epochs for the cosine
+  'warmup_epochs': 0                       # Number of warm-up epochs
+  'is_dynamic_loss_scale': 1               # Use dynamic loss scale manager (scale manager is not used for GPU)
+  'loss_scale': 1024                       # Loss scale value
+  'label_smooth_factor': 0.1               # Factor for labels smoothing
+  'use_label_smooth': True                 # Use label smoothing
+  ```
+
+For more configuration details, please refer to the script `config.py`.
+
+## Training process
+
+### Standalone training
+
+- Using an Ascend processor environment
+
+  ```bash
+  python train.py --device_id=0 > train.log 2>&1 &
+  ```
+
+  The above python command will run in the background, and the result can be viewed through the generated train.log file.
+
+- Using an GPU environment
+
+  ```bash
+  python train.py --device_target='GPU' --data_path="/path/to/imagenet/train/" --lr_init=0.26 > train.log 2>&1 &
+  ```
+
+  The above python command will run in the background, and the result can be viewed through the generated train.log file.
+
+### Distributed training
+
+- Using an Ascend processor environment
+
+  ```bash
+  bash ./scripts/run_distribute_train.sh [RANK_TABLE_FILE]
+  ```
+
+  The above shell script will run distributed training in the background.
+
+- Using a GPU environment
+
+  ```bash
+  bash ./scripts/run_distribute_train_gpu.sh [TRAIN_PATH](optional)
+  ```
+
+> TRAIN_PATH - Path to the directory with the training subset of the dataset.
+
+The above shell scripts will run the distributed training in the background.
+Also `train_parallel` folder will be created where the copy of the code,
+the training log files and the checkpoints will be stored.
+
+## Evaluation process
+
+### Evaluate
+
+- Evaluate the model on the ImageNet-1k dataset using the Ascend environment
+
+  鈥�./ckpt_0鈥� is a directory, where the trained model is saved in the .ckpt format.
+
+  ```bash
+  python eval.py --checkpoint_path ./ckpt_0 > ./eval.log 2>&1 &
+  OR
+  bash ./scripts/run_eval.sh
+  ```
+
+- Evaluate the model on the ImageNet-1k dataset using the GPU environment
+
+  鈥�./ckpt_0鈥� is a directory, where the trained model is saved in the .ckpt format.
+
+  ```bash
+  python eval.py --device_target="GPU" --checkpoint_path ./ckpt_0 > ./eval.log 2>&1 &
+  OR
+  bash ./scripts/run_eval_gpu.sh [CKPT_FILE_OR_DIR] [VALIDATION_DATASET](optional)
+  ```
+
+> CKPT_FILE_OR_DIR - Path to the trained model checkpoint or to the directory, containing checkpoints.
+>
+> VALIDATION_DATASET - (optional) Path to the validation subset of the dataset.
+
+## Export process
+
+### Export
+
+  ```shell
+  python export.py --ckpt_file [CKPT_FILE] --device_target [DEVICE_TARGET]
+  ```
+
+> DEVICE_TARGET: Ascend or GPU
+
+## Inference process
+
+### Inference
+
+Before inference, we need to export the model first.
+MINDIR can be exported in any environment, and the AIR model can only be exported in the Ascend 910 environment.
+The following shows an example of using the MINDIR model to run the inference.
+
+- Use ImageNet-1k dataset for inference on the Ascend 310
+
+  The results of the inference are stored in the scripts directory,
+  and results similar to the following can be found in the acc.log log file.
+
+  ```shell
+  # Ascend310 inference
+  bash run_infer_310.sh [MINDIR_PATH] [DATA_PATH] [DEVICE_ID]
+  Total data: 50000, top1 accuracy: 0.74214, top5 accuracy: 0.91652.
+  ```
+
+# Model description
+
+## Performance
+
+### Training performance
+
+#### Single-Path-NAS on ImageNet-1k
+
+| Parameter                  | Ascend                                                                                  | GPU                                                                                     |
+| -------------------------- | --------------------------------------------------------------------------------------- | --------------------------------------------------------------------------------------- |
+| Model                      | single-path-nas                                                                         | single-path-nas                                                                         |
+| Resource                   | Ascend 910                                                                              | V100 GPU, Intel Xeon Gold 6226R CPU @ 2.90GHz                                           |
+| Upload date                | 2021-06-27                                                                              | -                                                                                       |
+| MindSpore version          | 1.2.0                                                                                   | 1.5.0                                                                                   |
+| Dataset                    | ImageNet-1k Train, 1,281,167 images in total                                            | ImageNet-1k Train, 1,281,167 images in total                                            |
+| Training parameters        | epoch=180, batch_size=128, lr_init=0.26 (0.26 for a single card, 1.5 for eight cards)   | epoch=180, batch_size=128, lr_init=0.26 (0.26 for a single card, 1.5 for eight cards)   |
+| Optimizer                  | Momentum                                                                                | Momentum                                                                                |
+| Loss function              | Softmax cross entropy                                                                   | Softmax cross entropy                                                                   |
+| Output                     | Probability                                                                             | Probability                                                                             |
+| Classification accuracy    | Eight cards: top1:74.21%, top5:91.712%                                                  | Single card: top1=73.9%, top5=91.62% ; Eight cards: top1=74.01%, top5=91.66%            |
+| Speed                      | Single card: milliseconds/step; eight cards: 87.173 milliseconds/step                   | Single card: 221 ms/step; Eight cards: 263 ms/step                                      |
+
+### Inference performance
+
+#### Single-Path-NAS on ImageNet-1k
+
+| Parameter                  | Ascend                                        | GPU (8 card)                                | GPU (1 card)                               |
+| -------------------------- | --------------------------------------------- | ------------------------------------------- | ------------------------------------------ |
+| Model                      | single-path-nas                               | single-path-nas                             | single-path-nas                            |
+| Resource                   | Ascend 310                                    | V100 GPU                                    | V100 GPU                                   |
+| Upload date                | 2021-06-27                                    | -                                           | -                                          |
+| MindSpore version          | 1.2.0                                         | 1.5.0                                       | 1.5.0                                      |
+| Dataset                    | ImageNet-1k Val, a total of 50,000 images     | ImageNet-1k Val, a total of 50,000 images   | ImageNet-1k Val, a total of 50,000 images  |
+| Classification accuracy    | top1: 74.214%, top5: 91.652%                  | top1: 74.01%, top5: 91.66%                  | top1: 73.9%, top5: 91.62%                  |
+| Speed                      | Average time 7.67324 ms of infer_count 50000  | 1285 images/second                          | 1285 images/second                         |
+
+# ModelZoo homepage
+
+Please visit the official website [homepage](https://gitee.com/mindspore/models)
diff --git a/research/cv/single_path_nas/README_CN.md b/research/cv/single_path_nas/README_CN.md
index 53920d81c1901ed262a27caffdeda9432aef4f1f..3c71cfe5396fdacd71cbfa3d2e9e0f09b7b94ebb 100644
--- a/research/cv/single_path_nas/README_CN.md
+++ b/research/cv/single_path_nas/README_CN.md
@@ -101,19 +101,33 @@ single-path-nas鐨勪綔鑰呯敤涓€涓�7x7鐨勫ぇ鍗风Н锛屾潵浠ｈ〃3x3銆�5x5鍜�7x7鐨�
 
 ```bash
 鈹溾攢鈹€ model_zoo
-  鈹溾攢鈹€ README_CN.md             // Single-Path-NAS鐩稿叧璇存槑
   鈹溾攢鈹€ scripts
-  鈹�   鈹溾攢鈹€run_train.sh          // 鍒嗗竷寮忓埌Ascend鐨剆hell鑴氭湰
-  鈹�   鈹溾攢鈹€run_eval.sh           // 娴嬭瘯鑴氭湰
-  鈹�   鈹溾攢鈹€run_infer_310.sh      // 310鎺ㄧ悊鑴氭湰
+  鈹�   鈹溾攢鈹€run_distribute_train.sh              // 鍒嗗竷寮忓埌Ascend鐨剆hell鑴氭湰
+  鈹�   鈹溾攢鈹€run_distribute_train_gou.sh          // Shell script for running the GPU distributed training
+  鈹�   鈹溾攢鈹€run_standalone_train.sh              // Shell script for running the Ascend standalone training
+  鈹�   鈹溾攢鈹€run_standalone_train_gpu.sh          // Shell script for running the GPU standalone training
+  鈹�   鈹溾攢鈹€run_eval.sh                          // 娴嬭瘯鑴氭湰
+  鈹�   鈹溾攢鈹€run_eval_gpu.sh                      // Shell script for running the GPU evaluation
+  鈹�   鈹溾攢鈹€run_infer_310.sh                     // 310鎺ㄧ悊鑴氭湰
   鈹溾攢鈹€ src
-  鈹�   鈹溾攢鈹€lr_scheduler          // 瀛︿範鐜囩浉鍏虫枃浠跺す锛屽寘鍚涔犵巼鍙樺寲绛栫暐鐨刾y鏂囦欢
-  鈹�   鈹溾攢鈹€dataset.py            // 鍒涘缓鏁版嵁闆�
-  鈹�   鈹溾攢鈹€CrossEntropySmooth.py // 鎹熷け鍑芥暟鐩稿叧
-  鈹�   鈹溾攢鈹€spnasnet.py           //  Single-Path-NAS缃戠粶鏋舵瀯
-  鈹�   鈹溾攢鈹€config.py             // 鍙傛暟閰嶇疆
-  鈹�   鈹溾攢鈹€utils.py              // spnasnet.py鐨勮嚜瀹氫箟缃戠粶妯″潡
-  鈹溾攢鈹€ train.py                 // 璁粌鍜屾祴璇曟枃浠�
+  鈹�   鈹溾攢鈹€lr_scheduler                         // 瀛︿範鐜囩浉鍏虫枃浠跺す锛屽寘鍚涔犵巼鍙樺寲绛栫暐鐨刾y鏂囦欢
+  鈹�   鈹�   鈹溾攢鈹€__init__.py
+  鈹�   鈹�   鈹溾攢鈹€linear_warmup.py                 // Definitions for the warm-up functionality
+  鈹�   鈹�   鈹溾攢鈹€warmup_cosine_annealing_lr.py    // Definitions for the cosine annealing learning rate schedule
+  鈹�   鈹�   鈹溾攢鈹€warmup_step_lr.py                // Definitions for the exponential learning rate schedule
+  鈹�   鈹溾攢鈹€__init__.py
+  鈹�   鈹溾攢鈹€dataset.py                           // 鍒涘缓鏁版嵁闆�
+  鈹�   鈹溾攢鈹€CrossEntropySmooth.py                // 鎹熷け鍑芥暟鐩稿叧
+  鈹�   鈹溾攢鈹€spnasnet.py                          // Single-Path-NAS缃戠粶鏋舵瀯
+  鈹�   鈹溾攢鈹€config.py                            // 鍙傛暟閰嶇疆
+  鈹�   鈹溾攢鈹€utils.py                             // spnasnet.py鐨勮嚜瀹氫箟缃戠粶妯″潡
+  鈹溾攢鈹€ create_imagenet2012_label.py            // Creating ImageNet labels
+  鈹溾攢鈹€ eval.py                                 // Evaluate the trained model
+  鈹溾攢鈹€ export.py                               // Export model to other formats
+  鈹溾攢鈹€ postprocess.py                          // Postprocess for the Ascend 310 inference.
+  鈹溾攢鈹€ README.md                               // Single-Path-NAS related instruction in English
+  鈹溾攢鈹€ README_CN.md                            // Single-Path-NAS鐩稿叧璇存槑
+  鈹溾攢鈹€ train.py                                // 璁粌鍜屾祴璇曟枃浠�
 ```
 
 ## 鑴氭湰鍙傛暟
diff --git a/research/cv/single_path_nas/eval.py b/research/cv/single_path_nas/eval.py
index a671c7a34afb9c87f8aa31e89b713ddfc71eb440..cbba5a025c30569a7e5a4635550b8378be370a24 100644
--- a/research/cv/single_path_nas/eval.py
+++ b/research/cv/single_path_nas/eval.py
@@ -27,7 +27,8 @@ from mindspore.nn.loss.loss import _Loss
 from mindspore.ops import functional as F
 from mindspore.ops import operations as P
 from mindspore.train.model import Model
-from mindspore.train.serialization import load_checkpoint, load_param_into_net
+from mindspore.train.serialization import load_checkpoint
+from mindspore.train.serialization import load_param_into_net
 
 import src.spnasnet as spnasnet
 from src.config import imagenet_cfg
@@ -38,6 +39,10 @@ set_seed(1)
 parser = argparse.ArgumentParser(description='single-path-nas')
 parser.add_argument('--dataset_name', type=str, default='imagenet', choices=['imagenet',],
                     help='dataset name.')
+parser.add_argument('--val_data_path', type=str, default=None,
+                    help='Path to the validation dataset (e.g. "/datasets/imagenet/val/")')
+parser.add_argument('--device_target', type=str, choices=['Ascend', 'GPU', 'CPU'],
+                    default=None, help='Target device: Ascend, GPU or CPU')
 parser.add_argument('--checkpoint_path', type=str, default='./ckpt_0', help='Checkpoint file path or dir path')
 parser.add_argument('--device_id', type=int, default=None, help='device id of Ascend. (Default: None)')
 args_opt = parser.parse_args()
@@ -65,18 +70,27 @@ if __name__ == '__main__':
 
     if args_opt.dataset_name == "imagenet":
         cfg = imagenet_cfg
-        dataset = create_dataset_imagenet(cfg.val_data_path, 1, False)
-        if not cfg.use_label_smooth:
-            cfg.label_smooth_factor = 0.0
-        loss = CrossEntropySmooth(sparse=True, reduction="mean",
-                                  smooth_factor=cfg.label_smooth_factor, num_classes=cfg.num_classes)
-        net = spnasnet.spnasnet(num_classes=cfg.num_classes)
-        model = Model(net, loss_fn=loss, metrics={'top_1_accuracy', 'top_5_accuracy'})
 
+        if args_opt.val_data_path is not None:
+            cfg.val_data_path = args_opt.val_data_path
+
+        if args_opt.device_target is not None:
+            cfg.device_target = args_opt.device_target
+
+        device_target = cfg.device_target
+        dataset_drop_reminder = (device_target == 'GPU')
+
+        dataset = create_dataset_imagenet(cfg.val_data_path, 1, False, drop_reminder=dataset_drop_reminder)
     else:
         raise ValueError("dataset is not support.")
 
-    device_target = cfg.device_target
+    if not cfg.use_label_smooth:
+        cfg.label_smooth_factor = 0.0
+    loss = CrossEntropySmooth(sparse=True, reduction="mean",
+                              smooth_factor=cfg.label_smooth_factor, num_classes=cfg.num_classes)
+    net = spnasnet.spnasnet(num_classes=cfg.num_classes)
+    model = Model(net, loss_fn=loss, metrics={'top_1_accuracy', 'top_5_accuracy'})
+
     context.set_context(mode=context.GRAPH_MODE, device_target=cfg.device_target)
     if device_target == "Ascend":
         if args_opt.device_id is not None:
@@ -84,14 +98,16 @@ if __name__ == '__main__':
         else:
             context.set_context(device_id=cfg.device_id)
 
+    print(f'Checkpoint path: {args_opt.checkpoint_path}')
+
     if os.path.isfile(args_opt.checkpoint_path) and args_opt.checkpoint_path.endswith('.ckpt'):
         param_dict = load_checkpoint(args_opt.checkpoint_path)
         load_param_into_net(net, param_dict)
         net.set_train(False)
         acc = model.eval(dataset)
-        print(f"model {args_opt.checkpoint_path}'s accuracy is {acc}")
+        print(f"model {args_opt.checkpoint_path}'s accuracy is {acc}", flush=True)
     elif os.path.isdir(args_opt.checkpoint_path):
-        file_list = os.listdir(args_opt.checkpoint_path)
+        file_list = sorted(os.listdir(args_opt.checkpoint_path))
         for filename in file_list:
             de_path = os.path.join(args_opt.checkpoint_path, filename)
             if de_path.endswith('.ckpt'):
@@ -100,6 +116,6 @@ if __name__ == '__main__':
                 net.set_train(False)
 
                 acc = model.eval(dataset)
-                print(f"model {de_path}'s accuracy is {acc}")
+                print(f"model {de_path}'s accuracy is {acc}", flush=True)
     else:
         raise ValueError("args_opt.checkpoint_path must be a checkpoint file or dir contains checkpoint(s)")
diff --git a/research/cv/single_path_nas/export.py b/research/cv/single_path_nas/export.py
index 1c2b73de6e210c7cc184c6ee35f19a2fc5bc0c04..e4caf65fab02dafea8ea388178c6fc11886edd24 100644
--- a/research/cv/single_path_nas/export.py
+++ b/research/cv/single_path_nas/export.py
@@ -33,11 +33,11 @@ parser.add_argument('--width', type=int, default=224, help='input width')
 parser.add_argument('--height', type=int, default=224, help='input height')
 parser.add_argument("--file_format", type=str, choices=["AIR", "ONNX", "MINDIR"], default="MINDIR", help="file format")
 parser.add_argument("--device_target", type=str, default="Ascend",
-                    choices=["Ascend",], help="device target(default: Ascend)")
+                    choices=["Ascend", "GPU"], help="device target(default: Ascend)")
 args = parser.parse_args()
 
 context.set_context(mode=context.GRAPH_MODE, device_target=args.device_target)
-if args.device_target == "Ascend":
+if args.device_target in ["Ascend", "GPU"]:
     context.set_context(device_id=args.device_id)
 else:
     raise ValueError("Unsupported platform.")
diff --git a/research/cv/single_path_nas/scripts/run_distribute_train.sh b/research/cv/single_path_nas/scripts/run_distribute_train.sh
index f4b030f79165e42af0db99491f2dab8978a351fb..fab9920c8223c639bf50adbc5e9284cdb9cf1165 100644
--- a/research/cv/single_path_nas/scripts/run_distribute_train.sh
+++ b/research/cv/single_path_nas/scripts/run_distribute_train.sh
@@ -16,7 +16,7 @@
 
 if [ $# != 1 ]
 then
-    echo "Usage: sh run_train.sh [RANK_TABLE_FILE]"
+    echo "Usage: bash run_train.sh [RANK_TABLE_FILE]"
 exit 1
 fi
 
diff --git a/research/cv/single_path_nas/scripts/run_distribute_train_gpu.sh b/research/cv/single_path_nas/scripts/run_distribute_train_gpu.sh
new file mode 100644
index 0000000000000000000000000000000000000000..0a64c51536dbe8a692542178305f803053b40f87
--- /dev/null
+++ b/research/cv/single_path_nas/scripts/run_distribute_train_gpu.sh
@@ -0,0 +1,62 @@
+#!/bin/bash
+# Copyright 2021 Huawei Technologies Co., Ltd
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ============================================================================
+
+
+if [ $# != 0 ] && [ $# != 1 ]
+then
+  echo "Usage: bash run_distribute_train_gpu.sh [TRAIN_DATASET](optional)"
+  exit 1
+fi
+
+if [ $# == 1 ] && [ ! -d $1 ]
+then
+  echo "error: TRAIN_DATASET=$1 is not a directory"
+  exit 1
+fi
+
+ulimit -u unlimited
+
+rm -rf ./train_parallel
+mkdir ./train_parallel
+cp ./*.py ./train_parallel
+cp -r ./src ./train_parallel
+cd ./train_parallel || exit
+env > env.log
+
+if [ $# == 0 ]
+then
+  mpirun -n 8 \
+    --allow-run-as-root \
+    --output-filename 'log_output' \
+    --merge-stderr-to-stdout \
+    python ./train.py \
+      --use_gpu_distributed=1 \
+      --device_target='GPU' \
+      --lr_init=1.5 > log.txt 2>&1 &
+fi
+
+if [ $# == 1 ]
+then
+  mpirun -n 8 \
+    --allow-run-as-root \
+    --output-filename 'log_output' \
+    --merge-stderr-to-stdout \
+    python ./train.py \
+      --use_gpu_distributed=1 \
+      --device_target='GPU' \
+      --data_path="$1" \
+      --lr_init=1.5 > log.txt 2>&1 &
+fi
diff --git a/research/cv/single_path_nas/scripts/run_eval.sh b/research/cv/single_path_nas/scripts/run_eval.sh
index 5d30b6166f349f903da8609d096fcc4c1d428436..6cef937954c9c3281d56cf2fca4a56ee75fe9b54 100644
--- a/research/cv/single_path_nas/scripts/run_eval.sh
+++ b/research/cv/single_path_nas/scripts/run_eval.sh
@@ -16,7 +16,7 @@
 
 if [ $# != 1 ]
 then
-    echo "Usage: sh run_eval.sh checkpoint_path_dir/checkpoint_path_file"
+    echo "Usage: bash run_eval.sh checkpoint_path_dir/checkpoint_path_file"
 exit 1
 fi
 
diff --git a/research/cv/single_path_nas/scripts/run_eval_gpu.sh b/research/cv/single_path_nas/scripts/run_eval_gpu.sh
new file mode 100644
index 0000000000000000000000000000000000000000..f30886a8269eb6728a7528e6a9191dd2d0cb2f20
--- /dev/null
+++ b/research/cv/single_path_nas/scripts/run_eval_gpu.sh
@@ -0,0 +1,51 @@
+#!/bin/bash
+# Copyright 2021 Huawei Technologies Co., Ltd
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ============================================================================
+
+if [ $# != 1 ] && [ $# != 2 ]
+then
+  echo "Usage: bash run_eval_gpu.sh [CKPT_FILE_OR_DIR] [VALIDATION_DATASET](optional)"
+  exit 1
+fi
+
+
+if [ ! -d $1 ] && [ ! -f $1 ]
+then
+  echo "error: CKPT_FILE_OR_DIR=$1 is neither a directory nor a file"
+  exit 1
+fi
+
+if [ $# == 2 ] && [ ! -d $2 ]
+then
+  echo "error: VALIDATION_DATASET=$2 is not a directory"
+  exit 1
+fi
+
+ulimit -u unlimited
+
+if [ $# == 1 ]
+then
+  GLOG_v=3 python eval.py \
+    --checkpoint_path="$1" \
+    --device_target="GPU" > "./eval.log" 2>&1 &
+fi
+
+if [ $# == 2 ]
+then
+  GLOG_v=3 python eval.py \
+    --checkpoint_path="$1" \
+    --val_data_path="$2" \
+    --device_target="GPU" > "./eval.log" 2>&1 &
+fi
diff --git a/research/cv/single_path_nas/scripts/run_infer_310.sh b/research/cv/single_path_nas/scripts/run_infer_310.sh
index ab9d81199e91a9de1db5077add1c5d4d8e17abae..7e7cbfee7dd7f1fea612843c0ec1671e202186d9 100644
--- a/research/cv/single_path_nas/scripts/run_infer_310.sh
+++ b/research/cv/single_path_nas/scripts/run_infer_310.sh
@@ -15,7 +15,7 @@
 # ============================================================================
 
 if [[ $# -lt 2 || $# -gt 3 ]]; then
-    echo "Usage: sh run_infer_310.sh [MINDIR_PATH] [DATA_PATH] [DEVICE_ID]
+    echo "Usage: bash run_infer_310.sh [MINDIR_PATH] [DATA_PATH] [DEVICE_ID]
     DEVICE_ID is optional, it can be set by environment variable device_id, otherwise the value is zero"
 exit 1
 fi
@@ -59,7 +59,7 @@ function compile_app()
     if [ -f "Makefile" ]; then
         make clean
     fi
-    sh build.sh &> build.log    
+    bash build.sh &> build.log
 }
 
 function infer()
diff --git a/research/cv/single_path_nas/scripts/run_standalone_train.sh b/research/cv/single_path_nas/scripts/run_standalone_train.sh
index af523897117b1bbd63cd9d7e5b25cff5e1a93f8d..884c99f8b3fa8d0f5d664725c78f9e7466e9d069 100644
--- a/research/cv/single_path_nas/scripts/run_standalone_train.sh
+++ b/research/cv/single_path_nas/scripts/run_standalone_train.sh
@@ -16,7 +16,7 @@
 
 if [ $# != 0 ]
 then
-    echo "Usage: sh run_train.sh"
+    echo "Usage: bash run_train.sh"
 exit 1
 fi
 
diff --git a/research/cv/single_path_nas/scripts/run_standalone_train_gpu.sh b/research/cv/single_path_nas/scripts/run_standalone_train_gpu.sh
new file mode 100644
index 0000000000000000000000000000000000000000..16b3e7fbd4c0d5cf3e7e42a649c9c0b0d9ac22d4
--- /dev/null
+++ b/research/cv/single_path_nas/scripts/run_standalone_train_gpu.sh
@@ -0,0 +1,46 @@
+#!/bin/bash
+# Copyright 2021 Huawei Technologies Co., Ltd
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ============================================================================
+
+if [ $# != 0 ] && [ $# != 1 ]
+then
+  echo "Usage: bash run_standalone_train_gpu.sh [TRAIN_DATASET](optional)"
+  exit 1
+fi
+
+if [ $# == 1 ] && [ ! -d $1 ]
+then
+  echo "error: TRAIN_DATASET=$1 is not a directory"
+  exit 1
+fi
+
+ulimit -u unlimited
+
+rm -rf ./train_standalone
+mkdir ./train_standalone
+cp ./*.py ./train_standalone
+cp -r ./src ./train_standalone
+cd ./train_standalone || exit
+env > env.log
+
+if [ $# == 0 ]
+then
+  python train.py --device_target='GPU' --lr_init=0.26 > log.txt 2>&1 &
+fi
+
+if [ $# == 1 ]
+then
+  python train.py --device_target='GPU' --data_path="$1" --lr_init=0.26 > log.txt 2>&1 &
+fi
diff --git a/research/cv/single_path_nas/src/config.py b/research/cv/single_path_nas/src/config.py
index fba997b6f099de1d098f958e1bacf175f3413a8d..54e9594cfae81bcbae84b0519dc1fd0739ab2fe9 100644
--- a/research/cv/single_path_nas/src/config.py
+++ b/research/cv/single_path_nas/src/config.py
@@ -42,7 +42,7 @@ imagenet_cfg = edict({
     'lr_epochs': [30, 60, 90],
     'lr_gamma': 0.3,
     'eta_min': 0.0,
-    'T_max': 150,
+    'T_max': 180,
     'warmup_epochs': 0,
 
     # loss related
diff --git a/research/cv/single_path_nas/src/dataset.py b/research/cv/single_path_nas/src/dataset.py
index 97b64529478e7e81af657edf1a6abcb64e370dd0..ac51ad69e82410668aef968c4396d816be056806 100644
--- a/research/cv/single_path_nas/src/dataset.py
+++ b/research/cv/single_path_nas/src/dataset.py
@@ -15,7 +15,6 @@
 """
 Data operations, will be used in train.py and eval.py
 """
-import os
 
 import mindspore.common.dtype as mstype
 import mindspore.dataset as ds
@@ -26,28 +25,27 @@ from src.config import imagenet_cfg
 
 
 def create_dataset_imagenet(dataset_path, repeat_num=1, training=True,
-                            num_parallel_workers=None, shuffle=True):
+                            num_parallel_workers=None, shuffle=True,
+                            device_num=None, rank_id=None, drop_reminder=False):
     """
     create a train or eval imagenet2012 dataset for resnet50
 
     Args:
         dataset_path(string): the path of dataset.
-        do_train(bool): whether dataset is used for train or eval.
         repeat_num(int): the repeat times of dataset. Default: 1
-        batch_size(int): the batch size of dataset. Default: 32
-        target(str): the device target. Default: Ascend
-
+        training(bool): whether dataset is used for train or eval. Default: True.
+        num_parallel_workers(int): Number of parallel workers. Default: None.
+        shuffle(bool): whether dataset is used for train or eval. Default: True.
+        device_num(int): Number of devices for the distributed training. Default: None
+        rank_id(int): Rank of the process for the distributed training. Default: None
+        drop_reminder (bool): Drop reminder of the dataset,
+            if its size is less than the specified batch size. Default: False
     Returns:
         dataset
     """
 
-    device_num, rank_id = _get_rank_info()
-
-    if device_num == 1:
-        data_set = ds.ImageFolderDataset(dataset_path, num_parallel_workers=num_parallel_workers, shuffle=shuffle)
-    else:
-        data_set = ds.ImageFolderDataset(dataset_path, num_parallel_workers=num_parallel_workers, shuffle=shuffle,
-                                         num_shards=device_num, shard_id=rank_id)
+    data_set = ds.ImageFolderDataset(dataset_path, num_parallel_workers=num_parallel_workers,
+                                     shuffle=shuffle, num_shards=device_num, shard_id=rank_id)
 
     assert imagenet_cfg.image_height == imagenet_cfg.image_width, "image_height not equal image_width"
     image_size = imagenet_cfg.image_height
@@ -73,32 +71,14 @@ def create_dataset_imagenet(dataset_path, repeat_num=1, training=True,
         ]
 
     transform_label = [C.TypeCast(mstype.int32)]
-    if training:
-        data_set = data_set.map(input_columns="image", num_parallel_workers=16, operations=transform_img)
-        data_set = data_set.map(input_columns="label", num_parallel_workers=4, operations=transform_label)
-    else:
-        data_set = data_set.map(input_columns="image", num_parallel_workers=16, operations=transform_img)
-        data_set = data_set.map(input_columns="label", num_parallel_workers=4, operations=transform_label)
+    data_set = data_set.map(input_columns="image", num_parallel_workers=16,
+                            operations=transform_img, python_multiprocessing=True)
+    data_set = data_set.map(input_columns="label", num_parallel_workers=4,
+                            operations=transform_label)
     # apply batch operations
-    data_set = data_set.batch(imagenet_cfg.batch_size, drop_remainder=False)
+    data_set = data_set.batch(imagenet_cfg.batch_size, drop_remainder=drop_reminder)
 
     # apply dataset repeat operation
     data_set = data_set.repeat(repeat_num)
 
     return data_set
-
-
-def _get_rank_info():
-    """
-    get rank size and rank id
-    """
-    rank_size = int(os.environ.get("RANK_SIZE", 1))
-
-    if rank_size > 1:
-        from mindspore.communication.management import get_rank, get_group_size
-        rank_size = get_group_size()
-        rank_id = get_rank()
-    else:
-        rank_size = rank_id = None
-
-    return rank_size, rank_id
diff --git a/research/cv/single_path_nas/train.py b/research/cv/single_path_nas/train.py
index 1bc22b118e8b1f71ad1b604943a53c49d27eb789..3a54e28f3a794b31935b0e345e90aaff1207955f 100644
--- a/research/cv/single_path_nas/train.py
+++ b/research/cv/single_path_nas/train.py
@@ -22,11 +22,17 @@ import os
 from mindspore import Tensor
 from mindspore import context
 from mindspore.common import set_seed
-from mindspore.communication.management import init, get_rank
+from mindspore.communication.management import get_group_size
+from mindspore.communication.management import get_rank
+from mindspore.communication.management import init
 from mindspore.context import ParallelMode
 from mindspore.nn.optim.momentum import Momentum
-from mindspore.train.callback import ModelCheckpoint, CheckpointConfig, LossMonitor, TimeMonitor
-from mindspore.train.loss_scale_manager import DynamicLossScaleManager, FixedLossScaleManager
+from mindspore.train.callback import CheckpointConfig
+from mindspore.train.callback import LossMonitor
+from mindspore.train.callback import ModelCheckpoint
+from mindspore.train.callback import TimeMonitor
+from mindspore.train.loss_scale_manager import DynamicLossScaleManager
+from mindspore.train.loss_scale_manager import FixedLossScaleManager
 from mindspore.train.model import Model
 
 from src import spnasnet
@@ -64,10 +70,21 @@ def lr_steps_imagenet(_cfg, steps_per_epoch):
 
 if __name__ == '__main__':
     parser = argparse.ArgumentParser(description='Single-Path-NAS Training')
-    parser.add_argument('--dataset_name', type=str, default='imagenet', choices=['imagenet',],
+    parser.add_argument('--dataset_name', type=str, default='imagenet', choices=['imagenet'],
                         help='dataset name.')
-    parser.add_argument('--filter_prefix', type=str, default='huawei', help='filter_prefix name.')
-    parser.add_argument('--device_id', type=int, default=None, help='device id of Ascend. (Default: None)')
+    parser.add_argument('--filter_prefix', type=str, default='huawei',
+                        help='filter_prefix name.')
+    parser.add_argument('--lr_init', type=float, default=None,
+                        help='Override the learning rate value in the configuration file')
+    parser.add_argument('--device_id', type=int, default=None,
+                        help='device id of Ascend. (Default: None)')
+    parser.add_argument('--device_target', type=str, choices=['Ascend', 'GPU'],
+                        default=None, help='Target device: Ascend or GPU')
+    parser.add_argument('--use_gpu_distributed', type=int, default=0,
+                        help='Enable distributed GPU training.')
+    parser.add_argument('--data_path', type=str, default=None,
+                        help='Path to the training dataset (e.g. "/datasets/imagenet/train/")')
+
     args_opt = parser.parse_args()
 
     if args_opt.dataset_name == "imagenet":
@@ -75,9 +92,23 @@ if __name__ == '__main__':
     else:
         raise ValueError("Unsupported dataset.")
 
+    if args_opt.data_path is not None:
+        cfg.data_path = args_opt.data_path
+
     # set context
+    if args_opt.device_target is not None:
+        cfg.device_target = args_opt.device_target
+
+    if args_opt.lr_init is not None:
+        cfg.lr_init = args_opt.lr_init
+
     device_target = cfg.device_target
-    context.set_context(mode=context.GRAPH_MODE, device_target=cfg.device_target, enable_graph_kernel=True)
+
+    # We enabling the graph kernel only for the Ascend device.
+    enable_graph_kernel = (device_target == 'Ascend')
+
+    context.set_context(mode=context.GRAPH_MODE, device_target=cfg.device_target,
+                        enable_graph_kernel=enable_graph_kernel)
 
     device_num = int(os.environ.get("DEVICE_NUM", 1))
 
@@ -90,15 +121,37 @@ if __name__ == '__main__':
 
         if device_num > 1:
             context.reset_auto_parallel_context()
-            context.set_auto_parallel_context(device_num=device_num, parallel_mode=ParallelMode.DATA_PARALLEL,
+            context.set_auto_parallel_context(device_num=device_num,
+                                              parallel_mode=ParallelMode.DATA_PARALLEL,
                                               gradients_mean=True)
             init()
             rank = get_rank()
+    elif device_target == "GPU":
+        # Using the rank and devices number determined by the communication module.
+        if args_opt.use_gpu_distributed == 1:
+            init('nccl')
+            device_num = get_group_size()
+            rank = get_rank()
+            context.reset_auto_parallel_context()
+            context.set_auto_parallel_context(device_num=device_num,
+                                              parallel_mode=ParallelMode.DATA_PARALLEL,
+                                              gradients_mean=True)
+        else:
+            device_num = 1
+
     else:
         raise ValueError("Unsupported platform.")
 
+    dataset_drop_reminder = (device_target == 'GPU')
+
     if args_opt.dataset_name == "imagenet":
-        dataset = create_dataset_imagenet(cfg.data_path, 1)
+        if device_num > 1:
+            dataset = create_dataset_imagenet(cfg.data_path, 1, num_parallel_workers=8,
+                                              device_num=device_num, rank_id=rank,
+                                              drop_reminder=dataset_drop_reminder)
+        else:
+            dataset = create_dataset_imagenet(cfg.data_path, 1, num_parallel_workers=8,
+                                              drop_reminder=dataset_drop_reminder)
     else:
         raise ValueError("Unsupported dataset.")
 
@@ -111,7 +164,6 @@ if __name__ == '__main__':
     if args_opt.dataset_name == 'imagenet':
         lr = lr_steps_imagenet(cfg, batch_num)
 
-
         def get_param_groups(network):
             """ get param groups """
             decay_params = []
@@ -152,6 +204,9 @@ if __name__ == '__main__':
         else:
             loss_scale_manager = FixedLossScaleManager(cfg.loss_scale, drop_overflow_update=False)
 
+    else:
+        raise ValueError("Unsupported dataset.")
+
     model = Model(net, loss_fn=loss, optimizer=opt, metrics={'top_1_accuracy', 'top_5_accuracy', 'loss'},
                   amp_level="O3", keep_batchnorm_fp32=True, loss_scale_manager=loss_scale_manager)