Skip to content
Snippets Groups Projects
Unverified Commit efa6920a authored by i-robot's avatar i-robot Committed by Gitee
Browse files

!2061 AlphaPose GPU training

Merge pull request !2061 from Alexander Melekhin/alphapose-gpu
parents 830a6601 628c784b
No related branches found
No related tags found
No related merge requests found
Showing
with 970 additions and 299 deletions
# Contents
<!-- TOC -->
- [Alphapose Description](#alphapose-description)
- [Model Architecture](#model-architecture)
- [Dataset](#dataset)
- [Features](#features)
- [mixed precision](#mixed-precision)
- [Environmental requirements](#environmental-requirements)
- [Quick start](#quick-start)
- [Script description](#script-description)
- [Scripts and sample code](#scripts-and-sample-code)
- [Script parameters](#script-parameters)
- [Training process](#training-process)
- [Evaluation process](#evaluation-process)
- [310 Inference Process](#310-inference-process)
- [Model description](#model-description)
- [Performance](#performance)
- [Evaluation performance](#evaluation-performance)
- [Inference performance](#inference-performance)
- [Random Seed Description](#random-seed-description)
- [ModelZoo Homepage](#modelzoo-homepage)
<!-- /TOC -->
# Alphapose Description
## Overview
AlphaPose was proposed by Lu Cewu's team of Shanghai Jiaotong University, and the author proposed the Regional Multi-Person Pose Estimation (RMPE) framework. Mainly include symmetric spatial transformer network (SSTN), Parametric Pose Non-Maximum-Suppression (NMS), and Pose-Guided Proposals Generator (PGPG). And use symmetric spatial transformer network (SSTN), deep proposals generator (DPG), parametric pose nonmaximum suppression (p-NMS) three techniques to solve the problem of multi-person pose estimation in complex scenes.
For details of the AlphaPose model network, please refer to [Paper 1](https://arxiv.org/pdf/1612.00137.pdf),The Mindspore implementation of the AlphaPose model network is based on the Pytorch version released by the Lu Cewu team of Shanghai Jiaotong University. For details, please refer to (<https://github.com/MVIG-SJTU/AlphaPose>.
## paper
1. [paper](https://arxiv.org/pdf/1804.06208.pdf):Fang H S , Xie S , Tai Y W , et al. RMPE: Regional Multi-person Pose Estimation
# Model Architecture
The overall network architecture of AlphaPose is as follows:
[Link](https://arxiv.org/abs/1612.00137)
# Dataset
Datasets used: [COCO2017](https://cocodataset.org/#download)
- Dataset size:
- Training set: 19.56G, 118,287 images
- Test set: 825MB, 5,000 images
- Data format: JPG file
- Note: Data is processed in src/dataset.py
# Features
## mixed precision
The training method using [mixed precision](https://www.mindspore.cn/docs/programming_guide/en/r1.6/enable_mixed_precision.html) uses support for single-precision and half-precision data to improve the training speed of deep learning neural networks , while maintaining the network accuracy that single-precision training can achieve. Mixed-precision training increases computational speed and reduces memory usage while enabling training of larger models on specific hardware or enabling larger batches of training.
Taking the FP16 operator as an example, if the input data type is FP32, the MindSpore background will automatically reduce the precision to process the data. You can open the INFO log and search for "reduce precision" to view operators with reduced precision.
# Environmental requirements
- Hardware (Ascend)
- Prepare the Ascend processor to build the hardware environment.
- Framework
- [MindSpore](https://www.mindspore.cn/install/en)
- For details, see the following resources:
- [MindSpore Tutorial](https://www.mindspore.cn/tutorials/zh-CN/master/index.html)
- [MindSpore Python API](https://www.mindspore.cn/docs/api/zh-CN/master/index.html)
# Quick start
After installing MindSpore through the official website, you can follow the steps below for training and evaluation:
- Pre-trained models
AlphaPose model uses ResNet-50 network trained on ImageNet as backbone. You can run the Resnet training script in [official model zoo](https://gitee.com/mindspore/models/tree/master/official/cv/resnet) to get the model weight file or download trained checkpoint from [here](https://download.mindspore.cn/model_zoo/r1.3/resnet50_ascend_v130_imagenet2012_official_cv_bs32_acc77.06/). The pre-training file name should be resnet50.ckpt.
- Dataset preparation
The Alphapose network model uses the COCO2017 dataset for training and inference. The dataset can be downloaded from the [official website](https://cocodataset.org/) official website.
- Configuration
Set desired configuration in ```default_config.yaml``` file or create new one.
- Ascend processor environment to run
```bash
# Distributed training
bash scripts/run_distribute_train.sh --is_model_arts False --run_distribute True
# Stand-alone training
bash scripts/run_standalone_train.sh --device_id 0
# Run the evaluation example
bash scripts/run_eval.sh [DEVICE_TARGET] [CONFIG] [CKPT_PATH] [DATASET]
# run demo
bash scripts/run_demo.sh
```
- GPU environment to run
```bash
# Distributed training
bash scripts/run_distribute_train_gpu.sh [DEVICE_NUM] [VISIBLE_DEVICES(0,1,2,3,4,5,6,7)] [config_file] [dataset_dir] [pretrained_backbone]
# Stand-alone training
bash scripts/run_standalone_train_gpu.sh [config_file] [dataset_dir] [pretrained_backbone]
# Run the evaluation example
bash scripts/run_eval.sh [DEVICE_TARGET] [CONFIG] [CKPT_PATH] [DATASET]
```
# Script description
## Scripts and sample code
```text
└──AlphaPose
├── README.md
├── scripts
├── run_distribute_train.sh # Start Ascend distributed training (8 cards)
├── run_distribute_train_gpu.sh # Start GPU distributed training (8 cards)
├── run_demo.sh # Start the demo (single card)
├── run_eval.sh # Start Ascend eval
├── run_standalone_train.sh # Start Ascend stand-alone training (single card)
└── run_standalone_train_gpu.sh # Start GPU stand-alone training (single card)
├── src
├── utils
├── coco.py # COCO dataset evaluation tools
├── fn.py # Drawing human poses based on key points
├── inference.py # Heatmap keypoint prediction
├── nms.py # nms
└── transforms.py # Image processing transformation
├── config.py # parameter configuration
├── dataset.py # data preprocessing
├── DUC.py # Network part structure DUC
├── FastPose.py # Backbone network definition
├── network_with_loss.py # Loss function definition
├── SE_module.py # Network part structure SE
└── SE_module.py # Part of the network structure ResNet50
├── demo.py # demo
├── data_to_bin.py # Convert the images in the dataset to binary
├── default_config.yaml # Default configuration file
├── requirements.txt # pip requirements
├── export.py # Convert ckpt model file to mindir
├── postprocess.py # Post-processing precision
├── eval.py # Evaluate the network
└── train.py # train the network
```
## script parameters
Configure relevant parameters in ```default_config.yaml```.
- Configure model related parameters:
```python
MODEL_INIT_WEIGHTS = True # Initialize model weights
MODEL_PRETRAINED = 'resnet50.ckpt' # pretrained model
MODEL_NUM_JOINTS = 17 # number of key points
MODEL_IMAGE_SIZE = [192, 256] # image size
```
- Configure network related parameters:
```python
NETWORK_NUM_LAYERS = 50 # Resnet backbone network layers
NETWORK_DECONV_WITH_BIAS = False # network deconvolution bias
NETWORK_NUM_DECONV_LAYERS = 3 # The number of network deconvolution layers
NETWORK_NUM_DECONV_FILTERS = [256, 256, 256] # Deconvolution layer filter size
NETWORK_NUM_DECONV_KERNELS = [4, 4, 4] # Deconvolution layer kernel size
NETWORK_FINAL_CONV_KERNEL = 1 # Final convolutional layer kernel size
NETWORK_HEATMAP_SIZE = [48, 64] # Heatmap size
```
- Configure training related parameters:
```python
TRAIN_SHUFFLE = True # training data in random order
TRAIN_BATCH_SIZE = 64 # training batch size
TRAIN_BEGIN_EPOCH = 0 # Test dataset filename
DATASET_FLIP = True # The dataset is randomly flipped
DATASET_SCALE_FACTOR = 0.3 # dataset random scale factor
DATASET_ROT_FACTOR = 40 # Dataset random rotation factor
TRAIN_BEGIN_EPOCH = 0 # number of initial cycles
TRAIN_END_EPOCH = 270 # number of final cycles
TRAIN_LR = 0.001 # initial learning rate
TRAIN_LR_FACTOR = 0.1 # Learning rate reduction factor
```
- Configure test related parameters:
```python
TEST_BATCH_SIZE = 32 # test batch size
TEST_FLIP_TEST = True # flip validation
TEST_USE_GT_BBOX = False # Use gt boxes
```
- Configure nms related parameters:
```python
TEST_OKS_THRE = 0.9 # OKS threshold
TEST_IN_VIS_THRE = 0.2 # Visualization Threshold
TEST_BBOX_THRE = 1.0 # candidate box threshold
TEST_IMAGE_THRE = 0.0 # image threshold
TEST_NMS_THRE = 1.0 # nms threshold
```
- Configure demo related parameters:
```python
detect_image = "images/1.jpg" # Detect pictures
yolo_image_size = [416, 416] # yolo network input image size
yolo_ckpt = "yolo/yolo.ckpt" # yolo network weight
fast_pose_ckpt = "fastpose.ckpt" # fastpose network weights
yolo_threshold = 0.1 # bbox threshold
```
## training process
### usage
#### Ascend processor environment running
```bash
# Distributed training
bash scripts/run_distribute_train.sh --is_model_arts False --run_distribute True
# Stand-alone training
bash scripts/run_standalone_train.sh --device_id 0
# Run the evaluation example
bash scripts/run_eval.sh checkpoint_path device_id
```
#### GPU environment
```bash
# Distributed training
bash scripts/run_distribute_train_gpu.sh [DEVICE_NUM] [VISIBLE_DEVICES(0,1,2,3,4,5,6,7)] [config_file] [dataset_dir] [pretrained_backbone]
# Stand-alone training
bash scripts/run_standalone_train_gpu.sh [config_file] [dataset_dir] [pretrained_backbone]
# Run the evaluation example
bash scripts/run_eval.sh [DEVICE_TARGET] [CONFIG] [CKPT_PATH] [DATASET]
```
### result
- Train Alphapose with COCO2017 dataset
```text
Distributed training results (8P)
epoch:1 step:292, loss is 0.001391
epoch:2 step:292, loss is 0.001326
epoch:3 step:292, loss is 0.001001
epoch:4 step:292, loss is 0.0007763
epoch:5 step:292, loss is 0.0006757
...
epoch:288 step:292, loss is 0.0002837
epoch:269 step:292, loss is 0.0002367
epoch:270 step:292, loss is 0.0002532
```
## evaluation process
### usage
#### Ascend processor environment running
The corresponding model inference can be performed by changing the "TEST_MODEL_FILE" file in the config file.
```bash
# evaluate
bash scripts/run_eval.sh [DEVICE_TARGET] [CONFIG] [CKPT_PATH] [DATASET]
```
#### GPU environment
```bash
# Run the evaluation example
bash scripts/run_eval.sh [DEVICE_TARGET] [CONFIG] [CKPT_PATH] [DATASET]
```
### result
Alphapose is evaluated using val2017 in the COCO2017 dataset folder as follows:
```text
coco eval results saved to /cache/train_output/multi_train_poseresnet_v5_2-140_2340/keypoints_results.pkl
AP: 0.723
```
## 310 Inference Process
### usage
#### export model
```python
# export model
python export.py --ckpt_url [ckpt_url] --device_target [device_target] --device_id [device_id] --file_name [file_name] --file_format [file_format]
```
#### Ascend310 processor environment running
```bash
# 310 inference
bash run_infer_310.sh [MINDIR_PATH] [DATA_PATH] [NEED_PREPROCESS] [DEVICE_ID]
```
#### Acquire accuracy
```bash
# Acquire accuracy
more acc.log
```
### result
```text
AP: 0.723
```
# Model description
## performance
### Evaluation performance
#### Performance parameters on coco2017
| parameter | Ascend | GPU |
| -------------------------- | ------------------------------------------ | ------------------ |
| model version | ResNet50 | ResNet50 |
| resource | Ascend 910 ;CPU 2.60GHz,192核;内存:755G | 8p RTX 3090 24GB |
| upload date | 2020-12-16 | 2022-02-16 |
| MindSpore version | 1.3 | 1.6 |
| data set | coco2017 | coco2017
| training parameters | epoch=270, steps=2336, batch_size = 64, lr=0.001 | epoch=270, batch_size = 128, lr=0.001 |
| optimizer | Adam | Adam |
| loss function | Mean Squared Error | Mean Squared Error |
| output | heatmap | heatmap |
| loss | 0.00025 | 0.00026 |
| speed | 单卡:138.9毫秒/步; 8卡:147.28毫秒/步 | 8p: 441 ms/step |
| total duration | 单卡:24h22m36s; 8卡:3h13m31s | 8p: 04h 48m 00s |
| parameter(M) | 13.0 | 13.0 |
| Fine-tune checkpoints | 389.64M (.ckpt文件) | 338M (.ckpt) |
| inference model | 57.26M (.om文件), 112.76M(.MINDIR文件) | - |
### Inference performance
#### Performance parameters on coco2017
| parameter | Ascend | GPU |
| ------------------- | ----------------------- | ------------ |
| model version | ResNet50 | ResNet50 |
| resource | Ascend 910 | RTX 3090 24 GB |
| upload date | 2020-12-16 | 2022-02-16 |
| MindSpore Version | 1.3 | 1.6 |
| data set | coco2017 | coco2017 |
| batch_size | 32 | 32 |
| output | heatmap | heatmap |
| accuracy | 单卡: 72.3%; 8卡:72.5% | 72.2 % |
| inference model | 389.64M (.ckpt文件) | 338M (.ckpt) |
# Random Seed Description
The seed in the "create_dataset" function is set in dataset.py, and the initial network weights are used in model.py.
# ModelZoo Homepage
Please visit the official website [homepage](https://gitee.com/mindspore/models).
# general
DEVICE_ID: 0
DEVICE_TARGET: ''
VERSION: 'commit'
TRAIN_SEED: 1
EVAL_SEED: 1
DATASET_SEED: 1
RUN_DISTRIBUTE: False
SUMMARY_DIR: './summary'
# modelarts
MODELARTS_IS_MODEL_ARTS: False
MODELARTS_DATA_URL: ''
MODELARTS_TRAIN_URL: ''
MODELARTS_CACHE_INPUT: '/cache/data_tzh/'
MODELARTS_CACHE_OUTPUT: '/cache/train_out/'
# model network parameters
MODEL_IS_TRAINED: False # Initially True
MODEL_INIT_WEIGHTS: True
MODEL_PRETRAINED: 'resnet50.ckpt'
MODEL_NUM_JOINTS: 17
MODEL_IMAGE_SIZE: [192, 256] # input image size
# network
NETWORK_NUM_LAYERS: 50
NETWORK_DECONV_WITH_BIAS: False
NETWORK_NUM_DECONV_LAYERS: 3
NETWORK_NUM_DECONV_FILTERS: [256, 256, 256]
NETWORK_NUM_DECONV_KERNELS: [4, 4, 4]
NETWORK_FINAL_CONV_KERNEL: 1
NETWORK_REVERSE: True
NETWORK_TARGET_TYPE: 'gaussian'
NETWORK_HEATMAP_SIZE: [48, 64]
NETWORK_SIGMA: 2
# loss
LOSS_USE_TARGET_WEIGHT: True
# dataset
DATASET_TYPE: 'COCO'
DATASET_ROOT: ''
DATASET_TRAIN_SET: 'train2017'
DATASET_TRAIN_JSON: 'annotations/person_keypoints_train2017.json'
DATASET_TEST_SET: 'val2017'
DATASET_TEST_JSON: 'annotations/person_keypoints_val2017.json'
# training data augmentation
DATASET_FLIP: True
DATASET_SCALE_FACTOR: 0.3
DATASET_ROT_FACTOR: 40
# train
TRAIN_SHUFFLE: True
TRAIN_BATCH_SIZE: 128 # 128 in original paper
TRAIN_BEGIN_EPOCH: 0
TRAIN_END_EPOCH: 270 # 140 in original paper
TRAIN_LR: 0.001
TRAIN_LR_FACTOR: 0.1
TRAIN_LR_STEP: [170, 200]
TRAIN_NUM_PARALLEL_WORKERS: 6
TRAIN_SAVE_CKPT: True
TRAIN_nClasses: 17
TRAIN_CKPT_PATH: "./"
# valid
TEST_device_target: "Ascend"
TEST_device_id: 0
TEST_BATCH_SIZE: 32
TEST_FLIP_TEST: True
TEST_POST_PROCESS: True
TEST_SHIFT_HEATMAP: True
TEST_USE_GT_BBOX: False
TEST_NUM_PARALLEL_WORKERS: 2
TEST_MODEL_FILE: "FastPose.ckpt"
TEST_COCO_BBOX_FILE: '/COCO_BBOX_FILE/COCO_val2017_detections_AP_H_56_person.json'
TEST_OUTPUT_DIR: 'results/'
# export
file_name: 'simple_baselines'
ckpt_url: 'FastPose.ckpt'
file_format: 'MINDIR'
device_target: 'CPU'
device_id: 0
# demo
detect_image: "demo.jpg"
yolo_image_size: [416, 416]
yolo_ckpt: "yolov3.ckpt"
fast_pose_ckpt: "FastPose.ckpt"
# eval
checkpoint_path: ''
# confidence under ignore_threshold means no object when training
yolo_threshold: 0.1
save_bbox_image: True
result_path: "demo_result/"
# nms
TEST_OKS_THRE: 0.9
TEST_IN_VIS_THRE: 0.2
TEST_BBOX_THRE: 1.0
TEST_IMAGE_THRE: 0.0
TEST_NMS_THRE: 1.0
# 310 infer-related
INFER_PRE_RESULT_PATH: '_/preprocess_Result'
INFER_POST_RESULT_PATH: '_/result_Files'
......@@ -66,15 +66,15 @@ def inference(bboxes):
'''
inference
'''
image_width = config.MODEL.IMAGE_SIZE[0]
image_height = config.MODEL.IMAGE_SIZE[1]
image_width = config.MODEL_IMAGE_SIZE[0]
image_height = config.MODEL_IMAGE_SIZE[1]
aspect_ratio = image_width * 1.0 / image_height
scales, centers = bbox2sc(bboxes, aspect_ratio)
model = createModel()
ckpt_name = config.fast_pose_ckpt
print('loading model fastpose_ckpt from {}'.format(ckpt_name))
load_param_into_net(model, load_checkpoint(ckpt_name))
image_size = np.array(config.MODEL.IMAGE_SIZE, dtype=np.int32)
image_size = np.array(config.MODEL_IMAGE_SIZE, dtype=np.int32)
data_numpy = cv.imread(config.detect_image, cv.IMREAD_COLOR | cv.IMREAD_IGNORE_ORIENTATION)
......@@ -82,7 +82,7 @@ def inference(bboxes):
inputs = []
bbox_num = bboxes.shape[0]
image_size = np.array(config.MODEL.IMAGE_SIZE, dtype=np.int32)
image_size = np.array(config.MODEL_IMAGE_SIZE, dtype=np.int32)
for i in range(bbox_num):
s, c = scales[i], centers[i]
trans = get_affine_transform(c, s, 0, image_size, inv=0)
......@@ -91,12 +91,12 @@ def inference(bboxes):
inputs.append(image_data)
inputs = np.array(inputs, dtype=np.float32)
output = model(Tensor(inputs, float32)).asnumpy()
if config.TEST.FLIP_TEST:
if config.TEST_FLIP_TEST:
inputs_flipped = Tensor(inputs[:, :, :, ::-1], float32)
output_flipped = model(inputs_flipped)
output_flipped = flip_back(output_flipped.asnumpy(), flip_pairs)
if config.TEST.SHIFT_HEATMAP:
if config.TEST_SHIFT_HEATMAP:
output_flipped[:, :, :, 1:] = \
output_flipped.copy()[:, :, :, 0:-1]
......@@ -127,7 +127,7 @@ def DataWrite(result):
def main():
context.set_context(mode=context.GRAPH_MODE,
device_target="Ascend", save_graphs=False, device_id=5)
device_target=config.DEVICE_TARGET, save_graphs=False)
bboxes = detect_bbox()
pose_preds, pose_scores = inference(bboxes)
......
......@@ -17,7 +17,6 @@ This file evaluates the model used.
'''
from __future__ import division
import argparse
import os
import time
import numpy as np
......@@ -34,25 +33,20 @@ from src.utils.coco import evaluate
from src.utils.transforms import flip_back
from src.utils.inference import get_final_preds
if config.MODELARTS.IS_MODEL_ARTS:
if config.MODELARTS_IS_MODEL_ARTS:
import moxing as mox
set_seed(config.GENERAL.EVAL_SEED)
device_id = int(os.getenv('DEVICE_ID'))
set_seed(config.EVAL_SEED)
device_id = int(os.getenv('DEVICE_ID', '0'))
def parse_args():
parser = argparse.ArgumentParser(description='Evaluate')
parser.add_argument('--checkpoint_path', type=str, default=None, help='Checkpoint file path')
args = parser.parse_args()
return args
def validate(cfg, val_dataset, model, output_dir, ann_path):
'''
validate
'''
model.set_train(False)
num_samples = val_dataset.get_dataset_size() * cfg.TEST.BATCH_SIZE
all_preds = np.zeros((num_samples, cfg.MODEL.NUM_JOINTS, 3),
num_samples = val_dataset.get_dataset_size() * cfg.TEST_BATCH_SIZE
all_preds = np.zeros((num_samples, cfg.MODEL_NUM_JOINTS, 3),
dtype=np.float32)
all_boxes = np.zeros((num_samples, 2))
image_id = []
......@@ -62,12 +56,12 @@ def validate(cfg, val_dataset, model, output_dir, ann_path):
for item in val_dataset.create_dict_iterator():
inputs = item['image'].asnumpy()
output = model(Tensor(inputs, float32)).asnumpy()
if cfg.TEST.FLIP_TEST:
if cfg.TEST_FLIP_TEST:
inputs_flipped = Tensor(inputs[:, :, :, ::-1], float32)
output_flipped = model(inputs_flipped)
output_flipped = flip_back(output_flipped.asnumpy(), flip_pairs)
if cfg.TEST.SHIFT_HEATMAP:
if cfg.TEST_SHIFT_HEATMAP:
output_flipped[:, :, :, 1:] = \
output_flipped.copy()[:, :, :, 0:-1]
......@@ -98,25 +92,25 @@ def validate(cfg, val_dataset, model, output_dir, ann_path):
def main():
args = parse_args()
context.set_context(mode=context.GRAPH_MODE,
device_target=config.TEST.device_target,
device_id=config.TEST.device_id)
device_target=config.TEST_device_target,
device_id=config.TEST_device_id)
if config.MODELARTS.IS_MODEL_ARTS:
mox.file.copy_parallel(src_url=args.data_url, dst_url=config.MODELARTS.CACHE_INPUT)
if config.MODELARTS_IS_MODEL_ARTS:
mox.file.copy_parallel(src_url=config.MODELARTS_DATA_URL,
dst_url=config.MODELARTS_CACHE_INPUT)
model = createModel()
if config.MODELARTS.IS_MODEL_ARTS:
ckpt_name = config.MODELARTS.CACHE_INPUT
if config.MODELARTS_IS_MODEL_ARTS:
ckpt_name = config.MODELARTS_CACHE_INPUT
else:
ckpt_name = config.DATASET.ROOT
ckpt_name = ckpt_name + config.TEST.MODEL_FILE
ckpt_name = ''
ckpt_name = ckpt_name + config.TEST_MODEL_FILE
if args.checkpoint_path is not None:
param_dict = load_checkpoint(args.checkpoint_path)
print("load checkpoint from [{}].".format(args.checkpoint_path))
if config.checkpoint_path != '':
param_dict = load_checkpoint(config.checkpoint_path)
print("load checkpoint from [{}].".format(config.checkpoint_path))
else:
param_dict = load_checkpoint(ckpt_name)
print("load checkpoint from [{}].".format(ckpt_name))
......@@ -125,25 +119,26 @@ def main():
valid_dataset = CreateDatasetCoco(
train_mode=False,
num_parallel_workers=config.TEST.NUM_PARALLEL_WORKERS,
num_parallel_workers=config.TEST_NUM_PARALLEL_WORKERS,
)
ckpt_name = ckpt_name.split('/')
ckpt_name = ckpt_name[len(ckpt_name) - 1]
ckpt_name = ckpt_name.split('.')[0]
if config.MODELARTS.IS_MODEL_ARTS:
output_dir = config.MODELARTS.CACHE_OUTPUT
ann_path = config.MODELARTS.CACHE_INPUT
if config.MODELARTS_IS_MODEL_ARTS:
output_dir = config.MODELARTS_CACHE_OUTPUT
ann_path = config.MODELARTS_CACHE_INPUT
else:
output_dir = config.TEST.OUTPUT_DIR
ann_path = config.DATASET.ROOT
output_dir = config.TEST_OUTPUT_DIR
ann_path = config.DATASET_ROOT
output_dir = output_dir + ckpt_name
ann_path = ann_path + config.DATASET.TEST_JSON
ann_path = os.path.join(ann_path, config.DATASET_TEST_JSON)
validate(config, valid_dataset, model, output_dir, ann_path)
if config.MODELARTS.IS_MODEL_ARTS:
mox.file.copy_parallel(src_url=config.MODELARTS.CACHE_OUTPUT, dst_url=args.train_url)
if config.MODELARTS_IS_MODEL_ARTS:
mox.file.copy_parallel(src_url=config.MODELARTS_CACHE_OUTPUT,
dst_url=config.MODELARTS_TRAIN_URL)
if __name__ == '__main__':
main()
......@@ -16,34 +16,23 @@
##############export checkpoint file into air, onnx, mindir models#################
python export.py
"""
import argparse
import numpy as np
import mindspore.common.dtype as ms
from mindspore import Tensor, load_checkpoint, load_param_into_net, export, context
from src.FastPose import createModel
parser = argparse.ArgumentParser(description='simple_baselines')
parser.add_argument("--device_target", type=str, choices=["Ascend", "GPU", "CPU"], default="CPU",
help="device target")
parser.add_argument("--device_id", type=int, default=0, help="Device id")
parser.add_argument("--ckpt_url",
default="FastPose.ckpt",
help="Checkpoint file path.")
parser.add_argument("--file_name", type=str,
default="simple_baselines", help="output file name.")
parser.add_argument('--file_format', type=str, choices=["MINDIR", "AIR"],
default='MINDIR', help='file format')
args = parser.parse_args()
from src.config import config
context.set_context(mode=context.GRAPH_MODE,
device_target=args.device_target,
device_id=args.device_id)
device_target=config.device_target,
device_id=config.device_id)
if __name__ == '__main__':
net = createModel()
# assert cfg.checkpoint_dir is not None, "cfg.checkpoint_dir is None."
param_dict = load_checkpoint(args.ckpt_url)
param_dict = load_checkpoint(config.ckpt_url)
load_param_into_net(net, param_dict)
input_arr = Tensor(np.ones([1, 3, 256, 192]), ms.float32)
export(net, input_arr, file_name=args.file_name,
file_format=args.file_format)
export(net, input_arr, file_name=config.file_name,
file_format=config.file_format)
......@@ -58,11 +58,11 @@ def get_acc(cfg, result_path, npy_path):
for i in range(num_samples):
f1 = os.path.join(result_path, str(i)+"_0.bin")
output = np.fromfile(f1, np.float32).reshape(out_shape)
if cfg.TEST.FLIP_TEST:
if cfg.TEST_FLIP_TEST:
f2 = os.path.join(result_path, "flipped"+str(i)+"_0.bin")
output_flipped = np.fromfile(f2, np.float32).reshape(out_shape)
output_flipped = flip_back(output_flipped, flip_pairs)
if cfg.TEST.SHIFT_HEATMAP:
if cfg.TEST_SHIFT_HEATMAP:
output_flipped[:, :, :, 1:] = \
output_flipped.copy()[:, :, :, 0:-1]
......@@ -83,7 +83,7 @@ def get_acc(cfg, result_path, npy_path):
idx += num_images
output_dir = "result/"
ann_path = config.DATASET.ROOT + config.DATASET.TEST_JSON
ann_path = config.DATASET_ROOT + config.DATASET_TEST_JSON
_, perf_indicator = evaluate(cfg, all_preds[:idx], output_dir, all_boxes[:idx], image_id, ann_path)
print("AP:", perf_indicator)
return perf_indicator
......
opencv-python
pycocotools
easydict
PyYAML
......@@ -16,8 +16,8 @@
echo "========================================================================"
echo "Please run the script as: "
echo "bash run.sh RANK_TABLE"
echo "For example: bash run_distribute.sh RANK_TABLE"
echo "bash run_distribute_train.sh RANK_TABLE"
echo "For example: bash run_distribute_train.sh RANK_TABLE"
echo "It is better to use the absolute path."
echo "========================================================================"
set -e
......@@ -50,6 +50,7 @@ do
cd src
mkdir utils
cd ../../../
cp ./default_config.yaml ./distribute_train/device$i
cp ./train.py ./distribute_train/device$i
cp ./src/*.py ./distribute_train/device$i/src
cp ./src/utils/*.py ./distribute_train/device$i/src/utils
......@@ -58,7 +59,7 @@ do
export RANK_ID=$i
echo "start training for device $i"
env > env$i.log
python train.py --is_model_arts False --run_distribute True > train$i.log 2>&1 &
python train.py --DEVICE_TARGET Ascend --MODELARTS_IS_MODEL_ARTS False --RUN_DISTRIBUTE True > train$i.log 2>&1 &
echo "$i finish"
cd ../
done
......
#!/bin/bash
# Copyright 2022 Huawei Technologies Co., Ltd
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ============================================================================
if [ $# != 5 ]; then
echo "Usage:
bash run_distribute_train_gpu.sh [DEVICE_NUM] [VISIBLE_DEVICES(0,1,2,3,4,5,6,7)] [config_file] [dataset_dir] [pretrained_backbone]
"
exit 1
fi
get_real_path() {
if [ "${1:0:1}" == "/" ]; then
echo "$1"
else
echo "$(realpath -m $PWD/$1)"
fi
}
BASE_PATH=$(cd ./"`dirname $0`" || exit; pwd)
CONFIG=$(get_real_path $3)
echo "CONFIG: "$CONFIG
DATASET=$(get_real_path $4)
echo "DATASET: "$DATASET
BACKBONE=$(get_real_path $5)
echo "BACKBONE: "$BACKBONE
if [ ! -f $CONFIG ]
then
echo "error: config=$CONFIG is not a file."
exit 1
fi
if [ ! -d $DATASET ]
then
echo "error: dataset_root=$DATASET is not a directory."
exit 1
fi
if [ ! -f $BACKBONE ]
then
echo "error: pretrained_backbone=$BACKBONE is not a file."
exit 1
fi
if [ -d "$BASE_PATH/../train_parallel" ];
then
rm -rf $BASE_PATH/../train_parallel
fi
mkdir $BASE_PATH/../train_parallel
cd $BASE_PATH/../train_parallel || exit
export CUDA_VISIBLE_DEVICES="$2"
export PYTHONPATH=${BASE_PATH}:$PYTHONPATH
echo "start training on multiple GPUs"
env > env.log
echo
mpirun -n $1 --allow-run-as-root --output-filename log_output --merge-stderr-to-stdout \
python -u ${BASE_PATH}/../train.py --DEVICE_TARGET GPU --RUN_DISTRIBUTE True \
--config_path $CONFIG --DATASET_ROOT $DATASET --MODEL_PRETRAINED $BACKBONE &> train.log &
......@@ -13,6 +13,62 @@
# See the License for the specific language governing permissions and
# limitations under the License.
# ============================================================================
CKPT_PATH=$1
export DEVICE_ID=$2
python eval.py --checkpoint_path $CKPT_PATH > eval_log$2.txt 2>&1 &
if [ $# != 4 ]; then
echo "Usage:
bash run_eval.sh [DEVICE_TARGET] [CONFIG] [CKPT_PATH] [DATASET]
"
exit 1
fi
get_real_path() {
if [ "${1:0:1}" == "/" ]; then
echo "$1"
else
echo "$(realpath -m $PWD/$1)"
fi
}
BASE_PATH=$(cd ./"`dirname $0`" || exit; pwd)
DEVICE_TARGET=$1
CONFIG=$(get_real_path $2)
echo "CONFIG: "$CONFIG
CKPT_PATH=$(get_real_path $3)
echo "CKPT_PATH: "$CKPT_PATH
DATASET=$(get_real_path $4)
echo "CKPT_PATH: "$DATASET
if [ ! -f $CONFIG ]
then
echo "error: config=$CONFIG is not a file."
exit 1
fi
if [ ! -d $DATASET ]
then
echo "error: dataset_root=$DATASET is not a directory."
exit 1
fi
if [ ! -f $CKPT_PATH ]
then
echo "error: CKPT_PATH=$CKPT_PATH is not a file."
exit 1
fi
if [ -d "$BASE_PATH/../eval" ];
then
rm -rf $BASE_PATH/../eval
fi
mkdir $BASE_PATH/../eval
cd $BASE_PATH/../eval || exit
export PYTHONPATH=${BASE_PATH}:$PYTHONPATH
echo "start eval"
env > env.log
echo
python $BASE_PATH/../eval.py --TEST_device_target $DEVICE_TARGET --config_path $CONFIG --checkpoint_path $CKPT_PATH --MODEL_PRETRAINED $CKPT_PATH --DATASET_ROOT $DATASET &> eval.log &
......@@ -21,5 +21,5 @@ echo "It is better to use the absolute path."
echo "========================================================================"
echo "start training for device $DEVICE_ID"
export DEVICE_ID=$1
python -u ../train.py --device_id ${DEVICE_ID} > train${DEVICE_ID}.log 2>&1 &
echo "finish"
\ No newline at end of file
python -u ../train.py --DEVICE_TARGET Ascend --DEVICE_ID ${DEVICE_ID} > train${DEVICE_ID}.log 2>&1 &
echo "finish"
#!/bin/bash
# Copyright 2022 Huawei Technologies Co., Ltd
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ============================================================================
if [ $# != 3 ]; then
echo "Usage:
bash run_standalone_train_gpu.sh [config_file] [dataset_dir] [pretrained_backbone]
"
exit 1
fi
get_real_path() {
if [ "${1:0:1}" == "/" ]; then
echo "$1"
else
echo "$(realpath -m $PWD/$1)"
fi
}
BASE_PATH=$(cd ./"`dirname $0`" || exit; pwd)
CONFIG=$(get_real_path $1)
echo "CONFIG: "$CONFIG
DATASET=$(get_real_path $2)
echo "DATASET: "$DATASET
BACKBONE=$(get_real_path $3)
echo "BACKBONE: "$BACKBONE
if [ ! -f $CONFIG ]
then
echo "error: config=$CONFIG is not a file."
exit 1
fi
if [ ! -d $DATASET ]
then
echo "error: dataset_root=$DATASET is not a directory."
exit 1
fi
if [ ! -f $BACKBONE ]
then
echo "error: pretrained_backbone=$BACKBONE is not a file."
exit 1
fi
if [ -d "$BASE_PATH/../train" ];
then
rm -rf $BASE_PATH/../train
fi
mkdir $BASE_PATH/../train
cd $BASE_PATH/../train || exit
export PYTHONPATH=${BASE_PATH}:$PYTHONPATH
echo "start training on single GPU"
env > env.log
echo
python -u ${BASE_PATH}/../train.py --DEVICE_TARGET GPU --config_path $CONFIG \
--DATASET_ROOT $DATASET --MODEL_PRETRAINED $BACKBONE &> train.log &
......@@ -15,17 +15,21 @@
'''
Alphapose network
'''
import os
import mindspore.nn as nn
import mindspore.ops as ops
from mindspore import load_checkpoint, load_param_into_net
from src.DUC import DUC
from src.SE_Resnet import SEResnet
from src.config import config
if config.MODELARTS.IS_MODEL_ARTS:
pretrained = config.MODELARTS.CACHE_INPUT + config.MODEL.PRETRAINED
if config.MODELARTS_IS_MODEL_ARTS:
pretrained = os.path.join(config.MODELARTS_CACHE_INPUT, config.MODEL_PRETRAINED)
else:
pretrained = config.TRAIN.CKPT_PATH + config.MODEL.PRETRAINED
pretrained = os.path.join(config.MODEL_PRETRAINED)
def createModel():
'''
......@@ -49,7 +53,7 @@ class FastPose_SE(nn.Cell):
self.duc2 = DUC(256, 512, upscale_factor=2)
self.conv_out = nn.Conv2d(
self.conv_dim, config.TRAIN.nClasses, kernel_size=3, stride=1, pad_mode='pad', padding=1, has_bias=True)
self.conv_dim, config.TRAIN_nClasses, kernel_size=3, stride=1, pad_mode='pad', padding=1, has_bias=True)
def construct(self, x):
'''
construct
......
......@@ -12,127 +12,116 @@
# See the License for the specific language governing permissions and
# limitations under the License.
# ============================================================================
'''
config
'''
from easydict import EasyDict as edict
config = edict()
# general
config.GENERAL = edict()
config.GENERAL.VERSION = 'commit'
config.GENERAL.TRAIN_SEED = 1
config.GENERAL.EVAL_SEED = 1
config.GENERAL.DATASET_SEED = 1
config.GENERAL.RUN_DISTRIBUTE = True
# model arts
config.MODELARTS = edict()
config.MODELARTS.IS_MODEL_ARTS = False
config.MODELARTS.CACHE_INPUT = '/cache/data_tzh/'
config.MODELARTS.CACHE_OUTPUT = '/cache/train_out/'
# model 网络参数
config.MODEL = edict()
config.MODEL.IS_TRAINED = False # 初始是True
config.MODEL.INIT_WEIGHTS = True
config.MODEL.PRETRAINED = 'resnet50.ckpt'
config.MODEL.NUM_JOINTS = 17
config.MODEL.IMAGE_SIZE = [192, 256] # 输入图像大小
#config.MODEL.IMAGE_SIZE = [256,320]
# network
config.NETWORK = edict()
config.NETWORK.NUM_LAYERS = 50
config.NETWORK.DECONV_WITH_BIAS = False
config.NETWORK.NUM_DECONV_LAYERS = 3
config.NETWORK.NUM_DECONV_FILTERS = [256, 256, 256]
config.NETWORK.NUM_DECONV_KERNELS = [4, 4, 4]
config.NETWORK.FINAL_CONV_KERNEL = 1
config.NETWORK.REVERSE = True
config.NETWORK.TARGET_TYPE = 'gaussian'
config.NETWORK.HEATMAP_SIZE = [48, 64]
#config.NETWORK.HEATMAP_SIZE = [64, 80]
config.NETWORK.SIGMA = 2
# loss
config.LOSS = edict()
config.LOSS.USE_TARGET_WEIGHT = True
# dataset
config.DATASET = edict()
config.DATASET.TYPE = 'COCO'
config.DATASET.ROOT = '/COCO2017/'
config.DATASET.TRAIN_SET = 'images'
config.DATASET.TRAIN_JSON = 'annotations/person_keypoints_train2017.json'
config.DATASET.TEST_SET = 'images'
config.DATASET.TEST_JSON = 'annotations/person_keypoints_val2017.json'
# training data augmentation
config.DATASET.FLIP = True
config.DATASET.SCALE_FACTOR = 0.3
config.DATASET.ROT_FACTOR = 40
# train
config.TRAIN = edict()
config.TRAIN.SHUFFLE = True
config.TRAIN.BATCH_SIZE = 64
config.TRAIN.BEGIN_EPOCH = 0
config.TRAIN.END_EPOCH = 270
config.TRAIN.LR = 0.001
config.TRAIN.LR_FACTOR = 0.1
config.TRAIN.LR_STEP = [90, 120]
config.TRAIN.NUM_PARALLEL_WORKERS = 8
config.TRAIN.SAVE_CKPT = True
config.TRAIN.nClasses = 17
config.TRAIN.CKPT_PATH = "/CKPT_PATH/"
# valid
config.TEST = edict()
config.TEST.device_target = "Ascend"
config.TEST.device_id = 7
config.TEST.BATCH_SIZE = 32
config.TEST.FLIP_TEST = True
config.TEST.POST_PROCESS = True
config.TEST.SHIFT_HEATMAP = True
config.TEST.USE_GT_BBOX = False
config.TEST.NUM_PARALLEL_WORKERS = 2
config.TEST.MODEL_FILE = "FastPose.ckpt"
config.TEST.COCO_BBOX_FILE = '/COCO_BBOX_FILE/COCO_val2017_detections_AP_H_56_person.json'
config.TEST.OUTPUT_DIR = 'results/'
# demo
config.detect_image = "demo.jpg"
config.yolo_image_size = [416, 416]
config.yolo_ckpt = "yolov3.ckpt"
config.fast_pose_ckpt = "FastPose.ckpt"
# confidence under ignore_threshold means no object when training
config.yolo_threshold = 0.1
config.save_bbox_image = True
config.result_path = "demo_result/"
# nms
config.TEST.OKS_THRE = 0.9
config.TEST.IN_VIS_THRE = 0.2
config.TEST.BBOX_THRE = 1.0
config.TEST.IMAGE_THRE = 0.0
config.TEST.NMS_THRE = 1.0
# 310 infer-related
config.INFER = edict()
config.INFER.PRE_RESULT_PATH = './preprocess_Result'
config.INFER.POST_RESULT_PATH = './result_Files'
# Help description for each configuration
config.enable_modelarts = "Whether training on modelarts, default: False"
config.data_url = "Url for modelarts"
config.train_url = "Url for modelarts"
config.data_path = "The location of the input data."
config.output_path = "The location of the output file."
config.device_target = "Running platform, choose from Ascend, GPU or CPU, and default is Ascend."
config.enable_profiling = 'Whether enable profiling while training, default: False'
# Parameters that can be modified at the terminal
config.ckpt_save_dir = "ckpt path to save"
config.batch_size = "training batch size"
config.run_distribute = "Run distribute, default is false."
"""Parse arguments"""
import os
import ast
import argparse
from pprint import pformat
import yaml
class Config:
"""
Configuration namespace. Convert dictionary to members.
"""
def __init__(self, cfg_dict):
for k, v in cfg_dict.items():
if isinstance(v, (list, tuple)):
setattr(self, k, [Config(x) if isinstance(x, dict) else x for x in v])
else:
setattr(self, k, Config(v) if isinstance(v, dict) else v)
def __str__(self):
return pformat(self.__dict__)
def __repr__(self):
return self.__str__()
def parse_cli_to_yaml(parser, cfg, helper=None, choices=None, cfg_path="default_config.yaml"):
"""
Parse command line arguments to the configuration according to the default yaml.
Args:
parser: Parent parser.
cfg: Base configuration.
helper: Helper description.
cfg_path: Path to the default yaml config.
"""
parser = argparse.ArgumentParser(description="[REPLACE THIS at config.py]",
parents=[parser])
helper = {} if helper is None else helper
choices = {} if choices is None else choices
for item in cfg:
if not isinstance(cfg[item], list) and not isinstance(cfg[item], dict):
help_description = helper[item] if item in helper else "Please reference to {}".format(cfg_path)
choice = choices[item] if item in choices else None
if isinstance(cfg[item], bool):
parser.add_argument("--" + item, type=ast.literal_eval, default=cfg[item], choices=choice,
help=help_description)
else:
parser.add_argument("--" + item, type=type(cfg[item]), default=cfg[item], choices=choice,
help=help_description)
args = parser.parse_args()
return args
def parse_yaml(yaml_path):
"""
Parse the yaml config file.
Args:
yaml_path: Path to the yaml config.
"""
with open(yaml_path, 'r') as fin:
try:
cfgs = yaml.load_all(fin.read(), Loader=yaml.FullLoader)
cfgs = [x for x in cfgs]
if len(cfgs) == 1:
cfg_helper = {}
cfg = cfgs[0]
cfg_choices = {}
elif len(cfgs) == 2:
cfg, cfg_helper = cfgs
cfg_choices = {}
elif len(cfgs) == 3:
cfg, cfg_helper, cfg_choices = cfgs
else:
raise ValueError("At most 3 docs (config, description for help, choices) are supported in config yaml")
print(cfg_helper)
except:
raise ValueError("Failed to parse yaml")
return cfg, cfg_helper, cfg_choices
def merge(args, cfg):
"""
Merge the base config from yaml file and command line arguments.
Args:
args: Command line arguments.
cfg: Base configuration.
"""
args_var = vars(args)
for item in args_var:
cfg[item] = args_var[item]
return cfg
def get_config():
"""
Get Config according to the yaml file and cli arguments.
"""
parser = argparse.ArgumentParser(description="default name", add_help=False)
current_dir = os.path.dirname(os.path.abspath(__file__))
parser.add_argument("--config_path", type=str, default=os.path.join(current_dir, "../default_config.yaml"),
help="Config file path")
path_args, _ = parser.parse_known_args()
default, helper, choices = parse_yaml(path_args.config_path)
args = parse_cli_to_yaml(parser=parser, cfg=default, helper=helper, choices=choices, cfg_path=path_args.config_path)
default = Config(merge(args, default))
return default
config = get_config()
......@@ -30,7 +30,7 @@ import mindspore.dataset.vision as C
from src.utils.transforms import fliplr_joints, get_affine_transform, affine_transform
from src.config import config
ds.config.set_seed(config.GENERAL.DATASET_SEED) # Set Random Seed
ds.config.set_seed(config.DATASET_SEED) # Set Random Seed
flip_pairs = [[1, 2], [3, 4], [5, 6], [7, 8],
[9, 10], [11, 12], [13, 14], [15, 16]]
......@@ -39,17 +39,17 @@ class CocoDatasetGenerator:
About the specific operations of coco2017 data set processing
'''
def __init__(self, cfg, is_train=False):
self.image_thre = cfg.TEST.IMAGE_THRE
self.image_size = np.array(cfg.MODEL.IMAGE_SIZE, dtype=np.int32)
self.image_width = cfg.MODEL.IMAGE_SIZE[0]
self.image_height = cfg.MODEL.IMAGE_SIZE[1]
self.image_thre = cfg.TEST_IMAGE_THRE
self.image_size = np.array(cfg.MODEL_IMAGE_SIZE, dtype=np.int32)
self.image_width = cfg.MODEL_IMAGE_SIZE[0]
self.image_height = cfg.MODEL_IMAGE_SIZE[1]
self.aspect_ratio = self.image_width * 1.0 / self.image_height
self.heatmap_size = np.array(cfg.NETWORK.HEATMAP_SIZE, dtype=np.int32)
self.sigma = cfg.NETWORK.SIGMA
self.target_type = cfg.NETWORK.TARGET_TYPE
self.scale_factor = cfg.DATASET.SCALE_FACTOR
self.rotation_factor = cfg.DATASET.ROT_FACTOR
self.flip = cfg.DATASET.FLIP
self.heatmap_size = np.array(cfg.NETWORK_HEATMAP_SIZE, dtype=np.int32)
self.sigma = cfg.NETWORK_SIGMA
self.target_type = cfg.NETWORK_TARGET_TYPE
self.scale_factor = cfg.DATASET_SCALE_FACTOR
self.rotation_factor = cfg.DATASET_ROT_FACTOR
self.flip = cfg.DATASET_FLIP
self.db = []
self.is_train = is_train
self.flip_pairs = [[1, 2], [3, 4], [5, 6], [7, 8],
......@@ -310,34 +310,34 @@ def CreateDatasetCoco(rank=0,
'''
CreateDatasetCoco
'''
per_batch_size = config.TRAIN.BATCH_SIZE if train_mode else config.TEST.BATCH_SIZE
per_batch_size = config.TRAIN_BATCH_SIZE if train_mode else config.TEST_BATCH_SIZE
image_path = ''
ann_file = ''
bbox_file = ''
if config.MODELARTS.IS_MODEL_ARTS:
image_path = config.MODELARTS.CACHE_INPUT
ann_file = config.MODELARTS.CACHE_INPUT
bbox_file = config.MODELARTS.CACHE_INPUT
if config.MODELARTS_IS_MODEL_ARTS:
image_path = config.MODELARTS_CACHE_INPUT
ann_file = config.MODELARTS_CACHE_INPUT
bbox_file = config.MODELARTS_CACHE_INPUT
else:
image_path = config.DATASET.ROOT
ann_file = config.DATASET.ROOT
bbox_file = config.DATASET.ROOT
image_path = config.DATASET_ROOT
ann_file = config.DATASET_ROOT
bbox_file = config.DATASET_ROOT
if train_mode:
image_path = image_path + config.DATASET.TRAIN_SET
ann_file = ann_file + config.DATASET.TRAIN_JSON
image_path = os.path.join(image_path, config.DATASET_TRAIN_SET)
ann_file = os.path.join(ann_file, config.DATASET_TRAIN_JSON)
else:
image_path = image_path + config.DATASET.TEST_SET
ann_file = ann_file + config.DATASET.TEST_JSON
bbox_file = bbox_file + config.TEST.COCO_BBOX_FILE
image_path = os.path.join(image_path, config.DATASET_TEST_SET)
ann_file = os.path.join(ann_file, config.DATASET_TEST_JSON)
bbox_file = os.path.join(bbox_file, config.TEST_COCO_BBOX_FILE)
print('loading dataset from {}'.format(image_path))
shuffle = shuffle if shuffle is not None else train_mode
dataset_generator = CocoDatasetGenerator(config, is_train=train_mode)
if not train_mode and config.TEST.USE_GT_BBOX:
if not train_mode and config.TEST_USE_GT_BBOX:
print('loading bbox file from {}'.format(bbox_file))
dataset_generator.load_detect_dataset(image_path, ann_file, bbox_file)
else:
......
......@@ -103,9 +103,9 @@ def evaluate(cfg, preds, output_dir, all_boxes, img_id, ann_path):
})
# rescoring and oks nms
num_joints = cfg.MODEL.NUM_JOINTS
in_vis_thre = cfg.TEST.IN_VIS_THRE
oks_thre = cfg.TEST.OKS_THRE
num_joints = cfg.MODEL_NUM_JOINTS
in_vis_thre = cfg.TEST_IN_VIS_THRE
oks_thre = cfg.TEST_OKS_THRE
oks_nmsed_kpts = {}
for img, items in img_kpts_dict.items():
for item in items:
......@@ -126,10 +126,10 @@ def evaluate(cfg, preds, output_dir, all_boxes, img_id, ann_path):
oks_nmsed_kpts[img] = [items[kep] for kep in keep]
# evaluate and save
image_set = cfg.DATASET.TEST_SET
image_set = cfg.DATASET_TEST_SET
_write_coco_keypoint_results(oks_nmsed_kpts, num_joints, res_file)
if 'test' not in image_set and has_coco:
ann_path = ann_path if ann_path else os.path.join(cfg.DATASET.ROOT, 'annotations',
ann_path = ann_path if ann_path else os.path.join(cfg.DATASET_ROOT, 'annotations',
'person_keypoints_' + image_set + '.json')
info_str = _do_python_keypoint_eval(res_file, output_dir, ann_path)
name_value = OrderedDict(info_str)
......
......@@ -18,24 +18,21 @@ train
from __future__ import division
import os
import ast
import argparse
import numpy as np
from mindspore import context, Tensor
from mindspore.context import ParallelMode
from mindspore.communication.management import init
from mindspore.communication.management import init, get_rank, get_group_size
from mindspore.train import Model
from mindspore.train.callback import TimeMonitor, LossMonitor, ModelCheckpoint, CheckpointConfig
from mindspore.train.callback import TimeMonitor, LossMonitor, ModelCheckpoint,\
CheckpointConfig, SummaryCollector
from mindspore.nn.optim import Adam
from mindspore.common import set_seed
from src.dataset import CreateDatasetCoco
from src.config import config
from src.network_with_loss import JointsMSELoss, PoseResNetWithLoss
from src.FastPose import createModel
if config.MODELARTS.IS_MODEL_ARTS:
import moxing as mox
set_seed(config.GENERAL.TRAIN_SEED)
def get_lr(begin_epoch,
......@@ -61,104 +58,102 @@ def get_lr(begin_epoch,
return learning_rate
def parse_args():
'''
parse_args
'''
parser = argparse.ArgumentParser(description="Simpleposenet training")
parser.add_argument('--data_url', required=False,
default=None, help='Location of data.')
parser.add_argument('--train_url', required=False,
default=None, help='Location of training outputs.')
parser.add_argument('--device_id', required=False, default=0,
type=int, help='Location of training outputs.')
parser.add_argument('--run_distribute', type=ast.literal_eval,
default=False, help='Location of training outputs.')
parser.add_argument('--is_model_arts', type=ast.literal_eval,
default=False, help='Location of training outputs.')
args = parser.parse_args()
return args
def main():
print("loading parse...")
args = parse_args()
device_id = args.device_id
config.GENERAL.RUN_DISTRIBUTE = args.run_distribute
config.MODELARTS.IS_MODEL_ARTS = args.is_model_arts
if config.GENERAL.RUN_DISTRIBUTE or config.MODELARTS.IS_MODEL_ARTS:
device_id = int(os.getenv('DEVICE_ID'))
context.set_context(mode=context.GRAPH_MODE,
device_target="Ascend",
save_graphs=False,
device_id=device_id)
if config.GENERAL.RUN_DISTRIBUTE:
init()
rank = int(os.getenv('DEVICE_ID'))
device_num = int(os.getenv('RANK_SIZE'))
context.set_auto_parallel_context(device_num=device_num,
parallel_mode=ParallelMode.DATA_PARALLEL,
gradients_mean=True)
device_target=config.DEVICE_TARGET,
save_graphs=False)
if config.DEVICE_TARGET == "Ascend":
device_id = int(os.getenv('DEVICE_ID', '0'))
context.set_context(device_id=device_id)
if config.RUN_DISTRIBUTE:
if config.DEVICE_TARGET == 'Ascend':
init()
rank = get_rank()
device_num = get_group_size()
context.set_auto_parallel_context(device_num=device_num,
parallel_mode=ParallelMode.DATA_PARALLEL,
gradients_mean=True,
parameter_broadcast=True)
elif config.DEVICE_TARGET == 'GPU':
init("nccl")
rank = get_rank()
device_num = get_group_size()
context.reset_auto_parallel_context()
context.set_auto_parallel_context(device_num=device_num,
parallel_mode=ParallelMode.DATA_PARALLEL,
gradients_mean=True)
else:
raise NotImplementedError("Only GPU and Ascend training supported")
else:
rank = 0
device_num = 1
if config.MODELARTS.IS_MODEL_ARTS:
mox.file.copy_parallel(src_url=args.data_url,
dst_url=config.MODELARTS.CACHE_INPUT)
if config.MODELARTS_IS_MODEL_ARTS:
mox.file.copy_parallel(src_url=config.MODELARTS_DATA_URL,
dst_url=config.MODELARTS_CACHE_INPUT)
print(f"Running on {config.DEVICE_TARGET}, device num: {device_num}, rank: {rank}")
dataset = CreateDatasetCoco(rank=rank,
group_size=device_num,
train_mode=True,
num_parallel_workers=config.TRAIN.NUM_PARALLEL_WORKERS,
num_parallel_workers=config.TRAIN_NUM_PARALLEL_WORKERS,
)
m = createModel()
loss = JointsMSELoss(config.LOSS.USE_TARGET_WEIGHT)
loss = JointsMSELoss(config.LOSS_USE_TARGET_WEIGHT)
net_with_loss = PoseResNetWithLoss(m, loss)
dataset_size = dataset.get_dataset_size()
lr = Tensor(get_lr(config.TRAIN.BEGIN_EPOCH,
config.TRAIN.END_EPOCH,
print(f"Dataset size = {dataset_size}")
lr = Tensor(get_lr(config.TRAIN_BEGIN_EPOCH,
config.TRAIN_END_EPOCH,
dataset_size,
lr_init=config.TRAIN.LR,
factor=config.TRAIN.LR_FACTOR,
epoch_number_to_drop=config.TRAIN.LR_STEP))
lr_init=config.TRAIN_LR,
factor=config.TRAIN_LR_FACTOR,
epoch_number_to_drop=config.TRAIN_LR_STEP))
optim = Adam(m.trainable_params(), learning_rate=lr)
time_cb = TimeMonitor(data_size=dataset_size)
loss_cb = LossMonitor()
cb = [time_cb, loss_cb]
if config.TRAIN.SAVE_CKPT:
summary_cb = SummaryCollector(os.path.join(config.SUMMARY_DIR, f'rank_{rank}'))
cb = [time_cb, loss_cb, summary_cb]
if config.TRAIN_SAVE_CKPT:
config_ck = CheckpointConfig(
save_checkpoint_steps=dataset_size, keep_checkpoint_max=2)
prefix = ''
if config.GENERAL.RUN_DISTRIBUTE:
if config.RUN_DISTRIBUTE:
prefix = 'multi_' + 'train_fastpose_' + \
config.GENERAL.VERSION + '_' + os.getenv('DEVICE_ID')
config.VERSION + '_' + str(rank)
else:
prefix = 'single_' + 'train_fastpose_' + config.GENERAL.VERSION
prefix = 'single_' + 'train_fastpose_' + config.VERSION
directory = ''
if config.MODELARTS.IS_MODEL_ARTS:
directory = config.MODELARTS.CACHE_OUTPUT + \
'device_' + os.getenv('DEVICE_ID')
elif config.GENERAL.RUN_DISTRIBUTE:
directory = config.TRAIN.CKPT_PATH + \
'device_' + os.getenv('DEVICE_ID')
if config.MODELARTS_IS_MODEL_ARTS:
directory = config.MODELARTS_CACHE_OUTPUT + \
'device_' + str(rank)
elif config.RUN_DISTRIBUTE:
directory = config.TRAIN_CKPT_PATH + \
'device_' + str(rank)
else:
directory = config.TRAIN.CKPT_PATH + 'device'
directory = config.TRAIN_CKPT_PATH + 'device'
ckpoint_cb = ModelCheckpoint(
prefix=prefix, directory=directory, config=config_ck)
cb.append(ckpoint_cb)
model = Model(net_with_loss, optimizer=optim, amp_level="O2")
epoch_size = config.TRAIN.END_EPOCH - config.TRAIN.BEGIN_EPOCH
epoch_size = config.TRAIN_END_EPOCH - config.TRAIN_BEGIN_EPOCH
print("************ Start training now ************")
print('start training, epoch size = %d' % epoch_size)
model.train(epoch_size, dataset, callbacks=cb)
print(f'start training, {epoch_size} epochs, {dataset_size} steps per epoch')
model.train(epoch_size, dataset, callbacks=cb, dataset_sink_mode=True)
if config.MODELARTS.IS_MODEL_ARTS:
if config.MODELARTS_IS_MODEL_ARTS:
mox.file.copy_parallel(
src_url=config.MODELARTS.CACHE_OUTPUT, dst_url=args.train_url)
src_url=config.MODELARTS_CACHE_OUTPUT, dst_url=config.MODELARTS_TRAIN_URL)
if __name__ == '__main__':
if config.MODELARTS_IS_MODEL_ARTS:
import moxing as mox
set_seed(config.TRAIN_SEED)
main()
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment