diff --git a/research/cv/JDE/DATASET_ZOO.md b/research/cv/JDE/DATASET_ZOO.md new file mode 100644 index 0000000000000000000000000000000000000000..9ea8c96fa044d8be5af788b1130aa58bacadd7f9 --- /dev/null +++ b/research/cv/JDE/DATASET_ZOO.md @@ -0,0 +1,304 @@ +# Dataset Zoo + +Datasets preparing was used from [Towards-Realtime-MOT](https://github.com/Zhongdao/Towards-Realtime-MOT/blob/master/DATASET_ZOO.md) + +## Data Format + +The root folder of datasets will have the following structure: + +```text +. +鈹斺攢datasets + 鈹溾攢Caltech + 鈹溾攢Cityscapes + 鈹溾攢CUHKSYSU + 鈹溾攢ETHZ + 鈹溾攢MOT16 + 鈹溾攢MOT17 + 鈹斺攢PRW +``` + +Every image has a corresponding annotation text. Given an image path, +the annotation text path can be generated by replacing the string `images` with `labels_with_ids` and replacing `.jpg` with `.txt`. + +In the annotation text, each line is describing a bounding box and has the following format: + +```text +[class] [identity] [x_center] [y_center] [width] [height] +``` + +The field `[class]` should be `0`. Only single-class multi-object tracking is supported. + +The field `[identity]` is an integer from `0` to `num_identities - 1`, or `-1` if this box has no identity annotation. + +- Note that the values of `[x_center] [y_center] [width] [height]` are normalized by the width/height of the image, so they are floating point numbers ranging from 0 to 1. + +## Download + +### Caltech Pedestrian + +Download all archives `set**.tar` files from [this page](https://drive.google.com/drive/folders/1IBlcJP8YsCaT81LwQ2YwQJac8bf1q8xF?usp=sharing) and extract to `Caltech/data`. + +Download [annotations](https://drive.google.com/file/d/1h8vxl_6tgi9QVYoer9XcY9YwNB32TE5k/view?usp=sharing) and unzip to `Caltech/data/labels_with_ids`. + +Download [this tool](https://github.com/mitmul/caltech-pedestrian-dataset-converter) to convert the original data format to images. +Move `scripts` folder of tool to `Caltech` folder and use command: + +```bash +python scripts/convert_seqs.py +``` + +The structure of the dataset after completing all steps will be the following: + +```text +. +鈹斺攢Caltech + 鈹斺攢data + 鈹溾攢images + 鈹� 鈹斺攢*** + 鈹斺攢labels_with_ids + 鈹斺攢*** +``` + +Note: *** - it is a data (images or annotations) + +### CityPersons + +Google Drive: +[[0]](https://drive.google.com/file/d/1DgLHqEkQUOj63mCrS_0UGFEM9BG8sIZs/view?usp=sharing) +[[1]](https://drive.google.com/file/d/1BH9Xz59UImIGUdYwUR-cnP1g7Ton_LcZ/view?usp=sharing) +[[2]](https://drive.google.com/file/d/1q_OltirP68YFvRWgYkBHLEFSUayjkKYE/view?usp=sharing) +[[3]](https://drive.google.com/file/d/1VSL0SFoQxPXnIdBamOZJzHrHJ1N2gsTW/view?usp=sharing) + +Download `.zip` archives from links and use the following commands. + +```bash +zip --FF Citypersons --out c.zip +unzip c.zip +mv Citypersons Cityscapes +``` + +The structure of the dataset after completing all steps will be the following: + +```text +. +鈹斺攢Cityscapes + 鈹溾攢images + 鈹� 鈹溾攢train + 鈹� 鈹斺攢val + 鈹斺攢labels_with_ids + 鈹溾攢train + 鈹斺攢val +``` + +### CUHK-SYSU + +Google Drive: +[[0]](https://drive.google.com/file/d/1D7VL43kIV9uJrdSCYl53j89RE2K-IoQA/view?usp=sharing) + +Original dataset webpage: [CUHK-SYSU Person Search Dataset](http://www.ee.cuhk.edu.hk/~xgwang/PS/dataset.html) + +Download dataset, unzip and use command below. + +```bash +mv CUHK-SYSU CUHKSYSU +``` + +The structure of the dataset will be the following: + +```text +. +鈹斺攢CUHKSYSU + 鈹溾攢images + 鈹� 鈹斺攢*** + 鈹斺攢labels_with_ids + 鈹斺攢*** +``` + +Note: *** - it is a data (images or annotations) + +### PRW + +Google Drive: +[[0]](https://drive.google.com/file/d/116_mIdjgB-WJXGe8RYJDWxlFnc_4sqS8/view?usp=sharing) + +Download dataset and unzip. The structure of the dataset will be the following: + +```text +. +鈹斺攢PRW + 鈹溾攢images + 鈹� 鈹斺攢*** + 鈹斺攢labels_with_ids + 鈹斺攢*** +``` + +Note: *** - it is a data (images or annotations) + +### ETHZ (overlapping with MOT-16 removed) + +Google Drive: +[[0]](https://drive.google.com/file/d/19QyGOCqn8K_rc9TXJ8UwLSxCx17e0GoY/view?usp=sharing) + +Original dataset webpage: [ETHZ pedestrian dataset](https://data.vision.ee.ethz.ch/cvl/aess/dataset/) + +Download dataset and unzip. The structure of the dataset will be the following: + +```text +. +鈹斺攢ETHZ + 鈹溾攢eth01 + 鈹� 鈹溾攢images + 鈹� 鈹� 鈹斺攢*** + 鈹� 鈹斺攢labels_with_ids + 鈹� 鈹斺攢*** + 鈹溾攢eth02 + 鈹溾攢eth03 + 鈹溾攢eth05 + 鈹斺攢eth07 +``` + +Note: *** - it is a data (images or annotations). Same structure to every 'eth*' folder. + +### MOT-17 + +Official link: +[[0]](https://motchallenge.net/data/MOT17.zip) + +Original dataset webpage: [MOT-17](https://motchallenge.net/data/MOT17/) + +After downloading, unzip and use `prepare_mot17.py` script from the: + +```bash +python data/prepare_mot17.py --seq_root /path/to/MOT17/train +``` + +The structure of the dataset after completing all steps will be the following: + +```text +. +鈹斺攢MOT17 + 鈹溾攢images + 鈹� 鈹斺攢train + 鈹斺攢labels_with_ids + 鈹斺攢train +``` + +### MOT-16 (for evaluation) + +Google Drive: +[[0]](https://drive.google.com/file/d/1254q3ruzBzgn4LUejDVsCtT05SIEieQg/view?usp=sharing) + +Original dataset webpage: [MOT-16](https://motchallenge.net/data/MOT16/) + +Download link: [MOT-16.zip](https://motchallenge.net/data/MOT16.zip) + +> The section "Download" in the bottom of the web-page. Link: "Get all data". + +Download dataset and unzip. The structure of the dataset will be the following: + +```text +. +鈹斺攢MOT16 + 鈹斺攢train +``` + +# Data config + +Download [schemas](https://github.com/Zhongdao/Towards-Realtime-MOT/tree/master/data) of the training data with relative paths for every image, divided into train/val parts and move into `data` folder. + +```text +. +鈹斺攢鈹€ data + 鈹溾攢 caltech.10k.val + 鈹溾攢 caltech.train + 鈹溾攢 caltech.val + 鈹溾攢 citypersons.train + 鈹溾攢 citypersons.val + 鈹溾攢 cuhksysu.train + 鈹溾攢 cuhksysu.val + 鈹溾攢 eth.train + 鈹溾攢 mot17.train + 鈹溾攢 prw.train + 鈹斺攢 prw.val +``` + +# Citation + +Caltech: + +```text +@inproceedings{ dollarCVPR09peds, + author = "P. Doll\'ar and C. Wojek and B. Schiele and P. Perona", + title = "Pedestrian Detection: A Benchmark", + booktitle = "CVPR", + month = "June", + year = "2009", + city = "Miami", +} +``` + +Citypersons: + +```text +@INPROCEEDINGS{Shanshan2017CVPR, + author = {Shanshan Zhang and Rodrigo Benenson and Bernt Schiele}, + title = {CityPersons: A Diverse Dataset for Pedestrian Detection}, + booktitle = {CVPR}, + year = {2017} + } + +@INPROCEEDINGS{Cordts2016Cityscapes, + title={The Cityscapes Dataset for Semantic Urban Scene Understanding}, + author={Cordts, Marius and Omran, Mohamed and Ramos, Sebastian and Rehfeld, Timo and Enzweiler, Markus and Benenson, Rodrigo and Franke, Uwe and Roth, Stefan and Schiele, Bernt}, + booktitle={Proc. of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR)}, + year={2016} +} +``` + +CUHK-SYSU: + +```text +@inproceedings{xiaoli2017joint, + title={Joint Detection and Identification Feature Learning for Person Search}, + author={Xiao, Tong and Li, Shuang and Wang, Bochao and Lin, Liang and Wang, Xiaogang}, + booktitle={CVPR}, + year={2017} +} +``` + +PRW: + +```text +@inproceedings{zheng2017person, + title={Person re-identification in the wild}, + author={Zheng, Liang and Zhang, Hengheng and Sun, Shaoyan and Chandraker, Manmohan and Yang, Yi and Tian, Qi}, + booktitle={Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition}, + pages={1367--1376}, + year={2017} +} +``` + +ETHZ: + +```text +@InProceedings{eth_biwi_00534, + author = {A. Ess and B. Leibe and K. Schindler and and L. van Gool}, + title = {A Mobile Vision System for Robust Multi-Person Tracking}, + booktitle = {IEEE Conference on Computer Vision and Pattern Recognition (CVPR'08)}, + year = {2008}, + month = {June}, + publisher = {IEEE Press}, +} +``` + +MOT-16&17: + +```text +@article{milan2016mot16, + title={MOT16: A benchmark for multi-object tracking}, + author={Milan, Anton and Leal-Taix{\'e}, Laura and Reid, Ian and Roth, Stefan and Schindler, Konrad}, + journal={arXiv preprint arXiv:1603.00831}, + year={2016} +} +``` diff --git a/research/cv/JDE/README.md b/research/cv/JDE/README.md new file mode 100644 index 0000000000000000000000000000000000000000..35e9734c7f9dec167d6526796ad60fce13961f47 --- /dev/null +++ b/research/cv/JDE/README.md @@ -0,0 +1,392 @@ +# Contents + +- [Contents](#contents) + - [JDE Description](#jde-description) + - [Model Architecture](#model-architecture) + - [Dataset](#dataset) + - [Environment Requirements](#environment-requirements) + - [Quick Start](#quick-start) + - [Script Description](#script-description) + - [Script and Sample Code](#script-and-sample-code) + - [Script Parameters](#script-parameters) + - [Training Process](#training-process) + - [Standalone Training](#standalone-training) + - [Distribute Training](#distribute-training) + - [Evaluation Process](#evaluation-process) + - [Evaluation](#evaluation) + - [Inference Process](#inference-process) + - [Usage](#usage) + - [Result](#result) + - [Model Description](#model-description) + - [Performance](#performance) + - [Training Performance](#training-performance) + - [Evaluation Performance](#evaluation-performance) + - [ModelZoo Homepage](#modelzoo-homepage) + +## [JDE Description](#contents) + +Paper with introduced JDE model is dedicated to the improving efficiency of an MOT system. +It's introduce an early attempt that Jointly learns the Detector and Embedding model (JDE) in a single-shot deep network. +In other words, the proposed JDE employs a single network to simultaneously output detection results and the corresponding appearance embeddings of the detected boxes. +In comparison, SDE methods and two-stage methods are characterized by re-sampled pixels (bounding boxes) and feature maps, respectively. +Both the bounding boxes and feature maps are fed into a separate re-ID model for appearance feature extraction. +Method is near real-time while being almost as accurate as the SDE methods. + +[Paper](https://arxiv.org/pdf/1909.12605.pdf): Towards Real-Time Multi-Object Tracking. Department of Electronic Engineering, Tsinghua University + +## [Model Architecture](#contents) + +Architecture of the JDE is the Feature Pyramid Network (FPN). +FPN makes predictions from multiple scales, thus bringing improvement in pedestrian detection where the scale of targets varies a lot. +An input video frame first undergoes a forward pass through a backbone network to obtain feature maps at three scales, namely, scales with 1/32, 1/16 and 1/8 down-sampling rate, respectively. +Then, the feature map with the smallest size (also the semantically strongest features) is up-sampled and fused with the feature map from the second smallest scale by skip connection, and the same goes for the other scales. +Finally, prediction heads are added upon fused feature maps at all the three scales. +A prediction head consists of several stacked convolutional layers and outputs a dense prediction map of size (6A + D) 脳 H 脳 W, where A is the number of anchor templates assigned to this scale, and D is the dimension of the embedding. + +## [Dataset](#contents) + +Used a large-scale training set by putting together six publicly available datasets on pedestrian detection, MOT and person search. + +These datasets can be categorized into two types: ones that only contain bounding box annotations, and ones that have both bounding box and identity annotations. +The first category includes the ETH dataset and the CityPersons (CP) dataset. The second category includes the CalTech (CT) dataset, MOT16 (M16) dataset, CUHK-SYSU (CS) dataset and PRW dataset. +Training subsets of all these datasets are gathered to form the joint training set, and videos in the ETH dataset that overlap with the MOT-16 test set are excluded for fair evaluation. + +Datasets preparations are described in [DATASET_ZOO.md](DATASET_ZOO.md). + +Datasets size: 134G, 1 object category (pedestrian). + +Note: `--dataset_root` is used as an entry point for all datasets, used for training and evaluating this model. + +Organize your dataset structure as follows: + +```text +. +鈹斺攢dataset_root/ + 鈹溾攢Caltech/ + 鈹溾攢Cityscapes/ + 鈹溾攢CUHKSYSU/ + 鈹溾攢ETHZ/ + 鈹溾攢MOT16/ + 鈹溾攢MOT17/ + 鈹斺攢PRW/ +``` + +Information about train part of dataset. + +| Dataset | ETH | CP | CT | M16 | CS | PRW | Total | +| :------:|:---:|:---:|:---:|:---:|:---:|:---:|:-----:| +| # img |2K |3K |27K |53K |11K |6K |54K | +| # box |17K |21K |46K |112K |55K |18K |270K | +| # ID |- |- |0.6K |0.5K |7K |0.5K |8.7K | + +## [Environment Requirements](#contents) + +- Hardware锛圙PU锛� +- Prepare hardware environment with GPU processor. +- Framework + - [MindSpore](https://www.mindspore.cn/install/en) +- For more information, please check the resources below锛� + - [MindSpore Tutorials](https://www.mindspore.cn/tutorials/en/master/index.html) + - [MindSpore Python API](https://www.mindspore.cn/docs/api/en/master/index.html) + +## [Quick Start](#contents) + +After installing MindSpore through the official website, you can follow the steps below for training and evaluation, +in particular, before training, you need to install `requirements.txt` by following command `pip install -r requirements.txt`. + +> If an error occurred, update pip by `pip install --upgrade pip` and try again. +> If it didn't help install packages manually by using `pip install {package from requirements.txt}`. + +Note: The PyTorch is used only for checkpoint conversion. + +All trainings will starts from pre-trained backbone, +[download](https://drive.google.com/file/d/1keZwVIfcWmxfTiswzOKUwkUz2xjvTvfm/view) and convert the pre-trained on +ImageNet backbone with commands below: + +```bash +# From the root model directory run +python -m src.convert_checkpoint --ckpt_url [PATH_TO_PYTORCH_CHECKPOINT] +``` + +- PATH_TO_PYTORCH_CHECKPOINT - Path to the downloaded darknet53 PyTorch checkpoint. + +After converting the checkpoint and installing the requirements.txt, you can run the training scripts: + +```bash +# Run standalone training example +bash scripts/run_standalone_train_gpu.sh [DEVICE_ID] [LOGS_CKPT_DIR] [CKPT_URL] [DATASET_ROOT] + +# Run distribute training example +bash scripts/run_distribute_train_gpu.sh [DEVICE_NUM] [LOGS_CKPT_DIR] [CKPT_URL] [DATASET_ROOT] +``` + +- DEVICE_ID - Device ID +- LOGS_CKPT_DIR - path to the directory, where the training results will be stored. +- CKPT_URL - Path to the converted pre-trained DarkNet53 backbone. +- DATASET_ROOT - Path to the dataset root directory (containing all dataset parts, described in [DATASET_ZOO.md](DATASET_ZOO.md)) + +## [Script Description](#contents) + +### [Script and Sample Code](#contents) + +```text +. +鈹斺攢JDE + 鈹溾攢data + 鈹� 鈹斺攢prepare_mot17.py # MOT17 data preparation script + 鈹溾攢cfg + 鈹� 鈹溾攢ccmcpe.json # paths to dataset schema (defining relative paths structure) + 鈹� 鈹斺攢config.py # parameter parser + 鈹溾攢scripts + 鈹� 鈹溾攢run_distribute_train_gpu.sh # launch distribute train on GPU + 鈹� 鈹溾攢run_eval_gpu.sh # launch evaluation on GPU + 鈹� 鈹斺攢run_standalone_train_gpu.sh # launch standalone train on GPU + 鈹溾攢src + 鈹� 鈹溾攢__init__.py + 鈹� 鈹溾攢convert_checkpoint.py # backbone checkpoint converter (torch to mindspore) + 鈹� 鈹溾攢darknet.py # backbone of network + 鈹� 鈹溾攢dataset.py # create dataset + 鈹� 鈹溾攢evaluation.py # motmetrics evaluator + 鈹� 鈹溾攢io.py # MOT evaluation utils + 鈹� 鈹溾攢kalman_filter.py # kalman filter script + 鈹� 鈹溾攢log.py # logger script + 鈹� 鈹溾攢model.py # create model script + 鈹� 鈹溾攢timer.py # timer script + 鈹� 鈹溾攢utils.py # utilities used in other scripts + 鈹� 鈹斺攢visualization.py # visualization for inference + 鈹溾攢tracker + 鈹� 鈹溾攢__init__.py + 鈹� 鈹溾攢basetrack.py # base class for tracking + 鈹� 鈹溾攢matching.py # matching for tracking script + 鈹� 鈹斺攢multitracker.py # tracker init script + 鈹溾攢DATASET_ZOO.md # dataset preparing description + 鈹溾攢README.md + 鈹溾攢default_config.yaml # default configs + 鈹溾攢eval.py # evaluation script + 鈹溾攢eval_detect.py # detector evaluation script + 鈹溾攢export.py # export to MINDIR script + 鈹溾攢infer.py # inference script + 鈹溾攢requirements.txt + 鈹斺攢train.py # training script +``` + +### [Script Parameters](#contents) + +```text +Parameters in config.py and default_config.yaml. +Include arguments for Train/Evaluation/Inference. + +--config_path Path to default_config.yaml with hyperparameters and defaults +--data_cfg_url Path to .json with paths to datasets schemas +--momentum Momentum for SGD optimizer +--decay Weight_decay for SGD optimizer +--lr Init learning rate +--epochs Number of epochs to train +--batch_size Batch size per one device' +--num_classes Number of object classes +--k_max Max predictions per one map (made for optimization of FC layer embedding computation) +--img_size Size of input images +--track_buffer Tracking buffer +--keep_checkpoint_max Keep saved last N checkpoints +--backbone_input_shape Input filters of backbone layers +--backbone_shape Input filters of backbone layers +--backbone_layers Output filters of backbone layers +--out_channel Number of channels for detection +--embedding_dim Number of channels for embeddings +--iou_thres IOU thresholds +--conf_thres Confidence threshold +--nms_thres Threshold for Non-max suppression +--min_box_area Filter out tiny boxes +--anchor_scales 12 predefined anchor boxes. Different 4 per each of 3 feature maps +--col_names_train Names of columns for training GeneratorDataset +--col_names_val Names of columns for validation GeneratorDataset +--is_distributed Distribute training or not +--dataset_root Path to datasets root folder +--device_target Device GPU or any +--device_id Device id of target device +--device_start Start device id +--ckpt_url Location of checkpoint +--logs_dir Dir to save logs and ckpt +--input_video Path to the input video +--output_format Expected output format +--output_root Expected output root path +--save_images Save tracking results (image) +--save_videos Save tracking results (video) +``` + +### [Training Process](#contents) + +#### Standalone Training + +Note: For all trainings necessary to use pretrained backbone darknet53. + +```bash +bash scripts/run_standalone_train_gpu.sh [DEVICE_ID] [LOGS_CKPT_DIR] [CKPT_URL] [DATASET_ROOT] +``` + +- DEVICE_ID - device ID +- LOGS_CKPT_DIR - path to the directory, where the training results will be stored. +- CKPT_URL - Path to the converted pre-trained DarkNet53 backbone. +- DATASET_ROOT - Path to the dataset root directory (containing all dataset parts, described in [DATASET_ZOO.md](DATASET_ZOO.md)) + +The above command will run in the background, you can view the result through the generated standalone_train.log file. +After training, you can get the training loss and time logs in chosen logs_dir. + +The model checkpoints will be saved in LOGS_CKPT_DIR directory. + +#### Distribute Training + +```bash +bash scripts/run_distribute_train_gpu.sh [DEVICE_NUM] [LOGS_CKPT_DIR] [CKPT_URL] [DATASET_ROOT] +``` + +- DEVICE_ID - device ID +- LOGS_CKPT_DIR - path to the directory, where the training results will be stored. +- CKPT_URL - Path to the converted pre-trained DarkNet53 backbone. +- DATASET_ROOT - Path to the dataset root directory (containing all dataset parts, described in [DATASET_ZOO.md](DATASET_ZOO.md)) + +The above shell script will run the distributed training in the background. +Here is the example of the training logs: + +```text +epoch: 30 step: 1612, loss is -4.7679796 +epoch: 30 step: 1612, loss is -5.816874 +epoch: 30 step: 1612, loss is -5.302864 +epoch: 30 step: 1612, loss is -5.775913 +epoch: 30 step: 1612, loss is -4.9537477 +epoch: 30 step: 1612, loss is -4.3535285 +epoch: 30 step: 1612, loss is -5.0773625 +epoch: 30 step: 1612, loss is -4.2019467 +epoch time: 2023042.925 ms, per step time: 1209.954 ms +epoch time: 2023069.500 ms, per step time: 1209.970 ms +epoch time: 2023097.331 ms, per step time: 1209.986 ms +epoch time: 2023038.221 ms, per step time: 1209.951 ms +epoch time: 2023098.113 ms, per step time: 1209.987 ms +epoch time: 2023093.300 ms, per step time: 1209.984 ms +epoch time: 2023078.631 ms, per step time: 1209.975 ms +epoch time: 2017509.966 ms, per step time: 1206.645 ms +train success +train success +train success +train success +train success +train success +train success +train success +``` + +### [Evaluation Process](#contents) + +#### Evaluation + +Tracking ability of the model is tested on the train part of the MOT16 dataset (doesn't use during training). + +To start tracker evaluation run the command below. + +```bash +bash scripts/run_eval_gpu.sh [DEVICE_ID] [CKPT_URL] [DATASET_ROOT] +``` + +- DEVICE_ID - Device ID. +- CKPT_URL - Path to the trained JDE model. +- DATASET_ROOT - Path to the dataset root directory (containing all dataset parts, described in [DATASET_ZOO.md](DATASET_ZOO.md)). + +> Note: the script expects that the DATASET_ROOT directory contains the MOT16 sub-folder. + +The above python command will run in the background. The validation logs will be saved in "eval.log". + +For more details about `motmetrics`, you can refer to [MOT benchmark](https://motchallenge.net/). + +```text +DATE-DATE-DATE TIME:TIME:TIME [INFO]: Time elapsed: 240.54 seconds, FPS: 22.04 + IDF1 IDP IDR Rcll Prcn GT MT PT ML FP FN IDs FM MOTA MOTP IDt IDa IDm +MOT16-02 45.1% 49.9% 41.2% 71.0% 86.0% 54 17 31 6 2068 5172 425 619 57.0% 0.215 239 68 14 +MOT16-04 69.5% 75.5% 64.3% 80.6% 94.5% 83 45 24 14 2218 9234 175 383 75.6% 0.184 98 28 3 +MOT16-05 63.6% 68.1% 59.7% 82.0% 93.7% 125 67 49 9 376 1226 137 210 74.5% 0.203 113 40 40 +MOT16-09 55.2% 60.4% 50.8% 78.1% 92.9% 25 16 8 1 316 1152 108 147 70.0% 0.187 76 15 11 +MOT16-10 57.1% 59.9% 54.5% 80.1% 88.1% 54 28 26 0 1337 2446 376 569 66.2% 0.228 202 66 16 +MOT16-11 75.0% 76.4% 73.7% 89.6% 92.9% 69 50 16 3 626 953 78 137 81.9% 0.159 49 24 12 +MOT16-13 64.8% 69.9% 60.3% 78.5% 90.9% 107 58 43 6 900 2463 272 528 68.3% 0.223 200 59 48 +OVERALL 63.2% 68.1% 58.9% 79.5% 91.8% 517 281 197 39 7841 22646 1571 2593 71.0% 0.196 977 300 144 +``` + +To evaluate detection ability (get mAP, Precision and Recall metrics) of the model, run command below. + +```bash +python eval_detect.py --device_id [DEVICE_ID] --ckpt_url [CKPT_URL] --dataset_root [DATASET_ROOT] +``` + +- DEVICE_ID - Device ID. +- CKPT_URL - Path to the trained JDE model. +- DATASET_ROOT - Path to the dataset root directory (containing all dataset parts, described in [DATASET_ZOO.md](DATASET_ZOO.md)). + +Results of evaluation will be visualized at command line. + +```text + Image Total P R mAP + 4000 30353 0.829 0.778 0.765 0.426s + 8000 30353 0.863 0.798 0.788 0.42s + 12000 30353 0.854 0.815 0.802 0.419s + 16000 30353 0.857 0.821 0.809 0.582s + 20000 30353 0.865 0.834 0.824 0.413s + 24000 30353 0.868 0.841 0.832 0.415s + 28000 30353 0.874 0.839 0.83 0.419s +mean_mAP: 0.8225, mean_R: 0.8325, mean_P: 0.8700 +``` + +### [Inference Process](#contents) + +#### Usage + +To compile video from frames with predicted bounding boxes, you need to install `ffmpeg` by using +`sudo apt-get install ffmpeg`. Video compiling will happen automatically. + +```bash +python infer.py --device_id [DEVICE_ID] --ckpt_url [CKPT_URL] --input_video [INPUT_VIDEO] +``` + +- DEVICE_ID - Device ID. +- CKPT_URL - Path to the trained JDE model. +- INPUT_VIDEO - Path to the input video to tracking. + +#### Result + +Results of the inference will be saved into default `./results` folder, logs will be shown at command line. + +## [Model Description](#contents) + +### [Performance](#contents) + +#### Training Performance + +| Parameters | GPU (8p) | +| -------------------------- |----------------------------------------------------------------------------------- | +| Model | JDE (1088*608) | +| Hardware | 8 Nvidia RTX 3090, Intel Xeon Gold 6226R CPU @ 2.90GHz | +| Upload Date | 02/02/2022 (day/month/year) | +| MindSpore Version | 1.5.0 | +| Dataset | Joint Dataset (see `DATASET_ZOO.md`) | +| Training Parameters | epoch=30, batch_size=4 (per device), lr=0.01, momentum=0.9, weight_decay=0.0001 | +| Optimizer | SGD | +| Loss Function | SmoothL1Loss, SoftmaxCrossEntropyWithLogits (and apply auto-balancing loss strategy)| +| Outputs | Tensor of bbox cords, conf, class, emb | +| Speed | Eight cards: ~1206 ms/step | +| Total time | Eight cards: ~17 hours | + +#### Evaluation Performance + +| Parameters | GPU (1p) | +| ------------------- |--------------------------------------------------------| +| Model | JDE (1088*608) | +| Resource | 1 Nvidia RTX 3090, Intel Xeon Gold 6226R CPU @ 2.90GHz | +| Upload Date | 02/02/2022 (day/month/year) | +| MindSpore Version | 1.5.0 | +| Dataset | MOT-16 | +| Batch_size | 1 | +| Outputs | Metrics, .txt predictions | +| FPS | 22.04 | +| Metrics | mAP 82.2, MOTA 71.0% | + +## [ModelZoo Homepage](#contents) + + Please check the official [homepage](https://gitee.com/mindspore/models). diff --git a/research/cv/JDE/cfg/ccmcpe.json b/research/cv/JDE/cfg/ccmcpe.json new file mode 100644 index 0000000000000000000000000000000000000000..ac1825ea5b45a7a62b8d527f7c899715e822fb40 --- /dev/null +++ b/research/cv/JDE/cfg/ccmcpe.json @@ -0,0 +1,22 @@ +{ + "train": + { + "mot17":"./data/mot17.train", + "caltech":"./data/caltech.train", + "citypersons":"./data/citypersons.train", + "cuhksysu":"./data/cuhksysu.train", + "prw":"./data/prw.train", + "eth":"./data/eth.train" + }, + "test_emb": + { + "caltech":"./data/caltech.10k.val", + "cuhksysu":"./data/cuhksysu.val", + "prw":"./data/prw.val" + }, + "test": + { + "caltech":"./data/caltech.val", + "citypersons":"./data/citypersons.val" + } +} diff --git a/research/cv/JDE/cfg/config.py b/research/cv/JDE/cfg/config.py new file mode 100644 index 0000000000000000000000000000000000000000..8b4bc609818bd001e5dd0476999c8cc98e2cfb44 --- /dev/null +++ b/research/cv/JDE/cfg/config.py @@ -0,0 +1,129 @@ +# Copyright 2022 Huawei Technologies Co., Ltd +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================ +"""Parse arguments""" +import argparse +import ast +from pathlib import Path +from pprint import pformat + +import yaml + + +class Config: + """ + Configuration namespace, convert dictionary to members. + """ + def __init__(self, cfg_dict): + for k, v in cfg_dict.items(): + if isinstance(v, (list, tuple)): + setattr(self, k, [Config(x) if isinstance(x, dict) else x for x in v]) + else: + setattr(self, k, Config(v) if isinstance(v, dict) else v) + + def __str__(self): + return pformat(self.__dict__) + + def __repr__(self): + return self.__str__() + + +def parse_cli_to_yaml(parser, cfg, helper=None, choices=None, cfg_path="default_config.yaml"): + """ + Parse command line arguments to the configuration according to the default yaml. + + Args: + parser (argparse.ArgumentParser): Parent parser. + cfg (dict): Base configuration. + helper (dict): Helper description. + choices (dict): Choices. + """ + helper = {} if helper is None else helper + choices = {} if choices is None else choices + for item in cfg: + if not isinstance(cfg[item], list) and not isinstance(cfg[item], dict): + help_description = helper[item] if item in helper else f"Please reference to {cfg_path}" + choice = choices[item] if item in choices else None + if isinstance(cfg[item], bool): + parser.add_argument("--" + item, type=ast.literal_eval, default=cfg[item], choices=choice, + help=help_description) + else: + parser.add_argument("--" + item, type=type(cfg[item]), default=cfg[item], choices=choice, + help=help_description) + args = parser.parse_args() + return args + + +def parse_yaml(yaml_path): + """ + Parse the yaml config file. + """ + with open(yaml_path, 'r') as fin: + try: + cfgs_raw = yaml.load_all(fin.read(), Loader=yaml.FullLoader) + cfgs = [] + for cf in cfgs_raw: + cfgs.append(cf) + + if len(cfgs) == 1: + cfg_helper = {} + cfg = cfgs[0] + cfg_choices = {} + elif len(cfgs) == 2: + cfg, cfg_helper = cfgs + cfg_choices = {} + elif len(cfgs) == 3: + cfg, cfg_helper, cfg_choices = cfgs + else: + raise ValueError("At most 3 docs (config, description for help, choices) are supported in config yaml") + except ValueError("Failed to parse yaml") as err: + raise err + + return cfg, cfg_helper, cfg_choices + + +def merge(args, cfg): + """ + Merge the base config from yaml file and command line arguments. + + Args: + args (argparse.Namespace): Command line arguments. + cfg (dict): Base configuration. + """ + args_var = vars(args) + for item in args_var: + cfg[item] = args_var[item] + + return cfg + + +def get_config(): + """ + Get Config according to the yaml file and cli arguments. + """ + curr_dir = Path(__file__).resolve().parent + parser = argparse.ArgumentParser(description="JDE config", add_help=False) + parser.add_argument("--config_path", type=str, default=str(curr_dir / "../default_config.yaml"), + help="Path to config.") + parser.add_argument("--data_cfg_url", type=str, default=str(curr_dir / "ccmcpe.json"), + help="Path to data config.") + path_args, _ = parser.parse_known_args() + default, helper, choices = parse_yaml(path_args.config_path) + args = parse_cli_to_yaml(parser=parser, cfg=default, helper=helper, choices=choices) + final_config = merge(args, default) + + return Config(final_config) + + +config = get_config() diff --git a/research/cv/JDE/data/prepare_mot17.py b/research/cv/JDE/data/prepare_mot17.py new file mode 100644 index 0000000000000000000000000000000000000000..b147b43b7269928b5525d88a32518934b590868c --- /dev/null +++ b/research/cv/JDE/data/prepare_mot17.py @@ -0,0 +1,79 @@ +# Copyright 2022 Huawei Technologies Co., Ltd +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================ +"""Prepare data.""" +import argparse +import os +import os.path as osp +import shutil +from pathlib import Path + +import numpy as np + + +def prepare(seq_root): + """Prepare MOT17 dataset for JDE training.""" + label_root = str(Path(Path(seq_root).parents[0], 'labels_with_ids', 'train')) + seqs = [s for s in os.listdir(seq_root) if s.endswith('SDP')] + + tid_curr = 0 + tid_last = -1 + + for seq in seqs: + with open(osp.join(seq_root, seq, 'seqinfo.ini')) as file: + seq_info = file.read() + + seq_width = int(seq_info[seq_info.find('imWidth=') + 8: seq_info.find('\nimHeight')]) + seq_height = int(seq_info[seq_info.find('imHeight=') + 9: seq_info.find('\nimExt')]) + + gt_txt = osp.join(seq_root, seq, 'gt', 'gt.txt') + gt = np.loadtxt(gt_txt, dtype=np.float64, delimiter=',') + + seq_label_root = osp.join(label_root, seq, 'img1') + if not osp.exists(seq_label_root): + os.makedirs(seq_label_root) + + for fid, tid, x, y, w, h, mark, label, _ in gt: + if mark == 0 or not label == 1: + continue + fid = int(fid) + tid = int(tid) + if tid != tid_last: + tid_curr += 1 + tid_last = tid + x += w / 2 + y += h / 2 + label_fpath = osp.join(seq_label_root, '{:06d}.txt'.format(fid)) + label_str = '0 {:d} {:.6f} {:.6f} {:.6f} {:.6f}\n'.format( + tid_curr, x / seq_width, y / seq_height, w / seq_width, h / seq_height) + with open(label_fpath, 'a') as f: + f.write(label_str) + + old_path = str(Path(seq_root, seq)) + new_path = str(Path(Path(seq_root).parents[0], 'images', 'train')) + + if not osp.exists(new_path): + os.makedirs(new_path) + + shutil.move(old_path, new_path) + + print('Done') + + +if __name__ == "__main__": + parser = argparse.ArgumentParser() + parser.add_argument("--seq_root", required=True, help='Path to root dir of sequences') + + args = parser.parse_args() + prepare(args.seq_root) diff --git a/research/cv/JDE/default_config.yaml b/research/cv/JDE/default_config.yaml new file mode 100644 index 0000000000000000000000000000000000000000..c00ce6282dd499a55f167d516fbd3a347c10d379 --- /dev/null +++ b/research/cv/JDE/default_config.yaml @@ -0,0 +1,120 @@ +# Builtin Configurations(DO NOT CHANGE THESE CONFIGURATIONS unless you know exactly what you are doing) + +# hyperparameters of training +momentum: 0.9 +decay: 0.0001 +lr: 0.01 +epochs: 30 +batch_size: 4 + +# other +num_classes: 1 +k_max: 250 +img_size: [1088, 608] +track_buffer: 30 +keep_checkpoint_max: 6 + +# model initialization parameters +backbone_input_shape: [32, 64, 128, 256, 512] +backbone_shape: [64, 128, 256, 512, 1024] +backbone_layers: [1, 2, 8, 8, 4] +out_channel: 24 # 3 * (num_classes + 5) +embedding_dim: 512 + +# evaluation thresholds +iou_thres: 0.50 +conf_thres: 0.55 +nms_thres: 0.45 +min_box_area: 200 + +# h -> w +anchor_scales: [ + [8, 24], + [11, 34], + [16, 48], + [23, 68], + [32, 96], + [45, 135], + [64, 192], + [90, 271], + [128, 384], + [180, 540], + [256, 640], + [512, 640], +] + + +# data configs +col_names_train: [ + 'imgs', + 'tconf_s', + 'tbox_s', + 'tid_s', + 'tconf_m', + 'tbox_m', + 'tid_m', + 'tconf_b', + 'tbox_b', + 'tid_b', + 'emb_indices_s', + 'emb_indices_m', + 'emb_indices_b', +] + +col_names_val: [ + 'imgs', + 'targets', + 'lens', +] + + +# other +is_distributed: False +dataset_root: '/path/to/datasets/root/folder/' +device_target: 'GPU' +device_id: 0 +device_start: 0 +ckpt_url: '/path/to/checkpoint' +logs_dir: './logs' +input_video: '/path/to/input/video' +output_format: 'video' +output_root: './results' +save_images: False +save_videos: False + +--- +# Config description for each option +momentum: 'Momentum for SGD optimizer.' +decay: 'Weight_decay for SGD optimizer.' +lr: 'Init learning rate.' +epochs: 'Number of epochs to train.' +batch_size: 'Batch size per one device' +num_classes: 'Number of object classes.' +k_max: 'Max predictions per one map (made for optimization of FC layer embedding computation).' +img_size: 'Size of input images.' +track_buffer: 'Tracking buffer.' +keep_checkpoint_max: 'Keep saved last N checkpoints.' +backbone_input_shape: 'Input filters of backbone layers.' +backbone_shape: 'Input filters of backbone layers.' +backbone_layers: 'Output filters of backbone layers.' +out_channel: 'Number of channels for detection.' +embedding_dim: 'Number of channels for embeddings.' +iou_thres: 'IOU thresholds.' +conf_thres: 'Confidence threshold.' +nms_thres: 'Threshold for Non-max suppression.' +min_box_area: 'Filter out tiny boxes.' +anchor_scales: '12 predefined anchor boxes. Different 4 per each of 3 feature maps.' +col_names_train: 'Names of columns for training GeneratorDataset.' +col_names_val: 'Names of columns for validation GeneratorDataset.' +is_distributed: 'Distribute training or not.' +dataset_root: 'Path to datasets root folder.' +device_target: 'Device GPU or any.' +device_id: 'Device id of target device.' +device_start: 'Start device id.' +ckpt_url: 'Location of checkpoint.' +logs_dir: 'Dir to save logs and ckpt.' +input_video: 'Path to the input video.' +output_format: 'Expected output format.' +output_root: 'Expected output root path.' +save_images: 'Save tracking results (image).' +save_videos: 'Save tracking results (video).' diff --git a/research/cv/JDE/eval.py b/research/cv/JDE/eval.py new file mode 100644 index 0000000000000000000000000000000000000000..064b160d07b4b7381e29b8d64971366504a6ae9a --- /dev/null +++ b/research/cv/JDE/eval.py @@ -0,0 +1,272 @@ +# Copyright 2022 Huawei Technologies Co., Ltd +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================ +"""Tracker evaluation script.""" +import logging +import os +import os.path as osp + +import cv2 +import motmetrics as mm +import numpy as np +from mindspore import Model +from mindspore import Tensor +from mindspore import context +from mindspore import dtype as mstype +from mindspore.train.serialization import load_checkpoint + +from cfg.config import config as default_config +from src import visualization as vis +from src.darknet import DarkNet, ResidualBlock +from src.dataset import LoadImages +from src.evaluation import Evaluator +from src.log import logger +from src.model import JDEeval +from src.model import YOLOv3 +from src.timer import Timer +from src.utils import mkdir_if_missing +from tracker.multitracker import JDETracker + +_MOT16_VALIDATION_FOLDERS = ( + 'MOT16-02', + 'MOT16-04', + 'MOT16-05', + 'MOT16-09', + 'MOT16-10', + 'MOT16-11', + 'MOT16-13', +) + +_MOT16_DIR_FOR_TEST = 'MOT16/train' + + +def write_results(filename, results, data_type): + """ + Format for evaluation results. + """ + if data_type == 'mot': + save_format = '{frame},{id},{x1},{y1},{w},{h},1,-1,-1,-1\n' + elif data_type == 'kitti': + save_format = '{frame} {id} pedestrian 0 0 -10 {x1} {y1} {x2} {y2} -10 -10 -10 -1000 -1000 -1000 -10\n' + else: + raise ValueError(data_type) + + with open(filename, 'w') as f: + for frame_id, tlwhs, track_ids in results: + if data_type == 'kitti': + frame_id -= 1 + for tlwh, track_id in zip(tlwhs, track_ids): + if track_id < 0: + continue + x1, y1, w, h = tlwh + x2, y2 = x1 + w, y1 + h + line = save_format.format(frame=frame_id, id=track_id, x1=x1, y1=y1, x2=x2, y2=y2, w=w, h=h) + f.write(line) + logger.info('Save results to %s', filename) + + +def eval_seq( + opt, + dataloader, + data_type, + result_filename, + net, + save_dir=None, + frame_rate=30, +): + """ + Processes the video sequence given and provides the output + of tracking result (write the results in video file). + + It uses JDE model for getting information about the online targets present. + + Args: + opt (Any): Contains information passed as commandline arguments. + dataloader (Any): Fetching the image sequence and associated data. + data_type (str): Type of dataset corresponding(similar) to the given video. + result_filename (str): The name(path) of the file for storing results. + net (nn.Cell): Model. + save_dir (str): Path to output results. + frame_rate (int): Frame-rate of the given video. + + Returns: + frame_id (int): Sequence number of the last sequence. + average_time (int): Average time for frame. + calls (int): Num of timer calls. + """ + if save_dir: + mkdir_if_missing(save_dir) + tracker = JDETracker(opt, net=net, frame_rate=frame_rate) + timer = Timer() + results = [] + frame_id = 0 + timer.tic() + timer.toc() + timer.calls -= 1 + + for img, img0 in dataloader: + if frame_id % 20 == 0: + log_info = f'Processing frame {frame_id} ({(1. / max(1e-5, timer.average_time)):.2f} fps)' + logger.info('%s', log_info) + + # except initialization step at time calculation + if frame_id != 0: + timer.tic() + + im_blob = Tensor(np.expand_dims(img, 0), mstype.float32) + online_targets = tracker.update(im_blob, img0) + online_tlwhs = [] + online_ids = [] + for t in online_targets: + tlwh = t.tlwh + tid = t.track_id + vertical = tlwh[2] / tlwh[3] > 1.6 + if tlwh[2] * tlwh[3] > opt.min_box_area and not vertical: + online_tlwhs.append(tlwh) + online_ids.append(tid) + + if frame_id != 0: + timer.toc() + # save results + results.append((frame_id + 1, online_tlwhs, online_ids)) + if save_dir is not None: + online_im = vis.plot_tracking( + img0, + online_tlwhs, + online_ids, + frame_id=frame_id, + fps=1. / timer.average_time, + ) + + cv2.imwrite(os.path.join(save_dir, f'{frame_id:05}.jpg'), online_im) + frame_id += 1 + # save results + write_results(result_filename, results, data_type) + + return frame_id, timer.average_time, timer.calls - 1 + + +def main( + opt, + data_root, + seqs, + exp_name, + save_videos=False, +): + logger.setLevel(logging.INFO) + result_root = os.path.join(data_root, '..', 'results', exp_name) + mkdir_if_missing(result_root) + data_type = 'mot' + + darknet53 = DarkNet( + ResidualBlock, + opt.backbone_layers, + opt.backbone_input_shape, + opt.backbone_shape, + detect=True, + ) + model = YOLOv3( + backbone=darknet53, + backbone_shape=opt.backbone_shape, + out_channel=opt.out_channel, + ) + + model = JDEeval(model, opt) + load_checkpoint(opt.ckpt_url, model) + model = Model(model) + + # Run tracking + n_frame = 0 + timer_avgs, timer_calls, accs = [], [], [] + + for seq in seqs: + output_dir = os.path.join(data_root, '..', 'outputs', exp_name, seq) if save_videos else None + + logger.info('start seq: %s', seq) + + dataloader = LoadImages(osp.join(data_root, seq, 'img1'), opt.anchor_scales, opt.img_size) + + result_filename = os.path.join(result_root, f'{seq}.txt') + + with open(os.path.join(data_root, seq, 'seqinfo.ini')) as f: + meta_info = f.read() + + frame_rate = int(meta_info[meta_info.find('frameRate') + 10:meta_info.find('\nseqLength')]) + + nf, ta, tc = eval_seq( + opt, + dataloader, + data_type, + result_filename, + net=model, + save_dir=output_dir, + frame_rate=frame_rate, + ) + + n_frame += nf + timer_avgs.append(ta) + timer_calls.append(tc) + + # eval + logger.info('Evaluate seq: %s', seq) + evaluator = Evaluator(data_root, seq, data_type) + accs.append(evaluator.eval_file(result_filename)) + if save_videos: + output_video_path = osp.join(output_dir, f'{seq}.mp4') + cmd_str = f'ffmpeg -f image2 -i {output_dir}/%05d.jpg -c:v copy {output_video_path}' + os.system(cmd_str) + + timer_avgs = np.asarray(timer_avgs) + timer_calls = np.asarray(timer_calls) + all_time = np.dot(timer_avgs, timer_calls) + avg_time = all_time / np.sum(timer_calls) + + log_info = f'Time elapsed: {all_time:.2f} seconds, FPS: {(1.0 / avg_time):.2f}' + logger.info('%s', log_info) + + # Get summary + metrics = mm.metrics.motchallenge_metrics + mh = mm.metrics.create() + summary = Evaluator.get_summary(accs, seqs, metrics) + strsummary = mm.io.render_summary( + summary, + formatters=mh.formatters, + namemap=mm.io.motchallenge_metric_names + ) + + print(strsummary) + Evaluator.save_summary(summary, os.path.join(result_root, f'summary_{exp_name}.xlsx')) + + +if __name__ == '__main__': + config = default_config + + context.set_context(mode=context.GRAPH_MODE, device_target='GPU') + context.set_context(device_id=config.device_id) + + data_root_path = os.path.join(config.dataset_root, _MOT16_DIR_FOR_TEST) + + if not os.path.isdir(data_root_path): + raise NotADirectoryError( + f'Cannot find "{_MOT16_DIR_FOR_TEST}" subdirectory ' + f'in the specified dataset root "{config.dataset_root}"' + ) + + main( + config, + data_root=data_root_path, + seqs=_MOT16_VALIDATION_FOLDERS, + exp_name=config.ckpt_url.split('/')[-2], + save_videos=config.save_videos, + ) diff --git a/research/cv/JDE/eval_detect.py b/research/cv/JDE/eval_detect.py new file mode 100644 index 0000000000000000000000000000000000000000..9425f78caca267f6a7615263f74993239195b6d7 --- /dev/null +++ b/research/cv/JDE/eval_detect.py @@ -0,0 +1,216 @@ +# Copyright 2022 Huawei Technologies Co., Ltd +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================ +"""Evaluation script.""" +import json +import time + +import numpy as np +from mindspore import Model +from mindspore import context +from mindspore import dataset as ds +from mindspore.common import set_seed +from mindspore.communication.management import get_group_size +from mindspore.communication.management import get_rank +from mindspore.dataset.vision import py_transforms as PY +from mindspore.train.serialization import load_checkpoint + +from cfg.config import config as default_config +from src.darknet import DarkNet, ResidualBlock +from src.dataset import JointDatasetDetection +from src.model import JDEeval +from src.model import YOLOv3 +from src.utils import ap_per_class +from src.utils import bbox_iou +from src.utils import non_max_suppression +from src.utils import xywh2xyxy + +set_seed(1) + + +def _get_rank_info(device_target): + """ + Get rank size and rank id. + """ + if device_target == 'GPU': + rank_size = get_group_size() + rank_id = get_rank() + else: + raise ValueError("Unsupported platform.") + + return rank_size, rank_id + + +def main( + opt, + iou_thres, + conf_thres, + nms_thres, + nc, +): + img_size = opt.img_size + + with open(opt.data_cfg_url) as f: + data_config = json.load(f) + test_paths = data_config['test'] + + dataset = JointDatasetDetection( + opt.dataset_root, + test_paths, + augment=False, + transforms=PY.ToTensor(), + config=opt, + ) + + dataloader = ds.GeneratorDataset( + dataset, + column_names=opt.col_names_val, + shuffle=False, + num_parallel_workers=1, + max_rowsize=12, + ) + + dataloader = dataloader.batch(opt.batch_size, True) + + darknet53 = DarkNet( + ResidualBlock, + opt.backbone_layers, + opt.backbone_input_shape, + opt.backbone_shape, + detect=True, + ) + + model = YOLOv3( + backbone=darknet53, + backbone_shape=opt.backbone_shape, + out_channel=opt.out_channel, + ) + + model = JDEeval(model, opt) + + load_checkpoint(opt.ckpt_url, model) + print(f'Evaluation for {opt.ckpt_url}') + model = Model(model) + + mean_map, mean_r, mean_p, seen = 0.0, 0.0, 0.0, 0 + print('%11s' * 5 % ('Image', 'Total', 'P', 'R', 'mAP')) + maps, mr, mp = [], [], [] + ap_accum, ap_accum_count = np.zeros(nc), np.zeros(nc) + + for batch_i, inputs in enumerate(dataloader): + imgs, targets, targets_len = inputs + targets = targets.asnumpy() + targets_len = targets_len.asnumpy() + + t = time.time() + + raw_output, _ = model.predict(imgs) + output = non_max_suppression(raw_output.asnumpy(), conf_thres=conf_thres, nms_thres=nms_thres) + + for i, o in enumerate(output): + if o is not None: + output[i] = o[:, :6] + + # Compute average precision for each sample + targets = [targets[i][:int(l)] for i, l in enumerate(targets_len)] + for labels, detections in zip(targets, output): + seen += 1 + + if detections is None: + # If there are labels but no detections mark as zero ap + if labels.shape[0] != 0: + maps.append(0) + mr.append(0) + mp.append(0) + continue + + # Get detections sorted by decreasing confidence scores + detections = detections[np.argsort(-detections[:, 4])] + + # If no labels add number of detections as incorrect + correct = [] + if labels.shape[0] == 0: + maps.append(0) + mr.append(0) + mp.append(0) + continue + + target_cls = labels[:, 0] + + # Extract target boxes as (x1, y1, x2, y2) + target_boxes = xywh2xyxy(labels[:, 2:6]) + target_boxes[:, 0] *= img_size[0] + target_boxes[:, 2] *= img_size[0] + target_boxes[:, 1] *= img_size[1] + target_boxes[:, 3] *= img_size[1] + + detected = [] + for *pred_bbox, _, _ in detections: + obj_pred = 0 + pred_bbox = np.array(pred_bbox, dtype=np.float32).reshape(1, -1) + # Compute iou with target boxes + iou = bbox_iou(pred_bbox, target_boxes, x1y1x2y2=True)[0] + # Extract index of largest overlap + best_i = np.argmax(iou) + # If overlap exceeds threshold and classification is correct mark as correct + if iou[best_i] > iou_thres and obj_pred == labels[best_i, 0] and best_i not in detected: + correct.append(1) + detected.append(best_i) + else: + correct.append(0) + + # Compute Average Precision (ap) per class + ap, ap_class, r, p = ap_per_class( + tp=correct, + conf=detections[:, 4], + pred_cls=np.zeros_like(detections[:, 5]), # detections[:, 6] + target_cls=target_cls, + ) + + # Accumulate AP per class + ap_accum_count += np.bincount(ap_class, minlength=nc) + ap_accum += np.bincount(ap_class, minlength=nc, weights=ap) + + # Compute mean AP across all classes in this image, and append to image list + maps.append(ap.mean()) + mr.append(r.mean()) + mp.append(p.mean()) + + # Means of all images + mean_map = np.sum(maps) / (ap_accum_count + 1E-16) + mean_r = np.sum(mr) / (ap_accum_count + 1E-16) + mean_p = np.sum(mp) / (ap_accum_count + 1E-16) + + if (batch_i + 1) % 1000 == 0: + # Print image mAP and running mean mAP + print(('%11s%11s' + '%11.3g' * 4 + 's') % + (seen, dataset.nf, mean_p, mean_r, mean_map, time.time() - t)) + + # Print results + print(f'mean_mAP: {mean_map[0]:.4f}, mean_R: {mean_r[0]:.4f}, mean_P: {mean_p[0]:.4f}') + + +if __name__ == "__main__": + config = default_config + + context.set_context(mode=context.GRAPH_MODE, device_target=config.device_target) + context.set_context(device_id=config.device_id) + + main( + opt=config, + iou_thres=0.5, + conf_thres=0.3, + nms_thres=0.45, + nc=config.num_classes, + ) diff --git a/research/cv/JDE/export.py b/research/cv/JDE/export.py new file mode 100644 index 0000000000000000000000000000000000000000..40a21a9a57a65a8e70989f3406bcaa4f32bc862c --- /dev/null +++ b/research/cv/JDE/export.py @@ -0,0 +1,67 @@ +# Copyright 2022 Huawei Technologies Co., Ltd +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================ +"""run export""" +from pathlib import Path + +import numpy as np +from mindspore import Tensor +from mindspore import context +from mindspore import dtype as mstype +from mindspore import load_checkpoint +from mindspore.train.serialization import export + +from cfg.config import config as default_config +from src.darknet import DarkNet, ResidualBlock +from src.model import JDEeval +from src.model import YOLOv3 + + +def run_export(config): + """ + Export model to MINDIR. + """ + darknet53 = DarkNet( + ResidualBlock, + config.backbone_layers, + config.backbone_input_shape, + config.backbone_shape, + detect=True, + ) + + yolov3 = YOLOv3( + backbone=darknet53, + backbone_shape=config.backbone_shape, + out_channel=config.out_channel, + ) + + net = JDEeval(yolov3, default_config) + load_checkpoint(config.ckpt_url, net) + net.set_train(False) + + input_data = Tensor(np.zeros([1, 3, 1088, 608]), dtype=mstype.float32) + name = Path(config.ckpt_url).stem + + export(net, input_data, file_name=name, file_format='MINDIR') + print('Model exported successfully!') + + +if __name__ == "__main__": + context.set_context( + mode=context.GRAPH_MODE, + device_target=default_config.device_target, + device_id=default_config.device_id, + ) + + run_export(default_config) diff --git a/research/cv/JDE/infer.py b/research/cv/JDE/infer.py new file mode 100644 index 0000000000000000000000000000000000000000..536ea096c48b708ba5372dc79f6b703fe0ec3e3f --- /dev/null +++ b/research/cv/JDE/infer.py @@ -0,0 +1,100 @@ +# Copyright 2022 Huawei Technologies Co., Ltd +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================ +"""Inference script.""" +import logging +import os +import os.path as osp + +from mindspore import Model +from mindspore import context +from mindspore.train.serialization import load_checkpoint + +from cfg.config import config as default_config +from eval import eval_seq +from src.darknet import DarkNet, ResidualBlock +from src.dataset import LoadVideo +from src.log import logger +from src.model import JDEeval +from src.model import YOLOv3 +from src.utils import mkdir_if_missing + +logger.setLevel(logging.INFO) + +def track(opt): + """ + Inference of the input video. + + Save the results into output-root (video, annotations and frames.). + """ + + result_root = opt.output_root if opt.output_root != '' else '.' + mkdir_if_missing(result_root) + + anchors = opt.anchor_scales + + dataloader = LoadVideo( + opt.input_video, + anchor_scales=anchors, + img_size=opt.img_size, + ) + + darknet53 = DarkNet( + ResidualBlock, + opt.backbone_layers, + opt.backbone_input_shape, + opt.backbone_shape, + detect=True, + ) + model = YOLOv3( + backbone=darknet53, + backbone_shape=opt.backbone_shape, + out_channel=opt.out_channel, + ) + + model = JDEeval(model, opt) + load_checkpoint(opt.ckpt_url, model) + model = Model(model) + logger.info('Starting tracking...') + + result_filename = os.path.join(result_root, 'results.txt') + frame_rate = dataloader.frame_rate + + frame_dir = None if opt.output_format == 'text' else osp.join(result_root, 'frame') + try: + eval_seq( + opt, + dataloader, + 'mot', + result_filename, + net=model, + save_dir=frame_dir, + frame_rate=frame_rate, + ) + except TypeError as e: + logger.info(e) + + if opt.output_format == 'video': + output_video_path = osp.join(result_root, 'result.mp4') + cmd_str = f"ffmpeg -f image2 -i {osp.join(result_root, 'frame')}/%05d.jpg -c:v copy {output_video_path}" + os.system(cmd_str) + + +if __name__ == '__main__': + config = default_config + + context.set_context(mode=context.GRAPH_MODE, device_target='GPU') + context.set_context(device_id=config.device_id) + + track(config) diff --git a/research/cv/JDE/requirements.txt b/research/cv/JDE/requirements.txt new file mode 100644 index 0000000000000000000000000000000000000000..1addfc94bccbd7988a7e7b1519496e295fd6b310 --- /dev/null +++ b/research/cv/JDE/requirements.txt @@ -0,0 +1,8 @@ +PyYAML +opencv-python>=4.5.5.62 +motmetrics>=1.2.0 +scipy>=1.7.2 +lap>=0.4.0 +Cython +cython-bbox>=0.1.3 +torch diff --git a/research/cv/JDE/scripts/run_distribute_train_gpu.sh b/research/cv/JDE/scripts/run_distribute_train_gpu.sh new file mode 100644 index 0000000000000000000000000000000000000000..11247847b0d99d54baa1041ac2d46bdf47ecf922 --- /dev/null +++ b/research/cv/JDE/scripts/run_distribute_train_gpu.sh @@ -0,0 +1,53 @@ +#!/bin/bash +# Copyright 2022 Huawei Technologies Co., Ltd +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================ +if [[ $# -ne 4 ]]; then + echo "Usage: bash ./scripts/run_distribute_train_gpu.sh [DEVICE_NUM] [LOGS_CKPT_DIR] [CKPT_URL] [DATASET_ROOT]" + exit 1; +fi + +export RANK_SIZE=$1 + +get_real_path(){ + if [ "${1:0:1}" == "/" ]; then + echo "$1" + else + realpath -m "$PWD/$1" + fi +} + +LOGS_CKPT_DIR="$2" + +if [ ! -d "$LOGS_CKPT_DIR" ]; then + mkdir "$LOGS_CKPT_DIR" + mkdir "$LOGS_CKPT_DIR/training_configs" +fi + +DATASET_ROOT=$(get_real_path "$4") +CKPT_URL=$(get_real_path "$3") + +cp ./*.py ./"$LOGS_CKPT_DIR"/training_configs +cp ./*.yaml ./"$LOGS_CKPT_DIR"/training_configs +cp -r ./cfg ./"$LOGS_CKPT_DIR"/training_configs +cp -r ./src ./"$LOGS_CKPT_DIR"/training_configs + +mpirun -n $1 --allow-run-as-root\ + python train.py \ + --device_target="GPU" \ + --logs_dir="$LOGS_CKPT_DIR" \ + --dataset_root="$DATASET_ROOT" \ + --ckpt_url="$CKPT_URL" \ + --is_distributed=True \ + > ./"$LOGS_CKPT_DIR"/distribute_train.log 2>&1 & diff --git a/research/cv/JDE/scripts/run_eval_gpu.sh b/research/cv/JDE/scripts/run_eval_gpu.sh new file mode 100644 index 0000000000000000000000000000000000000000..b3499e339f8c12a25aac1ca5d68633a288b7f27b --- /dev/null +++ b/research/cv/JDE/scripts/run_eval_gpu.sh @@ -0,0 +1,49 @@ +#!/bin/bash +# Copyright 2022 Huawei Technologies Co., Ltd +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================ +if [[ $# -ne 3 ]]; then + echo "Usage: bash ./scripts/run_eval_gpu.sh [DEVICE_ID] [CKPT_URL] [DATASET_ROOT]" + exit 1; +fi + +export CUDA_VISIBLE_DEVICES=$1 + +get_real_path(){ + if [ "${1:0:1}" == "/" ]; then + echo "$1" + else + realpath -m "$PWD/$1" + fi +} + +CKPT_URL=$(get_real_path "$2") +DATASET_ROOT=$(get_real_path "$3") + +if [ ! -d "$DATASET_ROOT" ]; then + echo "The specified dataset root is not a directory: $DATASET_ROOT" + exit 1; +fi + +if [ ! -f "$CKPT_URL" ]; then + echo "The specified checkpoint does not exist: $CKPT_URL" + exit 1; +fi + +python ./eval.py \ + --device_target="GPU" \ + --device_id=0 \ + --ckpt_url="$CKPT_URL" \ + --dataset_root="$DATASET_ROOT" \ + > ./eval.log 2>&1 & diff --git a/research/cv/JDE/scripts/run_standalone_train_gpu.sh b/research/cv/JDE/scripts/run_standalone_train_gpu.sh new file mode 100644 index 0000000000000000000000000000000000000000..2be547033d6bb6243d9e651f888164da39442937 --- /dev/null +++ b/research/cv/JDE/scripts/run_standalone_train_gpu.sh @@ -0,0 +1,53 @@ +#!/bin/bash +# Copyright 2022 Huawei Technologies Co., Ltd +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================ +if [[ $# -ne 4 ]]; then + echo "Usage: bash ./scripts/run_standalone_train_gpu.sh [DEVICE_ID] [LOGS_CKPT_DIR] [CKPT_URL] [DATASET_ROOT]" + exit 1 +fi + +export CUDA_VISIBLE_DEVICES=$1 + +get_real_path(){ + if [ "${1:0:1}" == "/" ]; then + echo "$1" + else + realpath -m "$PWD/$1" + fi +} + +LOGS_CKPT_DIR="$2" + +if [ ! -d "$LOGS_CKPT_DIR" ]; then + mkdir "$LOGS_CKPT_DIR" + mkdir "$LOGS_CKPT_DIR/training_configs" +fi + +DATASET_ROOT=$(get_real_path "$4") +CKPT_URL=$(get_real_path "$3") + +cp ./*.py ./"$LOGS_CKPT_DIR"/training_configs +cp ./*.yaml ./"$LOGS_CKPT_DIR"/training_configs +cp -r ./cfg ./"$LOGS_CKPT_DIR"/training_configs +cp -r ./src ./"$LOGS_CKPT_DIR"/training_configs + +python ./train.py \ + --device_target="GPU" \ + --device_id=0 \ + --logs_dir="$LOGS_CKPT_DIR" \ + --dataset_root="$DATASET_ROOT" \ + --ckpt_url="$CKPT_URL" \ + --lr=0.00125 \ + > ./"$2"/standalone_train.log 2>&1 & diff --git a/research/cv/JDE/src/__init__.py b/research/cv/JDE/src/__init__.py new file mode 100644 index 0000000000000000000000000000000000000000..e69de29bb2d1d6434b8b29ae775ad8c2e48c5391 diff --git a/research/cv/JDE/src/convert_checkpoint.py b/research/cv/JDE/src/convert_checkpoint.py new file mode 100644 index 0000000000000000000000000000000000000000..fd24c6ab2b57538285c952cf06cade6995fd323b --- /dev/null +++ b/research/cv/JDE/src/convert_checkpoint.py @@ -0,0 +1,90 @@ +# Copyright 2022 Huawei Technologies Co., Ltd +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================ +"""Checkpoint import.""" +from pathlib import Path + +import torch +from mindspore import Parameter +from mindspore import Tensor +from mindspore import dtype as mstype +from mindspore import save_checkpoint + +from cfg.config import config +from src.darknet import DarkNet +from src.darknet import ResidualBlock + + +def convert(cfg): + """ + Init the DarkNet53 model, load PyTorch checkpoint, + change the keys order as well as in MindSpore and + save converted checkpoint with names, + corresponds to inited DarkNet model. + + Args: + cfg: Config parameters. + + Note: + Convert weights without last FC layer. + """ + darknet53 = DarkNet( + ResidualBlock, + cfg.backbone_layers, + cfg.backbone_input_shape, + cfg.backbone_shape, + detect=True, + ) + + # Get MindSpore names of parameters + ms_keys = list(darknet53.parameters_dict().keys()) + + # Get PyTorch weights and names + pt_weights = torch.load(cfg.ckpt_url, map_location=torch.device('cpu'))['state_dict'] + pt_keys = list(pt_weights.keys()) + + # Remove redundant keys + pt_keys_clear = [ + key + for key in pt_keys + if not key.endswith('tracked') + ] + + # One layer consist of 5 parameters + # Arrange PyTorch keys as well as in MindSpore + pt_keys_aligned = [] + for block_num in range(len(pt_keys_clear[:-2]) // 5): + layer = pt_keys_clear[block_num * 5:(block_num + 1) * 5] + pt_keys_aligned.append(layer[0]) # Conv weight + pt_keys_aligned.append(layer[3]) # BN moving mean + pt_keys_aligned.append(layer[4]) # BN moving var + pt_keys_aligned.append(layer[1]) # BN gamma + pt_keys_aligned.append(layer[2]) # BN beta + + ms_checkpoint = [] + for key_ms, key_pt in zip(ms_keys, pt_keys_aligned): + weight = Parameter(Tensor(pt_weights[key_pt].numpy(), mstype.float32)) + ms_checkpoint.append({'name': key_ms, 'data': weight}) + + checkpoint_name = str(Path(cfg.ckpt_url).resolve().parent / 'darknet53.ckpt') + save_checkpoint(ms_checkpoint, checkpoint_name) + + print(f'Checkpoint converted successfully! Location {checkpoint_name}') + + +if __name__ == '__main__': + if not Path(config.ckpt_url).exists(): + raise FileNotFoundError(f'Expect a path to the PyTorch checkpoint, but not found it at "{config.ckpt_url}"') + + convert(config) diff --git a/research/cv/JDE/src/darknet.py b/research/cv/JDE/src/darknet.py new file mode 100644 index 0000000000000000000000000000000000000000..c448975a802cdb421a1c1854a8a476f60a0daa76 --- /dev/null +++ b/research/cv/JDE/src/darknet.py @@ -0,0 +1,267 @@ +# Copyright 2022 Huawei Technologies Co., Ltd +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================ +"""DarkNet model.""" +from mindspore import nn +from mindspore.ops import operations as P + + +def conv_block( + in_channels, + out_channels, + kernel_size, + stride, + dilation=1, +): + """ + Set a conv2d, BN and relu layer. + """ + pad_mode = 'same' + padding = 0 + + dbl = nn.SequentialCell( + [ + nn.Conv2d( + in_channels, + out_channels, + kernel_size=kernel_size, + stride=stride, + padding=padding, + dilation=dilation, + pad_mode=pad_mode, + ), + nn.BatchNorm2d(out_channels, momentum=0.1), + nn.ReLU(), + ] + ) + + return dbl + + +class ResidualBlock(nn.Cell): + """ + DarkNet V1 residual block definition. + + Args: + in_channels (int): Input channel. + out_channels (int): Output channel. + + Returns: + out (ms.Tensor): Output tensor. + + Examples: + ResidualBlock(3, 32) + """ + def __init__( + self, + in_channels, + out_channels, + ): + super().__init__() + out_chls = out_channels//2 + self.conv1 = conv_block(in_channels, out_chls, kernel_size=1, stride=1) + self.conv2 = conv_block(out_chls, out_channels, kernel_size=3, stride=1) + self.add = P.Add() + + def construct(self, x): + identity = x + out = self.conv1(x) + out = self.conv2(out) + out = self.add(out, identity) + + return out + + +class DarkNet(nn.Cell): + """ + DarkNet V1 network. + + Args: + block (cell): Block for network. + layer_nums (list): Numbers of different layers. + in_channels (list): Input channel. + out_channels (list): Output channel. + detect (bool): Whether detect or not. Default:False. + + Returns: + if detect = True: + c11 (ms.Tensor): Output from last layer. + + if detect = False: + c7, c9, c11 (ms.Tensor): Outputs from different layers (FPN). + + Examples: + DarkNet( + ResidualBlock, + [1, 2, 8, 8, 4], + [32, 64, 128, 256, 512], + [64, 128, 256, 512, 1024], + ) + """ + def __init__( + self, + block, + layer_nums, + in_channels, + out_channels, + detect=False, + ): + super().__init__() + + self.detect = detect + + if not len(layer_nums) == len(in_channels) == len(out_channels) == 5: + raise ValueError("the length of layer_num, inchannel, outchannel list must be 5!") + + self.conv0 = conv_block( + 3, + in_channels[0], + kernel_size=3, + stride=1, + ) + + self.conv1 = conv_block( + in_channels[0], + out_channels[0], + kernel_size=3, + stride=2, + ) + + self.layer1 = self._make_layer( + block, + layer_nums[0], + in_channel=out_channels[0], + out_channel=out_channels[0], + ) + + self.conv2 = conv_block( + in_channels[1], + out_channels[1], + kernel_size=3, + stride=2, + ) + + self.layer2 = self._make_layer( + block, + layer_nums[1], + in_channel=out_channels[1], + out_channel=out_channels[1], + ) + + self.conv3 = conv_block( + in_channels[2], + out_channels[2], + kernel_size=3, + stride=2, + ) + + self.layer3 = self._make_layer( + block, + layer_nums[2], + in_channel=out_channels[2], + out_channel=out_channels[2], + ) + + self.conv4 = conv_block( + in_channels[3], + out_channels[3], + kernel_size=3, + stride=2, + ) + + self.layer4 = self._make_layer( + block, + layer_nums[3], + in_channel=out_channels[3], + out_channel=out_channels[3], + ) + + self.conv5 = conv_block( + in_channels[4], + out_channels[4], + kernel_size=3, + stride=2, + ) + + self.layer5 = self._make_layer( + block, + layer_nums[4], + in_channel=out_channels[4], + out_channel=out_channels[4], + ) + + def _make_layer(self, block, layer_num, in_channel, out_channel): + """ + Make Layer for DarkNet. + + Args: + block (Cell): DarkNet block. + layer_num (int): Layer number. + in_channel (int): Input channel. + out_channel (int): Output channel. + + Examples: + _make_layer(ConvBlock, 1, 128, 256) + """ + layers = [] + darkblk = block(in_channel, out_channel) + layers.append(darkblk) + + for _ in range(1, layer_num): + darkblk = block(out_channel, out_channel) + layers.append(darkblk) + + return nn.SequentialCell(layers) + + def construct(self, x): + """ + Feed forward image. + """ + c1 = self.conv0(x) + c2 = self.conv1(c1) + c3 = self.layer1(c2) + c4 = self.conv2(c3) + c5 = self.layer2(c4) + c6 = self.conv3(c5) + c7 = self.layer3(c6) + c8 = self.conv4(c7) + c9 = self.layer4(c8) + c10 = self.conv5(c9) + c11 = self.layer5(c10) + + if self.detect: + return c7, c9, c11 + + return c11 + + +def darknet53(): + """ + Get DarkNet53 neural network. + + Returns: + Cell, cell instance of DarkNet53 neural network. + + Examples: + darknet53() + """ + + darknet = DarkNet( + block=ResidualBlock, + layer_nums=[1, 2, 8, 8, 4], + in_channels=[32, 64, 128, 256, 512], + out_channels=[64, 128, 256, 512, 1024], + ) + + return darknet diff --git a/research/cv/JDE/src/dataset.py b/research/cv/JDE/src/dataset.py new file mode 100644 index 0000000000000000000000000000000000000000..38277136c66707a9ed4d40b846678578bd71cff5 --- /dev/null +++ b/research/cv/JDE/src/dataset.py @@ -0,0 +1,529 @@ +# Copyright 2022 Huawei Technologies Co., Ltd +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================ +"""Dataloader script.""" +import math +import os +import os.path as osp +import random +from collections import OrderedDict +from pathlib import Path + +import cv2 +import numpy as np + +from src.utils import build_thresholds +from src.utils import create_anchors_vec +from src.utils import xyxy2xywh + + +class LoadImages: + """ + Loader for inference. + + Args: + path (str): Path to the directory, containing images. + img_size (list): Size of output image. + + Returns: + img (np.array): Processed image. + img0 (np.array): Original image. + """ + def __init__(self, path, anchor_scales, img_size=(1088, 608)): + path = Path(path) + if not path.is_dir(): + raise NotADirectoryError(f'Expected a path to the directory with images, got "{path}"') + + self.files = sorted(path.glob('*.jpg')) + + self.anchors, self.strides = create_anchors_vec(anchor_scales) + self.nf = len(self.files) # Number of img files. + self.width = img_size[0] + self.height = img_size[1] + self.count = 0 + + assert self.nf > 0, 'No images found in ' + path + + def __iter__(self): + self.count = -1 + return self + + def __next__(self): + self.count += 1 + if self.count == self.nf: + raise StopIteration + img_path = str(self.files[self.count]) + + # Read image + img0 = cv2.imread(img_path) # BGR + assert img0 is not None, 'Failed to load ' + img_path + + # Padded resize + img, _, _, _ = letterbox(img0, height=self.height, width=self.width) + + # Normalize RGB + img = img[:, :, ::-1].transpose(2, 0, 1) + img = np.ascontiguousarray(img, dtype=np.float32) + img /= 255.0 + + output = (img, img0) + + return output + + def __getitem__(self, idx): + idx = idx % self.nf + img_path = self.files[idx] + + # Read image + img0 = cv2.imread(img_path) # BGR + assert img0 is not None, 'Failed to load ' + img_path + + # Padded resize + img, _, _, _ = letterbox(img0, height=self.height, width=self.width) + + # Normalize RGB + img = img[:, :, ::-1].transpose(2, 0, 1) + img = np.ascontiguousarray(img, dtype=np.float32) + img /= 255.0 + + output = (img, img0) + + return output + + def __len__(self): + return self.nf # number of files + + +class LoadVideo: + """ + Video loader for inference. + + Args: + path (str): Path to video. + img_size (tuple): Size of output images size. + + Returns: + count (int): Number of frame. + img (np.array): Processed image. + img0 (np.array): Original image. + """ + def __init__(self, path, anchor_scales, img_size=(1088, 608)): + if not os.path.isfile(path): + raise FileExistsError + + self.cap = cv2.VideoCapture(path) + self.frame_rate = int(round(self.cap.get(cv2.CAP_PROP_FPS))) + self.vw = int(self.cap.get(cv2.CAP_PROP_FRAME_WIDTH)) + self.vh = int(self.cap.get(cv2.CAP_PROP_FRAME_HEIGHT)) + self.vn = int(self.cap.get(cv2.CAP_PROP_FRAME_COUNT)) + + self.anchors, self.strides = create_anchors_vec(anchor_scales) + + self.width = img_size[0] + self.height = img_size[1] + self.count = 0 + + self.w, self.h = self.get_size(self.vw, self.vh, self.width, self.height) + print(f'Lenth of the video: {self.vn:d} frames') + + def get_size(self, vw, vh, dw, dh): + wa, ha = float(dw) / vw, float(dh) / vh + a = min(wa, ha) + return int(vw * a), int(vh * a) + + def __iter__(self): + self.count = -1 + return self + + def __next__(self): + self.count += 1 + if self.count == len(self): + raise StopIteration + # Read image + _, img0 = self.cap.read() # BGR + assert img0 is not None, f'Failed to load frame {self.count:d}' + img0 = cv2.resize(img0, (self.w, self.h)) + + # Padded resize + img, _, _, _ = letterbox(img0, height=self.height, width=self.width) + + # Normalize RGB + img = img[:, :, ::-1].transpose(2, 0, 1) + img = np.ascontiguousarray(img, dtype=np.float32) + img /= 255.0 + + output = (img, img0) + + return output + + def __len__(self): + return self.vn # number of files + + +class JointDataset: + """ + Loader for all datasets. + + Args: + root (str): Absolute path to datasets. + paths (dict): Relative paths for datasets. + img_size (list): Size of output image. + augment (bool): Augment images or not. + transforms: Transform methods. + config (class): Config with hyperparameters. + + Returns: + imgs (np_array): Prepared image. Shape (C, H, W) + tconf (s, m, b) (np_array): Mask with bg (0), gt (1) and ign (-1) indices. Shape (nA, nGh, nGw). + tbox (s, m, b) (np_array): Targets delta bbox values. Shape (nA, nGh, nGw, 4). + tid (s, m, b) (np_array): Grid with id for every cell. Shape (nA, nGh, nGw). + """ + def __init__( + self, + root, + paths, + img_size=(1088, 608), + k_max=200, + augment=False, + transforms=None, + config=None, + ): + self.img_files = OrderedDict() + self.label_files = OrderedDict() + self.tid_num = OrderedDict() + self.tid_start_index = OrderedDict() + self.config = config + self.anchors, self.strides = create_anchors_vec(config.anchor_scales) + self.k_max = k_max + + # Iterate for all of datasets to prepare paths to labels + for ds, img_path in paths.items(): + with open(img_path, 'r') as file: + self.img_files[ds] = file.readlines() + self.img_files[ds] = [osp.join(root, x.strip()) for x in self.img_files[ds]] + self.img_files[ds] = list(filter(lambda x: len(x) > 0, self.img_files[ds])) + + self.label_files[ds] = [ + x.replace('images', 'labels_with_ids').replace('.png', '.txt').replace('.jpg', '.txt') + for x in self.img_files[ds]] + + # Search for max pedestrian id in dataset + for ds, label_paths in self.label_files.items(): + max_index = -1 + for lp in label_paths: + lb = np.loadtxt(lp) + if lb.shape[0] < 1: + continue + if lb.ndim < 2: + img_max = lb[1] + else: + img_max = np.max(lb[:, 1]) + if img_max > max_index: + max_index = img_max + self.tid_num[ds] = max_index + 1 + + last_index = 0 + for k, v in self.tid_num.items(): + self.tid_start_index[k] = last_index + last_index += v + + self.nid = int(last_index + 1) + self.nds = [len(x) for x in self.img_files.values()] + self.cds = [sum(self.nds[:i]) for i in range(len(self.nds))] + self.nf = sum(self.nds) + self.width = img_size[0] + self.height = img_size[1] + self.augment = augment + self.transforms = transforms + + print('=' * 40) + print('dataset summary') + print(self.tid_num) + print('total # identities:', self.nid) + print('start index') + print(self.tid_start_index) + print('=' * 40) + + def get_data(self, img_path, label_path): + """ + Get and prepare data (augment img). + """ + height = self.height + width = self.width + img = cv2.imread(img_path) # BGR + if img is None: + raise ValueError(f'File corrupt {img_path}') + augment_hsv = True + if self.augment and augment_hsv: + # SV augmentation by 50% + fraction = 0.50 + img_hsv = cv2.cvtColor(img, cv2.COLOR_BGR2HSV) + s = img_hsv[:, :, 1].astype(np.float32) + v = img_hsv[:, :, 2].astype(np.float32) + + a = (random.random() * 2 - 1) * fraction + 1 + s *= a + if a > 1: + np.clip(s, a_min=0, a_max=255, out=s) + + a = (random.random() * 2 - 1) * fraction + 1 + v *= a + if a > 1: + np.clip(v, a_min=0, a_max=255, out=v) + + img_hsv[:, :, 1] = s.astype(np.uint8) + img_hsv[:, :, 2] = v.astype(np.uint8) + cv2.cvtColor(img_hsv, cv2.COLOR_HSV2BGR, dst=img) + + h, w, _ = img.shape + img, ratio, padw, padh = letterbox(img, height=height, width=width) + + # Load labels + if os.path.isfile(label_path): + labels0 = np.loadtxt(label_path, dtype=np.float32).reshape(-1, 6) + + # Normalized xywh to pixel xyxy format + labels = labels0.copy() + labels[:, 2] = ratio * w * (labels0[:, 2] - labels0[:, 4] / 2) + padw + labels[:, 3] = ratio * h * (labels0[:, 3] - labels0[:, 5] / 2) + padh + labels[:, 4] = ratio * w * (labels0[:, 2] + labels0[:, 4] / 2) + padw + labels[:, 5] = ratio * h * (labels0[:, 3] + labels0[:, 5] / 2) + padh + else: + labels = np.array([]) + + # Augment image and labels + if self.augment: + img, labels, _ = random_affine(img, labels, degrees=(-5, 5), translate=(0.10, 0.10), scale=(0.50, 1.20)) + + nlbls = len(labels) + if nlbls > 0: + # convert xyxy to xywh + labels[:, 2:6] = xyxy2xywh(labels[:, 2:6].copy()) # / height + labels[:, 2] /= width + labels[:, 3] /= height + labels[:, 4] /= width + labels[:, 5] /= height + if self.augment: + # random left-right flip + lr_flip = True + if lr_flip & (random.random() > 0.5): + img = np.fliplr(img) + if nlbls > 0: + labels[:, 2] = 1 - labels[:, 2] + + img = np.ascontiguousarray(img[:, :, ::-1]) # BGR to RGB + if self.transforms is not None: + img = self.transforms(img) + + return img, labels, img_path + + def __getitem__(self, files_index): + """ + Iterator function for train dataset + """ + for i, c in enumerate(self.cds): + if files_index >= c: + ds = list(self.label_files.keys())[i] + start_index = c + img_path = self.img_files[ds][files_index - start_index] + label_path = self.label_files[ds][files_index - start_index] + + imgs, labels, img_path = self.get_data(img_path, label_path) + for i, _ in enumerate(labels): + if labels[i, 1] > -1: + labels[i, 1] += self.tid_start_index[ds] + + # Graph mode in Mindspore uses constant shapes + # Thus, it is necessary to fill targets to max possible ids in image + to_fill = 100 - labels.shape[0] + padding = np.zeros((to_fill, 6), dtype=np.float32) + labels = np.concatenate((labels, padding), axis=0) + + # Calculate confidence mask, bbox delta and ids for every map size + small, medium, big = build_thresholds( + labels=labels, + anchor_vec_s=self.anchors[0], + anchor_vec_m=self.anchors[1], + anchor_vec_b=self.anchors[2], + k_max=self.k_max, + ) + + tconf_s, tbox_s, tid_s, emb_indices_s = small + tconf_m, tbox_m, tid_m, emb_indices_m = medium + tconf_b, tbox_b, tid_b, emb_indices_b = big + + total_values = ( + imgs.astype(np.float32), + tconf_s, + tbox_s, + tid_s, + tconf_m, + tbox_m, + tid_m, + tconf_b, + tbox_b, + tid_b, + emb_indices_s, + emb_indices_m, + emb_indices_b, + ) + return total_values + + def __len__(self): + return self.nf # number of batches + + +class JointDatasetDetection(JointDataset): + """ + Joint dataset for evaluation. + """ + def __init__(self, *args, **kwargs): + super().__init__(*args, **kwargs) + + def __getitem__(self, files_index): + """ + Iterator function for train dataset. + """ + for i, c in enumerate(self.cds): + if files_index >= c: + ds = list(self.label_files.keys())[i] + start_index = c + img_path = self.img_files[ds][files_index - start_index] + label_path = self.label_files[ds][files_index - start_index] + + imgs, labels, img_path = self.get_data(img_path, label_path) + for i, _ in enumerate(labels): + if labels[i, 1] > -1: + labels[i, 1] += self.tid_start_index[ds] + + targets_size = labels.shape[0] + + # Graph mode in Mindspore uses constant shapes + # Thus, it is necessary to fill targets to max possible ids in image. + to_fill = 100 - labels.shape[0] + padding = np.zeros((to_fill, 6), dtype=np.float32) + labels = np.concatenate((labels, padding), axis=0) + + output = (imgs.astype(np.float32), labels, targets_size) + + return output + + +def letterbox( + img, + height=608, + width=1088, + color=(127.5, 127.5, 127.5), +): + """ + Resize a rectangular image to a padded rectangular + and fill padded border with color. + """ + shape = img.shape[:2] # shape = [height, width] + ratio = min(float(height) / shape[0], float(width) / shape[1]) + new_shape = (round(shape[1] * ratio), round(shape[0] * ratio)) # new_shape = [width, height] + dw = (width - new_shape[0]) / 2 # width padding + dh = (height - new_shape[1]) / 2 # height padding + top, bottom = round(dh - 0.1), round(dh + 0.1) + left, right = round(dw - 0.1), round(dw + 0.1) + img = cv2.resize(img, new_shape, interpolation=cv2.INTER_AREA) # resized, no border + img = cv2.copyMakeBorder(img, top, bottom, left, right, cv2.BORDER_CONSTANT, value=color) # padded rectangular + + return img, ratio, dw, dh + + +def random_affine( + img, + targets=None, + degrees=(-10, 10), + translate=(.1, .1), + scale=(.9, 1.1), + shear=(-2, 2), + border_value=(127.5, 127.5, 127.5), +): + """ + Apply several data augmentation techniques, + such as random rotation, random scale, color jittering + to reduce overfitting. + + Every rotation and scaling and etc. + is also applied to targets bbox cords. + """ + border = 0 # width of added border (optional) + height = img.shape[0] + width = img.shape[1] + + # Rotation and Scale + r = np.eye(3) + a = random.random() * (degrees[1] - degrees[0]) + degrees[0] + s = random.random() * (scale[1] - scale[0]) + scale[0] + r[:2] = cv2.getRotationMatrix2D(angle=a, center=(img.shape[1] / 2, img.shape[0] / 2), scale=s) + + # Translation + t = np.eye(3) + t[0, 2] = (random.random() * 2 - 1) * translate[0] * img.shape[0] + border # x translation (pixels) + t[1, 2] = (random.random() * 2 - 1) * translate[1] * img.shape[1] + border # y translation (pixels) + + # Shear + s = np.eye(3) + s[0, 1] = math.tan((random.random() * (shear[1] - shear[0]) + shear[0]) * math.pi / 180) # x shear (deg) + s[1, 0] = math.tan((random.random() * (shear[1] - shear[0]) + shear[0]) * math.pi / 180) # y shear (deg) + + m = s @ t @ r # Combined rotation matrix. ORDER IS IMPORTANT HERE! + imw = cv2.warpPerspective(img, m, dsize=(width, height), flags=cv2.INTER_LINEAR, + borderValue=border_value) # BGR order borderValue + + # Return warped points also + if targets is not None: + if targets.shape[0] > 0: + n = targets.shape[0] + points = targets[:, 2:6].copy() + area0 = (points[:, 2] - points[:, 0]) * (points[:, 3] - points[:, 1]) + + # warp points + xy = np.ones((n * 4, 3)) + xy[:, :2] = points[:, [0, 1, 2, 3, 0, 3, 2, 1]].reshape(n * 4, 2) # x1y1, x2y2, x1y2, x2y1 + xy = (xy @ m.T)[:, :2].reshape(n, 8) + + # create new boxes + x = xy[:, [0, 2, 4, 6]] + y = xy[:, [1, 3, 5, 7]] + xy = np.concatenate((x.min(1), y.min(1), x.max(1), y.max(1))).reshape(4, n).T + + # apply angle-based reduction + radians = a * math.pi / 180 + reduction = max(abs(math.sin(radians)), abs(math.cos(radians))) ** 0.5 + x = (xy[:, 2] + xy[:, 0]) / 2 + y = (xy[:, 3] + xy[:, 1]) / 2 + w = (xy[:, 2] - xy[:, 0]) * reduction + h = (xy[:, 3] - xy[:, 1]) * reduction + xy = np.concatenate((x - w / 2, y - h / 2, x + w / 2, y + h / 2)).reshape(4, n).T + + # reject warped points outside of image + np.clip(xy[:, 0], 0, width, out=xy[:, 0]) + np.clip(xy[:, 2], 0, width, out=xy[:, 2]) + np.clip(xy[:, 1], 0, height, out=xy[:, 1]) + np.clip(xy[:, 3], 0, height, out=xy[:, 3]) + w = xy[:, 2] - xy[:, 0] + h = xy[:, 3] - xy[:, 1] + area = w * h + ar = np.maximum(w / (h + 1e-16), h / (w + 1e-16)) + i = (w > 4) & (h > 4) & (area / (area0 + 1e-16) > 0.1) & (ar < 10) + + targets = targets[i] + targets[:, 2:6] = xy[i] + + return imw, targets, m + + return imw diff --git a/research/cv/JDE/src/evaluation.py b/research/cv/JDE/src/evaluation.py new file mode 100644 index 0000000000000000000000000000000000000000..8ba837c57689e985d42e630a492495aa006f5069 --- /dev/null +++ b/research/cv/JDE/src/evaluation.py @@ -0,0 +1,135 @@ +# Copyright 2022 Huawei Technologies Co., Ltd +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================ +"""Evaluation scripts.""" +import copy +import os + +import motmetrics as mm +import numpy as np +import pandas as pd + +from src.io import read_results, unzip_objs + +mm.lap.default_solver = 'lap' + + +class Evaluator: + """ + Evaluation for tracking with motmetrics. + """ + def __init__(self, data_root, seq_name, data_type): + self.data_root = data_root + self.seq_name = seq_name + self.data_type = data_type + + self.load_annotations() + self.reset_accumulator() + + def load_annotations(self): + """Load groundtruths.""" + assert self.data_type == 'mot' + + gt_filename = os.path.join(self.data_root, self.seq_name, 'gt', 'gt.txt') + self.gt_frame_dict = read_results(gt_filename, self.data_type, is_gt=True) + self.gt_ignore_frame_dict = read_results(gt_filename, self.data_type, is_ignore=True) + + def reset_accumulator(self): + self.acc = mm.MOTAccumulator(auto_id=True) + + def eval_frame(self, frame_id, trk_tlwhs, trk_ids, rtn_events=False): + """ + Eval one frame. + """ + # results + trk_tlwhs = np.copy(trk_tlwhs) + trk_ids = np.copy(trk_ids) + + # gts + gt_objs = self.gt_frame_dict.get(frame_id, []) + gt_tlwhs, gt_ids = unzip_objs(gt_objs)[:2] + + # ignore boxes + ignore_objs = self.gt_ignore_frame_dict.get(frame_id, []) + ignore_tlwhs = unzip_objs(ignore_objs)[0] + + # remove ignored results + keep = np.ones(len(trk_tlwhs), dtype=bool) + iou_distance = mm.distances.iou_matrix(ignore_tlwhs, trk_tlwhs, max_iou=0.5) + if iou_distance.size > 0: + match_is, match_js = mm.lap.linear_sum_assignment(iou_distance) + match_is, match_js = map(lambda a: np.asarray(a, dtype=int), [match_is, match_js]) + match_ious = iou_distance[match_is, match_js] + + match_js = np.asarray(match_js, dtype=int) + match_js = match_js[np.logical_not(np.isnan(match_ious))] + keep[match_js] = False + trk_tlwhs = trk_tlwhs[keep] + trk_ids = trk_ids[keep] + + # get distance matrix + iou_distance = mm.distances.iou_matrix(gt_tlwhs, trk_tlwhs, max_iou=0.5) + + # acc + self.acc.update(gt_ids, trk_ids, iou_distance) + + if rtn_events and iou_distance.size > 0 and hasattr(self.acc, 'last_mot_events'): + events = self.acc.last_mot_events + else: + events = None + return events + + def eval_file(self, filename): + """ + Eval file. + """ + self.reset_accumulator() + + result_frame_dict = read_results(filename, self.data_type, is_gt=False) + frames = sorted(list(set(self.gt_frame_dict.keys()) | set(result_frame_dict.keys()))) + for frame_id in frames: + trk_objs = result_frame_dict.get(frame_id, []) + trk_tlwhs, trk_ids = unzip_objs(trk_objs)[:2] + self.eval_frame(frame_id, trk_tlwhs, trk_ids, rtn_events=False) + + return self.acc + + @staticmethod + def get_summary(accs, names, metrics=('mota', 'num_switches', 'idp', 'idr', 'idf1', 'precision', 'recall')): + """ + Get MOT summary. + """ + names = copy.deepcopy(names) + if metrics is None: + metrics = mm.metrics.motchallenge_metrics + metrics = copy.deepcopy(metrics) + + mh = mm.metrics.create() + summary = mh.compute_many( + accs, + metrics=metrics, + names=names, + generate_overall=True + ) + + return summary + + @staticmethod + def save_summary(summary, filename): + """ + Save evaluation summary. + """ + writer = pd.ExcelWriter(filename) + summary.to_excel(writer) + writer.save() diff --git a/research/cv/JDE/src/io.py b/research/cv/JDE/src/io.py new file mode 100644 index 0000000000000000000000000000000000000000..6975fad9459449b657dbe2c965c1cc01c6237d61 --- /dev/null +++ b/research/cv/JDE/src/io.py @@ -0,0 +1,88 @@ +# Copyright 2022 Huawei Technologies Co., Ltd +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================ +"""MOT utils.""" +import os + +import numpy as np + + +def read_results(filename, data_type: str, is_gt=False, is_ignore=False): + """ + Read results. + """ + if data_type in ('mot', 'lab'): + read_fun = read_mot_results + else: + raise ValueError('Unknown data type: {data_type}') + + return read_fun(filename, is_gt, is_ignore) + + +def read_mot_results(filename, is_gt, is_ignore): + """ + Read MOT results. + """ + valid_labels = {1} + ignore_labels = {2, 7, 8, 12} + results_dict = {} + if os.path.isfile(filename): + with open(filename, 'r') as f: + for line in f.readlines(): + linelist = line.split(',') + if len(linelist) < 7: + continue + fid = int(linelist[0]) + if fid < 1: + continue + results_dict.setdefault(fid, []) + + if is_gt: + if 'MOT16-' in filename or 'MOT17-' in filename: + label = int(float(linelist[7])) + mark = int(float(linelist[6])) + if mark == 0 or label not in valid_labels: + continue + score = 1 + elif is_ignore: + if 'MOT16-' in filename or 'MOT17-' in filename: + label = int(float(linelist[7])) + vis_ratio = float(linelist[8]) + if label not in ignore_labels and vis_ratio >= 0: + continue + else: + continue + score = 1 + else: + score = float(linelist[6]) + + tlwh = tuple(map(float, linelist[2:6])) + target_id = int(linelist[1]) + + results_dict[fid].append((tlwh, target_id, score)) + + return results_dict + + +def unzip_objs(objs): + """ + Unzip objects. + """ + if objs: + tlwhs, ids, scores = zip(*objs) + else: + tlwhs, ids, scores = [], [], [] + tlwhs = np.asarray(tlwhs, dtype=float).reshape(-1, 4) + + return tlwhs, ids, scores diff --git a/research/cv/JDE/src/kalman_filter.py b/research/cv/JDE/src/kalman_filter.py new file mode 100644 index 0000000000000000000000000000000000000000..c1444a38037d0fdd587c267f2846deb7b9ab59c6 --- /dev/null +++ b/research/cv/JDE/src/kalman_filter.py @@ -0,0 +1,258 @@ +# Copyright 2022 Huawei Technologies Co., Ltd +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================ +"""Kalman filter scripts.""" +import numpy as np +import scipy.linalg + + + +# Table for the 0.95 quantile of the chi-square distribution with N degrees of +# freedom (contains values for N=1, ..., 9). Taken from MATLAB/Octave's chi2inv +# function and used as Mahalanobis gating threshold. + +chi2inv95 = { + 1: 3.8415, + 2: 5.9915, + 3: 7.8147, + 4: 9.4877, + 5: 11.070, + 6: 12.592, + 7: 14.067, + 8: 15.507, + 9: 16.919} + + +class KalmanFilter: + """ + A simple Kalman filter for tracking bounding boxes in image space. + + The 8-dimensional state space (x, y, a, h, vx, vy, va, vh) + contains the bounding box center position (x, y), aspect ratio a, height h, + and their respective velocities. + + Object motion follows a constant velocity model. The bounding box location + (x, y, a, h) is taken as direct observation of the state space (linear + observation model). + """ + + def __init__(self): + ndim, dt = 4, 1. + + # Create Kalman filter model matrices. + self._motion_mat = np.eye(2 * ndim, 2 * ndim) + for i in range(ndim): + self._motion_mat[i, ndim + i] = dt + self._update_mat = np.eye(ndim, 2 * ndim) + + # Motion and observation uncertainty are chosen relative + # to the current state estimate. These weights control + # the amount of uncertainty in the model. + self._std_weight_position = 1. / 20 + self._std_weight_velocity = 1. / 160 + + def initiate(self, measurement): + """ + Create track from unassociated measurement. + + Args: + measurement (np.array): Bbox coords (x, y, a, h), center (x, y), aspect ratio a, and height h. + + Returns: + mean (np.array): Mean vector (8 dimensional) + covariance (np.array): Covariance matrix (8x8) of the new track. + """ + mean_pos = measurement + mean_vel = np.zeros_like(mean_pos) + mean = np.r_[mean_pos, mean_vel] + + std = [ + 2 * self._std_weight_position * measurement[3], + 2 * self._std_weight_position * measurement[3], + 1e-2, + 2 * self._std_weight_position * measurement[3], + 10 * self._std_weight_velocity * measurement[3], + 10 * self._std_weight_velocity * measurement[3], + 1e-5, + 10 * self._std_weight_velocity * measurement[3]] + covariance = np.diag(np.square(std)) + + return mean, covariance + + def predict(self, mean, covariance): + """ + Run Kalman filter prediction step. + + Args: + mean (np.array): The 8 dimensional mean vector of the object state at the previous time step. + covariance (np.array): The 8x8 dimensional covariance matrix of the object state at the previous time step. + + Returns: + mean (np.array): Mean vector of the predicted state. + covariance (np.array): Covariance matrix of the predicted state. + + Note: + Unobserved velocities are initialized to 0 mean. + """ + std_pos = [ + self._std_weight_position * mean[3], + self._std_weight_position * mean[3], + 1e-2, + self._std_weight_position * mean[3]] + std_vel = [ + self._std_weight_velocity * mean[3], + self._std_weight_velocity * mean[3], + 1e-5, + self._std_weight_velocity * mean[3]] + motion_cov = np.diag(np.square(np.r_[std_pos, std_vel])) + + mean = np.dot(mean, self._motion_mat.T) + covariance = np.linalg.multi_dot(( + self._motion_mat, covariance, self._motion_mat.T)) + motion_cov + + return mean, covariance + + def project(self, mean, covariance): + """ + Project state distribution to measurement space. + + Args: + mean (np.array): The state's mean vector (8 dimensional array). + covariance (np.array): The state's covariance matrix (8x8 dimensional). + + Returns: + mean (np.array): Projected mean of the given state estimate. + covariance (np.array): Projected covariance matrix of the given state estimate. + """ + std = [ + self._std_weight_position * mean[3], + self._std_weight_position * mean[3], + 1e-1, + self._std_weight_position * mean[3]] + innovation_cov = np.diag(np.square(std)) + + mean = np.dot(self._update_mat, mean) + covariance = np.linalg.multi_dot(( + self._update_mat, covariance, self._update_mat.T)) + return mean, covariance + innovation_cov + + def multi_predict(self, mean, covariance): + """ + Run Kalman filter prediction step (Vectorized version). + + Args: + mean (np.array): The Nx8 dim mean matrix of the object states at the previous step. + covariance (np.array): The Nx8x8 dime covariance matrix of the object states at the previous step. + + Returns: + mean (np.array): Mean vector of the predicted state. + covariance (np.array): Covariance matrix of the predicted state. + + Note: + Unobserved velocities are initialized to 0 mean. + """ + std_pos = [ + self._std_weight_position * mean[:, 3], + self._std_weight_position * mean[:, 3], + 1e-2 * np.ones_like(mean[:, 3]), + self._std_weight_position * mean[:, 3]] + std_vel = [ + self._std_weight_velocity * mean[:, 3], + self._std_weight_velocity * mean[:, 3], + 1e-5 * np.ones_like(mean[:, 3]), + self._std_weight_velocity * mean[:, 3]] + sqr = np.square(np.r_[std_pos, std_vel]).T + + motion_cov = [] + for i in range(len(mean)): + motion_cov.append(np.diag(sqr[i])) + motion_cov = np.asarray(motion_cov) + + mean = np.dot(mean, self._motion_mat.T) + left = np.dot(self._motion_mat, covariance).transpose((1, 0, 2)) + covariance = np.dot(left, self._motion_mat.T) + motion_cov + + return mean, covariance + + def update(self, mean, covariance, measurement): + """ + Run Kalman filter correction step. + + Args: + mean (np.array): The predicted state's mean vector (8 dimensional). + covariance (np.array): The state's covariance matrix (8x8 dimensional). + measurement (np.array): The 4 dimensional measurement vector (x, y, a, h), + where (x, y) is the center position, a the aspect ratio, + and h the height of the bounding box. + + Returns: + new_mean (np.array): Measurement-corrected state distribution. + new_covariance (np.array): Measurement-corrected state distribution. + """ + projected_mean, projected_cov = self.project(mean, covariance) + + chol_factor, lower = scipy.linalg.cho_factor( + projected_cov, lower=True, check_finite=False) + kalman_gain = scipy.linalg.cho_solve( + (chol_factor, lower), np.dot(covariance, self._update_mat.T).T, + check_finite=False).T + innovation = measurement - projected_mean + + new_mean = mean + np.dot(innovation, kalman_gain.T) + new_covariance = covariance - np.linalg.multi_dot(( + kalman_gain, projected_cov, kalman_gain.T)) + return new_mean, new_covariance + + def gating_distance(self, mean, covariance, measurements, only_position=False, metric='maha'): + """ + Compute gating distance between state distribution and measurements. + + A suitable distance threshold can be obtained from `chi2inv95`. If + `only_position` is False, the chi-square distribution has 4 degrees of + freedom, otherwise 2. + + Args: + mean (np.array): The predicted state's mean vector (8 dimensional). + covariance (np.array): The state's covariance matrix (8x8 dimensional). + measurements (np.array): An Nx4 dimensional matrix of N measurements, + each in format (x, y, a, h) where (x, y) is the bounding box center + position, a the aspect ratio, and h the height. + only_position (bool): If True, distance computation is done with + respect to the bounding box center position only. + metric (str): Compute selected metric. + + Returns: + (np.array): Array of length N, where the i-th element contains the + squared Mahalanobis distance between (mean, covariance) and + `measurements[i]`. + + """ + mean, covariance = self.project(mean, covariance) + if only_position: + mean, covariance = mean[:2], covariance[:2, :2] + measurements = measurements[:, :2] + + d = measurements - mean + if metric == 'gaussian': + return np.sum(d * d, axis=1) + + if metric == 'maha': + cholesky_factor = np.linalg.cholesky(covariance) + z = scipy.linalg.solve_triangular( + cholesky_factor, d.T, lower=True, check_finite=False, + overwrite_b=True) + squared_maha = np.sum(z * z, axis=0) + return squared_maha + + raise ValueError('invalid distance metric') diff --git a/research/cv/JDE/src/log.py b/research/cv/JDE/src/log.py new file mode 100644 index 0000000000000000000000000000000000000000..516cdc7eb438bcaafc2de728ec563af541089fd4 --- /dev/null +++ b/research/cv/JDE/src/log.py @@ -0,0 +1,36 @@ +# Copyright 2022 Huawei Technologies Co., Ltd +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================ +"""Logger.""" +import logging + + +def get_logger(name='root'): + """ + Get Logger. + """ + formatter = logging.Formatter( + fmt='%(asctime)s [%(levelname)s]: %(message)s', datefmt='%Y-%m-%d %H:%M:%S') + + handler = logging.StreamHandler() + handler.setFormatter(formatter) + + logg = logging.getLogger(name) + logg.setLevel(logging.DEBUG) + logg.addHandler(handler) + + return logg + + +logger = get_logger('root') diff --git a/research/cv/JDE/src/model.py b/research/cv/JDE/src/model.py new file mode 100644 index 0000000000000000000000000000000000000000..5b62133971988c73f7cb2298b755254d1f465297 --- /dev/null +++ b/research/cv/JDE/src/model.py @@ -0,0 +1,534 @@ +# Copyright 2022 Huawei Technologies Co., Ltd +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================ +"""YOLOv3 based on DarkNet.""" +import math + +import mindspore as ms +import mindspore.numpy as msnp +from mindspore import nn +from mindspore import ops +from mindspore.ops import constexpr +from mindspore.ops import operations as P + +from cfg.config import config as default_config +from src.utils import DecodeDeltaMap +from src.utils import SoftmaxCE +from src.utils import create_anchors_vec + + +def _conv_bn_relu( + in_channel, + out_channel, + ksize, + stride=1, + padding=0, + dilation=1, + alpha=0.1, + momentum=0.9, + eps=1e-5, + pad_mode="same", +): + """ + Set a conv2d, BN and relu layer. + """ + dbl = nn.SequentialCell( + [ + nn.Conv2d( + in_channel, + out_channel, + kernel_size=ksize, + stride=stride, + padding=padding, + dilation=dilation, + pad_mode=pad_mode, + ), + nn.BatchNorm2d(out_channel, momentum=momentum, eps=eps), + nn.LeakyReLU(alpha), + ] + ) + + return dbl + + +@constexpr +def batch_index(batch_size): + """ + Construct index for each image in batch. + + Example: + if batch_size = 2, returns ms.Tensor([[0], [1]]) + """ + batch_i = ms.Tensor(msnp.arange(batch_size).reshape(-1, 1), dtype=ms.int32) + + return batch_i + + +class YoloBlock(nn.Cell): + """ + YoloBlock for YOLOv3. + + Args: + in_channels (int): Input channel. + out_chls (int): Middle channel. + out_channels (int): Output channel. + config (class): Config with model and training params. + + Returns: + c5 (ms.Tensor): Feature map to feed at next layers. + out (ms.Tensor): Output feature map. + emb (ms.Tensor): Output embeddings. + + Examples: + YoloBlock(1024, 512, 24) + """ + + def __init__( + self, + in_channels, + out_chls, + out_channels, + config=default_config, + ): + super().__init__() + out_chls_2 = out_chls * 2 + + emb_dim = config.embedding_dim + + self.conv0 = _conv_bn_relu(in_channels, out_chls, ksize=1) + self.conv1 = _conv_bn_relu(out_chls, out_chls_2, ksize=3) + + self.conv2 = _conv_bn_relu(out_chls_2, out_chls, ksize=1) + self.conv3 = _conv_bn_relu(out_chls, out_chls_2, ksize=3) + + self.conv4 = _conv_bn_relu(out_chls_2, out_chls, ksize=1) + self.conv5 = _conv_bn_relu(out_chls, out_chls_2, ksize=3) + + self.conv6 = nn.Conv2d(out_chls_2, out_channels, kernel_size=1, stride=1, has_bias=True) + + self.emb_conv = nn.Conv2d(out_chls, emb_dim, kernel_size=3, stride=1, has_bias=True) + + def construct(self, x): + """ + Feed forward feature map to YOLOv3 block + to get detections and embeddings. + """ + c1 = self.conv0(x) + c2 = self.conv1(c1) + + c3 = self.conv2(c2) + c4 = self.conv3(c3) + + c5 = self.conv4(c4) + c6 = self.conv5(c5) + + emb = self.emb_conv(c5) + + out = self.conv6(c6) + + return c5, out, emb + + +class YOLOv3(nn.Cell): + """ + YOLOv3 Network. + + Note: + backbone = darknet53 + + Args: + backbone_shape (list): Darknet output channels shape. + backbone (nn.Cell): Backbone Network. + out_channel (int): Output channel. + + Returns: + small_feature (ms.Tensor): Feature_map with shape (batch_size, backbone_shape[2], h/8, w/8). + medium_feature (ms.Tensor): Feature_map with shape (batch_size, backbone_shape[3], h/16, w/16). + big_feature (ms.Tensor): Feature_map with shape (batch_size, backbone_shape[4], h/32, w/32). + + Examples: + YOLOv3( + backbone_shape=[64, 128, 256, 512, 1024] + backbone=darknet53(), + out_channel=24, + ) + """ + + def __init__(self, backbone_shape, backbone, out_channel): + super().__init__() + self.out_channel = out_channel + self.backbone = backbone + self.backblock0 = YoloBlock( + in_channels=backbone_shape[-1], # 1024 + out_chls=backbone_shape[-2], # 512 + out_channels=out_channel, # 24 + ) + + self.conv1 = _conv_bn_relu( + in_channel=backbone_shape[-2], # 1024 + out_channel=backbone_shape[-2] // 2, # 512 + ksize=1, + ) + self.backblock1 = YoloBlock( + in_channels=backbone_shape[-2] + backbone_shape[-3], # 768 + out_chls=backbone_shape[-3], # 256 + out_channels=out_channel, # 24 + ) + + self.conv2 = _conv_bn_relu( + in_channel=backbone_shape[-3], # 256 + out_channel=backbone_shape[-3] // 2, # 128 + ksize=1, + ) + self.backblock2 = YoloBlock( + in_channels=backbone_shape[-3] + backbone_shape[-4], # 384 + out_chls=backbone_shape[-4], # 128 + out_channels=out_channel, # 24 + ) + self.concat = P.Concat(axis=1) + + self.freeze_bn() + + def freeze_bn(self): + """Freeze batch norms.""" + for _, cell in self.cells_and_names(): + if isinstance(cell, nn.BatchNorm2d): + cell.beta.requires_grad = False + cell.gamma.requires_grad = False + + def construct(self, x): + """ + Feed forward image to FPN to get + 3 feature maps from different scales. + """ + # input_shape of x is (batch_size, 3, h, w) + img_hight = P.Shape()(x)[2] + img_width = P.Shape()(x)[3] + feature_map1, feature_map2, feature_map3 = self.backbone(x) + con1, small_object_output, sml_emb = self.backblock0(feature_map3) + + con1 = self.conv1(con1) + ups1 = P.ResizeNearestNeighbor((img_hight // 16, img_width // 16))(con1) + con1 = self.concat((ups1, feature_map2)) + con2, medium_object_output, med_emb = self.backblock1(con1) + + con2 = self.conv2(con2) + ups2 = P.ResizeNearestNeighbor((img_hight // 8, img_width // 8))(con2) + con3 = self.concat((ups2, feature_map1)) + _, big_object_output, big_emb = self.backblock2(con3) + + small_feature = self.concat((small_object_output, sml_emb)) + medium_feature = self.concat((medium_object_output, med_emb)) + big_feature = self.concat((big_object_output, big_emb)) + + return small_feature, medium_feature, big_feature + + +class YOLOLayer(nn.Cell): + """ + Head for loss calculation of classification confidence, + bbox regression and ids embedding learning . + + Args: + anchors (list): Absolute sizes of anchors (w, h). + nid (int): Number of identities in whole train datasets. + emb_dim (int): Size of embedding. + nc (int): Number of ground truth classes. + + Returns: + loss (ms.Tensor): Auto balanced loss, calculated from conf, bbox and ids. + """ + + def __init__( + self, + anchors, + nid, + emb_dim, + nc=default_config.num_classes, + ): + super().__init__() + self.anchors = ms.Tensor(anchors, ms.float32) + self.na = len(anchors) # Number of anchors (4) + self.nc = nc # Number of classes (1) + self.nid = nid # Number of identities + self.emb_dim = emb_dim + + # Set necessary operations and constants + self.normalize = ops.L2Normalize(axis=1, epsilon=1e-12) + self.argmax = ops.ArgMaxWithValue(axis=1) + self.expand_dims = ops.ExpandDims() + self.reduce_sum = ops.ReduceSum() + self.fill = ops.Fill() + self.exp = ops.Exp() + self.zero_tensor = ms.Tensor([0]) + + # Set eps to escape division by zero + self.eps = ms.Tensor(1e-16, dtype=ms.float32) + + self.smooth_l1_loss = nn.SmoothL1Loss() + self.softmax_loss = SoftmaxCE() + self.id_loss = SoftmaxCE() + + # Set trainable parameters for loss computation + self.s_c = ms.Parameter(-4.15 * ms.Tensor([1])) # -4.15 + self.s_r = ms.Parameter(-4.85 * ms.Tensor([1])) # -4.85 + self.s_id = ms.Parameter(-2.3 * ms.Tensor([1])) # -2.3 + + self.emb_scale = math.sqrt(2) * math.log(self.nid - 1) + + def construct(self, p_cat, tconf, tbox, tids, emb_indices, classifier): + """ + Feed forward output from the FPN, + calculate confidence loss, bbox regression loss, target id loss, + apply auto-balancing loss strategy. + """ + # Get detections and embeddings from model concatenated output. + p, p_emb = p_cat[:, :24, ...], p_cat[:, 24:, ...] + nb, ngh, ngw = p.shape[0], p.shape[-2], p.shape[-1] + + p = p.view(nb, self.na, self.nc + 5, ngh, ngw).transpose(0, 1, 3, 4, 2) # prediction + p_emb = p_emb.transpose(0, 2, 3, 1) + p_box = p[..., :4] + p_conf = p[..., 4:6].transpose(0, 4, 1, 2, 3) + + mask = (tconf > 0).astype('float32') + + # Compute losses + nm = self.reduce_sum(mask) # number of anchors (assigned to targets) + p_box = p_box * self.expand_dims(mask, -1) + tbox = tbox * self.expand_dims(mask, -1) + lbox = self.smooth_l1_loss(p_box, tbox) + lbox = lbox * self.expand_dims(mask, -1) + lbox = self.reduce_sum(lbox) / (nm * 4 + self.eps) + + lconf = self.softmax_loss(p_conf.transpose(0, 2, 3, 4, 1), tconf, ignore_index=-1) + + # Construct indices for selecting embeddings + # from the flattened view of the model output + # (corresponding to the embeddings prediction). + # + # Set flattened mask to existing detections + # and apply it to flattened indices to nullify if it is no detection. + emb_indices_batch_stride = emb_indices + batch_index(nb) * ngh * ngw # Shape (nb, k_max) + emb_indices_mask_flat = (emb_indices.reshape(-1) > 0).astype('float32') # Shape (nb x k_max) + emb_indices_flat = (emb_indices_batch_stride.reshape(-1) * emb_indices_mask_flat).astype('int32') + + # Flatten embs and take which is associate to flattened emb index + emb_flat = p_emb.view(-1, self.emb_dim) # Shape (nb x ngh x ngw, emb_dim) + embedding = emb_flat[emb_indices_flat] # Shape (nb x k_max, emb_dim) + embedding = self.emb_scale * self.normalize(embedding) + + # Flatten max tids and take according to index + _, tids = self.argmax(tids.astype('float32')) # Shape (nb, ngh, ngw) + tids_flat = tids.view(-1)[emb_indices_flat] # Shape (nb x k_max) + + # Apply flattened emb mask for nullify if it is no detections + # and subtract 1 where no detection to apply ignore mask into loss calculation. + tids_flat_masked = tids_flat * emb_indices_mask_flat + tids_flat_with_ignore = tids_flat_masked + (emb_indices_mask_flat - 1) + + # Apply FC layer to embeddings + # and compute loss by custom loss with ignore index = -1. + logits = classifier(embedding) + lid = self.id_loss(logits, tids_flat_with_ignore.astype('int32'), ignore_index=-1) + + # Apply auto-balancing loss strategy + loss = self.exp((-1) * self.s_r) * lbox + \ + self.exp((-1) * self.s_c) * lconf + \ + self.exp((-1) * self.s_id) * lid + \ + (self.s_r + self.s_c + self.s_id) + loss *= 0.5 + + return loss.squeeze() + + +class JDE(nn.Cell): + """ + JDE Network. + + Args: + extractor (nn.Cell): Backbone, which extracts feature maps. + config (class): Config with model and training params. + nid (int): Number of identities in whole train datasets. + ne (int): Size of embedding. + + Returns: + loss (ms.Tensor): Sum of 3 losses from each head. + + Note: + backbone = YOLOv3 with darknet53 + head = 3 similar heads for each feature map size + """ + + def __init__(self, extractor, config, nid, ne): + super().__init__() + anchors = config.anchor_scales + anchors1 = anchors[0:4] + anchors2 = anchors[4:8] + anchors3 = anchors[8:12] + + self.backbone = extractor + + # Set loss cell layers for different scales + self.head_s = YOLOLayer(anchors3, nid, ne) + self.head_m = YOLOLayer(anchors2, nid, ne) + self.head_b = YOLOLayer(anchors1, nid, ne) + + # Set classifier for embeddings + self.classifier = nn.Dense(ne, nid) + + def construct( + self, + images, + tconf_s, + tbox_s, + tid_s, + tconf_m, + tbox_m, + tid_m, + tconf_b, + tbox_b, + tid_b, + mask_s, + mask_m, + mask_b, + ): + """ + Feed forward image to FPN, get 3 feature maps with different sizes, + put it into 3 heads, corresponding to size, + get auto-balanced losses, summarize them. + """ + # Apply FPN to image to get 3 feature map with different scales + small, medium, big = self.backbone(images) + + # Calculate losses for each feature map + out_s = self.head_s(small, tconf_s, tbox_s, tid_s, mask_s, self.classifier) + out_m = self.head_m(medium, tconf_m, tbox_m, tid_m, mask_m, self.classifier) + out_b = self.head_b(big, tconf_b, tbox_b, tid_b, mask_b, self.classifier) + + loss = (out_s + out_m + out_b) / 3 + + return loss + + +class YOLOLayerEval(nn.Cell): + """ + Head for detection and tracking. + + Args: + anchor (list): Absolute sizes of anchors (w, h). + nc (int): Number of ground truth classes. + + Returns: + prediction (ms.Tensor): Model predictions for confidences, boxes and embeddings. + """ + + def __init__( + self, + anchor, + stride, + nc=default_config.num_classes, + ): + super().__init__() + self.na = len(anchor) # number of anchors (4) + self.nc = nc # number of classes (1) + self.anchor_vec = anchor + self.stride = stride + + self.argmax = ops.ArgMaxWithValue(axis=1) + self.expand_dims = ops.ExpandDims() + self.softmax = nn.Softmax(axis=1) + self.normalize = ops.L2Normalize(axis=-1, epsilon=1e-12) + self.tile = ops.Tile() + self.fill = ops.Fill() + self.concat = ops.Concat(axis=-1) + + self.decode_map = DecodeDeltaMap() + + def construct(self, p_cat): + """ + Feed forward output from the FPN, + calculate prediction corresponding to anchor. + """ + p, p_emb = p_cat[:, :24, ...], p_cat[:, 24:, ...] + nb, ngh, ngw = p.shape[0], p.shape[-2], p.shape[-1] + + p = p.view(nb, self.na, self.nc + 5, ngh, ngw).transpose(0, 1, 3, 4, 2) # prediction + p_emb = p_emb.transpose(0, 2, 3, 1) + p_box = p[..., :4] + p_conf = p[..., 4:6].transpose(0, 4, 1, 2, 3) # conf + p_conf = self.expand_dims(self.softmax(p_conf)[:, 1, ...], -1) + p_emb = self.normalize(self.tile(self.expand_dims(p_emb, 1), (1, self.na, 1, 1, 1))) + + p_cls = self.fill(ms.float32, (nb, self.na, ngh, ngw, 1), 0) # temp + p = self.concat((p_box, p_conf, p_cls, p_emb)) + + # Decode bbox delta to the absolute cords + p_1 = self.decode_map(p[..., :4], self.anchor_vec) + p_1 = p_1 * self.stride + + p = self.concat((p_1.astype('float32'), p[..., 4:])) + prediction = p.reshape(nb, -1, p.shape[-1]) + + return prediction + + +class JDEeval(nn.Cell): + """ + JDE Network. + + Note: + backbone = YOLOv3 with darknet53. + head = 3 similar heads for each feature map size. + + Returns: + output (ms.Tensor): Tensor with concatenated outputs from each head. + output_top_k (ms.Tensor): Output tensor of top_k best proposals by confidence. + + """ + + def __init__(self, extractor, config): + super().__init__() + anchors, strides = create_anchors_vec(config.anchor_scales) + anchors = ms.Tensor(anchors, dtype=ms.float32) + strides = ms.Tensor(strides, dtype=ms.float32) + + self.backbone = extractor + + self.head_s = YOLOLayerEval(anchors[0], strides[0]) + self.head_m = YOLOLayerEval(anchors[1], strides[1]) + self.head_b = YOLOLayerEval(anchors[2], strides[2]) + + self.concatenate = ops.Concat(axis=1) + self.top_k = ops.TopK(sorted=False) + self.k = 800 + + def construct(self, images): + """ + Feed forward image to FPN, get 3 feature maps with different sizes, + put them into 3 heads, corresponding to size, + get concatenated output of proposals. + """ + small, medium, big = self.backbone(images) + + out_s = self.head_s(small) + out_m = self.head_m(medium) + out_b = self.head_b(big) + + output = self.concatenate((out_s, out_m, out_b)) + + _, top_k_indices = self.top_k(output[:, :, 4], self.k) + output_top_k = output[0][top_k_indices] + + return output, output_top_k diff --git a/research/cv/JDE/src/timer.py b/research/cv/JDE/src/timer.py new file mode 100644 index 0000000000000000000000000000000000000000..2350b224ed0ba9af1fbdfe7411464d3957d1eec1 --- /dev/null +++ b/research/cv/JDE/src/timer.py @@ -0,0 +1,61 @@ +# Copyright 2022 Huawei Technologies Co., Ltd +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================ +"""Simple timer script.""" +import time + + +class Timer: + """ + A simple timer. + """ + def __init__(self): + self.total_time = 0. + self.calls = 0 + self.start_time = 0. + self.diff = 0. + self.average_time = 0. + + self.duration = 0. + + def tic(self): + """ + Get the start time. + """ + self.start_time = time.time() + + def toc(self, average=True): + """ + Compute duration of the period + """ + self.diff = time.time() - self.start_time + self.total_time += self.diff + self.calls += 1 + self.average_time = self.total_time / self.calls + if average: + self.duration = self.average_time + else: + self.duration = self.diff + return self.duration + + def clear(self): + """ + Clear values. + """ + self.total_time = 0. + self.calls = 0 + self.start_time = 0. + self.diff = 0. + self.average_time = 0. + self.duration = 0. diff --git a/research/cv/JDE/src/utils.py b/research/cv/JDE/src/utils.py new file mode 100644 index 0000000000000000000000000000000000000000..8605f5eda17e3a7f6252e5d9e9675a86393868af --- /dev/null +++ b/research/cv/JDE/src/utils.py @@ -0,0 +1,537 @@ +# Copyright 2022 Huawei Technologies Co., Ltd +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================ +"""Auxiliary utils.""" +import os + +import numpy as np +from mindspore import Tensor +from mindspore import dtype as mstype +from mindspore import nn +from mindspore import numpy as msnp +from mindspore import ops +from mindspore.ops import functional as F + + +def mkdir_if_missing(directory): + os.makedirs(directory, exist_ok=True) + + +def xyxy2xywh(x): + """ + Convert bounding box format from [x1, y1, x2, y2] to [x, y, w, h], + where x, y are coordinates of center, (x1, y1) and (x2, y2) + are coordinates of bottom left and top right respectively. + """ + y = np.zeros_like(x) + y[:, 0] = (x[:, 0] + x[:, 2]) / 2 # x center + y[:, 1] = (x[:, 1] + x[:, 3]) / 2 # y center + y[:, 2] = x[:, 2] - x[:, 0] # width + y[:, 3] = x[:, 3] - x[:, 1] # height + return y + + +def xywh2xyxy(x): + """ + Convert bounding box format from [x, y, w, h] to [x1, y1, x2, y2], + where x, y are coordinates of center, (x1, y1) and (x2, y2) + are coordinates of bottom left and top right respectively. + """ + y = np.zeros_like(x) + y[:, 0] = (x[:, 0] - x[:, 2] / 2) # Bottom left x + y[:, 1] = (x[:, 1] - x[:, 3] / 2) # Bottom left y + y[:, 2] = (x[:, 0] + x[:, 2] / 2) # Top right x + y[:, 3] = (x[:, 1] + x[:, 3] / 2) # Top right y + return y + + +def scale_coords(img_size, coords, img0_shape): + """ + Rescale x1, y1, x2, y2 to image size. + """ + gain_w = float(img_size[0]) / img0_shape[1] # gain = old / new + gain_h = float(img_size[1]) / img0_shape[0] + gain = min(gain_w, gain_h) + pad_x = (img_size[0] - img0_shape[1] * gain) / 2 # width padding + pad_y = (img_size[1] - img0_shape[0] * gain) / 2 # height padding + coords[:, [0, 2]] -= pad_x + coords[:, [1, 3]] -= pad_y + coords[:, 0:4] /= gain + cords_max = np.max(coords[:, :4]) + coords[:, :4] = np.clip(coords[:, :4], a_min=0, a_max=cords_max) + return coords + + +class SoftmaxCE(nn.Cell): + """ + Original nn.SoftmaxCrossEntropyWithLogits with modifications: + 1) Set ignore index = -1. + 2) Reshape labels and logits to (n, C). + 3) Calculate mean by mask. + """ + def __init__(self): + super().__init__() + # Set necessary operations and constants + self.soft_ce = ops.SoftmaxCrossEntropyWithLogits() + self.expand_dim = ops.ExpandDims() + self.transpose = ops.Transpose() + self.reshape = ops.Reshape() + self.one_hot = ops.OneHot() + self.sum = ops.ReduceSum() + self.one = Tensor(1, mstype.float32) + self.zero = Tensor(0, mstype.float32) + + # Set eps to escape division by zero + self.eps = Tensor(1e-16, dtype=mstype.float32) + + def construct(self, logits, labels, ignore_index): + """ + Calculate softmax loss between logits and labels with ignore mask. + """ + # Ignore indices which have not exactly recognized iou + mask = labels != ignore_index + mask = mask.astype('float32') + channels = F.shape(logits)[-1] + + # One-hot labels for total identities in dataset + labels_one_hot = self.one_hot(labels.flatten(), channels, self.one, self.zero) + raw_loss, _ = self.soft_ce( + self.reshape(logits, (-1, channels)), + self.reshape(labels_one_hot, (-1, channels)), + ) + + # Apply mask and take mean of losses + result = raw_loss * mask.reshape(raw_loss.shape) + result = self.sum(result) / (self.sum(mask) + self.eps) + + return result + + +def build_targets_thres(target, anchor_wh, na, ngh, ngw, k_max): + """ + Build grid of targets confidence mask, bbox delta and id with thresholds. + + Args: + target (np_array): Targets bbox cords and ids. + anchor_wh (np_array): Resized anchors for map size. + na (int): Number of anchors. + ngh (int): Map height. + ngw (int): Map width. + k_max (int): Limitation of max detections per image. + + Returns: + tconf (np_array): Mask with bg (0), gt (1) and ign (-1) indices. Shape (na, ngh, ngw). + tbox (np_array): Targets delta bbox values. Shape (na, ngh, ngw, 4). + tid (np_array): Grid with id for every cell. Shape (na, ngh, ngw). + + """ + id_thresh = 0.5 + fg_thresh = 0.5 + bg_thresh = 0.4 + + bg_id = -1 # Background id + + tbox = np.zeros((na, ngh, ngw, 4), dtype=np.float32) # Fill grid with zeros bbox cords + tconf = np.zeros((na, ngh, ngw), dtype=np.int32) # Fill grid with zeros confidence + tid = np.full((na, ngh, ngw), bg_id, dtype=np.int32) # Fill grid with background id + + t = target + t_id = t[:, 1].copy().astype(np.int32) + t = t[:, [0, 2, 3, 4, 5]] + + # Convert relative cords for map size + gxy, gwh = t[:, 1:3].copy(), t[:, 3:5].copy() + gxy[:, 0] = gxy[:, 0] * ngw + gxy[:, 1] = gxy[:, 1] * ngh + gwh[:, 0] = gwh[:, 0] * ngw + gwh[:, 1] = gwh[:, 1] * ngh + gxy[:, 0] = np.clip(gxy[:, 0], a_min=0, a_max=ngw - 1) + gxy[:, 1] = np.clip(gxy[:, 1], a_min=0, a_max=ngh - 1) + + gt_boxes = np.concatenate((gxy, gwh), axis=1) # Shape (num of targets, 4), 4 is (xc, yc, w, h) + + # Apply anchor to each cell of the grid + anchor_mesh = generate_anchor(ngh, ngw, anchor_wh) # Shape (na, 4, ngh, ngw) + anchor_list = anchor_mesh.transpose(0, 2, 3, 1).reshape(-1, 4) # Shape (na x ngh x ngw, 4) + + # Compute anchor iou with ground truths bboxes + iou_pdist = bbox_iou(anchor_list, gt_boxes) # Shape (na x ngh x ngw, Ng) + max_gt_index = iou_pdist.argmax(axis=1) # Shape (na x ngh x ngw) + iou_max = iou_pdist.max(axis=1) # Shape (na x ngh x ngw) + + iou_map = iou_max.reshape(na, ngh, ngw) + gt_index_map = max_gt_index.reshape(na, ngh, ngw) + + # Fill tconf by thresholds + id_index = iou_map > id_thresh + fg_index = iou_map > fg_thresh + bg_index = iou_map < bg_thresh + ign_index = (iou_map < fg_thresh) * (iou_map > bg_thresh) # Search unclear cells + tconf[fg_index] = 1 + tconf[bg_index] = 0 + tconf[ign_index] = -1 # Index to ignore unclear cells + + # Take ground truths with mask + gt_index = gt_index_map[fg_index] + gt_box_list = gt_boxes[gt_index] + gt_id_list = t_id[gt_index_map[id_index]] + if np.sum(fg_index) > 0: + tid[id_index] = gt_id_list + fg_anchor_list = anchor_list.reshape((na, ngh, ngw, 4))[fg_index] + delta_target = encode_delta(gt_box_list, fg_anchor_list) + tbox[fg_index] = delta_target + + # Indices of cells with detections + tconf_max = tconf.max(0) + tid_max = tid.max(0) + indices = np.where((tconf_max.flatten() > 0) & (tid_max.flatten() >= 0))[0] + + # Fill indices with zeros if k < k_max + # Where k - is the detections per image + # k_max - max detections per image + k = len(indices) + t_indices = np.zeros(k_max) + t_indices[..., :min(k_max, k)] = indices[..., :min(k_max, k)] + + return tconf, tbox, tid, t_indices + + +def bbox_iou(box1, box2, x1y1x2y2=False): + """ + Returns the IoU of two bounding boxes. + """ + n, m = len(box1), len(box2) + if x1y1x2y2: + # Get the coordinates of bounding boxes + b1_x1, b1_y1, b1_x2, b1_y2 = box1[:, 0], box1[:, 1], box1[:, 2], box1[:, 3] + b2_x1, b2_y1, b2_x2, b2_y2 = box2[:, 0], box2[:, 1], box2[:, 2], box2[:, 3] + else: + # Transform from center and width to exact coordinates + b1_x1, b1_x2 = box1[:, 0] - box1[:, 2] / 2, box1[:, 0] + box1[:, 2] / 2 + b1_y1, b1_y2 = box1[:, 1] - box1[:, 3] / 2, box1[:, 1] + box1[:, 3] / 2 + b2_x1, b2_x2 = box2[:, 0] - box2[:, 2] / 2, box2[:, 0] + box2[:, 2] / 2 + b2_y1, b2_y2 = box2[:, 1] - box2[:, 3] / 2, box2[:, 1] + box2[:, 3] / 2 + + # Get the coordinates of the intersection rectangle + inter_rect_x1 = np.maximum(np.expand_dims(b1_x1, 1), b2_x1) + inter_rect_y1 = np.maximum(np.expand_dims(b1_y1, 1), b2_y1) + inter_rect_x2 = np.minimum(np.expand_dims(b1_x2, 1), b2_x2) + inter_rect_y2 = np.minimum(np.expand_dims(b1_y2, 1), b2_y2) + + # Intersection area + i_r_x = inter_rect_x2 - inter_rect_x1 + i_r_y = inter_rect_y2 - inter_rect_y1 + inter_area = np.clip(i_r_x, 0, np.max(i_r_x)) * np.clip(i_r_y, 0, np.max(i_r_y)) + + # Union Area + b1_area = np.broadcast_to(((b1_x2 - b1_x1) * (b1_y2 - b1_y1)).reshape(-1, 1), (n, m)) + b2_area = np.broadcast_to(((b2_x2 - b2_x1) * (b2_y2 - b2_y1)).reshape(1, -1), (n, m)) + + return inter_area / (b1_area + b2_area - inter_area + 1e-16) + + +def generate_anchor(ngh, ngw, anchor_wh): + """ + Generate anchor for every cell in grid. + """ + na = len(anchor_wh) + yy, xx = np.meshgrid(np.arange(ngh), np.arange(ngw), indexing='ij') + + mesh = np.stack([xx, yy], axis=0) # Shape 2, ngh, ngw + mesh = np.tile(np.expand_dims(mesh, 0), (na, 1, 1, 1)).astype(np.float32) # Shape na, 2, ngh, ngw + anchor_offset_mesh = np.tile(np.expand_dims(np.expand_dims(anchor_wh, -1), -1), (1, 1, ngh, ngw)) # Shape na, 2, ngh, ngw + anchor_mesh = np.concatenate((mesh, anchor_offset_mesh), axis=1) # Shape na, 4, ngh, ngw + return anchor_mesh + + +def encode_delta(gt_box_list, fg_anchor_list): + """ + Calculate delta for bbox center, width, height. + """ + px, py, pw, ph = fg_anchor_list[:, 0], fg_anchor_list[:, 1], \ + fg_anchor_list[:, 2], fg_anchor_list[:, 3] + gx, gy, gw, gh = gt_box_list[:, 0], gt_box_list[:, 1], \ + gt_box_list[:, 2], gt_box_list[:, 3] + dx = (gx - px) / pw + dy = (gy - py) / ph + dw = np.log(gw / pw) + dh = np.log(gh / ph) + + return np.stack([dx, dy, dw, dh], axis=1) + + +def create_grids(anchors, img_size, ngw): + """ + Resize anchor according to image size and feature map size. + + Note: + Ratio of feature maps dimensions if 1:3 such as anchors. + Thus, it's enough to calculate stride per one dimension. + """ + stride = img_size[0] / ngw + anchor_vec = np.array(anchors) / stride + + return anchor_vec, stride + + +def build_thresholds( + labels, + anchor_vec_s, + anchor_vec_m, + anchor_vec_b, + k_max, +): + """ + Build thresholds for all feature map sizes. + """ + s = build_targets_thres(labels, anchor_vec_s, 4, 19, 34, k_max) + m = build_targets_thres(labels, anchor_vec_m, 4, 38, 68, k_max) + b = build_targets_thres(labels, anchor_vec_b, 4, 76, 136, k_max) + + return s, m, b + + +def create_anchors_vec(anchors, img_size=(1088, 608)): + """ + Create anchor vectors for every feature map size. + """ + anchors1 = anchors[0:4] + anchors2 = anchors[4:8] + anchors3 = anchors[8:12] + anchor_vec_s, stride_s = create_grids(anchors3, img_size, 34) + anchor_vec_m, stride_m = create_grids(anchors2, img_size, 68) + anchor_vec_b, stride_b = create_grids(anchors1, img_size, 136) + + anchors = (anchor_vec_s, anchor_vec_m, anchor_vec_b) + strides = (stride_s, stride_m, stride_b) + + return anchors, strides + + +class DecodeDeltaMap(nn.Cell): + """ + Network predicts delta for base anchors. + + Decodes predictions into relative bbox cords. + """ + def __init__(self): + super().__init__() + self.exp = ops.operations.Exp() + self.stack0 = ops.Stack(axis=0) + self.stack1 = ops.Stack(axis=1) + self.expand_dims = ops.ExpandDims() + self.reshape = ops.Reshape() + self.concat = ops.Concat(axis=2) + + def construct(self, delta_map, anchors): + """ + Decode delta of bbox predictions and summarize it with anchors. + """ + anchors = anchors.astype('float32') + nb, na, ngh, ngw, _ = delta_map.shape + yy, xx = msnp.meshgrid(msnp.arange(ngh), msnp.arange(ngw), indexing='ij') + + mesh = self.stack0([xx, yy]).astype('float32') # Shape (2, ngh, ngw) + mesh = msnp.tile(self.expand_dims(mesh, 0), (nb, na, 1, 1, 1)) # Shape (nb, na, 2, ngh, ngw) + anchors_unsqueezed = self.expand_dims(self.expand_dims(anchors, -1), -1) # Shape (na, 2, 1, 1) + anchor_offset_mesh = msnp.tile(anchors_unsqueezed, (nb, 1, 1, ngh, ngw)) # Shape (nb, na, 2, ngh, ngw) + anchor_mesh = self.concat((mesh, anchor_offset_mesh)) # Shape (nb, na, 4, ngh, ngw) + + anchor_mesh = anchor_mesh.transpose(0, 1, 3, 4, 2) + + delta = delta_map.reshape(-1, 4) + fg_anchor_list = anchor_mesh.reshape(-1, 4) + px, py, pw, ph = fg_anchor_list[:, 0], fg_anchor_list[:, 1], \ + fg_anchor_list[:, 2], fg_anchor_list[:, 3] + dx, dy, dw, dh = delta[:, 0], delta[:, 1], delta[:, 2], delta[:, 3] + gx = pw * dx + px + gy = ph * dy + py + gw = pw * self.exp(dw) + gh = ph * self.exp(dh) + + pred_list = self.stack1([gx, gy, gw, gh]) + + pred_map = pred_list.reshape(nb, na, ngh, ngw, 4) + + return pred_map + + +def non_max_suppression(prediction, conf_thres=0.5, nms_thres=0.4): + """ + Removes detections with lower object confidence score than 'conf_thres' + Non-Maximum Suppression to further filter detections. + + Args: + prediction (np.array): All predictions from model output. + conf_thres (float): Threshold for confidence. + nms_thres (float): Threshold for iou into nms. + + Returns: + output (np.array): Predictions with shape (x1, y1, x2, y2, object_conf, class_score, class_pred) + """ + + output = [None for _ in range(len(prediction))] + for image_i, pred in enumerate(prediction): + # Filter out confidence scores below threshold + # Get score and class with highest confidence + + v = pred[:, 4] > conf_thres + v = np.squeeze(v.nonzero()) + if v.ndim == 0: + v = np.expand_dims(v, 0) + + pred = pred[v] + + # If none are remaining => process next image + npred = pred.shape[0] + if not npred: + continue + # From (center x, center y, width, height) to (x1, y1, x2, y2) + pred[:, :4] = xywh2xyxy(pred[:, :4]) + + # Non-maximum suppression + bboxes = np.concatenate((pred[:, :4], np.expand_dims(pred[:, 4], -1)), axis=1) + nms_indices = nms(bboxes, nms_thres) + det_max = pred[nms_indices] + + if det_max.size > 0: + # Add max detections to outputs + output[image_i] = det_max if output[image_i] is None else np.concatenate((output[image_i], det_max)) + + return output + + +def nms(dets, thresh): + """ + Non-maximum suppression with threshold. + """ + x1 = dets[:, 0] + y1 = dets[:, 1] + x2 = dets[:, 2] + y2 = dets[:, 3] + scores = dets[:, 4] + + areas = (x2 - x1 + 1) * (y2 - y1 + 1) + order = scores.argsort()[::-1] + + keep = [] + while order.size > 0: + i = order[0] + keep.append(i) + xx1 = np.maximum(x1[i], x1[order[1:]]) + yy1 = np.maximum(y1[i], y1[order[1:]]) + xx2 = np.minimum(x2[i], x2[order[1:]]) + yy2 = np.minimum(y2[i], y2[order[1:]]) + + w = np.maximum(0.0, xx2 - xx1 + 1) + h = np.maximum(0.0, yy2 - yy1 + 1) + inter = w * h + ovr = inter / (areas[i] + areas[order[1:]] - inter) + + inds = np.where(ovr <= thresh)[0] + order = order[inds + 1] + + return keep + + +def ap_per_class(tp, conf, pred_cls, target_cls): + """ + Computes the average precision, given the recall and precision curves. + Method originally from https://github.com/rafaelpadilla/Object-Detection-Metrics. + + Args: + tp (list): True positives. + conf (list): Objectness value from 0-1. + pred_cls (np.array): Predicted object classes. + target_cls (np.array): True object classes. + + Returns: + ap (np.array): The average precision as computed in py-faster-rcnn. + unique classes (np.array): Classes of predictions. + r (np.array): Recall. + p (np.array): Precision. + """ + + # lists/pytorch to numpy + tp, conf, pred_cls, target_cls = np.array(tp), np.array(conf), np.array(pred_cls), np.array(target_cls) + + # Sort by objectness + i = np.argsort(-conf) + tp, conf, pred_cls = tp[i], conf[i], pred_cls[i] + + # Find unique classes + unique_classes = np.unique(np.concatenate((pred_cls, target_cls), 0)) + + # Create Precision-Recall curve and compute AP for each class + ap, p, r = [], [], [] + for c in unique_classes: + i = pred_cls == c + n_gt = sum(target_cls == c) # Number of ground truth objects + n_p = sum(i) # Number of predicted objects + + if (n_p == 0) and (n_gt == 0): + continue + + if (n_p == 0) or (n_gt == 0): + ap.append(0) + r.append(0) + p.append(0) + else: + # Accumulate FPs and TPs + fpc = np.cumsum(1 - tp[i]) + tpc = np.cumsum(tp[i]) + + # Recall + recall_curve = tpc / (n_gt + 1e-16) + r.append(tpc[-1] / (n_gt + 1e-16)) + + # Precision + precision_curve = tpc / (tpc + fpc) + p.append(tpc[-1] / (tpc[-1] + fpc[-1])) + + # AP from recall-precision curve + ap.append(compute_ap(recall_curve, precision_curve)) + + return np.array(ap), unique_classes.astype('int32'), np.array(r), np.array(p) + + +def compute_ap(recall, precision): + """ + Computes the average precision, given the recall and precision curves. + Code originally from https://github.com/rbgirshick/py-faster-rcnn. + + Args: + recall (list): The recall curve. + precision (list): The precision curve. + + Returns: + ap (np.array): The average precision as computed in py-faster-rcnn. + """ + + # correct AP calculation + # first append sentinel values at the end + mrec = np.concatenate(([0.], recall, [1.])) + mpre = np.concatenate(([0.], precision, [0.])) + + # compute the precision envelope + for i in range(mpre.size - 1, 0, -1): + mpre[i - 1] = np.maximum(mpre[i - 1], mpre[i]) + + # to calculate area under PR curve, look for points + # where X axis (recall) changes value + i = np.where(mrec[1:] != mrec[:-1])[0] + + # and sum (\Delta recall) * prec + ap = np.sum((mrec[i + 1] - mrec[i]) * mpre[i + 1]) + return ap diff --git a/research/cv/JDE/src/visualization.py b/research/cv/JDE/src/visualization.py new file mode 100644 index 0000000000000000000000000000000000000000..7057b3c5fd9ea6a1a69d912f71edc17f1663459a --- /dev/null +++ b/research/cv/JDE/src/visualization.py @@ -0,0 +1,54 @@ +# Copyright 2022 Huawei Technologies Co., Ltd +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================ +"""Images visualization script.""" +import numpy as np +import cv2 + + +def get_color(idx): + """ + Set the color for unique pedestrian. + """ + idx = idx * 3 + color = ((37 * idx) % 255, (17 * idx) % 255, (29 * idx) % 255) + + return color + + +def plot_tracking(image, tlwhs, obj_ids, frame_id=0, fps=0., ids2=None): + """ + Show tracking results. + """ + im = np.ascontiguousarray(np.copy(image)) + + text_scale = max(1, image.shape[1] / 1600.) + text_thickness = 1 if text_scale > 1.1 else 1 + line_thickness = max(1, int(image.shape[1] / 500.)) + + cv2.putText(im, f'frame: {frame_id} fps: {fps:.2f} num: {len(tlwhs)}', + (0, int(15 * text_scale)), cv2.FONT_HERSHEY_PLAIN, text_scale, (0, 0, 255), thickness=2) + + for i, tlwh in enumerate(tlwhs): + x1, y1, w, h = tlwh + intbox = tuple(map(int, (x1, y1, x1 + w, y1 + h))) + obj_id = int(obj_ids[i]) + id_text = f'{int(obj_id)}' + if ids2 is not None: + id_text = id_text + f', {int(ids2[i])}' + color = get_color(abs(obj_id)) + cv2.rectangle(im, intbox[0:2], intbox[2:4], color=color, thickness=line_thickness) + cv2.putText(im, id_text, (intbox[0], intbox[1] + 30), cv2.FONT_HERSHEY_PLAIN, text_scale, (0, 0, 255), + thickness=text_thickness) + return im diff --git a/research/cv/JDE/tracker/__init__.py b/research/cv/JDE/tracker/__init__.py new file mode 100644 index 0000000000000000000000000000000000000000..e69de29bb2d1d6434b8b29ae775ad8c2e48c5391 diff --git a/research/cv/JDE/tracker/basetrack.py b/research/cv/JDE/tracker/basetrack.py new file mode 100644 index 0000000000000000000000000000000000000000..49c8c090ee98437902408f74375bcd8a69a74a5a --- /dev/null +++ b/research/cv/JDE/tracker/basetrack.py @@ -0,0 +1,71 @@ +# Copyright 2022 Huawei Technologies Co., Ltd +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================ +"""Init base track params.""" +from collections import OrderedDict + +import numpy as np + + +class TrackState: + new = 0 + tracked = 1 + lost = 2 + removed = 3 + + +class BaseTrack: + """ + Track class template. + """ + _count = 0 + + track_id = 0 + is_activated = False + state = TrackState.new + + history = OrderedDict() + features = [] + curr_feature = None + score = 0 + start_frame = 0 + frame_id = 0 + time_since_update = 0 + + # multi-camera + location = (np.inf, np.inf) + + @property + def end_frame(self): + return self.frame_id + + @staticmethod + def next_id(): + BaseTrack._count += 1 + return BaseTrack._count + + def activate(self, *args): + raise NotImplementedError + + def predict(self): + raise NotImplementedError + + def update(self, *args, **kwargs): + raise NotImplementedError + + def mark_lost(self): + self.state = TrackState.lost + + def mark_removed(self): + self.state = TrackState.removed diff --git a/research/cv/JDE/tracker/matching.py b/research/cv/JDE/tracker/matching.py new file mode 100644 index 0000000000000000000000000000000000000000..faf550f8f88e0a2d6cf1b2f31f43173c51480bd8 --- /dev/null +++ b/research/cv/JDE/tracker/matching.py @@ -0,0 +1,115 @@ +# Copyright 2022 Huawei Technologies Co., Ltd +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================ +"""Matching script.""" +import lap +import numpy as np +from cython_bbox import bbox_overlaps as bbox_ious +from scipy.spatial.distance import cdist + +from src import kalman_filter + + +def linear_assignment(cost_matrix, thresh): + """ + Linear assignment with threshold. + """ + if cost_matrix.size == 0: + out = ( + np.empty((0, 2), dtype=int), + tuple(range(cost_matrix.shape[0])), + tuple(range(cost_matrix.shape[1])), + ) + + return out + matches, unmatched_a, unmatched_b = [], [], [] + _, x, y = lap.lapjv(cost_matrix, extend_cost=True, cost_limit=thresh) + for ix, mx in enumerate(x): + if mx >= 0: + matches.append([ix, mx]) + unmatched_a = np.where(x < 0)[0] + unmatched_b = np.where(y < 0)[0] + matches = np.asarray(matches) + + return matches, unmatched_a, unmatched_b + + +def iou(atlbrs, btlbrs): + """ + Compute cost based on IoU. + """ + ious = np.zeros((len(atlbrs), len(btlbrs)), dtype=np.float) + if ious.size == 0: + return ious + + ious = bbox_ious( + np.ascontiguousarray(atlbrs, dtype=np.float), + np.ascontiguousarray(btlbrs, dtype=np.float) + ) + + return ious + + +def iou_distance(atracks, btracks): + """ + Compute cost based on IoU. + """ + if (atracks and isinstance(atracks[0], np.ndarray)) or \ + (btracks and isinstance(btracks[0], np.ndarray)): + atlbrs = atracks + btlbrs = btracks + else: + atlbrs = [track.tlbr for track in atracks] + btlbrs = [track.tlbr for track in btracks] + + ious_val = iou(atlbrs, btlbrs) + cost_matrix = 1 - ious_val + + return cost_matrix + +def embedding_distance(tracks, detections): + """ + Compute embedding distance. + """ + cost_matrix = np.zeros((len(tracks), len(detections)), dtype=np.float) + if cost_matrix.size == 0: + return cost_matrix + det_features = np.asarray([track.curr_feat for track in detections], dtype=np.float) + track_features = np.asarray([track.smooth_feat for track in tracks], dtype=np.float) + cost_matrix = np.maximum(0.0, cdist(track_features, det_features)) # Nomalized features + + return cost_matrix + + +def fuse_motion(kf, cost_matrix, tracks, detections, only_position=False, lambda_=0.98): + """ + Fuses motion objects. + """ + if cost_matrix.size == 0: + return cost_matrix + gating_dim = 2 if only_position else 4 + gating_threshold = kalman_filter.chi2inv95[gating_dim] + measurements = np.asarray([det.to_xyah() for det in detections]) + for row, track in enumerate(tracks): + gating_distance = kf.gating_distance( + track.mean, + track.covariance, + measurements, + only_position, + metric='maha', + ) + cost_matrix[row, gating_distance > gating_threshold] = np.inf + cost_matrix[row] = lambda_ * cost_matrix[row] + (1-lambda_)* gating_distance + + return cost_matrix diff --git a/research/cv/JDE/tracker/multitracker.py b/research/cv/JDE/tracker/multitracker.py new file mode 100644 index 0000000000000000000000000000000000000000..f712bb161b5cf8d80c853c44f97b1d55e910f2c8 --- /dev/null +++ b/research/cv/JDE/tracker/multitracker.py @@ -0,0 +1,447 @@ +# Copyright 2022 Huawei Technologies Co., Ltd +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================ +"""Multiple objects tracking.""" +from collections import deque + +import numpy as np + +from src.kalman_filter import KalmanFilter +from src.log import logger +from src.utils import non_max_suppression +from src.utils import scale_coords +from tracker import matching +from tracker.basetrack import BaseTrack, TrackState + + +class TrackS(BaseTrack): + """ + Compute stracks. + """ + def __init__(self, tlwh, score, temp_feat, buffer_size=30): + # wait activate + self._tlwh = np.asarray(tlwh, dtype=np.float) + self.kalman_filter = None + self.mean, self.covariance = None, None + self.is_activated = False + + self.score = score + self.tracklet_len = 0 + + self.smooth_feat = None + self.update_features(temp_feat) + self.features = deque([], maxlen=buffer_size) + self.alpha = 0.9 + + def update_features(self, feat): + """ + Update values. + """ + feat /= np.linalg.norm(feat) + self.curr_feat = feat + if self.smooth_feat is None: + self.smooth_feat = feat + else: + self.smooth_feat = self.alpha * self.smooth_feat + (1 - self.alpha) * feat + self.features.append(feat) + self.smooth_feat /= np.linalg.norm(self.smooth_feat) + + def predict(self): + """ + Compute math distribution. + """ + mean_state = self.mean.copy() + if self.state != TrackState.tracked: + mean_state[7] = 0 + self.mean, self.covariance = self.kalman_filter.predict(mean_state, self.covariance) + + @staticmethod + def multi_predict(stracks, kalman_filter): + """ + Compute multi math distribution. + """ + if stracks: + multi_mean = np.asarray([st.mean.copy() for st in stracks]) + multi_covariance = np.asarray([st.covariance for st in stracks]) + for i, st in enumerate(stracks): + if st.state != TrackState.tracked: + multi_mean[i][7] = 0 + multi_mean, multi_covariance = kalman_filter.multi_predict(multi_mean, multi_covariance) + for i, (mean, cov) in enumerate(zip(multi_mean, multi_covariance)): + stracks[i].mean = mean + stracks[i].covariance = cov + + def activate(self, kalman_filter, frame_id): + """ + Start a new tracklet. + """ + self.kalman_filter = kalman_filter + self.track_id = self.next_id() + self.mean, self.covariance = self.kalman_filter.initiate(self.tlwh_to_xyah(self._tlwh)) + + self.tracklet_len = 0 + self.state = TrackState.tracked + self.frame_id = frame_id + self.start_frame = frame_id + + def re_activate(self, new_track, frame_id, new_id=False): + """ + Reactivate new tracks. + """ + self.mean, self.covariance = self.kalman_filter.update( + self.mean, + self.covariance, + self.tlwh_to_xyah(new_track.tlwh), + ) + + self.update_features(new_track.curr_feat) + self.tracklet_len = 0 + self.state = TrackState.tracked + self.is_activated = True + self.frame_id = frame_id + if new_id: + self.track_id = self.next_id() + + def update(self, new_track, frame_id, update_feature=True): + """ + Update a matched track. + + Args: + new_track (TrackS): New track frame. + frame_id (int): Number of current frame. + update_feature (bool): Update or not. + """ + self.frame_id = frame_id + self.tracklet_len += 1 + + new_tlwh = new_track.tlwh + self.mean, self.covariance = self.kalman_filter.update( + self.mean, self.covariance, self.tlwh_to_xyah(new_tlwh)) + self.state = TrackState.tracked + self.is_activated = True + + self.score = new_track.score + if update_feature: + self.update_features(new_track.curr_feat) + + @property + def tlwh(self): + """ + Get current position in bounding box format + (top left x, top left y, width, height). + """ + if self.mean is None: + return self._tlwh.copy() + ret = self.mean[:4].copy() + ret[2] *= ret[3] + ret[:2] -= ret[2:] / 2 + return ret + + @property + def tlbr(self): + """ + Convert bounding box to format + (min x, min y, max x, max y), i.e., (top left, bottom right). + """ + ret = self.tlwh.copy() + ret[2:] += ret[:2] + return ret + + @staticmethod + def tlwh_to_xyah(tlwh): + """ + Convert bounding box to format + (center x, center y, aspect ratio, height), + where the aspect ratio is width / height. + """ + ret = np.asarray(tlwh).copy() + ret[:2] += ret[2:] / 2 + ret[2] /= ret[3] + return ret + + def to_xyah(self): + """ + Convert tlwh format to xyah. + """ + return self.tlwh_to_xyah(self.tlwh) + + @staticmethod + def tlbr_to_tlwh(tlbr): + """ + Convert tlbr format to tlwh. + """ + ret = np.asarray(tlbr).copy() + ret[2:] -= ret[:2] + return ret + + @staticmethod + def tlwh_to_tlbr(tlwh): + """ + Convert tlwh format to tlbr. + """ + ret = np.asarray(tlwh).copy() + ret[2:] += ret[:2] + return ret + + def __repr__(self): + return f'OT_{self.track_id}_({self.start_frame}-{self.end_frame})' + + +class JDETracker: + """ + Compute track per frame and apply tracking. + """ + def __init__(self, opt, net, frame_rate=30): + self.opt = opt + + self.model = net + logger.info('Inference for: %s', opt.ckpt_url) + + self.tracked_stracks = [] # type: list[TrackS] + self.lost_stracks = [] # type: list[TrackS] + self.removed_stracks = [] # type: list[TrackS] + + self.frame_id = 0 + self.det_thresh = opt.conf_thres + self.buffer_size = int(frame_rate / 30.0 * opt.track_buffer) + self.max_time_lost = self.buffer_size + + self.kalman_filter = KalmanFilter() + + def tracking( + self, + activated_stracks, + refind_stracks, + lost_stracks, + removed_stracks, + unconfirmed, + tracked_stracks, + detections, + ): + """ + Apply tracking strategy. + """ + # Step 2: First association, with embedding. + # Combining currently tracked_stracks and lost_stracks + strack_pool = joint_stracks(tracked_stracks, self.lost_stracks) + # Predict the current location with kalman filter + TrackS.multi_predict(strack_pool, self.kalman_filter) + + # Compute distances of the detection with the tracks in strack_pool. + dists = matching.embedding_distance(strack_pool, detections) + dists = matching.fuse_motion(self.kalman_filter, dists, strack_pool, detections) + + matches, u_track, u_detection = matching.linear_assignment(dists, thresh=0.7) + # The matches is the array for corresponding matches of the detection with the corresponding strack_pool. + + for itracked, idet in matches: + # itracked is the id of the track and idet is the detection + track = strack_pool[itracked] + det = detections[idet] + if track.state == TrackState.tracked: + # If the track is active, add the detection to the track + track.update(detections[idet], self.frame_id) + activated_stracks.append(track) + else: + # Detection from a track which is not active, hence put the track in refind_stracks list + track.re_activate(det, self.frame_id, new_id=False) + refind_stracks.append(track) + + # Step 3: Second association, with IOU + detections = [detections[i] for i in u_detection] + # detections is now a list of the unmatched detections + r_tracked_stracks = [] # This is container for stracks which were tracked till the + # previous frame but no detection was found for it in the current frame + for i in u_track: + if strack_pool[i].state == TrackState.tracked: + r_tracked_stracks.append(strack_pool[i]) + dists = matching.iou_distance(r_tracked_stracks, detections) + matches, u_track, u_detection = matching.linear_assignment(dists, thresh=0.5) + # matches is the list of detections which matched with corresponding tracks by IOU distance method + for itracked, idet in matches: + track = r_tracked_stracks[itracked] + det = detections[idet] + if track.state == TrackState.tracked: + track.update(det, self.frame_id) + activated_stracks.append(track) + else: + track.re_activate(det, self.frame_id, new_id=False) + refind_stracks.append(track) + # Same process done for some unmatched detections, but now considering IOU_distance as measure + + for it in u_track: + track = r_tracked_stracks[it] + if not track.state == TrackState.lost: + track.mark_lost() + lost_stracks.append(track) + # If no detections are obtained for tracks (u_track), + # the tracks are added to lost_tracks list and are marked lost. + + # Deal with unconfirmed tracks, usually tracks with only one beginning frame + detections = [detections[i] for i in u_detection] + dists = matching.iou_distance(unconfirmed, detections) + matches, u_unconfirmed, u_detection = matching.linear_assignment(dists, thresh=0.7) + for itracked, idet in matches: + unconfirmed[itracked].update(detections[idet], self.frame_id) + activated_stracks.append(unconfirmed[itracked]) + + # The tracks which are yet not matched + for it in u_unconfirmed: + track = unconfirmed[it] + track.mark_removed() + removed_stracks.append(track) + + # After all these confirmation steps, if a new detection is found, it is initialized for a new track + # Step 4: Init new stracks + for inew in u_detection: + track = detections[inew] + if track.score < self.det_thresh: + continue + track.activate(self.kalman_filter, self.frame_id) + activated_stracks.append(track) + + # Step 5: Update state + # If the tracks are lost for more frames than the threshold number, the tracks are removed. + for track in self.lost_stracks: + if self.frame_id - track.end_frame > self.max_time_lost: + track.mark_removed() + removed_stracks.append(track) + + # Update the self.tracked_stracks and self.lost_stracks using the updates in this step. + self.tracked_stracks = [t for t in self.tracked_stracks if t.state == TrackState.tracked] + self.tracked_stracks = joint_stracks(self.tracked_stracks, activated_stracks) + self.tracked_stracks = joint_stracks(self.tracked_stracks, refind_stracks) + self.lost_stracks = sub_stracks(self.lost_stracks, self.tracked_stracks) + self.lost_stracks.extend(lost_stracks) + self.lost_stracks = sub_stracks(self.lost_stracks, self.removed_stracks) + self.removed_stracks.extend(removed_stracks) + self.tracked_stracks, self.lost_stracks = remove_duplicate_stracks(self.tracked_stracks, self.lost_stracks) + + + def update(self, im_blob, img0): + """ + Processes the image frame and finds bounding box(detections). + + Associates the detection with corresponding tracklets + and also handles lost, removed, refound and active tracklets. + + Args: + im_blob (np.array): Tensor of image. By default, shape of this tensor is [1, 3, 608, 1088]. + img0 (np.array): Input image sequence. By default, shape is [608, 1080, 3]. + + Returns: + output_stracks (list of TrackS): Information regarding the online_tracklets for the received image tensor. + """ + self.frame_id += 1 + activated_stracks = [] # For storing active tracks, for the current frame. + refind_stracks = [] # Lost Tracks whose detections are obtained in the current frame. + lost_stracks = [] # The tracks which are not obtained in the current frame but are not removed. + removed_stracks = [] + unconfirmed = [] + tracked_stracks = [] # type: list[TrackS] + + # Step 1: Network forward, get detections & embeddings + _, pred = self.model.predict(im_blob) + + # Pred is tensor of all the proposals (default number of proposals: 54264). + # Proposals have information associated with the bounding box and embeddings. + pred = pred.asnumpy() + pred = pred[pred[:, :, 4] > self.opt.conf_thres] + # Pred now has lesser number of proposals. Proposals rejected on basis of object confidence score. + + if pred.size > 0: + dets = non_max_suppression(np.expand_dims(pred, 0), self.opt.conf_thres, self.opt.nms_thres)[0] + + # Final proposals are obtained in dets. Information of bounding box and embeddings also included. + # Next step changes the detection scales + scale_coords(self.opt.img_size, dets[:, :4], img0.shape).round() + + # Detections is list of (x1, y1, x2, y2, object_conf, class_score, class_pred) + # Class_pred is the embeddings. + detections = [TrackS(TrackS.tlbr_to_tlwh(tlbrs[:4]), tlbrs[4], f, 30) for + (tlbrs, f) in zip(dets[:, :5], dets[:, 6:])] + else: + detections = [] + + # Add newly detected tracklets to tracked_stracks + for track in self.tracked_stracks: + if not track.is_activated: + # previous tracks which are not active in the current frame are added in unconfirmed list + unconfirmed.append(track) + else: + # Active tracks are added to the local list 'tracked_stracks' + tracked_stracks.append(track) + + self.tracking( + activated_stracks, + refind_stracks, + lost_stracks, + removed_stracks, + unconfirmed, + tracked_stracks, + detections, + ) + + # get scores of lost tracks + output_stracks = [track for track in self.tracked_stracks if track.is_activated] + + return output_stracks + + +def joint_stracks(tlista, tlistb): + """ + Append stracks. + """ + exists = {} + res = [] + for t in tlista: + exists[t.track_id] = 1 + res.append(t) + for t in tlistb: + tid = t.track_id + if not exists.get(tid, 0): + exists[tid] = 1 + res.append(t) + return res + +def sub_stracks(tlista, tlistb): + """ + Delete stracks. + """ + stracks = {} + for t in tlista: + stracks[t.track_id] = t + for t in tlistb: + tid = t.track_id + if stracks.get(tid, 0): + del stracks[tid] + return list(stracks.values()) + +def remove_duplicate_stracks(stracksa, stracksb): + """ + Removes duplicate from stracks. + """ + pdist = matching.iou_distance(stracksa, stracksb) + pairs = np.where(pdist < 0.15) + dupa, dupb = [], [] + for p, q in zip(*pairs): + timep = stracksa[p].frame_id - stracksa[p].start_frame + timeq = stracksb[q].frame_id - stracksb[q].start_frame + if timep > timeq: + dupb.append(q) + else: + dupa.append(p) + resa = [t for i, t in enumerate(stracksa) if not i in dupa] + resb = [t for i, t in enumerate(stracksb) if not i in dupb] + return resa, resb diff --git a/research/cv/JDE/train.py b/research/cv/JDE/train.py new file mode 100644 index 0000000000000000000000000000000000000000..da70b37187a1fb6573c6841ec91f1d96cd4560d6 --- /dev/null +++ b/research/cv/JDE/train.py @@ -0,0 +1,253 @@ +# Copyright 2022 Huawei Technologies Co., Ltd +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================ +"""Train script.""" +import json + +import numpy as np +from mindspore import Model +from mindspore import context +from mindspore import dataset as ds +from mindspore import nn +from mindspore.common import set_seed +from mindspore.communication.management import get_group_size +from mindspore.communication.management import get_rank +from mindspore.communication.management import init +from mindspore.context import ParallelMode +from mindspore.dataset.vision import py_transforms as PY +from mindspore.train.callback import CheckpointConfig +from mindspore.train.callback import LossMonitor +from mindspore.train.callback import ModelCheckpoint +from mindspore.train.callback import TimeMonitor +from mindspore.train.serialization import load_checkpoint +from mindspore.train.serialization import load_param_into_net + +from cfg.config import config as default_config +from src.darknet import DarkNet, ResidualBlock +from src.dataset import JointDataset +from src.model import JDE +from src.model import YOLOv3 + +set_seed(1) + + +def lr_steps(cfg, steps_per_epoch): + """ + Init lr steps. + """ + learning_rate = warmup_lr( + cfg.lr, + steps_per_epoch, + cfg.epochs, + ) + + return learning_rate + + +def warmup_lr(lr5, steps_per_epoch, max_epoch): + """ + Set lr for training with warmup and freeze backbone. + + Args: + lr5 (float): Initialized learning rate. + steps_per_epoch (int): Num of steps per epoch on one device. + max_epoch (int): Num of training epochs. + + Returns: + lr_each_step (np.array): Lr for every step of training for model params. + """ + base_lr = lr5 + warmup_steps = 1000 + total_steps = int(max_epoch * steps_per_epoch) + milestone_1 = int(0.5 * max_epoch * steps_per_epoch) + milestone_2 = int(0.75 * max_epoch * steps_per_epoch) + + lr_each_step = [] + + for i in range(total_steps): + if i < warmup_steps: + lr5 = base_lr * ((i + 1) / warmup_steps) ** 4 + elif warmup_steps <= i < milestone_1: + lr5 = base_lr + elif milestone_1 <= i < milestone_2: + lr5 = base_lr * 0.1 + elif milestone_2 <= i: + lr5 = base_lr * 0.01 + + lr_each_step.append(lr5) + + lr_each_step = np.array(lr_each_step, dtype=np.float32) + + return lr_each_step + + +def set_context(cfg): + """ + Set process context. + + Args: + cfg: Config parameters. + + Returns: + dev_target (str): Device target platform. + dev_num (int): Amount of devices participating in process. + dev_id (int): Current process device id.. + """ + dev_target = cfg.device_target + context.set_context(mode=context.GRAPH_MODE, device_target=dev_target) + + if dev_target == 'GPU': + if cfg.is_distributed: + init(backend_name='nccl') + dev_num = get_group_size() + dev_id = get_rank() + context.reset_auto_parallel_context() + context.set_auto_parallel_context( + device_num=dev_num, + parallel_mode=ParallelMode.DATA_PARALLEL, + gradients_mean=True, + ) + else: + dev_num = 1 + dev_id = cfg.device_id + context.set_context(device_id=dev_id) + else: + raise ValueError("Unsupported platform.") + + return dev_num, dev_id + + +def init_callbacks(cfg, batch_number, dev_id): + """ + Initialize training callbacks. + + Args: + cfg: Config parameters. + batch_number: Number of batches into one epoch on one device. + dev_id: Current process device id. + + Returns: + cbs: Inited callbacks. + """ + loss_cb = LossMonitor(per_print_times=100) + time_cb = TimeMonitor(data_size=batch_number) + + if cfg.is_distributed and dev_id != cfg.device_start: + cbs = [loss_cb, time_cb] + else: + config_ck = CheckpointConfig( + save_checkpoint_steps=batch_number, + keep_checkpoint_max=cfg.keep_checkpoint_max, + ) + + ckpt_cb = ModelCheckpoint( + prefix="JDE", + directory=cfg.logs_dir, + config=config_ck, + ) + + cbs = [loss_cb, time_cb, ckpt_cb] + + return cbs + + +if __name__ == "__main__": + config = default_config + device_target = config.device_target + + rank_size, rank_id = set_context(config) + + with open(config.data_cfg_url) as f: + data_config = json.load(f) + trainset_paths = data_config['train'] + + dataset = JointDataset( + config.dataset_root, + trainset_paths, + k_max=config.k_max, + augment=True, + transforms=PY.ToTensor(), + config=config, + ) + + dataloader = ds.GeneratorDataset( + dataset, + column_names=config.col_names_train, + shuffle=True, + num_parallel_workers=4, + num_shards=rank_size, + shard_id=rank_id, + max_rowsize=12, + python_multiprocessing=True, + ) + + dataloader = dataloader.batch(config.batch_size, True) + + batch_num = dataloader.get_dataset_size() + + # Initialize backbone + darknet53 = DarkNet( + ResidualBlock, + config.backbone_layers, + config.backbone_input_shape, + config.backbone_shape, + detect=True, + ) + + # Load weights into backbone + if config.ckpt_url is not None: + if config.ckpt_url.endswith(".ckpt"): + param_dict = load_checkpoint(config.ckpt_url) + else: + raise ValueError(f"Unsupported checkpoint extension: {config.ckpt_url}.") + + load_param_into_net(darknet53, param_dict) + print(f"Load pre-trained backbone from: {config.ckpt_url}") + else: + print("Start without pre-trained backbone.") + + # Initialize FPN with YOLOv3 head + yolov3 = YOLOv3( + backbone=darknet53, + backbone_shape=config.backbone_shape, + out_channel=config.out_channel, + ) + + # Initialize train model with loss cell + net = JDE(yolov3, default_config, dataset.nid, config.embedding_dim) + + # Initiate lr for training + lr = lr_steps(config, batch_num) + + params = net.trainable_params() + + # Set lr scheduler + group_params = [ + {'params': params, 'lr': lr}, + {'order_params': params}, + ] + + opt = nn.SGD( + params=group_params, + learning_rate=lr, + momentum=config.momentum, + weight_decay=config.decay, + ) + + model = Model(net, optimizer=opt) + + callbacks = init_callbacks(config, batch_num, rank_id) + + model.train(epoch=config.epochs, train_dataset=dataloader, callbacks=callbacks, dataset_sink_mode=False) + print("train success")