Skip to content
Snippets Groups Projects
Unverified Commit 27395d47 authored by i-robot's avatar i-robot Committed by Gitee
Browse files

!1954 Models: JDE

Merge pull request !1954 from adenisov/models-pr-jde
parents df922b9a 773684d5
No related branches found
No related tags found
No related merge requests found
Showing
with 2973 additions and 0 deletions
# Dataset Zoo
Datasets preparing was used from [Towards-Realtime-MOT](https://github.com/Zhongdao/Towards-Realtime-MOT/blob/master/DATASET_ZOO.md)
## Data Format
The root folder of datasets will have the following structure:
```text
.
└─datasets
├─Caltech
├─Cityscapes
├─CUHKSYSU
├─ETHZ
├─MOT16
├─MOT17
└─PRW
```
Every image has a corresponding annotation text. Given an image path,
the annotation text path can be generated by replacing the string `images` with `labels_with_ids` and replacing `.jpg` with `.txt`.
In the annotation text, each line is describing a bounding box and has the following format:
```text
[class] [identity] [x_center] [y_center] [width] [height]
```
The field `[class]` should be `0`. Only single-class multi-object tracking is supported.
The field `[identity]` is an integer from `0` to `num_identities - 1`, or `-1` if this box has no identity annotation.
- Note that the values of `[x_center] [y_center] [width] [height]` are normalized by the width/height of the image, so they are floating point numbers ranging from 0 to 1.
## Download
### Caltech Pedestrian
Download all archives `set**.tar` files from [this page](https://drive.google.com/drive/folders/1IBlcJP8YsCaT81LwQ2YwQJac8bf1q8xF?usp=sharing) and extract to `Caltech/data`.
Download [annotations](https://drive.google.com/file/d/1h8vxl_6tgi9QVYoer9XcY9YwNB32TE5k/view?usp=sharing) and unzip to `Caltech/data/labels_with_ids`.
Download [this tool](https://github.com/mitmul/caltech-pedestrian-dataset-converter) to convert the original data format to images.
Move `scripts` folder of tool to `Caltech` folder and use command:
```bash
python scripts/convert_seqs.py
```
The structure of the dataset after completing all steps will be the following:
```text
.
└─Caltech
└─data
├─images
│ └─***
└─labels_with_ids
└─***
```
Note: *** - it is a data (images or annotations)
### CityPersons
Google Drive:
[[0]](https://drive.google.com/file/d/1DgLHqEkQUOj63mCrS_0UGFEM9BG8sIZs/view?usp=sharing)
[[1]](https://drive.google.com/file/d/1BH9Xz59UImIGUdYwUR-cnP1g7Ton_LcZ/view?usp=sharing)
[[2]](https://drive.google.com/file/d/1q_OltirP68YFvRWgYkBHLEFSUayjkKYE/view?usp=sharing)
[[3]](https://drive.google.com/file/d/1VSL0SFoQxPXnIdBamOZJzHrHJ1N2gsTW/view?usp=sharing)
Download `.zip` archives from links and use the following commands.
```bash
zip --FF Citypersons --out c.zip
unzip c.zip
mv Citypersons Cityscapes
```
The structure of the dataset after completing all steps will be the following:
```text
.
└─Cityscapes
├─images
│ ├─train
│ └─val
└─labels_with_ids
├─train
└─val
```
### CUHK-SYSU
Google Drive:
[[0]](https://drive.google.com/file/d/1D7VL43kIV9uJrdSCYl53j89RE2K-IoQA/view?usp=sharing)
Original dataset webpage: [CUHK-SYSU Person Search Dataset](http://www.ee.cuhk.edu.hk/~xgwang/PS/dataset.html)
Download dataset, unzip and use command below.
```bash
mv CUHK-SYSU CUHKSYSU
```
The structure of the dataset will be the following:
```text
.
└─CUHKSYSU
├─images
│ └─***
└─labels_with_ids
└─***
```
Note: *** - it is a data (images or annotations)
### PRW
Google Drive:
[[0]](https://drive.google.com/file/d/116_mIdjgB-WJXGe8RYJDWxlFnc_4sqS8/view?usp=sharing)
Download dataset and unzip. The structure of the dataset will be the following:
```text
.
└─PRW
├─images
│ └─***
└─labels_with_ids
└─***
```
Note: *** - it is a data (images or annotations)
### ETHZ (overlapping with MOT-16 removed)
Google Drive:
[[0]](https://drive.google.com/file/d/19QyGOCqn8K_rc9TXJ8UwLSxCx17e0GoY/view?usp=sharing)
Original dataset webpage: [ETHZ pedestrian dataset](https://data.vision.ee.ethz.ch/cvl/aess/dataset/)
Download dataset and unzip. The structure of the dataset will be the following:
```text
.
└─ETHZ
├─eth01
│ ├─images
│ │ └─***
│ └─labels_with_ids
│ └─***
├─eth02
├─eth03
├─eth05
└─eth07
```
Note: *** - it is a data (images or annotations). Same structure to every 'eth*' folder.
### MOT-17
Official link:
[[0]](https://motchallenge.net/data/MOT17.zip)
Original dataset webpage: [MOT-17](https://motchallenge.net/data/MOT17/)
After downloading, unzip and use `prepare_mot17.py` script from the:
```bash
python data/prepare_mot17.py --seq_root /path/to/MOT17/train
```
The structure of the dataset after completing all steps will be the following:
```text
.
└─MOT17
├─images
│ └─train
└─labels_with_ids
└─train
```
### MOT-16 (for evaluation)
Google Drive:
[[0]](https://drive.google.com/file/d/1254q3ruzBzgn4LUejDVsCtT05SIEieQg/view?usp=sharing)
Original dataset webpage: [MOT-16](https://motchallenge.net/data/MOT16/)
Download link: [MOT-16.zip](https://motchallenge.net/data/MOT16.zip)
> The section "Download" in the bottom of the web-page. Link: "Get all data".
Download dataset and unzip. The structure of the dataset will be the following:
```text
.
└─MOT16
└─train
```
# Data config
Download [schemas](https://github.com/Zhongdao/Towards-Realtime-MOT/tree/master/data) of the training data with relative paths for every image, divided into train/val parts and move into `data` folder.
```text
.
└── data
├─ caltech.10k.val
├─ caltech.train
├─ caltech.val
├─ citypersons.train
├─ citypersons.val
├─ cuhksysu.train
├─ cuhksysu.val
├─ eth.train
├─ mot17.train
├─ prw.train
└─ prw.val
```
# Citation
Caltech:
```text
@inproceedings{ dollarCVPR09peds,
author = "P. Doll\'ar and C. Wojek and B. Schiele and P. Perona",
title = "Pedestrian Detection: A Benchmark",
booktitle = "CVPR",
month = "June",
year = "2009",
city = "Miami",
}
```
Citypersons:
```text
@INPROCEEDINGS{Shanshan2017CVPR,
author = {Shanshan Zhang and Rodrigo Benenson and Bernt Schiele},
title = {CityPersons: A Diverse Dataset for Pedestrian Detection},
booktitle = {CVPR},
year = {2017}
}
@INPROCEEDINGS{Cordts2016Cityscapes,
title={The Cityscapes Dataset for Semantic Urban Scene Understanding},
author={Cordts, Marius and Omran, Mohamed and Ramos, Sebastian and Rehfeld, Timo and Enzweiler, Markus and Benenson, Rodrigo and Franke, Uwe and Roth, Stefan and Schiele, Bernt},
booktitle={Proc. of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
year={2016}
}
```
CUHK-SYSU:
```text
@inproceedings{xiaoli2017joint,
title={Joint Detection and Identification Feature Learning for Person Search},
author={Xiao, Tong and Li, Shuang and Wang, Bochao and Lin, Liang and Wang, Xiaogang},
booktitle={CVPR},
year={2017}
}
```
PRW:
```text
@inproceedings{zheng2017person,
title={Person re-identification in the wild},
author={Zheng, Liang and Zhang, Hengheng and Sun, Shaoyan and Chandraker, Manmohan and Yang, Yi and Tian, Qi},
booktitle={Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition},
pages={1367--1376},
year={2017}
}
```
ETHZ:
```text
@InProceedings{eth_biwi_00534,
author = {A. Ess and B. Leibe and K. Schindler and and L. van Gool},
title = {A Mobile Vision System for Robust Multi-Person Tracking},
booktitle = {IEEE Conference on Computer Vision and Pattern Recognition (CVPR'08)},
year = {2008},
month = {June},
publisher = {IEEE Press},
}
```
MOT-16&17:
```text
@article{milan2016mot16,
title={MOT16: A benchmark for multi-object tracking},
author={Milan, Anton and Leal-Taix{\'e}, Laura and Reid, Ian and Roth, Stefan and Schindler, Konrad},
journal={arXiv preprint arXiv:1603.00831},
year={2016}
}
```
# Contents
- [Contents](#contents)
- [JDE Description](#jde-description)
- [Model Architecture](#model-architecture)
- [Dataset](#dataset)
- [Environment Requirements](#environment-requirements)
- [Quick Start](#quick-start)
- [Script Description](#script-description)
- [Script and Sample Code](#script-and-sample-code)
- [Script Parameters](#script-parameters)
- [Training Process](#training-process)
- [Standalone Training](#standalone-training)
- [Distribute Training](#distribute-training)
- [Evaluation Process](#evaluation-process)
- [Evaluation](#evaluation)
- [Inference Process](#inference-process)
- [Usage](#usage)
- [Result](#result)
- [Model Description](#model-description)
- [Performance](#performance)
- [Training Performance](#training-performance)
- [Evaluation Performance](#evaluation-performance)
- [ModelZoo Homepage](#modelzoo-homepage)
## [JDE Description](#contents)
Paper with introduced JDE model is dedicated to the improving efficiency of an MOT system.
It's introduce an early attempt that Jointly learns the Detector and Embedding model (JDE) in a single-shot deep network.
In other words, the proposed JDE employs a single network to simultaneously output detection results and the corresponding appearance embeddings of the detected boxes.
In comparison, SDE methods and two-stage methods are characterized by re-sampled pixels (bounding boxes) and feature maps, respectively.
Both the bounding boxes and feature maps are fed into a separate re-ID model for appearance feature extraction.
Method is near real-time while being almost as accurate as the SDE methods.
[Paper](https://arxiv.org/pdf/1909.12605.pdf): Towards Real-Time Multi-Object Tracking. Department of Electronic Engineering, Tsinghua University
## [Model Architecture](#contents)
Architecture of the JDE is the Feature Pyramid Network (FPN).
FPN makes predictions from multiple scales, thus bringing improvement in pedestrian detection where the scale of targets varies a lot.
An input video frame first undergoes a forward pass through a backbone network to obtain feature maps at three scales, namely, scales with 1/32, 1/16 and 1/8 down-sampling rate, respectively.
Then, the feature map with the smallest size (also the semantically strongest features) is up-sampled and fused with the feature map from the second smallest scale by skip connection, and the same goes for the other scales.
Finally, prediction heads are added upon fused feature maps at all the three scales.
A prediction head consists of several stacked convolutional layers and outputs a dense prediction map of size (6A + D) × H × W, where A is the number of anchor templates assigned to this scale, and D is the dimension of the embedding.
## [Dataset](#contents)
Used a large-scale training set by putting together six publicly available datasets on pedestrian detection, MOT and person search.
These datasets can be categorized into two types: ones that only contain bounding box annotations, and ones that have both bounding box and identity annotations.
The first category includes the ETH dataset and the CityPersons (CP) dataset. The second category includes the CalTech (CT) dataset, MOT16 (M16) dataset, CUHK-SYSU (CS) dataset and PRW dataset.
Training subsets of all these datasets are gathered to form the joint training set, and videos in the ETH dataset that overlap with the MOT-16 test set are excluded for fair evaluation.
Datasets preparations are described in [DATASET_ZOO.md](DATASET_ZOO.md).
Datasets size: 134G, 1 object category (pedestrian).
Note: `--dataset_root` is used as an entry point for all datasets, used for training and evaluating this model.
Organize your dataset structure as follows:
```text
.
└─dataset_root/
├─Caltech/
├─Cityscapes/
├─CUHKSYSU/
├─ETHZ/
├─MOT16/
├─MOT17/
└─PRW/
```
Information about train part of dataset.
| Dataset | ETH | CP | CT | M16 | CS | PRW | Total |
| :------:|:---:|:---:|:---:|:---:|:---:|:---:|:-----:|
| # img |2K |3K |27K |53K |11K |6K |54K |
| # box |17K |21K |46K |112K |55K |18K |270K |
| # ID |- |- |0.6K |0.5K |7K |0.5K |8.7K |
## [Environment Requirements](#contents)
- Hardware(GPU)
- Prepare hardware environment with GPU processor.
- Framework
- [MindSpore](https://www.mindspore.cn/install/en)
- For more information, please check the resources below:
- [MindSpore Tutorials](https://www.mindspore.cn/tutorials/en/master/index.html)
- [MindSpore Python API](https://www.mindspore.cn/docs/api/en/master/index.html)
## [Quick Start](#contents)
After installing MindSpore through the official website, you can follow the steps below for training and evaluation,
in particular, before training, you need to install `requirements.txt` by following command `pip install -r requirements.txt`.
> If an error occurred, update pip by `pip install --upgrade pip` and try again.
> If it didn't help install packages manually by using `pip install {package from requirements.txt}`.
Note: The PyTorch is used only for checkpoint conversion.
All trainings will starts from pre-trained backbone,
[download](https://drive.google.com/file/d/1keZwVIfcWmxfTiswzOKUwkUz2xjvTvfm/view) and convert the pre-trained on
ImageNet backbone with commands below:
```bash
# From the root model directory run
python -m src.convert_checkpoint --ckpt_url [PATH_TO_PYTORCH_CHECKPOINT]
```
- PATH_TO_PYTORCH_CHECKPOINT - Path to the downloaded darknet53 PyTorch checkpoint.
After converting the checkpoint and installing the requirements.txt, you can run the training scripts:
```bash
# Run standalone training example
bash scripts/run_standalone_train_gpu.sh [DEVICE_ID] [LOGS_CKPT_DIR] [CKPT_URL] [DATASET_ROOT]
# Run distribute training example
bash scripts/run_distribute_train_gpu.sh [DEVICE_NUM] [LOGS_CKPT_DIR] [CKPT_URL] [DATASET_ROOT]
```
- DEVICE_ID - Device ID
- LOGS_CKPT_DIR - path to the directory, where the training results will be stored.
- CKPT_URL - Path to the converted pre-trained DarkNet53 backbone.
- DATASET_ROOT - Path to the dataset root directory (containing all dataset parts, described in [DATASET_ZOO.md](DATASET_ZOO.md))
## [Script Description](#contents)
### [Script and Sample Code](#contents)
```text
.
└─JDE
├─data
│ └─prepare_mot17.py # MOT17 data preparation script
├─cfg
│ ├─ccmcpe.json # paths to dataset schema (defining relative paths structure)
│ └─config.py # parameter parser
├─scripts
│ ├─run_distribute_train_gpu.sh # launch distribute train on GPU
│ ├─run_eval_gpu.sh # launch evaluation on GPU
│ └─run_standalone_train_gpu.sh # launch standalone train on GPU
├─src
│ ├─__init__.py
│ ├─convert_checkpoint.py # backbone checkpoint converter (torch to mindspore)
│ ├─darknet.py # backbone of network
│ ├─dataset.py # create dataset
│ ├─evaluation.py # motmetrics evaluator
│ ├─io.py # MOT evaluation utils
│ ├─kalman_filter.py # kalman filter script
│ ├─log.py # logger script
│ ├─model.py # create model script
│ ├─timer.py # timer script
│ ├─utils.py # utilities used in other scripts
│ └─visualization.py # visualization for inference
├─tracker
│ ├─__init__.py
│ ├─basetrack.py # base class for tracking
│ ├─matching.py # matching for tracking script
│ └─multitracker.py # tracker init script
├─DATASET_ZOO.md # dataset preparing description
├─README.md
├─default_config.yaml # default configs
├─eval.py # evaluation script
├─eval_detect.py # detector evaluation script
├─export.py # export to MINDIR script
├─infer.py # inference script
├─requirements.txt
└─train.py # training script
```
### [Script Parameters](#contents)
```text
Parameters in config.py and default_config.yaml.
Include arguments for Train/Evaluation/Inference.
--config_path Path to default_config.yaml with hyperparameters and defaults
--data_cfg_url Path to .json with paths to datasets schemas
--momentum Momentum for SGD optimizer
--decay Weight_decay for SGD optimizer
--lr Init learning rate
--epochs Number of epochs to train
--batch_size Batch size per one device'
--num_classes Number of object classes
--k_max Max predictions per one map (made for optimization of FC layer embedding computation)
--img_size Size of input images
--track_buffer Tracking buffer
--keep_checkpoint_max Keep saved last N checkpoints
--backbone_input_shape Input filters of backbone layers
--backbone_shape Input filters of backbone layers
--backbone_layers Output filters of backbone layers
--out_channel Number of channels for detection
--embedding_dim Number of channels for embeddings
--iou_thres IOU thresholds
--conf_thres Confidence threshold
--nms_thres Threshold for Non-max suppression
--min_box_area Filter out tiny boxes
--anchor_scales 12 predefined anchor boxes. Different 4 per each of 3 feature maps
--col_names_train Names of columns for training GeneratorDataset
--col_names_val Names of columns for validation GeneratorDataset
--is_distributed Distribute training or not
--dataset_root Path to datasets root folder
--device_target Device GPU or any
--device_id Device id of target device
--device_start Start device id
--ckpt_url Location of checkpoint
--logs_dir Dir to save logs and ckpt
--input_video Path to the input video
--output_format Expected output format
--output_root Expected output root path
--save_images Save tracking results (image)
--save_videos Save tracking results (video)
```
### [Training Process](#contents)
#### Standalone Training
Note: For all trainings necessary to use pretrained backbone darknet53.
```bash
bash scripts/run_standalone_train_gpu.sh [DEVICE_ID] [LOGS_CKPT_DIR] [CKPT_URL] [DATASET_ROOT]
```
- DEVICE_ID - device ID
- LOGS_CKPT_DIR - path to the directory, where the training results will be stored.
- CKPT_URL - Path to the converted pre-trained DarkNet53 backbone.
- DATASET_ROOT - Path to the dataset root directory (containing all dataset parts, described in [DATASET_ZOO.md](DATASET_ZOO.md))
The above command will run in the background, you can view the result through the generated standalone_train.log file.
After training, you can get the training loss and time logs in chosen logs_dir.
The model checkpoints will be saved in LOGS_CKPT_DIR directory.
#### Distribute Training
```bash
bash scripts/run_distribute_train_gpu.sh [DEVICE_NUM] [LOGS_CKPT_DIR] [CKPT_URL] [DATASET_ROOT]
```
- DEVICE_ID - device ID
- LOGS_CKPT_DIR - path to the directory, where the training results will be stored.
- CKPT_URL - Path to the converted pre-trained DarkNet53 backbone.
- DATASET_ROOT - Path to the dataset root directory (containing all dataset parts, described in [DATASET_ZOO.md](DATASET_ZOO.md))
The above shell script will run the distributed training in the background.
Here is the example of the training logs:
```text
epoch: 30 step: 1612, loss is -4.7679796
epoch: 30 step: 1612, loss is -5.816874
epoch: 30 step: 1612, loss is -5.302864
epoch: 30 step: 1612, loss is -5.775913
epoch: 30 step: 1612, loss is -4.9537477
epoch: 30 step: 1612, loss is -4.3535285
epoch: 30 step: 1612, loss is -5.0773625
epoch: 30 step: 1612, loss is -4.2019467
epoch time: 2023042.925 ms, per step time: 1209.954 ms
epoch time: 2023069.500 ms, per step time: 1209.970 ms
epoch time: 2023097.331 ms, per step time: 1209.986 ms
epoch time: 2023038.221 ms, per step time: 1209.951 ms
epoch time: 2023098.113 ms, per step time: 1209.987 ms
epoch time: 2023093.300 ms, per step time: 1209.984 ms
epoch time: 2023078.631 ms, per step time: 1209.975 ms
epoch time: 2017509.966 ms, per step time: 1206.645 ms
train success
train success
train success
train success
train success
train success
train success
train success
```
### [Evaluation Process](#contents)
#### Evaluation
Tracking ability of the model is tested on the train part of the MOT16 dataset (doesn't use during training).
To start tracker evaluation run the command below.
```bash
bash scripts/run_eval_gpu.sh [DEVICE_ID] [CKPT_URL] [DATASET_ROOT]
```
- DEVICE_ID - Device ID.
- CKPT_URL - Path to the trained JDE model.
- DATASET_ROOT - Path to the dataset root directory (containing all dataset parts, described in [DATASET_ZOO.md](DATASET_ZOO.md)).
> Note: the script expects that the DATASET_ROOT directory contains the MOT16 sub-folder.
The above python command will run in the background. The validation logs will be saved in "eval.log".
For more details about `motmetrics`, you can refer to [MOT benchmark](https://motchallenge.net/).
```text
DATE-DATE-DATE TIME:TIME:TIME [INFO]: Time elapsed: 240.54 seconds, FPS: 22.04
IDF1 IDP IDR Rcll Prcn GT MT PT ML FP FN IDs FM MOTA MOTP IDt IDa IDm
MOT16-02 45.1% 49.9% 41.2% 71.0% 86.0% 54 17 31 6 2068 5172 425 619 57.0% 0.215 239 68 14
MOT16-04 69.5% 75.5% 64.3% 80.6% 94.5% 83 45 24 14 2218 9234 175 383 75.6% 0.184 98 28 3
MOT16-05 63.6% 68.1% 59.7% 82.0% 93.7% 125 67 49 9 376 1226 137 210 74.5% 0.203 113 40 40
MOT16-09 55.2% 60.4% 50.8% 78.1% 92.9% 25 16 8 1 316 1152 108 147 70.0% 0.187 76 15 11
MOT16-10 57.1% 59.9% 54.5% 80.1% 88.1% 54 28 26 0 1337 2446 376 569 66.2% 0.228 202 66 16
MOT16-11 75.0% 76.4% 73.7% 89.6% 92.9% 69 50 16 3 626 953 78 137 81.9% 0.159 49 24 12
MOT16-13 64.8% 69.9% 60.3% 78.5% 90.9% 107 58 43 6 900 2463 272 528 68.3% 0.223 200 59 48
OVERALL 63.2% 68.1% 58.9% 79.5% 91.8% 517 281 197 39 7841 22646 1571 2593 71.0% 0.196 977 300 144
```
To evaluate detection ability (get mAP, Precision and Recall metrics) of the model, run command below.
```bash
python eval_detect.py --device_id [DEVICE_ID] --ckpt_url [CKPT_URL] --dataset_root [DATASET_ROOT]
```
- DEVICE_ID - Device ID.
- CKPT_URL - Path to the trained JDE model.
- DATASET_ROOT - Path to the dataset root directory (containing all dataset parts, described in [DATASET_ZOO.md](DATASET_ZOO.md)).
Results of evaluation will be visualized at command line.
```text
Image Total P R mAP
4000 30353 0.829 0.778 0.765 0.426s
8000 30353 0.863 0.798 0.788 0.42s
12000 30353 0.854 0.815 0.802 0.419s
16000 30353 0.857 0.821 0.809 0.582s
20000 30353 0.865 0.834 0.824 0.413s
24000 30353 0.868 0.841 0.832 0.415s
28000 30353 0.874 0.839 0.83 0.419s
mean_mAP: 0.8225, mean_R: 0.8325, mean_P: 0.8700
```
### [Inference Process](#contents)
#### Usage
To compile video from frames with predicted bounding boxes, you need to install `ffmpeg` by using
`sudo apt-get install ffmpeg`. Video compiling will happen automatically.
```bash
python infer.py --device_id [DEVICE_ID] --ckpt_url [CKPT_URL] --input_video [INPUT_VIDEO]
```
- DEVICE_ID - Device ID.
- CKPT_URL - Path to the trained JDE model.
- INPUT_VIDEO - Path to the input video to tracking.
#### Result
Results of the inference will be saved into default `./results` folder, logs will be shown at command line.
## [Model Description](#contents)
### [Performance](#contents)
#### Training Performance
| Parameters | GPU (8p) |
| -------------------------- |----------------------------------------------------------------------------------- |
| Model | JDE (1088*608) |
| Hardware | 8 Nvidia RTX 3090, Intel Xeon Gold 6226R CPU @ 2.90GHz |
| Upload Date | 02/02/2022 (day/month/year) |
| MindSpore Version | 1.5.0 |
| Dataset | Joint Dataset (see `DATASET_ZOO.md`) |
| Training Parameters | epoch=30, batch_size=4 (per device), lr=0.01, momentum=0.9, weight_decay=0.0001 |
| Optimizer | SGD |
| Loss Function | SmoothL1Loss, SoftmaxCrossEntropyWithLogits (and apply auto-balancing loss strategy)|
| Outputs | Tensor of bbox cords, conf, class, emb |
| Speed | Eight cards: ~1206 ms/step |
| Total time | Eight cards: ~17 hours |
#### Evaluation Performance
| Parameters | GPU (1p) |
| ------------------- |--------------------------------------------------------|
| Model | JDE (1088*608) |
| Resource | 1 Nvidia RTX 3090, Intel Xeon Gold 6226R CPU @ 2.90GHz |
| Upload Date | 02/02/2022 (day/month/year) |
| MindSpore Version | 1.5.0 |
| Dataset | MOT-16 |
| Batch_size | 1 |
| Outputs | Metrics, .txt predictions |
| FPS | 22.04 |
| Metrics | mAP 82.2, MOTA 71.0% |
## [ModelZoo Homepage](#contents)
Please check the official [homepage](https://gitee.com/mindspore/models).
{
"train":
{
"mot17":"./data/mot17.train",
"caltech":"./data/caltech.train",
"citypersons":"./data/citypersons.train",
"cuhksysu":"./data/cuhksysu.train",
"prw":"./data/prw.train",
"eth":"./data/eth.train"
},
"test_emb":
{
"caltech":"./data/caltech.10k.val",
"cuhksysu":"./data/cuhksysu.val",
"prw":"./data/prw.val"
},
"test":
{
"caltech":"./data/caltech.val",
"citypersons":"./data/citypersons.val"
}
}
# Copyright 2022 Huawei Technologies Co., Ltd
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ============================================================================
"""Parse arguments"""
import argparse
import ast
from pathlib import Path
from pprint import pformat
import yaml
class Config:
"""
Configuration namespace, convert dictionary to members.
"""
def __init__(self, cfg_dict):
for k, v in cfg_dict.items():
if isinstance(v, (list, tuple)):
setattr(self, k, [Config(x) if isinstance(x, dict) else x for x in v])
else:
setattr(self, k, Config(v) if isinstance(v, dict) else v)
def __str__(self):
return pformat(self.__dict__)
def __repr__(self):
return self.__str__()
def parse_cli_to_yaml(parser, cfg, helper=None, choices=None, cfg_path="default_config.yaml"):
"""
Parse command line arguments to the configuration according to the default yaml.
Args:
parser (argparse.ArgumentParser): Parent parser.
cfg (dict): Base configuration.
helper (dict): Helper description.
choices (dict): Choices.
"""
helper = {} if helper is None else helper
choices = {} if choices is None else choices
for item in cfg:
if not isinstance(cfg[item], list) and not isinstance(cfg[item], dict):
help_description = helper[item] if item in helper else f"Please reference to {cfg_path}"
choice = choices[item] if item in choices else None
if isinstance(cfg[item], bool):
parser.add_argument("--" + item, type=ast.literal_eval, default=cfg[item], choices=choice,
help=help_description)
else:
parser.add_argument("--" + item, type=type(cfg[item]), default=cfg[item], choices=choice,
help=help_description)
args = parser.parse_args()
return args
def parse_yaml(yaml_path):
"""
Parse the yaml config file.
"""
with open(yaml_path, 'r') as fin:
try:
cfgs_raw = yaml.load_all(fin.read(), Loader=yaml.FullLoader)
cfgs = []
for cf in cfgs_raw:
cfgs.append(cf)
if len(cfgs) == 1:
cfg_helper = {}
cfg = cfgs[0]
cfg_choices = {}
elif len(cfgs) == 2:
cfg, cfg_helper = cfgs
cfg_choices = {}
elif len(cfgs) == 3:
cfg, cfg_helper, cfg_choices = cfgs
else:
raise ValueError("At most 3 docs (config, description for help, choices) are supported in config yaml")
except ValueError("Failed to parse yaml") as err:
raise err
return cfg, cfg_helper, cfg_choices
def merge(args, cfg):
"""
Merge the base config from yaml file and command line arguments.
Args:
args (argparse.Namespace): Command line arguments.
cfg (dict): Base configuration.
"""
args_var = vars(args)
for item in args_var:
cfg[item] = args_var[item]
return cfg
def get_config():
"""
Get Config according to the yaml file and cli arguments.
"""
curr_dir = Path(__file__).resolve().parent
parser = argparse.ArgumentParser(description="JDE config", add_help=False)
parser.add_argument("--config_path", type=str, default=str(curr_dir / "../default_config.yaml"),
help="Path to config.")
parser.add_argument("--data_cfg_url", type=str, default=str(curr_dir / "ccmcpe.json"),
help="Path to data config.")
path_args, _ = parser.parse_known_args()
default, helper, choices = parse_yaml(path_args.config_path)
args = parse_cli_to_yaml(parser=parser, cfg=default, helper=helper, choices=choices)
final_config = merge(args, default)
return Config(final_config)
config = get_config()
# Copyright 2022 Huawei Technologies Co., Ltd
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ============================================================================
"""Prepare data."""
import argparse
import os
import os.path as osp
import shutil
from pathlib import Path
import numpy as np
def prepare(seq_root):
"""Prepare MOT17 dataset for JDE training."""
label_root = str(Path(Path(seq_root).parents[0], 'labels_with_ids', 'train'))
seqs = [s for s in os.listdir(seq_root) if s.endswith('SDP')]
tid_curr = 0
tid_last = -1
for seq in seqs:
with open(osp.join(seq_root, seq, 'seqinfo.ini')) as file:
seq_info = file.read()
seq_width = int(seq_info[seq_info.find('imWidth=') + 8: seq_info.find('\nimHeight')])
seq_height = int(seq_info[seq_info.find('imHeight=') + 9: seq_info.find('\nimExt')])
gt_txt = osp.join(seq_root, seq, 'gt', 'gt.txt')
gt = np.loadtxt(gt_txt, dtype=np.float64, delimiter=',')
seq_label_root = osp.join(label_root, seq, 'img1')
if not osp.exists(seq_label_root):
os.makedirs(seq_label_root)
for fid, tid, x, y, w, h, mark, label, _ in gt:
if mark == 0 or not label == 1:
continue
fid = int(fid)
tid = int(tid)
if tid != tid_last:
tid_curr += 1
tid_last = tid
x += w / 2
y += h / 2
label_fpath = osp.join(seq_label_root, '{:06d}.txt'.format(fid))
label_str = '0 {:d} {:.6f} {:.6f} {:.6f} {:.6f}\n'.format(
tid_curr, x / seq_width, y / seq_height, w / seq_width, h / seq_height)
with open(label_fpath, 'a') as f:
f.write(label_str)
old_path = str(Path(seq_root, seq))
new_path = str(Path(Path(seq_root).parents[0], 'images', 'train'))
if not osp.exists(new_path):
os.makedirs(new_path)
shutil.move(old_path, new_path)
print('Done')
if __name__ == "__main__":
parser = argparse.ArgumentParser()
parser.add_argument("--seq_root", required=True, help='Path to root dir of sequences')
args = parser.parse_args()
prepare(args.seq_root)
# Builtin Configurations(DO NOT CHANGE THESE CONFIGURATIONS unless you know exactly what you are doing)
# hyperparameters of training
momentum: 0.9
decay: 0.0001
lr: 0.01
epochs: 30
batch_size: 4
# other
num_classes: 1
k_max: 250
img_size: [1088, 608]
track_buffer: 30
keep_checkpoint_max: 6
# model initialization parameters
backbone_input_shape: [32, 64, 128, 256, 512]
backbone_shape: [64, 128, 256, 512, 1024]
backbone_layers: [1, 2, 8, 8, 4]
out_channel: 24 # 3 * (num_classes + 5)
embedding_dim: 512
# evaluation thresholds
iou_thres: 0.50
conf_thres: 0.55
nms_thres: 0.45
min_box_area: 200
# h -> w
anchor_scales: [
[8, 24],
[11, 34],
[16, 48],
[23, 68],
[32, 96],
[45, 135],
[64, 192],
[90, 271],
[128, 384],
[180, 540],
[256, 640],
[512, 640],
]
# data configs
col_names_train: [
'imgs',
'tconf_s',
'tbox_s',
'tid_s',
'tconf_m',
'tbox_m',
'tid_m',
'tconf_b',
'tbox_b',
'tid_b',
'emb_indices_s',
'emb_indices_m',
'emb_indices_b',
]
col_names_val: [
'imgs',
'targets',
'lens',
]
# other
is_distributed: False
dataset_root: '/path/to/datasets/root/folder/'
device_target: 'GPU'
device_id: 0
device_start: 0
ckpt_url: '/path/to/checkpoint'
logs_dir: './logs'
input_video: '/path/to/input/video'
output_format: 'video'
output_root: './results'
save_images: False
save_videos: False
---
# Config description for each option
momentum: 'Momentum for SGD optimizer.'
decay: 'Weight_decay for SGD optimizer.'
lr: 'Init learning rate.'
epochs: 'Number of epochs to train.'
batch_size: 'Batch size per one device'
num_classes: 'Number of object classes.'
k_max: 'Max predictions per one map (made for optimization of FC layer embedding computation).'
img_size: 'Size of input images.'
track_buffer: 'Tracking buffer.'
keep_checkpoint_max: 'Keep saved last N checkpoints.'
backbone_input_shape: 'Input filters of backbone layers.'
backbone_shape: 'Input filters of backbone layers.'
backbone_layers: 'Output filters of backbone layers.'
out_channel: 'Number of channels for detection.'
embedding_dim: 'Number of channels for embeddings.'
iou_thres: 'IOU thresholds.'
conf_thres: 'Confidence threshold.'
nms_thres: 'Threshold for Non-max suppression.'
min_box_area: 'Filter out tiny boxes.'
anchor_scales: '12 predefined anchor boxes. Different 4 per each of 3 feature maps.'
col_names_train: 'Names of columns for training GeneratorDataset.'
col_names_val: 'Names of columns for validation GeneratorDataset.'
is_distributed: 'Distribute training or not.'
dataset_root: 'Path to datasets root folder.'
device_target: 'Device GPU or any.'
device_id: 'Device id of target device.'
device_start: 'Start device id.'
ckpt_url: 'Location of checkpoint.'
logs_dir: 'Dir to save logs and ckpt.'
input_video: 'Path to the input video.'
output_format: 'Expected output format.'
output_root: 'Expected output root path.'
save_images: 'Save tracking results (image).'
save_videos: 'Save tracking results (video).'
# Copyright 2022 Huawei Technologies Co., Ltd
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ============================================================================
"""Tracker evaluation script."""
import logging
import os
import os.path as osp
import cv2
import motmetrics as mm
import numpy as np
from mindspore import Model
from mindspore import Tensor
from mindspore import context
from mindspore import dtype as mstype
from mindspore.train.serialization import load_checkpoint
from cfg.config import config as default_config
from src import visualization as vis
from src.darknet import DarkNet, ResidualBlock
from src.dataset import LoadImages
from src.evaluation import Evaluator
from src.log import logger
from src.model import JDEeval
from src.model import YOLOv3
from src.timer import Timer
from src.utils import mkdir_if_missing
from tracker.multitracker import JDETracker
_MOT16_VALIDATION_FOLDERS = (
'MOT16-02',
'MOT16-04',
'MOT16-05',
'MOT16-09',
'MOT16-10',
'MOT16-11',
'MOT16-13',
)
_MOT16_DIR_FOR_TEST = 'MOT16/train'
def write_results(filename, results, data_type):
"""
Format for evaluation results.
"""
if data_type == 'mot':
save_format = '{frame},{id},{x1},{y1},{w},{h},1,-1,-1,-1\n'
elif data_type == 'kitti':
save_format = '{frame} {id} pedestrian 0 0 -10 {x1} {y1} {x2} {y2} -10 -10 -10 -1000 -1000 -1000 -10\n'
else:
raise ValueError(data_type)
with open(filename, 'w') as f:
for frame_id, tlwhs, track_ids in results:
if data_type == 'kitti':
frame_id -= 1
for tlwh, track_id in zip(tlwhs, track_ids):
if track_id < 0:
continue
x1, y1, w, h = tlwh
x2, y2 = x1 + w, y1 + h
line = save_format.format(frame=frame_id, id=track_id, x1=x1, y1=y1, x2=x2, y2=y2, w=w, h=h)
f.write(line)
logger.info('Save results to %s', filename)
def eval_seq(
opt,
dataloader,
data_type,
result_filename,
net,
save_dir=None,
frame_rate=30,
):
"""
Processes the video sequence given and provides the output
of tracking result (write the results in video file).
It uses JDE model for getting information about the online targets present.
Args:
opt (Any): Contains information passed as commandline arguments.
dataloader (Any): Fetching the image sequence and associated data.
data_type (str): Type of dataset corresponding(similar) to the given video.
result_filename (str): The name(path) of the file for storing results.
net (nn.Cell): Model.
save_dir (str): Path to output results.
frame_rate (int): Frame-rate of the given video.
Returns:
frame_id (int): Sequence number of the last sequence.
average_time (int): Average time for frame.
calls (int): Num of timer calls.
"""
if save_dir:
mkdir_if_missing(save_dir)
tracker = JDETracker(opt, net=net, frame_rate=frame_rate)
timer = Timer()
results = []
frame_id = 0
timer.tic()
timer.toc()
timer.calls -= 1
for img, img0 in dataloader:
if frame_id % 20 == 0:
log_info = f'Processing frame {frame_id} ({(1. / max(1e-5, timer.average_time)):.2f} fps)'
logger.info('%s', log_info)
# except initialization step at time calculation
if frame_id != 0:
timer.tic()
im_blob = Tensor(np.expand_dims(img, 0), mstype.float32)
online_targets = tracker.update(im_blob, img0)
online_tlwhs = []
online_ids = []
for t in online_targets:
tlwh = t.tlwh
tid = t.track_id
vertical = tlwh[2] / tlwh[3] > 1.6
if tlwh[2] * tlwh[3] > opt.min_box_area and not vertical:
online_tlwhs.append(tlwh)
online_ids.append(tid)
if frame_id != 0:
timer.toc()
# save results
results.append((frame_id + 1, online_tlwhs, online_ids))
if save_dir is not None:
online_im = vis.plot_tracking(
img0,
online_tlwhs,
online_ids,
frame_id=frame_id,
fps=1. / timer.average_time,
)
cv2.imwrite(os.path.join(save_dir, f'{frame_id:05}.jpg'), online_im)
frame_id += 1
# save results
write_results(result_filename, results, data_type)
return frame_id, timer.average_time, timer.calls - 1
def main(
opt,
data_root,
seqs,
exp_name,
save_videos=False,
):
logger.setLevel(logging.INFO)
result_root = os.path.join(data_root, '..', 'results', exp_name)
mkdir_if_missing(result_root)
data_type = 'mot'
darknet53 = DarkNet(
ResidualBlock,
opt.backbone_layers,
opt.backbone_input_shape,
opt.backbone_shape,
detect=True,
)
model = YOLOv3(
backbone=darknet53,
backbone_shape=opt.backbone_shape,
out_channel=opt.out_channel,
)
model = JDEeval(model, opt)
load_checkpoint(opt.ckpt_url, model)
model = Model(model)
# Run tracking
n_frame = 0
timer_avgs, timer_calls, accs = [], [], []
for seq in seqs:
output_dir = os.path.join(data_root, '..', 'outputs', exp_name, seq) if save_videos else None
logger.info('start seq: %s', seq)
dataloader = LoadImages(osp.join(data_root, seq, 'img1'), opt.anchor_scales, opt.img_size)
result_filename = os.path.join(result_root, f'{seq}.txt')
with open(os.path.join(data_root, seq, 'seqinfo.ini')) as f:
meta_info = f.read()
frame_rate = int(meta_info[meta_info.find('frameRate') + 10:meta_info.find('\nseqLength')])
nf, ta, tc = eval_seq(
opt,
dataloader,
data_type,
result_filename,
net=model,
save_dir=output_dir,
frame_rate=frame_rate,
)
n_frame += nf
timer_avgs.append(ta)
timer_calls.append(tc)
# eval
logger.info('Evaluate seq: %s', seq)
evaluator = Evaluator(data_root, seq, data_type)
accs.append(evaluator.eval_file(result_filename))
if save_videos:
output_video_path = osp.join(output_dir, f'{seq}.mp4')
cmd_str = f'ffmpeg -f image2 -i {output_dir}/%05d.jpg -c:v copy {output_video_path}'
os.system(cmd_str)
timer_avgs = np.asarray(timer_avgs)
timer_calls = np.asarray(timer_calls)
all_time = np.dot(timer_avgs, timer_calls)
avg_time = all_time / np.sum(timer_calls)
log_info = f'Time elapsed: {all_time:.2f} seconds, FPS: {(1.0 / avg_time):.2f}'
logger.info('%s', log_info)
# Get summary
metrics = mm.metrics.motchallenge_metrics
mh = mm.metrics.create()
summary = Evaluator.get_summary(accs, seqs, metrics)
strsummary = mm.io.render_summary(
summary,
formatters=mh.formatters,
namemap=mm.io.motchallenge_metric_names
)
print(strsummary)
Evaluator.save_summary(summary, os.path.join(result_root, f'summary_{exp_name}.xlsx'))
if __name__ == '__main__':
config = default_config
context.set_context(mode=context.GRAPH_MODE, device_target='GPU')
context.set_context(device_id=config.device_id)
data_root_path = os.path.join(config.dataset_root, _MOT16_DIR_FOR_TEST)
if not os.path.isdir(data_root_path):
raise NotADirectoryError(
f'Cannot find "{_MOT16_DIR_FOR_TEST}" subdirectory '
f'in the specified dataset root "{config.dataset_root}"'
)
main(
config,
data_root=data_root_path,
seqs=_MOT16_VALIDATION_FOLDERS,
exp_name=config.ckpt_url.split('/')[-2],
save_videos=config.save_videos,
)
# Copyright 2022 Huawei Technologies Co., Ltd
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ============================================================================
"""Evaluation script."""
import json
import time
import numpy as np
from mindspore import Model
from mindspore import context
from mindspore import dataset as ds
from mindspore.common import set_seed
from mindspore.communication.management import get_group_size
from mindspore.communication.management import get_rank
from mindspore.dataset.vision import py_transforms as PY
from mindspore.train.serialization import load_checkpoint
from cfg.config import config as default_config
from src.darknet import DarkNet, ResidualBlock
from src.dataset import JointDatasetDetection
from src.model import JDEeval
from src.model import YOLOv3
from src.utils import ap_per_class
from src.utils import bbox_iou
from src.utils import non_max_suppression
from src.utils import xywh2xyxy
set_seed(1)
def _get_rank_info(device_target):
"""
Get rank size and rank id.
"""
if device_target == 'GPU':
rank_size = get_group_size()
rank_id = get_rank()
else:
raise ValueError("Unsupported platform.")
return rank_size, rank_id
def main(
opt,
iou_thres,
conf_thres,
nms_thres,
nc,
):
img_size = opt.img_size
with open(opt.data_cfg_url) as f:
data_config = json.load(f)
test_paths = data_config['test']
dataset = JointDatasetDetection(
opt.dataset_root,
test_paths,
augment=False,
transforms=PY.ToTensor(),
config=opt,
)
dataloader = ds.GeneratorDataset(
dataset,
column_names=opt.col_names_val,
shuffle=False,
num_parallel_workers=1,
max_rowsize=12,
)
dataloader = dataloader.batch(opt.batch_size, True)
darknet53 = DarkNet(
ResidualBlock,
opt.backbone_layers,
opt.backbone_input_shape,
opt.backbone_shape,
detect=True,
)
model = YOLOv3(
backbone=darknet53,
backbone_shape=opt.backbone_shape,
out_channel=opt.out_channel,
)
model = JDEeval(model, opt)
load_checkpoint(opt.ckpt_url, model)
print(f'Evaluation for {opt.ckpt_url}')
model = Model(model)
mean_map, mean_r, mean_p, seen = 0.0, 0.0, 0.0, 0
print('%11s' * 5 % ('Image', 'Total', 'P', 'R', 'mAP'))
maps, mr, mp = [], [], []
ap_accum, ap_accum_count = np.zeros(nc), np.zeros(nc)
for batch_i, inputs in enumerate(dataloader):
imgs, targets, targets_len = inputs
targets = targets.asnumpy()
targets_len = targets_len.asnumpy()
t = time.time()
raw_output, _ = model.predict(imgs)
output = non_max_suppression(raw_output.asnumpy(), conf_thres=conf_thres, nms_thres=nms_thres)
for i, o in enumerate(output):
if o is not None:
output[i] = o[:, :6]
# Compute average precision for each sample
targets = [targets[i][:int(l)] for i, l in enumerate(targets_len)]
for labels, detections in zip(targets, output):
seen += 1
if detections is None:
# If there are labels but no detections mark as zero ap
if labels.shape[0] != 0:
maps.append(0)
mr.append(0)
mp.append(0)
continue
# Get detections sorted by decreasing confidence scores
detections = detections[np.argsort(-detections[:, 4])]
# If no labels add number of detections as incorrect
correct = []
if labels.shape[0] == 0:
maps.append(0)
mr.append(0)
mp.append(0)
continue
target_cls = labels[:, 0]
# Extract target boxes as (x1, y1, x2, y2)
target_boxes = xywh2xyxy(labels[:, 2:6])
target_boxes[:, 0] *= img_size[0]
target_boxes[:, 2] *= img_size[0]
target_boxes[:, 1] *= img_size[1]
target_boxes[:, 3] *= img_size[1]
detected = []
for *pred_bbox, _, _ in detections:
obj_pred = 0
pred_bbox = np.array(pred_bbox, dtype=np.float32).reshape(1, -1)
# Compute iou with target boxes
iou = bbox_iou(pred_bbox, target_boxes, x1y1x2y2=True)[0]
# Extract index of largest overlap
best_i = np.argmax(iou)
# If overlap exceeds threshold and classification is correct mark as correct
if iou[best_i] > iou_thres and obj_pred == labels[best_i, 0] and best_i not in detected:
correct.append(1)
detected.append(best_i)
else:
correct.append(0)
# Compute Average Precision (ap) per class
ap, ap_class, r, p = ap_per_class(
tp=correct,
conf=detections[:, 4],
pred_cls=np.zeros_like(detections[:, 5]), # detections[:, 6]
target_cls=target_cls,
)
# Accumulate AP per class
ap_accum_count += np.bincount(ap_class, minlength=nc)
ap_accum += np.bincount(ap_class, minlength=nc, weights=ap)
# Compute mean AP across all classes in this image, and append to image list
maps.append(ap.mean())
mr.append(r.mean())
mp.append(p.mean())
# Means of all images
mean_map = np.sum(maps) / (ap_accum_count + 1E-16)
mean_r = np.sum(mr) / (ap_accum_count + 1E-16)
mean_p = np.sum(mp) / (ap_accum_count + 1E-16)
if (batch_i + 1) % 1000 == 0:
# Print image mAP and running mean mAP
print(('%11s%11s' + '%11.3g' * 4 + 's') %
(seen, dataset.nf, mean_p, mean_r, mean_map, time.time() - t))
# Print results
print(f'mean_mAP: {mean_map[0]:.4f}, mean_R: {mean_r[0]:.4f}, mean_P: {mean_p[0]:.4f}')
if __name__ == "__main__":
config = default_config
context.set_context(mode=context.GRAPH_MODE, device_target=config.device_target)
context.set_context(device_id=config.device_id)
main(
opt=config,
iou_thres=0.5,
conf_thres=0.3,
nms_thres=0.45,
nc=config.num_classes,
)
# Copyright 2022 Huawei Technologies Co., Ltd
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ============================================================================
"""run export"""
from pathlib import Path
import numpy as np
from mindspore import Tensor
from mindspore import context
from mindspore import dtype as mstype
from mindspore import load_checkpoint
from mindspore.train.serialization import export
from cfg.config import config as default_config
from src.darknet import DarkNet, ResidualBlock
from src.model import JDEeval
from src.model import YOLOv3
def run_export(config):
"""
Export model to MINDIR.
"""
darknet53 = DarkNet(
ResidualBlock,
config.backbone_layers,
config.backbone_input_shape,
config.backbone_shape,
detect=True,
)
yolov3 = YOLOv3(
backbone=darknet53,
backbone_shape=config.backbone_shape,
out_channel=config.out_channel,
)
net = JDEeval(yolov3, default_config)
load_checkpoint(config.ckpt_url, net)
net.set_train(False)
input_data = Tensor(np.zeros([1, 3, 1088, 608]), dtype=mstype.float32)
name = Path(config.ckpt_url).stem
export(net, input_data, file_name=name, file_format='MINDIR')
print('Model exported successfully!')
if __name__ == "__main__":
context.set_context(
mode=context.GRAPH_MODE,
device_target=default_config.device_target,
device_id=default_config.device_id,
)
run_export(default_config)
# Copyright 2022 Huawei Technologies Co., Ltd
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ============================================================================
"""Inference script."""
import logging
import os
import os.path as osp
from mindspore import Model
from mindspore import context
from mindspore.train.serialization import load_checkpoint
from cfg.config import config as default_config
from eval import eval_seq
from src.darknet import DarkNet, ResidualBlock
from src.dataset import LoadVideo
from src.log import logger
from src.model import JDEeval
from src.model import YOLOv3
from src.utils import mkdir_if_missing
logger.setLevel(logging.INFO)
def track(opt):
"""
Inference of the input video.
Save the results into output-root (video, annotations and frames.).
"""
result_root = opt.output_root if opt.output_root != '' else '.'
mkdir_if_missing(result_root)
anchors = opt.anchor_scales
dataloader = LoadVideo(
opt.input_video,
anchor_scales=anchors,
img_size=opt.img_size,
)
darknet53 = DarkNet(
ResidualBlock,
opt.backbone_layers,
opt.backbone_input_shape,
opt.backbone_shape,
detect=True,
)
model = YOLOv3(
backbone=darknet53,
backbone_shape=opt.backbone_shape,
out_channel=opt.out_channel,
)
model = JDEeval(model, opt)
load_checkpoint(opt.ckpt_url, model)
model = Model(model)
logger.info('Starting tracking...')
result_filename = os.path.join(result_root, 'results.txt')
frame_rate = dataloader.frame_rate
frame_dir = None if opt.output_format == 'text' else osp.join(result_root, 'frame')
try:
eval_seq(
opt,
dataloader,
'mot',
result_filename,
net=model,
save_dir=frame_dir,
frame_rate=frame_rate,
)
except TypeError as e:
logger.info(e)
if opt.output_format == 'video':
output_video_path = osp.join(result_root, 'result.mp4')
cmd_str = f"ffmpeg -f image2 -i {osp.join(result_root, 'frame')}/%05d.jpg -c:v copy {output_video_path}"
os.system(cmd_str)
if __name__ == '__main__':
config = default_config
context.set_context(mode=context.GRAPH_MODE, device_target='GPU')
context.set_context(device_id=config.device_id)
track(config)
PyYAML
opencv-python>=4.5.5.62
motmetrics>=1.2.0
scipy>=1.7.2
lap>=0.4.0
Cython
cython-bbox>=0.1.3
torch
#!/bin/bash
# Copyright 2022 Huawei Technologies Co., Ltd
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ============================================================================
if [[ $# -ne 4 ]]; then
echo "Usage: bash ./scripts/run_distribute_train_gpu.sh [DEVICE_NUM] [LOGS_CKPT_DIR] [CKPT_URL] [DATASET_ROOT]"
exit 1;
fi
export RANK_SIZE=$1
get_real_path(){
if [ "${1:0:1}" == "/" ]; then
echo "$1"
else
realpath -m "$PWD/$1"
fi
}
LOGS_CKPT_DIR="$2"
if [ ! -d "$LOGS_CKPT_DIR" ]; then
mkdir "$LOGS_CKPT_DIR"
mkdir "$LOGS_CKPT_DIR/training_configs"
fi
DATASET_ROOT=$(get_real_path "$4")
CKPT_URL=$(get_real_path "$3")
cp ./*.py ./"$LOGS_CKPT_DIR"/training_configs
cp ./*.yaml ./"$LOGS_CKPT_DIR"/training_configs
cp -r ./cfg ./"$LOGS_CKPT_DIR"/training_configs
cp -r ./src ./"$LOGS_CKPT_DIR"/training_configs
mpirun -n $1 --allow-run-as-root\
python train.py \
--device_target="GPU" \
--logs_dir="$LOGS_CKPT_DIR" \
--dataset_root="$DATASET_ROOT" \
--ckpt_url="$CKPT_URL" \
--is_distributed=True \
> ./"$LOGS_CKPT_DIR"/distribute_train.log 2>&1 &
#!/bin/bash
# Copyright 2022 Huawei Technologies Co., Ltd
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ============================================================================
if [[ $# -ne 3 ]]; then
echo "Usage: bash ./scripts/run_eval_gpu.sh [DEVICE_ID] [CKPT_URL] [DATASET_ROOT]"
exit 1;
fi
export CUDA_VISIBLE_DEVICES=$1
get_real_path(){
if [ "${1:0:1}" == "/" ]; then
echo "$1"
else
realpath -m "$PWD/$1"
fi
}
CKPT_URL=$(get_real_path "$2")
DATASET_ROOT=$(get_real_path "$3")
if [ ! -d "$DATASET_ROOT" ]; then
echo "The specified dataset root is not a directory: $DATASET_ROOT"
exit 1;
fi
if [ ! -f "$CKPT_URL" ]; then
echo "The specified checkpoint does not exist: $CKPT_URL"
exit 1;
fi
python ./eval.py \
--device_target="GPU" \
--device_id=0 \
--ckpt_url="$CKPT_URL" \
--dataset_root="$DATASET_ROOT" \
> ./eval.log 2>&1 &
#!/bin/bash
# Copyright 2022 Huawei Technologies Co., Ltd
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ============================================================================
if [[ $# -ne 4 ]]; then
echo "Usage: bash ./scripts/run_standalone_train_gpu.sh [DEVICE_ID] [LOGS_CKPT_DIR] [CKPT_URL] [DATASET_ROOT]"
exit 1
fi
export CUDA_VISIBLE_DEVICES=$1
get_real_path(){
if [ "${1:0:1}" == "/" ]; then
echo "$1"
else
realpath -m "$PWD/$1"
fi
}
LOGS_CKPT_DIR="$2"
if [ ! -d "$LOGS_CKPT_DIR" ]; then
mkdir "$LOGS_CKPT_DIR"
mkdir "$LOGS_CKPT_DIR/training_configs"
fi
DATASET_ROOT=$(get_real_path "$4")
CKPT_URL=$(get_real_path "$3")
cp ./*.py ./"$LOGS_CKPT_DIR"/training_configs
cp ./*.yaml ./"$LOGS_CKPT_DIR"/training_configs
cp -r ./cfg ./"$LOGS_CKPT_DIR"/training_configs
cp -r ./src ./"$LOGS_CKPT_DIR"/training_configs
python ./train.py \
--device_target="GPU" \
--device_id=0 \
--logs_dir="$LOGS_CKPT_DIR" \
--dataset_root="$DATASET_ROOT" \
--ckpt_url="$CKPT_URL" \
--lr=0.00125 \
> ./"$2"/standalone_train.log 2>&1 &
# Copyright 2022 Huawei Technologies Co., Ltd
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ============================================================================
"""Checkpoint import."""
from pathlib import Path
import torch
from mindspore import Parameter
from mindspore import Tensor
from mindspore import dtype as mstype
from mindspore import save_checkpoint
from cfg.config import config
from src.darknet import DarkNet
from src.darknet import ResidualBlock
def convert(cfg):
"""
Init the DarkNet53 model, load PyTorch checkpoint,
change the keys order as well as in MindSpore and
save converted checkpoint with names,
corresponds to inited DarkNet model.
Args:
cfg: Config parameters.
Note:
Convert weights without last FC layer.
"""
darknet53 = DarkNet(
ResidualBlock,
cfg.backbone_layers,
cfg.backbone_input_shape,
cfg.backbone_shape,
detect=True,
)
# Get MindSpore names of parameters
ms_keys = list(darknet53.parameters_dict().keys())
# Get PyTorch weights and names
pt_weights = torch.load(cfg.ckpt_url, map_location=torch.device('cpu'))['state_dict']
pt_keys = list(pt_weights.keys())
# Remove redundant keys
pt_keys_clear = [
key
for key in pt_keys
if not key.endswith('tracked')
]
# One layer consist of 5 parameters
# Arrange PyTorch keys as well as in MindSpore
pt_keys_aligned = []
for block_num in range(len(pt_keys_clear[:-2]) // 5):
layer = pt_keys_clear[block_num * 5:(block_num + 1) * 5]
pt_keys_aligned.append(layer[0]) # Conv weight
pt_keys_aligned.append(layer[3]) # BN moving mean
pt_keys_aligned.append(layer[4]) # BN moving var
pt_keys_aligned.append(layer[1]) # BN gamma
pt_keys_aligned.append(layer[2]) # BN beta
ms_checkpoint = []
for key_ms, key_pt in zip(ms_keys, pt_keys_aligned):
weight = Parameter(Tensor(pt_weights[key_pt].numpy(), mstype.float32))
ms_checkpoint.append({'name': key_ms, 'data': weight})
checkpoint_name = str(Path(cfg.ckpt_url).resolve().parent / 'darknet53.ckpt')
save_checkpoint(ms_checkpoint, checkpoint_name)
print(f'Checkpoint converted successfully! Location {checkpoint_name}')
if __name__ == '__main__':
if not Path(config.ckpt_url).exists():
raise FileNotFoundError(f'Expect a path to the PyTorch checkpoint, but not found it at "{config.ckpt_url}"')
convert(config)
# Copyright 2022 Huawei Technologies Co., Ltd
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ============================================================================
"""DarkNet model."""
from mindspore import nn
from mindspore.ops import operations as P
def conv_block(
in_channels,
out_channels,
kernel_size,
stride,
dilation=1,
):
"""
Set a conv2d, BN and relu layer.
"""
pad_mode = 'same'
padding = 0
dbl = nn.SequentialCell(
[
nn.Conv2d(
in_channels,
out_channels,
kernel_size=kernel_size,
stride=stride,
padding=padding,
dilation=dilation,
pad_mode=pad_mode,
),
nn.BatchNorm2d(out_channels, momentum=0.1),
nn.ReLU(),
]
)
return dbl
class ResidualBlock(nn.Cell):
"""
DarkNet V1 residual block definition.
Args:
in_channels (int): Input channel.
out_channels (int): Output channel.
Returns:
out (ms.Tensor): Output tensor.
Examples:
ResidualBlock(3, 32)
"""
def __init__(
self,
in_channels,
out_channels,
):
super().__init__()
out_chls = out_channels//2
self.conv1 = conv_block(in_channels, out_chls, kernel_size=1, stride=1)
self.conv2 = conv_block(out_chls, out_channels, kernel_size=3, stride=1)
self.add = P.Add()
def construct(self, x):
identity = x
out = self.conv1(x)
out = self.conv2(out)
out = self.add(out, identity)
return out
class DarkNet(nn.Cell):
"""
DarkNet V1 network.
Args:
block (cell): Block for network.
layer_nums (list): Numbers of different layers.
in_channels (list): Input channel.
out_channels (list): Output channel.
detect (bool): Whether detect or not. Default:False.
Returns:
if detect = True:
c11 (ms.Tensor): Output from last layer.
if detect = False:
c7, c9, c11 (ms.Tensor): Outputs from different layers (FPN).
Examples:
DarkNet(
ResidualBlock,
[1, 2, 8, 8, 4],
[32, 64, 128, 256, 512],
[64, 128, 256, 512, 1024],
)
"""
def __init__(
self,
block,
layer_nums,
in_channels,
out_channels,
detect=False,
):
super().__init__()
self.detect = detect
if not len(layer_nums) == len(in_channels) == len(out_channels) == 5:
raise ValueError("the length of layer_num, inchannel, outchannel list must be 5!")
self.conv0 = conv_block(
3,
in_channels[0],
kernel_size=3,
stride=1,
)
self.conv1 = conv_block(
in_channels[0],
out_channels[0],
kernel_size=3,
stride=2,
)
self.layer1 = self._make_layer(
block,
layer_nums[0],
in_channel=out_channels[0],
out_channel=out_channels[0],
)
self.conv2 = conv_block(
in_channels[1],
out_channels[1],
kernel_size=3,
stride=2,
)
self.layer2 = self._make_layer(
block,
layer_nums[1],
in_channel=out_channels[1],
out_channel=out_channels[1],
)
self.conv3 = conv_block(
in_channels[2],
out_channels[2],
kernel_size=3,
stride=2,
)
self.layer3 = self._make_layer(
block,
layer_nums[2],
in_channel=out_channels[2],
out_channel=out_channels[2],
)
self.conv4 = conv_block(
in_channels[3],
out_channels[3],
kernel_size=3,
stride=2,
)
self.layer4 = self._make_layer(
block,
layer_nums[3],
in_channel=out_channels[3],
out_channel=out_channels[3],
)
self.conv5 = conv_block(
in_channels[4],
out_channels[4],
kernel_size=3,
stride=2,
)
self.layer5 = self._make_layer(
block,
layer_nums[4],
in_channel=out_channels[4],
out_channel=out_channels[4],
)
def _make_layer(self, block, layer_num, in_channel, out_channel):
"""
Make Layer for DarkNet.
Args:
block (Cell): DarkNet block.
layer_num (int): Layer number.
in_channel (int): Input channel.
out_channel (int): Output channel.
Examples:
_make_layer(ConvBlock, 1, 128, 256)
"""
layers = []
darkblk = block(in_channel, out_channel)
layers.append(darkblk)
for _ in range(1, layer_num):
darkblk = block(out_channel, out_channel)
layers.append(darkblk)
return nn.SequentialCell(layers)
def construct(self, x):
"""
Feed forward image.
"""
c1 = self.conv0(x)
c2 = self.conv1(c1)
c3 = self.layer1(c2)
c4 = self.conv2(c3)
c5 = self.layer2(c4)
c6 = self.conv3(c5)
c7 = self.layer3(c6)
c8 = self.conv4(c7)
c9 = self.layer4(c8)
c10 = self.conv5(c9)
c11 = self.layer5(c10)
if self.detect:
return c7, c9, c11
return c11
def darknet53():
"""
Get DarkNet53 neural network.
Returns:
Cell, cell instance of DarkNet53 neural network.
Examples:
darknet53()
"""
darknet = DarkNet(
block=ResidualBlock,
layer_nums=[1, 2, 8, 8, 4],
in_channels=[32, 64, 128, 256, 512],
out_channels=[64, 128, 256, 512, 1024],
)
return darknet
# Copyright 2022 Huawei Technologies Co., Ltd
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ============================================================================
"""Dataloader script."""
import math
import os
import os.path as osp
import random
from collections import OrderedDict
from pathlib import Path
import cv2
import numpy as np
from src.utils import build_thresholds
from src.utils import create_anchors_vec
from src.utils import xyxy2xywh
class LoadImages:
"""
Loader for inference.
Args:
path (str): Path to the directory, containing images.
img_size (list): Size of output image.
Returns:
img (np.array): Processed image.
img0 (np.array): Original image.
"""
def __init__(self, path, anchor_scales, img_size=(1088, 608)):
path = Path(path)
if not path.is_dir():
raise NotADirectoryError(f'Expected a path to the directory with images, got "{path}"')
self.files = sorted(path.glob('*.jpg'))
self.anchors, self.strides = create_anchors_vec(anchor_scales)
self.nf = len(self.files) # Number of img files.
self.width = img_size[0]
self.height = img_size[1]
self.count = 0
assert self.nf > 0, 'No images found in ' + path
def __iter__(self):
self.count = -1
return self
def __next__(self):
self.count += 1
if self.count == self.nf:
raise StopIteration
img_path = str(self.files[self.count])
# Read image
img0 = cv2.imread(img_path) # BGR
assert img0 is not None, 'Failed to load ' + img_path
# Padded resize
img, _, _, _ = letterbox(img0, height=self.height, width=self.width)
# Normalize RGB
img = img[:, :, ::-1].transpose(2, 0, 1)
img = np.ascontiguousarray(img, dtype=np.float32)
img /= 255.0
output = (img, img0)
return output
def __getitem__(self, idx):
idx = idx % self.nf
img_path = self.files[idx]
# Read image
img0 = cv2.imread(img_path) # BGR
assert img0 is not None, 'Failed to load ' + img_path
# Padded resize
img, _, _, _ = letterbox(img0, height=self.height, width=self.width)
# Normalize RGB
img = img[:, :, ::-1].transpose(2, 0, 1)
img = np.ascontiguousarray(img, dtype=np.float32)
img /= 255.0
output = (img, img0)
return output
def __len__(self):
return self.nf # number of files
class LoadVideo:
"""
Video loader for inference.
Args:
path (str): Path to video.
img_size (tuple): Size of output images size.
Returns:
count (int): Number of frame.
img (np.array): Processed image.
img0 (np.array): Original image.
"""
def __init__(self, path, anchor_scales, img_size=(1088, 608)):
if not os.path.isfile(path):
raise FileExistsError
self.cap = cv2.VideoCapture(path)
self.frame_rate = int(round(self.cap.get(cv2.CAP_PROP_FPS)))
self.vw = int(self.cap.get(cv2.CAP_PROP_FRAME_WIDTH))
self.vh = int(self.cap.get(cv2.CAP_PROP_FRAME_HEIGHT))
self.vn = int(self.cap.get(cv2.CAP_PROP_FRAME_COUNT))
self.anchors, self.strides = create_anchors_vec(anchor_scales)
self.width = img_size[0]
self.height = img_size[1]
self.count = 0
self.w, self.h = self.get_size(self.vw, self.vh, self.width, self.height)
print(f'Lenth of the video: {self.vn:d} frames')
def get_size(self, vw, vh, dw, dh):
wa, ha = float(dw) / vw, float(dh) / vh
a = min(wa, ha)
return int(vw * a), int(vh * a)
def __iter__(self):
self.count = -1
return self
def __next__(self):
self.count += 1
if self.count == len(self):
raise StopIteration
# Read image
_, img0 = self.cap.read() # BGR
assert img0 is not None, f'Failed to load frame {self.count:d}'
img0 = cv2.resize(img0, (self.w, self.h))
# Padded resize
img, _, _, _ = letterbox(img0, height=self.height, width=self.width)
# Normalize RGB
img = img[:, :, ::-1].transpose(2, 0, 1)
img = np.ascontiguousarray(img, dtype=np.float32)
img /= 255.0
output = (img, img0)
return output
def __len__(self):
return self.vn # number of files
class JointDataset:
"""
Loader for all datasets.
Args:
root (str): Absolute path to datasets.
paths (dict): Relative paths for datasets.
img_size (list): Size of output image.
augment (bool): Augment images or not.
transforms: Transform methods.
config (class): Config with hyperparameters.
Returns:
imgs (np_array): Prepared image. Shape (C, H, W)
tconf (s, m, b) (np_array): Mask with bg (0), gt (1) and ign (-1) indices. Shape (nA, nGh, nGw).
tbox (s, m, b) (np_array): Targets delta bbox values. Shape (nA, nGh, nGw, 4).
tid (s, m, b) (np_array): Grid with id for every cell. Shape (nA, nGh, nGw).
"""
def __init__(
self,
root,
paths,
img_size=(1088, 608),
k_max=200,
augment=False,
transforms=None,
config=None,
):
self.img_files = OrderedDict()
self.label_files = OrderedDict()
self.tid_num = OrderedDict()
self.tid_start_index = OrderedDict()
self.config = config
self.anchors, self.strides = create_anchors_vec(config.anchor_scales)
self.k_max = k_max
# Iterate for all of datasets to prepare paths to labels
for ds, img_path in paths.items():
with open(img_path, 'r') as file:
self.img_files[ds] = file.readlines()
self.img_files[ds] = [osp.join(root, x.strip()) for x in self.img_files[ds]]
self.img_files[ds] = list(filter(lambda x: len(x) > 0, self.img_files[ds]))
self.label_files[ds] = [
x.replace('images', 'labels_with_ids').replace('.png', '.txt').replace('.jpg', '.txt')
for x in self.img_files[ds]]
# Search for max pedestrian id in dataset
for ds, label_paths in self.label_files.items():
max_index = -1
for lp in label_paths:
lb = np.loadtxt(lp)
if lb.shape[0] < 1:
continue
if lb.ndim < 2:
img_max = lb[1]
else:
img_max = np.max(lb[:, 1])
if img_max > max_index:
max_index = img_max
self.tid_num[ds] = max_index + 1
last_index = 0
for k, v in self.tid_num.items():
self.tid_start_index[k] = last_index
last_index += v
self.nid = int(last_index + 1)
self.nds = [len(x) for x in self.img_files.values()]
self.cds = [sum(self.nds[:i]) for i in range(len(self.nds))]
self.nf = sum(self.nds)
self.width = img_size[0]
self.height = img_size[1]
self.augment = augment
self.transforms = transforms
print('=' * 40)
print('dataset summary')
print(self.tid_num)
print('total # identities:', self.nid)
print('start index')
print(self.tid_start_index)
print('=' * 40)
def get_data(self, img_path, label_path):
"""
Get and prepare data (augment img).
"""
height = self.height
width = self.width
img = cv2.imread(img_path) # BGR
if img is None:
raise ValueError(f'File corrupt {img_path}')
augment_hsv = True
if self.augment and augment_hsv:
# SV augmentation by 50%
fraction = 0.50
img_hsv = cv2.cvtColor(img, cv2.COLOR_BGR2HSV)
s = img_hsv[:, :, 1].astype(np.float32)
v = img_hsv[:, :, 2].astype(np.float32)
a = (random.random() * 2 - 1) * fraction + 1
s *= a
if a > 1:
np.clip(s, a_min=0, a_max=255, out=s)
a = (random.random() * 2 - 1) * fraction + 1
v *= a
if a > 1:
np.clip(v, a_min=0, a_max=255, out=v)
img_hsv[:, :, 1] = s.astype(np.uint8)
img_hsv[:, :, 2] = v.astype(np.uint8)
cv2.cvtColor(img_hsv, cv2.COLOR_HSV2BGR, dst=img)
h, w, _ = img.shape
img, ratio, padw, padh = letterbox(img, height=height, width=width)
# Load labels
if os.path.isfile(label_path):
labels0 = np.loadtxt(label_path, dtype=np.float32).reshape(-1, 6)
# Normalized xywh to pixel xyxy format
labels = labels0.copy()
labels[:, 2] = ratio * w * (labels0[:, 2] - labels0[:, 4] / 2) + padw
labels[:, 3] = ratio * h * (labels0[:, 3] - labels0[:, 5] / 2) + padh
labels[:, 4] = ratio * w * (labels0[:, 2] + labels0[:, 4] / 2) + padw
labels[:, 5] = ratio * h * (labels0[:, 3] + labels0[:, 5] / 2) + padh
else:
labels = np.array([])
# Augment image and labels
if self.augment:
img, labels, _ = random_affine(img, labels, degrees=(-5, 5), translate=(0.10, 0.10), scale=(0.50, 1.20))
nlbls = len(labels)
if nlbls > 0:
# convert xyxy to xywh
labels[:, 2:6] = xyxy2xywh(labels[:, 2:6].copy()) # / height
labels[:, 2] /= width
labels[:, 3] /= height
labels[:, 4] /= width
labels[:, 5] /= height
if self.augment:
# random left-right flip
lr_flip = True
if lr_flip & (random.random() > 0.5):
img = np.fliplr(img)
if nlbls > 0:
labels[:, 2] = 1 - labels[:, 2]
img = np.ascontiguousarray(img[:, :, ::-1]) # BGR to RGB
if self.transforms is not None:
img = self.transforms(img)
return img, labels, img_path
def __getitem__(self, files_index):
"""
Iterator function for train dataset
"""
for i, c in enumerate(self.cds):
if files_index >= c:
ds = list(self.label_files.keys())[i]
start_index = c
img_path = self.img_files[ds][files_index - start_index]
label_path = self.label_files[ds][files_index - start_index]
imgs, labels, img_path = self.get_data(img_path, label_path)
for i, _ in enumerate(labels):
if labels[i, 1] > -1:
labels[i, 1] += self.tid_start_index[ds]
# Graph mode in Mindspore uses constant shapes
# Thus, it is necessary to fill targets to max possible ids in image
to_fill = 100 - labels.shape[0]
padding = np.zeros((to_fill, 6), dtype=np.float32)
labels = np.concatenate((labels, padding), axis=0)
# Calculate confidence mask, bbox delta and ids for every map size
small, medium, big = build_thresholds(
labels=labels,
anchor_vec_s=self.anchors[0],
anchor_vec_m=self.anchors[1],
anchor_vec_b=self.anchors[2],
k_max=self.k_max,
)
tconf_s, tbox_s, tid_s, emb_indices_s = small
tconf_m, tbox_m, tid_m, emb_indices_m = medium
tconf_b, tbox_b, tid_b, emb_indices_b = big
total_values = (
imgs.astype(np.float32),
tconf_s,
tbox_s,
tid_s,
tconf_m,
tbox_m,
tid_m,
tconf_b,
tbox_b,
tid_b,
emb_indices_s,
emb_indices_m,
emb_indices_b,
)
return total_values
def __len__(self):
return self.nf # number of batches
class JointDatasetDetection(JointDataset):
"""
Joint dataset for evaluation.
"""
def __init__(self, *args, **kwargs):
super().__init__(*args, **kwargs)
def __getitem__(self, files_index):
"""
Iterator function for train dataset.
"""
for i, c in enumerate(self.cds):
if files_index >= c:
ds = list(self.label_files.keys())[i]
start_index = c
img_path = self.img_files[ds][files_index - start_index]
label_path = self.label_files[ds][files_index - start_index]
imgs, labels, img_path = self.get_data(img_path, label_path)
for i, _ in enumerate(labels):
if labels[i, 1] > -1:
labels[i, 1] += self.tid_start_index[ds]
targets_size = labels.shape[0]
# Graph mode in Mindspore uses constant shapes
# Thus, it is necessary to fill targets to max possible ids in image.
to_fill = 100 - labels.shape[0]
padding = np.zeros((to_fill, 6), dtype=np.float32)
labels = np.concatenate((labels, padding), axis=0)
output = (imgs.astype(np.float32), labels, targets_size)
return output
def letterbox(
img,
height=608,
width=1088,
color=(127.5, 127.5, 127.5),
):
"""
Resize a rectangular image to a padded rectangular
and fill padded border with color.
"""
shape = img.shape[:2] # shape = [height, width]
ratio = min(float(height) / shape[0], float(width) / shape[1])
new_shape = (round(shape[1] * ratio), round(shape[0] * ratio)) # new_shape = [width, height]
dw = (width - new_shape[0]) / 2 # width padding
dh = (height - new_shape[1]) / 2 # height padding
top, bottom = round(dh - 0.1), round(dh + 0.1)
left, right = round(dw - 0.1), round(dw + 0.1)
img = cv2.resize(img, new_shape, interpolation=cv2.INTER_AREA) # resized, no border
img = cv2.copyMakeBorder(img, top, bottom, left, right, cv2.BORDER_CONSTANT, value=color) # padded rectangular
return img, ratio, dw, dh
def random_affine(
img,
targets=None,
degrees=(-10, 10),
translate=(.1, .1),
scale=(.9, 1.1),
shear=(-2, 2),
border_value=(127.5, 127.5, 127.5),
):
"""
Apply several data augmentation techniques,
such as random rotation, random scale, color jittering
to reduce overfitting.
Every rotation and scaling and etc.
is also applied to targets bbox cords.
"""
border = 0 # width of added border (optional)
height = img.shape[0]
width = img.shape[1]
# Rotation and Scale
r = np.eye(3)
a = random.random() * (degrees[1] - degrees[0]) + degrees[0]
s = random.random() * (scale[1] - scale[0]) + scale[0]
r[:2] = cv2.getRotationMatrix2D(angle=a, center=(img.shape[1] / 2, img.shape[0] / 2), scale=s)
# Translation
t = np.eye(3)
t[0, 2] = (random.random() * 2 - 1) * translate[0] * img.shape[0] + border # x translation (pixels)
t[1, 2] = (random.random() * 2 - 1) * translate[1] * img.shape[1] + border # y translation (pixels)
# Shear
s = np.eye(3)
s[0, 1] = math.tan((random.random() * (shear[1] - shear[0]) + shear[0]) * math.pi / 180) # x shear (deg)
s[1, 0] = math.tan((random.random() * (shear[1] - shear[0]) + shear[0]) * math.pi / 180) # y shear (deg)
m = s @ t @ r # Combined rotation matrix. ORDER IS IMPORTANT HERE!
imw = cv2.warpPerspective(img, m, dsize=(width, height), flags=cv2.INTER_LINEAR,
borderValue=border_value) # BGR order borderValue
# Return warped points also
if targets is not None:
if targets.shape[0] > 0:
n = targets.shape[0]
points = targets[:, 2:6].copy()
area0 = (points[:, 2] - points[:, 0]) * (points[:, 3] - points[:, 1])
# warp points
xy = np.ones((n * 4, 3))
xy[:, :2] = points[:, [0, 1, 2, 3, 0, 3, 2, 1]].reshape(n * 4, 2) # x1y1, x2y2, x1y2, x2y1
xy = (xy @ m.T)[:, :2].reshape(n, 8)
# create new boxes
x = xy[:, [0, 2, 4, 6]]
y = xy[:, [1, 3, 5, 7]]
xy = np.concatenate((x.min(1), y.min(1), x.max(1), y.max(1))).reshape(4, n).T
# apply angle-based reduction
radians = a * math.pi / 180
reduction = max(abs(math.sin(radians)), abs(math.cos(radians))) ** 0.5
x = (xy[:, 2] + xy[:, 0]) / 2
y = (xy[:, 3] + xy[:, 1]) / 2
w = (xy[:, 2] - xy[:, 0]) * reduction
h = (xy[:, 3] - xy[:, 1]) * reduction
xy = np.concatenate((x - w / 2, y - h / 2, x + w / 2, y + h / 2)).reshape(4, n).T
# reject warped points outside of image
np.clip(xy[:, 0], 0, width, out=xy[:, 0])
np.clip(xy[:, 2], 0, width, out=xy[:, 2])
np.clip(xy[:, 1], 0, height, out=xy[:, 1])
np.clip(xy[:, 3], 0, height, out=xy[:, 3])
w = xy[:, 2] - xy[:, 0]
h = xy[:, 3] - xy[:, 1]
area = w * h
ar = np.maximum(w / (h + 1e-16), h / (w + 1e-16))
i = (w > 4) & (h > 4) & (area / (area0 + 1e-16) > 0.1) & (ar < 10)
targets = targets[i]
targets[:, 2:6] = xy[i]
return imw, targets, m
return imw
# Copyright 2022 Huawei Technologies Co., Ltd
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ============================================================================
"""Evaluation scripts."""
import copy
import os
import motmetrics as mm
import numpy as np
import pandas as pd
from src.io import read_results, unzip_objs
mm.lap.default_solver = 'lap'
class Evaluator:
"""
Evaluation for tracking with motmetrics.
"""
def __init__(self, data_root, seq_name, data_type):
self.data_root = data_root
self.seq_name = seq_name
self.data_type = data_type
self.load_annotations()
self.reset_accumulator()
def load_annotations(self):
"""Load groundtruths."""
assert self.data_type == 'mot'
gt_filename = os.path.join(self.data_root, self.seq_name, 'gt', 'gt.txt')
self.gt_frame_dict = read_results(gt_filename, self.data_type, is_gt=True)
self.gt_ignore_frame_dict = read_results(gt_filename, self.data_type, is_ignore=True)
def reset_accumulator(self):
self.acc = mm.MOTAccumulator(auto_id=True)
def eval_frame(self, frame_id, trk_tlwhs, trk_ids, rtn_events=False):
"""
Eval one frame.
"""
# results
trk_tlwhs = np.copy(trk_tlwhs)
trk_ids = np.copy(trk_ids)
# gts
gt_objs = self.gt_frame_dict.get(frame_id, [])
gt_tlwhs, gt_ids = unzip_objs(gt_objs)[:2]
# ignore boxes
ignore_objs = self.gt_ignore_frame_dict.get(frame_id, [])
ignore_tlwhs = unzip_objs(ignore_objs)[0]
# remove ignored results
keep = np.ones(len(trk_tlwhs), dtype=bool)
iou_distance = mm.distances.iou_matrix(ignore_tlwhs, trk_tlwhs, max_iou=0.5)
if iou_distance.size > 0:
match_is, match_js = mm.lap.linear_sum_assignment(iou_distance)
match_is, match_js = map(lambda a: np.asarray(a, dtype=int), [match_is, match_js])
match_ious = iou_distance[match_is, match_js]
match_js = np.asarray(match_js, dtype=int)
match_js = match_js[np.logical_not(np.isnan(match_ious))]
keep[match_js] = False
trk_tlwhs = trk_tlwhs[keep]
trk_ids = trk_ids[keep]
# get distance matrix
iou_distance = mm.distances.iou_matrix(gt_tlwhs, trk_tlwhs, max_iou=0.5)
# acc
self.acc.update(gt_ids, trk_ids, iou_distance)
if rtn_events and iou_distance.size > 0 and hasattr(self.acc, 'last_mot_events'):
events = self.acc.last_mot_events
else:
events = None
return events
def eval_file(self, filename):
"""
Eval file.
"""
self.reset_accumulator()
result_frame_dict = read_results(filename, self.data_type, is_gt=False)
frames = sorted(list(set(self.gt_frame_dict.keys()) | set(result_frame_dict.keys())))
for frame_id in frames:
trk_objs = result_frame_dict.get(frame_id, [])
trk_tlwhs, trk_ids = unzip_objs(trk_objs)[:2]
self.eval_frame(frame_id, trk_tlwhs, trk_ids, rtn_events=False)
return self.acc
@staticmethod
def get_summary(accs, names, metrics=('mota', 'num_switches', 'idp', 'idr', 'idf1', 'precision', 'recall')):
"""
Get MOT summary.
"""
names = copy.deepcopy(names)
if metrics is None:
metrics = mm.metrics.motchallenge_metrics
metrics = copy.deepcopy(metrics)
mh = mm.metrics.create()
summary = mh.compute_many(
accs,
metrics=metrics,
names=names,
generate_overall=True
)
return summary
@staticmethod
def save_summary(summary, filename):
"""
Save evaluation summary.
"""
writer = pd.ExcelWriter(filename)
summary.to_excel(writer)
writer.save()
# Copyright 2022 Huawei Technologies Co., Ltd
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ============================================================================
"""MOT utils."""
import os
import numpy as np
def read_results(filename, data_type: str, is_gt=False, is_ignore=False):
"""
Read results.
"""
if data_type in ('mot', 'lab'):
read_fun = read_mot_results
else:
raise ValueError('Unknown data type: {data_type}')
return read_fun(filename, is_gt, is_ignore)
def read_mot_results(filename, is_gt, is_ignore):
"""
Read MOT results.
"""
valid_labels = {1}
ignore_labels = {2, 7, 8, 12}
results_dict = {}
if os.path.isfile(filename):
with open(filename, 'r') as f:
for line in f.readlines():
linelist = line.split(',')
if len(linelist) < 7:
continue
fid = int(linelist[0])
if fid < 1:
continue
results_dict.setdefault(fid, [])
if is_gt:
if 'MOT16-' in filename or 'MOT17-' in filename:
label = int(float(linelist[7]))
mark = int(float(linelist[6]))
if mark == 0 or label not in valid_labels:
continue
score = 1
elif is_ignore:
if 'MOT16-' in filename or 'MOT17-' in filename:
label = int(float(linelist[7]))
vis_ratio = float(linelist[8])
if label not in ignore_labels and vis_ratio >= 0:
continue
else:
continue
score = 1
else:
score = float(linelist[6])
tlwh = tuple(map(float, linelist[2:6]))
target_id = int(linelist[1])
results_dict[fid].append((tlwh, target_id, score))
return results_dict
def unzip_objs(objs):
"""
Unzip objects.
"""
if objs:
tlwhs, ids, scores = zip(*objs)
else:
tlwhs, ids, scores = [], [], []
tlwhs = np.asarray(tlwhs, dtype=float).reshape(-1, 4)
return tlwhs, ids, scores
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment