!1954 Models: JDE

Merge pull request !1954 from adenisov/models-pr-jde

!1954 Models: JDE
Merge pull request !1954 from adenisov/models-pr-jde
27395d47 · i-robot · Gitee · df922b9a · 773684d5 · 27395d47
Unverified Commit 27395d47 authored 3 years ago by i-robot Committed by Gitee 3 years ago
--- a/research/cv/JDE/DATASET_ZOO.md
+++ b/research/cv/JDE/DATASET_ZOO.md
+# Dataset Zoo
+
+Datasets preparing was used from [Towards-Realtime-MOT](https://github.com/Zhongdao/Towards-Realtime-MOT/blob/master/DATASET_ZOO.md)
+
+## Data Format
+
+The root folder of datasets will have the following structure:
+
+```text
+.
+└─datasets
+  ├─Caltech
+  ├─Cityscapes
+  ├─CUHKSYSU
+  ├─ETHZ
+  ├─MOT16
+  ├─MOT17
+  └─PRW
+```
+
+Every image has a corresponding annotation text. Given an image path,
+the annotation text path can be generated by replacing the string `images` with `labels_with_ids` and replacing `.jpg` with `.txt`.
+
+In the annotation text, each line is describing a bounding box and has the following format:
+
+```text
+[class] [identity] [x_center] [y_center] [width] [height]
+```
+
+The field `[class]` should be `0`. Only single-class multi-object tracking is supported.
+
+The field `[identity]` is an integer from `0` to `num_identities - 1`, or `-1` if this box has no identity annotation.
+
+- Note that the values of `[x_center] [y_center] [width] [height]` are normalized by the width/height of the image, so they are floating point numbers ranging from 0 to 1.
+
+## Download
+
+### Caltech Pedestrian
+
+Download all archives `set**.tar` files from [this page](https://drive.google.com/drive/folders/1IBlcJP8YsCaT81LwQ2YwQJac8bf1q8xF?usp=sharing) and extract to `Caltech/data`.
+
+Download [annotations](https://drive.google.com/file/d/1h8vxl_6tgi9QVYoer9XcY9YwNB32TE5k/view?usp=sharing) and unzip to `Caltech/data/labels_with_ids`.
+
+Download [this tool](https://github.com/mitmul/caltech-pedestrian-dataset-converter) to convert the original data format to images.
+Move `scripts` folder of tool to `Caltech` folder and use command:
+
+```bash
+python scripts/convert_seqs.py
+```
+
+The structure of the dataset after completing all steps will be the following:
+
+```text
+.
+└─Caltech
+  └─data
+    ├─images
+    │ └─***
+    └─labels_with_ids
+      └─***
+```
+
+Note: *** - it is a data (images or annotations)
+
+### CityPersons
+
+Google Drive:
+[[0]](https://drive.google.com/file/d/1DgLHqEkQUOj63mCrS_0UGFEM9BG8sIZs/view?usp=sharing)
+[[1]](https://drive.google.com/file/d/1BH9Xz59UImIGUdYwUR-cnP1g7Ton_LcZ/view?usp=sharing)
+[[2]](https://drive.google.com/file/d/1q_OltirP68YFvRWgYkBHLEFSUayjkKYE/view?usp=sharing)
+[[3]](https://drive.google.com/file/d/1VSL0SFoQxPXnIdBamOZJzHrHJ1N2gsTW/view?usp=sharing)
+
+Download `.zip` archives from links and use the following commands.
+
+```bash
+zip --FF Citypersons --out c.zip
+unzip c.zip
+mv Citypersons Cityscapes
+```
+
+The structure of the dataset after completing all steps will be the following:
+
+```text
+.
+└─Cityscapes
+  ├─images
+  │ ├─train
+  │ └─val
+  └─labels_with_ids
+    ├─train
+    └─val
+```
+
+### CUHK-SYSU
+
+Google Drive:
+[[0]](https://drive.google.com/file/d/1D7VL43kIV9uJrdSCYl53j89RE2K-IoQA/view?usp=sharing)
+
+Original dataset webpage: [CUHK-SYSU Person Search Dataset](http://www.ee.cuhk.edu.hk/~xgwang/PS/dataset.html)
+
+Download dataset, unzip and use command below.
+
+```bash
+mv CUHK-SYSU CUHKSYSU
+```
+
+The structure of the dataset will be the following:
+
+```text
+.
+└─CUHKSYSU
+  ├─images
+  │ └─***
+  └─labels_with_ids
+    └─***
+```
+
+Note: *** - it is a data (images or annotations)
+
+### PRW
+
+Google Drive:
+[[0]](https://drive.google.com/file/d/116_mIdjgB-WJXGe8RYJDWxlFnc_4sqS8/view?usp=sharing)
+
+Download dataset and unzip. The structure of the dataset will be the following:
+
+```text
+.
+└─PRW
+  ├─images
+  │ └─***
+  └─labels_with_ids
+    └─***
+```
+
+Note: *** - it is a data (images or annotations)
+
+### ETHZ (overlapping with MOT-16 removed)
+
+Google Drive:
+[[0]](https://drive.google.com/file/d/19QyGOCqn8K_rc9TXJ8UwLSxCx17e0GoY/view?usp=sharing)
+
+Original dataset webpage: [ETHZ pedestrian dataset](https://data.vision.ee.ethz.ch/cvl/aess/dataset/)
+
+Download dataset and unzip. The structure of the dataset will be the following:
+
+```text
+.
+└─ETHZ
+  ├─eth01
+  │ ├─images
+  │ │ └─***
+  │ └─labels_with_ids
+  │   └─***
+  ├─eth02
+  ├─eth03
+  ├─eth05
+  └─eth07
+```
+
+Note: *** - it is a data (images or annotations). Same structure to every 'eth*' folder.
+
+### MOT-17
+
+Official link:
+[[0]](https://motchallenge.net/data/MOT17.zip)
+
+Original dataset webpage: [MOT-17](https://motchallenge.net/data/MOT17/)
+
+After downloading, unzip and use `prepare_mot17.py` script from the:
+
+```bash
+python data/prepare_mot17.py --seq_root /path/to/MOT17/train
+```
+
+The structure of the dataset after completing all steps will be the following:
+
+```text
+.
+└─MOT17
+  ├─images
+  │ └─train
+  └─labels_with_ids
+    └─train
+```
+
+### MOT-16 (for evaluation)
+
+Google Drive:
+[[0]](https://drive.google.com/file/d/1254q3ruzBzgn4LUejDVsCtT05SIEieQg/view?usp=sharing)
+
+Original dataset webpage: [MOT-16](https://motchallenge.net/data/MOT16/)
+
+Download link: [MOT-16.zip](https://motchallenge.net/data/MOT16.zip)
+
+> The section "Download" in the bottom of the web-page. Link: "Get all data".
+
+Download dataset and unzip. The structure of the dataset will be the following:
+
+```text
+.
+└─MOT16
+  └─train
+```
+
+# Data config
+
+Download [schemas](https://github.com/Zhongdao/Towards-Realtime-MOT/tree/master/data) of the training data with relative paths for every image, divided into train/val parts and move into `data` folder.
+
+```text
+.
+└── data
+    ├─ caltech.10k.val
+    ├─ caltech.train
+    ├─ caltech.val
+    ├─ citypersons.train
+    ├─ citypersons.val
+    ├─ cuhksysu.train
+    ├─ cuhksysu.val
+    ├─ eth.train
+    ├─ mot17.train
+    ├─ prw.train
+    └─ prw.val
+```
+
+# Citation
+
+Caltech:
+
+```text
+@inproceedings{ dollarCVPR09peds,
+       author = "P. Doll\'ar and C. Wojek and B. Schiele and  P. Perona",
+       title = "Pedestrian Detection: A Benchmark",
+       booktitle = "CVPR",
+       month = "June",
+       year = "2009",
+       city = "Miami",
+}
+```
+
+Citypersons:
+
+```text
+@INPROCEEDINGS{Shanshan2017CVPR,
+  author = {Shanshan Zhang and Rodrigo Benenson and Bernt Schiele},
+  title = {CityPersons: A Diverse Dataset for Pedestrian Detection},
+  booktitle = {CVPR},
+  year = {2017}
+ }
+
+@INPROCEEDINGS{Cordts2016Cityscapes,
+  title={The Cityscapes Dataset for Semantic Urban Scene Understanding},
+  author={Cordts, Marius and Omran, Mohamed and Ramos, Sebastian and Rehfeld, Timo and Enzweiler, Markus and Benenson, Rodrigo and Franke, Uwe and Roth, Stefan and Schiele, Bernt},
+  booktitle={Proc. of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
+  year={2016}
+}
+```
+
+CUHK-SYSU:
+
+```text
+@inproceedings{xiaoli2017joint,
+  title={Joint Detection and Identification Feature Learning for Person Search},
+  author={Xiao, Tong and Li, Shuang and Wang, Bochao and Lin, Liang and Wang, Xiaogang},
+  booktitle={CVPR},
+  year={2017}
+}
+```
+
+PRW:
+
+```text
+@inproceedings{zheng2017person,
+  title={Person re-identification in the wild},
+  author={Zheng, Liang and Zhang, Hengheng and Sun, Shaoyan and Chandraker, Manmohan and Yang, Yi and Tian, Qi},
+  booktitle={Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition},
+  pages={1367--1376},
+  year={2017}
+}
+```
+
+ETHZ:
+
+```text
+@InProceedings{eth_biwi_00534,
+  author = {A. Ess and B. Leibe and K. Schindler and and L. van Gool},
+  title = {A Mobile Vision System for Robust Multi-Person Tracking},
+  booktitle = {IEEE Conference on Computer Vision and Pattern Recognition (CVPR'08)},
+  year = {2008},
+  month = {June},
+  publisher = {IEEE Press},
+}
+```
+
+MOT-16&17:
+
+```text
+@article{milan2016mot16,
+  title={MOT16: A benchmark for multi-object tracking},
+  author={Milan, Anton and Leal-Taix{\'e}, Laura and Reid, Ian and Roth, Stefan and Schindler, Konrad},
+  journal={arXiv preprint arXiv:1603.00831},
+  year={2016}
+}
+```
--- a/research/cv/JDE/README.md
+++ b/research/cv/JDE/README.md
+# Contents
+
+- [Contents](#contents)
+    - [JDE Description](#jde-description)
+    - [Model Architecture](#model-architecture)
+    - [Dataset](#dataset)
+    - [Environment Requirements](#environment-requirements)
+    - [Quick Start](#quick-start)
+    - [Script Description](#script-description)
+        - [Script and Sample Code](#script-and-sample-code)
+        - [Script Parameters](#script-parameters)
+        - [Training Process](#training-process)
+            - [Standalone Training](#standalone-training)
+            - [Distribute Training](#distribute-training)
+        - [Evaluation Process](#evaluation-process)
+            - [Evaluation](#evaluation)
+        - [Inference Process](#inference-process)
+            - [Usage](#usage)
+            - [Result](#result)
+    - [Model Description](#model-description)
+        - [Performance](#performance)
+            - [Training Performance](#training-performance)
+            - [Evaluation Performance](#evaluation-performance)
+    - [ModelZoo Homepage](#modelzoo-homepage)
+
+## [JDE Description](#contents)
+
+Paper with introduced JDE model is dedicated to the improving efficiency of an MOT system.
+It's introduce an early attempt that Jointly learns the Detector and Embedding model (JDE) in a single-shot deep network.
+In other words, the proposed JDE employs a single network to simultaneously output detection results and the corresponding appearance embeddings of the detected boxes.
+In comparison, SDE methods and two-stage methods are characterized by re-sampled pixels (bounding boxes) and feature maps, respectively.
+Both the bounding boxes and feature maps are fed into a separate re-ID model for appearance feature extraction.
+Method is near real-time while being almost as accurate as the SDE methods.
+
+[Paper](https://arxiv.org/pdf/1909.12605.pdf):  Towards Real-Time Multi-Object Tracking. Department of Electronic Engineering, Tsinghua University
+
+## [Model Architecture](#contents)
+
+Architecture of the JDE is the Feature Pyramid Network (FPN).
+FPN makes predictions from multiple scales, thus bringing improvement in pedestrian detection where the scale of targets varies a lot.
+An input video frame first undergoes a forward pass through a backbone network to obtain feature maps at three scales, namely, scales with 1/32, 1/16 and 1/8 down-sampling rate, respectively.
+Then, the feature map with the smallest size (also the semantically strongest features) is up-sampled and fused with the feature map from the second smallest scale by skip connection, and the same goes for the other scales.
+Finally, prediction heads are added upon fused feature maps at all the three scales.
+A prediction head consists of several stacked convolutional layers and outputs a dense prediction map of size (6A + D) × H × W, where A is the number of anchor templates assigned to this scale, and D is the dimension of the embedding.
+
+## [Dataset](#contents)
+
+Used a large-scale training set by putting together six publicly available datasets on pedestrian detection, MOT and person search.
+
+These datasets can be categorized into two types: ones that only contain bounding box annotations, and ones that have both bounding box and identity annotations.
+The first category includes the ETH dataset and the CityPersons (CP) dataset. The second category includes the CalTech (CT) dataset, MOT16 (M16) dataset, CUHK-SYSU (CS) dataset and PRW dataset.
+Training subsets of all these datasets are gathered to form the joint training set, and videos in the ETH dataset that overlap with the MOT-16 test set are excluded for fair evaluation.
+
+Datasets preparations are described in [DATASET_ZOO.md](DATASET_ZOO.md).
+
+Datasets size: 134G, 1 object category (pedestrian).
+
+Note: `--dataset_root` is used as an entry point for all datasets, used for training and evaluating this model.
+
+Organize your dataset structure as follows:
+
+```text
+.
+└─dataset_root/
+  ├─Caltech/
+  ├─Cityscapes/
+  ├─CUHKSYSU/
+  ├─ETHZ/
+  ├─MOT16/
+  ├─MOT17/
+  └─PRW/
+```
+
+Information about train part of dataset.
+
+| Dataset | ETH |  CP |  CT | M16 |  CS | PRW | Total |
+| :------:|:---:|:---:|:---:|:---:|:---:|:---:|:-----:|
+| # img   |2K   |3K   |27K  |53K  |11K  |6K   |54K    |
+| # box   |17K  |21K  |46K  |112K |55K  |18K  |270K   |
+| # ID    |-    |-    |0.6K |0.5K |7K   |0.5K |8.7K   |
+
+## [Environment Requirements](#contents)
+
+- Hardware（GPU）
+- Prepare hardware environment with GPU processor.
+- Framework
+    - [MindSpore](https://www.mindspore.cn/install/en)
+- For more information, please check the resources below：
+    - [MindSpore Tutorials](https://www.mindspore.cn/tutorials/en/master/index.html)
+    - [MindSpore Python API](https://www.mindspore.cn/docs/api/en/master/index.html)
+
+## [Quick Start](#contents)
+
+After installing MindSpore through the official website, you can follow the steps below for training and evaluation,
+in particular, before training, you need to install `requirements.txt` by following command `pip install -r requirements.txt`.
+
+> If an error occurred, update pip by `pip install --upgrade pip` and try again.
+> If it didn't help install packages manually by using `pip install {package from requirements.txt}`.
+
+Note: The PyTorch is used only for checkpoint conversion.
+
+All trainings will starts from pre-trained backbone,
+[download](https://drive.google.com/file/d/1keZwVIfcWmxfTiswzOKUwkUz2xjvTvfm/view) and convert the pre-trained on
+ImageNet backbone with commands below:
+
+```bash
+# From the root model directory run
+python -m src.convert_checkpoint --ckpt_url [PATH_TO_PYTORCH_CHECKPOINT]
+```
+
+- PATH_TO_PYTORCH_CHECKPOINT - Path to the downloaded darknet53 PyTorch checkpoint.
+
+After converting the checkpoint and installing the requirements.txt, you can run the training scripts:
+
+```bash
+# Run standalone training example
+bash scripts/run_standalone_train_gpu.sh [DEVICE_ID] [LOGS_CKPT_DIR] [CKPT_URL] [DATASET_ROOT]
+
+# Run distribute training example
+bash scripts/run_distribute_train_gpu.sh [DEVICE_NUM] [LOGS_CKPT_DIR] [CKPT_URL] [DATASET_ROOT]
+```
+
+- DEVICE_ID - Device ID
+- LOGS_CKPT_DIR - path to the directory, where the training results will be stored.
+- CKPT_URL - Path to the converted pre-trained DarkNet53 backbone.
+- DATASET_ROOT - Path to the dataset root directory (containing all dataset parts, described in [DATASET_ZOO.md](DATASET_ZOO.md))
+
+## [Script Description](#contents)
+
+### [Script and Sample Code](#contents)
+
+```text
+.
+└─JDE
+  ├─data
+  │ └─prepare_mot17.py                 # MOT17 data preparation script
+  ├─cfg
+  │ ├─ccmcpe.json                      # paths to dataset schema (defining relative paths structure)
+  │ └─config.py                        # parameter parser
+  ├─scripts
+  │ ├─run_distribute_train_gpu.sh      # launch distribute train on GPU
+  │ ├─run_eval_gpu.sh                  # launch evaluation on GPU
+  │ └─run_standalone_train_gpu.sh      # launch standalone train on GPU
+  ├─src
+  │ ├─__init__.py
+  │ ├─convert_checkpoint.py            # backbone checkpoint converter (torch to mindspore)
+  │ ├─darknet.py                       # backbone of network
+  │ ├─dataset.py                       # create dataset
+  │ ├─evaluation.py                    # motmetrics evaluator
+  │ ├─io.py                            # MOT evaluation utils
+  │ ├─kalman_filter.py                 # kalman filter script
+  │ ├─log.py                           # logger script
+  │ ├─model.py                         # create model script
+  │ ├─timer.py                         # timer script
+  │ ├─utils.py                         # utilities used in other scripts
+  │ └─visualization.py                 # visualization for inference
+  ├─tracker
+  │ ├─__init__.py
+  │ ├─basetrack.py                     # base class for tracking
+  │ ├─matching.py                      # matching for tracking script
+  │ └─multitracker.py                  # tracker init script
+  ├─DATASET_ZOO.md                     # dataset preparing description
+  ├─README.md
+  ├─default_config.yaml                # default configs
+  ├─eval.py                            # evaluation script
+  ├─eval_detect.py                     # detector evaluation script
+  ├─export.py                          # export to MINDIR script
+  ├─infer.py                           # inference script
+  ├─requirements.txt
+  └─train.py                           # training script
+```
+
+### [Script Parameters](#contents)
+
+```text
+Parameters in config.py and default_config.yaml.
+Include arguments for Train/Evaluation/Inference.
+
+--config_path             Path to default_config.yaml with hyperparameters and defaults
+--data_cfg_url            Path to .json with paths to datasets schemas
+--momentum                Momentum for SGD optimizer
+--decay                   Weight_decay for SGD optimizer
+--lr                      Init learning rate
+--epochs                  Number of epochs to train
+--batch_size              Batch size per one device'
+--num_classes             Number of object classes
+--k_max                   Max predictions per one map (made for optimization of FC layer embedding computation)
+--img_size                Size of input images
+--track_buffer            Tracking buffer
+--keep_checkpoint_max     Keep saved last N checkpoints
+--backbone_input_shape    Input filters of backbone layers
+--backbone_shape          Input filters of backbone layers
+--backbone_layers         Output filters of backbone layers
+--out_channel             Number of channels for detection
+--embedding_dim           Number of channels for embeddings
+--iou_thres               IOU thresholds
+--conf_thres              Confidence threshold
+--nms_thres               Threshold for Non-max suppression
+--min_box_area            Filter out tiny boxes
+--anchor_scales           12 predefined anchor boxes. Different 4 per each of 3 feature maps
+--col_names_train         Names of columns for training GeneratorDataset
+--col_names_val           Names of columns for validation GeneratorDataset
+--is_distributed          Distribute training or not
+--dataset_root            Path to datasets root folder
+--device_target           Device GPU or any
+--device_id               Device id of target device
+--device_start            Start device id
+--ckpt_url                Location of checkpoint
+--logs_dir                Dir to save logs and ckpt
+--input_video             Path to the input video
+--output_format           Expected output format
+--output_root             Expected output root path
+--save_images             Save tracking results (image)
+--save_videos             Save tracking results (video)
+```
+
+### [Training Process](#contents)
+
+#### Standalone Training
+
+Note: For all trainings necessary to use pretrained backbone darknet53.
+
+```bash
+bash scripts/run_standalone_train_gpu.sh [DEVICE_ID] [LOGS_CKPT_DIR] [CKPT_URL] [DATASET_ROOT]
+```
+
+- DEVICE_ID - device ID
+- LOGS_CKPT_DIR - path to the directory, where the training results will be stored.
+- CKPT_URL - Path to the converted pre-trained DarkNet53 backbone.
+- DATASET_ROOT - Path to the dataset root directory (containing all dataset parts, described in [DATASET_ZOO.md](DATASET_ZOO.md))
+
+The above command will run in the background, you can view the result through the generated standalone_train.log file.
+After training, you can get the training loss and time logs in chosen logs_dir.
+
+The model checkpoints will be saved in LOGS_CKPT_DIR directory.
+
+#### Distribute Training
+
+```bash
+bash scripts/run_distribute_train_gpu.sh [DEVICE_NUM] [LOGS_CKPT_DIR] [CKPT_URL] [DATASET_ROOT]
+```
+
+- DEVICE_ID - device ID
+- LOGS_CKPT_DIR - path to the directory, where the training results will be stored.
+- CKPT_URL - Path to the converted pre-trained DarkNet53 backbone.
+- DATASET_ROOT - Path to the dataset root directory (containing all dataset parts, described in [DATASET_ZOO.md](DATASET_ZOO.md))
+
+The above shell script will run the distributed training in the background.
+Here is the example of the training logs:
+
+```text
+epoch: 30 step: 1612, loss is -4.7679796
+epoch: 30 step: 1612, loss is -5.816874
+epoch: 30 step: 1612, loss is -5.302864
+epoch: 30 step: 1612, loss is -5.775913
+epoch: 30 step: 1612, loss is -4.9537477
+epoch: 30 step: 1612, loss is -4.3535285
+epoch: 30 step: 1612, loss is -5.0773625
+epoch: 30 step: 1612, loss is -4.2019467
+epoch time: 2023042.925 ms, per step time: 1209.954 ms
+epoch time: 2023069.500 ms, per step time: 1209.970 ms
+epoch time: 2023097.331 ms, per step time: 1209.986 ms
+epoch time: 2023038.221 ms, per step time: 1209.951 ms
+epoch time: 2023098.113 ms, per step time: 1209.987 ms
+epoch time: 2023093.300 ms, per step time: 1209.984 ms
+epoch time: 2023078.631 ms, per step time: 1209.975 ms
+epoch time: 2017509.966 ms, per step time: 1206.645 ms
+train success
+train success
+train success
+train success
+train success
+train success
+train success
+train success
+```
+
+### [Evaluation Process](#contents)
+
+#### Evaluation
+
+Tracking ability of the model is tested on the train part of the MOT16 dataset (doesn't use during training).
+
+To start tracker evaluation run the command below.
+
+```bash
+bash scripts/run_eval_gpu.sh [DEVICE_ID] [CKPT_URL] [DATASET_ROOT]
+```
+
+- DEVICE_ID - Device ID.
+- CKPT_URL - Path to the trained JDE model.
+- DATASET_ROOT - Path to the dataset root directory (containing all dataset parts, described in [DATASET_ZOO.md](DATASET_ZOO.md)).
+
+> Note: the script expects that the DATASET_ROOT directory contains the MOT16 sub-folder.
+
+The above python command will run in the background. The validation logs will be saved in "eval.log".
+
+For more details about `motmetrics`, you can refer to [MOT benchmark](https://motchallenge.net/).
+
+```text
+DATE-DATE-DATE TIME:TIME:TIME [INFO]: Time elapsed: 240.54 seconds, FPS: 22.04
+          IDF1   IDP   IDR  Rcll  Prcn  GT  MT  PT ML   FP    FN  IDs    FM  MOTA  MOTP IDt IDa IDm
+MOT16-02 45.1% 49.9% 41.2% 71.0% 86.0%  54  17  31  6 2068  5172  425   619 57.0% 0.215 239  68  14
+MOT16-04 69.5% 75.5% 64.3% 80.6% 94.5%  83  45  24 14 2218  9234  175   383 75.6% 0.184  98  28   3
+MOT16-05 63.6% 68.1% 59.7% 82.0% 93.7% 125  67  49  9  376  1226  137   210 74.5% 0.203 113  40  40
+MOT16-09 55.2% 60.4% 50.8% 78.1% 92.9%  25  16   8  1  316  1152  108   147 70.0% 0.187  76  15  11
+MOT16-10 57.1% 59.9% 54.5% 80.1% 88.1%  54  28  26  0 1337  2446  376   569 66.2% 0.228 202  66  16
+MOT16-11 75.0% 76.4% 73.7% 89.6% 92.9%  69  50  16  3  626   953   78   137 81.9% 0.159  49  24  12
+MOT16-13 64.8% 69.9% 60.3% 78.5% 90.9% 107  58  43  6  900  2463  272   528 68.3% 0.223 200  59  48
+OVERALL  63.2% 68.1% 58.9% 79.5% 91.8% 517 281 197 39 7841 22646 1571  2593 71.0% 0.196 977 300 144
+```
+
+To evaluate detection ability (get mAP, Precision and Recall metrics) of the model, run command below.
+
+```bash
+python eval_detect.py --device_id [DEVICE_ID] --ckpt_url [CKPT_URL] --dataset_root [DATASET_ROOT]
+```
+
+- DEVICE_ID - Device ID.
+- CKPT_URL - Path to the trained JDE model.
+- DATASET_ROOT - Path to the dataset root directory (containing all dataset parts, described in [DATASET_ZOO.md](DATASET_ZOO.md)).
+
+Results of evaluation will be visualized at command line.
+
+```text
+      Image      Total          P          R        mAP
+       4000      30353      0.829      0.778      0.765      0.426s
+       8000      30353      0.863      0.798      0.788       0.42s
+      12000      30353      0.854      0.815      0.802      0.419s
+      16000      30353      0.857      0.821      0.809      0.582s
+      20000      30353      0.865      0.834      0.824      0.413s
+      24000      30353      0.868      0.841      0.832      0.415s
+      28000      30353      0.874      0.839       0.83      0.419s
+mean_mAP: 0.8225, mean_R: 0.8325, mean_P: 0.8700
+```
+
+### [Inference Process](#contents)
+
+#### Usage
+
+To compile video from frames with predicted bounding boxes, you need to install `ffmpeg` by using
+`sudo apt-get install ffmpeg`. Video compiling will happen automatically.
+
+```bash
+python infer.py --device_id [DEVICE_ID] --ckpt_url [CKPT_URL] --input_video [INPUT_VIDEO]
+```
+
+- DEVICE_ID - Device ID.
+- CKPT_URL - Path to the trained JDE model.
+- INPUT_VIDEO - Path to the input video to tracking.
+
+#### Result
+
+Results of the inference will be saved into default `./results` folder, logs will be shown at command line.
+
+## [Model Description](#contents)
+
+### [Performance](#contents)
+
+#### Training Performance
+
+| Parameters                 | GPU (8p)                                                                            |
+| -------------------------- |-----------------------------------------------------------------------------------  |
+| Model                      | JDE (1088*608)                                                                      |
+| Hardware                   | 8 Nvidia RTX 3090, Intel Xeon Gold 6226R CPU @ 2.90GHz                              |
+| Upload Date                | 02/02/2022 (day/month/year)                                                         |
+| MindSpore Version          | 1.5.0                                                                               |
+| Dataset                    | Joint Dataset (see `DATASET_ZOO.md`)                                                |
+| Training Parameters        | epoch=30, batch_size=4 (per device), lr=0.01, momentum=0.9, weight_decay=0.0001     |
+| Optimizer                  | SGD                                                                                 |
+| Loss Function              | SmoothL1Loss, SoftmaxCrossEntropyWithLogits (and apply auto-balancing loss strategy)|
+| Outputs                    | Tensor of bbox cords, conf, class, emb                                              |
+| Speed                      | Eight cards: ~1206 ms/step                                                          |
+| Total time                 | Eight cards: ~17 hours                                                              |
+
+#### Evaluation Performance
+
+| Parameters          | GPU (1p)                                               |
+| ------------------- |--------------------------------------------------------|
+| Model               | JDE (1088*608)                                         |
+| Resource            | 1 Nvidia RTX 3090, Intel Xeon Gold 6226R CPU @ 2.90GHz |
+| Upload Date         | 02/02/2022 (day/month/year)                            |
+| MindSpore Version   | 1.5.0                                                  |
+| Dataset             | MOT-16                                                 |
+| Batch_size          | 1                                                      |
+| Outputs             | Metrics, .txt predictions                              |
+| FPS                 | 22.04                                                  |
+| Metrics             | mAP 82.2, MOTA 71.0%                                   |
+
+## [ModelZoo Homepage](#contents)
+
+ Please check the official [homepage](https://gitee.com/mindspore/models).
--- a/research/cv/JDE/cfg/ccmcpe.json
+++ b/research/cv/JDE/cfg/ccmcpe.json
+{
+    "train":
+    {
+        "mot17":"./data/mot17.train",
+        "caltech":"./data/caltech.train",
+        "citypersons":"./data/citypersons.train",
+        "cuhksysu":"./data/cuhksysu.train",
+        "prw":"./data/prw.train",
+        "eth":"./data/eth.train"
+    },
+    "test_emb":
+    {
+        "caltech":"./data/caltech.10k.val",
+        "cuhksysu":"./data/cuhksysu.val",
+        "prw":"./data/prw.val"
+    },
+    "test":
+    {
+        "caltech":"./data/caltech.val",
+        "citypersons":"./data/citypersons.val"
+    }
+}
--- a/research/cv/JDE/cfg/config.py
+++ b/research/cv/JDE/cfg/config.py
+# Copyright 2022 Huawei Technologies Co., Ltd
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ============================================================================
+"""Parse arguments"""
+import argparse
+import ast
+from pathlib import Path
+from pprint import pformat
+
+import yaml
+
+
+class Config:
+    """
+    Configuration namespace, convert dictionary to members.
+    """
+    def __init__(self, cfg_dict):
+        for k, v in cfg_dict.items():
+            if isinstance(v, (list, tuple)):
+                setattr(self, k, [Config(x) if isinstance(x, dict) else x for x in v])
+            else:
+                setattr(self, k, Config(v) if isinstance(v, dict) else v)
+
+    def __str__(self):
+        return pformat(self.__dict__)
+
+    def __repr__(self):
+        return self.__str__()
+
+
+def parse_cli_to_yaml(parser, cfg, helper=None, choices=None, cfg_path="default_config.yaml"):
+    """
+    Parse command line arguments to the configuration according to the default yaml.
+
+    Args:
+        parser (argparse.ArgumentParser): Parent parser.
+        cfg (dict): Base configuration.
+        helper (dict): Helper description.
+        choices (dict): Choices.
+    """
+    helper = {} if helper is None else helper
+    choices = {} if choices is None else choices
+    for item in cfg:
+        if not isinstance(cfg[item], list) and not isinstance(cfg[item], dict):
+            help_description = helper[item] if item in helper else f"Please reference to {cfg_path}"
+            choice = choices[item] if item in choices else None
+            if isinstance(cfg[item], bool):
+                parser.add_argument("--" + item, type=ast.literal_eval, default=cfg[item], choices=choice,
+                                    help=help_description)
+            else:
+                parser.add_argument("--" + item, type=type(cfg[item]), default=cfg[item], choices=choice,
+                                    help=help_description)
+    args = parser.parse_args()
+    return args
+
+
+def parse_yaml(yaml_path):
+    """
+    Parse the yaml config file.
+    """
+    with open(yaml_path, 'r') as fin:
+        try:
+            cfgs_raw = yaml.load_all(fin.read(), Loader=yaml.FullLoader)
+            cfgs = []
+            for cf in cfgs_raw:
+                cfgs.append(cf)
+
+            if len(cfgs) == 1:
+                cfg_helper = {}
+                cfg = cfgs[0]
+                cfg_choices = {}
+            elif len(cfgs) == 2:
+                cfg, cfg_helper = cfgs
+                cfg_choices = {}
+            elif len(cfgs) == 3:
+                cfg, cfg_helper, cfg_choices = cfgs
+            else:
+                raise ValueError("At most 3 docs (config, description for help, choices) are supported in config yaml")
+        except ValueError("Failed to parse yaml") as err:
+            raise err
+
+    return cfg, cfg_helper, cfg_choices
+
+
+def merge(args, cfg):
+    """
+    Merge the base config from yaml file and command line arguments.
+
+    Args:
+        args (argparse.Namespace): Command line arguments.
+        cfg (dict): Base configuration.
+    """
+    args_var = vars(args)
+    for item in args_var:
+        cfg[item] = args_var[item]
+
+    return cfg
+
+
+def get_config():
+    """
+    Get Config according to the yaml file and cli arguments.
+    """
+    curr_dir = Path(__file__).resolve().parent
+    parser = argparse.ArgumentParser(description="JDE config", add_help=False)
+    parser.add_argument("--config_path", type=str, default=str(curr_dir / "../default_config.yaml"),
+                        help="Path to config.")
+    parser.add_argument("--data_cfg_url", type=str, default=str(curr_dir / "ccmcpe.json"),
+                        help="Path to data config.")
+    path_args, _ = parser.parse_known_args()
+    default, helper, choices = parse_yaml(path_args.config_path)
+    args = parse_cli_to_yaml(parser=parser, cfg=default, helper=helper, choices=choices)
+    final_config = merge(args, default)
+
+    return Config(final_config)
+
+
+config = get_config()
--- a/research/cv/JDE/data/prepare_mot17.py
+++ b/research/cv/JDE/data/prepare_mot17.py
+# Copyright 2022 Huawei Technologies Co., Ltd
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ============================================================================
+"""Prepare data."""
+import argparse
+import os
+import os.path as osp
+import shutil
+from pathlib import Path
+
+import numpy as np
+
+
+def prepare(seq_root):
+    """Prepare MOT17 dataset for JDE training."""
+    label_root = str(Path(Path(seq_root).parents[0], 'labels_with_ids', 'train'))
+    seqs = [s for s in os.listdir(seq_root) if s.endswith('SDP')]
+
+    tid_curr = 0
+    tid_last = -1
+
+    for seq in seqs:
+        with open(osp.join(seq_root, seq, 'seqinfo.ini')) as file:
+            seq_info = file.read()
+
+        seq_width = int(seq_info[seq_info.find('imWidth=') + 8: seq_info.find('\nimHeight')])
+        seq_height = int(seq_info[seq_info.find('imHeight=') + 9: seq_info.find('\nimExt')])
+
+        gt_txt = osp.join(seq_root, seq, 'gt', 'gt.txt')
+        gt = np.loadtxt(gt_txt, dtype=np.float64, delimiter=',')
+
+        seq_label_root = osp.join(label_root, seq, 'img1')
+        if not osp.exists(seq_label_root):
+            os.makedirs(seq_label_root)
+
+        for fid, tid, x, y, w, h, mark, label, _ in gt:
+            if mark == 0 or not label == 1:
+                continue
+            fid = int(fid)
+            tid = int(tid)
+            if tid != tid_last:
+                tid_curr += 1
+                tid_last = tid
+            x += w / 2
+            y += h / 2
+            label_fpath = osp.join(seq_label_root, '{:06d}.txt'.format(fid))
+            label_str = '0 {:d} {:.6f} {:.6f} {:.6f} {:.6f}\n'.format(
+                tid_curr, x / seq_width, y / seq_height, w / seq_width, h / seq_height)
+            with open(label_fpath, 'a') as f:
+                f.write(label_str)
+
+        old_path = str(Path(seq_root, seq))
+        new_path = str(Path(Path(seq_root).parents[0], 'images', 'train'))
+
+        if not osp.exists(new_path):
+            os.makedirs(new_path)
+
+        shutil.move(old_path, new_path)
+
+    print('Done')
+
+
+if __name__ == "__main__":
+    parser = argparse.ArgumentParser()
+    parser.add_argument("--seq_root", required=True, help='Path to root dir of sequences')
+
+    args = parser.parse_args()
+    prepare(args.seq_root)
--- a/research/cv/JDE/default_config.yaml
+++ b/research/cv/JDE/default_config.yaml
+# Builtin Configurations(DO NOT CHANGE THESE CONFIGURATIONS unless you know exactly what you are doing)
+
+# hyperparameters of training
+momentum: 0.9
+decay: 0.0001
+lr: 0.01
+epochs: 30
+batch_size: 4
+
+# other
+num_classes: 1
+k_max: 250
+img_size: [1088, 608]
+track_buffer: 30
+keep_checkpoint_max: 6
+
+# model initialization parameters
+backbone_input_shape: [32, 64, 128, 256, 512]
+backbone_shape: [64, 128, 256, 512, 1024]
+backbone_layers: [1, 2, 8, 8, 4]
+out_channel: 24  # 3 * (num_classes + 5)
+embedding_dim: 512
+
+# evaluation thresholds
+iou_thres: 0.50
+conf_thres: 0.55
+nms_thres: 0.45
+min_box_area: 200
+
+# h -> w
+anchor_scales: [
+      [8, 24],
+      [11, 34],
+      [16, 48],
+      [23, 68],
+      [32, 96],
+      [45, 135],
+      [64, 192],
+      [90, 271],
+      [128, 384],
+      [180, 540],
+      [256, 640],
+      [512, 640],
+]
+
+
+# data configs
+col_names_train: [
+    'imgs',
+    'tconf_s',
+    'tbox_s',
+    'tid_s',
+    'tconf_m',
+    'tbox_m',
+    'tid_m',
+    'tconf_b',
+    'tbox_b',
+    'tid_b',
+    'emb_indices_s',
+    'emb_indices_m',
+    'emb_indices_b',
+]
+
+col_names_val: [
+    'imgs',
+    'targets',
+    'lens',
+]
+
+
+# other
+is_distributed: False
+dataset_root: '/path/to/datasets/root/folder/'
+device_target: 'GPU'
+device_id: 0
+device_start: 0
+ckpt_url: '/path/to/checkpoint'
+logs_dir: './logs'
+input_video: '/path/to/input/video'
+output_format: 'video'
+output_root: './results'
+save_images: False
+save_videos: False
+
+---
+# Config description for each option
+momentum: 'Momentum for SGD optimizer.'
+decay: 'Weight_decay for SGD optimizer.'
+lr: 'Init learning rate.'
+epochs: 'Number of epochs to train.'
+batch_size: 'Batch size per one device'
+num_classes: 'Number of object classes.'
+k_max: 'Max predictions per one map (made for optimization of FC layer embedding computation).'
+img_size: 'Size of input images.'
+track_buffer: 'Tracking buffer.'
+keep_checkpoint_max: 'Keep saved last N checkpoints.'
+backbone_input_shape: 'Input filters of backbone layers.'
+backbone_shape: 'Input filters of backbone layers.'
+backbone_layers: 'Output filters of backbone layers.'
+out_channel: 'Number of channels for detection.'
+embedding_dim: 'Number of channels for embeddings.'
+iou_thres: 'IOU thresholds.'
+conf_thres: 'Confidence threshold.'
+nms_thres: 'Threshold for Non-max suppression.'
+min_box_area: 'Filter out tiny boxes.'
+anchor_scales: '12 predefined anchor boxes. Different 4 per each of 3 feature maps.'
+col_names_train: 'Names of columns for training GeneratorDataset.'
+col_names_val: 'Names of columns for validation GeneratorDataset.'
+is_distributed: 'Distribute training or not.'
+dataset_root: 'Path to datasets root folder.'
+device_target: 'Device GPU or any.'
+device_id: 'Device id of target device.'
+device_start: 'Start device id.'
+ckpt_url: 'Location of checkpoint.'
+logs_dir: 'Dir to save logs and ckpt.'
+input_video: 'Path to the input video.'
+output_format: 'Expected output format.'
+output_root: 'Expected output root path.'
+save_images: 'Save tracking results (image).'
+save_videos: 'Save tracking results (video).'
--- a/research/cv/JDE/eval.py
+++ b/research/cv/JDE/eval.py
+# Copyright 2022 Huawei Technologies Co., Ltd
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ============================================================================
+"""Tracker evaluation script."""
+import logging
+import os
+import os.path as osp
+
+import cv2
+import motmetrics as mm
+import numpy as np
+from mindspore import Model
+from mindspore import Tensor
+from mindspore import context
+from mindspore import dtype as mstype
+from mindspore.train.serialization import load_checkpoint
+
+from cfg.config import config as default_config
+from src import visualization as vis
+from src.darknet import DarkNet, ResidualBlock
+from src.dataset import LoadImages
+from src.evaluation import Evaluator
+from src.log import logger
+from src.model import JDEeval
+from src.model import YOLOv3
+from src.timer import Timer
+from src.utils import mkdir_if_missing
+from tracker.multitracker import JDETracker
+
+_MOT16_VALIDATION_FOLDERS = (
+    'MOT16-02',
+    'MOT16-04',
+    'MOT16-05',
+    'MOT16-09',
+    'MOT16-10',
+    'MOT16-11',
+    'MOT16-13',
+)
+
+_MOT16_DIR_FOR_TEST = 'MOT16/train'
+
+
+def write_results(filename, results, data_type):
+    """
+    Format for evaluation results.
+    """
+    if data_type == 'mot':
+        save_format = '{frame},{id},{x1},{y1},{w},{h},1,-1,-1,-1\n'
+    elif data_type == 'kitti':
+        save_format = '{frame} {id} pedestrian 0 0 -10 {x1} {y1} {x2} {y2} -10 -10 -10 -1000 -1000 -1000 -10\n'
+    else:
+        raise ValueError(data_type)
+
+    with open(filename, 'w') as f:
+        for frame_id, tlwhs, track_ids in results:
+            if data_type == 'kitti':
+                frame_id -= 1
+            for tlwh, track_id in zip(tlwhs, track_ids):
+                if track_id < 0:
+                    continue
+                x1, y1, w, h = tlwh
+                x2, y2 = x1 + w, y1 + h
+                line = save_format.format(frame=frame_id, id=track_id, x1=x1, y1=y1, x2=x2, y2=y2, w=w, h=h)
+                f.write(line)
+    logger.info('Save results to %s', filename)
+
+
+def eval_seq(
+        opt,
+        dataloader,
+        data_type,
+        result_filename,
+        net,
+        save_dir=None,
+        frame_rate=30,
+):
+    """
+    Processes the video sequence given and provides the output
+    of tracking result (write the results in video file).
+
+    It uses JDE model for getting information about the online targets present.
+
+    Args:
+        opt (Any): Contains information passed as commandline arguments.
+        dataloader (Any): Fetching the image sequence and associated data.
+        data_type (str): Type of dataset corresponding(similar) to the given video.
+        result_filename (str): The name(path) of the file for storing results.
+        net (nn.Cell): Model.
+        save_dir (str): Path to output results.
+        frame_rate (int): Frame-rate of the given video.
+
+    Returns:
+        frame_id (int): Sequence number of the last sequence.
+        average_time (int): Average time for frame.
+        calls (int): Num of timer calls.
+    """
+    if save_dir:
+        mkdir_if_missing(save_dir)
+    tracker = JDETracker(opt, net=net, frame_rate=frame_rate)
+    timer = Timer()
+    results = []
+    frame_id = 0
+    timer.tic()
+    timer.toc()
+    timer.calls -= 1
+
+    for img, img0 in dataloader:
+        if frame_id % 20 == 0:
+            log_info = f'Processing frame {frame_id} ({(1. / max(1e-5, timer.average_time)):.2f} fps)'
+            logger.info('%s', log_info)
+
+        # except initialization step at time calculation
+        if frame_id != 0:
+            timer.tic()
+
+        im_blob = Tensor(np.expand_dims(img, 0), mstype.float32)
+        online_targets = tracker.update(im_blob, img0)
+        online_tlwhs = []
+        online_ids = []
+        for t in online_targets:
+            tlwh = t.tlwh
+            tid = t.track_id
+            vertical = tlwh[2] / tlwh[3] > 1.6
+            if tlwh[2] * tlwh[3] > opt.min_box_area and not vertical:
+                online_tlwhs.append(tlwh)
+                online_ids.append(tid)
+
+        if frame_id != 0:
+            timer.toc()
+        # save results
+        results.append((frame_id + 1, online_tlwhs, online_ids))
+        if save_dir is not None:
+            online_im = vis.plot_tracking(
+                img0,
+                online_tlwhs,
+                online_ids,
+                frame_id=frame_id,
+                fps=1. / timer.average_time,
+            )
+
+            cv2.imwrite(os.path.join(save_dir, f'{frame_id:05}.jpg'), online_im)
+        frame_id += 1
+    # save results
+    write_results(result_filename, results, data_type)
+
+    return frame_id, timer.average_time, timer.calls - 1
+
+
+def main(
+        opt,
+        data_root,
+        seqs,
+        exp_name,
+        save_videos=False,
+):
+    logger.setLevel(logging.INFO)
+    result_root = os.path.join(data_root, '..', 'results', exp_name)
+    mkdir_if_missing(result_root)
+    data_type = 'mot'
+
+    darknet53 = DarkNet(
+        ResidualBlock,
+        opt.backbone_layers,
+        opt.backbone_input_shape,
+        opt.backbone_shape,
+        detect=True,
+    )
+    model = YOLOv3(
+        backbone=darknet53,
+        backbone_shape=opt.backbone_shape,
+        out_channel=opt.out_channel,
+    )
+
+    model = JDEeval(model, opt)
+    load_checkpoint(opt.ckpt_url, model)
+    model = Model(model)
+
+    # Run tracking
+    n_frame = 0
+    timer_avgs, timer_calls, accs = [], [], []
+
+    for seq in seqs:
+        output_dir = os.path.join(data_root, '..', 'outputs', exp_name, seq) if save_videos else None
+
+        logger.info('start seq: %s', seq)
+
+        dataloader = LoadImages(osp.join(data_root, seq, 'img1'), opt.anchor_scales, opt.img_size)
+
+        result_filename = os.path.join(result_root, f'{seq}.txt')
+
+        with open(os.path.join(data_root, seq, 'seqinfo.ini')) as f:
+            meta_info = f.read()
+
+        frame_rate = int(meta_info[meta_info.find('frameRate') + 10:meta_info.find('\nseqLength')])
+
+        nf, ta, tc = eval_seq(
+            opt,
+            dataloader,
+            data_type,
+            result_filename,
+            net=model,
+            save_dir=output_dir,
+            frame_rate=frame_rate,
+        )
+
+        n_frame += nf
+        timer_avgs.append(ta)
+        timer_calls.append(tc)
+
+        # eval
+        logger.info('Evaluate seq: %s', seq)
+        evaluator = Evaluator(data_root, seq, data_type)
+        accs.append(evaluator.eval_file(result_filename))
+        if save_videos:
+            output_video_path = osp.join(output_dir, f'{seq}.mp4')
+            cmd_str = f'ffmpeg -f image2 -i {output_dir}/%05d.jpg -c:v copy {output_video_path}'
+            os.system(cmd_str)
+
+    timer_avgs = np.asarray(timer_avgs)
+    timer_calls = np.asarray(timer_calls)
+    all_time = np.dot(timer_avgs, timer_calls)
+    avg_time = all_time / np.sum(timer_calls)
+
+    log_info = f'Time elapsed: {all_time:.2f} seconds, FPS: {(1.0 / avg_time):.2f}'
+    logger.info('%s', log_info)
+
+    # Get summary
+    metrics = mm.metrics.motchallenge_metrics
+    mh = mm.metrics.create()
+    summary = Evaluator.get_summary(accs, seqs, metrics)
+    strsummary = mm.io.render_summary(
+        summary,
+        formatters=mh.formatters,
+        namemap=mm.io.motchallenge_metric_names
+    )
+
+    print(strsummary)
+    Evaluator.save_summary(summary, os.path.join(result_root, f'summary_{exp_name}.xlsx'))
+
+
+if __name__ == '__main__':
+    config = default_config
+
+    context.set_context(mode=context.GRAPH_MODE, device_target='GPU')
+    context.set_context(device_id=config.device_id)
+
+    data_root_path = os.path.join(config.dataset_root, _MOT16_DIR_FOR_TEST)
+
+    if not os.path.isdir(data_root_path):
+        raise NotADirectoryError(
+            f'Cannot find "{_MOT16_DIR_FOR_TEST}" subdirectory '
+            f'in the specified dataset root "{config.dataset_root}"'
+        )
+
+    main(
+        config,
+        data_root=data_root_path,
+        seqs=_MOT16_VALIDATION_FOLDERS,
+        exp_name=config.ckpt_url.split('/')[-2],
+        save_videos=config.save_videos,
+    )
--- a/research/cv/JDE/eval_detect.py
+++ b/research/cv/JDE/eval_detect.py
+# Copyright 2022 Huawei Technologies Co., Ltd
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ============================================================================
+"""Evaluation script."""
+import json
+import time
+
+import numpy as np
+from mindspore import Model
+from mindspore import context
+from mindspore import dataset as ds
+from mindspore.common import set_seed
+from mindspore.communication.management import get_group_size
+from mindspore.communication.management import get_rank
+from mindspore.dataset.vision import py_transforms as PY
+from mindspore.train.serialization import load_checkpoint
+
+from cfg.config import config as default_config
+from src.darknet import DarkNet, ResidualBlock
+from src.dataset import JointDatasetDetection
+from src.model import JDEeval
+from src.model import YOLOv3
+from src.utils import ap_per_class
+from src.utils import bbox_iou
+from src.utils import non_max_suppression
+from src.utils import xywh2xyxy
+
+set_seed(1)
+
+
+def _get_rank_info(device_target):
+    """
+    Get rank size and rank id.
+    """
+    if device_target == 'GPU':
+        rank_size = get_group_size()
+        rank_id = get_rank()
+    else:
+        raise ValueError("Unsupported platform.")
+
+    return rank_size, rank_id
+
+
+def main(
+        opt,
+        iou_thres,
+        conf_thres,
+        nms_thres,
+        nc,
+):
+    img_size = opt.img_size
+
+    with open(opt.data_cfg_url) as f:
+        data_config = json.load(f)
+        test_paths = data_config['test']
+
+    dataset = JointDatasetDetection(
+        opt.dataset_root,
+        test_paths,
+        augment=False,
+        transforms=PY.ToTensor(),
+        config=opt,
+    )
+
+    dataloader = ds.GeneratorDataset(
+        dataset,
+        column_names=opt.col_names_val,
+        shuffle=False,
+        num_parallel_workers=1,
+        max_rowsize=12,
+    )
+
+    dataloader = dataloader.batch(opt.batch_size, True)
+
+    darknet53 = DarkNet(
+        ResidualBlock,
+        opt.backbone_layers,
+        opt.backbone_input_shape,
+        opt.backbone_shape,
+        detect=True,
+    )
+
+    model = YOLOv3(
+        backbone=darknet53,
+        backbone_shape=opt.backbone_shape,
+        out_channel=opt.out_channel,
+    )
+
+    model = JDEeval(model, opt)
+
+    load_checkpoint(opt.ckpt_url, model)
+    print(f'Evaluation for {opt.ckpt_url}')
+    model = Model(model)
+
+    mean_map, mean_r, mean_p, seen = 0.0, 0.0, 0.0, 0
+    print('%11s' * 5 % ('Image', 'Total', 'P', 'R', 'mAP'))
+    maps, mr, mp = [], [], []
+    ap_accum, ap_accum_count = np.zeros(nc), np.zeros(nc)
+
+    for batch_i, inputs in enumerate(dataloader):
+        imgs, targets, targets_len = inputs
+        targets = targets.asnumpy()
+        targets_len = targets_len.asnumpy()
+
+        t = time.time()
+
+        raw_output, _ = model.predict(imgs)
+        output = non_max_suppression(raw_output.asnumpy(), conf_thres=conf_thres, nms_thres=nms_thres)
+
+        for i, o in enumerate(output):
+            if o is not None:
+                output[i] = o[:, :6]
+
+        # Compute average precision for each sample
+        targets = [targets[i][:int(l)] for i, l in enumerate(targets_len)]
+        for labels, detections in zip(targets, output):
+            seen += 1
+
+            if detections is None:
+                # If there are labels but no detections mark as zero ap
+                if labels.shape[0] != 0:
+                    maps.append(0)
+                    mr.append(0)
+                    mp.append(0)
+                continue
+
+            # Get detections sorted by decreasing confidence scores
+            detections = detections[np.argsort(-detections[:, 4])]
+
+            # If no labels add number of detections as incorrect
+            correct = []
+            if labels.shape[0] == 0:
+                maps.append(0)
+                mr.append(0)
+                mp.append(0)
+                continue
+
+            target_cls = labels[:, 0]
+
+            # Extract target boxes as (x1, y1, x2, y2)
+            target_boxes = xywh2xyxy(labels[:, 2:6])
+            target_boxes[:, 0] *= img_size[0]
+            target_boxes[:, 2] *= img_size[0]
+            target_boxes[:, 1] *= img_size[1]
+            target_boxes[:, 3] *= img_size[1]
+
+            detected = []
+            for *pred_bbox, _, _  in detections:
+                obj_pred = 0
+                pred_bbox = np.array(pred_bbox, dtype=np.float32).reshape(1, -1)
+                # Compute iou with target boxes
+                iou = bbox_iou(pred_bbox, target_boxes, x1y1x2y2=True)[0]
+                # Extract index of largest overlap
+                best_i = np.argmax(iou)
+                # If overlap exceeds threshold and classification is correct mark as correct
+                if iou[best_i] > iou_thres and obj_pred == labels[best_i, 0] and best_i not in detected:
+                    correct.append(1)
+                    detected.append(best_i)
+                else:
+                    correct.append(0)
+
+            # Compute Average Precision (ap) per class
+            ap, ap_class, r, p = ap_per_class(
+                tp=correct,
+                conf=detections[:, 4],
+                pred_cls=np.zeros_like(detections[:, 5]),  # detections[:, 6]
+                target_cls=target_cls,
+            )
+
+            # Accumulate AP per class
+            ap_accum_count += np.bincount(ap_class, minlength=nc)
+            ap_accum += np.bincount(ap_class, minlength=nc, weights=ap)
+
+            # Compute mean AP across all classes in this image, and append to image list
+            maps.append(ap.mean())
+            mr.append(r.mean())
+            mp.append(p.mean())
+
+            # Means of all images
+            mean_map = np.sum(maps) / (ap_accum_count + 1E-16)
+            mean_r = np.sum(mr) / (ap_accum_count + 1E-16)
+            mean_p = np.sum(mp) / (ap_accum_count + 1E-16)
+
+        if (batch_i + 1) % 1000 == 0:
+            # Print image mAP and running mean mAP
+            print(('%11s%11s' + '%11.3g' * 4 + 's') %
+                  (seen, dataset.nf, mean_p, mean_r, mean_map, time.time() - t))
+
+    # Print results
+    print(f'mean_mAP: {mean_map[0]:.4f}, mean_R: {mean_r[0]:.4f}, mean_P: {mean_p[0]:.4f}')
+
+
+if __name__ == "__main__":
+    config = default_config
+
+    context.set_context(mode=context.GRAPH_MODE, device_target=config.device_target)
+    context.set_context(device_id=config.device_id)
+
+    main(
+        opt=config,
+        iou_thres=0.5,
+        conf_thres=0.3,
+        nms_thres=0.45,
+        nc=config.num_classes,
+    )
--- a/research/cv/JDE/export.py
+++ b/research/cv/JDE/export.py
+# Copyright 2022 Huawei Technologies Co., Ltd
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ============================================================================
+"""run export"""
+from pathlib import Path
+
+import numpy as np
+from mindspore import Tensor
+from mindspore import context
+from mindspore import dtype as mstype
+from mindspore import load_checkpoint
+from mindspore.train.serialization import export
+
+from cfg.config import config as default_config
+from src.darknet import DarkNet, ResidualBlock
+from src.model import JDEeval
+from src.model import YOLOv3
+
+
+def run_export(config):
+    """
+    Export model to MINDIR.
+    """
+    darknet53 = DarkNet(
+        ResidualBlock,
+        config.backbone_layers,
+        config.backbone_input_shape,
+        config.backbone_shape,
+        detect=True,
+    )
+
+    yolov3 = YOLOv3(
+        backbone=darknet53,
+        backbone_shape=config.backbone_shape,
+        out_channel=config.out_channel,
+    )
+
+    net = JDEeval(yolov3, default_config)
+    load_checkpoint(config.ckpt_url, net)
+    net.set_train(False)
+
+    input_data = Tensor(np.zeros([1, 3, 1088, 608]), dtype=mstype.float32)
+    name = Path(config.ckpt_url).stem
+
+    export(net, input_data, file_name=name, file_format='MINDIR')
+    print('Model exported successfully!')
+
+
+if __name__ == "__main__":
+    context.set_context(
+        mode=context.GRAPH_MODE,
+        device_target=default_config.device_target,
+        device_id=default_config.device_id,
+    )
+
+    run_export(default_config)
--- a/research/cv/JDE/infer.py
+++ b/research/cv/JDE/infer.py
+# Copyright 2022 Huawei Technologies Co., Ltd
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ============================================================================
+"""Inference script."""
+import logging
+import os
+import os.path as osp
+
+from mindspore import Model
+from mindspore import context
+from mindspore.train.serialization import load_checkpoint
+
+from cfg.config import config as default_config
+from eval import eval_seq
+from src.darknet import DarkNet, ResidualBlock
+from src.dataset import LoadVideo
+from src.log import logger
+from src.model import JDEeval
+from src.model import YOLOv3
+from src.utils import mkdir_if_missing
+
+logger.setLevel(logging.INFO)
+
+def track(opt):
+    """
+    Inference of the input video.
+
+    Save the results into output-root (video, annotations and frames.).
+    """
+
+    result_root = opt.output_root if opt.output_root != '' else '.'
+    mkdir_if_missing(result_root)
+
+    anchors = opt.anchor_scales
+
+    dataloader = LoadVideo(
+        opt.input_video,
+        anchor_scales=anchors,
+        img_size=opt.img_size,
+    )
+
+    darknet53 = DarkNet(
+        ResidualBlock,
+        opt.backbone_layers,
+        opt.backbone_input_shape,
+        opt.backbone_shape,
+        detect=True,
+    )
+    model = YOLOv3(
+        backbone=darknet53,
+        backbone_shape=opt.backbone_shape,
+        out_channel=opt.out_channel,
+    )
+
+    model = JDEeval(model, opt)
+    load_checkpoint(opt.ckpt_url, model)
+    model = Model(model)
+    logger.info('Starting tracking...')
+
+    result_filename = os.path.join(result_root, 'results.txt')
+    frame_rate = dataloader.frame_rate
+
+    frame_dir = None if opt.output_format == 'text' else osp.join(result_root, 'frame')
+    try:
+        eval_seq(
+            opt,
+            dataloader,
+            'mot',
+            result_filename,
+            net=model,
+            save_dir=frame_dir,
+            frame_rate=frame_rate,
+        )
+    except TypeError as e:
+        logger.info(e)
+
+    if opt.output_format == 'video':
+        output_video_path = osp.join(result_root, 'result.mp4')
+        cmd_str = f"ffmpeg -f image2 -i {osp.join(result_root, 'frame')}/%05d.jpg -c:v copy {output_video_path}"
+        os.system(cmd_str)
+
+
+if __name__ == '__main__':
+    config = default_config
+
+    context.set_context(mode=context.GRAPH_MODE, device_target='GPU')
+    context.set_context(device_id=config.device_id)
+
+    track(config)
--- a/research/cv/JDE/requirements.txt
+++ b/research/cv/JDE/requirements.txt
+PyYAML
+opencv-python>=4.5.5.62
+motmetrics>=1.2.0
+scipy>=1.7.2
+lap>=0.4.0
+Cython
+cython-bbox>=0.1.3
+torch
--- a/research/cv/JDE/scripts/run_distribute_train_gpu.sh
+++ b/research/cv/JDE/scripts/run_distribute_train_gpu.sh
+#!/bin/bash
+# Copyright 2022 Huawei Technologies Co., Ltd
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ============================================================================
+if [[ $# -ne 4 ]]; then
+    echo "Usage: bash ./scripts/run_distribute_train_gpu.sh [DEVICE_NUM] [LOGS_CKPT_DIR] [CKPT_URL] [DATASET_ROOT]"
+    exit 1;
+fi
+
+export RANK_SIZE=$1
+
+get_real_path(){
+    if [ "${1:0:1}" == "/" ]; then
+        echo "$1"
+    else
+        realpath -m "$PWD/$1"
+    fi
+}
+
+LOGS_CKPT_DIR="$2"
+
+if [ !  -d "$LOGS_CKPT_DIR" ]; then
+  mkdir "$LOGS_CKPT_DIR"
+  mkdir "$LOGS_CKPT_DIR/training_configs"
+fi
+
+DATASET_ROOT=$(get_real_path "$4")
+CKPT_URL=$(get_real_path "$3")
+
+cp ./*.py ./"$LOGS_CKPT_DIR"/training_configs
+cp ./*.yaml ./"$LOGS_CKPT_DIR"/training_configs
+cp -r ./cfg ./"$LOGS_CKPT_DIR"/training_configs
+cp -r ./src ./"$LOGS_CKPT_DIR"/training_configs
+
+mpirun -n $1 --allow-run-as-root\
+    python train.py  \
+    --device_target="GPU" \
+    --logs_dir="$LOGS_CKPT_DIR" \
+    --dataset_root="$DATASET_ROOT" \
+    --ckpt_url="$CKPT_URL" \
+    --is_distributed=True \
+    > ./"$LOGS_CKPT_DIR"/distribute_train.log 2>&1 &
--- a/research/cv/JDE/scripts/run_eval_gpu.sh
+++ b/research/cv/JDE/scripts/run_eval_gpu.sh
+#!/bin/bash
+# Copyright 2022 Huawei Technologies Co., Ltd
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ============================================================================
+if [[ $# -ne 3 ]]; then
+    echo "Usage: bash ./scripts/run_eval_gpu.sh [DEVICE_ID] [CKPT_URL] [DATASET_ROOT]"
+    exit 1;
+fi
+
+export CUDA_VISIBLE_DEVICES=$1
+
+get_real_path(){
+    if [ "${1:0:1}" == "/" ]; then
+        echo "$1"
+    else
+        realpath -m "$PWD/$1"
+    fi
+}
+
+CKPT_URL=$(get_real_path "$2")
+DATASET_ROOT=$(get_real_path "$3")
+
+if [ ! -d "$DATASET_ROOT" ]; then
+    echo "The specified dataset root is not a directory: $DATASET_ROOT"
+    exit 1;
+fi
+
+if [ ! -f "$CKPT_URL" ]; then
+    echo "The specified checkpoint does not exist: $CKPT_URL"
+    exit 1;
+fi
+
+python ./eval.py \
+    --device_target="GPU" \
+    --device_id=0 \
+    --ckpt_url="$CKPT_URL" \
+    --dataset_root="$DATASET_ROOT" \
+    > ./eval.log 2>&1 &
--- a/research/cv/JDE/scripts/run_standalone_train_gpu.sh
+++ b/research/cv/JDE/scripts/run_standalone_train_gpu.sh
+#!/bin/bash
+# Copyright 2022 Huawei Technologies Co., Ltd
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ============================================================================
+if [[ $# -ne 4 ]]; then
+    echo "Usage: bash ./scripts/run_standalone_train_gpu.sh [DEVICE_ID] [LOGS_CKPT_DIR] [CKPT_URL] [DATASET_ROOT]"
+    exit 1
+fi
+
+export CUDA_VISIBLE_DEVICES=$1
+
+get_real_path(){
+    if [ "${1:0:1}" == "/" ]; then
+        echo "$1"
+    else
+        realpath -m "$PWD/$1"
+    fi
+}
+
+LOGS_CKPT_DIR="$2"
+
+if [ !  -d "$LOGS_CKPT_DIR" ]; then
+  mkdir "$LOGS_CKPT_DIR"
+  mkdir "$LOGS_CKPT_DIR/training_configs"
+fi
+
+DATASET_ROOT=$(get_real_path "$4")
+CKPT_URL=$(get_real_path "$3")
+
+cp ./*.py ./"$LOGS_CKPT_DIR"/training_configs
+cp ./*.yaml ./"$LOGS_CKPT_DIR"/training_configs
+cp -r ./cfg ./"$LOGS_CKPT_DIR"/training_configs
+cp -r ./src ./"$LOGS_CKPT_DIR"/training_configs
+
+python ./train.py \
+    --device_target="GPU" \
+    --device_id=0 \
+    --logs_dir="$LOGS_CKPT_DIR" \
+    --dataset_root="$DATASET_ROOT" \
+    --ckpt_url="$CKPT_URL" \
+    --lr=0.00125 \
+    > ./"$2"/standalone_train.log 2>&1 &
--- a/research/cv/JDE/src/__init__.py
+++ b/research/cv/JDE/src/__init__.py
--- a/research/cv/JDE/src/convert_checkpoint.py
+++ b/research/cv/JDE/src/convert_checkpoint.py
+# Copyright 2022 Huawei Technologies Co., Ltd
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ============================================================================
+"""Checkpoint import."""
+from pathlib import Path
+
+import torch
+from mindspore import Parameter
+from mindspore import Tensor
+from mindspore import dtype as mstype
+from mindspore import save_checkpoint
+
+from cfg.config import config
+from src.darknet import DarkNet
+from src.darknet import ResidualBlock
+
+
+def convert(cfg):
+    """
+    Init the DarkNet53 model, load PyTorch checkpoint,
+    change the keys order as well as in MindSpore and
+    save converted checkpoint with names,
+    corresponds to inited DarkNet model.
+
+    Args:
+        cfg: Config parameters.
+
+    Note:
+        Convert weights without last FC layer.
+    """
+    darknet53 = DarkNet(
+        ResidualBlock,
+        cfg.backbone_layers,
+        cfg.backbone_input_shape,
+        cfg.backbone_shape,
+        detect=True,
+    )
+
+    # Get MindSpore names of parameters
+    ms_keys = list(darknet53.parameters_dict().keys())
+
+    # Get PyTorch weights and names
+    pt_weights = torch.load(cfg.ckpt_url, map_location=torch.device('cpu'))['state_dict']
+    pt_keys = list(pt_weights.keys())
+
+    # Remove redundant keys
+    pt_keys_clear = [
+        key
+        for key in pt_keys
+        if not key.endswith('tracked')
+    ]
+
+    # One layer consist of 5 parameters
+    # Arrange PyTorch keys as well as in MindSpore
+    pt_keys_aligned = []
+    for block_num in range(len(pt_keys_clear[:-2]) // 5):
+        layer = pt_keys_clear[block_num * 5:(block_num + 1) * 5]
+        pt_keys_aligned.append(layer[0])  # Conv weight
+        pt_keys_aligned.append(layer[3])  # BN moving mean
+        pt_keys_aligned.append(layer[4])  # BN moving var
+        pt_keys_aligned.append(layer[1])  # BN gamma
+        pt_keys_aligned.append(layer[2])  # BN beta
+
+    ms_checkpoint = []
+    for key_ms, key_pt in zip(ms_keys, pt_keys_aligned):
+        weight = Parameter(Tensor(pt_weights[key_pt].numpy(), mstype.float32))
+        ms_checkpoint.append({'name': key_ms, 'data': weight})
+
+    checkpoint_name = str(Path(cfg.ckpt_url).resolve().parent / 'darknet53.ckpt')
+    save_checkpoint(ms_checkpoint, checkpoint_name)
+
+    print(f'Checkpoint converted successfully! Location {checkpoint_name}')
+
+
+if __name__ == '__main__':
+    if not Path(config.ckpt_url).exists():
+        raise FileNotFoundError(f'Expect a path to the PyTorch checkpoint, but not found it at "{config.ckpt_url}"')
+
+    convert(config)
--- a/research/cv/JDE/src/darknet.py
+++ b/research/cv/JDE/src/darknet.py
+# Copyright 2022 Huawei Technologies Co., Ltd
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ============================================================================
+"""DarkNet model."""
+from mindspore import nn
+from mindspore.ops import operations as P
+
+
+def conv_block(
+        in_channels,
+        out_channels,
+        kernel_size,
+        stride,
+        dilation=1,
+):
+    """
+    Set a conv2d, BN and relu layer.
+    """
+    pad_mode = 'same'
+    padding = 0
+
+    dbl = nn.SequentialCell(
+        [
+            nn.Conv2d(
+                in_channels,
+                out_channels,
+                kernel_size=kernel_size,
+                stride=stride,
+                padding=padding,
+                dilation=dilation,
+                pad_mode=pad_mode,
+            ),
+            nn.BatchNorm2d(out_channels, momentum=0.1),
+            nn.ReLU(),
+        ]
+    )
+
+    return dbl
+
+
+class ResidualBlock(nn.Cell):
+    """
+    DarkNet V1 residual block definition.
+
+    Args:
+        in_channels (int): Input channel.
+        out_channels (int): Output channel.
+
+    Returns:
+        out (ms.Tensor): Output tensor.
+
+    Examples:
+        ResidualBlock(3, 32)
+    """
+    def __init__(
+            self,
+            in_channels,
+            out_channels,
+    ):
+        super().__init__()
+        out_chls = out_channels//2
+        self.conv1 = conv_block(in_channels, out_chls, kernel_size=1, stride=1)
+        self.conv2 = conv_block(out_chls, out_channels, kernel_size=3, stride=1)
+        self.add = P.Add()
+
+    def construct(self, x):
+        identity = x
+        out = self.conv1(x)
+        out = self.conv2(out)
+        out = self.add(out, identity)
+
+        return out
+
+
+class DarkNet(nn.Cell):
+    """
+    DarkNet V1 network.
+
+    Args:
+        block (cell): Block for network.
+        layer_nums (list): Numbers of different layers.
+        in_channels (list): Input channel.
+        out_channels (list): Output channel.
+        detect (bool): Whether detect or not. Default:False.
+
+    Returns:
+        if detect = True:
+            c11 (ms.Tensor): Output from last layer.
+
+        if detect = False:
+            c7, c9, c11 (ms.Tensor): Outputs from different layers (FPN).
+
+    Examples:
+        DarkNet(
+        ResidualBlock,
+        [1, 2, 8, 8, 4],
+        [32, 64, 128, 256, 512],
+        [64, 128, 256, 512, 1024],
+        )
+    """
+    def __init__(
+            self,
+            block,
+            layer_nums,
+            in_channels,
+            out_channels,
+            detect=False,
+    ):
+        super().__init__()
+
+        self.detect = detect
+
+        if not len(layer_nums) == len(in_channels) == len(out_channels) == 5:
+            raise ValueError("the length of layer_num, inchannel, outchannel list must be 5!")
+
+        self.conv0 = conv_block(
+            3,
+            in_channels[0],
+            kernel_size=3,
+            stride=1,
+        )
+
+        self.conv1 = conv_block(
+            in_channels[0],
+            out_channels[0],
+            kernel_size=3,
+            stride=2,
+        )
+
+        self.layer1 = self._make_layer(
+            block,
+            layer_nums[0],
+            in_channel=out_channels[0],
+            out_channel=out_channels[0],
+        )
+
+        self.conv2 = conv_block(
+            in_channels[1],
+            out_channels[1],
+            kernel_size=3,
+            stride=2,
+        )
+
+        self.layer2 = self._make_layer(
+            block,
+            layer_nums[1],
+            in_channel=out_channels[1],
+            out_channel=out_channels[1],
+        )
+
+        self.conv3 = conv_block(
+            in_channels[2],
+            out_channels[2],
+            kernel_size=3,
+            stride=2,
+        )
+
+        self.layer3 = self._make_layer(
+            block,
+            layer_nums[2],
+            in_channel=out_channels[2],
+            out_channel=out_channels[2],
+        )
+
+        self.conv4 = conv_block(
+            in_channels[3],
+            out_channels[3],
+            kernel_size=3,
+            stride=2,
+        )
+
+        self.layer4 = self._make_layer(
+            block,
+            layer_nums[3],
+            in_channel=out_channels[3],
+            out_channel=out_channels[3],
+        )
+
+        self.conv5 = conv_block(
+            in_channels[4],
+            out_channels[4],
+            kernel_size=3,
+            stride=2,
+        )
+
+        self.layer5 = self._make_layer(
+            block,
+            layer_nums[4],
+            in_channel=out_channels[4],
+            out_channel=out_channels[4],
+        )
+
+    def _make_layer(self, block, layer_num, in_channel, out_channel):
+        """
+        Make Layer for DarkNet.
+
+        Args:
+            block (Cell): DarkNet block.
+            layer_num (int): Layer number.
+            in_channel (int): Input channel.
+            out_channel (int): Output channel.
+
+        Examples:
+            _make_layer(ConvBlock, 1, 128, 256)
+        """
+        layers = []
+        darkblk = block(in_channel, out_channel)
+        layers.append(darkblk)
+
+        for _ in range(1, layer_num):
+            darkblk = block(out_channel, out_channel)
+            layers.append(darkblk)
+
+        return nn.SequentialCell(layers)
+
+    def construct(self, x):
+        """
+        Feed forward image.
+        """
+        c1 = self.conv0(x)
+        c2 = self.conv1(c1)
+        c3 = self.layer1(c2)
+        c4 = self.conv2(c3)
+        c5 = self.layer2(c4)
+        c6 = self.conv3(c5)
+        c7 = self.layer3(c6)
+        c8 = self.conv4(c7)
+        c9 = self.layer4(c8)
+        c10 = self.conv5(c9)
+        c11 = self.layer5(c10)
+
+        if self.detect:
+            return c7, c9, c11
+
+        return c11
+
+
+def darknet53():
+    """
+    Get DarkNet53 neural network.
+
+    Returns:
+        Cell, cell instance of DarkNet53 neural network.
+
+    Examples:
+        darknet53()
+    """
+
+    darknet = DarkNet(
+        block=ResidualBlock,
+        layer_nums=[1, 2, 8, 8, 4],
+        in_channels=[32, 64, 128, 256, 512],
+        out_channels=[64, 128, 256, 512, 1024],
+    )
+
+    return darknet
--- a/research/cv/JDE/src/dataset.py
+++ b/research/cv/JDE/src/dataset.py
+# Copyright 2022 Huawei Technologies Co., Ltd
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ============================================================================
+"""Dataloader script."""
+import math
+import os
+import os.path as osp
+import random
+from collections import OrderedDict
+from pathlib import Path
+
+import cv2
+import numpy as np
+
+from src.utils import build_thresholds
+from src.utils import create_anchors_vec
+from src.utils import xyxy2xywh
+
+
+class LoadImages:
+    """
+    Loader for inference.
+
+    Args:
+        path (str): Path to the directory, containing images.
+        img_size (list): Size of output image.
+
+    Returns:
+        img (np.array): Processed image.
+        img0 (np.array): Original image.
+    """
+    def __init__(self, path, anchor_scales, img_size=(1088, 608)):
+        path = Path(path)
+        if not path.is_dir():
+            raise NotADirectoryError(f'Expected a path to the directory with images, got "{path}"')
+
+        self.files = sorted(path.glob('*.jpg'))
+
+        self.anchors, self.strides = create_anchors_vec(anchor_scales)
+        self.nf = len(self.files)  # Number of img files.
+        self.width = img_size[0]
+        self.height = img_size[1]
+        self.count = 0
+
+        assert self.nf > 0, 'No images found in ' + path
+
+    def __iter__(self):
+        self.count = -1
+        return self
+
+    def __next__(self):
+        self.count += 1
+        if self.count == self.nf:
+            raise StopIteration
+        img_path = str(self.files[self.count])
+
+        # Read image
+        img0 = cv2.imread(img_path)  # BGR
+        assert img0 is not None, 'Failed to load ' + img_path
+
+        # Padded resize
+        img, _, _, _ = letterbox(img0, height=self.height, width=self.width)
+
+        # Normalize RGB
+        img = img[:, :, ::-1].transpose(2, 0, 1)
+        img = np.ascontiguousarray(img, dtype=np.float32)
+        img /= 255.0
+
+        output = (img, img0)
+
+        return output
+
+    def __getitem__(self, idx):
+        idx = idx % self.nf
+        img_path = self.files[idx]
+
+        # Read image
+        img0 = cv2.imread(img_path)  # BGR
+        assert img0 is not None, 'Failed to load ' + img_path
+
+        # Padded resize
+        img, _, _, _ = letterbox(img0, height=self.height, width=self.width)
+
+        # Normalize RGB
+        img = img[:, :, ::-1].transpose(2, 0, 1)
+        img = np.ascontiguousarray(img, dtype=np.float32)
+        img /= 255.0
+
+        output = (img, img0)
+
+        return output
+
+    def __len__(self):
+        return self.nf  # number of files
+
+
+class LoadVideo:
+    """
+    Video loader for inference.
+
+    Args:
+        path (str): Path to video.
+        img_size (tuple): Size of output images size.
+
+    Returns:
+        count (int): Number of frame.
+        img (np.array): Processed image.
+        img0 (np.array): Original image.
+    """
+    def __init__(self, path, anchor_scales, img_size=(1088, 608)):
+        if not os.path.isfile(path):
+            raise FileExistsError
+
+        self.cap = cv2.VideoCapture(path)
+        self.frame_rate = int(round(self.cap.get(cv2.CAP_PROP_FPS)))
+        self.vw = int(self.cap.get(cv2.CAP_PROP_FRAME_WIDTH))
+        self.vh = int(self.cap.get(cv2.CAP_PROP_FRAME_HEIGHT))
+        self.vn = int(self.cap.get(cv2.CAP_PROP_FRAME_COUNT))
+
+        self.anchors, self.strides = create_anchors_vec(anchor_scales)
+
+        self.width = img_size[0]
+        self.height = img_size[1]
+        self.count = 0
+
+        self.w, self.h = self.get_size(self.vw, self.vh, self.width, self.height)
+        print(f'Lenth of the video: {self.vn:d} frames')
+
+    def get_size(self, vw, vh, dw, dh):
+        wa, ha = float(dw) / vw, float(dh) / vh
+        a = min(wa, ha)
+        return int(vw * a), int(vh * a)
+
+    def __iter__(self):
+        self.count = -1
+        return self
+
+    def __next__(self):
+        self.count += 1
+        if self.count == len(self):
+            raise StopIteration
+        # Read image
+        _, img0 = self.cap.read()  # BGR
+        assert img0 is not None, f'Failed to load frame {self.count:d}'
+        img0 = cv2.resize(img0, (self.w, self.h))
+
+        # Padded resize
+        img, _, _, _ = letterbox(img0, height=self.height, width=self.width)
+
+        # Normalize RGB
+        img = img[:, :, ::-1].transpose(2, 0, 1)
+        img = np.ascontiguousarray(img, dtype=np.float32)
+        img /= 255.0
+
+        output = (img, img0)
+
+        return output
+
+    def __len__(self):
+        return self.vn  # number of files
+
+
+class JointDataset:
+    """
+    Loader for all datasets.
+
+    Args:
+        root (str): Absolute path to datasets.
+        paths (dict): Relative paths for datasets.
+        img_size (list): Size of output image.
+        augment (bool): Augment images or not.
+        transforms: Transform methods.
+        config (class): Config with hyperparameters.
+
+    Returns:
+        imgs (np_array): Prepared image. Shape (C, H, W)
+        tconf (s, m, b) (np_array): Mask with bg (0), gt (1) and ign (-1) indices. Shape (nA, nGh, nGw).
+        tbox (s, m, b) (np_array): Targets delta bbox values. Shape (nA, nGh, nGw, 4).
+        tid (s, m, b) (np_array): Grid with id for every cell. Shape (nA, nGh, nGw).
+    """
+    def __init__(
+            self,
+            root,
+            paths,
+            img_size=(1088, 608),
+            k_max=200,
+            augment=False,
+            transforms=None,
+            config=None,
+    ):
+        self.img_files = OrderedDict()
+        self.label_files = OrderedDict()
+        self.tid_num = OrderedDict()
+        self.tid_start_index = OrderedDict()
+        self.config = config
+        self.anchors, self.strides = create_anchors_vec(config.anchor_scales)
+        self.k_max = k_max
+
+        # Iterate for all of datasets to prepare paths to labels
+        for ds, img_path in paths.items():
+            with open(img_path, 'r') as file:
+                self.img_files[ds] = file.readlines()
+                self.img_files[ds] = [osp.join(root, x.strip()) for x in self.img_files[ds]]
+                self.img_files[ds] = list(filter(lambda x: len(x) > 0, self.img_files[ds]))
+
+            self.label_files[ds] = [
+                x.replace('images', 'labels_with_ids').replace('.png', '.txt').replace('.jpg', '.txt')
+                for x in self.img_files[ds]]
+
+        # Search for max pedestrian id in dataset
+        for ds, label_paths in self.label_files.items():
+            max_index = -1
+            for lp in label_paths:
+                lb = np.loadtxt(lp)
+                if lb.shape[0] < 1:
+                    continue
+                if lb.ndim < 2:
+                    img_max = lb[1]
+                else:
+                    img_max = np.max(lb[:, 1])
+                if img_max > max_index:
+                    max_index = img_max
+            self.tid_num[ds] = max_index + 1
+
+        last_index = 0
+        for k, v in self.tid_num.items():
+            self.tid_start_index[k] = last_index
+            last_index += v
+
+        self.nid = int(last_index + 1)
+        self.nds = [len(x) for x in self.img_files.values()]
+        self.cds = [sum(self.nds[:i]) for i in range(len(self.nds))]
+        self.nf = sum(self.nds)
+        self.width = img_size[0]
+        self.height = img_size[1]
+        self.augment = augment
+        self.transforms = transforms
+
+        print('=' * 40)
+        print('dataset summary')
+        print(self.tid_num)
+        print('total # identities:', self.nid)
+        print('start index')
+        print(self.tid_start_index)
+        print('=' * 40)
+
+    def get_data(self, img_path, label_path):
+        """
+        Get and prepare data (augment img).
+        """
+        height = self.height
+        width = self.width
+        img = cv2.imread(img_path)  # BGR
+        if img is None:
+            raise ValueError(f'File corrupt {img_path}')
+        augment_hsv = True
+        if self.augment and augment_hsv:
+            # SV augmentation by 50%
+            fraction = 0.50
+            img_hsv = cv2.cvtColor(img, cv2.COLOR_BGR2HSV)
+            s = img_hsv[:, :, 1].astype(np.float32)
+            v = img_hsv[:, :, 2].astype(np.float32)
+
+            a = (random.random() * 2 - 1) * fraction + 1
+            s *= a
+            if a > 1:
+                np.clip(s, a_min=0, a_max=255, out=s)
+
+            a = (random.random() * 2 - 1) * fraction + 1
+            v *= a
+            if a > 1:
+                np.clip(v, a_min=0, a_max=255, out=v)
+
+            img_hsv[:, :, 1] = s.astype(np.uint8)
+            img_hsv[:, :, 2] = v.astype(np.uint8)
+            cv2.cvtColor(img_hsv, cv2.COLOR_HSV2BGR, dst=img)
+
+        h, w, _ = img.shape
+        img, ratio, padw, padh = letterbox(img, height=height, width=width)
+
+        # Load labels
+        if os.path.isfile(label_path):
+            labels0 = np.loadtxt(label_path, dtype=np.float32).reshape(-1, 6)
+
+            # Normalized xywh to pixel xyxy format
+            labels = labels0.copy()
+            labels[:, 2] = ratio * w * (labels0[:, 2] - labels0[:, 4] / 2) + padw
+            labels[:, 3] = ratio * h * (labels0[:, 3] - labels0[:, 5] / 2) + padh
+            labels[:, 4] = ratio * w * (labels0[:, 2] + labels0[:, 4] / 2) + padw
+            labels[:, 5] = ratio * h * (labels0[:, 3] + labels0[:, 5] / 2) + padh
+        else:
+            labels = np.array([])
+
+        # Augment image and labels
+        if self.augment:
+            img, labels, _ = random_affine(img, labels, degrees=(-5, 5), translate=(0.10, 0.10), scale=(0.50, 1.20))
+
+        nlbls = len(labels)
+        if nlbls > 0:
+            # convert xyxy to xywh
+            labels[:, 2:6] = xyxy2xywh(labels[:, 2:6].copy())  # / height
+            labels[:, 2] /= width
+            labels[:, 3] /= height
+            labels[:, 4] /= width
+            labels[:, 5] /= height
+        if self.augment:
+            # random left-right flip
+            lr_flip = True
+            if lr_flip & (random.random() > 0.5):
+                img = np.fliplr(img)
+                if nlbls > 0:
+                    labels[:, 2] = 1 - labels[:, 2]
+
+        img = np.ascontiguousarray(img[:, :, ::-1])  # BGR to RGB
+        if self.transforms is not None:
+            img = self.transforms(img)
+
+        return img, labels, img_path
+
+    def __getitem__(self, files_index):
+        """
+        Iterator function for train dataset
+        """
+        for i, c in enumerate(self.cds):
+            if files_index >= c:
+                ds = list(self.label_files.keys())[i]
+                start_index = c
+        img_path = self.img_files[ds][files_index - start_index]
+        label_path = self.label_files[ds][files_index - start_index]
+
+        imgs, labels, img_path = self.get_data(img_path, label_path)
+        for i, _ in enumerate(labels):
+            if labels[i, 1] > -1:
+                labels[i, 1] += self.tid_start_index[ds]
+
+        # Graph mode in Mindspore uses constant shapes
+        # Thus, it is necessary to fill targets to max possible ids in image
+        to_fill = 100 - labels.shape[0]
+        padding = np.zeros((to_fill, 6), dtype=np.float32)
+        labels = np.concatenate((labels, padding), axis=0)
+
+        # Calculate confidence mask, bbox delta and ids for every map size
+        small, medium, big = build_thresholds(
+            labels=labels,
+            anchor_vec_s=self.anchors[0],
+            anchor_vec_m=self.anchors[1],
+            anchor_vec_b=self.anchors[2],
+            k_max=self.k_max,
+        )
+
+        tconf_s, tbox_s, tid_s, emb_indices_s = small
+        tconf_m, tbox_m, tid_m, emb_indices_m = medium
+        tconf_b, tbox_b, tid_b, emb_indices_b = big
+
+        total_values = (
+            imgs.astype(np.float32),
+            tconf_s,
+            tbox_s,
+            tid_s,
+            tconf_m,
+            tbox_m,
+            tid_m,
+            tconf_b,
+            tbox_b,
+            tid_b,
+            emb_indices_s,
+            emb_indices_m,
+            emb_indices_b,
+        )
+        return total_values
+
+    def __len__(self):
+        return self.nf  # number of batches
+
+
+class JointDatasetDetection(JointDataset):
+    """
+    Joint dataset for evaluation.
+    """
+    def __init__(self, *args, **kwargs):
+        super().__init__(*args, **kwargs)
+
+    def __getitem__(self, files_index):
+        """
+        Iterator function for train dataset.
+        """
+        for i, c in enumerate(self.cds):
+            if files_index >= c:
+                ds = list(self.label_files.keys())[i]
+                start_index = c
+        img_path = self.img_files[ds][files_index - start_index]
+        label_path = self.label_files[ds][files_index - start_index]
+
+        imgs, labels, img_path = self.get_data(img_path, label_path)
+        for i, _ in enumerate(labels):
+            if labels[i, 1] > -1:
+                labels[i, 1] += self.tid_start_index[ds]
+
+        targets_size = labels.shape[0]
+
+        # Graph mode in Mindspore uses constant shapes
+        # Thus, it is necessary to fill targets to max possible ids in image.
+        to_fill = 100 - labels.shape[0]
+        padding = np.zeros((to_fill, 6), dtype=np.float32)
+        labels = np.concatenate((labels, padding), axis=0)
+
+        output = (imgs.astype(np.float32), labels, targets_size)
+
+        return output
+
+
+def letterbox(
+        img,
+        height=608,
+        width=1088,
+        color=(127.5, 127.5, 127.5),
+):
+    """
+    Resize a rectangular image to a padded rectangular
+    and fill padded border with color.
+    """
+    shape = img.shape[:2]  # shape = [height, width]
+    ratio = min(float(height) / shape[0], float(width) / shape[1])
+    new_shape = (round(shape[1] * ratio), round(shape[0] * ratio))  # new_shape = [width, height]
+    dw = (width - new_shape[0]) / 2  # width padding
+    dh = (height - new_shape[1]) / 2  # height padding
+    top, bottom = round(dh - 0.1), round(dh + 0.1)
+    left, right = round(dw - 0.1), round(dw + 0.1)
+    img = cv2.resize(img, new_shape, interpolation=cv2.INTER_AREA)  # resized, no border
+    img = cv2.copyMakeBorder(img, top, bottom, left, right, cv2.BORDER_CONSTANT, value=color)  # padded rectangular
+
+    return img, ratio, dw, dh
+
+
+def random_affine(
+        img,
+        targets=None,
+        degrees=(-10, 10),
+        translate=(.1, .1),
+        scale=(.9, 1.1),
+        shear=(-2, 2),
+        border_value=(127.5, 127.5, 127.5),
+):
+    """
+    Apply several data augmentation techniques,
+    such as random rotation, random scale, color jittering
+    to reduce overfitting.
+
+    Every rotation and scaling and etc.
+    is also applied to targets bbox cords.
+    """
+    border = 0  # width of added border (optional)
+    height = img.shape[0]
+    width = img.shape[1]
+
+    # Rotation and Scale
+    r = np.eye(3)
+    a = random.random() * (degrees[1] - degrees[0]) + degrees[0]
+    s = random.random() * (scale[1] - scale[0]) + scale[0]
+    r[:2] = cv2.getRotationMatrix2D(angle=a, center=(img.shape[1] / 2, img.shape[0] / 2), scale=s)
+
+    # Translation
+    t = np.eye(3)
+    t[0, 2] = (random.random() * 2 - 1) * translate[0] * img.shape[0] + border  # x translation (pixels)
+    t[1, 2] = (random.random() * 2 - 1) * translate[1] * img.shape[1] + border  # y translation (pixels)
+
+    # Shear
+    s = np.eye(3)
+    s[0, 1] = math.tan((random.random() * (shear[1] - shear[0]) + shear[0]) * math.pi / 180)  # x shear (deg)
+    s[1, 0] = math.tan((random.random() * (shear[1] - shear[0]) + shear[0]) * math.pi / 180)  # y shear (deg)
+
+    m = s @ t @ r  # Combined rotation matrix. ORDER IS IMPORTANT HERE!
+    imw = cv2.warpPerspective(img, m, dsize=(width, height), flags=cv2.INTER_LINEAR,
+                              borderValue=border_value)  # BGR order borderValue
+
+    # Return warped points also
+    if targets is not None:
+        if targets.shape[0] > 0:
+            n = targets.shape[0]
+            points = targets[:, 2:6].copy()
+            area0 = (points[:, 2] - points[:, 0]) * (points[:, 3] - points[:, 1])
+
+            # warp points
+            xy = np.ones((n * 4, 3))
+            xy[:, :2] = points[:, [0, 1, 2, 3, 0, 3, 2, 1]].reshape(n * 4, 2)  # x1y1, x2y2, x1y2, x2y1
+            xy = (xy @ m.T)[:, :2].reshape(n, 8)
+
+            # create new boxes
+            x = xy[:, [0, 2, 4, 6]]
+            y = xy[:, [1, 3, 5, 7]]
+            xy = np.concatenate((x.min(1), y.min(1), x.max(1), y.max(1))).reshape(4, n).T
+
+            # apply angle-based reduction
+            radians = a * math.pi / 180
+            reduction = max(abs(math.sin(radians)), abs(math.cos(radians))) ** 0.5
+            x = (xy[:, 2] + xy[:, 0]) / 2
+            y = (xy[:, 3] + xy[:, 1]) / 2
+            w = (xy[:, 2] - xy[:, 0]) * reduction
+            h = (xy[:, 3] - xy[:, 1]) * reduction
+            xy = np.concatenate((x - w / 2, y - h / 2, x + w / 2, y + h / 2)).reshape(4, n).T
+
+            # reject warped points outside of image
+            np.clip(xy[:, 0], 0, width, out=xy[:, 0])
+            np.clip(xy[:, 2], 0, width, out=xy[:, 2])
+            np.clip(xy[:, 1], 0, height, out=xy[:, 1])
+            np.clip(xy[:, 3], 0, height, out=xy[:, 3])
+            w = xy[:, 2] - xy[:, 0]
+            h = xy[:, 3] - xy[:, 1]
+            area = w * h
+            ar = np.maximum(w / (h + 1e-16), h / (w + 1e-16))
+            i = (w > 4) & (h > 4) & (area / (area0 + 1e-16) > 0.1) & (ar < 10)
+
+            targets = targets[i]
+            targets[:, 2:6] = xy[i]
+
+            return imw, targets, m
+
+    return imw
--- a/research/cv/JDE/src/evaluation.py
+++ b/research/cv/JDE/src/evaluation.py
+# Copyright 2022 Huawei Technologies Co., Ltd
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ============================================================================
+"""Evaluation scripts."""
+import copy
+import os
+
+import motmetrics as mm
+import numpy as np
+import pandas as pd
+
+from src.io import read_results, unzip_objs
+
+mm.lap.default_solver = 'lap'
+
+
+class Evaluator:
+    """
+    Evaluation for tracking with motmetrics.
+    """
+    def __init__(self, data_root, seq_name, data_type):
+        self.data_root = data_root
+        self.seq_name = seq_name
+        self.data_type = data_type
+
+        self.load_annotations()
+        self.reset_accumulator()
+
+    def load_annotations(self):
+        """Load groundtruths."""
+        assert self.data_type == 'mot'
+
+        gt_filename = os.path.join(self.data_root, self.seq_name, 'gt', 'gt.txt')
+        self.gt_frame_dict = read_results(gt_filename, self.data_type, is_gt=True)
+        self.gt_ignore_frame_dict = read_results(gt_filename, self.data_type, is_ignore=True)
+
+    def reset_accumulator(self):
+        self.acc = mm.MOTAccumulator(auto_id=True)
+
+    def eval_frame(self, frame_id, trk_tlwhs, trk_ids, rtn_events=False):
+        """
+        Eval one frame.
+        """
+        # results
+        trk_tlwhs = np.copy(trk_tlwhs)
+        trk_ids = np.copy(trk_ids)
+
+        # gts
+        gt_objs = self.gt_frame_dict.get(frame_id, [])
+        gt_tlwhs, gt_ids = unzip_objs(gt_objs)[:2]
+
+        # ignore boxes
+        ignore_objs = self.gt_ignore_frame_dict.get(frame_id, [])
+        ignore_tlwhs = unzip_objs(ignore_objs)[0]
+
+        # remove ignored results
+        keep = np.ones(len(trk_tlwhs), dtype=bool)
+        iou_distance = mm.distances.iou_matrix(ignore_tlwhs, trk_tlwhs, max_iou=0.5)
+        if iou_distance.size > 0:
+            match_is, match_js = mm.lap.linear_sum_assignment(iou_distance)
+            match_is, match_js = map(lambda a: np.asarray(a, dtype=int), [match_is, match_js])
+            match_ious = iou_distance[match_is, match_js]
+
+            match_js = np.asarray(match_js, dtype=int)
+            match_js = match_js[np.logical_not(np.isnan(match_ious))]
+            keep[match_js] = False
+            trk_tlwhs = trk_tlwhs[keep]
+            trk_ids = trk_ids[keep]
+
+        # get distance matrix
+        iou_distance = mm.distances.iou_matrix(gt_tlwhs, trk_tlwhs, max_iou=0.5)
+
+        # acc
+        self.acc.update(gt_ids, trk_ids, iou_distance)
+
+        if rtn_events and iou_distance.size > 0 and hasattr(self.acc, 'last_mot_events'):
+            events = self.acc.last_mot_events
+        else:
+            events = None
+        return events
+
+    def eval_file(self, filename):
+        """
+        Eval file.
+        """
+        self.reset_accumulator()
+
+        result_frame_dict = read_results(filename, self.data_type, is_gt=False)
+        frames = sorted(list(set(self.gt_frame_dict.keys()) | set(result_frame_dict.keys())))
+        for frame_id in frames:
+            trk_objs = result_frame_dict.get(frame_id, [])
+            trk_tlwhs, trk_ids = unzip_objs(trk_objs)[:2]
+            self.eval_frame(frame_id, trk_tlwhs, trk_ids, rtn_events=False)
+
+        return self.acc
+
+    @staticmethod
+    def get_summary(accs, names, metrics=('mota', 'num_switches', 'idp', 'idr', 'idf1', 'precision', 'recall')):
+        """
+        Get MOT summary.
+        """
+        names = copy.deepcopy(names)
+        if metrics is None:
+            metrics = mm.metrics.motchallenge_metrics
+        metrics = copy.deepcopy(metrics)
+
+        mh = mm.metrics.create()
+        summary = mh.compute_many(
+            accs,
+            metrics=metrics,
+            names=names,
+            generate_overall=True
+        )
+
+        return summary
+
+    @staticmethod
+    def save_summary(summary, filename):
+        """
+        Save evaluation summary.
+        """
+        writer = pd.ExcelWriter(filename)
+        summary.to_excel(writer)
+        writer.save()
--- a/research/cv/JDE/src/io.py
+++ b/research/cv/JDE/src/io.py
+# Copyright 2022 Huawei Technologies Co., Ltd
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ============================================================================
+"""MOT utils."""
+import os
+
+import numpy as np
+
+
+def read_results(filename, data_type: str, is_gt=False, is_ignore=False):
+    """
+    Read results.
+    """
+    if data_type in ('mot', 'lab'):
+        read_fun = read_mot_results
+    else:
+        raise ValueError('Unknown data type: {data_type}')
+
+    return read_fun(filename, is_gt, is_ignore)
+
+
+def read_mot_results(filename, is_gt, is_ignore):
+    """
+    Read MOT results.
+    """
+    valid_labels = {1}
+    ignore_labels = {2, 7, 8, 12}
+    results_dict = {}
+    if os.path.isfile(filename):
+        with open(filename, 'r') as f:
+            for line in f.readlines():
+                linelist = line.split(',')
+                if len(linelist) < 7:
+                    continue
+                fid = int(linelist[0])
+                if fid < 1:
+                    continue
+                results_dict.setdefault(fid, [])
+
+                if is_gt:
+                    if 'MOT16-' in filename or 'MOT17-' in filename:
+                        label = int(float(linelist[7]))
+                        mark = int(float(linelist[6]))
+                        if mark == 0 or label not in valid_labels:
+                            continue
+                    score = 1
+                elif is_ignore:
+                    if 'MOT16-' in filename or 'MOT17-' in filename:
+                        label = int(float(linelist[7]))
+                        vis_ratio = float(linelist[8])
+                        if label not in ignore_labels and vis_ratio >= 0:
+                            continue
+                    else:
+                        continue
+                    score = 1
+                else:
+                    score = float(linelist[6])
+
+                tlwh = tuple(map(float, linelist[2:6]))
+                target_id = int(linelist[1])
+
+                results_dict[fid].append((tlwh, target_id, score))
+
+    return results_dict
+
+
+def unzip_objs(objs):
+    """
+    Unzip objects.
+    """
+    if objs:
+        tlwhs, ids, scores = zip(*objs)
+    else:
+        tlwhs, ids, scores = [], [], []
+    tlwhs = np.asarray(tlwhs, dtype=float).reshape(-1, 4)
+
+    return tlwhs, ids, scores