diff --git a/research/cv/JDE/DATASET_ZOO.md b/research/cv/JDE/DATASET_ZOO.md
new file mode 100644
index 0000000000000000000000000000000000000000..9ea8c96fa044d8be5af788b1130aa58bacadd7f9
--- /dev/null
+++ b/research/cv/JDE/DATASET_ZOO.md
@@ -0,0 +1,304 @@
+# Dataset Zoo
+
+Datasets preparing was used from [Towards-Realtime-MOT](https://github.com/Zhongdao/Towards-Realtime-MOT/blob/master/DATASET_ZOO.md)
+
+## Data Format
+
+The root folder of datasets will have the following structure:
+
+```text
+.
+鈹斺攢datasets
+  鈹溾攢Caltech
+  鈹溾攢Cityscapes
+  鈹溾攢CUHKSYSU
+  鈹溾攢ETHZ
+  鈹溾攢MOT16
+  鈹溾攢MOT17
+  鈹斺攢PRW
+```
+
+Every image has a corresponding annotation text. Given an image path,
+the annotation text path can be generated by replacing the string `images` with `labels_with_ids` and replacing `.jpg` with `.txt`.
+
+In the annotation text, each line is describing a bounding box and has the following format:
+
+```text
+[class] [identity] [x_center] [y_center] [width] [height]
+```
+
+The field `[class]` should be `0`. Only single-class multi-object tracking is supported.
+
+The field `[identity]` is an integer from `0` to `num_identities - 1`, or `-1` if this box has no identity annotation.
+
+- Note that the values of `[x_center] [y_center] [width] [height]` are normalized by the width/height of the image, so they are floating point numbers ranging from 0 to 1.
+
+## Download
+
+### Caltech Pedestrian
+
+Download all archives `set**.tar` files from [this page](https://drive.google.com/drive/folders/1IBlcJP8YsCaT81LwQ2YwQJac8bf1q8xF?usp=sharing) and extract to `Caltech/data`.
+
+Download [annotations](https://drive.google.com/file/d/1h8vxl_6tgi9QVYoer9XcY9YwNB32TE5k/view?usp=sharing) and unzip to `Caltech/data/labels_with_ids`.
+
+Download [this tool](https://github.com/mitmul/caltech-pedestrian-dataset-converter) to convert the original data format to images.
+Move `scripts` folder of tool to `Caltech` folder and use command:
+
+```bash
+python scripts/convert_seqs.py
+```
+
+The structure of the dataset after completing all steps will be the following:
+
+```text
+.
+鈹斺攢Caltech
+  鈹斺攢data
+    鈹溾攢images
+    鈹� 鈹斺攢***
+    鈹斺攢labels_with_ids
+      鈹斺攢***
+```
+
+Note: *** - it is a data (images or annotations)
+
+### CityPersons
+
+Google Drive:
+[[0]](https://drive.google.com/file/d/1DgLHqEkQUOj63mCrS_0UGFEM9BG8sIZs/view?usp=sharing)
+[[1]](https://drive.google.com/file/d/1BH9Xz59UImIGUdYwUR-cnP1g7Ton_LcZ/view?usp=sharing)
+[[2]](https://drive.google.com/file/d/1q_OltirP68YFvRWgYkBHLEFSUayjkKYE/view?usp=sharing)
+[[3]](https://drive.google.com/file/d/1VSL0SFoQxPXnIdBamOZJzHrHJ1N2gsTW/view?usp=sharing)
+
+Download `.zip` archives from links and use the following commands.
+
+```bash
+zip --FF Citypersons --out c.zip
+unzip c.zip
+mv Citypersons Cityscapes
+```
+
+The structure of the dataset after completing all steps will be the following:
+
+```text
+.
+鈹斺攢Cityscapes
+  鈹溾攢images
+  鈹� 鈹溾攢train
+  鈹� 鈹斺攢val
+  鈹斺攢labels_with_ids
+    鈹溾攢train
+    鈹斺攢val
+```
+
+### CUHK-SYSU
+
+Google Drive:
+[[0]](https://drive.google.com/file/d/1D7VL43kIV9uJrdSCYl53j89RE2K-IoQA/view?usp=sharing)
+
+Original dataset webpage: [CUHK-SYSU Person Search Dataset](http://www.ee.cuhk.edu.hk/~xgwang/PS/dataset.html)
+
+Download dataset, unzip and use command below.
+
+```bash
+mv CUHK-SYSU CUHKSYSU
+```
+
+The structure of the dataset will be the following:
+
+```text
+.
+鈹斺攢CUHKSYSU
+  鈹溾攢images
+  鈹� 鈹斺攢***
+  鈹斺攢labels_with_ids
+    鈹斺攢***
+```
+
+Note: *** - it is a data (images or annotations)
+
+### PRW
+
+Google Drive:
+[[0]](https://drive.google.com/file/d/116_mIdjgB-WJXGe8RYJDWxlFnc_4sqS8/view?usp=sharing)
+
+Download dataset and unzip. The structure of the dataset will be the following:
+
+```text
+.
+鈹斺攢PRW
+  鈹溾攢images
+  鈹� 鈹斺攢***
+  鈹斺攢labels_with_ids
+    鈹斺攢***
+```
+
+Note: *** - it is a data (images or annotations)
+
+### ETHZ (overlapping with MOT-16 removed)
+
+Google Drive:
+[[0]](https://drive.google.com/file/d/19QyGOCqn8K_rc9TXJ8UwLSxCx17e0GoY/view?usp=sharing)
+
+Original dataset webpage: [ETHZ pedestrian dataset](https://data.vision.ee.ethz.ch/cvl/aess/dataset/)
+
+Download dataset and unzip. The structure of the dataset will be the following:
+
+```text
+.
+鈹斺攢ETHZ
+  鈹溾攢eth01
+  鈹� 鈹溾攢images
+  鈹� 鈹� 鈹斺攢***
+  鈹� 鈹斺攢labels_with_ids
+  鈹�   鈹斺攢***
+  鈹溾攢eth02
+  鈹溾攢eth03
+  鈹溾攢eth05
+  鈹斺攢eth07
+```
+
+Note: *** - it is a data (images or annotations). Same structure to every 'eth*' folder.
+
+### MOT-17
+
+Official link:
+[[0]](https://motchallenge.net/data/MOT17.zip)
+
+Original dataset webpage: [MOT-17](https://motchallenge.net/data/MOT17/)
+
+After downloading, unzip and use `prepare_mot17.py` script from the:
+
+```bash
+python data/prepare_mot17.py --seq_root /path/to/MOT17/train
+```
+
+The structure of the dataset after completing all steps will be the following:
+
+```text
+.
+鈹斺攢MOT17
+  鈹溾攢images
+  鈹� 鈹斺攢train
+  鈹斺攢labels_with_ids
+    鈹斺攢train
+```
+
+### MOT-16 (for evaluation)
+
+Google Drive:
+[[0]](https://drive.google.com/file/d/1254q3ruzBzgn4LUejDVsCtT05SIEieQg/view?usp=sharing)
+
+Original dataset webpage: [MOT-16](https://motchallenge.net/data/MOT16/)
+
+Download link: [MOT-16.zip](https://motchallenge.net/data/MOT16.zip)
+
+> The section "Download" in the bottom of the web-page. Link: "Get all data".
+
+Download dataset and unzip. The structure of the dataset will be the following:
+
+```text
+.
+鈹斺攢MOT16
+  鈹斺攢train
+```
+
+# Data config
+
+Download [schemas](https://github.com/Zhongdao/Towards-Realtime-MOT/tree/master/data) of the training data with relative paths for every image, divided into train/val parts and move into `data` folder.
+
+```text
+.
+鈹斺攢鈹€ data
+    鈹溾攢 caltech.10k.val
+    鈹溾攢 caltech.train
+    鈹溾攢 caltech.val
+    鈹溾攢 citypersons.train
+    鈹溾攢 citypersons.val
+    鈹溾攢 cuhksysu.train
+    鈹溾攢 cuhksysu.val
+    鈹溾攢 eth.train
+    鈹溾攢 mot17.train
+    鈹溾攢 prw.train
+    鈹斺攢 prw.val
+```
+
+# Citation
+
+Caltech:
+
+```text
+@inproceedings{ dollarCVPR09peds,
+       author = "P. Doll\'ar and C. Wojek and B. Schiele and  P. Perona",
+       title = "Pedestrian Detection: A Benchmark",
+       booktitle = "CVPR",
+       month = "June",
+       year = "2009",
+       city = "Miami",
+}
+```
+
+Citypersons:
+
+```text
+@INPROCEEDINGS{Shanshan2017CVPR,
+  author = {Shanshan Zhang and Rodrigo Benenson and Bernt Schiele},
+  title = {CityPersons: A Diverse Dataset for Pedestrian Detection},
+  booktitle = {CVPR},
+  year = {2017}
+ }
+
+@INPROCEEDINGS{Cordts2016Cityscapes,
+  title={The Cityscapes Dataset for Semantic Urban Scene Understanding},
+  author={Cordts, Marius and Omran, Mohamed and Ramos, Sebastian and Rehfeld, Timo and Enzweiler, Markus and Benenson, Rodrigo and Franke, Uwe and Roth, Stefan and Schiele, Bernt},
+  booktitle={Proc. of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
+  year={2016}
+}
+```
+
+CUHK-SYSU:
+
+```text
+@inproceedings{xiaoli2017joint,
+  title={Joint Detection and Identification Feature Learning for Person Search},
+  author={Xiao, Tong and Li, Shuang and Wang, Bochao and Lin, Liang and Wang, Xiaogang},
+  booktitle={CVPR},
+  year={2017}
+}
+```
+
+PRW:
+
+```text
+@inproceedings{zheng2017person,
+  title={Person re-identification in the wild},
+  author={Zheng, Liang and Zhang, Hengheng and Sun, Shaoyan and Chandraker, Manmohan and Yang, Yi and Tian, Qi},
+  booktitle={Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition},
+  pages={1367--1376},
+  year={2017}
+}
+```
+
+ETHZ:
+
+```text
+@InProceedings{eth_biwi_00534,
+  author = {A. Ess and B. Leibe and K. Schindler and and L. van Gool},
+  title = {A Mobile Vision System for Robust Multi-Person Tracking},
+  booktitle = {IEEE Conference on Computer Vision and Pattern Recognition (CVPR'08)},
+  year = {2008},
+  month = {June},
+  publisher = {IEEE Press},
+}
+```
+
+MOT-16&17:
+
+```text
+@article{milan2016mot16,
+  title={MOT16: A benchmark for multi-object tracking},
+  author={Milan, Anton and Leal-Taix{\'e}, Laura and Reid, Ian and Roth, Stefan and Schindler, Konrad},
+  journal={arXiv preprint arXiv:1603.00831},
+  year={2016}
+}
+```
diff --git a/research/cv/JDE/README.md b/research/cv/JDE/README.md
new file mode 100644
index 0000000000000000000000000000000000000000..35e9734c7f9dec167d6526796ad60fce13961f47
--- /dev/null
+++ b/research/cv/JDE/README.md
@@ -0,0 +1,392 @@
+# Contents
+
+- [Contents](#contents)
+    - [JDE Description](#jde-description)
+    - [Model Architecture](#model-architecture)
+    - [Dataset](#dataset)
+    - [Environment Requirements](#environment-requirements)
+    - [Quick Start](#quick-start)
+    - [Script Description](#script-description)
+        - [Script and Sample Code](#script-and-sample-code)
+        - [Script Parameters](#script-parameters)
+        - [Training Process](#training-process)
+            - [Standalone Training](#standalone-training)
+            - [Distribute Training](#distribute-training)
+        - [Evaluation Process](#evaluation-process)
+            - [Evaluation](#evaluation)
+        - [Inference Process](#inference-process)
+            - [Usage](#usage)
+            - [Result](#result)
+    - [Model Description](#model-description)
+        - [Performance](#performance)
+            - [Training Performance](#training-performance)
+            - [Evaluation Performance](#evaluation-performance)
+    - [ModelZoo Homepage](#modelzoo-homepage)
+
+## [JDE Description](#contents)
+
+Paper with introduced JDE model is dedicated to the improving efficiency of an MOT system.
+It's introduce an early attempt that Jointly learns the Detector and Embedding model (JDE) in a single-shot deep network.
+In other words, the proposed JDE employs a single network to simultaneously output detection results and the corresponding appearance embeddings of the detected boxes.
+In comparison, SDE methods and two-stage methods are characterized by re-sampled pixels (bounding boxes) and feature maps, respectively.
+Both the bounding boxes and feature maps are fed into a separate re-ID model for appearance feature extraction.
+Method is near real-time while being almost as accurate as the SDE methods.
+
+[Paper](https://arxiv.org/pdf/1909.12605.pdf):  Towards Real-Time Multi-Object Tracking. Department of Electronic Engineering, Tsinghua University
+
+## [Model Architecture](#contents)
+
+Architecture of the JDE is the Feature Pyramid Network (FPN).
+FPN makes predictions from multiple scales, thus bringing improvement in pedestrian detection where the scale of targets varies a lot.
+An input video frame first undergoes a forward pass through a backbone network to obtain feature maps at three scales, namely, scales with 1/32, 1/16 and 1/8 down-sampling rate, respectively.
+Then, the feature map with the smallest size (also the semantically strongest features) is up-sampled and fused with the feature map from the second smallest scale by skip connection, and the same goes for the other scales.
+Finally, prediction heads are added upon fused feature maps at all the three scales.
+A prediction head consists of several stacked convolutional layers and outputs a dense prediction map of size (6A + D) 脳 H 脳 W, where A is the number of anchor templates assigned to this scale, and D is the dimension of the embedding.
+
+## [Dataset](#contents)
+
+Used a large-scale training set by putting together six publicly available datasets on pedestrian detection, MOT and person search.
+
+These datasets can be categorized into two types: ones that only contain bounding box annotations, and ones that have both bounding box and identity annotations.
+The first category includes the ETH dataset and the CityPersons (CP) dataset. The second category includes the CalTech (CT) dataset, MOT16 (M16) dataset, CUHK-SYSU (CS) dataset and PRW dataset.
+Training subsets of all these datasets are gathered to form the joint training set, and videos in the ETH dataset that overlap with the MOT-16 test set are excluded for fair evaluation.
+
+Datasets preparations are described in [DATASET_ZOO.md](DATASET_ZOO.md).
+
+Datasets size: 134G, 1 object category (pedestrian).
+
+Note: `--dataset_root` is used as an entry point for all datasets, used for training and evaluating this model.
+
+Organize your dataset structure as follows:
+
+```text
+.
+鈹斺攢dataset_root/
+  鈹溾攢Caltech/
+  鈹溾攢Cityscapes/
+  鈹溾攢CUHKSYSU/
+  鈹溾攢ETHZ/
+  鈹溾攢MOT16/
+  鈹溾攢MOT17/
+  鈹斺攢PRW/
+```
+
+Information about train part of dataset.
+
+| Dataset | ETH |  CP |  CT | M16 |  CS | PRW | Total |
+| :------:|:---:|:---:|:---:|:---:|:---:|:---:|:-----:|
+| # img   |2K   |3K   |27K  |53K  |11K  |6K   |54K    |
+| # box   |17K  |21K  |46K  |112K |55K  |18K  |270K   |
+| # ID    |-    |-    |0.6K |0.5K |7K   |0.5K |8.7K   |
+
+## [Environment Requirements](#contents)
+
+- Hardware锛圙PU锛�
+- Prepare hardware environment with GPU processor.
+- Framework
+    - [MindSpore](https://www.mindspore.cn/install/en)
+- For more information, please check the resources below锛�
+    - [MindSpore Tutorials](https://www.mindspore.cn/tutorials/en/master/index.html)
+    - [MindSpore Python API](https://www.mindspore.cn/docs/api/en/master/index.html)
+
+## [Quick Start](#contents)
+
+After installing MindSpore through the official website, you can follow the steps below for training and evaluation,
+in particular, before training, you need to install `requirements.txt` by following command `pip install -r requirements.txt`.
+
+> If an error occurred, update pip by `pip install --upgrade pip` and try again.
+> If it didn't help install packages manually by using `pip install {package from requirements.txt}`.
+
+Note: The PyTorch is used only for checkpoint conversion.
+
+All trainings will starts from pre-trained backbone,
+[download](https://drive.google.com/file/d/1keZwVIfcWmxfTiswzOKUwkUz2xjvTvfm/view) and convert the pre-trained on
+ImageNet backbone with commands below:
+
+```bash
+# From the root model directory run
+python -m src.convert_checkpoint --ckpt_url [PATH_TO_PYTORCH_CHECKPOINT]
+```
+
+- PATH_TO_PYTORCH_CHECKPOINT - Path to the downloaded darknet53 PyTorch checkpoint.
+
+After converting the checkpoint and installing the requirements.txt, you can run the training scripts:
+
+```bash
+# Run standalone training example
+bash scripts/run_standalone_train_gpu.sh [DEVICE_ID] [LOGS_CKPT_DIR] [CKPT_URL] [DATASET_ROOT]
+
+# Run distribute training example
+bash scripts/run_distribute_train_gpu.sh [DEVICE_NUM] [LOGS_CKPT_DIR] [CKPT_URL] [DATASET_ROOT]
+```
+
+- DEVICE_ID - Device ID
+- LOGS_CKPT_DIR - path to the directory, where the training results will be stored.
+- CKPT_URL - Path to the converted pre-trained DarkNet53 backbone.
+- DATASET_ROOT - Path to the dataset root directory (containing all dataset parts, described in [DATASET_ZOO.md](DATASET_ZOO.md))
+
+## [Script Description](#contents)
+
+### [Script and Sample Code](#contents)
+
+```text
+.
+鈹斺攢JDE
+  鈹溾攢data
+  鈹� 鈹斺攢prepare_mot17.py                 # MOT17 data preparation script
+  鈹溾攢cfg
+  鈹� 鈹溾攢ccmcpe.json                      # paths to dataset schema (defining relative paths structure)
+  鈹� 鈹斺攢config.py                        # parameter parser
+  鈹溾攢scripts
+  鈹� 鈹溾攢run_distribute_train_gpu.sh      # launch distribute train on GPU
+  鈹� 鈹溾攢run_eval_gpu.sh                  # launch evaluation on GPU
+  鈹� 鈹斺攢run_standalone_train_gpu.sh      # launch standalone train on GPU
+  鈹溾攢src
+  鈹� 鈹溾攢__init__.py
+  鈹� 鈹溾攢convert_checkpoint.py            # backbone checkpoint converter (torch to mindspore)
+  鈹� 鈹溾攢darknet.py                       # backbone of network
+  鈹� 鈹溾攢dataset.py                       # create dataset
+  鈹� 鈹溾攢evaluation.py                    # motmetrics evaluator
+  鈹� 鈹溾攢io.py                            # MOT evaluation utils
+  鈹� 鈹溾攢kalman_filter.py                 # kalman filter script
+  鈹� 鈹溾攢log.py                           # logger script
+  鈹� 鈹溾攢model.py                         # create model script
+  鈹� 鈹溾攢timer.py                         # timer script
+  鈹� 鈹溾攢utils.py                         # utilities used in other scripts
+  鈹� 鈹斺攢visualization.py                 # visualization for inference
+  鈹溾攢tracker
+  鈹� 鈹溾攢__init__.py
+  鈹� 鈹溾攢basetrack.py                     # base class for tracking
+  鈹� 鈹溾攢matching.py                      # matching for tracking script
+  鈹� 鈹斺攢multitracker.py                  # tracker init script
+  鈹溾攢DATASET_ZOO.md                     # dataset preparing description
+  鈹溾攢README.md
+  鈹溾攢default_config.yaml                # default configs
+  鈹溾攢eval.py                            # evaluation script
+  鈹溾攢eval_detect.py                     # detector evaluation script
+  鈹溾攢export.py                          # export to MINDIR script
+  鈹溾攢infer.py                           # inference script
+  鈹溾攢requirements.txt
+  鈹斺攢train.py                           # training script
+```
+
+### [Script Parameters](#contents)
+
+```text
+Parameters in config.py and default_config.yaml.
+Include arguments for Train/Evaluation/Inference.
+
+--config_path             Path to default_config.yaml with hyperparameters and defaults
+--data_cfg_url            Path to .json with paths to datasets schemas
+--momentum                Momentum for SGD optimizer
+--decay                   Weight_decay for SGD optimizer
+--lr                      Init learning rate
+--epochs                  Number of epochs to train
+--batch_size              Batch size per one device'
+--num_classes             Number of object classes
+--k_max                   Max predictions per one map (made for optimization of FC layer embedding computation)
+--img_size                Size of input images
+--track_buffer            Tracking buffer
+--keep_checkpoint_max     Keep saved last N checkpoints
+--backbone_input_shape    Input filters of backbone layers
+--backbone_shape          Input filters of backbone layers
+--backbone_layers         Output filters of backbone layers
+--out_channel             Number of channels for detection
+--embedding_dim           Number of channels for embeddings
+--iou_thres               IOU thresholds
+--conf_thres              Confidence threshold
+--nms_thres               Threshold for Non-max suppression
+--min_box_area            Filter out tiny boxes
+--anchor_scales           12 predefined anchor boxes. Different 4 per each of 3 feature maps
+--col_names_train         Names of columns for training GeneratorDataset
+--col_names_val           Names of columns for validation GeneratorDataset
+--is_distributed          Distribute training or not
+--dataset_root            Path to datasets root folder
+--device_target           Device GPU or any
+--device_id               Device id of target device
+--device_start            Start device id
+--ckpt_url                Location of checkpoint
+--logs_dir                Dir to save logs and ckpt
+--input_video             Path to the input video
+--output_format           Expected output format
+--output_root             Expected output root path
+--save_images             Save tracking results (image)
+--save_videos             Save tracking results (video)
+```
+
+### [Training Process](#contents)
+
+#### Standalone Training
+
+Note: For all trainings necessary to use pretrained backbone darknet53.
+
+```bash
+bash scripts/run_standalone_train_gpu.sh [DEVICE_ID] [LOGS_CKPT_DIR] [CKPT_URL] [DATASET_ROOT]
+```
+
+- DEVICE_ID - device ID
+- LOGS_CKPT_DIR - path to the directory, where the training results will be stored.
+- CKPT_URL - Path to the converted pre-trained DarkNet53 backbone.
+- DATASET_ROOT - Path to the dataset root directory (containing all dataset parts, described in [DATASET_ZOO.md](DATASET_ZOO.md))
+
+The above command will run in the background, you can view the result through the generated standalone_train.log file.
+After training, you can get the training loss and time logs in chosen logs_dir.
+
+The model checkpoints will be saved in LOGS_CKPT_DIR directory.
+
+#### Distribute Training
+
+```bash
+bash scripts/run_distribute_train_gpu.sh [DEVICE_NUM] [LOGS_CKPT_DIR] [CKPT_URL] [DATASET_ROOT]
+```
+
+- DEVICE_ID - device ID
+- LOGS_CKPT_DIR - path to the directory, where the training results will be stored.
+- CKPT_URL - Path to the converted pre-trained DarkNet53 backbone.
+- DATASET_ROOT - Path to the dataset root directory (containing all dataset parts, described in [DATASET_ZOO.md](DATASET_ZOO.md))
+
+The above shell script will run the distributed training in the background.
+Here is the example of the training logs:
+
+```text
+epoch: 30 step: 1612, loss is -4.7679796
+epoch: 30 step: 1612, loss is -5.816874
+epoch: 30 step: 1612, loss is -5.302864
+epoch: 30 step: 1612, loss is -5.775913
+epoch: 30 step: 1612, loss is -4.9537477
+epoch: 30 step: 1612, loss is -4.3535285
+epoch: 30 step: 1612, loss is -5.0773625
+epoch: 30 step: 1612, loss is -4.2019467
+epoch time: 2023042.925 ms, per step time: 1209.954 ms
+epoch time: 2023069.500 ms, per step time: 1209.970 ms
+epoch time: 2023097.331 ms, per step time: 1209.986 ms
+epoch time: 2023038.221 ms, per step time: 1209.951 ms
+epoch time: 2023098.113 ms, per step time: 1209.987 ms
+epoch time: 2023093.300 ms, per step time: 1209.984 ms
+epoch time: 2023078.631 ms, per step time: 1209.975 ms
+epoch time: 2017509.966 ms, per step time: 1206.645 ms
+train success
+train success
+train success
+train success
+train success
+train success
+train success
+train success
+```
+
+### [Evaluation Process](#contents)
+
+#### Evaluation
+
+Tracking ability of the model is tested on the train part of the MOT16 dataset (doesn't use during training).
+
+To start tracker evaluation run the command below.
+
+```bash
+bash scripts/run_eval_gpu.sh [DEVICE_ID] [CKPT_URL] [DATASET_ROOT]
+```
+
+- DEVICE_ID - Device ID.
+- CKPT_URL - Path to the trained JDE model.
+- DATASET_ROOT - Path to the dataset root directory (containing all dataset parts, described in [DATASET_ZOO.md](DATASET_ZOO.md)).
+
+> Note: the script expects that the DATASET_ROOT directory contains the MOT16 sub-folder.
+
+The above python command will run in the background. The validation logs will be saved in "eval.log".
+
+For more details about `motmetrics`, you can refer to [MOT benchmark](https://motchallenge.net/).
+
+```text
+DATE-DATE-DATE TIME:TIME:TIME [INFO]: Time elapsed: 240.54 seconds, FPS: 22.04
+          IDF1   IDP   IDR  Rcll  Prcn  GT  MT  PT ML   FP    FN  IDs    FM  MOTA  MOTP IDt IDa IDm
+MOT16-02 45.1% 49.9% 41.2% 71.0% 86.0%  54  17  31  6 2068  5172  425   619 57.0% 0.215 239  68  14
+MOT16-04 69.5% 75.5% 64.3% 80.6% 94.5%  83  45  24 14 2218  9234  175   383 75.6% 0.184  98  28   3
+MOT16-05 63.6% 68.1% 59.7% 82.0% 93.7% 125  67  49  9  376  1226  137   210 74.5% 0.203 113  40  40
+MOT16-09 55.2% 60.4% 50.8% 78.1% 92.9%  25  16   8  1  316  1152  108   147 70.0% 0.187  76  15  11
+MOT16-10 57.1% 59.9% 54.5% 80.1% 88.1%  54  28  26  0 1337  2446  376   569 66.2% 0.228 202  66  16
+MOT16-11 75.0% 76.4% 73.7% 89.6% 92.9%  69  50  16  3  626   953   78   137 81.9% 0.159  49  24  12
+MOT16-13 64.8% 69.9% 60.3% 78.5% 90.9% 107  58  43  6  900  2463  272   528 68.3% 0.223 200  59  48
+OVERALL  63.2% 68.1% 58.9% 79.5% 91.8% 517 281 197 39 7841 22646 1571  2593 71.0% 0.196 977 300 144
+```
+
+To evaluate detection ability (get mAP, Precision and Recall metrics) of the model, run command below.
+
+```bash
+python eval_detect.py --device_id [DEVICE_ID] --ckpt_url [CKPT_URL] --dataset_root [DATASET_ROOT]
+```
+
+- DEVICE_ID - Device ID.
+- CKPT_URL - Path to the trained JDE model.
+- DATASET_ROOT - Path to the dataset root directory (containing all dataset parts, described in [DATASET_ZOO.md](DATASET_ZOO.md)).
+
+Results of evaluation will be visualized at command line.
+
+```text
+      Image      Total          P          R        mAP
+       4000      30353      0.829      0.778      0.765      0.426s
+       8000      30353      0.863      0.798      0.788       0.42s
+      12000      30353      0.854      0.815      0.802      0.419s
+      16000      30353      0.857      0.821      0.809      0.582s
+      20000      30353      0.865      0.834      0.824      0.413s
+      24000      30353      0.868      0.841      0.832      0.415s
+      28000      30353      0.874      0.839       0.83      0.419s
+mean_mAP: 0.8225, mean_R: 0.8325, mean_P: 0.8700
+```
+
+### [Inference Process](#contents)
+
+#### Usage
+
+To compile video from frames with predicted bounding boxes, you need to install `ffmpeg` by using
+`sudo apt-get install ffmpeg`. Video compiling will happen automatically.
+
+```bash
+python infer.py --device_id [DEVICE_ID] --ckpt_url [CKPT_URL] --input_video [INPUT_VIDEO]
+```
+
+- DEVICE_ID - Device ID.
+- CKPT_URL - Path to the trained JDE model.
+- INPUT_VIDEO - Path to the input video to tracking.
+
+#### Result
+
+Results of the inference will be saved into default `./results` folder, logs will be shown at command line.
+
+## [Model Description](#contents)
+
+### [Performance](#contents)
+
+#### Training Performance
+
+| Parameters                 | GPU (8p)                                                                            |
+| -------------------------- |-----------------------------------------------------------------------------------  |
+| Model                      | JDE (1088*608)                                                                      |
+| Hardware                   | 8 Nvidia RTX 3090, Intel Xeon Gold 6226R CPU @ 2.90GHz                              |
+| Upload Date                | 02/02/2022 (day/month/year)                                                         |
+| MindSpore Version          | 1.5.0                                                                               |
+| Dataset                    | Joint Dataset (see `DATASET_ZOO.md`)                                                |
+| Training Parameters        | epoch=30, batch_size=4 (per device), lr=0.01, momentum=0.9, weight_decay=0.0001     |
+| Optimizer                  | SGD                                                                                 |
+| Loss Function              | SmoothL1Loss, SoftmaxCrossEntropyWithLogits (and apply auto-balancing loss strategy)|
+| Outputs                    | Tensor of bbox cords, conf, class, emb                                              |
+| Speed                      | Eight cards: ~1206 ms/step                                                          |
+| Total time                 | Eight cards: ~17 hours                                                              |
+
+#### Evaluation Performance
+
+| Parameters          | GPU (1p)                                               |
+| ------------------- |--------------------------------------------------------|
+| Model               | JDE (1088*608)                                         |
+| Resource            | 1 Nvidia RTX 3090, Intel Xeon Gold 6226R CPU @ 2.90GHz |
+| Upload Date         | 02/02/2022 (day/month/year)                            |
+| MindSpore Version   | 1.5.0                                                  |
+| Dataset             | MOT-16                                                 |
+| Batch_size          | 1                                                      |
+| Outputs             | Metrics, .txt predictions                              |
+| FPS                 | 22.04                                                  |
+| Metrics             | mAP 82.2, MOTA 71.0%                                   |
+
+## [ModelZoo Homepage](#contents)
+
+ Please check the official [homepage](https://gitee.com/mindspore/models).
diff --git a/research/cv/JDE/cfg/ccmcpe.json b/research/cv/JDE/cfg/ccmcpe.json
new file mode 100644
index 0000000000000000000000000000000000000000..ac1825ea5b45a7a62b8d527f7c899715e822fb40
--- /dev/null
+++ b/research/cv/JDE/cfg/ccmcpe.json
@@ -0,0 +1,22 @@
+{
+    "train":
+    {
+        "mot17":"./data/mot17.train",
+        "caltech":"./data/caltech.train",
+        "citypersons":"./data/citypersons.train",
+        "cuhksysu":"./data/cuhksysu.train",
+        "prw":"./data/prw.train",
+        "eth":"./data/eth.train"
+    },
+    "test_emb":
+    {
+        "caltech":"./data/caltech.10k.val",
+        "cuhksysu":"./data/cuhksysu.val",
+        "prw":"./data/prw.val"
+    },
+    "test":
+    {
+        "caltech":"./data/caltech.val",
+        "citypersons":"./data/citypersons.val"
+    }
+}
diff --git a/research/cv/JDE/cfg/config.py b/research/cv/JDE/cfg/config.py
new file mode 100644
index 0000000000000000000000000000000000000000..8b4bc609818bd001e5dd0476999c8cc98e2cfb44
--- /dev/null
+++ b/research/cv/JDE/cfg/config.py
@@ -0,0 +1,129 @@
+# Copyright 2022 Huawei Technologies Co., Ltd
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ============================================================================
+"""Parse arguments"""
+import argparse
+import ast
+from pathlib import Path
+from pprint import pformat
+
+import yaml
+
+
+class Config:
+    """
+    Configuration namespace, convert dictionary to members.
+    """
+    def __init__(self, cfg_dict):
+        for k, v in cfg_dict.items():
+            if isinstance(v, (list, tuple)):
+                setattr(self, k, [Config(x) if isinstance(x, dict) else x for x in v])
+            else:
+                setattr(self, k, Config(v) if isinstance(v, dict) else v)
+
+    def __str__(self):
+        return pformat(self.__dict__)
+
+    def __repr__(self):
+        return self.__str__()
+
+
+def parse_cli_to_yaml(parser, cfg, helper=None, choices=None, cfg_path="default_config.yaml"):
+    """
+    Parse command line arguments to the configuration according to the default yaml.
+
+    Args:
+        parser (argparse.ArgumentParser): Parent parser.
+        cfg (dict): Base configuration.
+        helper (dict): Helper description.
+        choices (dict): Choices.
+    """
+    helper = {} if helper is None else helper
+    choices = {} if choices is None else choices
+    for item in cfg:
+        if not isinstance(cfg[item], list) and not isinstance(cfg[item], dict):
+            help_description = helper[item] if item in helper else f"Please reference to {cfg_path}"
+            choice = choices[item] if item in choices else None
+            if isinstance(cfg[item], bool):
+                parser.add_argument("--" + item, type=ast.literal_eval, default=cfg[item], choices=choice,
+                                    help=help_description)
+            else:
+                parser.add_argument("--" + item, type=type(cfg[item]), default=cfg[item], choices=choice,
+                                    help=help_description)
+    args = parser.parse_args()
+    return args
+
+
+def parse_yaml(yaml_path):
+    """
+    Parse the yaml config file.
+    """
+    with open(yaml_path, 'r') as fin:
+        try:
+            cfgs_raw = yaml.load_all(fin.read(), Loader=yaml.FullLoader)
+            cfgs = []
+            for cf in cfgs_raw:
+                cfgs.append(cf)
+
+            if len(cfgs) == 1:
+                cfg_helper = {}
+                cfg = cfgs[0]
+                cfg_choices = {}
+            elif len(cfgs) == 2:
+                cfg, cfg_helper = cfgs
+                cfg_choices = {}
+            elif len(cfgs) == 3:
+                cfg, cfg_helper, cfg_choices = cfgs
+            else:
+                raise ValueError("At most 3 docs (config, description for help, choices) are supported in config yaml")
+        except ValueError("Failed to parse yaml") as err:
+            raise err
+
+    return cfg, cfg_helper, cfg_choices
+
+
+def merge(args, cfg):
+    """
+    Merge the base config from yaml file and command line arguments.
+
+    Args:
+        args (argparse.Namespace): Command line arguments.
+        cfg (dict): Base configuration.
+    """
+    args_var = vars(args)
+    for item in args_var:
+        cfg[item] = args_var[item]
+
+    return cfg
+
+
+def get_config():
+    """
+    Get Config according to the yaml file and cli arguments.
+    """
+    curr_dir = Path(__file__).resolve().parent
+    parser = argparse.ArgumentParser(description="JDE config", add_help=False)
+    parser.add_argument("--config_path", type=str, default=str(curr_dir / "../default_config.yaml"),
+                        help="Path to config.")
+    parser.add_argument("--data_cfg_url", type=str, default=str(curr_dir / "ccmcpe.json"),
+                        help="Path to data config.")
+    path_args, _ = parser.parse_known_args()
+    default, helper, choices = parse_yaml(path_args.config_path)
+    args = parse_cli_to_yaml(parser=parser, cfg=default, helper=helper, choices=choices)
+    final_config = merge(args, default)
+
+    return Config(final_config)
+
+
+config = get_config()
diff --git a/research/cv/JDE/data/prepare_mot17.py b/research/cv/JDE/data/prepare_mot17.py
new file mode 100644
index 0000000000000000000000000000000000000000..b147b43b7269928b5525d88a32518934b590868c
--- /dev/null
+++ b/research/cv/JDE/data/prepare_mot17.py
@@ -0,0 +1,79 @@
+# Copyright 2022 Huawei Technologies Co., Ltd
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ============================================================================
+"""Prepare data."""
+import argparse
+import os
+import os.path as osp
+import shutil
+from pathlib import Path
+
+import numpy as np
+
+
+def prepare(seq_root):
+    """Prepare MOT17 dataset for JDE training."""
+    label_root = str(Path(Path(seq_root).parents[0], 'labels_with_ids', 'train'))
+    seqs = [s for s in os.listdir(seq_root) if s.endswith('SDP')]
+
+    tid_curr = 0
+    tid_last = -1
+
+    for seq in seqs:
+        with open(osp.join(seq_root, seq, 'seqinfo.ini')) as file:
+            seq_info = file.read()
+
+        seq_width = int(seq_info[seq_info.find('imWidth=') + 8: seq_info.find('\nimHeight')])
+        seq_height = int(seq_info[seq_info.find('imHeight=') + 9: seq_info.find('\nimExt')])
+
+        gt_txt = osp.join(seq_root, seq, 'gt', 'gt.txt')
+        gt = np.loadtxt(gt_txt, dtype=np.float64, delimiter=',')
+
+        seq_label_root = osp.join(label_root, seq, 'img1')
+        if not osp.exists(seq_label_root):
+            os.makedirs(seq_label_root)
+
+        for fid, tid, x, y, w, h, mark, label, _ in gt:
+            if mark == 0 or not label == 1:
+                continue
+            fid = int(fid)
+            tid = int(tid)
+            if tid != tid_last:
+                tid_curr += 1
+                tid_last = tid
+            x += w / 2
+            y += h / 2
+            label_fpath = osp.join(seq_label_root, '{:06d}.txt'.format(fid))
+            label_str = '0 {:d} {:.6f} {:.6f} {:.6f} {:.6f}\n'.format(
+                tid_curr, x / seq_width, y / seq_height, w / seq_width, h / seq_height)
+            with open(label_fpath, 'a') as f:
+                f.write(label_str)
+
+        old_path = str(Path(seq_root, seq))
+        new_path = str(Path(Path(seq_root).parents[0], 'images', 'train'))
+
+        if not osp.exists(new_path):
+            os.makedirs(new_path)
+
+        shutil.move(old_path, new_path)
+
+    print('Done')
+
+
+if __name__ == "__main__":
+    parser = argparse.ArgumentParser()
+    parser.add_argument("--seq_root", required=True, help='Path to root dir of sequences')
+
+    args = parser.parse_args()
+    prepare(args.seq_root)
diff --git a/research/cv/JDE/default_config.yaml b/research/cv/JDE/default_config.yaml
new file mode 100644
index 0000000000000000000000000000000000000000..c00ce6282dd499a55f167d516fbd3a347c10d379
--- /dev/null
+++ b/research/cv/JDE/default_config.yaml
@@ -0,0 +1,120 @@
+# Builtin Configurations(DO NOT CHANGE THESE CONFIGURATIONS unless you know exactly what you are doing)
+
+# hyperparameters of training
+momentum: 0.9
+decay: 0.0001
+lr: 0.01
+epochs: 30
+batch_size: 4
+
+# other
+num_classes: 1
+k_max: 250
+img_size: [1088, 608]
+track_buffer: 30
+keep_checkpoint_max: 6
+
+# model initialization parameters
+backbone_input_shape: [32, 64, 128, 256, 512]
+backbone_shape: [64, 128, 256, 512, 1024]
+backbone_layers: [1, 2, 8, 8, 4]
+out_channel: 24  # 3 * (num_classes + 5)
+embedding_dim: 512
+
+# evaluation thresholds
+iou_thres: 0.50
+conf_thres: 0.55
+nms_thres: 0.45
+min_box_area: 200
+
+# h -> w
+anchor_scales: [
+      [8, 24],
+      [11, 34],
+      [16, 48],
+      [23, 68],
+      [32, 96],
+      [45, 135],
+      [64, 192],
+      [90, 271],
+      [128, 384],
+      [180, 540],
+      [256, 640],
+      [512, 640],
+]
+
+
+# data configs
+col_names_train: [
+    'imgs',
+    'tconf_s',
+    'tbox_s',
+    'tid_s',
+    'tconf_m',
+    'tbox_m',
+    'tid_m',
+    'tconf_b',
+    'tbox_b',
+    'tid_b',
+    'emb_indices_s',
+    'emb_indices_m',
+    'emb_indices_b',
+]
+
+col_names_val: [
+    'imgs',
+    'targets',
+    'lens',
+]
+
+
+# other
+is_distributed: False
+dataset_root: '/path/to/datasets/root/folder/'
+device_target: 'GPU'
+device_id: 0
+device_start: 0
+ckpt_url: '/path/to/checkpoint'
+logs_dir: './logs'
+input_video: '/path/to/input/video'
+output_format: 'video'
+output_root: './results'
+save_images: False
+save_videos: False
+
+---
+# Config description for each option
+momentum: 'Momentum for SGD optimizer.'
+decay: 'Weight_decay for SGD optimizer.'
+lr: 'Init learning rate.'
+epochs: 'Number of epochs to train.'
+batch_size: 'Batch size per one device'
+num_classes: 'Number of object classes.'
+k_max: 'Max predictions per one map (made for optimization of FC layer embedding computation).'
+img_size: 'Size of input images.'
+track_buffer: 'Tracking buffer.'
+keep_checkpoint_max: 'Keep saved last N checkpoints.'
+backbone_input_shape: 'Input filters of backbone layers.'
+backbone_shape: 'Input filters of backbone layers.'
+backbone_layers: 'Output filters of backbone layers.'
+out_channel: 'Number of channels for detection.'
+embedding_dim: 'Number of channels for embeddings.'
+iou_thres: 'IOU thresholds.'
+conf_thres: 'Confidence threshold.'
+nms_thres: 'Threshold for Non-max suppression.'
+min_box_area: 'Filter out tiny boxes.'
+anchor_scales: '12 predefined anchor boxes. Different 4 per each of 3 feature maps.'
+col_names_train: 'Names of columns for training GeneratorDataset.'
+col_names_val: 'Names of columns for validation GeneratorDataset.'
+is_distributed: 'Distribute training or not.'
+dataset_root: 'Path to datasets root folder.'
+device_target: 'Device GPU or any.'
+device_id: 'Device id of target device.'
+device_start: 'Start device id.'
+ckpt_url: 'Location of checkpoint.'
+logs_dir: 'Dir to save logs and ckpt.'
+input_video: 'Path to the input video.'
+output_format: 'Expected output format.'
+output_root: 'Expected output root path.'
+save_images: 'Save tracking results (image).'
+save_videos: 'Save tracking results (video).'
diff --git a/research/cv/JDE/eval.py b/research/cv/JDE/eval.py
new file mode 100644
index 0000000000000000000000000000000000000000..064b160d07b4b7381e29b8d64971366504a6ae9a
--- /dev/null
+++ b/research/cv/JDE/eval.py
@@ -0,0 +1,272 @@
+# Copyright 2022 Huawei Technologies Co., Ltd
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ============================================================================
+"""Tracker evaluation script."""
+import logging
+import os
+import os.path as osp
+
+import cv2
+import motmetrics as mm
+import numpy as np
+from mindspore import Model
+from mindspore import Tensor
+from mindspore import context
+from mindspore import dtype as mstype
+from mindspore.train.serialization import load_checkpoint
+
+from cfg.config import config as default_config
+from src import visualization as vis
+from src.darknet import DarkNet, ResidualBlock
+from src.dataset import LoadImages
+from src.evaluation import Evaluator
+from src.log import logger
+from src.model import JDEeval
+from src.model import YOLOv3
+from src.timer import Timer
+from src.utils import mkdir_if_missing
+from tracker.multitracker import JDETracker
+
+_MOT16_VALIDATION_FOLDERS = (
+    'MOT16-02',
+    'MOT16-04',
+    'MOT16-05',
+    'MOT16-09',
+    'MOT16-10',
+    'MOT16-11',
+    'MOT16-13',
+)
+
+_MOT16_DIR_FOR_TEST = 'MOT16/train'
+
+
+def write_results(filename, results, data_type):
+    """
+    Format for evaluation results.
+    """
+    if data_type == 'mot':
+        save_format = '{frame},{id},{x1},{y1},{w},{h},1,-1,-1,-1\n'
+    elif data_type == 'kitti':
+        save_format = '{frame} {id} pedestrian 0 0 -10 {x1} {y1} {x2} {y2} -10 -10 -10 -1000 -1000 -1000 -10\n'
+    else:
+        raise ValueError(data_type)
+
+    with open(filename, 'w') as f:
+        for frame_id, tlwhs, track_ids in results:
+            if data_type == 'kitti':
+                frame_id -= 1
+            for tlwh, track_id in zip(tlwhs, track_ids):
+                if track_id < 0:
+                    continue
+                x1, y1, w, h = tlwh
+                x2, y2 = x1 + w, y1 + h
+                line = save_format.format(frame=frame_id, id=track_id, x1=x1, y1=y1, x2=x2, y2=y2, w=w, h=h)
+                f.write(line)
+    logger.info('Save results to %s', filename)
+
+
+def eval_seq(
+        opt,
+        dataloader,
+        data_type,
+        result_filename,
+        net,
+        save_dir=None,
+        frame_rate=30,
+):
+    """
+    Processes the video sequence given and provides the output
+    of tracking result (write the results in video file).
+
+    It uses JDE model for getting information about the online targets present.
+
+    Args:
+        opt (Any): Contains information passed as commandline arguments.
+        dataloader (Any): Fetching the image sequence and associated data.
+        data_type (str): Type of dataset corresponding(similar) to the given video.
+        result_filename (str): The name(path) of the file for storing results.
+        net (nn.Cell): Model.
+        save_dir (str): Path to output results.
+        frame_rate (int): Frame-rate of the given video.
+
+    Returns:
+        frame_id (int): Sequence number of the last sequence.
+        average_time (int): Average time for frame.
+        calls (int): Num of timer calls.
+    """
+    if save_dir:
+        mkdir_if_missing(save_dir)
+    tracker = JDETracker(opt, net=net, frame_rate=frame_rate)
+    timer = Timer()
+    results = []
+    frame_id = 0
+    timer.tic()
+    timer.toc()
+    timer.calls -= 1
+
+    for img, img0 in dataloader:
+        if frame_id % 20 == 0:
+            log_info = f'Processing frame {frame_id} ({(1. / max(1e-5, timer.average_time)):.2f} fps)'
+            logger.info('%s', log_info)
+
+        # except initialization step at time calculation
+        if frame_id != 0:
+            timer.tic()
+
+        im_blob = Tensor(np.expand_dims(img, 0), mstype.float32)
+        online_targets = tracker.update(im_blob, img0)
+        online_tlwhs = []
+        online_ids = []
+        for t in online_targets:
+            tlwh = t.tlwh
+            tid = t.track_id
+            vertical = tlwh[2] / tlwh[3] > 1.6
+            if tlwh[2] * tlwh[3] > opt.min_box_area and not vertical:
+                online_tlwhs.append(tlwh)
+                online_ids.append(tid)
+
+        if frame_id != 0:
+            timer.toc()
+        # save results
+        results.append((frame_id + 1, online_tlwhs, online_ids))
+        if save_dir is not None:
+            online_im = vis.plot_tracking(
+                img0,
+                online_tlwhs,
+                online_ids,
+                frame_id=frame_id,
+                fps=1. / timer.average_time,
+            )
+
+            cv2.imwrite(os.path.join(save_dir, f'{frame_id:05}.jpg'), online_im)
+        frame_id += 1
+    # save results
+    write_results(result_filename, results, data_type)
+
+    return frame_id, timer.average_time, timer.calls - 1
+
+
+def main(
+        opt,
+        data_root,
+        seqs,
+        exp_name,
+        save_videos=False,
+):
+    logger.setLevel(logging.INFO)
+    result_root = os.path.join(data_root, '..', 'results', exp_name)
+    mkdir_if_missing(result_root)
+    data_type = 'mot'
+
+    darknet53 = DarkNet(
+        ResidualBlock,
+        opt.backbone_layers,
+        opt.backbone_input_shape,
+        opt.backbone_shape,
+        detect=True,
+    )
+    model = YOLOv3(
+        backbone=darknet53,
+        backbone_shape=opt.backbone_shape,
+        out_channel=opt.out_channel,
+    )
+
+    model = JDEeval(model, opt)
+    load_checkpoint(opt.ckpt_url, model)
+    model = Model(model)
+
+    # Run tracking
+    n_frame = 0
+    timer_avgs, timer_calls, accs = [], [], []
+
+    for seq in seqs:
+        output_dir = os.path.join(data_root, '..', 'outputs', exp_name, seq) if save_videos else None
+
+        logger.info('start seq: %s', seq)
+
+        dataloader = LoadImages(osp.join(data_root, seq, 'img1'), opt.anchor_scales, opt.img_size)
+
+        result_filename = os.path.join(result_root, f'{seq}.txt')
+
+        with open(os.path.join(data_root, seq, 'seqinfo.ini')) as f:
+            meta_info = f.read()
+
+        frame_rate = int(meta_info[meta_info.find('frameRate') + 10:meta_info.find('\nseqLength')])
+
+        nf, ta, tc = eval_seq(
+            opt,
+            dataloader,
+            data_type,
+            result_filename,
+            net=model,
+            save_dir=output_dir,
+            frame_rate=frame_rate,
+        )
+
+        n_frame += nf
+        timer_avgs.append(ta)
+        timer_calls.append(tc)
+
+        # eval
+        logger.info('Evaluate seq: %s', seq)
+        evaluator = Evaluator(data_root, seq, data_type)
+        accs.append(evaluator.eval_file(result_filename))
+        if save_videos:
+            output_video_path = osp.join(output_dir, f'{seq}.mp4')
+            cmd_str = f'ffmpeg -f image2 -i {output_dir}/%05d.jpg -c:v copy {output_video_path}'
+            os.system(cmd_str)
+
+    timer_avgs = np.asarray(timer_avgs)
+    timer_calls = np.asarray(timer_calls)
+    all_time = np.dot(timer_avgs, timer_calls)
+    avg_time = all_time / np.sum(timer_calls)
+
+    log_info = f'Time elapsed: {all_time:.2f} seconds, FPS: {(1.0 / avg_time):.2f}'
+    logger.info('%s', log_info)
+
+    # Get summary
+    metrics = mm.metrics.motchallenge_metrics
+    mh = mm.metrics.create()
+    summary = Evaluator.get_summary(accs, seqs, metrics)
+    strsummary = mm.io.render_summary(
+        summary,
+        formatters=mh.formatters,
+        namemap=mm.io.motchallenge_metric_names
+    )
+
+    print(strsummary)
+    Evaluator.save_summary(summary, os.path.join(result_root, f'summary_{exp_name}.xlsx'))
+
+
+if __name__ == '__main__':
+    config = default_config
+
+    context.set_context(mode=context.GRAPH_MODE, device_target='GPU')
+    context.set_context(device_id=config.device_id)
+
+    data_root_path = os.path.join(config.dataset_root, _MOT16_DIR_FOR_TEST)
+
+    if not os.path.isdir(data_root_path):
+        raise NotADirectoryError(
+            f'Cannot find "{_MOT16_DIR_FOR_TEST}" subdirectory '
+            f'in the specified dataset root "{config.dataset_root}"'
+        )
+
+    main(
+        config,
+        data_root=data_root_path,
+        seqs=_MOT16_VALIDATION_FOLDERS,
+        exp_name=config.ckpt_url.split('/')[-2],
+        save_videos=config.save_videos,
+    )
diff --git a/research/cv/JDE/eval_detect.py b/research/cv/JDE/eval_detect.py
new file mode 100644
index 0000000000000000000000000000000000000000..9425f78caca267f6a7615263f74993239195b6d7
--- /dev/null
+++ b/research/cv/JDE/eval_detect.py
@@ -0,0 +1,216 @@
+# Copyright 2022 Huawei Technologies Co., Ltd
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ============================================================================
+"""Evaluation script."""
+import json
+import time
+
+import numpy as np
+from mindspore import Model
+from mindspore import context
+from mindspore import dataset as ds
+from mindspore.common import set_seed
+from mindspore.communication.management import get_group_size
+from mindspore.communication.management import get_rank
+from mindspore.dataset.vision import py_transforms as PY
+from mindspore.train.serialization import load_checkpoint
+
+from cfg.config import config as default_config
+from src.darknet import DarkNet, ResidualBlock
+from src.dataset import JointDatasetDetection
+from src.model import JDEeval
+from src.model import YOLOv3
+from src.utils import ap_per_class
+from src.utils import bbox_iou
+from src.utils import non_max_suppression
+from src.utils import xywh2xyxy
+
+set_seed(1)
+
+
+def _get_rank_info(device_target):
+    """
+    Get rank size and rank id.
+    """
+    if device_target == 'GPU':
+        rank_size = get_group_size()
+        rank_id = get_rank()
+    else:
+        raise ValueError("Unsupported platform.")
+
+    return rank_size, rank_id
+
+
+def main(
+        opt,
+        iou_thres,
+        conf_thres,
+        nms_thres,
+        nc,
+):
+    img_size = opt.img_size
+
+    with open(opt.data_cfg_url) as f:
+        data_config = json.load(f)
+        test_paths = data_config['test']
+
+    dataset = JointDatasetDetection(
+        opt.dataset_root,
+        test_paths,
+        augment=False,
+        transforms=PY.ToTensor(),
+        config=opt,
+    )
+
+    dataloader = ds.GeneratorDataset(
+        dataset,
+        column_names=opt.col_names_val,
+        shuffle=False,
+        num_parallel_workers=1,
+        max_rowsize=12,
+    )
+
+    dataloader = dataloader.batch(opt.batch_size, True)
+
+    darknet53 = DarkNet(
+        ResidualBlock,
+        opt.backbone_layers,
+        opt.backbone_input_shape,
+        opt.backbone_shape,
+        detect=True,
+    )
+
+    model = YOLOv3(
+        backbone=darknet53,
+        backbone_shape=opt.backbone_shape,
+        out_channel=opt.out_channel,
+    )
+
+    model = JDEeval(model, opt)
+
+    load_checkpoint(opt.ckpt_url, model)
+    print(f'Evaluation for {opt.ckpt_url}')
+    model = Model(model)
+
+    mean_map, mean_r, mean_p, seen = 0.0, 0.0, 0.0, 0
+    print('%11s' * 5 % ('Image', 'Total', 'P', 'R', 'mAP'))
+    maps, mr, mp = [], [], []
+    ap_accum, ap_accum_count = np.zeros(nc), np.zeros(nc)
+
+    for batch_i, inputs in enumerate(dataloader):
+        imgs, targets, targets_len = inputs
+        targets = targets.asnumpy()
+        targets_len = targets_len.asnumpy()
+
+        t = time.time()
+
+        raw_output, _ = model.predict(imgs)
+        output = non_max_suppression(raw_output.asnumpy(), conf_thres=conf_thres, nms_thres=nms_thres)
+
+        for i, o in enumerate(output):
+            if o is not None:
+                output[i] = o[:, :6]
+
+        # Compute average precision for each sample
+        targets = [targets[i][:int(l)] for i, l in enumerate(targets_len)]
+        for labels, detections in zip(targets, output):
+            seen += 1
+
+            if detections is None:
+                # If there are labels but no detections mark as zero ap
+                if labels.shape[0] != 0:
+                    maps.append(0)
+                    mr.append(0)
+                    mp.append(0)
+                continue
+
+            # Get detections sorted by decreasing confidence scores
+            detections = detections[np.argsort(-detections[:, 4])]
+
+            # If no labels add number of detections as incorrect
+            correct = []
+            if labels.shape[0] == 0:
+                maps.append(0)
+                mr.append(0)
+                mp.append(0)
+                continue
+
+            target_cls = labels[:, 0]
+
+            # Extract target boxes as (x1, y1, x2, y2)
+            target_boxes = xywh2xyxy(labels[:, 2:6])
+            target_boxes[:, 0] *= img_size[0]
+            target_boxes[:, 2] *= img_size[0]
+            target_boxes[:, 1] *= img_size[1]
+            target_boxes[:, 3] *= img_size[1]
+
+            detected = []
+            for *pred_bbox, _, _  in detections:
+                obj_pred = 0
+                pred_bbox = np.array(pred_bbox, dtype=np.float32).reshape(1, -1)
+                # Compute iou with target boxes
+                iou = bbox_iou(pred_bbox, target_boxes, x1y1x2y2=True)[0]
+                # Extract index of largest overlap
+                best_i = np.argmax(iou)
+                # If overlap exceeds threshold and classification is correct mark as correct
+                if iou[best_i] > iou_thres and obj_pred == labels[best_i, 0] and best_i not in detected:
+                    correct.append(1)
+                    detected.append(best_i)
+                else:
+                    correct.append(0)
+
+            # Compute Average Precision (ap) per class
+            ap, ap_class, r, p = ap_per_class(
+                tp=correct,
+                conf=detections[:, 4],
+                pred_cls=np.zeros_like(detections[:, 5]),  # detections[:, 6]
+                target_cls=target_cls,
+            )
+
+            # Accumulate AP per class
+            ap_accum_count += np.bincount(ap_class, minlength=nc)
+            ap_accum += np.bincount(ap_class, minlength=nc, weights=ap)
+
+            # Compute mean AP across all classes in this image, and append to image list
+            maps.append(ap.mean())
+            mr.append(r.mean())
+            mp.append(p.mean())
+
+            # Means of all images
+            mean_map = np.sum(maps) / (ap_accum_count + 1E-16)
+            mean_r = np.sum(mr) / (ap_accum_count + 1E-16)
+            mean_p = np.sum(mp) / (ap_accum_count + 1E-16)
+
+        if (batch_i + 1) % 1000 == 0:
+            # Print image mAP and running mean mAP
+            print(('%11s%11s' + '%11.3g' * 4 + 's') %
+                  (seen, dataset.nf, mean_p, mean_r, mean_map, time.time() - t))
+
+    # Print results
+    print(f'mean_mAP: {mean_map[0]:.4f}, mean_R: {mean_r[0]:.4f}, mean_P: {mean_p[0]:.4f}')
+
+
+if __name__ == "__main__":
+    config = default_config
+
+    context.set_context(mode=context.GRAPH_MODE, device_target=config.device_target)
+    context.set_context(device_id=config.device_id)
+
+    main(
+        opt=config,
+        iou_thres=0.5,
+        conf_thres=0.3,
+        nms_thres=0.45,
+        nc=config.num_classes,
+    )
diff --git a/research/cv/JDE/export.py b/research/cv/JDE/export.py
new file mode 100644
index 0000000000000000000000000000000000000000..40a21a9a57a65a8e70989f3406bcaa4f32bc862c
--- /dev/null
+++ b/research/cv/JDE/export.py
@@ -0,0 +1,67 @@
+# Copyright 2022 Huawei Technologies Co., Ltd
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ============================================================================
+"""run export"""
+from pathlib import Path
+
+import numpy as np
+from mindspore import Tensor
+from mindspore import context
+from mindspore import dtype as mstype
+from mindspore import load_checkpoint
+from mindspore.train.serialization import export
+
+from cfg.config import config as default_config
+from src.darknet import DarkNet, ResidualBlock
+from src.model import JDEeval
+from src.model import YOLOv3
+
+
+def run_export(config):
+    """
+    Export model to MINDIR.
+    """
+    darknet53 = DarkNet(
+        ResidualBlock,
+        config.backbone_layers,
+        config.backbone_input_shape,
+        config.backbone_shape,
+        detect=True,
+    )
+
+    yolov3 = YOLOv3(
+        backbone=darknet53,
+        backbone_shape=config.backbone_shape,
+        out_channel=config.out_channel,
+    )
+
+    net = JDEeval(yolov3, default_config)
+    load_checkpoint(config.ckpt_url, net)
+    net.set_train(False)
+
+    input_data = Tensor(np.zeros([1, 3, 1088, 608]), dtype=mstype.float32)
+    name = Path(config.ckpt_url).stem
+
+    export(net, input_data, file_name=name, file_format='MINDIR')
+    print('Model exported successfully!')
+
+
+if __name__ == "__main__":
+    context.set_context(
+        mode=context.GRAPH_MODE,
+        device_target=default_config.device_target,
+        device_id=default_config.device_id,
+    )
+
+    run_export(default_config)
diff --git a/research/cv/JDE/infer.py b/research/cv/JDE/infer.py
new file mode 100644
index 0000000000000000000000000000000000000000..536ea096c48b708ba5372dc79f6b703fe0ec3e3f
--- /dev/null
+++ b/research/cv/JDE/infer.py
@@ -0,0 +1,100 @@
+# Copyright 2022 Huawei Technologies Co., Ltd
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ============================================================================
+"""Inference script."""
+import logging
+import os
+import os.path as osp
+
+from mindspore import Model
+from mindspore import context
+from mindspore.train.serialization import load_checkpoint
+
+from cfg.config import config as default_config
+from eval import eval_seq
+from src.darknet import DarkNet, ResidualBlock
+from src.dataset import LoadVideo
+from src.log import logger
+from src.model import JDEeval
+from src.model import YOLOv3
+from src.utils import mkdir_if_missing
+
+logger.setLevel(logging.INFO)
+
+def track(opt):
+    """
+    Inference of the input video.
+
+    Save the results into output-root (video, annotations and frames.).
+    """
+
+    result_root = opt.output_root if opt.output_root != '' else '.'
+    mkdir_if_missing(result_root)
+
+    anchors = opt.anchor_scales
+
+    dataloader = LoadVideo(
+        opt.input_video,
+        anchor_scales=anchors,
+        img_size=opt.img_size,
+    )
+
+    darknet53 = DarkNet(
+        ResidualBlock,
+        opt.backbone_layers,
+        opt.backbone_input_shape,
+        opt.backbone_shape,
+        detect=True,
+    )
+    model = YOLOv3(
+        backbone=darknet53,
+        backbone_shape=opt.backbone_shape,
+        out_channel=opt.out_channel,
+    )
+
+    model = JDEeval(model, opt)
+    load_checkpoint(opt.ckpt_url, model)
+    model = Model(model)
+    logger.info('Starting tracking...')
+
+    result_filename = os.path.join(result_root, 'results.txt')
+    frame_rate = dataloader.frame_rate
+
+    frame_dir = None if opt.output_format == 'text' else osp.join(result_root, 'frame')
+    try:
+        eval_seq(
+            opt,
+            dataloader,
+            'mot',
+            result_filename,
+            net=model,
+            save_dir=frame_dir,
+            frame_rate=frame_rate,
+        )
+    except TypeError as e:
+        logger.info(e)
+
+    if opt.output_format == 'video':
+        output_video_path = osp.join(result_root, 'result.mp4')
+        cmd_str = f"ffmpeg -f image2 -i {osp.join(result_root, 'frame')}/%05d.jpg -c:v copy {output_video_path}"
+        os.system(cmd_str)
+
+
+if __name__ == '__main__':
+    config = default_config
+
+    context.set_context(mode=context.GRAPH_MODE, device_target='GPU')
+    context.set_context(device_id=config.device_id)
+
+    track(config)
diff --git a/research/cv/JDE/requirements.txt b/research/cv/JDE/requirements.txt
new file mode 100644
index 0000000000000000000000000000000000000000..1addfc94bccbd7988a7e7b1519496e295fd6b310
--- /dev/null
+++ b/research/cv/JDE/requirements.txt
@@ -0,0 +1,8 @@
+PyYAML
+opencv-python>=4.5.5.62
+motmetrics>=1.2.0
+scipy>=1.7.2
+lap>=0.4.0
+Cython
+cython-bbox>=0.1.3
+torch
diff --git a/research/cv/JDE/scripts/run_distribute_train_gpu.sh b/research/cv/JDE/scripts/run_distribute_train_gpu.sh
new file mode 100644
index 0000000000000000000000000000000000000000..11247847b0d99d54baa1041ac2d46bdf47ecf922
--- /dev/null
+++ b/research/cv/JDE/scripts/run_distribute_train_gpu.sh
@@ -0,0 +1,53 @@
+#!/bin/bash
+# Copyright 2022 Huawei Technologies Co., Ltd
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ============================================================================
+if [[ $# -ne 4 ]]; then
+    echo "Usage: bash ./scripts/run_distribute_train_gpu.sh [DEVICE_NUM] [LOGS_CKPT_DIR] [CKPT_URL] [DATASET_ROOT]"
+    exit 1;
+fi
+
+export RANK_SIZE=$1
+
+get_real_path(){
+    if [ "${1:0:1}" == "/" ]; then
+        echo "$1"
+    else
+        realpath -m "$PWD/$1"
+    fi
+}
+
+LOGS_CKPT_DIR="$2"
+
+if [ !  -d "$LOGS_CKPT_DIR" ]; then
+  mkdir "$LOGS_CKPT_DIR"
+  mkdir "$LOGS_CKPT_DIR/training_configs"
+fi
+
+DATASET_ROOT=$(get_real_path "$4")
+CKPT_URL=$(get_real_path "$3")
+
+cp ./*.py ./"$LOGS_CKPT_DIR"/training_configs
+cp ./*.yaml ./"$LOGS_CKPT_DIR"/training_configs
+cp -r ./cfg ./"$LOGS_CKPT_DIR"/training_configs
+cp -r ./src ./"$LOGS_CKPT_DIR"/training_configs
+
+mpirun -n $1 --allow-run-as-root\
+    python train.py  \
+    --device_target="GPU" \
+    --logs_dir="$LOGS_CKPT_DIR" \
+    --dataset_root="$DATASET_ROOT" \
+    --ckpt_url="$CKPT_URL" \
+    --is_distributed=True \
+    > ./"$LOGS_CKPT_DIR"/distribute_train.log 2>&1 &
diff --git a/research/cv/JDE/scripts/run_eval_gpu.sh b/research/cv/JDE/scripts/run_eval_gpu.sh
new file mode 100644
index 0000000000000000000000000000000000000000..b3499e339f8c12a25aac1ca5d68633a288b7f27b
--- /dev/null
+++ b/research/cv/JDE/scripts/run_eval_gpu.sh
@@ -0,0 +1,49 @@
+#!/bin/bash
+# Copyright 2022 Huawei Technologies Co., Ltd
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ============================================================================
+if [[ $# -ne 3 ]]; then
+    echo "Usage: bash ./scripts/run_eval_gpu.sh [DEVICE_ID] [CKPT_URL] [DATASET_ROOT]"
+    exit 1;
+fi
+
+export CUDA_VISIBLE_DEVICES=$1
+
+get_real_path(){
+    if [ "${1:0:1}" == "/" ]; then
+        echo "$1"
+    else
+        realpath -m "$PWD/$1"
+    fi
+}
+
+CKPT_URL=$(get_real_path "$2")
+DATASET_ROOT=$(get_real_path "$3")
+
+if [ ! -d "$DATASET_ROOT" ]; then
+    echo "The specified dataset root is not a directory: $DATASET_ROOT"
+    exit 1;
+fi
+
+if [ ! -f "$CKPT_URL" ]; then
+    echo "The specified checkpoint does not exist: $CKPT_URL"
+    exit 1;
+fi
+
+python ./eval.py \
+    --device_target="GPU" \
+    --device_id=0 \
+    --ckpt_url="$CKPT_URL" \
+    --dataset_root="$DATASET_ROOT" \
+    > ./eval.log 2>&1 &
diff --git a/research/cv/JDE/scripts/run_standalone_train_gpu.sh b/research/cv/JDE/scripts/run_standalone_train_gpu.sh
new file mode 100644
index 0000000000000000000000000000000000000000..2be547033d6bb6243d9e651f888164da39442937
--- /dev/null
+++ b/research/cv/JDE/scripts/run_standalone_train_gpu.sh
@@ -0,0 +1,53 @@
+#!/bin/bash
+# Copyright 2022 Huawei Technologies Co., Ltd
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ============================================================================
+if [[ $# -ne 4 ]]; then
+    echo "Usage: bash ./scripts/run_standalone_train_gpu.sh [DEVICE_ID] [LOGS_CKPT_DIR] [CKPT_URL] [DATASET_ROOT]"
+    exit 1
+fi
+
+export CUDA_VISIBLE_DEVICES=$1
+
+get_real_path(){
+    if [ "${1:0:1}" == "/" ]; then
+        echo "$1"
+    else
+        realpath -m "$PWD/$1"
+    fi
+}
+
+LOGS_CKPT_DIR="$2"
+
+if [ !  -d "$LOGS_CKPT_DIR" ]; then
+  mkdir "$LOGS_CKPT_DIR"
+  mkdir "$LOGS_CKPT_DIR/training_configs"
+fi
+
+DATASET_ROOT=$(get_real_path "$4")
+CKPT_URL=$(get_real_path "$3")
+
+cp ./*.py ./"$LOGS_CKPT_DIR"/training_configs
+cp ./*.yaml ./"$LOGS_CKPT_DIR"/training_configs
+cp -r ./cfg ./"$LOGS_CKPT_DIR"/training_configs
+cp -r ./src ./"$LOGS_CKPT_DIR"/training_configs
+
+python ./train.py \
+    --device_target="GPU" \
+    --device_id=0 \
+    --logs_dir="$LOGS_CKPT_DIR" \
+    --dataset_root="$DATASET_ROOT" \
+    --ckpt_url="$CKPT_URL" \
+    --lr=0.00125 \
+    > ./"$2"/standalone_train.log 2>&1 &
diff --git a/research/cv/JDE/src/__init__.py b/research/cv/JDE/src/__init__.py
new file mode 100644
index 0000000000000000000000000000000000000000..e69de29bb2d1d6434b8b29ae775ad8c2e48c5391
diff --git a/research/cv/JDE/src/convert_checkpoint.py b/research/cv/JDE/src/convert_checkpoint.py
new file mode 100644
index 0000000000000000000000000000000000000000..fd24c6ab2b57538285c952cf06cade6995fd323b
--- /dev/null
+++ b/research/cv/JDE/src/convert_checkpoint.py
@@ -0,0 +1,90 @@
+# Copyright 2022 Huawei Technologies Co., Ltd
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ============================================================================
+"""Checkpoint import."""
+from pathlib import Path
+
+import torch
+from mindspore import Parameter
+from mindspore import Tensor
+from mindspore import dtype as mstype
+from mindspore import save_checkpoint
+
+from cfg.config import config
+from src.darknet import DarkNet
+from src.darknet import ResidualBlock
+
+
+def convert(cfg):
+    """
+    Init the DarkNet53 model, load PyTorch checkpoint,
+    change the keys order as well as in MindSpore and
+    save converted checkpoint with names,
+    corresponds to inited DarkNet model.
+
+    Args:
+        cfg: Config parameters.
+
+    Note:
+        Convert weights without last FC layer.
+    """
+    darknet53 = DarkNet(
+        ResidualBlock,
+        cfg.backbone_layers,
+        cfg.backbone_input_shape,
+        cfg.backbone_shape,
+        detect=True,
+    )
+
+    # Get MindSpore names of parameters
+    ms_keys = list(darknet53.parameters_dict().keys())
+
+    # Get PyTorch weights and names
+    pt_weights = torch.load(cfg.ckpt_url, map_location=torch.device('cpu'))['state_dict']
+    pt_keys = list(pt_weights.keys())
+
+    # Remove redundant keys
+    pt_keys_clear = [
+        key
+        for key in pt_keys
+        if not key.endswith('tracked')
+    ]
+
+    # One layer consist of 5 parameters
+    # Arrange PyTorch keys as well as in MindSpore
+    pt_keys_aligned = []
+    for block_num in range(len(pt_keys_clear[:-2]) // 5):
+        layer = pt_keys_clear[block_num * 5:(block_num + 1) * 5]
+        pt_keys_aligned.append(layer[0])  # Conv weight
+        pt_keys_aligned.append(layer[3])  # BN moving mean
+        pt_keys_aligned.append(layer[4])  # BN moving var
+        pt_keys_aligned.append(layer[1])  # BN gamma
+        pt_keys_aligned.append(layer[2])  # BN beta
+
+    ms_checkpoint = []
+    for key_ms, key_pt in zip(ms_keys, pt_keys_aligned):
+        weight = Parameter(Tensor(pt_weights[key_pt].numpy(), mstype.float32))
+        ms_checkpoint.append({'name': key_ms, 'data': weight})
+
+    checkpoint_name = str(Path(cfg.ckpt_url).resolve().parent / 'darknet53.ckpt')
+    save_checkpoint(ms_checkpoint, checkpoint_name)
+
+    print(f'Checkpoint converted successfully! Location {checkpoint_name}')
+
+
+if __name__ == '__main__':
+    if not Path(config.ckpt_url).exists():
+        raise FileNotFoundError(f'Expect a path to the PyTorch checkpoint, but not found it at "{config.ckpt_url}"')
+
+    convert(config)
diff --git a/research/cv/JDE/src/darknet.py b/research/cv/JDE/src/darknet.py
new file mode 100644
index 0000000000000000000000000000000000000000..c448975a802cdb421a1c1854a8a476f60a0daa76
--- /dev/null
+++ b/research/cv/JDE/src/darknet.py
@@ -0,0 +1,267 @@
+# Copyright 2022 Huawei Technologies Co., Ltd
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ============================================================================
+"""DarkNet model."""
+from mindspore import nn
+from mindspore.ops import operations as P
+
+
+def conv_block(
+        in_channels,
+        out_channels,
+        kernel_size,
+        stride,
+        dilation=1,
+):
+    """
+    Set a conv2d, BN and relu layer.
+    """
+    pad_mode = 'same'
+    padding = 0
+
+    dbl = nn.SequentialCell(
+        [
+            nn.Conv2d(
+                in_channels,
+                out_channels,
+                kernel_size=kernel_size,
+                stride=stride,
+                padding=padding,
+                dilation=dilation,
+                pad_mode=pad_mode,
+            ),
+            nn.BatchNorm2d(out_channels, momentum=0.1),
+            nn.ReLU(),
+        ]
+    )
+
+    return dbl
+
+
+class ResidualBlock(nn.Cell):
+    """
+    DarkNet V1 residual block definition.
+
+    Args:
+        in_channels (int): Input channel.
+        out_channels (int): Output channel.
+
+    Returns:
+        out (ms.Tensor): Output tensor.
+
+    Examples:
+        ResidualBlock(3, 32)
+    """
+    def __init__(
+            self,
+            in_channels,
+            out_channels,
+    ):
+        super().__init__()
+        out_chls = out_channels//2
+        self.conv1 = conv_block(in_channels, out_chls, kernel_size=1, stride=1)
+        self.conv2 = conv_block(out_chls, out_channels, kernel_size=3, stride=1)
+        self.add = P.Add()
+
+    def construct(self, x):
+        identity = x
+        out = self.conv1(x)
+        out = self.conv2(out)
+        out = self.add(out, identity)
+
+        return out
+
+
+class DarkNet(nn.Cell):
+    """
+    DarkNet V1 network.
+
+    Args:
+        block (cell): Block for network.
+        layer_nums (list): Numbers of different layers.
+        in_channels (list): Input channel.
+        out_channels (list): Output channel.
+        detect (bool): Whether detect or not. Default:False.
+
+    Returns:
+        if detect = True:
+            c11 (ms.Tensor): Output from last layer.
+
+        if detect = False:
+            c7, c9, c11 (ms.Tensor): Outputs from different layers (FPN).
+
+    Examples:
+        DarkNet(
+        ResidualBlock,
+        [1, 2, 8, 8, 4],
+        [32, 64, 128, 256, 512],
+        [64, 128, 256, 512, 1024],
+        )
+    """
+    def __init__(
+            self,
+            block,
+            layer_nums,
+            in_channels,
+            out_channels,
+            detect=False,
+    ):
+        super().__init__()
+
+        self.detect = detect
+
+        if not len(layer_nums) == len(in_channels) == len(out_channels) == 5:
+            raise ValueError("the length of layer_num, inchannel, outchannel list must be 5!")
+
+        self.conv0 = conv_block(
+            3,
+            in_channels[0],
+            kernel_size=3,
+            stride=1,
+        )
+
+        self.conv1 = conv_block(
+            in_channels[0],
+            out_channels[0],
+            kernel_size=3,
+            stride=2,
+        )
+
+        self.layer1 = self._make_layer(
+            block,
+            layer_nums[0],
+            in_channel=out_channels[0],
+            out_channel=out_channels[0],
+        )
+
+        self.conv2 = conv_block(
+            in_channels[1],
+            out_channels[1],
+            kernel_size=3,
+            stride=2,
+        )
+
+        self.layer2 = self._make_layer(
+            block,
+            layer_nums[1],
+            in_channel=out_channels[1],
+            out_channel=out_channels[1],
+        )
+
+        self.conv3 = conv_block(
+            in_channels[2],
+            out_channels[2],
+            kernel_size=3,
+            stride=2,
+        )
+
+        self.layer3 = self._make_layer(
+            block,
+            layer_nums[2],
+            in_channel=out_channels[2],
+            out_channel=out_channels[2],
+        )
+
+        self.conv4 = conv_block(
+            in_channels[3],
+            out_channels[3],
+            kernel_size=3,
+            stride=2,
+        )
+
+        self.layer4 = self._make_layer(
+            block,
+            layer_nums[3],
+            in_channel=out_channels[3],
+            out_channel=out_channels[3],
+        )
+
+        self.conv5 = conv_block(
+            in_channels[4],
+            out_channels[4],
+            kernel_size=3,
+            stride=2,
+        )
+
+        self.layer5 = self._make_layer(
+            block,
+            layer_nums[4],
+            in_channel=out_channels[4],
+            out_channel=out_channels[4],
+        )
+
+    def _make_layer(self, block, layer_num, in_channel, out_channel):
+        """
+        Make Layer for DarkNet.
+
+        Args:
+            block (Cell): DarkNet block.
+            layer_num (int): Layer number.
+            in_channel (int): Input channel.
+            out_channel (int): Output channel.
+
+        Examples:
+            _make_layer(ConvBlock, 1, 128, 256)
+        """
+        layers = []
+        darkblk = block(in_channel, out_channel)
+        layers.append(darkblk)
+
+        for _ in range(1, layer_num):
+            darkblk = block(out_channel, out_channel)
+            layers.append(darkblk)
+
+        return nn.SequentialCell(layers)
+
+    def construct(self, x):
+        """
+        Feed forward image.
+        """
+        c1 = self.conv0(x)
+        c2 = self.conv1(c1)
+        c3 = self.layer1(c2)
+        c4 = self.conv2(c3)
+        c5 = self.layer2(c4)
+        c6 = self.conv3(c5)
+        c7 = self.layer3(c6)
+        c8 = self.conv4(c7)
+        c9 = self.layer4(c8)
+        c10 = self.conv5(c9)
+        c11 = self.layer5(c10)
+
+        if self.detect:
+            return c7, c9, c11
+
+        return c11
+
+
+def darknet53():
+    """
+    Get DarkNet53 neural network.
+
+    Returns:
+        Cell, cell instance of DarkNet53 neural network.
+
+    Examples:
+        darknet53()
+    """
+
+    darknet = DarkNet(
+        block=ResidualBlock,
+        layer_nums=[1, 2, 8, 8, 4],
+        in_channels=[32, 64, 128, 256, 512],
+        out_channels=[64, 128, 256, 512, 1024],
+    )
+
+    return darknet
diff --git a/research/cv/JDE/src/dataset.py b/research/cv/JDE/src/dataset.py
new file mode 100644
index 0000000000000000000000000000000000000000..38277136c66707a9ed4d40b846678578bd71cff5
--- /dev/null
+++ b/research/cv/JDE/src/dataset.py
@@ -0,0 +1,529 @@
+# Copyright 2022 Huawei Technologies Co., Ltd
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ============================================================================
+"""Dataloader script."""
+import math
+import os
+import os.path as osp
+import random
+from collections import OrderedDict
+from pathlib import Path
+
+import cv2
+import numpy as np
+
+from src.utils import build_thresholds
+from src.utils import create_anchors_vec
+from src.utils import xyxy2xywh
+
+
+class LoadImages:
+    """
+    Loader for inference.
+
+    Args:
+        path (str): Path to the directory, containing images.
+        img_size (list): Size of output image.
+
+    Returns:
+        img (np.array): Processed image.
+        img0 (np.array): Original image.
+    """
+    def __init__(self, path, anchor_scales, img_size=(1088, 608)):
+        path = Path(path)
+        if not path.is_dir():
+            raise NotADirectoryError(f'Expected a path to the directory with images, got "{path}"')
+
+        self.files = sorted(path.glob('*.jpg'))
+
+        self.anchors, self.strides = create_anchors_vec(anchor_scales)
+        self.nf = len(self.files)  # Number of img files.
+        self.width = img_size[0]
+        self.height = img_size[1]
+        self.count = 0
+
+        assert self.nf > 0, 'No images found in ' + path
+
+    def __iter__(self):
+        self.count = -1
+        return self
+
+    def __next__(self):
+        self.count += 1
+        if self.count == self.nf:
+            raise StopIteration
+        img_path = str(self.files[self.count])
+
+        # Read image
+        img0 = cv2.imread(img_path)  # BGR
+        assert img0 is not None, 'Failed to load ' + img_path
+
+        # Padded resize
+        img, _, _, _ = letterbox(img0, height=self.height, width=self.width)
+
+        # Normalize RGB
+        img = img[:, :, ::-1].transpose(2, 0, 1)
+        img = np.ascontiguousarray(img, dtype=np.float32)
+        img /= 255.0
+
+        output = (img, img0)
+
+        return output
+
+    def __getitem__(self, idx):
+        idx = idx % self.nf
+        img_path = self.files[idx]
+
+        # Read image
+        img0 = cv2.imread(img_path)  # BGR
+        assert img0 is not None, 'Failed to load ' + img_path
+
+        # Padded resize
+        img, _, _, _ = letterbox(img0, height=self.height, width=self.width)
+
+        # Normalize RGB
+        img = img[:, :, ::-1].transpose(2, 0, 1)
+        img = np.ascontiguousarray(img, dtype=np.float32)
+        img /= 255.0
+
+        output = (img, img0)
+
+        return output
+
+    def __len__(self):
+        return self.nf  # number of files
+
+
+class LoadVideo:
+    """
+    Video loader for inference.
+
+    Args:
+        path (str): Path to video.
+        img_size (tuple): Size of output images size.
+
+    Returns:
+        count (int): Number of frame.
+        img (np.array): Processed image.
+        img0 (np.array): Original image.
+    """
+    def __init__(self, path, anchor_scales, img_size=(1088, 608)):
+        if not os.path.isfile(path):
+            raise FileExistsError
+
+        self.cap = cv2.VideoCapture(path)
+        self.frame_rate = int(round(self.cap.get(cv2.CAP_PROP_FPS)))
+        self.vw = int(self.cap.get(cv2.CAP_PROP_FRAME_WIDTH))
+        self.vh = int(self.cap.get(cv2.CAP_PROP_FRAME_HEIGHT))
+        self.vn = int(self.cap.get(cv2.CAP_PROP_FRAME_COUNT))
+
+        self.anchors, self.strides = create_anchors_vec(anchor_scales)
+
+        self.width = img_size[0]
+        self.height = img_size[1]
+        self.count = 0
+
+        self.w, self.h = self.get_size(self.vw, self.vh, self.width, self.height)
+        print(f'Lenth of the video: {self.vn:d} frames')
+
+    def get_size(self, vw, vh, dw, dh):
+        wa, ha = float(dw) / vw, float(dh) / vh
+        a = min(wa, ha)
+        return int(vw * a), int(vh * a)
+
+    def __iter__(self):
+        self.count = -1
+        return self
+
+    def __next__(self):
+        self.count += 1
+        if self.count == len(self):
+            raise StopIteration
+        # Read image
+        _, img0 = self.cap.read()  # BGR
+        assert img0 is not None, f'Failed to load frame {self.count:d}'
+        img0 = cv2.resize(img0, (self.w, self.h))
+
+        # Padded resize
+        img, _, _, _ = letterbox(img0, height=self.height, width=self.width)
+
+        # Normalize RGB
+        img = img[:, :, ::-1].transpose(2, 0, 1)
+        img = np.ascontiguousarray(img, dtype=np.float32)
+        img /= 255.0
+
+        output = (img, img0)
+
+        return output
+
+    def __len__(self):
+        return self.vn  # number of files
+
+
+class JointDataset:
+    """
+    Loader for all datasets.
+
+    Args:
+        root (str): Absolute path to datasets.
+        paths (dict): Relative paths for datasets.
+        img_size (list): Size of output image.
+        augment (bool): Augment images or not.
+        transforms: Transform methods.
+        config (class): Config with hyperparameters.
+
+    Returns:
+        imgs (np_array): Prepared image. Shape (C, H, W)
+        tconf (s, m, b) (np_array): Mask with bg (0), gt (1) and ign (-1) indices. Shape (nA, nGh, nGw).
+        tbox (s, m, b) (np_array): Targets delta bbox values. Shape (nA, nGh, nGw, 4).
+        tid (s, m, b) (np_array): Grid with id for every cell. Shape (nA, nGh, nGw).
+    """
+    def __init__(
+            self,
+            root,
+            paths,
+            img_size=(1088, 608),
+            k_max=200,
+            augment=False,
+            transforms=None,
+            config=None,
+    ):
+        self.img_files = OrderedDict()
+        self.label_files = OrderedDict()
+        self.tid_num = OrderedDict()
+        self.tid_start_index = OrderedDict()
+        self.config = config
+        self.anchors, self.strides = create_anchors_vec(config.anchor_scales)
+        self.k_max = k_max
+
+        # Iterate for all of datasets to prepare paths to labels
+        for ds, img_path in paths.items():
+            with open(img_path, 'r') as file:
+                self.img_files[ds] = file.readlines()
+                self.img_files[ds] = [osp.join(root, x.strip()) for x in self.img_files[ds]]
+                self.img_files[ds] = list(filter(lambda x: len(x) > 0, self.img_files[ds]))
+
+            self.label_files[ds] = [
+                x.replace('images', 'labels_with_ids').replace('.png', '.txt').replace('.jpg', '.txt')
+                for x in self.img_files[ds]]
+
+        # Search for max pedestrian id in dataset
+        for ds, label_paths in self.label_files.items():
+            max_index = -1
+            for lp in label_paths:
+                lb = np.loadtxt(lp)
+                if lb.shape[0] < 1:
+                    continue
+                if lb.ndim < 2:
+                    img_max = lb[1]
+                else:
+                    img_max = np.max(lb[:, 1])
+                if img_max > max_index:
+                    max_index = img_max
+            self.tid_num[ds] = max_index + 1
+
+        last_index = 0
+        for k, v in self.tid_num.items():
+            self.tid_start_index[k] = last_index
+            last_index += v
+
+        self.nid = int(last_index + 1)
+        self.nds = [len(x) for x in self.img_files.values()]
+        self.cds = [sum(self.nds[:i]) for i in range(len(self.nds))]
+        self.nf = sum(self.nds)
+        self.width = img_size[0]
+        self.height = img_size[1]
+        self.augment = augment
+        self.transforms = transforms
+
+        print('=' * 40)
+        print('dataset summary')
+        print(self.tid_num)
+        print('total # identities:', self.nid)
+        print('start index')
+        print(self.tid_start_index)
+        print('=' * 40)
+
+    def get_data(self, img_path, label_path):
+        """
+        Get and prepare data (augment img).
+        """
+        height = self.height
+        width = self.width
+        img = cv2.imread(img_path)  # BGR
+        if img is None:
+            raise ValueError(f'File corrupt {img_path}')
+        augment_hsv = True
+        if self.augment and augment_hsv:
+            # SV augmentation by 50%
+            fraction = 0.50
+            img_hsv = cv2.cvtColor(img, cv2.COLOR_BGR2HSV)
+            s = img_hsv[:, :, 1].astype(np.float32)
+            v = img_hsv[:, :, 2].astype(np.float32)
+
+            a = (random.random() * 2 - 1) * fraction + 1
+            s *= a
+            if a > 1:
+                np.clip(s, a_min=0, a_max=255, out=s)
+
+            a = (random.random() * 2 - 1) * fraction + 1
+            v *= a
+            if a > 1:
+                np.clip(v, a_min=0, a_max=255, out=v)
+
+            img_hsv[:, :, 1] = s.astype(np.uint8)
+            img_hsv[:, :, 2] = v.astype(np.uint8)
+            cv2.cvtColor(img_hsv, cv2.COLOR_HSV2BGR, dst=img)
+
+        h, w, _ = img.shape
+        img, ratio, padw, padh = letterbox(img, height=height, width=width)
+
+        # Load labels
+        if os.path.isfile(label_path):
+            labels0 = np.loadtxt(label_path, dtype=np.float32).reshape(-1, 6)
+
+            # Normalized xywh to pixel xyxy format
+            labels = labels0.copy()
+            labels[:, 2] = ratio * w * (labels0[:, 2] - labels0[:, 4] / 2) + padw
+            labels[:, 3] = ratio * h * (labels0[:, 3] - labels0[:, 5] / 2) + padh
+            labels[:, 4] = ratio * w * (labels0[:, 2] + labels0[:, 4] / 2) + padw
+            labels[:, 5] = ratio * h * (labels0[:, 3] + labels0[:, 5] / 2) + padh
+        else:
+            labels = np.array([])
+
+        # Augment image and labels
+        if self.augment:
+            img, labels, _ = random_affine(img, labels, degrees=(-5, 5), translate=(0.10, 0.10), scale=(0.50, 1.20))
+
+        nlbls = len(labels)
+        if nlbls > 0:
+            # convert xyxy to xywh
+            labels[:, 2:6] = xyxy2xywh(labels[:, 2:6].copy())  # / height
+            labels[:, 2] /= width
+            labels[:, 3] /= height
+            labels[:, 4] /= width
+            labels[:, 5] /= height
+        if self.augment:
+            # random left-right flip
+            lr_flip = True
+            if lr_flip & (random.random() > 0.5):
+                img = np.fliplr(img)
+                if nlbls > 0:
+                    labels[:, 2] = 1 - labels[:, 2]
+
+        img = np.ascontiguousarray(img[:, :, ::-1])  # BGR to RGB
+        if self.transforms is not None:
+            img = self.transforms(img)
+
+        return img, labels, img_path
+
+    def __getitem__(self, files_index):
+        """
+        Iterator function for train dataset
+        """
+        for i, c in enumerate(self.cds):
+            if files_index >= c:
+                ds = list(self.label_files.keys())[i]
+                start_index = c
+        img_path = self.img_files[ds][files_index - start_index]
+        label_path = self.label_files[ds][files_index - start_index]
+
+        imgs, labels, img_path = self.get_data(img_path, label_path)
+        for i, _ in enumerate(labels):
+            if labels[i, 1] > -1:
+                labels[i, 1] += self.tid_start_index[ds]
+
+        # Graph mode in Mindspore uses constant shapes
+        # Thus, it is necessary to fill targets to max possible ids in image
+        to_fill = 100 - labels.shape[0]
+        padding = np.zeros((to_fill, 6), dtype=np.float32)
+        labels = np.concatenate((labels, padding), axis=0)
+
+        # Calculate confidence mask, bbox delta and ids for every map size
+        small, medium, big = build_thresholds(
+            labels=labels,
+            anchor_vec_s=self.anchors[0],
+            anchor_vec_m=self.anchors[1],
+            anchor_vec_b=self.anchors[2],
+            k_max=self.k_max,
+        )
+
+        tconf_s, tbox_s, tid_s, emb_indices_s = small
+        tconf_m, tbox_m, tid_m, emb_indices_m = medium
+        tconf_b, tbox_b, tid_b, emb_indices_b = big
+
+        total_values = (
+            imgs.astype(np.float32),
+            tconf_s,
+            tbox_s,
+            tid_s,
+            tconf_m,
+            tbox_m,
+            tid_m,
+            tconf_b,
+            tbox_b,
+            tid_b,
+            emb_indices_s,
+            emb_indices_m,
+            emb_indices_b,
+        )
+        return total_values
+
+    def __len__(self):
+        return self.nf  # number of batches
+
+
+class JointDatasetDetection(JointDataset):
+    """
+    Joint dataset for evaluation.
+    """
+    def __init__(self, *args, **kwargs):
+        super().__init__(*args, **kwargs)
+
+    def __getitem__(self, files_index):
+        """
+        Iterator function for train dataset.
+        """
+        for i, c in enumerate(self.cds):
+            if files_index >= c:
+                ds = list(self.label_files.keys())[i]
+                start_index = c
+        img_path = self.img_files[ds][files_index - start_index]
+        label_path = self.label_files[ds][files_index - start_index]
+
+        imgs, labels, img_path = self.get_data(img_path, label_path)
+        for i, _ in enumerate(labels):
+            if labels[i, 1] > -1:
+                labels[i, 1] += self.tid_start_index[ds]
+
+        targets_size = labels.shape[0]
+
+        # Graph mode in Mindspore uses constant shapes
+        # Thus, it is necessary to fill targets to max possible ids in image.
+        to_fill = 100 - labels.shape[0]
+        padding = np.zeros((to_fill, 6), dtype=np.float32)
+        labels = np.concatenate((labels, padding), axis=0)
+
+        output = (imgs.astype(np.float32), labels, targets_size)
+
+        return output
+
+
+def letterbox(
+        img,
+        height=608,
+        width=1088,
+        color=(127.5, 127.5, 127.5),
+):
+    """
+    Resize a rectangular image to a padded rectangular
+    and fill padded border with color.
+    """
+    shape = img.shape[:2]  # shape = [height, width]
+    ratio = min(float(height) / shape[0], float(width) / shape[1])
+    new_shape = (round(shape[1] * ratio), round(shape[0] * ratio))  # new_shape = [width, height]
+    dw = (width - new_shape[0]) / 2  # width padding
+    dh = (height - new_shape[1]) / 2  # height padding
+    top, bottom = round(dh - 0.1), round(dh + 0.1)
+    left, right = round(dw - 0.1), round(dw + 0.1)
+    img = cv2.resize(img, new_shape, interpolation=cv2.INTER_AREA)  # resized, no border
+    img = cv2.copyMakeBorder(img, top, bottom, left, right, cv2.BORDER_CONSTANT, value=color)  # padded rectangular
+
+    return img, ratio, dw, dh
+
+
+def random_affine(
+        img,
+        targets=None,
+        degrees=(-10, 10),
+        translate=(.1, .1),
+        scale=(.9, 1.1),
+        shear=(-2, 2),
+        border_value=(127.5, 127.5, 127.5),
+):
+    """
+    Apply several data augmentation techniques,
+    such as random rotation, random scale, color jittering
+    to reduce overfitting.
+
+    Every rotation and scaling and etc.
+    is also applied to targets bbox cords.
+    """
+    border = 0  # width of added border (optional)
+    height = img.shape[0]
+    width = img.shape[1]
+
+    # Rotation and Scale
+    r = np.eye(3)
+    a = random.random() * (degrees[1] - degrees[0]) + degrees[0]
+    s = random.random() * (scale[1] - scale[0]) + scale[0]
+    r[:2] = cv2.getRotationMatrix2D(angle=a, center=(img.shape[1] / 2, img.shape[0] / 2), scale=s)
+
+    # Translation
+    t = np.eye(3)
+    t[0, 2] = (random.random() * 2 - 1) * translate[0] * img.shape[0] + border  # x translation (pixels)
+    t[1, 2] = (random.random() * 2 - 1) * translate[1] * img.shape[1] + border  # y translation (pixels)
+
+    # Shear
+    s = np.eye(3)
+    s[0, 1] = math.tan((random.random() * (shear[1] - shear[0]) + shear[0]) * math.pi / 180)  # x shear (deg)
+    s[1, 0] = math.tan((random.random() * (shear[1] - shear[0]) + shear[0]) * math.pi / 180)  # y shear (deg)
+
+    m = s @ t @ r  # Combined rotation matrix. ORDER IS IMPORTANT HERE!
+    imw = cv2.warpPerspective(img, m, dsize=(width, height), flags=cv2.INTER_LINEAR,
+                              borderValue=border_value)  # BGR order borderValue
+
+    # Return warped points also
+    if targets is not None:
+        if targets.shape[0] > 0:
+            n = targets.shape[0]
+            points = targets[:, 2:6].copy()
+            area0 = (points[:, 2] - points[:, 0]) * (points[:, 3] - points[:, 1])
+
+            # warp points
+            xy = np.ones((n * 4, 3))
+            xy[:, :2] = points[:, [0, 1, 2, 3, 0, 3, 2, 1]].reshape(n * 4, 2)  # x1y1, x2y2, x1y2, x2y1
+            xy = (xy @ m.T)[:, :2].reshape(n, 8)
+
+            # create new boxes
+            x = xy[:, [0, 2, 4, 6]]
+            y = xy[:, [1, 3, 5, 7]]
+            xy = np.concatenate((x.min(1), y.min(1), x.max(1), y.max(1))).reshape(4, n).T
+
+            # apply angle-based reduction
+            radians = a * math.pi / 180
+            reduction = max(abs(math.sin(radians)), abs(math.cos(radians))) ** 0.5
+            x = (xy[:, 2] + xy[:, 0]) / 2
+            y = (xy[:, 3] + xy[:, 1]) / 2
+            w = (xy[:, 2] - xy[:, 0]) * reduction
+            h = (xy[:, 3] - xy[:, 1]) * reduction
+            xy = np.concatenate((x - w / 2, y - h / 2, x + w / 2, y + h / 2)).reshape(4, n).T
+
+            # reject warped points outside of image
+            np.clip(xy[:, 0], 0, width, out=xy[:, 0])
+            np.clip(xy[:, 2], 0, width, out=xy[:, 2])
+            np.clip(xy[:, 1], 0, height, out=xy[:, 1])
+            np.clip(xy[:, 3], 0, height, out=xy[:, 3])
+            w = xy[:, 2] - xy[:, 0]
+            h = xy[:, 3] - xy[:, 1]
+            area = w * h
+            ar = np.maximum(w / (h + 1e-16), h / (w + 1e-16))
+            i = (w > 4) & (h > 4) & (area / (area0 + 1e-16) > 0.1) & (ar < 10)
+
+            targets = targets[i]
+            targets[:, 2:6] = xy[i]
+
+            return imw, targets, m
+
+    return imw
diff --git a/research/cv/JDE/src/evaluation.py b/research/cv/JDE/src/evaluation.py
new file mode 100644
index 0000000000000000000000000000000000000000..8ba837c57689e985d42e630a492495aa006f5069
--- /dev/null
+++ b/research/cv/JDE/src/evaluation.py
@@ -0,0 +1,135 @@
+# Copyright 2022 Huawei Technologies Co., Ltd
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ============================================================================
+"""Evaluation scripts."""
+import copy
+import os
+
+import motmetrics as mm
+import numpy as np
+import pandas as pd
+
+from src.io import read_results, unzip_objs
+
+mm.lap.default_solver = 'lap'
+
+
+class Evaluator:
+    """
+    Evaluation for tracking with motmetrics.
+    """
+    def __init__(self, data_root, seq_name, data_type):
+        self.data_root = data_root
+        self.seq_name = seq_name
+        self.data_type = data_type
+
+        self.load_annotations()
+        self.reset_accumulator()
+
+    def load_annotations(self):
+        """Load groundtruths."""
+        assert self.data_type == 'mot'
+
+        gt_filename = os.path.join(self.data_root, self.seq_name, 'gt', 'gt.txt')
+        self.gt_frame_dict = read_results(gt_filename, self.data_type, is_gt=True)
+        self.gt_ignore_frame_dict = read_results(gt_filename, self.data_type, is_ignore=True)
+
+    def reset_accumulator(self):
+        self.acc = mm.MOTAccumulator(auto_id=True)
+
+    def eval_frame(self, frame_id, trk_tlwhs, trk_ids, rtn_events=False):
+        """
+        Eval one frame.
+        """
+        # results
+        trk_tlwhs = np.copy(trk_tlwhs)
+        trk_ids = np.copy(trk_ids)
+
+        # gts
+        gt_objs = self.gt_frame_dict.get(frame_id, [])
+        gt_tlwhs, gt_ids = unzip_objs(gt_objs)[:2]
+
+        # ignore boxes
+        ignore_objs = self.gt_ignore_frame_dict.get(frame_id, [])
+        ignore_tlwhs = unzip_objs(ignore_objs)[0]
+
+        # remove ignored results
+        keep = np.ones(len(trk_tlwhs), dtype=bool)
+        iou_distance = mm.distances.iou_matrix(ignore_tlwhs, trk_tlwhs, max_iou=0.5)
+        if iou_distance.size > 0:
+            match_is, match_js = mm.lap.linear_sum_assignment(iou_distance)
+            match_is, match_js = map(lambda a: np.asarray(a, dtype=int), [match_is, match_js])
+            match_ious = iou_distance[match_is, match_js]
+
+            match_js = np.asarray(match_js, dtype=int)
+            match_js = match_js[np.logical_not(np.isnan(match_ious))]
+            keep[match_js] = False
+            trk_tlwhs = trk_tlwhs[keep]
+            trk_ids = trk_ids[keep]
+
+        # get distance matrix
+        iou_distance = mm.distances.iou_matrix(gt_tlwhs, trk_tlwhs, max_iou=0.5)
+
+        # acc
+        self.acc.update(gt_ids, trk_ids, iou_distance)
+
+        if rtn_events and iou_distance.size > 0 and hasattr(self.acc, 'last_mot_events'):
+            events = self.acc.last_mot_events
+        else:
+            events = None
+        return events
+
+    def eval_file(self, filename):
+        """
+        Eval file.
+        """
+        self.reset_accumulator()
+
+        result_frame_dict = read_results(filename, self.data_type, is_gt=False)
+        frames = sorted(list(set(self.gt_frame_dict.keys()) | set(result_frame_dict.keys())))
+        for frame_id in frames:
+            trk_objs = result_frame_dict.get(frame_id, [])
+            trk_tlwhs, trk_ids = unzip_objs(trk_objs)[:2]
+            self.eval_frame(frame_id, trk_tlwhs, trk_ids, rtn_events=False)
+
+        return self.acc
+
+    @staticmethod
+    def get_summary(accs, names, metrics=('mota', 'num_switches', 'idp', 'idr', 'idf1', 'precision', 'recall')):
+        """
+        Get MOT summary.
+        """
+        names = copy.deepcopy(names)
+        if metrics is None:
+            metrics = mm.metrics.motchallenge_metrics
+        metrics = copy.deepcopy(metrics)
+
+        mh = mm.metrics.create()
+        summary = mh.compute_many(
+            accs,
+            metrics=metrics,
+            names=names,
+            generate_overall=True
+        )
+
+        return summary
+
+    @staticmethod
+    def save_summary(summary, filename):
+        """
+        Save evaluation summary.
+        """
+        writer = pd.ExcelWriter(filename)
+        summary.to_excel(writer)
+        writer.save()
diff --git a/research/cv/JDE/src/io.py b/research/cv/JDE/src/io.py
new file mode 100644
index 0000000000000000000000000000000000000000..6975fad9459449b657dbe2c965c1cc01c6237d61
--- /dev/null
+++ b/research/cv/JDE/src/io.py
@@ -0,0 +1,88 @@
+# Copyright 2022 Huawei Technologies Co., Ltd
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ============================================================================
+"""MOT utils."""
+import os
+
+import numpy as np
+
+
+def read_results(filename, data_type: str, is_gt=False, is_ignore=False):
+    """
+    Read results.
+    """
+    if data_type in ('mot', 'lab'):
+        read_fun = read_mot_results
+    else:
+        raise ValueError('Unknown data type: {data_type}')
+
+    return read_fun(filename, is_gt, is_ignore)
+
+
+def read_mot_results(filename, is_gt, is_ignore):
+    """
+    Read MOT results.
+    """
+    valid_labels = {1}
+    ignore_labels = {2, 7, 8, 12}
+    results_dict = {}
+    if os.path.isfile(filename):
+        with open(filename, 'r') as f:
+            for line in f.readlines():
+                linelist = line.split(',')
+                if len(linelist) < 7:
+                    continue
+                fid = int(linelist[0])
+                if fid < 1:
+                    continue
+                results_dict.setdefault(fid, [])
+
+                if is_gt:
+                    if 'MOT16-' in filename or 'MOT17-' in filename:
+                        label = int(float(linelist[7]))
+                        mark = int(float(linelist[6]))
+                        if mark == 0 or label not in valid_labels:
+                            continue
+                    score = 1
+                elif is_ignore:
+                    if 'MOT16-' in filename or 'MOT17-' in filename:
+                        label = int(float(linelist[7]))
+                        vis_ratio = float(linelist[8])
+                        if label not in ignore_labels and vis_ratio >= 0:
+                            continue
+                    else:
+                        continue
+                    score = 1
+                else:
+                    score = float(linelist[6])
+
+                tlwh = tuple(map(float, linelist[2:6]))
+                target_id = int(linelist[1])
+
+                results_dict[fid].append((tlwh, target_id, score))
+
+    return results_dict
+
+
+def unzip_objs(objs):
+    """
+    Unzip objects.
+    """
+    if objs:
+        tlwhs, ids, scores = zip(*objs)
+    else:
+        tlwhs, ids, scores = [], [], []
+    tlwhs = np.asarray(tlwhs, dtype=float).reshape(-1, 4)
+
+    return tlwhs, ids, scores
diff --git a/research/cv/JDE/src/kalman_filter.py b/research/cv/JDE/src/kalman_filter.py
new file mode 100644
index 0000000000000000000000000000000000000000..c1444a38037d0fdd587c267f2846deb7b9ab59c6
--- /dev/null
+++ b/research/cv/JDE/src/kalman_filter.py
@@ -0,0 +1,258 @@
+# Copyright 2022 Huawei Technologies Co., Ltd
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ============================================================================
+"""Kalman filter scripts."""
+import numpy as np
+import scipy.linalg
+
+
+
+# Table for the 0.95 quantile of the chi-square distribution with N degrees of
+# freedom (contains values for N=1, ..., 9). Taken from MATLAB/Octave's chi2inv
+# function and used as Mahalanobis gating threshold.
+
+chi2inv95 = {
+    1: 3.8415,
+    2: 5.9915,
+    3: 7.8147,
+    4: 9.4877,
+    5: 11.070,
+    6: 12.592,
+    7: 14.067,
+    8: 15.507,
+    9: 16.919}
+
+
+class KalmanFilter:
+    """
+    A simple Kalman filter for tracking bounding boxes in image space.
+
+    The 8-dimensional state space (x, y, a, h, vx, vy, va, vh)
+    contains the bounding box center position (x, y), aspect ratio a, height h,
+    and their respective velocities.
+
+    Object motion follows a constant velocity model. The bounding box location
+    (x, y, a, h) is taken as direct observation of the state space (linear
+    observation model).
+    """
+
+    def __init__(self):
+        ndim, dt = 4, 1.
+
+        # Create Kalman filter model matrices.
+        self._motion_mat = np.eye(2 * ndim, 2 * ndim)
+        for i in range(ndim):
+            self._motion_mat[i, ndim + i] = dt
+        self._update_mat = np.eye(ndim, 2 * ndim)
+
+        # Motion and observation uncertainty are chosen relative
+        # to the current state estimate. These weights control
+        # the amount of uncertainty in the model.
+        self._std_weight_position = 1. / 20
+        self._std_weight_velocity = 1. / 160
+
+    def initiate(self, measurement):
+        """
+        Create track from unassociated measurement.
+
+        Args:
+            measurement (np.array): Bbox coords (x, y, a, h), center (x, y), aspect ratio a, and height h.
+
+        Returns:
+            mean (np.array): Mean vector (8 dimensional)
+            covariance (np.array): Covariance matrix (8x8) of the new track.
+        """
+        mean_pos = measurement
+        mean_vel = np.zeros_like(mean_pos)
+        mean = np.r_[mean_pos, mean_vel]
+
+        std = [
+            2 * self._std_weight_position * measurement[3],
+            2 * self._std_weight_position * measurement[3],
+            1e-2,
+            2 * self._std_weight_position * measurement[3],
+            10 * self._std_weight_velocity * measurement[3],
+            10 * self._std_weight_velocity * measurement[3],
+            1e-5,
+            10 * self._std_weight_velocity * measurement[3]]
+        covariance = np.diag(np.square(std))
+
+        return mean, covariance
+
+    def predict(self, mean, covariance):
+        """
+        Run Kalman filter prediction step.
+
+        Args:
+            mean (np.array): The 8 dimensional mean vector of the object state at the previous time step.
+            covariance (np.array): The 8x8 dimensional covariance matrix of the object state at the previous time step.
+
+        Returns:
+            mean (np.array): Mean vector of the predicted state.
+            covariance (np.array): Covariance matrix of the predicted state.
+
+        Note:
+            Unobserved velocities are initialized to 0 mean.
+        """
+        std_pos = [
+            self._std_weight_position * mean[3],
+            self._std_weight_position * mean[3],
+            1e-2,
+            self._std_weight_position * mean[3]]
+        std_vel = [
+            self._std_weight_velocity * mean[3],
+            self._std_weight_velocity * mean[3],
+            1e-5,
+            self._std_weight_velocity * mean[3]]
+        motion_cov = np.diag(np.square(np.r_[std_pos, std_vel]))
+
+        mean = np.dot(mean, self._motion_mat.T)
+        covariance = np.linalg.multi_dot((
+            self._motion_mat, covariance, self._motion_mat.T)) + motion_cov
+
+        return mean, covariance
+
+    def project(self, mean, covariance):
+        """
+        Project state distribution to measurement space.
+
+        Args:
+            mean (np.array): The state's mean vector (8 dimensional array).
+            covariance (np.array): The state's covariance matrix (8x8 dimensional).
+
+        Returns:
+            mean (np.array): Projected mean of the given state estimate.
+            covariance (np.array): Projected covariance matrix of the given state estimate.
+        """
+        std = [
+            self._std_weight_position * mean[3],
+            self._std_weight_position * mean[3],
+            1e-1,
+            self._std_weight_position * mean[3]]
+        innovation_cov = np.diag(np.square(std))
+
+        mean = np.dot(self._update_mat, mean)
+        covariance = np.linalg.multi_dot((
+            self._update_mat, covariance, self._update_mat.T))
+        return mean, covariance + innovation_cov
+
+    def multi_predict(self, mean, covariance):
+        """
+        Run Kalman filter prediction step (Vectorized version).
+
+        Args:
+            mean (np.array): The Nx8 dim mean matrix of the object states at the previous step.
+            covariance (np.array): The Nx8x8 dime covariance matrix of the object states at the previous step.
+
+        Returns:
+            mean (np.array): Mean vector of the predicted state.
+            covariance (np.array): Covariance matrix of the predicted state.
+
+        Note:
+            Unobserved velocities are initialized to 0 mean.
+        """
+        std_pos = [
+            self._std_weight_position * mean[:, 3],
+            self._std_weight_position * mean[:, 3],
+            1e-2 * np.ones_like(mean[:, 3]),
+            self._std_weight_position * mean[:, 3]]
+        std_vel = [
+            self._std_weight_velocity * mean[:, 3],
+            self._std_weight_velocity * mean[:, 3],
+            1e-5 * np.ones_like(mean[:, 3]),
+            self._std_weight_velocity * mean[:, 3]]
+        sqr = np.square(np.r_[std_pos, std_vel]).T
+
+        motion_cov = []
+        for i in range(len(mean)):
+            motion_cov.append(np.diag(sqr[i]))
+        motion_cov = np.asarray(motion_cov)
+
+        mean = np.dot(mean, self._motion_mat.T)
+        left = np.dot(self._motion_mat, covariance).transpose((1, 0, 2))
+        covariance = np.dot(left, self._motion_mat.T) + motion_cov
+
+        return mean, covariance
+
+    def update(self, mean, covariance, measurement):
+        """
+        Run Kalman filter correction step.
+
+        Args:
+            mean (np.array): The predicted state's mean vector (8 dimensional).
+            covariance (np.array): The state's covariance matrix (8x8 dimensional).
+            measurement (np.array): The 4 dimensional measurement vector (x, y, a, h),
+                where (x, y) is the center position, a the aspect ratio,
+                and h the height of the bounding box.
+
+        Returns:
+            new_mean (np.array): Measurement-corrected state distribution.
+            new_covariance (np.array): Measurement-corrected state distribution.
+        """
+        projected_mean, projected_cov = self.project(mean, covariance)
+
+        chol_factor, lower = scipy.linalg.cho_factor(
+            projected_cov, lower=True, check_finite=False)
+        kalman_gain = scipy.linalg.cho_solve(
+            (chol_factor, lower), np.dot(covariance, self._update_mat.T).T,
+            check_finite=False).T
+        innovation = measurement - projected_mean
+
+        new_mean = mean + np.dot(innovation, kalman_gain.T)
+        new_covariance = covariance - np.linalg.multi_dot((
+            kalman_gain, projected_cov, kalman_gain.T))
+        return new_mean, new_covariance
+
+    def gating_distance(self, mean, covariance, measurements, only_position=False, metric='maha'):
+        """
+        Compute gating distance between state distribution and measurements.
+
+        A suitable distance threshold can be obtained from `chi2inv95`. If
+        `only_position` is False, the chi-square distribution has 4 degrees of
+        freedom, otherwise 2.
+
+        Args:
+            mean (np.array): The predicted state's mean vector (8 dimensional).
+            covariance (np.array): The state's covariance matrix (8x8 dimensional).
+            measurements (np.array): An Nx4 dimensional matrix of N measurements,
+                each in format (x, y, a, h) where (x, y) is the bounding box center
+                position, a the aspect ratio, and h the height.
+            only_position (bool): If True, distance computation is done with
+                respect to the bounding box center position only.
+            metric (str): Compute selected metric.
+
+        Returns:
+            (np.array): Array of length N, where the i-th element contains the
+                squared Mahalanobis distance between (mean, covariance) and
+                `measurements[i]`.
+
+        """
+        mean, covariance = self.project(mean, covariance)
+        if only_position:
+            mean, covariance = mean[:2], covariance[:2, :2]
+            measurements = measurements[:, :2]
+
+        d = measurements - mean
+        if metric == 'gaussian':
+            return np.sum(d * d, axis=1)
+
+        if metric == 'maha':
+            cholesky_factor = np.linalg.cholesky(covariance)
+            z = scipy.linalg.solve_triangular(
+                cholesky_factor, d.T, lower=True, check_finite=False,
+                overwrite_b=True)
+            squared_maha = np.sum(z * z, axis=0)
+            return squared_maha
+
+        raise ValueError('invalid distance metric')
diff --git a/research/cv/JDE/src/log.py b/research/cv/JDE/src/log.py
new file mode 100644
index 0000000000000000000000000000000000000000..516cdc7eb438bcaafc2de728ec563af541089fd4
--- /dev/null
+++ b/research/cv/JDE/src/log.py
@@ -0,0 +1,36 @@
+# Copyright 2022 Huawei Technologies Co., Ltd
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ============================================================================
+"""Logger."""
+import logging
+
+
+def get_logger(name='root'):
+    """
+    Get Logger.
+    """
+    formatter = logging.Formatter(
+        fmt='%(asctime)s [%(levelname)s]: %(message)s', datefmt='%Y-%m-%d %H:%M:%S')
+
+    handler = logging.StreamHandler()
+    handler.setFormatter(formatter)
+
+    logg = logging.getLogger(name)
+    logg.setLevel(logging.DEBUG)
+    logg.addHandler(handler)
+
+    return logg
+
+
+logger = get_logger('root')
diff --git a/research/cv/JDE/src/model.py b/research/cv/JDE/src/model.py
new file mode 100644
index 0000000000000000000000000000000000000000..5b62133971988c73f7cb2298b755254d1f465297
--- /dev/null
+++ b/research/cv/JDE/src/model.py
@@ -0,0 +1,534 @@
+# Copyright 2022 Huawei Technologies Co., Ltd
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ============================================================================
+"""YOLOv3 based on DarkNet."""
+import math
+
+import mindspore as ms
+import mindspore.numpy as msnp
+from mindspore import nn
+from mindspore import ops
+from mindspore.ops import constexpr
+from mindspore.ops import operations as P
+
+from cfg.config import config as default_config
+from src.utils import DecodeDeltaMap
+from src.utils import SoftmaxCE
+from src.utils import create_anchors_vec
+
+
+def _conv_bn_relu(
+        in_channel,
+        out_channel,
+        ksize,
+        stride=1,
+        padding=0,
+        dilation=1,
+        alpha=0.1,
+        momentum=0.9,
+        eps=1e-5,
+        pad_mode="same",
+):
+    """
+    Set a conv2d, BN and relu layer.
+    """
+    dbl = nn.SequentialCell(
+        [
+            nn.Conv2d(
+                in_channel,
+                out_channel,
+                kernel_size=ksize,
+                stride=stride,
+                padding=padding,
+                dilation=dilation,
+                pad_mode=pad_mode,
+            ),
+            nn.BatchNorm2d(out_channel, momentum=momentum, eps=eps),
+            nn.LeakyReLU(alpha),
+        ]
+    )
+
+    return dbl
+
+
+@constexpr
+def batch_index(batch_size):
+    """
+    Construct index for each image in batch.
+
+    Example:
+        if batch_size = 2, returns ms.Tensor([[0], [1]])
+    """
+    batch_i = ms.Tensor(msnp.arange(batch_size).reshape(-1, 1), dtype=ms.int32)
+
+    return batch_i
+
+
+class YoloBlock(nn.Cell):
+    """
+    YoloBlock for YOLOv3.
+
+    Args:
+        in_channels (int): Input channel.
+        out_chls (int): Middle channel.
+        out_channels (int): Output channel.
+        config (class): Config with model and training params.
+
+    Returns:
+        c5 (ms.Tensor): Feature map to feed at next layers.
+        out (ms.Tensor): Output feature map.
+        emb (ms.Tensor): Output embeddings.
+
+    Examples:
+        YoloBlock(1024, 512, 24)
+    """
+
+    def __init__(
+            self,
+            in_channels,
+            out_chls,
+            out_channels,
+            config=default_config,
+    ):
+        super().__init__()
+        out_chls_2 = out_chls * 2
+
+        emb_dim = config.embedding_dim
+
+        self.conv0 = _conv_bn_relu(in_channels, out_chls, ksize=1)
+        self.conv1 = _conv_bn_relu(out_chls, out_chls_2, ksize=3)
+
+        self.conv2 = _conv_bn_relu(out_chls_2, out_chls, ksize=1)
+        self.conv3 = _conv_bn_relu(out_chls, out_chls_2, ksize=3)
+
+        self.conv4 = _conv_bn_relu(out_chls_2, out_chls, ksize=1)
+        self.conv5 = _conv_bn_relu(out_chls, out_chls_2, ksize=3)
+
+        self.conv6 = nn.Conv2d(out_chls_2, out_channels, kernel_size=1, stride=1, has_bias=True)
+
+        self.emb_conv = nn.Conv2d(out_chls, emb_dim, kernel_size=3, stride=1, has_bias=True)
+
+    def construct(self, x):
+        """
+        Feed forward feature map to YOLOv3 block
+        to get detections and embeddings.
+        """
+        c1 = self.conv0(x)
+        c2 = self.conv1(c1)
+
+        c3 = self.conv2(c2)
+        c4 = self.conv3(c3)
+
+        c5 = self.conv4(c4)
+        c6 = self.conv5(c5)
+
+        emb = self.emb_conv(c5)
+
+        out = self.conv6(c6)
+
+        return c5, out, emb
+
+
+class YOLOv3(nn.Cell):
+    """
+    YOLOv3 Network.
+
+    Note:
+        backbone = darknet53
+
+    Args:
+        backbone_shape (list): Darknet output channels shape.
+        backbone (nn.Cell): Backbone Network.
+        out_channel (int): Output channel.
+
+    Returns:
+       small_feature (ms.Tensor): Feature_map with shape (batch_size, backbone_shape[2], h/8, w/8).
+       medium_feature (ms.Tensor): Feature_map with shape (batch_size, backbone_shape[3], h/16, w/16).
+       big_feature (ms.Tensor): Feature_map with shape (batch_size, backbone_shape[4], h/32, w/32).
+
+    Examples:
+        YOLOv3(
+            backbone_shape=[64, 128, 256, 512, 1024]
+            backbone=darknet53(),
+            out_channel=24,
+            )
+    """
+
+    def __init__(self, backbone_shape, backbone, out_channel):
+        super().__init__()
+        self.out_channel = out_channel
+        self.backbone = backbone
+        self.backblock0 = YoloBlock(
+            in_channels=backbone_shape[-1],  # 1024
+            out_chls=backbone_shape[-2],  # 512
+            out_channels=out_channel,  # 24
+        )
+
+        self.conv1 = _conv_bn_relu(
+            in_channel=backbone_shape[-2],  # 1024
+            out_channel=backbone_shape[-2] // 2,  # 512
+            ksize=1,
+        )
+        self.backblock1 = YoloBlock(
+            in_channels=backbone_shape[-2] + backbone_shape[-3],  # 768
+            out_chls=backbone_shape[-3],  # 256
+            out_channels=out_channel,  # 24
+        )
+
+        self.conv2 = _conv_bn_relu(
+            in_channel=backbone_shape[-3],  # 256
+            out_channel=backbone_shape[-3] // 2,  # 128
+            ksize=1,
+        )
+        self.backblock2 = YoloBlock(
+            in_channels=backbone_shape[-3] + backbone_shape[-4],  # 384
+            out_chls=backbone_shape[-4],  # 128
+            out_channels=out_channel,  # 24
+        )
+        self.concat = P.Concat(axis=1)
+
+        self.freeze_bn()
+
+    def freeze_bn(self):
+        """Freeze batch norms."""
+        for _, cell in self.cells_and_names():
+            if isinstance(cell, nn.BatchNorm2d):
+                cell.beta.requires_grad = False
+                cell.gamma.requires_grad = False
+
+    def construct(self, x):
+        """
+        Feed forward image to FPN to get
+        3 feature maps from different scales.
+        """
+        # input_shape of x is (batch_size, 3, h, w)
+        img_hight = P.Shape()(x)[2]
+        img_width = P.Shape()(x)[3]
+        feature_map1, feature_map2, feature_map3 = self.backbone(x)
+        con1, small_object_output, sml_emb = self.backblock0(feature_map3)
+
+        con1 = self.conv1(con1)
+        ups1 = P.ResizeNearestNeighbor((img_hight // 16, img_width // 16))(con1)
+        con1 = self.concat((ups1, feature_map2))
+        con2, medium_object_output, med_emb = self.backblock1(con1)
+
+        con2 = self.conv2(con2)
+        ups2 = P.ResizeNearestNeighbor((img_hight // 8, img_width // 8))(con2)
+        con3 = self.concat((ups2, feature_map1))
+        _, big_object_output, big_emb = self.backblock2(con3)
+
+        small_feature = self.concat((small_object_output, sml_emb))
+        medium_feature = self.concat((medium_object_output, med_emb))
+        big_feature = self.concat((big_object_output, big_emb))
+
+        return small_feature, medium_feature, big_feature
+
+
+class YOLOLayer(nn.Cell):
+    """
+    Head for loss calculation of classification confidence,
+    bbox regression and ids embedding learning .
+
+    Args:
+        anchors (list): Absolute sizes of anchors (w, h).
+        nid (int): Number of identities in whole train datasets.
+        emb_dim (int): Size of embedding.
+        nc (int): Number of ground truth classes.
+
+    Returns:
+        loss (ms.Tensor): Auto balanced loss, calculated from conf, bbox and ids.
+    """
+
+    def __init__(
+            self,
+            anchors,
+            nid,
+            emb_dim,
+            nc=default_config.num_classes,
+    ):
+        super().__init__()
+        self.anchors = ms.Tensor(anchors, ms.float32)
+        self.na = len(anchors)  # Number of anchors (4)
+        self.nc = nc  # Number of classes (1)
+        self.nid = nid  # Number of identities
+        self.emb_dim = emb_dim
+
+        # Set necessary operations and constants
+        self.normalize = ops.L2Normalize(axis=1, epsilon=1e-12)
+        self.argmax = ops.ArgMaxWithValue(axis=1)
+        self.expand_dims = ops.ExpandDims()
+        self.reduce_sum = ops.ReduceSum()
+        self.fill = ops.Fill()
+        self.exp = ops.Exp()
+        self.zero_tensor = ms.Tensor([0])
+
+        # Set eps to escape division by zero
+        self.eps = ms.Tensor(1e-16, dtype=ms.float32)
+
+        self.smooth_l1_loss = nn.SmoothL1Loss()
+        self.softmax_loss = SoftmaxCE()
+        self.id_loss = SoftmaxCE()
+
+        # Set trainable parameters for loss computation
+        self.s_c = ms.Parameter(-4.15 * ms.Tensor([1]))  # -4.15
+        self.s_r = ms.Parameter(-4.85 * ms.Tensor([1]))  # -4.85
+        self.s_id = ms.Parameter(-2.3 * ms.Tensor([1]))  # -2.3
+
+        self.emb_scale = math.sqrt(2) * math.log(self.nid - 1)
+
+    def construct(self, p_cat, tconf, tbox, tids, emb_indices, classifier):
+        """
+        Feed forward output from the FPN,
+        calculate confidence loss, bbox regression loss, target id loss,
+        apply auto-balancing loss strategy.
+        """
+        # Get detections and embeddings from model concatenated output.
+        p, p_emb = p_cat[:, :24, ...], p_cat[:, 24:, ...]
+        nb, ngh, ngw = p.shape[0], p.shape[-2], p.shape[-1]
+
+        p = p.view(nb, self.na, self.nc + 5, ngh, ngw).transpose(0, 1, 3, 4, 2)  # prediction
+        p_emb = p_emb.transpose(0, 2, 3, 1)
+        p_box = p[..., :4]
+        p_conf = p[..., 4:6].transpose(0, 4, 1, 2, 3)
+
+        mask = (tconf > 0).astype('float32')
+
+        # Compute losses
+        nm = self.reduce_sum(mask)  # number of anchors (assigned to targets)
+        p_box = p_box * self.expand_dims(mask, -1)
+        tbox = tbox * self.expand_dims(mask, -1)
+        lbox = self.smooth_l1_loss(p_box, tbox)
+        lbox = lbox * self.expand_dims(mask, -1)
+        lbox = self.reduce_sum(lbox) / (nm * 4 + self.eps)
+
+        lconf = self.softmax_loss(p_conf.transpose(0, 2, 3, 4, 1), tconf, ignore_index=-1)
+
+        # Construct indices for selecting embeddings
+        # from the flattened view of the model output
+        # (corresponding to the embeddings prediction).
+        #
+        # Set flattened mask to existing detections
+        # and apply it to flattened indices to nullify if it is no detection.
+        emb_indices_batch_stride = emb_indices + batch_index(nb) * ngh * ngw  # Shape (nb, k_max)
+        emb_indices_mask_flat = (emb_indices.reshape(-1) > 0).astype('float32')  # Shape (nb x k_max)
+        emb_indices_flat = (emb_indices_batch_stride.reshape(-1) * emb_indices_mask_flat).astype('int32')
+
+        # Flatten embs and take which is associate to flattened emb index
+        emb_flat = p_emb.view(-1, self.emb_dim)  # Shape (nb x ngh x ngw, emb_dim)
+        embedding = emb_flat[emb_indices_flat]  # Shape (nb x k_max, emb_dim)
+        embedding = self.emb_scale * self.normalize(embedding)
+
+        # Flatten max tids and take according to index
+        _, tids = self.argmax(tids.astype('float32'))  # Shape (nb, ngh, ngw)
+        tids_flat = tids.view(-1)[emb_indices_flat]  # Shape (nb x k_max)
+
+        # Apply flattened emb mask for nullify if it is no detections
+        # and subtract 1 where no detection to apply ignore mask into loss calculation.
+        tids_flat_masked = tids_flat * emb_indices_mask_flat
+        tids_flat_with_ignore = tids_flat_masked + (emb_indices_mask_flat - 1)
+
+        # Apply FC layer to embeddings
+        # and compute loss by custom loss with ignore index = -1.
+        logits = classifier(embedding)
+        lid = self.id_loss(logits, tids_flat_with_ignore.astype('int32'), ignore_index=-1)
+
+        # Apply auto-balancing loss strategy
+        loss = self.exp((-1) * self.s_r) * lbox + \
+               self.exp((-1) * self.s_c) * lconf + \
+               self.exp((-1) * self.s_id) * lid + \
+               (self.s_r + self.s_c + self.s_id)
+        loss *= 0.5
+
+        return loss.squeeze()
+
+
+class JDE(nn.Cell):
+    """
+    JDE Network.
+
+    Args:
+        extractor (nn.Cell): Backbone, which extracts feature maps.
+        config (class): Config with model and training params.
+        nid (int): Number of identities in whole train datasets.
+        ne (int): Size of embedding.
+
+    Returns:
+        loss (ms.Tensor): Sum of 3 losses from each head.
+
+    Note:
+        backbone = YOLOv3 with darknet53
+        head = 3 similar heads for each feature map size
+    """
+
+    def __init__(self, extractor, config, nid, ne):
+        super().__init__()
+        anchors = config.anchor_scales
+        anchors1 = anchors[0:4]
+        anchors2 = anchors[4:8]
+        anchors3 = anchors[8:12]
+
+        self.backbone = extractor
+
+        # Set loss cell layers for different scales
+        self.head_s = YOLOLayer(anchors3, nid, ne)
+        self.head_m = YOLOLayer(anchors2, nid, ne)
+        self.head_b = YOLOLayer(anchors1, nid, ne)
+
+        # Set classifier for embeddings
+        self.classifier = nn.Dense(ne, nid)
+
+    def construct(
+            self,
+            images,
+            tconf_s,
+            tbox_s,
+            tid_s,
+            tconf_m,
+            tbox_m,
+            tid_m,
+            tconf_b,
+            tbox_b,
+            tid_b,
+            mask_s,
+            mask_m,
+            mask_b,
+    ):
+        """
+        Feed forward image to FPN, get 3 feature maps with different sizes,
+        put it into 3 heads, corresponding to size,
+        get auto-balanced losses, summarize them.
+        """
+        # Apply FPN to image to get 3 feature map with different scales
+        small, medium, big = self.backbone(images)
+
+        # Calculate losses for each feature map
+        out_s = self.head_s(small, tconf_s, tbox_s, tid_s, mask_s, self.classifier)
+        out_m = self.head_m(medium, tconf_m, tbox_m, tid_m, mask_m, self.classifier)
+        out_b = self.head_b(big, tconf_b, tbox_b, tid_b, mask_b, self.classifier)
+
+        loss = (out_s + out_m + out_b) / 3
+
+        return loss
+
+
+class YOLOLayerEval(nn.Cell):
+    """
+    Head for detection and tracking.
+
+    Args:
+        anchor (list): Absolute sizes of anchors (w, h).
+        nc (int): Number of ground truth classes.
+
+    Returns:
+        prediction (ms.Tensor): Model predictions for confidences, boxes and embeddings.
+    """
+
+    def __init__(
+            self,
+            anchor,
+            stride,
+            nc=default_config.num_classes,
+    ):
+        super().__init__()
+        self.na = len(anchor)  # number of anchors (4)
+        self.nc = nc  # number of classes (1)
+        self.anchor_vec = anchor
+        self.stride = stride
+
+        self.argmax = ops.ArgMaxWithValue(axis=1)
+        self.expand_dims = ops.ExpandDims()
+        self.softmax = nn.Softmax(axis=1)
+        self.normalize = ops.L2Normalize(axis=-1, epsilon=1e-12)
+        self.tile = ops.Tile()
+        self.fill = ops.Fill()
+        self.concat = ops.Concat(axis=-1)
+
+        self.decode_map = DecodeDeltaMap()
+
+    def construct(self, p_cat):
+        """
+        Feed forward output from the FPN,
+        calculate prediction corresponding to anchor.
+        """
+        p, p_emb = p_cat[:, :24, ...], p_cat[:, 24:, ...]
+        nb, ngh, ngw = p.shape[0], p.shape[-2], p.shape[-1]
+
+        p = p.view(nb, self.na, self.nc + 5, ngh, ngw).transpose(0, 1, 3, 4, 2)  # prediction
+        p_emb = p_emb.transpose(0, 2, 3, 1)
+        p_box = p[..., :4]
+        p_conf = p[..., 4:6].transpose(0, 4, 1, 2, 3)  # conf
+        p_conf = self.expand_dims(self.softmax(p_conf)[:, 1, ...], -1)
+        p_emb = self.normalize(self.tile(self.expand_dims(p_emb, 1), (1, self.na, 1, 1, 1)))
+
+        p_cls = self.fill(ms.float32, (nb, self.na, ngh, ngw, 1), 0)  # temp
+        p = self.concat((p_box, p_conf, p_cls, p_emb))
+
+        # Decode bbox delta to the absolute cords
+        p_1 = self.decode_map(p[..., :4], self.anchor_vec)
+        p_1 = p_1 * self.stride
+
+        p = self.concat((p_1.astype('float32'), p[..., 4:]))
+        prediction = p.reshape(nb, -1, p.shape[-1])
+
+        return prediction
+
+
+class JDEeval(nn.Cell):
+    """
+     JDE Network.
+
+     Note:
+         backbone = YOLOv3 with darknet53.
+         head = 3 similar heads for each feature map size.
+
+     Returns:
+         output (ms.Tensor): Tensor with concatenated outputs from each head.
+         output_top_k (ms.Tensor): Output tensor of top_k best proposals by confidence.
+
+    """
+
+    def __init__(self, extractor, config):
+        super().__init__()
+        anchors, strides = create_anchors_vec(config.anchor_scales)
+        anchors = ms.Tensor(anchors, dtype=ms.float32)
+        strides = ms.Tensor(strides, dtype=ms.float32)
+
+        self.backbone = extractor
+
+        self.head_s = YOLOLayerEval(anchors[0], strides[0])
+        self.head_m = YOLOLayerEval(anchors[1], strides[1])
+        self.head_b = YOLOLayerEval(anchors[2], strides[2])
+
+        self.concatenate = ops.Concat(axis=1)
+        self.top_k = ops.TopK(sorted=False)
+        self.k = 800
+
+    def construct(self, images):
+        """
+        Feed forward image to FPN, get 3 feature maps with different sizes,
+        put them into 3 heads, corresponding to size,
+        get concatenated output of proposals.
+        """
+        small, medium, big = self.backbone(images)
+
+        out_s = self.head_s(small)
+        out_m = self.head_m(medium)
+        out_b = self.head_b(big)
+
+        output = self.concatenate((out_s, out_m, out_b))
+
+        _, top_k_indices = self.top_k(output[:, :, 4], self.k)
+        output_top_k = output[0][top_k_indices]
+
+        return output, output_top_k
diff --git a/research/cv/JDE/src/timer.py b/research/cv/JDE/src/timer.py
new file mode 100644
index 0000000000000000000000000000000000000000..2350b224ed0ba9af1fbdfe7411464d3957d1eec1
--- /dev/null
+++ b/research/cv/JDE/src/timer.py
@@ -0,0 +1,61 @@
+# Copyright 2022 Huawei Technologies Co., Ltd
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ============================================================================
+"""Simple timer script."""
+import time
+
+
+class Timer:
+    """
+    A simple timer.
+    """
+    def __init__(self):
+        self.total_time = 0.
+        self.calls = 0
+        self.start_time = 0.
+        self.diff = 0.
+        self.average_time = 0.
+
+        self.duration = 0.
+
+    def tic(self):
+        """
+        Get the start time.
+        """
+        self.start_time = time.time()
+
+    def toc(self, average=True):
+        """
+        Compute duration of the period
+        """
+        self.diff = time.time() - self.start_time
+        self.total_time += self.diff
+        self.calls += 1
+        self.average_time = self.total_time / self.calls
+        if average:
+            self.duration = self.average_time
+        else:
+            self.duration = self.diff
+        return self.duration
+
+    def clear(self):
+        """
+        Clear values.
+        """
+        self.total_time = 0.
+        self.calls = 0
+        self.start_time = 0.
+        self.diff = 0.
+        self.average_time = 0.
+        self.duration = 0.
diff --git a/research/cv/JDE/src/utils.py b/research/cv/JDE/src/utils.py
new file mode 100644
index 0000000000000000000000000000000000000000..8605f5eda17e3a7f6252e5d9e9675a86393868af
--- /dev/null
+++ b/research/cv/JDE/src/utils.py
@@ -0,0 +1,537 @@
+# Copyright 2022 Huawei Technologies Co., Ltd
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ============================================================================
+"""Auxiliary utils."""
+import os
+
+import numpy as np
+from mindspore import Tensor
+from mindspore import dtype as mstype
+from mindspore import nn
+from mindspore import numpy as msnp
+from mindspore import ops
+from mindspore.ops import functional as F
+
+
+def mkdir_if_missing(directory):
+    os.makedirs(directory, exist_ok=True)
+
+
+def xyxy2xywh(x):
+    """
+    Convert bounding box format from [x1, y1, x2, y2] to [x, y, w, h],
+    where x, y are coordinates of center, (x1, y1) and (x2, y2)
+    are coordinates of bottom left and top right respectively.
+    """
+    y = np.zeros_like(x)
+    y[:, 0] = (x[:, 0] + x[:, 2]) / 2  # x center
+    y[:, 1] = (x[:, 1] + x[:, 3]) / 2  # y center
+    y[:, 2] = x[:, 2] - x[:, 0]  # width
+    y[:, 3] = x[:, 3] - x[:, 1]  # height
+    return y
+
+
+def xywh2xyxy(x):
+    """
+    Convert bounding box format from [x, y, w, h] to [x1, y1, x2, y2],
+    where x, y are coordinates of center, (x1, y1) and (x2, y2)
+    are coordinates of bottom left and top right respectively.
+    """
+    y = np.zeros_like(x)
+    y[:, 0] = (x[:, 0] - x[:, 2] / 2)  # Bottom left x
+    y[:, 1] = (x[:, 1] - x[:, 3] / 2)  # Bottom left y
+    y[:, 2] = (x[:, 0] + x[:, 2] / 2)  # Top right x
+    y[:, 3] = (x[:, 1] + x[:, 3] / 2)  # Top right y
+    return y
+
+
+def scale_coords(img_size, coords, img0_shape):
+    """
+    Rescale x1, y1, x2, y2 to image size.
+    """
+    gain_w = float(img_size[0]) / img0_shape[1]  # gain  = old / new
+    gain_h = float(img_size[1]) / img0_shape[0]
+    gain = min(gain_w, gain_h)
+    pad_x = (img_size[0] - img0_shape[1] * gain) / 2  # width padding
+    pad_y = (img_size[1] - img0_shape[0] * gain) / 2  # height padding
+    coords[:, [0, 2]] -= pad_x
+    coords[:, [1, 3]] -= pad_y
+    coords[:, 0:4] /= gain
+    cords_max = np.max(coords[:, :4])
+    coords[:, :4] = np.clip(coords[:, :4], a_min=0, a_max=cords_max)
+    return coords
+
+
+class SoftmaxCE(nn.Cell):
+    """
+    Original nn.SoftmaxCrossEntropyWithLogits with modifications:
+    1) Set ignore index = -1.
+    2) Reshape labels and logits to (n, C).
+    3) Calculate mean by mask.
+    """
+    def __init__(self):
+        super().__init__()
+        # Set necessary operations and constants
+        self.soft_ce = ops.SoftmaxCrossEntropyWithLogits()
+        self.expand_dim = ops.ExpandDims()
+        self.transpose = ops.Transpose()
+        self.reshape = ops.Reshape()
+        self.one_hot = ops.OneHot()
+        self.sum = ops.ReduceSum()
+        self.one = Tensor(1, mstype.float32)
+        self.zero = Tensor(0, mstype.float32)
+
+        # Set eps to escape division by zero
+        self.eps = Tensor(1e-16, dtype=mstype.float32)
+
+    def construct(self, logits, labels, ignore_index):
+        """
+        Calculate softmax loss between logits and labels with ignore mask.
+        """
+        # Ignore indices which have not exactly recognized iou
+        mask = labels != ignore_index
+        mask = mask.astype('float32')
+        channels = F.shape(logits)[-1]
+
+        # One-hot labels for total identities in dataset
+        labels_one_hot = self.one_hot(labels.flatten(), channels, self.one, self.zero)
+        raw_loss, _ = self.soft_ce(
+            self.reshape(logits, (-1, channels)),
+            self.reshape(labels_one_hot, (-1, channels)),
+        )
+
+        # Apply mask and take mean of losses
+        result = raw_loss * mask.reshape(raw_loss.shape)
+        result = self.sum(result) / (self.sum(mask) + self.eps)
+
+        return result
+
+
+def build_targets_thres(target, anchor_wh, na, ngh, ngw, k_max):
+    """
+    Build grid of targets confidence mask, bbox delta and id with thresholds.
+
+    Args:
+        target (np_array): Targets bbox cords and ids.
+        anchor_wh (np_array): Resized anchors for map size.
+        na (int): Number of anchors.
+        ngh (int): Map height.
+        ngw (int): Map width.
+        k_max (int): Limitation of max detections per image.
+
+    Returns:
+        tconf (np_array): Mask with bg (0), gt (1) and ign (-1) indices. Shape (na, ngh, ngw).
+        tbox (np_array): Targets delta bbox values. Shape (na, ngh, ngw, 4).
+        tid (np_array): Grid with id for every cell. Shape (na, ngh, ngw).
+
+    """
+    id_thresh = 0.5
+    fg_thresh = 0.5
+    bg_thresh = 0.4
+
+    bg_id = -1  # Background id
+
+    tbox = np.zeros((na, ngh, ngw, 4), dtype=np.float32)  # Fill grid with zeros bbox cords
+    tconf = np.zeros((na, ngh, ngw), dtype=np.int32)  # Fill grid with zeros confidence
+    tid = np.full((na, ngh, ngw), bg_id, dtype=np.int32)  # Fill grid with background id
+
+    t = target
+    t_id = t[:, 1].copy().astype(np.int32)
+    t = t[:, [0, 2, 3, 4, 5]]
+
+    # Convert relative cords for map size
+    gxy, gwh = t[:, 1:3].copy(), t[:, 3:5].copy()
+    gxy[:, 0] = gxy[:, 0] * ngw
+    gxy[:, 1] = gxy[:, 1] * ngh
+    gwh[:, 0] = gwh[:, 0] * ngw
+    gwh[:, 1] = gwh[:, 1] * ngh
+    gxy[:, 0] = np.clip(gxy[:, 0], a_min=0, a_max=ngw - 1)
+    gxy[:, 1] = np.clip(gxy[:, 1], a_min=0, a_max=ngh - 1)
+
+    gt_boxes = np.concatenate((gxy, gwh), axis=1)  # Shape (num of targets, 4), 4 is (xc, yc, w, h)
+
+    # Apply anchor to each cell of the grid
+    anchor_mesh = generate_anchor(ngh, ngw, anchor_wh)  # Shape (na, 4, ngh, ngw)
+    anchor_list = anchor_mesh.transpose(0, 2, 3, 1).reshape(-1, 4)  # Shape (na x ngh x ngw, 4)
+
+    # Compute anchor iou with ground truths bboxes
+    iou_pdist = bbox_iou(anchor_list, gt_boxes)  # Shape (na x ngh x ngw, Ng)
+    max_gt_index = iou_pdist.argmax(axis=1)   # Shape (na x ngh x ngw)
+    iou_max = iou_pdist.max(axis=1)   # Shape (na x ngh x ngw)
+
+    iou_map = iou_max.reshape(na, ngh, ngw)
+    gt_index_map = max_gt_index.reshape(na, ngh, ngw)
+
+    # Fill tconf by thresholds
+    id_index = iou_map > id_thresh
+    fg_index = iou_map > fg_thresh
+    bg_index = iou_map < bg_thresh
+    ign_index = (iou_map < fg_thresh) * (iou_map > bg_thresh)  # Search unclear cells
+    tconf[fg_index] = 1
+    tconf[bg_index] = 0
+    tconf[ign_index] = -1  # Index to ignore unclear cells
+
+    # Take ground truths with mask
+    gt_index = gt_index_map[fg_index]
+    gt_box_list = gt_boxes[gt_index]
+    gt_id_list = t_id[gt_index_map[id_index]]
+    if np.sum(fg_index) > 0:
+        tid[id_index] = gt_id_list
+        fg_anchor_list = anchor_list.reshape((na, ngh, ngw, 4))[fg_index]
+        delta_target = encode_delta(gt_box_list, fg_anchor_list)
+        tbox[fg_index] = delta_target
+
+    # Indices of cells with detections
+    tconf_max = tconf.max(0)
+    tid_max = tid.max(0)
+    indices = np.where((tconf_max.flatten() > 0) & (tid_max.flatten() >= 0))[0]
+
+    # Fill indices with zeros if k < k_max
+    # Where k - is the detections per image
+    # k_max - max detections per image
+    k = len(indices)
+    t_indices = np.zeros(k_max)
+    t_indices[..., :min(k_max, k)] = indices[..., :min(k_max, k)]
+
+    return tconf, tbox, tid, t_indices
+
+
+def bbox_iou(box1, box2, x1y1x2y2=False):
+    """
+    Returns the IoU of two bounding boxes.
+    """
+    n, m = len(box1), len(box2)
+    if x1y1x2y2:
+        # Get the coordinates of bounding boxes
+        b1_x1, b1_y1, b1_x2, b1_y2 = box1[:, 0], box1[:, 1], box1[:, 2], box1[:, 3]
+        b2_x1, b2_y1, b2_x2, b2_y2 = box2[:, 0], box2[:, 1], box2[:, 2], box2[:, 3]
+    else:
+        # Transform from center and width to exact coordinates
+        b1_x1, b1_x2 = box1[:, 0] - box1[:, 2] / 2, box1[:, 0] + box1[:, 2] / 2
+        b1_y1, b1_y2 = box1[:, 1] - box1[:, 3] / 2, box1[:, 1] + box1[:, 3] / 2
+        b2_x1, b2_x2 = box2[:, 0] - box2[:, 2] / 2, box2[:, 0] + box2[:, 2] / 2
+        b2_y1, b2_y2 = box2[:, 1] - box2[:, 3] / 2, box2[:, 1] + box2[:, 3] / 2
+
+    # Get the coordinates of the intersection rectangle
+    inter_rect_x1 = np.maximum(np.expand_dims(b1_x1, 1), b2_x1)
+    inter_rect_y1 = np.maximum(np.expand_dims(b1_y1, 1), b2_y1)
+    inter_rect_x2 = np.minimum(np.expand_dims(b1_x2, 1), b2_x2)
+    inter_rect_y2 = np.minimum(np.expand_dims(b1_y2, 1), b2_y2)
+
+    # Intersection area
+    i_r_x = inter_rect_x2 - inter_rect_x1
+    i_r_y = inter_rect_y2 - inter_rect_y1
+    inter_area = np.clip(i_r_x, 0, np.max(i_r_x)) * np.clip(i_r_y, 0, np.max(i_r_y))
+
+    # Union Area
+    b1_area = np.broadcast_to(((b1_x2 - b1_x1) * (b1_y2 - b1_y1)).reshape(-1, 1), (n, m))
+    b2_area = np.broadcast_to(((b2_x2 - b2_x1) * (b2_y2 - b2_y1)).reshape(1, -1), (n, m))
+
+    return inter_area / (b1_area + b2_area - inter_area + 1e-16)
+
+
+def generate_anchor(ngh, ngw, anchor_wh):
+    """
+    Generate anchor for every cell in grid.
+    """
+    na = len(anchor_wh)
+    yy, xx = np.meshgrid(np.arange(ngh), np.arange(ngw), indexing='ij')
+
+    mesh = np.stack([xx, yy], axis=0)  # Shape 2, ngh, ngw
+    mesh = np.tile(np.expand_dims(mesh, 0), (na, 1, 1, 1)).astype(np.float32)  # Shape na, 2, ngh, ngw
+    anchor_offset_mesh = np.tile(np.expand_dims(np.expand_dims(anchor_wh, -1), -1), (1, 1, ngh, ngw))  # Shape na, 2, ngh, ngw
+    anchor_mesh = np.concatenate((mesh, anchor_offset_mesh), axis=1)  # Shape na, 4, ngh, ngw
+    return anchor_mesh
+
+
+def encode_delta(gt_box_list, fg_anchor_list):
+    """
+    Calculate delta for bbox center, width, height.
+    """
+    px, py, pw, ph = fg_anchor_list[:, 0], fg_anchor_list[:, 1], \
+                     fg_anchor_list[:, 2], fg_anchor_list[:, 3]
+    gx, gy, gw, gh = gt_box_list[:, 0], gt_box_list[:, 1], \
+                     gt_box_list[:, 2], gt_box_list[:, 3]
+    dx = (gx - px) / pw
+    dy = (gy - py) / ph
+    dw = np.log(gw / pw)
+    dh = np.log(gh / ph)
+
+    return np.stack([dx, dy, dw, dh], axis=1)
+
+
+def create_grids(anchors, img_size, ngw):
+    """
+    Resize anchor according to image size and feature map size.
+
+    Note:
+        Ratio of feature maps dimensions if 1:3 such as anchors.
+        Thus, it's enough to calculate stride per one dimension.
+    """
+    stride = img_size[0] / ngw
+    anchor_vec = np.array(anchors) / stride
+
+    return anchor_vec, stride
+
+
+def build_thresholds(
+        labels,
+        anchor_vec_s,
+        anchor_vec_m,
+        anchor_vec_b,
+        k_max,
+):
+    """
+    Build thresholds for all feature map sizes.
+    """
+    s = build_targets_thres(labels, anchor_vec_s, 4, 19, 34, k_max)
+    m = build_targets_thres(labels, anchor_vec_m, 4, 38, 68, k_max)
+    b = build_targets_thres(labels, anchor_vec_b, 4, 76, 136, k_max)
+
+    return s, m, b
+
+
+def create_anchors_vec(anchors, img_size=(1088, 608)):
+    """
+    Create anchor vectors for every feature map size.
+    """
+    anchors1 = anchors[0:4]
+    anchors2 = anchors[4:8]
+    anchors3 = anchors[8:12]
+    anchor_vec_s, stride_s = create_grids(anchors3, img_size, 34)
+    anchor_vec_m, stride_m = create_grids(anchors2, img_size, 68)
+    anchor_vec_b, stride_b = create_grids(anchors1, img_size, 136)
+
+    anchors = (anchor_vec_s, anchor_vec_m, anchor_vec_b)
+    strides = (stride_s, stride_m, stride_b)
+
+    return anchors, strides
+
+
+class DecodeDeltaMap(nn.Cell):
+    """
+    Network predicts delta for base anchors.
+
+    Decodes predictions into relative bbox cords.
+    """
+    def __init__(self):
+        super().__init__()
+        self.exp = ops.operations.Exp()
+        self.stack0 = ops.Stack(axis=0)
+        self.stack1 = ops.Stack(axis=1)
+        self.expand_dims = ops.ExpandDims()
+        self.reshape = ops.Reshape()
+        self.concat = ops.Concat(axis=2)
+
+    def construct(self, delta_map, anchors):
+        """
+        Decode delta of bbox predictions and summarize it with anchors.
+        """
+        anchors = anchors.astype('float32')
+        nb, na, ngh, ngw, _ = delta_map.shape
+        yy, xx = msnp.meshgrid(msnp.arange(ngh), msnp.arange(ngw), indexing='ij')
+
+        mesh = self.stack0([xx, yy]).astype('float32')  # Shape (2, ngh, ngw)
+        mesh = msnp.tile(self.expand_dims(mesh, 0), (nb, na, 1, 1, 1))  # Shape (nb, na, 2, ngh, ngw)
+        anchors_unsqueezed = self.expand_dims(self.expand_dims(anchors, -1), -1)  # Shape (na, 2, 1, 1)
+        anchor_offset_mesh = msnp.tile(anchors_unsqueezed, (nb, 1, 1, ngh, ngw))  # Shape (nb, na, 2, ngh, ngw)
+        anchor_mesh = self.concat((mesh, anchor_offset_mesh))  # Shape (nb, na, 4, ngh, ngw)
+
+        anchor_mesh = anchor_mesh.transpose(0, 1, 3, 4, 2)
+
+        delta = delta_map.reshape(-1, 4)
+        fg_anchor_list = anchor_mesh.reshape(-1, 4)
+        px, py, pw, ph = fg_anchor_list[:, 0], fg_anchor_list[:, 1], \
+                         fg_anchor_list[:, 2], fg_anchor_list[:, 3]
+        dx, dy, dw, dh = delta[:, 0], delta[:, 1], delta[:, 2], delta[:, 3]
+        gx = pw * dx + px
+        gy = ph * dy + py
+        gw = pw * self.exp(dw)
+        gh = ph * self.exp(dh)
+
+        pred_list = self.stack1([gx, gy, gw, gh])
+
+        pred_map = pred_list.reshape(nb, na, ngh, ngw, 4)
+
+        return pred_map
+
+
+def non_max_suppression(prediction, conf_thres=0.5, nms_thres=0.4):
+    """
+    Removes detections with lower object confidence score than 'conf_thres'
+    Non-Maximum Suppression to further filter detections.
+
+    Args:
+        prediction (np.array): All predictions from model output.
+        conf_thres (float): Threshold for confidence.
+        nms_thres (float): Threshold for iou into nms.
+
+    Returns:
+        output (np.array): Predictions with shape (x1, y1, x2, y2, object_conf, class_score, class_pred)
+    """
+
+    output = [None for _ in range(len(prediction))]
+    for image_i, pred in enumerate(prediction):
+        # Filter out confidence scores below threshold
+        # Get score and class with highest confidence
+
+        v = pred[:, 4] > conf_thres
+        v = np.squeeze(v.nonzero())
+        if v.ndim == 0:
+            v = np.expand_dims(v, 0)
+
+        pred = pred[v]
+
+        # If none are remaining => process next image
+        npred = pred.shape[0]
+        if not npred:
+            continue
+        # From (center x, center y, width, height) to (x1, y1, x2, y2)
+        pred[:, :4] = xywh2xyxy(pred[:, :4])
+
+        # Non-maximum suppression
+        bboxes = np.concatenate((pred[:, :4], np.expand_dims(pred[:, 4], -1)), axis=1)
+        nms_indices = nms(bboxes, nms_thres)
+        det_max = pred[nms_indices]
+
+        if det_max.size > 0:
+            # Add max detections to outputs
+            output[image_i] = det_max if output[image_i] is None else np.concatenate((output[image_i], det_max))
+
+    return output
+
+
+def nms(dets, thresh):
+    """
+    Non-maximum suppression with threshold.
+    """
+    x1 = dets[:, 0]
+    y1 = dets[:, 1]
+    x2 = dets[:, 2]
+    y2 = dets[:, 3]
+    scores = dets[:, 4]
+
+    areas = (x2 - x1 + 1) * (y2 - y1 + 1)
+    order = scores.argsort()[::-1]
+
+    keep = []
+    while order.size > 0:
+        i = order[0]
+        keep.append(i)
+        xx1 = np.maximum(x1[i], x1[order[1:]])
+        yy1 = np.maximum(y1[i], y1[order[1:]])
+        xx2 = np.minimum(x2[i], x2[order[1:]])
+        yy2 = np.minimum(y2[i], y2[order[1:]])
+
+        w = np.maximum(0.0, xx2 - xx1 + 1)
+        h = np.maximum(0.0, yy2 - yy1 + 1)
+        inter = w * h
+        ovr = inter / (areas[i] + areas[order[1:]] - inter)
+
+        inds = np.where(ovr <= thresh)[0]
+        order = order[inds + 1]
+
+    return keep
+
+
+def ap_per_class(tp, conf, pred_cls, target_cls):
+    """
+    Computes the average precision, given the recall and precision curves.
+    Method originally from https://github.com/rafaelpadilla/Object-Detection-Metrics.
+
+    Args:
+        tp (list): True positives.
+        conf (list): Objectness value from 0-1.
+        pred_cls (np.array): Predicted object classes.
+        target_cls (np.array): True object classes.
+
+    Returns:
+        ap (np.array): The average precision as computed in py-faster-rcnn.
+        unique classes (np.array): Classes of predictions.
+        r (np.array): Recall.
+        p (np.array): Precision.
+    """
+
+    # lists/pytorch to numpy
+    tp, conf, pred_cls, target_cls = np.array(tp), np.array(conf), np.array(pred_cls), np.array(target_cls)
+
+    # Sort by objectness
+    i = np.argsort(-conf)
+    tp, conf, pred_cls = tp[i], conf[i], pred_cls[i]
+
+    # Find unique classes
+    unique_classes = np.unique(np.concatenate((pred_cls, target_cls), 0))
+
+    # Create Precision-Recall curve and compute AP for each class
+    ap, p, r = [], [], []
+    for c in unique_classes:
+        i = pred_cls == c
+        n_gt = sum(target_cls == c)  # Number of ground truth objects
+        n_p = sum(i)  # Number of predicted objects
+
+        if (n_p == 0) and (n_gt == 0):
+            continue
+
+        if (n_p == 0) or (n_gt == 0):
+            ap.append(0)
+            r.append(0)
+            p.append(0)
+        else:
+            # Accumulate FPs and TPs
+            fpc = np.cumsum(1 - tp[i])
+            tpc = np.cumsum(tp[i])
+
+            # Recall
+            recall_curve = tpc / (n_gt + 1e-16)
+            r.append(tpc[-1] / (n_gt + 1e-16))
+
+            # Precision
+            precision_curve = tpc / (tpc + fpc)
+            p.append(tpc[-1] / (tpc[-1] + fpc[-1]))
+
+            # AP from recall-precision curve
+            ap.append(compute_ap(recall_curve, precision_curve))
+
+    return np.array(ap), unique_classes.astype('int32'), np.array(r), np.array(p)
+
+
+def compute_ap(recall, precision):
+    """
+    Computes the average precision, given the recall and precision curves.
+    Code originally from https://github.com/rbgirshick/py-faster-rcnn.
+
+    Args:
+        recall (list): The recall curve.
+        precision (list): The precision curve.
+
+    Returns:
+        ap (np.array): The average precision as computed in py-faster-rcnn.
+    """
+
+    # correct AP calculation
+    # first append sentinel values at the end
+    mrec = np.concatenate(([0.], recall, [1.]))
+    mpre = np.concatenate(([0.], precision, [0.]))
+
+    # compute the precision envelope
+    for i in range(mpre.size - 1, 0, -1):
+        mpre[i - 1] = np.maximum(mpre[i - 1], mpre[i])
+
+    # to calculate area under PR curve, look for points
+    # where X axis (recall) changes value
+    i = np.where(mrec[1:] != mrec[:-1])[0]
+
+    # and sum (\Delta recall) * prec
+    ap = np.sum((mrec[i + 1] - mrec[i]) * mpre[i + 1])
+    return ap
diff --git a/research/cv/JDE/src/visualization.py b/research/cv/JDE/src/visualization.py
new file mode 100644
index 0000000000000000000000000000000000000000..7057b3c5fd9ea6a1a69d912f71edc17f1663459a
--- /dev/null
+++ b/research/cv/JDE/src/visualization.py
@@ -0,0 +1,54 @@
+# Copyright 2022 Huawei Technologies Co., Ltd
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ============================================================================
+"""Images visualization script."""
+import numpy as np
+import cv2
+
+
+def get_color(idx):
+    """
+    Set the color for unique pedestrian.
+    """
+    idx = idx * 3
+    color = ((37 * idx) % 255, (17 * idx) % 255, (29 * idx) % 255)
+
+    return color
+
+
+def plot_tracking(image, tlwhs, obj_ids, frame_id=0, fps=0., ids2=None):
+    """
+    Show tracking results.
+    """
+    im = np.ascontiguousarray(np.copy(image))
+
+    text_scale = max(1, image.shape[1] / 1600.)
+    text_thickness = 1 if text_scale > 1.1 else 1
+    line_thickness = max(1, int(image.shape[1] / 500.))
+
+    cv2.putText(im, f'frame: {frame_id} fps: {fps:.2f} num: {len(tlwhs)}',
+                (0, int(15 * text_scale)), cv2.FONT_HERSHEY_PLAIN, text_scale, (0, 0, 255), thickness=2)
+
+    for i, tlwh in enumerate(tlwhs):
+        x1, y1, w, h = tlwh
+        intbox = tuple(map(int, (x1, y1, x1 + w, y1 + h)))
+        obj_id = int(obj_ids[i])
+        id_text = f'{int(obj_id)}'
+        if ids2 is not None:
+            id_text = id_text + f', {int(ids2[i])}'
+        color = get_color(abs(obj_id))
+        cv2.rectangle(im, intbox[0:2], intbox[2:4], color=color, thickness=line_thickness)
+        cv2.putText(im, id_text, (intbox[0], intbox[1] + 30), cv2.FONT_HERSHEY_PLAIN, text_scale, (0, 0, 255),
+                    thickness=text_thickness)
+    return im
diff --git a/research/cv/JDE/tracker/__init__.py b/research/cv/JDE/tracker/__init__.py
new file mode 100644
index 0000000000000000000000000000000000000000..e69de29bb2d1d6434b8b29ae775ad8c2e48c5391
diff --git a/research/cv/JDE/tracker/basetrack.py b/research/cv/JDE/tracker/basetrack.py
new file mode 100644
index 0000000000000000000000000000000000000000..49c8c090ee98437902408f74375bcd8a69a74a5a
--- /dev/null
+++ b/research/cv/JDE/tracker/basetrack.py
@@ -0,0 +1,71 @@
+# Copyright 2022 Huawei Technologies Co., Ltd
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ============================================================================
+"""Init base track params."""
+from collections import OrderedDict
+
+import numpy as np
+
+
+class TrackState:
+    new = 0
+    tracked = 1
+    lost = 2
+    removed = 3
+
+
+class BaseTrack:
+    """
+    Track class template.
+    """
+    _count = 0
+
+    track_id = 0
+    is_activated = False
+    state = TrackState.new
+
+    history = OrderedDict()
+    features = []
+    curr_feature = None
+    score = 0
+    start_frame = 0
+    frame_id = 0
+    time_since_update = 0
+
+    # multi-camera
+    location = (np.inf, np.inf)
+
+    @property
+    def end_frame(self):
+        return self.frame_id
+
+    @staticmethod
+    def next_id():
+        BaseTrack._count += 1
+        return BaseTrack._count
+
+    def activate(self, *args):
+        raise NotImplementedError
+
+    def predict(self):
+        raise NotImplementedError
+
+    def update(self, *args, **kwargs):
+        raise NotImplementedError
+
+    def mark_lost(self):
+        self.state = TrackState.lost
+
+    def mark_removed(self):
+        self.state = TrackState.removed
diff --git a/research/cv/JDE/tracker/matching.py b/research/cv/JDE/tracker/matching.py
new file mode 100644
index 0000000000000000000000000000000000000000..faf550f8f88e0a2d6cf1b2f31f43173c51480bd8
--- /dev/null
+++ b/research/cv/JDE/tracker/matching.py
@@ -0,0 +1,115 @@
+# Copyright 2022 Huawei Technologies Co., Ltd
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ============================================================================
+"""Matching script."""
+import lap
+import numpy as np
+from cython_bbox import bbox_overlaps as bbox_ious
+from scipy.spatial.distance import cdist
+
+from src import kalman_filter
+
+
+def linear_assignment(cost_matrix, thresh):
+    """
+    Linear assignment with threshold.
+    """
+    if cost_matrix.size == 0:
+        out = (
+            np.empty((0, 2), dtype=int),
+            tuple(range(cost_matrix.shape[0])),
+            tuple(range(cost_matrix.shape[1])),
+        )
+
+        return out
+    matches, unmatched_a, unmatched_b = [], [], []
+    _, x, y = lap.lapjv(cost_matrix, extend_cost=True, cost_limit=thresh)
+    for ix, mx in enumerate(x):
+        if mx >= 0:
+            matches.append([ix, mx])
+    unmatched_a = np.where(x < 0)[0]
+    unmatched_b = np.where(y < 0)[0]
+    matches = np.asarray(matches)
+
+    return matches, unmatched_a, unmatched_b
+
+
+def iou(atlbrs, btlbrs):
+    """
+    Compute cost based on IoU.
+    """
+    ious = np.zeros((len(atlbrs), len(btlbrs)), dtype=np.float)
+    if ious.size == 0:
+        return ious
+
+    ious = bbox_ious(
+        np.ascontiguousarray(atlbrs, dtype=np.float),
+        np.ascontiguousarray(btlbrs, dtype=np.float)
+    )
+
+    return ious
+
+
+def iou_distance(atracks, btracks):
+    """
+    Compute cost based on IoU.
+    """
+    if (atracks and isinstance(atracks[0], np.ndarray)) or \
+       (btracks and isinstance(btracks[0], np.ndarray)):
+        atlbrs = atracks
+        btlbrs = btracks
+    else:
+        atlbrs = [track.tlbr for track in atracks]
+        btlbrs = [track.tlbr for track in btracks]
+
+    ious_val = iou(atlbrs, btlbrs)
+    cost_matrix = 1 - ious_val
+
+    return cost_matrix
+
+def embedding_distance(tracks, detections):
+    """
+    Compute embedding distance.
+    """
+    cost_matrix = np.zeros((len(tracks), len(detections)), dtype=np.float)
+    if cost_matrix.size == 0:
+        return cost_matrix
+    det_features = np.asarray([track.curr_feat for track in detections], dtype=np.float)
+    track_features = np.asarray([track.smooth_feat for track in tracks], dtype=np.float)
+    cost_matrix = np.maximum(0.0, cdist(track_features, det_features))  # Nomalized features
+
+    return cost_matrix
+
+
+def fuse_motion(kf, cost_matrix, tracks, detections, only_position=False, lambda_=0.98):
+    """
+    Fuses motion objects.
+    """
+    if cost_matrix.size == 0:
+        return cost_matrix
+    gating_dim = 2 if only_position else 4
+    gating_threshold = kalman_filter.chi2inv95[gating_dim]
+    measurements = np.asarray([det.to_xyah() for det in detections])
+    for row, track in enumerate(tracks):
+        gating_distance = kf.gating_distance(
+            track.mean,
+            track.covariance,
+            measurements,
+            only_position,
+            metric='maha',
+        )
+        cost_matrix[row, gating_distance > gating_threshold] = np.inf
+        cost_matrix[row] = lambda_ * cost_matrix[row] + (1-lambda_)* gating_distance
+
+    return cost_matrix
diff --git a/research/cv/JDE/tracker/multitracker.py b/research/cv/JDE/tracker/multitracker.py
new file mode 100644
index 0000000000000000000000000000000000000000..f712bb161b5cf8d80c853c44f97b1d55e910f2c8
--- /dev/null
+++ b/research/cv/JDE/tracker/multitracker.py
@@ -0,0 +1,447 @@
+# Copyright 2022 Huawei Technologies Co., Ltd
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ============================================================================
+"""Multiple objects tracking."""
+from collections import deque
+
+import numpy as np
+
+from src.kalman_filter import KalmanFilter
+from src.log import logger
+from src.utils import non_max_suppression
+from src.utils import scale_coords
+from tracker import matching
+from tracker.basetrack import BaseTrack, TrackState
+
+
+class TrackS(BaseTrack):
+    """
+    Compute stracks.
+    """
+    def __init__(self, tlwh, score, temp_feat, buffer_size=30):
+        # wait activate
+        self._tlwh = np.asarray(tlwh, dtype=np.float)
+        self.kalman_filter = None
+        self.mean, self.covariance = None, None
+        self.is_activated = False
+
+        self.score = score
+        self.tracklet_len = 0
+
+        self.smooth_feat = None
+        self.update_features(temp_feat)
+        self.features = deque([], maxlen=buffer_size)
+        self.alpha = 0.9
+
+    def update_features(self, feat):
+        """
+        Update values.
+        """
+        feat /= np.linalg.norm(feat)
+        self.curr_feat = feat
+        if self.smooth_feat is None:
+            self.smooth_feat = feat
+        else:
+            self.smooth_feat = self.alpha * self.smooth_feat + (1 - self.alpha) * feat
+        self.features.append(feat)
+        self.smooth_feat /= np.linalg.norm(self.smooth_feat)
+
+    def predict(self):
+        """
+        Compute math distribution.
+        """
+        mean_state = self.mean.copy()
+        if self.state != TrackState.tracked:
+            mean_state[7] = 0
+        self.mean, self.covariance = self.kalman_filter.predict(mean_state, self.covariance)
+
+    @staticmethod
+    def multi_predict(stracks, kalman_filter):
+        """
+        Compute multi math distribution.
+        """
+        if stracks:
+            multi_mean = np.asarray([st.mean.copy() for st in stracks])
+            multi_covariance = np.asarray([st.covariance for st in stracks])
+            for i, st in enumerate(stracks):
+                if st.state != TrackState.tracked:
+                    multi_mean[i][7] = 0
+            multi_mean, multi_covariance = kalman_filter.multi_predict(multi_mean, multi_covariance)
+            for i, (mean, cov) in enumerate(zip(multi_mean, multi_covariance)):
+                stracks[i].mean = mean
+                stracks[i].covariance = cov
+
+    def activate(self, kalman_filter, frame_id):
+        """
+        Start a new tracklet.
+        """
+        self.kalman_filter = kalman_filter
+        self.track_id = self.next_id()
+        self.mean, self.covariance = self.kalman_filter.initiate(self.tlwh_to_xyah(self._tlwh))
+
+        self.tracklet_len = 0
+        self.state = TrackState.tracked
+        self.frame_id = frame_id
+        self.start_frame = frame_id
+
+    def re_activate(self, new_track, frame_id, new_id=False):
+        """
+        Reactivate new tracks.
+        """
+        self.mean, self.covariance = self.kalman_filter.update(
+            self.mean,
+            self.covariance,
+            self.tlwh_to_xyah(new_track.tlwh),
+        )
+
+        self.update_features(new_track.curr_feat)
+        self.tracklet_len = 0
+        self.state = TrackState.tracked
+        self.is_activated = True
+        self.frame_id = frame_id
+        if new_id:
+            self.track_id = self.next_id()
+
+    def update(self, new_track, frame_id, update_feature=True):
+        """
+        Update a matched track.
+
+        Args:
+            new_track (TrackS): New track frame.
+            frame_id (int): Number of current frame.
+            update_feature (bool): Update or not.
+        """
+        self.frame_id = frame_id
+        self.tracklet_len += 1
+
+        new_tlwh = new_track.tlwh
+        self.mean, self.covariance = self.kalman_filter.update(
+            self.mean, self.covariance, self.tlwh_to_xyah(new_tlwh))
+        self.state = TrackState.tracked
+        self.is_activated = True
+
+        self.score = new_track.score
+        if update_feature:
+            self.update_features(new_track.curr_feat)
+
+    @property
+    def tlwh(self):
+        """
+        Get current position in bounding box format
+        (top left x, top left y, width, height).
+        """
+        if self.mean is None:
+            return self._tlwh.copy()
+        ret = self.mean[:4].copy()
+        ret[2] *= ret[3]
+        ret[:2] -= ret[2:] / 2
+        return ret
+
+    @property
+    def tlbr(self):
+        """
+        Convert bounding box to format
+        (min x, min y, max x, max y), i.e., (top left, bottom right).
+        """
+        ret = self.tlwh.copy()
+        ret[2:] += ret[:2]
+        return ret
+
+    @staticmethod
+    def tlwh_to_xyah(tlwh):
+        """
+        Convert bounding box to format
+        (center x, center y, aspect ratio, height),
+        where the aspect ratio is width / height.
+        """
+        ret = np.asarray(tlwh).copy()
+        ret[:2] += ret[2:] / 2
+        ret[2] /= ret[3]
+        return ret
+
+    def to_xyah(self):
+        """
+        Convert tlwh format to xyah.
+        """
+        return self.tlwh_to_xyah(self.tlwh)
+
+    @staticmethod
+    def tlbr_to_tlwh(tlbr):
+        """
+        Convert tlbr format to tlwh.
+        """
+        ret = np.asarray(tlbr).copy()
+        ret[2:] -= ret[:2]
+        return ret
+
+    @staticmethod
+    def tlwh_to_tlbr(tlwh):
+        """
+        Convert tlwh format to tlbr.
+        """
+        ret = np.asarray(tlwh).copy()
+        ret[2:] += ret[:2]
+        return ret
+
+    def __repr__(self):
+        return f'OT_{self.track_id}_({self.start_frame}-{self.end_frame})'
+
+
+class JDETracker:
+    """
+    Compute track per frame and apply tracking.
+    """
+    def __init__(self, opt, net, frame_rate=30):
+        self.opt = opt
+
+        self.model = net
+        logger.info('Inference for: %s', opt.ckpt_url)
+
+        self.tracked_stracks = []  # type: list[TrackS]
+        self.lost_stracks = []  # type: list[TrackS]
+        self.removed_stracks = []  # type: list[TrackS]
+
+        self.frame_id = 0
+        self.det_thresh = opt.conf_thres
+        self.buffer_size = int(frame_rate / 30.0 * opt.track_buffer)
+        self.max_time_lost = self.buffer_size
+
+        self.kalman_filter = KalmanFilter()
+
+    def tracking(
+            self,
+            activated_stracks,
+            refind_stracks,
+            lost_stracks,
+            removed_stracks,
+            unconfirmed,
+            tracked_stracks,
+            detections,
+        ):
+        """
+        Apply tracking strategy.
+        """
+        # Step 2: First association, with embedding.
+        # Combining currently tracked_stracks and lost_stracks
+        strack_pool = joint_stracks(tracked_stracks, self.lost_stracks)
+        # Predict the current location with kalman filter
+        TrackS.multi_predict(strack_pool, self.kalman_filter)
+
+        # Compute distances of the detection with the tracks in strack_pool.
+        dists = matching.embedding_distance(strack_pool, detections)
+        dists = matching.fuse_motion(self.kalman_filter, dists, strack_pool, detections)
+
+        matches, u_track, u_detection = matching.linear_assignment(dists, thresh=0.7)
+        # The matches is the array for corresponding matches of the detection with the corresponding strack_pool.
+
+        for itracked, idet in matches:
+            # itracked is the id of the track and idet is the detection
+            track = strack_pool[itracked]
+            det = detections[idet]
+            if track.state == TrackState.tracked:
+                # If the track is active, add the detection to the track
+                track.update(detections[idet], self.frame_id)
+                activated_stracks.append(track)
+            else:
+                # Detection from a track which is not active, hence put the track in refind_stracks list
+                track.re_activate(det, self.frame_id, new_id=False)
+                refind_stracks.append(track)
+
+        # Step 3: Second association, with IOU
+        detections = [detections[i] for i in u_detection]
+        # detections is now a list of the unmatched detections
+        r_tracked_stracks = []  # This is container for stracks which were tracked till the
+        # previous frame but no detection was found for it in the current frame
+        for i in u_track:
+            if strack_pool[i].state == TrackState.tracked:
+                r_tracked_stracks.append(strack_pool[i])
+        dists = matching.iou_distance(r_tracked_stracks, detections)
+        matches, u_track, u_detection = matching.linear_assignment(dists, thresh=0.5)
+        # matches is the list of detections which matched with corresponding tracks by IOU distance method
+        for itracked, idet in matches:
+            track = r_tracked_stracks[itracked]
+            det = detections[idet]
+            if track.state == TrackState.tracked:
+                track.update(det, self.frame_id)
+                activated_stracks.append(track)
+            else:
+                track.re_activate(det, self.frame_id, new_id=False)
+                refind_stracks.append(track)
+        # Same process done for some unmatched detections, but now considering IOU_distance as measure
+
+        for it in u_track:
+            track = r_tracked_stracks[it]
+            if not track.state == TrackState.lost:
+                track.mark_lost()
+                lost_stracks.append(track)
+        # If no detections are obtained for tracks (u_track),
+        # the tracks are added to lost_tracks list and are marked lost.
+
+        # Deal with unconfirmed tracks, usually tracks with only one beginning frame
+        detections = [detections[i] for i in u_detection]
+        dists = matching.iou_distance(unconfirmed, detections)
+        matches, u_unconfirmed, u_detection = matching.linear_assignment(dists, thresh=0.7)
+        for itracked, idet in matches:
+            unconfirmed[itracked].update(detections[idet], self.frame_id)
+            activated_stracks.append(unconfirmed[itracked])
+
+        # The tracks which are yet not matched
+        for it in u_unconfirmed:
+            track = unconfirmed[it]
+            track.mark_removed()
+            removed_stracks.append(track)
+
+        # After all these confirmation steps, if a new detection is found, it is initialized for a new track
+        # Step 4: Init new stracks
+        for inew in u_detection:
+            track = detections[inew]
+            if track.score < self.det_thresh:
+                continue
+            track.activate(self.kalman_filter, self.frame_id)
+            activated_stracks.append(track)
+
+        # Step 5: Update state
+        # If the tracks are lost for more frames than the threshold number, the tracks are removed.
+        for track in self.lost_stracks:
+            if self.frame_id - track.end_frame > self.max_time_lost:
+                track.mark_removed()
+                removed_stracks.append(track)
+
+        # Update the self.tracked_stracks and self.lost_stracks using the updates in this step.
+        self.tracked_stracks = [t for t in self.tracked_stracks if t.state == TrackState.tracked]
+        self.tracked_stracks = joint_stracks(self.tracked_stracks, activated_stracks)
+        self.tracked_stracks = joint_stracks(self.tracked_stracks, refind_stracks)
+        self.lost_stracks = sub_stracks(self.lost_stracks, self.tracked_stracks)
+        self.lost_stracks.extend(lost_stracks)
+        self.lost_stracks = sub_stracks(self.lost_stracks, self.removed_stracks)
+        self.removed_stracks.extend(removed_stracks)
+        self.tracked_stracks, self.lost_stracks = remove_duplicate_stracks(self.tracked_stracks, self.lost_stracks)
+
+
+    def update(self, im_blob, img0):
+        """
+        Processes the image frame and finds bounding box(detections).
+
+        Associates the detection with corresponding tracklets
+        and also handles lost, removed, refound and active tracklets.
+
+        Args:
+            im_blob (np.array): Tensor of image. By default, shape of this tensor is [1, 3, 608, 1088].
+            img0 (np.array): Input image sequence. By default, shape is [608, 1080, 3].
+
+        Returns:
+            output_stracks (list of TrackS): Information regarding the online_tracklets for the received image tensor.
+        """
+        self.frame_id += 1
+        activated_stracks = []  # For storing active tracks, for the current frame.
+        refind_stracks = []  # Lost Tracks whose detections are obtained in the current frame.
+        lost_stracks = []  # The tracks which are not obtained in the current frame but are not removed.
+        removed_stracks = []
+        unconfirmed = []
+        tracked_stracks = []  # type: list[TrackS]
+
+        # Step 1: Network forward, get detections & embeddings
+        _, pred = self.model.predict(im_blob)
+
+        # Pred is tensor of all the proposals (default number of proposals: 54264).
+        # Proposals have information associated with the bounding box and embeddings.
+        pred = pred.asnumpy()
+        pred = pred[pred[:, :, 4] > self.opt.conf_thres]
+        # Pred now has lesser number of proposals. Proposals rejected on basis of object confidence score.
+
+        if pred.size > 0:
+            dets = non_max_suppression(np.expand_dims(pred, 0), self.opt.conf_thres, self.opt.nms_thres)[0]
+
+            # Final proposals are obtained in dets. Information of bounding box and embeddings also included.
+            # Next step changes the detection scales
+            scale_coords(self.opt.img_size, dets[:, :4], img0.shape).round()
+
+            # Detections is list of (x1, y1, x2, y2, object_conf, class_score, class_pred)
+            # Class_pred is the embeddings.
+            detections = [TrackS(TrackS.tlbr_to_tlwh(tlbrs[:4]), tlbrs[4], f, 30) for
+                          (tlbrs, f) in zip(dets[:, :5], dets[:, 6:])]
+        else:
+            detections = []
+
+        # Add newly detected tracklets to tracked_stracks
+        for track in self.tracked_stracks:
+            if not track.is_activated:
+                # previous tracks which are not active in the current frame are added in unconfirmed list
+                unconfirmed.append(track)
+            else:
+                # Active tracks are added to the local list 'tracked_stracks'
+                tracked_stracks.append(track)
+
+        self.tracking(
+            activated_stracks,
+            refind_stracks,
+            lost_stracks,
+            removed_stracks,
+            unconfirmed,
+            tracked_stracks,
+            detections,
+        )
+
+        # get scores of lost tracks
+        output_stracks = [track for track in self.tracked_stracks if track.is_activated]
+
+        return output_stracks
+
+
+def joint_stracks(tlista, tlistb):
+    """
+    Append stracks.
+    """
+    exists = {}
+    res = []
+    for t in tlista:
+        exists[t.track_id] = 1
+        res.append(t)
+    for t in tlistb:
+        tid = t.track_id
+        if not exists.get(tid, 0):
+            exists[tid] = 1
+            res.append(t)
+    return res
+
+def sub_stracks(tlista, tlistb):
+    """
+    Delete stracks.
+    """
+    stracks = {}
+    for t in tlista:
+        stracks[t.track_id] = t
+    for t in tlistb:
+        tid = t.track_id
+        if stracks.get(tid, 0):
+            del stracks[tid]
+    return list(stracks.values())
+
+def remove_duplicate_stracks(stracksa, stracksb):
+    """
+    Removes duplicate from stracks.
+    """
+    pdist = matching.iou_distance(stracksa, stracksb)
+    pairs = np.where(pdist < 0.15)
+    dupa, dupb = [], []
+    for p, q in zip(*pairs):
+        timep = stracksa[p].frame_id - stracksa[p].start_frame
+        timeq = stracksb[q].frame_id - stracksb[q].start_frame
+        if timep > timeq:
+            dupb.append(q)
+        else:
+            dupa.append(p)
+    resa = [t for i, t in enumerate(stracksa) if not i in dupa]
+    resb = [t for i, t in enumerate(stracksb) if not i in dupb]
+    return resa, resb
diff --git a/research/cv/JDE/train.py b/research/cv/JDE/train.py
new file mode 100644
index 0000000000000000000000000000000000000000..da70b37187a1fb6573c6841ec91f1d96cd4560d6
--- /dev/null
+++ b/research/cv/JDE/train.py
@@ -0,0 +1,253 @@
+# Copyright 2022 Huawei Technologies Co., Ltd
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ============================================================================
+"""Train script."""
+import json
+
+import numpy as np
+from mindspore import Model
+from mindspore import context
+from mindspore import dataset as ds
+from mindspore import nn
+from mindspore.common import set_seed
+from mindspore.communication.management import get_group_size
+from mindspore.communication.management import get_rank
+from mindspore.communication.management import init
+from mindspore.context import ParallelMode
+from mindspore.dataset.vision import py_transforms as PY
+from mindspore.train.callback import CheckpointConfig
+from mindspore.train.callback import LossMonitor
+from mindspore.train.callback import ModelCheckpoint
+from mindspore.train.callback import TimeMonitor
+from mindspore.train.serialization import load_checkpoint
+from mindspore.train.serialization import load_param_into_net
+
+from cfg.config import config as default_config
+from src.darknet import DarkNet, ResidualBlock
+from src.dataset import JointDataset
+from src.model import JDE
+from src.model import YOLOv3
+
+set_seed(1)
+
+
+def lr_steps(cfg, steps_per_epoch):
+    """
+    Init lr steps.
+    """
+    learning_rate = warmup_lr(
+        cfg.lr,
+        steps_per_epoch,
+        cfg.epochs,
+    )
+
+    return learning_rate
+
+
+def warmup_lr(lr5, steps_per_epoch, max_epoch):
+    """
+    Set lr for training with warmup and freeze backbone.
+
+    Args:
+        lr5 (float): Initialized learning rate.
+        steps_per_epoch (int): Num of steps per epoch on one device.
+        max_epoch (int): Num of training epochs.
+
+    Returns:
+        lr_each_step (np.array): Lr for every step of training for model params.
+    """
+    base_lr = lr5
+    warmup_steps = 1000
+    total_steps = int(max_epoch * steps_per_epoch)
+    milestone_1 = int(0.5 * max_epoch * steps_per_epoch)
+    milestone_2 = int(0.75 * max_epoch * steps_per_epoch)
+
+    lr_each_step = []
+
+    for i in range(total_steps):
+        if i < warmup_steps:
+            lr5 = base_lr * ((i + 1) / warmup_steps) ** 4
+        elif warmup_steps <= i < milestone_1:
+            lr5 = base_lr
+        elif milestone_1 <= i < milestone_2:
+            lr5 = base_lr * 0.1
+        elif milestone_2 <= i:
+            lr5 = base_lr * 0.01
+
+        lr_each_step.append(lr5)
+
+    lr_each_step = np.array(lr_each_step, dtype=np.float32)
+
+    return lr_each_step
+
+
+def set_context(cfg):
+    """
+    Set process context.
+
+    Args:
+        cfg: Config parameters.
+
+    Returns:
+        dev_target (str): Device target platform.
+        dev_num (int): Amount of devices participating in process.
+        dev_id (int): Current process device id..
+    """
+    dev_target = cfg.device_target
+    context.set_context(mode=context.GRAPH_MODE, device_target=dev_target)
+
+    if dev_target == 'GPU':
+        if cfg.is_distributed:
+            init(backend_name='nccl')
+            dev_num = get_group_size()
+            dev_id = get_rank()
+            context.reset_auto_parallel_context()
+            context.set_auto_parallel_context(
+                device_num=dev_num,
+                parallel_mode=ParallelMode.DATA_PARALLEL,
+                gradients_mean=True,
+            )
+        else:
+            dev_num = 1
+            dev_id = cfg.device_id
+            context.set_context(device_id=dev_id)
+    else:
+        raise ValueError("Unsupported platform.")
+
+    return dev_num, dev_id
+
+
+def init_callbacks(cfg, batch_number, dev_id):
+    """
+    Initialize training callbacks.
+
+    Args:
+        cfg: Config parameters.
+        batch_number: Number of batches into one epoch on one device.
+        dev_id: Current process device id.
+
+    Returns:
+        cbs: Inited callbacks.
+    """
+    loss_cb = LossMonitor(per_print_times=100)
+    time_cb = TimeMonitor(data_size=batch_number)
+
+    if cfg.is_distributed and dev_id != cfg.device_start:
+        cbs = [loss_cb, time_cb]
+    else:
+        config_ck = CheckpointConfig(
+            save_checkpoint_steps=batch_number,
+            keep_checkpoint_max=cfg.keep_checkpoint_max,
+        )
+
+        ckpt_cb = ModelCheckpoint(
+            prefix="JDE",
+            directory=cfg.logs_dir,
+            config=config_ck,
+        )
+
+        cbs = [loss_cb, time_cb, ckpt_cb]
+
+    return cbs
+
+
+if __name__ == "__main__":
+    config = default_config
+    device_target = config.device_target
+
+    rank_size, rank_id = set_context(config)
+
+    with open(config.data_cfg_url) as f:
+        data_config = json.load(f)
+        trainset_paths = data_config['train']
+
+    dataset = JointDataset(
+        config.dataset_root,
+        trainset_paths,
+        k_max=config.k_max,
+        augment=True,
+        transforms=PY.ToTensor(),
+        config=config,
+    )
+
+    dataloader = ds.GeneratorDataset(
+        dataset,
+        column_names=config.col_names_train,
+        shuffle=True,
+        num_parallel_workers=4,
+        num_shards=rank_size,
+        shard_id=rank_id,
+        max_rowsize=12,
+        python_multiprocessing=True,
+    )
+
+    dataloader = dataloader.batch(config.batch_size, True)
+
+    batch_num = dataloader.get_dataset_size()
+
+    # Initialize backbone
+    darknet53 = DarkNet(
+        ResidualBlock,
+        config.backbone_layers,
+        config.backbone_input_shape,
+        config.backbone_shape,
+        detect=True,
+    )
+
+    # Load weights into backbone
+    if config.ckpt_url is not None:
+        if config.ckpt_url.endswith(".ckpt"):
+            param_dict = load_checkpoint(config.ckpt_url)
+        else:
+            raise ValueError(f"Unsupported checkpoint extension: {config.ckpt_url}.")
+
+        load_param_into_net(darknet53, param_dict)
+        print(f"Load pre-trained backbone from: {config.ckpt_url}")
+    else:
+        print("Start without pre-trained backbone.")
+
+    # Initialize FPN with YOLOv3 head
+    yolov3 = YOLOv3(
+        backbone=darknet53,
+        backbone_shape=config.backbone_shape,
+        out_channel=config.out_channel,
+    )
+
+    # Initialize train model with loss cell
+    net = JDE(yolov3, default_config, dataset.nid, config.embedding_dim)
+
+    # Initiate lr for training
+    lr = lr_steps(config, batch_num)
+
+    params = net.trainable_params()
+
+    # Set lr scheduler
+    group_params = [
+        {'params': params, 'lr': lr},
+        {'order_params': params},
+    ]
+
+    opt = nn.SGD(
+        params=group_params,
+        learning_rate=lr,
+        momentum=config.momentum,
+        weight_decay=config.decay,
+    )
+
+    model = Model(net, optimizer=opt)
+
+    callbacks = init_callbacks(config, batch_num, rank_id)
+
+    model.train(epoch=config.epochs, train_dataset=dataloader, callbacks=callbacks, dataset_sink_mode=False)
+    print("train success")