diff --git a/research/cv/flownet2/README.md b/research/cv/flownet2/README.md
new file mode 100644
index 0000000000000000000000000000000000000000..34ebe2b864b4e1d9019e30248da860190cfd0178
--- /dev/null
+++ b/research/cv/flownet2/README.md
@@ -0,0 +1,426 @@
+# Contents
+
+- [Contents](#contents)
+- [Flownet2 Description](#flownet2-description)
+- [Model Architecture](#model-architecture)
+- [Dataset](#dataset)
+- [Environment Requirements](#environment-requirements)
+- [Quick Start](#quick-start)
+- [Script Description](#script-description)
+    - [Script and Sample Code](#script-and-sample-code)
+    - [Script Parameters](#script-parameters)
+    - [Training Process](#training-process)
+        - [Training](#training)
+        - [Distributed Training](#distributed-training)
+    - [Evaluation Process](#evaluation-process)
+        - [Evaluation](#evaluation)
+- [Model Description](#model-description)
+    - [Performance](#performance)
+        - [Training Performance](#training-performance)
+            - [FlowNet2 train on FlyingChairs](#flownet2-train-on-flyingchairs)
+        - [Inference Performance](#inference-performance)
+            - [FLowNet2 infer on MpiSintelClean](#flownet2-infer-on-mpisintelclean)
+- [ModelZoo Homepage](#modelzoo-homepage)
+
+# [Flownet2 Description](#contents)
+
+FlowNet2.0, a deep network proposed in 2017, which performs end-to-end learning on optical flow data.
+It is optimized based on the FlowNet network , The large improvements in quality and
+speed are caused by three major contributions: first, it
+focus on the training data and show that the schedule of
+presenting data during training is very important. Second,
+it develop a stacked architecture that includes warping
+of the second image with intermediate optical flow. Third,
+it elaborate on small displacements by introducing a subnetwork specializing on small motions.
+
+Compared with the FLownet network,  FlowNet 2.0 is only
+marginally slower than the original FlowNet but decreases
+the estimation error by more than 50%.
+
+[FlowNet2 paper](https://arxiv.org/abs/1612.01925 )：Eddy Ilg, Nikolaus Mayer, Tonmoy Saikia, Margret Keuper, Alexey Dosovitskiy, Thomas Brox
+
+[FlowNet paper](https://arxiv.org/abs/1504.06852 )：Philipp Fischer, Alexey Dosovitskiy, Eddy Ilg, Philip Häusser, Caner Hazırbaş, Vladimir Golkov, Patrick van der Smagt, Daniel Cremers, Thomas Brox
+
+# [Model Architecture](#contents)
+
+The FlowNet2 network is stacked by multiple flownet sub-modules. After the output of the previous network is processed by warp, it is used as the input of the second network.
+
+The model structure is flowNet2CSS and FlowNet2SD two sub-networks fuse the output through the FlownetFusion network, and the entire large network structure formed is FLowNet2
+The FlowNet2CSS network is a stack of FLowNet2C and two FLowNet2S. The specific structure can be further understood according to the paper
+
+This source code provides the following model structure, which can be configured and used in the yaml file:
+
+- FlowNet2S
+- FlowNet2C
+- FlowNet2CS
+- FlowNet2CSS
+- FlowNet2SD
+- FlowNet2
+
+# [Dataset](#contents)
+
+Dataset used: [FlyingChairs](https://lmb.informatik.uni-freiburg.de/resources/datasets/FlyingChairs.en.html)
+
+- Dataset size：31GB，22,872 pairs 512*384 colorful images
+- Data format：PPM
+    - Note：Data will be processed in src/dataset.py
+- you can download here [dataset package](https://lmb.informatik.uni-freiburg.de/data/FlyingChairs/FlyingChairs.zip)
+
+Dataset used: [ChairsSDHom](https://lmb.informatik.uni-freiburg.de/resources/datasets/FlyingChairs.en.html)
+
+- Dataset size: 51GB, 21,668 pairs 512*384 colorful images
+    - Train：20,965 pairs image
+    - Test： 703 pairs image
+- Data format：PNG
+    - Note: Data will be processed in src/dataset.py
+- you can download here [dataset package](https://lmb.informatik.uni-freiburg.de/data/FlowNet2/ChairsSDHom/ChairsSDHom.tar.gz)
+
+Dataset used: [MpiSintel](http://sintel.cs.washington.edu)
+
+- Dataset size: 536M, 1024 x 436 colorful images in 23 classes
+    - MpiSintelClean：1150 images  
+    - MpiSintelFinal： 1150 images
+- Data format：PNG
+    - Note: Data will be processed in src/dataset.py
+- you can download here [dataset package](http://files.is.tue.mpg.de/sintel/MPI-Sintel-complete.zip)
+
+# [Environment Requirements](#contents)
+
+- Hardware（Ascend/GPU/CPU）
+    - Prepare hardware environment with Ascend/GPU/CPU processor.
+- Framework
+    - [MindSpore](https://www.mindspore.cn/install/en)
+- For more information, please check the resources below：
+    - [MindSpore Tutorials](https://www.mindspore.cn/tutorials/en/master/index.html)
+    - [MindSpore Python API](https://www.mindspore.cn/docs/api/en/master/index.html)
+
+# [Quick Start](#contents)
+
+After installing MindSpore via the official website, you can start training and evaluation as follows:
+
+- download pretrained parameter
+
+    FlowNet2 [620MB](https://drive.google.com/file/d/1hF8vS6YeHkx3j2pfCeQqqZGwA_PJq_Da/view?usp=sharing)
+
+    FlowNet2-C [149MB](https://drive.google.com/file/d/1BFT6b7KgKJC8rA59RmOVAXRM_S7aSfKE/view?usp=sharing)
+
+    FlowNet2-CS [297MB](https://drive.google.com/file/d/1iBJ1_o7PloaINpa8m7u_7TsLCX0Dt_jS/view?usp=sharing)
+
+    FlowNet2-CSS [445MB](https://drive.google.com/file/d/157zuzVf4YMN6ABAQgZc8rRmR5cgWzSu8/view?usp=sharing)
+
+    FlowNet2-CSS-ft-sd [445MB](https://drive.google.com/file/d/1R5xafCIzJCXc8ia4TGfC65irmTNiMg6u/view?usp=sharing)
+
+    FlowNet2-S [148MB](https://drive.google.com/file/d/1V61dZjFomwlynwlYklJHC-TLfdFom3Lg/view?usp=sharing)
+
+    FlowNet2-SD [173MB](https://drive.google.com/file/d/1QW03eyYG_vD-dT-Mx4wopYvtPu_msTKn/view?usp=sharing)
+
+- convert pretrained parameter (from pytorch pretrained parameter to mindspore pretained parameter,so the env should both installed torch and mindspore)
+    convert pytorch pretrained parameter to mindspore pretrained parameter
+    the pytorch pretrained parameter are supposed to be downloaded by above link
+
+    ```text
+    bash scripts/run_ckpt_convert.sh [PYTORCH_FILE_PATH] [MINDSPORE_FILE_PATH]
+    # example:
+    bash scripts/run_ckpt_convert.sh /path/to/FlowNet2_checkpoint.pth.tar /path/to/flownet2.ckpt
+    ```
+
+- compile custom operation Correlation and Resample2d
+  after execution,you can check the whether generate correlation.so and resample2d.so under path src/submodels/custom_ops/
+
+    ```text
+    bash scripts/run_compile_custom_ops.sh
+    ```
+
+- config pretrained parameter path in yaml file
+
+  ```text
+  pre_trained:     # whether use pretrained parameter file 1 or 0
+  pre_trained_ckpt_path: # pretrained checkpoint file path
+  # 实例：
+  pre_trained: 1
+  pre_trained_ckpt_path: /path/checkpoint/flownet2.ckpt
+  ```
+
+- config dataset name and path in yaml file
+
+    ```text
+    train_data: [DATASET_NAME]  # Name of dataset, 'FlyingChairs' or 'MpiSintelFinal' or 'MpiSintelClean'
+    train_data_path：[DATASET_PATH] # path of dataset
+    # example：
+    train_data: FlyingChairs
+    train_data_path: /path/to/FlyingChairs_release/data
+    ```
+
+- running on GPU
+
+  For running on GPU, please change `device_target` from `Ascend` to `GPU` in configuration file default_config.yaml
+
+  ```python
+  # run training example
+  export CUDA_VISIBLE_DEVICES=0
+  python train.py > train.log 2>&1 &
+
+  # run distributed training example
+  bash scripts/run_train_gpu.sh 8 0,1,2,3,4,5,6,7
+
+  # run evaluation example
+  python eval.py --eval_checkpoint_path=[EVAL_CHECKPOINT_PATH] > eval.log 2>&1 &  
+  OR
+  bash scripts/run_eval_gpu.sh [MpiSintelClean/MpiSintelFinal] [DATA_PATH] [MODEL_NAME] [CKPT_PATH] [DEVICE_ID]
+  ```
+
+We use FlyingChairs dataset by default. Your can also pass `$dataset_type` to the scripts so that select different datasets. For more details, please refer the specify script.
+
+# [Script Description](#contents)
+
+## [Script and Sample Code](#contents)
+
+```text
+├── model_zoo
+    ├── README.md                          // descriptions about all the models
+    ├── flownet2
+        ├── README.md                      // descriptions about flownet2
+        ├── scripts
+        │   ├── run_ckpt_convert.sh        // shell script for converting pytorch ckpt file to pickle file on GPU
+        │   ├── run_compile_custom_ops.sh  // shell script for compile ops
+        │   ├── run_eval_gpu.sh            // shell script for eval on GPU
+        │   └── run_train_gpu.sh           // shell script for training on GPU
+        ├── src
+        │   ├── dataset.py                 // creating dataset
+        │   ├── eval_callback.py           // eval callback when training
+        │   ├── metric.py                  // metric to calculate mean error
+        │   ├── model_utils
+        │   │   ├── ckpt_convert.py         // convert pytorch ckpt file to pickle file
+        │   │   ├── config.py               // parameter configuration
+        │   │   ├── device_adapter.py       // device adapter
+        │   │   ├── local_adapter.py        // local adapter
+        │   │   ├── moxing_adapter.py       // moxing adapter
+        │   │   ├── frame_utils.py          // utils to read files of dataset
+        │   │   └── tools.py                // tools to match class with paratmeter from config
+        │   ├── models.py                   // FlowNet2/FlowNet2CSS/FlowNet2CS/FlowNet2C/FlowNet2S/FlowNet2SD model
+        │   └── submodels
+        │       ├── custom_ops
+        │       │    ├── correlation.cu        // cuda file for operation correlation
+        │       │    ├── resample2d.cu         // cuda file for operation resample2d
+        │       │    └── custom_ops.py         // definition of correlation and resample2d
+        │       ├── FlowNetC.py             // FlowNetC model
+        │       ├── FlowNetFusion.py        // FlowNetFusion model
+        │       ├── FlowNetS.py             // FlowNetS model
+        │       ├── FlowNetSD.py            // FlowNetSD model
+        │       └── submodules.py           // submodules used in flownet model
+        ├── default_config.yaml             // parameter configuration
+        ├── requirements.txt                // requirements configuration
+        ├── eval.py                         // evaluation script
+        └── train.py                        // training script
+```
+
+## [Script Parameters](#contents)
+
+Parameters for both training and evaluation can be set in config.py
+
+- config for FLowNet2
+
+  ```text
+    # ==============================================================================
+    # Device
+    device_target:          "GPU"
+    device_id:              0
+
+    # Dataset Setup
+    crop_type:              Random                   # Type of cropping operation (Random and Center)
+    crop_size:              [384, 512]               # (Height, Width) of image when training
+    eval_size:              [256, 256]               # (Height, Width) of image when eval
+
+    # Experiment Setup
+    model:                  "FlowNet2"                # Name of model to be loaded
+    rgb_max:                255                       # rgb channel used
+    batchNorm:              False                     # boolean switch to whether add batchnorm before conv
+    lr:                     1e-6                      # Learning rate
+    num_parallel_workers:   2                         # Number of CPU worker used to load data
+    max_rowsize:            2                         # Number of max rowsize used to load data
+    batch_size:             2                         # Numbers of image pairs in a mini-batch
+    epoch_size:             20                        # Total number of epochs
+    pre_trained:            1                         # Load pretrained network
+    pre_trained_ckpt_path:  "/path/flownet2.ckpt"     # Pretrained ckpt path
+    seed:                   1                         # Seed for reproducibility
+    is_dynamicLoss_scale:   0                         # Using dynamicLoss scale or fix scale
+    scale:                  1024                      # Fix scale value
+    weight_decay:           0.00001                   # Weight decay
+    train_data:             "FlyingChairs"            # Train Dataset name
+    train_data_path:        "/path/ds/FlyingChairs_release/data"       # Train Dataset path
+
+    # Train Setup
+    run_distribute:         1                         # Distributed training or not
+    is_save_on_master:      1                         # Only save ckpt on master device
+    save_checkpoint:        1                         # Is save ckpt while training
+    save_ckpt_interval:     1                         # Saving ckpt interval
+    keep_checkpoint_max:    5                         # Max ckpt file number
+    save_checkpoint_path:   "/path/ckpt/"             # Ckpt save path
+
+    # eval Setup
+    eval_data:              "MpiSintelClean"           # Eval Dataset name
+    eval_data_path:         "/home/shm/ds/training"    # Eval Dataset path
+    eval_checkpoint_path:   "/path/flownet2.ckpt"      # Ckpt path used to eval
+    run_evalCallback:       1                          # Is run evalCallBack while training
+    eval_start_epoch:       1                          # EvalCallback start epoch
+    eval_interval:          5                          # EvalCallback running interval
+    save_best_ckpt:         1                          # Is save best ckpt
+
+    # Export Setup
+    mindir_file_name:        "Flownet2"                 # Save file path
+    file_format:             "MINDIR"                   # Save file format
+
+    # Modelarts Setup
+    enable_modelarts:       0                           # Is training on modelarts
+  ```
+
+For more configuration details, please refer the script `config.py`.
+
+## [Training Process](#contents)
+
+### Training
+
+- running on GPU
+
+  ```python
+  export CUDA_VISIBLE_DEVICES=0
+  python train.py > train.log 2>&1 &
+  ```
+
+  ```bash
+  bash scripts/run_train_gpu.sh 1 0
+  ```
+
+  The python command above will run in the background, you can view the results through the file `train.log`.
+
+  After training, you'll get some checkpoint files under the folder `${save_checkpoint_path}/ckpt_0/` by default.
+
+- train.log for flyingchairs
+
+```text
+epoch: 1 step: 2859, loss is 1.0592992305755615
+epoch time: 2454542.145 ms, per step time: 858.532 ms
+epoch: 2 step: 2859, loss is 1.074428915977478
+epoch time: 2416319.469 ms, per step time: 845.162 ms
+epoch: 3 step: 2859, loss is 0.6141664981842041
+epoch time: 2412936.084 ms, per step time: 843.979 ms
+```
+
+- train.log for MpiSintel
+
+```text
+epoch: 1 step: 131, loss is 0.3894098699092865
+epoch time: 114087.253 ms, per step time: 870.895 ms
+epoch: 2 step: 131, loss is 1.822862982749939
+epoch time: 93423.045 ms, per step time: 713.153 ms
+epoch: 3 step: 131, loss is 0.06125941127538681
+epoch time: 93837.971 ms, per step time: 716.320 ms
+```
+
+### Distributed Training
+
+- running on GPU
+
+  ```bash
+  bash scripts/run_train_gpu.sh 8 0,1,2,3,4,5,6,7
+  ```
+
+  The above shell script will run distribute training in the background. You can view the results through the file `train.log`.
+
+- train.log for flyingchairs
+
+```text
+epoch: 1 step: 358, loss is 1.1717915534973145
+epoch: 1 step: 358, loss is 0.6347103118896484
+epoch: 1 step: 358, loss is 1.4680955410003662
+epoch: 1 step: 358, loss is 1.7656424045562744
+epoch: 1 step: 358, loss is 1.1760812997817993
+epoch: 1 step: 358, loss is 0.8203185200691223
+epoch: 1 step: 358, loss is 2.2942874431610107
+epoch: 1 step: 358, loss is 1.3205347061157227
+epoch time: 858929.203 ms, per step time: 2399.244 ms
+epoch time: 859414.930 ms, per step time: 2400.600 ms
+epoch time: 859515.190 ms, per step time: 2400.880 ms
+epoch time: 859614.460 ms, per step time: 2401.158 ms
+epoch time: 859695.493 ms, per step time: 2401.384 ms
+epoch time: 859799.146 ms, per step time: 2401.674 ms
+epoch time: 859995.238 ms, per step time: 2402.221 ms
+epoch time: 860035.718 ms, per step time: 2402.334 ms
+```
+
+## [Evaluation Process](#contents)
+
+### Evaluation
+
+- evaluation on MpiSintelClean dataset when running on GPU
+
+  Before running the command below, please check the checkpoint path used for evaluation. Please set the checkpoint path to be the absolute full path, e.g., "path/flownet2/ckpt/flownet2-125_390.ckpt".
+
+  ```python
+  python eval.py --eval_data=[DATASET_NAME] --eval_data_path=[DATASET_PATH]/
+  --model=[MODEL_NAME] --eval_checkpoint_path=[CHECKPOINT_PATH] > eval.log 2>&1 &  
+  ```
+
+  The above python command will run in the background. You can view the results through the file "eval.log". The accuracy of the test dataset will be as follows:
+
+  ```bash
+  # grep "mean error: " eval.log
+  flownet2 mean error:  {'flownetEPE': 2.112366}
+  ```
+
+  OR,
+
+  ```bash
+  bash scripts/run_eval_gpu.sh [MpiSintelClean/MpiSintelFinal] [DATA_PATH] [MODEL_NAME] [CKPT_PATH] [DEVICE_ID]
+  ```
+
+  The above python command will run in the background. You can view the results through the file "eval/eval.log". The accuracy of the test dataset will be as follows:
+
+  ```text
+  # grep "mean error: " eval.log
+  flownet2 mean error:  {'flownetEPE': 2.112366}
+  ```
+
+# [Model Description](#contents)
+
+## [Performance](#contents)
+
+### Training Performance
+
+#### FlowNet2 train on FlyingChairs
+
+| Parameters                 | GPU                                                                                               |
+|----------------------------|---------------------------------------------------------------------------------------------------|
+| Model Version              | Inception V1                                                                                      |
+| Resource                   | NV SMX2 V100-32G                                                                                  |
+| uploaded Date              | 04/05/2021 (month/day/year)                                                                       |
+| MindSpore Version          | 1.7.0                                                                                             |
+| Dataset                    | FlyingChairs                                                                                      |
+| Training Parameters        | epoch=50, steps=2800, batch_size=8, lr=1e-6                                                       |
+| Optimizer                  | Adam                                                                                              |
+| Loss Function              | L1loss                                                                                            |
+| outputs                    | flow                                                                                              |                                                                                             |
+| Speed                      | 1pc: 152 ms/step;  8pcs: 171 ms/step                                                              |
+| Total time                 | 8pcs: 8.8 hours                                                                                   |
+| Parameters                 | 162,518,834                                                                                       |
+| Checkpoint for Fine tuning | 260M (.ckpt file)                                                                                 |
+| Scripts                    | [flownet2 script](https://gitee.com/mindspore/models/tree/master/research/cv/flownet2) |
+
+### Inference Performance
+
+#### FlowNet2 infer on MpiSintelClean
+
+| Parameters        | GPU                         |
+|-------------------|-----------------------------|
+| Model Version     | Inception V1                |
+| Resource          | NV SMX2 V100-32G            |
+| Uploaded Date     | 04/05/2022 (month/day/year) |
+| MindSpore Version | 1.7.0                       |
+| Dataset           | MpiSintelClean              |
+| batch_size        | 8                           |
+| outputs           | flow                        |
+| Mean Error        | 2.10                        |
+
+# [ModelZoo Homepage](#contents)  
+
+ Please check the official [homepage](https://gitee.com/mindspore/models).  
diff --git a/research/cv/flownet2/default_config.yaml b/research/cv/flownet2/default_config.yaml
new file mode 100644
index 0000000000000000000000000000000000000000..6a21bdd7d65681c0b2e0bb4ba31afe84272e9c8a
--- /dev/null
+++ b/research/cv/flownet2/default_config.yaml
@@ -0,0 +1,48 @@
+# ==============================================================================
+# Device
+device_target:          "GPU"
+device_id:              0
+
+# Dataset Setup
+crop_type:              Random                   # Type of cropping operation (Random and Center) when training
+crop_size:              [384, 512]               # (Height, Width) of image when training
+eval_size:              [256, 256]               # (Height, Width) of image when eval
+
+# Experiment Setup
+model:                  "FlowNet2"                # Name of model to be loaded
+rgb_max:                255                       # rgb channel used
+batchNorm:              False                     # boolean switch to whether add batchnorm before conv
+lr:                     0.0000001                 # Learning rate
+num_parallel_workers:   2                         # Number of CPU worker used to load data
+max_rowsize:            2                         # Number of max rowsize used to load data
+batch_size:             8                         # Numbers of image pairs in a mini-batch
+epoch_size:             20                        # Total number of epochs
+pre_trained:            1                         # Load pretrained network
+pre_trained_ckpt_path:  "/path/flownet2.ckpt"     # Pretrained ckpt path
+seed:                   1                         # Seed for reproducibility
+is_dynamicLoss_scale:   0                         # Using dynamicLoss scale or fix scale
+scale:                  1024                      # Fix scale value
+weight_decay:           0.00001                   # Weight decay
+train_data:             "FlyingChairs"            # Train Dataset name
+train_data_path:        "/path/ds/FlyingChairs_release/data"       # Train Dataset path
+
+# Train Setup
+run_distribute:         0                         # Distributed training or not
+is_save_on_master:      1                         # Only save ckpt on master device
+save_checkpoint:        1                         # Is save ckpt while training
+save_ckpt_interval:     1                         # Saving ckpt interval
+keep_checkpoint_max:    5                         # Max ckpt file number
+save_checkpoint_path:   "/path/ckpt/"             # Ckpt save path
+
+# eval Setup
+eval_data:              "MpiSintelClean"          # Eval Dataset name
+eval_data_path:         "/path/ds/training"       # Eval Dataset path
+eval_checkpoint_path:   "/path/flownet2.ckpt"     # Ckpt path used to eval
+run_evalCallback:       1                         # Is run evalCallBack while training
+eval_start_epoch:       1                         # EvalCallback start epoch
+eval_interval:          1                         # EvalCallback running interval
+save_best_ckpt:         1                         # Is save best ckpt
+
+# Export Setup
+mindir_file_name:        "Flownet2"               # Save file path
+file_format:             "MINDIR"                 # Save file format
diff --git a/research/cv/flownet2/eval.py b/research/cv/flownet2/eval.py
new file mode 100644
index 0000000000000000000000000000000000000000..0034d6a24effeac63647cbcee346a3f79bbeb363
--- /dev/null
+++ b/research/cv/flownet2/eval.py
@@ -0,0 +1,64 @@
+# Copyright 2022 Huawei Technologies Co., Ltd
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ============================================================================
+
+import mindspore.nn as nn
+import mindspore.dataset as ds
+from mindspore import context
+from mindspore.common import set_seed
+from mindspore.context import ParallelMode
+from mindspore.train.model import Model
+from mindspore.train.serialization import load_checkpoint, load_param_into_net
+
+
+import src.dataset as datasets
+import src.models as models
+from src.metric import FlowNetEPE
+import src.model_utils.tools as tools
+from src.model_utils.config import config
+
+def run_eval():
+    set_seed(config.seed)
+    context.set_context(mode=context.GRAPH_MODE, device_target=config.device_target, save_graphs=False)
+    context.set_auto_parallel_context(parallel_mode=ParallelMode.STAND_ALONE, gradients_mean=True, device_num=1)
+    ds.config.set_enable_shared_mem(False)
+    # load dataset by config param
+    config.eval_dataset_class = tools.module_to_dict(datasets)[config.eval_data]
+    flownet_eval_gen = config.eval_dataset_class("Center", config.crop_size, config.eval_size,
+                                                 config.eval_data_path)
+    eval_dataset = ds.GeneratorDataset(flownet_eval_gen, ["images", "flow"]
+                                       , num_parallel_workers=config.num_parallel_workers,
+                                       max_rowsize=config.max_rowsize)
+    eval_dataset = eval_dataset.batch(config.batch_size)
+
+    # load model by config param
+    config.model_class = tools.module_to_dict(models)[config.model]
+    net = config.model_class(config.rgb_max, config.batchNorm)
+
+    loss = nn.L1Loss()
+
+    param_dict = load_checkpoint(config.eval_checkpoint_path)
+    print("load checkpoint from [{}].".format(config.eval_checkpoint_path))
+    load_param_into_net(net, param_dict)
+    net.set_train(False)
+
+    model = Model(net, loss_fn=loss, metrics={'flownetEPE': FlowNetEPE()})
+
+    mean_error = model.eval(eval_dataset, dataset_sink_mode=False)
+
+    print("flownet2 mean error: ", mean_error)
+
+
+if __name__ == '__main__':
+    run_eval()
diff --git a/research/cv/flownet2/requirements.txt b/research/cv/flownet2/requirements.txt
new file mode 100644
index 0000000000000000000000000000000000000000..98b973b4425dc52838a4dfb5f6a8ab6464f5dc59
--- /dev/null
+++ b/research/cv/flownet2/requirements.txt
@@ -0,0 +1,8 @@
+Pillow
+imageio
+matplotlib
+decorator
+numpy
+pytz
+PyYAML
+mindspore_gpu>=1.7.0
\ No newline at end of file
diff --git a/research/cv/flownet2/scripts/run_ckpt_convert.sh b/research/cv/flownet2/scripts/run_ckpt_convert.sh
new file mode 100644
index 0000000000000000000000000000000000000000..67f8f40f4194fc59c5e3ea0746d2f847a992cf4c
--- /dev/null
+++ b/research/cv/flownet2/scripts/run_ckpt_convert.sh
@@ -0,0 +1,43 @@
+#!/bin/bash
+# Copyright 2022 Huawei Technologies Co., Ltd
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ============================================================================
+
+if [ $# -lt 1 ] || [ $# -gt 2 ]; then
+    echo "Usage: bash run_ckpt_convert.sh [PYTORCH_FILE_PATH] [MINDSPORE_FILE_PATH]
+    PYTORCH_FILE_PATH is pytorch pretrained model ckpt file path.
+    MINDSPORE_FILE_PATH is mindspore pretrained model ckpt file path."
+exit 1
+fi
+
+get_real_path(){
+    if [ "${1:0:1}" == "/" ]; then
+        echo "$1"
+    else
+        echo "$(realpath -m $PWD/$1)"
+    fi
+}
+
+torch_file_path=$(get_real_path $1)
+
+if [ ! -f ${torch_file_path} ]; then
+    echo "Pytorch pretrained model ckpt file path does not exist."
+exit 1
+fi
+
+mindspore_file_path=$(get_real_path $2)
+
+BASEPATH=$(cd "`dirname $0`" || exit; pwd)
+
+python3 ${BASEPATH}/../src/model_utils/ckpt_convert.py ${torch_file_path} ${mindspore_file_path}
diff --git a/research/cv/flownet2/scripts/run_compile_custom_ops.sh b/research/cv/flownet2/scripts/run_compile_custom_ops.sh
new file mode 100644
index 0000000000000000000000000000000000000000..618255a887b81f018970218863aa3061962940ea
--- /dev/null
+++ b/research/cv/flownet2/scripts/run_compile_custom_ops.sh
@@ -0,0 +1,22 @@
+#!/bin/bash
+# Copyright 2022 Huawei Technologies Co., Ltd
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ============================================================================
+
+BASEPATH=$(cd "`dirname $0`" || exit; pwd)
+
+CUSTOM_OP_PATH="${BASEPATH}/../src/submodels/custom_ops"
+
+nvcc --shared -Xcompiler -fPIC -o "${CUSTOM_OP_PATH}/correlation.so" "${CUSTOM_OP_PATH}/correlation.cu"
+nvcc --shared -Xcompiler -fPIC -o "${CUSTOM_OP_PATH}/resample2d.so"  "${CUSTOM_OP_PATH}/resample2d.cu"
diff --git a/research/cv/flownet2/scripts/run_eval_gpu.sh b/research/cv/flownet2/scripts/run_eval_gpu.sh
new file mode 100644
index 0000000000000000000000000000000000000000..fc4ef3681028b7a9063ef2fddff3b1620832ef06
--- /dev/null
+++ b/research/cv/flownet2/scripts/run_eval_gpu.sh
@@ -0,0 +1,42 @@
+#!/bin/bash
+# Copyright 2022 Huawei Technologies Co., Ltd
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ============================================================================
+
+if [ $# -lt 4 ]
+then
+    echo "Usage: \
+          bash run_eval_gpu.sh [MpiSintelClean/MpiSintelFinal] [DATA_PATH] [MODEL_NAME] [CKPT_PATH] [DEVICE_ID]\
+          "
+exit 1
+fi
+
+
+export DATA_NAME=$1
+export DATA_PATH=$2
+export MODEL_NAME=$3
+export CKPT_PATH=$4
+export DEVICE_ID=$5
+
+BASEPATH=$(cd "`dirname $0`" || exit; pwd)
+
+ulimit -u unlimited
+
+CONFIG_PATH="${BASEPATH}/../default_config.yaml"
+echo "config path is : ${CONFIG_PATH}"
+
+
+python3 eval.py --config_path=$CONFIG_PATH --eval_data=$DATA_NAME \
+    --eval_data_path=$DATA_PATH --model=$MODEL_NAME --eval_checkpoint_path=$CKPT_PATH \
+    --device_id=$DEVICE_ID --device_target="GPU" > eval.log 2>&1 &
diff --git a/research/cv/flownet2/scripts/run_train_gpu.sh b/research/cv/flownet2/scripts/run_train_gpu.sh
new file mode 100644
index 0000000000000000000000000000000000000000..e3c64c12ae81907a9bcfd79a3fd32049fe858011
--- /dev/null
+++ b/research/cv/flownet2/scripts/run_train_gpu.sh
@@ -0,0 +1,49 @@
+#!/bin/bash
+# Copyright 2022 Huawei Technologies Co., Ltd
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ============================================================================
+
+if [ $# -lt 2 ]
+then
+    echo "Usage: \
+          bash run_train_gpu.sh [DEVICE_NUM] [VISIABLE_DEVICES(0,1,2,3,4,5,6,7)]\
+          "
+exit 1
+fi
+
+if [ $1 -lt 1 ] && [ $1 -gt 8 ]
+then
+    echo "error: DEVICE_NUM=$1 is not in (1-8)"
+exit 1
+fi
+
+export DEVICE_NUM=$1
+export RANK_SIZE=$1
+
+BASEPATH=$(cd "`dirname $0`" || exit; pwd)
+
+ulimit -u unlimited
+export CUDA_VISIBLE_DEVICES="$2"
+
+CONFIG_PATH="${BASEPATH}/../default_config.yaml"
+echo "config path is : ${CONFIG_PATH}"
+
+
+if [ $1 -gt 1 ]
+then
+    mpirun -n $1 --allow-run-as-root --output-filename log_output --merge-stderr-to-stdout \
+    python3 train.py --config_path=$CONFIG_PATH --run_distribute=1 > train.log 2>&1 &
+else
+    python3 train.py --config_path=$CONFIG_PATH --run_distribute=0 > train.log 2>&1 &
+fi
diff --git a/research/cv/flownet2/src/dataset.py b/research/cv/flownet2/src/dataset.py
new file mode 100644
index 0000000000000000000000000000000000000000..c1326702b2983947ca5e41138d2bf89581ea0293
--- /dev/null
+++ b/research/cv/flownet2/src/dataset.py
@@ -0,0 +1,302 @@
+# Copyright 2022 Huawei Technologies Co., Ltd
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ============================================================================
+
+import random
+import math
+from os.path import join
+from os.path import isfile
+from glob import glob
+import numpy as np
+from src.model_utils.frame_utils import read_gen
+
+
+class StaticRandomCrop():
+    def __init__(self, image_size, crop_size):
+        self.th, self.tw = crop_size
+        h, w = image_size
+        self.h1 = random.randint(0, h - self.th)
+        self.w1 = random.randint(0, w - self.tw)
+
+    def __call__(self, img):
+        return img[self.h1:(self.h1 + self.th), self.w1:(self.w1 + self.tw), :]
+
+
+class StaticCenterCrop():
+    def __init__(self, image_size, crop_size):
+        self.th, self.tw = crop_size
+        self.h, self.w = image_size
+
+    def __call__(self, img):
+        return img[(self.h - self.th) // 2:(self.h + self.th) // 2, (self.w - self.tw) // 2:(self.w + self.tw) // 2, :]
+
+
+class DistributedSampler():
+    """
+  Distributed sampler
+  """
+
+    def __init__(self, dataset, rank, group_size, shuffle=True, seed=0):
+        self.dataset = dataset
+        self.rank = rank
+        self.group_size = group_size
+        self.dataset_length = len(self.dataset)
+        self.num_samples = int(math.ceil(self.dataset_length * 1.0 / self.group_size))
+        self.total_size = self.num_samples * self.group_size
+        self.shuffle = shuffle
+        self.seed = seed
+
+    def __iter__(self):
+        if self.shuffle:
+            self.seed = (self.seed + 1) & 0xffffffff
+            np.random.seed(self.seed)
+            indices = np.random.permutation(self.dataset_length).tolist()
+        else:
+            indices = list(range(len(self.dataset_length)))
+        indices += indices[:(self.total_size - len(indices))]
+        assert len(indices) == self.total_size
+        indices = indices[self.rank::self.group_size]
+        assert len(indices) == self.num_samples
+        return iter(indices)
+
+    def __len__(self):
+        return self.num_samples
+
+
+class MpiSintel:
+    def __init__(self, crop_type='Random', crop_size=None, eval_size=None,
+                 root='', dstype='clean', replicates=1):
+        self.crop_type = crop_type
+        if crop_size is None:
+            crop_size = [384, 512]
+        self.crop_size = crop_size
+        if eval_size is None:
+            eval_size = [256, 256]
+        self.render_size = eval_size
+        self.replicates = replicates
+
+        flow_root = join(root, 'flow')
+        image_root = join(root, dstype)
+
+        file_list = sorted(glob(join(flow_root, '*/*.flo')))
+
+        self.flow_list = []
+        self.image_list = []
+
+        for file in file_list:
+            if 'test' in file:
+                # print file
+                continue
+
+            fbase = file[len(flow_root) + 1:]
+            fprefix = fbase[:-8]
+            fnum = int(fbase[-8:-4])
+
+            img1 = join(image_root, fprefix + "%04d" % (fnum + 0) + '.png')
+            img2 = join(image_root, fprefix + "%04d" % (fnum + 1) + '.png')
+
+            if not isfile(img1) or not isfile(img2) or not isfile(file):
+                continue
+
+            self.image_list += [[img1, img2]]
+            self.flow_list += [file]
+
+        self.size = len(self.image_list)
+
+        self.frame_size = read_gen(self.image_list[0][0]).shape
+
+        if (self.render_size[0] < 0) or (self.render_size[1] < 0) or (self.frame_size[0] % 64) or (
+                self.frame_size[1] % 64):
+            self.render_size[0] = ((self.frame_size[0]) // 64) * 64
+            self.render_size[1] = ((self.frame_size[1]) // 64) * 64
+
+        # args.eval_size = self.render_size
+
+        assert len(self.image_list) == len(self.flow_list)
+
+    def __getitem__(self, index):
+
+        index = index % self.size
+
+        img1 = read_gen(self.image_list[index][0])
+        img2 = read_gen(self.image_list[index][1])
+
+        flow = read_gen(self.flow_list[index])
+
+        images = [img1, img2]
+        image_size = img1.shape[:2]
+
+        if self.crop_type == 'Random':
+            cropper = StaticRandomCrop(image_size, self.crop_size)
+        elif self.crop_type == 'Center':
+            cropper = StaticCenterCrop(image_size, self.render_size)
+        images = list(map(cropper, images))
+        flow = cropper(flow)
+
+        images = np.array(images).transpose(3, 0, 1, 2)
+        flow = flow.transpose(2, 0, 1)
+
+        images = images.astype(np.float32)
+        flow = flow.astype(np.float32)
+
+        return images, flow
+
+    def __len__(self):
+        return self.size * self.replicates
+
+
+class MpiSintelClean(MpiSintel):
+    def __init__(self, crop_type, crop_size, eval_size, root, replicates=1):
+        super(MpiSintelClean, self).__init__(crop_type=crop_type, crop_size=crop_size, eval_size=eval_size,
+                                             root=root, dstype='clean', replicates=replicates)
+
+
+class MpiSintelFinal(MpiSintel):
+    def __init__(self, crop_type, crop_size, eval_size, root, replicates=1):
+        super(MpiSintelFinal, self).__init__(crop_type=crop_type, crop_size=crop_size, eval_size=eval_size,
+                                             root=root, dstype='final', replicates=replicates)
+
+
+# definite a DatasetGenerator
+class ChairsSDHom:
+    def __init__(self, crop_type, crop_size, eval_size, root='/path/to/chairssdhom/data', dstype='train', replicates=1):
+        self.crop_type = crop_type
+        self.crop_size = crop_size
+        self.render_size = eval_size
+        self.replicates = replicates
+
+        image1 = sorted(glob(join(root, dstype, 't0/*.png')))
+        image2 = sorted(glob(join(root, dstype, 't1/*.png')))
+        self.flow_list = sorted(glob(join(root, dstype, 'flow/*.pfm')))
+
+        assert len(image1) == len(self.flow_list)
+
+        self.image_list = []
+        for i in range(len(self.flow_list)):
+            im1 = image1[i]
+            im2 = image2[i]
+            self.image_list += [[im1, im2]]
+
+        assert len(self.image_list) == len(self.flow_list)
+
+        self.size = len(self.image_list)
+
+        self.frame_size = read_gen(self.image_list[0][0]).shape
+
+        if (self.render_size[0] < 0) or (self.render_size[1] < 0) or (self.frame_size[0] % 64) or (
+                self.frame_size[1] % 64):
+            self.render_size[0] = ((self.frame_size[0]) // 64) * 64
+            self.render_size[1] = ((self.frame_size[1]) // 64) * 64
+
+        # args.eval_size = self.render_size
+
+    def __getitem__(self, index):
+        index = index % self.size
+
+        img1 = read_gen(self.image_list[index][0])
+        img2 = read_gen(self.image_list[index][1])
+
+        flow = read_gen(self.flow_list[index])
+        flow = flow[::-1, :, :]
+
+        images = [img1, img2]
+        image_size = img1.shape[:2]
+        if self.crop_type == 'Random':
+            cropper = StaticRandomCrop(image_size, self.crop_size)
+        elif self.crop_type == 'Center':
+            cropper = StaticCenterCrop(image_size, self.render_size)
+        images = list(map(cropper, images))
+        flow = cropper(flow)
+
+        images = np.array(images).transpose(3, 0, 1, 2)
+        flow = flow.transpose(2, 0, 1)
+
+        images = images.astype(np.float32)
+        flow = flow.astype(np.float32)
+        return images, flow
+
+    def __len__(self):
+        return self.size * self.replicates
+
+
+class ChairsSDHomTrain(ChairsSDHom):
+    def __init__(self, crop_type, crop_size, eval_size, root='', replicates=1):
+        super(ChairsSDHomTrain, self).__init__(crop_type=crop_type, crop_size=crop_size, eval_size=eval_size,
+                                               root=root, dstype='train', replicates=replicates)
+
+
+class ChairsSDHomTest(ChairsSDHom):
+    def __init__(self, crop_type, crop_size, eval_size, root='', replicates=1):
+        super(ChairsSDHomTest, self).__init__(crop_type=crop_type, crop_size=crop_size, eval_size=eval_size, root=root,
+                                              dstype='test', replicates=replicates)
+
+
+class FlyingChairs:
+    def __init__(self, crop_type, crop_size, eval_size, root='/path/to/FlyingChairs_release/data', replicates=1):
+        self.crop_type = crop_type
+        self.crop_size = crop_size
+        self.render_size = eval_size
+        self.replicates = replicates
+
+        images = sorted(glob(join(root, '*.ppm')))
+
+        self.flow_list = sorted(glob(join(root, '*.flo')))
+
+        assert len(images) // 2 == len(self.flow_list)
+
+        self.image_list = []
+        for i in range(len(self.flow_list)):
+            im1 = images[2 * i]
+            im2 = images[2 * i + 1]
+            self.image_list += [[im1, im2]]
+
+        assert len(self.image_list) == len(self.flow_list)
+
+        self.size = len(self.image_list)
+
+        self.frame_size = read_gen(self.image_list[0][0]).shape
+
+        if (self.render_size[0] < 0) or (self.render_size[1] < 0) or (self.frame_size[0] % 64) or (
+                self.frame_size[1] % 64):
+            self.render_size[0] = ((self.frame_size[0]) // 64) * 64
+            self.render_size[1] = ((self.frame_size[1]) // 64) * 64
+
+        # args.eval_size = self.render_size
+
+    def __getitem__(self, index):
+        index = index % self.size
+
+        img1 = read_gen(self.image_list[index][0])
+        img2 = read_gen(self.image_list[index][1])
+
+        flow = read_gen(self.flow_list[index])
+
+        images = [img1, img2]
+        image_size = img1.shape[:2]
+        if self.crop_type == 'Random':
+            cropper = StaticRandomCrop(image_size, self.crop_size)
+        elif self.crop_type == 'Center':
+            cropper = StaticCenterCrop(image_size, self.render_size)
+        images = list(map(cropper, images))
+        flow = cropper(flow)
+
+        images = np.array(images).transpose(3, 0, 1, 2)
+        flow = flow.transpose(2, 0, 1)
+
+        images = images.astype(np.float32)
+        flow = flow.astype(np.float32)
+        return images, flow
+
+    def __len__(self):
+        return self.size * self.replicates
diff --git a/research/cv/flownet2/src/eval_callback.py b/research/cv/flownet2/src/eval_callback.py
new file mode 100644
index 0000000000000000000000000000000000000000..9002a9341a64e3ee78317991298ca7140dd39540
--- /dev/null
+++ b/research/cv/flownet2/src/eval_callback.py
@@ -0,0 +1,93 @@
+# Copyright 2022 Huawei Technologies Co., Ltd
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ============================================================================
+"""Evaluation callback when training"""
+import os
+import stat
+import time
+from mindspore import save_checkpoint
+from mindspore import log as logger
+from mindspore.train.callback import Callback
+
+class EvalCallBack(Callback):
+    """
+    Evaluation callback when training.
+
+    Args:
+        eval_function (function): evaluation function.
+        eval_param_dict (dict): evaluation parameters' configure dict.
+        interval (int): run evaluation interval, default is 1.
+        eval_start_epoch (int): evaluation start epoch, default is 1.
+        save_best_ckpt (bool): Whether to save best checkpoint, default is True.
+        besk_ckpt_name (str): bast checkpoint name, default is `best.ckpt`.
+        metrics_name (str): evaluation metrics name, default is `acc`.
+
+    Returns:
+        None
+
+    Examples:
+        >>> EvalCallBack(eval_function, eval_param_dict)
+    """
+
+    def __init__(self, eval_function, eval_param_dict, interval=1, eval_start_epoch=1, save_best_ckpt=True,
+                 ckpt_directory="./", besk_ckpt_name="best.ckpt", metrics_name="MeanError"):
+        super(EvalCallBack, self).__init__()
+        self.eval_param_dict = eval_param_dict
+        self.eval_function = eval_function
+        self.eval_start_epoch = eval_start_epoch
+        if interval < 1:
+            raise ValueError("interval should >= 1.")
+        self.interval = interval
+        self.save_best_ckpt = save_best_ckpt
+        self.best_res = 10
+        self.best_epoch = 0
+        if not os.path.isdir(ckpt_directory):
+            os.makedirs(ckpt_directory)
+        self.bast_ckpt_path = os.path.join(ckpt_directory, besk_ckpt_name)
+        self.metrics_name = metrics_name
+
+    def remove_ckpoint_file(self, file_name):
+        """Remove the specified checkpoint file from this checkpoint manager and also from the directory."""
+        try:
+            os.chmod(file_name, stat.S_IWRITE)
+            os.remove(file_name)
+        except OSError:
+            logger.warning("OSError, failed to remove the older ckpt file %s.", file_name)
+        except ValueError:
+            logger.warning("ValueError, failed to remove the older ckpt file %s.", file_name)
+
+    def epoch_end(self, run_context):
+        """Callback when epoch end."""
+        cb_params = run_context.original_args()
+        cur_epoch = cb_params.cur_epoch_num
+        if cur_epoch >= self.eval_start_epoch and (cur_epoch - self.eval_start_epoch) % self.interval == 0:
+            eval_start = time.time()
+            res = self.eval_function(self.eval_param_dict)
+            eval_cost = time.time() - eval_start
+            print("epoch: {}, {}: {}, eval_cost:{:.2f}".format(cur_epoch, self.metrics_name, res, eval_cost),
+                  flush=True)
+            if res <= self.best_res:
+                self.best_res = res
+                self.best_epoch = cur_epoch
+                print("update best result: {}".format(res), flush=True)
+                if self.save_best_ckpt:
+                    if os.path.exists(self.bast_ckpt_path):
+                        self.remove_ckpoint_file(self.bast_ckpt_path)
+                    save_checkpoint(cb_params.train_network, self.bast_ckpt_path)
+                    print("update best checkpoint at: {}".format(self.bast_ckpt_path), flush=True)
+
+    def end(self, run_context):
+        print("End training, the best {0} is: {1}, the best {0} epoch is {2}".format(self.metrics_name,
+                                                                                     self.best_res,
+                                                                                     self.best_epoch), flush=True)
diff --git a/research/cv/flownet2/src/metric.py b/research/cv/flownet2/src/metric.py
new file mode 100644
index 0000000000000000000000000000000000000000..2e71ead2172bfa6fe9237bad87f3975f00e7902f
--- /dev/null
+++ b/research/cv/flownet2/src/metric.py
@@ -0,0 +1,44 @@
+# Copyright 2022 Huawei Technologies Co., Ltd
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ============================================================================
+import mindspore as ms
+import mindspore.ops as ops
+import mindspore.nn as nn
+from mindspore.nn import rearrange_inputs
+import numpy as np
+
+class FlowNetEPE(nn.Metric):
+    def __init__(self):
+        super(FlowNetEPE, self).__init__()
+        self.norm_op = nn.Norm(axis=1)
+        self.mean = ops.ReduceMean()
+
+    def clear(self):
+        self._abs_error_sum = []
+        self._samples_num = 0
+
+    @rearrange_inputs
+    def update(self, *inputs):
+        if len(inputs) != 2:
+            raise ValueError('The MAE needs 2 inputs (y_pred, y), but got {}'.format(len(inputs)))
+        y_pred = self._convert_data(inputs[0])
+        y = self._convert_data(inputs[1])
+        abs_error_sum = self.mean(self.norm_op(ms.Tensor(y) - ms.Tensor(y_pred)))
+        self._abs_error_sum.append(abs_error_sum.asnumpy().sum())
+        self._samples_num += y.shape[0]
+
+    def eval(self):
+        if self._samples_num == 0:
+            raise RuntimeError('The total number of samples must not be 0.')
+        return np.array(self._abs_error_sum).mean()
diff --git a/research/cv/flownet2/src/model_utils/ckpt_convert.py b/research/cv/flownet2/src/model_utils/ckpt_convert.py
new file mode 100644
index 0000000000000000000000000000000000000000..f8fb9f7689dbc6f42abf2c4af8a16780f33d52b0
--- /dev/null
+++ b/research/cv/flownet2/src/model_utils/ckpt_convert.py
@@ -0,0 +1,40 @@
+# Copyright 2022 Huawei Technologies Co., Ltd
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ============================================================================
+
+import sys
+from mindspore import Tensor
+from mindspore.train.serialization import save_checkpoint
+import torch
+
+def torch_to_mindspore(torch_file_path, mindspore_file_path):
+    ckpt = torch.load(torch_file_path, map_location=torch.device('cpu'))
+    mindspore_params_list = []
+    par_dict = ckpt['state_dict']
+    for name in par_dict:
+        print(name)
+        param_dict = {}
+        parameter = par_dict[name]
+        print(parameter.size())
+        param_dict['name'] = name
+        param_dict['data'] = Tensor(parameter.numpy())
+        mindspore_params_list.append(param_dict)
+    save_checkpoint(mindspore_params_list, mindspore_file_path)
+    print('convert pytorch ckpt file to mindspore ckpt file success !')
+
+
+if __name__ == '__main__':
+    torch_ckpt_file_path = sys.argv[1]
+    mindspore_ckpt_file_path = sys.argv[2]
+    torch_to_mindspore(torch_ckpt_file_path, mindspore_ckpt_file_path)
diff --git a/research/cv/flownet2/src/model_utils/config.py b/research/cv/flownet2/src/model_utils/config.py
new file mode 100644
index 0000000000000000000000000000000000000000..79cf623c9348100fb931658a785ea0408193e55d
--- /dev/null
+++ b/research/cv/flownet2/src/model_utils/config.py
@@ -0,0 +1,127 @@
+# Copyright 2022 Huawei Technologies Co., Ltd
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ============================================================================
+
+"""Parse arguments"""
+
+import os
+import ast
+import argparse
+from pprint import pprint, pformat
+import yaml
+
+class Config:
+    """
+    Configuration namespace. Convert dictionary to members.
+    """
+    def __init__(self, cfg_dict):
+        for k, v in cfg_dict.items():
+            if isinstance(v, (list, tuple)):
+                setattr(self, k, [Config(x) if isinstance(x, dict) else x for x in v])
+            else:
+                setattr(self, k, Config(v) if isinstance(v, dict) else v)
+
+    def __str__(self):
+        return pformat(self.__dict__)
+
+    def __repr__(self):
+        return self.__str__()
+
+
+def parse_cli_to_yaml(parser, cfg, helper=None, choices=None, cfg_path="default_config.yaml"):
+    """
+    Parse command line arguments to the configuration according to the default yaml.
+
+    Args:
+        parser: Parent parser.
+        cfg: Base configuration.
+        helper: Helper description.
+        cfg_path: Path to the default yaml config.
+    """
+    parser = argparse.ArgumentParser(description="[REPLACE THIS at config.py]",
+                                     parents=[parser])
+    helper = {} if helper is None else helper
+    choices = {} if choices is None else choices
+    for item in cfg:
+        if not isinstance(cfg[item], list) and not isinstance(cfg[item], dict):
+            help_description = helper[item] if item in helper else "Please reference to {}".format(cfg_path)
+            choice = choices[item] if item in choices else None
+            if isinstance(cfg[item], bool):
+                parser.add_argument("--" + item, type=ast.literal_eval, default=cfg[item], choices=choice,
+                                    help=help_description)
+            else:
+                parser.add_argument("--" + item, type=type(cfg[item]), default=cfg[item], choices=choice,
+                                    help=help_description)
+    args = parser.parse_args()
+    return args
+
+
+def parse_yaml(yaml_path):
+    """
+    Parse the yaml config file.
+
+    Args:
+        yaml_path: Path to the yaml config.
+    """
+    with open(yaml_path, 'r') as fin:
+        try:
+            cfgs = yaml.load_all(fin.read(), Loader=yaml.FullLoader)
+            cfgs = [x for x in cfgs]
+            if len(cfgs) == 1:
+                cfg_helper = {}
+                cfg = cfgs[0]
+                cfg_choices = {}
+            elif len(cfgs) == 2:
+                cfg, cfg_helper = cfgs
+                cfg_choices = {}
+            elif len(cfgs) == 3:
+                cfg, cfg_helper, cfg_choices = cfgs
+            else:
+                raise ValueError("At most 3 docs (config, description for help, choices) are supported in config yaml")
+            print(cfg_helper)
+        except:
+            raise ValueError("Failed to parse yaml")
+    return cfg, cfg_helper, cfg_choices
+
+
+def merge(args, cfg):
+    """
+    Merge the base config from yaml file and command line arguments.
+
+    Args:
+        args: Command line arguments.
+        cfg: Base configuration.
+    """
+    args_var = vars(args)
+    for item in args_var:
+        cfg[item] = args_var[item]
+    return cfg
+
+
+def get_config():
+    """
+    Get Config according to the yaml file and cli arguments.
+    """
+    parser = argparse.ArgumentParser(description="default name", add_help=False)
+    current_dir = os.path.dirname(os.path.abspath(__file__))
+    parser.add_argument("--config_path", type=str, default=os.path.join(current_dir, "../../default_config.yaml"),
+                        help="Config file path")
+    path_args, _ = parser.parse_known_args()
+    default, helper, choices = parse_yaml(path_args.config_path)
+    pprint(default)
+    args = parse_cli_to_yaml(parser=parser, cfg=default, helper=helper, choices=choices, cfg_path=path_args.config_path)
+    final_config = merge(args, default)
+    return Config(final_config)
+
+config = get_config()
diff --git a/research/cv/flownet2/src/model_utils/frame_utils.py b/research/cv/flownet2/src/model_utils/frame_utils.py
new file mode 100644
index 0000000000000000000000000000000000000000..33c26443a2ea209c33c297b68d31088eda1c6885
--- /dev/null
+++ b/research/cv/flownet2/src/model_utils/frame_utils.py
@@ -0,0 +1,76 @@
+# Copyright 2022 Huawei Technologies Co., Ltd
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ============================================================================
+
+import re
+from os.path import splitext
+import numpy as np
+from imageio import imread
+
+def read_gen(file_name):
+    ext = splitext(file_name)[-1]
+    if ext in ('.png', '.jpeg', '.ppm', '.jpg'):
+        im = imread(file_name)
+        if im.shape[2] > 3:
+            return im[:, :, :3]
+        return im
+    if ext in ('.bin', '.raw'):
+        return np.load(file_name)
+    if ext == '.flo':
+        return readFlow(file_name).astype(np.float32)
+    if ext == '.pfm':
+        return readPFM(file_name).astype(np.float32)
+    return []
+
+def  readFlow(fn):
+    """ Read .flo file in Middlebury format"""
+    with open(fn, 'rb') as f:
+        magic = np.fromfile(f, np.float32, count=1)
+        if magic != 202021.25:
+            print('Magic number incorrect. Invalid .flo file')
+            return None
+        w = np.fromfile(f, np.int32, count=1)
+        h = np.fromfile(f, np.int32, count=1)
+        data = np.fromfile(f, np.float32, count=2*int(w)*int(h))
+        return np.resize(data, (int(h), int(w), 2))
+
+def readPFM(file):
+    file = open(file, 'rb')
+    header = file.readline().rstrip()
+    if header in ('PF', b'PF'):
+        color = True
+    elif header in ('Pf', b'Pf'):
+        color = False
+    else:
+        raise Exception('Not a PFM file.')
+    wh = bytes.decode(file.readline())
+    dim_match = re.match(r'^(\d+)\s+(\d+)$', wh.strip())
+
+    if dim_match:
+        width, height = map(int, dim_match.groups())
+    else:
+        raise Exception('Malformed PFM header.')
+    scale = float(file.readline().decode().rstrip())
+    if scale < 0:  # little-endian
+        endian = '<'
+        scale = -scale
+    else:
+        endian = '>'  # big-endian
+
+    data = np.fromfile(file, endian + 'f')
+    shape = (height, width, 3) if color else (height, width)
+
+    data = np.reshape(data, shape)
+    data = np.flipud(data)
+    return data
diff --git a/research/cv/flownet2/src/model_utils/local_adapter.py b/research/cv/flownet2/src/model_utils/local_adapter.py
new file mode 100644
index 0000000000000000000000000000000000000000..0e7c529384330b0bf45a1a3f7c85a7244f3fdc3f
--- /dev/null
+++ b/research/cv/flownet2/src/model_utils/local_adapter.py
@@ -0,0 +1,36 @@
+# Copyright 2022 Huawei Technologies Co., Ltd
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ============================================================================
+
+"""Local adapter"""
+
+import os
+
+def get_device_id():
+    device_id = os.getenv('DEVICE_ID', '0')
+    return int(device_id)
+
+
+def get_device_num():
+    device_num = os.getenv('RANK_SIZE', '1')
+    return int(device_num)
+
+
+def get_rank_id():
+    global_rank_id = os.getenv('RANK_ID', '0')
+    return int(global_rank_id)
+
+
+def get_job_id():
+    return "Local Job"
diff --git a/research/cv/flownet2/src/model_utils/moxing_adapter.py b/research/cv/flownet2/src/model_utils/moxing_adapter.py
new file mode 100644
index 0000000000000000000000000000000000000000..e68e3dfeb9ebd79bab3c099a26270b969135cc04
--- /dev/null
+++ b/research/cv/flownet2/src/model_utils/moxing_adapter.py
@@ -0,0 +1,116 @@
+# Copyright 2022 Huawei Technologies Co., Ltd
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ============================================================================
+
+"""Moxing adapter for ModelArts"""
+
+import os
+import functools
+from mindspore import context
+from config import config
+
+_global_sync_count = 0
+
+def get_device_id():
+    device_id = os.getenv('DEVICE_ID', '0')
+    return int(device_id)
+
+
+def get_device_num():
+    device_num = os.getenv('RANK_SIZE', '1')
+    return int(device_num)
+
+
+def get_rank_id():
+    global_rank_id = os.getenv('RANK_ID', '0')
+    return int(global_rank_id)
+
+
+def get_job_id():
+    job_id = os.getenv('JOB_ID')
+    job_id = job_id if job_id != "" else "default"
+    return job_id
+
+def sync_data(from_path, to_path):
+    """
+    Download data from remote obs to local directory if the first url is remote url and the second one is local path
+    Upload data from local directory to remote obs in contrast.
+    """
+    import moxing as mox
+    import time
+    global _global_sync_count
+    sync_lock = "/tmp/copy_sync.lock" + str(_global_sync_count)
+    _global_sync_count += 1
+
+    # Each server contains 8 devices as most.
+    if get_device_id() % min(get_device_num(), 8) == 0 and not os.path.exists(sync_lock):
+        print("from path: ", from_path)
+        print("to path: ", to_path)
+        mox.file.copy_parallel(from_path, to_path)
+        print("===finish data synchronization===")
+        try:
+            os.mknod(sync_lock)
+        except IOError:
+            pass
+        print("===save flag===")
+
+    while True:
+        if os.path.exists(sync_lock):
+            break
+        time.sleep(1)
+
+    print("Finish sync data from {} to {}.".format(from_path, to_path))
+
+
+def moxing_wrapper(pre_process=None, post_process=None):
+    """
+    Moxing wrapper to download dataset and upload outputs.
+    """
+    def wrapper(run_func):
+        @functools.wraps(run_func)
+        def wrapped_func(*args, **kwargs):
+            # Download data from data_url
+            if config.enable_modelarts:
+                if config.data_url:
+                    sync_data(config.data_url, config.data_path)
+                    print("Dataset downloaded: ", os.listdir(config.data_path))
+                if config.checkpoint_url:
+                    sync_data(config.checkpoint_url, config.load_path)
+                    print("Preload downloaded: ", os.listdir(config.load_path))
+                if config.train_url:
+                    sync_data(config.train_url, config.output_path)
+                    print("Workspace downloaded: ", os.listdir(config.output_path))
+
+                context.set_context(save_graphs_path=os.path.join(config.output_path, str(get_rank_id())))
+                config.device_num = get_device_num()
+                config.device_id = get_device_id()
+                if not os.path.exists(config.output_path):
+                    os.makedirs(config.output_path)
+
+                if pre_process:
+                    pre_process()
+
+            # Run the main function
+            run_func(*args, **kwargs)
+
+            # Upload data to train_url
+            if config.enable_modelarts:
+                if post_process:
+                    post_process()
+
+                if config.train_url:
+                    print("Start to copy output directory")
+                    sync_data(config.output_path, config.train_url)
+        return wrapped_func
+    return wrapper
diff --git a/research/cv/flownet2/src/model_utils/tools.py b/research/cv/flownet2/src/model_utils/tools.py
new file mode 100644
index 0000000000000000000000000000000000000000..e78cba2782326373d45b55689b364f24c3d261fa
--- /dev/null
+++ b/research/cv/flownet2/src/model_utils/tools.py
@@ -0,0 +1,24 @@
+# Copyright 2022 Huawei Technologies Co., Ltd
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ============================================================================
+from inspect import isclass
+
+def module_to_dict(module, exclude=None):
+    if exclude is None:
+        exclude = []
+    module_dict = {}
+    for x in dir(module):
+        if isclass(getattr(module, x)) and x not in exclude and getattr(module, x) not in exclude:
+            module_dict[x] = getattr(module, x)
+    return module_dict
diff --git a/research/cv/flownet2/src/models.py b/research/cv/flownet2/src/models.py
new file mode 100644
index 0000000000000000000000000000000000000000..469b655acdf0489910563919272633ab285b6447
--- /dev/null
+++ b/research/cv/flownet2/src/models.py
@@ -0,0 +1,452 @@
+# Copyright 2022 Huawei Technologies Co., Ltd
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ============================================================================
+
+import mindspore.nn as nn
+import mindspore.ops as ops
+
+from .submodels import FlowNetC
+from .submodels import FlowNetS
+from .submodels import FlowNetSD
+from .submodels import FlowNetFusion
+from .submodels.custom_ops.custom_ops import Resample2D as Resample2d
+from .submodels.submodules import ChannelNorm
+from .submodels.submodules import Upsample
+
+Parameter_count = 162, 518, 834
+
+class FlowNet2(nn.Cell):
+
+    def __init__(self, rgb_max=255, batchNorm=False, div_flow=20.):
+        super(FlowNet2, self).__init__()
+        self.batchNorm = batchNorm
+        self.div_flow = div_flow
+        self.rgb_max = rgb_max
+
+        self.channelnorm = ChannelNorm(axis=1)
+
+        # First Block (FlowNetC)
+        self.flownetc = FlowNetC.FlowNetC(batchNorm=self.batchNorm)
+
+        self.upsample1 = Upsample(scale_factor=4, mode='bilinear')
+
+        self.resample1 = Resample2d()
+
+        # Block (FlowNetS1)
+        self.flownets_1 = FlowNetS.FlowNetS(batchNorm=self.batchNorm)
+        self.upsample2 = Upsample(scale_factor=4, mode='bilinear')
+        self.resample2 = Resample2d()
+
+        # Block (FlowNetS2)
+        self.flownets_2 = FlowNetS.FlowNetS(batchNorm=self.batchNorm)
+
+        # Block (FlowNetSD)
+        self.flownets_d = FlowNetSD.FlowNetSD(batchNorm=self.batchNorm)
+
+        self.upsample3 = Upsample(scale_factor=4, mode='nearest')
+        self.upsample4 = Upsample(scale_factor=4, mode='nearest')
+
+        self.resample3 = Resample2d()
+
+        self.resample4 = Resample2d()
+
+        # Block (FLowNetFusion)
+        self.flownetfusion = FlowNetFusion.FlowNetFusion(batchNorm=self.batchNorm)
+
+        self.concat_op = ops.Concat(1)
+        self.mean = ops.ReduceMean()
+
+        for c in self.cells():
+            if isinstance(c, nn.Conv2d):
+                if c.bias_init is not None:
+                    c.bias_init = 'Uniform'
+                c.weight_init = 'XavierUniform'
+
+            if isinstance(c, nn.Conv2dTranspose):
+                if c.bias_init is not None:
+                    c.bias_init = 'Uniform'
+                c.weight_init = 'XavierUniform'
+
+
+    def construct(self, inputs):
+        rgb_mean = inputs.view(inputs.shape[:2] + (-1,)).mean(axis=-1).view(inputs.shape[:2] + (1, 1, 1,))
+
+        x = (inputs - rgb_mean) / self.rgb_max
+        x1 = x[:, :, 0, :, :]
+        x2 = x[:, :, 1, :, :]
+
+        x = self.concat_op((x1, x2))
+
+        # flownetc
+        flownetc_flow2 = self.flownetc(x)[0]
+        flownetc_flow = self.upsample1(flownetc_flow2 * self.div_flow)
+
+        # warp img1 to img0; magnitude of diff between img0 and and warped_img1,
+        resampled_img1 = self.resample1(x[:, 3:, :, :], flownetc_flow)
+        diff_img0 = x[:, :3, :, :] - resampled_img1
+        norm_diff_img0 = self.channelnorm(diff_img0)
+
+        # concat img0, img1, img1->img0, flow, diff-mag ;
+        concat1 = self.concat_op((x, resampled_img1, flownetc_flow / self.div_flow, norm_diff_img0))
+
+        # flownets1
+        flownets1_flow2 = self.flownets_1(concat1)[0]
+        flownets1_flow = self.upsample2(flownets1_flow2 * self.div_flow)
+
+        # warp img1 to img0 using flownets1; magnitude of diff between img0 and and warped_img1
+        resampled_img1 = self.resample2(x[:, 3:, :, :], flownets1_flow)
+        diff_img0 = x[:, :3, :, :] - resampled_img1
+        norm_diff_img0 = self.channelnorm(diff_img0)
+
+        # concat img0, img1, img1->img0, flow, diff-mag
+        concat2 = self.concat_op((x, resampled_img1, flownets1_flow / self.div_flow, norm_diff_img0))
+
+        # flownets2
+        flownets2_flow2 = self.flownets_2(concat2)[0]
+        flownets2_flow = self.upsample4(flownets2_flow2 * self.div_flow)
+        norm_flownets2_flow = self.channelnorm(flownets2_flow)
+
+        diff_flownets2_flow = self.resample4(x[:, 3:, :, :], flownets2_flow)
+
+        diff_flownets2_img1 = self.channelnorm((x[:, :3, :, :] - diff_flownets2_flow))
+
+        # flownetsd
+        flownetsd_flow2 = self.flownets_d(x)[0]
+        flownetsd_flow = self.upsample3(flownetsd_flow2 / self.div_flow)
+        norm_flownetsd_flow = self.channelnorm(flownetsd_flow)
+
+        diff_flownetsd_flow = self.resample3(x[:, 3:, :, :], flownetsd_flow)
+
+        diff_flownetsd_img1 = self.channelnorm((x[:, :3, :, :] - diff_flownetsd_flow))
+
+        # concat img1 flownetsd, flownets2, norm_flownetsd, norm_flownets2, diff_flownetsd_img1, diff_flownets2_img1
+        concat3 = self.concat_op(
+            (x[:, :3, :, :], flownetsd_flow, flownets2_flow, norm_flownetsd_flow, norm_flownets2_flow,
+             diff_flownetsd_img1, diff_flownets2_img1))
+        flownetfusion_flow = self.flownetfusion(concat3)
+
+        return flownetfusion_flow
+
+
+class FlowNet2C(FlowNetC.FlowNetC):
+    def __init__(self, rgb_max, batchNorm=False, div_flow=20):
+        super(FlowNet2C, self).__init__(batchNorm=batchNorm, div_flow=div_flow)
+        self.rgb_max = rgb_max
+        self.concat_op = ops.Concat(1)
+        self.mean = ops.ReduceMean()
+
+    def construct(self, inputs):
+        rgb_mean = self.mean(inputs.view(inputs.shape[:2] + (-1,)), -1).view(inputs.shape[:2] + (1, 1, 1,))
+
+        x = (inputs - rgb_mean) / self.rgb_max
+        x1 = x[:, :, 0, :, :]
+        x2 = x[:, :, 1, :, :]
+
+        # FlownetC top input stream
+        out_conv1a = self.conv1(x1)
+        out_conv2a = self.conv2(out_conv1a)
+        out_conv3a = self.conv3(out_conv2a)
+
+        # FlownetC bottom input stream
+        out_conv1b = self.conv1(x2)
+
+        out_conv2b = self.conv2(out_conv1b)
+        out_conv3b = self.conv3(out_conv2b)
+
+        # Merge streams
+        out_corr = self.corr(out_conv3a, out_conv3b)  # False
+        out_corr = self.corr_activation(out_corr)
+
+        # Redirect top input stream and concatenate
+        out_conv_redir = self.conv_redir(out_conv3a)
+
+        in_conv3_1 = self.concat_op((out_conv_redir, out_corr))
+
+        # Merged conv layers
+        out_conv3_1 = self.conv3_1(in_conv3_1)
+
+        out_conv4 = self.conv4_1(self.conv4(out_conv3_1))
+
+        out_conv5 = self.conv5_1(self.conv5(out_conv4))
+        out_conv6 = self.conv6_1(self.conv6(out_conv5))
+
+        flow6 = self.predict_flow6(out_conv6)
+        flow6_up = self.upsampled_flow6_to_5(flow6)
+        out_deconv5 = self.deconv5(out_conv6)
+
+        concat5 = self.concat_op((out_conv5, out_deconv5, flow6_up))
+
+        flow5 = self.predict_flow5(concat5)
+        flow5_up = self.upsampled_flow5_to_4(flow5)
+        out_deconv4 = self.deconv4(concat5)
+        concat4 = self.concat_op((out_conv4, out_deconv4, flow5_up))
+
+        flow4 = self.predict_flow4(concat4)
+        flow4_up = self.upsampled_flow4_to_3(flow4)
+        out_deconv3 = self.deconv3(concat4)
+        concat3 = self.concat_op((out_conv3_1, out_deconv3, flow4_up))
+
+        flow3 = self.predict_flow3(concat3)
+        flow3_up = self.upsampled_flow3_to_2(flow3)
+        out_deconv2 = self.deconv2(concat3)
+        concat2 = self.concat_op((out_conv2a, out_deconv2, flow3_up))
+
+        flow2 = self.predict_flow2(concat2)
+
+        if self.training:
+            return flow2, flow3, flow4, flow5, flow6
+        return self.upsample1(flow2 * self.div_flow)
+
+
+class FlowNet2S(FlowNetS.FlowNetS):
+    def __init__(self, rgb_max=255, batchNorm=False, div_flow=20):
+        super(FlowNet2S, self).__init__(input_channels=6, batchNorm=batchNorm)
+        self.rgb_max = rgb_max
+        self.div_flow = div_flow
+        self.concat_op = ops.Concat(1)
+        self.mean = ops.ReduceMean()
+
+    def construct(self, inputs):
+        rgb_mean = self.mean(inputs.view(inputs.shape[:2] + (-1,)), -1).view(inputs.shape[:2] + (1, 1, 1,))
+        x = (inputs - rgb_mean) / self.rgb_max
+        x = self.concat_op((x[:, :, 0, :, :], x[:, :, 1, :, :]))
+
+        out_conv1 = self.conv1(x)
+
+        out_conv2 = self.conv2(out_conv1)
+        out_conv3 = self.conv3_1(self.conv3(out_conv2))
+        out_conv4 = self.conv4_1(self.conv4(out_conv3))
+        out_conv5 = self.conv5_1(self.conv5(out_conv4))
+        out_conv6 = self.conv6_1(self.conv6(out_conv5))
+
+        flow6 = self.predict_flow6(out_conv6)
+        flow6_up = self.upsampled_flow6_to_5(flow6)
+        out_deconv5 = self.deconv5(out_conv6)
+
+        concat5 = self.concat_op((out_conv5, out_deconv5, flow6_up))
+        flow5 = self.predict_flow5(concat5)
+        flow5_up = self.upsampled_flow5_to_4(flow5)
+        out_deconv4 = self.deconv4(concat5)
+
+        concat4 = self.concat_op((out_conv4, out_deconv4, flow5_up))
+        flow4 = self.predict_flow4(concat4)
+        flow4_up = self.upsampled_flow4_to_3(flow4)
+        out_deconv3 = self.deconv3(concat4)
+
+        concat3 = self.concat_op((out_conv3, out_deconv3, flow4_up))
+        flow3 = self.predict_flow3(concat3)
+        flow3_up = self.upsampled_flow3_to_2(flow3)
+        out_deconv2 = self.deconv2(concat3)
+
+        concat2 = self.concat_op((out_conv2, out_deconv2, flow3_up))
+        flow2 = self.predict_flow2(concat2)
+
+        if self.training:
+            return flow2, flow3, flow4, flow5, flow6
+        return self.upsample1(flow2 * self.div_flow)
+
+
+class FlowNet2SD(FlowNetSD.FlowNetSD):
+    def __init__(self, rgb_max=255, batchNorm=False, div_flow=20):
+        super(FlowNet2SD, self).__init__(batchNorm=batchNorm)
+        self.rgb_max = rgb_max
+        self.div_flow = div_flow
+        self.concat_op = ops.Concat(1)
+        self.mean = ops.ReduceMean()
+
+    def construct(self, inputs):
+        rgb_mean = self.mean(inputs.view(inputs.shape[:2] + (-1,)), -1).view(inputs.shape[:2] + (1, 1, 1,))
+        x = (inputs - rgb_mean) / self.rgb_max
+        x = self.concat_op((x[:, :, 0, :, :], x[:, :, 1, :, :]))
+
+        out_conv0 = self.conv0(x)
+        out_conv1 = self.conv1_1(self.conv1(out_conv0))
+        out_conv2 = self.conv2_1(self.conv2(out_conv1))
+
+        out_conv3 = self.conv3_1(self.conv3(out_conv2))
+        out_conv4 = self.conv4_1(self.conv4(out_conv3))
+        out_conv5 = self.conv5_1(self.conv5(out_conv4))
+        out_conv6 = self.conv6_1(self.conv6(out_conv5))
+
+        flow6 = self.predict_flow6(out_conv6)
+        flow6_up = self.upsampled_flow6_to_5(flow6)
+        out_deconv5 = self.deconv5(out_conv6)
+
+        concat5 = self.concat_op((out_conv5, out_deconv5, flow6_up))
+        out_interconv5 = self.inter_conv5(concat5)
+        flow5 = self.predict_flow5(out_interconv5)
+
+        flow5_up = self.upsampled_flow5_to_4(flow5)
+        out_deconv4 = self.deconv4(concat5)
+
+        concat4 = self.concat_op((out_conv4, out_deconv4, flow5_up))
+        out_interconv4 = self.inter_conv4(concat4)
+        flow4 = self.predict_flow4(out_interconv4)
+        flow4_up = self.upsampled_flow4_to_3(flow4)
+        out_deconv3 = self.deconv3(concat4)
+
+        concat3 = self.concat_op((out_conv3, out_deconv3, flow4_up))
+        out_interconv3 = self.inter_conv3(concat3)
+        flow3 = self.predict_flow3(out_interconv3)
+        flow3_up = self.upsampled_flow3_to_2(flow3)
+        out_deconv2 = self.deconv2(concat3)
+
+        concat2 = self.concat_op((out_conv2, out_deconv2, flow3_up))
+        out_interconv2 = self.inter_conv2(concat2)
+        flow2 = self.predict_flow2(out_interconv2)
+
+        if self.training:
+            return flow2, flow3, flow4, flow5, flow6
+        return self.upsample1(flow2 * self.div_flow)
+
+
+class FlowNet2CS(nn.Cell):
+
+    def __init__(self, rgb_max=255, batchNorm=False, div_flow=20.):
+        super(FlowNet2CS, self).__init__()
+        self.batchNorm = batchNorm
+        self.div_flow = div_flow
+        self.rgb_max = rgb_max
+
+        self.channelnorm = ChannelNorm(axis=1)
+
+        # First Block (FlowNetC)
+        self.flownetc = FlowNetC.FlowNetC(batchNorm=self.batchNorm)
+        self.upsample1 = Upsample(scale_factor=4, mode='bilinear')
+
+        self.resample1 = Resample2d()
+
+        # Block (FlowNetS1)
+        self.flownets_1 = FlowNetS.FlowNetS(batchNorm=self.batchNorm)
+        self.upsample2 = Upsample(scale_factor=4, mode='bilinear')
+
+        self.concat_op = ops.Concat(1)
+        self.mean = ops.ReduceMean()
+
+        for c in self.cells():
+            if isinstance(c, nn.Conv2d):
+                if c.bias_init is not None:
+                    c.bias_init = 'Uniform'
+                c.weight_init = 'XavierUniform'
+
+            if isinstance(c, nn.Conv2dTranspose):
+                if c.bias_init is not None:
+                    c.bias_init = 'Uniform'
+                c.weight_init = 'XavierUniform'
+
+    def construct(self, inputs):
+        rgb_mean = self.mean(inputs.view(inputs.shape[:2] + (-1,)), -1).view(inputs.shape[:2] + (1, 1, 1,))
+
+        x = (inputs - rgb_mean) / self.rgb_max
+        x1 = x[:, :, 0, :, :]
+        x2 = x[:, :, 1, :, :]
+        x = self.concat_op((x1, x2))
+
+        # flownetc
+        flownetc_flow2 = self.flownetc(x)[0]
+        flownetc_flow = self.upsample1(flownetc_flow2 * self.div_flow)
+
+        # warp img1 to img0; magnitude of diff between img0 and and warped_img1,
+        resampled_img1 = self.resample1(x[:, 3:, :, :], flownetc_flow)
+        diff_img0 = x[:, :3, :, :] - resampled_img1
+        norm_diff_img0 = self.channelnorm(diff_img0)
+
+        # concat img0, img1, img1->img0, flow, diff-mag ;
+        concat1 = self.concat_op((x, resampled_img1, flownetc_flow / self.div_flow, norm_diff_img0))
+
+        # flownets1
+        flownets1_flow2 = self.flownets_1(concat1)[0]
+        flownets1_flow = self.upsample2(flownets1_flow2 * self.div_flow)
+
+        return flownets1_flow
+
+
+class FlowNet2CSS(nn.Cell):
+
+    def __init__(self, rgb_max=255, batchNorm=False, div_flow=20.):
+        super(FlowNet2CSS, self).__init__()
+        self.batchNorm = batchNorm
+        self.div_flow = div_flow
+        self.rgb_max = rgb_max
+
+        self.channelnorm = ChannelNorm(axis=1)
+
+        # First Block (FlowNetC)
+        self.flownetc = FlowNetC.FlowNetC(batchNorm=self.batchNorm)
+        self.upsample1 = Upsample(scale_factor=4, mode='bilinear')
+
+        self.resample1 = Resample2d()
+
+        # Block (FlowNetS1)
+        self.flownets_1 = FlowNetS.FlowNetS(batchNorm=self.batchNorm)
+        self.upsample2 = Upsample(scale_factor=4, mode='bilinear')
+
+        self.resample2 = Resample2d()
+
+        # Block (FlowNetS2)
+        self.flownets_2 = FlowNetS.FlowNetS(batchNorm=self.batchNorm)
+        self.upsample3 = Upsample(scale_factor=4, mode='nearest')
+
+        self.concat_op = ops.Concat(1)
+        self.mean = ops.ReduceMean()
+
+        for c in self.cells():
+            if isinstance(c, nn.Conv2d):
+                if c.bias_init is not None:
+                    c.bias_init = 'Uniform'
+                c.weight_init = 'XavierUniform'
+
+            if isinstance(c, nn.Conv2dTranspose):
+                if c.bias_init is not None:
+                    c.bias_init = 'Uniform'
+                c.weight_init = 'XavierUniform'
+
+
+    def construct(self, inputs):
+        rgb_mean = self.mean(inputs.view(inputs.shape[:2] + (-1,)), -1).view(inputs.shape[:2] + (1, 1, 1,))
+
+        x = (inputs - rgb_mean) / self.rgb_max
+        x1 = x[:, :, 0, :, :]
+        x2 = x[:, :, 1, :, :]
+        x = self.concat_op((x1, x2))
+
+        # flownetc
+        flownetc_flow2 = self.flownetc(x)[0]
+        flownetc_flow = self.upsample1(flownetc_flow2 * self.div_flow)
+
+        # warp img1 to img0; magnitude of diff between img0 and and warped_img1,
+        resampled_img1 = self.resample1(x[:, 3:, :, :], flownetc_flow)
+        diff_img0 = x[:, :3, :, :] - resampled_img1
+        norm_diff_img0 = self.channelnorm(diff_img0)
+
+        # concat img0, img1, img1->img0, flow, diff-mag ;
+        concat1 = self.concat_op((x, resampled_img1, flownetc_flow / self.div_flow, norm_diff_img0))
+
+        # flownets1
+        flownets1_flow2 = self.flownets_1(concat1)[0]
+        flownets1_flow = self.upsample2(flownets1_flow2 * self.div_flow)
+
+        # warp img1 to img0 using flownets1; magnitude of diff between img0 and and warped_img1
+        resampled_img1 = self.resample2(x[:, 3:, :, :], flownets1_flow)
+        diff_img0 = x[:, :3, :, :] - resampled_img1
+        norm_diff_img0 = self.channelnorm(diff_img0)
+
+        # concat img0, img1, img1->img0, flow, diff-mag
+        concat2 = self.concat_op((x, resampled_img1, flownets1_flow / self.div_flow, norm_diff_img0))
+
+        # flownets2
+        flownets2_flow2 = self.flownets_2(concat2)[0]
+        flownets2_flow = self.upsample3(flownets2_flow2 * self.div_flow)
+
+        return flownets2_flow
diff --git a/research/cv/flownet2/src/submodels/FlowNetC.py b/research/cv/flownet2/src/submodels/FlowNetC.py
new file mode 100644
index 0000000000000000000000000000000000000000..66f16e0df53be4d22c20d10d4c7b0357274a51ed
--- /dev/null
+++ b/research/cv/flownet2/src/submodels/FlowNetC.py
@@ -0,0 +1,139 @@
+# Copyright 2022 Huawei Technologies Co., Ltd
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ============================================================================
+
+import mindspore.nn as nn
+import mindspore.ops as ops
+from .custom_ops.custom_ops import Correlation
+from .submodules import conv
+from .submodules import predict_flow
+from .submodules import deconv
+from .submodules import Upsample
+
+
+Parameter_count = 39, 175, 298
+
+
+class FlowNetC(nn.Cell):
+    def __init__(self, batchNorm=True, div_flow=20):
+        super(FlowNetC, self).__init__()
+
+        self.batchNorm = batchNorm
+        self.div_flow = div_flow
+
+        self.conv1 = conv(self.batchNorm, 3, 64, kernel_size=7, stride=2)
+        self.conv2 = conv(self.batchNorm, 64, 128, kernel_size=5, stride=2)
+        self.conv3 = conv(self.batchNorm, 128, 256, kernel_size=5, stride=2)
+        self.conv_redir = conv(self.batchNorm, 256, 32, kernel_size=1, stride=1)
+
+        self.corr = Correlation(pad_size=20, kernel_size=1, max_displacement=20, stride1=1, stride2=2)
+
+        self.corr_activation = nn.LeakyReLU(0.1)
+        self.conv3_1 = conv(self.batchNorm, 473, 256)
+        self.conv4 = conv(self.batchNorm, 256, 512, stride=2)
+        self.conv4_1 = conv(self.batchNorm, 512, 512)
+        self.conv5 = conv(self.batchNorm, 512, 512, stride=2)
+        self.conv5_1 = conv(self.batchNorm, 512, 512)
+        self.conv6 = conv(self.batchNorm, 512, 1024, stride=2)
+        self.conv6_1 = conv(self.batchNorm, 1024, 1024)
+
+        self.deconv5 = deconv(1024, 512)
+        self.deconv4 = deconv(1026, 256)
+        self.deconv3 = deconv(770, 128)
+        self.deconv2 = deconv(386, 64)
+
+        self.predict_flow6 = predict_flow(1024)
+        self.predict_flow5 = predict_flow(1026)
+        self.predict_flow4 = predict_flow(770)
+        self.predict_flow3 = predict_flow(386)
+        self.predict_flow2 = predict_flow(194)
+
+        self.upsampled_flow6_to_5 = nn.Conv2dTranspose(2, 2, 4, 2, pad_mode='pad', padding=1, has_bias=True)
+        self.upsampled_flow5_to_4 = nn.Conv2dTranspose(2, 2, 4, 2, pad_mode='pad', padding=1, has_bias=True)
+        self.upsampled_flow4_to_3 = nn.Conv2dTranspose(2, 2, 4, 2, pad_mode='pad', padding=1, has_bias=True)
+        self.upsampled_flow3_to_2 = nn.Conv2dTranspose(2, 2, 4, 2, pad_mode='pad', padding=1, has_bias=True)
+
+        self.concat_op = ops.Concat(1)
+
+        for c in self.cells():
+            if isinstance(c, nn.Conv2d):
+                if c.bias_init is not None:
+                    c.bias_init = 'Uniform'
+                c.weight_init = 'XavierUniform'
+
+            if isinstance(c, nn.Conv2dTranspose):
+                if c.bias_init is not None:
+                    c.bias_init = 'Uniform'
+                c.weight_init = 'XavierUniform'
+
+        self.upsample1 = Upsample(scale_factor=4, mode='bilinear')
+
+    def construct(self, x):
+        x1 = x[:, 0:3, :, :]
+        x2 = x[:, 3::, :, :]
+
+        out_conv1a = self.conv1(x1)
+        out_conv2a = self.conv2(out_conv1a)
+        out_conv3a = self.conv3(out_conv2a)
+
+        # FlownetC bottom input stream
+        out_conv1b = self.conv1(x2)
+
+        out_conv2b = self.conv2(out_conv1b)
+        out_conv3b = self.conv3(out_conv2b)
+
+        # Merge streams
+        out_corr = self.corr(out_conv3a,
+                             out_conv3b)  # 未打印 Correlation(pad_size=20, kernel_size=1, max_displacement=20, stride1=1, stride2=2)
+        out_corr = self.corr_activation(out_corr)  # nn.LeakyReLU(0.1)
+
+        # Redirect top input stream and concatenate
+        out_conv_redir = self.conv_redir(out_conv3a)  # 已打印 conv(self.batchNorm, 256,   32, kernel_size=1, stride=1)
+
+        in_conv3_1 = self.concat_op((out_conv_redir, out_corr))
+
+        # Merged conv layers
+        out_conv3_1 = self.conv3_1(in_conv3_1)
+
+        out_conv4 = self.conv4_1(self.conv4(out_conv3_1))
+
+        out_conv5 = self.conv5_1(self.conv5(out_conv4))
+        out_conv6 = self.conv6_1(self.conv6(out_conv5))
+
+        flow6 = self.predict_flow6(out_conv6)
+        flow6_up = self.upsampled_flow6_to_5(flow6)
+        out_deconv5 = self.deconv5(out_conv6)
+
+        concat5 = self.concat_op((out_conv5, out_deconv5, flow6_up))
+
+        flow5 = self.predict_flow5(concat5)
+        flow5_up = self.upsampled_flow5_to_4(flow5)
+        out_deconv4 = self.deconv4(concat5)
+        concat4 = self.concat_op((out_conv4, out_deconv4, flow5_up))
+
+        flow4 = self.predict_flow4(concat4)
+        flow4_up = self.upsampled_flow4_to_3(flow4)
+        out_deconv3 = self.deconv3(concat4)
+        concat3 = self.concat_op((out_conv3_1, out_deconv3, flow4_up))
+
+        flow3 = self.predict_flow3(concat3)
+        flow3_up = self.upsampled_flow3_to_2(flow3)
+        out_deconv2 = self.deconv2(concat3)
+        concat2 = self.concat_op((out_conv2a, out_deconv2, flow3_up))
+
+        flow2 = self.predict_flow2(concat2)
+
+        if self.training:
+            return flow2, flow3, flow4, flow5, flow6
+        return flow2, None
diff --git a/research/cv/flownet2/src/submodels/FlowNetFusion.py b/research/cv/flownet2/src/submodels/FlowNetFusion.py
new file mode 100644
index 0000000000000000000000000000000000000000..2440eac5c9f711babad59847140d8007c3456286
--- /dev/null
+++ b/research/cv/flownet2/src/submodels/FlowNetFusion.py
@@ -0,0 +1,82 @@
+# Copyright 2022 Huawei Technologies Co., Ltd
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ============================================================================
+
+from mindspore import nn
+import mindspore.ops as ops
+from .submodules import conv
+from .submodules import deconv
+from .submodules import i_conv
+from .submodules import predict_flow
+
+Parameter_count = 581, 226
+
+
+class FlowNetFusion(nn.Cell):
+    def __init__(self, batchNorm=True):
+        super(FlowNetFusion, self).__init__()
+
+        self.batchNorm = batchNorm
+        self.conv0 = conv(self.batchNorm, 11, 64)
+        self.conv1 = conv(self.batchNorm, 64, 64, stride=2)
+        self.conv1_1 = conv(self.batchNorm, 64, 128)
+        self.conv2 = conv(self.batchNorm, 128, 128, stride=2)
+        self.conv2_1 = conv(self.batchNorm, 128, 128)
+
+        self.deconv1 = deconv(128, 32)
+        self.deconv0 = deconv(162, 16)
+
+        self.inter_conv1 = i_conv(self.batchNorm, 162, 32)
+        self.inter_conv0 = i_conv(self.batchNorm, 82, 16)
+
+        self.predict_flow2 = predict_flow(128)
+        self.predict_flow1 = predict_flow(32)
+        self.predict_flow0 = predict_flow(16)
+
+        self.upsampled_flow2_to_1 = nn.Conv2dTranspose(2, 2, 4, 2, pad_mode='pad', padding=1, has_bias=True)
+        self.upsampled_flow1_to_0 = nn.Conv2dTranspose(2, 2, 4, 2, pad_mode='pad', padding=1, has_bias=True)
+
+        self.concat_op = ops.Concat(1)
+
+        for c in self.cells():
+            if isinstance(c, nn.Conv2d):
+                if c.bias_init is not None:
+                    c.bias_init = 'Uniform'
+                c.weight_init = 'XavierUniform'
+
+            if isinstance(c, nn.Conv2dTranspose):
+                if c.bias_init is not None:
+                    c.bias_init = 'Uniform'
+                c.weight_init = 'XavierUniform'
+
+    def construct(self, x):
+        out_conv0 = self.conv0(x)
+        out_conv1 = self.conv1_1(self.conv1(out_conv0))
+        out_conv2 = self.conv2_1(self.conv2(out_conv1))
+
+        flow2 = self.predict_flow2(out_conv2)
+        flow2_up = self.upsampled_flow2_to_1(flow2)
+        out_deconv1 = self.deconv1(out_conv2)
+
+        concat1 = self.concat_op((out_conv1, out_deconv1, flow2_up))
+        out_interconv1 = self.inter_conv1(concat1)
+        flow1 = self.predict_flow1(out_interconv1)
+        flow1_up = self.upsampled_flow1_to_0(flow1)
+        out_deconv0 = self.deconv0(concat1)
+
+        concat0 = self.concat_op((out_conv0, out_deconv0, flow1_up))
+        out_interconv0 = self.inter_conv0(concat0)
+        flow0 = self.predict_flow0(out_interconv0)
+
+        return flow0
diff --git a/research/cv/flownet2/src/submodels/FlowNetS.py b/research/cv/flownet2/src/submodels/FlowNetS.py
new file mode 100644
index 0000000000000000000000000000000000000000..0ee92ae23826af80cf21e8e4ba6f781043c5fccc
--- /dev/null
+++ b/research/cv/flownet2/src/submodels/FlowNetS.py
@@ -0,0 +1,104 @@
+# Copyright 2022 Huawei Technologies Co., Ltd
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ============================================================================
+
+from mindspore import nn
+import mindspore.ops as ops
+from .submodules import conv
+from .submodules import deconv
+from .submodules import predict_flow
+from .submodules import Upsample
+
+
+class FlowNetS(nn.Cell):
+    def __init__(self, input_channels=12, batchNorm=True):
+        super(FlowNetS, self).__init__()
+
+        self.batchNorm = batchNorm
+        self.conv1 = conv(self.batchNorm, input_channels, 64, kernel_size=7, stride=2)
+        self.conv2 = conv(self.batchNorm, 64, 128, kernel_size=5, stride=2)
+        self.conv3 = conv(self.batchNorm, 128, 256, kernel_size=5, stride=2)
+        self.conv3_1 = conv(self.batchNorm, 256, 256)
+        self.conv4 = conv(self.batchNorm, 256, 512, stride=2)
+        self.conv4_1 = conv(self.batchNorm, 512, 512)
+        self.conv5 = conv(self.batchNorm, 512, 512, stride=2)
+        self.conv5_1 = conv(self.batchNorm, 512, 512)
+        self.conv6 = conv(self.batchNorm, 512, 1024, stride=2)
+        self.conv6_1 = conv(self.batchNorm, 1024, 1024)
+
+        self.deconv5 = deconv(1024, 512)
+        self.deconv4 = deconv(1026, 256)
+        self.deconv3 = deconv(770, 128)
+        self.deconv2 = deconv(386, 64)
+
+        self.predict_flow6 = predict_flow(1024)
+        self.predict_flow5 = predict_flow(1026)
+        self.predict_flow4 = predict_flow(770)
+        self.predict_flow3 = predict_flow(386)
+        self.predict_flow2 = predict_flow(194)
+
+        self.upsampled_flow6_to_5 = nn.Conv2dTranspose(2, 2, 4, 2, pad_mode='pad', padding=1, has_bias=False)
+        self.upsampled_flow5_to_4 = nn.Conv2dTranspose(2, 2, 4, 2, pad_mode='pad', padding=1, has_bias=False)
+        self.upsampled_flow4_to_3 = nn.Conv2dTranspose(2, 2, 4, 2, pad_mode='pad', padding=1, has_bias=False)
+        self.upsampled_flow3_to_2 = nn.Conv2dTranspose(2, 2, 4, 2, pad_mode='pad', padding=1, has_bias=False)
+
+        self.concat_op = ops.Concat(1)
+
+        for c in self.cells():
+            if isinstance(c, nn.Conv2d):
+                if c.bias_init is not None:
+                    c.bias_init = 'Uniform'
+                c.weight_init = 'XavierUniform'
+
+            if isinstance(c, nn.Conv2dTranspose):
+                if c.bias_init is not None:
+                    c.bias_init = 'Uniform'
+                c.weight_init = 'XavierUniform'
+
+        self.upsample1 = Upsample(scale_factor=4, mode='bilinear')
+
+    def construct(self, x):
+        out_conv1 = self.conv1(x)
+
+        out_conv2 = self.conv2(out_conv1)
+        out_conv3 = self.conv3_1(self.conv3(out_conv2))
+        out_conv4 = self.conv4_1(self.conv4(out_conv3))
+        out_conv5 = self.conv5_1(self.conv5(out_conv4))
+        out_conv6 = self.conv6_1(self.conv6(out_conv5))
+
+        flow6 = self.predict_flow6(out_conv6)
+        flow6_up = self.upsampled_flow6_to_5(flow6)
+        out_deconv5 = self.deconv5(out_conv6)
+
+        concat5 = self.concat_op((out_conv5, out_deconv5, flow6_up))
+        flow5 = self.predict_flow5(concat5)
+        flow5_up = self.upsampled_flow5_to_4(flow5)
+        out_deconv4 = self.deconv4(concat5)
+
+        concat4 = self.concat_op((out_conv4, out_deconv4, flow5_up))
+        flow4 = self.predict_flow4(concat4)
+        flow4_up = self.upsampled_flow4_to_3(flow4)
+        out_deconv3 = self.deconv3(concat4)
+
+        concat3 = self.concat_op((out_conv3, out_deconv3, flow4_up))
+        flow3 = self.predict_flow3(concat3)
+        flow3_up = self.upsampled_flow3_to_2(flow3)
+        out_deconv2 = self.deconv2(concat3)
+
+        concat2 = self.concat_op((out_conv2, out_deconv2, flow3_up))
+        flow2 = self.predict_flow2(concat2)
+
+        if self.training:
+            return flow2, flow3, flow4, flow5, flow6
+        return flow2, None
diff --git a/research/cv/flownet2/src/submodels/FlowNetSD.py b/research/cv/flownet2/src/submodels/FlowNetSD.py
new file mode 100644
index 0000000000000000000000000000000000000000..ba64a6a2671009c1ffb053c2f90f3cd5e5b2cdd1
--- /dev/null
+++ b/research/cv/flownet2/src/submodels/FlowNetSD.py
@@ -0,0 +1,122 @@
+# Copyright 2022 Huawei Technologies Co., Ltd
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ============================================================================
+
+from mindspore import nn
+import mindspore.ops as ops
+from .submodules import conv
+from .submodules import deconv
+from .submodules import i_conv
+from .submodules import predict_flow
+from .submodules import Upsample
+
+Parameter_count = 45, 371, 666
+
+
+class FlowNetSD(nn.Cell):
+    def __init__(self, batchNorm=True):
+        super(FlowNetSD, self).__init__()
+
+        self.batchNorm = batchNorm
+        self.conv0 = conv(self.batchNorm, 6, 64)
+        self.conv1 = conv(self.batchNorm, 64, 64, stride=2)
+        self.conv1_1 = conv(self.batchNorm, 64, 128)
+        self.conv2 = conv(self.batchNorm, 128, 128, stride=2)
+        self.conv2_1 = conv(self.batchNorm, 128, 128)
+        self.conv3 = conv(self.batchNorm, 128, 256, stride=2)
+        self.conv3_1 = conv(self.batchNorm, 256, 256)
+        self.conv4 = conv(self.batchNorm, 256, 512, stride=2)
+        self.conv4_1 = conv(self.batchNorm, 512, 512)
+        self.conv5 = conv(self.batchNorm, 512, 512, stride=2)
+        self.conv5_1 = conv(self.batchNorm, 512, 512)
+        self.conv6 = conv(self.batchNorm, 512, 1024, stride=2)
+        self.conv6_1 = conv(self.batchNorm, 1024, 1024)
+
+        self.deconv5 = deconv(1024, 512)
+        self.deconv4 = deconv(1026, 256)
+        self.deconv3 = deconv(770, 128)
+        self.deconv2 = deconv(386, 64)
+
+        self.inter_conv5 = i_conv(self.batchNorm, 1026, 512)
+        self.inter_conv4 = i_conv(self.batchNorm, 770, 256)
+        self.inter_conv3 = i_conv(self.batchNorm, 386, 128)
+        self.inter_conv2 = i_conv(self.batchNorm, 194, 64)
+
+        self.predict_flow6 = predict_flow(1024)
+        self.predict_flow5 = predict_flow(512)
+        self.predict_flow4 = predict_flow(256)
+        self.predict_flow3 = predict_flow(128)
+        self.predict_flow2 = predict_flow(64)
+
+        self.upsampled_flow6_to_5 = nn.Conv2dTranspose(2, 2, 4, 2, pad_mode='pad', padding=1, has_bias=True)
+        self.upsampled_flow5_to_4 = nn.Conv2dTranspose(2, 2, 4, 2, pad_mode='pad', padding=1, has_bias=True)
+        self.upsampled_flow4_to_3 = nn.Conv2dTranspose(2, 2, 4, 2, pad_mode='pad', padding=1, has_bias=True)
+        self.upsampled_flow3_to_2 = nn.Conv2dTranspose(2, 2, 4, 2, pad_mode='pad', padding=1, has_bias=True)
+
+        self.concat_op = ops.Concat(1)
+
+        for c in self.cells():
+            if isinstance(c, nn.Conv2d):
+                if c.bias_init is not None:
+                    c.bias_init = 'Uniform'
+                c.weight_init = 'XavierUniform'
+
+            if isinstance(c, nn.Conv2dTranspose):
+                if c.bias_init is not None:
+                    c.bias_init = 'Uniform'
+                c.weight_init = 'XavierUniform'
+
+        self.upsample1 = Upsample(scale_factor=4, mode='bilinear')
+
+    def construct(self, x):
+        # print(x.shape)
+        out_conv0 = self.conv0(x)
+        out_conv1 = self.conv1_1(self.conv1(out_conv0))
+        out_conv2 = self.conv2_1(self.conv2(out_conv1))
+
+        out_conv3 = self.conv3_1(self.conv3(out_conv2))
+        out_conv4 = self.conv4_1(self.conv4(out_conv3))
+        out_conv5 = self.conv5_1(self.conv5(out_conv4))
+        out_conv6 = self.conv6_1(self.conv6(out_conv5))
+
+        flow6 = self.predict_flow6(out_conv6)
+        flow6_up = self.upsampled_flow6_to_5(flow6)
+        out_deconv5 = self.deconv5(out_conv6)
+
+        concat5 = self.concat_op((out_conv5, out_deconv5, flow6_up))
+        out_interconv5 = self.inter_conv5(concat5)
+        flow5 = self.predict_flow5(out_interconv5)
+
+        flow5_up = self.upsampled_flow5_to_4(flow5)
+        out_deconv4 = self.deconv4(concat5)
+
+        concat4 = self.concat_op((out_conv4, out_deconv4, flow5_up))
+        out_interconv4 = self.inter_conv4(concat4)
+        flow4 = self.predict_flow4(out_interconv4)
+        flow4_up = self.upsampled_flow4_to_3(flow4)
+        out_deconv3 = self.deconv3(concat4)
+
+        concat3 = self.concat_op((out_conv3, out_deconv3, flow4_up))
+        out_interconv3 = self.inter_conv3(concat3)
+        flow3 = self.predict_flow3(out_interconv3)
+        flow3_up = self.upsampled_flow3_to_2(flow3)
+        out_deconv2 = self.deconv2(concat3)
+
+        concat2 = self.concat_op((out_conv2, out_deconv2, flow3_up))
+        out_interconv2 = self.inter_conv2(concat2)
+        flow2 = self.predict_flow2(out_interconv2)
+
+        if self.training:
+            return flow2, flow3, flow4, flow5, flow6
+        return flow2, None
diff --git a/research/cv/flownet2/src/submodels/__init__.py b/research/cv/flownet2/src/submodels/__init__.py
new file mode 100644
index 0000000000000000000000000000000000000000..e69de29bb2d1d6434b8b29ae775ad8c2e48c5391
diff --git a/research/cv/flownet2/src/submodels/custom_ops/correlation.cu b/research/cv/flownet2/src/submodels/custom_ops/correlation.cu
new file mode 100644
index 0000000000000000000000000000000000000000..4e40a00f1435955b77a2daca2073996c6b4f435e
--- /dev/null
+++ b/research/cv/flownet2/src/submodels/custom_ops/correlation.cu
@@ -0,0 +1,392 @@
+// Copyright 2022 Huawei Technologies Co., Ltd
+//
+// Licensed under the Apache License, Version 2.0 (the "License");
+// you may not use this file except in compliance with the License.
+// You may obtain a copy of the License at
+//
+// http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing, software
+// distributed under the License is distributed on an "AS IS" BASIS,
+// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+// See the License for the specific language governing permissions and
+// limitations under the License.
+// ============================================================================
+
+#include <thrust/reduce.h>
+#include <stdio.h>
+#include <algorithm>
+
+#define CUDA_NUM_THREADS 1024
+#define THREADS_PER_BLOCK 32
+#define FULL_MASK 0xffffffff
+
+__global__ void correlationInitKernel(size_t size_init, float *input) {
+    auto idx = blockIdx.x * CUDA_NUM_THREADS + threadIdx.x;
+    if (idx < size_init) {
+        input[idx] = static_cast<float>(.0);
+    }
+}
+
+
+__forceinline__ __device__ float warpReduceSum(float value) {
+        for (int offset = 16; offset > 0; offset /= 2)
+                value += __shfl_down_sync(FULL_MASK, value, offset);
+        return value;
+}
+
+
+__forceinline__ __device__ float blockReduceSum(float value) {
+        static __shared__ float shared[32];
+        int lane = threadIdx.x % warpSize;
+        int windex = threadIdx.x / warpSize;
+        value = warpReduceSum(value);
+        if (lane == 0)
+                shared[windex] = value;
+
+        __syncthreads();
+
+        value = (threadIdx.x < blockDim.x / warpSize) ? shared[lane] : 0;
+
+        if (windex == 0)
+                value = warpReduceSum(value);
+        return value;
+}
+
+__global__ void correlation_forward(float*  output,
+                const int tdim_cyx, const int tdim_yx, const int tdim_x,
+                const float* padded_input1, const float* padded_input2,
+                const int pdim_yxc, const int pdim_xc, const int pdim_c,
+                const int kernel_size, const int max_displacement, const int stride1, const int stride2) {
+        int32_t kernel_radius = (kernel_size - 1) / 2;
+        int32_t displacement_radius = max_displacement / stride2;
+        int32_t displacement_size = 2 * displacement_radius + 1;
+
+        int32_t nums = kernel_size * kernel_size * pdim_c;
+
+        int32_t n = blockIdx.x;
+        int32_t y1 = blockIdx.y * stride1 + max_displacement;
+        int32_t x1 = blockIdx.z * stride1 + max_displacement;
+        int32_t c = threadIdx.x;
+
+        // along channel axism, do element-wise product
+        for (int t_j = -displacement_radius; t_j <= displacement_radius; ++t_j) {
+                for (int t_i = -displacement_radius; t_i <= displacement_radius; ++t_i) {
+                        int x2 = x1 + t_i * stride2;
+                        int y2 = y1 + t_j * stride2;
+                        float acc = 0.0f;
+                        // add 2 feature kernel_radius
+                        for (int j = -kernel_radius; j <= kernel_radius; ++j) {
+                                for (int i = -kernel_radius; i <= kernel_radius; ++i) {
+                                        #pragma unroll
+                                        for (int ch = c; ch < pdim_c; ch += blockDim.x) {
+                                                int index1 = n * pdim_yxc + (y1 + j) * pdim_xc + (x1 + i) * pdim_c + ch;
+                                                int index2 = n * pdim_yxc + (y2 + j) * pdim_xc + (x2 + i) * pdim_c + ch;
+                                                acc += static_cast<float>(padded_input1[index1] *
+                                                                          padded_input2[index2]);
+                                        }
+                                }
+                        }
+
+                        if (blockDim.x == warpSize) {
+                            __syncwarp();
+                            acc = warpReduceSum(acc);
+                        } else {
+                            __syncthreads();
+                            acc = blockReduceSum(acc);
+                        }
+
+                        if (threadIdx.x == 0) {
+                                int tc = (t_j + displacement_radius) * displacement_size
+                                                + (t_i + displacement_radius);
+                                const int tindex = n * tdim_cyx + tc * tdim_yx + blockIdx.y * tdim_x + blockIdx.z;
+                                output[tindex] = static_cast<float>(acc / nums);
+                        }
+                }
+        }
+}
+
+extern "C" int correlation(int nparam, void **params, int *ndims, int64_t **shapes, const char **dtypes, void *stream,
+                         void *extra) {
+  cudaStream_t custream = static_cast<cudaStream_t>(stream);
+  constexpr int OUTPUT_INDEX = 2;
+  constexpr int INPUT_INDEX = 0;
+  constexpr int TOTAL_PARAM_NUM = 3;
+  if (nparam != TOTAL_PARAM_NUM) {
+    return 1;
+  }
+  // This is to check if the type of parameters the same as what the user wants.
+  for (int i = 0; i < nparam; i++) {
+    if (strcmp(dtypes[i], "float32") != 0) {
+      return 2;
+    }
+  }
+  // input1's index is 0, input2's index is 1 and output's index is 2
+  float *input1 = static_cast<float *>(params[0]);
+  float *input2 = static_cast<float *>(params[1]);
+  float *output = static_cast<float *>(params[2]);
+
+  int batchSize = shapes[OUTPUT_INDEX][0];
+  int outputChannels = shapes[OUTPUT_INDEX][1];
+  int outputHeight = shapes[OUTPUT_INDEX][2];
+  int outputWidth = shapes[OUTPUT_INDEX][3];
+  int inputChannels = shapes[INPUT_INDEX][3];
+  int inputHeight = shapes[INPUT_INDEX][1];
+  int inputWidth = shapes[INPUT_INDEX][2];
+
+  // notice: At Currently the parameter used in cuda is fixed because the interface have no place to pass parameters
+  // need to be changed in future
+  const int kernel_size = 1;
+  const int max_displacement = 20;
+  const int stride1 = 1;
+  const int stride2 = 2;
+
+  int output_size = batchSize*outputChannels*outputWidth*outputHeight;
+  int n = output_size / CUDA_NUM_THREADS;
+  correlationInitKernel<<<n + 1, CUDA_NUM_THREADS, 0, custream>>>(output_size, output);
+
+  dim3 threadsPerBlock(THREADS_PER_BLOCK);
+  dim3 totalBlocksCorr(batchSize, outputHeight, outputWidth);
+
+  int32_t pdim_yxc = inputHeight * inputWidth * inputChannels;
+  int32_t pdim_xc = inputWidth * inputChannels;
+  int32_t pdim_c = inputChannels;
+
+  int32_t tdim_cyx = outputChannels * outputHeight * outputWidth;
+  int32_t tdim_yx = outputHeight * outputWidth;
+  int32_t tdim_x = outputWidth;
+
+  correlation_forward<<<totalBlocksCorr, threadsPerBlock, 0, custream>>>
+                        (output, tdim_cyx, tdim_yx, tdim_x,
+                         input1, input2, pdim_yxc, pdim_xc, pdim_c,
+                         kernel_size, max_displacement, stride1, stride2);
+  return 0;
+}
+
+// correlation_backward_input1 kernel
+__global__ void correlation_backward_input1(int item, float *grad_input_1,
+                                            const int p_dim_yxc, const int p_dim_xc, const int p_dim_c,
+                                            const int o_dim_cyx, const int o_dim_yx, const int o_dim_x,
+                                            const float *gradOutput, int outputChannels,
+                                            int outputHeight, int outputWidth,
+                                            const float *padded_input2, int pad_size,
+                                            int kernel_size, int max_displacement,
+                                            int stride1, int stride2, int kernel_radius, int displacement_radius,
+                                            int displacement_size) {
+    // NCHW (bs,num of channels,height,width)
+    int n = item;
+    int y = blockIdx.x * stride1 + pad_size;
+    int x = blockIdx.y * stride1 + pad_size;
+    int c = blockIdx.z;
+    int tch_off = threadIdx.x;
+
+    int t_dim_cyx = outputChannels * outputHeight * outputWidth;
+    int t_dim_yx = outputHeight * outputWidth;
+    int t_dim_x = outputWidth;
+
+    int x_min = (x - kernel_radius - max_displacement) / stride1;
+    int y_min = (y - kernel_radius - max_displacement) / stride1;
+    int x_max = (x + kernel_radius - max_displacement) / stride1;
+    int y_max = (y + kernel_radius - max_displacement) / stride1;
+
+    // grad_input_1 is zero filled
+    if (x_max < 0 || y_max < 0 || x_min >= outputWidth || y_min >= outputHeight
+        || x_min > x_max || y_min > y_max) {
+        return;
+    }
+    // add range limit of height and width to cal grad_input_1
+    x_min = max(0, x_min);
+    x_max = min(outputWidth-1, x_max);
+
+    y_min = max(0, y_min);
+    y_max = min(outputHeight-1, y_max);
+
+    float nums = kernel_size * kernel_size * p_dim_c;
+
+    __shared__ float temp_sum[THREADS_PER_BLOCK];
+    temp_sum[tch_off] = 0;
+    // along channel axism
+    for (int tc = tch_off; tc < outputChannels; tc += THREADS_PER_BLOCK) {
+      int m_2 = (tc % displacement_size - displacement_radius) * stride2;
+      int n_2 = (tc / displacement_size - displacement_radius) * stride2;
+      int index2 =  n * p_dim_yxc + (y + n_2) * p_dim_xc + (x + m_2) * p_dim_c + c;
+
+      float val2 = padded_input2[index2];
+
+      for (int j = y_min; j <= y_max; ++j) {
+        for (int i = x_min; i <= x_max; ++i) {
+          int t_index = n * t_dim_cyx + tc * t_dim_yx + j * t_dim_x + i;
+          temp_sum[tch_off] += gradOutput[t_index] * val2;
+        }
+      }
+    }
+    __syncthreads();
+
+    if (tch_off == 0) {
+      float reduce_sum = 0;
+      for (int index = 0; index < THREADS_PER_BLOCK; index++) {
+          reduce_sum += temp_sum[index];
+      }
+      const int index1 = n * o_dim_cyx + c * o_dim_yx + (y - pad_size) * o_dim_x + (x - pad_size);
+      grad_input_1[index1] = reduce_sum / nums;
+    }
+}
+
+// correlation_backward_input2 kernel
+__global__ void correlation_backward_input2(int item, float  *grad_input_2,
+                                            const int p_dim_yxc, const int p_dim_xc, const int p_dim_c,
+                                            const int o_dim_cyx, const int o_dim_yx, const int o_dim_x,
+                                            const int t_dim_cyx, const int t_dim_yx, const int t_dim_x,
+                                            const float *gradOutput, int outputChannels,
+                                            int outputHeight, int outputWidth,
+                                            const float *padded_input1, int pad_size,
+                                            int kernel_size, int max_displacement,
+                                            int stride1, int stride2, int kernel_radius, int displacement_radius,
+                                            int displacement_size) {
+    // NCHW (bs,num of channels,height,width)
+    int n = item;
+    int y = blockIdx.x * stride1 + pad_size;
+    int x = blockIdx.y * stride1 + pad_size;
+    int c = blockIdx.z;
+
+    int tch_off = threadIdx.x;
+    __shared__ float prod_sum[THREADS_PER_BLOCK];
+    prod_sum[tch_off] = 0;
+    for (int tc = tch_off; tc < outputChannels; tc += THREADS_PER_BLOCK) {
+      int m_1 = (tc % displacement_size - displacement_radius) * stride2;
+      int n_1 = (tc / displacement_size - displacement_radius) * stride2;
+
+      int x_min = (x - kernel_radius - max_displacement - m_1) / stride1;
+      int y_min = (y - kernel_radius - max_displacement - n_1) / stride1;
+
+      int x_max = (x + kernel_radius - max_displacement - m_1) / stride1;
+      int y_max = (y + kernel_radius - max_displacement - n_1) / stride1;
+
+      if (x_max < 0 || y_max < 0) {
+          continue;
+      }
+      if (x_min >= outputWidth || y_min >= outputHeight) {
+          continue;
+      }
+      if (x_min > x_max || y_min > y_max) {
+          continue;
+      }
+
+      // add range limit of height and width to cal grad_input_2
+      x_min = max(0, x_min);
+      x_max = min(outputWidth-1, x_max);
+      y_min = max(0, y_min);
+      y_max = min(outputHeight-1, y_max);
+
+      // assign value of gradOutput to grad_input_2
+      int index_1 = n * p_dim_yxc + (y - n_1) * p_dim_xc + (x - m_1) * p_dim_c + c;
+      float val_1 = padded_input1[index_1];
+      for (int j = y_min; j <= y_max; ++j) {
+         for (int i = x_min; i <= x_max; ++i) {
+            int t_index = n * t_dim_cyx + tc * t_dim_yx + j * t_dim_x + i;
+            prod_sum[tch_off] += gradOutput[t_index] * val_1;
+         }
+      }
+    }
+
+    __syncthreads();
+    if (tch_off == 0) {
+      float reduce_sum = 0;
+      for (int index = 0; index < THREADS_PER_BLOCK; index++) {
+          reduce_sum += prod_sum[index];
+      }
+      const int index_2 = n * o_dim_cyx + c * o_dim_yx + (y - pad_size) * o_dim_x + (x - pad_size);
+      float nums = kernel_size * kernel_size * p_dim_c;
+      grad_input_2[index_2] = reduce_sum / nums;
+    }
+}
+
+extern "C" int correlationGrad(int nparam, void **params, int *ndims, int64_t **shapes, const char **dtypes,
+                                     void *stream, void *extra) {
+    cudaStream_t custream = static_cast<cudaStream_t>(stream);
+    constexpr int INPUT1_INDEX = 0;
+    constexpr int GRAD_OUTPUT_INDEX = 2;
+    constexpr int TOTAL_PARAM_NUM = 5;
+
+    if (nparam != TOTAL_PARAM_NUM) {
+         return 1;
+    }
+    // This is to check if the type of parameters the same as what the user wants.
+    for (int i = 0; i < TOTAL_PARAM_NUM; i++) {
+        if (strcmp(dtypes[0], "float32") != 0) {
+            return 2;
+        }
+    }
+
+    float *padded_input1 = static_cast<float *>(params[0]);
+    float *padded_input2 = static_cast<float *>(params[1]);
+    float *gradOutput = static_cast<float *>(params[2]);
+    float *gradInput1 = static_cast<float *>(params[3]);
+    float *gradInput2 = static_cast<float *>(params[4]);
+
+    int batchSize = shapes[GRAD_OUTPUT_INDEX][0];
+    int outputChannels = shapes[GRAD_OUTPUT_INDEX][1];
+    int outputHeight = shapes[GRAD_OUTPUT_INDEX][2];
+    int outputWidth = shapes[GRAD_OUTPUT_INDEX][3];
+
+    int inputChannels = shapes[INPUT1_INDEX][3];
+    int p_inputHeight = shapes[INPUT1_INDEX][1];
+    int p_inputWidth = shapes[INPUT1_INDEX][2];
+
+    // notice: At Currently the parameter used in cuda is fixed because the interface have no place to pass parameters
+    // need to be changed in future
+    const int pad_size = 20;
+    const int kernel_size = 1;
+    const int max_displacement = 20;
+    const int stride1 = 1;
+    const int stride2 = 2;
+
+    int inputWidth = p_inputWidth - 2 * pad_size;
+    int inputHeight = p_inputHeight - 2 * pad_size;
+
+    int kernel_radius = (kernel_size - 1) / 2;
+    int displacement_radius = max_displacement / stride2;
+    int displacement_size = 2 * displacement_radius + 1;
+
+    int p_dim_yxc = p_inputHeight * p_inputWidth * inputChannels;
+    int p_dim_xc = p_inputWidth * inputChannels;
+    int p_dim_c = inputChannels;
+
+    int t_dim_cyx = outputChannels * outputHeight * outputWidth;
+    int t_dim_yx = outputHeight * outputWidth;
+    int t_dim_x = outputWidth;
+
+    int o_dim_cyx = inputChannels * inputHeight* inputWidth;
+    int o_dim_yx = inputHeight * inputWidth;
+    int o_dim_x = inputWidth;
+
+    dim3 threadsPerBlock(THREADS_PER_BLOCK);
+    dim3 totalBlocksCorr(inputHeight, inputWidth, inputChannels);
+
+    // initialize gradInput1 zero
+    int gradInput1_size = batchSize*inputChannels*inputWidth*inputHeight;
+    correlationInitKernel<<<gradInput1_size / CUDA_NUM_THREADS + 1, CUDA_NUM_THREADS,
+                            0, custream>>>(gradInput1_size, gradInput1);
+    // call correlation_backward_input1
+    for (int n = 0; n < batchSize; ++n) {
+        correlation_backward_input1<<<totalBlocksCorr, threadsPerBlock, 0, custream>>> (
+              n, gradInput1, p_dim_yxc, p_dim_xc, p_dim_c, o_dim_cyx, o_dim_yx, o_dim_x,
+              gradOutput, outputChannels, outputHeight, outputWidth,
+              padded_input2, pad_size, kernel_size, max_displacement, stride1, stride2,
+              kernel_radius, displacement_radius, displacement_size);
+    }
+    // initialize gradInput2 zero
+    int gradInput2_size = batchSize*inputChannels*inputWidth*inputHeight;
+    correlationInitKernel<<<gradInput2_size / CUDA_NUM_THREADS + 1, CUDA_NUM_THREADS,
+                            0, custream>>>(gradInput2_size, gradInput2);
+    // call correlation_backward_input2
+    for (int n = 0; n < batchSize; n++) {
+      correlation_backward_input2<<<totalBlocksCorr, threadsPerBlock, 0, custream>>>(
+            n, gradInput2, p_dim_yxc, p_dim_xc, p_dim_c, o_dim_cyx, o_dim_yx, o_dim_x,
+            t_dim_cyx, t_dim_yx, t_dim_x,
+            gradOutput, outputChannels, outputHeight, outputWidth,
+            padded_input1, pad_size, kernel_size, max_displacement, stride1, stride2,
+            kernel_radius, displacement_radius, displacement_size);
+    }
+    return 0;
+}
diff --git a/research/cv/flownet2/src/submodels/custom_ops/custom_ops.py b/research/cv/flownet2/src/submodels/custom_ops/custom_ops.py
new file mode 100644
index 0000000000000000000000000000000000000000..c985f27de1939ae93e2111c897f2c65a5786b4b4
--- /dev/null
+++ b/research/cv/flownet2/src/submodels/custom_ops/custom_ops.py
@@ -0,0 +1,93 @@
+# Copyright 2022 Huawei Technologies Co., Ltd
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ============================================================================
+
+import os
+import math
+from mindspore.common import dtype as mstype
+from mindspore.nn import Cell
+import mindspore.ops as ops
+
+
+
+class TransposeAndPad(Cell):
+    def __init__(self, pad_size):
+        super(TransposeAndPad, self).__init__()
+        self.tanspose = ops.Transpose()
+        self.pad = ops.Pad(((0, 0), (pad_size, pad_size), (pad_size, pad_size), (0, 0)))
+
+    def construct(self, x):
+        x_tanspose = self.tanspose(x, (0, 2, 3, 1))
+        x_tanspose_pad = self.pad(x_tanspose)
+        return x_tanspose_pad
+
+
+class Correlation(Cell):
+    def __init__(self, pad_size=0, kernel_size=0, max_displacement=0, stride1=1, stride2=2):
+        super(Correlation, self).__init__()
+        self.pad_size = pad_size
+        self.max_displacement = max_displacement
+        self.kernel_size = kernel_size
+        self.stride1 = stride1
+        self.stride2 = stride2
+        self.transpose_pad = TransposeAndPad(pad_size)
+        dir_path = os.path.dirname(os.path.abspath(__file__))
+        self.func_path = dir_path + "/correlation.so"
+
+    def construct(self, x1, x2):
+        pad_x1 = self.transpose_pad(x1)
+        pad_x2 = self.transpose_pad(x2)
+        n_output_channels = (int(self.max_displacement / self.stride2) * 2 + 1) \
+                            * (int(self.max_displacement / self.stride2) * 2 + 1)
+        x1_shape = x1.shape
+        kernel_radius = (self.kernel_size - 1) / 2
+        border_radius = kernel_radius + self.max_displacement
+        padded_height = x1_shape[2] + 2 * self.pad_size
+        padded_width = x1_shape[3] + 2 * self.pad_size
+        output_height = int(math.ceil((padded_height - 2 * border_radius) / self.stride1))
+        output_width = int(math.ceil((padded_width - 2 * border_radius) / self.stride1))
+        out_shape = (x1_shape[0], n_output_channels, output_height, output_width)
+        correlation_forward = ops.Custom(self.func_path + ":correlation", out_shape, mstype.float32, "aot")
+        output = correlation_forward(pad_x1, pad_x2)
+        return output
+
+    def bprop(self, x1, x2, out, dout):
+        pad_x1 = self.transpose_pad(x1)
+        pad_x2 = self.transpose_pad(x2)
+        correlation_backward = ops.Custom(self.func_path + ":correlationGrad", (x1.shape, x2.shape),
+                                          (mstype.float32, mstype.float32), "aot")
+        dx1, dx2 = correlation_backward(pad_x1, pad_x2, dout)
+        return dx1, dx2
+
+
+class Resample2D(Cell):
+    def __init__(self, kernel_size=1, bilinear=True):
+        super(Resample2D, self).__init__()
+        self.kernel_saize = kernel_size
+        self.bilinear = bilinear
+        dir_path = os.path.dirname(os.path.abspath(__file__))
+        self.func_path = dir_path + "/resample2d.so"
+
+
+    def construct(self, x1, x2):
+        out_shape = (x2.shape[0], x1.shape[1], x2.shape[2], x2.shape[3])
+        resample2d_forward = ops.Custom(self.func_path + ":Resample2d", out_shape, mstype.float32, "aot")
+        output = resample2d_forward(x1, x2)
+        return output
+
+    def bprop(self, x1, x2, out, dout):
+        Resample2d_backward = ops.Custom(self.func_path + ":Resample2dGrad", (x1.shape, x2.shape),
+                                         (mstype.float32, mstype.float32), "aot")
+        dx1, dx2 = Resample2d_backward(x1, x2, dout)
+        return dx1, dx2
diff --git a/research/cv/flownet2/src/submodels/custom_ops/resample2d.cu b/research/cv/flownet2/src/submodels/custom_ops/resample2d.cu
new file mode 100644
index 0000000000000000000000000000000000000000..fa7dfa4f752046997986b8e477838fd05bfa98e3
--- /dev/null
+++ b/research/cv/flownet2/src/submodels/custom_ops/resample2d.cu
@@ -0,0 +1,418 @@
+// Copyright 2022 Huawei Technologies Co., Ltd
+//
+// Licensed under the Apache License, Version 2.0 (the "License");
+// you may not use this file except in compliance with the License.
+// You may obtain a copy of the License at
+//
+// http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing, software
+// distributed under the License is distributed on an "AS IS" BASIS,
+// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+// See the License for the specific language governing permissions and
+// limitations under the License.
+// ============================================================================
+#define CUDA_NUM_THREADS 512
+#define THREADS_PER_BLOCK 64
+
+#include <algorithm>
+
+__device__ __forceinline__ float MsAtomicAdd(float *address, const float val) {
+  return atomicAdd(address, val);
+}
+
+__global__ void Resample2dInitKernel(size_t size_init, float *input) {
+  auto idx = blockIdx.x * CUDA_NUM_THREADS + threadIdx.x;
+  if (idx < size_init) {
+     input[idx] = static_cast<float>(.0);
+  }
+}
+
+
+__device__ int GET_INDEX(const int batch , const int channels, const int height, const int width,
+                        const int batch_stride , const int channels_stride, const int height_stride) {
+     return batch*batch_stride+channels*channels_stride+height*height_stride+width;
+}
+
+__device__ float DIM3_INDEX(const float *input, const int batch , const int channels, const int height, const int width,
+                        const int batch_stride , const int channels_stride, const int height_stride) {
+    return input[batch*batch_stride+channels*channels_stride+height*height_stride+width];
+}
+
+
+__global__ void Resample2dKernel(size_t size, const float *input1, const float *input2, float *out_data,
+                    int batch_stride_x1, int channel_stride_x1, int height_stride_x1,
+                    int batch_stride_x2, int channel_stride_x2, int height_stride_x2,
+                    int batch_output, int channel_output, int height_output, int width_output,
+                    int kernel_size, bool bilinear) {
+    int index = blockIdx.x * blockDim.x + threadIdx.x;
+
+    if (index >= size) {
+        return;
+    }
+
+    float val = 0.0;
+
+    int dim_b = batch_output;
+    int dim_c = channel_output;
+    int dim_h = height_output;
+    int dim_w = width_output;
+    int dim_chw = dim_c * dim_h * dim_w;
+    int dim_hw  = dim_h * dim_w;
+    int b = (index / dim_chw) % dim_b;
+    int c = (index / dim_hw) % dim_c;
+    int y = (index / dim_w) % dim_h;
+    int x = (index) % dim_w;
+
+    float dx = DIM3_INDEX(input2, b, 0, y, x, batch_stride_x2, channel_stride_x2, height_stride_x2);
+    float dy = DIM3_INDEX(input2, b, 1, y, x, batch_stride_x2, channel_stride_x2, height_stride_x2);
+
+    float xf = x + dx;
+    float yf = y + dy;  // img+flow
+    float alpha = xf - (floor(xf));  // alpha
+    float beta = yf - (floor(yf));  // beta
+    if (bilinear) {
+        int xL = max(min(static_cast<int>(floor(xf)),    dim_w-1), 0);
+        int xR = max(min(static_cast<int>(floor(xf)+1), dim_w -1), 0);
+        int yT = max(min(static_cast<int>(floor(yf)),    dim_h-1), 0);
+        int yB = max(min(static_cast<int>(floor(yf)+1),  dim_h-1), 0);
+        for (int fy = 0; fy < kernel_size; fy += 1) {
+            for (int fx = 0; fx < kernel_size; fx += 1) {
+                float offTL = DIM3_INDEX(input1, b, c, yT + fy, xL + fx,
+                                    batch_stride_x1, channel_stride_x1, height_stride_x1);
+                float offTR = DIM3_INDEX(input1, b, c, yT + fy, xR + fx,
+                                    batch_stride_x1, channel_stride_x1, height_stride_x1);
+                float offBL = DIM3_INDEX(input1, b, c, yB + fy, xL + fx,
+                                    batch_stride_x1, channel_stride_x1, height_stride_x1);
+                float offBR = DIM3_INDEX(input1, b, c, yB + fy, xR + fx,
+                                    batch_stride_x1, channel_stride_x1, height_stride_x1);
+                val += (1. - alpha)*(1. - beta) *  offTL;
+                val += (alpha)*(1. - beta) * offTR;
+                val += (1. - alpha)*(beta) * offBL;
+                val += (alpha)*(beta) * offBR;
+            }
+        }
+        out_data[index] = val;
+    } else {
+        int xN = max(min(static_cast<int>(floor(xf + 0.5)), dim_w - 1), 0);
+        int yN = max(min(static_cast<int>(floor(yf + 0.5)), dim_h - 1), 0);
+        out_data[index] = DIM3_INDEX(input1, b, c, yN, xN, batch_stride_x1, channel_stride_x1, height_stride_x1);
+    }
+}
+
+
+
+extern "C" int Resample2d(int nparam, void **params, int *ndims, int64_t **shapes, const char **dtypes, void *stream,
+                         void *extra) {
+    cudaStream_t custream = static_cast<cudaStream_t>(stream);
+    constexpr int INPUT1_INDEX = 0;
+    constexpr int INPUT2_INDEX = 1;
+    constexpr int OUTPUT_INDEX = 2;
+    constexpr int TOTAL_PARAM_NUM = 3;
+
+    if (nparam != TOTAL_PARAM_NUM) {
+    return 1;
+    }
+    // This is to check if the type of parameters the same as what the user wants.
+    for (int i = 0; i < nparam; i++) {
+        if (strcmp(dtypes[i], "float32") != 0) {
+            return 2;
+        }
+    }
+
+    float *x1 = static_cast<float *>(params[0]);
+    float *x2 = static_cast<float *>(params[1]);
+
+    float *out_data = static_cast<float *>(params[2]);
+
+//     int batch_x1 = shapes[INPUT1_INDEX][0];
+    int channel_x1 = shapes[INPUT1_INDEX][1];
+    int height_x1 = shapes[INPUT1_INDEX][2];
+    int width_x1 = shapes[INPUT1_INDEX][3];
+
+//     int batch_x2 = shapes[INPUT2_INDEX][0];
+    int channel_x2 = shapes[INPUT2_INDEX][1];
+    int height_x2 = shapes[INPUT2_INDEX][2];
+    int width_x2 = shapes[INPUT2_INDEX][3];
+
+    int batch_output = shapes[OUTPUT_INDEX][0];
+    int channel_output = shapes[OUTPUT_INDEX][1];
+    int height_output = shapes[OUTPUT_INDEX][2];
+    int width_output = shapes[OUTPUT_INDEX][3];
+
+    // fix at now ,need to be changed in future
+    const int kernel_size = 1;
+    const bool bilinear = true;
+
+    int batch_stride_x1  = channel_x1 * height_x1 * width_x1;
+    int channel_stride_x1  = height_x1 * width_x1;
+    int height_stride_x1 = width_x1;
+    int batch_stride_x2 = channel_x2 * height_x2 * width_x2;
+    int channel_stride_x2  = height_x2 * width_x2;
+    int height_stride_x2  = width_x2;
+    size_t size = batch_output * channel_output * height_output * width_output;
+    Resample2dInitKernel<<<size / CUDA_NUM_THREADS +1, CUDA_NUM_THREADS, 0, custream>>>(size, out_data);
+
+    Resample2dKernel<<< (size + CUDA_NUM_THREADS - 1)/CUDA_NUM_THREADS, CUDA_NUM_THREADS, 0, custream>>>
+                   (size, x1, x2, out_data, batch_stride_x1, channel_stride_x1, height_stride_x1,
+                   batch_stride_x2, channel_stride_x2, height_stride_x2, batch_output, channel_output,
+                    height_output, width_output, kernel_size , bilinear);
+    return 0;
+}
+
+
+__global__ void kernel_resample2d_grad_input1(size_t size,
+    const float* input1, int batch_input1, int channel_input1, int height_input1, int width_input1,
+    const float* input2, int batch_stride_input2, int channel_stride_input2, int height_stride_input2,
+    const float* gradOutput, int batch_gradOutput, int channel_gradOutput, int height_gradOutput, int width_gradOutput,
+    int batch_stride_gradOutput, int channel_stride_gradOutput, int height_stride_gradOutput,
+    float* gradInput, int batch_stride_gradInput, int channel_stride_gradInput, int height_stride_gradInput,
+    int kernel_size, bool bilinear) {
+
+    int index = blockIdx.x * blockDim.x + threadIdx.x;
+
+    if (index >= size) {
+        return;
+    }
+
+    int dim_b = batch_gradOutput;
+    int dim_c = channel_gradOutput;
+    int dim_h = height_gradOutput;
+    int dim_w = width_gradOutput;
+    int dim_chw = dim_c * dim_h * dim_w;
+    int dim_hw  = dim_h * dim_w;
+
+    int b = (index / dim_chw) % dim_b;
+    int c = (index / dim_hw)  % dim_c;
+    int y = (index / dim_w)   % dim_h;
+    int x = (index)  % dim_w;
+
+    float dx = DIM3_INDEX(input2, b, 0, y, x, batch_stride_input2, channel_stride_input2, height_stride_input2);
+    float dy = DIM3_INDEX(input2, b, 1, y, x, batch_stride_input2, channel_stride_input2, height_stride_input2);
+
+    float xf = x + dx;
+    float yf = y + dy;
+    float alpha = xf - static_cast<int>(xf);  // alpha
+    float beta = yf - static_cast<int>(yf);  // beta
+
+    int idim_h = height_input1;
+    int idim_w = width_input1;
+
+    int xL = max(min(static_cast<int>(floor(xf)),    idim_w-1), 0);
+    int xR = max(min(static_cast<int>(floor(xf)+1), idim_w -1), 0);
+    int yT = max(min(static_cast<int>(floor(yf)),    idim_h-1), 0);
+    int yB = max(min(static_cast<int>(floor(yf)+1),  idim_h-1), 0);
+
+    float w1, w2, w3, w4;
+    float num = 1.f;
+    w1 = (num-alpha)*(num-beta);
+    w2 = (alpha)*(num-beta);
+    w3 = (num-alpha)*(beta);
+    w4 = (alpha)*(beta);
+
+    float gradnum = DIM3_INDEX(gradOutput, b, c, y, x,
+            batch_stride_gradOutput, channel_stride_gradOutput, height_stride_gradOutput);
+    for (int fy = 0; fy < kernel_size; fy += 1) {
+        for (int fx = 0; fx < kernel_size; fx += 1) {
+            int indexTL = GET_INDEX(b, c, (yT + fy), (xL + fx),
+                        batch_stride_gradInput, channel_stride_gradInput, height_stride_gradInput);
+            MsAtomicAdd(&gradInput[indexTL], w1 * gradnum);
+
+            int indexTR = GET_INDEX(b, c, (yT + fy), (xR + fx),
+                        batch_stride_gradInput, channel_stride_gradInput, height_stride_gradInput);
+            MsAtomicAdd(&gradInput[indexTR], w2 * gradnum);
+
+            int indexBL = GET_INDEX(b, c, (yB + fy), (xL + fx),
+                        batch_stride_gradInput, channel_stride_gradInput, height_stride_gradInput);
+            MsAtomicAdd(&gradInput[indexBL], w3 * gradnum);
+
+            int indexBR = GET_INDEX(b, c, (yB + fy), (xR + fx),
+                        batch_stride_gradInput, channel_stride_gradInput, height_stride_gradInput);
+            MsAtomicAdd(&gradInput[indexBR], w4 * gradnum);
+        }
+    }
+}
+
+
+__global__ void kernel_resample2d_grad_input2(size_t size,
+    const float  *input1, int batch_stride_input1, int channel_stride_input1, int height_stride_input1,
+    const float  *input2, int batch_stride_input2, int channel_stride_input2, int height_stride_input2,
+    const float  *gradOutput, int channel_gradOutput, int batch_stride_gradOutput,
+    int channel_stride_gradOutput, int height_stride_gradOutput,
+    float  *gradInput, int batch_gradInput, int channel_gradInput, int height_gradInput, int width_gradInput,
+    int batch_stride_gradInput, int channel_stride_gradInput, int height_stride_gradInput,
+    int kernel_size, bool bilinear) {
+    int index = blockIdx.x * blockDim.x + threadIdx.x;
+
+    if (index >= size) {
+        return;
+    }
+
+    float output = 0.0;
+    int kernel_rad = (kernel_size - 1)/2;
+
+    int dim_b = batch_gradInput;
+    int dim_c = channel_gradInput;
+    int dim_h = height_gradInput;
+    int dim_w = width_gradInput;
+    int dim_chw = dim_c * dim_h * dim_w;
+    int dim_hw  = dim_h * dim_w;
+
+    int b = (index / dim_chw) % dim_b;
+    int c = (index / dim_hw)  % dim_c;
+    int y = (index / dim_w)   % dim_h;
+    int x = (index)  % dim_w;
+
+    int odim_c = channel_gradOutput;
+
+    float dx = DIM3_INDEX(input2, b, 0, y, x, batch_stride_input2, channel_stride_input2, height_stride_input2);
+    float dy = DIM3_INDEX(input2, b, 1, y, x, batch_stride_input2, channel_stride_input2, height_stride_input2);
+
+    float xf = x + dx;
+    float yf = y + dy;
+
+    int xL = max(min(static_cast<int>(floor(xf)),    dim_w-1), 0);
+    int xR = max(min(static_cast<int>(floor(xf)+1), dim_w -1), 0);
+    int yT = max(min(static_cast<int>(floor(yf)),    dim_h-1), 0);
+    int yB = max(min(static_cast<int>(floor(yf)+1),  dim_h-1), 0);
+
+    if (c % 2) {
+        float gamma = 1 - (xf - floor(xf));  // alpha
+        for (int i = 0; i <= 2*kernel_rad ; ++i) {
+            for (int j = 0; j <= 2*kernel_rad; ++j) {
+                for (int ch = 0; ch < odim_c; ++ch) {
+                    float gradout = DIM3_INDEX(gradOutput, b, ch, y, x,
+                     batch_stride_gradOutput, channel_stride_gradOutput, height_stride_gradOutput);
+                    output += (gamma) * gradout * DIM3_INDEX(input1, b, ch, (yB + j), (xL + i),
+                                batch_stride_input1, channel_stride_input1, height_stride_input1);
+                    output -= (gamma) * gradout * DIM3_INDEX(input1, b, ch, (yT + j), (xL + i),
+                                batch_stride_input1, channel_stride_input1, height_stride_input1);
+                    output += (1-gamma) * gradout * DIM3_INDEX(input1, b, ch, (yB + j), (xR + i),
+                                batch_stride_input1, channel_stride_input1, height_stride_input1);
+                    output -= (1-gamma) * gradout * DIM3_INDEX(input1, b, ch, (yT + j), (xR + i),
+                                batch_stride_input1, channel_stride_input1, height_stride_input1);
+                }
+            }
+        }
+    } else {
+        float gamma = 1 - (yf - floor(yf));  // alpha
+        for (int i = 0; i <= 2*kernel_rad; ++i) {
+            for (int j = 0; j <= 2*kernel_rad; ++j) {
+                for (int ch = 0; ch < odim_c; ++ch) {
+                    float gradout = static_cast<float>(DIM3_INDEX(gradOutput, b, ch, y, x,
+                                batch_stride_gradOutput, channel_stride_gradOutput, height_stride_gradOutput));
+                    output += (gamma) * gradout * static_cast<float>(DIM3_INDEX(input1, b, ch, (yT + j), (xR + i),
+                                batch_stride_input1, channel_stride_input1, height_stride_input1));
+                    output -= (gamma)* gradout * static_cast<float>(DIM3_INDEX(input1, b, ch, (yT + j), (xL + i),
+                                batch_stride_input1, channel_stride_input1, height_stride_input1));
+                    output += (1-gamma)* gradout * static_cast<float>(DIM3_INDEX(input1, b, ch, (yB + j), (xR + i),
+                                batch_stride_input1, channel_stride_input1, height_stride_input1));
+                    output -= (1-gamma) * gradout * static_cast<float>(DIM3_INDEX(input1, b, ch, (yB + j), (xL + i),
+                                batch_stride_input1, channel_stride_input1, height_stride_input1));
+                }
+            }
+        }
+    }
+    gradInput[index] = output;
+}
+
+
+extern "C" int Resample2dGrad(int nparam, void **params, int *ndims, int64_t **shapes, const char **dtypes,
+                              void *stream, void *extra) {
+    cudaStream_t custream = static_cast<cudaStream_t>(stream);
+    constexpr int INPUT1_INDEX = 0;
+    constexpr int INPUT2_INDEX = 1;
+    constexpr int GRAD_OUTPUT_INDEX = 2;
+    constexpr int TOTAL_PARAM_NUM = 5;
+
+    if (nparam != TOTAL_PARAM_NUM) {
+        return 1;
+    }
+    // This is to check if the type of parameters the same as what the user wants.
+    for (int i = 0; i < nparam; i++) {
+        if (strcmp(dtypes[i], "float32") != 0) {
+            return 2;
+        }
+    }
+
+    float *x1 = static_cast<float *>(params[0]);
+    float *x2 = static_cast<float *>(params[1]);
+    float *dout = static_cast<float *>(params[2]);
+    float *dx1 = static_cast<float *>(params[3]);
+    float *dx2 = static_cast<float *>(params[4]);
+
+    int batch_x1 = shapes[INPUT1_INDEX][0];
+    int channel_x1 = shapes[INPUT1_INDEX][1];
+    int height_x1 = shapes[INPUT1_INDEX][2];
+    int width_x1 = shapes[INPUT1_INDEX][3];
+
+    int batch_x2 = shapes[INPUT2_INDEX][0];
+    int channel_x2 = shapes[INPUT2_INDEX][1];
+    int height_x2 = shapes[INPUT2_INDEX][2];
+    int width_x2 = shapes[INPUT2_INDEX][3];
+
+    int batch_dout = shapes[GRAD_OUTPUT_INDEX][0];
+    int channel_dout = shapes[GRAD_OUTPUT_INDEX][1];
+    int height_dout = shapes[GRAD_OUTPUT_INDEX][2];
+    int width_dout = shapes[GRAD_OUTPUT_INDEX][3];
+
+    // fix at now ,need to be changed in future
+    const int kernel_size = 1;
+    const bool bilinear = true;
+
+    int batch_dx1  = batch_x1;
+    int channel_dx1  = channel_x1;
+    int height_dx1  = height_x1;
+    int width_dx1 = width_x1;
+    int batch_dx2  = batch_x2;
+    int channel_dx2 = channel_x2;
+    int height_dx2 = height_x2;
+    int width_dx2 = width_x2;
+    int batch_stride_x1  = channel_x1 * height_x1 * width_x1;
+    int channel_stride_x1  = height_x1 * width_x1;
+    int height_stride_x1  = width_x1;
+//     int width_stride_x1 = 1;
+    int batch_stride_x2  = channel_x2 * height_x2 * width_x2;
+    int channel_stride_x2  = height_x2 * width_x2;
+    int height_stride_x2 = width_x2;
+//     int width_stride_x2 = 1;
+    int batch_stride_dx1  = batch_stride_x1;
+    int channel_stride_dx1 = channel_stride_x1;
+    int height_stride_dx1 = height_stride_x1;
+//     int width_stride_dx1 = width_stride_x1;
+    int batch_stride_dx2  = batch_stride_x2;
+    int channel_stride_dx2 = channel_stride_x2;
+    int height_stride_dx2 = height_stride_x2;
+//     int width_stride_dx2 = width_stride_x2;
+    int batch_stride_dout  = channel_dout * height_dout * width_dout;
+    int channel_stride_dout  = height_dout * width_dout;
+    int height_stride_dout  = width_dout;
+//     int width_stride_dout  = 1;
+
+    size_t dx1_size = batch_dx1 * channel_dx1 * height_dx1 * width_dx1;
+
+    Resample2dInitKernel<<<dx1_size / CUDA_NUM_THREADS +1, CUDA_NUM_THREADS, 0, custream>>>(dx1_size, dx1);
+    size_t dx2_size = batch_dx2 * channel_dx2 * height_dx2 * width_dx2;
+    Resample2dInitKernel<<<dx2_size / CUDA_NUM_THREADS +1, CUDA_NUM_THREADS, 0, custream>>>(dx2_size, dx2);
+
+    size_t dout_size = batch_dout * channel_dout * height_dout * width_dout;
+
+    kernel_resample2d_grad_input1<<<(dout_size + CUDA_NUM_THREADS - 1)/CUDA_NUM_THREADS, CUDA_NUM_THREADS,
+                                     0, custream>>>(dout_size,
+                                     x1, batch_x1, channel_x1, height_x1, width_x1,
+                                     x2, batch_stride_x2, channel_stride_x2, height_stride_x2,
+                                     dout, batch_dout, channel_dout, height_dout, width_dout,
+                                     batch_stride_dout, channel_stride_dout, height_stride_dout,
+                                     dx1, batch_stride_dx1, channel_stride_dx1, height_stride_dx1,
+                                     kernel_size, bilinear);
+
+    kernel_resample2d_grad_input2<<<(dx2_size + CUDA_NUM_THREADS - 1)/CUDA_NUM_THREADS, CUDA_NUM_THREADS,
+                                    0, custream>>>(dx2_size,
+                                     x1, batch_stride_x1, channel_stride_x1, height_stride_x1,
+                                     x2, batch_stride_x2, channel_stride_x2, height_stride_x2,
+                                     dout, channel_dout, batch_stride_dout, channel_stride_dout, height_stride_dout,
+                                     dx2, batch_dx2, channel_dx2, height_dx2, width_dx2,
+                                     batch_stride_dx2, channel_stride_dx2, height_stride_dx2,
+                                     kernel_size, bilinear);
+    return 0;
+}
diff --git a/research/cv/flownet2/src/submodels/submodules.py b/research/cv/flownet2/src/submodels/submodules.py
new file mode 100644
index 0000000000000000000000000000000000000000..c8adc05bcbe9c1fde372725fbd30a89ff8dcd521
--- /dev/null
+++ b/research/cv/flownet2/src/submodels/submodules.py
@@ -0,0 +1,98 @@
+# Copyright 2022 Huawei Technologies Co., Ltd
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ============================================================================
+import mindspore.nn as nn
+import mindspore.ops as ops
+
+class Norm(nn.Cell):
+    def __init__(self, axis=1, keep_dims=False):
+        super(Norm, self).__init__()
+        self.axis = axis
+        self.keep_dims = keep_dims
+        self.reduce_sum = ops.ReduceSum(True)
+        self.sqrt = ops.Sqrt()
+        self.squeeze = ops.Squeeze(self.axis)
+
+    def construct(self, x):
+        x = self.sqrt(ops.maximum(self.reduce_sum(ops.square(x), self.axis), 1e-7))
+
+        if not self.keep_dims:
+            x = self.squeeze(x)
+        return x
+
+
+class ChannelNorm(nn.Cell):
+    def __init__(self, axis=1):
+        super(ChannelNorm, self).__init__()
+        self.axis = axis
+        self.add = ops.Add()
+        self.norm = Norm(axis)
+
+    def construct(self, x):
+        output = self.norm(x)
+        output = output.reshape(output.shape[0], 1, output.shape[1], output.shape[2])
+        return output
+
+
+class Upsample(nn.Cell):
+
+    def __init__(self, scale_factor=4, mode='bilinear'):
+        super(Upsample, self).__init__()
+        self.scale_factor = scale_factor
+        self.mode = mode
+
+    def construct(self, x):
+        shape = x.shape
+        new_height = shape[2] * self.scale_factor
+        new_width = shape[3] * self.scale_factor
+        if self.mode == 'nearest':
+            upsample_op = ops.ResizeNearestNeighbor((new_height, new_width))
+        else:
+            upsample_op = ops.ResizeBilinear((new_height, new_width))
+        return upsample_op(x)
+
+
+def conv(batchnorm, in_planes, out_planes, kernel_size=3, stride=1):
+    if batchnorm:
+        conv2d = nn.Conv2d(in_planes, out_planes, kernel_size=kernel_size, stride=stride, pad_mode='pad',
+                           padding=(kernel_size - 1) // 2, has_bias=False)
+        batchNorm2d = nn.BatchNorm2d(out_planes)
+        leakyReLU = nn.LeakyReLU(0.1)
+        return nn.SequentialCell([conv2d, batchNorm2d, leakyReLU])
+    conv2d = nn.Conv2d(in_planes, out_planes, kernel_size=kernel_size, stride=stride, pad_mode='pad',
+                       padding=(kernel_size - 1) // 2, has_bias=True)
+    leakyReLU = nn.LeakyReLU(0.1)
+    return nn.SequentialCell([conv2d, leakyReLU])
+
+
+def i_conv(batchNorm, in_planes, out_planes, kernel_size=3, stride=1, bias=True):
+    if batchNorm:
+        conv2d = nn.Conv2d(in_planes, out_planes, kernel_size=kernel_size, stride=stride, pad_mode='pad',
+                           padding=(kernel_size - 1) // 2, has_bias=bias)
+        batchNorm2d = nn.BatchNorm2d(out_planes)
+        return nn.SequentialCell([conv2d, batchNorm2d])
+    conv2d = nn.Conv2d(in_planes, out_planes, kernel_size=kernel_size, stride=stride, pad_mode='pad',
+                       padding=(kernel_size - 1) // 2, has_bias=bias)
+    return nn.SequentialCell([conv2d])
+
+
+def predict_flow(in_planes):
+    return nn.Conv2d(in_planes, 2, kernel_size=3, stride=1, pad_mode='pad', padding=1, has_bias=True)
+
+
+def deconv(in_planes, out_planes):
+    convTranspose2d = nn.Conv2dTranspose(in_planes, out_planes, kernel_size=4, stride=2, pad_mode='pad', padding=1,
+                                         has_bias=True)
+    leakyReLU = nn.LeakyReLU(0.1)
+    return nn.SequentialCell([convTranspose2d, leakyReLU])
diff --git a/research/cv/flownet2/train.py b/research/cv/flownet2/train.py
new file mode 100644
index 0000000000000000000000000000000000000000..e6615a6b28898a2ea6802d7dfa1c4ed850e0ab87
--- /dev/null
+++ b/research/cv/flownet2/train.py
@@ -0,0 +1,162 @@
+# Copyright 2022 Huawei Technologies Co., Ltd
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ============================================================================
+import os
+import datetime
+import glob
+import mindspore as ms
+import mindspore.dataset as ds
+import mindspore.log as logger
+import mindspore.nn as nn
+from mindspore.context import ParallelMode
+from mindspore.nn.optim.adam import Adam
+from mindspore.train.callback import ModelCheckpoint, CheckpointConfig, LossMonitor, TimeMonitor
+from mindspore.train.loss_scale_manager import DynamicLossScaleManager, FixedLossScaleManager
+from mindspore.train.model import Model
+from mindspore.train.serialization import load_checkpoint, load_param_into_net
+from mindspore.communication.management import init, get_rank, get_group_size
+from mindspore.common import set_seed
+
+from src.eval_callback import EvalCallBack
+import src.dataset as datasets
+import src.models as models
+from src.metric import FlowNetEPE
+import src.model_utils.tools as tools
+from src.model_utils.config import config
+
+
+def set_save_ckpt_dir():
+    """set save ckpt dir"""
+    ckpt_save_dir = config.save_checkpoint_path
+    if config.run_distribute:
+        ckpt_save_dir = ckpt_save_dir + "/ckpt_" + str(get_rank()) + "/"
+    return ckpt_save_dir
+
+def apply_eval(eval_param):
+    eval_model = eval_param["model"]
+    eval_ds = eval_param["dataset"]
+    metrics_name = eval_param["metrics_name"]
+    res = eval_model.eval(eval_ds, dataset_sink_mode=False)
+    return res[metrics_name]
+
+
+def load_pre_trained_checkpoint(net, pre_trained, checkpoint_path):
+    param_dict = None
+    if pre_trained:
+        if os.path.isdir(checkpoint_path):
+            ckpt_save_dir = os.path.join(checkpoint_path, "ckpt_0")
+            ckpt_pattern = os.path.join(ckpt_save_dir, "*.ckpt")
+            ckpt_files = glob.glob(ckpt_pattern)
+            if not ckpt_files:
+                logger.warning(f"There is no ckpt file in {ckpt_save_dir}, "
+                               f"pre_trained is unsupported.")
+            else:
+                ckpt_files.sort(key=os.path.getmtime, reverse=True)
+                time_stamp = datetime.datetime.now()
+                print(f"time stamp {time_stamp.strftime('%Y.%m.%d-%H:%M:%S')}"
+                      f" pre trained ckpt model {ckpt_files[0]} loading",
+                      flush=True)
+                param_dict = load_checkpoint(ckpt_files[0])
+        elif os.path.isfile(checkpoint_path):
+            param_dict = load_checkpoint(checkpoint_path)
+        else:
+            print(f"Invalid pre_trained {checkpoint_path} parameter.")
+            return
+        load_param_into_net(net, param_dict)
+        print(f"loaded param from {checkpoint_path} into net")
+
+
+def add_ckpt_callback(step_size, ckpt_save_dir, cbs):
+    if config.save_checkpoint:
+        config_ck = CheckpointConfig(save_checkpoint_steps=step_size * config.save_ckpt_interval,
+                                     keep_checkpoint_max=config.keep_checkpoint_max)
+        ckpoint_cb = ModelCheckpoint(prefix="flownet2_", directory=ckpt_save_dir, config=config_ck)
+        cbs += [ckpoint_cb]
+
+
+def add_eval_callback(model, ckpt_save_dir, cbs):
+    if config.run_evalCallback:
+        if config.eval_data_path is None or (not os.path.isdir(config.eval_data_path)):
+            raise ValueError("{} is not a existing path.".format(config.eval_data_path))
+
+        config.eval_dataset_class = tools.module_to_dict(datasets)[config.eval_data]
+        flownet_eval_gen = config.eval_dataset_class("Center", config.crop_size, config.eval_size,
+                                                     config.eval_data_path)
+        eval_dataset = ds.GeneratorDataset(flownet_eval_gen, ["images", "flow"],
+                                           num_parallel_workers=config.num_parallel_workers,
+                                           max_rowsize=config.max_rowsize)
+        eval_dataset = eval_dataset.batch(config.batch_size)
+
+        eval_param_dict = {"model": model, "dataset": eval_dataset, "metrics_name": "FlowNetEPE"}
+        eval_cb = EvalCallBack(apply_eval, eval_param_dict, interval=config.eval_interval,
+                               eval_start_epoch=config.eval_start_epoch, save_best_ckpt=config.save_best_ckpt,
+                               ckpt_directory=ckpt_save_dir, besk_ckpt_name="best_acc.ckpt",
+                               metrics_name="FlowNetEPE")
+        cbs += [eval_cb]
+
+
+def run_train():
+    set_seed(config.seed)
+    ms.set_context(mode=ms.context.GRAPH_MODE, enable_graph_kernel=True, device_target=config.device_target)
+    ds.config.set_enable_shared_mem(False)
+    if config.device_target == "GPU":
+        if config.run_distribute:
+            init()
+            parallel_mode = ParallelMode.DATA_PARALLEL
+            rank = get_rank()
+            group_size = get_group_size()
+        else:
+            parallel_mode = ParallelMode.STAND_ALONE
+            rank = 0
+            group_size = 1
+
+        ms.context.set_auto_parallel_context(parallel_mode=parallel_mode, gradients_mean=True, device_num=group_size)
+
+    # load dataset by config param
+    config.training_dataset_class = tools.module_to_dict(datasets)[config.train_data]
+    flownet_train_gen = config.training_dataset_class(config.crop_type, config.crop_size, config.eval_size,
+                                                      config.train_data_path)
+    sampler = datasets.DistributedSampler(flownet_train_gen, rank, group_size, shuffle=True)
+    train_dataset = ds.GeneratorDataset(flownet_train_gen, ["images", "flow"],
+                                        sampler=sampler, num_parallel_workers=config.num_parallel_workers)
+    train_dataset = train_dataset.batch(config.batch_size)
+    step_size = train_dataset.get_dataset_size()
+
+    # load model by config param
+    config.model_class = tools.module_to_dict(models)[config.model]
+    net = config.model_class(config.rgb_max, config.batchNorm)
+
+    loss = nn.L1Loss()
+    if config.is_dynamicLoss_scale == 1:
+        loss_scale_manager = DynamicLossScaleManager(init_loss_scale=65536, scale_factor=2, scale_window=2000)
+    else:
+        loss_scale_manager = FixedLossScaleManager(config.scale, drop_overflow_update=False)
+
+    optim = Adam(params=net.trainable_params(), learning_rate=config.lr)
+
+    load_pre_trained_checkpoint(net, config.pre_trained, config.pre_trained_ckpt_path)
+    model = Model(net, loss_fn=loss, optimizer=optim, metrics={'FlowNetEPE': FlowNetEPE()},
+                  amp_level="O0", keep_batchnorm_fp32=True, loss_scale_manager=loss_scale_manager)
+    # add callback
+    time_cb = TimeMonitor(data_size=step_size)
+    loss_cb = LossMonitor()
+    cbs = [time_cb, loss_cb]
+    ckpt_save_dir = set_save_ckpt_dir()
+    add_ckpt_callback(step_size, ckpt_save_dir, cbs)
+    add_eval_callback(model, ckpt_save_dir, cbs)
+
+    model.train(config.epoch_size, train_dataset, callbacks=cbs, dataset_sink_mode=True)
+
+if __name__ == '__main__':
+    run_train()