!3370 昇腾众智-西安电子科技大学-MindSpore ONNX-resnet3d

Merge pull request !3370 from cfreshgirl/resnet3d_onnx

!3370 昇腾众智-西安电子科技大学-MindSpore ONNX-resnet3d
e4502e6b · i-robot · Gitee · 76bd0d1f · 924f83dc · e4502e6b
Unverified Commit e4502e6b authored Oct 17, 2022 by i-robot Committed by Gitee Oct 17, 2022
--- a/research/cv/resnet3d/README_CN.md
+++ b/research/cv/resnet3d/README_CN.md
@@ -14,6 +14,7 @@
  - [脚本参数](#脚本参数)
  - [训练过程](#训练过程)
  - [评估过程](#评估过程)
+  - [ONNX评估](#ONNX评估)
  - [导出过程](#导出过程)
  - [导出](#导出)
  - [推理过程](#推理过程)
@@ -54,9 +55,11 @@ resnet3d的总体网络架构如下：
 - [MIT](http://moments.csail.mit.edu/)
  - MIT-IBM Watson AI Lab 推出的一个全新的百万规模视频理解数据集Moments in Time,共有100,0000 个视频, 用于预训练
 - [hmdb51](https://serre-lab.clps.brown.edu/resource/hmdb-a-large-human-motion-database/#Downloads)
-  - 一个小型的视频行为识别数据集，包含51类动作，共有6849个视频，每个动作至少包含51个视频, 用于Fine-tune
+  - 一个小型的视频行为识别数据集，包含51类动作，共有6849个视频，每个动作至少包含51个视频, 用于Fine-tune，此处使用Stabilized HMDB51
+  - labels地址(http://serre-lab.clps.brown.edu/wp-content/uploads/2013/10/test_train_splits.rar)
 - [UCF101](https://www.crcv.ucf.edu/data/UCF101/UCF101.rar)
  - 从YouTube收集的具有101个动作类别的真实动作视频的动作识别数据集, 共计13320个视频, 用于Fine-tune
+  - labels地址(https://www.crcv.ucf.edu/data/UCF101/UCF101TrainTestSplits-RecognitionTask.zip)
 预训练模型获取地址：
 [链接](https://github.com/kenshohara/3D-ResNets-PyTorch)
@@ -68,7 +71,7 @@ python pth_to_ckpt.py --pth_path=./pretrained.pth --ckpt_path=./pretrained.ckpt
 特别说明：
-原始数据集下载后格式为：
+按照下面格式创建目录，将下载好的hmdb51_sta.rar解压，把解压出来的文件夹放到videos目录中。将数据集对应的labels解压，把解压出的txt文件移动到labels目录中。
 ```text
 .
@@ -87,18 +90,18 @@ python pth_to_ckpt.py --pth_path=./pretrained.pth --ckpt_path=./pretrained.ckpt
  └──json
 ```
-使用src/generate_hmdb51_json.py生成json格式的标注文件
+使用src/generate_video_jpgs.py将avi格式的视频文件转换为jpg格式的图片文件
 ```text
 cd ~/src
-python3 generate_hmdb51_json.py --dir_path ~/dataset/hmdb51/labels/ --video_path ~/dataset/hmdb51/videos/ --dst_dir_path ~/dataset/hmdb51/json
+python3 generate_video_jpgs.py --video_path ~/dataset/hmdb51/videos/ --target_path ~/dataset/hmdb51/jpg/
 ```
-使用src/generate_video_jpgs.py将avi格式的视频文件转换为jpg格式的图片文件
+使用src/generate_hmdb51_json.py生成json格式的标注文件
 ```text
 cd ~/src
-python3 generate_video_jpgs.py --video_path ~/dataset/hmdb51/videos/ --target_path ~/dataset/hmdb51/jpg/
+python3 generate_hmdb51_json.py --dir_path ~/dataset/hmdb51/labels/ --video_path ~/dataset/hmdb51/jpg/ --dst_dir_path ~/dataset/hmdb51/json
 ```
 # 特性
@@ -115,7 +118,7 @@ python3 generate_video_jpgs.py --video_path ~/dataset/hmdb51/videos/ --target_pa
    - [MindSpore](https://www.mindspore.cn/install/en)
 - 如需查看详情，请参见如下资源：
  - [MindSpore教程](https://www.mindspore.cn/tutorials/zh-CN/master/index.html)
-  - [MindSpore Python API](https://www.mindspore.cn/docs/zh-CN/master/index.html)
+  - [MindSpore Python API](https://www.mindspore.cn/docs/api/zh-CN/master/index.html)
 # 快速入门
@@ -146,6 +149,7 @@ python3 generate_video_jpgs.py --video_path ~/dataset/hmdb51/videos/ --target_pa
    ├── run_distribute_train.sh            # 启动Ascend分布式训练（8卡）
    ├── run_eval.sh                        # 启动Ascend评估
    ├── run_standalone_train.sh            # 启动Ascend单机训练（单卡）
+    ├──run_eval_onnx.sh                    # ONNX评估的shell脚本
  ├── src
    ├── __init__.py
    ├── config.py                          # yaml文件解析
@@ -165,6 +169,7 @@ python3 generate_video_jpgs.py --video_path ~/dataset/hmdb51/videos/ --target_pa
    └──  videodataset_multiclips.py        # 自定义数据集加载方式
  ├── pth_to_ckpt.py                       # 将预训练模型从pth格式转换为ckpt格式
  ├── eval.py                              # 评估网络
+  ├── eval_onnx.py                         # ONNX评估脚本
  ├── train.py                             # 训练网络
  ├── hmdb51_config.yaml                   # 参数配置
  └── ucf101_config.yaml                   # 参数配置  
@@ -180,6 +185,7 @@ python3 generate_video_jpgs.py --video_path ~/dataset/hmdb51/videos/ --target_pa
    'result_path': './results/ucf101',                                             # 训练、推理结果路径
    'pretrain_path': '~/your_path/pretrained.ckpt',                                # 预训练模型文件路径
    'inference_ckpt_path': "~/your_path/results/ucf101/result.ckpt",               # 用于推理的模型文件路径
+    'onnx_path': "~/your_path/results/result-3d.onnx",               # 用于推理的模型文件路径
    'n_classes': 101,                                                              # 数据集类别数
    'sample_size': 112,                                                            # 图片分辨率
    'sample_duration': 16,                                                         # 视频片段长度，单位：帧
@@ -303,6 +309,48 @@ clip: 66.5% top-1: 69.7%  top-5: 93.8%
 clip: 88.8% top-1: 92.7%  top-5: 99.3%
 ```
+## ONNX评估
+### 导出onnx模型
+```bash
+python export.py --ckpt_file=/path/best.ckpt --file_format=ONNX --n_classes=51 --batch_size=1 --device_target=GPU
+ ```
+- `ckpt_file` ckpt文件路径
+- `file_format` 导出模型格式，此处为ONNX
+- `n_classes` 使用数据集类别数，hmdb51数据集此参数为51，ucf101数据集此参数为101
+- `batch_size` 批次数，固定为1
+- `device_target` 目前仅支持GPU或CPU
+### 运行ONNX模型评估
+```bash
+用法：bash run_eval_onnx.sh [ucf101|hmdb51] [VIDEO_PATH] [ANNOTATION_PATH] [ONNX_PATH]
+实例：bash run_eval_onnx.sh ucf101 /path/ucf101/jpg /path/ucf101/json/ucf101_01.json /path/resnet-3d.onnx
+ ```
+- `[ucf101|hmdb51]` 选择所使用的数据集
+- `[VIDEO_PATH]` 视频路径
+- `[ANNOTATION_PATH]` 标签路径
+- `[ONNX_PATH]` onnx模型的路径
+### 结果
+评估结果保存在示例路径中，文件名为“~/eval_onnx.log”。您可在此路径下的日志找到如下结果：
+- 使用hmdb51数据集评估resnet3d
+```text
+clip: 66.5% top-1: 69.7%  top-5: 93.8%
+```
+- 使用ucf101数据集评估resnet3d
+```text
+clip: 88.8% top-1: 92.7%  top-5: 99.3%
+```
 ## 导出过程
 ### 导出
@@ -310,7 +358,7 @@ clip: 88.8% top-1: 92.7%  top-5: 99.3%
 在导出时，hmdb51数据集,参数n_classes设置为51,ucf101数据集,参数n_classes设置为101, 参数batch_size只能设置为1.
 ```shell
-python export.py --ckpt_file=./saved_model/best.ckpt --file_format=MINDIR --n_classes=51, --batch_size=1
+python export.py --ckpt_file=./saved_model/best.ckpt --file_format=MINDIR --n_classes=51 --batch_size=1
 ```
 ## 推理过程

--- a/research/cv/resnet3d/eval_onnx.py
+++ b/research/cv/resnet3d/eval_onnx.py
+# Copyright 2022 Huawei Technologies Co., Ltd
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ============================================================================
+"""
+Eval.
+"""
+import time
+import random
+import json
+from collections import defaultdict
+import numpy as np
+import onnxruntime
+from mindspore import dataset as de
+from mindspore.common import set_seed
+from src.config import config as args_opt
+from src.dataset import create_eval_dataset
+from src.inference import (topk_, get_video_results, load_ground_truth, load_result,
+                           remove_nonexistent_ground_truth, calculate_clip_acc)
+from src.videodataset_multiclips import get_target_path
+random.seed(1)
+np.random.seed(1)
+de.config.set_seed(1)
+set_seed(1)
+if __name__ == '__main__':
+    t1_ = time.time()
+    cfg = args_opt
+    print(cfg)
+    target = args_opt.device_target
+    if target == 'GPU':
+        providers = ['CUDAExecutionProvider']
+    elif target == 'CPU':
+        providers = ['CPUExecutionProvider']
+    else:
+        raise ValueError(
+            f'Unsupported target device {target}, '
+            f'Expected one of: "CPU", "GPU"'
+        )
+    session = onnxruntime.InferenceSession(args_opt.onnx_path, providers=providers)
+    predict_data = create_eval_dataset(
+        cfg.video_path, cfg.annotation_path, cfg)
+    size = predict_data.get_dataset_size()
+    total_target_path = get_target_path(cfg.annotation_path)
+    with total_target_path.open('r') as f:
+        total_target_data = json.load(f)
+    results = {'results': defaultdict(list)}
+    count = 0
+    for data in predict_data.create_dict_iterator(output_numpy=True):
+        t1 = time.time()
+        x, label = data['data'][0], data['label'].tolist()
+        video_ids, segments = zip(
+            *total_target_data['targets'][str(label[0])])
+        x_list = np.split(x, x.shape[0], axis=0)
+        outputs = []
+        for x in x_list:
+            inputs = {session.get_inputs()[0].name: x}
+            output = session.run(None, inputs)[0]
+            outputs.append(output)
+        outputs = np.concatenate(outputs, axis=0)
+        _, locs = topk_(outputs, K=1)
+        locs = locs.reshape(1, -1)
+        t2 = time.time()
+        print("[{} / {}] Net time: {} ms".format(count, size, (t2 - t1) * 1000))
+        for j in range(0, outputs.shape[0]):
+            results['results'][video_ids[j]].append({
+                'segment': segments[j],
+                'output': outputs[j]
+            })
+        count += 1
+    class_names = total_target_data['class_names']
+    inference_results = {'results': {}}
+    clips_inference_results = {'results': {}}
+    for video_id, video_results in results['results'].items():
+        video_outputs = [
+            segment_result['output'] for segment_result in video_results
+        ]
+        video_outputs = np.stack(video_outputs, axis=0)
+        average_scores = np.mean(video_outputs, axis=0)
+        clips_inference_results['results'][video_id] = get_video_results(
+            average_scores, class_names, 5)
+        inference_results['results'][video_id] = []
+        for segment_result in video_results:
+            segment = segment_result['segment']
+            result = get_video_results(segment_result['output'],
+                                       class_names, 5)
+            inference_results['results'][video_id].append({
+                'segment': segment,
+                'result': result
+            })
+    # init context
+    print('load ground truth')
+    ground_truth, class_labels_map = load_ground_truth(
+        cfg.annotation_path, "validation")
+    print('number of ground truth: {}'.format(len(ground_truth)))
+    n_ground_truth_top_1 = len(ground_truth)
+    n_ground_truth_top_5 = len(ground_truth)
+    result_top1, result_top5 = load_result(
+        clips_inference_results, class_labels_map)
+    ground_truth_top1 = remove_nonexistent_ground_truth(
+        ground_truth, result_top1)
+    ground_truth_top5 = remove_nonexistent_ground_truth(
+        ground_truth, result_top5)
+    if cfg.ignore:
+        n_ground_truth_top_1 = len(ground_truth_top1)
+        n_ground_truth_top_5 = len(ground_truth_top5)
+    correct_top1 = [1 if line[1] in result_top1[line[0]]
+                    else 0 for line in ground_truth_top1]
+    correct_top5 = [1 if line[1] in result_top5[line[0]]
+                    else 0 for line in ground_truth_top5]
+    clip_acc = calculate_clip_acc(
+        inference_results, ground_truth, class_labels_map)
+    print(sum(correct_top1))
+    print(n_ground_truth_top_1)
+    print(sum(correct_top5))
+    print(n_ground_truth_top_5)
+    accuracy_top1 = float(sum(correct_top1)) / float(n_ground_truth_top_1)
+    accuracy_top5 = float(sum(correct_top5)) / float(n_ground_truth_top_5)
+    print('==================Accuracy=================\n'
+          ' clip-acc : {} \ttop-1 : {} \ttop-5: {}'.format(clip_acc, accuracy_top1, accuracy_top5))
+    t2_ = time.time()
+    print("Total time : {} s".format(t2_ - t1_))
--- a/research/cv/resnet3d/export.py
+++ b/research/cv/resnet3d/export.py
@@ -31,7 +31,7 @@ parser.add_argument('--ckpt_file', type=str, required=True,
 parser.add_argument('--file_name', type=str,
                    default='resnet-3d', help='Output file name.')
 parser.add_argument('--file_format', type=str,
-                    choices=['AIR', 'MINDIR'], default='MINDIR', help='File format.')
+                    choices=['AIR', 'MINDIR', 'ONNX'], default='MINDIR', help='File format.')
 parser.add_argument('--device_target', type=str, choices=['Ascend', 'CPU', 'GPU'], default='Ascend',
                    help='Device target')
 parser.add_argument('--sample_duration', type=int, default=16)

--- a/research/cv/resnet3d/hmdb51_config.yaml
+++ b/research/cv/resnet3d/hmdb51_config.yaml
@@ -14,6 +14,7 @@ annotation_path: ""
 result_path: ""
 pretrain_path: ""
 inference_ckpt_path: ""
+onnx_path: ""
 n_classes: 51
 sample_size: 112
 sample_duration: 16

--- a/research/cv/resnet3d/requirements.txt
+++ b/research/cv/resnet3d/requirements.txt
+numpy 1.21.6
+onnxruntime-gpu 1.11.1
+pyyaml 6.0
+Pillow 9.2.0
\ No newline at end of file
--- a/research/cv/resnet3d/scripts/run_eval_onnx.sh
+++ b/research/cv/resnet3d/scripts/run_eval_onnx.sh
+#!/bin/bash
+# Copyright 2022 Huawei Technologies Co., Ltd
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ============================================================================
+echo "=============================================================================================================="
+echo "Please run the script as: "
+echo "bash run_eval_onnx.sh [ucf101|hmdb51] [VIDEO_PATH] [ANNOTATION_PATH] [ONNX_PATH]"
+echo "For example:
+bash run_eval_onnx.sh 0 ucf101 \\
+/path/ucf101/jpg/ \\
+/path/ucf101/json/ucf101_01.json \\
+/path/resnet-3d.onnx"
+echo "It is better to use the ABSOLUTE path."
+echo "=============================================================================================================="
+set -e
+if [ $# != 4 ]
+then
+  echo "Usage: bash run_eval_onnx.sh [ucf101|hmdb51] [VIDEO_PATH] [ANNOTATION_PATH] [ONNX_PATH]"
+exit 1
+fi
+DATASET=$1
+VIDEO_PATH=$2
+ANNOTATION_PATH=$3
+ONNX_PATH=$4
+EXEC_PATH=$(pwd)
+echo "$EXEC_PATH"
+cd ..
+env > env.log
+echo "Eval begin"
+python eval_onnx.py --is_modelarts False  --config_path ./${DATASET}_config.yaml --video_path $VIDEO_PATH \
+--annotation_path $ANNOTATION_PATH --onnx_path $ONNX_PATH --device_target GPU > eval_$DATASET.log 2>&1 &
+echo "Evaling. Check it at eval_$DATASET.log"
--- a/research/cv/resnet3d/src/pil_transforms.py
+++ b/research/cv/resnet3d/src/pil_transforms.py
@@ -33,7 +33,7 @@ class PILTrans:
                                     ratio=(opt.train_crop_min_ratio, 1.0 / opt.train_crop_min_ratio))
        self.random_horizontal_flip = vision.RandomHorizontalFlip(prob=0.5)
        self.color = vision.RandomColorAdjust(0.4, 0.4, 0.4, 0.1)
-        self.normalize = vision.Normalize(mean=mean, std=std, is_hwc=False)
+        self.normalize = vision.Normalize(mean=mean, std=std)
        self.to_tensor = vision.ToTensor()
        self.resize = vision.Resize(opt.sample_size)
        self.center_crop = vision.CenterCrop(opt.sample_size)
@@ -75,7 +75,7 @@ class EvalPILTrans:
        self.to_pil = vision.ToPIL()
        self.resize = vision.Resize(opt.sample_size)
        self.center_crop = vision.CenterCrop(opt.sample_size)
-        self.normalize = vision.Normalize(mean=mean, std=std, is_hwc=False)
+        self.normalize = vision.Normalize(mean=mean, std=std)
        self.to_tensor = vision.ToTensor()
    def __call__(self, data, labels, batchInfo):

--- a/research/cv/resnet3d/ucf101_config.yaml
+++ b/research/cv/resnet3d/ucf101_config.yaml
@@ -14,6 +14,7 @@ annotation_path: ""
 result_path: ""
 pretrain_path: ""
 inference_ckpt_path: ""
+onnx_path: ""
 n_classes: 101
 sample_size: 112
 sample_duration: 16