!3244 CLIFF init

Merge pull request !3244 from anzhengqi/master

!3244 CLIFF init
Merge pull request !3244 from anzhengqi/master
e6e458d0 · i-robot · Gitee · 42c88451 · d9d1f5c1 · e6e458d0
Unverified Commit e6e458d0 authored 2 years ago by i-robot Committed by Gitee 2 years ago
--- a/research/cv/CLIFF/README.md
+++ b/research/cv/CLIFF/README.md
+
+# Content
+
+- [Introduction](#introduction)
+- [Dataset](#dataset)
+- [Requiments](#requirements)
+- [Quick Start](#quick-start)
+- [ModelZoo Homepage](#modelzoo-homepage)
+
+## Introduction
+
+<img src="assets/teaser.gif" width="100%">
+
+*(This testing video is from the 3DPW testset, and processed frame by frame without temporal smoothing.)*
+
+This repo contains the CLIFF demo code (Implemented in MindSpore) for the following paper.
+
+> CLIFF: Carrying Location Information in Full Frames into Human Pose and Shape Estimation. \
+> Zhihao Li, Jianzhuang Liu, Zhensong Zhang, Songcen Xu, and Youliang Yan ⋆ \
+> ECCV 2022 Oral
+
+<img src="assets/arch.png" width="100%">
+
+## Dataset
+
+Not relevant.
+
+## Requirements
+
+```bash
+conda create -n cliff python=3.9
+pip install -r requirements.txt
+```
+
+Download the pretrained checkpoints and the testing sample to run the demo.
+[[Baidu Pan](https://pan.baidu.com/s/15v0jnoyEpKIXWhh2AjAZeQ?pwd=7777)]
+[[Google Drive](https://drive.google.com/drive/folders/1_d12Q8Yj13TEvB_4vopAbMdwJ1-KVR0R?usp=sharing)]
+
+Finally put these data following the directory structure as below:
+
+```text
+${ROOT}
+|-- ckpt
+    |-- cliff-hr48-PA43.0_MJE69.0_MVE81.2_3dpw.ckpt
+    |-- cliff-res50-PA45.7_MJE72.0_MVE85.3_3dpw.ckpt
+|-- data
+    |-- data/im07937.png
+    |-- data/smpl_mean_params.npz
+```
+
+## Quick Start
+
+```bash
+python demo.py --input_path PATH --ckpt CKPT
+```
+
+<img src="assets/im08036/im08036.png" width="24%">
+<img src="assets/im08036/im08036_bbox.jpg" width="24%">
+<img src="assets/im08036/im08036_front_view_cliff_hr48.jpg" width="24%">
+<img src="assets/im08036/im08036_side_view_cliff_hr48.jpg" width="24%">
+
+<img src="assets/im00492/im00492.png" width="24%">
+<img src="assets/im00492/im00492_bbox.jpg" width="24%">
+<img src="assets/im00492/im00492_front_view_cliff_hr48.jpg" width="24%">
+<img src="assets/im00492/im00492_side_view_cliff_hr48.jpg" width="24%">
+
+One can change the demo options in the script. Please see the option description in the bottom lines of `demo.py`.
+
+## ModelZoo Homepage
+
+Please check the official [homepage](https://gitee.com/mindspore/models).
\ No newline at end of file
--- a/research/cv/CLIFF/README_CN.md
+++ b/research/cv/CLIFF/README_CN.md
+
+# 目录
+
+- [模型简介](#模型简介)
+- [数据集](#数据集)
+- [环境要求](#环境要求)
+- [快速入门](#推理)
+- [ModelZoo主页](#modelzoo-主页)
+
+## 模型简介
+
+<img src="assets/teaser.gif" width="100%">
+
+*(该测试视频来自3DPW的测试集，处理时是一帧一帧地处理的，并没有加上时域平滑.)*
+
+CLIFF（ECCV 2022 Oral）是一种基于单目图像的人体动作捕捉算法，在多个公开数据集上取得了优异的效果。
+
+> CLIFF: Carrying Location Information in Full Frames into Human Pose and Shape Estimation. \
+> Zhihao Li, Jianzhuang Liu, Zhensong Zhang, Songcen Xu, and Youliang Yan ⋆ \
+> ECCV 2022 Oral
+
+<img src="assets/arch.png" width="100%">
+
+## 数据集
+
+不涉及
+
+## 环境要求
+
+```bash
+conda create -n cliff python=3.9
+pip install -r requirements.txt
+```
+
+下载预训练模型和测试样例，以运行推理代码。
+[[百度网盘](https://pan.baidu.com/s/15v0jnoyEpKIXWhh2AjAZeQ?pwd=7777)]
+[[Google Drive](https://drive.google.com/drive/folders/1_d12Q8Yj13TEvB_4vopAbMdwJ1-KVR0R?usp=sharing)]
+
+请把预训练模型放在`ckpt`目录下，测试样例放在`data`目录下，形成如下的目录结构：
+
+```text
+${ROOT}
+|-- ckpt
+    |-- cliff-hr48-PA43.0_MJE69.0_MVE81.2_3dpw.ckpt
+    |-- cliff-res50-PA45.7_MJE72.0_MVE85.3_3dpw.ckpt
+|-- data
+    |-- data/im07937.png
+    |-- data/smpl_mean_params.npz
+```
+
+## 快速入门
+
+运行脚本`demo.py`即可推理。
+
+```bash
+python demo.py --input_path PATH --ckpt CKPT
+```
+
+<p float="left">
+    <img src="assets/im08036/im08036.png" width="24%">
+    <img src="assets/im08036/im08036_bbox.jpg" width="24%">
+    <img src="assets/im08036/im08036_front_view_cliff_hr48.jpg" width="24%">
+    <img src="assets/im08036/im08036_side_view_cliff_hr48.jpg" width="24%">
+</p>
+
+<p float="left">
+    <img src="assets/im00492/im00492.png" width="24%">
+    <img src="assets/im00492/im00492_bbox.jpg" width="24%">
+    <img src="assets/im00492/im00492_front_view_cliff_hr48.jpg" width="24%">
+    <img src="assets/im00492/im00492_side_view_cliff_hr48.jpg" width="24%">
+</p>
+
+demo的相关参数可以修改，关于这些参数的说明请看`demo.py`文件的下方。
+
+## ModelZoo 主页
+
+请浏览官方[主页](https://gitee.com/mindspore/models)。
\ No newline at end of file
--- a/research/cv/CLIFF/assets/arch.png
+++ b/research/cv/CLIFF/assets/arch.png
--- a/research/cv/CLIFF/assets/im00492/im00492.png
+++ b/research/cv/CLIFF/assets/im00492/im00492.png
--- a/research/cv/CLIFF/assets/im00492/im00492_bbox.jpg
+++ b/research/cv/CLIFF/assets/im00492/im00492_bbox.jpg
--- a/research/cv/CLIFF/assets/im00492/im00492_front_view_cliff_hr48.jpg
+++ b/research/cv/CLIFF/assets/im00492/im00492_front_view_cliff_hr48.jpg
--- a/research/cv/CLIFF/assets/im00492/im00492_side_view_cliff_hr48.jpg
+++ b/research/cv/CLIFF/assets/im00492/im00492_side_view_cliff_hr48.jpg
--- a/research/cv/CLIFF/assets/im08036/im08036.png
+++ b/research/cv/CLIFF/assets/im08036/im08036.png
--- a/research/cv/CLIFF/assets/im08036/im08036_bbox.jpg
+++ b/research/cv/CLIFF/assets/im08036/im08036_bbox.jpg
--- a/research/cv/CLIFF/assets/im08036/im08036_front_view_cliff_hr48.jpg
+++ b/research/cv/CLIFF/assets/im08036/im08036_front_view_cliff_hr48.jpg
--- a/research/cv/CLIFF/assets/im08036/im08036_side_view_cliff_hr48.jpg
+++ b/research/cv/CLIFF/assets/im08036/im08036_side_view_cliff_hr48.jpg
--- a/research/cv/CLIFF/assets/teaser.gif
+++ b/research/cv/CLIFF/assets/teaser.gif
--- a/research/cv/CLIFF/common/constants.py
+++ b/research/cv/CLIFF/common/constants.py
+# Copyright 2022 Huawei Technologies Co., Ltd
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import os
+from os.path import join
+
+curr_dir = os.path.dirname(os.path.abspath(__file__))
+SMPL_MEAN_PARAMS = join(curr_dir, '../data/smpl_mean_params.npz')
+
+CROP_IMG_HEIGHT = 256
+CROP_IMG_WIDTH = 192
+CROP_ASPECT_RATIO = CROP_IMG_HEIGHT / float(CROP_IMG_WIDTH)
+
+# Mean and standard deviation for normalizing input image
+IMG_NORM_MEAN = [0.485, 0.456, 0.406]
+IMG_NORM_STD = [0.229, 0.224, 0.225]
--- a/research/cv/CLIFF/common/imutils.py
+++ b/research/cv/CLIFF/common/imutils.py
+# Copyright (c) 2019, University of Pennsylvania, Max Planck Institute for Intelligent Systems
+# This script is borrowed and extended from SPIN
+
+import cv2
+import numpy as np
+
+from common import constants
+
+
+def get_transform(center, scale, res, rot=0):
+    """Generate transformation matrix."""
+    # res: (height, width), (rows, cols)
+    crop_aspect_ratio = res[0] / float(res[1])
+    h = 200 * scale
+    w = h / crop_aspect_ratio
+    t = np.zeros((3, 3))
+    t[0, 0] = float(res[1]) / w
+    t[1, 1] = float(res[0]) / h
+    t[0, 2] = res[1] * (-float(center[0]) / w + .5)
+    t[1, 2] = res[0] * (-float(center[1]) / h + .5)
+    t[2, 2] = 1
+    if rot != 0:
+        rot = -rot  # To match direction of rotation from cropping
+        rot_mat = np.zeros((3, 3))
+        rot_rad = rot * np.pi / 180
+        sn, cs = np.sin(rot_rad), np.cos(rot_rad)
+        rot_mat[0, :2] = [cs, -sn]
+        rot_mat[1, :2] = [sn, cs]
+        rot_mat[2, 2] = 1
+        # Need to rotate around center
+        t_mat = np.eye(3)
+        t_mat[0, 2] = -res[1] / 2
+        t_mat[1, 2] = -res[0] / 2
+        t_inv = t_mat.copy()
+        t_inv[:2, 2] *= -1
+        t = np.dot(t_inv, np.dot(rot_mat, np.dot(t_mat, t)))
+    return t
+
+
+def transform(pt, center, scale, res, invert=0, rot=0):
+    """Transform pixel location to different reference."""
+    t = get_transform(center, scale, res, rot=rot)
+    if invert:
+        t = np.linalg.inv(t)
+    new_pt = np.array([pt[0] - 1, pt[1] - 1, 1.]).T
+    new_pt = np.dot(t, new_pt)
+    return np.array([round(new_pt[0]), round(new_pt[1])], dtype=int) + 1
+
+
+def crop(img, center, scale, res):
+    """
+    Crop image according to the supplied bounding box.
+    res: [rows, cols]
+    """
+    # Upper left point
+    ul = np.array(transform([1, 1], center, scale, res, invert=1)) - 1
+    # Bottom right point
+    br = np.array(transform([res[1] + 1, res[0] + 1], center, scale, res, invert=1)) - 1
+
+    new_shape = [br[1] - ul[1], br[0] - ul[0]]
+    if len(img.shape) > 2:
+        new_shape += [img.shape[2]]
+    new_img = np.zeros(new_shape, dtype=np.float32)
+
+    # Range to fill new array
+    new_x = max(0, -ul[0]), min(br[0], len(img[0])) - ul[0]
+    new_y = max(0, -ul[1]), min(br[1], len(img)) - ul[1]
+    # Range to sample from original image
+    old_x = max(0, ul[0]), min(len(img[0]), br[0])
+    old_y = max(0, ul[1]), min(len(img), br[1])
+    new_img[new_y[0]:new_y[1], new_x[0]:new_x[1]] = img[old_y[0]:old_y[1], old_x[0]:old_x[1]]
+
+    new_img = cv2.resize(new_img, (res[1], res[0]))  # (cols, rows)
+
+    return new_img
+
+
+def bbox_from_detector(bbox, rescale=1.1):
+    """
+    Get center and scale of bounding box from bounding box.
+    The expected format is [min_x, min_y, max_x, max_y].
+    """
+    # center
+    center_x = (bbox[0] + bbox[2]) / 2.0
+    center_y = (bbox[1] + bbox[3]) / 2.0
+    center = np.array([center_x, center_y])
+
+    # scale
+    bbox_w = bbox[2] - bbox[0]
+    bbox_h = bbox[3] - bbox[1]
+    bbox_size = max(bbox_w * constants.CROP_ASPECT_RATIO, bbox_h)
+    scale = bbox_size / 200.0
+    # adjust bounding box tightness
+    scale *= rescale
+    return center, scale
+
+
+def process_image(orig_img_rgb, bbox,
+                  crop_height=constants.CROP_IMG_HEIGHT,
+                  crop_width=constants.CROP_IMG_WIDTH):
+    """
+    Read image, do preprocessing and possibly crop it according to the bounding box.
+    If there are bounding box annotations, use them to crop the image.
+    If no bounding box is specified but openpose detections are available, use them to get the bounding box.
+    """
+    if bbox is not None:
+        center, scale = bbox_from_detector(bbox)
+    else:
+        # Assume that the person is centered in the image
+        height = orig_img_rgb.shape[0]
+        width = orig_img_rgb.shape[1]
+        center = np.array([width // 2, height // 2])
+        scale = max(height, width * crop_height / float(crop_width)) / 200.
+
+    img = crop(orig_img_rgb, center, scale, (crop_height, crop_width))
+    img = img / 255.
+    mean = np.array(constants.IMG_NORM_MEAN, dtype=np.float32)
+    std = np.array(constants.IMG_NORM_STD, dtype=np.float32)
+    norm_img = (img - mean) / std
+    norm_img = np.transpose(norm_img, (2, 0, 1))
+
+    return norm_img, center, scale
--- a/research/cv/CLIFF/demo.py
+++ b/research/cv/CLIFF/demo.py
+# Copyright 2022 Huawei Technologies Co., Ltd
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import argparse
+import cv2
+import mindspore
+from mindspore import Tensor
+import numpy as np
+
+from models.cliff_res50 import MindSporeModel as cliff_res50
+from common.imutils import process_image
+from common import constants
+
+
+def main(args):
+    # load the model
+    print("ckpt:", args.ckpt)
+    cliff = cliff_res50()
+    param_dict = mindspore.load_checkpoint(args.ckpt)
+    mindspore.load_param_into_net(cliff, param_dict)
+
+    # load and pre-process the image
+    print("input_path:", args.input_path)
+    img_bgr = cv2.imread(args.input_path)
+    img_rgb = img_bgr[:, :, ::-1]
+    norm_img, center, scale = process_image(img_rgb, bbox=None)
+    norm_img = norm_img[np.newaxis, :, :, :]
+
+    # calculate the bbox info
+    cx, cy, b = center[0], center[1], scale * 200
+    img_h, img_w, _ = img_rgb.shape
+    focal_length = (img_w * img_w + img_h * img_h) ** 0.5  # fov: 55 degree
+    bbox_info = np.array([cx - img_w / 2., cy - img_h / 2., b], dtype=np.float32)
+    bbox_info = bbox_info[np.newaxis, :]
+    bbox_info[:, :2] = bbox_info[:, :2] / focal_length * 2.8  # [-1, 1]
+    bbox_info[:, 2] = (bbox_info[:, 2] - 0.24 * focal_length) / (0.06 * focal_length)  # [-1, 1]
+
+    # load the initial parameter
+    mean_params = np.load(constants.SMPL_MEAN_PARAMS)
+    init_pose = mean_params['pose'][np.newaxis, :].astype('float32')
+    init_shape = mean_params['shape'][np.newaxis, :].astype('float32')
+    init_cam = mean_params['cam'][np.newaxis, :].astype('float32')
+
+    # feed-forward
+    pred_rotmat_6d, pred_betas, pred_cam_crop = cliff(Tensor(norm_img), Tensor(bbox_info),
+                                                      Tensor(init_pose), Tensor(init_shape), Tensor(init_cam))
+    print("pred_rotmat_6d", pred_rotmat_6d)
+    print("pred_betas", pred_betas)
+    print("pred_cam_crop", pred_cam_crop)
+    print("Inference finished successfully!")
+
+
+if __name__ == '__main__':
+    parser = argparse.ArgumentParser()
+
+    parser.add_argument('--input_path', default='data/im07937.png', help='path to the input data')
+    parser.add_argument('--ckpt', default="ckpt/cliff-res50-PA45.7_MJE72.0_MVE85.3_3dpw.ckpt",
+                        help='path to the pretrained checkpoint')
+
+    arguments = parser.parse_args()
+    main(arguments)
--- a/research/cv/CLIFF/models/cliff_res50.py
+++ b/research/cv/CLIFF/models/cliff_res50.py
--- a/research/cv/CLIFF/requirements.txt
+++ b/research/cv/CLIFF/requirements.txt
+opencv-python>=4.6.0.66
+numpy>=1.23.1
\ No newline at end of file