diff --git a/research/cv/delf/README_CN.md b/research/cv/delf/README_CN.md index 956eac9a6d83178d9495fd74820dd70c7a392457..df3dd6716980232e1d2301eeb06b14349a9cddc1 100755 --- a/research/cv/delf/README_CN.md +++ b/research/cv/delf/README_CN.md @@ -86,6 +86,9 @@ - 如需查看详情,请参见如下资源: - [MindSpore教程](https://www.mindspore.cn/tutorials/zh-CN/master/index.html) - [MindSpore Python API](https://www.mindspore.cn/docs/api/zh-CN/master/index.html) +- 依赖包 + - h5py版本:3.4.0 + - 其它依赖包信息详见requirements.txt文件 # 快速入门 @@ -93,7 +96,7 @@ - 数据集下载和预处理 - ```yaml + ```shell # Google Landmarks Dataset v2 训练集下载以及转化为mindrecord文件 # 【注】请准备至少1.1TB的存储空间,若空间不足可以将可选参数[NEED_ROMOVE_TAR]设置为'y',设置后占用约633G存储空间 bash scripts/download_gldv2.sh 500 [DATASET_PATH] [NEED_ROMOVE_TAR] @@ -104,9 +107,9 @@ bash scripts/download_paris.sh [DATASET_PATH] ``` -- 迁移学习预训练权重下载 +- 预训练权重下载 - ```yaml + ```shell # 下载ImageNet预训练的Resnet50权重和pca降维预训练转换矩阵 bash scripts/download_pretrained.sh ``` @@ -121,12 +124,14 @@ ``` ```shell + # 请确保前面步骤的预训练权重已经下载完毕! # 运行训练示例,分为两阶段进行训练 # 微调阶段: - python train.py --train_state=tuning > train_tuning.log 2>&1 & + bash scripts/run_1p_train.sh tuning # 注意力训练阶段: - # 修改checkpoint_path为微调阶段得到的checkpoint - python train.py --train_state=attn --checkpoint_path=/home/delf/ckpt/checkpoint_delf_tuning-1_4989.ckpt > train_attn.log 2>&1 & + # 修改[CHECKPOINT]为微调阶段得到的checkpoint文件路径 + bash scripts/run_1p_train.sh attn [CHECKPOINT] + # example: bash scripts/run_1p_train.sh attn ./ckpt/checkpoint_delf_tuning-1_4989.ckpt # 运行分布式训练示例 # 微调阶段 @@ -208,6 +213,7 @@ │ ├── download_oxf.sh # 下载 Oxford5k 的shell脚本 │ ├── download_paris.sh # 下载 Paris6k 的shell脚本 │ ├── download_pretrained.sh # 下载预训练权重的shell脚本 + │ ├── run_1p_train.sh # 单卡训练的shell脚本 │ ├── run_8p_train.sh # 分布式训练的shell脚本 │ ├── run_eval_match_images.sh # 图像匹配评估的shell脚本 │ ├── run_eval_retrieval_images.sh # 图像检索评估的shell脚本 @@ -350,18 +356,19 @@ bash scripts/download_paris.sh [DATASET_PATH] 为了让模型达到更好更快地收敛,delf模型需要进行两个阶段的训练,微调阶段和注意力训练阶段,微调阶段运行命令: ```bash - python train.py --train_state=tuning > train_tuning.log 2>&1 & + bash scripts/run_1p_train.sh tuning ``` 注意力训练阶段运行命令: ```bash - python train.py --train_state=attn --checkpoint_path=/home/delf/ckpt/checkpoint_delf_tuning-1_4989.ckpt > train_attn.log 2>&1 & + bash scripts/run_1p_train.sh attn [CHECKPOINT] + # example: bash scripts/run_1p_train.sh attn ./ckpt/checkpoint_delf_tuning-1_4989.ckpt ``` 需要注意的是注意力训练阶段要载入微调阶段已经训练好的checkpoint权重。 - 上述python命令将在后台运行,您可以通过train_tuning.log或train_attn.log文件查看结果。 + 上述命令将在后台运行,您可以通过train_tuning.log或train_attn.log文件查看结果。 训练结束后,您可在默认脚本文件夹的`ckpt`目录下找到检查点文件。 @@ -424,7 +431,7 @@ bash scripts/download_paris.sh [DATASET_PATH] bash scripts/run_eval_match_images.sh [IMAGES_PATH] [CHECKPOINT] [DEVICES] ``` - 上述python命令将在后台打印日志,您可以通过`extract_feature.log`文件查看特征提取情况,通过`match_images.log`文件查看特征匹配情况。最后得到展示两个图像匹配特征的图片,它会保存为默认脚本目录下的`eval_match.png`。 + 上述命令将在后台打印日志,您可以通过`extract_feature.log`文件查看特征提取情况,通过`match_images.log`文件查看特征匹配情况。最后得到展示两个图像匹配特征的图片,它会保存为默认脚本目录下的`eval_match.png`。 注:对于分布式训练后评估,请将checkpoint_path设置为最后保存的检查点文件,如`/username/delf/train_parallel0/ckpt/attn-1-4898.ckpt`。 @@ -440,7 +447,7 @@ bash scripts/download_paris.sh [DATASET_PATH] bash scripts/run_eval_retrieval_images.sh [IMAGES_PATH] [GT_PATH] [CHECKPOINT] [DEVICES] ``` - 上述python命令将在后台打印日志,您可以通过`extract_feature.log`文件查看特征提取情况,通过`./retrieval_dataset/process[X]/retrieval[X].log`文件查看检索情况,通过`calculate_mAP.log`文件查看计算mAP的情况。计算得到的mAP结果会在脚本运行完成后打印在终端,也会保存到默认脚本目录下的mAP.txt文件中: + 上述命令将在后台打印日志,您可以通过`extract_feature.log`文件查看特征提取情况,通过`./retrieval_dataset/process[X]/retrieval[X].log`文件查看检索情况,通过`calculate_mAP.log`文件查看计算mAP的情况。计算得到的mAP结果会在脚本运行完成后打印在终端,也会保存到默认脚本目录下的mAP.txt文件中: ```matlab # cat mAP.txt diff --git a/research/cv/delf/delf_config.yaml b/research/cv/delf/delf_config.yaml index a6588731324cb68f1b473318605fe1e9b35121e6..3066b9316a44fee583682d3160c25e40c619bdf5 100755 --- a/research/cv/delf/delf_config.yaml +++ b/research/cv/delf/delf_config.yaml @@ -25,7 +25,7 @@ attention_loss_weight: 1.0 traindata_path: "/mass_data/gldv2_dataset/mindrecord/train.mindrecord000" keep_checkpoint_max: 1 -imagenet_checkpoint: "/home/resnet50_weights_tf_dim_ordering_tf_kernels_notop.h5" +imagenet_checkpoint: "./resnet50_weights_tf_dim_ordering_tf_kernels_notop.h5" checkpoint_path: "" save_ckpt: "./ckpt/" save_ckpt_step: 2000 diff --git a/research/cv/delf/requirements.txt b/research/cv/delf/requirements.txt index cba88de4dea5048690a5f32dd05eef9e3dada21c..cd8ea1509c47a2e8073ef9557d57bf6d3bfef4d2 100755 --- a/research/cv/delf/requirements.txt +++ b/research/cv/delf/requirements.txt @@ -3,7 +3,7 @@ annoy pillow csv pandas -h5py +h5py==3.4.0 matplotlib scipy scikit-image \ No newline at end of file diff --git a/research/cv/delf/scripts/run_1p_train.sh b/research/cv/delf/scripts/run_1p_train.sh new file mode 100644 index 0000000000000000000000000000000000000000..0a32cebeb7f706eaf482430e4ed42016351dcf31 --- /dev/null +++ b/research/cv/delf/scripts/run_1p_train.sh @@ -0,0 +1,51 @@ +#!/bin/bash +# Copyright 2022 Huawei Technologies Co., Ltd +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================ + +if [[ $# -lt 1 || $# -gt 2 ]] +then + echo "Usage: bash scripts/run_1p_train.sh [TRAIN_STATE] [CHECKPOINT]" +exit 1 +fi + +if [ "$1" == "tuning" ] || [ "$1" == "attn" ];then + train_state=$1 +else + echo "train state, it's value must be in [tuning, attn]" + exit 1 +fi + +checkpoint="" +if [ $# == 2 ] +then + checkpoint=$2 +fi + +cores=`cat /proc/cpuinfo|grep "processor" |wc -l` +echo "the number of logical core" $cores + +EXECUTE_PATH=$(pwd) +config_path="${EXECUTE_PATH}/delf_config.yaml" + +export DEVICE_ID=0 +export RANK_ID=0 +export RANK_SIZE=1 + +echo "Start training for rank 0, device 0" + +python train.py \ +--config_path=$config_path \ +--train_state=$train_state \ +--checkpoint_path=$checkpoint > $train_state.log 2>&1 & diff --git a/research/cv/delf/src/convert_h5_to_weight.py b/research/cv/delf/src/convert_h5_to_weight.py index f69b1a13443c20b861f3bf7a1f13c0828ea6a1fc..8b1a52b44f87615eba361bd392478ae6b0ebfbd5 100755 --- a/research/cv/delf/src/convert_h5_to_weight.py +++ b/research/cv/delf/src/convert_h5_to_weight.py @@ -33,7 +33,7 @@ def translate_h5(h5path=''): param_name = param_name.replace('_beta_1:0', '.beta') param_name = param_name.replace('_running_mean_1:0', '.moving_mean') param_name = param_name.replace('_running_std_1:0', '.moving_variance') - data = f[i][name].value.astype(np.float32) + data = f[i][name][:].astype(np.float32) count += 1 weights[param_name] = data parameter_dict = {}