Skip to content
Snippets Groups Projects
Commit d29ce081 authored by yuanAIhan's avatar yuanAIhan
Browse files

feat: the code of 310

parent 2f6b315f
No related branches found
No related tags found
No related merge requests found
Showing
with 1276 additions and 59 deletions
# Contents
- [Contents](#contents)
- [ResNeXt152 Description](#resnext152-description)
- [Model Architecture](#model-architecture)
- [Model architecture](#model-architecture)
- [Dataset](#dataset)
- [Features](#features)
- [Mixed Precision](#mixed-precision)
- [Mixed Precision](#mixed-precision)
- [Environment Requirements](#environment-requirements)
- [Quick Start](#quick-start)
- [Script Description](#script-description)
- [Script and Sample Code](#script-and-sample-code)
- [Script description](#script-description)
- [Script and sample code](#script-and-sample-code)
- [Script Parameters](#script-parameters)
- [Training Process](#training-process)
- [Usage](#usage)
- [Launch](#launch)
- [Evaluation Process](#evaluation-process)
- [Usage](#usage-1)
- [Launch](#launch-1)
- [Result](#result)
- [Model Export](#model-export)
- [Model Description](#model-description)
- [Performance](#performance)
- [Training Performance](#evaluation-performance)
- [Inference Performance](#evaluation-performance)
- [Inference Process](#inference-process)
- [Usage](#usage-2)
- [result](#result-1)
- [Model description](#model-description)
- [Performance](#performance)
- [Training Performance](#training-performance)
- [Inference Performance](#inference-performance)
- [Description of Random Situation](#description-of-random-situation)
- [ModelZoo Homepage](#modelzoo-homepage)
......@@ -67,19 +75,35 @@ For FP16 operators, if the input data type is FP32, the backend of MindSpore wil
```python
.
└─resnext152_64x4d
├─ascend310_infer # 310 inference code
├─inc
├─utils.h # Tool library header file
├─src
├─build.sh # Run script
├─CMakeLists.txt # cmake file
├─main_preprocess.cc # pre process
├─main.cc # the entry of main function
├─utils.cc # Tool library function implementation
├─README.md
├─scripts
├─run_standalone_train.sh # launch standalone training for ascend(1p)
├─run_standalone_train_gpu.sh # launch standalone training for gpu (1p)
├─run_distribute_train.sh # launch distributed training for ascend(8p)
├─run_distribute_train_gpu.sh # launch distributed training for gpu (8p)
├─run_eval.sh # launch evaluating
├─run_eval.sh # launch evaluate
└─run_eval_gpu.sh # launch evaluating for gpu
├─src
├─backbone
├─_init_.py # initialize
├─resnet.py # resnext152 backbone
├─model_utils
├─config.py # Related parameters
├─device_adapter.py # Device adapter for ModelArts
├─local_adapter.py # Local adapter
├─moxing_adapter.py # Moxing adapter for ModelArts
├─utils
├─_init_.py # initialize
├─auto_mixed_precision.py # Mixed precision
├─cunstom_op.py # network operation
├─logging.py # print log
├─optimizers_init_.py # get parameters
......@@ -91,15 +115,20 @@ For FP16 operators, if the input data type is FP32, the backend of MindSpore wil
├─dataset.py # data preprocessing
├─eval_callback.py # Inference during training
├─head.py # common head
├─image_classification.py # get resnet
├─lr_generator.py # Learning rate scheduler
├─image_classification.py # get ResNet
├─metric.py # Inference
├─linear_warmup.py # linear warmup learning rate
├─warmup_cosine_annealing.py # learning rate each step
├─warmup_step_lr.py # warmup step learning rate
├─create_imagenet2012_label.py # create label
├─default_config.yaml # parameters
├─eval.py # eval net
├─export.py # export mindir script
├─postprocess.py # 310 post-processing
├─train.py # train net
├─requirements.txt # Required python libraries
├─README.md # Documentation in English
├─README_CN.md # Documentation in Chinese
```
## [Script Parameters](#contents)
......@@ -221,37 +250,58 @@ python export.py --device_target [PLATFORM] --ckpt_file [CKPT_PATH] --file_forma
`EXPORT_FORMAT` should be in ["AIR", "ONNX", "MINDIR"]
## [Inference Process](#contents)
### Usage
Before performing inference, the mindir file must be exported by export.py. Currently, only batchsize 1 is supported.
```shell
# Ascend310 inference
bash run_infer_310.sh [MINDIR_PATH] [DATA_PATH] [DEVICE_ID]
```
`DEVICE_ID` is optional, default value is 0.
### result
Inference result is saved in current path, you can find result in acc.log file.
```shell
Total data: 50000, top1 accuracy: 0.79174, top5 accuracy: 0.94178.
```
# [Model description](#contents)
## [Performance](#contents)
### Training Performance
| Parameters | ResNeXt152 | ResNeXt152 |
| -------------------------- | --------------------------------------------- | --------------------------------------------- |
| Resource | Ascend 910, cpu:2.60GHz 192cores, memory:755G | 8x V100, Intel Xeon Gold 6226R CPU @ 2.90GHz |
| uploaded Date | 06/30/2021 | 06/30/2021 |
| MindSpore Version | 1.2 | 1.5.0 (docker build, CUDA 11.1) |
| Dataset | ImageNet | ImageNet |
| Training Parameters | src/config.py | src/config.py; lr=0.05, per_batch_size=16 |
| Optimizer | Momentum | Momentum |
| Loss Function | SoftmaxCrossEntropy | SoftmaxCrossEntropy |
| Loss | 1.28923 | 2.172222 |
| Accuracy | 80.08%(TOP1) | 79.36%(TOP1) (148 epoch, early stopping) |
| Total time | 7.8 h 8ps | 2 days 45 minutes (8P, processes) |
| Checkpoint for Fine tuning | 192 M(.ckpt file) | - |
| Parameters | ResNeXt152 | ResNeXt152 |
| -------------------------- | --------------------------------------------- | -------------------------------------------- |
| Resource | Ascend 910, cpu:2.60GHz 192cores, memory:755G | 8x V100, Intel Xeon Gold 6226R CPU @ 2.90GHz |
| uploaded Date | 06/30/2021 | 06/30/2021 |
| MindSpore Version | 1.2 | 1.5.0 (docker build, CUDA 11.1) |
| Dataset | ImageNet | ImageNet |
| Training Parameters | src/config.py | src/config.py; lr=0.05, per_batch_size=16 |
| Optimizer | Momentum | Momentum |
| Loss Function | SoftmaxCrossEntropy | SoftmaxCrossEntropy |
| Loss | 1.28923 | 2.172222 |
| Accuracy | 80.08%(TOP1) | 79.36%(TOP1) (148 epoch, early stopping) |
| Total time | 7.8 h 8ps | 2 days 45 minutes (8P, processes) |
| Checkpoint for Fine tuning | 192 M(.ckpt file) | - |
#### Inference Performance
| Parameters | | |
| ----------------- | ---------------- | ---------------- |
| Resource | Ascend 910 | GPU V100 |
| uploaded Date | 06/20/2021 | - |
| MindSpore Version | 1.2 | 1.5.0, CUDA 11.1 |
| Dataset | ImageNet, 1.2W | ImageNet, 1.2W |
| batch_size | 1 | 32 |
| outputs | probability | probability |
| Accuracy | acc=80.08%(TOP1) | acc=79.36%(TOP1) |
| Parameters | | | |
| ----------------- | ---------------- | ---------------- | ---------------- |
| Resource | Ascend 910 | GPU V100 | Ascend 310 |
| uploaded Date | 06/20/2021 | 2021-10-27 | 2021-10-27 |
| MindSpore Version | 1.2 | 1.5.0, CUDA 11.1 | 1.3.0 |
| Dataset | ImageNet, 1.2W | ImageNet, 1.2W | ImageNet, 1.2W |
| batch_size | 1 | 32 | 1 |
| outputs | probability | probability | probability |
| Accuracy | acc=80.08%(TOP1) | acc=79.36%(TOP1) | acc=79.34%(TOP1) |
# [Description of Random Situation](#contents)
......
......@@ -18,6 +18,9 @@
- [样例](#样例-1)
- [结果](#结果)
- [模型导出](#模型导出)
- [推理过程](#推理过程)
- [用法](#用法-2)
- [结果](#结果-1)
- [模型描述](#模型描述)
- [性能](#性能)
- [训练性能](#训练性能)
......@@ -69,9 +72,18 @@ ResNeXt整体网络架构如下:
## 脚本及样例代码
```path
```python
.
└─resnext152_64x4d
├─ascend310_infer # 310的推理代码
├─inc
├─utils.h # 工具库头文件
├─src
├─build.sh # 运行脚本
├─CMakeLists.txt # cmake文件
├─main_preprocess.cc # 预处理
├─main.cc # 主函数入口
├─utils.cc # 工具库函数实现
├─README.md
├─scripts
├─run_standalone_train.sh # 启动Ascend单机训练(单卡)
......@@ -84,8 +96,14 @@ ResNeXt整体网络架构如下:
├─backbone
├─_init_.py # 初始化
├─resnet.py # ResNeXt152骨干
├─model_utils
├─config.py # 相关参数
├─device_adapter.py # Device adapter for ModelArts
├─local_adapter.py # Local adapter
├─moxing_adapter.py # Moxing adapter for ModelArts
├─utils
├─_init_.py # 初始化
├─auto_mixed_precision.py # 混合精度
├─cunstom_op.py # 网络操作
├─logging.py # 打印日志
├─optimizers_init_.py # 获取参数
......@@ -102,9 +120,13 @@ ResNeXt整体网络架构如下:
├─linear_warmup.py # 线性热身学习率
├─warmup_cosine_annealing.py # 每次迭代的学习率
├─warmup_step_lr.py # 热身迭代学习率
├─create_imagenet2012_label.py # 创建标签
├─default_config.yaml # 参数
├─eval.py # 评估网络
├─export.py # export mindir script
├─postprocess.py # 310的后期处理
├─train.py # 训练网络
├─requirements.txt # 需要的python库
├─README.md # Documentation in English
├─README_CN.md # Documentation in Chinese
```
......@@ -113,7 +135,7 @@ ResNeXt整体网络架构如下:
在config.py中可以同时配置训练和评估参数。
```python
```config
"image_size": '224,224' # 图像大小
"num_classes": 1000, # 数据集类数
"per_batch_size": 128, # 输入张量的批次大小
......@@ -208,6 +230,18 @@ acc=80.08%(TOP1)
acc=94.71%(TOP5)
```
Example for the GPU evaluation:
```text
...
[DATE/TIME]:INFO:load model /path/to/checkpoints/ckpt_0/0-148_10009.ckpt success
[DATE/TIME]:INFO:Inference Performance: 218.14 img/sec
[DATE/TIME]:INFO:before results=[[39666], [46445], [49984]]
[DATE/TIME]:INFO:after results=[[39666] [46445] [49984]]
[DATE/TIME]:INFO:after allreduce eval: top1_correct=39666, tot=49984,acc=79.36%(TOP1)
[DATE/TIME]:INFO:after allreduce eval: top5_correct=46445, tot=49984,acc=92.92%(TOP5)
```
## 模型导出
```shell
......@@ -216,37 +250,58 @@ python export.py --device_target [PLATFORM] --ckpt_file [CKPT_PATH] --file_forma
`EXPORT_FORMAT` 可选 ["AIR", "ONNX", "MINDIR"].
## 推理过程
### 用法
在执行推理之前,需要通过export.py导出mindir文件。目前仅可处理batch_size为1。
```shell
#Ascend310 推理
bash run_infer_310.sh [MINDIR_PATH] [DATA_PATH] [DEVICE_ID]
```
`MINDIR_PATH`为生成的mindir的路径,`DATA_PATH`为imagenet的数据集路径,`DEVICE_ID`可选,默认值为0。
### 结果
推理结果保存在当前路径,可在acc.log中看到最终精度结果。
```shell
Total data: 50000, top1 accuracy: 0.79174, top5 accuracy: 0.94178.
```
# 模型描述
## 性能
### 训练性能
| 参数 | ResNeXt152 |
| -------------------------- | ---------------------------------------------------------- |
| 资源 | Ascend 910;CPU:2.60GHz192核;内存:755GB |
| 上传日期 | 2021-6-30 |
| MindSpore版本 | 1.2 |
| 数据集 | ImageNet |
| 训练参数 | src/config.py |
| 优化器 | Momentum |
| 损失函数 | Softmax交叉熵 |
| 损失 | 1.2892 |
| 准确率 | 80.08%(TOP1) |
| 总时长 | 7.8小时 (8卡) |
| 调优检查点 | 192 M.ckpt文件) |
| 参数 | ResNeXt152 | ResNeXt152 |
| ---------- | --------------------------------------------- | -------------------------------------------- |
| 资源 | Ascend 910, cpu:2.60GHz 192cores, memory:755G | 8x V100, Intel Xeon Gold 6226R CPU @ 2.90GHz |
| 上传日期 | 06/30/2021 | 06/30/2021 |
| 版本信息 | 1.3 | 1.5.0 (docker build, CUDA 11.1) |
| 数据集 | ImageNet | ImageNet |
| 训练参数 | src/config.py | src/config.py; lr=0.05, per_batch_size=16 |
| 优化器 | Momentum | Momentum |
| 损失函数 | SoftmaxCrossEntropy | SoftmaxCrossEntropy |
| 损失 | 1.28923 | 2.172222 |
| 准确率 | 80.08%(TOP1) | 79.36%(TOP1) (148 epoch, early stopping) |
| 总时长 | 7.8 h 8ps | 2 days 45 minutes (8P, processes) |
| 调优检查点 | 192 M(.ckpt file) | - |
#### 推理性能
| 参数 | |
| -------------------------- | -------------------- |
| 资源 | Ascend 910 |
| 上传日期 | 2021-6-20 |
| MindSpore版本 | 1.2 |
| 数据集 | ImageNet 1.2 |
| batch_size | 1 |
| 输出 | 概率 |
| 准确率 | acc=80.08%(TOP1) |
| 参数 | | | |
| ---------- | ---------------- | ---------------- | ---------------- |
| 资源 | Ascend 910 | GPU V100 | Ascend 310 |
| 上传日期 | 06/20/2021 | 2021-10-27 | 2021-10-27 |
| 版本信息 | 1.2 | 1.5.0, CUDA 11.1 | 1.3.0 |
| 数据集 | ImageNet, 1.2W | ImageNet, 1.2W | ImageNet, 1.2W |
| batch_size | 1 | 32 | 1 |
| outputs | probability | probability | probability |
| 准确率 | acc=80.08%(TOP1) | acc=79.36%(TOP1) | acc=79.34%(TOP1) |
# 随机情况说明
......
/**
* Copyright 2021 Huawei Technologies Co., Ltd
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
#ifndef MINDSPORE_INFERENCE_UTILS_H_
#define MINDSPORE_INFERENCE_UTILS_H_
#include <sys/stat.h>
#include <dirent.h>
#include <vector>
#include <string>
#include <memory>
#include "include/api/types.h"
std::vector<std::string> GetAllFiles(std::string_view dirName);
DIR *OpenDir(std::string_view dirName);
std::string RealPath(std::string_view path);
mindspore::MSTensor ReadFileToTensor(const std::string &file);
int WriteResult(const std::string& imageFile, const std::vector<mindspore::MSTensor> &outputs);
std::vector<std::string> GetAllFiles(std::string dir_name);
std::vector<std::vector<std::string>> GetAllInputData(std::string dir_name);
#endif
cmake_minimum_required(VERSION 3.14.1)
project(MindSporeCxxTestcase[CXX])
add_compile_definitions(_GLIBCXX_USE_CXX11_ABI=0)
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -O0 -g -std=c++17 -Werror -Wall -fPIE -Wl,--allow-shlib-undefined")
set(PROJECT_SRC_ROOT ${CMAKE_CURRENT_LIST_DIR}/)
option(MINDSPORE_PATH "mindspore install path" "")
include_directories(${MINDSPORE_PATH})
include_directories(${MINDSPORE_PATH}/include)
include_directories(${PROJECT_SRC_ROOT}/../)
find_library(MS_LIB libmindspore.so ${MINDSPORE_PATH}/lib)
file(GLOB_RECURSE MD_LIB ${MINDSPORE_PATH}/_c_dataengine*)
add_executable(main main.cc utils.cc)
target_link_libraries(main ${MS_LIB} ${MD_LIB} gflags)
# add_executable(main_preprocess main_preprocess.cc utils.cc)
# target_link_libraries(main_preprocess ${MS_LIB} gflags)
#!/bin/bash
# Copyright 2021 Huawei Technologies Co., Ltd
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ============================================================================
cmake . -DMINDSPORE_PATH="`pip show mindspore-ascend | grep Location | awk '{print $2"/mindspore"}' | xargs realpath`"
make
\ No newline at end of file
/**
* Copyright 2021 Huawei Technologies Co., Ltd
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
#include <sys/time.h>
#include <gflags/gflags.h>
#include <dirent.h>
#include <iostream>
#include <string>
#include <algorithm>
#include <iosfwd>
#include <vector>
#include <fstream>
#include <sstream>
#include "include/api/model.h"
#include "include/api/context.h"
#include "include/api/types.h"
#include "include/api/serialization.h"
#include "include/dataset/vision_ascend.h"
#include "include/dataset/execute.h"
#include "include/dataset/transforms.h"
#include "include/dataset/vision.h"
#include "inc/utils.h"
using mindspore::dataset::vision::Decode;
using mindspore::dataset::vision::Resize;
using mindspore::dataset::vision::CenterCrop;
using mindspore::dataset::vision::Normalize;
using mindspore::dataset::vision::HWC2CHW;
using mindspore::dataset::TensorTransform;
using mindspore::Context;
using mindspore::Serialization;
using mindspore::Model;
using mindspore::Status;
using mindspore::ModelType;
using mindspore::GraphCell;
using mindspore::kSuccess;
using mindspore::MSTensor;
using mindspore::dataset::Execute;
DEFINE_string(mindir_path, "", "mindir path");
DEFINE_string(dataset_path, ".", "dataset path");
DEFINE_int32(device_id, 0, "device id");
int main(int argc, char **argv) {
gflags::ParseCommandLineFlags(&argc, &argv, true);
if (RealPath(FLAGS_mindir_path).empty()) {
std::cout << "Invalid mindir" << std::endl;
return 1;
}
auto context = std::make_shared<Context>();
auto ascend310 = std::make_shared<mindspore::Ascend310DeviceInfo>();
ascend310->SetDeviceID(FLAGS_device_id);
ascend310->SetPrecisionMode("allow_fp32_to_fp16");
context->MutableDeviceInfo().push_back(ascend310);
mindspore::Graph graph;
Serialization::Load(FLAGS_mindir_path, ModelType::kMindIR, &graph);
Model model;
Status ret = model.Build(GraphCell(graph), context);
if (ret != kSuccess) {
std::cout << "ERROR: Build failed." << std::endl;
return 1;
}
auto all_files = GetAllInputData(FLAGS_dataset_path);
if (all_files.empty()) {
std::cout << "ERROR: no input data." << std::endl;
return 1;
}
std::map<double, double> costTime_map;
size_t size = all_files.size();
std::shared_ptr<TensorTransform> decode(new Decode());
std::shared_ptr<TensorTransform> resize(new Resize({256, 256}));
std::shared_ptr<TensorTransform> centercrop(new CenterCrop({224, 224}));
std::shared_ptr<TensorTransform> normalize(new Normalize({123.675, 116.28, 103.53},
{58.395, 57.12, 57.375}));
std::shared_ptr<TensorTransform> hwc2chw(new HWC2CHW());
std::vector<std::shared_ptr<TensorTransform>> trans_list;
trans_list = {decode, resize, centercrop, normalize, hwc2chw};
mindspore::dataset::Execute SingleOp(trans_list);
for (size_t i = 0; i < size; ++i) {
for (size_t j = 0; j < all_files[i].size(); ++j) {
struct timeval start = {0};
struct timeval end = {0};
double startTimeMs;
double endTimeMs;
std::vector<MSTensor> inputs;
std::vector<MSTensor> outputs;
std::cout << "Start predict input files:" << all_files[i][j] <<std::endl;
auto imgDvpp = std::make_shared<MSTensor>();
SingleOp(ReadFileToTensor(all_files[i][j]), imgDvpp.get());
inputs.emplace_back(imgDvpp->Name(), imgDvpp->DataType(), imgDvpp->Shape(),
imgDvpp->Data().get(), imgDvpp->DataSize());
gettimeofday(&start, nullptr);
ret = model.Predict(inputs, &outputs);
gettimeofday(&end, nullptr);
if (ret != kSuccess) {
std::cout << "Predict " << all_files[i][j] << " failed." << std::endl;
return 1;
}
startTimeMs = (1.0 * start.tv_sec * 1000000 + start.tv_usec) / 1000;
endTimeMs = (1.0 * end.tv_sec * 1000000 + end.tv_usec) / 1000;
costTime_map.insert(std::pair<double, double>(startTimeMs, endTimeMs));
WriteResult(all_files[i][j], outputs);
}
}
double average = 0.0;
int inferCount = 0;
for (auto iter = costTime_map.begin(); iter != costTime_map.end(); iter++) {
double diff = 0.0;
diff = iter->second - iter->first;
average += diff;
inferCount++;
}
average = average / inferCount;
std::stringstream timeCost;
timeCost << "NN inference cost average time: "<< average << " ms of infer_count " << inferCount << std::endl;
std::cout << "NN inference cost average time: "<< average << "ms of infer_count " << inferCount << std::endl;
std::string fileName = "./time_Result" + std::string("/test_perform_static.txt");
std::ofstream fileStream(fileName.c_str(), std::ios::trunc);
fileStream << timeCost.str();
fileStream.close();
costTime_map.clear();
return 0;
}
/**
* Copyright 2021 Huawei Technologies Co., Ltd
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
#include <sys/time.h>
#include <gflags/gflags.h>
#include <dirent.h>
#include <iostream>
#include <string>
#include <algorithm>
#include <iosfwd>
#include <vector>
#include <fstream>
#include <sstream>
#include "include/api/model.h"
#include "include/api/context.h"
#include "include/api/types.h"
#include "include/api/serialization.h"
#include "inc/utils.h"
using mindspore::Context;
using mindspore::GraphCell;
using mindspore::Model;
using mindspore::ModelType;
using mindspore::MSTensor;
using mindspore::Serialization;
using mindspore::Status;
DEFINE_string(mindir_path, "", "mindir path");
DEFINE_string(dataset_path, ".", "dataset path");
DEFINE_string(image_path, ".", "image path");
DEFINE_int32(device_id, 0, "device id");
int main(int argc, char **argv) {
gflags::ParseCommandLineFlags(&argc, &argv, true);
if (RealPath(FLAGS_mindir_path).empty()) {
std::cout << "Invalid mindir" << std::endl;
return 1;
}
auto context = std::make_shared<Context>();
auto ascend310 = std::make_shared<mindspore::Ascend310DeviceInfo>();
ascend310->SetDeviceID(FLAGS_device_id);
ascend310->SetPrecisionMode("allow_fp32_to_fp16");
context->MutableDeviceInfo().push_back(ascend310);
mindspore::Graph graph;
Serialization::Load(FLAGS_mindir_path, ModelType::kMindIR, &graph);
Model model;
Status ret = model.Build(GraphCell(graph), context);
if (ret.IsError()) {
std::cout << "ERROR: Build failed." << std::endl;
return 1;
}
std::cout << "Check if data preprocess exists: " << model.HasPreprocess() << std::endl;
// way 1, construct a common MSTensor
std::vector<MSTensor> inputs1 = {ReadFileToTensor(FLAGS_image_path)};
std::vector<MSTensor> outputs1;
ret = model.PredictWithPreprocess(inputs1, &outputs1);
if (ret.IsError()) {
std::cout << "ERROR: Predict failed." << std::endl;
return 1;
}
std::ofstream o1("result1.txt", std::ios::out);
o1.write(reinterpret_cast<const char *>(outputs1[0].MutableData()), std::streamsize(outputs1[0].DataSize()));
// way 2, construct a pointer of MSTensor, be careful of destroy
MSTensor *tensor = MSTensor::CreateImageTensor(FLAGS_image_path);
std::vector<MSTensor> inputs2 = {*tensor};
MSTensor::DestroyTensorPtr(tensor);
std::vector<MSTensor> outputs2;
ret = model.PredictWithPreprocess(inputs2, &outputs2);
if (ret.IsError()) {
std::cout << "ERROR: Predict failed." << std::endl;
return 1;
}
std::ofstream o2("result2.txt", std::ios::out);
o2.write(reinterpret_cast<const char *>(outputs2[0].MutableData()), std::streamsize(outputs2[0].DataSize()));
// way 3, split preprocess and predict
std::vector<MSTensor> inputs3 = {ReadFileToTensor(FLAGS_image_path)};
std::vector<MSTensor> outputs3;
ret = model.Preprocess(inputs3, &outputs3);
if (ret.IsError()) {
std::cout << "ERROR: Preprocess failed." << std::endl;
return 1;
}
std::vector<MSTensor> outputs4;
ret = model.Predict(outputs3, &outputs4);
if (ret.IsError()) {
std::cout << "ERROR: Preprocess failed." << std::endl;
return 1;
}
std::ofstream o3("result3.txt", std::ios::out);
o3.write(reinterpret_cast<const char *>(outputs4[0].MutableData()), std::streamsize(outputs4[0].DataSize()));
// check shape
auto shape = outputs1[0].Shape();
std::cout << "Output Shape: " << std::endl;
for (auto s : shape) {
std::cout << s << ", ";
}
std::cout << std::endl;
return 0;
}
/**
* Copyright 2021 Huawei Technologies Co., Ltd
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
#include <fstream>
#include <algorithm>
#include <iostream>
#include "inc/utils.h"
using mindspore::MSTensor;
using mindspore::DataType;
std::vector<std::vector<std::string>> GetAllInputData(std::string dir_name) {
std::vector<std::vector<std::string>> ret;
DIR *dir = OpenDir(dir_name);
if (dir == nullptr) {
return {};
}
struct dirent *filename;
/* read all the files in the dir ~ */
std::vector<std::string> sub_dirs;
while ((filename = readdir(dir)) != nullptr) {
std::string d_name = std::string(filename->d_name);
// get rid of "." and ".."
if (d_name == "." || d_name == ".." || d_name.empty()) {
continue;
}
std::string dir_path = RealPath(std::string(dir_name) + "/" + filename->d_name);
struct stat s;
lstat(dir_path.c_str(), &s);
if (!S_ISDIR(s.st_mode)) {
continue;
}
sub_dirs.emplace_back(dir_path);
}
std::sort(sub_dirs.begin(), sub_dirs.end());
(void)std::transform(sub_dirs.begin(), sub_dirs.end(), std::back_inserter(ret),
[](const std::string &d) { return GetAllFiles(d); });
return ret;
}
std::vector<std::string> GetAllFiles(std::string dir_name) {
struct dirent *filename;
DIR *dir = OpenDir(dir_name);
if (dir == nullptr) {
return {};
}
std::vector<std::string> res;
while ((filename = readdir(dir)) != nullptr) {
std::string d_name = std::string(filename->d_name);
if (d_name == "." || d_name == ".." || d_name.size() <= 3) {
continue;
}
res.emplace_back(std::string(dir_name) + "/" + filename->d_name);
}
std::sort(res.begin(), res.end());
return res;
}
std::vector<std::string> GetAllFiles(std::string_view dirName) {
struct dirent *filename;
DIR *dir = OpenDir(dirName);
if (dir == nullptr) {
return {};
}
std::vector<std::string> res;
while ((filename = readdir(dir)) != nullptr) {
std::string dName = std::string(filename->d_name);
if (dName == "." || dName == ".." || filename->d_type != DT_REG) {
continue;
}
res.emplace_back(std::string(dirName) + "/" + filename->d_name);
}
std::sort(res.begin(), res.end());
for (auto &f : res) {
std::cout << "image file: " << f << std::endl;
}
return res;
}
int WriteResult(const std::string& imageFile, const std::vector<MSTensor> &outputs) {
std::string homePath = "./result_Files";
for (size_t i = 0; i < outputs.size(); ++i) {
size_t outputSize;
std::shared_ptr<const void> netOutput;
netOutput = outputs[i].Data();
outputSize = outputs[i].DataSize();
int pos = imageFile.rfind('/');
std::string fileName(imageFile, pos + 1);
fileName.replace(fileName.find('.'), fileName.size() - fileName.find('.'), '_' + std::to_string(i) + ".bin");
std::string outFileName = homePath + "/" + fileName;
FILE *outputFile = fopen(outFileName.c_str(), "wb");
fwrite(netOutput.get(), outputSize, sizeof(char), outputFile);
fclose(outputFile);
outputFile = nullptr;
}
return 0;
}
mindspore::MSTensor ReadFileToTensor(const std::string &file) {
if (file.empty()) {
std::cout << "Pointer file is nullptr" << std::endl;
return mindspore::MSTensor();
}
std::ifstream ifs(file);
if (!ifs.good()) {
std::cout << "File: " << file << " is not exist" << std::endl;
return mindspore::MSTensor();
}
if (!ifs.is_open()) {
std::cout << "File: " << file << "open failed" << std::endl;
return mindspore::MSTensor();
}
ifs.seekg(0, std::ios::end);
size_t size = ifs.tellg();
mindspore::MSTensor buffer(file, mindspore::DataType::kNumberTypeUInt8, {static_cast<int64_t>(size)}, nullptr, size);
ifs.seekg(0, std::ios::beg);
ifs.read(reinterpret_cast<char *>(buffer.MutableData()), size);
ifs.close();
return buffer;
}
DIR *OpenDir(std::string_view dirName) {
if (dirName.empty()) {
std::cout << " dirName is null ! " << std::endl;
return nullptr;
}
std::string realPath = RealPath(dirName);
struct stat s;
lstat(realPath.c_str(), &s);
if (!S_ISDIR(s.st_mode)) {
std::cout << "dirName is not a valid directory !" << std::endl;
return nullptr;
}
DIR *dir;
dir = opendir(realPath.c_str());
if (dir == nullptr) {
std::cout << "Can not open dir " << dirName << std::endl;
return nullptr;
}
std::cout << "Successfully opened the dir " << dirName << std::endl;
return dir;
}
std::string RealPath(std::string_view path) {
char realPathMem[PATH_MAX] = {0};
char *realPathRet = nullptr;
realPathRet = realpath(path.data(), realPathMem);
if (realPathRet == nullptr) {
std::cout << "File: " << path << " is not exist.";
return "";
}
std::string realPath(realPathMem);
std::cout << path << " realpath is: " << realPath << std::endl;
return realPath;
}
# Copyright 2021 Huawei Technologies Co., Ltd
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ============================================================================
"""create_imagenet2012_label"""
import os
import json
import argparse
parser = argparse.ArgumentParser(description="resnet imagenet2012 label")
parser.add_argument("--img_path", type=str, required=True, help="imagenet2012 file path.")
args = parser.parse_args()
def create_label(file_path):
"""create label of imagenet."""
print("[WARNING] Create imagenet label. Currently only use for Imagenet2012!")
dirs = os.listdir(file_path)
file_list = []
for file in dirs:
file_list.append(file)
file_list = sorted(file_list)
total = 0
img_label = {}
for i, file_dir in enumerate(file_list):
files = os.listdir(os.path.join(file_path, file_dir))
for f in files:
img_label[f] = i
total += len(files)
with open("imagenet_label.json", "w+") as label:
json.dump(img_label, label)
print("[INFO] Completed! Total {} data.".format(total))
if __name__ == '__main__':
create_label(args.img_path)
# Builtin Configurations(DO NOT CHANGE THESE CONFIGURATIONS unless you know exactly what you are doing)
enable_modelarts: False
network: "resnext152"
# Url for modelarts
data_url: ""
train_url: ""
checkpoint_url: ""
# Path for local
run_distribute: False
enable_profiling: False
data_path: "/cache/data"
output_path: "/cache/train"
load_path: "/cache/checkpoint_path/"
device_target: "Ascend"
checkpoint_path: "./checkpoint/"
checkpoint_file_path: ""
# ==============================================================================
# Training options
image_size: [224, 224]
num_classes: 1000
batch_size: 1
lr: 0.4
lr_scheduler: "cosine_annealing"
lr_epochs: [30, 60, 90, 120]
lr_gamma: 0.1
eta_min: 0
T_max: 150
max_epoch: 150
warmup_epochs: 1
weight_decay: 0.0001
momentum: 0.9
is_dynamic_loss_scale: 0
loss_scale: 1024
label_smooth: 1
label_smooth_factor: 0.1
per_batch_size: 128
ckpt_interval: 5
ckpt_save_max: 5
ckpt_path: "output_demo/"
is_save_on_master: 1
rank: 0
group_size: 1
rank_save_ckpt_flag: 0
outputs_dir: ""
log_path: "./output_log"
# Export options
device_id: 0
width: 224
height: 224
file_name: "resnext152"
file_format: "AIR"
result_path: ""
label_path: ""
---
# Help description for each configuration
enable_modelarts: "Whether training on modelarts, default: False"
data_url: "Dataset url for obs"
train_url: "Training output url for obs"
checkpoint_url: "The location of checkpoint for obs"
data_path: "Dataset path for local"
output_path: "Training output path for local"
load_path: "The location of checkpoint for obs"
device_target: "Target device type, available: [Ascend, GPU, CPU]"
enable_profiling: "Whether enable profiling while training, default: False"
num_classes: "Class for dataset"
batch_size: "Batch size for training and evaluation"
epoch_size: "Total training epochs."
keep_checkpoint_max: "keep the last keep_checkpoint_max checkpoint"
checkpoint_path: "The location of the checkpoint file."
checkpoint_file_path: "The location of the checkpoint file."
# Copyright 2021 Huawei Technologies Co., Ltd
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ============================================================================
"""post process for 310 inference"""
import os
import json
import argparse
import numpy as np
parser = argparse.ArgumentParser(description="resnet inference")
parser.add_argument("--result_path", type=str, required=True, help="result files path.")
parser.add_argument("--label_path", type=str, required=True, help="image file path.")
args = parser.parse_args()
batch_size = 1
num_classes = 1000
def get_result(result_path, label_path):
"""calculate the result"""
files = os.listdir(result_path)
with open(label_path, "r") as label:
labels = json.load(label)
top1 = 0
top5 = 0
total_data = len(files)
for file in files:
img_ids_name = file.split('_0.')[0]
data_path = os.path.join(result_path, img_ids_name + "_0.bin")
result = np.fromfile(data_path, dtype=np.float16).reshape(batch_size, num_classes)
for batch in range(batch_size):
predict = np.argsort(-result[batch], axis=-1)
if labels[img_ids_name+".JPEG"] == predict[0]:
top1 += 1
if labels[img_ids_name+".JPEG"] in predict[:5]:
top5 += 1
print(f"Total data: {total_data}, top1 accuracy: {top1/total_data}, top5 accuracy: {top5/total_data}.")
if __name__ == '__main__':
get_result(args.result_path, args.label_path)
numpy
pillow
pyyaml
#!/bin/bash
# Copyright 2021 Huawei Technologies Co., Ltd
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ============================================================================
if [[ $# -lt 2 || $# -gt 3 ]]; then
echo "Usage: bash run_infer_310.sh [MINDIR_PATH] [DATA_PATH] [DEVICE_ID]
DEVICE_ID is optional, it can be set by environment variable device_id, otherwise the value is zero"
exit 1
fi
get_real_path(){
if [ "${1:0:1}" == "/" ]; then
echo "$1"
else
echo "$(realpath -m $PWD/$1)"
fi
}
model=$(get_real_path $1)
data_path=$(get_real_path $2)
device_id=0
if [ $# == 3 ]; then
device_id=$3
fi
echo "mindir name: "$model
echo "dataset path: "$data_path
echo "device id: "$device_id
export ASCEND_HOME=/usr/local/Ascend/
if [ -d ${ASCEND_HOME}/ascend-toolkit ]; then
export PATH=$ASCEND_HOME/fwkacllib/bin:$ASCEND_HOME/fwkacllib/ccec_compiler/bin:$ASCEND_HOME/ascend-toolkit/latest/fwkacllib/ccec_compiler/bin:$ASCEND_HOME/ascend-toolkit/latest/atc/bin:$PATH
export LD_LIBRARY_PATH=$ASCEND_HOME/fwkacllib/lib64:/usr/local/lib:$ASCEND_HOME/ascend-toolkit/latest/atc/lib64:$ASCEND_HOME/ascend-toolkit/latest/fwkacllib/lib64:$ASCEND_HOME/driver/lib64:$ASCEND_HOME/add-ons:$LD_LIBRARY_PATH
export TBE_IMPL_PATH=$ASCEND_HOME/ascend-toolkit/latest/opp/op_impl/built-in/ai_core/tbe
export PYTHONPATH=$ASCEND_HOME/fwkacllib/python/site-packages:${TBE_IMPL_PATH}:$ASCEND_HOME/ascend-toolkit/latest/fwkacllib/python/site-packages:$PYTHONPATH
export ASCEND_OPP_PATH=$ASCEND_HOME/ascend-toolkit/latest/opp
else
export PATH=$ASCEND_HOME/fwkacllib/bin:$ASCEND_HOME/fwkacllib/ccec_compiler/bin:$ASCEND_HOME/atc/ccec_compiler/bin:$ASCEND_HOME/atc/bin:$PATH
export LD_LIBRARY_PATH=$ASCEND_HOME/fwkacllib/lib64:/usr/local/lib:$ASCEND_HOME/atc/lib64:$ASCEND_HOME/acllib/lib64:$ASCEND_HOME/driver/lib64:$ASCEND_HOME/add-ons:$LD_LIBRARY_PATH
export PYTHONPATH=$ASCEND_HOME/fwkacllib/python/site-packages:$ASCEND_HOME/atc/python/site-packages:$PYTHONPATH
export ASCEND_OPP_PATH=$ASCEND_HOME/opp
fi
function compile_app()
{
cd ../ascend310_infer/src/ || exit
if [ -f "Makefile" ]; then
make clean
fi
bash build.sh &> build.log
}
function infer()
{
cd - || exit
if [ -d result_Files ]; then
rm -rf ./result_Files
fi
if [ -d time_Result ]; then
rm -rf ./time_Result
fi
mkdir result_Files
mkdir time_Result
../ascend310_infer/src/main --mindir_path=$model --dataset_path=$data_path --device_id=$device_id &> infer.log
}
function cal_acc()
{
python ../create_imagenet2012_label.py --img_path=$data_path
python ../postprocess.py --result_path=./result_Files --label_path=./imagenet_label.json &> acc.log &
}
compile_app
if [ $? -ne 0 ]; then
echo "compile app code failed"
exit 1
fi
infer
if [ $? -ne 0 ]; then
echo " execute inference failed"
exit 1
fi
cal_acc
if [ $? -ne 0 ]; then
echo "calculate accuracy failed"
exit 1
fi
\ No newline at end of file
# Copyright 2021 Huawei Technologies Co., Ltd
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ============================================================================
"""Parse arguments"""
import os
import ast
import argparse
from pprint import pprint, pformat
import yaml
_config_path = "./default_config.yaml"
class Config:
"""
Configuration namespace. Convert dictionary to members.
"""
def __init__(self, cfg_dict):
for k, v in cfg_dict.items():
if isinstance(v, (list, tuple)):
setattr(self, k, [Config(x) if isinstance(x, dict) else x for x in v])
else:
setattr(self, k, Config(v) if isinstance(v, dict) else v)
def __str__(self):
return pformat(self.__dict__)
def __repr__(self):
return self.__str__()
def parse_cli_to_yaml(parser, cfg, helper=None, choices=None, cfg_path="default_config.yaml"):
"""
Parse command line arguments to the configuration according to the default yaml.
Args:
parser: Parent parser.
cfg: Base configuration.
helper: Helper description.
cfg_path: Path to the default yaml config.
"""
parser = argparse.ArgumentParser(description="[REPLACE THIS at config.py]",
parents=[parser])
helper = {} if helper is None else helper
choices = {} if choices is None else choices
for item in cfg:
if not isinstance(cfg[item], list) and not isinstance(cfg[item], dict):
help_description = helper[item] if item in helper else "Please reference to {}".format(cfg_path)
choice = choices[item] if item in choices else None
if isinstance(cfg[item], bool):
parser.add_argument("--" + item, type=ast.literal_eval, default=cfg[item], choices=choice,
help=help_description)
else:
parser.add_argument("--" + item, type=type(cfg[item]), default=cfg[item], choices=choice,
help=help_description)
args = parser.parse_args()
return args
def parse_yaml(yaml_path):
"""
Parse the yaml config file.
Args:
yaml_path: Path to the yaml config.
"""
with open(yaml_path, 'r') as fin:
try:
cfgs = yaml.load_all(fin.read(), Loader=yaml.FullLoader)
cfgs = [x for x in cfgs]
if len(cfgs) == 1:
cfg_helper = {}
cfg = cfgs[0]
cfg_choices = {}
elif len(cfgs) == 2:
cfg, cfg_helper = cfgs
cfg_choices = {}
elif len(cfgs) == 3:
cfg, cfg_helper, cfg_choices = cfgs
else:
raise ValueError("At most 3 docs (config description for help, choices) are supported in config yaml")
print(cfg_helper)
except:
raise ValueError("Failed to parse yaml")
return cfg, cfg_helper, cfg_choices
def merge(args, cfg):
"""
Merge the base config from yaml file and command line arguments.
Args:
args: Command line arguments.
cfg: Base configuration.
"""
args_var = vars(args)
for item in args_var:
cfg[item] = args_var[item]
return cfg
def get_config():
"""
Get Config according to the yaml file and cli arguments.
"""
parser = argparse.ArgumentParser(description="default name", add_help=False)
current_dir = os.path.dirname(os.path.abspath(__file__))
parser.add_argument("--config_path", type=str, default=os.path.join(current_dir, "../../default_config.yaml"),
help="Config file path")
path_args, _ = parser.parse_known_args()
default, helper, choices = parse_yaml(path_args.config_path)
args = parse_cli_to_yaml(parser=parser, cfg=default, helper=helper, choices=choices, cfg_path=path_args.config_path)
final_config = merge(args, default)
pprint(final_config)
print("Please check the above information for the configurations", flush=True)
return Config(final_config)
config = get_config()
# Copyright 2021 Huawei Technologies Co., Ltd
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ============================================================================
"""Device adapter for ModelArts"""
from src.model_utils.config import config
if config.enable_modelarts:
from src.model_utils.moxing_adapter import get_device_id, get_device_num, get_rank_id, get_job_id
else:
from src.model_utils.local_adapter import get_device_id, get_device_num, get_rank_id, get_job_id
__all__ = [
"get_device_id", "get_device_num", "get_rank_id", "get_job_id"
]
# Copyright 2021 Huawei Technologies Co., Ltd
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ============================================================================
"""Local adapter"""
import os
def get_device_id():
device_id = os.getenv('DEVICE_ID', '0')
return int(device_id)
def get_device_num():
device_num = os.getenv('RANK_SIZE', '1')
return int(device_num)
def get_rank_id():
global_rank_id = os.getenv('RANK_ID', '0')
return int(global_rank_id)
def get_job_id():
return "Local Job"
# Copyright 2021 Huawei Technologies Co., Ltd
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ============================================================================
"""Moxing adapter for ModelArts"""
import os
import functools
from mindspore import context
from src.model_utils.config import config
_global_sync_count = 0
def get_device_id():
device_id = os.getenv('DEVICE_ID', '0')
return int(device_id)
def get_device_num():
device_num = os.getenv('RANK_SIZE', '1')
return int(device_num)
def get_rank_id():
global_rank_id = os.getenv('RANK_ID', '0')
return int(global_rank_id)
def get_job_id():
job_id = os.getenv('JOB_ID')
job_id = job_id if job_id != "" else "default"
return job_id
def sync_data(from_path, to_path):
"""
Download data from remote obs to local directory if the first url is remote url and the second one is local path
Upload data from local directory to remote obs in contrast.
"""
import moxing as mox
import time
global _global_sync_count
sync_lock = "/tmp/copy_sync.lock" + str(_global_sync_count)
_global_sync_count += 1
# Each server contains 8 devices as most.
if get_device_id() % min(get_device_num(), 8) == 0 and not os.path.exists(sync_lock):
print("from path: ", from_path)
print("to path: ", to_path)
mox.file.copy_parallel(from_path, to_path)
print("===finish data synchronization===")
try:
os.mknod(sync_lock)
except IOError:
pass
print("===save flag===")
while True:
if os.path.exists(sync_lock):
break
time.sleep(1)
print("Finish sync data from {} to {}.".format(from_path, to_path))
def moxing_wrapper(pre_process=None, post_process=None):
"""
Moxing wrapper to download dataset and upload outputs.
"""
def wrapper(run_func):
@functools.wraps(run_func)
def wrapped_func(*args, **kwargs):
# Download data from data_url
if config.enable_modelarts:
if config.data_url:
sync_data(config.data_url, config.data_path)
print("Dataset downloaded: ", os.listdir(config.data_path))
if config.checkpoint_url:
sync_data(config.checkpoint_url, config.load_path)
print("Preload downloaded: ", os.listdir(config.load_path))
if config.train_url:
sync_data(config.train_url, config.output_path)
print("Workspace downloaded: ", os.listdir(config.output_path))
context.set_context(save_graphs_path=os.path.join(config.output_path, str(get_rank_id())))
config.device_num = get_device_num()
config.device_id = get_device_id()
if not os.path.exists(config.output_path):
os.makedirs(config.output_path)
if pre_process:
pre_process()
run_func(*args, **kwargs)
# Upload data to train_url
if config.enable_modelarts:
if post_process:
post_process()
if config.train_url:
print("Start to copy output directory")
sync_data(config.output_path, config.train_url)
return wrapped_func
return wrapper
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment