diff --git a/official/cv/se_resnext50/README.md b/official/cv/se_resnext50/README.md new file mode 100644 index 0000000000000000000000000000000000000000..9421b32f706b2fa6fb9d72570c5a32bb90ba8843 --- /dev/null +++ b/official/cv/se_resnext50/README.md @@ -0,0 +1,332 @@ +# Contents + +- [SE_ResNeXt50 Description](#se_resnext50-description) +- [Model Architecture](#model-architecture) +- [Dataset](#dataset) +- [Features](#features) +- [Mixed Precision](#mixed-precision) +- [Environment Requirements](#environment-requirements) +- [Quick Start](#quick-start) +- [Script Description](#script-description) + - [Script and Sample Code](#script-and-sample-code) + - [Script Parameters](#script-parameters) + - [Training Process](#training-process) + - [Evaluation Process](#evaluation-process) + - [Model Export](#model-export) + - [Inference Process](#inference-process) +- [Model Description](#model-description) + - [Performance](#performance) + - [Training Performance](#evaluation-performance) + - [Inference Performance](#evaluation-performance) +- [Description of Random Situation](#description-of-random-situation) +- [ModelZoo Homepage](#modelzoo-homepage) + +# [SE_ResNeXt50 Description](#contents) + +SE-ResNeXt50 is a variant of ResNeXt50 which reference [paper 1](https://arxiv.org/abs/1709.01507) below, ResNeXt50 is a simple, highly modularized network architecture for image classification. It designs results in a homogeneous, multi-branch architecture that has only a few hyper-parameters to set in ResNeXt50. This strategy exposes a new dimension, which we call 鈥渃ardinality鈥� (the size of the set of transformations), as an essential factor in addition to the dimensions of depth and width.ResNeXt50 reference [paper 2](https://arxiv.org/abs/1611.05431) below. + +[paper1](https://arxiv.org/abs/1709.01507)锛欽ie Hu, Li Shen, Samuel Albanie, Gang Sun, Enhua Wu."Squeeze-and-Excitation Networks" + +[paper2](https://arxiv.org/abs/1611.05431)锛歋aining Xie, Ross Girshick, Piotr Doll谩r, Zhuowen Tu, Kaiming He." Aggregated Residual Transformations for Deep Neural Networks" + +# [Model architecture](#contents) + +The overall network architecture of SE_ResNeXt50 is show below: + +[Link](https://arxiv.org/abs/1709.01507) + +# [Dataset](#contents) + +Dataset used: [imagenet2012](http://www.image-net.org/) + +- Dataset size: ~125G, 224*224 colorful images in 1000 classes +- Train: 120G, 1281167 images +- Test: 5G, 50000 images +- Data format: RGB images +- Note: Data will be processed in src/dataset.py + +# [Features](#contents) + +## [Mixed Precision](#contents) + +The [mixed precision](https://www.mindspore.cn/tutorial/training/en/master/advanced_use/enable_mixed_precision.html) training method accelerates the deep learning neural network training process by using both the single-precision and half-precision data formats, and maintains the network precision achieved by the single-precision training at the same time. Mixed precision training can accelerate the computation process, reduce memory usage, and enable a larger model or batch size to be trained on specific hardware. + +For FP16 operators, if the input data type is FP32, the backend of MindSpore will automatically handle it with reduced precision. Users could check the reduced-precision operators by enabling INFO log and then searching 鈥榬educe precision鈥�. + +# [Environment Requirements](#contents) + +- Hardware锛圓scend锛� +- Prepare hardware environment with Ascend processor. +- Framework +- [MindSpore](https://www.mindspore.cn/install/en) +- For more information, please check the resources below锛� +- [MindSpore Tutorials](https://www.mindspore.cn/tutorials/en/r1.3/index.html) +- [MindSpore Python API](https://www.mindspore.cn/docs/api/en/r1.3/index.html) + +If you want to run in modelarts, please check the official documentation of [modelarts](https://support.huaweicloud.com/modelarts/), and you can start training and evaluation as follows: + +```bash +# run distributed training on modelarts example +# (1) First, Perform a or b. +# a. Set "enable_modelarts=True" on yaml file. +# Set other parameters on yaml file you need. +# b. Add "enable_modelarts=True" on the website UI interface. +# Add other parameters on the website UI interface. +# (2) Set the code directory to "/path/SE_ResNeXt50" on the website UI interface. +# (3) Set the startup file to "train.py" on the website UI interface. +# (4) Set the "Dataset path" and "Output file path" and "Job log path" to your path on the website UI interface. +# (5) Create your job. + +# run evaluation on modelarts example +# (1) Copy or upload your trained model to S3 bucket. +# (2) Perform a or b. +# a. Set "enable_modelarts=True" on yaml file. +# Set "checkpoint_file_path='/cache/checkpoint_path/model.ckpt'" on yaml file. +# Set "checkpoint_url=/The path of checkpoint in S3/" on yaml file. +# b. Add "enable_modelarts=True" on the website UI interface. +# Add "checkpoint_file_path='/cache/checkpoint_path/model.ckpt'" on the website UI interface. +# Add "checkpoint_url=/The path of checkpoint in S3/" on the website UI interface. +# (3) Set the code directory to "/path/se_resnext50" on the website UI interface. +# (4) Set the startup file to "eval.py" on the website UI interface. +# (5) Set the "Dataset path" and "Output file path" and "Job log path" to your path on the website UI interface. +# (6) Create your job. +``` + +# [Script description](#contents) + +## [Script and sample code](#contents) + +```python +. +鈹斺攢SE_ResNeXt50 + 鈹溾攢README.md + 鈹溾攢README_CN.md + 鈹溾攢scripts + 鈹溾攢run_standalone_train.sh # launch standalone training for ascend(1p) + 鈹溾攢run_distribute_train.sh # launch distributed training for ascend(8p) + 鈹斺攢run_eval.sh # launch evaluating + 鈹溾攢src + 鈹溾攢backbone + 鈹溾攢_init_.py # initialize + 鈹溾攢resnet.py # SE_ResNeXt50 backbone + 鈹溾攢utils + 鈹溾攢_init_.py # initialize + 鈹溾攢cunstom_op.py # network operation + 鈹溾攢logging.py # print log + 鈹溾攢optimizers_init_.py # get parameters + 鈹溾攢sampler.py # distributed sampler + 鈹溾攢var_init_.py # calculate gain value + 鈹溾攢_init_.py # initialize + 鈹溾攢config.py # parameter configuration + 鈹溾攢crossentropy.py # CrossEntropy loss function + 鈹溾攢dataset.py # data preprocessing + 鈹溾攢head.py # common head + 鈹溾攢image_classification.py # get resnet + 鈹溾攢linear_warmup.py # linear warmup learning rate + 鈹溾攢warmup_cosine_annealing.py # learning rate each step + 鈹溾攢warmup_step_lr.py # warmup step learning rate + 鈹溾攢鈹€ model_utils + 鈹溾攢鈹€config.py # parameter configuration + 鈹溾攢鈹€device_adapter.py # device adapter + 鈹溾攢鈹€local_adapter.py # local adapter + 鈹溾攢鈹€moxing_adapter.py # moxing adapter + 鈹溾攢鈹€ default_config.yaml # parameter configuration + 鈹溾攢eval.py # eval net + 鈹溾攢鈹€train.py # train net + 鈹溾攢鈹€export.py # export mindir script + 鈹溾攢鈹€mindspore_hub_conf.py # mindspore hub interface + +``` + +## [Script Parameters](#contents) + +Parameters for both training and evaluating can be set in config.py. + +```python +image_size: [224,224] # image size +num_classes: 1000 # dataset class number +batch_size: 1 # batch size of input +lr: 0.05 # base learning rate +lr_scheduler: "cosine_annealing" # learning rate mode +lr_epochs: [30,60,90,120] # epoch of lr changing +lr_gamma: 0.1 # decrease lr by a factor of exponential +eta_min: 0 # eta_min in cosine_annealing scheduler +T_max: 150 # T-max in cosine_annealing scheduler +max_epoch: 150 # max epoch num to train the model +warmup_epochs: 1 # warmup epoch +weight_decay: 0.0001 # weight decay +momentum: 0.9 # momentum +is_dynamic_loss_scale: 0 # dynamic loss scale +loss_scale: 1024 # loss scale +label_smooth: 1 # label_smooth +label_smooth_factor: 0. # label_smooth_factor +per_batch_size: 128 # batch size of input tensor +ckpt_interval: 2000 # ckpt_interval +ckpt_save_max: 5 # max of checkpoint save +is_save_on_master: 1 +rank_save_ckpt_flag: 0 # local rank of distributed +outputs_dir: "" # output path +log_path: "./output_log" # log path +``` + +## [Training Process](#contents) + +### Usage + +You can start training by python script: + +```bash +python train.py --data_dir ~/imagenet/train/ --device_target Ascend --run_distribute 0 +``` + +or shell script: + +```bash +Ascend: + # distribute training example(8p) + sh run_distribute_train.sh RANK_TABLE_FILE DATA_PATH + # standalone training + sh run_standalone_train.sh DEVICE_ID DATA_PATH +``` + +#### Launch + +```bash +# distributed training example(8p) for Ascend +sh run_distribute_train.sh RANK_TABLE_FILE /dataset/train +# standalone training example for Ascend +sh run_standalone_train.sh 0 /dataset/train +``` + +You can find checkpoint file together with result in log. + +## [Evaluation Process](#contents) + +### Usage + +You can start executing by python script: + +Before execution, modify the configuration item run_distribute of default config.yaml to False. + +```bash +python eval.py --data_path ~/imagenet/val/ --device_target Ascend --checkpoint_file_path se_resnext50.ckpt +``` + +or shell script: + +```bash +# Evaluation +sh run_eval.sh DEVICE_ID DATA_PATH PRETRAINED_CKPT_PATH DEVICE_TARGET +``` + +DEVICE_TARGET is Ascend, default is Ascend. + +#### Launch + +```bash +# Evaluation with checkpoint +sh run_eval.sh 0 /opt/npu/datasets/classification/val /se_resnext50.ckpt Ascend +``` + +#### Result + +Evaluation result will be stored in the scripts path. Under this, you can find result like the followings in log. + +```log +acc=78.81%(TOP1) +acc=94.40%(TOP5) +``` + +## [Model Export](#contents) + +```bash +python export.py --device_target [DEVICE_TARGET] --checkpoint_file_path [CKPT_PATH] --file_format [EXPORT_FORMAT] +``` + +The `checkpoint_file_path` parameter is required. +`EXPORT_FORMAT` should be in ["AIR", "MINDIR"]. + +Export on ModelArts (If you want to run in modelarts, please check the official documentation of [modelarts](https://support.huaweicloud.com/modelarts/), and you can start as follows) + +```python +# Export on ModelArts +# (1) Perform a or b. +# a. Set "enable_modelarts=True" on default_config.yaml file. +# Set "checkpoint_file_path='/cache/checkpoint_path/model.ckpt'" on default_config.yaml file. +# Set "checkpoint_url='s3://dir_to_trained_ckpt/'" on default_config.yaml file. +# Set "file_name='./resnext50'" on default_config.yaml file. +# Set "file_format='AIR'" on default_config.yaml file. +# Set other parameters on default_config.yaml file you need. +# b. Add "enable_modelarts=True" on the website UI interface. +# Add "checkpoint_file_path='/cache/checkpoint_path/model.ckpt'" on the website UI interface. +# Add "checkpoint_url='s3://dir_to_trained_ckpt/'" on the website UI interface. +# Add "file_name='./SE_ResNeXt50'" on the website UI interface. +# Add "file_format='AIR'" on the website UI interface. +# Add other parameters on the website UI interface. +# (2) Set the config_path="/path/yaml file" on the website UI interface. +# (3) Set the code directory to "/path/SE_ResNeXt50" on the website UI interface. +# (4) Set the startup file to "export.py" on the website UI interface. +# (5) Set the "Output file path" and "Job log path" to your path on the website UI interface. +# (6) Create your job. +``` + +## [Inference Process](#contents) + +### Usage + +Before performing inference, the mindir file must be exported by export.py. Currently, only batchsize 1 is supported. + +```bash +# Ascend310 inference +bash run_infer_310.sh [MINDIR_PATH] [DATA_PATH] [DEVICE_ID] +``` + +`DEVICE_ID` is optional, default value is 0. + +### result + +Inference result is saved in current path, you can find result in acc.log file. + +```log +Total data:50000, top1 accuracy:0.79174, top5 accuracy:0.94492. +``` + +# [Model description](#contents) + +## [Performance](#contents) + +### Training Performance + +| Parameters | SE_ResNeXt50 | +| -------------------------- | ----------------------------------------------------------- | +| Resource | Ascend 910; cpu 2.60GHz, 192cores; memory 755G; OS Euler2.8 | +| uploaded Date | 08/31/2021 | +| MindSpore Version | 1.3.0 | +| Dataset | ImageNet2012 | +| Training Parameters | default_config.yaml | +| Optimizer | Momentum | +| Loss Function | SoftmaxCrossEntropy | +| Loss | 1.4159617 | +| Accuracy | 78%(TOP1) | +| Total time | 10 h (8ps) | +| Checkpoint for Fine tuning | 212 M(.ckpt file) | + +#### Inference Performance + +| Parameters | | | +| ----------------- | ----------------------- | ------------ | +| Resource | Ascend 910; OS Euler2.8 | Ascend 310 | +| uploaded Date | 08/31/2021 | 08/31/2021 | +| MindSpore Version | 1.3.0 | 1.3.0 | +| Dataset | ImageNet2012 | ImageNet2012 | +| batch_size | 128 | 1 | +| outputs | probability | probability | +| Accuracy | acc=78.61%(TOP1) | | + +# [Description of Random Situation](#contents) + +In dataset.py, we set the seed inside 鈥渃reate_dataset" function. We also use random seed in train.py. + +# [ModelZoo Homepage](#contents) + +Please check the official [homepage](https://gitee.com/mindspore/mindspore/tree/master/model_zoo). diff --git a/official/cv/se_resnext50/README_CN.md b/official/cv/se_resnext50/README_CN.md new file mode 100644 index 0000000000000000000000000000000000000000..51de7cd0d82474e7e988201e525b9186337737e7 --- /dev/null +++ b/official/cv/se_resnext50/README_CN.md @@ -0,0 +1,340 @@ +# 鐩綍 + +- [鐩綍](#鐩綍) +- [SE_ResNeXt50璇存槑](#resnext50璇存槑) +- [妯″瀷鏋舵瀯](#妯″瀷鏋舵瀯) +- [鏁版嵁闆哴(#鏁版嵁闆�) +- [鐗规€(#鐗规€�) + - [娣峰悎绮惧害](#娣峰悎绮惧害) +- [鐜瑕佹眰](#鐜瑕佹眰) +- [鑴氭湰璇存槑](#鑴氭湰璇存槑) + - [鑴氭湰鍙婃牱渚嬩唬鐮乚(#鑴氭湰鍙婃牱渚嬩唬鐮�) + - [鑴氭湰鍙傛暟](#鑴氭湰鍙傛暟) + - [璁粌杩囩▼](#璁粌杩囩▼) + - [鐢ㄦ硶](#鐢ㄦ硶) + - [鏍蜂緥](#鏍蜂緥) + - [璇勪及杩囩▼](#璇勪及杩囩▼) + - [鐢ㄦ硶](#鐢ㄦ硶-1) + - [鏍蜂緥](#鏍蜂緥-1) + - [缁撴灉](#缁撴灉) + - [妯″瀷瀵煎嚭](#妯″瀷瀵煎嚭) + - [鎺ㄧ悊杩囩▼](#鎺ㄧ悊杩囩▼) + - [鐢ㄦ硶](#鐢ㄦ硶-2) + - [缁撴灉](#缁撴灉-2) +- [妯″瀷鎻忚堪](#妯″瀷鎻忚堪) + - [鎬ц兘](#鎬ц兘) + - [璁粌鎬ц兘](#璁粌鎬ц兘) + - [鎺ㄧ悊鎬ц兘](#鎺ㄧ悊鎬ц兘) +- [闅忔満鎯呭喌璇存槑](#闅忔満鎯呭喌璇存槑) +- [ModelZoo涓婚〉](#modelzoo涓婚〉) + +# SE_ResNeXt50璇存槑 + +SE_ResNeXt50鏄疪esNeXt50鐨勪竴涓彉浣擄紝鍙弬鑰冭鏂�1銆俁esNeXt50鏄竴涓畝鍗曘€侀珮搴︽ā鍧楀寲鐨勫浘鍍忓垎绫荤綉缁滄灦鏋勩€俁esNeXt50鐨勮璁′负缁熶竴鐨勩€佸鍒嗘敮鐨勬灦鏋勶紝璇ユ灦鏋勪粎闇€璁剧疆鍑犱釜瓒呭弬鏁般€傛绛栫暐鎻愪緵浜嗕竴涓柊缁村害锛屽嵆鈥滃熀鏁扳€濓紙杞崲闆嗙殑澶у皬锛夛紝瀹冩槸娣卞害鍜屽搴︾淮搴︿箣澶栫殑涓€涓噸瑕佸洜绱狅紝ResNeXt50鍙弬鑰冭鏂�2銆� + +璁烘枃1锛欽ie Hu, Li Shen, Samuel Albanie, Gang Sun, Enhua Wu."Squeeze-and-Excitation Networks" + +璁烘枃2锛歋aining Xie, Ross Girshick, Piotr Doll谩r, Zhuowen Tu, Kaiming He." Aggregated Residual Transformations for Deep Neural Networks" + +# 妯″瀷鏋舵瀯 + +SE_ResNeXt鏁翠綋缃戠粶鏋舵瀯濡備笅锛� + +[閾炬帴](https://arxiv.org/abs/1709.01507) + +# 鏁版嵁闆� + +浣跨敤鐨勬暟鎹泦锛歔ImageNet2012](http://www.image-net.org/) + +- 鏁版嵁闆嗗ぇ灏忥細绾�125G, 鍏�1000涓被锛�224*224褰╄壊鍥惧儚 + - 璁粌闆嗭細120G锛屽叡1281167寮犲浘鍍� + - 娴嬭瘯闆嗭細5G锛�50000寮犲浘鍍� +- 鏁版嵁鏍煎紡锛歊GB鍥惧儚銆� + - 娉細鏁版嵁鍦╯rc/dataset.py涓鐞嗐€� + +# 鐗规€� + +## 娣峰悎绮惧害 + +閲囩敤[娣峰悎绮惧害](https://www.mindspore.cn/tutorial/training/zh-CN/master/advanced_use/enable_mixed_precision.html)鐨勮缁冩柟娉曚娇鐢ㄦ敮鎸佸崟绮惧害鍜屽崐绮惧害鏁版嵁鏉ユ彁楂樻繁搴﹀涔犵缁忕綉缁滅殑璁粌閫熷害锛屽悓鏃朵繚鎸佸崟绮惧害璁粌鎵€鑳借揪鍒扮殑缃戠粶绮惧害銆傛贩鍚堢簿搴﹁缁冩彁楂樿绠楅€熷害銆佸噺灏戝唴瀛樹娇鐢ㄧ殑鍚屾椂锛屾敮鎸佸湪鐗瑰畾纭欢涓婅缁冩洿澶х殑妯″瀷鎴栧疄鐜版洿澶ф壒娆$殑璁粌銆� + +浠P16绠楀瓙涓轰緥锛屽鏋滆緭鍏ユ暟鎹被鍨嬩负FP32锛孧indSpore鍚庡彴浼氳嚜鍔ㄩ檷浣庣簿搴︽潵澶勭悊鏁版嵁銆傜敤鎴峰彲鎵撳紑INFO鏃ュ織锛屾悳绱⑩€渞educe precision鈥濇煡鐪嬬簿搴﹂檷浣庣殑绠楀瓙銆� + +# 鐜瑕佹眰 + +- 纭欢锛圓scend锛� + - 鍑嗗Ascend澶勭悊鍣ㄦ惌寤虹‖浠剁幆澧冦€� +- 妗嗘灦 + - [MindSpore](https://www.mindspore.cn/install) +- 濡傞渶鏌ョ湅璇︽儏锛岃鍙傝濡備笅璧勬簮锛� + - [MindSpore鏁欑▼](https://www.mindspore.cn/tutorials/zh-CN/r1.3/index.html) + - [MindSpore Python API](https://www.mindspore.cn/docs/api/en/r1.3/index.html) + +濡傛灉瑕佸湪modelarts涓婅繘琛屾ā鍨嬬殑璁粌锛屽彲浠ュ弬鑰僲odelarts鐨勫畼鏂规寚瀵兼枃妗�(https://support.huaweicloud.com/modelarts/) +寮€濮嬭繘琛屾ā鍨嬬殑璁粌鍜屾帹鐞嗭紝鍏蜂綋鎿嶄綔濡備笅锛� + +```bash +# 鍦╩odelarts涓婁娇鐢ㄥ垎甯冨紡璁粌鐨勭ず渚嬶細 +# (1) 閫夋嫨a鎴栬€卋鍏朵腑涓€绉嶆柟寮忋€� +# a. 璁剧疆 "enable_modelarts=True" 銆� +# 鍦▂aml鏂囦欢涓婅缃綉缁滄墍闇€鐨勫弬鏁般€� +# b. 澧炲姞 "enable_modelarts=True" 鍙傛暟鍦╩odearts鐨勭晫闈笂銆� +# 鍦╩odelarts鐨勭晫闈笂璁剧疆缃戠粶鎵€闇€鐨勫弬鏁般€� +# (2) 鍦╩odelarts鐨勭晫闈笂璁剧疆浠g爜鐨勮矾寰� "/path/SE_ResNeXt50"銆� +# (3) 鍦╩odelarts鐨勭晫闈笂璁剧疆妯″瀷鐨勫惎鍔ㄦ枃浠� "train.py" 銆� +# (4) 鍦╩odelarts鐨勭晫闈笂璁剧疆妯″瀷鐨勬暟鎹矾寰� "Dataset path" , +# 妯″瀷鐨勮緭鍑鸿矾寰�"Output file path" 鍜屾ā鍨嬬殑鏃ュ織璺緞 "Job log path" 銆� +# (5) 寮€濮嬫ā鍨嬬殑璁粌銆� + +# 鍦╩odelarts涓婁娇鐢ㄦā鍨嬫帹鐞嗙殑绀轰緥 +# (1) 鎶婅缁冨ソ鐨勬ā鍨嬪湴鏂瑰埌妗剁殑瀵瑰簲浣嶇疆銆� +# (2) 閫夊潃a鎴栬€卋鍏朵腑涓€绉嶆柟寮忋€� +# a. 璁剧疆 "enable_modelarts=True" +# 璁剧疆 "checkpoint_file_path='/cache/checkpoint_path/model.ckpt" 鍦� yaml 鏂囦欢. +# 璁剧疆 "checkpoint_url=/The path of checkpoint in S3/" 鍦� yaml 鏂囦欢. +# b. 澧炲姞 "enable_modelarts=True" 鍙傛暟鍦╩odearts鐨勭晫闈笂銆� +# 澧炲姞 "checkpoint_file_path='/cache/checkpoint_path/model.ckpt'" 鍙傛暟鍦╩odearts鐨勭晫闈笂銆� +# 澧炲姞 "checkpoint_url=/The path of checkpoint in S3/" 鍙傛暟鍦╩odearts鐨勭晫闈笂銆� +# (3) 鍦╩odelarts鐨勭晫闈笂璁剧疆浠g爜鐨勮矾寰� "/path/SE_ResNeXt50"銆� +# (4) 鍦╩odelarts鐨勭晫闈笂璁剧疆妯″瀷鐨勫惎鍔ㄦ枃浠� "eval.py" 銆� +# (5) 鍦╩odelarts鐨勭晫闈笂璁剧疆妯″瀷鐨勬暟鎹矾寰� "Dataset path" , +# 妯″瀷鐨勮緭鍑鸿矾寰�"Output file path" 鍜屾ā鍨嬬殑鏃ュ織璺緞 "Job log path" 銆� +# (6) 寮€濮嬫ā鍨嬬殑鎺ㄧ悊銆� +``` + +# 鑴氭湰璇存槑 + +## 鑴氭湰鍙婃牱渚嬩唬鐮� + +```python +. +鈹斺攢SE_ResNeXt50 + 鈹溾攢README.md + 鈹溾攢README_CN.md + 鈹溾攢scripts + 鈹溾攢run_standalone_train.sh # 鍚姩Ascend鍗曟満璁粌锛堝崟鍗★級 + 鈹溾攢run_distribute_train.sh # 鍚姩Ascend鍒嗗竷寮忚缁冿紙8鍗★級 + 鈹斺攢run_eval.sh # 鍚姩璇勪及 + 鈹溾攢src + 鈹溾攢backbone + 鈹溾攢_init_.py # 鍒濆鍖� + 鈹溾攢resnet.py # SE_ResNeXt50楠ㄥ共 + 鈹溾攢utils + 鈹溾攢_init_.py # 鍒濆鍖� + 鈹溾攢cunstom_op.py # 缃戠粶鎿嶄綔 + 鈹溾攢logging.py # 鎵撳嵃鏃ュ織 + 鈹溾攢optimizers_init_.py # 鑾峰彇鍙傛暟 + 鈹溾攢sampler.py # 鍒嗗竷寮忛噰鏍峰櫒 + 鈹溾攢var_init_.py # 璁$畻澧炵泭鍊� + 鈹溾攢_init_.py # 鍒濆鍖� + 鈹溾攢config.py # 鍙傛暟閰嶇疆 + 鈹溾攢crossentropy.py # 浜ゅ弶鐔垫崯澶卞嚱鏁� + 鈹溾攢dataset.py # 鏁版嵁棰勫鐞� + 鈹溾攢head.py # 甯歌澶� + 鈹溾攢image_classification.py # 鑾峰彇ResNet + 鈹溾攢linear_warmup.py # 绾挎€х儹韬涔犵巼 + 鈹溾攢warmup_cosine_annealing.py # 姣忔杩唬鐨勫涔犵巼 + 鈹溾攢warmup_step_lr.py # 鐑韩杩唬瀛︿範鐜� + 鈹溾攢model_utils + 鈹溾攢鈹€config.py # 鍙傛暟閰嶇疆 + 鈹溾攢鈹€device_adapter.py # 璁惧閰嶇疆 + 鈹溾攢鈹€local_adapter.py # 鏈湴璁惧閰嶇疆 + 鈹溾攢鈹€moxing_adapter.py # modelarts璁惧閰嶇疆 + 鈹溾攢鈹€ default_config.yaml # 鍙傛暟閰嶇疆椤� + 鈹溾攢eval.py # 璇勪及缃戠粶 + 鈹溾攢鈹€train.py # 璁粌缃戠粶 + 鈹溾攢鈹€mindspore_hub_conf.py # MindSpore Hub鎺ュ彛 +``` + +## 鑴氭湰鍙傛暟 + +鍦╠efault_config.yaml 涓彲浠ュ悓鏃堕厤缃缁冨拰璇勪及鍙傛暟銆� + +```python +# Training options +image_size: [224,224] # 鍥惧儚澶у皬 +num_classes: 1000 # 鏁版嵁闆嗙被鏁� +batch_size: 1 # 杈撳叆鏁版嵁鎵规 +lr: 0.05 # 鍩虹瀛︿範鐜� +lr_scheduler: "cosine_annealing" # 瀛︿範鐜囨ā寮� +lr_epochs: [30,60,90,120] # 瀛︿範鐜囧彉鍖栬疆娆� +lr_gamma: 0.1 # 鍑忓皯LR鐨別xponential lr_scheduler鍥犲瓙 +eta_min: 0 # cosine_annealing璋冨害鍣ㄤ腑鐨別ta_min +T_max: 150 # cosine_annealing璋冨害鍣ㄤ腑鐨凾-max +max_epoch: 150 # 璁粌妯″瀷鐨勬渶澶ц疆娆℃暟閲� +warmup_epochs: 1 # 鐑韩杞 +weight_decay: 0.0001 # 鏉冮噸琛板噺 +momentum: 0.9 # 鍔ㄩ噺 +is_dynamic_loss_scale: 0 # 鍔ㄦ€佹崯澶辨斁澶� +loss_scale: 1024 # 鎹熷け鏀惧ぇ +label_smooth: 1 # 鏍囩骞虫粦 +label_smooth_factor: 0. # 鏍囩骞虫粦鍥犲瓙 +per_batch_size: 128 # 杈撳叆寮犻噺鐨勬壒娆″ぇ灏� +ckpt_interval: 2000 # 妫€鏌ョ偣闂撮殧 +ckpt_save_max: 5 # 妫€鏌ョ偣鏈€澶т繚瀛樻暟 +is_save_on_master: 1 +rank_save_ckpt_flag: 0 # 鍒嗗竷寮忔湰鍦拌繘绋嬪簭鍙� +outputs_dir: "" # 杈撳嚭璺緞 +log_path: "./output_log" # 鏃ュ織璺緞 +``` + +## 璁粌杩囩▼ + +### 鐢ㄦ硶 + +鎮ㄥ彲浠ラ€氳繃python鑴氭湰寮€濮嬭缁冿細 + +```bash +python train.py --data_dir ~/imagenet/train/ --device_target Ascend --run_distribute 0 +``` + +鎴栭€氳繃shell鑴氭湰寮€濮嬭缁冿細 + +```bash +Ascend: + # 鍒嗗竷寮忚缁冪ず渚嬶紙8鍗★級 + sh run_distribute_train.sh RANK_TABLE_FILE DATA_PATH + # 鍗曟満璁粌 + sh run_standalone_train.sh DEVICE_ID DATA_PATH +``` + +### 鏍蜂緥 + +```bash +# Ascend鍒嗗竷寮忚缁冪ず渚嬶紙8鍗★級 +sh run_distribute_train.sh RANK_TABLE_FILE /dataset/train +# Ascend鍗曟満璁粌绀轰緥 +sh run_standalone_train.sh 0 /dataset/train +``` + +鎮ㄥ彲浠ュ湪鏃ュ織涓壘鍒版鏌ョ偣鏂囦欢鍜岀粨鏋溿€� + +## 璇勪及杩囩▼ + +### 鐢ㄦ硶 + +鎮ㄥ彲浠ラ€氳繃python鑴氭湰鎵ц锛� + +鎵ц鍓嶏紝淇敼default_config.yaml鐨勯厤缃」run_distribute涓篎alse銆� + +```bash +python eval.py --data_path ~/imagenet/val/ --device_target Ascend --checkpoint_file_path se_resnext50.ckpt +``` + +鎴栭€氳繃shell鑴氭湰鎵ц锛� + +```bash +# 璇勪及 +sh run_eval.sh DEVICE_ID DATA_PATH PRETRAINED_CKPT_PATH DEVICE_TARGET +``` + +DEVICE_TARGET is Ascend锛� default is Ascend. + +#### 鏍蜂緥 + +```bash +# 妫€鏌ョ偣璇勪及 +sh run_eval.sh 0 /opt/npu/datasets/classification/val /se_resnext50.ckpt Ascend +``` + +#### 缁撴灉 + +璇勪及缁撴灉淇濆瓨鍦ㄨ剼鏈矾寰勪笅銆傛偍鍙互鍦ㄦ棩蹇椾腑鎵惧埌绫讳技浠ヤ笅鐨勭粨鏋溿€� + +```log +acc=78.81%(TOP1) +acc=94.40%(TOP5) +``` + +## 妯″瀷瀵煎嚭 + +```bash +python export.py --device_target [DEVICE_TARGET] --checkpoint_file_path [CKPT_PATH] --file_format [EXPORT_FORMAT] +``` + +`checkpoint_file_path` 鍙傛暟涓哄繀濉」銆� +`EXPORT_FORMAT` 鍙€� ["AIR", "MINDIR"]銆� + +ModelArts瀵煎嚭mindir + +```python +# (1) 鎶婅缁冨ソ鐨勬ā鍨嬪湴鏂瑰埌妗剁殑瀵瑰簲浣嶇疆銆� +# (2) 閫夊潃a鎴栬€卋鍏朵腑涓€绉嶆柟寮忋€� +# a. 璁剧疆 "enable_modelarts=True" +# 璁剧疆 "checkpoint_file_path='/cache/checkpoint_path/model.ckpt" 鍦� yaml 鏂囦欢銆� +# 璁剧疆 "checkpoint_url=/The path of checkpoint in S3/" 鍦� yaml 鏂囦欢銆� +# 璁剧疆 "file_name='./SE_ResNeXt50'"鍙傛暟鍦▂aml鏂囦欢銆� +# 璁剧疆 "file_format='AIR'" 鍙傛暟鍦▂aml鏂囦欢銆� +# b. 澧炲姞 "enable_modelarts=True" 鍙傛暟鍦╩odearts鐨勭晫闈笂銆� +# 澧炲姞 "checkpoint_file_path='/cache/checkpoint_path/model.ckpt'" 鍙傛暟鍦╩odearts鐨勭晫闈笂銆� +# 澧炲姞 "checkpoint_url=/The path of checkpoint in S3/" 鍙傛暟鍦╩odearts鐨勭晫闈笂銆� +# 璁剧疆 "file_name='./resnext50'"鍙傛暟鍦╩odearts鐨勭晫闈笂銆� +# 璁剧疆 "file_format='AIR'" 鍙傛暟鍦╩odelarts鐨勭晫闈笂銆� +# (3) 璁剧疆缃戠粶閰嶇疆鏂囦欢鐨勮矾寰� "config_path=/The path of config in S3/" +# (4) 鍦╩odelarts鐨勭晫闈笂璁剧疆浠g爜鐨勮矾寰� "/path/SE_ResNeXt50"銆� +# (5) 鍦╩odelarts鐨勭晫闈笂璁剧疆妯″瀷鐨勫惎鍔ㄦ枃浠� "export.py" 銆� +# 妯″瀷鐨勮緭鍑鸿矾寰�"Output file path" 鍜屾ā鍨嬬殑鏃ュ織璺緞 "Job log path" 銆� +# (6) 寮€濮嬪鍑簃indir銆� +``` + +## [鎺ㄧ悊杩囩▼](#contents) + +### 鐢ㄦ硶 + +鍦ㄦ墽琛屾帹鐞嗕箣鍓嶏紝闇€瑕侀€氳繃export.py瀵煎嚭mindir鏂囦欢銆� +鐩墠浠呭彲澶勭悊batch_Size涓�1銆� + +```bash +#Ascend310 鎺ㄧ悊 +bash run_infer_310.sh [MINDIR_PATH] [DATA_PATH] [DEVICE_ID] +``` + +`DEVICE_ID` 鍙€夛紝榛樿鍊间负 0銆� + +### 缁撴灉 + +鎺ㄧ悊缁撴灉淇濆瓨鍦ㄥ綋鍓嶈矾寰勶紝鍙湪acc.log涓湅鍒版渶缁堢簿搴︾粨鏋溿€� + +```log +Total data:50000, top1 accuracy:0.79174, top5 accuracy:0.94492. +``` + +# 妯″瀷鎻忚堪 + +## 鎬ц兘 + +### 璁粌鎬ц兘 + +| 鍙傛暟 | SE_ResNeXt50 | +| -------------------------- | ---------------------------------------------------------- | +| 璧勬簮 | Ascend 910锛汣PU 2.60GHz锛�192鏍革紱鍐呭瓨 755GB锛涚郴缁� Euler2.8 | +| 涓婁紶鏃ユ湡 | 2021-8-31 | +| MindSpore鐗堟湰 | 1.3.0 | +| 鏁版嵁闆� | ImageNet2012 | +| 璁粌鍙傛暟 | default_config.yaml | +| 浼樺寲鍣� | Momentum | +| 鎹熷け鍑芥暟 | Softmax浜ゅ弶鐔� | +| 鎹熷け | 1.4159617 | +| 鍑嗙‘鐜� | 78%(TOP1) | +| 鎬绘椂闀� | 10灏忔椂 锛�8鍗★級 | +| 璋冧紭妫€鏌ョ偣 | 212 M锛�.ckpt鏂囦欢锛� | + +#### 鎺ㄧ悊鎬ц兘 + +| 鍙傛暟 | | | +| -------------------------- | ----------------------------- | -------------------- | +| 璧勬簮 | Ascend 910锛涚郴缁� Euler2.8 | Ascend 310 | +| 涓婁紶鏃ユ湡 | 2021-8-31 | 2021-8-31 | +| MindSpore鐗堟湰 | 1.3.0 | 1.3.0 | +| 鏁版嵁闆� | ImageNet2012 | ImageNet2012 | +| batch_size | 128 | 1 | +| 杈撳嚭 | 姒傜巼 | 姒傜巼 | +| 鍑嗙‘鐜� | acc=78.61%(TOP1) | | + +# 闅忔満鎯呭喌璇存槑 + +dataset.py涓缃簡鈥渃reate_dataset鈥濆嚱鏁板唴鐨勭瀛愶紝鍚屾椂杩樹娇鐢ㄤ簡train.py涓殑闅忔満绉嶅瓙銆� + +# ModelZoo涓婚〉 + +璇锋祻瑙堝畼缃慬涓婚〉](https://gitee.com/mindspore/mindspore/tree/master/model_zoo)銆� diff --git a/official/cv/se_resnext50/ascend310_infer/inc/utils.h b/official/cv/se_resnext50/ascend310_infer/inc/utils.h new file mode 100644 index 0000000000000000000000000000000000000000..f8ae1e5b473d869b77af8d725a280d7c7665527c --- /dev/null +++ b/official/cv/se_resnext50/ascend310_infer/inc/utils.h @@ -0,0 +1,35 @@ +/** + * Copyright 2021 Huawei Technologies Co., Ltd + * + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +#ifndef MINDSPORE_INFERENCE_UTILS_H_ +#define MINDSPORE_INFERENCE_UTILS_H_ + +#include <sys/stat.h> +#include <dirent.h> +#include <vector> +#include <string> +#include <memory> +#include "include/api/types.h" + +std::vector<std::string> GetAllFiles(std::string_view dirName); +DIR *OpenDir(std::string_view dirName); +std::string RealPath(std::string_view path); +mindspore::MSTensor ReadFileToTensor(const std::string &file); +int WriteResult(const std::string& imageFile, const std::vector<mindspore::MSTensor> &outputs); +std::vector<std::string> GetAllFiles(std::string dir_name); +std::vector<std::vector<std::string>> GetAllInputData(std::string dir_name); + +#endif diff --git a/official/cv/se_resnext50/ascend310_infer/src/CMakeLists.txt b/official/cv/se_resnext50/ascend310_infer/src/CMakeLists.txt new file mode 100644 index 0000000000000000000000000000000000000000..0397995b0e0b37c4fa7c39c93ebd41011d5bd936 --- /dev/null +++ b/official/cv/se_resnext50/ascend310_infer/src/CMakeLists.txt @@ -0,0 +1,14 @@ +cmake_minimum_required(VERSION 3.14.1) +project(MindSporeCxxTestcase[CXX]) +add_compile_definitions(_GLIBCXX_USE_CXX11_ABI=0) +set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -O0 -g -std=c++17 -Werror -Wall -fPIE -Wl,--allow-shlib-undefined") +set(PROJECT_SRC_ROOT ${CMAKE_CURRENT_LIST_DIR}/) +option(MINDSPORE_PATH "mindspore install path" "") +include_directories(${MINDSPORE_PATH}) +include_directories(${MINDSPORE_PATH}/include) +include_directories(${PROJECT_SRC_ROOT}/../) +find_library(MS_LIB libmindspore.so ${MINDSPORE_PATH}/lib) +file(GLOB_RECURSE MD_LIB ${MINDSPORE_PATH}/_c_dataengine*) + +add_executable(main main.cc utils.cc) +target_link_libraries(main ${MS_LIB} ${MD_LIB} gflags) diff --git a/official/cv/se_resnext50/ascend310_infer/src/build.sh b/official/cv/se_resnext50/ascend310_infer/src/build.sh new file mode 100644 index 0000000000000000000000000000000000000000..7fac9cff3a98c83bce7e8f66053fab2ecebab86d --- /dev/null +++ b/official/cv/se_resnext50/ascend310_infer/src/build.sh @@ -0,0 +1,18 @@ +#!/bin/bash +# Copyright 2021 Huawei Technologies Co., Ltd +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================ + +cmake . -DMINDSPORE_PATH="`pip3.7 show mindspore-ascend | grep Location | awk '{print $2"/mindspore"}' | xargs realpath`" +make \ No newline at end of file diff --git a/official/cv/se_resnext50/ascend310_infer/src/main.cc b/official/cv/se_resnext50/ascend310_infer/src/main.cc new file mode 100644 index 0000000000000000000000000000000000000000..d85816925229fde1d1eb784295ba57ba985d6c31 --- /dev/null +++ b/official/cv/se_resnext50/ascend310_infer/src/main.cc @@ -0,0 +1,145 @@ +/** + * Copyright 2021 Huawei Technologies Co., Ltd + * + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +#include <sys/time.h> +#include <gflags/gflags.h> +#include <dirent.h> +#include <iostream> +#include <string> +#include <algorithm> +#include <iosfwd> +#include <vector> +#include <fstream> +#include <sstream> + +#include "include/api/model.h" +#include "include/api/context.h" +#include "include/api/types.h" +#include "include/api/serialization.h" +#include "include/dataset/vision_ascend.h" +#include "include/dataset/execute.h" +#include "include/dataset/transforms.h" +#include "include/dataset/vision.h" +#include "inc/utils.h" + +using mindspore::dataset::vision::Decode; +using mindspore::dataset::vision::Resize; +using mindspore::dataset::vision::CenterCrop; +using mindspore::dataset::vision::Normalize; +using mindspore::dataset::vision::HWC2CHW; +using mindspore::dataset::TensorTransform; +using mindspore::Context; +using mindspore::Serialization; +using mindspore::Model; +using mindspore::Status; +using mindspore::ModelType; +using mindspore::GraphCell; +using mindspore::kSuccess; +using mindspore::MSTensor; +using mindspore::dataset::Execute; + + +DEFINE_string(mindir_path, "", "mindir path"); +DEFINE_string(dataset_path, ".", "dataset path"); +DEFINE_int32(device_id, 0, "device id"); + +int main(int argc, char **argv) { + gflags::ParseCommandLineFlags(&argc, &argv, true); + if (RealPath(FLAGS_mindir_path).empty()) { + std::cout << "Invalid mindir" << std::endl; + return 1; + } + + auto context = std::make_shared<Context>(); + auto ascend310 = std::make_shared<mindspore::Ascend310DeviceInfo>(); + ascend310->SetDeviceID(FLAGS_device_id); + context->MutableDeviceInfo().push_back(ascend310); + mindspore::Graph graph; + Serialization::Load(FLAGS_mindir_path, ModelType::kMindIR, &graph); + Model model; + Status ret = model.Build(GraphCell(graph), context); + if (ret != kSuccess) { + std::cout << "ERROR: Build failed." << std::endl; + return 1; + } + + auto all_files = GetAllInputData(FLAGS_dataset_path); + if (all_files.empty()) { + std::cout << "ERROR: no input data." << std::endl; + return 1; + } + + std::map<double, double> costTime_map; + size_t size = all_files.size(); + + std::shared_ptr<TensorTransform> decode(new Decode()); + std::shared_ptr<TensorTransform> resize(new Resize({256, 256})); + std::shared_ptr<TensorTransform> centercrop(new CenterCrop({224, 224})); + std::shared_ptr<TensorTransform> normalize(new Normalize({123.675, 116.28, 103.53}, + {58.395, 57.12, 57.375})); + std::shared_ptr<TensorTransform> hwc2chw(new HWC2CHW()); + + std::vector<std::shared_ptr<TensorTransform>> trans_list; + trans_list = {decode, resize, centercrop, normalize, hwc2chw}; + + mindspore::dataset::Execute SingleOp(trans_list); + + for (size_t i = 0; i < size; ++i) { + for (size_t j = 0; j < all_files[i].size(); ++j) { + struct timeval start = {0}; + struct timeval end = {0}; + double startTimeMs; + double endTimeMs; + std::vector<MSTensor> inputs; + std::vector<MSTensor> outputs; + std::cout << "Start predict input files:" << all_files[i][j] <<std::endl; + auto imgDvpp = std::make_shared<MSTensor>(); + SingleOp(ReadFileToTensor(all_files[i][j]), imgDvpp.get()); + + inputs.emplace_back(imgDvpp->Name(), imgDvpp->DataType(), imgDvpp->Shape(), + imgDvpp->Data().get(), imgDvpp->DataSize()); + gettimeofday(&start, nullptr); + ret = model.Predict(inputs, &outputs); + gettimeofday(&end, nullptr); + if (ret != kSuccess) { + std::cout << "Predict " << all_files[i][j] << " failed." << std::endl; + return 1; + } + startTimeMs = (1.0 * start.tv_sec * 1000000 + start.tv_usec) / 1000; + endTimeMs = (1.0 * end.tv_sec * 1000000 + end.tv_usec) / 1000; + costTime_map.insert(std::pair<double, double>(startTimeMs, endTimeMs)); + WriteResult(all_files[i][j], outputs); + } + } + double average = 0.0; + int inferCount = 0; + + for (auto iter = costTime_map.begin(); iter != costTime_map.end(); iter++) { + double diff = 0.0; + diff = iter->second - iter->first; + average += diff; + inferCount++; + } + average = average / inferCount; + std::stringstream timeCost; + timeCost << "NN inference cost average time: "<< average << " ms of infer_count " << inferCount << std::endl; + std::cout << "NN inference cost average time: "<< average << "ms of infer_count " << inferCount << std::endl; + std::string fileName = "./time_Result" + std::string("/test_perform_static.txt"); + std::ofstream fileStream(fileName.c_str(), std::ios::trunc); + fileStream << timeCost.str(); + fileStream.close(); + costTime_map.clear(); + return 0; +} diff --git a/official/cv/se_resnext50/ascend310_infer/src/utils.cc b/official/cv/se_resnext50/ascend310_infer/src/utils.cc new file mode 100644 index 0000000000000000000000000000000000000000..d71f388b83d23c2813d8bfc883dbcf2e7e0e4ef0 --- /dev/null +++ b/official/cv/se_resnext50/ascend310_infer/src/utils.cc @@ -0,0 +1,185 @@ +/** + * Copyright 2021 Huawei Technologies Co., Ltd + * + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +#include <fstream> +#include <algorithm> +#include <iostream> +#include "inc/utils.h" + +using mindspore::MSTensor; +using mindspore::DataType; + + +std::vector<std::vector<std::string>> GetAllInputData(std::string dir_name) { + std::vector<std::vector<std::string>> ret; + + DIR *dir = OpenDir(dir_name); + if (dir == nullptr) { + return {}; + } + struct dirent *filename; + /* read all the files in the dir ~ */ + std::vector<std::string> sub_dirs; + while ((filename = readdir(dir)) != nullptr) { + std::string d_name = std::string(filename->d_name); + // get rid of "." and ".." + if (d_name == "." || d_name == ".." || d_name.empty()) { + continue; + } + std::string dir_path = RealPath(std::string(dir_name) + "/" + filename->d_name); + struct stat s; + lstat(dir_path.c_str(), &s); + if (!S_ISDIR(s.st_mode)) { + continue; + } + + sub_dirs.emplace_back(dir_path); + } + std::sort(sub_dirs.begin(), sub_dirs.end()); + + (void)std::transform(sub_dirs.begin(), sub_dirs.end(), std::back_inserter(ret), + [](const std::string &d) { return GetAllFiles(d); }); + + return ret; +} + + +std::vector<std::string> GetAllFiles(std::string dir_name) { + struct dirent *filename; + DIR *dir = OpenDir(dir_name); + if (dir == nullptr) { + return {}; + } + + std::vector<std::string> res; + while ((filename = readdir(dir)) != nullptr) { + std::string d_name = std::string(filename->d_name); + if (d_name == "." || d_name == ".." || d_name.size() <= 3) { + continue; + } + res.emplace_back(std::string(dir_name) + "/" + filename->d_name); + } + std::sort(res.begin(), res.end()); + + return res; +} + + +std::vector<std::string> GetAllFiles(std::string_view dirName) { + struct dirent *filename; + DIR *dir = OpenDir(dirName); + if (dir == nullptr) { + return {}; + } + std::vector<std::string> res; + while ((filename = readdir(dir)) != nullptr) { + std::string dName = std::string(filename->d_name); + if (dName == "." || dName == ".." || filename->d_type != DT_REG) { + continue; + } + res.emplace_back(std::string(dirName) + "/" + filename->d_name); + } + std::sort(res.begin(), res.end()); + for (auto &f : res) { + std::cout << "image file: " << f << std::endl; + } + return res; +} + + +int WriteResult(const std::string& imageFile, const std::vector<MSTensor> &outputs) { + std::string homePath = "./result_Files"; + for (size_t i = 0; i < outputs.size(); ++i) { + size_t outputSize; + std::shared_ptr<const void> netOutput; + netOutput = outputs[i].Data(); + outputSize = outputs[i].DataSize(); + int pos = imageFile.rfind('/'); + std::string fileName(imageFile, pos + 1); + fileName.replace(fileName.find('.'), fileName.size() - fileName.find('.'), '_' + std::to_string(i) + ".bin"); + std::string outFileName = homePath + "/" + fileName; + FILE *outputFile = fopen(outFileName.c_str(), "wb"); + fwrite(netOutput.get(), outputSize, sizeof(char), outputFile); + fclose(outputFile); + outputFile = nullptr; + } + return 0; +} + +mindspore::MSTensor ReadFileToTensor(const std::string &file) { + if (file.empty()) { + std::cout << "Pointer file is nullptr" << std::endl; + return mindspore::MSTensor(); + } + + std::ifstream ifs(file); + if (!ifs.good()) { + std::cout << "File: " << file << " is not exist" << std::endl; + return mindspore::MSTensor(); + } + + if (!ifs.is_open()) { + std::cout << "File: " << file << "open failed" << std::endl; + return mindspore::MSTensor(); + } + + ifs.seekg(0, std::ios::end); + size_t size = ifs.tellg(); + mindspore::MSTensor buffer(file, mindspore::DataType::kNumberTypeUInt8, {static_cast<int64_t>(size)}, nullptr, size); + + ifs.seekg(0, std::ios::beg); + ifs.read(reinterpret_cast<char *>(buffer.MutableData()), size); + ifs.close(); + + return buffer; +} + + +DIR *OpenDir(std::string_view dirName) { + if (dirName.empty()) { + std::cout << " dirName is null ! " << std::endl; + return nullptr; + } + std::string realPath = RealPath(dirName); + struct stat s; + lstat(realPath.c_str(), &s); + if (!S_ISDIR(s.st_mode)) { + std::cout << "dirName is not a valid directory !" << std::endl; + return nullptr; + } + DIR *dir; + dir = opendir(realPath.c_str()); + if (dir == nullptr) { + std::cout << "Can not open dir " << dirName << std::endl; + return nullptr; + } + std::cout << "Successfully opened the dir " << dirName << std::endl; + return dir; +} + +std::string RealPath(std::string_view path) { + char realPathMem[PATH_MAX] = {0}; + char *realPathRet = nullptr; + realPathRet = realpath(path.data(), realPathMem); + if (realPathRet == nullptr) { + std::cout << "File: " << path << " is not exist."; + return ""; + } + + std::string realPath(realPathMem); + std::cout << path << " realpath is: " << realPath << std::endl; + return realPath; +} diff --git a/official/cv/se_resnext50/create_imagenet2012_label.py b/official/cv/se_resnext50/create_imagenet2012_label.py new file mode 100644 index 0000000000000000000000000000000000000000..773855964154b5558a863b7a309efaa4d68efad7 --- /dev/null +++ b/official/cv/se_resnext50/create_imagenet2012_label.py @@ -0,0 +1,43 @@ +# Copyright 2021 Huawei Technologies Co., Ltd +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# less required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================ +"""create_imagenet2012_label""" +import os +import json +import argparse + +parser = argparse.ArgumentParser(description="resnet imagenet2012 label") +parser.add_argument("--img_path", type=str, required=True, help="imagenet2012 file path.") +args = parser.parse_args() + +def create_label(file_path): + print("[WARNING] Create imagenet label. Currently only use for Imagenet2012!") + dirs = os.listdir(file_path) + file_list = [] + for file in dirs: + file_list.append(file) + file_list = sorted(file_list) + total = 0 + img_label = {} + for i, file_dir in enumerate(file_list): + files = os.listdir(os.path.join(file_path, file_dir)) + for f in files: + img_label[f] = i + total += len(files) + with open("imagenet_label.json", "w+") as label: + json.dump(img_label, label) + print("[INFO] Completed! Total {} data.".format(total)) + +if __name__ == '__main__': + create_label(args.img_path) diff --git a/official/cv/se_resnext50/default_config.yaml b/official/cv/se_resnext50/default_config.yaml new file mode 100644 index 0000000000000000000000000000000000000000..10696db17283b45847350687345992d5f787fc15 --- /dev/null +++ b/official/cv/se_resnext50/default_config.yaml @@ -0,0 +1,73 @@ +# Builtin Configurations(DO NOT CHANGE THESE CONFIGURATIONS unless you know exactly what you are doing) +enable_modelarts: False +network: "se_resnext50" +# Url for modelarts +data_url: "" +train_url: "" +checkpoint_url: "" +# Path for local +run_distribute: False +enable_profiling: False +data_path: "/cache/data" +output_path: "/cache/train" +load_path: "/cache/checkpoint_path/" +device_target: "Ascend" +checkpoint_path: "./checkpoint/" +checkpoint_file_path: "" + +# ============================================================================== +# Training options +image_size: [224,224] +num_classes: 1000 +batch_size: 1 + +lr: 0.4 +lr_scheduler: "cosine_annealing" +lr_epochs: [30,60,90,120] +lr_gamma: 0.1 +eta_min: 0 +T_max: 150 +max_epoch: 150 +warmup_epochs: 1 + +weight_decay: 0.0001 +momentum: 0.9 +is_dynamic_loss_scale: 0 +loss_scale: 1024 +label_smooth: 1 +label_smooth_factor: 0.1 +per_batch_size: 128 + +ckpt_interval: 5 +ckpt_save_max: 5 +is_save_on_master: 1 +rank_save_ckpt_flag: 0 +outputs_dir: "" +log_path: "./output_log" + +# Export options +device_id: 0 +width: 224 +height: 224 +file_name: "se_resnext50" +file_format: "AIR" +result_path: "" +label_path: "" + +--- +# Help description for each configuration +enable_modelarts: "Whether training on modelarts, default: False" +data_url: "Dataset url for obs" +train_url: "Training output url for obs" +checkpoint_url: "The location of checkpoint for obs" +data_path: "Dataset path for local" +output_path: "Training output path for local" +load_path: "The location of checkpoint for obs" +device_target: "Target device type, available: [Ascend, CPU]" +enable_profiling: "Whether enable profiling while training, default: False" +num_classes: "Class for dataset" +batch_size: "Batch size for training and evaluation" +epoch_size: "Total training epochs." +keep_checkpoint_max: "keep the last keep_checkpoint_max checkpoint" +checkpoint_path: "The location of the checkpoint file." +checkpoint_file_path: "The location of the checkpoint file." diff --git a/official/cv/se_resnext50/eval.py b/official/cv/se_resnext50/eval.py new file mode 100644 index 0000000000000000000000000000000000000000..e339710b4b4400cc96fb32033b0dc5109002385d --- /dev/null +++ b/official/cv/se_resnext50/eval.py @@ -0,0 +1,207 @@ +# Copyright 2021 Huawei Technologies Co., Ltd +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================ +"""Eval""" +import os +import time +import datetime +import glob +import numpy as np +import mindspore.nn as nn + +from mindspore import Tensor, context +from mindspore.context import ParallelMode +from mindspore.communication.management import init, get_rank, get_group_size, release +from mindspore.ops import operations as P +from mindspore.ops import functional as F +from mindspore.common import dtype as mstype + +from src.utils.logging import get_logger +from src.utils.auto_mixed_precision import auto_mixed_precision +from src.utils.var_init import load_pretrain_model +from src.image_classification import get_network +from src.dataset import classification_dataset +from src.model_utils.config import config +from src.model_utils.moxing_adapter import moxing_wrapper + + +class ParameterReduce(nn.Cell): + """ParameterReduce""" + def __init__(self): + super(ParameterReduce, self).__init__() + self.cast = P.Cast() + self.reduce = P.AllReduce() + + def construct(self, x): + one = self.cast(F.scalar_to_array(1.0), mstype.float32) + out = x * one + ret = self.reduce(out) + return ret + + +def set_parameters(): + """set_parameters""" + if config.run_distribute: + init() + config.rank = get_rank() + config.group_size = get_group_size() + else: + config.rank = 0 + config.group_size = 1 + + config.outputs_dir = os.path.join(config.log_path, + datetime.datetime.now().strftime('%Y-%m-%d_time_%H_%M_%S')) + + config.logger = get_logger(config.outputs_dir, config.rank) + return config + + +def get_top5_acc(top5_arg, gt_class): + sub_count = 0 + for top5, gt in zip(top5_arg, gt_class): + if gt in top5: + sub_count += 1 + return sub_count + + +def get_result(model, top1_correct, top5_correct, img_tot): + """calculate top1 and top5 value.""" + results = [[top1_correct], [top5_correct], [img_tot]] + config.logger.info('before results=%s', results) + if config.run_distribute: + model_md5 = model.replace('/', '') + tmp_dir = '/cache' + if not os.path.exists(tmp_dir): + os.mkdir(tmp_dir) + top1_correct_npy = '/cache/top1_rank_{}_{}.npy'.format(config.rank, model_md5) + top5_correct_npy = '/cache/top5_rank_{}_{}.npy'.format(config.rank, model_md5) + img_tot_npy = '/cache/img_tot_rank_{}_{}.npy'.format(config.rank, model_md5) + np.save(top1_correct_npy, top1_correct) + np.save(top5_correct_npy, top5_correct) + np.save(img_tot_npy, img_tot) + while True: + rank_ok = True + for other_rank in range(config.group_size): + top1_correct_npy = '/cache/top1_rank_{}_{}.npy'.format(other_rank, model_md5) + top5_correct_npy = '/cache/top5_rank_{}_{}.npy'.format(other_rank, model_md5) + img_tot_npy = '/cache/img_tot_rank_{}_{}.npy'.format(other_rank, model_md5) + if not os.path.exists(top1_correct_npy) or not os.path.exists(top5_correct_npy) or \ + not os.path.exists(img_tot_npy): + rank_ok = False + if rank_ok: + break + + top1_correct_all = 0 + top5_correct_all = 0 + img_tot_all = 0 + for other_rank in range(config.group_size): + top1_correct_npy = '/cache/top1_rank_{}_{}.npy'.format(other_rank, model_md5) + top5_correct_npy = '/cache/top5_rank_{}_{}.npy'.format(other_rank, model_md5) + img_tot_npy = '/cache/img_tot_rank_{}_{}.npy'.format(other_rank, model_md5) + top1_correct_all += np.load(top1_correct_npy) + top5_correct_all += np.load(top5_correct_npy) + img_tot_all += np.load(img_tot_npy) + results = [[top1_correct_all], [top5_correct_all], [img_tot_all]] + results = np.array(results) + else: + results = np.array(results) + + config.logger.info('after results=%s', results) + return results + +@moxing_wrapper() +def test(): + """test""" + set_parameters() + context.set_context(mode=context.GRAPH_MODE, enable_auto_mixed_precision=True, + device_target=config.device_target, save_graphs=False) + if os.getenv('DEVICE_ID', "not_set").isdigit(): + context.set_context(device_id=int(os.getenv('DEVICE_ID'))) + + # init distributed + if config.run_distribute: + parallel_mode = ParallelMode.DATA_PARALLEL + context.set_auto_parallel_context(parallel_mode=parallel_mode, device_num=config.group_size, + gradients_mean=True) + + config.logger.save_args(config) + + # network + config.logger.important_info('start create network') + if os.path.isdir(config.checkpoint_file_path): + models = list(glob.glob(os.path.join(config.checkpoint_file_path, '*.ckpt'))) + print(models) + if config.checkpoint_file_path: + f = lambda x: -1 * int(os.path.splitext(os.path.split(x)[-1])[0].split('-')[-1].split('_')[0]) + else: + f = lambda x: -1 * int(os.path.splitext(os.path.split(x)[-1])[0].split('_')[-1]) + config.models = sorted(models, key=f) + else: + config.models = [config.checkpoint_file_path,] + + for model in config.models: + de_dataset = classification_dataset(config.data_path, image_size=config.image_size, + per_batch_size=config.per_batch_size, + max_epoch=1, rank=config.rank, group_size=config.group_size, + mode='eval') + eval_dataloader = de_dataset.create_tuple_iterator(output_numpy=True, num_epochs=1) + network = get_network(network=config.network, num_classes=config.num_classes, platform=config.device_target) + + load_pretrain_model(model, network, config) + + img_tot = 0 + top1_correct = 0 + top5_correct = 0 + if config.device_target == "Ascend": + network.to_float(mstype.float16) + else: + auto_mixed_precision(network) + network.set_train(False) + t_end = time.time() + it = 0 + for data, gt_classes in eval_dataloader: + output = network(Tensor(data, mstype.float32)) + output = output.asnumpy() + + top1_output = np.argmax(output, (-1)) + top5_output = np.argsort(output)[:, -5:] + + t1_correct = np.equal(top1_output, gt_classes).sum() + top1_correct += t1_correct + top5_correct += get_top5_acc(top5_output, gt_classes) + img_tot += config.per_batch_size + + if config.rank == 0 and it == 0: + t_end = time.time() + it = 1 + if config.rank == 0: + time_used = time.time() - t_end + fps = (img_tot - config.per_batch_size) * config.group_size / time_used + config.logger.info('Inference Performance: {:.2f} img/sec'.format(fps)) + results = get_result(model, top1_correct, top5_correct, img_tot) + top1_correct = results[0, 0] + top5_correct = results[1, 0] + img_tot = results[2, 0] + acc1 = 100.0 * top1_correct / img_tot + acc5 = 100.0 * top5_correct / img_tot + config.logger.info('after allreduce eval: top1_correct={}, tot={},' + 'acc={:.2f}%(TOP1)'.format(top1_correct, img_tot, acc1)) + config.logger.info('after allreduce eval: top5_correct={}, tot={},' + 'acc={:.2f}%(TOP5)'.format(top5_correct, img_tot, acc5)) + if config.run_distribute: + release() + + +if __name__ == "__main__": + test() diff --git a/official/cv/se_resnext50/export.py b/official/cv/se_resnext50/export.py new file mode 100644 index 0000000000000000000000000000000000000000..ec7699f46b51f091fac9ffb66b4c44805f1d5972 --- /dev/null +++ b/official/cv/se_resnext50/export.py @@ -0,0 +1,53 @@ +# Copyright 2021 Huawei Technologies Co., Ltd +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================ +""" +se_resnext50 export mindir. +""" +import os +import numpy as np +from mindspore.common import dtype as mstype +from mindspore import context, Tensor, load_checkpoint, load_param_into_net, export +from src.model_utils.config import config +from src.model_utils.moxing_adapter import moxing_wrapper +from src.image_classification import get_network +from src.utils.auto_mixed_precision import auto_mixed_precision + + +context.set_context(mode=context.GRAPH_MODE, device_target=config.device_target) +if config.device_target == "Ascend": + context.set_context(device_id=config.device_id) + +def modelarts_pre_process(): + '''modelarts pre process function.''' + config.file_name = os.path.join(config.output_path, config.file_name) + +@moxing_wrapper(pre_process=modelarts_pre_process) +def run_export(): + """run export.""" + network = get_network(network=config.network, num_classes=config.num_classes, platform=config.device_target) + + param_dict = load_checkpoint(config.checkpoint_file_path) + load_param_into_net(network, param_dict) + if config.device_target == "Ascend": + network.to_float(mstype.float16) + else: + auto_mixed_precision(network) + network.set_train(False) + input_shp = [config.batch_size, 3, config.height, config.width] + input_array = Tensor(np.random.uniform(-1.0, 1.0, size=input_shp).astype(np.float32)) + export(network, input_array, file_name=config.file_name, file_format=config.file_format) + +if __name__ == '__main__': + run_export() diff --git a/official/cv/se_resnext50/mindspore_hub_conf.py b/official/cv/se_resnext50/mindspore_hub_conf.py new file mode 100644 index 0000000000000000000000000000000000000000..20028866cb29e1f1002c13d4a1a326a6a264a7ac --- /dev/null +++ b/official/cv/se_resnext50/mindspore_hub_conf.py @@ -0,0 +1,22 @@ +# Copyright 2021 Huawei Technologies Co., Ltd +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================ +"""hub config.""" +from src.image_classification import get_network + +def create_network(name, *args, **kwargs): + if name != "renext50": + raise NotImplementedError(f"{name} is not implemented in the repo") + net = get_network(*args, **kwargs) + return net diff --git a/official/cv/se_resnext50/postprocess.py b/official/cv/se_resnext50/postprocess.py new file mode 100644 index 0000000000000000000000000000000000000000..62c84645a8a0f3eb92a59067d6416424e86077a3 --- /dev/null +++ b/official/cv/se_resnext50/postprocess.py @@ -0,0 +1,51 @@ +# Copyright 2021 Huawei Technologies Co., Ltd +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# less required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================ +"""post process for 310 inference""" +import os +import json +import argparse +import numpy as np + +batch_size = 1 +parser = argparse.ArgumentParser(description="resnet inference") +parser.add_argument("--result_path", type=str, required=True, help="result files path.") +parser.add_argument("--label_path", type=str, required=True, help="image file path.") +args = parser.parse_args() + +num_classes = 1000 + +def get_result(result_path, label_path): + files = os.listdir(result_path) + with open(label_path, "r") as label: + labels = json.load(label) + + top1 = 0 + top5 = 0 + total_data = len(files) + for file in files: + img_ids_name = file.split('_0.')[0] + data_path = os.path.join(result_path, img_ids_name + "_0.bin") + result = np.fromfile(data_path, dtype=np.float16).reshape(batch_size, num_classes) + for batch in range(batch_size): + predict = np.argsort(-result[batch], axis=-1) + if labels[img_ids_name+".JPEG"] == predict[0]: + top1 += 1 + if labels[img_ids_name+".JPEG"] in predict[:5]: + top5 += 1 + print(f"Total data: {total_data}, top1 accuracy: {top1/total_data}, top5 accuracy: {top5/total_data}.") + + +if __name__ == '__main__': + get_result(args.result_path, args.label_path) diff --git a/official/cv/se_resnext50/requirements.txt b/official/cv/se_resnext50/requirements.txt new file mode 100644 index 0000000000000000000000000000000000000000..2238567851e149895552b0629989b89dd3a03690 --- /dev/null +++ b/official/cv/se_resnext50/requirements.txt @@ -0,0 +1,2 @@ +pillow +pyyaml diff --git a/official/cv/se_resnext50/scripts/run_distribute_train.sh b/official/cv/se_resnext50/scripts/run_distribute_train.sh new file mode 100644 index 0000000000000000000000000000000000000000..ebb768e160e0f7b86754a3eba3709cdafeeadcf0 --- /dev/null +++ b/official/cv/se_resnext50/scripts/run_distribute_train.sh @@ -0,0 +1,74 @@ +#!/bin/bash +# Copyright 2021 Huawei Technologies Co., Ltd +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================ + +if [ $# != 2 ] && [ $# != 3 ] +then + echo "Usage: bash run_distribute_train.sh [RANK_TABLE_FILE] [DATA_PATH] [PRETRAINED_CKPT_PATH](optional)" + exit 1 +fi + +DATA_DIR=$2 +export RANK_TABLE_FILE=$1 +export RANK_SIZE=8 +export HCCL_CONNECT_TIMEOUT=600 +echo "hccl connect time out has changed to 600 second" +PATH_CHECKPOINT="" +if [ $# == 3 ] +then + PATH_CHECKPOINT=$3 +fi + +if [ ! -d $2 ]; +then echo "DATA_DIR Does Not Exist!" +fi + +if [ ! -f $1 ]; +then echo "RANK_TABLE_FILE Does Not Exist!" +fi + +cores=`cat /proc/cpuinfo|grep "processor" |wc -l` +echo "the number of logical core" $cores +avg_core_per_rank=`expr $cores \/ $RANK_SIZE` +core_gap=`expr $avg_core_per_rank \- 1` +echo "avg_core_per_rank" $avg_core_per_rank +echo "core_gap" $core_gap +for((i=0;i<RANK_SIZE;i++)) +do + start=`expr $i \* $avg_core_per_rank` + export DEVICE_ID=$i + export RANK_ID=$i + export DEPLOY_MODE=0 + export GE_USE_STATIC_MEMORY=1 + end=`expr $start \+ $core_gap` + cmdopt=$start"-"$end + + rm -rf LOG$i + mkdir ./LOG$i + cp *.py ./LOG$i + cp *.yaml ./LOG$i + cp -r ./src ./LOG$i + cd ./LOG$i || exit + echo "start training for rank $i, device $DEVICE_ID" + + env > env.log + taskset -c $cmdopt python ../train.py \ + --run_distribute=1 \ + --device_id=$DEVICE_ID \ + --checkpoint_file_path=$PATH_CHECKPOINT \ + --data_path=$DATA_DIR \ + --output_path './output' > log.txt 2>&1 & + cd ../ +done diff --git a/official/cv/se_resnext50/scripts/run_eval.sh b/official/cv/se_resnext50/scripts/run_eval.sh new file mode 100644 index 0000000000000000000000000000000000000000..ac1f9e4980d37675c0d843aa08af0033bccdbb30 --- /dev/null +++ b/official/cv/se_resnext50/scripts/run_eval.sh @@ -0,0 +1,35 @@ +#!/bin/bash +# Copyright 2021 Huawei Technologies Co., Ltd +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================ + +export DEVICE_ID=$1 +DATA_DIR=$2 +PATH_CHECKPOINT=$3 +PLATFORM=$4 + +if [ ! -d $2 ]; +then echo "DATA_DIR Does Not Exist!" +fi + +if [ ! -f $3 ]; +then echo "PATH_CHECKPOINT Does Not Exist!" +fi + + +python eval.py \ + --checkpoint_file_path=$PATH_CHECKPOINT \ + --device_target=$PLATFORM \ + --data_path=$DATA_DIR \ + --device_target=$PLATFORM > log.txt 2>&1 & diff --git a/official/cv/se_resnext50/scripts/run_infer_310.sh b/official/cv/se_resnext50/scripts/run_infer_310.sh new file mode 100644 index 0000000000000000000000000000000000000000..60584182677a2b6e5e2261d28fb1c37b1d259779 --- /dev/null +++ b/official/cv/se_resnext50/scripts/run_infer_310.sh @@ -0,0 +1,99 @@ +#!/bin/bash +# Copyright 2021 Huawei Technologies Co., Ltd +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================ + +if [[ $# -lt 2 || $# -gt 3 ]]; then + echo "Usage: bash run_infer_310.sh [MINDIR_PATH] [DATA_PATH] [DEVICE_ID] + DEVICE_ID is optional, it can be set by environment variable device_id, otherwise the value is zero" +exit 1 +fi + +get_real_path(){ + if [ "${1:0:1}" == "/" ]; then + echo "$1" + else + echo "$(realpath -m $PWD/$1)" + fi +} +model=$(get_real_path $1) +data_path=$(get_real_path $2) + +device_id=0 +if [ $# == 3 ]; then + device_id=$3 +fi + +echo "mindir name: "$model +echo "dataset path: "$data_path +echo "device id: "$device_id + +export ASCEND_HOME=/usr/local/Ascend/ +if [ -d ${ASCEND_HOME}/ascend-toolkit ]; then + export PATH=$ASCEND_HOME/fwkacllib/bin:$ASCEND_HOME/fwkacllib/ccec_compiler/bin:$ASCEND_HOME/ascend-toolkit/latest/fwkacllib/ccec_compiler/bin:$ASCEND_HOME/ascend-toolkit/latest/atc/bin:$PATH + export LD_LIBRARY_PATH=$ASCEND_HOME/fwkacllib/lib64:/usr/local/lib:$ASCEND_HOME/ascend-toolkit/latest/atc/lib64:$ASCEND_HOME/ascend-toolkit/latest/fwkacllib/lib64:$ASCEND_HOME/driver/lib64:$ASCEND_HOME/add-ons:$LD_LIBRARY_PATH + export TBE_IMPL_PATH=$ASCEND_HOME/ascend-toolkit/latest/opp/op_impl/built-in/ai_core/tbe + export PYTHONPATH=$ASCEND_HOME/fwkacllib/python/site-packages:${TBE_IMPL_PATH}:$ASCEND_HOME/ascend-toolkit/latest/fwkacllib/python/site-packages:$PYTHONPATH + export ASCEND_OPP_PATH=$ASCEND_HOME/ascend-toolkit/latest/opp +else + export PATH=$ASCEND_HOME/fwkacllib/bin:$ASCEND_HOME/fwkacllib/ccec_compiler/bin:$ASCEND_HOME/atc/ccec_compiler/bin:$ASCEND_HOME/atc/bin:$PATH + export LD_LIBRARY_PATH=$ASCEND_HOME/fwkacllib/lib64:/usr/local/lib:$ASCEND_HOME/atc/lib64:$ASCEND_HOME/acllib/lib64:$ASCEND_HOME/driver/lib64:$ASCEND_HOME/add-ons:$LD_LIBRARY_PATH + export PYTHONPATH=$ASCEND_HOME/fwkacllib/python/site-packages:$ASCEND_HOME/atc/python/site-packages:$PYTHONPATH + export ASCEND_OPP_PATH=$ASCEND_HOME/opp +fi + +function compile_app() +{ + cd ../ascend310_infer/src/ || exit + if [ -f "Makefile" ]; then + make clean + fi + bash build.sh &> build.log +} + +function infer() +{ + cd - || exit + if [ -d result_Files ]; then + rm -rf ./result_Files + fi + if [ -d time_Result ]; then + rm -rf ./time_Result + fi + mkdir result_Files + mkdir time_Result + ../ascend310_infer/src/main --mindir_path=$model --dataset_path=$data_path --device_id=$device_id &> infer.log +} + +function cal_acc() +{ + python3.7 ../create_imagenet2012_label.py --img_path=$data_path + python3.7 ../postprocess.py --result_path=./result_Files --label_path=./imagenet_label.json &> acc.log & +} + +compile_app +if [ $? -ne 0 ]; then + echo "compile app code failed" + exit 1 +fi +infer +if [ $? -ne 0 ]; then + echo " execute inference failed" + exit 1 +fi +cal_acc +if [ $? -ne 0 ]; then + echo "calculate accuracy failed" + exit 1 +fi \ No newline at end of file diff --git a/official/cv/se_resnext50/scripts/run_standalone_train.sh b/official/cv/se_resnext50/scripts/run_standalone_train.sh new file mode 100644 index 0000000000000000000000000000000000000000..19000ef870b60b18349e40f34c93cbec4f4311f6 --- /dev/null +++ b/official/cv/se_resnext50/scripts/run_standalone_train.sh @@ -0,0 +1,31 @@ +#!/bin/bash +# Copyright 2021 Huawei Technologies Co., Ltd +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================ + +export DEVICE_ID=$1 +DATA_DIR=$2 +PATH_CHECKPOINT="" +if [ $# == 3 ] +then + PATH_CHECKPOINT=$3 +fi + +python train.py \ + --run_distribute=0 \ + --device_id=$DEVICE_ID \ + --checkpoint_file_path=$PATH_CHECKPOINT \ + --data_path=$DATA_DIR \ + --output_path './output' > log.txt 2>&1 & + diff --git a/official/cv/se_resnext50/src/__init__.py b/official/cv/se_resnext50/src/__init__.py new file mode 100644 index 0000000000000000000000000000000000000000..e69de29bb2d1d6434b8b29ae775ad8c2e48c5391 diff --git a/official/cv/se_resnext50/src/backbone/__init__.py b/official/cv/se_resnext50/src/backbone/__init__.py new file mode 100644 index 0000000000000000000000000000000000000000..b29269af6feeafe91688523fc2a110102389aca1 --- /dev/null +++ b/official/cv/se_resnext50/src/backbone/__init__.py @@ -0,0 +1,16 @@ +# Copyright 2021 Huawei Technologies Co., Ltd +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================ +"""resnet""" +from .resnet import * diff --git a/official/cv/se_resnext50/src/backbone/resnet.py b/official/cv/se_resnext50/src/backbone/resnet.py new file mode 100644 index 0000000000000000000000000000000000000000..72373c2c328833e4693f08732bb2873db5d95324 --- /dev/null +++ b/official/cv/se_resnext50/src/backbone/resnet.py @@ -0,0 +1,248 @@ +# Copyright 2021 Huawei Technologies Co., Ltd +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================ +""" +ResNet based ResNext +""" +import mindspore.nn as nn +from mindspore.ops.operations import Add, Split, Concat +from mindspore.ops import operations as P +from mindspore.common.initializer import TruncatedNormal + +from src.utils.cunstom_op import SEBlock, GroupConv + + +__all__ = ['ResNet', 'se_resnext50'] + + +def weight_variable(shape, factor=0.1): + return TruncatedNormal(0.02) + +def conv7x7(in_channels, out_channels, stride=1, padding=3, has_bias=False, groups=1): + return nn.Conv2d(in_channels, out_channels, kernel_size=7, stride=stride, has_bias=has_bias, + padding=padding, pad_mode="pad", group=groups) + +def conv3x3(in_channels, out_channels, stride=1, padding=1, has_bias=False, groups=1): + return nn.Conv2d(in_channels, out_channels, kernel_size=3, stride=stride, has_bias=has_bias, + padding=padding, pad_mode="pad", group=groups) + +def conv1x1(in_channels, out_channels, stride=1, padding=0, has_bias=False, groups=1): + return nn.Conv2d(in_channels, out_channels, kernel_size=1, stride=stride, has_bias=has_bias, + padding=padding, pad_mode="pad", group=groups) + +class _DownSample(nn.Cell): + """ + Downsample for ResNext-ResNet. + + Args: + in_channels (int): Input channels. + out_channels (int): Output channels. + stride (int): Stride size for the 1*1 convolutional layer. + + Returns: + Tensor, output tensor. + + Examples: + >>>DownSample(32, 64, 2) + """ + def __init__(self, in_channels, out_channels, stride): + super(_DownSample, self).__init__() + self.conv = conv1x1(in_channels, out_channels, stride=stride, padding=0) + self.bn = nn.BatchNorm2d(out_channels) + + def construct(self, x): + out = self.conv(x) + out = self.bn(out) + return out + +class BasicBlock(nn.Cell): + """ + ResNet basic block definition. + + Args: + in_channels (int): Input channels. + out_channels (int): Output channels. + stride (int): Stride size for the first convolutional layer. Default: 1. + + Returns: + Tensor, output tensor. + + Examples: + >>>BasicBlock(32, 256, stride=2) + """ + expansion = 1 + + def __init__(self, in_channels, out_channels, stride=1, down_sample=None, use_se=True, + platform="Ascend", **kwargs): + super(BasicBlock, self).__init__() + self.conv1 = conv3x3(in_channels, out_channels, stride=stride) + self.bn1 = nn.BatchNorm2d(out_channels) + self.relu = P.ReLU() + self.conv2 = conv3x3(out_channels, out_channels, stride=1) + self.bn2 = nn.BatchNorm2d(out_channels) + self.use_se = use_se + if self.use_se: + self.se = SEBlock(out_channels) + self.down_sample_flag = False + if down_sample is not None: + self.down_sample = down_sample + self.down_sample_flag = True + self.add = Add() + + def construct(self, x): + identity = x + out = self.conv1(x) + out = self.bn1(out) + out = self.relu(out) + out = self.conv2(out) + out = self.bn2(out) + if self.use_se: + out = self.se(out) + if self.down_sample_flag: + identity = self.down_sample(x) + out = self.add(out, identity) + out = self.relu(out) + return out + +class Bottleneck(nn.Cell): + """ + ResNet Bottleneck block definition. + + Args: + in_channels (int): Input channels. + out_channels (int): Output channels. + stride (int): Stride size for the initial convolutional layer. Default: 1. + + Returns: + Tensor, the ResNet unit's output. + + + >>>Bottleneck(3, 256, stride=2) + """ + expansion = 4 + + def __init__(self, in_channels, out_channels, stride=1, down_sample=None, + base_width=64, groups=1, use_se=True, platform="Ascend", **kwargs): + super(Bottleneck, self).__init__() + + width = int(out_channels * (base_width / 64.0)) * groups + self.groups = groups + self.conv1 = conv1x1(in_channels, width, stride=1) + self.bn1 = nn.BatchNorm2d(width) + self.relu = P.ReLU() + self.conv3x3s = nn.CellList() + self.conv2 = GroupConv(width, width, 3, stride, pad=1, groups=groups) + self.op_split = Split(axis=1, output_num=self.groups) + self.op_concat = Concat(axis=1) + self.bn2 = nn.BatchNorm2d(width) + self.conv3 = conv1x1(width, out_channels * self.expansion, stride=1) + self.bn3 = nn.BatchNorm2d(out_channels * self.expansion) + self.use_se = use_se + if self.use_se: + self.se = SEBlock(out_channels * self.expansion) + self.down_sample_flag = False + if down_sample is not None: + self.down_sample = down_sample + self.down_sample_flag = True + self.cast = P.Cast() + self.add = Add() + + def construct(self, x): + identity = x + out = self.conv1(x) + out = self.bn1(out) + out = self.relu(out) + out = self.conv2(out) + out = self.bn2(out) + out = self.relu(out) + out = self.conv3(out) + out = self.bn3(out) + if self.use_se: + out = self.se(out) + if self.down_sample_flag: + identity = self.down_sample(x) + out = self.add(out, identity) + out = self.relu(out) + return out + +class ResNet(nn.Cell): + """ + ResNet architecture. + + Args: + block (cell): Block for network. + layers (list): Numbers of block in different layers. + width_per_group (int): Width of every group. + groups (int): Groups number. + + Returns: + Tuple, output tensor tuple. + + Examples: + >>>ResNet() + """ + def __init__(self, block, layers, width_per_group=64, groups=1, use_se=True, platform="Ascend"): + super(ResNet, self).__init__() + self.in_channels = 64 + self.groups = groups + self.base_width = width_per_group + self.conv = conv7x7(3, self.in_channels, stride=2, padding=3) + self.bn = nn.BatchNorm2d(self.in_channels) + self.relu = P.ReLU() + self.maxpool = nn.MaxPool2d(kernel_size=3, stride=2, pad_mode='same') + self.layer1 = self._make_layer(block, 64, layers[0], use_se=use_se, platform=platform) + self.layer2 = self._make_layer(block, 128, layers[1], stride=2, use_se=use_se, platform=platform) + self.layer3 = self._make_layer(block, 256, layers[2], stride=2, use_se=use_se, platform=platform) + self.layer4 = self._make_layer(block, 512, layers[3], stride=2, use_se=use_se, platform=platform) + self.out_channels = 512 * block.expansion + self.cast = P.Cast() + + def construct(self, x): + x = self.conv(x) + x = self.bn(x) + x = self.relu(x) + x = self.maxpool(x) + x = self.layer1(x) + x = self.layer2(x) + x = self.layer3(x) + x = self.layer4(x) + return x + + def _make_layer(self, block, out_channels, blocks_num, stride=1, use_se=True, platform="Ascend"): + """_make_layer""" + down_sample = None + if stride != 1 or self.in_channels != out_channels * block.expansion: + down_sample = _DownSample(self.in_channels, + out_channels * block.expansion, + stride=stride) + layers = [] + layers.append(block(self.in_channels, + out_channels, + stride=stride, + down_sample=down_sample, + base_width=self.base_width, + groups=self.groups, + use_se=use_se, + platform=platform)) + self.in_channels = out_channels * block.expansion + for _ in range(1, blocks_num): + layers.append(block(self.in_channels, out_channels, base_width=self.base_width, + groups=self.groups, use_se=use_se, platform=platform)) + return nn.SequentialCell(layers) + + def get_out_channels(self): + return self.out_channels + +def se_resnext50(platform="Ascend"): + return ResNet(Bottleneck, [3, 4, 6, 3], width_per_group=4, groups=32, platform=platform) diff --git a/official/cv/se_resnext50/src/crossentropy.py b/official/cv/se_resnext50/src/crossentropy.py new file mode 100644 index 0000000000000000000000000000000000000000..01039e39ea3e4fe4a7fa353a738a6d96ada2809b --- /dev/null +++ b/official/cv/se_resnext50/src/crossentropy.py @@ -0,0 +1,41 @@ +# Copyright 2021 Huawei Technologies Co., Ltd +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================ +""" +define loss function for network. +""" +from mindspore.nn.loss.loss import LossBase +from mindspore.ops import operations as P +from mindspore.ops import functional as F +from mindspore import Tensor +from mindspore.common import dtype as mstype +import mindspore.nn as nn + +class CrossEntropy(LossBase): + """ + the redefined loss function with SoftmaxCrossEntropyWithLogits. + """ + def __init__(self, smooth_factor=0., num_classes=1000): + super(CrossEntropy, self).__init__() + self.onehot = P.OneHot() + self.on_value = Tensor(1.0 - smooth_factor, mstype.float32) + self.off_value = Tensor(1.0 * smooth_factor / (num_classes -1), mstype.float32) + self.ce = nn.SoftmaxCrossEntropyWithLogits() + self.mean = P.ReduceMean(False) + + def construct(self, logit, label): + one_hot_label = self.onehot(label, F.shape(logit)[1], self.on_value, self.off_value) + loss = self.ce(logit, one_hot_label) + loss = self.mean(loss, 0) + return loss diff --git a/official/cv/se_resnext50/src/dataset.py b/official/cv/se_resnext50/src/dataset.py new file mode 100644 index 0000000000000000000000000000000000000000..4ce7b43c00efac7641a909d4ec4bdeb2feb72a76 --- /dev/null +++ b/official/cv/se_resnext50/src/dataset.py @@ -0,0 +1,158 @@ +# Copyright 2021 Huawei Technologies Co., Ltd +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================ +""" +dataset processing. +""" +import os +from PIL import Image, ImageFile +from mindspore.common import dtype as mstype +import mindspore.dataset as de +import mindspore.dataset.transforms.c_transforms as C +import mindspore.dataset.vision.c_transforms as V_C +from src.utils.sampler import DistributedSampler + +ImageFile.LOAD_TRUNCATED_IMAGES = True + + +class TxtDataset(): + """ + create txt dataset. + + Args: + Returns: + de_dataset. + """ + + def __init__(self, root, txt_name): + super(TxtDataset, self).__init__() + self.imgs = [] + self.labels = [] + fin = open(txt_name, "r") + for line in fin: + img_name, label = line.strip().split(' ') + self.imgs.append(os.path.join(root, img_name)) + self.labels.append(int(label)) + fin.close() + + def __getitem__(self, index): + img = Image.open(self.imgs[index]).convert('RGB') + return img, self.labels[index] + + def __len__(self): + return len(self.imgs) + + +def classification_dataset(data_dir, image_size, per_batch_size, max_epoch, rank, group_size, + mode='train', + input_mode='folder', + root='', + num_parallel_workers=None, + shuffle=None, + sampler=None, + class_indexing=None, + drop_remainder=True, + transform=None, + target_transform=None): + """ + A function that returns a dataset for classification. The mode of input dataset could be "folder" or "txt". + If it is "folder", all images within one folder have the same label. If it is "txt", all paths of images + are written into a textfile. + + Args: + data_dir (str): Path to the root directory that contains the dataset for "input_mode="folder"". + Or path of the textfile that contains every image's path of the dataset. + image_size (Union(int, sequence)): Size of the input images. + per_batch_size (int): the batch size of evey step during training. + max_epoch (int): the number of epochs. + rank (int): The shard ID within num_shards (default=None). + group_size (int): Number of shards that the dataset should be divided + into (default=None). + mode (str): "train" or others. Default: " train". + input_mode (str): The form of the input dataset. "folder" or "txt". Default: "folder". + root (str): the images path for "input_mode="txt"". Default: " ". + num_parallel_workers (int): Number of workers to read the data. Default: None. + shuffle (bool): Whether or not to perform shuffle on the dataset + (default=None, performs shuffle). + sampler (Sampler): Object used to choose samples from the dataset. Default: None. + class_indexing (dict): A str-to-int mapping from folder name to index + (default=None, the folder names will be sorted + alphabetically and each class will be given a + unique index starting from 0). + + Examples: + >>> from src.dataset import classification_dataset + >>> # path to imagefolder directory. This directory needs to contain sub-directories which contain the images + >>> data_dir = "/path/to/imagefolder_directory" + >>> de_dataset = classification_dataset(data_dir, image_size=[224, 244], + >>> per_batch_size=64, max_epoch=100, + >>> rank=0, group_size=4) + >>> # Path of the textfile that contains every image's path of the dataset. + >>> data_dir = "/path/to/dataset/images/train.txt" + >>> images_dir = "/path/to/dataset/images" + >>> de_dataset = classification_dataset(data_dir, image_size=[224, 244], + >>> per_batch_size=64, max_epoch=100, + >>> rank=0, group_size=4, + >>> input_mode="txt", root=images_dir) + """ + + mean = [0.485 * 255, 0.456 * 255, 0.406 * 255] + std = [0.229 * 255, 0.224 * 255, 0.225 * 255] + + if transform is None: + if mode == 'train': + transform_img = [ + V_C.RandomCropDecodeResize(image_size, scale=(0.08, 1.0), ratio=(0.75, 1.333)), + V_C.RandomHorizontalFlip(prob=0.5), + V_C.RandomColorAdjust(brightness=0.4, contrast=0.4, saturation=0.4), + V_C.Normalize(mean=mean, std=std), + V_C.HWC2CHW() + ] + else: + transform_img = [ + V_C.Decode(), + V_C.Resize((256, 256)), + V_C.CenterCrop(image_size), + V_C.Normalize(mean=mean, std=std), + V_C.HWC2CHW() + ] + else: + transform_img = transform + + if target_transform is None: + transform_label = [C.TypeCast(mstype.int32)] + else: + transform_label = target_transform + + if input_mode == 'folder': + de_dataset = de.ImageFolderDataset(data_dir, num_parallel_workers=num_parallel_workers, + shuffle=shuffle, sampler=sampler, class_indexing=class_indexing, + num_shards=group_size, shard_id=rank) + else: + dataset = TxtDataset(root, data_dir) + sampler = DistributedSampler(dataset, rank, group_size, shuffle=shuffle) + de_dataset = de.GeneratorDataset(dataset, ["image", "label"], sampler=sampler) + + de_dataset = de_dataset.map(operations=transform_img, input_columns="image", + num_parallel_workers=num_parallel_workers) + de_dataset = de_dataset.map(operations=transform_label, input_columns="label", + num_parallel_workers=num_parallel_workers) + + columns_to_project = ["image", "label"] + de_dataset = de_dataset.project(columns=columns_to_project) + + de_dataset = de_dataset.batch(per_batch_size, drop_remainder=drop_remainder) + de_dataset = de_dataset.repeat(max_epoch) + + return de_dataset diff --git a/official/cv/se_resnext50/src/head.py b/official/cv/se_resnext50/src/head.py new file mode 100644 index 0000000000000000000000000000000000000000..bfc63befc310ce7b355cde7dc288fb7cc5ed25ab --- /dev/null +++ b/official/cv/se_resnext50/src/head.py @@ -0,0 +1,42 @@ +# Copyright 2021 Huawei Technologies Co., Ltd +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================ +""" +common architecture. +""" +import mindspore.nn as nn +from src.utils.cunstom_op import GlobalAvgPooling + +__all__ = ['CommonHead'] + +class CommonHead(nn.Cell): + """ + common architecture definition. + + Args: + num_classes (int): Number of classes. + out_channels (int): Output channels. + + Returns: + Tensor, output tensor. + """ + def __init__(self, num_classes, out_channels): + super(CommonHead, self).__init__() + self.avgpool = GlobalAvgPooling() + self.fc = nn.Dense(out_channels, num_classes, has_bias=True).add_flags_recursive(fp16=True) + + def construct(self, x): + x = self.avgpool(x) + x = self.fc(x) + return x diff --git a/official/cv/se_resnext50/src/image_classification.py b/official/cv/se_resnext50/src/image_classification.py new file mode 100644 index 0000000000000000000000000000000000000000..cfacbf9e8f66beb734362842e953c728ac94a30a --- /dev/null +++ b/official/cv/se_resnext50/src/image_classification.py @@ -0,0 +1,100 @@ +# Copyright 2021 Huawei Technologies Co., Ltd +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================ +""" +Image classifiation. +""" +import math +import mindspore.nn as nn +from mindspore.common import initializer as init +import src.backbone as backbones +import src.head as heads +from src.utils.var_init import default_recurisive_init, KaimingNormal + + +class ImageClassificationNetwork(nn.Cell): + """ + architecture of image classification network. + + Args: + Returns: + Tensor, output tensor. + """ + def __init__(self, backbone, head, include_top=True, activation="None"): + super(ImageClassificationNetwork, self).__init__() + self.backbone = backbone + self.include_top = include_top + self.need_activation = False + if self.include_top: + self.head = head + if activation != "None": + self.need_activation = True + if activation == "Sigmoid": + self.activation = P.Sigmoid() + elif activation == "Softmax": + self.activation = P.Softmax() + else: + raise NotImplementedError(f"The activation {activation} not in [Sigmoid, Softmax].") + + def construct(self, x): + x = self.backbone(x) + if self.include_top: + x = self.head(x) + if self.need_activation: + x = self.activation(x) + return x + + +class Resnet(ImageClassificationNetwork): + """ + Resnet architecture. + Args: + backbone_name (string): backbone. + num_classes (int): number of classes, Default is 1000. + Returns: + Resnet. + """ + def __init__(self, backbone_name, num_classes=1000, platform="Ascend", include_top=True, activation="None"): + self.backbone_name = backbone_name + backbone = backbones.__dict__[self.backbone_name](platform=platform) + out_channels = backbone.get_out_channels() + head = heads.CommonHead(num_classes=num_classes, out_channels=out_channels) + super(Resnet, self).__init__(backbone, head, include_top, activation) + + default_recurisive_init(self) + + for cell in self.cells_and_names(): + if isinstance(cell, nn.Conv2d): + cell.weight.set_data(init.initializer( + KaimingNormal(a=math.sqrt(5), mode='fan_out', nonlinearity='relu'), + cell.weight.shape, cell.weight.dtype)) + elif isinstance(cell, nn.BatchNorm2d): + cell.gamma.set_data(init.initializer('ones', cell.gamma.shape)) + cell.beta.set_data(init.initializer('zeros', cell.beta.shape)) + + # Zero-initialize the last BN in each residual branch, + # so that the residual branch starts with zeros, and each residual block behaves like an identity. + # This improves the model by 0.2~0.3% according to https://arxiv.org/abs/1706.02677 + for cell in self.cells_and_names(): + if isinstance(cell, backbones.resnet.Bottleneck): + cell.bn3.gamma.set_data(init.initializer('zeros', cell.bn3.gamma.shape)) + elif isinstance(cell, backbones.resnet.BasicBlock): + cell.bn2.gamma.set_data(init.initializer('zeros', cell.bn2.gamma.shape)) + + + +def get_network(network, **kwargs): + if network not in ['se_resnext50']: + raise NotImplementedError(f"The network {network} not in [se_resnext50].") + return Resnet('se_resnext50', **kwargs) diff --git a/official/cv/se_resnext50/src/lr_generator.py b/official/cv/se_resnext50/src/lr_generator.py new file mode 100644 index 0000000000000000000000000000000000000000..0fcca9ad252f43e848f8af7af424eef127ac3b4f --- /dev/null +++ b/official/cv/se_resnext50/src/lr_generator.py @@ -0,0 +1,142 @@ +# Copyright 2021 Huawei Technologies Co., Ltd +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================ +""" +learning rate generator. +""" +import math +from collections import Counter +import numpy as np + + +def linear_warmup_lr(current_step, warmup_steps, base_lr, init_lr): + """ + Applies liner Increasing to generate learning rate array in warmup stage. + + Args: + current_step(int): current step in warmup stage. + warmup_steps(int): all steps in warmup stage. + base_lr(float): init learning rate. + init_lr(float): end learning rate + + Returns: + float, learning rate. + """ + lr_inc = (float(base_lr) - float(init_lr)) / float(warmup_steps) + lr = float(init_lr) + lr_inc * current_step + return lr + + +def warmup_cosine_annealing_lr(lr, steps_per_epoch, warmup_epochs, max_epoch, T_max, eta_min=0): + """ + Applies cosine decay to generate learning rate array with warmup. + + Args: + lr(float): init learning rate + steps_per_epoch(int): steps of one epoch + warmup_epochs(int): number of warmup epochs + max_epoch(int): total epoch of training + T_max(int): max epoch in decay. + eta_min(float): end learning rate + + Returns: + np.array, learning rate array. + """ + base_lr = lr + warmup_init_lr = 0 + total_steps = int(max_epoch * steps_per_epoch) + warmup_steps = int(warmup_epochs * steps_per_epoch) + + lr_each_step = [] + for i in range(total_steps): + last_epoch = i // steps_per_epoch + if i < warmup_steps: + lr = linear_warmup_lr(i + 1, warmup_steps, base_lr, warmup_init_lr) + else: + lr = eta_min + (base_lr - eta_min) * (1. + math.cos(math.pi * last_epoch / T_max)) / 2 + lr_each_step.append(lr) + + return np.array(lr_each_step).astype(np.float32) + + +def warmup_step_lr(lr, lr_epochs, steps_per_epoch, warmup_epochs, max_epoch, gamma=0.1): + """ + Applies step decay to generate learning rate array with warmup. + + Args: + lr(float): init learning rate + lr_epochs(list): learning rate decay epoches list + steps_per_epoch(int): steps of one epoch + warmup_epochs(int): number of warmup epochs + max_epoch(int): total epoch of training + gamma(float): attenuation constants. + + Returns: + np.array, learning rate array. + """ + base_lr = lr + warmup_init_lr = 0 + total_steps = int(max_epoch * steps_per_epoch) + warmup_steps = int(warmup_epochs * steps_per_epoch) + milestones = lr_epochs + milestones_steps = [] + for milestone in milestones: + milestones_step = milestone * steps_per_epoch + milestones_steps.append(milestones_step) + + lr_each_step = [] + lr = base_lr + milestones_steps_counter = Counter(milestones_steps) + for i in range(total_steps): + if i < warmup_steps: + lr = linear_warmup_lr(i + 1, warmup_steps, base_lr, warmup_init_lr) + else: + lr = lr * gamma**milestones_steps_counter[i] + lr_each_step.append(lr) + + return np.array(lr_each_step).astype(np.float32) + + +def multi_step_lr(lr, milestones, steps_per_epoch, max_epoch, gamma=0.1): + return warmup_step_lr(lr, milestones, steps_per_epoch, 0, max_epoch, gamma=gamma) + + +def step_lr(lr, epoch_size, steps_per_epoch, max_epoch, gamma=0.1): + lr_epochs = [] + for i in range(1, max_epoch): + if i % epoch_size == 0: + lr_epochs.append(i) + return multi_step_lr(lr, lr_epochs, steps_per_epoch, max_epoch, gamma=gamma) + + +def get_lr(args): + """generate learning rate array.""" + if args.lr_scheduler == 'exponential': + lr = warmup_step_lr(args.lr, + args.lr_epochs, + args.steps_per_epoch, + args.warmup_epochs, + args.max_epoch, + gamma=args.lr_gamma, + ) + elif args.lr_scheduler == 'cosine_annealing': + lr = warmup_cosine_annealing_lr(args.lr, + args.steps_per_epoch, + args.warmup_epochs, + args.max_epoch, + args.T_max, + args.eta_min) + else: + raise NotImplementedError(args.lr_scheduler) + return lr diff --git a/official/cv/se_resnext50/src/model_utils/config.py b/official/cv/se_resnext50/src/model_utils/config.py new file mode 100644 index 0000000000000000000000000000000000000000..1611eb76d3bd15a0d5a1097ad1f98f61400e2a81 --- /dev/null +++ b/official/cv/se_resnext50/src/model_utils/config.py @@ -0,0 +1,125 @@ +# Copyright 2021 Huawei Technologies Co., Ltd +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================ + +"""Parse arguments""" + +import os +import ast +import argparse +from pprint import pprint, pformat +import yaml + + +class Config: + """ + Configuration namespace. Convert dictionary to members. + """ + def __init__(self, cfg_dict): + for k, v in cfg_dict.items(): + if isinstance(v, (list, tuple)): + setattr(self, k, [Config(x) if isinstance(x, dict) else x for x in v]) + else: + setattr(self, k, Config(v) if isinstance(v, dict) else v) + + def __str__(self): + return pformat(self.__dict__) + + def __repr__(self): + return self.__str__() + + +def parse_cli_to_yaml(parser, cfg, helper=None, choices=None, cfg_path="default_config.yaml"): + """ + Parse command line arguments to the configuration according to the default yaml. + + Args: + parser: Parent parser. + cfg: Base configuration. + helper: Helper description. + cfg_path: Path to the default yaml config. + """ + parser = argparse.ArgumentParser(description="[REPLACE THIS at config.py]", + parents=[parser]) + helper = {} if helper is None else helper + choices = {} if choices is None else choices + for item in cfg: + if not isinstance(cfg[item], list) and not isinstance(cfg[item], dict): + help_description = helper[item] if item in helper else "Please reference to {}".format(cfg_path) + choice = choices[item] if item in choices else None + if isinstance(cfg[item], bool): + parser.add_argument("--" + item, type=ast.literal_eval, default=cfg[item], choices=choice, + help=help_description) + else: + parser.add_argument("--" + item, type=type(cfg[item]), default=cfg[item], choices=choice, + help=help_description) + args = parser.parse_args() + return args + +def parse_yaml(yaml_path): + """ + Parse the yaml config file. + + Args: + yaml_path: Path to the yaml config. + """ + with open(yaml_path, 'r') as fin: + try: + cfgs = yaml.load_all(fin.read(), Loader=yaml.FullLoader) + cfgs = [x for x in cfgs] + if len(cfgs) == 1: + cfg_helper = {} + cfg = cfgs[0] + cfg_choices = {} + elif len(cfgs) == 2: + cfg, cfg_helper = cfgs + cfg_choices = {} + elif len(cfgs) == 3: + cfg, cfg_helper, cfg_choices = cfgs + else: + raise ValueError("At most 3 docs (config description for help, choices) are supported in config yaml") + print(cfg_helper) + except: + raise ValueError("Failed to parse yaml") + return cfg, cfg_helper, cfg_choices + +def merge(args, cfg): + """ + Merge the base config from yaml file and command line arguments. + + Args: + args: Command line arguments. + cfg: Base configuration. + """ + args_var = vars(args) + for item in args_var: + cfg[item] = args_var[item] + return cfg + +def get_config(): + """ + Get Config according to the yaml file and cli arguments. + """ + parser = argparse.ArgumentParser(description="default name", add_help=False) + current_dir = os.path.dirname(os.path.abspath(__file__)) + parser.add_argument("--config_path", type=str, default=os.path.join(current_dir, "../../default_config.yaml"), + help="Config file path") + path_args, _ = parser.parse_known_args() + default, helper, choices = parse_yaml(path_args.config_path) + pprint(default) + args = parse_cli_to_yaml(parser=parser, cfg=default, helper=helper, choices=choices, cfg_path=path_args.config_path) + final_config = merge(args, default) + return Config(final_config) + +config = get_config() diff --git a/official/cv/se_resnext50/src/model_utils/device_adapter.py b/official/cv/se_resnext50/src/model_utils/device_adapter.py new file mode 100644 index 0000000000000000000000000000000000000000..9c3d21d5e47c22617170887df9da97beff668495 --- /dev/null +++ b/official/cv/se_resnext50/src/model_utils/device_adapter.py @@ -0,0 +1,27 @@ +# Copyright 2021 Huawei Technologies Co., Ltd +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================ + +"""Device adapter for ModelArts""" + +from src.model_utils.config import config + +if config.enable_modelarts: + from src.model_utils.moxing_adapter import get_device_id, get_device_num, get_rank_id, get_job_id +else: + from src.model_utils.local_adapter import get_device_id, get_device_num, get_rank_id, get_job_id + +__all__ = [ + "get_device_id", "get_device_num", "get_rank_id", "get_job_id" +] diff --git a/official/cv/se_resnext50/src/model_utils/local_adapter.py b/official/cv/se_resnext50/src/model_utils/local_adapter.py new file mode 100644 index 0000000000000000000000000000000000000000..769fa6dc78e59eb66dbc8e6773accdc1d08b649e --- /dev/null +++ b/official/cv/se_resnext50/src/model_utils/local_adapter.py @@ -0,0 +1,36 @@ +# Copyright 2021 Huawei Technologies Co., Ltd +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================ + +"""Local adapter""" + +import os + +def get_device_id(): + device_id = os.getenv('DEVICE_ID', '0') + return int(device_id) + + +def get_device_num(): + device_num = os.getenv('RANK_SIZE', '1') + return int(device_num) + + +def get_rank_id(): + global_rank_id = os.getenv('RANK_ID', '0') + return int(global_rank_id) + + +def get_job_id(): + return "Local Job" diff --git a/official/cv/se_resnext50/src/model_utils/moxing_adapter.py b/official/cv/se_resnext50/src/model_utils/moxing_adapter.py new file mode 100644 index 0000000000000000000000000000000000000000..aabd5ac6cf1bde3ca20f3d6ea9cf3d5310169f1e --- /dev/null +++ b/official/cv/se_resnext50/src/model_utils/moxing_adapter.py @@ -0,0 +1,115 @@ +# Copyright 2021 Huawei Technologies Co., Ltd +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================ + +"""Moxing adapter for ModelArts""" + +import os +import functools +from mindspore import context +from src.model_utils.config import config + +_global_sync_count = 0 + +def get_device_id(): + device_id = os.getenv('DEVICE_ID', '0') + return int(device_id) + + +def get_device_num(): + device_num = os.getenv('RANK_SIZE', '1') + return int(device_num) + + +def get_rank_id(): + global_rank_id = os.getenv('RANK_ID', '0') + return int(global_rank_id) + + +def get_job_id(): + job_id = os.getenv('JOB_ID') + job_id = job_id if job_id != "" else "default" + return job_id + +def sync_data(from_path, to_path): + """ + Download data from remote obs to local directory if the first url is remote url and the second one is local path + Upload data from local directory to remote obs in contrast. + """ + import moxing as mox + import time + global _global_sync_count + sync_lock = "/tmp/copy_sync.lock" + str(_global_sync_count) + _global_sync_count += 1 + + # Each server contains 8 devices as most. + if get_device_id() % min(get_device_num(), 8) == 0 and not os.path.exists(sync_lock): + print("from path: ", from_path) + print("to path: ", to_path) + mox.file.copy_parallel(from_path, to_path) + print("===finish data synchronization===") + try: + os.mknod(sync_lock) + except IOError: + pass + print("===save flag===") + + while True: + if os.path.exists(sync_lock): + break + time.sleep(1) + + print("Finish sync data from {} to {}.".format(from_path, to_path)) + + +def moxing_wrapper(pre_process=None, post_process=None): + """ + Moxing wrapper to download dataset and upload outputs. + """ + def wrapper(run_func): + @functools.wraps(run_func) + def wrapped_func(*args, **kwargs): + # Download data from data_url + if config.enable_modelarts: + if config.data_url: + sync_data(config.data_url, config.data_path) + print("Dataset downloaded: ", os.listdir(config.data_path)) + if config.checkpoint_url: + sync_data(config.checkpoint_url, config.load_path) + print("Preload downloaded: ", os.listdir(config.load_path)) + if config.train_url: + sync_data(config.train_url, config.output_path) + print("Workspace downloaded: ", os.listdir(config.output_path)) + + context.set_context(save_graphs_path=os.path.join(config.output_path, str(get_rank_id()))) + config.device_num = get_device_num() + config.device_id = get_device_id() + if not os.path.exists(config.output_path): + os.makedirs(config.output_path) + + if pre_process: + pre_process() + + run_func(*args, **kwargs) + + # Upload data to train_url + if config.enable_modelarts: + if post_process: + post_process() + + if config.train_url: + print("Start to copy output directory") + sync_data(config.output_path, config.train_url) + return wrapped_func + return wrapper diff --git a/official/cv/se_resnext50/src/utils/__init__.py b/official/cv/se_resnext50/src/utils/__init__.py new file mode 100644 index 0000000000000000000000000000000000000000..e69de29bb2d1d6434b8b29ae775ad8c2e48c5391 diff --git a/official/cv/se_resnext50/src/utils/auto_mixed_precision.py b/official/cv/se_resnext50/src/utils/auto_mixed_precision.py new file mode 100644 index 0000000000000000000000000000000000000000..6be124658a2cf3df2bcec766c981d6bdbee1e225 --- /dev/null +++ b/official/cv/se_resnext50/src/utils/auto_mixed_precision.py @@ -0,0 +1,53 @@ +# Copyright 2021 Huawei Technologies Co., Ltd +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================ +"""Auto mixed precision.""" +import mindspore.nn as nn +from mindspore.ops import functional as F +from mindspore._checkparam import Validator as validator +from mindspore.common import dtype as mstype + + +class OutputTo(nn.Cell): + "Cast cell output back to float16 or float32" + + def __init__(self, op, to_type=mstype.float16): + super(OutputTo, self).__init__(auto_prefix=False) + self._op = op + validator.check_type_name('to_type', to_type, [mstype.float16, mstype.float32], None) + self.to_type = to_type + + def construct(self, x): + return F.cast(self._op(x), self.to_type) + + +def auto_mixed_precision(network): + """Do keep batchnorm fp32.""" + cells = network.name_cells() + change = False + network.to_float(mstype.float16) + for name in cells: + subcell = cells[name] + if subcell == network: + continue + elif name == 'fc': + network.insert_child_to_cell(name, OutputTo(subcell, mstype.float32)) + change = True + elif isinstance(subcell, (nn.BatchNorm2d, nn.BatchNorm1d)): + network.insert_child_to_cell(name, OutputTo(subcell.to_float(mstype.float32), mstype.float16)) + change = True + else: + auto_mixed_precision(subcell) + if isinstance(network, nn.SequentialCell) and change: + network.cell_list = list(network.cells()) diff --git a/official/cv/se_resnext50/src/utils/cunstom_op.py b/official/cv/se_resnext50/src/utils/cunstom_op.py new file mode 100644 index 0000000000000000000000000000000000000000..b7567c28f56420cb5bf05f06c7ba9c9fed74c0a7 --- /dev/null +++ b/official/cv/se_resnext50/src/utils/cunstom_op.py @@ -0,0 +1,103 @@ +# Copyright 2021 Huawei Technologies Co., Ltd +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================ +""" +network operations +""" +import mindspore.nn as nn +from mindspore.ops import operations as P +from mindspore.common import dtype as mstype + + +class GlobalAvgPooling(nn.Cell): + """ + global average pooling feature map. + + Args: + mean (tuple): means for each channel. + """ + def __init__(self): + super(GlobalAvgPooling, self).__init__() + self.mean = P.ReduceMean(False) + + def construct(self, x): + x = self.mean(x, (2, 3)) + return x + + +class SEBlock(nn.Cell): + """ + squeeze and excitation block. + + Args: + channel (int): number of feature maps. + reduction (int): weight. + """ + def __init__(self, channel, reduction=16): + super(SEBlock, self).__init__() + + self.avg_pool = GlobalAvgPooling() + self.fc1 = nn.Dense(channel, channel // reduction) + self.relu = P.ReLU() + self.fc2 = nn.Dense(channel // reduction, channel) + self.sigmoid = P.Sigmoid() + self.reshape = P.Reshape() + self.shape = P.Shape() + self.cast = P.Cast() + + def construct(self, x): + b, c = self.shape(x) + y = self.avg_pool(x) + + y = self.reshape(y, (b, c)) + y = self.fc1(y) + y = self.relu(y) + y = self.fc2(y) + y = self.sigmoid(y) + y = self.reshape(y, (b, c, 1, 1)) + return x * y + +class GroupConv(nn.Cell): + """ + group convolution operation. + + Args: + in_channels (int): Input channels of feature map. + out_channels (int): Output channels of feature map. + kernel_size (int): Size of convolution kernel. + stride (int): Stride size for the group convolution layer. + + Returns: + tensor, output tensor. + """ + def __init__(self, in_channels, out_channels, kernel_size, stride, pad_mode="pad", pad=0, groups=1, has_bias=False): + super(GroupConv, self).__init__() + assert in_channels % groups == 0 and out_channels % groups == 0 + self.groups = groups + self.convs = nn.CellList() + self.op_split = P.Split(axis=1, output_num=self.groups) + self.op_concat = P.Concat(axis=1) + self.cast = P.Cast() + for _ in range(groups): + self.convs.append(nn.Conv2d(in_channels//groups, out_channels//groups, + kernel_size=kernel_size, stride=stride, has_bias=has_bias, + padding=pad, pad_mode=pad_mode, group=1)) + + def construct(self, x): + features = self.op_split(x) + outputs = () + for i in range(self.groups): + outputs = outputs + (self.convs[i](self.cast(features[i], mstype.float32)),) + out = self.op_concat(outputs) + return out diff --git a/official/cv/se_resnext50/src/utils/logging.py b/official/cv/se_resnext50/src/utils/logging.py new file mode 100644 index 0000000000000000000000000000000000000000..c17befd265b04d2c820c8ec71bf361273eacc31e --- /dev/null +++ b/official/cv/se_resnext50/src/utils/logging.py @@ -0,0 +1,82 @@ +# Copyright 2021 Huawei Technologies Co., Ltd +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================ +""" +get logger. +""" +import logging +import os +import sys +from datetime import datetime + +class LOGGER(logging.Logger): + """ + set up logging file. + + Args: + logger_name (string): logger name. + log_dir (string): path of logger. + + Returns: + string, logger path + """ + def __init__(self, logger_name, rank=0): + super(LOGGER, self).__init__(logger_name) + if rank % 8 == 0: + console = logging.StreamHandler(sys.stdout) + console.setLevel(logging.INFO) + formatter = logging.Formatter('%(asctime)s:%(levelname)s:%(message)s') + console.setFormatter(formatter) + self.addHandler(console) + + def setup_logging_file(self, log_dir, rank=0): + """set up log file""" + self.rank = rank + if not os.path.exists(log_dir): + os.makedirs(log_dir, exist_ok=True) + log_name = datetime.now().strftime('%Y-%m-%d_time_%H_%M_%S') + '_rank_{}.log'.format(rank) + self.log_fn = os.path.join(log_dir, log_name) + fh = logging.FileHandler(self.log_fn) + fh.setLevel(logging.INFO) + formatter = logging.Formatter('%(asctime)s:%(levelname)s:%(message)s') + fh.setFormatter(formatter) + self.addHandler(fh) + + def info(self, msg, *args, **kwargs): + if self.isEnabledFor(logging.INFO): + self._log(logging.INFO, msg, args, **kwargs) + + def save_args(self, args): + self.info('Args:') + args_dict = vars(args) + for key in args_dict.keys(): + self.info('--> %s: %s', key, args_dict[key]) + self.info('') + + def important_info(self, msg, *args, **kwargs): + if self.isEnabledFor(logging.INFO) and self.rank == 0: + line_width = 2 + important_msg = '\n' + important_msg += ('*'*70 + '\n')*line_width + important_msg += ('*'*line_width + '\n')*2 + important_msg += '*'*line_width + ' '*8 + msg + '\n' + important_msg += ('*'*line_width + '\n')*2 + important_msg += ('*'*70 + '\n')*line_width + self.info(important_msg, *args, **kwargs) + + +def get_logger(path, rank): + logger = LOGGER("mindversion", rank) + logger.setup_logging_file(path, rank) + return logger diff --git a/official/cv/se_resnext50/src/utils/optimizers__init__.py b/official/cv/se_resnext50/src/utils/optimizers__init__.py new file mode 100644 index 0000000000000000000000000000000000000000..c70a530f2c7c51fee15ba8949bd0c1c80f296008 --- /dev/null +++ b/official/cv/se_resnext50/src/utils/optimizers__init__.py @@ -0,0 +1,36 @@ +# Copyright 2021 Huawei Technologies Co., Ltd +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================ +""" +optimizer parameters. +""" +def get_param_groups(network): + """get param groups""" + decay_params = [] + no_decay_params = [] + for x in network.trainable_params(): + parameter_name = x.name + if parameter_name.endswith('.bias'): + # all bias not using weight decay + no_decay_params.append(x) + elif parameter_name.endswith('.gamma'): + # bn weight bias not using weight decay, be carefully for now x not include BN + no_decay_params.append(x) + elif parameter_name.endswith('.beta'): + # bn weight bias not using weight decay, be carefully for now x not include BN + no_decay_params.append(x) + else: + decay_params.append(x) + + return [{'params': no_decay_params, 'weight_decay': 0.0}, {'params': decay_params}] diff --git a/official/cv/se_resnext50/src/utils/sampler.py b/official/cv/se_resnext50/src/utils/sampler.py new file mode 100644 index 0000000000000000000000000000000000000000..fd8e3f61ac09f2fa6e8148c1c56d20ffee8adee0 --- /dev/null +++ b/official/cv/se_resnext50/src/utils/sampler.py @@ -0,0 +1,52 @@ +# Copyright 2021 Huawei Technologies Co., Ltd +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================ +""" +choose samples from the dataset +""" +import math +import numpy as np + +class DistributedSampler(): + """ + sampling the dataset. + + Args: + Returns: + num_samples, number of samples. + """ + def __init__(self, dataset, rank, group_size, shuffle=True, seed=0): + self.dataset = dataset + self.rank = rank + self.group_size = group_size + self.dataset_length = len(self.dataset) + self.num_samples = int(math.ceil(self.dataset_length * 1.0 / self.group_size)) + self.total_size = self.num_samples * self.group_size + self.shuffle = shuffle + self.seed = seed + + def __iter__(self): + if self.shuffle: + self.seed = (self.seed + 1) & 0xffffffff + np.random.seed(self.seed) + indices = np.random.permutation(self.dataset_length).tolist() + else: + indices = list(range(len(self.dataset_length))) + + indices += indices[:(self.total_size - len(indices))] + indices = indices[self.rank::self.group_size] + return iter(indices) + + def __len__(self): + return self.num_samples diff --git a/official/cv/se_resnext50/src/utils/var_init.py b/official/cv/se_resnext50/src/utils/var_init.py new file mode 100644 index 0000000000000000000000000000000000000000..d2954978269a5b26bd286ff40e93b46448551b08 --- /dev/null +++ b/official/cv/se_resnext50/src/utils/var_init.py @@ -0,0 +1,228 @@ +# Copyright 2021 Huawei Technologies Co., Ltd +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================ +""" +Initialize. +""" +import os +import math +from functools import reduce +import numpy as np +import mindspore.nn as nn +from mindspore.common import initializer as init +from mindspore.train.serialization import load_checkpoint, load_param_into_net + +def _calculate_gain(nonlinearity, param=None): + r""" + Return the recommended gain value for the given nonlinearity function. + + The values are as follows: + ================= ==================================================== + nonlinearity gain + ================= ==================================================== + Linear / Identity :math:`1` + Conv{1,2,3}D :math:`1` + Sigmoid :math:`1` + Tanh :math:`\frac{5}{3}` + ReLU :math:`\sqrt{2}` + Leaky Relu :math:`\sqrt{\frac{2}{1 + \text{negative\_slope}^2}}` + ================= ==================================================== + + Args: + nonlinearity: the non-linear function + param: optional parameter for the non-linear function + + Examples: + >>> gain = calculate_gain('leaky_relu', 0.2) # leaky_relu with negative_slope=0.2 + """ + linear_fns = ['linear', 'conv1d', 'conv2d', 'conv3d', 'conv_transpose1d', 'conv_transpose2d', 'conv_transpose3d'] + if nonlinearity in linear_fns or nonlinearity == 'sigmoid': + return 1 + if nonlinearity == 'tanh': + return 5.0 / 3 + if nonlinearity == 'relu': + return math.sqrt(2.0) + if nonlinearity == 'leaky_relu': + if param is None: + negative_slope = 0.01 + elif not isinstance(param, bool) and isinstance(param, int) or isinstance(param, float): + negative_slope = param + else: + raise ValueError("negative_slope {} not a valid number".format(param)) + return math.sqrt(2.0 / (1 + negative_slope ** 2)) + + raise ValueError("Unsupported nonlinearity {}".format(nonlinearity)) + +def _assignment(arr, num): + """Assign the value of `num` to `arr`.""" + if arr.shape == (): + arr = arr.reshape((1)) + arr[:] = num + arr = arr.reshape(()) + else: + if isinstance(num, np.ndarray): + arr[:] = num[:] + else: + arr[:] = num + return arr + +def _calculate_in_and_out(arr): + """ + Calculate n_in and n_out. + + Args: + arr (Array): Input array. + + Returns: + Tuple, a tuple with two elements, the first element is `n_in` and the second element is `n_out`. + """ + dim = len(arr.shape) + if dim < 2: + raise ValueError("If initialize data with xavier uniform, the dimension of data must greater than 1.") + + n_in = arr.shape[1] + n_out = arr.shape[0] + + if dim > 2: + counter = reduce(lambda x, y: x * y, arr.shape[2:]) + n_in *= counter + n_out *= counter + return n_in, n_out + +def _select_fan(array, mode): + mode = mode.lower() + valid_modes = ['fan_in', 'fan_out'] + if mode not in valid_modes: + raise ValueError("Mode {} not supported, please use one of {}".format(mode, valid_modes)) + + fan_in, fan_out = _calculate_in_and_out(array) + return fan_in if mode == 'fan_in' else fan_out + +class KaimingInit(init.Initializer): + r""" + Base Class. Initialize the array with He kaiming algorithm. + + Args: + a: the negative slope of the rectifier used after this layer (only + used with ``'leaky_relu'``) + mode: either ``'fan_in'`` (default) or ``'fan_out'``. Choosing ``'fan_in'`` + preserves the magnitude of the variance of the weights in the + forward pass. Choosing ``'fan_out'`` preserves the magnitudes in the + backwards pass. + nonlinearity: the non-linear function, recommended to use only with + ``'relu'`` or ``'leaky_relu'`` (default). + """ + def __init__(self, a=0, mode='fan_in', nonlinearity='leaky_relu'): + super(KaimingInit, self).__init__() + self.mode = mode + self.gain = _calculate_gain(nonlinearity, a) + def _initialize(self, arr): + pass + + +class KaimingUniform(KaimingInit): + r""" + Initialize the array with He kaiming uniform algorithm. The resulting tensor will + have values sampled from :math:`\mathcal{U}(-\text{bound}, \text{bound})` where + + .. math:: + \text{bound} = \text{gain} \times \sqrt{\frac{3}{\text{fan\_mode}}} + + Input: + arr (Array): The array to be assigned. + + Returns: + Array, assigned array. + + Examples: + >>> w = np.empty(3, 5) + >>> KaimingUniform(w, mode='fan_in', nonlinearity='relu') + """ + + def _initialize(self, arr): + fan = _select_fan(arr, self.mode) + bound = math.sqrt(3.0) * self.gain / math.sqrt(fan) + data = np.random.uniform(-bound, bound, arr.shape) + + _assignment(arr, data) + + +class KaimingNormal(KaimingInit): + r""" + Initialize the array with He kaiming normal algorithm. The resulting tensor will + have values sampled from :math:`\mathcal{N}(0, \text{std}^2)` where + + .. math:: + \text{std} = \frac{\text{gain}}{\sqrt{\text{fan\_mode}}} + + Input: + arr (Array): The array to be assigned. + + Returns: + Array, assigned array. + + Examples: + >>> w = np.empty(3, 5) + >>> KaimingNormal(w, mode='fan_out', nonlinearity='relu') + """ + + def _initialize(self, arr): + fan = _select_fan(arr, self.mode) + std = self.gain / math.sqrt(fan) + data = np.random.normal(0, std, arr.shape) + + _assignment(arr, data) + + +def default_recurisive_init(custom_cell): + """default_recurisive_init""" + for _, cell in custom_cell.cells_and_names(): + if isinstance(cell, nn.Conv2d): + cell.weight.set_data(init.initializer(KaimingUniform(a=math.sqrt(5)), + cell.weight.shape, + cell.weight.dtype)) + if cell.bias is not None: + fan_in, _ = _calculate_in_and_out(cell.weight) + bound = 1 / math.sqrt(fan_in) + cell.bias.set_data(init.initializer(init.Uniform(bound), + cell.bias.shape, + cell.bias.dtype)) + elif isinstance(cell, nn.Dense): + cell.weight.set_data(init.initializer(KaimingUniform(a=math.sqrt(5)), + cell.weight.shape, + cell.weight.dtype)) + if cell.bias is not None: + fan_in, _ = _calculate_in_and_out(cell.weight) + bound = 1 / math.sqrt(fan_in) + cell.bias.set_data(init.initializer(init.Uniform(bound), + cell.bias.shape, + cell.bias.dtype)) + elif isinstance(cell, (nn.BatchNorm2d, nn.BatchNorm1d)): + pass + + +def load_pretrain_model(ckpt_file, network, args): + """load pretrain model.""" + if os.path.isfile(ckpt_file): + param_dict = load_checkpoint(ckpt_file) + param_dict_new = {} + for key, values in param_dict.items(): + if key.startswith('moments.'): + continue + elif key.startswith('network.'): + param_dict_new[key[8:]] = values + else: + param_dict_new[key] = values + load_param_into_net(network, param_dict_new) + args.logger.info('load model {} success'.format(ckpt_file)) diff --git a/official/cv/se_resnext50/train.py b/official/cv/se_resnext50/train.py new file mode 100644 index 0000000000000000000000000000000000000000..97ebb71fc89d67e96e69fd075e082e17a7e20c27 --- /dev/null +++ b/official/cv/se_resnext50/train.py @@ -0,0 +1,208 @@ +# Copyright 2021 Huawei Technologies Co., Ltd +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================ +"""train ImageNet.""" +import os +import time +import datetime + +import mindspore.nn as nn +from mindspore import Tensor, context +from mindspore.context import ParallelMode +from mindspore.nn.optim import Momentum +from mindspore.communication.management import init, get_rank, get_group_size +from mindspore.train.callback import ModelCheckpoint +from mindspore.train.callback import CheckpointConfig, Callback +from mindspore.train.model import Model +from mindspore.train.loss_scale_manager import DynamicLossScaleManager, FixedLossScaleManager +from mindspore.common import set_seed + +from src.dataset import classification_dataset +from src.crossentropy import CrossEntropy +from src.lr_generator import get_lr +from src.utils.logging import get_logger +from src.utils.optimizers__init__ import get_param_groups +from src.utils.var_init import load_pretrain_model +from src.image_classification import get_network +from src.model_utils.config import config + +from src.model_utils.moxing_adapter import moxing_wrapper + + +set_seed(1) + + +class BuildTrainNetwork(nn.Cell): + """build training network""" + def __init__(self, network, criterion): + super(BuildTrainNetwork, self).__init__() + self.network = network + self.criterion = criterion + + def construct(self, input_data, label): + output = self.network(input_data) + loss = self.criterion(output, label) + return loss + + +class ProgressMonitor(Callback): + """monitor loss and time""" + def __init__(self, args): + super(ProgressMonitor, self).__init__() + self.me_epoch_start_time = 0 + self.me_epoch_start_step_num = 0 + self.args = args + self.ckpt_history = [] + + def begin(self, run_context): + self.args.logger.info('start network train...') + + def epoch_begin(self, run_context): + pass + + def epoch_end(self, run_context, *me_args): + cb_params = run_context.original_args() + me_step = cb_params.cur_step_num - 1 + + real_epoch = me_step // self.args.steps_per_epoch + time_used = time.time() - self.me_epoch_start_time + fps_mean = self.args.per_batch_size * (me_step-self.me_epoch_start_step_num) * self.args.group_size / time_used + self.args.logger.info('epoch[{}], iter[{}], loss:{}, mean_fps:{:.2f}' + 'imgs/sec'.format(real_epoch, me_step, cb_params.net_outputs, fps_mean)) + + if self.args.rank_save_ckpt_flag: + import glob + ckpts = glob.glob(os.path.join(self.args.outputs_dir, '*.ckpt')) + for ckpt in ckpts: + ckpt_fn = os.path.basename(ckpt) + if not ckpt_fn.startswith('{}-'.format(self.args.rank)): + continue + if ckpt in self.ckpt_history: + continue + self.ckpt_history.append(ckpt) + self.args.logger.info('epoch[{}], iter[{}], loss:{}, ckpt:{},' + 'ckpt_fn:{}'.format(real_epoch, me_step, cb_params.net_outputs, ckpt, ckpt_fn)) + + + self.me_epoch_start_step_num = me_step + self.me_epoch_start_time = time.time() + + def step_begin(self, run_context): + pass + + def step_end(self, run_context, *me_args): + pass + + def end(self, run_context): + self.args.logger.info('end network train...') + + +def set_parameters(): + """parameters""" + context.set_context(mode=context.GRAPH_MODE, enable_auto_mixed_precision=True, + device_target=config.device_target, save_graphs=False) + # init distributed + if config.run_distribute: + init() + config.rank = get_rank() + config.group_size = get_group_size() + else: + config.rank = 0 + config.group_size = 1 + + if config.is_dynamic_loss_scale == 1: + config.loss_scale = 1 # for dynamic loss scale can not set loss scale in momentum opt + + # select for master rank save ckpt or all rank save, compatible for model parallel + config.rank_save_ckpt_flag = 0 + if config.is_save_on_master: + if config.rank == 0: + config.rank_save_ckpt_flag = 1 + else: + config.rank_save_ckpt_flag = 1 + + # logger + config.outputs_dir = os.path.join(config.output_path, + datetime.datetime.now().strftime('%Y-%m-%d_time_%H_%M_%S')) + config.logger = get_logger(config.outputs_dir, config.rank) + return config + +@moxing_wrapper() +def train(): + """training process""" + set_parameters() + if int(os.getenv('DEVICE_ID', "0")): + context.set_context(device_id=int(os.getenv('DEVICE_ID'))) + + # init distributed + if config.run_distribute: + parallel_mode = ParallelMode.DATA_PARALLEL + context.set_auto_parallel_context(parallel_mode=parallel_mode, device_num=config.group_size, + gradients_mean=True) + # dataloader + de_dataset = classification_dataset(config.data_path, config.image_size, + config.per_batch_size, 1, + config.rank, config.group_size, num_parallel_workers=8) + config.steps_per_epoch = de_dataset.get_dataset_size() + + config.logger.save_args(config) + + # network + config.logger.important_info('start create network') + # get network and init + network = get_network(network=config.network, num_classes=config.num_classes, platform=config.device_target) + + load_pretrain_model(config.checkpoint_file_path, network, config) + + # lr scheduler + lr = get_lr(config) + + # optimizer + opt = Momentum(params=get_param_groups(network), + learning_rate=Tensor(lr), + momentum=config.momentum, + weight_decay=config.weight_decay, + loss_scale=config.loss_scale) + + + # loss + if not config.label_smooth: + config.label_smooth_factor = 0.0 + loss = CrossEntropy(smooth_factor=config.label_smooth_factor, num_classes=config.num_classes) + + if config.is_dynamic_loss_scale == 1: + loss_scale_manager = DynamicLossScaleManager(init_loss_scale=65536, scale_factor=2, scale_window=2000) + else: + loss_scale_manager = FixedLossScaleManager(config.loss_scale, drop_overflow_update=False) + + model = Model(network, loss_fn=loss, optimizer=opt, loss_scale_manager=loss_scale_manager, + metrics={'acc'}, amp_level="O3") + + # checkpoint save + progress_cb = ProgressMonitor(config) + callbacks = [progress_cb,] + if config.rank_save_ckpt_flag: + ckpt_config = CheckpointConfig(save_checkpoint_steps=config.ckpt_interval * config.steps_per_epoch, + keep_checkpoint_max=config.ckpt_save_max) + save_ckpt_path = os.path.join(config.outputs_dir, 'ckpt_' + str(config.rank) + '/') + ckpt_cb = ModelCheckpoint(config=ckpt_config, + directory=save_ckpt_path, + prefix='{}'.format(config.rank)) + callbacks.append(ckpt_cb) + + model.train(config.max_epoch, de_dataset, callbacks=callbacks, dataset_sink_mode=True) + + +if __name__ == "__main__": + train()