diff --git a/official/nlp/transformer/infer/README.md b/official/nlp/transformer/infer/README.md new file mode 100644 index 0000000000000000000000000000000000000000..0bdcd449a85b0c8bd68954848b79059b3ec2e6b2 --- /dev/null +++ b/official/nlp/transformer/infer/README.md @@ -0,0 +1,434 @@ +# Transformer MindX鎺ㄧ悊鍙妋xBase鎺ㄧ悊 + +- [鑴氭湰璇存槑](#鑴氭湰璇存槑) + - [鑴氭湰鍙婃牱渚嬩唬鐮乚(#鑴氭湰鍙婃牱渚嬩唬鐮�) + - [鍑嗗鎺ㄧ悊鏁版嵁](#鍑嗗鎺ㄧ悊鏁版嵁) + - [妯″瀷杞崲](#妯″瀷杞崲) + - [mxBase鎺ㄧ悊](#mxBase鎺ㄧ悊) + - [MindX SDK鎺ㄧ悊](#MindX-SDK鎺ㄧ悊) + +## 鑴氭湰璇存槑 + +### 鑴氭湰鍙婃牱渚嬩唬鐮� + +```text +鈹溾攢鈹€ infer // 鎺ㄧ悊 MindX楂樻€ц兘棰勮缁冩ā鍨嬫柊澧� + 鈹溾攢鈹€ convert // 杞崲om妯″瀷鍛戒护锛孉IPP + 鈹溾攢鈹€ air_to_om.sh + 鈹溾攢鈹€ data // 鍖呮嫭妯″瀷鏂囦欢銆佹ā鍨嬭緭鍏ユ暟鎹泦銆佹ā鍨嬬浉鍏抽厤缃枃浠� + 鈹溾攢鈹€ config // 閰嶇疆鏂囦欢 + 鈹溾攢鈹€ transformer.pipeline + 鈹溾攢鈹€ data // 鎺ㄧ悊鎵€闇€鐨勬暟鎹泦 + 鈹溾攢鈹€ 00_source_eos_ids + 鈹溾攢鈹€ 01_source_eos_mask // 缁忚繃澶勭悊鍚庣殑鏁版嵁闆� + 鈹溾攢鈹€ vocab.bpe.32000 // 璁$畻绮惧害鎵€鐢ㄦ暟鎹泦 + 鈹溾攢鈹€ newstest2014.tok.de // 璁$畻绮惧害鎵€鐢ㄦ暟鎹泦 + 鈹溾攢鈹€ test.all // 鍘熷鏁版嵁闆� + 鈹溾攢鈹€newstest2014-l128-mindrecord + 鈹溾攢鈹€newstest2014-l128-mindrecord.db + 鈹溾攢鈹€ model // air銆乷m妯″瀷鏂囦欢 + 鈹溾攢鈹€ transformer.air + 鈹溾攢鈹€ transformer.om + 鈹溾攢鈹€ mxbase // mxbase鎺ㄧ悊 + 鈹溾攢鈹€ src + 鈹溾攢鈹€transformer.cpp + 鈹溾攢鈹€Transformer.h + 鈹溾攢鈹€main.cpp + 鈹溾攢鈹€ build.sh + 鈹溾攢鈹€ CMakeLists.txt + 鈹溾攢鈹€ post_process.py + 鈹溾攢鈹€ sdk // sdk鎺ㄧ悊 + 鈹溾攢鈹€main.py + 鈹溾攢鈹€docker_start_infer.sh // 鍚姩瀹瑰櫒鑴氭湰 + 鈹溾攢鈹€multi-bleu.perl // 璁$畻绮惧害鑴氭湰 +``` + +### 鍑嗗鎺ㄧ悊鏁版嵁 + +鍑嗗妯″瀷杞崲鍜屾ā鍨嬫帹鐞嗘墍闇€鐩綍鍙婃暟鎹€� + +1. 澶勭悊鏁版嵁闆� + +- 杩涘叆瀹瑰櫒鎵ц浠ヤ笅鍛戒护: + +鍚姩瀹瑰櫒,杩涘叆Transformer/infer鐩綍,鎵ц浠ヤ笅鍛戒护,鍚姩瀹瑰櫒銆� + +```Shell +bash docker_start_infer.sh docker_image tag model_dir +``` + +**琛� 2** 鍙傛暟璇存槑 + + | 鍙傛暟 | 璇存槑 | + | ----------- | ----------- | + | docker_image | 鎺ㄧ悊闀滃儚鍚嶇О锛屾牴鎹疄闄呭啓鍏ャ€� | + | tag | 闀滃儚tag锛岃鏍规嵁瀹為檯閰嶇疆锛屽锛�21.0.2銆� | + | model_dir | 鎺ㄧ悊浠g爜璺緞銆� | + +- 鍚姩瀹瑰櫒鏃朵細灏嗘帹鐞嗚姱鐗囧拰鏁版嵁璺緞鎸傝浇鍒板鍣ㄤ腑銆傚彲鏍规嵁闇€瑕侀€氳繃淇敼**docker\_start\_infer.sh**鐨刣evice鏉ユ寚瀹氭寕杞界殑鎺ㄧ悊鑺墖銆� + +```Shell +docker run -it +--device=/dev/davinci0 # 鍙牴鎹渶瑕佷慨鏀规寕杞界殑npu璁惧 +--device=/dev/davinci_manager +``` + +>**璇存槑锛�** +>MindX SDK寮€鍙戝浠讹紙mxManufacture锛夊凡瀹夎鍦ㄥ熀纭€闀滃儚涓紝瀹夎璺緞锛氣€�/usr/local/sdk_home鈥溿€� + +2. 涓嬭浇杞欢鍖呫€� + + 鍗曞嚮鈥滀笅杞芥ā鍨嬭剼鏈€濆拰鈥滀笅杞芥ā鍨嬧€濓紝涓嬭浇鎵€闇€杞欢鍖呫€� + +3. 灏嗘ā鍨嬭剼鏈拰妯″瀷涓婁紶鑷虫帹鐞嗘湇鍔″櫒浠绘剰鐩綍骞惰В鍘嬶紙濡傗€�/home/Transformer鈥濓級 + +```shell +# 鍦ㄧ幆澧冧笂鎵ц +unzip Transformer_for_MindSpore_{version}_code.zip +cd Transformer_for_MindSpore_{version}_code/infer && dos2unix `find .` +unzip ../../Transformer_for_MindSpore_{version}_model.zip +``` + +4. 鍚姩瀹瑰櫒鍚庯紝杩涘叆鈥淭ransformer鈥滀唬鐮佺洰褰� + +鎵ц鍛戒护濡備笅: + +```Shell +bash wmt16_en_de.sh +``` + +鍋囪鎮ㄥ凡鑾峰緱涓嬪垪鏂囦欢,灏嗕互涓嬫枃浠剁Щ鍏ュ埌浠g爜鐩綍鈥淭ransformer/infer/data/data/鈥滅洰褰曚笅 + +```text +鈹溾攢鈹€ wmt16_en_de + vocab.bpe.32000 + newstest2014.tok.bpe.32000.en + newstest2014.tok.bpe.32000.de + newstest2014.tok.de +``` + +杩涘叆鈥淭ransformer/infer/data/data/鈥滅洰褰� + +鎵ц鍛戒护濡備笅: + +```Shell +paste newstest2014.tok.bpe.32000.en newstest2014.tok.bpe.32000.de > test.all +``` + +灏哾efault_config.yaml涓璪ucket鏀逛负bucket: [128] + +```text +# create_data.py +input_file: '' +num_splits: 16 +clip_to_max_len: False +max_seq_length: 128 +bucket: [128] +``` + +杩涘叆鈥淭ransformer/鈥滅洰褰� + +鎵ц鍛戒护濡備笅: + +```Shell +python3 create_data.py --input_file ./infer/data/data/test.all --vocab_file ./infer/data/data/vocab.bpe.32000 --output_file ./infer/data/data/newstest2014-l128-mindrecord --num_splits 1 --max_seq_length 128 --clip_to_max_len True +``` + +鏇存敼default_config.yaml涓弬鏁帮細 + +```text +#eval_config/cfg edict +data_file: './infer/data/data/newstest2014-l128-mindrecord' +... +#'preprocess / from eval_config' +result_path: "./infer/data/data" +``` + +鎺ョ潃鎵ц鍛戒护锛� + +```Shell +python3 preprocess.py +``` + +鎵ц鍚庡湪鈥淭ransformer/infer/data/data鈥滅洰褰曚腑寰楀埌鏂囦欢澶瑰涓�: + +```txt +鈹溾攢鈹€data + 00_source_eos_ids + 01_source_eos_mask +``` + +### 妯″瀷杞崲 + +浠ヤ笅鎿嶄綔鍧囬渶杩涘叆瀹瑰櫒涓墽琛屻€� + +1. 鍑嗗妯″瀷鏂囦欢銆� + +- transformer.air + +- 灏嗘枃浠舵斁鍏ransformer/infer/data/model涓� + +2. 妯″瀷杞崲銆� + + 杩涘叆鈥淭ransformer/infer/convert鈥滅洰褰曡繘琛屾ā鍨嬭浆鎹紝杞崲璇︾粏淇℃伅鍙煡鐪嬭浆鎹㈣剼鏈拰瀵瑰簲鐨刟ipp閰嶇疆鏂囦欢锛�**鍦╝ir_to_om.sh**鑴氭湰鏂囦欢涓紝閰嶇疆鐩稿叧鍙傛暟銆� + +```Shell +model_path=$1 +output_model_name=$2 + +atc --model=$model_path \ # 甯﹁浆鎹㈡ā鍨嬬殑璺緞 + --framework=1 \ # 1琛ㄧずMindSpore + --output=$output_model_name \ # 杈撳嚭om妯″瀷鐨勮矾寰� + --input_format=NCHW \ # 杈撳叆鏍煎紡 + --soc_version=Ascend310 \ # 妯″瀷杞崲鏃舵寚瀹氳姱鐗囩増鏈� + --op_select_implmode=high_precision \ # 妯″瀷杞崲绮惧害 + --precision_mode=allow_fp32_to_fp16 # 妯″瀷杞崲绮惧害 +``` + +杞崲鍛戒护濡備笅: + +```Shell +bash air_to_om.sh [input_path] [output_path] +e.g. +bash air_to_om.sh ../data/model/transformer.air ../data/model/transformer +``` + +**琛� 3** 鍙傛暟璇存槑 + +| 鍙傛暟 | 璇存槑 | +| ----------- | ----------- | +| input_path | AIR鏂囦欢璺緞銆� | +| output_path |鐢熸垚鐨凮M鏂囦欢鍚嶏紝杞崲鑴氭湰浼氬湪姝ゅ熀纭€涓婃坊鍔�.om鍚庣紑銆� | + +### mxBase鎺ㄧ悊 + +宸茶繘鍏ユ帹鐞嗗鍣ㄧ幆澧�,鍏蜂綋鎿嶄綔璇峰弬瑙佲€滃噯澶囧鍣ㄧ幆澧冣€濄€� + +1. 鏍规嵁瀹為檯鎯呭喌淇敼Transformer.h鏂囦欢涓殑鎺ㄧ悊缁撴灉淇濆瓨璺緞銆� + +```c + private: + std::shared_ptr<MxBase::ModelInferenceProcessor> model_; + MxBase::ModelDesc modelDesc_ = {}; + uint32_t deviceId_ = 0; + std::string outputDataPath_ = "./result/result.txt"; +}; + +#endif +``` + +2. 鍦ㄢ€渋nfer/mxbase鈥濈洰褰曚笅锛岀紪璇戝伐绋� + +```shell +bash build.sh +``` + +缂栬瘧瀹屾垚鍚庡湪mxbase鐩綍涓嬪緱鍒颁互涓嬫柊鏂囦欢: + +```text +鈹溾攢鈹€ mxbase + 鈹溾攢鈹€ build // 缂栬瘧鍚庣殑鏂囦欢 + 鈹溾攢鈹€ result //鐢ㄤ簬瀛樻斁鎺ㄧ悊缁撴灉鐨勭┖鏂囦欢澶� + 鈹溾攢鈹€ Transformer // 鐢ㄤ簬鎵ц鐨勬ā鍨嬫枃浠� +``` + + 杩愯鎺ㄧ悊鏈嶅姟銆� + + 鍦ㄢ€渋nfer/mxbase鈥濈洰褰曚笅锛岃繍琛屾帹鐞嗙▼搴忚剼鏈� + +```shell +./Transformer [model_path] [input_data_path/] [output_data_path] +e.g. +./Transformer ../data/model/transformer.om ../data/data ./result +``` + +**琛� 4** 鍙傛暟璇存槑锛� + +| 鍙傛暟 | 璇存槑 | +| ---------------- | ------------ | +| model_path | 妯″瀷璺緞 | +| input_data_path | 澶勭悊鍚庣殑鏁版嵁璺緞 | +| output_data_path | 杈撳嚭鎺ㄧ悊缁撴灉璺緞 | + +3. 澶勭悊缁撴灉銆� + +淇敼鍙傛暟 + +```python +path = "./result" #鎺ㄧ悊缁撴灉鎵€鍦ㄦ枃浠跺す + +filenames = os.listdir(path) +result = "./results.txt" #澶勭悊鎺ㄧ悊缁撴灉鍚庢枃浠舵墍鍦ㄨ矾寰� +``` + +鍦ㄢ€渋nfer/mxbase鈥濈洰褰曚笅鎵ц: + +```shell +python3 post_process.py +``` + +鍦ㄢ€渋nfer/mxbase鈥濈洰褰曚笅寰楀埌results.txt + +4. 鎺ㄧ悊绮惧害 + +杩涘叆"Transformer/"鐩綍涓嬫墽琛岋細 + +```shell +bash scripts/process_output.sh REF_DATA EVAL_OUTPUT VOCAB_FILE +e.g. +bash scripts/process_output.sh ./infer/data/data/newstest2014.tok.de ./infer/mxbase/results.txt ./infer/data/data/vocab.bpe.32000 +``` + +杩涘叆"Transformer/infer/"鐩綍涓嬫墽琛岋細 + +```shell +perl multi-bleu.perl REF_DATA.forbleu < EVAL_OUTPUT.forbleu +e.g. +perl multi-bleu.perl ./data/data/newstest2014.tok.de.forbleu < ./mxbase/results.txt.forbleu +``` + +寰楀埌绮惧害BLEU涓�27.24 + +### MindX SDK鎺ㄧ悊 + +宸茶繘鍏ユ帹鐞嗗鍣ㄧ幆澧冦€傚叿浣撴搷浣滆鍙傝鈥滃噯澶囧鍣ㄧ幆澧冣€濄€� + +1. 鍑嗗妯″瀷鎺ㄧ悊鏂囦欢 + + (1)杩涘叆Transformer/infer/data/config鐩綍锛宼ransformer.pipeline鏂囦欢涓殑"modelPath": "../model/transformer.om"涓簅m妯″瀷鎵€鍦ㄨ矾寰勩€� + +```txt + { + "transformer": { + "stream_config": { + "deviceId": "0" + }, + "appsrc0": { + "props": { + "blocksize": "409600" + }, + "factory": "appsrc", + "next": "mxpi_tensorinfer0:0" + }, + "appsrc1": { + "props": { + "blocksize": "409600" + }, + "factory": "appsrc", + "next": "mxpi_tensorinfer0:1" + }, + "mxpi_tensorinfer0": { + "props": { + "dataSource":"appsrc0,appsrc1", + "modelPath": "../data/model/transformer.om" + }, + "factory": "mxpi_tensorinfer", + "next": "mxpi_dataserialize0" + }, + "mxpi_dataserialize0": { + "props": { + "outputDataKeys": "mxpi_tensorinfer0" + }, + "factory": "mxpi_dataserialize", + "next": "appsink0" + }, + "appsink0": { + "props": { + "blocksize": "4096000" + }, + "factory": "appsink" + } + } +} +``` + +(2) 鏍规嵁瀹為檯鎯呭喌淇敼main.py鏂囦欢涓殑 **pipeline璺緞** 銆�**鏁版嵁闆嗚矾寰�**銆�**鎺ㄧ悊缁撴灉璺緞**鏂囦欢璺緞銆� + +```python +def run(): + """ + read pipeline and do infer + """ + # init stream manager + stream_manager_api = StreamManagerApi() + ret = stream_manager_api.InitManager() + if ret != 0: + print("Failed to init Stream manager, ret=%s" % str(ret)) + return + + # create streams by pipeline config file + with open("../data/config/transformer.pipeline", 'rb') as f: #pipeline璺緞 + pipelineStr = f.read() + ret = stream_manager_api.CreateMultipleStreams(pipelineStr) + + if ret != 0: + print("Failed to create Stream, ret=%s" % str(ret)) + return + stream_name = b'transformer' + predictions = [] + path = '../data/data/00_source_eos_ids' #鏁版嵁闆嗚矾寰� + path1 = '../data/data/01_source_eos_mask' #鏁版嵁闆嗚矾寰� + files = os.listdir(path) + for i in range(len(files)): + full_file_path = os.path.join(path, "transformer_bs_1_" + str(i) + ".bin") + full_file_path1 = os.path.join(path1, "transformer_bs_1_" + str(i) + ".bin") + source_ids = np.fromfile(full_file_path, dtype=np.int32) + source_mask = np.fromfile(full_file_path1, dtype=np.int32) + source_ids = np.expand_dims(source_ids, 0) + source_mask = np.expand_dims(source_mask, 0) + print(source_ids) + print(source_mask) + if not send_source_data(0, source_ids, stream_name, stream_manager_api): + return + if not send_source_data(1, source_mask, stream_name, stream_manager_api): + return + # Obtain the inference result by specifying streamName and uniqueId. + key_vec = StringVector() + key_vec.push_back(b'mxpi_tensorinfer0') + infer_result = stream_manager_api.GetProtobuf(stream_name, 0, key_vec) + if infer_result.size() == 0: + print("inferResult is null") + return + if infer_result[0].errorCode != 0: + print("GetProtobuf error. errorCode=%d" % (infer_result[0].errorCode)) + return + result = MxpiDataType.MxpiTensorPackageList() + result.ParseFromString(infer_result[0].messageBuf) + res = np.frombuffer(result.tensorPackageVec[0].tensorVec[0].dataStr, dtype=np.int32) + print(res) + predictions.append(res.reshape(1, 1, 81)) + # decode and write to file + f = open('./results', 'w') #鎺ㄧ悊缁撴灉璺緞 + for batch_out in predictions: + token_ids = [str(x) for x in batch_out[0][0].tolist()] + f.write(" ".join(token_ids) + "\n") + f.close() + # destroy streams + stream_manager_api.DestroyAllStreams() +``` + +2. 杩愯鎺ㄧ悊鏈嶅姟,杩涘叆鈥淭ransformer/infer/sdk鈥� 鐩綍涓嬫墽琛屻€� + +```Shell +python3 main.py +``` + +3. 璁$畻鎺ㄧ悊绮惧害銆� + +杩涘叆"Transformer/"鐩綍涓嬫墽琛岋細 + +```shell +bash scripts/process_output.sh REF_DATA EVAL_OUTPUT VOCAB_FILE +e.g. +bash scripts/process_output.sh ./infer/data/data/newstest2014.tok.de ./infer/sdk/results ./infer/data/data/vocab.bpe.32000 +``` + +杩涘叆"Transformer/infer/"鐩綍涓嬫墽琛岋細 + +```shell +perl multi-bleu.perl REF_DATA.forbleu < EVAL_OUTPUT.forbleu +e.g. +perl multi-bleu.perl ./data/data/newstest2014.tok.de.forbleu < ./sdk/results.forbleu +``` + +寰楀埌绮惧害BLEU涓�27.24 \ No newline at end of file diff --git a/official/nlp/transformer/infer/convert/air_to_om.sh b/official/nlp/transformer/infer/convert/air_to_om.sh new file mode 100644 index 0000000000000000000000000000000000000000..bbee9e221df82507d96760c8999c4fbf78d6ea02 --- /dev/null +++ b/official/nlp/transformer/infer/convert/air_to_om.sh @@ -0,0 +1,26 @@ +#!/usr/bin/env bash + +# Copyright 2022 Huawei Technologies Co., Ltd +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +model_path=$1 +output_model_name=$2 + +atc --model=$model_path \ + --framework=1 \ + --output=$output_model_name \ + --input_format=NCHW \ + --soc_version=Ascend310 \ + --op_select_implmode=high_precision \ + --precision_mode=allow_fp32_to_fp16 diff --git a/official/nlp/transformer/infer/data/config/transformer.pipeline b/official/nlp/transformer/infer/data/config/transformer.pipeline new file mode 100644 index 0000000000000000000000000000000000000000..25c2b39aede8408721a9c8bcfc91e1386fbc5ce0 --- /dev/null +++ b/official/nlp/transformer/infer/data/config/transformer.pipeline @@ -0,0 +1,42 @@ +{ + "transformer": { + "stream_config": { + "deviceId": "0" + }, + "appsrc0": { + "props": { + "blocksize": "409600" + }, + "factory": "appsrc", + "next": "mxpi_tensorinfer0:0" + }, + "appsrc1": { + "props": { + "blocksize": "409600" + }, + "factory": "appsrc", + "next": "mxpi_tensorinfer0:1" + }, + "mxpi_tensorinfer0": { + "props": { + "dataSource":"appsrc0,appsrc1", + "modelPath": "../data/model/transformer.om" + }, + "factory": "mxpi_tensorinfer", + "next": "mxpi_dataserialize0" + }, + "mxpi_dataserialize0": { + "props": { + "outputDataKeys": "mxpi_tensorinfer0" + }, + "factory": "mxpi_dataserialize", + "next": "appsink0" + }, + "appsink0": { + "props": { + "blocksize": "4096000" + }, + "factory": "appsink" + } + } +} diff --git a/official/nlp/transformer/infer/docker_start_infer.sh b/official/nlp/transformer/infer/docker_start_infer.sh new file mode 100644 index 0000000000000000000000000000000000000000..c76879755acc27849fd8ca75ab0fcd76daea1658 --- /dev/null +++ b/official/nlp/transformer/infer/docker_start_infer.sh @@ -0,0 +1,43 @@ +#!/bin/bash +# Copyright 2022 Huawei Technologies Co., Ltd +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +docker_image=$1 +share_dir=$2 +data_dir=$3 +echo "$1" +echo "$2" +if [ -z "${docker_image}" ]; then + echo "please input docker_image" + exit 1 +fi + +if [ ! -d "${share_dir}" ]; then + echo "please input share directory that contains dataset, models and codes" + exit 1 +fi + + +docker run -it -u root \ + --device=/dev/davinci0 \ + --device=/dev/davinci_manager \ + --device=/dev/devmm_svm \ + --device=/dev/hisi_hdc \ + --privileged \ + -v //usr/local/bin/npu-smi:/usr/local/bin/npu-smi \ + -v /usr/local/Ascend/driver:/usr/local/Ascend/driver \ + -v ${data_dir}:${data_dir} \ + -v ${share_dir}:${share_dir} \ + ${docker_image} \ + /bin/bash diff --git a/official/nlp/transformer/infer/mxbase/CMakeLists.txt b/official/nlp/transformer/infer/mxbase/CMakeLists.txt new file mode 100644 index 0000000000000000000000000000000000000000..979de41a2c96b8d0aff46ee5581901e1e95b9ec0 --- /dev/null +++ b/official/nlp/transformer/infer/mxbase/CMakeLists.txt @@ -0,0 +1,48 @@ +cmake_minimum_required(VERSION 3.14.0) +project(Transformer) +set(TARGET Transformer) +add_definitions(-DENABLE_DVPP_INTERFACE) +add_definitions(-D_GLIBCXX_USE_CXX11_ABI=0) +add_definitions(-Dgoogle=mindxsdk_private) +add_compile_options(-std=c++11 -fPIE -fstack-protector-all -fPIC -Wall) +add_link_options(-Wl,-z,relro,-z,now,-z,noexecstack -s -pie) +# Check environment variable +if(NOT DEFINED ENV{ASCEND_HOME}) + message(FATAL_ERROR "please define environment variable:ASCEND_HOME") +endif() +if(NOT DEFINED ENV{ASCEND_VERSION}) + message(WARNING "please define environment variable:ASCEND_VERSION") +endif() +if(NOT DEFINED ENV{ARCH_PATTERN}) + message(WARNING "please define environment variable:ARCH_PATTERN") +endif() +set(ACL_INC_DIR $ENV{ASCEND_HOME}/$ENV{ASCEND_VERSION}/$ENV{ARCH_PATTERN}/acllib/include) +set(ACL_LIB_DIR $ENV{ASCEND_HOME}/$ENV{ASCEND_VERSION}/$ENV{ARCH_PATTERN}/acllib/lib64) +set(MXBASE_ROOT_DIR $ENV{MX_SDK_HOME}) +set(MXBASE_INC ${MXBASE_ROOT_DIR}/include) +set(MXBASE_LIB_DIR ${MXBASE_ROOT_DIR}/lib) +set(MXBASE_POST_LIB_DIR ${MXBASE_ROOT_DIR}/lib/modelpostprocessors) +set(MXBASE_POST_PROCESS_DIR ${MXBASE_ROOT_DIR}/include/MxBase/postprocess/include) + +if(NOT DEFINED ENV{MXSDK_OPENSOURCE_DIR}) + message(WARNING "please define environment variable:MXSDK_OPENSOURCE_DIR") +endif() + +set(OPENSOURCE_DIR $ENV{MXSDK_OPENSOURCE_DIR}) + +include_directories(${ACL_INC_DIR}) +include_directories(${OPENSOURCE_DIR}/include) +include_directories(${OPENSOURCE_DIR}/include/opencv4) + + +include_directories(${MXBASE_INC}) +include_directories(${MXBASE_POST_PROCESS_DIR}) + +link_directories(${ACL_LIB_DIR}) +link_directories(${OPENSOURCE_DIR}/lib) +link_directories(${MXBASE_LIB_DIR}) +link_directories(${MXBASE_POST_LIB_DIR}) + +add_executable(${TARGET} ./src/main.cpp ./src/transformer.cpp) +target_link_libraries(${TARGET} glog cpprest mxbase opencv_world stdc++fs) +install(TARGETS ${TARGET} RUNTIME DESTINATION ${PROJECT_SOURCE_DIR}/) \ No newline at end of file diff --git a/official/nlp/transformer/infer/mxbase/build.sh b/official/nlp/transformer/infer/mxbase/build.sh new file mode 100644 index 0000000000000000000000000000000000000000..0f250167cfea9bd853729795c0be283fc50d9bc6 --- /dev/null +++ b/official/nlp/transformer/infer/mxbase/build.sh @@ -0,0 +1,61 @@ +#!/bin/bash + +# Copyright 2022 Huawei Technologies Co., Ltd +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================ +export ASCEND_HOME=/usr/local/Ascend +export ASCEND_VERSION=ascend-toolkit/latest +export ARCH_PATTERN=. +export MXSDK_OPENSOURCE_DIR=/usr/local/sdk_home/mxManufacture/opensource + +function check_env() +{ + # set ASCEND_VERSION to ascend-toolkit/latest when it was not specified by user + if [ ! "${ASCEND_VERSION}" ]; then + export ASCEND_VERSION=ascend-toolkit/latest + echo "Set ASCEND_VERSION to the default value: ${ASCEND_VERSION}" + else + echo "ASCEND_VERSION is set to ${ASCEND_VERSION} by user" + fi + + if [ ! "${ARCH_PATTERN}" ]; then + # set ARCH_PATTERN to ./ when it was not specified by user + export ARCH_PATTERN=./ + echo "ARCH_PATTERN is set to the default value: ${ARCH_PATTERN}" + else + echo "ARCH_PATTERN is set to ${ARCH_PATTERN} by user" + fi +} + +function build_transformer() +{ + cd . + rm -rf build + mkdir -p build + cd build + cmake .. + make + ret=$? + if [ ${ret} -ne 0 ]; then + echo "Failed to build transformer." + exit ${ret} + fi + make install +} + +rm -rf ./result +mkdir -p ./result + +check_env +build_transformer \ No newline at end of file diff --git a/official/nlp/transformer/infer/mxbase/post_process.py b/official/nlp/transformer/infer/mxbase/post_process.py new file mode 100644 index 0000000000000000000000000000000000000000..a4ce83b37ff2e0e10fdbb306abac930f1cab4d8b --- /dev/null +++ b/official/nlp/transformer/infer/mxbase/post_process.py @@ -0,0 +1,37 @@ +# coding=utf-8 + +# Copyright 2022 Huawei Technologies Co., Ltd +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================ + +import os + +path = "./result" + +filenames = os.listdir(path) +result = "./results.txt" + +file = open(result, 'w+', encoding="utf-8") + +for i in range(3003): + filepath = path + '/' + filepath = filepath + 'transformer_bs_1_'+str(i)+'.txt' + originfile = open(filepath) + for line in originfile.readlines(): + line = line.strip() + file.write(line) + file.write(' ') + file.write('\n') + +file.close() diff --git a/official/nlp/transformer/infer/mxbase/src/Transformer.h b/official/nlp/transformer/infer/mxbase/src/Transformer.h new file mode 100644 index 0000000000000000000000000000000000000000..f27a0204a1b14da9ae3aeef59d6f3bda2e42d46c --- /dev/null +++ b/official/nlp/transformer/infer/mxbase/src/Transformer.h @@ -0,0 +1,60 @@ +/* + * Copyright 2022 Huawei Technologies Co., Ltd + * + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +#ifndef MXBASE_SRCNN_H +#define MXBASE_SRCNN_H + +#include <memory> +#include <utility> +#include <vector> +#include <string> +#include <map> +#include "MxBase/ModelInfer/ModelInferenceProcessor.h" +#include "MxBase/Tensor/TensorContext/TensorContext.h" + +extern std::vector<double> g_inferCost; + +struct InitParam { + uint32_t deviceId; + std::string modelPath; + std::string outputDataPath; +}; + +enum DataIndex { + source_ids = 0, + source_mask = 1, +}; + +class transformer { + public: + APP_ERROR Init(const InitParam &initParam); + APP_ERROR DeInit(); + APP_ERROR Inference(const std::vector<MxBase::TensorBase> &inputs, std::vector<MxBase::TensorBase> *outputs); + APP_ERROR Process(const std::string &inferPath, const std::string &fileName); + + protected: + APP_ERROR ReadTensorFromFile(const std::string &file, uint32_t *data, uint32_t size); + APP_ERROR ReadInputTensor(const std::string &fileName, uint32_t index, std::vector<MxBase::TensorBase> *inputs); + APP_ERROR WriteResult(const std::string &imageFile, std::vector<MxBase::TensorBase> outputs); + + private: + std::shared_ptr<MxBase::ModelInferenceProcessor> model_; + MxBase::ModelDesc modelDesc_ = {}; + uint32_t deviceId_ = 0; + std::string outputDataPath_ = "./result/result.txt"; +}; + +#endif diff --git a/official/nlp/transformer/infer/mxbase/src/main.cpp b/official/nlp/transformer/infer/mxbase/src/main.cpp new file mode 100644 index 0000000000000000000000000000000000000000..7515a983f07953d1c806ec08678f82f6dfb60434 --- /dev/null +++ b/official/nlp/transformer/infer/mxbase/src/main.cpp @@ -0,0 +1,77 @@ + +/** + * Copyright 2022 Huawei Technologies Co., Ltd + * + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +#include <unistd.h> +#include <dirent.h> +#include <iostream> +#include <fstream> +#include <vector> +#include "Transformer.h" +#include "MxBase/Log/Log.h" + +std::vector<double> g_inferCost; + +void InitProtonetParam(InitParam* initParam, const std::string &model_path, const std::string &output_data_path) { + initParam->deviceId = 0; + initParam->modelPath = model_path; + initParam->outputDataPath = output_data_path; +} + +int main(int argc, char* argv[]) { + LogInfo << "======================================= !!!Parameters setting!!!" << \ + "========================================"; + std::string model_path = argv[1]; + LogInfo << "========== loading model weights from: " << model_path; + + std::string input_data_path = argv[2]; + LogInfo << "========== input data path = " << input_data_path; + + std::string output_data_path = argv[3]; + LogInfo << "========== output data path = " << output_data_path << \ + " WARNING: please make sure that this folder is created in advance!!!"; + + LogInfo << "======================================== !!!Parameters setting!!! " << \ + "========================================"; + + InitParam initParam; + InitProtonetParam(&initParam, model_path, output_data_path); + auto model_ = std::make_shared<transformer>(); + APP_ERROR ret = model_->Init(initParam); + if (ret != APP_ERR_OK) { + LogError << "transformer init failed, ret=" << ret << "."; + return ret; + } + for (uint32_t i = 0; i < 3003; i++) { + ret = model_->Process(input_data_path, "transformer_bs_1_" + std::to_string(i) + ".bin"); + if (ret != APP_ERR_OK) { + LogError << "transformer process failed, ret=" << ret << "."; + model_->DeInit(); + return ret; + } + } + LogInfo << "infer succeed and write the result data with binary file !"; + + model_->DeInit(); + double costSum = 0; + for (uint32_t i = 0; i < g_inferCost.size(); i++) { + costSum += g_inferCost[i]; + } + LogInfo << "Infer sum " << g_inferCost.size() << ", cost total time: " << costSum << " ms."; + LogInfo << "The throughput: " << g_inferCost.size() * 1000 / costSum << " bin/sec."; + LogInfo << "========== The infer result has been saved in ---> " << output_data_path; + return APP_ERR_OK; +} diff --git a/official/nlp/transformer/infer/mxbase/src/transformer.cpp b/official/nlp/transformer/infer/mxbase/src/transformer.cpp new file mode 100644 index 0000000000000000000000000000000000000000..0755d401444b5a1d90317bffe2e399a91926eb59 --- /dev/null +++ b/official/nlp/transformer/infer/mxbase/src/transformer.cpp @@ -0,0 +1,188 @@ +/** + * Copyright 2022 Huawei Technologies Co., Ltd + * + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +#include "Transformer.h" +#include <unistd.h> +#include <sys/stat.h> +#include <map> +#include <fstream> +#include "MxBase/DeviceManager/DeviceManager.h" +#include "MxBase/Log/Log.h" + +const uint32_t MAX_LENGTH = 128; + +APP_ERROR transformer::Init(const InitParam &initParam) { + deviceId_ = initParam.deviceId; + outputDataPath_ = initParam.outputDataPath; + APP_ERROR ret = MxBase::DeviceManager::GetInstance()->InitDevices(); + if (ret != APP_ERR_OK) { + LogError << "Init devices failed, ret=" << ret << "."; + return ret; + } + + ret = MxBase::TensorContext::GetInstance()->SetContext(initParam.deviceId); + if (ret != APP_ERR_OK) { + LogError << "Set context failed, ret=" << ret << "."; + return ret; + } + + model_ = std::make_shared<MxBase::ModelInferenceProcessor>(); + ret = model_->Init(initParam.modelPath, modelDesc_); + if (ret != APP_ERR_OK) { + LogError << "ModelInferenceProcessor init failed, ret=" << ret << "."; + return ret; + } + + return APP_ERR_OK; +} + +APP_ERROR transformer::DeInit() { + model_->DeInit(); + MxBase::DeviceManager::GetInstance()->DestroyDevices(); + return APP_ERR_OK; +} + +APP_ERROR transformer::ReadTensorFromFile(const std::string &file, uint32_t *data, uint32_t size) { + if (data == NULL|| size < MAX_LENGTH) { + LogError << "input data is invalid."; + return APP_ERR_COMM_INVALID_POINTER; + } + + std::ifstream infile; + // open data file + infile.open(file, std::ios_base::in | std::ios_base::binary); + // check data file validity + if (infile.fail()) { + LogError << "Failed to open data file: " << file << "."; + return APP_ERR_COMM_OPEN_FAIL; + } + infile.read(reinterpret_cast<char*>(data), sizeof(uint32_t) * MAX_LENGTH); + infile.close(); + return APP_ERR_OK; +} + +APP_ERROR transformer::ReadInputTensor(const std::string &fileName, uint32_t index, + std::vector<MxBase::TensorBase> *inputs) { + uint32_t data[MAX_LENGTH] = {0}; + APP_ERROR ret = ReadTensorFromFile(fileName, data, MAX_LENGTH); + if (ret != APP_ERR_OK) { + LogError << "ReadTensorFromFile failed."; + return ret; + } + + const uint32_t dataSize = modelDesc_.inputTensors[index].tensorSize; + MxBase::MemoryData memoryDataDst(dataSize, MxBase::MemoryData::MEMORY_DEVICE, deviceId_); + MxBase::MemoryData memoryDataSrc(reinterpret_cast<void*>(data), dataSize, MxBase::MemoryData::MEMORY_HOST_MALLOC); + ret = MxBase::MemoryHelper::MxbsMallocAndCopy(memoryDataDst, memoryDataSrc); + if (ret != APP_ERR_OK) { + LogError << GetError(ret) << "Memory malloc and copy failed."; + return ret; + } + + std::vector<uint32_t> shape = {1, MAX_LENGTH}; + inputs->push_back(MxBase::TensorBase(memoryDataDst, false, shape, MxBase::TENSOR_DTYPE_UINT32)); + return APP_ERR_OK; +} + + +APP_ERROR transformer::Inference(const std::vector<MxBase::TensorBase> &inputs, + std::vector<MxBase::TensorBase> *outputs) { + auto dtypes = model_->GetOutputDataType(); + for (size_t i = 0; i < modelDesc_.outputTensors.size(); ++i) { + std::vector<uint32_t> shape = {}; + for (size_t j = 0; j < modelDesc_.outputTensors[i].tensorDims.size(); ++j) { + shape.push_back((uint32_t)modelDesc_.outputTensors[i].tensorDims[j]); + } + MxBase::TensorBase tensor(shape, dtypes[i], MxBase::MemoryData::MemoryType::MEMORY_DEVICE, deviceId_); + APP_ERROR ret = MxBase::TensorBase::TensorBaseMalloc(tensor); + if (ret != APP_ERR_OK) { + LogError << "TensorBaseMalloc failed, ret=" << ret << "."; + return ret; + } + outputs->push_back(tensor); + } + + MxBase::DynamicInfo dynamicInfo = {}; + dynamicInfo.dynamicType = MxBase::DynamicType::STATIC_BATCH; + auto startTime = std::chrono::high_resolution_clock::now(); + APP_ERROR ret = model_->ModelInference(inputs, *outputs, dynamicInfo); + auto endTime = std::chrono::high_resolution_clock::now(); + double costMs = std::chrono::duration<double, std::milli>(endTime - startTime).count(); + g_inferCost.push_back(costMs); + + if (ret != APP_ERR_OK) { + LogError << "ModelInference failed, ret=" << ret << "."; + return ret; + } + return APP_ERR_OK; +} + + +APP_ERROR transformer::WriteResult(const std::string &imageFile, std::vector<MxBase::TensorBase> outputs) { + APP_ERROR ret = outputs[0].ToHost(); + if (ret != APP_ERR_OK) { + LogError << GetError(ret) << "tohost fail."; + return ret; + } + auto dataptr = (uint32_t *)outputs[0].GetBuffer(); // NOLINT + int pos = imageFile.rfind('/'); + std::string fileName(imageFile, pos + 1); + fileName.replace(fileName.find('.'), fileName.size() - fileName.find('.'), ".txt"); + std::string outFileName = this->outputDataPath_ + "/" + fileName; + + LogInfo << "file path for saving result: " << outFileName; + std::ofstream tfile(outFileName); + if (tfile.fail()) { + LogError << "Failed to open result file"; + return APP_ERR_COMM_FAILURE; + } + for (size_t i = 0; i < 81; ++i) { + tfile << *(dataptr + i) << std::endl; + } + tfile.close(); + return APP_ERR_OK; +} + + +APP_ERROR transformer::Process(const std::string &inferPath, const std::string &fileName) { + std::vector<MxBase::TensorBase> inputs = {}; + std::string inputIdsFile = inferPath + "/00_source_eos_ids/" + fileName; + APP_ERROR ret = ReadInputTensor(inputIdsFile, source_ids, &inputs); + if (ret != APP_ERR_OK) { + LogError << "Read source_ids failed, ret=" << ret << "."; + return ret; + } + std::string inputIdsFile1 = inferPath + "/01_source_eos_mask/" + fileName; + ret = ReadInputTensor(inputIdsFile1, source_mask, &inputs); + if (ret != APP_ERR_OK) { + LogError << "Read source_mask failed, ret=" << ret << "."; + return ret; + } + + std::vector<MxBase::TensorBase> outputs = {}; + ret = Inference(inputs, &outputs); + if (ret != APP_ERR_OK) { + LogError << "Inference failed, ret=" << ret << "."; + return ret; + } + + ret = WriteResult(fileName, outputs); + if (ret != APP_ERR_OK) { + LogError << "Write result failed, ret=" << ret << "."; + return ret; + } + return APP_ERR_OK; +} + diff --git a/official/nlp/transformer/infer/sdk/main.py b/official/nlp/transformer/infer/sdk/main.py new file mode 100644 index 0000000000000000000000000000000000000000..e02acb277435a16eb5d7021bac115c9a09eb0d0f --- /dev/null +++ b/official/nlp/transformer/infer/sdk/main.py @@ -0,0 +1,122 @@ +# coding=utf-8 + +# Copyright 2022 Huawei Technologies Co., Ltd +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================ + +import os +import MxpiDataType_pb2 as MxpiDataType +import numpy as np +from StreamManagerApi import StreamManagerApi, InProtobufVector, \ + MxProtobufIn, StringVector + + +def send_source_data(appsrc_id, tensor, stream_name, stream_manager): + """ + Construct the input of the stream, + send inputs data to a specified stream based on streamName. + Args: + appsrc_id: an RGB image:the appsrc component number for SendProtobuf + tensor: the tensor type of the input file + stream_name: stream Name + stream_manager:the StreamManagerApi + Returns: + bool: send data success or not + """ + tensor_package_list = MxpiDataType.MxpiTensorPackageList() + tensor_package = tensor_package_list.tensorPackageVec.add() + array_bytes = tensor.tobytes() + tensor_vec = tensor_package.tensorVec.add() + tensor_vec.deviceId = 0 + tensor_vec.memType = 0 + for i in tensor.shape: + tensor_vec.tensorShape.append(i) + tensor_vec.dataStr = array_bytes + tensor_vec.tensorDataSize = len(array_bytes) + key = "appsrc{}".format(appsrc_id).encode('utf-8') + protobuf_vec = InProtobufVector() + protobuf = MxProtobufIn() + protobuf.key = key + protobuf.type = b'MxTools.MxpiTensorPackageList' + protobuf.protobuf = tensor_package_list.SerializeToString() + protobuf_vec.push_back(protobuf) + + ret = stream_manager.SendProtobuf(stream_name, appsrc_id, protobuf_vec) + if ret < 0: + print("Failed to send data to stream.") + return False + return True + + +def run(): + """ + read pipeline and do infer + """ + # init stream manager + stream_manager_api = StreamManagerApi() + ret = stream_manager_api.InitManager() + if ret != 0: + print("Failed to init Stream manager, ret=%s" % str(ret)) + return + + # create streams by pipeline config file + with open("../data/config/transformer.pipeline", 'rb') as f: + pipelineStr = f.read() + ret = stream_manager_api.CreateMultipleStreams(pipelineStr) + + if ret != 0: + print("Failed to create Stream, ret=%s" % str(ret)) + return + stream_name = b'transformer' + predictions = [] + path = '../data/data/00_source_eos_ids' + path1 = '../data/data/01_source_eos_mask' + files = os.listdir(path) + for i in range(len(files)): + full_file_path = os.path.join(path, "transformer_bs_1_" + str(i) + ".bin") + full_file_path1 = os.path.join(path1, "transformer_bs_1_" + str(i) + ".bin") + source_ids = np.fromfile(full_file_path, dtype=np.int32) + source_mask = np.fromfile(full_file_path1, dtype=np.int32) + source_ids = np.expand_dims(source_ids, 0) + source_mask = np.expand_dims(source_mask, 0) + if not send_source_data(0, source_ids, stream_name, stream_manager_api): + return + if not send_source_data(1, source_mask, stream_name, stream_manager_api): + return + # Obtain the inference result by specifying streamName and uniqueId. + key_vec = StringVector() + key_vec.push_back(b'mxpi_tensorinfer0') + infer_result = stream_manager_api.GetProtobuf(stream_name, 0, key_vec) + if infer_result.size() == 0: + print("inferResult is null") + return + if infer_result[0].errorCode != 0: + print("GetProtobuf error. errorCode=%d" % (infer_result[0].errorCode)) + return + result = MxpiDataType.MxpiTensorPackageList() + result.ParseFromString(infer_result[0].messageBuf) + res = np.frombuffer(result.tensorPackageVec[0].tensorVec[0].dataStr, dtype=np.int32) + predictions.append(res.reshape(1, 1, 81)) + # decode and write to file + f = open('./results', 'w') + for batch_out in predictions: + token_ids = [str(x) for x in batch_out[0][0].tolist()] + f.write(" ".join(token_ids) + "\n") + f.close() + # destroy streams + stream_manager_api.DestroyAllStreams() + + +if __name__ == '__main__': + run() diff --git a/official/nlp/transformer/modelarts/train_modelarts.py b/official/nlp/transformer/modelarts/train_modelarts.py new file mode 100644 index 0000000000000000000000000000000000000000..2292379737cfd2c9e9262bed827db703ed658a6e --- /dev/null +++ b/official/nlp/transformer/modelarts/train_modelarts.py @@ -0,0 +1,233 @@ +# Copyright 2021-2022 Huawei Technologies Co., Ltd +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================ +"""Transformer training script.""" + +import argparse +import os +import time +import ast +import numpy as np +from easydict import EasyDict as edict +import mindspore.common.dtype as mstype +from mindspore import Tensor, context +from mindspore.nn.optim import Adam +from mindspore.train.model import Model +from mindspore.train.loss_scale_manager import DynamicLossScaleManager +from mindspore.train.callback import CheckpointConfig, ModelCheckpoint +from mindspore.train.callback import Callback, TimeMonitor +from mindspore.train.serialization import load_checkpoint, load_param_into_net, export +import mindspore.communication.management as D +from mindspore.communication.management import get_rank +from mindspore.context import ParallelMode +from mindspore.common import set_seed +from src.transformer_model import TransformerModel +from src.transformer_for_train import TransformerTrainOneStepCell, TransformerNetworkWithLoss, \ + TransformerTrainOneStepWithLossScaleCell, \ + TransformerTrainAccumulationAllReducePostWithLossScaleCell +from src.dataset import create_transformer_dataset +from src.lr_schedule import create_dynamic_lr +from src.model_utils.config import config +from src.model_utils.moxing_adapter import moxing_wrapper +from src.model_utils.device_adapter import get_device_id +from eval import load_weights + +set_seed(1) +parser = argparse.ArgumentParser(description='transformer', formatter_class=argparse.ArgumentDefaultsHelpFormatter) +parser.add_argument('--data_url', type=str, default=None, help='Location of Data') +parser.add_argument('--epoch', type=int, default=None, help='epoch') +parser.add_argument('--train_url', type=str, default='', help='Location of training outputs') +parser.add_argument('--enable_modelarts', type=ast.literal_eval, default=True, help='choose modelarts') +args, unknown = parser.parse_known_args() + +def save_ckpt_to_air(save_ckpt_path, path): + config.batch_size = config.batch_size_ev + config.hidden_dropout_prob = config.hidden_dropout_prob_ev + config.attention_probs_dropout_prob = config.attention_probs_dropout_prob_ev + + tfm_model = TransformerModel(config=config, is_training=False, use_one_hot_embeddings=False) + + parameter_dict = load_weights(path) + load_param_into_net(tfm_model, parameter_dict) + + source_ids = Tensor(np.ones((config.batch_size, config.seq_length)).astype(np.int32)) + source_mask = Tensor(np.ones((config.batch_size, config.seq_length)).astype(np.int32)) + + export(tfm_model, source_ids, source_mask, file_name=save_ckpt_path+'transformer', file_format="AIR") + +def get_ms_timestamp(): + t = time.time() + return int(round(t * 1000)) + +time_stamp_init = False +time_stamp_first = 0 + +config.dtype = mstype.float32 +config.compute_type = mstype.float16 +config.lr_schedule = edict({ + 'learning_rate': 2.0, + 'warmup_steps': 8000, + 'start_decay_step': 16000, + 'min_lr': 0.0, + }) + +class LossCallBack(Callback): + """ + Monitor the loss in training. + If the loss is NAN or INF terminating training. + Note: + If per_print_times is 0 do not print loss. + Args: + per_print_times (int): Print loss every times. Default: 1. + """ + + def __init__(self, per_print_times=1, rank_id=0): + super(LossCallBack, self).__init__() + if not isinstance(per_print_times, int) or per_print_times < 0: + raise ValueError("print_step must be int and >= 0.") + self._per_print_times = per_print_times + self.rank_id = rank_id + global time_stamp_init, time_stamp_first + if not time_stamp_init: + time_stamp_first = get_ms_timestamp() + time_stamp_init = True + + def step_end(self, run_context): + """Monitor the loss in training.""" + global time_stamp_first + time_stamp_current = get_ms_timestamp() + cb_params = run_context.original_args() + print("time: {}, epoch: {}, step: {}, outputs are {}".format(time_stamp_current - time_stamp_first, + cb_params.cur_epoch_num, + cb_params.cur_step_num, + str(cb_params.net_outputs))) + loss_file = "./loss_{}.log" + if args.enable_modelarts: + loss_file = "/cache/train/loss_{}.log" + + with open(loss_file.format(self.rank_id), "a+") as f: + f.write("time: {}, epoch: {}, step: {}, loss: {}, overflow: {}, loss_scale: {}".format( + time_stamp_current - time_stamp_first, + cb_params.cur_epoch_num, + cb_params.cur_step_num, + str(cb_params.net_outputs[0].asnumpy()), + str(cb_params.net_outputs[1].asnumpy()), + str(cb_params.net_outputs[2].asnumpy()))) + f.write('\n') + + +def modelarts_pre_process(): + config.save_checkpoint_path = config.output_path + args.data_url = os.path.join(args.data_url, 'ende-l128-mindrecord') + +@moxing_wrapper(pre_process=modelarts_pre_process) +def run_transformer_train(): + """ + Transformer training. + """ + if config.device_target == "Ascend": + context.set_context(mode=context.GRAPH_MODE, device_target=config.device_target, device_id=get_device_id()) + else: + context.set_context(mode=context.GRAPH_MODE, device_target=config.device_target) + context.set_context(reserve_class_name_in_scope=False) + + # Set mempool block size in PYNATIVE_MODE for improving memory utilization, which will not take effect in GRAPH_MODE + + if config.device_target == "GPU": + # Enable graph kernel + context.set_context(enable_graph_kernel=True, graph_kernel_flags="--enable_parallel_fusion") + if config.distribute == "true": + if config.device_target == "Ascend": + device_num = config.device_num + D.init('hccl') + else: + D.init('nccl') + device_num = D.get_group_size() + rank = get_rank() + config.device_id = rank + context.reset_auto_parallel_context() + context.set_auto_parallel_context(parallel_mode=ParallelMode.DATA_PARALLEL, gradients_mean=True, + device_num=device_num) + rank_id = config.device_id % device_num + save_ckpt_path = os.path.join(config.save_checkpoint_path, 'ckpt_' + str(get_rank()) + '/') + else: + device_num = 1 + rank_id = 0 + save_ckpt_path = os.path.join(config.save_checkpoint_path, 'ckpt_0/') + dataset = create_transformer_dataset(rank_size=device_num, + rank_id=rank_id, + do_shuffle=config.do_shuffle, + dataset_path=args.data_url, + bucket_boundaries=config.bucket_boundaries, + device_target=config.device_target) + + netwithloss = TransformerNetworkWithLoss(config, True) + + if config.checkpoint_path: + parameter_dict = load_checkpoint(config.checkpoint_path) + load_param_into_net(netwithloss, parameter_dict) + + hidden_size = config.hidden_size + learning_rate = config.lr_schedule.learning_rate if config.device_target == "Ascend" else 1.0 + lr = Tensor(create_dynamic_lr(schedule="constant*rsqrt_hidden*linear_warmup*rsqrt_decay", + training_steps=dataset.get_dataset_size()*config.epoch_size, + learning_rate=learning_rate, + warmup_steps=config.lr_schedule.warmup_steps, + hidden_size=hidden_size, + start_decay_step=config.lr_schedule.start_decay_step, + min_lr=config.lr_schedule.min_lr), mstype.float32) + if config.device_target == "GPU" and config.transformer_network == "large": + optimizer = Adam(netwithloss.trainable_params(), lr, beta2=config.optimizer_adam_beta2) + else: + optimizer = Adam(netwithloss.trainable_params(), lr) + + callbacks = [TimeMonitor(dataset.get_dataset_size()), LossCallBack(rank_id=rank_id)] + if config.enable_save_ckpt == "true": + if device_num == 1 or (device_num > 1 and rank_id == 0): + if config.device_target == "Ascend": + ckpt_config = CheckpointConfig(save_checkpoint_steps=config.save_checkpoint_steps, + keep_checkpoint_max=config.save_checkpoint_num) + else: + ckpt_config = CheckpointConfig(save_checkpoint_steps=dataset.get_dataset_size(), + keep_checkpoint_max=config.save_checkpoint_num) + ckpoint_cb = ModelCheckpoint(prefix='transformer', directory=save_ckpt_path, config=ckpt_config) + callbacks.append(ckpoint_cb) + + if config.enable_lossscale == "true": + scale_manager = DynamicLossScaleManager(init_loss_scale=config.init_loss_scale_value, + scale_factor=config.scale_factor, + scale_window=config.scale_window) + update_cell = scale_manager.get_update_cell() + if config.accumulation_steps > 1: + netwithgrads = TransformerTrainAccumulationAllReducePostWithLossScaleCell(netwithloss, optimizer, + update_cell, + config.accumulation_steps) + else: + netwithgrads = TransformerTrainOneStepWithLossScaleCell(netwithloss, optimizer=optimizer, + scale_update_cell=update_cell) + else: + netwithgrads = TransformerTrainOneStepCell(netwithloss, optimizer=optimizer) + + netwithgrads.set_train(True) + model = Model(netwithgrads) + + print("============== Starting Training ==============") + model.train(args.epoch, dataset, callbacks=callbacks, dataset_sink_mode=False) + print("============== End Training ==============") + path = os.path.join(save_ckpt_path, 'transformer'+'-'+str(args.epoch)+'_'+str(dataset.get_dataset_size())+'.ckpt') + save_ckpt_to_air(save_ckpt_path, path) + + +if __name__ == '__main__': + run_transformer_train()