diff --git a/research/nlp/transformer_xl/README.md b/research/nlp/transformer_xl/README.md new file mode 100644 index 0000000000000000000000000000000000000000..0e60df4873c4c01e67d3b2a26d3ad83d8d31ce46 --- /dev/null +++ b/research/nlp/transformer_xl/README.md @@ -0,0 +1,298 @@ +# Contents + +- [Contents](#Contents) + - [Transformer_XL Description](#transformer-xl-description) + - [Model Architecture](#model-architecture) + - [Dataset](#dataset) + - [Environment Requirements](#environment-requirements) + - [Quick Start](#quick-start) + - [Script Description](#script-description) + - [Script and Sample Code](#script-and-sample-code) + - [Script Parameters](#script-parameters) + - [Training Script Parameters](#training-script-parameters) + - [Running Options](#running-options) + - [Network Parameters](#network-parameters) + - [Dataset Preparation](#dataset-preparation) + - [Training Process](#training-process) + - [Evaluation Process](#evaluation-process) + - [Model Description](#model-description) + - [Performance](#performance) + - [Training Performance](#training-performance) + - [Evaluation Performance](#evaluation-performance) + - [Description of Random Situation](#description-of-random-situation) + - [ModelZoo Homepage](#modelzoo-homepage) + +## [Transformer_XL Description](#contents) + +Transformer-XL is an improvement to Transformer, mainly to solve the problem of long sequences. At the same time, it +combines the advantages of RNN sequence modeling and Transformer's self-attention mechanism, introduces a recurrent +mechanism and relative position encoding, uses Transformer's attention module on each segment of the input data, and +uses a recurrent mechanism to learn the relationship between consecutive segments. dependencies. And successfully +achieved SoTA effect on language modeling datasets such as enwik8 and text8. + +[Paper](https://arxiv.org/abs/1901.02860): Dai Z, Yang Z, Yang Y, et al. Transformer-xl: Attentive language models +beyond a fixed-length context[J]. arXiv preprint arXiv:1901.02860, 2019. + +## [Model Architecture](#contents) + +The backbone structure of Transformer-XL is Transformer, which adds Recurrence Mechanism and Relative Positional +Encoding on the original basis. + +## [Dataset](#contents) + +The following two datasets contain the training dataset and the evaluation dataset. Recommended for dataset `bash getdata.sh` is automatically downloaded and preprocessed. + +[enwik8](http://mattmahoney.net/dc/enwik8.zip) + +Enwik8 data set is based on Wikipedia and is usually used to measure the ability of the model to compress data. Contains 100MB of unprocessed Wikipedia text. + +If you download the enwik8 dataset directly through the link, download and execute [prep_enwik8.py](https://raw.githubusercontent.com/salesforce/awd-lstm-lm/master/data/enwik8/prep_enwik8.py) Preprocess the downloaded data set. + +Dataset size: + +- Training set: 88,982,818 characters in total +- Validation set: 4,945,742 characters in total +- Test set: 36,191 characters in total + +Dataset format: TXT text + +Dataset directory structure: + +```text +└─data + ├─enwik8 + ├─train.txt # Training set + ├─train.txt.raw # Training set(unprocessed) + ├─valid.txt # Validation set + ├─valid.txt.raw # Validation set(unprocessed) + ├─test.txt # Test set + └─test.txt.raw # Test set(unprocessed) +``` + +- [text8](http://mattmahoney.net/dc/text8.zip) + +Text8 also contains 100MB of Wikipedia text. The difference is to move other characters except 26 letters and spaces based on the enwik8 dataset. + +If you download the text8 dataset directly through the link, execute prep_ text8.py to preprocess the downloaded data set. + +Dataset size: + +- Training set: 89,999,999 characters in total +- Validation set: 4,999,999 characters in total +- Test set: 5,000,000 characters in total + +Dataset format: TXT text + +Dataset directory structure: + +```text +└─data + ├─text8 + ├─train.txt # Training set + ├─train.txt.raw # Training set(unprocessed) + ├─valid.txt # Validation set + ├─valid.txt.raw # Validation set(unprocessed) + ├─test.txt # Test set + └─test.txt.raw # Test set(unprocessed) +``` + +## [Environment Requirements](#contents) + +- Hardware(Ascend/GPU) + - Prepare hardware environment with Ascend or GPU processor. +- Framework + - [MindSpore](https://gitee.com/mindspore/mindspore) +- For more information, please check the resources below: + - [MindSpore Tutorials](https://www.mindspore.cn/tutorials/en/master/index.html) + - [MindSpore Python API](https://www.mindspore.cn/docs/api/en/master/index.html) + +## [Quick Start](#contents) + +- Running on GPU + +After dataset preparation, you can start training and evaluation as follows: + +```bash +# Fine-tuning of parameters: hyperparameters in enwik8_base.yaml +# Where [DATA_NAME] belongs to the default parameter [enwik8, text8] +# The [TRAIN_URL] parameter can be set to a character name like "experiments", which will automatically create the corresponding model training file under "/script/train/experiments-enwik8" according to this name, or it can be set to a path, such as "/home/mindspore/transformer-xl/enwik8_8p". In this way, the training model will be saved separately in this directory. + +# run training example +bash run_standalone_train_gpu.sh [DEVICE_ID] [DATA_DIR] [DATA_NAME] [TRAIN_URL] [CONFIG_PATH] +# for example: bash run_standalone_train_gpu.sh 0 /home/mindspore/transformer-xl/data/enwik8/ enwik8 experiments /home/mindspore/transformer-xl/yaml/enwik8_base.yaml + +# run distributed training example +bash run_distribute_train_gpu.sh [DEVICE_NUM] [VISIABLE_DEVICES(0,1,2,3,4,5,6,7)] [DATA_DIR] [DATA_NAME] [TRAIN_URL] [CONFIG_PATH] +# for example: bash run_distribute_train_gpu.sh 4 0,1,2,3 /home/mindspore/transformer-xl/data/enwik8/ enwik8 experiments /home/mindspore/transformer-xl/yaml/enwik8_base.yaml + +# run evaluation example +bash run_eval_gpu.sh [DATA_URL] [DATA_NAME] [CKPT_PATH] [CONFIG_PATH] [DEVICE_ID(optional)] +# for example: bash run_eval_gpu.sh /home/mindspore/transformer-xl/data/enwik8/ enwik8 /home/mindspore/transformer-xl/script/experiments-enwik8/20220416-140816/model7.ckpt /home/mindspore/transformer-xl/yaml/enwik8_base.yaml 0 +``` + +## [Script Description](#contents) + +### [Script and Sample Code](#contents) + +```text +. +└─Transformer-XL + ├─README.md // descriptions about Transformer-XL + ├─README_CN.md // descriptions about Transformer-XL + ├─scripts + ├─run_distribute_train_gpu.sh // shell script for distributed training on GPU + ├─run_standalone_train_gpu.sh // shell script for training on GPU + └─run_eval_gpu.sh // shell script for testing on GPU + ├─src + ├─callback + ├─eval.py // callback function(eval) + ├─flag.py // callback function(flag) + └─log.py // callback function(log) + ├─loss_fn + └─ProjectedAdaptiveLogSoftmaxLoss.py // loss + ├─metric + └─calc.py // get bpc and ppl + ├─model + ├─attn.py // Attention code + ├─dataset.py // get dataset + ├─embedding.py // PositionalEmbedding and AdaptiveEmbedding + ├─layer.py // layer code + ├─mem_transformer.py // Transformer-XL model + ├─positionwiseFF.py // positionwiseFF + └─vocabulary.py // construct vocabulary + ├─model_utils + ├─config.py // parameter configuration + ├─device_adapter.py // device adapter + ├─local_adapter.py // local adapter + └─moxing_adapter.py // moxing adapter + ├─utils + ├─additional_algorithms.py // General method + ├─dataset_util.py // Interface to get dataset + └─nnUtils.py // Basic method + ├─yaml + ├─enwik8_base.yaml // parameter configuration on gpu + ├─enwik8_large.yaml // parameter configuration on gpu + └─text8_large.yaml // parameter configuration on gpu + ├─getdata.sh // shell script for preprocessing dataset + ├─eval.py // evaluation script + └─train.py // training script +``` + +### [Script Parameters](#contents) + +#### Training Script Parameters + +```text +usage: +train.py +If you need to set the parameters, you can modify the . /enwik8_base.yaml file to implement the parameters. +If you need to change the parameter configuration file, you can change the --config_path parameter of line130 in /src/model_utils/config.py. + +``` + +#### Network Parameters + +```text +Parameters for dataset and network (Training/Evaluation): + n_layer number of total layers: N, default is 12 + d_model dimension of model, default is 512 + n_head number of heads, default is 8 + d_head head dimension, default is 64 + d_inner inner dimension in FF, default is 2048 + dropout global dropout rate: Q, default is 0.1 + dropatt attention probability dropout rate: Q, default is 0.0 + max_step maximum of step: N, default is 400000 + tgt_len number of tokens to predict, default is 512 + mem_len length of the retained previous heads, default is 512 + eval_tgt_len number of tokens to predict for evaluation, default is 128 + batch_size batch size of input dataset: N, default is 22 + +Parameters for learning rate: + lr value of learning rate: Q, default is 0.00025 + warmup_step steps of the learning rate warm up: N, default is 0 +``` + +### [Dataset Preparation](#contents) + +- Download the dataset and configure DATA_PATH + +### [Training Process](#contents) + +- Set options in `enwik8_base.yaml`, including loss_scale, learning rate and network hyperparameters. + +- Run `run_standalone_train_gpu.sh` for training of Transformer-XL model. + + ``` + # run training example + bash run_standalone_train_gpu.sh [DEVICE_ID] [DATA_DIR] [DATA_NAME] [TRAIN_URL] [CONFIG_PATH] + # for example: bash run_standalone_train_gpu.sh 0 /home/mindspore/transformer-xl/data/enwik8/ enwik8 experiments /home/mindspore/transformer-xl/yaml/enwik8_base.yaml + ``` + +- Run `run_distribute_train_gpu.sh` for distributed training of Transformer-XL model. + + ``` + # run distributed training example + bash run_distribute_train_gpu.sh [DEVICE_NUM] [VISIABLE_DEVICES(0,1,2,3,4,5,6,7)] [DATA_DIR] [DATA_NAME] [TRAIN_URL] [CONFIG_PATH] + # for example: bash run_distribute_train_gpu.sh 4 0,1,2,3 /home/mindspore/transformer-xl/data/enwik8/ enwik8 experiments /home/mindspore/transformer-xl/yaml/enwik8_base.yaml + ``` + +### [Evaluation Process](#contents) + +- Set options in `enwik8_base.yaml`. Make sure the 'datadir' are set to your own path. + +- Run `run_eval_gpu.sh` for evaluation of Transformer model. + + ``` + # run evaluation example + bash run_eval_gpu.sh [DATA_URL] [DATA_NAME] [CKPT_PATH] [CONFIG_PATH] [DEVICE_ID(optional)] + # for example: bash run_eval_gpu.sh /home/mindspore/transformer-xl/data/enwik8/ enwik8 /home/mindspore/transformer-xl/script/experiments-enwik8/20220416-140816/model7.ckpt /home/mindspore/transformer-xl/yaml/enwik8_base.yaml 0 + ``` + +## [Model Description](#contents) + +### [Performance](#contents) + +#### Training Performance + +| Parameters | GPU | +| -------------------------- | -------------------------------------- | +| Resource | MindSpore | +| uploaded Date | 22/04/2022 (month/day/year) | +| MindSpore Version | 1.6.1 | +| Dataset | enwik8 | +| Training Parameters | batch_size=22 | +| Optimizer | Adam | +| Loss Function | Softmax Cross Entropy | +| BPC Score | 1.07906 | +| Speed | 421.24ms/step(1p,bsz=8) | +| Loss | 0.75 | +| Checkpoint for inference | 1.45G(.ckpt文件) | +| Scripts | Transformer scripts | + +#### Evaluation Performance + +| Parameters | GPU | +| ------------------- | --------------------------- | +| Resource | MindSpore | +| Uploaded Date | 22/04/2022 (month/day/year) | +| MindSpore Version | 1.6.1 | +| Dataset | enwik8 | +| batch_size | 22 | +| outputs | loss,bpc | +| Loss | 0.75 | +| BPC Score | 1.07906 | + +## [Description of Random Situation](#contents) + +There are three random situations: + +- Shuffle of the dataset. +- Initialization of some model weights. +- Dropout operations. + +Some seeds have already been set in train.py to avoid the randomness of dataset shuffle and weight initialization. If +you want to disable dropout, please set the corresponding dropout_prob parameter to 0 in default_config.yaml. + +## [ModelZoo Homepage](#contents) + +Please check the official [homepage](https://gitee.com/mindspore/models). diff --git a/research/nlp/transformer_xl/README_CN.md b/research/nlp/transformer_xl/README_CN.md new file mode 100644 index 0000000000000000000000000000000000000000..6a4f3869df42c111c32b33d8fffe079866d4cc1b --- /dev/null +++ b/research/nlp/transformer_xl/README_CN.md @@ -0,0 +1,296 @@ +# 目录 + +- [目录](#目录) + - [Transformer-XL 概述](#transformer-xl-概述) + - [模型架构](#模型架构) + - [数据集](#数据集) + - [环境要求](#环境要求) + - [快速入门](#快速入门) + - [脚本说明](#脚本说明) + - [脚本和样例代码](#脚本和样例代码) + - [脚本参数](#脚本参数) + - [训练脚本参数](#训练脚本参数) + - [运行选项](#运行选项) + - [网络参数](#网络参数) + - [准备数据集](#准备数据集) + - [训练过程](#训练过程) + - [评估过程](#评估过程) + - [模型描述](#模型描述) + - [性能](#性能) + - [训练性能](#训练性能) + - [评估性能](#评估性能) + - [随机情况说明](#随机情况说明) + - [ModelZoo主页](#modelzoo主页) + +## Transformer-XL 概述 + +Transformer-XL是对Transformer的改进,主要是解决长序列的问题。同时结合了RNN序列建模和Transformer自注意力机制的优点,引入循环机制(Recurrence +Mechanism)和相对位置编码(Relative Positional +Encoding),在输入数据的每个段上使用Transformer的注意力模块,并使用循环机制来学习连续段之间的依赖关系。并成功在enwik8、text8等语言建模数据集上取得SoTA效果。 + +[论文](https://arxiv.org/abs/1901.02860): Dai Z, Yang Z, Yang Y, et al. Transformer-xl: Attentive language models beyond +a fixed-length context[J]. arXiv preprint arXiv:1901.02860, 2019. + +## 模型架构 + +Transformer-XL主干结构为Transformer,在原有基础上加入了循环机制(Recurrence Mechanism)和相对位置编码(Relative Positional Encoding) + +## 数据集 + +以下数据集包含训练数据集和评估数据集,数据集推荐使用 `bash getdata.sh` 的方式自动下载并预处理。 + +[enwik8](http://mattmahoney.net/dc/enwik8.zip) + +enwik8数据集基于维基百科,通常用于衡量模型压缩数据的能力。包含了100MB未处理的Wikipedia的文本。 + +如果直接通过链接下载enwik8数据集,请通过下载并执行 [prep_enwik8.py](https://raw.githubusercontent.com/salesforce/awd-lstm-lm/master/data/enwik8/prep_enwik8.py) 的方式对下载的数据集进行预处理。 + +数据集大小 + +- 训练集:共计88,982,818个字符 +- 验证集:共计4,945,742个字符 +- 测试集:共计36,191个字符 + +数据集格式:txt文本 + +数据集目录结构: + +```text +└─data + ├─enwik8 + ├─train.txt # 训练集 + ├─train.txt.raw # 训练集(未处理) + ├─valid.txt # 验证集 + ├─valid.txt.raw # 验证集(未处理) + ├─test.txt # 测试集 + └─test.txt.raw # 测试集(未处理) +``` + +- [text8](http://mattmahoney.net/dc/text8.zip) + +text8同样包含了100MB的Wikipedia文本,区别在于在enwik8数据集的基础上移除了26个字母和空格以外的其他字符。 + +如果直接通过链接下载text8数据集,请通过执行 prep_text8.py 的方式对下载的数据集进行预处理。 + +数据集大小: + +- 训练集:共计89,999,999个字符 +- 验证集:共计4,999,999个字符 +- 测试集:共计5,000,000个字符 + +数据集格式:txt文本 + +数据集目录结构: + +```text +└─data + ├─text8 + ├─train.txt # 训练集 + ├─train.txt.raw # 训练集(未处理) + ├─valid.txt # 验证集 + ├─valid.txt.raw # 验证集(未处理) + ├─test.txt # 测试集 + └─test.txt.raw # 测试集(未处理) +``` + +## 环境要求 + +- 硬件(Ascend处理器) + - 使用Ascend处理器准备硬件环境。 +- 框架 + - [MindSpore](https://gitee.com/mindspore/mindspore) +- 如需查看详情,请参见如下资源: + - [MindSpore教程](https://www.mindspore.cn/tutorials/zh-CN/master/index.html) + - [MindSpore Python API](https://www.mindspore.cn/docs/api/zh-CN/master/index.html) + +## 快速入门 + +- 在GPU上运行 + +数据集准备完成后,请按照如下步骤开始训练和评估: + +```bash +# 对参数进行微调: enwik8_base.yaml中对超参数进行调整 +# 其中[DATA_NAME]属于缺省参数[enwik8,text8] +# 其中[TRAIN_URL]参数可以设置为一个字符名称,这样会自动按照这个名称在/script/train/下面创建对应的模型训练文件,也可以设置为一个路径,例如 `"/home/mindspore/transformer-xl/enwik8_8p"` 这种方式会将训练的模型单独保存在这个目录下。 + +# 运行非分布式训练示例 +bash run_standalone_train_gpu.sh [DEVICE_ID] [DATA_DIR] [DATA_NAME] [TRAIN_URL] [CONFIG_PATH] +# for example: bash run_standalone_train_gpu.sh 0 /home/mindspore/transformer-xl/data/enwik8/ enwik8 experiments /home/mindspore/transformer-xl/yaml/enwik8_base.yaml + +# 运行分布式训练示例 +bash run_distribute_train_gpu.sh [DEVICE_NUM] [VISIABLE_DEVICES(0,1,2,3,4,5,6,7)] [DATA_DIR] [DATA_NAME] [TRAIN_URL] [CONFIG_PATH] +# for example: bash run_distribute_train_gpu.sh 4 0,1,2,3 /home/mindspore/transformer-xl/data/enwik8/ enwik8 experiments /home/mindspore/transformer-xl/yaml/enwik8_base.yaml + +# 运行评估示例 +bash run_eval_gpu.sh [DATA_URL] [DATA_NAME] [CKPT_PATH] [CONFIG_PATH] [DEVICE_ID(optional)] +# for example: bash run_eval_gpu.sh /home/mindspore/transformer-xl/data/enwik8/ enwik8 /home/mindspore/transformer-xl/script/experiments-enwik8/20220416-140816/model7.ckpt /home/mindspore/transformer-xl/yaml/enwik8_base.yaml 0 +``` + +## 脚本说明 + +### 脚本和样例代码 + +```text +. +└─Transformer-XL + ├─README.md // descriptions about Transformer-XL + ├─README_CN.md // descriptions about Transformer-XL + ├─scripts + ├─run_distribute_train_gpu.sh // shell script for distributed training on GPU + ├─run_standalone_train_gpu.sh // shell script for training on GPU + └─run_eval_gpu.sh // shell script for testing on GPU + ├─src + ├─callback + ├─eval.py // callback function(eval) + ├─flag.py // callback function(flag) + └─log.py // callback function(log) + ├─loss_fn + └─ProjectedAdaptiveLogSoftmaxLoss.py // loss + ├─metric + └─calc.py // get bpc and ppl + ├─model + ├─attn.py // Attention code + ├─dataset.py // get dataset + ├─embedding.py // PositionalEmbedding and AdaptiveEmbedding + ├─layer.py // layer code + ├─mem_transformer.py // Transformer-XL model + ├─positionwiseFF.py // positionwiseFF + └─vocabulary.py // construct vocabulary + ├─model_utils + ├─config.py // parameter configuration + ├─device_adapter.py // device adapter + ├─local_adapter.py // local adapter + └─moxing_adapter.py // moxing adapter + ├─utils + ├─additional_algorithms.py // General method + ├─dataset_util.py // Interface to get dataset + └─nnUtils.py // Basic method + ├─yaml + ├─enwik8_base.yaml // parameter configuration on gpu + ├─enwik8_large.yaml // parameter configuration on gpu + └─text8_large.yaml // parameter configuration on gpu + ├─getdata.sh // shell script for preprocessing dataset + ├─eval.py // evaluation script + └─train.py // training script +``` + +### 脚本参数 + +#### 训练脚本参数 + +```text +用法: +train.py +如果需要对参数进行设置,可以修改./enwik8_base.yaml文件中的参数实现。 +如果需要更改参数配置文件,可以更改/src/model_utils/config.py中line130的--config_path参数。 +``` + +#### 网络参数 + +```text +数据集和网络参数(训练/微调/评估): + n_layer 网络层数: N, 默认值为 12 + d_model 模型维度, 默认值为 512 + n_head 总的注意力头数, 默认值为 8 + d_head 注意力头的维度, 默认值为 64 + d_inner 前馈网络的维度, 默认值为 2048 + dropout 输出层的随机失活概率: Q, 默认值是 0.1 + dropatt 注意力层的随机失活概率: Q, default is 0.0 + max_step 迭代次数: N, 默认值为 400000 + tgt_len 标签特征维度大小, 默认值为 512 + mem_len 记忆特征维度大小, 默认值为 512 + eval_tgt_len 迭代任务中标签特征维度大小, 默认值为 128 + batch_size 输入数据集的批次大小: N, 默认值是 22 + +学习率参数: + lr 学习率: Q, 默认值为 0.00025 + warmup_step 热身学习率步数: N, 默认值为 0 +``` + +### 准备数据集 + +- 运行 `bash getdata.sh` , 脚本会创建 `./data` 目录并将数据集自动下载到该目录下 + +- 下载数据集并配置好DATA_PATH + +### 训练过程 + +- 通过直接用sh输入参数的方式输入路径,或在`enwik8_base.yaml`中设置选项,确保 'datadir' 路径为数据集路径。设置其他参数包括loss_scale、学习率和网络超参数。 + +- 运行`run_standalone_train_gpu.sh`,进行Transformer-XL模型的非分布式训练。 + + ``` + # 运行非分布式训练示例 + bash run_standalone_train_gpu.sh [DEVICE_ID] [DATA_DIR] [DATA_NAME] [TRAIN_URL] [CONFIG_PATH] + # for example: bash run_standalone_train_gpu.sh 0 /home/mindspore/transformer-xl/data/enwik8/ enwik8 experiments /home/mindspore/transformer-xl/yaml/enwik8_base.yaml + ``` + +- 运行`run_distribute_train_gpu.sh`,进行Transformer-XL模型的分布式训练。 + + ``` + # 运行分布式训练示例 + bash run_distribute_train_gpu.sh [DEVICE_NUM] [VISIABLE_DEVICES(0,1,2,3,4,5,6,7)] [DATA_DIR] [DATA_NAME] [TRAIN_URL] [CONFIG_PATH] + # for example: bash run_distribute_train_gpu.sh 4 0,1,2,3 /home/mindspore/transformer-xl/data/enwik8/ enwik8 experiments /home/mindspore/transformer-xl/yaml/enwik8_base.yaml + ``` + +### 评估过程 + +- 通过直接用sh输入参数的方式输入路径,或在`enwik8_base.yaml`中设置选项,设置 'load_path' 文件路径。 + +- 运行`run_eval_gpu.sh`,评估Transformer-XL模型。 + + ``` + # 运行评估示例 + bash run_eval_gpu.sh [DATA_URL] [DATA_NAME] [CKPT_PATH] [CONFIG_PATH] [DEVICE_ID(optional)] + # for example: bash run_eval_gpu.sh /home/mindspore/transformer-xl/data/enwik8/ enwik8 /home/mindspore/transformer-xl/script/experiments-enwik8/20220416-140816/model7.ckpt /home/mindspore/transformer-xl/yaml/enwik8_base.yaml 0 + ``` + +## 模型描述 + +### 性能 + +#### 训练性能 + +| 参数 | GPU | +| ------------- | ------------------------------ | +| 资源 | MindSpore | +| 上传日期 | 2022-04-22 | +| MindSpore版本 | 1.6.1 | +| 数据集 | enwik8 | +| 训练参数 | max_step=400000, batch_size=22 | +| 优化器 | Adam | +| 损失函数 | Softmax Cross Entropy | +| BPC分数 | 1.07906 | +| 速度 | 421.24ms/step(1p,bsz=8) | +| 损失 | 0.75 | +| 推理检查点 | 1.45G(.ckpt文件) | +| 脚本 | Transformer-XL script | + +#### 评估性能 + +| 参数 | GPU | +| ------------- | --------------------------- | +|资源 | MindSpore | +| 上传日期 | 2022-04-22 | +| MindSpore版本 | 1.6.1 | +| 数据集 | enwik8 | +| batch_size | 22 | +| 输出 | 损失loss,BPC分数 | +| 损失loss | 0.75 | +| BPC分数 | 1.07906 | + +## 随机情况说明 + +以下三种随机情况: + +- 轮换数据集 +- 初始化部分模型权重 +- 随机失活运行 + +train.py已经设置了一些种子,避免数据集轮换和权重初始化的随机性。若需关闭随机失活,将default_config.yaml中相应的dropout_prob参数设置为0。 + +## ModelZoo主页 + +请浏览官网[主页](https://gitee.com/mindspore/models)。 + diff --git a/research/nlp/transformer_xl/eval.py b/research/nlp/transformer_xl/eval.py new file mode 100644 index 0000000000000000000000000000000000000000..b8e1bc7adaa3983841122e84b622c68ed95ed492 --- /dev/null +++ b/research/nlp/transformer_xl/eval.py @@ -0,0 +1,77 @@ +# Copyright 2022 Huawei Technologies Co., Ltd +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================ + +import argparse +from mindspore import load_checkpoint, context +from mindspore.dataset import GeneratorDataset +from src.callback.eval import doEval +from src.metric.calc import bpc +from src.model.mem_transformer import MemTransformerLM +from src.model_utils.config import config +from src.utils.dataset_util import get_dataset + +if __name__ == '__main__': + parser = argparse.ArgumentParser(description='Transformer-XL evaluation running') + parser.add_argument('--datadir', default='./data/enwik8', + help='Directory contains enwik8 dataset.') + parser.add_argument('--dataset', default='enwik8', + help='Dataset Name.', choices=["enwik8", "text8"]) + parser.add_argument('--ckpt_path', default="./model0.ckpt", help='Directory of model.') + parser.add_argument("--device", type=str, default="GPU", help="Device Target, default GPU", + choices=["Ascend", "GPU"]) + parser.add_argument("--device_id", type=int, default=0, help="Device id, default is 0.") + + args = parser.parse_args() + datadir = args.datadir + dataset = args.dataset + device_id = args.device_id + + dataset = get_dataset(datadir, dataset) + ntokens = len(dataset.vocab) + + context.set_context(device_id=device_id) + context.set_context(mode=context.GRAPH_MODE, device_target="GPU", max_device_memory="39.0GB", + enable_graph_kernel=True) + + # Due to the mems mechanism, it is not possible to perform multi-card segmentation on the valid and test datasets + valid_dataset = GeneratorDataset(source=dataset.get_valid_generator(), column_names=['data', 'target'], + shuffle=False) + test_dataset = GeneratorDataset(source=dataset.get_test_generator(), column_names=['data', 'target'], + shuffle=False) + + # adaptive softmax / embedding + cutoffs = [] + net = MemTransformerLM(ntokens, config.n_layer, config.n_head, config.d_model, + config.d_head, config.d_inner, config.dropout, config.dropatt, batch_size=config.batch_size, + d_embed=config.d_embed, div_val=config.div_val, + pre_lnorm=config.pre_lnorm, tgt_len=config.tgt_len, + ext_len=config.ext_len, mem_len=config.mem_len, eval_tgt_len=config.eval_tgt_len, + cutoffs=cutoffs, same_length=config.same_length, clamp_len=config.clamp_len) + + # model_filename = os.path.join(config.load_path, args.ckpt_filename + '.ckpt') + model_filename = args.ckpt_path + print(model_filename) + load_checkpoint(net=net, ckpt_file_name=model_filename) + + valid_loss = doEval(net, valid_dataset, config.tgt_len, config.ext_len, config.mem_len, config.eval_tgt_len) + test_loss = doEval(net, test_dataset, config.tgt_len, config.ext_len, config.mem_len, config.eval_tgt_len) + + print('=' * 100) + if config.dataset in ['enwik8', 'text8']: + print('| End of valid | valid loss {:5.2f} | valid bpc {:9.5f}'.format( + valid_loss, bpc(valid_loss))) + print('| End of test | test loss {:5.2f} | test bpc {:9.5f}'.format( + test_loss, bpc(test_loss))) + print('=' * 100) diff --git a/research/nlp/transformer_xl/getdata.sh b/research/nlp/transformer_xl/getdata.sh new file mode 100644 index 0000000000000000000000000000000000000000..3aa90345c108048a22966b5a2782a6dda971aa38 --- /dev/null +++ b/research/nlp/transformer_xl/getdata.sh @@ -0,0 +1,43 @@ +#!/bin/bash +# Copyright 2022 Huawei Technologies Co., Ltd +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================ + +echo "=== Acquiring datasets ===" +echo "---" + +mkdir -p data +cd data + +echo "- Downloading enwik8 (Character)" +if [[ ! -d 'enwik8' ]]; then + mkdir -p enwik8 + cd enwik8 + wget --continue http://mattmahoney.net/dc/enwik8.zip --no-check-certificate + wget https://raw.githubusercontent.com/salesforce/awd-lstm-lm/master/data/enwik8/prep_enwik8.py --no-check-certificate + python3 prep_enwik8.py + cd .. +fi + +echo "- Downloading text8 (Character)" +if [[ ! -d 'text8' ]]; then + mkdir -p text8 + cd text8 + wget --continue http://mattmahoney.net/dc/text8.zip --no-check-certificate + python ../../prep_text8.py + cd .. +fi + +echo "---" +echo "Happy language modeling :)" diff --git a/research/nlp/transformer_xl/prep_text8.py b/research/nlp/transformer_xl/prep_text8.py new file mode 100644 index 0000000000000000000000000000000000000000..fa586ba0545d08188e4c22427100a06d3a39aed4 --- /dev/null +++ b/research/nlp/transformer_xl/prep_text8.py @@ -0,0 +1,46 @@ +#!/usr/bin/env python +# coding=utf-8 +# Copyright 2022 Huawei Technologies Co., Ltd +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================ +import os +import sys +import zipfile + +from io import open + +if os.path.exists('train.txt'): + print('Tokenized text8 already exists - skipping processing') + sys.exit() + +zipfile.ZipFile('text8.zip').extractall() +data = open('text8', 'r', encoding='utf-8').read() + +print('Length of text8: {}'.format(len(data))) + +# Segment the text8 dataset according to the specification +num_test_chars = 5000000 + +train_data = data[: -2 * num_test_chars] +valid_data = data[-2 * num_test_chars: -num_test_chars] +test_data = data[-num_test_chars:] + +for fn, part in [('train.txt', train_data), ('valid.txt', valid_data), ('test.txt', test_data)]: + print('{} will have {} bytes'.format(fn, len(part))) + print('- Tokenizing...') + # Change space ' ' to underscore '_' + part_str = ' '.join(['_' if c == ' ' else c for c in part.strip()]) + print('- Writing...') + f = open(fn, 'w').write(part_str) + f = open(fn + '.raw', 'w', encoding='utf-8').write(part) diff --git a/research/nlp/transformer_xl/requirements.txt b/research/nlp/transformer_xl/requirements.txt new file mode 100644 index 0000000000000000000000000000000000000000..ae46c79cafd7b9d321a9dc2c8eaf53608e53b0cd --- /dev/null +++ b/research/nlp/transformer_xl/requirements.txt @@ -0,0 +1,3 @@ +numpy +easydict +pyyaml diff --git a/research/nlp/transformer_xl/script/run_distribute_train_gpu.sh b/research/nlp/transformer_xl/script/run_distribute_train_gpu.sh new file mode 100644 index 0000000000000000000000000000000000000000..83de2471622cc73c8c28001bdbc88e33f95fd396 --- /dev/null +++ b/research/nlp/transformer_xl/script/run_distribute_train_gpu.sh @@ -0,0 +1,63 @@ +#!/bin/bash +# Copyright 2022 Huawei Technologies Co., Ltd +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================ + +if [ $# != 6 ]; then + echo "Usage: bash run_distributed_train_gpu.sh [DEVICE_NUM] [VISIABLE_DEVICES(0,1,2,3,4,5,6,7)] + [DATA_DIR] [DATA_NAME] [TRAIN_URL] [CONFIG_PATH]" +exit 1 +fi + +if [ $1 -lt 1 ] || [ $1 -gt 8 ]; then + echo "error: DEVICE_NUM=$1 is not in (1-8)" + exit 1 +fi + +DATA_DIR=$3 +DATA_NAME=$4 +TRAIN_URL=$5 +CONFIG_PATH=$6 + +echo "DATA_DIR="$DATA_DIR +echo "DATA_NAME="$DATA_NAME +echo "TRAIN_URL="$TRAIN_URL +echo "CONFIG_PATH="$CONFIG_PATH + +export CONFIG_PATH=${CONFIG_PATH} +export DEVICE_NUM=$1 +export RANK_SIZE=$1 + +BASEPATH=$( + cd "$(dirname $0)" || exit + pwd +) + +export PYTHONPATH=${BASEPATH}:$PYTHONPATH +if [ -d "./train" ]; then + rm -rf ./train +fi +mkdir ./train +cd ./train || exit + +export CUDA_VISIBLE_DEVICES="$2" + +echo "Start Training :)" + +if [ $1 -gt 1 ]; then + mpirun -np $1 --allow-run-as-root --output-filename log_output --merge-stderr-to-stdout \ + python ${BASEPATH}/../train.py --device="GPU" --datadir=$DATA_DIR --dataset=$DATA_NAME --train_url=$TRAIN_URL >train_gpu.log 2>&1 & +else + python ${BASEPATH}/../train.py --device="GPU" --datadir=$DATA_DIR --dataset=$DATA_NAME --train_url=$TRAIN_URL >train_gpu.log 2>&1 & +fi diff --git a/research/nlp/transformer_xl/script/run_eval_gpu.sh b/research/nlp/transformer_xl/script/run_eval_gpu.sh new file mode 100644 index 0000000000000000000000000000000000000000..7d25944f446dea3aa2b9b845d3010cc410d4c58f --- /dev/null +++ b/research/nlp/transformer_xl/script/run_eval_gpu.sh @@ -0,0 +1,69 @@ +#!/bin/bash +# Copyright 2022 Huawei Technologies Co., Ltd +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================ + +if [ $# -lt 4 ] || [ $# -gt 5 ] +then + echo "Usage: bash run_eval_gpu.sh [DATA_DIR] [DATA_NAME] [CKPT_PATH] [CONFIG_PATH] [DEVICE_ID(optional)]" +exit 1 +fi + +export DEVICE_ID=0 + +if [ $# = 5 ] ; then + export DEVICE_ID=$5 +fi; + + +get_real_path(){ + if [ "${1:0:1}" == "/" ]; then + echo "$1" + else + echo "$(realpath -m $PWD/$1)" + fi +} + +DATA_DIR=$(get_real_path $1) + +if [ ! -d $DATA_DIR ] +then + echo "error: DATA_DIR=$DATA_DIR is not a directory" +exit 1 +fi + +DATA_NAME=$2 +CKPT_PATH=$3 +CONFIG_PATH=$4 + +echo "DATA_DIR="$DATA_DIR +echo "DATA_NAME="$DATA_NAME +echo "CKPT_PATH="$CKPT_PATH +echo "CONFIG_PATH="$CONFIG_PATH + +export CONFIG_PATH=${CONFIG_PATH} +export DEVICE_NUM=1 +export RANK_SIZE=$DEVICE_NUM +export RANK_ID=0 +if [ -d "eval" ]; +then + rm -rf ./eval +fi +mkdir ./eval + +env > env.log + +echo "Start evaluation for device $DEVICE_ID :)" + +python ../eval.py --device_id=$DEVICE_ID --datadir=$DATA_DIR --dataset=$DATA_NAME --ckpt_path=$CKPT_PATH --device="GPU" &> eval.log & diff --git a/research/nlp/transformer_xl/script/run_standalone_train_gpu.sh b/research/nlp/transformer_xl/script/run_standalone_train_gpu.sh new file mode 100644 index 0000000000000000000000000000000000000000..c0021e54bea1fcf8e55c5833d81bea9087d65e2b --- /dev/null +++ b/research/nlp/transformer_xl/script/run_standalone_train_gpu.sh @@ -0,0 +1,44 @@ +#!/bin/bash +# Copyright 2022 Huawei Technologies Co., Ltd +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================ + +if [ $# -lt 4 ]; then + echo "Usage: bash run_standalone_train_gpu.sh [DEVICE_ID] [DATA_DIR] [DATA_NAME] + [TRAIN_URL] [CONFIG_PATH]" +exit 1 +fi + +DEVICE_ID=$1 +DATA_DIR=$2 +DATA_NAME=$3 +TRAIN_URL=$4 +CONFIG_PATH=$5 + +echo "DATA_DIR="$DATA_DIR +echo "DATA_NAME="$DATA_NAME +echo "TRAIN_URL="$TRAIN_URL +echo "CONFIG_PATH="$CONFIG_PATH + +export CONFIG_PATH=${CONFIG_PATH} + +if [ -d "./train_stand" ]; then + rm -rf ./train_stand +fi +mkdir ./train_stand +cd ./train_stand || exit + +echo "Start training for device $DEVICE_ID :)" + +CUDA_VISIBLE_DEVICES=$DEVICE_ID python ../../train.py --device="GPU" --datadir=$DATA_DIR --dataset=$DATA_NAME --train_url=$TRAIN_URL > train_stand_gpu.log 2>&1 & diff --git a/research/nlp/transformer_xl/src/__init__.py b/research/nlp/transformer_xl/src/__init__.py new file mode 100644 index 0000000000000000000000000000000000000000..602527cd720c8d268599dbaef190ba1cf1eb6f2b --- /dev/null +++ b/research/nlp/transformer_xl/src/__init__.py @@ -0,0 +1,14 @@ +# Copyright 2022 Huawei Technologies Co., Ltd +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================ diff --git a/research/nlp/transformer_xl/src/callback/eval.py b/research/nlp/transformer_xl/src/callback/eval.py new file mode 100644 index 0000000000000000000000000000000000000000..f4408ce34b689901abfb382284bbc3bb1ac36f6c --- /dev/null +++ b/research/nlp/transformer_xl/src/callback/eval.py @@ -0,0 +1,86 @@ +# Copyright 2022 Huawei Technologies Co., Ltd +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================ + +import time +import os +import numpy as np +from mindspore.train.callback import Callback +from mindspore import save_checkpoint +from src.model_utils.device_adapter import get_device_id +from src.model_utils.config import config +from src.metric.calc import bpc, ppl + + +def doEval(net, dataset, tgt_len, ext_len, mem_len, eval_tgt_len): + """Separate eval for valid and test""" + net.set_train(tgt_len, ext_len, mem_len, eval_tgt_len, False) + total_len, total_loss = 0, 0. + idx = 0 + for data, target in dataset.create_tuple_iterator(): + loss = net(data, target, idx) + idx = 1 + seq_len = target.shape[0] + total_loss += seq_len * loss + total_len += seq_len + if net.is_first_iteration: + net.add_flags_recursive(is_first_iteration=False) + + test_loss = total_loss / total_len + test_loss = np.mean(test_loss.asnumpy()) + net.set_train(tgt_len, ext_len, mem_len, eval_tgt_len, True) + return test_loss + + +class EvalDuringTrain(Callback): + def __init__(self, dataset, per_print_times, tgt_len, ext_len, mem_len, + eval_tgt_len): + super(EvalDuringTrain, self).__init__() + self.dataset = dataset + self._per_print_times = per_print_times + self.best_val_loss = None + self.tgt_len = tgt_len + self.ext_len = ext_len + self.mem_len = mem_len + self.eval_tgt_len = eval_tgt_len + + def step_end(self, run_context): + """Called after each step finished.""" + device_id = get_device_id() + cb_params = run_context.original_args() + train_step = cb_params.cur_epoch_num + if self._per_print_times != 0 and train_step % self._per_print_times == 0: + eval_start_time = time.time() + net = cb_params.network + + valid_loss = doEval(net, self.dataset, tgt_len=self.tgt_len, ext_len=self.ext_len, mem_len=self.mem_len, + eval_tgt_len=self.eval_tgt_len) + + print('-' * 100) + log_str = '| Eval {:3d} at step {:>8d} | time: {:5.2f}s ' \ + '| valid loss {:5.2f}'.format(train_step // self._per_print_times, train_step, + (time.time() - eval_start_time), valid_loss) + if config.dataset in ['enwik8', 'text8']: + log_str += ' | valid bpc {:9.5f}'.format(bpc(valid_loss)) + else: + log_str += ' | valid ppl {:9.3f}'.format(ppl(valid_loss)) + print(log_str) + print('-' * 100) + + if not self.best_val_loss or valid_loss < self.best_val_loss: + model_filename = os.path.join(config.train_url, 'model' + str(device_id) + '.ckpt') + optimizer_filename = os.path.join(config.train_url, 'optimizer' + str(device_id) + '.ckpt') + save_checkpoint(net, model_filename) + save_checkpoint(cb_params.optimizer, optimizer_filename) + self.best_val_loss = valid_loss diff --git a/research/nlp/transformer_xl/src/callback/flag.py b/research/nlp/transformer_xl/src/callback/flag.py new file mode 100644 index 0000000000000000000000000000000000000000..cffae73a469bb8c2a65db59da6116b72d4fd4371 --- /dev/null +++ b/research/nlp/transformer_xl/src/callback/flag.py @@ -0,0 +1,26 @@ +# Copyright 2022 Huawei Technologies Co., Ltd +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================ + +from mindspore.train.callback import Callback + + +class FlagModifiedCallback(Callback): + + def step_end(self, run_context): + """Called after each step finished.""" + cb_params = run_context.original_args() + net = cb_params.network + if net.is_first_iteration: + net.add_flags_recursive(is_first_iteration=False) diff --git a/research/nlp/transformer_xl/src/callback/log.py b/research/nlp/transformer_xl/src/callback/log.py new file mode 100644 index 0000000000000000000000000000000000000000..eff31b2e560ddd9fc208aaa9733089e7ebe219b7 --- /dev/null +++ b/research/nlp/transformer_xl/src/callback/log.py @@ -0,0 +1,68 @@ +# Copyright 2022 Huawei Technologies Co., Ltd +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================ + +import math +import time +import numpy as np + +import mindspore as ms +from mindspore import Tensor +from mindspore.train.callback import LossMonitor + +from src.metric.calc import bpc, ppl +from src.model_utils.config import config + + +class TrainLogger(LossMonitor): + def __init__(self, per_print_times, n_batch): + super(TrainLogger, self).__init__(per_print_times) + self.log_start_time = 0 + self.n_batch = n_batch + self.train_loss = 0.0 + self.log_start_time = time.time() + + def step_end(self, run_context): + """Called after each step finished.""" + cb_params = run_context.original_args() + train_step = cb_params.cur_epoch_num + + loss = cb_params.net_outputs + + if isinstance(loss, (tuple, list)): + if isinstance(loss[0], Tensor) and isinstance(loss[0].asnumpy(), np.ndarray): + loss = loss[0] + + if isinstance(loss, Tensor) and isinstance(loss.asnumpy(), np.ndarray): + loss = np.mean(loss.asnumpy()) + + self.train_loss += loss + if self._per_print_times != 0 and train_step % self._per_print_times == 0: + epoch = math.ceil(train_step / self.n_batch) + cur_loss = self.train_loss / self._per_print_times + elapsed = time.time() - self.log_start_time + batch = train_step % (self.n_batch + 1) + (0 if epoch == 1 else 1) + optimizer = cb_params.optimizer + train_step_t = Tensor(train_step, ms.int32) + lr = optimizer.learning_rate(train_step_t).asnumpy() + log_str = '| epoch {:3d} step {:>8d} | {:>6d} batches | lr {:.3g} ' \ + '| ms/step {:5.2f} | loss {:5.2f}'.format(epoch, train_step, batch, lr, + elapsed * 1000 / self._per_print_times, cur_loss) + if config.dataset in ['enwik8', 'text8']: + log_str += ' | bpc {:9.5f}'.format(bpc(cur_loss)) + else: + log_str += ' | ppl {:9.3f}'.format(ppl(cur_loss)) + print(log_str) + self.train_loss = 0.0 + self.log_start_time = time.time() diff --git a/research/nlp/transformer_xl/src/loss_fn/ProjectedAdaptiveLogSoftmaxLoss.py b/research/nlp/transformer_xl/src/loss_fn/ProjectedAdaptiveLogSoftmaxLoss.py new file mode 100644 index 0000000000000000000000000000000000000000..7c37aeac10770e57f123eccb4f17e2c51987e2b1 --- /dev/null +++ b/research/nlp/transformer_xl/src/loss_fn/ProjectedAdaptiveLogSoftmaxLoss.py @@ -0,0 +1,95 @@ +# Copyright 2022 Huawei Technologies Co., Ltd +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================ + +import mindspore as ms +import mindspore.nn as nn +import mindspore.ops as P +from mindspore.nn import LossBase +from mindspore.ops import Zeros +from mindspore.ops import ExpandDims, Concat, Squeeze +from src.utils.additional_algorithms import linear + + +class ProjectedAdaptiveLogSoftmaxLoss(LossBase): + def __init__(self, n_token, d_embed, d_proj, cutoffs, div_val=1, tie_projs=None, + keep_order=False): + super(ProjectedAdaptiveLogSoftmaxLoss, self).__init__() + self.squeeze_1 = Squeeze(1) + self.gather = P.GatherD() + self.zeros = Zeros() + self.expandDims = ExpandDims() + self.concat_0 = Concat(0) + self.log_softmax_n_1 = nn.LogSoftmax() + self.log_softmax_1 = nn.LogSoftmax(1) + if tie_projs is None: + tie_projs = [False] + self.n_token = n_token + self.d_embed = d_embed + self.d_proj = d_proj + + self.cutoffs = cutoffs + [n_token] + self.cutoff_ends = [0] + self.cutoffs + self.div_val = div_val + + self.shortlist_size = self.cutoffs[0] + self.n_clusters = len(self.cutoffs) - 1 + self.head_size = self.shortlist_size + self.n_clusters + + if self.n_clusters > 0: + self.cluster_weight = ms.Parameter(self.zeros((self.n_clusters, self.d_embed), ms.float32)) + self.cluster_bias = ms.Parameter(self.zeros(self.n_clusters, ms.float32)) + + self.out_layers = nn.CellList() + parameters = [] + + if div_val == 1: + for i in range(len(self.cutoffs)): + if d_proj != d_embed: + parameters.append( + ms.Parameter(self.zeros((d_proj, d_embed), ms.float32)) + ) + + self.out_layers.append(nn.Dense(d_embed, n_token)) + else: + for i in range(len(self.cutoffs)): + l_idx, r_idx = self.cutoff_ends[i], self.cutoff_ends[i + 1] + d_emb_i = d_embed // (div_val ** i) + + parameters.append( + ms.Parameter(self.zeros((d_proj, d_emb_i), ms.float32)) + ) + + self.out_layers.append(nn.Dense(d_emb_i, r_idx - l_idx)) + + self.out_projs = ms.ParameterTuple(parameters) + self.keep_order = keep_order + + def _compute_logit(self, hidden, weight, bias, proj=None): + if proj is None: + logit = linear(hidden, weight, bias) + else: + proj_hid = linear(hidden, proj.T) + logit = linear(proj_hid, weight, bias) + return logit + + def construct(self, hidden, target): + """ + hidden :: [len*bsz x d_proj] + target :: [len*bsz] + """ + + logit = self.out_layers[0](hidden) + nll = self.squeeze_1(self.gather(-self.log_softmax_n_1(logit), 1, self.expandDims(target, 1))) + return self.get_loss(nll) diff --git a/research/nlp/transformer_xl/src/metric/calc.py b/research/nlp/transformer_xl/src/metric/calc.py new file mode 100644 index 0000000000000000000000000000000000000000..3d5b8bd2d5f20bf6b1c69a511f58ce6e764921fc --- /dev/null +++ b/research/nlp/transformer_xl/src/metric/calc.py @@ -0,0 +1,59 @@ +# Copyright 2022 Huawei Technologies Co., Ltd +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================ + +import math +from mindspore.nn import Metric + + +def bpc(loss): + return loss / math.log(2) + + +def ppl(loss): + return math.exp(loss) + + +class BPC(Metric): + def __init__(self): + super(BPC, self).__init__() + + self.loss = 0.0 + self.log_2 = math.log(2) + + def clear(self): + """Clears the internal evaluation result.""" + self.loss = 0.0 + + def update(self, loss): + self.loss = loss + + def eval(self): + return self.loss / self.log_2 + + +class PPL(Metric): + def __init__(self): + super(PPL, self).__init__() + self.loss = 0.0 + + def clear(self): + """Clears the internal evaluation result.""" + self.loss = 0.0 + + def update(self, loss): + self.loss = loss + + def eval(self): + return math.exp(self.loss) diff --git a/research/nlp/transformer_xl/src/model/attn.py b/research/nlp/transformer_xl/src/model/attn.py new file mode 100644 index 0000000000000000000000000000000000000000..58ecc7f145c5a77488b1fb142d2bee65a7c706eb --- /dev/null +++ b/research/nlp/transformer_xl/src/model/attn.py @@ -0,0 +1,165 @@ +# Copyright 2022 Huawei Technologies Co., Ltd +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================ + +# MindSpore r1.7 +# import os +# os.environ["PATH"] = os.environ["PATH"] + ":/usr/local/cuda/bin/" +# from mindspore.ops import Einsum +import mindspore as ms +import mindspore.nn as nn +from mindspore.nn import Tril, Triu +from mindspore.ops import Zeros, Ones +from mindspore.ops import ExpandDims, Concat, Split +from mindspore.ops import Transpose, BatchMatMul, Tile +from mindspore.ops import Softmax, Mul +from src.utils.additional_algorithms import MaskerFill + +class RelMultiHeadAttn(nn.Cell): + def __init__(self, n_head, d_model, d_head, dropout, dropatt=0.0, pre_lnorm=False): + super(RelMultiHeadAttn, self).__init__() + + self.zeros, self.ones = Zeros(), Ones() + self.expandDims, self.concat_0, self.concat_1 = ExpandDims(), Concat(0), Concat(1) + self.split_n_1_2, self.split_n_1_3 = Split(-1, 2), Split(-1, 3) + self.tril, self.triu = Tril(), Triu() + self.transpose = Transpose() + self.batchMatMul = BatchMatMul() + self.tile = Tile() + self.maskerFill = MaskerFill() + self.softmax_1 = Softmax(1) + self.mul = Mul() + self.n_head = n_head + self.d_model = d_model + self.d_head = d_head + self.dropout = dropout + + self.qkv_net = nn.Dense(d_model, 3 * n_head * d_head, has_bias=False) + self.drop = nn.Dropout(1 - dropout, dtype=ms.float32) + self.dropatt = nn.Dropout(1 - dropout, dtype=ms.float32) + self.o_net = nn.Dense(n_head * d_head, d_model, has_bias=False) + self.layer_norm = nn.LayerNorm([d_model]) + self.scale = 1 / (d_head ** 0.5) + self.pre_lnorm = pre_lnorm + self.negative_inf = -1e9 + + def _rel_shift(self, x, zero_triu=False): + zero_pad = self.zeros((x.shape[0], 1, x.shape[2], x.shape[3]), x.dtype) + x_padded = self.concat_1((zero_pad, x)) + x_padded = x_padded.view(x.shape[1] + 1, x.shape[0], x.shape[2], x.shape[3]) + + x = x_padded[1:].reshape(x.shape) + + if zero_triu: + _ones = self.ones((x.shape[0], x.shape[1])) + x = x * self.tril(_ones, x.shape[1] - x.shape[0])[:, :, None, None] + + return x + + def construct(self, w, r, r_w_bias, r_r_bias, mems=None, attn_mask=None): + raise NotImplementedError + + +class RelPartialLearnableMultiHeadAttn(RelMultiHeadAttn): + def __init__(self, *args, **kwargs): + super(RelPartialLearnableMultiHeadAttn, self).__init__(*args, **kwargs) + + self.r_net = nn.Dense(self.d_model, self.n_head * self.d_head, has_bias=False) + + def construct(self, w, r, r_w_bias, r_r_bias, mems=None, attn_mask=None): + qlen, rlen, bsz = w.shape[0], r.shape[0], w.shape[1] + + if not self.is_first_iteration: + cat = self.concat_0([mems, w]) + if self.pre_lnorm: + w_heads = self.qkv_net(self.layer_norm(cat)) + else: + w_heads = self.qkv_net(cat) + r_head_k = self.r_net(r) + + w_head_q, w_head_k, w_head_v = self.split_n_1_3(w_heads) + w_head_q = w_head_q[-qlen:] + else: + if self.pre_lnorm: + w_heads = self.qkv_net(self.layer_norm(w)) + else: + w_heads = self.qkv_net(w) + r_head_k = self.r_net(r) + w_head_q, w_head_k, w_head_v = self.split_n_1_3(w_heads) + + klen = w_head_k.shape[0] + + w_head_q = w_head_q.view(qlen, bsz, self.n_head, self.d_head) # qlen x bsz x n_head x d_head + w_head_k = w_head_k.view(klen, bsz, self.n_head, self.d_head) # qlen x bsz x n_head x d_head + w_head_v = w_head_v.view(klen, bsz, self.n_head, self.d_head) # qlen x bsz x n_head x d_head + + r_head_k = r_head_k.view(rlen, self.n_head, self.d_head) # qlen x n_head x d_head + + # compute attention score + rw_head_q = w_head_q + r_w_bias # qlen x bsz x n_head x d_head + rr_head_q = w_head_q + r_r_bias + + # qlen x klen x bsz x n_head + + AC = self.transpose( + self.batchMatMul(self.transpose(rw_head_q, (1, 2, 0, 3)), self.transpose(w_head_k, (1, 2, 3, 0))), + (2, 3, 0, 1)) + + rr_head_q_t = self.transpose(rr_head_q, (1, 2, 0, 3)) + r_head_k_t = self.transpose(r_head_k, (1, 2, 0)) + BD = self.transpose( + self.batchMatMul(rr_head_q_t, self.tile(self.expandDims(r_head_k_t, 0), (rr_head_q_t.shape[0], 1, 1, 1))), + (2, 3, 0, 1)) + + BD = self._rel_shift(BD) + + # [qlen x klen x bsz x n_head] + attn_score = AC + BD + + attn_score *= self.scale + + # compute attention probability + if attn_mask is not None: + if attn_mask.ndim == 2: + attn_mask_ = self.tile(self.expandDims(self.expandDims(attn_mask, -1), 0), + (1, 1, attn_score.shape[2], attn_score.shape[3])) + attn_score = self.maskerFill(attn_score, attn_mask_, self.negative_inf) + elif attn_mask.ndim == 3: + attn_mask_ = self.tile(self.expandDims(attn_mask, -1), (1, 1, attn_score.shape[2], attn_score.shape[3])) + attn_score = self.maskerFill(attn_score, attn_mask_, self.negative_inf) + + # [qlen x klen x bsz x n_head] + attn_prob = self.softmax_1(attn_score) + attn_prob = self.dropatt(attn_prob) + # compute attention vector + attn_vec = self.transpose( + self.batchMatMul(self.transpose(attn_prob, (2, 3, 0, 1)), self.transpose(w_head_v, (1, 2, 0, 3))), + (2, 0, 1, 3)) + + # [qlen x bsz x n_head x d_head] + attn_vec = attn_vec.reshape( + attn_vec.shape[0], attn_vec.shape[1], self.n_head * self.d_head) + + # linear projection + attn_out = self.o_net(attn_vec) + attn_out = self.drop(attn_out) + + if self.pre_lnorm: + # residual connection + output = w + attn_out + else: + # residual connection + layer normalization + output = self.layer_norm(w + attn_out) + + return output diff --git a/research/nlp/transformer_xl/src/model/dataset.py b/research/nlp/transformer_xl/src/model/dataset.py new file mode 100644 index 0000000000000000000000000000000000000000..2e314f6339c9a2dcbe68c3d1214dd2c86c646c5e --- /dev/null +++ b/research/nlp/transformer_xl/src/model/dataset.py @@ -0,0 +1,157 @@ +# Copyright 2022 Huawei Technologies Co., Ltd +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================ + +import os +import numpy as np +from src.model.vocabulary import Vocab + + +class Generator: + # LM1B dataset + def __init__(self, _data, batch_size, tgt_len, ext_len=None): + super(Generator, self).__init__() + self.bsz = batch_size + self.bptt = tgt_len + self.ext_len = ext_len if ext_len is not None else 0 + + # Work out how cleanly we can divide the dataset into bsz parts. + self.n_step = _data.size // self.bsz + + # Trim off any extra elements that wouldn't cleanly fit (remainders). + _data = _data[:self.n_step * self.bsz] + + # Evenly divide the data across the bsz batches. + self._data = _data.reshape(self.bsz, -1).T + self._data = self._data.astype(np.int32) + + # Number of mini-batches + self.n_batch = self.n_step // self.bptt + + def __getitem__(self, item): + item *= self.bptt + _seq_len = min(self.bptt, self._data.size - 1 - item) + + end_idx = item + _seq_len + beg_idx = max(0, item - self.ext_len) + + _data = self._data[beg_idx: end_idx] + _target = self._data[item + 1:item + 1 + _seq_len] + return _data, _target + + def __len__(self): + return self.n_batch + + +class VariableGenerator(Generator): + def __init__(self, _data, batch_size, tgt_len, ext_len=None, start=0, std=5, min_len=5, max_deviation=3): + super(VariableGenerator, self).__init__(_data, batch_size, tgt_len, ext_len) + self.start = start + self.std = std + self.min_len = min_len + self.max_deviation = max_deviation + self.max_len = self.bptt + max_deviation * std + + self.bptt_arr = [] + j = start + while j < self._data.size - 2: + bptt = self.bptt if np.random.random() < 0.95 else self.bptt / 2. + bptt = min(self.max_len, max(self.min_len, int(np.random.normal(bptt, self.std)))) + self.bptt_arr.append(bptt) + _seq_len = min(bptt, self._data.size - 1 - j) + j += _seq_len + self.len = len(self.bptt_arr) + self.index = 0 + + def __getitem__(self, item): + bptt = self.bptt_arr[self.index] + self.index += 1 + _seq_len = min(bptt, len(self._data) - 1 - item) + + end_idx = item + _seq_len + beg_idx = max(0, item - self.ext_len) + + _data = self._data[beg_idx:end_idx] + _target = self._data[item + 1:item + 1 + _seq_len] + return _data, _target + + def __len__(self): + return self.len + + +class AbstractDataset: + def __init__(self, path, _dataset, *_args, **kwargs): + super(AbstractDataset, self).__init__() + self.path = path + self.dataset = _dataset + self.args = _args + self.kwargs = kwargs + + def write(self): + pass + + def get_train_generator(self): + return self.train_generator + + def get_valid_generator(self): + return self.valid_generator + + def get_test_generator(self): + return self.test_generator + + +class Enwik8_Dataset(AbstractDataset): + def __init__(self, path, _dataset, batch_size, tgt_len, *_args, ext_len=None, eval_tgt_len=None, varlen=False, + **kwargs): + super(Enwik8_Dataset, self).__init__(path, _dataset, *_args, **kwargs) + self.vocab = Vocab() + self.vocab.count_file(os.path.join(path, 'train.txt')) + self.vocab.count_file(os.path.join(path, 'valid.txt')) + self.vocab.count_file(os.path.join(path, 'test.txt')) + self.vocab.build_vocab() + self.train = self.vocab.encode_file( + os.path.join(path, 'train.txt'), ordered=True, add_eos=False) + self.valid = self.vocab.encode_file( + os.path.join(path, 'valid.txt'), ordered=True, add_eos=False) + self.test = self.vocab.encode_file( + os.path.join(path, 'test.txt'), ordered=True, add_eos=False) + self.train_generator = getGenerator(self.train, batch_size, tgt_len, ext_len, varlen) + self.valid_generator = getGenerator(self.valid, batch_size, eval_tgt_len, ext_len, varlen) + self.test_generator = getGenerator(self.test, batch_size, eval_tgt_len, ext_len, varlen) + + +class Text8_Dataset(AbstractDataset): + def __init__(self, path, _dataset, batch_size, tgt_len, *_args, ext_len=None, eval_tgt_len=None, varlen=False, + **kwargs): + super(Text8_Dataset, self).__init__(path, _dataset, *_args, **kwargs) + self.vocab = Vocab() + self.vocab.count_file(os.path.join(path, 'train.txt')) + self.vocab.count_file(os.path.join(path, 'valid.txt')) + self.vocab.count_file(os.path.join(path, 'test.txt')) + self.vocab.build_vocab() + self.train = self.vocab.encode_file( + os.path.join(path, 'train.txt'), ordered=True, add_eos=False) + self.valid = self.vocab.encode_file( + os.path.join(path, 'valid.txt'), ordered=True, add_eos=False) + self.test = self.vocab.encode_file( + os.path.join(path, 'test.txt'), ordered=True, add_eos=False) + self.train_generator = getGenerator(self.train, batch_size, tgt_len, ext_len, varlen) + self.valid_generator = getGenerator(self.valid, batch_size, eval_tgt_len, ext_len, varlen) + self.test_generator = getGenerator(self.test, batch_size, eval_tgt_len, ext_len, varlen) + + +def getGenerator(_data, batch_size, tgt_len, ext_len=None, varlen=False, start=0, std=5, min_len=5, max_deviation=3): + if varlen: + return VariableGenerator(_data, batch_size, tgt_len, ext_len, start, std, min_len, max_deviation) + return Generator(_data, batch_size, tgt_len, ext_len) diff --git a/research/nlp/transformer_xl/src/model/embedding.py b/research/nlp/transformer_xl/src/model/embedding.py new file mode 100644 index 0000000000000000000000000000000000000000..f6e3869f47f6ebae38d75e3048e2988b8f79c3d0 --- /dev/null +++ b/research/nlp/transformer_xl/src/model/embedding.py @@ -0,0 +1,84 @@ +# Copyright 2022 Huawei Technologies Co., Ltd +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================ + +import numpy as np +import mindspore as ms +import mindspore.nn as nn +from mindspore.ops import Zeros, Concat, BroadcastTo +from mindspore.ops import Sin, Cos +from mindspore.numpy import outer +from src.utils.additional_algorithms import linear + + +class PositionalEmbedding(nn.Cell): + def __init__(self, demb): + super(PositionalEmbedding, self).__init__() + self.concat_n_1 = Concat(-1) + self.sin = Sin() + self.cos = Cos() + self.demb = demb + self.inv_freq = ms.Tensor(1 / (10000 ** (np.arange(0.0, demb, 2.0) / demb)), ms.float32) + + def construct(self, pos_seq, bsz=None): + sinusoid_inp = outer(pos_seq, self.inv_freq) + pos_emb = self.concat_n_1([self.sin(sinusoid_inp), self.cos(sinusoid_inp)]) + + if bsz is not None: + return BroadcastTo(-1, bsz, -1)(pos_emb[:, None, :]) + return pos_emb[:, None, :] + + +class AdaptiveEmbedding(nn.Cell): + def __init__(self, n_token, d_embed, d_proj, cutoffs, div_val=1): + super(AdaptiveEmbedding, self).__init__() + self.zeros = Zeros() + self.n_token = n_token + self.d_embed = d_embed + + self.cutoffs = cutoffs + [n_token] + self.div_val = div_val + self.d_proj = d_proj + + self.emb_scale = d_proj ** 0.5 + + self.cutoff_ends = [0] + self.cutoffs + + self.emb_layers = nn.CellList() + parameters = [] + if div_val == 1: + self.emb_layers.append( + nn.Embedding(n_token, d_embed) + ) + if d_proj != d_embed: + parameters.append(ms.Parameter(self.zeros((d_proj, d_embed), ms.float32))) + else: + for i in range(len(self.cutoffs)): + l_idx, r_idx = self.cutoff_ends[i], self.cutoff_ends[i + 1] + d_emb_i = d_embed // (div_val ** i) + self.emb_layers.append(nn.Embedding(r_idx - l_idx, d_emb_i)) + parameters.append(ms.Parameter(self.zeros((d_proj, d_emb_i), ms.float32))) + self.emb_projs = ms.ParameterTuple(parameters) + + def construct(self, inp): + if self.div_val == 1: + embed = self.emb_layers[0](inp) + if self.d_proj != self.d_embed: + embed = linear(embed, self.emb_projs[0]) + else: + embed = self.emb_layers[0](inp) + + embed *= self.emb_scale + + return embed diff --git a/research/nlp/transformer_xl/src/model/layer.py b/research/nlp/transformer_xl/src/model/layer.py new file mode 100644 index 0000000000000000000000000000000000000000..d37898bfabed0327f39ddfcc2be280f23a619dbd --- /dev/null +++ b/research/nlp/transformer_xl/src/model/layer.py @@ -0,0 +1,32 @@ +# Copyright 2022 Huawei Technologies Co., Ltd +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================ + +from mindspore.nn import Cell +from src.model.attn import RelPartialLearnableMultiHeadAttn +from src.model.positionwiseFF import PositionwiseFF + + +class RelPartialLearnableDecoderLayer(Cell): + def __init__(self, n_head, d_model, d_head, d_inner, dropout, + **kwargs): + super(RelPartialLearnableDecoderLayer, self).__init__() + + self.attn = RelPartialLearnableMultiHeadAttn(n_head, d_model, d_head, dropout, **kwargs) + self.pos_ff = PositionwiseFF(d_model, d_inner, dropout, pre_lnorm=kwargs.get('pre_lnorm')) + + def construct(self, dec_inp, r, r_w_bias, r_r_bias, mems=None, attn_mask=None): + output = self.attn(dec_inp, r, r_w_bias, r_r_bias, attn_mask=attn_mask, mems=mems) + output = self.pos_ff(output) + return output diff --git a/research/nlp/transformer_xl/src/model/mem_transformer.py b/research/nlp/transformer_xl/src/model/mem_transformer.py new file mode 100644 index 0000000000000000000000000000000000000000..555d925017de91e598f5083a88c5339ee443ffaf --- /dev/null +++ b/research/nlp/transformer_xl/src/model/mem_transformer.py @@ -0,0 +1,233 @@ +# Copyright 2022 Huawei Technologies Co., Ltd +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================ + +import mindspore.numpy as np +import mindspore as ms +import mindspore.nn as nn +import mindspore.ops as P +from mindspore import Tensor +from mindspore import Parameter +from mindspore.ops import Zeros, Ones +from mindspore.ops import ExpandDims, Concat +from mindspore.ops import clip_by_value +from mindspore.nn import Tril, Triu +from src.loss_fn.ProjectedAdaptiveLogSoftmaxLoss import ProjectedAdaptiveLogSoftmaxLoss +from src.model.embedding import AdaptiveEmbedding, PositionalEmbedding +from src.model.layer import RelPartialLearnableDecoderLayer + + +class MemTransformerLM(nn.Cell): + def __init__(self, n_token, n_layer, n_head, d_model, d_head, d_inner, + dropout, dropatt, batch_size, d_embed=None, + div_val=1, pre_lnorm=False, + tgt_len=None, ext_len=None, mem_len=None, eval_tgt_len=None, + cutoffs=None, sample_softmax=-1, tie_weight=True, tie_projs=None, + same_length=False, clamp_len=-1): + super(MemTransformerLM, self).__init__() + + if tie_projs is None: + tie_projs = [False] + if cutoffs is None: + cutoffs = [] + self.assign = P.Assign() + self.zeros, self.ones = Zeros(), Ones() + self.expandDims, self.concat_0, self.concat_1 = ExpandDims(), Concat(0), Concat(1) + self.tril, self.triu = Tril(), Triu() + + self.n_token = n_token + + d_embed = d_model if d_embed is None else d_embed + self.d_embed = d_embed + self.d_model = d_model + self.n_head = n_head + self.d_head = d_head + self.batch_size = batch_size + + self.word_emb = AdaptiveEmbedding(n_token, d_embed, d_model, cutoffs, + div_val=div_val) + self.drop = nn.Dropout(1 - dropout, dtype=ms.float32) + self.n_layer = n_layer + self.tgt_len = tgt_len + self.mem_len = mem_len + self.ext_len = ext_len + self.eval_tgt_len = eval_tgt_len + self.max_klen = tgt_len + ext_len + mem_len + self.layers = nn.CellList() + + for i in range(n_layer): + self.layers.append( + RelPartialLearnableDecoderLayer( + n_head, d_model, d_head, d_inner, dropout, dropatt=dropatt, pre_lnorm=pre_lnorm) + ) + + self.sample_softmax = sample_softmax + # use sampled softmax + if self.sample_softmax > 0: + self.out_layer = nn.Dense(d_model, n_token) + if tie_weight: + self.out_layer.weight = self.word_emb.emb_projs[0].embedding_table + self.tie_weight = tie_weight + + # use adaptive softmax (including standard softmax) + else: + self.crit = ProjectedAdaptiveLogSoftmaxLoss(n_token, d_embed, d_model, + cutoffs, div_val=div_val) + + if tie_weight: + for i in range(len(self.crit.out_layers)): + self.crit.out_layers[i].weight = self.word_emb.emb_layers[i].embedding_table + + if tie_projs: + for i, tie_proj in enumerate(tie_projs): + if tie_proj and div_val == 1 and d_model != d_embed: + self.crit.out_projs[i] = self.word_emb.emb_projs[0] + elif tie_proj and div_val != 1: + self.crit.out_projs[i] = self.word_emb.emb_projs[i] + + self.same_length = same_length + self.clamp_len = Tensor(clamp_len, ms.float32) + self.min_clamp_len = Tensor(0, ms.float32) + + self._create_params() + + self.add_flags_recursive(is_first_iteration=True) + + def backward_compatible(self): + self.sample_softmax = -1 + + def _create_params(self): + self.pos_emb = PositionalEmbedding(self.d_model) + self.r_w_bias = Parameter(self.zeros((self.n_head, self.d_head), ms.float32)) + self.r_r_bias = Parameter(self.zeros((self.n_head, self.d_head), ms.float32)) + self.mems = Parameter( + self.zeros((self.n_layer, self.mem_len, self.batch_size, self.d_model), ms.float32), + requires_grad=False) + self.valid_mems = Parameter( + self.zeros((self.n_layer, self.mem_len + self.tgt_len - self.eval_tgt_len, self.batch_size, self.d_model), + ms.float32), requires_grad=False) + self.empty_valid_mems = Parameter( + self.zeros((self.n_layer, self.mem_len + self.tgt_len - self.eval_tgt_len, self.batch_size, self.d_model), + ms.float32), requires_grad=False) + + def reset_length(self, tgt_len, ext_len, mem_len): + self.tgt_len = tgt_len + self.mem_len = mem_len + self.ext_len = ext_len + return True + + def _update_mems(self, hids, qlen, mlen): + if self.training: # update mems # + if self.mem_len > 0: + # There are `mlen + qlen` steps that can be cached into mems + # For the next step, the last `ext_len` of the `qlen` tokens + # will be used as the extended context. Hence, we only cache + # the tokens from `mlen + qlen - self.ext_len - self.mem_len` + # to `mlen + qlen - self.ext_len`. + for i, h in enumerate(hids): + hids[i] = self.expandDims(h, 0) + + # graph mode not support function max() + end_idx = mlen if qlen - self.ext_len < 0 else qlen - self.ext_len + mlen + beg_idx = 0 if end_idx - self.mem_len < 0 else end_idx - self.mem_len + cat = self.concat_0(hids) + cat = self.concat_1((self.mems, cat)) + cat = cat[:, beg_idx:end_idx] + self.assign(self.mems, cat) + else: # update mems # + if self.mem_len > 0: + for i, h in enumerate(hids): + hids[i] = self.expandDims(h, 0) + + if self.is_first_iteration: + cat = self.concat_0(hids) + cat = self.sameShape(cat, self.valid_mems) + self.assign(self.valid_mems, cat) + else: + end_idx = mlen if qlen - self.ext_len < 0 else qlen - self.ext_len + mlen + beg_idx = 0 if end_idx - self.mem_len < 0 else end_idx - self.mem_len + cat = self.concat_0(hids) + cat = self.concat_1((self.valid_mems, cat)) + cat = cat[:, beg_idx:end_idx] + self.assign(self.valid_mems, cat) + return True + + def sameShape(self, a, b): + c = self.zeros((a.shape[0], b.shape[1] - a.shape[1], a.shape[2], a.shape[3]), ms.float32) + a = self.concat_1((c, a)) + return a + + def set_train(self, tgt_len, ext_len, mem_len, eval_tgt_len, mode=True): + super(MemTransformerLM, self).set_train(mode=mode) + if mode: + # Switch back to the training mode + self.reset_length(tgt_len, ext_len, mem_len) + else: + # If the model does not use memory at all, make the ext_len longer. + # Otherwise, make the mem_len longer and keep the ext_len the same. + self.assign(self.valid_mems, self.empty_valid_mems) + self.add_flags_recursive(is_first_iteration=True) + if mem_len == 0: + self.reset_length(eval_tgt_len, + ext_len + tgt_len - eval_tgt_len, mem_len) + else: + self.reset_length(eval_tgt_len, + ext_len, mem_len + tgt_len - eval_tgt_len) + return True + + def construct(self, data, target, idx=None): + + tgt_len = target.size + qlen, _ = data.shape + word_emb = self.word_emb(data) + + mems = self.mems if self.training else self.valid_mems + mlen = 0 if self.is_first_iteration \ + else (self.mem_len if self.training else self.mem_len + self.tgt_len - self.eval_tgt_len) + + klen = qlen + mlen + all_ones = np.ones((qlen, klen), ms.int16) + + if self.same_length: + mask_len = klen - self.mem_len + if mask_len > 0: + mask_shift_len = qlen - mask_len + else: + mask_shift_len = qlen + dec_attn_mask = np.expand_dims((np.triu((all_ones, 1 + mlen), ms.int16) + + np.tril((all_ones, -mask_shift_len), ms.int16)), -1) # -1 + else: + dec_attn_mask = np.expand_dims(np.triu(all_ones, 1 + mlen), -1) + + hids = [] + + pos_seq = np.arange(klen - 1, -1, -1, dtype=word_emb.dtype) + if self.clamp_len > 0: + pos_seq = clip_by_value(pos_seq, clip_value_min=self.min_clamp_len, clip_value_max=self.clamp_len) + pos_emb = self.pos_emb(pos_seq) + + core_out = self.drop(word_emb) + pos_emb = self.drop(pos_emb) + + for i, layer in enumerate(self.layers): + hids.append(core_out) + core_out = layer(core_out, pos_emb, self.r_w_bias, self.r_r_bias, attn_mask=dec_attn_mask, mems=mems[i]) + + hidden = self.drop(core_out) + + self._update_mems(hids, qlen, mlen) + + pred_hid = hidden[-tgt_len:] + loss = self.crit(pred_hid.reshape(-1, pred_hid.shape[-1]), target.reshape(-1)) + return loss diff --git a/research/nlp/transformer_xl/src/model/positionwiseFF.py b/research/nlp/transformer_xl/src/model/positionwiseFF.py new file mode 100644 index 0000000000000000000000000000000000000000..3e430947da64c028d6f3abd72589dfc096e13b46 --- /dev/null +++ b/research/nlp/transformer_xl/src/model/positionwiseFF.py @@ -0,0 +1,55 @@ +# Copyright 2022 Huawei Technologies Co., Ltd +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================ + +import mindspore as ms +from mindspore.nn import Cell +import mindspore.nn as nn + + +class PositionwiseFF(Cell): + def __init__(self, d_model, d_inner, dropout, pre_lnorm=False): + super(PositionwiseFF, self).__init__() + + self.d_model = d_model + self.d_inner = d_inner + self.dropout = dropout + + if dropout == 0.0: + self.CoreNet = nn.SequentialCell( + nn.Dense(d_model, d_inner), nn.ReLU(), + nn.Dense(d_inner, d_model), + ) + else: + self.CoreNet = nn.SequentialCell( + nn.Dense(d_model, d_inner), nn.ReLU(), + nn.Dropout(1 - dropout, dtype=ms.float32), + nn.Dense(d_inner, d_model), + nn.Dropout(1 - dropout, dtype=ms.float32), + ) + self.layer_norm = nn.LayerNorm([d_model]) + self.pre_lnorm = pre_lnorm + + def construct(self, inp): + if self.pre_lnorm: + # layer normalization + positionwise feed-forward + core_out = self.CoreNet(self.layer_norm(inp)) + # residual connection + output = core_out + inp + else: + # positionwise feed-forward + core_out = self.CoreNet(inp) + # residual connection + layer normalization + output = self.layer_norm(inp + core_out) + return output diff --git a/research/nlp/transformer_xl/src/model/vocabulary.py b/research/nlp/transformer_xl/src/model/vocabulary.py new file mode 100644 index 0000000000000000000000000000000000000000..2ef036690415785e0b20898b4bf73c3cd9bda0d0 --- /dev/null +++ b/research/nlp/transformer_xl/src/model/vocabulary.py @@ -0,0 +1,186 @@ +# Copyright 2022 Huawei Technologies Co., Ltd +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================ + +from collections import Counter, OrderedDict +import numpy as np +import mindspore as ms +from mindspore.ops import Concat + +class Vocab: + def __init__(self, special=None, min_freq=0, max_size=None, lower_case=True, + delimiter=None, vocab_file=None): + self.concat_0 = Concat(0) + if special is None: + special = [] + self.counter = Counter() + self.special = special + self.min_freq = min_freq + self.max_size = max_size + self.lower_case = lower_case + self.delimiter = delimiter + self.vocab_file = vocab_file + + self.sym2idx = OrderedDict() + self.unk_idx = None + self.idx2sym = [] + + def tokenize(self, line, add_eos=False, add_double_eos=False): + line = line.strip() + # convert to lower case + if self.lower_case: + line = line.lower() + + # empty delimiter '' will evaluate False + if self.delimiter == '': + symbols = line + else: + symbols = line.split(self.delimiter) + + if add_double_eos: # lm1b + symbols = ['<S>'] + symbols + ['<S>'] + elif add_eos: + symbols = symbols + ['<eos>'] + return symbols + + def count_file(self, path, verbose=False, add_eos=False): + if verbose: print('counting file {} ...'.format(path)) + + sents = [] + with open(path, 'r', encoding='utf-8') as f: + for idx, line in enumerate(f): + if verbose and idx > 0 and idx % 500000 == 0: + print(' line {}'.format(idx)) + symbols = self.tokenize(line, add_eos=add_eos) + self.counter.update(symbols) + sents.append(symbols) + + return sents + + def count_sents(self, sents, verbose=False): + """ + sents : a list of sentences, each a list of tokenized symbols + """ + if verbose: print('counting {} sents ...'.format(len(sents))) + for idx, symbols in enumerate(sents): + if verbose and idx > 0 and idx % 500000 == 0: + print(' line {}'.format(idx)) + self.counter.update(symbols) + + def _build_from_file(self, vocab_file): + self.idx2sym = [] + self.sym2idx = OrderedDict() + + with open(vocab_file, 'r', encoding='utf-8') as f: + for line in f: + symb = line.strip().split()[0] + self.add_symbol(symb) + self.unk_idx = self.sym2idx['<UNK>'] + + def build_vocab(self): + if self.vocab_file: + print('building vocab from {}'.format(self.vocab_file)) + self._build_from_file(self.vocab_file) + print('final vocab size {}'.format(len(self))) + else: + print('building vocab with min_freq={}, max_size={}'.format( + self.min_freq, self.max_size)) + self.idx2sym = [] + self.sym2idx = OrderedDict() + + for sym in self.special: + self.add_special(sym) + + for sym, cnt in self.counter.most_common(self.max_size): + if cnt < self.min_freq: break + self.add_symbol(sym) + + print('final vocab size {} from {} unique tokens'.format( + len(self), len(self.counter))) + + def encode_file(self, path, ordered=False, verbose=False, add_eos=True, + add_double_eos=False): + if verbose: print('encoding file {} ...'.format(path)) + + encoded = [] + if not ordered: + with open(path, 'r', encoding='utf-8') as f: + for idx, line in enumerate(f): + if verbose and idx > 0 and idx % 500000 == 0: + print(' line {}'.format(idx)) + symbols = self.tokenize(line, add_eos=add_eos, + add_double_eos=add_double_eos) + encoded.append(self.convert_to_tensor(symbols)) + + else: + with open(path, 'r', encoding='utf-8') as f: + for idx, line in enumerate(f): + if verbose and idx > 0 and idx % 500000 == 0: + print(' line {}'.format(idx)) + symbols = self.tokenize(line, add_eos=add_eos, + add_double_eos=add_double_eos) + symbols_indices = self.get_indices(symbols) + encoded.append(symbols_indices) + encoded = np.concatenate(encoded, axis=0).astype(np.int64) + + return encoded + + def encode_sents(self, sents, ordered=False, verbose=False): + if verbose: print('encoding {} sents ...'.format(len(sents))) + encoded = [] + for idx, symbols in enumerate(sents): + if verbose and idx > 0 and idx % 500000 == 0: + print(' line {}'.format(idx)) + encoded.append(self.convert_to_tensor(symbols)) + + if ordered: + encoded = self.concat_0(encoded) + + return encoded + + def add_special(self, sym): + if sym not in self.sym2idx: + self.idx2sym.append(sym) + self.sym2idx[sym] = len(self.idx2sym) - 1 + setattr(self, '{}_idx'.format(sym.strip('<>')), self.sym2idx[sym]) + + def add_symbol(self, sym): + if sym not in self.sym2idx: + self.idx2sym.append(sym) + self.sym2idx[sym] = len(self.idx2sym) - 1 + + def get_sym(self, idx): + return self.idx2sym[idx] + + def get_idx(self, sym): + if sym in self.sym2idx: + return self.sym2idx[sym] + return self.sym2idx.get(sym, self.unk_idx) + + def get_symbols(self, indices): + return [self.get_sym(idx) for idx in indices] + + def get_indices(self, symbols): + return [self.get_idx(sym) for sym in symbols] + + def convert_to_tensor(self, symbols): + return ms.Tensor(self.get_indices(symbols), dtype=ms.int64) + + def convert_to_sent(self, indices, exclude=None): + if exclude is None: + return ' '.join([self.get_sym(idx) for idx in indices]) + return ' '.join([self.get_sym(idx) for idx in indices if idx not in exclude]) + + def __len__(self): + return len(self.idx2sym) diff --git a/research/nlp/transformer_xl/src/model_utils/__init__.py b/research/nlp/transformer_xl/src/model_utils/__init__.py new file mode 100644 index 0000000000000000000000000000000000000000..602527cd720c8d268599dbaef190ba1cf1eb6f2b --- /dev/null +++ b/research/nlp/transformer_xl/src/model_utils/__init__.py @@ -0,0 +1,14 @@ +# Copyright 2022 Huawei Technologies Co., Ltd +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================ diff --git a/research/nlp/transformer_xl/src/model_utils/config.py b/research/nlp/transformer_xl/src/model_utils/config.py new file mode 100644 index 0000000000000000000000000000000000000000..dd048184190a3221428bb62f7920ea34cd4337dd --- /dev/null +++ b/research/nlp/transformer_xl/src/model_utils/config.py @@ -0,0 +1,139 @@ +# Copyright 2022 Huawei Technologies Co., Ltd +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================ + +"""Parse arguments""" + +import os +import ast +import argparse +import time +from pprint import pformat +import yaml + + +class Config: + """ + Configuration namespace. Convert dictionary to members. + """ + + def __init__(self, cfg_dict): + for k, v in cfg_dict.items(): + if isinstance(v, (list, tuple)): + setattr(self, k, [Config(x) if isinstance(x, dict) else x for x in v]) + else: + setattr(self, k, Config(v) if isinstance(v, dict) else v) + + def __str__(self): + return pformat(self.__dict__) + + def __repr__(self): + return self.__str__() + + +def parse_cli_to_yaml(parser, cfg, helper=None, choices=None, cfg_path="default_config.yaml"): + """ + Parse command line arguments to the configuration according to the default yaml. + + Args: + parser: Parent parser. + cfg: Base configuration. + helper: Helper description. + cfg_path: Path to the default yaml config. + """ + parser = argparse.ArgumentParser(description="[REPLACE THIS at config.py]", + parents=[parser]) + helper = {} if helper is None else helper + choices = {} if choices is None else choices + for item in cfg: + if not isinstance(cfg[item], list) and not isinstance(cfg[item], dict): + help_description = helper[item] if item in helper else "Please reference to {}".format(cfg_path) + choice = choices[item] if item in choices else None + if isinstance(cfg[item], bool): + parser.add_argument("--" + item, type=ast.literal_eval, default=cfg[item], choices=choice, + help=help_description) + else: + parser.add_argument("--" + item, type=type(cfg[item]), default=cfg[item], choices=choice, + help=help_description) + args = parser.parse_args() + return args + + +def parse_yaml(yaml_path): + """ + Parse the yaml config file. + + Args: + yaml_path: Path to the yaml config. + """ + with open(yaml_path, 'r') as fin: + try: + cfgs = yaml.load_all(fin.read(), Loader=yaml.FullLoader) + cfgs = [x for x in cfgs] + if len(cfgs) == 1: + cfg_helper = {} + cfg = cfgs[0] + cfg_choices = {} + elif len(cfgs) == 2: + cfg, cfg_helper = cfgs + cfg_choices = {} + elif len(cfgs) == 3: + cfg, cfg_helper, cfg_choices = cfgs + else: + raise ValueError("At most 3 docs (config, description for help, choices) are supported in config yaml") + print(cfg_helper) + except: + raise ValueError("Failed to parse yaml") + return cfg, cfg_helper, cfg_choices + + +def merge(args, cfg): + """ + Merge the base config from yaml file and command line arguments. + + Args: + args: Command line arguments. + cfg: Base configuration. + """ + args_var = vars(args) + for item in args_var: + cfg[item] = args_var[item] + return cfg + + +def reset_config(args): + if args.d_embed < 0: + args.d_embed = args.d_model + args.train_url = '{}-{}'.format(args.train_url, args.dataset) + args.train_url = os.path.join(args.train_url, time.strftime('%Y%m%d-%H%M%S')) + if not os.path.exists(args.train_url): + os.makedirs(args.train_url, exist_ok=True) + + +def get_config(): + """ + Get Config according to the yaml file and cli arguments. + """ + parser = argparse.ArgumentParser(description="Mindspore Transformer Language Model", add_help=False) + + config_path = os.environ['CONFIG_PATH'] + default, helper, choices = parse_yaml(config_path) + args = parse_cli_to_yaml(parser=parser, cfg=default, helper=helper, choices=choices, cfg_path=config_path) + + reset_config(args) + final_config = merge(args, default) + return Config(final_config) + + +config = get_config() diff --git a/research/nlp/transformer_xl/src/model_utils/device_adapter.py b/research/nlp/transformer_xl/src/model_utils/device_adapter.py new file mode 100644 index 0000000000000000000000000000000000000000..98b8c7ac4807141e7a736893e743c341976e891f --- /dev/null +++ b/research/nlp/transformer_xl/src/model_utils/device_adapter.py @@ -0,0 +1,26 @@ +# Copyright 2022 Huawei Technologies Co., Ltd +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================ + +"""Device adapter for ModelArts""" +from .config import config + +if config.enable_modelarts: + from .moxing_adapter import get_device_id, get_device_num, get_rank_id, get_job_id +else: + from .local_adapter import get_device_id, get_device_num, get_rank_id, get_job_id + +__all__ = [ + "get_device_id", "get_device_num", "get_rank_id", "get_job_id" +] diff --git a/research/nlp/transformer_xl/src/model_utils/local_adapter.py b/research/nlp/transformer_xl/src/model_utils/local_adapter.py new file mode 100644 index 0000000000000000000000000000000000000000..33edc2d0df14791f377be28ed4fe48fe9a681325 --- /dev/null +++ b/research/nlp/transformer_xl/src/model_utils/local_adapter.py @@ -0,0 +1,63 @@ +# Copyright 2022 Huawei Technologies Co., Ltd +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================ + +"""Local adapter""" + +import os +from .config import config + + +def get_device_id(): + if config.device == "Ascend": + device_id = os.getenv('DEVICE_ID', '0') + elif config.device == "GPU": + device_id = os.getenv('OMPI_COMM_WORLD_LOCAL_RANK', '0') + else: + device_id = 0 + return int(device_id) + + +def get_device_num(): + if config.device == "Ascend": + local_device_num = os.getenv('RANK_SIZE', '1') + elif config.device == "GPU": + local_device_num = os.getenv('OMPI_COMM_WORLD_SIZE', '1') + else: + local_device_num = 1 + return int(local_device_num) + + +def get_local_device_num(): + if config.device == "Ascend": + local_device_num = min(get_device_num, 8) + elif config.device == "GPU": + local_device_num = os.getenv('OMPI_COMM_WORLD_LOCAL_SIZE', '1') + else: + local_device_num = 1 + return int(local_device_num) + + +def get_rank_id(): + if config.device == "Ascend": + global_rank_id = os.getenv('RANK_ID', '0') + elif config.device == "GPU": + global_rank_id = os.getenv('OMPI_COMM_WORLD_RANK', '0') + else: + global_rank_id = 0 + return int(global_rank_id) + + +def get_job_id(): + return "Local Job" diff --git a/research/nlp/transformer_xl/src/model_utils/moxing_adapter.py b/research/nlp/transformer_xl/src/model_utils/moxing_adapter.py new file mode 100644 index 0000000000000000000000000000000000000000..8edf03c97fe62fa63228e1f012a88d7a80a7be3a --- /dev/null +++ b/research/nlp/transformer_xl/src/model_utils/moxing_adapter.py @@ -0,0 +1,146 @@ +# Copyright 2022 Huawei Technologies Co., Ltd +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================ + +"""Moxing adapter for ModelArts""" + +import os +import functools +from mindspore import context +from .config import config + +_global_sync_count = 0 + + +def get_device_id(): + if config.device == "ascend": + device_id = os.getenv('DEVICE_ID', '0') + elif config.device == "gpu": + device_id = os.getenv('OMPI_COMM_WORLD_LOCAL_RANK', '0') + else: + device_id = 0 + return int(device_id) + + +def get_device_num(): + if config.device == "ascend": + local_device_num = os.getenv('RANK_SIZE', '1') + elif config.device == "gpu": + local_device_num = os.getenv('OMPI_COMM_WORLD_SIZE', '1') + else: + local_device_num = 1 + return int(local_device_num) + + +def get_local_device_num(): + if config.device == "ascend": + local_device_num = min(get_device_num, 8) + elif config.device == "gpu": + local_device_num = os.getenv('OMPI_COMM_WORLD_LOCAL_SIZE', '1') + else: + local_device_num = 1 + return int(local_device_num) + + +def get_rank_id(): + if config.device == "ascend": + global_rank_id = os.getenv('RANK_ID', '0') + elif config.device == "gpu": + global_rank_id = os.getenv('OMPI_COMM_WORLD_RANK', '0') + else: + global_rank_id = 0 + return int(global_rank_id) + + +def get_job_id(): + job_id = os.getenv('JOB_ID') + job_id = job_id if job_id != "" else "default" + return job_id + + +def sync_data(from_path, to_path): + """ + Download data from remote obs to local directory if the first url is remote url and the second one is local path + Upload data from local directory to remote obs in contrast. + """ + import moxing as mox + import time + global _global_sync_count + sync_lock = "/tmp/copy_sync.lock" + str(_global_sync_count) + _global_sync_count += 1 + + # Each server contains 8 devices as most. + if get_device_id() % min(get_device_num(), 8) == 0 and not os.path.exists(sync_lock): + print("from path: ", from_path) + print("to path: ", to_path) + mox.file.copy_parallel(from_path, to_path) + print("===finish data synchronization===") + try: + os.mknod(sync_lock) + except IOError: + pass + print("===save flag===") + + while True: + if os.path.exists(sync_lock): + break + time.sleep(1) + + print("Finish sync data from {} to {}.".format(from_path, to_path)) + + +def moxing_wrapper(pre_process=None, post_process=None): + """ + Moxing wrapper to download dataset and upload outputs. + """ + + def wrapper(run_func): + @functools.wraps(run_func) + def wrapped_func(*args, **kwargs): + # Download data from data_url + if config.enable_modelarts: + if config.data_url: + sync_data(config.data_url, config.data_path) + print("Dataset downloaded: ", os.listdir(config.data_path)) + if config.checkpoint_url: + sync_data(config.checkpoint_url, config.load_path) + print("Preload downloaded: ", os.listdir(config.load_path)) + if config.train_url: + sync_data(config.train_url, config.output_path) + print("Workspace downloaded: ", os.listdir(config.output_path)) + + context.set_context(save_graphs_path=os.path.join(config.output_path, str(get_rank_id()))) + config.device_num = get_device_num() + config.device_id = get_device_id() + if not os.path.exists(config.output_path): + os.makedirs(config.output_path) + + if pre_process: + pre_process() + + # Run the main function + run_func(*args, **kwargs) + + # Upload data to train_url + if config.enable_modelarts: + if post_process: + post_process() + + if config.train_url: + print("Start to copy output directory") + sync_data(config.output_path, config.train_url) + + return wrapped_func + + return wrapper diff --git a/research/nlp/transformer_xl/src/utils/additional_algorithms.py b/research/nlp/transformer_xl/src/utils/additional_algorithms.py new file mode 100644 index 0000000000000000000000000000000000000000..cb9dc40008e6d2b0734b37e11d93d772646ff093 --- /dev/null +++ b/research/nlp/transformer_xl/src/utils/additional_algorithms.py @@ -0,0 +1,82 @@ +# Copyright 2022 Huawei Technologies Co., Ltd +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================ + +import math +import mindspore as ms +import mindspore.ops as P +import mindspore.common.dtype as mstype +from mindspore import Tensor, Parameter +from mindspore.nn import Cell, MatMul +from mindspore.nn.learning_rate_schedule import LearningRateSchedule + +matMul_tb = MatMul(transpose_x2=True) + + +def linear(_input, weight, bias=None): + r""" + Applies a linear transformation to the incoming data: :math:`y = xA^T + b`. + + Shape: + + - Input: :math:`(N, *, in\_features)` where `*` means any number of + additional dimensions + - Weight: :math:`(out\_features, in\_features)` + - Bias: :math:`(out\_features)` + - Output: :math:`(N, *, out\_features)` + """ + output = matMul_tb(_input, weight) + if bias is not None: + output += bias + return output + + +class MaskerFill(Cell): + def __init__(self): + super(MaskerFill, self).__init__() + self.select = P.Select() + self.fill = P.Fill() + self.cast = P.Cast() + + def construct(self, inputs, mask, value): + mask = self.cast(mask, mstype.bool_) + masked_value = self.fill(ms.float32, inputs.shape, value) + output = self.select(mask, masked_value, inputs) + return output + + +class CosineAnnealingLR(LearningRateSchedule): + def __init__(self, total_step, lr, min_lr=0): + super(CosineAnnealingLR, self).__init__() + + self.min_lr = Parameter(Tensor(min_lr, ms.float32)) + self.lr = Parameter(Tensor(lr, ms.float32)) + self.max_lr = Parameter(Tensor(lr, ms.float32)) + self.T_max = Parameter(Tensor(total_step, ms.float32)) + + self.cos = P.Cos() + self.pi = Parameter(Tensor(math.pi, ms.float32)) + self.cast = P.Cast() + + def construct(self, global_step): + global_step = self.cast(global_step, ms.float32) + if global_step <= 0: + self.lr = self.max_lr + elif (global_step - 1 - self.T_max) % (2 * self.T_max) == 0: + self.lr += (self.max_lr - self.min_lr) * (1 - self.cos(self.pi / self.T_max)) / 2 + else: + self.lr = (1 + self.cos(self.pi * global_step / self.T_max)) /\ + (1 + self.cos(self.pi * (global_step - 1) / self.T_max)) * (self.lr - self.min_lr) + self.min_lr + + return self.lr diff --git a/research/nlp/transformer_xl/src/utils/dataset_util.py b/research/nlp/transformer_xl/src/utils/dataset_util.py new file mode 100644 index 0000000000000000000000000000000000000000..69285269cad2ce355c507d018a800d94cb560cb5 --- /dev/null +++ b/research/nlp/transformer_xl/src/utils/dataset_util.py @@ -0,0 +1,29 @@ +# Copyright 2022 Huawei Technologies Co., Ltd +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================ + +from src.model_utils.config import config +import src.model.dataset as ds + + +def get_dataset(datadir, dataset): + if dataset == 'enwik8': + dataset = ds.Enwik8_Dataset(path=datadir, _dataset=dataset, batch_size=config.batch_size, + tgt_len=config.tgt_len, ext_len=config.ext_len, eval_tgt_len=config.eval_tgt_len, + varlen=config.varlen) + elif dataset == 'text8': + dataset = ds.Text8_Dataset(path=datadir, _dataset=dataset, batch_size=config.batch_size, tgt_len=config.tgt_len, + ext_len=config.ext_len, eval_tgt_len=config.eval_tgt_len, varlen=config.varlen) + + return dataset diff --git a/research/nlp/transformer_xl/src/utils/nnUtils.py b/research/nlp/transformer_xl/src/utils/nnUtils.py new file mode 100644 index 0000000000000000000000000000000000000000..a5dce844859627d4a2567791d4f6bc1adc900635 --- /dev/null +++ b/research/nlp/transformer_xl/src/utils/nnUtils.py @@ -0,0 +1,67 @@ +# Copyright 2022 Huawei Technologies Co., Ltd +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================ + +import mindspore +from mindspore import Tensor +from mindspore.common import initializer as init + + +def uniform_(tensor, a=0., b=1.): + r"""Fills the input Tensor with values drawn from the uniform + distribution :math:`\mathcal{U}(a, b)`. + + Args: + tensor: an n-dimensional `torch.Tensor` + a: the lower bound of the uniform distribution + b: the upper bound of the uniform distribution + + Examples: + #>>> w = torch.empty(3, 5) + #>>> nn.init.uniform_(w) + """ + tensor += Tensor(dtype=mindspore.float32, init=init.Zero(), shape=tensor.shape).fill((b - a) / 2) + init.Uniform((b - a) / 2)(tensor.asnumpy()) + + +def normal_(tensor, mean=0., std=1.): + r"""Fills the input Tensor with values drawn from the normal + distribution :math:`\mathcal{N}(\text{mean}, \text{std}^2)`. + + Args: + tensor: an n-dimensional `torch.Tensor` + mean: the mean of the normal distribution + std: the standard deviation of the normal distribution + + Examples: + #>>> w = torch.empty(3, 5) + #>>> nn.init.normal_(w) + """ + + init.Normal(mean=mean, sigma=std)(tensor.asnumpy()) + + +def constant_(tensor, val): + r"""Fills the input Tensor with the value :math:`\text{val}`. + + Args: + tensor: an n-dimensional `torch.Tensor` + val: the value to fill the tensor with + + Examples: + #>>> w = torch.empty(3, 5) + #>>> nn.init.constant_(w, 0.3) + """ + constant_init = init.Constant(value=val) + constant_init(tensor.asnumpy()) diff --git a/research/nlp/transformer_xl/train.py b/research/nlp/transformer_xl/train.py new file mode 100644 index 0000000000000000000000000000000000000000..2b6406a7683c545ec72dc7d7357462635d245107 --- /dev/null +++ b/research/nlp/transformer_xl/train.py @@ -0,0 +1,334 @@ +# Copyright 2022 Huawei Technologies Co., Ltd +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================ +"""Transformer training script.""" + +import math +import argparse +import numpy as np +import mindspore as ms +from mindspore.communication import init +import mindspore.nn.optim as optim +import mindspore.context as context +from mindspore.dataset import GeneratorDataset +from mindspore.train.model import Model +from src.callback.eval import EvalDuringTrain, doEval +from src.callback.log import TrainLogger +from src.callback.flag import FlagModifiedCallback +from src.model.mem_transformer import MemTransformerLM +from src.model_utils.device_adapter import get_device_id, get_device_num, get_rank_id +from src.model_utils.config import config +from src.utils.dataset_util import get_dataset +from src.utils.nnUtils import uniform_, normal_, constant_ +from src.metric.calc import bpc + + +def init_weight(weight, _config): + if _config.init == 'uniform': + uniform_(weight, -_config.init_range, _config.init_range) + elif _config.init == 'normal': + normal_(weight, 0.0, _config.init_std) + + +def init_bias(bias): + constant_(bias, 0.0) + + +def weights_init_Dense(m, config1): + if hasattr(m, 'weight') and m.weight is not None: + init_weight(m.weight, config1) + if hasattr(m, 'bias') and m.bias is not None: + init_bias(m.bias) + + +def weights_init_AdaptiveEmbedding(m, config1): + if hasattr(m, 'emb_projs'): + for i in range(len(m.emb_projs)): + if m.emb_projs[i] is not None: + normal_(m.emb_projs[i], 0.0, config1.proj_init_std) + + +def weights_init_ProjectedAdaptiveLogSoftmax(m, config1): + if hasattr(m, 'cluster_weight') and m.cluster_weight is not None: + init_weight(m.cluster_weight, config1) + if hasattr(m, 'cluster_bias') and m.cluster_bias is not None: + init_bias(m.cluster_bias) + if hasattr(m, 'out_projs'): + for i in range(len(m.out_projs)): + if m.out_projs[i] is not None: + normal_(m.out_projs[i], 0.0, config1.proj_init_std) + + +def weights_init_LayerNorm(m, config1): + if hasattr(m, 'weight'): + normal_(m.weight, 1.0, config1.init_std) + if hasattr(m, 'bias') and m.bias is not None: + init_bias(m.bias) + + +def weights_init_TransformerLM(m, config1): + if hasattr(m, 'r_emb'): + init_weight(m.r_emb, config1) + if hasattr(m, 'r_w_bias'): + init_weight(m.r_w_bias, config1) + if hasattr(m, 'r_r_bias'): + init_weight(m.r_r_bias, config1) + if hasattr(m, 'r_bias'): + init_bias(m.r_bias) + + +def weights_init(m, config1): + classname = m.__class__.__name__ + if classname.find('Dense') != -1: + weights_init_Dense(m, config1) + elif classname.find('AdaptiveEmbedding') != -1: + weights_init_AdaptiveEmbedding(m, config1) + elif classname.find('Embedding') != -1: + if hasattr(m, 'weight'): + init_weight(m.weight, config1) + elif classname.find('ProjectedAdaptiveLogSoftmax') != -1: + weights_init_ProjectedAdaptiveLogSoftmax(m, config1) + elif classname.find('LayerNorm') != -1: + weights_init_LayerNorm(m, config1) + elif classname.find('TransformerLM') != -1: + weights_init_TransformerLM(m, config1) + + +def get_optimizer(_config, net, scheduler): + """ + get optimizer: adam,sgd + Args: + _config: + net: + scheduler: + + Returns: + optimizer: + optimizer_sparse: default is None + """ + optimizer = optimizer_sparse = None + lr = dynamic_lr() + if _config.optim.lower() == 'sgd': + if _config.sample_softmax > 0: + dense_params, sparse_params = [], [] + for param in net.trainable_params(): + if len(param) == len(net.word_emb.embedding_table): + sparse_params.append(param) + else: + dense_params.append(param) + optimizer_sparse = optim.SGD(sparse_params, learning_rate=_config.lr * 2) + optimizer = optim.SGD(dense_params, learning_rate=_config.lr, momentum=_config.mom) + else: + optimizer = optim.SGD(net.trainable_params(), learning_rate=_config.lr, + momentum=_config.mom) + elif _config.optim.lower() == 'adam': + if _config.sample_softmax > 0: + dense_params, sparse_params = [], [] + for param in net.trainable_params(): + if len(param) == len(net.word_emb.embedding_table): + sparse_params.append(param) + else: + dense_params.append(param) + optimizer_sparse = optim.SparseAdam(sparse_params, lr=lr) + optimizer = optim.Adam(dense_params, learning_rate=lr) + else: + optimizer = optim.Adam(net.trainable_params(), learning_rate=lr) + elif _config.optim.lower() == 'adagrad': + optimizer = optim.Adagrad(net.trainable_params(), learning_rate=lr) + return optimizer, optimizer_sparse + + +def rsqrt_decay(warmup_steps, current_step): + return float(max([current_step, warmup_steps])) ** -0.5 + + +def linear_warmup_learning_rate(current_step, warmup_steps, base_lr, init_lr): + lr_inc = (float(base_lr) - float(init_lr)) / float(warmup_steps) + learning_rate = float(init_lr) + lr_inc * current_step + return learning_rate + + +def a_cosine_learning_rate(current_step, base_lr, warmup_steps, total_steps): + decay_steps = total_steps - warmup_steps + linear_decay = (total_steps - current_step) / decay_steps + cosine_decay = 0.5 * (1 + math.cos(math.pi * 2 * 0.47 * current_step / decay_steps)) + decayed = linear_decay * cosine_decay + 0.00001 + learning_rate = decayed * base_lr + return learning_rate + + +def dynamic_lr(): + """dynamic learning rate generator""" + base_lr = config.lr + total_steps = int(config.max_step) + warmup_steps = int(config.warmup_step) + lr = [] + for i in range(total_steps): + if i < warmup_steps: + lr.append(linear_warmup_learning_rate(i, warmup_steps, base_lr, base_lr * config.warmup_ratio)) + else: + lr.append(a_cosine_learning_rate(i, base_lr, warmup_steps, total_steps)) + return lr + + +def get_scheduler(_config): + scheduler = scheduler_sparse = None + if _config.scheduler == 'cosine': + # here we do not set eta_min to lr_min to be backward compatible + # because in previous versions eta_min is default to 0 + # rather than the default value of lr_min 1e-6 + from src.utils.additional_algorithms import CosineAnnealingLR + + scheduler = CosineAnnealingLR(total_step=_config.max_step, lr=_config.lr, min_lr=_config.eta_min) + + elif _config.scheduler == 'inv_sqrt': + pass + + elif _config.scheduler == 'dev_perf': + pass + elif _config.scheduler == 'constant': + pass + return scheduler, scheduler_sparse + + +def set_seed(): + np.random.seed(config.seed) + ms.set_seed(config.seed) + + +def main(): + # Set the random seed manually for reproducibility. + set_seed() + + parser = argparse.ArgumentParser(description='Transformer-XL train running') + parser.add_argument('--datadir', default='./data/enwik8', + help='Directory contains enwik8 dataset.') + parser.add_argument('--dataset', default='enwik8', + help='Dataset Name.', choices=["enwik8", "text8"]) + parser.add_argument('--train_url', default="./", help='Directory of training output.') + parser.add_argument("--device", type=str, default="GPU", help="Device Target, default GPU", + choices=["Ascend", "GPU"]) + + args = parser.parse_args() + datadir = args.datadir + dataset = args.dataset + + device_id = get_device_id() + device_num = get_device_num() + + if config.device == 'Ascend': + context.set_context(mode=context.GRAPH_MODE, device_target="Ascend", device_id=device_id) + if device_num > 1: + context.reset_auto_parallel_context() + context.set_auto_parallel_context(device_num=device_num, parallel_mode=context.ParallelMode.DATA_PARALLEL, + gradients_mean=True) + init() + + elif config.device == 'GPU': + context.set_context(mode=context.GRAPH_MODE, device_target="GPU", max_device_memory="39.0GB", + enable_graph_kernel=True) + if device_num > 1: + init() + context.reset_auto_parallel_context() + context.set_auto_parallel_context(device_num=device_num, parallel_mode=context.ParallelMode.DATA_PARALLEL, + gradients_mean=True) + else: + context.set_context(mode=context.PYNATIVE_MODE, device_target="CPU") + + ############################################################################### + # Load data + ############################################################################### + + dataset = get_dataset(datadir, dataset) + ntokens = len(dataset.vocab) + config.n_token = ntokens + + # adaptive softmax / embedding + cutoffs = [] + + ############################################################################### + # Build the model + ############################################################################### + + net = MemTransformerLM(ntokens, config.n_layer, config.n_head, config.d_model, + config.d_head, config.d_inner, config.dropout, config.dropatt, + batch_size=config.batch_size, + d_embed=config.d_embed, div_val=config.div_val, + pre_lnorm=config.pre_lnorm, tgt_len=config.tgt_len, + ext_len=config.ext_len, mem_len=config.mem_len, eval_tgt_len=config.eval_tgt_len, + cutoffs=cutoffs, same_length=config.same_length, clamp_len=config.clamp_len) + + # ensure embedding init is not overridden by out_layer in case of weight sharing + weights_init(net, config) + weights_init(net.word_emb, config) + + config.n_all_param = sum([p.size for p in net.trainable_params()]) + config.n_nonemb_param = sum([p.size for p in net.layers.trainable_params()]) + + # scheduler + scheduler, _ = get_scheduler(config) + # optimizer + optimizer, _ = get_optimizer(config, net, scheduler) + + if device_id == 0: + print('=' * 100) + for k, v in config.__dict__.items(): + print(' - {} : {}'.format(k, v)) + print('=' * 100) + print('#params = {}'.format(config.n_all_param)) + print('#non emb params = {}'.format(config.n_nonemb_param)) + + ############################################################################### + # Training code + ############################################################################### + + config.n_batch = dataset.get_train_generator().n_batch + config.max_epoch = math.ceil(config.max_step / config.n_batch) + + rank_size, rank_id = get_device_num(), get_rank_id() + + train_dataset = GeneratorDataset(source=dataset.get_train_generator(), column_names=['data', 'target'], + num_shards=rank_size, shard_id=rank_id, shuffle=False) + # Due to the mems mechanism, it is not possible to perform multi-card segmentation on the valid and test datasets + valid_dataset = GeneratorDataset(source=dataset.get_valid_generator(), column_names=['data', 'target'], + shuffle=False) + test_dataset = GeneratorDataset(source=dataset.get_test_generator(), column_names=['data', 'target'], + shuffle=False) + + # Train # + + flagModifiedCallback = FlagModifiedCallback() + train_log = TrainLogger(per_print_times=config.log_interval, n_batch=config.n_batch) + evalDuringTrain = EvalDuringTrain(dataset=valid_dataset, per_print_times=config.eval_interval, + tgt_len=config.tgt_len, ext_len=config.ext_len, mem_len=config.mem_len, + eval_tgt_len=config.eval_tgt_len) + + model = Model(network=net, loss_fn=None, optimizer=optimizer, metrics=None) + model.train(config.max_step, train_dataset, sink_size=1, + callbacks=[flagModifiedCallback, train_log, evalDuringTrain]) + + # Test # + + if device_id == 0: + test_loss = doEval(net=net, dataset=test_dataset, tgt_len=config.tgt_len, ext_len=config.ext_len, + mem_len=config.mem_len, eval_tgt_len=config.eval_tgt_len) + print('=' * 100) + if config.dataset in ['enwik8', 'text8']: + print('| End of training | test loss {:5.2f} | test bpc {:9.5f}'.format( + test_loss, bpc(test_loss))) + print('=' * 100) + + +if __name__ == '__main__': + main() diff --git a/research/nlp/transformer_xl/tran_model/Transformer-XL model transform.md b/research/nlp/transformer_xl/tran_model/Transformer-XL model transform.md new file mode 100644 index 0000000000000000000000000000000000000000..b438acf601fa85458d4be533f5d1287ba48aacd2 --- /dev/null +++ b/research/nlp/transformer_xl/tran_model/Transformer-XL model transform.md @@ -0,0 +1,58 @@ +## Transformer-XL model transform + +作者在GitHub提供的源代码一共提供了PyTorch和TensorFlow两种版本的代码,这里提供了将PyTorch和TensorFlow训练好的pt模型转为MindSpore的ckpt模型的方案和具体操作 + +由于Transformer-XL源代码所需要的环境版本较低,并且高版本的环境会出现代码无法正常运行等问题,因此强烈建议先配置好Transformer-XL作者提供的源代码所需要的环境。模型转化的思路是,先在作者源代码所需的环境下,通过训练/下载对应模型的方式,将模型转化为numpy格式下的.pkl参数文件,再切换到MindSpore环境下将.pkl的参数文件传入MindSpore模型并保存为.ckpt文件。为了保证模型的正常运行,在保存模型后,加入了对test数据集的推理。 + +论文官方源代码(包含PyTorch版本与TensorFlow版本):[点此](https://github.com/kimiyoung/transformer-xl) + +作者提供的enwik8_large与text8_large模型链接:[enwik8_large](http://curtis.ml.cmu.edu/datasets/pretrained_xl/tf_enwiki8/) ; [text8_large](http://curtis.ml.cmu.edu/datasets/pretrained_xl/tf_text8/) + +### PyTorch2MindSpore + +所需环境: + +- Python:3.7.5 + +- PyTorch:0.4.0 + +```shell +# Step1:将/tran_model/torch_get_param.py和/tran_model/torch_get_param.sh拷贝到源代码的/pytorch/目录下 +cp "/home/transformer-xl/tran_model/torch_get_param.py" "/home/txl_author/pytorch/torch_get_param.py" +cp "/home/transformer-xl/tran_model/torch_get_param.sh" "/home/txl_author/pytorch/torch_get_param.sh" +# Step2:在PyTorch0.4.0环境下运行torch_get_param.sh,将模型参数取出转为numpy格式,并存为.pkl文件,其中[DATA_SET]为数据集名称,例如enwik8/text8,[WORK_DIR]为模型所在路径,因为PyTorch训练得到的模型默认名称为model.pt +cd /home/txl_author/pytorch/ +bash torch_get_param.sh [DATA_SET] [WORK_DIR] +# bash torch_get_param.sh "enwik8" "/home/ganweikang/project/txl_torch/pytorch/LM-TFM-enwik8/20220322-202922/" +# Step3:切换到高版本PyTorch下,将model.state_dict中的参数转为numpy,[WORK_DIR]为Step2中保存的enwik8_base.pkl所在的路径 +cd /home/transformer-xl/tran_model/torch2msp +bash torch2numpy.sh [DATA_SET] [WORK_DIR] +# bash torch2numpy.sh "enwik8" "/home/ganweikang/project/txl_torch/pytorch/" +# Step4:切换到MindSpore环境下,执行torch2msp.sh,将numpy格式的.pkl文件传入MindSpore模型并保存为.ckpt文件并执行一次test数据集的推理 +cd /home/transformer-xl/tran_model/ +bash torch2msp.sh [DATA_DIR] [DATA_NAME] [TORCH_PT_PATH] + [CONFIG_PATH] [DEVICE_ID(optional)] +# bash torch2msp.sh "/home/ganweikang/project/transformer-xl/data/enwik8/" "enwik8" "/home/ganweikang/project/txl_0512/tran_model/torch2msp/enwik8_base.pkl" "/home/ganweikang/project/txl_0512/yaml/enwik8_base_eval.yaml" + +``` + + + +### TensorFlow2MindSpore + +所需环境: + +- Python:2.7 + +- TensorFlow:1.12.0 + +```shell +# Step1:将/tran_model/tf_get_param.py和/tran_model/tf_get_param.sh拷贝到源代码的/tf/目录下 +cp "/home/transformer-xl/tran_model/tf_get_param.py" "/home/txl_author/tf/torch_get_param.py" +cp "/home/transformer-xl/tran_model/tf_get_param.sh" "/home/txl_author/tf/torch_get_param.sh" +# Step2:在TensorFlow环境下运行tf_get_param.sh,将模型参数取出转为numpy格式,并存为.pkl文件,其中[DATA_SET]为数据集名称,例如enwik8/text8。 +cd /home/txl_author/tf/ +bash tf_get_param.sh [DATA_SET] +# Step3:切换到MindSpore环境下,执行tf2msp.sh,将.pkl文件传入MindSpore模型并保存为.ckpt文件并执行一次test数据集的推理 +bash tf2msp.sh "/home/transformer-xl/data/text8" "text8" "/home/txl_author/tf/text8_large.pkl" "/home/transformer-xl/yaml/text8_large_eval.yaml" +``` diff --git a/research/nlp/transformer_xl/tran_model/key_mapping.py b/research/nlp/transformer_xl/tran_model/key_mapping.py new file mode 100644 index 0000000000000000000000000000000000000000..44d736b5f46dca2119770baa2d7eda65b3015511 --- /dev/null +++ b/research/nlp/transformer_xl/tran_model/key_mapping.py @@ -0,0 +1,76 @@ +# Copyright 2022 Huawei Technologies Co., Ltd +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================ + +""" +Build mapping file for model parameter transformation +""" + + +# mindspore -> pytroch +def msp2torch(): + param_dict = {} + param_dict["r_w_bias"] = 'r_w_bias' + param_dict["r_r_bias"] = 'r_r_bias' + param_dict['word_emb.emb_layers.0.embedding_table'] = 'word_emb.emb_layers.0.weight' + for i in range(0, 12): + param_dict[str(i) + '.attn.qkv_net.weight'] = 'layers.' + str(i) + '.dec_attn.qkv_net.weight' + param_dict[str(i) + '.attn.o_net.weight'] = 'layers.' + str(i) + '.dec_attn.o_net.weight' + param_dict[str(i) + '.attn.layer_norm.gamma'] = 'layers.' + str(i) + '.dec_attn.layer_norm.weight' + param_dict[str(i) + '.attn.layer_norm.beta'] = 'layers.' + str(i) + '.dec_attn.layer_norm.bias' + param_dict[str(i) + '.attn.r_net.weight'] = 'layers.' + str(i) + '.dec_attn.r_net.weight' + param_dict[str(i) + '.pos_ff.CoreNet.0.weight'] = 'layers.' + str(i) + '.pos_ff.CoreNet.0.weight' + param_dict[str(i) + '.pos_ff.CoreNet.0.bias'] = 'layers.' + str(i) + '.pos_ff.CoreNet.0.bias' + param_dict[str(i) + '.pos_ff.CoreNet.3.weight'] = 'layers.' + str(i) + '.pos_ff.CoreNet.3.weight' + param_dict[str(i) + '.pos_ff.CoreNet.3.bias'] = 'layers.' + str(i) + '.pos_ff.CoreNet.3.bias' + param_dict[str(i) + '.pos_ff.layer_norm.gamma'] = 'layers.' + str(i) + '.pos_ff.layer_norm.weight' + param_dict[str(i) + '.pos_ff.layer_norm.beta'] = 'layers.' + str(i) + '.pos_ff.layer_norm.bias' + param_dict['crit.out_layers.0.weight'] = 'crit.out_layers.0.weight' + param_dict['crit.out_layers.0.bias'] = 'crit.out_layers.0.bias' + param_dict['pos_emb.inv_freq'] = 'pos_emb.inv_freq' + with open('msp2torch_base.txt', 'w') as f: + for key, value in param_dict.items(): + line = '%s:%s\n' % (key, value) + f.write(line) + return param_dict + + +# tf -> msp +def tf2msp(): + param_dict = {} + param_dict["transformer/r_w_bias"] = 'r_w_bias' + param_dict["transformer/r_r_bias"] = 'r_r_bias' + param_dict['transformer/adaptive_embed/lookup_table'] = 'word_emb.emb_layers.0.embedding_table' + for i in range(0, 24): + param_dict['transformer/layer_' + str(i) + '/rel_attn/qkv/kernel'] = str(i) + '.attn.qkv_net.weight' + param_dict['transformer/layer_' + str(i) + '/rel_attn/o/kernel'] = str(i) + '.attn.o_net.weight' + param_dict['transformer/layer_' + str(i) + '/rel_attn/r/kernel'] = str(i) + '.attn.r_net.weight' + param_dict['transformer/layer_' + str(i) + '/rel_attn/LayerNorm/gamma'] = str(i) + '.attn.layer_norm.gamma' + param_dict['transformer/layer_' + str(i) + '/rel_attn/LayerNorm/beta'] = str(i) + '.attn.layer_norm.beta' + param_dict['transformer/layer_' + str(i) + '/ff/layer_1/kernel'] = str(i) + '.pos_ff.CoreNet.0.weight' + param_dict['transformer/layer_' + str(i) + '/ff/layer_1/bias'] = str(i) + '.pos_ff.CoreNet.0.bias' + param_dict['transformer/layer_' + str(i) + '/ff/layer_2/kernel'] = str(i) + '.pos_ff.CoreNet.3.weight' + param_dict['transformer/layer_' + str(i) + '/ff/layer_2/bias'] = str(i) + '.pos_ff.CoreNet.3.bias' + param_dict['transformer/layer_' + str(i) + '/ff/LayerNorm/gamma'] = str(i) + '.pos_ff.layer_norm.gamma' + param_dict['transformer/layer_' + str(i) + '/ff/LayerNorm/beta'] = str(i) + '.pos_ff.layer_norm.beta' + with open('tf2msp_large.txt', 'w') as f: + for key, value in param_dict.items(): + line = '%s:%s\n' % (key, value) + f.write(line) + return param_dict + + +if __name__ == '__main__': + msp2torch() + tf2msp() diff --git a/research/nlp/transformer_xl/tran_model/tf2msp.sh b/research/nlp/transformer_xl/tran_model/tf2msp.sh new file mode 100644 index 0000000000000000000000000000000000000000..696b3a63b20155fbe80f46ffb329487c4ceaba87 --- /dev/null +++ b/research/nlp/transformer_xl/tran_model/tf2msp.sh @@ -0,0 +1,77 @@ +#!/bin/bash +# Copyright 2022 Huawei Technologies Co., Ltd +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================ + + +echo 'Preprocess key mapping...' +python key_mapping.py + +echo 'Trans tensorflow model to mindspore model.' +if [ $# -lt 4 ] || [ $# -gt 5 ] +then + echo "Usage: bash tf2msp.sh [DATA_DIR] [DATA_NAME] [TF_PT_PATH] + [CONFIG_PATH] [DEVICE_ID(optional)]" +exit 1 +fi + +export DEVICE_ID=0 +if [ $# = 5 ] ; then + export DEVICE_ID=$5 +fi; + +get_real_path(){ + if [ "${1:0:1}" == "/" ]; then + echo "$1" + else + echo "$(realpath -m $PWD/$1)" + fi +} + +DATA_DIR=$(get_real_path $1) +if [ ! -d $DATA_DIR ] +then + echo "error: DATA_DIR=$DATA_DIR is not a directory" +exit 1 +fi + +DATA_NAME=$2 +PT_PATH=$3 +CONFIG_PATH=$4 + +echo "DATA_DIR="$DATA_DIR +echo "DATA_NAME="$DATA_NAME +echo "PT_PATH="$PT_PATH +echo "CONFIG_PATH="$CONFIG_PATH + +export CONFIG_PATH=${CONFIG_PATH} +export DEVICE_NUM=1 +export RANK_SIZE=$DEVICE_NUM +export RANK_ID=0 + +if [ ! -d "tf2msp_model" ]; +then + mkdir ./tf2msp_model +fi + +echo "Start evaluation for device $DEVICE_ID :)" + +python ./tf2msp/tf2msp.py \ + --device_id=$DEVICE_ID \ + --datadir=$DATA_DIR \ + --dataset=$DATA_NAME \ + --pt_path=$PT_PATH \ + --device="GPU" &> tf2msp_$DATA_NAME.log & + + diff --git a/research/nlp/transformer_xl/tran_model/tf2msp/tf2msp.py b/research/nlp/transformer_xl/tran_model/tf2msp/tf2msp.py new file mode 100644 index 0000000000000000000000000000000000000000..1fad7f9c2159b10b12fc0b50c0d59568aab2f960 --- /dev/null +++ b/research/nlp/transformer_xl/tran_model/tf2msp/tf2msp.py @@ -0,0 +1,102 @@ +# Copyright 2022 Huawei Technologies Co., Ltd +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================ + +import sys +import argparse +import pickle +import mindspore +import mindspore.ops as ops +from mindspore import context +from mindspore.dataset import GeneratorDataset +from mindspore import save_checkpoint +from src.metric.calc import bpc +from src.model.mem_transformer import MemTransformerLM +from src.model_utils.config import config +from src.utils.dataset_util import get_dataset +from src.callback.eval import doEval + +sys.path.insert(0, '../') + +parser = argparse.ArgumentParser(description='PyTorch Model Trans MindSpore Model.') +parser.add_argument('--datadir', default='./data/enwik8', + help='Directory contains enwik8 dataset.') +parser.add_argument('--dataset', default='enwik8', + help='Dataset Name.', choices=["enwik8", "text8"]) +parser.add_argument('--pt_path', default="./model.pt", help='Directory of model param.') +parser.add_argument("--device", type=str, default="GPU", help="Device Target, default GPU", + choices=["Ascend", "GPU"]) +parser.add_argument("--device_id", type=int, default=0, help="Device id, default is 0.") +args = parser.parse_args() +datadir = args.datadir +dataset = args.dataset +pt_path = args.pt_path +device_id = args.device_id + +numpy_param_path = pt_path +with open(numpy_param_path, 'rb') as f: + tf_dict = pickle.load(f, encoding='bytes') + +dataset = get_dataset(datadir, dataset) +ntokens = len(dataset.vocab) + +context.set_context(device_id=device_id) +context.set_context(mode=context.GRAPH_MODE, device_target="GPU", max_device_memory="39.0GB", + enable_graph_kernel=True) + +test_dataset = GeneratorDataset(source=dataset.get_test_generator(), column_names=['data', 'target'], + shuffle=False) + +cutoffs = [] +net = MemTransformerLM(ntokens, config.n_layer, config.n_head, config.d_model, + config.d_head, config.d_inner, config.dropout, config.dropatt, batch_size=config.batch_size, + d_embed=config.d_embed, div_val=config.div_val, + pre_lnorm=config.pre_lnorm, tgt_len=config.tgt_len, + ext_len=config.ext_len, mem_len=config.mem_len, eval_tgt_len=config.eval_tgt_len, + cutoffs=cutoffs, same_length=config.same_length, clamp_len=config.clamp_len) + +net_dict = {} +with open('./tf2msp_large.txt', 'r') as f: + for line in f.readlines(): + tf_name, msp_name = line.strip().split(":") + net_dict[msp_name] = tf_dict[tf_name] + +transpose = ops.Transpose() + +for k in net.parameters_dict(): + if k in ('mems', 'valid_mems', 'empty_valid_mems'): + continue + if k in ('pos_emb.inv_freq', 'crit.out_layers.0.weight', 'crit.out_layers.0.bias'): + continue + if 'attn.qkv_net.weight' in k or 'attn.r_net.weight' in k or \ + 'attn.o_net.weight' in k or 'pos_ff.CoreNet.0.weight' in k or \ + 'pos_ff.CoreNet.3.weight' in k: + a = mindspore.Tensor(net_dict[k].transpose((1, 0))) + net.parameters_dict()[k].set_data(a) + else: + net.parameters_dict()[k].set_data(mindspore.Tensor(net_dict[k])) + +print('load net param') + +save_path = './tf2msp_model/' + str(args.dataset) + '_model.ckpt' +save_checkpoint(net, save_path) + +test_loss = doEval(net, test_dataset, config.tgt_len, config.ext_len, config.mem_len, config.eval_tgt_len) + +print('=' * 100) +if config.dataset in ['enwik8', 'text8']: + print('| End of test | test loss {:5.2f} | test bpc {:9.5f}'.format( + test_loss, bpc(test_loss))) + +print('=' * 100) diff --git a/research/nlp/transformer_xl/tran_model/tf2msp/tf_get_param.py b/research/nlp/transformer_xl/tran_model/tf2msp/tf_get_param.py new file mode 100644 index 0000000000000000000000000000000000000000..fd639512028c50127ce8170cb7142de17f7d078e --- /dev/null +++ b/research/nlp/transformer_xl/tran_model/tf2msp/tf_get_param.py @@ -0,0 +1,371 @@ +# Copyright 2022 Huawei Technologies Co., Ltd +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================ +from __future__ import absolute_import +from __future__ import division +from __future__ import print_function + +import tensorflow as tf +import model +import data_utils +import numpy as np +from absl import flags +from gpu_utils import assign_to_gpu + +# GPU config +flags.DEFINE_integer("num_hosts", default=1, + help="Number of TPU hosts") +flags.DEFINE_integer("num_core_per_host", default=8, + help="Number of cores per host") + +# Experiment (data/checkpoint/directory) config +flags.DEFINE_string("data_dir", default="", + help="Path to tf-records directory.") +flags.DEFINE_string("record_info_dir", default="", + help="Path to local directory containing filenames.txt.") +flags.DEFINE_string("corpus_info_path", default="", + help="Path to corpus-info.json file.") +flags.DEFINE_string("model_dir", default=None, + help="Estimator model_dir.") +flags.DEFINE_bool("do_train", default=True, + help="Whether to run training.") +flags.DEFINE_bool("do_eval", default=False, + help="Whether to run eval on the dev set.") +flags.DEFINE_string("eval_ckpt_path", None, + help="Checkpoint path for do_test evaluation." + "If set, model_dir will be ignored." + "If unset, will use the latest ckpt in model_dir.") +flags.DEFINE_string("warm_start_path", None, + help="Checkpoint path for warm start." + "If set, will clear Adam states." + "Note that the new model_dir should be different" + " from warm_start_path.") + +# Optimization config +flags.DEFINE_float("learning_rate", default=2.5e-4, + help="Maximum learning rate.") +flags.DEFINE_float("clip", default=0.25, + help="Gradient clipping value.") +# for cosine decay +flags.DEFINE_float("min_lr_ratio", default=0.004, + help="Minimum ratio learning rate.") +flags.DEFINE_integer("warmup_steps", default=0, + help="Number of steps for linear lr warmup.") + +# Training config +flags.DEFINE_integer("train_batch_size", default=60, + help="Size of train batch.") +flags.DEFINE_integer("eval_batch_size", default=60, + help="Size of valid batch.") +flags.DEFINE_integer("train_steps", default=100000, + help="Total number of training steps.") +flags.DEFINE_integer("iterations", default=500, + help="Number of iterations per repeat loop.") +flags.DEFINE_integer("save_steps", default=10000, + help="number of steps for model checkpointing.") + +# Evaluation config +flags.DEFINE_bool("do_test", default=False, + help="Run on the test set.") +flags.DEFINE_integer("max_eval_batch", default=-1, + help="Set -1 to turn off. Only used in test mode.") +flags.DEFINE_bool("do_eval_only", default=False, + help="Run evaluation only.") +flags.DEFINE_integer("start_eval_steps", default=10000, + help="Which checkpoint to start with in `do_eval_only` mode.") +flags.DEFINE_string("eval_split", "valid", + help="Which data split to evaluate.") + +# Model config +flags.DEFINE_integer("tgt_len", default=70, + help="Number of steps to predict") +flags.DEFINE_integer("mem_len", default=70, + help="Number of steps to cache") +flags.DEFINE_bool("same_length", default=False, + help="Same length attention") +flags.DEFINE_integer("clamp_len", default=-1, + help="Clamp length") + +flags.DEFINE_integer("n_layer", default=6, + help="Number of layers.") +flags.DEFINE_integer("d_model", default=500, + help="Dimension of the model.") +flags.DEFINE_integer("d_embed", default=500, + help="Dimension of the embeddings.") +flags.DEFINE_integer("n_head", default=10, + help="Number of attention heads.") +flags.DEFINE_integer("d_head", default=50, + help="Dimension of each attention head.") +flags.DEFINE_integer("d_inner", default=1000, + help="Dimension of inner hidden size in positionwise feed-forward.") +flags.DEFINE_float("dropout", default=0.1, + help="Dropout rate.") +flags.DEFINE_float("dropatt", default=0.1, + help="Attention dropout rate.") +flags.DEFINE_bool("untie_r", default=False, + help="untie r_w_bias and r_r_bias") + +# Adaptive Softmax / Embedding +flags.DEFINE_bool("tie_weight", default=True, + help="Tie embedding and softmax weight.") +flags.DEFINE_integer("div_val", default=1, + help="Divide the embedding size by this val for each bin") +flags.DEFINE_bool("proj_share_all_but_first", default=False, + help="True to share all but first projs, False not to share.") +flags.DEFINE_bool("proj_same_dim", default=True, + help="Project the bin with the same dimension.") + +# Parameter initialization +flags.DEFINE_enum("init", default="normal", + enum_values=["normal", "uniform"], + help="Initialization method.") +flags.DEFINE_float("init_std", default=0.02, + help="Initialization std when init is normal.") +flags.DEFINE_float("proj_init_std", default=0.01, + help="Initialization std for embedding projection.") +flags.DEFINE_float("init_range", default=0.1, + help="Initialization std when init is uniform.") + +FLAGS = flags.FLAGS + + +def get_model_fn(n_token, cutoffs): + def model_fn(inp, tgt, mems, is_training): + inp = tf.transpose(inp, [1, 0]) + tgt = tf.transpose(tgt, [1, 0]) + + if FLAGS.init == "uniform": + initializer = tf.initializers.random_uniform( + minval=-FLAGS.init_range, + maxval=FLAGS.init_range, + seed=None) + elif FLAGS.init == "normal": + initializer = tf.initializers.random_normal( + stddev=FLAGS.init_std, + seed=None) + proj_initializer = tf.initializers.random_normal( + stddev=FLAGS.proj_init_std, + seed=None) + + tie_projs = [False for _ in range(len(cutoffs) + 1)] + if FLAGS.proj_share_all_but_first: + for i in range(1, len(tie_projs)): + tie_projs[i] = True + + loss, new_mems = model.transformer( + dec_inp=inp, + target=tgt, + mems=mems, + n_token=n_token, + n_layer=FLAGS.n_layer, + d_model=FLAGS.d_model, + d_embed=FLAGS.d_embed, + n_head=FLAGS.n_head, + d_head=FLAGS.d_head, + d_inner=FLAGS.d_inner, + dropout=FLAGS.dropout, + dropatt=FLAGS.dropatt, + initializer=initializer, + proj_initializer=proj_initializer, + is_training=is_training, + mem_len=FLAGS.mem_len, + cutoffs=cutoffs, + div_val=FLAGS.div_val, + tie_projs=tie_projs, + input_perms=None, + target_perms=None, + head_target=None, + same_length=FLAGS.same_length, + clamp_len=FLAGS.clamp_len, + use_tpu=False, + untie_r=FLAGS.untie_r, + proj_same_dim=FLAGS.proj_same_dim) + + # number of parameters + num_params = sum([np.prod(v.shape) for v in tf.trainable_variables()]) + tf.logging.info('#params: {}'.format(num_params)) + + # format_str = '{{:<{0}s}}\t{{}}'.format( + # max([len(v.name) for v in tf.trainable_variables()])) + # for v in tf.trainable_variables(): + # tf.logging.info(format_str.format(v.name, v.get_shape())) + + if is_training: + all_vars = tf.trainable_variables() + grads = tf.gradients(loss, all_vars) + grads_and_vars = list(zip(grads, all_vars)) + return loss, new_mems, grads_and_vars + return loss, new_mems + + return model_fn + + +def single_core_graph(n_token, cutoffs, is_training, inp, tgt, mems): + model_fn = get_model_fn( + n_token=n_token, + cutoffs=cutoffs) + + model_ret = model_fn( + inp=inp, + tgt=tgt, + mems=mems, + is_training=is_training) + + return model_ret + + +def evaluate(n_token, cutoffs, ps_device): + ##### Get input function and model function + eval_input_fn, eval_record_info = data_utils.get_input_fn( + record_info_dir=FLAGS.record_info_dir, + split=FLAGS.eval_split, + per_host_bsz=FLAGS.eval_batch_size, + tgt_len=FLAGS.tgt_len, + num_core_per_host=FLAGS.num_core_per_host, + num_hosts=1, + use_tpu=False) + + num_batch = eval_record_info["num_batch"] + if FLAGS.max_eval_batch > 0: + num_batch = FLAGS.max_eval_batch + tf.logging.info("num of batches {}".format(num_batch)) + + ##### Create computational graph + eval_set = eval_input_fn({ + "batch_size": FLAGS.eval_batch_size, + "data_dir": FLAGS.data_dir}) + + input_feed, label_feed = eval_set.make_one_shot_iterator().get_next() + + inputs = tf.split(input_feed, FLAGS.num_core_per_host, 0) + labels = tf.split(label_feed, FLAGS.num_core_per_host, 0) + + per_core_bsz = FLAGS.eval_batch_size // FLAGS.num_core_per_host + tower_mems, tower_losses, tower_new_mems = [], [], [] + + for i in range(FLAGS.num_core_per_host): + with tf.device(assign_to_gpu(i, ps_device)), \ + tf.variable_scope(tf.get_variable_scope(), reuse=tf.AUTO_REUSE): + mems_i = [tf.placeholder(tf.float32, + [FLAGS.mem_len, per_core_bsz, FLAGS.d_model]) + for _ in range(FLAGS.n_layer)] + + loss_i, new_mems_i = single_core_graph( + n_token=n_token, + cutoffs=cutoffs, + is_training=False, + inp=inputs[i], + tgt=labels[i], + mems=mems_i) + + tower_mems.append(mems_i) + tower_losses.append(loss_i) + tower_new_mems.append(new_mems_i) + + saver = tf.train.Saver() + + with tf.Session(config=tf.ConfigProto(allow_soft_placement=True)) as sess: + sess.run(tf.global_variables_initializer()) + + if FLAGS.eval_ckpt_path is None: + eval_ckpt_path = tf.train.latest_checkpoint(FLAGS.model_dir) + else: + eval_ckpt_path = FLAGS.eval_ckpt_path + tf.logging.info("Evaluate {}".format(eval_ckpt_path)) + saver.restore(sess, eval_ckpt_path) + + print("=" * 100) + graph = sess.graph + # print([node.name for node in graph.as_graph_def().node]) + + # r_w_bias(8,128) --> transformer/r_w_bias(8,128) + # r_r_bias(8.128) --> transformer/r_r_bias(8,128) + + # 0.attn.qkv_net.weight(3072, 1024) --> transformer/layer_0/rel_attn/qkv/kernel(1024, 3072) + # 0.attn.o_net.weight(1024,1024) --> transformer/layer_0/rel_attn/o/kernel(1024, 1024) + # 0.attn.r_net.weight(1024,1024) --> transformer/layer_0/rel_attn/r/kernel(1024, 1024) + # 0.attn.layer_norm.gamma(1024,1) --> transformer/layer_0/rel_attn/LayerNorm/gamma(1024,1) + # 0.attn.layer_norm.beta(1024,1) --> transformer/layer_0/rel_attn/LayerNorm/beta(1024,1) + # 0.pos_ff.CoreNet.0.weight(3072, 1024) --> transformer/layer_0/ff/layer_1/kernel(3072, 1024) + # 0.pos_ff.CoreNet.0.bias(3072,1) --> transformer/layer_0/ff/layer_1/bias(1024,1) + # 0.pos_ff.CoreNet.3.weight(1024, 3072) --> transformer/layer_0/ff/layer_2/kernel(3072, 1024) + # 0.pos_ff.CoreNet.3.bias(1024,1) --> transformer/layer_0/ff/layer_2/bias(1024,1) + # 0.pos_ff.layer_norm.gamma(1024,1) --> transformer/layer_0/ff/LayerNorm/gamma(1024,1) + # 0.pos_ff.layer_norm.beta(1024,1) --> transformer/layer_0/ff/LayerNorm/beta(1024,1) + + # word_emb.emb_layers.0.embedding_table(204,1024) --> transformer/adaptive_embed/lookup_table(204, 1024) + # crit.out_layers.0.bias(204,) --> + + print("*" * 100) + param_dict = {} + param_dict["transformer/r_w_bias"] = 'r_w_bias' + param_dict["transformer/r_r_bias"] = 'r_r_bias' + param_dict['transformer/adaptive_embed/lookup_table'] = 'word_emb.emb_layers.0.embedding_table' + for i in range(0, 24): + param_dict['transformer/layer_' + str(i) + '/rel_attn/qkv/kernel'] = str(i) + '.attn.qkv_net.weight' + param_dict['transformer/layer_' + str(i) + '/rel_attn/o/kernel'] = str(i) + '.attn.o_net.weight' + param_dict['transformer/layer_' + str(i) + '/rel_attn/r/kernel'] = str(i) + '.attn.r_net.weight' + param_dict['transformer/layer_' + str(i) + '/rel_attn/LayerNorm/gamma'] = str(i) + '.attn.layer_norm.gamma' + param_dict['transformer/layer_' + str(i) + '/rel_attn/LayerNorm/beta'] = str(i) + '.attn.layer_norm.beta' + param_dict['transformer/layer_' + str(i) + '/ff/layer_1/kernel'] = str(i) + '.pos_ff.CoreNet.0.weight' + param_dict['transformer/layer_' + str(i) + '/ff/layer_1/bias'] = str(i) + '.pos_ff.CoreNet.0.bias' + ############### + param_dict['transformer/layer_' + str(i) + '/ff/layer_2/kernel'] = str(i) + '.pos_ff.CoreNet.3.weight' + param_dict['transformer/layer_' + str(i) + '/ff/layer_2/bias'] = str(i) + '.pos_ff.CoreNet.3.bias' + ############### + param_dict['transformer/layer_' + str(i) + '/ff/LayerNorm/gamma'] = str(i) + '.pos_ff.layer_norm.gamma' + param_dict['transformer/layer_' + str(i) + '/ff/LayerNorm/beta'] = str(i) + '.pos_ff.layer_norm.beta' + + tf_dict = {} + for node in graph.as_graph_def().node: + if node.name in param_dict.keys(): + print(node.name) + node_data = graph.get_operation_by_name(node.name).outputs[0] + data_np = sess.run(node_data) + print(type(data_np)) + print(data_np.shape) + print(data_np) + + tf_dict[node.name] = data_np + print("*" * 100) + + import pickle + + if 'enwik8' in FLAGS.model_dir: + with open('./enwik8_large.pkl', 'wb') as f: + pickle.dump(tf_dict, f) + if 'text8' in FLAGS.model_dir: + with open('./text8_large.pkl', 'wb') as f: + pickle.dump(tf_dict, f) + + print("=" * 100) + print("finish!") + + +def main(unused_argv): + del unused_argv # Unused + + tf.logging.set_verbosity(tf.logging.INFO) + + # Get corpus info + corpus_info = data_utils.get_corpus_info(FLAGS.corpus_info_path) + n_token = corpus_info["vocab_size"] + cutoffs = corpus_info["cutoffs"][1:-1] + tf.logging.info("n_token {}".format(n_token)) + + evaluate(n_token, cutoffs, "/gpu:0") + + +if __name__ == "__main__": + tf.app.run() diff --git a/research/nlp/transformer_xl/tran_model/tf2msp/tf_get_param.sh b/research/nlp/transformer_xl/tran_model/tf2msp/tf_get_param.sh new file mode 100644 index 0000000000000000000000000000000000000000..323c60aa88f5fa2688597fa9e7f899ca1df0804f --- /dev/null +++ b/research/nlp/transformer_xl/tran_model/tf2msp/tf_get_param.sh @@ -0,0 +1,140 @@ +#!/bin/bash +# Copyright 2022 Huawei Technologies Co., Ltd +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================ + +export CUDA_VISIBLE_DEVICES="0,1,2,3" + +echo 'Trans TensorFlow model to Numpy.' +if [ $# -lt 1 ] ; then + echo "Usage: bash torch_get_param.sh [DATA_SET]" +exit 1 +fi + +if [ "$1" == "enwik8" ]; then + # Data + DATA_ROOT=./ + DATA_DIR=${DATA_ROOT}/pretrained_xl/tf_enwik8/data + MODEL_DIR=${DATA_ROOT}/pretrained_xl/tf_enwik8/model + + # Model + N_LAYER=24 + D_MODEL=1024 + D_EMBED=1024 + N_HEAD=8 + D_HEAD=128 + D_INNER=3072 + + # Testing + TEST_TGT_LEN=128 + TEST_MEM_LEN=3800 + TEST_CLAMP_LEN=1000 + + TEST_CKPT_PATH=${MODEL_DIR}/model.ckpt-0 + TEST_BSZ=2 + TEST_NUM_CORE=2 + + echo 'Preprocess test set...' + python data_utils.py \ + --data_dir=${DATA_DIR}/ \ + --dataset=enwik8 \ + --tgt_len=${TEST_TGT_LEN} \ + --per_host_test_bsz=${TEST_BSZ} \ + --num_passes=1 \ + --use_tpu=False + + echo 'Run evaluation on test set...' + python tf_get_param.py \ + --data_dir=${DATA_DIR}/tfrecords \ + --record_info_dir=${DATA_DIR}/tfrecords/ \ + --corpus_info_path=${DATA_DIR}/corpus-info.json \ + --eval_ckpt_path=${TEST_CKPT_PATH} \ + --model_dir=EXP-enwik8 \ + --n_layer=${N_LAYER} \ + --d_model=${D_MODEL} \ + --d_embed=${D_EMBED} \ + --n_head=${N_HEAD} \ + --d_head=${D_HEAD} \ + --d_inner=${D_INNER} \ + --dropout=0.0 \ + --dropatt=0.0 \ + --tgt_len=${TEST_TGT_LEN} \ + --mem_len=${TEST_MEM_LEN} \ + --clamp_len=${TEST_CLAMP_LEN} \ + --same_length=True \ + --eval_batch_size=${TEST_BSZ} \ + --num_core_per_host=${TEST_NUM_CORE} \ + --do_train=False \ + --do_eval=True \ + --eval_split=test + +fi + +if [ "$1" == "text8" ]; then + # Data + DATA_ROOT=./ + DATA_DIR=${DATA_ROOT}/pretrained_xl/tf_text8/data + MODEL_DIR=${DATA_ROOT}/pretrained_xl/tf_text8/model + + # Model + N_LAYER=24 + D_MODEL=1024 + D_EMBED=1024 + N_HEAD=8 + D_HEAD=128 + D_INNER=3072 + + # Testing + TEST_TGT_LEN=128 + TEST_MEM_LEN=3800 + TEST_CLAMP_LEN=1000 + + TEST_CKPT_PATH=${MODEL_DIR}/model.ckpt-0 + TEST_BSZ=2 + TEST_NUM_CORE=2 + + echo 'Preprocess test set...' + python data_utils.py \ + --data_dir=${DATA_DIR}/ \ + --dataset=text8 \ + --tgt_len=${TEST_TGT_LEN} \ + --per_host_test_bsz=${TEST_BSZ} \ + --num_passes=1 \ + --use_tpu=False + + echo 'Run evaluation on test set...' + python tf_get_param.py \ + --data_dir=${DATA_DIR}/tfrecords \ + --record_info_dir=${DATA_DIR}/tfrecords/ \ + --corpus_info_path=${DATA_DIR}/corpus-info.json \ + --eval_ckpt_path=${TEST_CKPT_PATH} \ + --model_dir=EXP-text8 \ + --n_layer=${N_LAYER} \ + --d_model=${D_MODEL} \ + --d_embed=${D_EMBED} \ + --n_head=${N_HEAD} \ + --d_head=${D_HEAD} \ + --d_inner=${D_INNER} \ + --dropout=0.0 \ + --dropatt=0.0 \ + --tgt_len=${TEST_TGT_LEN} \ + --mem_len=${TEST_MEM_LEN} \ + --clamp_len=${TEST_CLAMP_LEN} \ + --same_length=True \ + --eval_batch_size=${TEST_BSZ} \ + --num_core_per_host=${TEST_NUM_CORE} \ + --do_train=False \ + --do_eval=True \ + --eval_split=test +fi \ No newline at end of file diff --git a/research/nlp/transformer_xl/tran_model/torch2msp.sh b/research/nlp/transformer_xl/tran_model/torch2msp.sh new file mode 100644 index 0000000000000000000000000000000000000000..24d998976ad43c95e4604e6b1fe95de023a07ea0 --- /dev/null +++ b/research/nlp/transformer_xl/tran_model/torch2msp.sh @@ -0,0 +1,77 @@ +#!/bin/bash +# Copyright 2022 Huawei Technologies Co., Ltd +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================ + + +echo 'Preprocess key mapping...' +python key_mapping.py + +echo 'Trans pytorch model to mindspore model.' +if [ $# -lt 4 ] || [ $# -gt 5 ] +then + echo "Usage: bash torch2msp.sh [DATA_DIR] [DATA_NAME] [TORCH_PT_PATH] + [CONFIG_PATH] [DEVICE_ID(optional)]" +exit 1 +fi + +export DEVICE_ID=0 +if [ $# = 5 ] ; then + export DEVICE_ID=$5 +fi; + +get_real_path(){ + if [ "${1:0:1}" == "/" ]; then + echo "$1" + else + echo "$(realpath -m $PWD/$1)" + fi +} + +DATA_DIR=$(get_real_path $1) +if [ ! -d $DATA_DIR ] +then + echo "error: DATA_DIR=$DATA_DIR is not a directory" +exit 1 +fi + +DATA_NAME=$2 +PT_PATH=$3 +CONFIG_PATH=$4 + +echo "DATA_DIR="$DATA_DIR +echo "DATA_NAME="$DATA_NAME +echo "PT_PATH="$PT_PATH +echo "CONFIG_PATH="$CONFIG_PATH + +export CONFIG_PATH=${CONFIG_PATH} +export DEVICE_NUM=1 +export RANK_SIZE=$DEVICE_NUM +export RANK_ID=0 + +if [ ! -d "torch2msp_model" ]; +then + mkdir ./torch2msp_model +fi + +echo "Start evaluation for device $DEVICE_ID :)" + +python ./torch2msp/torch2msp.py \ + --device_id=$DEVICE_ID \ + --datadir=$DATA_DIR \ + --dataset=$DATA_NAME \ + --pt_path=$PT_PATH \ + --device="GPU" &> torch2msp_$DATA_NAME.log & + + diff --git a/research/nlp/transformer_xl/tran_model/torch2msp/torch2msp.py b/research/nlp/transformer_xl/tran_model/torch2msp/torch2msp.py new file mode 100644 index 0000000000000000000000000000000000000000..8903f0854049d2d1259d16f9ebfd595d2a3a081e --- /dev/null +++ b/research/nlp/transformer_xl/tran_model/torch2msp/torch2msp.py @@ -0,0 +1,91 @@ +# Copyright 2022 Huawei Technologies Co., Ltd +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================ + +import sys +import argparse +import pickle +import mindspore +from mindspore import context +from mindspore import save_checkpoint +from mindspore.dataset import GeneratorDataset +from src.utils.dataset_util import get_dataset +from src.metric.calc import bpc +from src.model.mem_transformer import MemTransformerLM +from src.model_utils.config import config +from src.callback.eval import doEval + +sys.path.insert(0, '../') + +parser = argparse.ArgumentParser(description='PyTorch Model Trans MindSpore Model.') +parser.add_argument('--datadir', default='./data/enwik8', + help='Directory contains enwik8 dataset.') +parser.add_argument('--dataset', default='enwik8', + help='Dataset Name.', choices=["enwik8", "text8"]) +parser.add_argument('--pt_path', default="./model.pt", help='Directory of model param.') +parser.add_argument("--device", type=str, default="GPU", help="Device Target, default GPU", + choices=["Ascend", "GPU"]) +parser.add_argument("--device_id", type=int, default=0, help="Device id, default is 0.") +args = parser.parse_args() +datadir = args.datadir +dataset = args.dataset +pt_path = args.pt_path +device_id = args.device_id + +numpy_param_path = pt_path +with open(numpy_param_path, 'rb') as f: + torch_dict = pickle.load(f) + +dataset = get_dataset(datadir, dataset) +ntokens = len(dataset.vocab) + +context.set_context(device_id=device_id) +context.set_context(mode=context.GRAPH_MODE, device_target="GPU", max_device_memory="39.0GB", + enable_graph_kernel=True) + +test_dataset = GeneratorDataset(source=dataset.get_test_generator(), column_names=['data', 'target'], + shuffle=False) + +cutoffs = [] +net = MemTransformerLM(ntokens, config.n_layer, config.n_head, config.d_model, + config.d_head, config.d_inner, config.dropout, config.dropatt, batch_size=config.batch_size, + d_embed=config.d_embed, div_val=config.div_val, + pre_lnorm=config.pre_lnorm, tgt_len=config.tgt_len, + ext_len=config.ext_len, mem_len=config.mem_len, eval_tgt_len=config.eval_tgt_len, + cutoffs=cutoffs, same_length=config.same_length, clamp_len=config.clamp_len) + +net_dict = {} +with open('./msp2torch_base.txt', 'r') as f: + for line in f.readlines(): + msp_name, pytorch_name = line.strip().split(":") + net_dict[msp_name] = torch_dict[pytorch_name] + +for k in net.parameters_dict(): + if k in ('mems', 'valid_mems', 'empty_valid_mems'): + continue + net.parameters_dict()[k].set_data(mindspore.Tensor(net_dict[k])) + +print('load net param') + +save_path = './torch2msp_model/' + str(args.dataset) + '_model.ckpt' +save_checkpoint(net, save_path) + +test_loss = doEval(net, test_dataset, config.tgt_len, config.ext_len, config.mem_len, config.eval_tgt_len) + +print('=' * 100) +if config.dataset in ['enwik8', 'text8']: + print('| End of test | test loss {:5.2f} | test bpc {:9.5f}'.format( + test_loss, bpc(test_loss))) + +print('=' * 100) diff --git a/research/nlp/transformer_xl/tran_model/torch2msp/torch2numpy.py b/research/nlp/transformer_xl/tran_model/torch2msp/torch2numpy.py new file mode 100644 index 0000000000000000000000000000000000000000..4fbfc06bdd90eef5a674b47d2c8b1737c7d239b7 --- /dev/null +++ b/research/nlp/transformer_xl/tran_model/torch2msp/torch2numpy.py @@ -0,0 +1,48 @@ +# Copyright 2022 Huawei Technologies Co., Ltd +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================ + +import argparse +import os +import pickle +import torch + +# 需要1.x版本的PyTorch实现numpy的转化 +parser = argparse.ArgumentParser(description='PyTorch Model Trans numpy.') +parser.add_argument('--dataset', default='enwik8', + help='Dataset Name.', choices=["enwik8", "text8"]) +parser.add_argument('--work_dir', default="./enwik8_base.pkl", help='Directory of model param.') +args = parser.parse_args() + +torch_model_path = args.work_dir +torch_param = None +if args.dataset == 'enwik8': + torch_param = torch.load(os.path.join(torch_model_path, 'enwik8_base.pkl')) +if args.dataset == 'text8': + torch_param = torch.load(os.path.join(torch_model_path, 'text8_base.pkl')) + +if not torch_param: + print('no torch param model.') + exit() +torch_dict = {} +for key in torch_param.keys(): + torch_dict[key] = torch_param[key].cpu().numpy() + +if args.dataset == 'enwik8': + with open('./enwik8_base.pkl', 'wb') as f: + pickle.dump(torch_dict, f) +if args.dataset == 'text8': + with open('./text8_base.pkl', 'wb') as f: + pickle.dump(torch_dict, f) +print('finish!') diff --git a/research/nlp/transformer_xl/tran_model/torch2msp/torch2numpy.sh b/research/nlp/transformer_xl/tran_model/torch2msp/torch2numpy.sh new file mode 100644 index 0000000000000000000000000000000000000000..eaa7b55d5907cf855a0c9e5edaa0e318e160addf --- /dev/null +++ b/research/nlp/transformer_xl/tran_model/torch2msp/torch2numpy.sh @@ -0,0 +1,25 @@ +#!/bin/bash +# Copyright 2022 Huawei Technologies Co., Ltd +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================ + +echo 'Trans pytorch dict model to Numpy.' +if [ $# -lt 2 ] ; then + echo "Usage: bash torch2numpy.sh [DATA_SET] [WORK_DIR]" +exit 1 +fi + +python torch2numpy.py \ + --dataset $1 \ + --work_dir $2 \ No newline at end of file diff --git a/research/nlp/transformer_xl/tran_model/torch2msp/torch_get_param.py b/research/nlp/transformer_xl/tran_model/torch2msp/torch_get_param.py new file mode 100644 index 0000000000000000000000000000000000000000..ce527889edd8a2cb68f2e45630c549028dddebde --- /dev/null +++ b/research/nlp/transformer_xl/tran_model/torch2msp/torch_get_param.py @@ -0,0 +1,44 @@ +# coding: utf-8 +# Copyright 2022 Huawei Technologies Co., Ltd +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================ +import argparse +import os +import torch + +parser = argparse.ArgumentParser(description='PyTorch Transformer Language Model') +parser.add_argument('--dataset', type=str, default='wt103', + choices=['wt103', 'lm1b', 'enwik8', 'text8'], + help='dataset name') +parser.add_argument('--cuda', action='store_true', + help='use CUDA') +parser.add_argument('--work_dir', type=str, required=True, + help='path to the work_dir') + +args = parser.parse_args() + +device = torch.device("cuda" if args.cuda else "cpu") + +# Load the best saved model. +with open(os.path.join(args.work_dir, 'model.pt'), 'rb') as f: + model = torch.load(f) +model.backward_compatible() +model = model.to(device) + +# 保存模型参数 +if 'enwik8' in args.dataset: + torch.save(model.state_dict(), 'enwik8_base.pkl') +if 'text8' in args.dataset: + torch.save(model.state_dict(), 'text8_base.pkl') +print('finish') diff --git a/research/nlp/transformer_xl/tran_model/torch2msp/torch_get_param.sh b/research/nlp/transformer_xl/tran_model/torch2msp/torch_get_param.sh new file mode 100644 index 0000000000000000000000000000000000000000..9ef29b4b6be7211782f4604ae76b05e0510d39c1 --- /dev/null +++ b/research/nlp/transformer_xl/tran_model/torch2msp/torch_get_param.sh @@ -0,0 +1,26 @@ +#!/bin/bash +# Copyright 2022 Huawei Technologies Co., Ltd +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================ + +echo 'Trans pytorch model to dict_model.' +if [ $# -lt 2 ] ; then + echo "Usage: bash torch_get_param.sh [DATA_SET] [WORK_DIR]" +exit 1 +fi + +python torch_get_param.py \ + --cuda \ + --dataset $1 \ + --work_dir $2 \ No newline at end of file diff --git a/research/nlp/transformer_xl/yaml/enwik8_base.yaml b/research/nlp/transformer_xl/yaml/enwik8_base.yaml new file mode 100644 index 0000000000000000000000000000000000000000..0473d99ad5a37f7c587df81b8353549f8e24e8dd --- /dev/null +++ b/research/nlp/transformer_xl/yaml/enwik8_base.yaml @@ -0,0 +1,94 @@ +# Copyright 2022 Huawei Technologies Co., Ltd +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================ + +# Builtin Configurations(DO NOT CHANGE THESE CONFIGURATIONS unless you know exactly what you are doing) +enable_modelarts: False +# Url for modelarts +data_url: "" +train_url: "experiments" +checkpoint_url: "experiments" + +# Path for local +datadir: "/home/mindspore/msp_txl/official/nlp/transformer_xl/data/enwik8" +dataset: "enwik8" +ckpt_path: "/home/mindspore/msp_txl/official/nlp/transformer_xl/script/experiments-enwik8/20220416-140816/model0.ckpt" +device: "GPU" +device_id: 0 + +# ============================================================================== +# Training options + +n_layer: 12 +n_head: 8 +d_head: 64 +d_embed: 512 +d_model: 512 +d_inner: 2048 +dropout: 0.1 +dropatt: 0.0 +optim: "adam" +scheduler: "cosine" +lr: 0.00025 +lr_min: 0.0 +warmup_step: 0 +max_step: 400000 +log-interval: 200 +eval-interval: 4000 +batch_size: 22 +tgt_len: 512 +eval_tgt_len: 128 +mem_len: 512 +clamp_len: -1 +init: "normal" +emb_init: "normal" +init_range: 0.1 +emb_init_range: 0.01 +init_std: 0.02 +proj_init_std: 0.01 +mom: 0.0 +decay_rate: 0.5 +clip: 0.25 +batch_chunk: 1 +seed: 1111 +div_val: 1 +attn_type: 0 +ext_len: 0 +eta_min: 0.0 +max_eval_steps: -1 +sample_softmax: -1 +patience: 0 +adaptive: False +varlen: False +pre_lnorm: False +same_length: False + +# Model Description + + + +--- +# Config description for each option +enable_modelarts: 'Whether training on modelarts, default: False' +data_url: 'Dataset url for obs' +train_url: 'Training output url for obs' +data_path: 'Dataset path for local' +output_path: 'Training output path for local' + +device_target: 'Target device type' +enable_profiling: 'Whether enable profiling while training, default: False' + + +--- +device_target: [ 'Ascend', 'GPU', 'CPU' ] diff --git a/research/nlp/transformer_xl/yaml/enwik8_large.yaml b/research/nlp/transformer_xl/yaml/enwik8_large.yaml new file mode 100644 index 0000000000000000000000000000000000000000..b8e18c2eba8c032a306465915cbc65b9c6d22513 --- /dev/null +++ b/research/nlp/transformer_xl/yaml/enwik8_large.yaml @@ -0,0 +1,94 @@ +# Copyright 2022 Huawei Technologies Co., Ltd +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================ + +# Builtin Configurations(DO NOT CHANGE THESE CONFIGURATIONS unless you know exactly what you are doing) +enable_modelarts: False +# Url for modelarts +data_url: "" +train_url: "experiments" +checkpoint_url: "experiments" + +# Path for local +datadir: "/home/mindspore/msp_txl/official/nlp/transformer_xl/data/enwik8" +dataset: "enwik8" +ckpt_path: "/home/mindspore/msp_txl/official/nlp/transformer_xl/script/experiments-enwik8/20220416-140816/model0.ckpt" +device: "GPU" +device_id: 0 + +# ============================================================================== +# Training options + +n_layer: 24 +n_head: 8 +d_head: 128 +d_embed: -1 +d_model: 1024 +d_inner: 3072 +dropout: 0.15 +dropatt: 0.15 +optim: "adam" +scheduler: "cosine" +lr: 0.00025 +lr_min: 0.0 +warmup_step: 0 +max_step: 400000 +log-interval: 200 +eval-interval: 4000 +batch_size: 64 +tgt_len: 786 +eval_tgt_len: 128 +ext_len: 0 +mem_len: 786 +clamp_len: -1 +init: "normal" +emb_init: "normal" +init_range: 0.1 +emb_init_range: 0.01 +init_std: 0.02 +proj_init_std: 0.01 +mom: 0.0 +decay_rate: 0.5 +clip: 0.25 +batch_chunk: 1 +seed: 1111 +div_val: 1 +attn_type: 0 +eta_min: 0.0 +max_eval_steps: -1 +sample_softmax: -1 +patience: 0 +adaptive: False +varlen: False +pre_lnorm: False +same_length: False + +# Model Description + + + +--- +# Config description for each option +enable_modelarts: 'Whether training on modelarts, default: False' +data_url: 'Dataset url for obs' +train_url: 'Training output url for obs' +data_path: 'Dataset path for local' +output_path: 'Training output path for local' + +device_target: 'Target device type' +enable_profiling: 'Whether enable profiling while training, default: False' + + +--- +device_target: [ 'Ascend', 'GPU', 'CPU' ] diff --git a/research/nlp/transformer_xl/yaml/text8_large.yaml b/research/nlp/transformer_xl/yaml/text8_large.yaml new file mode 100644 index 0000000000000000000000000000000000000000..b8e18c2eba8c032a306465915cbc65b9c6d22513 --- /dev/null +++ b/research/nlp/transformer_xl/yaml/text8_large.yaml @@ -0,0 +1,94 @@ +# Copyright 2022 Huawei Technologies Co., Ltd +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================ + +# Builtin Configurations(DO NOT CHANGE THESE CONFIGURATIONS unless you know exactly what you are doing) +enable_modelarts: False +# Url for modelarts +data_url: "" +train_url: "experiments" +checkpoint_url: "experiments" + +# Path for local +datadir: "/home/mindspore/msp_txl/official/nlp/transformer_xl/data/enwik8" +dataset: "enwik8" +ckpt_path: "/home/mindspore/msp_txl/official/nlp/transformer_xl/script/experiments-enwik8/20220416-140816/model0.ckpt" +device: "GPU" +device_id: 0 + +# ============================================================================== +# Training options + +n_layer: 24 +n_head: 8 +d_head: 128 +d_embed: -1 +d_model: 1024 +d_inner: 3072 +dropout: 0.15 +dropatt: 0.15 +optim: "adam" +scheduler: "cosine" +lr: 0.00025 +lr_min: 0.0 +warmup_step: 0 +max_step: 400000 +log-interval: 200 +eval-interval: 4000 +batch_size: 64 +tgt_len: 786 +eval_tgt_len: 128 +ext_len: 0 +mem_len: 786 +clamp_len: -1 +init: "normal" +emb_init: "normal" +init_range: 0.1 +emb_init_range: 0.01 +init_std: 0.02 +proj_init_std: 0.01 +mom: 0.0 +decay_rate: 0.5 +clip: 0.25 +batch_chunk: 1 +seed: 1111 +div_val: 1 +attn_type: 0 +eta_min: 0.0 +max_eval_steps: -1 +sample_softmax: -1 +patience: 0 +adaptive: False +varlen: False +pre_lnorm: False +same_length: False + +# Model Description + + + +--- +# Config description for each option +enable_modelarts: 'Whether training on modelarts, default: False' +data_url: 'Dataset url for obs' +train_url: 'Training output url for obs' +data_path: 'Dataset path for local' +output_path: 'Training output path for local' + +device_target: 'Target device type' +enable_profiling: 'Whether enable profiling while training, default: False' + + +--- +device_target: [ 'Ascend', 'GPU', 'CPU' ]