jasper

7553d331 · 18767169826 · 18767169826 · 7face724 · 7553d331 · 7553d331
Commit 7553d331 authored 3 years ago by 18767169826 Committed by 18767169826 2 years ago
--- a/research/audio/jasper/README-CN.md
+++ b/research/audio/jasper/README-CN.md
+# 目录
+
+[View English](./README.md)
+
+<!-- TOC -->
+
+- - [目录](#目录)
+  - [jasper介绍](#jasper介绍)
+  - [网络模型结构](#网络模型结构)
+  - [数据集](#数据集)
+  - [环境要求](#环境要求)
+  - [文件说明和运行说明](#文件说明和运行说明)
+    - [代码目录结构说明](#代码目录结构说明)
+    - [模型参数](#模型参数)
+    - [训练和推理过程](#训练和推理过程)
+    - [Export](#Export)
+  - [性能](#性能)
+    - [训练性能](#训练性能)
+    - [推理性能](#推理性能)
+  - [ModelZoo主页](#modelzoo主页)
+
+## [Jasper介绍](#contents)
+
+Japser是一个使用 CTC 损失训练的端到端的语音识别模型。Jasper模型仅仅使用1D convolutions, batch normalization, ReLU, dropout和residual connections这些模块。训练和验证支持CPU和GPU。
+
+[论文](https://arxiv.org/pdf/1904.03288v3.pdf): Jason Li, et al. Jasper: An End-to-End Convolutional Neural Acoustic Model.
+
+## [网络模型结构](#contents)
+
+Jasper是一种基于卷积的端到端神经声学模型。在音频处理阶段，将每一帧转换为梅尔尺度谱图特征，声学模型将其作为输入，并输出每一帧词汇表上的概率分布。声学模型具有模块化的块结构，可以相应地进行参数化：Jasper BxR模型有B个块，每个块由R个重复子块组成。
+
+每一个子块应用下面这些操作：
+1D-Convolution, Batch Normalization, ReLU activation, Dropout.
+每个块输入通过残差连接直接连接到所有后续块的最后一个子块，本文称之为dense residual。每个块的内核大小和过滤器数量都不同，从底层到顶层，过滤器的大小都在增加。不管精确的块配置参数B和R如何，每个Jasper模型都有四个额外的卷积块：一个紧跟在输入层之后，三个在B块末尾。
+
+## [数据集](#contents)
+
+可以基于论文中提到的数据集或在相关领域/网络架构中广泛使用的数据集运行脚本。在下面的部分中，我们将介绍如何使用下面的相关数据集运行脚本。
+
+使用的数据集为: [LibriSpeech](<http://www.openslr.org/12>)
+
+训练集：
+train-clean-100: [6.3G] (100小时的无噪音演讲训练集)
+train-clean-360.tar.gz [23G] (360小时的无噪音演讲训练集)
+train-other-500.tar.gz [30G] (500小时的有噪音演讲训练集)
+验证集：
+dev-clean.tar.gz [337M] (无噪音)
+dev-other.tar.gz [314M] (有噪音)
+测试集:
+test-clean.tar.gz [346M] (测试集, 无噪音)
+test-other.tar.gz [328M] (测试集, 有噪音)
+数据格式：wav 和 txt 文件
+
+## [环境要求](#contents)
+
+硬件（GPU）
+  GPU处理器
+框架
+  [MindSpore](https://www.mindspore.cn/install/en)
+通过下面网址可以获得更多信息：
+ [MindSpore tutorials](https://www.mindspore.cn/tutorials/en/master/index.html)
+ [MindSpore Python API](https://www.mindspore.cn/docs/api/zh-CN/master/index.html)
+
+## [文件说明和运行说明](#contents)
+
+### [代码目录结构说明](#contents)
+
+```path
+.
+└─audio
+    └─jasper
+        │  eval.py                          //推理文件
+        │  labels.json                      //需要用到的字符
+        │  pt2mind.py                       //pth转化ckpt文件
+        |  create_mindrecord.py             //将数据集转化为mindrecord
+        │  README-CN.md                     //中文readme
+        │  README.md                        //英文readme
+        │  requirements.txt                 //需要的库文件
+        │  train.py                         //训练文件
+        │
+        ├─scripts
+        │      download_librispeech.sh      //下载数据集的脚本
+        │      preprocess_librispeech.sh    //处理数据集的脚本
+        │      run_distribute_train_gpu.sh  //GPU8卡训练
+        │      run_eval_cpu.sh              //CPU推理
+        │      run_eval_gpu.sh              //GPU推理
+        │      run_standalone_train_cpu.sh  //CPU单卡训练
+        │      run_standalone_train_gpu.sh  //GPU单卡训练
+        │
+        ├─src
+        │      audio.py                     //数据处理相关代码
+        │      callback.py                  //回调以监控训练
+        │      cleaners.py                  //数据清理
+        │      config.py                    //jasper配置文件
+        │      dataset.py                   //数据处理
+        │      decoder.py                   //来自第三方的解码器
+        │      eval_callback.py             //推理的数据回调
+        │      greedydecoder.py             //修改Mindspore代码的greedydecoder
+        │      jasper10x5dr_speca.yaml      //jasper网络结构配置
+        │      lr_generator.py              //产生学习率
+        │      model.py                     //训练模型
+        │      model_test.py                //推理模型
+        │      number.py                    //数据处理
+        │      text.py                      //数据处理
+        │      __init__.py
+        │
+        └─utils
+                convert_librispeech.py      //转化数据集
+                download_librispeech.py     //下载数据集
+                download_utils.py           //下载工具
+                inference_librispeech.csv   //推理数据集链接
+                librispeech.csv             //全部数据集链接
+                preprocessing_utils.py      //预处理工具
+                __init__.py
+```
+
+### [模型参数](#contents)
+
+训练和推理的相关参数在`config.py`文件
+
+```text
+训练相关参数
+    epochs                       训练的epoch数量，默认为440
+```
+
+```text
+数据处理相关参数
+    train_manifest               用于训练的数据文件路径，默认为 'data/libri_train_manifest.json'
+    val_manifest                 用于测试的数据文件路径，默认为 'data/libri_val_manifest.json'
+    batch_size                   批处理大小，默认为64
+    labels_path                  模型输出的token json 路径, 默认为 "./labels.json"
+    sample_rate                  数据特征的采样率，默认为16000
+    window_size                  频谱图生成的窗口大小（秒），默认为0.02
+    window_stride                频谱图生成的窗口步长（秒），默认为0.01
+    window                       频谱图生成的窗口类型，默认为 'hamming'
+    speed_volume_perturb         使用随机速度和增益扰动，默认为False，当前模型中未使用
+    spec_augment                 在MEL谱图上使用简单的光谱增强，默认为False，当前模型中未使用
+    noise_dir                    注入噪音到音频。默认为noise Inject未添加，默认为''，当前模型中未使用
+    noise_prob                   每个样本加噪声的概率，默认为0.4，当前模型中未使用
+    noise_min                    样本的最小噪音水平，(1.0意味着所有的噪声，不是原始信号)，默认是0.0，当前模型中未使用
+    noise_max                    样本的最大噪音水平。最大值为1.0，默认值为0.5，当前模型中未使用
+```
+
+```text
+优化器相关参数
+    learning_rate                初始化学习率，默认为3e-4
+    learning_anneal              对每个epoch之后的学习率进行退火，默认为1.1
+    weight_decay                 权重衰减，默认为1e-5
+    momentum                     动量，默认为0.9
+    eps                          Adam eps，默认为1e-8
+    betas                        Adam betas，默认为(0.9, 0.999)
+    loss_scale                   损失规模，默认是1024
+```
+
+```text
+checkpoint相关参数
+    ckpt_file_name_prefix        ckpt文件的名称前缀，默认为'DeepSpeech'
+    ckpt_path                    ckpt文件的保存路径，默认为'checkpoints'
+    keep_checkpoint_max          ckpt文件的最大数量限制，删除旧的检查点，默认是10
+```
+
+## [训练和推理过程](#contents)
+
+### 训练
+
+```text
+运行: train.py   [--use_pretrained USE_PRETRAINED]
+                 [--pre_trained_model_path PRE_TRAINED_MODEL_PATH]
+                 [--is_distributed IS_DISTRIBUTED]
+                 [--bidirectional BIDIRECTIONAL]
+                 [--device_target DEVICE_TARGET]
+参数:
+    --pre_trained_model_path    预先训练的模型文件路径，默认为''
+    --is_distributed            多卡训练，默认为False
+    --device_target             运行代码的设备："GPU" | “CPU”，默认为"GPU"
+```
+
+### 推理
+
+```text
+运行: eval.py   [--bidirectional BIDIRECTIONAL]
+                [--pretrain_ckpt PRETRAIN_CKPT]
+                [--device_target DEVICE_TARGET]
+
+参数:
+    --pretrain_ckpt              checkpoint的文件路径, 默认为''
+    --device_target              运行代码的设备："GPU" | “CPU”，默认为"GPU"
+```
+
+在训练之前，应该下载、处理数据集。
+
+``` bash
+bash scripts/download_librispeech.sh
+bash scripts/preprocess_librispeech.sh
+python create_mindrecord.py //将数据集转成mindrecord格式
+```
+
+流程结束后，数据目录结构如下：
+
+```path
+    .
+    |--LibriSpeech
+    │  |--train-clean-100-wav
+    │  │--train-clean-360-wav
+    │  │--train-other-500-wav
+    │  |--dev-clean-wav
+    │  |--dev-other-wav
+    │  |--test-clean-wav
+    │  |--test-other-wav
+    |--librispeech-train-clean-100-wav.json,librispeech-train-clean-360-wav.json,librispeech-train-other-500-wav.json,librispeech-dev-clean-wav.json,librispeech-dev-other-wav.json,librispeech-test-clean-wav.json,librispeech-test-other-wav.json
+```
+
+src/config中设置数据集的位置。
+
+```shell
+...
+训练配置
+"Data_dir": '/data/dataset',
+"train_manifest": ['/data/dataset/librispeech-train-clean-100-wav.json',
+                   '/data/dataset/librispeech-train-clean-360-wav.json',
+                   '/data/dataset/librispeech-train-other-500-wav.json'],
+"mindrecord_format": "/data/jasper_tr{}.md",
+"mindrecord_files": [f"/data/jasper_tr{i}.md" for i in range(8)]
+
+评估配置
+"DataConfig":{
+     "Data_dir": '/data/inference_datasets',
+     "test_manifest": ['/data/inference_datasets/librispeech-dev-clean-wav.json'],
+}
+
+```
+
+训练之前，需要安装`librosa` and `Levenshtein`
+通过官网安装MindSpore并完成数据集处理后，可以开始训练如下：
+
+```shell
+
+# gpu单卡训练
+bash ./scripts/run_standalone_train_gpu.sh [DEVICE_ID]
+
+# cpu单卡训练
+bash ./scripts/run_standalone_train_cpu.sh
+
+# gpu多卡训练
+bash ./scripts/run_distribute_train_gpu.sh
+
+```
+
+推理：
+
+```shell
+
+# cpu评估
+bash ./scripts/run_eval_cpu.sh [PATH_CHECKPOINT]
+
+# gpu评估
+bash ./scripts/run_eval_gpu.sh [DEVICE_ID] [PATH_CHECKPOINT]
+
+```
+
+## [性能](#contents)
+
+### [训练和测试性能分析](#contents)
+
+#### 训练性能
+
+| 参数                 | Jasper                                                      |
+| -------------------------- | ---------------------------------------------------------------|
+| 资源                   | NV SMX2 V100-32G              |
+| 更新日期              | 2/7/2022 (month/day/year)                                    |
+| MindSpore版本           | 1.8.0                                                        |
+| 数据集                    | LibriSpeech                                                 |
+| 训练参数       | 8p, epoch=440, steps=1088 * epoch, batch_size = 64, lr=3e-4 |
+| 优化器                  | Adam                                                           |
+| 损失函数              | CTCLoss                                |
+| 输出                    | 概率值                                                    |
+| 损失值                       | 0.2-0.7                                                        |
+| 运行速度                      | 8p 2.7s/step                              |
+| 训练总时间       | 8p: around 194h;                          |
+| Checkpoint文件大小                 | 991M (.ckpt file)                                              |
+| 代码                   | [Japser script](https://gitee.com/mindspore/models/tree/master/research/audio/jasper) |
+
+#### Inference Performance
+
+| 参数                 | Jasper                                                       |
+| -------------------------- | ----------------------------------------------------------------|
+| 资源                   | NV SMX2 V100-32G                   |
+| 更新日期              | 2/7/2022 (month/day/year)                                 |
+| MindSpore版本          | 1.8.0                                                         |
+| 数据集                    | LibriSpeech                         |
+| 批处理大小                 | 64                                                         |
+| 输出                    | 概率值                       |
+| 精确度(无噪声)       | 8p: WER: 5.754  CER: 2.151 |
+| 精确度(有噪声)      | 8p: WER: 19.213 CER: 9.393 |
+| 模型大小        | 330M (.mindir file)                                              |
+
+## [ModelZoo主页](#contents)
+
+ [ModelZoo主页](https://gitee.com/mindspore/models).
--- a/research/audio/jasper/README.md
+++ b/research/audio/jasper/README.md
+# Contents
+
+- - [jasper Description](#CenterNet-description)
+  - [Model Architecture](#Model-Architecture)
+  - [Dataset](#dataset)
+  - [Environment Requirements](#environment-requirements)
+  - [Script Description](#script-description)
+    - [Script and Sample Code](#script-parameters)
+    - [Script Parameters](#script-parameters)
+    - [Training and eval Process](#training-process)
+    - [Export](#Export)
+  - [Performance](#performance)
+    - [Training Performance](#training-performance)
+    - [Inference Performance](#inference-performance)
+  - [ModelZoo Homepage](#modelzoo-homepage)
+
+## [Jasper Description](#contents)
+
+Jasper is an end-to-end speech recognition models which is trained with CTC loss. Jasper model uses only 1D convolutions, batch normalization, ReLU, dropout, and residual connections. We support training and evaluation on CPU and GPU.
+
+[Paper](https://arxiv.org/pdf/1904.03288v3.pdf): Jason Li, et al. Jasper: An End-to-End Convolutional Neural Acoustic Model.
+
+## [Model Architecture](#contents)
+
+Jasper is an end-to-end neural acoustic model that is based on convolutions. In the audio processing stage, each frame is transformed into mel-scale spectrogram features, which the acoustic model takes as input and outputs a probability distribution over the vocabulary for each frame. The acoustic model has a modular block structure and can be parametrized accordingly: a Jasper BxR model has B blocks, each consisting of R repeating sub-blocks.
+Each sub-block applies the following operations in sequence: 1D-Convolution, Batch Normalization, ReLU activation, and Dropout.
+Each block input is connected directly to the last subblock of all following blocks via a residual connection, which is referred to as dense residual in the paper. Every block differs in kernel size and number of filters, which are increasing in size from the bottom to the top layers. Irrespective of the exact block configuration parameters B and R, every Jasper model has four additional convolutional blocks: one immediately succeeding the input layer (Prologue) and three at the end of the B blocks (Epilogue).
+
+## [Dataset](#contents)
+
+Note that you can run the scripts based on the dataset mentioned in original paper or widely used in relevant domain/network architecture. In the following sections, we will introduce how to run the scripts using the related dataset below.
+
+Dataset used: [LibriSpeech](<http://www.openslr.org/12>)
+
+Train Data：
+train-clean-100: [6.3G] (training set of 100 hours "clean" speech)
+train-clean-360.tar.gz [23G] (training set of 360 hours "clean" speech)
+train-other-500.tar.gz [30G] (training set of 500 hours "other" speech)
+Val Data：
+dev-clean.tar.gz [337M] (development set, "clean" speech)
+dev-other.tar.gz [314M] (development set, "other", more challenging, speech)
+Test Data:
+test-clean.tar.gz [346M] (test set, "clean" speech )
+test-other.tar.gz [328M] (test set, "other" speech )
+Data format：wav and txt files
+
+## [Environment Requirements](#contents)
+
+Hardware（GPU）
+  Prepare hardware environment with GPU processor.
+Framework
+  [MindSpore](https://www.mindspore.cn/install/en)
+For more information, please check the resources below：
+  [MindSpore tutorials](https://www.mindspore.cn/tutorials/en/master/index.html)
+  [MindSpore Python API](https://www.mindspore.cn/docs/api/en/master/index.html)
+
+## [Script Description](#contents)
+
+### [Script and Sample Code](#contents)
+
+```path
+.
+└─audio
+    └─jasper
+        │  eval.py                          //inference file
+        │  labels.json                      //label file
+        │  pt2mind.py                       //pth transform to ckpt file
+        |  create_mindrecord.py             //transform data to mindrecord
+        │  README-CN.md                     //Chinese readme
+        │  README.md                        //English readme
+        │  requirements.txt                 //required library file
+        │  train.py                         //train file
+        │
+        ├─scripts
+        │      download_librispeech.sh      //download data
+        │      preprocess_librispeech.sh    //preprocess data
+        │      run_distribute_train_gpu.sh  //8 GPU cards train
+        │      run_eval_cpu.sh              //CPU evaluate
+        │      run_eval_gpu.sh              //GPU evaluate
+        │      run_standalone_train_cpu.sh  //one CPU train
+        │      run_standalone_train_gpu.sh  //one GPU train
+        │
+        ├─src
+        │      audio.py                     //preprocess data
+        │      callback.py                  //callback
+        │      cleaners.py                  //preprocess data
+        │      config.py                    //jasper config
+        │      dataset.py                   //preporcess data
+        │      decoder.py                   //Third-party decoders
+        │      eval_callback.py             //evaluate callback
+        │      greedydecoder.py             //refactored greedydecoder
+        │      jasper10x5dr_speca.yaml      //jasper model's config
+        │      lr_generator.py              //learning rate
+        │      model.py                     //training model
+        │      model_test.py                //inference model
+        │      number.py                    //preprocess data
+        │      text.py                      //preprocess data
+        │      __init__.py
+        │
+        └─utils
+                convert_librispeech.py      //convert data
+                download_librispeech.py     //download data
+                download_utils.py           //download utils
+                inference_librispeech.csv   //links to inference data
+                librispeech.csv             //links to all data
+                preprocessing_utils.py      //preprocessing utils
+                __init__.py
+
+```
+
+### [Script Parameters](#contents)
+
+#### Training
+
+```text
+usage: train.py  [--use_pretrained USE_PRETRAINED]
+                 [--pre_trained_model_path PRE_TRAINED_MODEL_PATH]
+                 [--is_distributed IS_DISTRIBUTED]
+                 [--bidirectional BIDIRECTIONAL]
+                 [--device_target DEVICE_TARGET]
+options:
+    --pre_trained_model_path    pretrained checkpoint path, default is ''
+    --is_distributed            distributed training, default is False
+    is True. Currently, only bidirectional model is implemented
+    --device_target             device where the code will be implemented: "GPU" | "CPU", default is "GPU"
+```
+
+#### Evaluation
+
+```text
+usage: eval.py  [--bidirectional BIDIRECTIONAL]
+                [--pretrain_ckpt PRETRAIN_CKPT]
+                [--device_target DEVICE_TARGET]
+
+options:
+    --bidirectional              whether to use bidirectional RNN, default is True. Currently, only bidirectional model is implemented
+    --pretrain_ckpt              saved checkpoint path, default is ''
+    --device_target              device where the code will be implemented: "GPU" | "CPU", default is "GPU"
+```
+
+#### Options and Parameters
+
+Parameters for training and evaluation can be set in file `config.py`
+
+```text
+config for training.
+    epochs                       number of training epoch, default is 70
+```
+
+```text
+config for dataloader.
+    train_manifest               train manifest path, default is 'data/libri_train_manifest.json'
+    val_manifest                 dev manifest path, default is 'data/libri_val_manifest.json'
+    batch_size                   batch size for training, default is 8
+    labels_path                  tokens json path for model output, default is "./labels.json"
+    sample_rate                  sample rate for the data/model features, default is 16000
+    window_size                  window size for spectrogram generation (seconds), default is 0.02
+    window_stride                window stride for spectrogram generation (seconds), default is 0.01
+    window                       window type for spectrogram generation, default is 'hamming'
+    speed_volume_perturb         use random tempo and gain perturbations, default is False, not used in current model
+    spec_augment                 use simple spectral augmentation on mel spectograms, default is False, not used in current model
+    noise_dir                    directory to inject noise into audio. If default, noise Inject not added, default is '', not used in current model
+    noise_prob                   probability of noise being added per sample, default is 0.4, not used in current model
+    noise_min                    minimum noise level to sample from. (1.0 means all noise, not original signal), default is 0.0, not used in current model
+    noise_max                    maximum noise levels to sample from. Maximum 1.0, default is 0.5, not used in current model
+```
+
+```text
+config for optimizer.
+    learning_rate                initial learning rate, default is 3e-4
+    learning_anneal              annealing applied to learning rate after each epoch, default is 1.1
+    weight_decay                 weight decay, default is 1e-5
+    momentum                     momentum, default is 0.9
+    eps                          Adam eps, default is 1e-8
+    betas                        Adam betas, default is (0.9, 0.999)
+    loss_scale                   loss scale, default is 1024
+```
+
+```text
+config for checkpoint.
+    ckpt_file_name_prefix        ckpt_file_name_prefix, default is 'Jasper'
+    ckpt_path                    path to save ckpt, default is 'checkpoints'
+    keep_checkpoint_max          max number of checkpoints to save, delete older checkpoints, default is 10
+```
+
+## [Training and Eval process](#contents)
+
+Before training, the dataset should be processed.
+
+``` bash
+bash scripts/download_librispeech.sh
+bash scripts/preprocess_librispeech.sh
+python createmindrecord.py //transform data to mindrecord
+```
+
+dataset directory structure is as follows:
+
+```path
+    .
+    |--LibriSpeech
+    │  |--train-clean-100-wav
+    │  │--train-clean-360-wav
+    │  │--train-other-500-wav
+    │  |--dev-clean-wav
+    │  |--dev-other-wav
+    │  |--test-clean-wav
+    │  |--test-other-wav
+    |--librispeech-train-clean-100-wav.json,librispeech-train-clean-360-wav.json,librispeech-train-other-500-wav.json,librispeech-dev-clean-wav.json,librispeech-dev-other-wav.json,librispeech-test-clean-wav.json,librispeech-test-other-wav.json
+```
+
+The three *.json file stores the absolute path of the corresponding
+data. After obtaining the 3 json file, you should modify the configurations in `src/config.py`.
+For training config, the train_manifest should be configured with the path of `libri_train_manifest.json` and for eval config, it should be configured
+with `libri_test_other_manifest.json` or `libri_train_manifest.json`, depending on which dataset is evaluated.
+
+```shell
+...
+train config
+"Data_dir": '/data/dataset',
+"train_manifest": ['/data/dataset/librispeech-train-clean-100-wav.json',
+                   '/data/dataset/librispeech-train-clean-360-wav.json',
+                   '/data/dataset/librispeech-train-other-500-wav.json'],
+"mindrecord_format": "/data/jasper_tr{}.md",
+"mindrecord_files": [f"/data/jasper_tr{i}.md" for i in range(8)]
+
+eval config
+"DataConfig":{
+     "Data_dir": '/data/inference_datasets',
+     "test_manifest": ['/data/inference_datasets/librispeech-dev-clean-wav.json'],
+}
+
+```
+
+Before training, some requirements should be installed, including `librosa` and `Levenshtein`
+After installing MindSpore via the official website and finishing dataset processing, you can start training as follows:
+
+```shell
+
+# standalone training gpu
+bash ./scripts/run_standalone_train_gpu.sh [DEVICE_ID]
+
+# standalone training cpu
+bash ./scripts/run_standalone_train_cpu.sh
+
+# distributed training gpu
+bash ./scripts/run_distribute_train_gpu.sh
+
+```
+
+The following script is used to evaluate the model. Note we only support greedy decoder now and before run the script:
+
+```shell
+
+# eval on cpu
+bash ./scripts/run_eval_cpu.sh [PATH_CHECKPOINT]
+
+# eval on gpu
+bash ./scripts/run_eval_gpu.sh [DEVICE_ID] [PATH_CHECKPOINT]
+
+```
+
+## [Model Description](#contents)
+
+### [Performance](#contents)
+
+#### Training Performance
+
+| Parameters           | Jasper                                                       |
+| -------------------- | ------------------------------------------------------------ |
+| Resource             | NV SMX2 V100-32G                                             |
+| uploaded Date        | 2/7/2022 (month/day/year)                                    |
+| MindSpore Version    | 1.8.0                                                        |
+| Dataset              | LibriSpeech                                                  |
+| Training Parameters  | 8p, epoch=70, steps=1088 * epoch, batch_size = 64, lr=3e-4   |
+| Optimizer            | Adam                                                         |
+| Loss Function        | CTCLoss                                                      |
+| outputs              | probability                                                  |
+| Loss                 | 0.2-0.7                                                      |
+| Speed                | 8p 2.7s/step                                                 |
+| Total time: training | 8p: around 194 h;                                            |
+| Checkpoint           | 991M (.ckpt file)                                            |
+| Scripts              | [Jasper script](https://gitee.com/mindspore/models/tree/master/research/audio/jasper) |
+
+#### Inference Performance
+
+| Parameters          | Jasper                     |
+| ------------------- | -------------------------- |
+| Resource            | NV SMX2 V100-32G           |
+| uploaded Date       | 2/7/2022 (month/day/year)  |
+| MindSpore Version   | 1.8.0                      |
+| Dataset             | LibriSpeech                |
+| batch_size          | 64                         |
+| outputs             | probability                |
+| Accuracy(dev-clean) | 8p: WER: 5.754  CER: 2.151 |
+| Accuracy(dev-other) | 8p: WER: 19.213 CER: 9.393 |
+| Model for inference | 330M (.mindir file)        |
+
+## [ModelZoo Homepage](#contents)
+
+ Please check the official [homepage](https://gitee.com/mindspore/models).
--- a/research/audio/jasper/create_mindrecord.py
+++ b/research/audio/jasper/create_mindrecord.py
+# Copyright 2022 Huawei Technologies Co., Ltd
+#
+# Licensed under the Apache License, Version 2.0 (the License);
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# httpwww.apache.orglicensesLICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an AS IS BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ============================================================================
+
+import os
+from multiprocessing import Pool
+import mindspore.dataset.engine as de
+from mindspore.mindrecord import FileWriter
+from src.dataset import ASRDataset
+from src.config import train_config, symbols
+
+
+def _exec_task(task_id):
+    """
+    Execute task with specified task id
+    """
+    print("exec task {}...".format(task_id))
+    # get number of files
+    writer = FileWriter(mindrecord_file.format(task_id), 1)
+    writer.set_page_size(1 << 25)
+    jasper_json = {
+        "batch_spect": {"type": "float32", "shape": [1, 64, -1]},
+        "batch_script": {"type": "int32", "shape": [-1,]}
+    }
+    writer.add_schema(jasper_json, "jasper_json")
+    output_columns = ["batch_spect", "batch_script"]
+    dataset = ASRDataset(data_dir=train_config.DataConfig.Data_dir,
+                         manifest_fpaths=train_config.DataConfig.train_manifest,
+                         labels=symbols,
+                         batch_size=1,
+                         train_mode=True)
+    ds = de.GeneratorDataset(dataset, output_columns,
+                             num_shards=num_tasks, shard_id=task_id)
+    dataset_size = ds.get_dataset_size()
+    for c, item in enumerate(ds.create_dict_iterator(output_numpy=True)):
+        row = {"batch_spect": item["batch_spect"],
+               "batch_script": item["batch_script"]}
+        writer.write_raw_data([row])
+        print(f"{c}/{dataset_size}", flush=True)
+    writer.commit()
+
+
+if __name__ == "__main__":
+    mindrecord_file = train_config.DataConfig.mindrecord_format
+    mindrecord_dir = os.path.dirname(mindrecord_file)
+    if not os.path.isdir(mindrecord_dir):
+        os.makedirs(mindrecord_dir)
+    num_tasks = 8
+
+    print("Write mindrecord ...")
+
+    task_list = list(range(num_tasks))
+
+    if os.name == 'nt':
+        for window_task_id in task_list:
+            _exec_task(window_task_id)
+    elif num_tasks > 1:
+        with Pool(num_tasks) as p:
+            p.map(_exec_task, task_list)
+    else:
+        _exec_task(0)
--- a/research/audio/jasper/eval.py
+++ b/research/audio/jasper/eval.py
+# Copyright 2022 Huawei Technologies Co., Ltd
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ============================================================================
+
+"""
+Eval for Japer
+"""
+import argparse
+import json
+import pickle
+import numpy as np
+from src.config import eval_config, symbols, encoder_kw, decoder_kw
+from src.model_test import Jasper, PredictWithSoftmax
+from src.dataset import create_eval_dataset
+from src.decoder import GreedyDecoder
+from mindspore import context
+from mindspore.train.serialization import load_checkpoint, load_param_into_net
+
+parser = argparse.ArgumentParser(description='jasper evaluation')
+parser.add_argument('--pretrain_ckpt', type=str,
+                    default='./checkpoint/ckpt_0/jasper10.ckpt', help='Pretrained checkpoint path')
+parser.add_argument('--device_target', type=str, default="GPU", choices=("GPU", "CPU"),
+                    help='Device target, support GPU and CPU, Default: GPU')
+args = parser.parse_args()
+
+if __name__ == '__main__':
+    context.set_context(mode=context.GRAPH_MODE,
+                        device_target=args.device_target, save_graphs=False)
+    config = eval_config
+    with open(config.DataConfig.labels_path) as label_file:
+        labels = json.load(label_file)
+
+    model = PredictWithSoftmax(
+        Jasper(encoder_kw=encoder_kw, decoder_kw=decoder_kw))
+
+    ds_eval = create_eval_dataset(data_dir=config.DataConfig.Data_dir,
+                                  manifest_filepath=config.DataConfig.test_manifest,
+                                  labels=symbols, batch_size=config.DataConfig.batch_size, train_mode=False)
+
+    param_dict = load_checkpoint(args.pretrain_ckpt)
+    load_param_into_net(model, param_dict)
+    print('Successfully loading the pre-trained model')
+
+    if config.LMConfig.decoder_type == 'greedy':
+        decoder = GreedyDecoder(labels=symbols, blank_index=len(symbols)-1)
+    else:
+        raise NotImplementedError("Only greedy decoder is supported now")
+    target_decoder = GreedyDecoder(symbols, blank_index=len(symbols)-1)
+
+    model.set_train(False)
+    total_cer, total_wer, num_tokens, num_chars = 0, 0, 0, 0
+    output_data = []
+    for data in ds_eval.create_dict_iterator():
+        inputs, input_length, target_indices, targets = data['inputs'], data['input_length'], data['target_indices'], \
+            data['targets']
+
+        split_targets = []
+        start, count, last_id = 0, 0, 0
+        target_indices, targets = target_indices.asnumpy(), targets.asnumpy()
+        for i in range(np.shape(targets)[0]):
+            if target_indices[i, 0] == last_id:
+                count += 1
+            else:
+                split_targets.append(list(targets[start:count]))
+                last_id += 1
+                start = count
+                count += 1
+        split_targets.append(list(targets[start:]))
+
+        out, output_sizes = model(inputs, input_length)
+        decoded_output, _ = decoder.decode(out, output_sizes)
+        target_strings = target_decoder.convert_to_strings(split_targets)
+
+        if config.save_output is not None:
+            output_data.append(
+                (out.asnumpy(), output_sizes.asnumpy(), target_strings))
+        for doutput, toutput in zip(decoded_output, target_strings):
+            transcript, reference = doutput[0], toutput[0]
+            wer_inst = decoder.wer(transcript, reference)
+            cer_inst = decoder.cer(transcript, reference)
+            total_wer += wer_inst
+            total_cer += cer_inst
+            num_tokens += len(reference.split())
+            num_chars += len(reference.replace(' ', ''))
+            if config.verbose:
+                print("Ref:", reference.lower())
+                print("Hyp:", transcript.lower())
+                print("WER:", float(wer_inst) / len(reference.split()),
+                      "CER:", float(cer_inst) / len(reference.replace(' ', '')), "\n")
+    wer = float(total_wer) / num_tokens
+    cer = float(total_cer) / num_chars
+
+    print('Test Summary \t'
+          'Average WER {wer:.3f}\t'
+          'Average CER {cer:.3f}\t'.format(wer=wer * 100, cer=cer * 100))
+
+    if config.save_output is not None:
+        with open(config.save_output + '.bin', 'wb') as output:
+            pickle.dump(output_data, output)
--- a/research/audio/jasper/export.py
+++ b/research/audio/jasper/export.py
+# Copyright 2022 Huawei Technologies Co., Ltd
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ============================================================================
+
+"""
+export checkpoint file to mindir model
+"""
+import json
+import argparse
+import numpy as np
+import mindspore as ms
+from mindspore import context, Tensor
+from mindspore.train.serialization import load_checkpoint, load_param_into_net, export
+from src.config import train_config, encoder_kw, decoder_kw
+from src.model import Jasper
+
+parser = argparse.ArgumentParser(
+    description='Export DeepSpeech model to Mindir')
+parser.add_argument('--pre_trained_model_path', type=str,
+                    default='', help=' existed checkpoint path')
+parser.add_argument('--device_target', type=str, default="GPU", choices=("GPU", "CPU"),
+                    help='Device target, support GPU and CPU, Default: GPU')
+args = parser.parse_args()
+
+if __name__ == '__main__':
+    config = train_config
+    context.set_context(mode=context.GRAPH_MODE,
+                        device_target=args.device_target, save_graphs=False)
+    with open(config.DataConfig.labels_path) as label_file:
+        labels = json.load(label_file)
+
+    jasper_net = Jasper(encoder_kw=encoder_kw,
+                        decoder_kw=decoder_kw).to_float(ms.float16)
+
+    param_dict = load_checkpoint(args.pre_trained_model_path)
+    load_param_into_net(jasper_net, param_dict)
+    print('Successfully loading the pre-trained model')
+    # 3500 is the max length in evaluation dataset(LibriSpeech). This is consistent with that in dataset.py
+    # The length is fixed to this value because Mindspore does not support dynamic shape currently
+    input_np = np.random.uniform(
+        0.0, 1.0, size=[1, 64, 3500]).astype(np.float32)
+    length = np.array([100], dtype=np.int32)
+    export(jasper_net, Tensor(input_np), Tensor(length),
+           file_name="jasper.mindir", file_format='MINDIR')
--- a/research/audio/jasper/labels.json
+++ b/research/audio/jasper/labels.json
+[
+  "'",
+  "A",
+  "B",
+  "C",
+  "D",
+  "E",
+  "F",
+  "G",
+  "H",
+  "I",
+  "J",
+  "K",
+  "L",
+  "M",
+  "N",
+  "O",
+  "P",
+  "Q",
+  "R",
+  "S",
+  "T",
+  "U",
+  "V",
+  "W",
+  "X",
+  "Y",
+  "Z",
+  " ",
+  "_"
+]
\ No newline at end of file
--- a/research/audio/jasper/pt2mind.py
+++ b/research/audio/jasper/pt2mind.py
+# Copyright 2022 Huawei Technologies Co., Ltd
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ============================================================================
+
+
+import argparse
+import re
+
+import numpy as np
+from sympy import arg
+import torch
+from mindspore.train.serialization import save_checkpoint
+from mindspore import Tensor
+
+parser = argparse.ArgumentParser(description='pth translate to ckpt')
+parser.add_argument('--pth', type='str',
+                    default='/data/Jasper_epoch10_checkpoint.pt', help='path of pth')
+
+args = parser.parse_args()
+
+
+def convert_v1_state_dict(state_dict):
+    rules = [
+        ('^jasper_encoder.encoder.', 'encoder.layers.'),
+        ('^jasper_decoder.decoder_layers.', 'decoder.layers.'),
+    ]
+    ret = {}
+    for k, v in state_dict.items():
+        if k.startswith('acoustic_model.'):
+            continue
+        if k.startswith('audio_preprocessor.'):
+            continue
+        for pattern, to in rules:
+            k = re.sub(pattern, to, k)
+        ret[k] = v
+
+    return ret
+
+
+checkpoint = torch.load(arg.pth, map_location="cpu")
+
+state_dic = convert_v1_state_dict(checkpoint['state_dict'])
+
+
+mydict = state_dic
+newparams_list = []
+names = [item for item in mydict if 'num_batches_tracked' not in item]
+i = 0
+for name in names:
+    parameter = mydict[name].numpy()
+    param_dict = {}
+
+    if i % 5 == 0:
+        name = name.replace('weight', 'conv1.weight')
+        parameter = np.expand_dims(parameter, axis=2)
+    elif i % 5 == 1:
+        name = name.replace('weight', 'batchnorm.gamma')
+    elif i % 5 == 2:
+        name = name.replace('bias', 'batchnorm.beta')
+    elif i % 5 == 3:
+        name = name.replace('running_mean', 'batchnorm.moving_mean')
+    else:
+        name = name.replace('running_var', 'batchnorm.moving_variance')
+
+    if i == 540:
+        name = name.replace('0.conv1.weight', 'weight')
+    if i == 541:
+        name = name.replace('0.bias', 'bias')
+
+    param_dict['name'] = name
+    param_dict['data'] = Tensor(parameter)
+    newparams_list.append(param_dict)
+    if i % 5 == 4:
+        newparams_list[i-3], newparams_list[i-2], newparams_list[i-1], newparams_list[i] = \
+            newparams_list[i - 1], newparams_list[i], newparams_list[i -
+                                                                     3], newparams_list[i-2]
+
+    i += 1
+
+save_checkpoint(newparams_list, './jasper_mindspore_10.ckpt')
+print("end")
--- a/research/audio/jasper/requirements.txt
+++ b/research/audio/jasper/requirements.txt
+ctcdecode==1.0.2
+easydict==1.9
+inflect==5.4.0
+librosa==0.8.0
+mindspore==1.8.0
+numpy==1.20.1
+pandas==1.2.4
+python_Levenshtein==0.12.2
+PyYAML==6.0
+requests==2.25.1
+six==1.15.0
+SoundFile==0.10.3.post1
+sox==1.4.1
+tqdm==4.59.0
+Unidecode==1.3.4
--- a/research/audio/jasper/scripts/download_librispeech.sh
+++ b/research/audio/jasper/scripts/download_librispeech.sh
+#!/bin/bash
+# Copyright 2022 Huawei Technologies Co., Ltd
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ============================================================================
+
+
+
+DATA_SET=$1
+DATA_ROOT_DIR=$2
+DATA_DIR="${DATA_ROOT_DIR}/${DATA_SET}"
+
+if [ ! -d "$DATA_DIR" ]
+then
+   mkdir --mode 755 $DATA_DIR
+
+   python utils/download_librispeech.py \
+      utils/inference_librispeech.csv \
+      $DATA_DIR \
+      -e ${DATA_ROOT_DIR}/
+else
+   echo "Directory $DATA_DIR already exists."
+fi
--- a/research/audio/jasper/scripts/preprocess_librispeech.sh
+++ b/research/audio/jasper/scripts/preprocess_librispeech.sh
+#!/usr/bin/env bash
+# Copyright 2022 Huawei Technologies Co., Ltd
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ============================================================================
+SPEEDS=$1
+[ -n "$SPEEDS" ] && SPEED_FLAG="--speed $SPEEDS"
+
+python ./utils/convert_librispeech.py \
+    --input_dir /datasets/LibriSpeech/train-clean-100 \
+    --dest_dir /datasets/LibriSpeech/train-clean-100-wav \
+    --output_json /datasets/LibriSpeech/librispeech-train-clean-100-wav.json \
+    $SPEED_FLAG
+python ./utils/convert_librispeech.py \
+    --input_dir /datasets/LibriSpeech/train-clean-360 \
+    --dest_dir /datasets/LibriSpeech/train-clean-360-wav \
+    --output_json /datasets/LibriSpeech/librispeech-train-clean-360-wav.json \
+    $SPEED_FLAG
+python ./utils/convert_librispeech.py \
+    --input_dir /datasets/LibriSpeech/train-other-500 \
+    --dest_dir /datasets/LibriSpeech/train-other-500-wav \
+    --output_json /datasets/LibriSpeech/librispeech-train-other-500-wav.json \
+    $SPEED_FLAG
+python ./utils/convert_librispeech.py \
+    --input_dir /datasets/LibriSpeech/dev-clean \
+    --dest_dir /datasets/LibriSpeech/dev-clean-wav \
+    --output_json /datasets/LibriSpeech/librispeech-dev-clean-wav.json
+python ./utils/convert_librispeech.py \
+    --input_dir /datasets/LibriSpeech/dev-other \
+    --dest_dir /datasets/LibriSpeech/dev-other-wav \
+    --output_json /datasets/LibriSpeech/librispeech-dev-other-wav.json
+python ./utils/convert_librispeech.py \
+    --input_dir /datasets/LibriSpeech/test-clean \
+    --dest_dir /datasets/LibriSpeech/test-clean-wav \
+    --output_json /datasets/LibriSpeech/librispeech-test-clean-wav.json
+python ./utils/convert_librispeech.py \
+    --input_dir /datasets/LibriSpeech/test-other \
+    --dest_dir /datasets/LibriSpeech/test-other-wav \
+    --output_json /datasets/LibriSpeech/librispeech-test-other-wav.json
--- a/research/audio/jasper/scripts/run_distribute_train_gpu.sh
+++ b/research/audio/jasper/scripts/run_distribute_train_gpu.sh
+#!/bin/bash
+# Copyright 2022 Huawei Technologies Co., Ltd
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ============================================================================
+
+mpirun --allow-run-as-root -n 8 --output-filename log_output --merge-stderr-to-stdout \
+python ./train.py --is_distributed --device_target 'GPU' > train_8p.log 2>&1 &
--- a/research/audio/jasper/scripts/run_eval_cpu.sh
+++ b/research/audio/jasper/scripts/run_eval_cpu.sh
+#!/bin/bash
+# Copyright 2022 Huawei Technologies Co., Ltd
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ============================================================================
+
+PATH_CHECKPOINT=$1
+python ./eval.py --pretrain_ckpt $PATH_CHECKPOINT --device_target 'CPU' > eval.log 2>&1 &
--- a/research/audio/jasper/scripts/run_eval_gpu.sh
+++ b/research/audio/jasper/scripts/run_eval_gpu.sh
+#!/bin/bash
+# Copyright 2022 Huawei Technologies Co., Ltd
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ============================================================================
+DEVICE_ID=$1
+PATH_CHECKPOINT=$2
+export CUDA_VISIBLE_DEVICES=$DEVICE_ID
+python ./eval.py --pretrain_ckpt $PATH_CHECKPOINT \
+--device_target 'GPU' > eval.log 2>&1 &
\ No newline at end of file
--- a/research/audio/jasper/scripts/run_standalone_train_cpu.sh
+++ b/research/audio/jasper/scripts/run_standalone_train_cpu.sh
+#!/bin/bash
+# Copyright 2022 Huawei Technologies Co., Ltd
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ============================================================================
+
+python ./train.py --device_target 'CPU' > train.log 2>&1 &
+
--- a/research/audio/jasper/scripts/run_standalone_train_gpu.sh
+++ b/research/audio/jasper/scripts/run_standalone_train_gpu.sh
+#!/bin/bash
+# Copyright 2022 Huawei Technologies Co., Ltd
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ============================================================================
+
+DEVICE_ID=$1
+CUDA_VISIBLE_DEVICES=$DEVICE_ID python ./train.py --device_target 'GPU' > train.log 2>&1 &
+
--- a/research/audio/jasper/src/__init__.py
+++ b/research/audio/jasper/src/__init__.py
+# Copyright 2022 Huawei Technologies Co., Ltd
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ============================================================================
--- a/research/audio/jasper/src/audio.py
+++ b/research/audio/jasper/src/audio.py
+# Copyright 2022 Huawei Technologies Co., Ltd
+#
+# Licensed under the Apache License, Version 2.0 (the License);
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# httpwww.apache.orglicensesLICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an AS IS BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ============================================================================
+
+import random
+import soundfile as sf
+
+import librosa
+
+import numpy as np
+
+import sox
+
+
+def audio_from_file(file_path, offset=0, duration=0, trim=False, target_sr=16000):
+    audio = AudioSegment(file_path, target_sr=target_sr, int_values=False,
+                         offset=offset, duration=duration, trim=trim)
+
+    samples = audio.samples
+    num_samples = samples.shape[0]
+    return (np.expand_dims(samples, 0), np.expand_dims(num_samples, 0))
+
+
+class AudioSegment:
+    """Monaural audio segment abstraction.
+
+    :param samples: Audio samples [num_samples x num_channels].
+    :type samples: ndarray.float32
+    :param sample_rate: Audio sample rate.
+    :type sample_rate: int
+    :raises TypeError: If the sample data type is not float or int.
+    """
+
+    def __init__(self, filename, target_sr=None, int_values=False, offset=0,
+                 duration=0, trim=False, trim_db=60):
+        """Create audio segment from samples.
+
+        Samples are convert float32 internally, with int scaled to [-1, 1].
+        Load a file supported by librosa and return as an AudioSegment.
+        :param filename: path of file to load
+        :param target_sr: the desired sample rate
+        :param int_values: if true, load samples as 32-bit integers
+        :param offset: offset in seconds when loading audio
+        :param duration: duration in seconds when loading audio
+        :return: numpy array of samples
+        """
+        with sf.SoundFile(filename, 'r') as f:
+            dtype = 'int32' if int_values else 'float32'
+            sample_rate = f.samplerate
+            if offset > 0:
+                f.seek(int(offset * sample_rate))
+            if duration > 0:
+                samples = f.read(int(duration * sample_rate), dtype=dtype)
+            else:
+                samples = f.read(dtype=dtype)
+        samples = samples.transpose()
+
+        samples = self._convert_samples_to_float32(samples)
+
+        if target_sr is not None and target_sr != sample_rate:
+            samples = librosa.core.resample(samples, sample_rate, target_sr)
+            sample_rate = target_sr
+        if trim:
+            samples, _ = librosa.effects.trim(samples, trim_db)
+
+        self._samples = samples
+        self._sample_rate = sample_rate
+        if self._samples.ndim >= 2:
+            self._samples = np.mean(self._samples, 1)
+
+    def __eq__(self, other):
+        """Return whether two objects are equal."""
+        if type(other) is not type(self):
+            return False
+        if self._sample_rate != other._sample_rate: # pylint: disable=W0212
+            return False
+        if self._samples.shape != other._samples.shape: # pylint: disable=W0212
+            return False
+        if np.any(self.samples != other._samples): # pylint: disable=W0212
+            return False
+        return True
+
+    def __ne__(self, other):
+        """Return whether two objects are unequal."""
+        return not self.__eq__(other)
+
+    def __str__(self):
+        """Return human-readable representation of segment."""
+        return ("%s: num_samples=%d, sample_rate=%d, duration=%.2fsec, "
+                "rms=%.2fdB" % (type(self), self.num_samples, self.sample_rate,
+                                self.duration, self.rms_db))
+
+    @staticmethod
+    def _convert_samples_to_float32(samples):
+        """Convert sample type to float32.
+
+        Audio sample type is usually integer or float-point.
+        Integers will be scaled to [-1, 1] in float32.
+        """
+        float32_samples = samples.astype('float32')
+        if samples.dtype in np.sctypes['int']:
+            bits = np.iinfo(samples.dtype).bits
+            float32_samples *= (1. / 2 ** (bits - 1))
+        elif samples.dtype in np.sctypes['float']:
+            pass
+        else:
+            raise TypeError("Unsupported sample type: %s." % samples.dtype)
+        return float32_samples
+
+    @property
+    def samples(self):
+        return self._samples.copy()
+
+    @property
+    def sample_rate(self):
+        return self._sample_rate
+
+    @property
+    def num_samples(self):
+        return self._samples.shape[0]
+
+    @property
+    def duration(self):
+        return self._samples.shape[0] / float(self._sample_rate)
+
+    @property
+    def rms_db(self):
+        mean_square = np.mean(self._samples ** 2)
+        return 10 * np.log10(mean_square)
+
+    def gain_db(self, gain):
+        self._samples *= 10. ** (gain / 20.)
+
+    def pad(self, pad_size, symmetric=False):
+        """Add zero padding to the sample.
+
+        The pad size is given in number of samples. If symmetric=True,
+        `pad_size` will be added to both sides. If false, `pad_size` zeros
+        will be added only to the end.
+        """
+        self._samples = np.pad(self._samples,
+                               (pad_size if symmetric else 0, pad_size),
+                               mode='constant')
+
+    def subsegment(self, start_time=None, end_time=None):
+        """Cut the AudioSegment between given boundaries.
+
+        Note that this is an in-place transformation.
+        :param start_time: Beginning of subsegment in seconds.
+        :type start_time: float
+        :param end_time: End of subsegment in seconds.
+        :type end_time: float
+        :raise ValueError: If start_time or end_time is incorrectly set, e.g. out
+                                             of bounds in time.
+        """
+        start_time = 0.0 if start_time is None else start_time
+        end_time = self.duration if end_time is None else end_time
+        if start_time < 0.0:
+            start_time = self.duration + start_time
+        if end_time < 0.0:
+            end_time = self.duration + end_time
+        if start_time < 0.0:
+            raise ValueError("The slice start position (%f s) is out of "
+                             "bounds." % start_time)
+        if end_time < 0.0:
+            raise ValueError("The slice end position (%f s) is out of bounds." %
+                             end_time)
+        if start_time > end_time:
+            raise ValueError("The slice start position (%f s) is later than "
+                             "the end position (%f s)." % (start_time, end_time))
+        if end_time > self.duration:
+            raise ValueError("The slice end position (%f s) is out of bounds "
+                             "(> %f s)" % (end_time, self.duration))
+        start_sample = int(round(start_time * self._sample_rate))
+        end_sample = int(round(end_time * self._sample_rate))
+        self._samples = self._samples[start_sample:end_sample]
+
+
+class Perturbation:
+    def __init__(self, p=0.1, rng=None):
+        self.p = p
+        self._rng = random.Random() if rng is None else rng
+
+    def maybe_apply(self, segment, sample_rate=None):
+        if self._rng.random() < self.p:
+            self(segment, sample_rate) # pylint: disable=E1102
+
+
+class SpeedPerturbation(Perturbation):
+    def __init__(self, min_rate=0.85, max_rate=1.15, discrete=False, p=0.1, rng=None):
+        super(SpeedPerturbation, self).__init__(p, rng)
+        assert 0 < min_rate < max_rate
+        self.min_rate = min_rate
+        self.max_rate = max_rate
+        self.discrete = discrete
+
+    def __call__(self, data, sample_rate):
+        if self.discrete:
+            rate = np.random.choice([self.min_rate, None, self.max_rate])
+        else:
+            rate = self._rng.uniform(self.min_rate, self.max_rate)
+
+        if rate is not None:
+            data._samples = sox.Transformer().speed(factor=rate).build_array( # pylint: disable=W0212
+                input_array=data._samples, sample_rate_in=sample_rate) # pylint: disable=W0212
--- a/research/audio/jasper/src/callback.py
+++ b/research/audio/jasper/src/callback.py
+# Copyright 2022 Huawei Technologies Co., Ltd
+#
+# Licensed under the Apache License, Version 2.0 (the License);
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# httpwww.apache.orglicensesLICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an AS IS BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ============================================================================
+
+"""
+Defined callback for jasper.
+"""
+import time
+import math
+import numpy as np
+from mindspore.train.callback import Callback
+from mindspore import Tensor
+
+
+class TimeMonitor(Callback):
+    """
+    Time monitor for calculating cost of each epoch.
+    Args
+        data_size (int) step size of an epoch.
+    """
+
+    def __init__(self, data_size):
+        super(TimeMonitor, self).__init__()
+        self.data_size = data_size
+
+    def epoch_begin(self, run_context):
+        self.epoch_time = time.time()
+
+    def epoch_end(self, run_context):
+        epoch_mseconds = (time.time() - self.epoch_time) * 1000
+        per_step_mseconds = epoch_mseconds / self.data_size
+        print("epoch time: {0}, per step time: {1}".format(
+            epoch_mseconds, per_step_mseconds), flush=True)
+
+    def step_begin(self, run_context):
+        self.step_time = time.time()
+
+    def step_end(self, run_context):
+        step_mseconds = (time.time() - self.step_time) * 1000
+        print(f"step time {step_mseconds}", flush=True)
+
+
+class Monitor(Callback):
+    """
+    Monitor loss and time.
+
+    Args:
+        lr_init (numpy array): train lr
+
+    Returns:
+        None
+    """
+
+    def __init__(self, lr_init=None):
+        super(Monitor, self).__init__()
+        self.lr_init = lr_init
+        self.lr_init_len = len(lr_init)
+
+    def epoch_begin(self, run_context):
+        self.losses = []
+        self.step_now = 0
+        self.step_nan = 0
+        self.epoch_time = time.time()
+
+    def epoch_end(self, run_context):
+        cb_params = run_context.original_args()
+
+        epoch_mseconds = (time.time() - self.epoch_time)
+        per_step_mseconds = epoch_mseconds / cb_params.batch_num
+        print("epoch time: {:5.3f}, per step time: {:5.3f}, avg loss: {:5.3f}".format(epoch_mseconds,
+                                                                                      per_step_mseconds,
+                                                                                      np.mean(self.losses)))
+
+    def step_begin(self, run_context):
+        self.step_time = time.time()
+
+    def step_end(self, run_context):
+        """
+        Args:
+            run_context:
+
+        Returns:
+        """
+        cb_params = run_context.original_args()
+        step_mseconds = (time.time() - self.step_time)
+        step_loss = cb_params.net_outputs
+
+        if isinstance(step_loss, (tuple, list)) and isinstance(step_loss[0], Tensor):
+            step_loss = step_loss[0]
+        if isinstance(step_loss, Tensor):
+            step_loss = np.mean(step_loss.asnumpy())
+
+        if math.isnan(step_loss) is not True and math.isinf(step_loss) is not True:
+
+            self.losses.append(step_loss)
+
+        cur_step_in_epoch = (cb_params.cur_step_num - 1) % cb_params.batch_num
+
+        print("epoch: [{:3d}/{:3d}], step:[{:5d}/{:5d}], loss:[{:5.3f}/{:5.3f}], time:[{:5.3f}], lr:[{:.9f}]".format(
+            cb_params.cur_epoch_num -
+            1, cb_params.epoch_num, cur_step_in_epoch, cb_params.batch_num, step_loss,
+            np.mean(self.losses), step_mseconds, self.lr_init[cb_params.cur_step_num - 1].asnumpy()))
--- a/research/audio/jasper/src/cleaners.py
+++ b/research/audio/jasper/src/cleaners.py
+# Copyright 2022 Huawei Technologies Co., Ltd
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ============================================================================
+
+import re
+from unidecode import unidecode
+from .number import normalize_numbers
+
+# Regular expression matching whitespace:
+_whitespace_re = re.compile(r'\s+')
+
+# List of (regular expression, replacement) pairs for abbreviations:
+_abbreviations = [(re.compile('\\b%s\\.' % x[0], re.IGNORECASE), x[1]) for x in [
+    ('mrs', 'misess'),
+    ('mr', 'mister'),
+    ('dr', 'doctor'),
+    ('st', 'saint'),
+    ('co', 'company'),
+    ('jr', 'junior'),
+    ('maj', 'major'),
+    ('gen', 'general'),
+    ('drs', 'doctors'),
+    ('rev', 'reverend'),
+    ('lt', 'lieutenant'),
+    ('hon', 'honorable'),
+    ('sgt', 'sergeant'),
+    ('capt', 'captain'),
+    ('esq', 'esquire'),
+    ('ltd', 'limited'),
+    ('col', 'colonel'),
+    ('ft', 'fort'),
+]]
+
+def expand_abbreviations(text):
+    for regex, replacement in _abbreviations:
+        text = re.sub(regex, replacement, text)
+    return text
+
+def expand_numbers(text):
+    return normalize_numbers(text)
+
+def lowercase(text):
+    return text.lower()
+
+def collapse_whitespace(text):
+    return re.sub(_whitespace_re, ' ', text)
+
+def convert_to_ascii(text):
+    return unidecode(text)
+
+def remove_punctuation(text, table):
+    text = text.translate(table)
+    text = re.sub(r'&', " and ", text)
+    text = re.sub(r'\+', " plus ", text)
+    return text
+
+def basic_cleaners(text):
+    '''Basic pipeline that lowercases and collapses whitespace without transliteration.'''
+    text = lowercase(text)
+    text = collapse_whitespace(text)
+    return text
+
+def transliteration_cleaners(text):
+    '''Pipeline for non-English text that transliterates to ASCII.'''
+    text = convert_to_ascii(text)
+    text = lowercase(text)
+    text = collapse_whitespace(text)
+    return text
+
+def english_cleaners(text, table=None):
+    '''Pipeline for English text, including number and abbreviation expansion.'''
+    text = convert_to_ascii(text)
+    text = lowercase(text)
+    text = expand_numbers(text)
+    text = expand_abbreviations(text)
+    if table is not None:
+        text = remove_punctuation(text, table)
+    text = collapse_whitespace(text)
+    return text
--- a/research/audio/jasper/src/config.py
+++ b/research/audio/jasper/src/config.py
+# Copyright 2022 Huawei Technologies Co., Ltd
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ===========================================================================
+"""
+network config setting, will be used in train.py and eval.py
+"""
+import inspect
+
+import yaml
+from easydict import EasyDict as ed
+
+from src.model import JasperBlock, JasperDecoderForCTC, JasperEncoder
+
+train_config = ed({
+
+
+    "TrainingConfig": {
+        "epochs": 440,
+        "loss_scale": 128.0
+    },
+
+    "DataConfig": {
+        "Data_dir": '/data/train_datasets',
+        "train_manifest": ['/data/train_datasets/librispeech-train-clean-100-wav.json',
+                           '/data/train_datasets/librispeech-train-clean-360-wav.json',
+                           '/data/train_datasets/librispeech-train-other-500-wav.json'],
+        "mindrecord_format": "/data/jasper_tr{}.md",
+        "mindrecord_files": [f"/data/jasper_tr{i}.md" for i in range(8)],
+        "batch_size": 64,
+        "accumulation_step": 2,
+        "labels_path": "labels.json",
+
+        "SpectConfig": {
+            "sample_rate": 16000,
+            "window_size": 0.02,
+            "window_stride": 0.01,
+            "window": "hamming"
+        },
+
+        "AugmentationConfig": {
+            "speed_volume_perturb": False,
+            "spec_augment": False,
+            "noise_dir": '',
+            "noise_prob": 0.4,
+            "noise_min": 0.0,
+            "noise_max": 0.5,
+        }
+    },
+
+    "OptimConfig": {
+        "learning_rate": 0.01,
+        "learning_anneal": 1.1,
+        "weight_decay": 1e-5,
+        "momentum": 0.9,
+        "eps": 1e-8,
+        "betas": (0.9, 0.999),
+        "loss_scale": 1024,
+        "epsilon": 0.00001
+    },
+
+    "CheckpointConfig": {
+        "ckpt_file_name_prefix": 'Jasper',
+        "ckpt_path": './checkpoint',
+        "keep_checkpoint_max": 10
+    }
+})
+
+eval_config = ed({
+
+    "save_output": 'librispeech_val_output',
+    "verbose": True,
+
+    "DataConfig": {
+
+        "Data_dir": '/data/inference_datasets',
+
+        "test_manifest": ['/data/inference_datasets/librispeech-dev-clean-wav.json'],
+
+
+        "batch_size": 32,
+        "labels_path": "labels.json",
+
+        "SpectConfig": {
+            "sample_rate": 16000,
+            "window_size": 0.02,
+            "window_stride": 0.01,
+            "window": "hanning"
+        },
+    },
+    "LMConfig": {
+        "decoder_type": "greedy",
+        "lm_path": './3-gram.pruned.3e-7.arpa',
+        "top_paths": 1,
+        "alpha": 1.818182,
+        "beta": 0,
+        "cutoff_top_n": 40,
+        "cutoff_prob": 1.0,
+        "beam_width": 1024,
+        "lm_workers": 4
+    },
+
+})
+
+
+def default_args(klass):
+    sig = inspect.signature(klass.__init__)
+    return {k: v.default for k, v in sig.parameters.items() if k != 'self'}
+
+
+def load(fpath):
+    if fpath.endswith('.toml'):
+        raise ValueError('.toml config format has been changed to .yaml')
+
+    cfg = yaml.safe_load(open(fpath, 'r'))
+
+    # Reload to deep copy shallow copies, which were made with yaml anchors
+    yaml.Dumper.ignore_aliases = lambda *args: True
+    cfg = yaml.dump(cfg)
+    cfg = yaml.safe_load(cfg)
+    return cfg
+
+
+def validate_and_fill(klass, user_conf, ignore_unk=None, optional=None):
+    conf = default_args(klass)
+    if ignore_unk is None:
+        ignore_unk = []
+    if optional is None:
+        optional = []
+    for k, v in user_conf.items():
+        conf[k] = v
+
+    # Keep only mandatory or optional-nonempty
+    conf = {k: v for k, v in conf.items()
+            if k not in optional or v is not inspect.Parameter.empty}
+
+    # Validate
+    for k, v in conf.items():
+        assert v is not inspect.Parameter.empty, \
+            f'Value for {k} not specified for {klass}'
+    return conf
+
+
+def encoder(conf):
+    """Validate config for JasperEncoder and subsequent JasperBlocks"""
+
+    # Validate, but don't overwrite with defaults
+    for blk in conf['jasper']['encoder']['blocks']:
+        validate_and_fill(JasperBlock, blk, optional=['infilters'],
+                          ignore_unk=['residual_dense'])
+
+    return validate_and_fill(JasperEncoder, conf['jasper']['encoder'])
+
+
+def decoder(conf, n_classes):
+    deco_kw = {'n_classes': n_classes, **conf['jasper']['decoder']}
+    return validate_and_fill(JasperDecoderForCTC, deco_kw)
+
+
+def add_ctc_blank(sym):
+    return sym + ['_']
+
+
+cfgs = load('./src/jasper10x5dr_speca.yaml')
+
+symbols = add_ctc_blank(cfgs['labels'])
+encoder_kw = encoder(cfgs)
+decoder_kw = decoder(cfgs, n_classes=len(symbols))