Skip to content
Snippets Groups Projects
Commit 7553d331 authored by 18767169826's avatar 18767169826 Committed by 18767169826
Browse files

jasper

parent 7face724
No related branches found
No related tags found
No related merge requests found
Showing
with 1760 additions and 0 deletions
# 目录
[View English](./README.md)
<!-- TOC -->
- - [目录](#目录)
- [jasper介绍](#jasper介绍)
- [网络模型结构](#网络模型结构)
- [数据集](#数据集)
- [环境要求](#环境要求)
- [文件说明和运行说明](#文件说明和运行说明)
- [代码目录结构说明](#代码目录结构说明)
- [模型参数](#模型参数)
- [训练和推理过程](#训练和推理过程)
- [Export](#Export)
- [性能](#性能)
- [训练性能](#训练性能)
- [推理性能](#推理性能)
- [ModelZoo主页](#modelzoo主页)
## [Jasper介绍](#contents)
Japser是一个使用 CTC 损失训练的端到端的语音识别模型。Jasper模型仅仅使用1D convolutions, batch normalization, ReLU, dropout和residual connections这些模块。训练和验证支持CPU和GPU。
[论文](https://arxiv.org/pdf/1904.03288v3.pdf): Jason Li, et al. Jasper: An End-to-End Convolutional Neural Acoustic Model.
## [网络模型结构](#contents)
Jasper是一种基于卷积的端到端神经声学模型。在音频处理阶段,将每一帧转换为梅尔尺度谱图特征,声学模型将其作为输入,并输出每一帧词汇表上的概率分布。声学模型具有模块化的块结构,可以相应地进行参数化:Jasper BxR模型有B个块,每个块由R个重复子块组成。
每一个子块应用下面这些操作:
1D-Convolution, Batch Normalization, ReLU activation, Dropout.
每个块输入通过残差连接直接连接到所有后续块的最后一个子块,本文称之为dense residual。每个块的内核大小和过滤器数量都不同,从底层到顶层,过滤器的大小都在增加。不管精确的块配置参数B和R如何,每个Jasper模型都有四个额外的卷积块:一个紧跟在输入层之后,三个在B块末尾。
## [数据集](#contents)
可以基于论文中提到的数据集或在相关领域/网络架构中广泛使用的数据集运行脚本。在下面的部分中,我们将介绍如何使用下面的相关数据集运行脚本。
使用的数据集为: [LibriSpeech](<http://www.openslr.org/12>)
训练集:
train-clean-100: [6.3G] (100小时的无噪音演讲训练集)
train-clean-360.tar.gz [23G] (360小时的无噪音演讲训练集)
train-other-500.tar.gz [30G] (500小时的有噪音演讲训练集)
验证集:
dev-clean.tar.gz [337M] (无噪音)
dev-other.tar.gz [314M] (有噪音)
测试集:
test-clean.tar.gz [346M] (测试集, 无噪音)
test-other.tar.gz [328M] (测试集, 有噪音)
数据格式:wav 和 txt 文件
## [环境要求](#contents)
硬件(GPU)
GPU处理器
框架
[MindSpore](https://www.mindspore.cn/install/en)
通过下面网址可以获得更多信息:
[MindSpore tutorials](https://www.mindspore.cn/tutorials/en/master/index.html)
[MindSpore Python API](https://www.mindspore.cn/docs/api/zh-CN/master/index.html)
## [文件说明和运行说明](#contents)
### [代码目录结构说明](#contents)
```path
.
└─audio
└─jasper
│ eval.py //推理文件
│ labels.json //需要用到的字符
│ pt2mind.py //pth转化ckpt文件
| create_mindrecord.py //将数据集转化为mindrecord
│ README-CN.md //中文readme
│ README.md //英文readme
│ requirements.txt //需要的库文件
│ train.py //训练文件
├─scripts
│ download_librispeech.sh //下载数据集的脚本
│ preprocess_librispeech.sh //处理数据集的脚本
│ run_distribute_train_gpu.sh //GPU8卡训练
│ run_eval_cpu.sh //CPU推理
│ run_eval_gpu.sh //GPU推理
│ run_standalone_train_cpu.sh //CPU单卡训练
│ run_standalone_train_gpu.sh //GPU单卡训练
├─src
│ audio.py //数据处理相关代码
│ callback.py //回调以监控训练
│ cleaners.py //数据清理
│ config.py //jasper配置文件
│ dataset.py //数据处理
│ decoder.py //来自第三方的解码器
│ eval_callback.py //推理的数据回调
│ greedydecoder.py //修改Mindspore代码的greedydecoder
│ jasper10x5dr_speca.yaml //jasper网络结构配置
│ lr_generator.py //产生学习率
│ model.py //训练模型
│ model_test.py //推理模型
│ number.py //数据处理
│ text.py //数据处理
│ __init__.py
└─utils
convert_librispeech.py //转化数据集
download_librispeech.py //下载数据集
download_utils.py //下载工具
inference_librispeech.csv //推理数据集链接
librispeech.csv //全部数据集链接
preprocessing_utils.py //预处理工具
__init__.py
```
### [模型参数](#contents)
训练和推理的相关参数在`config.py`文件
```text
训练相关参数
epochs 训练的epoch数量,默认为440
```
```text
数据处理相关参数
train_manifest 用于训练的数据文件路径,默认为 'data/libri_train_manifest.json'
val_manifest 用于测试的数据文件路径,默认为 'data/libri_val_manifest.json'
batch_size 批处理大小,默认为64
labels_path 模型输出的token json 路径, 默认为 "./labels.json"
sample_rate 数据特征的采样率,默认为16000
window_size 频谱图生成的窗口大小(秒),默认为0.02
window_stride 频谱图生成的窗口步长(秒),默认为0.01
window 频谱图生成的窗口类型,默认为 'hamming'
speed_volume_perturb 使用随机速度和增益扰动,默认为False,当前模型中未使用
spec_augment 在MEL谱图上使用简单的光谱增强,默认为False,当前模型中未使用
noise_dir 注入噪音到音频。默认为noise Inject未添加,默认为'',当前模型中未使用
noise_prob 每个样本加噪声的概率,默认为0.4,当前模型中未使用
noise_min 样本的最小噪音水平,(1.0意味着所有的噪声,不是原始信号),默认是0.0,当前模型中未使用
noise_max 样本的最大噪音水平。最大值为1.0,默认值为0.5,当前模型中未使用
```
```text
优化器相关参数
learning_rate 初始化学习率,默认为3e-4
learning_anneal 对每个epoch之后的学习率进行退火,默认为1.1
weight_decay 权重衰减,默认为1e-5
momentum 动量,默认为0.9
eps Adam eps,默认为1e-8
betas Adam betas,默认为(0.9, 0.999)
loss_scale 损失规模,默认是1024
```
```text
checkpoint相关参数
ckpt_file_name_prefix ckpt文件的名称前缀,默认为'DeepSpeech'
ckpt_path ckpt文件的保存路径,默认为'checkpoints'
keep_checkpoint_max ckpt文件的最大数量限制,删除旧的检查点,默认是10
```
## [训练和推理过程](#contents)
### 训练
```text
运行: train.py [--use_pretrained USE_PRETRAINED]
[--pre_trained_model_path PRE_TRAINED_MODEL_PATH]
[--is_distributed IS_DISTRIBUTED]
[--bidirectional BIDIRECTIONAL]
[--device_target DEVICE_TARGET]
参数:
--pre_trained_model_path 预先训练的模型文件路径,默认为''
--is_distributed 多卡训练,默认为False
--device_target 运行代码的设备:"GPU" | “CPU”,默认为"GPU"
```
### 推理
```text
运行: eval.py [--bidirectional BIDIRECTIONAL]
[--pretrain_ckpt PRETRAIN_CKPT]
[--device_target DEVICE_TARGET]
参数:
--pretrain_ckpt checkpoint的文件路径, 默认为''
--device_target 运行代码的设备:"GPU" | “CPU”,默认为"GPU"
```
在训练之前,应该下载、处理数据集。
``` bash
bash scripts/download_librispeech.sh
bash scripts/preprocess_librispeech.sh
python create_mindrecord.py //将数据集转成mindrecord格式
```
流程结束后,数据目录结构如下:
```path
.
|--LibriSpeech
│ |--train-clean-100-wav
│ │--train-clean-360-wav
│ │--train-other-500-wav
│ |--dev-clean-wav
│ |--dev-other-wav
│ |--test-clean-wav
│ |--test-other-wav
|--librispeech-train-clean-100-wav.json,librispeech-train-clean-360-wav.json,librispeech-train-other-500-wav.json,librispeech-dev-clean-wav.json,librispeech-dev-other-wav.json,librispeech-test-clean-wav.json,librispeech-test-other-wav.json
```
src/config中设置数据集的位置。
```shell
...
训练配置
"Data_dir": '/data/dataset',
"train_manifest": ['/data/dataset/librispeech-train-clean-100-wav.json',
'/data/dataset/librispeech-train-clean-360-wav.json',
'/data/dataset/librispeech-train-other-500-wav.json'],
"mindrecord_format": "/data/jasper_tr{}.md",
"mindrecord_files": [f"/data/jasper_tr{i}.md" for i in range(8)]
评估配置
"DataConfig":{
"Data_dir": '/data/inference_datasets',
"test_manifest": ['/data/inference_datasets/librispeech-dev-clean-wav.json'],
}
```
训练之前,需要安装`librosa` and `Levenshtein`
通过官网安装MindSpore并完成数据集处理后,可以开始训练如下:
```shell
# gpu单卡训练
bash ./scripts/run_standalone_train_gpu.sh [DEVICE_ID]
# cpu单卡训练
bash ./scripts/run_standalone_train_cpu.sh
# gpu多卡训练
bash ./scripts/run_distribute_train_gpu.sh
```
推理:
```shell
# cpu评估
bash ./scripts/run_eval_cpu.sh [PATH_CHECKPOINT]
# gpu评估
bash ./scripts/run_eval_gpu.sh [DEVICE_ID] [PATH_CHECKPOINT]
```
## [性能](#contents)
### [训练和测试性能分析](#contents)
#### 训练性能
| 参数 | Jasper |
| -------------------------- | ---------------------------------------------------------------|
| 资源 | NV SMX2 V100-32G |
| 更新日期 | 2/7/2022 (month/day/year) |
| MindSpore版本 | 1.8.0 |
| 数据集 | LibriSpeech |
| 训练参数 | 8p, epoch=440, steps=1088 * epoch, batch_size = 64, lr=3e-4 |
| 优化器 | Adam |
| 损失函数 | CTCLoss |
| 输出 | 概率值 |
| 损失值 | 0.2-0.7 |
| 运行速度 | 8p 2.7s/step |
| 训练总时间 | 8p: around 194h; |
| Checkpoint文件大小 | 991M (.ckpt file) |
| 代码 | [Japser script](https://gitee.com/mindspore/models/tree/master/research/audio/jasper) |
#### Inference Performance
| 参数 | Jasper |
| -------------------------- | ----------------------------------------------------------------|
| 资源 | NV SMX2 V100-32G |
| 更新日期 | 2/7/2022 (month/day/year) |
| MindSpore版本 | 1.8.0 |
| 数据集 | LibriSpeech |
| 批处理大小 | 64 |
| 输出 | 概率值 |
| 精确度(无噪声) | 8p: WER: 5.754 CER: 2.151 |
| 精确度(有噪声) | 8p: WER: 19.213 CER: 9.393 |
| 模型大小 | 330M (.mindir file) |
## [ModelZoo主页](#contents)
[ModelZoo主页](https://gitee.com/mindspore/models).
# Contents
- - [jasper Description](#CenterNet-description)
- [Model Architecture](#Model-Architecture)
- [Dataset](#dataset)
- [Environment Requirements](#environment-requirements)
- [Script Description](#script-description)
- [Script and Sample Code](#script-parameters)
- [Script Parameters](#script-parameters)
- [Training and eval Process](#training-process)
- [Export](#Export)
- [Performance](#performance)
- [Training Performance](#training-performance)
- [Inference Performance](#inference-performance)
- [ModelZoo Homepage](#modelzoo-homepage)
## [Jasper Description](#contents)
Jasper is an end-to-end speech recognition models which is trained with CTC loss. Jasper model uses only 1D convolutions, batch normalization, ReLU, dropout, and residual connections. We support training and evaluation on CPU and GPU.
[Paper](https://arxiv.org/pdf/1904.03288v3.pdf): Jason Li, et al. Jasper: An End-to-End Convolutional Neural Acoustic Model.
## [Model Architecture](#contents)
Jasper is an end-to-end neural acoustic model that is based on convolutions. In the audio processing stage, each frame is transformed into mel-scale spectrogram features, which the acoustic model takes as input and outputs a probability distribution over the vocabulary for each frame. The acoustic model has a modular block structure and can be parametrized accordingly: a Jasper BxR model has B blocks, each consisting of R repeating sub-blocks.
Each sub-block applies the following operations in sequence: 1D-Convolution, Batch Normalization, ReLU activation, and Dropout.
Each block input is connected directly to the last subblock of all following blocks via a residual connection, which is referred to as dense residual in the paper. Every block differs in kernel size and number of filters, which are increasing in size from the bottom to the top layers. Irrespective of the exact block configuration parameters B and R, every Jasper model has four additional convolutional blocks: one immediately succeeding the input layer (Prologue) and three at the end of the B blocks (Epilogue).
## [Dataset](#contents)
Note that you can run the scripts based on the dataset mentioned in original paper or widely used in relevant domain/network architecture. In the following sections, we will introduce how to run the scripts using the related dataset below.
Dataset used: [LibriSpeech](<http://www.openslr.org/12>)
Train Data:
train-clean-100: [6.3G] (training set of 100 hours "clean" speech)
train-clean-360.tar.gz [23G] (training set of 360 hours "clean" speech)
train-other-500.tar.gz [30G] (training set of 500 hours "other" speech)
Val Data:
dev-clean.tar.gz [337M] (development set, "clean" speech)
dev-other.tar.gz [314M] (development set, "other", more challenging, speech)
Test Data:
test-clean.tar.gz [346M] (test set, "clean" speech )
test-other.tar.gz [328M] (test set, "other" speech )
Data format:wav and txt files
## [Environment Requirements](#contents)
Hardware(GPU)
Prepare hardware environment with GPU processor.
Framework
[MindSpore](https://www.mindspore.cn/install/en)
For more information, please check the resources below:
[MindSpore tutorials](https://www.mindspore.cn/tutorials/en/master/index.html)
[MindSpore Python API](https://www.mindspore.cn/docs/api/en/master/index.html)
## [Script Description](#contents)
### [Script and Sample Code](#contents)
```path
.
└─audio
└─jasper
│ eval.py //inference file
│ labels.json //label file
│ pt2mind.py //pth transform to ckpt file
| create_mindrecord.py //transform data to mindrecord
│ README-CN.md //Chinese readme
│ README.md //English readme
│ requirements.txt //required library file
│ train.py //train file
├─scripts
│ download_librispeech.sh //download data
│ preprocess_librispeech.sh //preprocess data
│ run_distribute_train_gpu.sh //8 GPU cards train
│ run_eval_cpu.sh //CPU evaluate
│ run_eval_gpu.sh //GPU evaluate
│ run_standalone_train_cpu.sh //one CPU train
│ run_standalone_train_gpu.sh //one GPU train
├─src
│ audio.py //preprocess data
│ callback.py //callback
│ cleaners.py //preprocess data
│ config.py //jasper config
│ dataset.py //preporcess data
│ decoder.py //Third-party decoders
│ eval_callback.py //evaluate callback
│ greedydecoder.py //refactored greedydecoder
│ jasper10x5dr_speca.yaml //jasper model's config
│ lr_generator.py //learning rate
│ model.py //training model
│ model_test.py //inference model
│ number.py //preprocess data
│ text.py //preprocess data
│ __init__.py
└─utils
convert_librispeech.py //convert data
download_librispeech.py //download data
download_utils.py //download utils
inference_librispeech.csv //links to inference data
librispeech.csv //links to all data
preprocessing_utils.py //preprocessing utils
__init__.py
```
### [Script Parameters](#contents)
#### Training
```text
usage: train.py [--use_pretrained USE_PRETRAINED]
[--pre_trained_model_path PRE_TRAINED_MODEL_PATH]
[--is_distributed IS_DISTRIBUTED]
[--bidirectional BIDIRECTIONAL]
[--device_target DEVICE_TARGET]
options:
--pre_trained_model_path pretrained checkpoint path, default is ''
--is_distributed distributed training, default is False
is True. Currently, only bidirectional model is implemented
--device_target device where the code will be implemented: "GPU" | "CPU", default is "GPU"
```
#### Evaluation
```text
usage: eval.py [--bidirectional BIDIRECTIONAL]
[--pretrain_ckpt PRETRAIN_CKPT]
[--device_target DEVICE_TARGET]
options:
--bidirectional whether to use bidirectional RNN, default is True. Currently, only bidirectional model is implemented
--pretrain_ckpt saved checkpoint path, default is ''
--device_target device where the code will be implemented: "GPU" | "CPU", default is "GPU"
```
#### Options and Parameters
Parameters for training and evaluation can be set in file `config.py`
```text
config for training.
epochs number of training epoch, default is 70
```
```text
config for dataloader.
train_manifest train manifest path, default is 'data/libri_train_manifest.json'
val_manifest dev manifest path, default is 'data/libri_val_manifest.json'
batch_size batch size for training, default is 8
labels_path tokens json path for model output, default is "./labels.json"
sample_rate sample rate for the data/model features, default is 16000
window_size window size for spectrogram generation (seconds), default is 0.02
window_stride window stride for spectrogram generation (seconds), default is 0.01
window window type for spectrogram generation, default is 'hamming'
speed_volume_perturb use random tempo and gain perturbations, default is False, not used in current model
spec_augment use simple spectral augmentation on mel spectograms, default is False, not used in current model
noise_dir directory to inject noise into audio. If default, noise Inject not added, default is '', not used in current model
noise_prob probability of noise being added per sample, default is 0.4, not used in current model
noise_min minimum noise level to sample from. (1.0 means all noise, not original signal), default is 0.0, not used in current model
noise_max maximum noise levels to sample from. Maximum 1.0, default is 0.5, not used in current model
```
```text
config for optimizer.
learning_rate initial learning rate, default is 3e-4
learning_anneal annealing applied to learning rate after each epoch, default is 1.1
weight_decay weight decay, default is 1e-5
momentum momentum, default is 0.9
eps Adam eps, default is 1e-8
betas Adam betas, default is (0.9, 0.999)
loss_scale loss scale, default is 1024
```
```text
config for checkpoint.
ckpt_file_name_prefix ckpt_file_name_prefix, default is 'Jasper'
ckpt_path path to save ckpt, default is 'checkpoints'
keep_checkpoint_max max number of checkpoints to save, delete older checkpoints, default is 10
```
## [Training and Eval process](#contents)
Before training, the dataset should be processed.
``` bash
bash scripts/download_librispeech.sh
bash scripts/preprocess_librispeech.sh
python createmindrecord.py //transform data to mindrecord
```
dataset directory structure is as follows:
```path
.
|--LibriSpeech
│ |--train-clean-100-wav
│ │--train-clean-360-wav
│ │--train-other-500-wav
│ |--dev-clean-wav
│ |--dev-other-wav
│ |--test-clean-wav
│ |--test-other-wav
|--librispeech-train-clean-100-wav.json,librispeech-train-clean-360-wav.json,librispeech-train-other-500-wav.json,librispeech-dev-clean-wav.json,librispeech-dev-other-wav.json,librispeech-test-clean-wav.json,librispeech-test-other-wav.json
```
The three *.json file stores the absolute path of the corresponding
data. After obtaining the 3 json file, you should modify the configurations in `src/config.py`.
For training config, the train_manifest should be configured with the path of `libri_train_manifest.json` and for eval config, it should be configured
with `libri_test_other_manifest.json` or `libri_train_manifest.json`, depending on which dataset is evaluated.
```shell
...
train config
"Data_dir": '/data/dataset',
"train_manifest": ['/data/dataset/librispeech-train-clean-100-wav.json',
'/data/dataset/librispeech-train-clean-360-wav.json',
'/data/dataset/librispeech-train-other-500-wav.json'],
"mindrecord_format": "/data/jasper_tr{}.md",
"mindrecord_files": [f"/data/jasper_tr{i}.md" for i in range(8)]
eval config
"DataConfig":{
"Data_dir": '/data/inference_datasets',
"test_manifest": ['/data/inference_datasets/librispeech-dev-clean-wav.json'],
}
```
Before training, some requirements should be installed, including `librosa` and `Levenshtein`
After installing MindSpore via the official website and finishing dataset processing, you can start training as follows:
```shell
# standalone training gpu
bash ./scripts/run_standalone_train_gpu.sh [DEVICE_ID]
# standalone training cpu
bash ./scripts/run_standalone_train_cpu.sh
# distributed training gpu
bash ./scripts/run_distribute_train_gpu.sh
```
The following script is used to evaluate the model. Note we only support greedy decoder now and before run the script:
```shell
# eval on cpu
bash ./scripts/run_eval_cpu.sh [PATH_CHECKPOINT]
# eval on gpu
bash ./scripts/run_eval_gpu.sh [DEVICE_ID] [PATH_CHECKPOINT]
```
## [Model Description](#contents)
### [Performance](#contents)
#### Training Performance
| Parameters | Jasper |
| -------------------- | ------------------------------------------------------------ |
| Resource | NV SMX2 V100-32G |
| uploaded Date | 2/7/2022 (month/day/year) |
| MindSpore Version | 1.8.0 |
| Dataset | LibriSpeech |
| Training Parameters | 8p, epoch=70, steps=1088 * epoch, batch_size = 64, lr=3e-4 |
| Optimizer | Adam |
| Loss Function | CTCLoss |
| outputs | probability |
| Loss | 0.2-0.7 |
| Speed | 8p 2.7s/step |
| Total time: training | 8p: around 194 h; |
| Checkpoint | 991M (.ckpt file) |
| Scripts | [Jasper script](https://gitee.com/mindspore/models/tree/master/research/audio/jasper) |
#### Inference Performance
| Parameters | Jasper |
| ------------------- | -------------------------- |
| Resource | NV SMX2 V100-32G |
| uploaded Date | 2/7/2022 (month/day/year) |
| MindSpore Version | 1.8.0 |
| Dataset | LibriSpeech |
| batch_size | 64 |
| outputs | probability |
| Accuracy(dev-clean) | 8p: WER: 5.754 CER: 2.151 |
| Accuracy(dev-other) | 8p: WER: 19.213 CER: 9.393 |
| Model for inference | 330M (.mindir file) |
## [ModelZoo Homepage](#contents)
Please check the official [homepage](https://gitee.com/mindspore/models).
# Copyright 2022 Huawei Technologies Co., Ltd
#
# Licensed under the Apache License, Version 2.0 (the License);
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# httpwww.apache.orglicensesLICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an AS IS BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ============================================================================
import os
from multiprocessing import Pool
import mindspore.dataset.engine as de
from mindspore.mindrecord import FileWriter
from src.dataset import ASRDataset
from src.config import train_config, symbols
def _exec_task(task_id):
"""
Execute task with specified task id
"""
print("exec task {}...".format(task_id))
# get number of files
writer = FileWriter(mindrecord_file.format(task_id), 1)
writer.set_page_size(1 << 25)
jasper_json = {
"batch_spect": {"type": "float32", "shape": [1, 64, -1]},
"batch_script": {"type": "int32", "shape": [-1,]}
}
writer.add_schema(jasper_json, "jasper_json")
output_columns = ["batch_spect", "batch_script"]
dataset = ASRDataset(data_dir=train_config.DataConfig.Data_dir,
manifest_fpaths=train_config.DataConfig.train_manifest,
labels=symbols,
batch_size=1,
train_mode=True)
ds = de.GeneratorDataset(dataset, output_columns,
num_shards=num_tasks, shard_id=task_id)
dataset_size = ds.get_dataset_size()
for c, item in enumerate(ds.create_dict_iterator(output_numpy=True)):
row = {"batch_spect": item["batch_spect"],
"batch_script": item["batch_script"]}
writer.write_raw_data([row])
print(f"{c}/{dataset_size}", flush=True)
writer.commit()
if __name__ == "__main__":
mindrecord_file = train_config.DataConfig.mindrecord_format
mindrecord_dir = os.path.dirname(mindrecord_file)
if not os.path.isdir(mindrecord_dir):
os.makedirs(mindrecord_dir)
num_tasks = 8
print("Write mindrecord ...")
task_list = list(range(num_tasks))
if os.name == 'nt':
for window_task_id in task_list:
_exec_task(window_task_id)
elif num_tasks > 1:
with Pool(num_tasks) as p:
p.map(_exec_task, task_list)
else:
_exec_task(0)
# Copyright 2022 Huawei Technologies Co., Ltd
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ============================================================================
"""
Eval for Japer
"""
import argparse
import json
import pickle
import numpy as np
from src.config import eval_config, symbols, encoder_kw, decoder_kw
from src.model_test import Jasper, PredictWithSoftmax
from src.dataset import create_eval_dataset
from src.decoder import GreedyDecoder
from mindspore import context
from mindspore.train.serialization import load_checkpoint, load_param_into_net
parser = argparse.ArgumentParser(description='jasper evaluation')
parser.add_argument('--pretrain_ckpt', type=str,
default='./checkpoint/ckpt_0/jasper10.ckpt', help='Pretrained checkpoint path')
parser.add_argument('--device_target', type=str, default="GPU", choices=("GPU", "CPU"),
help='Device target, support GPU and CPU, Default: GPU')
args = parser.parse_args()
if __name__ == '__main__':
context.set_context(mode=context.GRAPH_MODE,
device_target=args.device_target, save_graphs=False)
config = eval_config
with open(config.DataConfig.labels_path) as label_file:
labels = json.load(label_file)
model = PredictWithSoftmax(
Jasper(encoder_kw=encoder_kw, decoder_kw=decoder_kw))
ds_eval = create_eval_dataset(data_dir=config.DataConfig.Data_dir,
manifest_filepath=config.DataConfig.test_manifest,
labels=symbols, batch_size=config.DataConfig.batch_size, train_mode=False)
param_dict = load_checkpoint(args.pretrain_ckpt)
load_param_into_net(model, param_dict)
print('Successfully loading the pre-trained model')
if config.LMConfig.decoder_type == 'greedy':
decoder = GreedyDecoder(labels=symbols, blank_index=len(symbols)-1)
else:
raise NotImplementedError("Only greedy decoder is supported now")
target_decoder = GreedyDecoder(symbols, blank_index=len(symbols)-1)
model.set_train(False)
total_cer, total_wer, num_tokens, num_chars = 0, 0, 0, 0
output_data = []
for data in ds_eval.create_dict_iterator():
inputs, input_length, target_indices, targets = data['inputs'], data['input_length'], data['target_indices'], \
data['targets']
split_targets = []
start, count, last_id = 0, 0, 0
target_indices, targets = target_indices.asnumpy(), targets.asnumpy()
for i in range(np.shape(targets)[0]):
if target_indices[i, 0] == last_id:
count += 1
else:
split_targets.append(list(targets[start:count]))
last_id += 1
start = count
count += 1
split_targets.append(list(targets[start:]))
out, output_sizes = model(inputs, input_length)
decoded_output, _ = decoder.decode(out, output_sizes)
target_strings = target_decoder.convert_to_strings(split_targets)
if config.save_output is not None:
output_data.append(
(out.asnumpy(), output_sizes.asnumpy(), target_strings))
for doutput, toutput in zip(decoded_output, target_strings):
transcript, reference = doutput[0], toutput[0]
wer_inst = decoder.wer(transcript, reference)
cer_inst = decoder.cer(transcript, reference)
total_wer += wer_inst
total_cer += cer_inst
num_tokens += len(reference.split())
num_chars += len(reference.replace(' ', ''))
if config.verbose:
print("Ref:", reference.lower())
print("Hyp:", transcript.lower())
print("WER:", float(wer_inst) / len(reference.split()),
"CER:", float(cer_inst) / len(reference.replace(' ', '')), "\n")
wer = float(total_wer) / num_tokens
cer = float(total_cer) / num_chars
print('Test Summary \t'
'Average WER {wer:.3f}\t'
'Average CER {cer:.3f}\t'.format(wer=wer * 100, cer=cer * 100))
if config.save_output is not None:
with open(config.save_output + '.bin', 'wb') as output:
pickle.dump(output_data, output)
# Copyright 2022 Huawei Technologies Co., Ltd
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ============================================================================
"""
export checkpoint file to mindir model
"""
import json
import argparse
import numpy as np
import mindspore as ms
from mindspore import context, Tensor
from mindspore.train.serialization import load_checkpoint, load_param_into_net, export
from src.config import train_config, encoder_kw, decoder_kw
from src.model import Jasper
parser = argparse.ArgumentParser(
description='Export DeepSpeech model to Mindir')
parser.add_argument('--pre_trained_model_path', type=str,
default='', help=' existed checkpoint path')
parser.add_argument('--device_target', type=str, default="GPU", choices=("GPU", "CPU"),
help='Device target, support GPU and CPU, Default: GPU')
args = parser.parse_args()
if __name__ == '__main__':
config = train_config
context.set_context(mode=context.GRAPH_MODE,
device_target=args.device_target, save_graphs=False)
with open(config.DataConfig.labels_path) as label_file:
labels = json.load(label_file)
jasper_net = Jasper(encoder_kw=encoder_kw,
decoder_kw=decoder_kw).to_float(ms.float16)
param_dict = load_checkpoint(args.pre_trained_model_path)
load_param_into_net(jasper_net, param_dict)
print('Successfully loading the pre-trained model')
# 3500 is the max length in evaluation dataset(LibriSpeech). This is consistent with that in dataset.py
# The length is fixed to this value because Mindspore does not support dynamic shape currently
input_np = np.random.uniform(
0.0, 1.0, size=[1, 64, 3500]).astype(np.float32)
length = np.array([100], dtype=np.int32)
export(jasper_net, Tensor(input_np), Tensor(length),
file_name="jasper.mindir", file_format='MINDIR')
[
"'",
"A",
"B",
"C",
"D",
"E",
"F",
"G",
"H",
"I",
"J",
"K",
"L",
"M",
"N",
"O",
"P",
"Q",
"R",
"S",
"T",
"U",
"V",
"W",
"X",
"Y",
"Z",
" ",
"_"
]
\ No newline at end of file
# Copyright 2022 Huawei Technologies Co., Ltd
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ============================================================================
import argparse
import re
import numpy as np
from sympy import arg
import torch
from mindspore.train.serialization import save_checkpoint
from mindspore import Tensor
parser = argparse.ArgumentParser(description='pth translate to ckpt')
parser.add_argument('--pth', type='str',
default='/data/Jasper_epoch10_checkpoint.pt', help='path of pth')
args = parser.parse_args()
def convert_v1_state_dict(state_dict):
rules = [
('^jasper_encoder.encoder.', 'encoder.layers.'),
('^jasper_decoder.decoder_layers.', 'decoder.layers.'),
]
ret = {}
for k, v in state_dict.items():
if k.startswith('acoustic_model.'):
continue
if k.startswith('audio_preprocessor.'):
continue
for pattern, to in rules:
k = re.sub(pattern, to, k)
ret[k] = v
return ret
checkpoint = torch.load(arg.pth, map_location="cpu")
state_dic = convert_v1_state_dict(checkpoint['state_dict'])
mydict = state_dic
newparams_list = []
names = [item for item in mydict if 'num_batches_tracked' not in item]
i = 0
for name in names:
parameter = mydict[name].numpy()
param_dict = {}
if i % 5 == 0:
name = name.replace('weight', 'conv1.weight')
parameter = np.expand_dims(parameter, axis=2)
elif i % 5 == 1:
name = name.replace('weight', 'batchnorm.gamma')
elif i % 5 == 2:
name = name.replace('bias', 'batchnorm.beta')
elif i % 5 == 3:
name = name.replace('running_mean', 'batchnorm.moving_mean')
else:
name = name.replace('running_var', 'batchnorm.moving_variance')
if i == 540:
name = name.replace('0.conv1.weight', 'weight')
if i == 541:
name = name.replace('0.bias', 'bias')
param_dict['name'] = name
param_dict['data'] = Tensor(parameter)
newparams_list.append(param_dict)
if i % 5 == 4:
newparams_list[i-3], newparams_list[i-2], newparams_list[i-1], newparams_list[i] = \
newparams_list[i - 1], newparams_list[i], newparams_list[i -
3], newparams_list[i-2]
i += 1
save_checkpoint(newparams_list, './jasper_mindspore_10.ckpt')
print("end")
ctcdecode==1.0.2
easydict==1.9
inflect==5.4.0
librosa==0.8.0
mindspore==1.8.0
numpy==1.20.1
pandas==1.2.4
python_Levenshtein==0.12.2
PyYAML==6.0
requests==2.25.1
six==1.15.0
SoundFile==0.10.3.post1
sox==1.4.1
tqdm==4.59.0
Unidecode==1.3.4
#!/bin/bash
# Copyright 2022 Huawei Technologies Co., Ltd
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ============================================================================
DATA_SET=$1
DATA_ROOT_DIR=$2
DATA_DIR="${DATA_ROOT_DIR}/${DATA_SET}"
if [ ! -d "$DATA_DIR" ]
then
mkdir --mode 755 $DATA_DIR
python utils/download_librispeech.py \
utils/inference_librispeech.csv \
$DATA_DIR \
-e ${DATA_ROOT_DIR}/
else
echo "Directory $DATA_DIR already exists."
fi
#!/usr/bin/env bash
# Copyright 2022 Huawei Technologies Co., Ltd
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ============================================================================
SPEEDS=$1
[ -n "$SPEEDS" ] && SPEED_FLAG="--speed $SPEEDS"
python ./utils/convert_librispeech.py \
--input_dir /datasets/LibriSpeech/train-clean-100 \
--dest_dir /datasets/LibriSpeech/train-clean-100-wav \
--output_json /datasets/LibriSpeech/librispeech-train-clean-100-wav.json \
$SPEED_FLAG
python ./utils/convert_librispeech.py \
--input_dir /datasets/LibriSpeech/train-clean-360 \
--dest_dir /datasets/LibriSpeech/train-clean-360-wav \
--output_json /datasets/LibriSpeech/librispeech-train-clean-360-wav.json \
$SPEED_FLAG
python ./utils/convert_librispeech.py \
--input_dir /datasets/LibriSpeech/train-other-500 \
--dest_dir /datasets/LibriSpeech/train-other-500-wav \
--output_json /datasets/LibriSpeech/librispeech-train-other-500-wav.json \
$SPEED_FLAG
python ./utils/convert_librispeech.py \
--input_dir /datasets/LibriSpeech/dev-clean \
--dest_dir /datasets/LibriSpeech/dev-clean-wav \
--output_json /datasets/LibriSpeech/librispeech-dev-clean-wav.json
python ./utils/convert_librispeech.py \
--input_dir /datasets/LibriSpeech/dev-other \
--dest_dir /datasets/LibriSpeech/dev-other-wav \
--output_json /datasets/LibriSpeech/librispeech-dev-other-wav.json
python ./utils/convert_librispeech.py \
--input_dir /datasets/LibriSpeech/test-clean \
--dest_dir /datasets/LibriSpeech/test-clean-wav \
--output_json /datasets/LibriSpeech/librispeech-test-clean-wav.json
python ./utils/convert_librispeech.py \
--input_dir /datasets/LibriSpeech/test-other \
--dest_dir /datasets/LibriSpeech/test-other-wav \
--output_json /datasets/LibriSpeech/librispeech-test-other-wav.json
#!/bin/bash
# Copyright 2022 Huawei Technologies Co., Ltd
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ============================================================================
mpirun --allow-run-as-root -n 8 --output-filename log_output --merge-stderr-to-stdout \
python ./train.py --is_distributed --device_target 'GPU' > train_8p.log 2>&1 &
#!/bin/bash
# Copyright 2022 Huawei Technologies Co., Ltd
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ============================================================================
PATH_CHECKPOINT=$1
python ./eval.py --pretrain_ckpt $PATH_CHECKPOINT --device_target 'CPU' > eval.log 2>&1 &
#!/bin/bash
# Copyright 2022 Huawei Technologies Co., Ltd
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ============================================================================
DEVICE_ID=$1
PATH_CHECKPOINT=$2
export CUDA_VISIBLE_DEVICES=$DEVICE_ID
python ./eval.py --pretrain_ckpt $PATH_CHECKPOINT \
--device_target 'GPU' > eval.log 2>&1 &
\ No newline at end of file
#!/bin/bash
# Copyright 2022 Huawei Technologies Co., Ltd
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ============================================================================
python ./train.py --device_target 'CPU' > train.log 2>&1 &
#!/bin/bash
# Copyright 2022 Huawei Technologies Co., Ltd
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ============================================================================
DEVICE_ID=$1
CUDA_VISIBLE_DEVICES=$DEVICE_ID python ./train.py --device_target 'GPU' > train.log 2>&1 &
# Copyright 2022 Huawei Technologies Co., Ltd
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ============================================================================
# Copyright 2022 Huawei Technologies Co., Ltd
#
# Licensed under the Apache License, Version 2.0 (the License);
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# httpwww.apache.orglicensesLICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an AS IS BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ============================================================================
import random
import soundfile as sf
import librosa
import numpy as np
import sox
def audio_from_file(file_path, offset=0, duration=0, trim=False, target_sr=16000):
audio = AudioSegment(file_path, target_sr=target_sr, int_values=False,
offset=offset, duration=duration, trim=trim)
samples = audio.samples
num_samples = samples.shape[0]
return (np.expand_dims(samples, 0), np.expand_dims(num_samples, 0))
class AudioSegment:
"""Monaural audio segment abstraction.
:param samples: Audio samples [num_samples x num_channels].
:type samples: ndarray.float32
:param sample_rate: Audio sample rate.
:type sample_rate: int
:raises TypeError: If the sample data type is not float or int.
"""
def __init__(self, filename, target_sr=None, int_values=False, offset=0,
duration=0, trim=False, trim_db=60):
"""Create audio segment from samples.
Samples are convert float32 internally, with int scaled to [-1, 1].
Load a file supported by librosa and return as an AudioSegment.
:param filename: path of file to load
:param target_sr: the desired sample rate
:param int_values: if true, load samples as 32-bit integers
:param offset: offset in seconds when loading audio
:param duration: duration in seconds when loading audio
:return: numpy array of samples
"""
with sf.SoundFile(filename, 'r') as f:
dtype = 'int32' if int_values else 'float32'
sample_rate = f.samplerate
if offset > 0:
f.seek(int(offset * sample_rate))
if duration > 0:
samples = f.read(int(duration * sample_rate), dtype=dtype)
else:
samples = f.read(dtype=dtype)
samples = samples.transpose()
samples = self._convert_samples_to_float32(samples)
if target_sr is not None and target_sr != sample_rate:
samples = librosa.core.resample(samples, sample_rate, target_sr)
sample_rate = target_sr
if trim:
samples, _ = librosa.effects.trim(samples, trim_db)
self._samples = samples
self._sample_rate = sample_rate
if self._samples.ndim >= 2:
self._samples = np.mean(self._samples, 1)
def __eq__(self, other):
"""Return whether two objects are equal."""
if type(other) is not type(self):
return False
if self._sample_rate != other._sample_rate: # pylint: disable=W0212
return False
if self._samples.shape != other._samples.shape: # pylint: disable=W0212
return False
if np.any(self.samples != other._samples): # pylint: disable=W0212
return False
return True
def __ne__(self, other):
"""Return whether two objects are unequal."""
return not self.__eq__(other)
def __str__(self):
"""Return human-readable representation of segment."""
return ("%s: num_samples=%d, sample_rate=%d, duration=%.2fsec, "
"rms=%.2fdB" % (type(self), self.num_samples, self.sample_rate,
self.duration, self.rms_db))
@staticmethod
def _convert_samples_to_float32(samples):
"""Convert sample type to float32.
Audio sample type is usually integer or float-point.
Integers will be scaled to [-1, 1] in float32.
"""
float32_samples = samples.astype('float32')
if samples.dtype in np.sctypes['int']:
bits = np.iinfo(samples.dtype).bits
float32_samples *= (1. / 2 ** (bits - 1))
elif samples.dtype in np.sctypes['float']:
pass
else:
raise TypeError("Unsupported sample type: %s." % samples.dtype)
return float32_samples
@property
def samples(self):
return self._samples.copy()
@property
def sample_rate(self):
return self._sample_rate
@property
def num_samples(self):
return self._samples.shape[0]
@property
def duration(self):
return self._samples.shape[0] / float(self._sample_rate)
@property
def rms_db(self):
mean_square = np.mean(self._samples ** 2)
return 10 * np.log10(mean_square)
def gain_db(self, gain):
self._samples *= 10. ** (gain / 20.)
def pad(self, pad_size, symmetric=False):
"""Add zero padding to the sample.
The pad size is given in number of samples. If symmetric=True,
`pad_size` will be added to both sides. If false, `pad_size` zeros
will be added only to the end.
"""
self._samples = np.pad(self._samples,
(pad_size if symmetric else 0, pad_size),
mode='constant')
def subsegment(self, start_time=None, end_time=None):
"""Cut the AudioSegment between given boundaries.
Note that this is an in-place transformation.
:param start_time: Beginning of subsegment in seconds.
:type start_time: float
:param end_time: End of subsegment in seconds.
:type end_time: float
:raise ValueError: If start_time or end_time is incorrectly set, e.g. out
of bounds in time.
"""
start_time = 0.0 if start_time is None else start_time
end_time = self.duration if end_time is None else end_time
if start_time < 0.0:
start_time = self.duration + start_time
if end_time < 0.0:
end_time = self.duration + end_time
if start_time < 0.0:
raise ValueError("The slice start position (%f s) is out of "
"bounds." % start_time)
if end_time < 0.0:
raise ValueError("The slice end position (%f s) is out of bounds." %
end_time)
if start_time > end_time:
raise ValueError("The slice start position (%f s) is later than "
"the end position (%f s)." % (start_time, end_time))
if end_time > self.duration:
raise ValueError("The slice end position (%f s) is out of bounds "
"(> %f s)" % (end_time, self.duration))
start_sample = int(round(start_time * self._sample_rate))
end_sample = int(round(end_time * self._sample_rate))
self._samples = self._samples[start_sample:end_sample]
class Perturbation:
def __init__(self, p=0.1, rng=None):
self.p = p
self._rng = random.Random() if rng is None else rng
def maybe_apply(self, segment, sample_rate=None):
if self._rng.random() < self.p:
self(segment, sample_rate) # pylint: disable=E1102
class SpeedPerturbation(Perturbation):
def __init__(self, min_rate=0.85, max_rate=1.15, discrete=False, p=0.1, rng=None):
super(SpeedPerturbation, self).__init__(p, rng)
assert 0 < min_rate < max_rate
self.min_rate = min_rate
self.max_rate = max_rate
self.discrete = discrete
def __call__(self, data, sample_rate):
if self.discrete:
rate = np.random.choice([self.min_rate, None, self.max_rate])
else:
rate = self._rng.uniform(self.min_rate, self.max_rate)
if rate is not None:
data._samples = sox.Transformer().speed(factor=rate).build_array( # pylint: disable=W0212
input_array=data._samples, sample_rate_in=sample_rate) # pylint: disable=W0212
# Copyright 2022 Huawei Technologies Co., Ltd
#
# Licensed under the Apache License, Version 2.0 (the License);
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# httpwww.apache.orglicensesLICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an AS IS BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ============================================================================
"""
Defined callback for jasper.
"""
import time
import math
import numpy as np
from mindspore.train.callback import Callback
from mindspore import Tensor
class TimeMonitor(Callback):
"""
Time monitor for calculating cost of each epoch.
Args
data_size (int) step size of an epoch.
"""
def __init__(self, data_size):
super(TimeMonitor, self).__init__()
self.data_size = data_size
def epoch_begin(self, run_context):
self.epoch_time = time.time()
def epoch_end(self, run_context):
epoch_mseconds = (time.time() - self.epoch_time) * 1000
per_step_mseconds = epoch_mseconds / self.data_size
print("epoch time: {0}, per step time: {1}".format(
epoch_mseconds, per_step_mseconds), flush=True)
def step_begin(self, run_context):
self.step_time = time.time()
def step_end(self, run_context):
step_mseconds = (time.time() - self.step_time) * 1000
print(f"step time {step_mseconds}", flush=True)
class Monitor(Callback):
"""
Monitor loss and time.
Args:
lr_init (numpy array): train lr
Returns:
None
"""
def __init__(self, lr_init=None):
super(Monitor, self).__init__()
self.lr_init = lr_init
self.lr_init_len = len(lr_init)
def epoch_begin(self, run_context):
self.losses = []
self.step_now = 0
self.step_nan = 0
self.epoch_time = time.time()
def epoch_end(self, run_context):
cb_params = run_context.original_args()
epoch_mseconds = (time.time() - self.epoch_time)
per_step_mseconds = epoch_mseconds / cb_params.batch_num
print("epoch time: {:5.3f}, per step time: {:5.3f}, avg loss: {:5.3f}".format(epoch_mseconds,
per_step_mseconds,
np.mean(self.losses)))
def step_begin(self, run_context):
self.step_time = time.time()
def step_end(self, run_context):
"""
Args:
run_context:
Returns:
"""
cb_params = run_context.original_args()
step_mseconds = (time.time() - self.step_time)
step_loss = cb_params.net_outputs
if isinstance(step_loss, (tuple, list)) and isinstance(step_loss[0], Tensor):
step_loss = step_loss[0]
if isinstance(step_loss, Tensor):
step_loss = np.mean(step_loss.asnumpy())
if math.isnan(step_loss) is not True and math.isinf(step_loss) is not True:
self.losses.append(step_loss)
cur_step_in_epoch = (cb_params.cur_step_num - 1) % cb_params.batch_num
print("epoch: [{:3d}/{:3d}], step:[{:5d}/{:5d}], loss:[{:5.3f}/{:5.3f}], time:[{:5.3f}], lr:[{:.9f}]".format(
cb_params.cur_epoch_num -
1, cb_params.epoch_num, cur_step_in_epoch, cb_params.batch_num, step_loss,
np.mean(self.losses), step_mseconds, self.lr_init[cb_params.cur_step_num - 1].asnumpy()))
# Copyright 2022 Huawei Technologies Co., Ltd
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ============================================================================
import re
from unidecode import unidecode
from .number import normalize_numbers
# Regular expression matching whitespace:
_whitespace_re = re.compile(r'\s+')
# List of (regular expression, replacement) pairs for abbreviations:
_abbreviations = [(re.compile('\\b%s\\.' % x[0], re.IGNORECASE), x[1]) for x in [
('mrs', 'misess'),
('mr', 'mister'),
('dr', 'doctor'),
('st', 'saint'),
('co', 'company'),
('jr', 'junior'),
('maj', 'major'),
('gen', 'general'),
('drs', 'doctors'),
('rev', 'reverend'),
('lt', 'lieutenant'),
('hon', 'honorable'),
('sgt', 'sergeant'),
('capt', 'captain'),
('esq', 'esquire'),
('ltd', 'limited'),
('col', 'colonel'),
('ft', 'fort'),
]]
def expand_abbreviations(text):
for regex, replacement in _abbreviations:
text = re.sub(regex, replacement, text)
return text
def expand_numbers(text):
return normalize_numbers(text)
def lowercase(text):
return text.lower()
def collapse_whitespace(text):
return re.sub(_whitespace_re, ' ', text)
def convert_to_ascii(text):
return unidecode(text)
def remove_punctuation(text, table):
text = text.translate(table)
text = re.sub(r'&', " and ", text)
text = re.sub(r'\+', " plus ", text)
return text
def basic_cleaners(text):
'''Basic pipeline that lowercases and collapses whitespace without transliteration.'''
text = lowercase(text)
text = collapse_whitespace(text)
return text
def transliteration_cleaners(text):
'''Pipeline for non-English text that transliterates to ASCII.'''
text = convert_to_ascii(text)
text = lowercase(text)
text = collapse_whitespace(text)
return text
def english_cleaners(text, table=None):
'''Pipeline for English text, including number and abbreviation expansion.'''
text = convert_to_ascii(text)
text = lowercase(text)
text = expand_numbers(text)
text = expand_abbreviations(text)
if table is not None:
text = remove_punctuation(text, table)
text = collapse_whitespace(text)
return text
# Copyright 2022 Huawei Technologies Co., Ltd
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ===========================================================================
"""
network config setting, will be used in train.py and eval.py
"""
import inspect
import yaml
from easydict import EasyDict as ed
from src.model import JasperBlock, JasperDecoderForCTC, JasperEncoder
train_config = ed({
"TrainingConfig": {
"epochs": 440,
"loss_scale": 128.0
},
"DataConfig": {
"Data_dir": '/data/train_datasets',
"train_manifest": ['/data/train_datasets/librispeech-train-clean-100-wav.json',
'/data/train_datasets/librispeech-train-clean-360-wav.json',
'/data/train_datasets/librispeech-train-other-500-wav.json'],
"mindrecord_format": "/data/jasper_tr{}.md",
"mindrecord_files": [f"/data/jasper_tr{i}.md" for i in range(8)],
"batch_size": 64,
"accumulation_step": 2,
"labels_path": "labels.json",
"SpectConfig": {
"sample_rate": 16000,
"window_size": 0.02,
"window_stride": 0.01,
"window": "hamming"
},
"AugmentationConfig": {
"speed_volume_perturb": False,
"spec_augment": False,
"noise_dir": '',
"noise_prob": 0.4,
"noise_min": 0.0,
"noise_max": 0.5,
}
},
"OptimConfig": {
"learning_rate": 0.01,
"learning_anneal": 1.1,
"weight_decay": 1e-5,
"momentum": 0.9,
"eps": 1e-8,
"betas": (0.9, 0.999),
"loss_scale": 1024,
"epsilon": 0.00001
},
"CheckpointConfig": {
"ckpt_file_name_prefix": 'Jasper',
"ckpt_path": './checkpoint',
"keep_checkpoint_max": 10
}
})
eval_config = ed({
"save_output": 'librispeech_val_output',
"verbose": True,
"DataConfig": {
"Data_dir": '/data/inference_datasets',
"test_manifest": ['/data/inference_datasets/librispeech-dev-clean-wav.json'],
"batch_size": 32,
"labels_path": "labels.json",
"SpectConfig": {
"sample_rate": 16000,
"window_size": 0.02,
"window_stride": 0.01,
"window": "hanning"
},
},
"LMConfig": {
"decoder_type": "greedy",
"lm_path": './3-gram.pruned.3e-7.arpa',
"top_paths": 1,
"alpha": 1.818182,
"beta": 0,
"cutoff_top_n": 40,
"cutoff_prob": 1.0,
"beam_width": 1024,
"lm_workers": 4
},
})
def default_args(klass):
sig = inspect.signature(klass.__init__)
return {k: v.default for k, v in sig.parameters.items() if k != 'self'}
def load(fpath):
if fpath.endswith('.toml'):
raise ValueError('.toml config format has been changed to .yaml')
cfg = yaml.safe_load(open(fpath, 'r'))
# Reload to deep copy shallow copies, which were made with yaml anchors
yaml.Dumper.ignore_aliases = lambda *args: True
cfg = yaml.dump(cfg)
cfg = yaml.safe_load(cfg)
return cfg
def validate_and_fill(klass, user_conf, ignore_unk=None, optional=None):
conf = default_args(klass)
if ignore_unk is None:
ignore_unk = []
if optional is None:
optional = []
for k, v in user_conf.items():
conf[k] = v
# Keep only mandatory or optional-nonempty
conf = {k: v for k, v in conf.items()
if k not in optional or v is not inspect.Parameter.empty}
# Validate
for k, v in conf.items():
assert v is not inspect.Parameter.empty, \
f'Value for {k} not specified for {klass}'
return conf
def encoder(conf):
"""Validate config for JasperEncoder and subsequent JasperBlocks"""
# Validate, but don't overwrite with defaults
for blk in conf['jasper']['encoder']['blocks']:
validate_and_fill(JasperBlock, blk, optional=['infilters'],
ignore_unk=['residual_dense'])
return validate_and_fill(JasperEncoder, conf['jasper']['encoder'])
def decoder(conf, n_classes):
deco_kw = {'n_classes': n_classes, **conf['jasper']['decoder']}
return validate_and_fill(JasperDecoderForCTC, deco_kw)
def add_ctc_blank(sym):
return sym + ['_']
cfgs = load('./src/jasper10x5dr_speca.yaml')
symbols = add_ctc_blank(cfgs['labels'])
encoder_kw = encoder(cfgs)
decoder_kw = decoder(cfgs, n_classes=len(symbols))
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment