Skip to content
Snippets Groups Projects
Commit ece8f5e9 authored by huangxinjing's avatar huangxinjing
Browse files

Fix single run error for gpt model

parent 2b24802f
No related branches found
No related tags found
No related merge requests found
......@@ -30,4 +30,5 @@ python train.py \
--epoch_size=$EPOCH_SIZE \
--device_id=$DEVICE_ID \
--data_path=$DATA_DIR \
--model_parallel=1 \
--optimizer="adam" > training_log.txt 2>&1 &
......@@ -241,7 +241,7 @@ Training 60B model using 8 NPU in one server requires that the server has at lea
```bash
# run distributed training example in one ascend machine
bash run_distributed_train_moe_host_device.sh /path/dataset /path/hccl.json 8 fp32 2.6B 1 1 1 0 8 36 0
bash run_distributed_train_moe_host_device.sh /path/dataset /path/hccl.json 8 fp32 2.6B 1 1 2 0 8 36 0
```
#### Training on homogeneous
......
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment