Skip to content
Snippets Groups Projects
Unverified Commit b428f797 authored by Shenghang Tsai's avatar Shenghang Tsai Committed by GitHub
Browse files

Print backtrace when distributed CI failed (#4211)


* Print backtrace when distributed CI failed

* fix path

* typo

Co-authored-by: default avataroneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>
parent 2111c2d2
No related branches found
No related tags found
No related merge requests found
......@@ -31,3 +31,4 @@
**/core.9*
/.cache
/oneflow-src.zip
/distributed-tmp
......@@ -87,6 +87,10 @@ jobs:
--build_docker_img \
--oneflow_wheel_path=${wheelhouse_dir} \
--oneflow_worker_bin=${bin_dir}/oneflow_worker
- name: Print Backtrace (distributed test)
if: always()
run: |
docker run --privileged --network host --shm-size=8g --rm -v $PWD:$PWD -w $PWD oneflow-test:$USER bash ci/test/print_stack_from_core.sh python3 distributed-tmp
- name: Upload log (distributed test)
if: always()
uses: ./.github/actions/upload_oss
......
......@@ -28,3 +28,4 @@ compile_commands.json
.cache
/oneflow-src.zip
/oneflow_temp
/distributed-tmp
......@@ -161,6 +161,7 @@ export ONEFLOW_TEST_SSH_PORT={ssh_port}
export ONEFLOW_TEST_LOG_DIR={log_dir}
export ONEFLOW_TEST_NODE_LIST="{this_host},{remote_host}"
export ONEFLOW_WORKER_KEEP_LOG=1
export ONEFLOW_TEST_TMP_DIR="./distributed-tmp"
export NCCL_DEBUG=INFO
"""
if oneflow_worker_bin:
......
set -ex
if compgen -G "$2/core.*" > /dev/null; then
gdb --batch --quiet -ex "thread apply all bt full" -ex "quit" $1 $2/core.*
fi
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment