Skip to content
Snippets Groups Projects
  1. Oct 11, 2019
    • Juncheng's avatar
      Dev cuda 9 arch 70 (#2318) · 77bcbedd
      Juncheng authored
      * kCudaAlignSize = 256
      
      * always compute_70
      
      * __CUDA_API_VERSION >= 10000
      
      * __CUDA_API_VERSION >= 10000
      
      * disable_all_reduce_sequence
      77bcbedd
  2. Oct 10, 2019
    • cheng cheng's avatar
      Memory Version 2.0 Step 2: MemSharedAndReused between jobs (#2267) · f4887a09
      cheng cheng authored
      * MemBlockProto and ChunkProto
      
      * create mem block and chunk after improver
      
      * interface merge mem block and chunk between sub plans
      
      * merge chunk between jobs for memory reuse
      
      * using memory zone unique id replace memory case hash
      
      * merge interface op mem block between jobs for mem shared
      
      * gen GlobalCriticalSection by mem block id and chunk id
      
      * check mem block and chunk valid before runtime
      
      * Refactor: RegstMgr ;  allocate memory by mem block and chunk instead of regst
      
      * fix bug; and pass test
      
      * fig bug: init chunk_id_count in id_manager
      
      * reuse copyHd out mem between jobs
      
      * PushPlan and PullPlan for memblock and chunk
      
      * refine merge mem block / chunk in oneflow.cpp
      
      * at(i);
      
      * GetOpName2JobId2TaskProtos functional
      
      * using output ptr; pass test AlexNet and Resnet
      f4887a09
  3. Sep 30, 2019
  4. Sep 27, 2019
  5. Sep 25, 2019
    • Juncheng's avatar
      Dev pr boxing v2 (#2248) · 3e951b0b
      Juncheng authored
      * NcclDeviceCtx
      
      * include naive_actor
      
      * refine
      
      * use_boxing_v2
      
      * config.use_boxing_v2
      
      * SubTskGphBuilder
      
      * fix
      
      * hash<oneflow::MemoryCase>
      
      * Maybe<void>
      
      * ChainSubTskGphBuilder
      
      * SliceBoxingOp
      
      * return ok
      
      * SliceBoxingKernel
      
      * SliceBoxingActor
      
      * kSliceBoxing
      
      * nccl boxing op
      
      * nccl actor
      
      * REGISTER_OP
      
      * GetMsgFromCustomizedConf
      
      * NcclBoxingTaskNode
      
      * BldSubTskGphByBoxingV2
      
      * NcclBoxingSubTskGphBuilder
      
      * fix
      
      * fix
      
      * NcclKernel
      
      * ParallelContext
      
      * REGISTER_ACTOR
      
      * fix rank set
      
      * IsNcclTaskType
      
      * limit
      
      * 1024
      
      * multi thread reader
      
      * thread_num
      
      * IsKernelLaunchSynchronized
      
      * refine
      
      * NcclTupleReduce/BroadcastKernel use NcclDeviceCtx
      
      * MakeHostMemCase
      
      * NcclBldSubTskGph
      
      * remove use less code
      
      * use_boxing_v2
      
      * refine
      
      * refine
      
      * refine
      
      * refine
      
      * refine
      3e951b0b
  6. Sep 24, 2019
  7. Sep 23, 2019
  8. Sep 20, 2019
  9. Sep 19, 2019
    • Juncheng's avatar
      Dev kernel launch synchronized (#2230) · 7ee0d54c
      Juncheng authored
      * IsKernelLaunchSynchronized
      
      * virtual
      
      * refine
      
      * refine
      7ee0d54c
    • lixinqi's avatar
      refactor ndarray for eq/ne/... · 2d813ffb
      lixinqi authored
      2d813ffb
    • Juncheng's avatar
      fix reduction_coefficient (#2228) · ff0b37bf
      Juncheng authored
      ff0b37bf
    • Juncheng's avatar
      fix send msg (#2227) · 5e28336c
      Juncheng authored
      5e28336c
    • Li Xinqi's avatar
      Merge wnd python (#2226) · 01f4c2d1
      Li Xinqi authored
      * not ready yet
      
      * segment fix
      
      * fix segment_sum bugs
      
      * 1st wide_n_deep push
      
      * Fix tick in multi node parallel (#2042)
      
      * check in fixes
      
      * fix by adding boxing method
      
      * register tick op
      
      * move code and add more check
      
      * fix typo
      
      * fix bug when filtering op nodes before adding tick
      
      * fix wheel build not adding .so (#2052)
      
      * color plan dot VERSION-2 (#2045)
      
      * run sucessfully on single GPU
      
      * fix 121 for tick (#2069)
      
      * delete unncessary multiply_grad class
      
      * speed up generate time for dot2svg (#2083)
      
      * Add axis conf to bias_add for any axis channel (#2087)
      
      * bias_add completion
      
      * follow comment
      
      * make conf axis required
      
      * Revert "Add axis conf to bias_add for any axis channel (#2087)" (#2091)
      
      This reverts commit 8679ce980ce8570bf927baeab8616ee7b93fac47.
      
      * updated
      
      * fix segment_sum_grad
      
      * fix sbp
      
      * fix segment_sum impl for data parallel
      
      * fix
      
      * remove useless code in segment_kernel_util.h
      
      * add python interface
      
      * fix sigmoid conf
      
      * fix naming error
      
      * fix typo
      
      * temp mod loss sbp
      
      * add LazyAdam
      
      * Merge branch 'dev_python' of https://github.com/Oneflow-Inc/oneflow into dev_python_widedeep
      
      * rm useless code
      
      * unsorted_segment_sum
      
      * refactor sigmoid_cross_entropy_loss_kernel to high performance
      
      * Improve sigmoid cross entropy loss grad (#2207)
      
      * remove for loop called cuda kernel
      
      * minor fix
      
      * ../oneflow/python/ops/data_ops.py (#2209)
      
      * fix lazy_adam
      
      * Merge wnd and python (#2214)
      
      * rm ActivationType from op/kernel (#2205)
      
      * refactor sigmoid_cross_entropy_loss
      
      * fix SigmoidGrad::InferBatchAxis
      
      * support part_name_prefix and part_name_suffix_length (#2208)
      
      * rename: OutRemoteBlobsResultBox => OutRemoteBlobsStatus
      
      * oneflow.watch for debug
      
      * Dev decode batch size (#2206)
      
      * rm batch_size and piece_size
      
      * merge dev_python
      
      * Update reshape_like_op.cpp (#2213)
      
      * oneflow.parallel (#2211)
      
      * oneflow.parallel
      
      * refactor split_axis => parallel
      
      * rename parallel => distribute
      
      * fix typo: *Parallel => *Distribute
      
      * add blob_desc.with_split_distribute(axis) and blob_desc.with_broadcast_distribute()
      
      * merge dev_python
      
      * fix boxing: P->S(0)
      
      * check in docker build scripts (#2216)
      
      * Dev python widedeep docker (#2218)
      
      * check in docker build scripts
      
      * check in .dockerignore
      
      * rm oneflow.segment_sum
      
      * remove segment_sum
      
      * rm unused file
      
      * rm debug code
      
      * rm debug code
      
      * rm double empty lines
      
      * remove useless comments
      01f4c2d1
    • Juncheng's avatar
      Dev actor msg queue (#2225) · 6017512d
      Juncheng authored
      * async msg queue
      
      * EnqueueAsyncMsg
      6017512d
  10. Sep 18, 2019
  11. Sep 17, 2019
  12. Sep 16, 2019
    • scxfjiang's avatar
      add resnet50 in model (#2217) · 247170c1
      scxfjiang authored
      247170c1
    • Shenghang Tsai's avatar
      docker build support (#2002) · bcd7621f
      Shenghang Tsai authored
      * update cmake files
      
      * check in files
      
      * Fix tick in multi node parallel (#2042)
      
      * check in fixes
      
      * fix by adding boxing method
      
      * register tick op
      
      * move code and add more check
      
      * fix typo
      
      * fix bug when filtering op nodes before adding tick
      
      * shrink ctx size
      
      * fix script
      
      * fix wheel build
      
      * fix wheel build not adding .so (#2052)
      
      * lower cmake version bar
      
      * rm more files
      
      * keep build dir
      
      * check in test bash script
      
      * fix
      
      * Dev docker sx (#2124)
      
      * add python2 docker env
      
      * rm old docker files
      
      * update repository
      
      * add ARG CUDA and USE_PYTHON_3_OR_2
      
      * reform files
      
      * update
      
      * rm log doesn't print when there is cache
      
      * use default arg in dockerfile
      
      * better py 2 or 3 condition
      
      * add default
      
      * use if
      
      * update alexnet
      
      * update for bert
      
      * 15->16
      bcd7621f
    • cheng cheng's avatar
    • Li Xinqi's avatar
      oneflow.parallel (#2211) · 0ca43f65
      Li Xinqi authored
      * oneflow.parallel
      
      * refactor split_axis => parallel
      
      * rename parallel => distribute
      
      * fix typo: *Parallel => *Distribute
      
      * add blob_desc.with_split_distribute(axis) and blob_desc.with_broadcast_distribute()
      0ca43f65
    • Shenghang Tsai's avatar
      Update reshape_like_op.cpp (#2213) · 01f9b069
      Shenghang Tsai authored
      01f9b069
    • Li Xinqi's avatar
      Dev decode batch size (#2206) · 42967639
      Li Xinqi authored
      * rm batch_size and piece_size
      
      * merge dev_python
      42967639
  13. Sep 15, 2019
  14. Sep 14, 2019