- Oct 11, 2019
-
-
Juncheng authored
* kCudaAlignSize = 256 * always compute_70 * __CUDA_API_VERSION >= 10000 * __CUDA_API_VERSION >= 10000 * disable_all_reduce_sequence
-
- Oct 10, 2019
-
-
cheng cheng authored
* MemBlockProto and ChunkProto * create mem block and chunk after improver * interface merge mem block and chunk between sub plans * merge chunk between jobs for memory reuse * using memory zone unique id replace memory case hash * merge interface op mem block between jobs for mem shared * gen GlobalCriticalSection by mem block id and chunk id * check mem block and chunk valid before runtime * Refactor: RegstMgr ; allocate memory by mem block and chunk instead of regst * fix bug; and pass test * fig bug: init chunk_id_count in id_manager * reuse copyHd out mem between jobs * PushPlan and PullPlan for memblock and chunk * refine merge mem block / chunk in oneflow.cpp * at(i); * GetOpName2JobId2TaskProtos functional * using output ptr; pass test AlexNet and Resnet
-
- Sep 30, 2019
-
-
QiaoJing authored
* add decode random * fix decode random actor * fix dev_python test scripts * fix batch_size test scripts * fix
-
Juncheng authored
* use bias add * fix * bias_add * bias add half * fix * reinterpret_cast * fix half * HALF * fix * ADD_DEFAULT_KERNEL_CREATOR * fix * format
-
cheng cheng authored
-
- Sep 27, 2019
-
-
cheng cheng authored
-
- Sep 25, 2019
-
-
Juncheng authored
* NcclDeviceCtx * include naive_actor * refine * use_boxing_v2 * config.use_boxing_v2 * SubTskGphBuilder * fix * hash<oneflow::MemoryCase> * Maybe<void> * ChainSubTskGphBuilder * SliceBoxingOp * return ok * SliceBoxingKernel * SliceBoxingActor * kSliceBoxing * nccl boxing op * nccl actor * REGISTER_OP * GetMsgFromCustomizedConf * NcclBoxingTaskNode * BldSubTskGphByBoxingV2 * NcclBoxingSubTskGphBuilder * fix * fix * NcclKernel * ParallelContext * REGISTER_ACTOR * fix rank set * IsNcclTaskType * limit * 1024 * multi thread reader * thread_num * IsKernelLaunchSynchronized * refine * NcclTupleReduce/BroadcastKernel use NcclDeviceCtx * MakeHostMemCase * NcclBldSubTskGph * remove use less code * use_boxing_v2 * refine * refine * refine * refine * refine
-
- Sep 24, 2019
-
-
QiaoJing authored
* add decode random * fix decode random actor
-
- Sep 23, 2019
-
-
Juncheng authored
* multi thread * ComputeThreadPoolSize * python api
-
cheng cheng authored
* mem_shared_id -> mem_block_id; mem_shared_off_set -> mem_block_offset; enable_mem_sharing->enable_reuse_mem * memory version 2 step 1: replace original concept about mem sharing
-
Shiyuan Shang-Guan authored
* refine lazy adam * update
-
- Sep 20, 2019
-
-
Juncheng authored
* op&kernel&actor * job * job_completer * graph * format * fix pd * fix * ignore DelPlacementByOpName * fix auto tick * JobBuilder * fix * config util * fix * fix opgrade * broadcast tick * fix allreduce * balance by model size * GetSoleOutBlobSize * async_actor_msg_deque * group * AddOrMutOpsOnlyOnce * fix NcclTupleBroadcastGrad * order * set nccl order hint * op_conf * grad hint * NcclTupleBroadcastReduceSequencePass * add missed mutops * order fix * try kMdUpdtArea * fix nccl_order_hint * fix * add ti * tuple_identity_op * remove useless * group * fix dead lock * force ctrl in * sc broadcast * sort obn * group nccl * config group_size_mbyte * non_distributed_optimizer_group_size_mbyte * format * stop check * rm message sending optimization
-
Li Xinqi authored
* remove unused task related dot function * do not output dot rank info
-
lixinqi authored
-
lixinqi authored
-
lixinqi authored
-
- Sep 19, 2019
-
-
Juncheng authored
* IsKernelLaunchSynchronized * virtual * refine * refine
-
lixinqi authored
-
Juncheng authored
-
Juncheng authored
-
Li Xinqi authored
* not ready yet * segment fix * fix segment_sum bugs * 1st wide_n_deep push * Fix tick in multi node parallel (#2042) * check in fixes * fix by adding boxing method * register tick op * move code and add more check * fix typo * fix bug when filtering op nodes before adding tick * fix wheel build not adding .so (#2052) * color plan dot VERSION-2 (#2045) * run sucessfully on single GPU * fix 121 for tick (#2069) * delete unncessary multiply_grad class * speed up generate time for dot2svg (#2083) * Add axis conf to bias_add for any axis channel (#2087) * bias_add completion * follow comment * make conf axis required * Revert "Add axis conf to bias_add for any axis channel (#2087)" (#2091) This reverts commit 8679ce980ce8570bf927baeab8616ee7b93fac47. * updated * fix segment_sum_grad * fix sbp * fix segment_sum impl for data parallel * fix * remove useless code in segment_kernel_util.h * add python interface * fix sigmoid conf * fix naming error * fix typo * temp mod loss sbp * add LazyAdam * Merge branch 'dev_python' of https://github.com/Oneflow-Inc/oneflow into dev_python_widedeep * rm useless code * unsorted_segment_sum * refactor sigmoid_cross_entropy_loss_kernel to high performance * Improve sigmoid cross entropy loss grad (#2207) * remove for loop called cuda kernel * minor fix * ../oneflow/python/ops/data_ops.py (#2209) * fix lazy_adam * Merge wnd and python (#2214) * rm ActivationType from op/kernel (#2205) * refactor sigmoid_cross_entropy_loss * fix SigmoidGrad::InferBatchAxis * support part_name_prefix and part_name_suffix_length (#2208) * rename: OutRemoteBlobsResultBox => OutRemoteBlobsStatus * oneflow.watch for debug * Dev decode batch size (#2206) * rm batch_size and piece_size * merge dev_python * Update reshape_like_op.cpp (#2213) * oneflow.parallel (#2211) * oneflow.parallel * refactor split_axis => parallel * rename parallel => distribute * fix typo: *Parallel => *Distribute * add blob_desc.with_split_distribute(axis) and blob_desc.with_broadcast_distribute() * merge dev_python * fix boxing: P->S(0) * check in docker build scripts (#2216) * Dev python widedeep docker (#2218) * check in docker build scripts * check in .dockerignore * rm oneflow.segment_sum * remove segment_sum * rm unused file * rm debug code * rm debug code * rm double empty lines * remove useless comments
-
Juncheng authored
* async msg queue * EnqueueAsyncMsg
-
- Sep 18, 2019
- Sep 17, 2019
-
-
cheng cheng authored
* remove parallel policy * rm FC/rnn/embedding_look_up op/kernel * add check data parallel for conv/layer_norm op * bugfix: bias add + use math_add when batch size = 1
-
- Sep 16, 2019
-
-
scxfjiang authored
-
Shenghang Tsai authored
* update cmake files * check in files * Fix tick in multi node parallel (#2042) * check in fixes * fix by adding boxing method * register tick op * move code and add more check * fix typo * fix bug when filtering op nodes before adding tick * shrink ctx size * fix script * fix wheel build * fix wheel build not adding .so (#2052) * lower cmake version bar * rm more files * keep build dir * check in test bash script * fix * Dev docker sx (#2124) * add python2 docker env * rm old docker files * update repository * add ARG CUDA and USE_PYTHON_3_OR_2 * reform files * update * rm log doesn't print when there is cache * use default arg in dockerfile * better py 2 or 3 condition * add default * use if * update alexnet * update for bert * 15->16
-
cheng cheng authored
-
Li Xinqi authored
* oneflow.parallel * refactor split_axis => parallel * rename parallel => distribute * fix typo: *Parallel => *Distribute * add blob_desc.with_split_distribute(axis) and blob_desc.with_broadcast_distribute()
-
Shenghang Tsai authored
-
Li Xinqi authored
* rm batch_size and piece_size * merge dev_python
-
- Sep 15, 2019
- Sep 14, 2019