merge with dev_python (#2249)
* Dev actor msg queue (#2225) * async msg queue * EnqueueAsyncMsg * Merge wnd python (#2226) * not ready yet * segment fix * fix segment_sum bugs * 1st wide_n_deep push * Fix tick in multi node parallel (#2042) * check in fixes * fix by adding boxing method * register tick op * move code and add more check * fix typo * fix bug when filtering op nodes before adding tick * fix wheel build not adding .so (#2052) * color plan dot VERSION-2 (#2045) * run sucessfully on single GPU * fix 121 for tick (#2069) * delete unncessary multiply_grad class * speed up generate time for dot2svg (#2083) * Add axis conf to bias_add for any axis channel (#2087) * bias_add completion * follow comment * make conf axis required * Revert "Add axis conf to bias_add for any axis channel (#2087)" (#2091) This reverts commit 8679ce980ce8570bf927baeab8616ee7b93fac47. * updated * fix segment_sum_grad * fix sbp * fix segment_sum impl for data parallel * fix * remove useless code in segment_kernel_util.h * add python interface * fix sigmoid conf * fix naming error * fix typo * temp mod loss sbp * add LazyAdam * Merge branch 'dev_python' of https://github.com/Oneflow-Inc/oneflow into dev_python_widedeep * rm useless code * unsorted_segment_sum * refactor sigmoid_cross_entropy_loss_kernel to high performance * Improve sigmoid cross entropy loss grad (#2207) * remove for loop called cuda kernel * minor fix * ../oneflow/python/ops/data_ops.py (#2209) * fix lazy_adam * Merge wnd and python (#2214) * rm ActivationType from op/kernel (#2205) * refactor sigmoid_cross_entropy_loss * fix SigmoidGrad::InferBatchAxis * support part_name_prefix and part_name_suffix_length (#2208) * rename: OutRemoteBlobsResultBox => OutRemoteBlobsStatus * oneflow.watch for debug * Dev decode batch size (#2206) * rm batch_size and piece_size * merge dev_python * Update reshape_like_op.cpp (#2213) * oneflow.parallel (#2211) * oneflow.parallel * refactor split_axis => parallel * rename parallel => distribute * fix typo: *Parallel => *Distribute * add blob_desc.with_split_distribute(axis) and blob_desc.with_broadcast_distribute() * merge dev_python * fix boxing: P->S(0) * check in docker build scripts (#2216) * Dev python widedeep docker (#2218) * check in docker build scripts * check in .dockerignore * rm oneflow.segment_sum * remove segment_sum * rm unused file * rm debug code * rm debug code * rm double empty lines * remove useless comments * fix send msg (#2227) * fix reduction_coefficient (#2228) * refactor ndarray for eq/ne/... * Dev kernel launch synchronized (#2230) * IsKernelLaunchSynchronized * virtual * refine * refine * seperate LOGICAL_BINARY_FUNC from ARITHMETIC_BINARY_FUNC * more static_assert * remove unused task related dot function (#2236) * remove unused task related dot function * do not output dot rank info * Dev non distributed optimizer js (#2234) * op&kernel&actor * job * job_completer * graph * format * fix pd * fix * ignore DelPlacementByOpName * fix auto tick * JobBuilder * fix * config util * fix * fix opgrade * broadcast tick * fix allreduce * balance by model size * GetSoleOutBlobSize * async_actor_msg_deque * group * AddOrMutOpsOnlyOnce * fix NcclTupleBroadcastGrad * order * set nccl order hint * op_conf * grad hint * NcclTupleBroadcastReduceSequencePass * add missed mutops * order fix * try kMdUpdtArea * fix nccl_order_hint * fix * add ti * tuple_identity_op * remove useless * group * fix dead lock * force ctrl in * sc broadcast * sort obn * group nccl * config group_size_mbyte * non_distributed_optimizer_group_size_mbyte * format * stop check * rm message sending optimization * refine lazy adam (#2244) * refine lazy adam * update * memory version 2 step 1: replace original concept about mem sharing (#2242) * mem_shared_id -> mem_block_id; mem_shared_off_set -> mem_block_offset; enable_mem_sharing->enable_reuse_mem * memory version 2 step 1: replace original concept about mem sharing * record reader multi thread (#2246) * multi thread * ComputeThreadPoolSize * python api
Showing
- oneflow/core/actor/actor.cpp 29 additions, 10 deletionsoneflow/core/actor/actor.cpp
- oneflow/core/actor/actor.h 5 additions, 1 deletiononeflow/core/actor/actor.h
- oneflow/core/actor/copy_comm_net_actor.cpp 2 additions, 2 deletionsoneflow/core/actor/copy_comm_net_actor.cpp
- oneflow/core/actor/input_wise_compute_actor.h 1 addition, 1 deletiononeflow/core/actor/input_wise_compute_actor.h
- oneflow/core/actor/nccl_actor.cpp 2 additions, 1 deletiononeflow/core/actor/nccl_actor.cpp
- oneflow/core/actor/normal_forward_compute_actor.cpp 1 addition, 1 deletiononeflow/core/actor/normal_forward_compute_actor.cpp
- oneflow/core/actor/repeat_forward_compute_actor.cpp 1 addition, 1 deletiononeflow/core/actor/repeat_forward_compute_actor.cpp
- oneflow/core/graph/boxing_task_node.cpp 4 additions, 2 deletionsoneflow/core/graph/boxing_task_node.cpp
- oneflow/core/graph/logical_graph.cpp 3 additions, 1 deletiononeflow/core/graph/logical_graph.cpp
- oneflow/core/graph/logical_node.cpp 32 additions, 0 deletionsoneflow/core/graph/logical_node.cpp
- oneflow/core/graph/logical_node.h 26 additions, 0 deletionsoneflow/core/graph/logical_node.h
- oneflow/core/graph/nccl_tuple_broadcast_compute_task_node.cpp 49 additions, 0 deletions...low/core/graph/nccl_tuple_broadcast_compute_task_node.cpp
- oneflow/core/graph/nccl_tuple_broadcast_compute_task_node.h 26 additions, 0 deletionsoneflow/core/graph/nccl_tuple_broadcast_compute_task_node.h
- oneflow/core/graph/nccl_tuple_reduce_compute_task_node.cpp 38 additions, 0 deletionsoneflow/core/graph/nccl_tuple_reduce_compute_task_node.cpp
- oneflow/core/graph/nccl_tuple_reduce_compute_task_node.h 26 additions, 0 deletionsoneflow/core/graph/nccl_tuple_reduce_compute_task_node.h
- oneflow/core/graph/normal_forward_compute_task_node.cpp 5 additions, 5 deletionsoneflow/core/graph/normal_forward_compute_task_node.cpp
- oneflow/core/graph/op_graph.cpp 2 additions, 1 deletiononeflow/core/graph/op_graph.cpp
- oneflow/core/graph/reduce_add_compute_task_node.cpp 8 additions, 8 deletionsoneflow/core/graph/reduce_add_compute_task_node.cpp
- oneflow/core/graph/reduce_comp_task_node_if.h 6 additions, 6 deletionsoneflow/core/graph/reduce_comp_task_node_if.h
- oneflow/core/graph/sharable_mem_block_graph.cpp 1 addition, 4 deletionsoneflow/core/graph/sharable_mem_block_graph.cpp
Please register or sign in to comment