Commits · d9d28b805b5113a99538468ea06b06acbcf66d6d · Summer2021 / 210130133

Aug 06, 2021
- Remove obsolete Profiler (#5747) · d9d28b80
  Juncheng authored 3 years ago
  
  d9d28b80
Aug 05, 2021

Replace piece_id with comm_net_sequence_number (#5731) · 0160f262

Juncheng authored 3 years ago


* comm_net_sequence_number

* remove piece_id

* Remove IsAllowedActor

Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>

0160f262

Add env ONEFLOW_THREAD_ENABLE_LOCAL_MESSAGE_QUEUE (#5720) · a4c87a26

Juncheng authored 3 years ago


* Add env ONEFLOW_THREAD_LOCAL_MESSAGE_QUEUE_ENABLE

* refine GetGlobalWorkStreamId

* refine name

* refine

Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>

a4c87a26

Jul 30, 2021
- Refactor RuntimeCtx for multi-runtime (#5664) · 458bc060
  cheng cheng authored 3 years ago
```
Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>
```
  458bc060
Feb 20, 2021

Remove get local work stream id api (#4227) · 0e32ee69

leaves-zwx authored 4 years ago


* rm LocalWorkStreamId

* rm AllocateLocalWorkStreamId in TaskNode

* rm local work stream id in task node and commnet task node

* rm local_work_stream_id param in NewTaskId

* fix test

Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>

0e32ee69

Dec 28, 2020
- Remove col_id (#4046) · 847d1a9e
  Juncheng authored 4 years ago
  
  847d1a9e
Nov 03, 2020
- Refactor ExecKernel::bn_in_op2regst_desc_id to bn_in_op2blob_info (#3744) · 75685a8e
  Juncheng authored 4 years ago
```
Co-authored-by: oneflow-bot <69100618+oneflow-bot@users.noreply.github.com>
```
  75685a8e
Oct 30, 2020

Ssp variable proxy (#3715) · b034f5ce

Li Xinqi authored 4 years ago


* Add Repeat/Acc user op

* ssp_variable_proxy op/kernel/task_node

* SspVariableProxyActor

* bind inplace relation between var and ref

* fix ssp_variable_proxy bugs

* address comments

* Update test_ssp_variable_proxy.py

Co-authored-by: liujuncheng <liujuncheng1022@gmail.com>
Co-authored-by: Shenghang Tsai <jackalcooper@gmail.com>
Co-authored-by: oneflow-bot <69100618+oneflow-bot@users.noreply.github.com>

b034f5ce

Oct 10, 2020

AutoRegistrationFactory add key type (#3660) · 36fe61c9

Juncheng authored 4 years ago


* AutoRegistrationFactory add key type

* AutoRegistrationFactory add creators accessores (#3662)

* AutoRegistrationFactory add creators accessores

* refine

* const

Co-authored-by: oneflow-bot <69100618+oneflow-bot@users.noreply.github.com>

36fe61c9

Jul 23, 2020

Dev apache2 license (#3266) · d0bdbd5d

Shenghang Tsai authored 4 years ago


* add license at root dir

* check in empty files

* rm space

* check in script

* update script

* fix bug

* add print

* fix

* add exit

* add to of_format

* add CI task

* fix license

* Revert "fix license"

This reverts commit 818b6d7691d3a8b4a25dd41a47ff2c5922b8ec57.

* only add once

* quick fix

* fix script

* dont fmt empty file

* fix

* quick fix

* fix py

* add license

* fix exit

* add license for hpp

* add license

* license new vm files

Co-authored-by: tsai <caishenghang@oneflow.org>

d0bdbd5d

Dec 09, 2019
- tmp_regst_desc_id_vec_ (#2492) · 2521eee9
  Juncheng authored 5 years ago
  
  2521eee9
Dec 06, 2019
- Dev remove model vid (#2488) · 7d5da3ad
  Juncheng authored 5 years ago
```
* rm model_version_id

* rm model_version_id

* remove model vid
```
  7d5da3ad
Nov 29, 2019
- Fix InplaceActor when no one consume inplace out regst (#2458) · d274ba1b
  Niu Chong authored 5 years ago
  
  d274ba1b
Sep 24, 2019

merge with dev_python (#2249) · 3960d2cb

Niu Chong authored 5 years ago

* Dev actor msg queue (#2225)

* async msg queue

* EnqueueAsyncMsg

* Merge wnd python (#2226)

* not ready yet

* segment fix

* fix segment_sum bugs

* 1st wide_n_deep push

* Fix tick in multi node parallel (#2042)

* check in fixes

* fix by adding boxing method

* register tick op

* move code and add more check

* fix typo

* fix bug when filtering op nodes before adding tick

* fix wheel build not adding .so (#2052)

* color plan dot VERSION-2 (#2045)

* run sucessfully on single GPU

* fix 121 for tick (#2069)

* delete unncessary multiply_grad class

* speed up generate time for dot2svg (#2083)

* Add axis conf to bias_add for any axis channel (#2087)

* bias_add completion

* follow comment

* make conf axis required

* Revert "Add axis conf to bias_add for any axis channel (#2087)" (#2091)

This reverts commit 8679ce980ce8570bf927baeab8616ee7b93fac47.

* updated

* fix segment_sum_grad

* fix sbp

* fix segment_sum impl for data parallel

* fix

* remove useless code in segment_kernel_util.h

* add python interface

* fix sigmoid conf

* fix naming error

* fix typo

* temp mod loss sbp

* add LazyAdam

* Merge branch 'dev_python' of https://github.com/Oneflow-Inc/oneflow into dev_python_widedeep

* rm useless code

* unsorted_segment_sum

* refactor sigmoid_cross_entropy_loss_kernel to high performance

* Improve sigmoid cross entropy loss grad (#2207)

* remove for loop called cuda kernel

* minor fix

* ../oneflow/python/ops/data_ops.py (#2209)

* fix lazy_adam

* Merge wnd and python (#2214)

* rm ActivationType from op/kernel (#2205)

* refactor sigmoid_cross_entropy_loss

* fix SigmoidGrad::InferBatchAxis

* support part_name_prefix and part_name_suffix_length (#2208)

* rename: OutRemoteBlobsResultBox => OutRemoteBlobsStatus

* oneflow.watch for debug

* Dev decode batch size (#2206)

* rm batch_size and piece_size

* merge dev_python

* Update reshape_like_op.cpp (#2213)

* oneflow.parallel (#2211)

* oneflow.parallel

* refactor split_axis => parallel

* rename parallel => distribute

* fix typo: *Parallel => *Distribute

* add blob_desc.with_split_distribute(axis) and blob_desc.with_broadcast_distribute()

* merge dev_python

* fix boxing: P->S(0)

* check in docker build scripts (#2216)

* Dev python widedeep docker (#2218)

* check in docker build scripts

* check in .dockerignore

* rm oneflow.segment_sum

* remove segment_sum

* rm unused file

* rm debug code

* rm debug code

* rm double empty lines

* remove useless comments

* fix send msg (#2227)

* fix reduction_coefficient (#2228)

* refactor ndarray for eq/ne/...

* Dev kernel launch synchronized (#2230)

* IsKernelLaunchSynchronized

* virtual

* refine

* refine

* seperate LOGICAL_BINARY_FUNC from ARITHMETIC_BINARY_FUNC

* more static_assert

* remove unused task related dot function (#2236)

* remove unused task related dot function

* do not output dot rank info

* Dev non distributed optimizer js (#2234)

* op&kernel&actor

* job

* job_completer

* graph

* format

* fix pd

* fix

* ignore DelPlacementByOpName

* fix auto tick

* JobBuilder

* fix

* config util

* fix

* fix opgrade

* broadcast tick

* fix allreduce

* balance by model size

* GetSoleOutBlobSize

* async_actor_msg_deque

* group

* AddOrMutOpsOnlyOnce

* fix NcclTupleBroadcastGrad

* order

* set nccl order hint

* op_conf

* grad hint

* NcclTupleBroadcastReduceSequencePass

* add missed mutops

* order fix

* try kMdUpdtArea

* fix nccl_order_hint

* fix

* add ti

* tuple_identity_op

* remove useless

* group

* fix dead lock

* force ctrl in

* sc broadcast

* sort obn

* group nccl

* config group_size_mbyte

* non_distributed_optimizer_group_size_mbyte

* format

* stop check

* rm message sending optimization

* refine lazy adam (#2244)

* refine lazy adam

* update

* memory version 2 step 1: replace original concept about mem sharing (#2242)

* mem_shared_id -> mem_block_id;  mem_shared_off_set -> mem_block_offset; enable_mem_sharing->enable_reuse_mem

* memory version 2 step 1: replace original concept about mem sharing

* record reader multi thread (#2246)

* multi thread

* ComputeThreadPoolSize

* python api

3960d2cb

Sep 19, 2019
- Dev kernel launch synchronized (#2230) · 7ee0d54c
  Juncheng authored 5 years ago
```
* IsKernelLaunchSynchronized

* virtual

* refine

* refine
```
  7ee0d54c
- Dev actor msg queue (#2225) · 6017512d
  Juncheng authored 5 years ago
```
* async msg queue

* EnqueueAsyncMsg
```
  6017512d
Sep 04, 2019

rm job_conf.num_of_batches_in_snapshot · 9bcdf707
lixinqi authored 5 years ago

9bcdf707

Dev model init op (#2117) · 26717534

Juncheng authored 5 years ago

* assign op


AddGlobalStepOpConf


fix


ARITHMETIC_DATA_TYPE_SEQ


identity_op_conf


add ops


GenNewSnapshotName


SnapshotOp


cleanup


blob name


LearningRateScheduleOp


LearningRateScheduleKernel


LearningRateScheduleKernel


AddLearningRateScheduleOpConf


learning rate


cleanup


fix


fix

* remove total_mbn_num

* date time format

* save

* refine

* refine

* revert

* refine snapshot

* fix

* refine

* AutoGlobalStep

* refine

* GenLogicalBlobName

* AutoLearningRate

* remove JobDesc lr

* fix snapshot path

* Maybe<void>

* learning_rate blob

* remove next_model_vid


fix


fix 


fix


learning_rate

* train_conf

* fix for global step on multi nodes

* SnapshotReader


snapshot writer


model init op


fix


refine


init


InitializeFromSnapshotConf


model io job


ModelLoadOp


ModelLoadKernel


MakeModelLoadJob


ModelSaveOp


fix


InterUserJobInfo


_MakeModelLoadJobFunc


MutModelLoadOpConTickInputHelper


fix


refine


init/load/save


set_default_variable

* remove SnapshotMgr

* snapshot.h

* delete model_init_job.cpp


foreign_input_op_conf


fix


snapshot path


set path


op_conf


fix


fix CopyFromNdarray


to bytes c


use uint8


char2uint8

* model init

* model io

* fix

* ModelSaveKernel

* mutable_batch_axis()->Clear()

* InferBatchAxis

* fix

* refine

* job set

* MakeModelIoJobs

* fix

* jobs

* fix

* model io job

* GenOutputOpConf

* refine snapshot

* refine

* fix

* refine CheckPoint

* remove session

* refine

* refine

* refine

* remove keyword.h/cpp

* refine

* global_step=>train_step

* GetSbpSignatures

* ModelInitOp

26717534

Aug 11, 2019

Bugfix actor case (#1995) · 9b479e89

Li Xinqi authored 5 years ago

* unlock critical sections as more as possible

* consumed and produced regst of actor 'case' are customized

* refine code

9b479e89

Jun 20, 2019
- remove Actor::Act(bool* cur_act_encounter_eord) · 989358d4
  Xinqi authored 5 years ago
  
  989358d4
- ReentrantLockComputeActor::eord_regst_desc_id_ · 5e7a5660
  Xinqi authored 5 years ago
  
  5e7a5660
Jun 19, 2019
- fix eord bug · 892cd80f
  Xinqi authored 5 years ago
  
  892cd80f
Jun 05, 2019
- actor::IsReadReady/WriteReady add const qualifier (#1892) · 3cc42016
  Juncheng authored 5 years ago
  
  3cc42016
May 31, 2019
- rm Global<JobDesc> under actor and kernel directory · f4c74235
  Xinqi authored 5 years ago
  
  f4c74235
May 26, 2019

Dev inplace (#1879) · 36749a85

Li Xinqi authored 5 years ago

* OpGraph::MakePredicatorIsAllLbiConsumersReachableToOpName

* refactor TaskGraph::EnableInplaceMemSharing

* TaskGraph::GetSafeInplaceOpBlobArgList

* InplaceLbiGraph::ForEachSafeInplaceEdgesInSourceOpSubTree

* fix a typo

* TaskGraph::SetTaskRegstInplaceInfo

* InplaceRegstGraph

* refine

* refine IsLbiOnTaskEdge

* fix a bug in TaskGraph::ForEachDeviceNodes

* ForEachGpuDeviceNodes

* remove wrong CHECK

* fix wrong use of std::unordered_set::erase

* fix a bug in TaskGraph::GetInplaceOpBlobArgList

* fix inplace bugs

* fix error CHECK between inplace in dptr and inplace out dptr

36749a85

May 16, 2019

Dev inplace obn graph (#1868) · 3563ba46

Li Xinqi authored 5 years ago

* InplaceObnGraph

* more checks in InplaceObnGraph::InitNodes

* framework of InplaceObnGraph::ComputeSafeInplaceObns

* refine InplaceObnGraph::ComputeSafeInplaceObns

* replace InplaceObnGraph with InplaceLbiGraph

* fix three types of mut_ref conflicts

* InplaceLbiGraph::FindFirstConstRefConflictMutRefEdge

* fix bugs in InplaceLbiGraph::ComputeSafeInplaceObns

* InplaceLbiGraph::DisconnectUnReachabeAndDataMutableEdge

* InplaceLbiGraph::FixMutRefConflictsFromSourceOpNode

* InplaceLbiGraph::FixMutRefConflictsFromSourceOpNode

* Graph::FindFirstBackEdgeDstNode

* more CHECK_ISNULL

* fix a bug in Graph::FindFirstBackEdgeDstNode()

a

* fix bugs in Graph<NodeType, EdgeType>::ForEachConnectedComponent

* rename GetIsMutableIbnConsumer => FindSoleMutableIbnConsumer

* refine InplaceLbiGraph::IsConstRefConflictMutRefNode

* there could be no mut_ref node found in InplaceLbiGraph::FindFirstInterOpRefConflictMutRefEdge

* refine InplaceLbiGraph::FindFirstInterOpRefConflictMutRefEdge

* remove unnecessary CHECK in InplaceLbiGraph::GetSafeInplaceObnEdges

* fix a line of comment in InplaceLbiGraph::GetSafeInplaceObnEdges

* shouldn't delete the edge to updt_node

* refine InplaceLbiGraph::FixMutRefConflictsFromSourceOpNode

* refine FindFirstIntraOpRefConflictMutRefEdge

* fix a bug in InplaceLbiGraph::FindFirstIntraOpRefConflictMutRefEdge

* CheckSubGraph

* change some lambdas to functions

3563ba46

May 06, 2019

feat: support inplace in Actor (#1781) · 606d8446

Niu Chong authored 5 years ago

* style(actor.h): move customized virtual function declaration together

* feat: update Actor to support inplace regst

* feat: check inplace in/out regst dptr equal

606d8446

Feb 21, 2019

Merge master to develop (part 3) (#1657) · 07ab09a5

Shiyuan Shang-Guan authored 6 years ago

* gpu (#1310)

* Fix snapshot (#1320)

* fix bug of snapshot

* refine distribute.sh

* use more accurate function calls

* rename function

* update for model parallel

* refine code

* feat: enhance cmake download & options (#1281)

* feat: enhance cmake download & options

* feat(tools/): add share libs build scripts

* fix: add cmake options

* feat: add 3rd party download

* chore: updat README

* fix: fix protobuf & cmake repo

* fix: fix options name

* chore: merge 3rd_party.cmake & third_party.cmake

* chore: revert pre cmake URL fix

* chore: update ExternalProject check

* fix: fix typo & missing download

* fix: fix download url

* chore: update readme

* chore: fix typo

* fix: fix bugs

* fix: fix bugs

* fix: fix pre

* print all third party libs

* refine readme

* DOWNLOAD_THIRD_PARTY -> PRECOMPILED_THIRD_PARTY

* refine readme

* minor typo fix

* Fix bug in model parallel (#1345)

* fix conv in model parallel

* add TODO

* Fix bug in gcc54 (#1352)

* fix bug in gcc 5.4

* update

* refine ibverbs lib (#1391)

* refine link ibverbs lib

* modify minor

* fix a little bug in accuracy print (#1403)

* batch num for prediction (#1405)

* batch num for prediction

* !Train() => Predict()

* fix normlization epsilon check (#1433)

* Fix normlization epsilon check (#1441)

* fix normlization epsilon check

* remove check, fix eplison value in op_conf

* align with tensorflow (#1461)

* Dev crop with random size (#1468)

* random size crop proto

* ImagePreprocessImpl::<kCropWithRandomSize>

* clang format

* MaxVal

* Dev jinyi offline build (#1476)

* chore: remove pre compiler funcs

* chore: add submoudles

* fix: fix project build URL from git_url -> submodule_dir_url

* fix: fix submodule commit id

* fix: fix .gitmodules

* chore: mv third_party dir

* chore: remove test-driver(glog#188) link in glog submodule

* fix: update glog from: da816ea70645e463aa04f9564544939fa327d5a7 ==> to: 4f3e18bf26cdb794fc66cec348f57b5838a0c929

* chore: update README.md

* Dev prelu (#1474)

* GPU impl of prelu

* better naming

* address review

* renaming and use ? :

* address reviews on op

* change op conf

* rename weight

* allow 2d+

* not elementwise op

* naive impl

* minor fix

* rename

* remove dupl

* refacor to remove duplicate

* remove dup

* remove dup

* reverse condition

* move code

* remove useless arg

* refactoring

* add empty line

* fix jpeg encoder quality (#1450)

* fix(actor.cpp): never access regst after sending it to producer (#1531)

* fix(boxing_actor): not handle ctrl regst in NormalProcessNaiveReadableRegstMsg() (#1520)

* Dev center crop (#1542)

* center crop

* update

* add scalar_mul (#1553)

*  refactor(actor/*): update the {RegstNameType, {}} to std::make_pair (#1605)

* fix(boxing_actor): not handle ctrl regst in NormalProcessNaiveReadableRegstMsg()

* refactor(actor/*): update the {RegstNameType, {}} to std::make_pair

* fix record_num in blob (#1619)

* fix record_num in blob

* add comment

07ab09a5

Feb 19, 2019

Dev bert merge develop (#1650) · 59eb55c1

Li Xinqi authored 6 years ago

* Implement gelu op (#1478)

* gelu op

* call different funcs for float and double

* Dev bert gather op (#1483)

* embedding_dense_op

* refine

* gather op

* revert

* Fix gelu bug (#1484)

* fix inherit bug

* fix backward formula

* fix bug

* Dev variable op (#1485)

* DefineTestBlobConf => DefineTestBlobOpConf (#1480)

* variable op

* Dev variable op disable memsharing (#1487)

* disable mem sharing for VariableOp

* variable disable tick diff

* fix

* refine

* options transpose_a and transpose_b for Matmul

* matmul operator conf

* Dev bert const scalar op (#1488)

* const scalar  op

* refine

* fix

* data parallel only

* const range op (#1489)

* square and sqrt

* broadcast_binary_op

* feat: add mean op (#1490)

* feat: add mean op

* feat: add mean_kernel

* feat: add implementation

* feat: fix mean kernel

* Dev bert slice op (#1491)

* add op_conf

* add slice op impl

* add space kernel impl

* fix

* same semantic as python

* optional start and end

* fix

* add has_dim0_in_shape in reshape op (#1486)

* refine CHECK in broadcast_binary_op

* feat: add kernel implement for broadcast_mul/div

* Impl square && sqrt (#1495)

* impl square && sqrt

* fix typo

* Dev bert slice op (#1496)

* add op_conf

* add slice op impl

* add space kernel impl

* fix

* same semantic as python

* optional start and end

* fix

* slice kernel cpu impl

* modify coding style

* BiasAddOpConf

* refactor(broadcast_div_kernel): update kernel util api

* Dev bert cosnt range use device piece size (#1498)

* use device_piece_size

* cosnt size => size

* fix

* no check in BroadcastBinaryOp::InitFromProto

* override GetCustomizedConfs for broadcast_binary_op

* fix: fix bugs in broadcast_div/mul kernel (#1502)

* fix: fix bugs in broadcast_div/mul kernel

* fix

* fix: fix the infer bw_buf blobdesc bug in broadcast_binary op

* Bias Add Op && Kernel (#1503)

* pass compile

* fix typo

* Matmul kernel implementation (#1494)

* pass compile

* add comment

* fix bug

* Dev bert const scalar kernel (#1492)

* const scalar kernel

* fix

* fix

* init

* empty const range kernel

* sketch of gather kernel

* gather kernel

* refine

* refine

* const range kernel

* refine

* backward

* const range size

* gather kernel

* assert index

* add truncated_normal initializer (#1499)

* add truncated_normal initializer

* rename RngTruncatedNormal

* fix: add const override for InferBwBufBlobDescs in BroadcastBinaryOp

* fix: udpate the supported data type from floating to arithmetic

* enforce 2d on bias add

* Dev bert slice op (#1500)

* add op_conf

* add slice op impl

* add space kernel impl

* fix

* same semantic as python

* optional start and end

* fix

* slice kernel cpu impl

* modify coding style

* slice gpu impl const buf infer

* add slice gpu impl

* simplify slice cpu impl

* fix gpu impl bug

* fix typo

* add forward function from broadcast_add,broadcast_sub

* feat: add gpu impl of cast kernel (#1504)

* Dev nc cast (#1507)

* feat: add gpu impl of cast kernel

* register gpu cast op

* Fix broadcast binary all dim size 1 (#1505)

* remove check NumAxes

* check scalar

* IsScalarBlob

* b_diff=>b (#1509)

* feat: add LayerNormOp/Kernel without kernel implement (#1510)

* fix: fix missing registing layer_normalization kernel

* fix: fix missing registing layer_normalization op

* fix: temply remove activation from layer_norm_kernel

* ExecShapeUtil

* broadcast_binary_xpu_util.h

* add bw kernel of broadcast_add

* Dev constant (#1513)

* constant_op

* init_op_conf

* sequence=>range

* Dev broadcast add (#1514)

* ExecShapeUtil

* broadcast_binary_xpu_util.h

* add bw kernel of broadcast_add

* WITH_CUDA_PARAM

* left extended shape

* xpu_ndarray_builder

* add bw kernel of broadcast_sub

* updt to 1d (#1512)

* fix small in xpu_reduce_ndarray

* fix(broadcast_binary_op): fix the wrong data_type of bw_buf regst (#1515)

* feat(mean): update mean_op/kernel for calc only last dim of blob (#1516)

* fix(mean_kernel): fix typo

* ndarray reduce

* new reduce

* fix shape of tmp_storage

* reduce

* more check for NdArrayReduce

* ImplaceApplyUnary<UnaryFuncMinus>

* ndarray_apply_broadcast_binary

* delte useless files

* complete backward kernel of broadcast_mul

* add backward kernel of broadcast_div

* broadcast binary op check data type equal (#1508)

* fix bug in broadcast_binary

* debug op

* EncodeBlob

* const_out_blob_feature_load_file

* DefineTestBlobOpConf.has_diff

* indices has_diff = false (#1519)

* adam model update (#1518)

* adam model update

* add comment

* update

* add correct_deviation flag

* rename

* remove GetCustomizedConf

* fix bug in mean_op fw kernel

* add sigmoid loss op

* ndarray_apply_broadcast_unary

* reomve multiplier of mean kernel

* fix(boxing_actor): not handle ctrl regst in NormalProcessNaiveReadableRegstMsg()

* fix raw (#1522)

* rsqrt

* XpuReducedNdarray supports expression template

* faster_reduce

* inlined cuda device function

* profiling reduce_sum

* refactor(kernel_util.cu): calc x_strides on cpu instead of on TransposeGpu() (#1525)

* BroadcastBinaryOp

* ExecShape => XpuShape

* fix shape bug in mean bw kernel

* refine XpuNdarrayAssign

* use ndarray broadcast mul (#1529)

* Dev softmax reduce ndarray (#1527)

* softmax use ndarray reduce

* fix shape

* refine reduce

* fix

* remove xpu_ndarray_builder

* fix(actor.cpp): never access regst after sending it to producer

* ndarray_util.h => xpu_util.h

* xpu_ndarray_util.h => ndarray_util.h

* XpuNdArrayUtil => NdarrayUtil

* SwitchReduce(SwitchCase(num_axes), ...) => Reduce(...)

* refactor: rename NormalProcessNaiveReadableRegstMsg() to NormalProcessNaiveReadableDataRegstMsg() (#1532)

* SwitchBroadcastApply(SwitchCase(num_axes), ...) => BroadcastApply(...)

* softmax kernel use ndarray reduce  (#1530)

* softmax use ndarray reduce

* fix shape

* refine reduce

* fix

* RowMax=>NdarrayReduce

* SwitchReduce=>Reduce

* move template parameter NDIMS from class NdarrayReduce to methods of class NdarrayReduce

* rename file: ndarray/xpu_ndarray_reduce_test.cpp -> ndarray/ndarray_reduce_test.cpp

* move NdarrayUtil::SwitchReduce(...) to NdarrayReduce::SwitchReduce(...)

* Dev one hot encoder (#1533)

* one_hot op

* ohe

* one hot kernel

* refine

* refine

* remove old

* refine

* refine

* refine

* format

* save m and v in adam_model_update (#1534)

* Dev profile reduce (#1535)

* ndarray_reduce_impl

* NdarrayMatrixRowReduce

* 1) MatrixColReduce; 2) WITH_CUDA_PARAM => RUN_CUDA_KERNEL

* NdarrayScalarReduce

* NdarrayDefaultReduce

* refactor NdarrayReduce<DeviceType device_type, typename T> to NdarrayReduce<DeviceType device_type, typename T, const T(*binary_func)(const T, const T)>

* 1) MaxVal<T>() => GetMaxVal<T>(); MaxValue<T>::value => MaxVal<T>::value

* replace KernelUtil::RowMax with NdarrayUtil::ReduceMax

* NdarrayNoReduce

* eliminate redundant code by macros

* Fix matmul gpu bugs (#1528)

* call different api for batchedgemm

* updt api

* use naive loop

* save work

* save work

* updt impl

* remove useless code

* replace naive loop with cublasgemmbatched

* feat: add ScalarAddOp and ScalarMulOp (#1541)

* Dev nc scalar (#1543)

* feat: add ScalarAddOp and ScalarMulOp

* feat: add ScalarAddKernel and ScalarMulKernel

* fix: ScalarAddOp/ScalarMulOp not inheri from CWiseOp

* fix: fix code style

* fix: fix typo of include file in scalar_add_op/scalar_mul_op

* fix(scalar_mul_kernel): register ScalarMulKerenl

* fix: add MulbyScalarPara(), replace cublas_scal by this on ScalarMulKernel

* fix(scalar_mul_kernel): fix typo

* Dev nc testtrans (#1540)

* feat: update trans kernel

* InitGlobalCudaDeviceProp

* in_blob and out_blob is unnecesary for bw kernel of variable_op and constant_op

* Transpose: the shape elem_cnt of x must not exceed 2^32

* remove LabelType (#1545)

* rm ndarray_reduce_core.*

* Dev identity loss (#1547)

* identity_loss

* loss op

* CalcLossInstanceNum

* mem shared for mdupdt first in regst and md diff add regst (#1546)

* remove useless code (#1548)

* Dev sparse cross entropy (#1550)

* op for sparse cross _entropy

* modify op_conf for sparse cross entropy

* saprse cross entropy kernel

* op

* SparseCrossEntropyKernelUtil

* refine

* refine shape check (#1552)

* refactoring reduce sum (#1554)

* refactoring reduce sum

* also use shape and dptr when bw

* add resize when keepdims

* address reviews

* move functions to Anonymous namespace

* address reviews

* remove auto

* replace find

* rename keepdims

* only enable nccl on gpu

* fix diff add regst size in MdUpdt task node as same as in regst (#1556)

* mem shared for mdupdt first in regst and md diff add regst

* fix diff add regst size in MdUpdt task node as same as in regst

* minor fix

* special occasion when it is a loss op

* Dev loss instance num (#1544)

* loss instance number

* set_has_loss_instance_num_field

* loss

* in_diff

* LossOpFixInDiffHasLossInstanceNum

* remove need_do_loss_instance_num

* move to FixInDiffBlobDescs

* remove

* loss_instance_num use float

* refine

* Boxing ForwardLossInstance

* fix

* fix loss

* fix

* refine

* fix

* refine

* refine

* impl reduce mean

* Dev all reduce ctrl edge (#1558)

* mem shared for mdupdt first in regst and md diff add regst

* feat: add ReduceInplaceIdentity LogicalNode/TaskNode/Op/Kernel

* nccl reduce ctrl edge

* MayConsumeModelDiff

* fix diff add regst size in MdUpdt task node as same as in regst

* eager_reduce_ratio

* mem sharing for ReduceIdentity

* ReduceInplaceIdentity => ReduceIdentity

* reduce ctrl edge supports for arbitrary placement

* refine ChainLogicalGraph::IsLogicalNodeMergeable

* model name (#1561)

* Dev gather refine (#1517)

* gather op index support all int type and axis

* out=in

* reformat

* negative axis

* LookupKernel=>GatherKernel

* reformat

* refine

* axis

* refine & bugfix

* remove ConstScalar and ConstRange (#1526)

* Refine range initializer (#1523)

* support axis

* refine naming

* fix before_dim_size

* doc

* refine

* refine naming

* refine naming

* VariableLogicalNode

* identity (#1563)

* total_instance_num use naive mdupdt (#1564)

* patch by hand from faster_rcnn

* revert LogicalVariableOp

* Dev clone boxing (#1566)

* identity

* reduce clone boxing

* Dev clone boxing (#1568)

* identity

* reduce clone boxing

* tuple identity

* Dev tick (#1571)

* feat: add Tick LogicalNode/TaskNode/Op/Kernel

* feat: remove Tick LogicalNode/TaskNode

* feat: add BldSubTskGphByTickToSource for TickOp

* refine: refine due to comment

* feat: add BldSubTskGphByRecordLoadToTick

* pr tick op/kernel alone

* feat: add TickOp and BldSubTskGphByTickToSource  (#1565)

* feat: add Tick LogicalNode/TaskNode/Op/Kernel

* feat: remove Tick LogicalNode/TaskNode

* feat: add BldSubTskGphByTickToSource for TickOp

* refine: refine due to comment

* feat: add BldSubTskGphByRecordLoadToTick

* refine: refine due to comment

* refine: due to comment

* refine: remove BldSubTskGphByRecordLoadToTick

* fix tick op in dlnet (#1572)

* Dev clip by global norm (#1521)

* clip_by_global_norm

* update

* refine model_update op

* remove useless code

* fix name

* rename clip_norm

* remove useless code

* force init memory and add CHECK()

* remove useless code and add comment

* fixbug

* refine code

* Dev bert profile (#1573)

* 1) refactor reduce_group; 2) add new stream kReduceCtrl

* 1) allreduce and model_update overlapping; 2) allreduce and fw overlapping

* add mdupdt ctrl edges within reduce group (#1575)

* Dev group all reduce by model bytes (#1577)

* group all reduce by model byte size

* mv OpGraph into a seperate file op_graph.h

* gelu (#1578)

* Dev bert layer norm (#1574)

* layer norm

* layer_norm

* fix trainable

* fix

* fix trainable

* refine

* Dev bert cuda event sync (#1581)

* cudaSetDevice in actor poller threads

* ReduceConcatCompActor ; NaiveActor

* set dev id (#1583)

* Dev bert profiling (#1586)

* profiling

* all_reduce_* option for performance optimization

* fix a mem sharing bug (#1590)

* Fix mem sharing bug (#1593)

* fix a mem sharing bug

* refine by review

* remove previous if condition

* refine

* Dev profiling adam (#1592)

* profiling

* all_reduce_* option for performance optimization

* faster adam kernel

* Dev refine transpose (#1594)

* profiling

* all_reduce_* option for performance optimization

* faster adam kernel

* refine dropout and transpose

* loss print duration (#1598)

* pseudo chains of OpGraph

* ConvertPseudoChainToChain

* refine pseudo_chain

* refine register coloring algorithm

* rename op_graph log file name

* remove unused code

* Dev bigger chain (#1601)

* pseudo chains of OpGraph

* ConvertPseudoChainToChain

* refine pseudo_chain

* refine register coloring algorithm

* rename op_graph log file name

* remove unused code

* chore: add -gencode in CMakeLists.txt (#1603)

* EnableMemSharingInVariableOp

* no mem_sharing for out_diff & model_diff in variable_op

* Dev mem sharing for variable op (#1604)

* pseudo chains of OpGraph

* ConvertPseudoChainToChain

* refine pseudo_chain

* refine register coloring algorithm

* rename op_graph log file name

* remove unused code

* EnableMemSharingInVariableOp

* no mem_sharing for out_diff & model_diff in variable_op

* refine code

* Fix jxf reduce concat bug (#1606)

* refine logic to infer reduce_concat_op's elem_cnt of out blob, still have bugs...

* add RoundUp in reduce_concat

* CHECK_LE -> CHECK_EQ

* add CHECK

* Dev random shuffle (#1607)

* random shuffle

* fix

* refine

* refine

* single thread

* refine

* cmake add half (#1609)

* Bugfix no tick diff (#1614)

* group by has_diff

* rm unnecessary identity

* share model_diff and out_diff in variable op (#1616)

* share model_diff and out_diff in variable op

* bugfix: model_diff is a produced register

* register_num of model_diff is 1

* add VariableKernelConf

* no mutable

* bugfix

* bugfix: set ctrl_regst's return_regst_num (#1617)

* 带策略的寄存器着色 (#1613)

* mem_shared_hint_id

* sharable memory block

* rm useless code

* remove useless code

* bugfix: no redundant edges

* rename: MemBlockGroup => MemBlock

* put constrcutor of SharableMemBlockNode into header file

* bugfix

* rename field: MemBlock.block_id => MemBlock.mem_block_id

* refine CHECK in AllReduce (#1618)

* refine CHECK in AllReduce

* move ReduceConcatOpCtx definition to .cpp file

* fix fw_consumer nullptr (#1622)

* faster improver (#1628)

* multithreads register coloring (#1630)

* multithreads register coloring

* refine code

* Dev bert accuracy with weight (#1632)

* accuracy

* accuracy_task_node add fw_buf

* fw_buf=>data_tmp

* Dev logical blob dim0 (#1625)

* mem_shared_hint_id

* sharable memory block

* rm useless code

* remove useless code

* bugfix: no redundant edges

* rename: MemBlockGroup => MemBlock

* put constrcutor of SharableMemBlockNode into header file

* bugfix

* rename field: MemBlock.block_id => MemBlock.mem_block_id

* replace piece_size with logical_blob_dim0

* BlobParallelConf

* BlobParallelDesc

* infer out blob model_split_axis

* int64_t => int32_t

* InferOutBlobParallelDesc

* gather out blob model split (#1624)

* InferBlobParallelDesc

* let variable op support kModelParallel

* rename lbi2blob_desc_ => lbi2no_parallel_blob_desc_

* Global<OpGraph>

* SplitLogicalInputBlobDesc

* ConcatOutputBlobDescs

* rename: BlobDataParallel => DataBlobParallel; BlobModelParallel => ModelBlobParallel; BlobGridParallel => GridBlobParallel

* OpGraph::CheckBlobDescs(...)

* exact division is unnecessary

* fix bugs

* rename InferOutBlob* => InferOutputBlob

* exact division in variable_op is unnecessary

* bug fix

* fix bugs

* fix bugs

* IsInputBlobAllowedModelSplit

* use Global<OpGraph> to InferModelSize

* add OpGraph::GetDataBalancedSplitter and OpGraph::GetModelBalancedSplitter

* fix IdentityOp::IsInputBlobAllowedModelSplit

* no implementation for pure virtual function Operator::IsInputBlobAllowedModelSplit

* refine BlobParallelDesc: replace CopyParallelConf with operator=

* refine ParallelDesc: remove unused functions

* more checks on ParallelDesc

* Dev logical blob dim0 (#1635)

* mem_shared_hint_id

* sharable memory block

* rm useless code

* remove useless code

* bugfix: no redundant edges

* rename: MemBlockGroup => MemBlock

* put constrcutor of SharableMemBlockNode into header file

* bugfix

* rename field: MemBlock.block_id => MemBlock.mem_block_id

* replace piece_size with logical_blob_dim0

* BlobParallelConf

* BlobParallelDesc

* infer out blob model_split_axis

* int64_t => int32_t

* InferOutBlobParallelDesc

* gather out blob model split (#1624)

* InferBlobParallelDesc

* let variable op support kModelParallel

* rename lbi2blob_desc_ => lbi2no_parallel_blob_desc_

* Global<OpGraph>

* SplitLogicalInputBlobDesc

* ConcatOutputBlobDescs

* rename: BlobDataParallel => DataBlobParallel; BlobModelParallel => ModelBlobParallel; BlobGridParallel => GridBlobParallel

* OpGraph::CheckBlobDescs(...)

* exact division is unnecessary

* fix bugs

* rename InferOutBlob* => InferOutputBlob

* exact division in variable_op is unnecessary

* bug fix

* fix bugs

* fix bugs

* IsInputBlobAllowedModelSplit

* use Global<OpGraph> to InferModelSize

* add OpGraph::GetDataBalancedSplitter and OpGraph::GetModelBalancedSplitter

* fix IdentityOp::IsInputBlobAllowedModelSplit

* no implementation for pure virtual function Operator::IsInputBlobAllowedModelSplit

* refine BlobParallelDesc: replace CopyParallelConf with operator=

* refine ParallelDesc: remove unused functions

* more checks on ParallelDesc

* remove unused function Operator::MaxModelSplitNum

* bugfix: SoleOp() => op_vec().at(0)

* Dev global op graph (#1636)

* Global<OpGraph> is only available duraing compilation

* small record_piece_size for InferNoParallelBlobDesc

* Dev op graph piece size (#1637)

* fix a bug in OpGraph::InferNoParallelBlobDesc

* fix a bug in OpGraph::InferNoParallelBlobDesc

* DfsTopoForEachNodeSortByDistanceToSink (#1638)

* Dev jxf bert top k (#1633)

* top_k

* dev top_k op

* refine

* fix bug

* refactor top_k op, cooperate with gather op to get values now

* customized TOPK_KERNEL_ENTRY in auto factory

* batch gather op

* refine

* Backup: batch_gather op, pass compile

* fix bugs, pass the test

* fix no new line at the end of file

* const

* refine by review

* fix bugs

* rename: instance_dim -> instance_size

* remove a blank line

* refine coding style by Juncheng's suggestions, Bravo

* refine top_k

* more refine

* compatible with new model parallel

* refine

* rename

* cpu only in top_k

* Dev model boxing (#1639)

* mem_shared_hint_id

* sharable memory block

* rm useless code

* remove useless code

* bugfix: no redundant edges

* rename: MemBlockGroup => MemBlock

* put constrcutor of SharableMemBlockNode into header file

* bugfix

* rename field: MemBlock.block_id => MemBlock.mem_block_id

* replace piece_size with logical_blob_dim0

* BlobParallelConf

* BlobParallelDesc

* infer out blob model_split_axis

* int64_t => int32_t

* InferOutBlobParallelDesc

* gather out blob model split (#1624)

* InferBlobParallelDesc

* let variable op support kModelParallel

* rename lbi2blob_desc_ => lbi2no_parallel_blob_desc_

* Global<OpGraph>

* SplitLogicalInputBlobDesc

* ConcatOutputBlobDescs

* rename: BlobDataParallel => DataBlobParallel; BlobModelParallel => ModelBlobParallel; BlobGridParallel => GridBlobParallel

* OpGraph::CheckBlobDescs(...)

* exact division is unnecessary

* fix bugs

* rename InferOutBlob* => InferOutputBlob

* exact division in variable_op is unnecessary

* bug fix

* fix bugs

* fix bugs

* IsInputBlobAllowedModelSplit

* use Global<OpGraph> to InferModelSize

* add OpGraph::GetDataBalancedSplitter and OpGraph::GetModelBalancedSplitter

* fix IdentityOp::IsInputBlobAllowedModelSplit

* no implementation for pure virtual function Operator::IsInputBlobAllowedModelSplit

* refine BlobParallelDesc: replace CopyParallelConf with operator=

* refine ParallelDesc: remove unused functions

* more checks on ParallelDesc

* remove unused function Operator::MaxModelSplitNum

* BlobParallelDesc::EquivalentTo

* LogicalNOde::main_model_parallel_ is out of date

* refine Operator: replace IsElemWiseOp with IsSoleInputBlobAllowedModelSplit

* refine transpose conf

* fix a bug in Operator::FixParallelDesc

* InferInputBlobModelSplitAxis

* BlobParallelType

* more default behaviors for Operator::InferInputOutputBlobParallelType

* op_parallel_signature

* rename: BlobParallelType => LogicalBlobParallelDesc

* OpGraph::InferLogicalBlobParallelDesc

* refactor SplitLogicalInputBlobDesc by LogicalBlobParallelDesc

* refine OpNode::ConcatBlobDesc By LogicalBlobParallelDesc

* OpNode::lbi2model_split_axis_

* OpGraph::GetBalancedSplitter

* replace OpGraph::GetBlobParallelDesc4Lbi with OpGraph::GetLbpd4Lbi

* rm BlobParallelDesc in OpGraph

* VariableOp::InitOpParallelSignatures

* rm BlobParallelDesc

* rename Make*ParalelSignature functions

* MakeOpParallelSignature_DS_MC_2_DS

* MakeOpParallelSignature_DC_MS_2_MS

* BiasAddOp::InitOpParallelSignatures

* refine MakeOpParallelSignature_DC_MS_2_MS

* MatmulOp::InitOpParallelSignatures

* GatherOp::InitOpParallelSignatures

* bugfix: model_split_axis cannot equals -1 when parallel_policy is kModelParallel

* refactor: bn2parallel_id2blob_desc => lbi2parallel_id2blob_desc

* refine OpNode

* LogicalBlobParallelConf

* LogicalBlobParallelDesc::DualLbpd

* 1) merge dev_bert;
2) placement.proto not used in logical_blob_parallel_conf.proto

* bugfix: 1) remove CHECK(has_model) in Operator::NaiveInitOpParallelSignatures; 2) lbpd->set_parallel_num(val)

* fix bugs in GatherOp::InitOpParallelSignatures and BroadcastBinaryOp::InitOpParallelSignatures

* refactor: InitOpParallelSignatures => GetOpParallelSignatures

* refactor: const OpParallelSignature => std::unique_ptr<const OpParallelSignature>

* rm LogicalBlobParallelConf

* refactor: ModelSplitAxis4BnInOp => LbpdHint4BnInOp

* fix bugs about LbpdHint

* simplify the interface of InferInputOutputBlobLogicalBlobParallelDescIf

* rename Class CloneParallel => BroadcastParallel

* rename field: clone_parallel => broadcast_parallel

* refactor LbpdHint by SbpParallel

* InferIsModelBlob4OutputBlobsIf

* remove field LogicalBlobParallelDesc::parallel_num

* rename: LogicalBlobParallelDesc => SbpParallel

* rename: LbpdHint =>SbpInferHint

* simplify interface Operator::InferOutputBlobSbpInferHint

* rename api: Operator::InferBlobSbpInferHintIf => Operator::InferOuputBlobsSbpInferHintIf

* OpGraph::InferIsModelBlob

* rename file: logical_blob_parallel_desc.* => sbp_parallel.*

* rename filename: lbpd_hint* => sbp_infer_hint*

* rename field: SbpInferHint::has_data_split => SbpInferHint::is_data_split

* rename fields: SbpInferHint::is_data_split, is_model_split, is_data_partial_sum, is_model_broadcast

* refactor SbpInferHint::split_axis

* LambdaOpParallelSignature

* replace function MakeVariableOpDataSplitOpParallelSignature with class VariableOpDataSplitOpParallelSignature

* replace function MakeVariableOpModelSplitOpParallelSignature with class VariableOpModelSplitOpParallelSignature

* BroadcastBinaryOpParallelSignature

* Matmul_DMS_MS_2_P_OpParallelSignature

* Gather_DC_MS_2_P_OpParallelSignature

* class DataSplitOpParallelSignature

* class ModelBroadcastOpParallelSignature

* class DS_MC_2_DS_OpParallelSignature

* add field OpParallelSignature::op_

* refactor: ModelSplitAxis => OutputBlobModelSplitAxis

* remove Operator::InferOuputBlobsSbpInferHintIf

* implement MatmulOp::OutputBlobModelSplitAxis

* implement GatherOp::OutputBlobModelSplitAxis

* implement TransposeOp::OutputBlobModelSplitAxis and BiasAddOp::OutputBlobModelSplitAxis

* add method OpGraph::IsDataBlob

* refactor OpGraph::InferSbpParallel

* refactor class SbpInferHint

* rename local variable: SbpInferHint4BnInOp => SbpInferHint4Ibn

* refactor MakeModelSplitOpParallelSignature

* refactor Make_DC_MS_2_MS_OpParallelSignature

* remove unused class LambdaOpParallelSignature; refactor class name '*Clone*' => '*Broadcast*'

* bugfix: Operator::OutputBlobModelSplitAxis for sole-ibn op

* fix bugs in SbpInferHint::has_split_axis(), SbpInferHint::split_axis and OpNode::IsModelBlob4Lbi

* refactor class SbpInferHint: replace split_axis_ with sbp_parallel_

* refactor by SbpInferHint::sbp_parallel

* 1) rename OpNode data member; 2) rm unused proto

* fix clone (#1641)

* OpGraph::GetBlobDataType (#1643)

* OpGraph::GetBlobDataType

* refine OpGraph::GetBlobDataType

* IdentityOp => TupleIdentityOp (#1644)

* Dev sbp parallel cast (#1646)

* add SbpParallelCastOp

* only SplitParallel and BroadcastParallel can be user customized

* rename: SbpParallelCastOp => ParallelCastOp

* build boxing_conf by sbp_parallel

* fix a bug in BroadcastBinaryOpParallelSignature

* support broadcast_parallel for sole-ibn op

* 1) build boxing_op_conf by sbp_parallel for tuple_identity_op;
2) no op parallel desc fix for kModelParallel;
3) fix a in TaskGraph::EnableMemSharingInVariableOp
4) add TupleIdentityOpParallelSignature

* fix bug in IsModelParallel121 (#1648)

* merge develop

* merge develop (#1649)

59eb55c1

Oct 01, 2018

fix: add AsyncSednRegstMsgToConsumer() for send single produced regst, e.g.... · 139c2241

Niu Chong authored 6 years ago

fix: add AsyncSednRegstMsgToConsumer() for send single produced regst, e.g. forward_model_regst (#1274)

* fix(normal_model_update_compute_actor): fix send forward_model_regst_ to consumer

* fix: add AsyncSednRegstMsgToConsumer() for send single produced regst, e.g. forward_model_regst

139c2241

fix(normal_forward_compute_actor): fix SendMsgToForwardModelSaveActor() (#1270) · d746016e
Niu Chong authored 6 years ago
```
* fix(normal_forward_compute_actor): fix SendMsgToForwardModelSaveActor()

* refine(normal_forward_compute_actor)
```
d746016e

Sep 30, 2018

Refactor Actor (#1259) · e042befc

Niu Chong authored 6 years ago

* feat(register_slot): add the RegstSlot

* feat(register_slot): update RegstSlot if

* feat(actor): update member of Actor to use RegstSlot

* fix(register_slot): fix the available_regst_desc_cnt init val

* refine(register_slot): rename PushBack/PopFront, FindTheRegstDescId to TryPushBack/TryPopFront, HasRegstDescId

* feat(regst_slot): rename ForEachCurRegstDeq/ForEachCurFrontRegst to ForEachRegstDeq/ForEachFrontRegst

* feat(regst_slot): add ForChosenRegstDeq/ForChosenFrontRegst, add CHECK empty in ForEachFrontRegst

* fix(register_slot): fix the CHECK empty

* feat: remove actual_writeable_regst_desc_id_ from Actor, add Naive/CustomizedProducedRegst

* fix(normal_model_update_actor): bug: not send customized regst to consumer when SendIntialModel

* fix(normal_forward_compute_actor): bug: not add kLoss/kAccuracy produced regst to NaiveProducedRegst

* fix(actor): UNIMPLEMENTED() for AsyncSendCustomizedProducedRegstMsgToConsumer

* fix(normal_forward_compute_actor): set const_buf_regst to nullptr when recv from consumers

* fix(actor): total_reading_data_regst_cnt, not total_reading_ctrl_regst_cnt

* refactor: update GetNaiveConsumedRegstDescName to GetNaiveOrCustomizedConsumedRegstDescName(same for Produced)

* feat: combine data_regst and ctrl_regst in Actor

* fix: fix bugs

* fix: fix bugs

* fix: remove .swp files and unused LOG

* feat: split Act and SendMsg (#1255)

* feat: split Act and SendMsg

* refine: rename HandleProduced/ConsumedDataRegst.. to HandleProduced/ConsumedNaiveDatRegst..

* fix(input_wise_comp_actor): bug: not set piece id

* fix(actor): potential bug: produced msg with no allowed actor still pop from queue

* refactor: mv some protected member function to private

* fix(actor): fix the condition about sending EORD msg

* refactor(input_wise_actor): use RegstSlot in InputWiseActor

* fix(copy_comm_net_actor): rename piece_id2regst_ctx to piece_id2regst_ctx_

* refactor: rename Name2RegstDescId to Name2RegstDescIds

* refactor(naive_actor): "override final" instead of only "final"

* refine(actor): little refine

* feat: update the return type of GetNaiveOrCustomizedNamesRegstDescName to enum class RegstNameType

e042befc

Sep 10, 2018
- sketch: let input-wise actor and send input-wise ctrl msg (#1219) · 9ddf4c53
  Jinhui Yuan authored 6 years ago
  
  9ddf4c53
Sep 07, 2018

feat: update the data members to use RegstSlot in Actor (#1208) · 38a50de4

Niu Chong authored 6 years ago

* feat(register_slot): add the RegstSlot

* feat(register_slot): update RegstSlot if

* feat(actor): update member of Actor to use RegstSlot

* fix(register_slot): fix the available_regst_desc_cnt init val

* refine(register_slot): rename PushBack/PopFront, FindTheRegstDescId to TryPushBack/TryPopFront, HasRegstDescId

* feat(regst_slot): rename ForEachCurRegstDeq/ForEachCurFrontRegst to ForEachRegstDeq/ForEachFrontRegst

* feat(regst_slot): add ForChosenRegstDeq/ForChosenFrontRegst, add CHECK empty in ForEachFrontRegst

* fix(register_slot): fix the CHECK empty

38a50de4

Aug 19, 2018
- fix wrong ActNum of ctrl regst produced by AccCompActor (#1136) · 3654d164
  Jinhui Yuan authored 6 years ago
  
  3654d164
- refine act_id order condition (#1088) · 5be84c50
  Jinhui Yuan authored 6 years ago
```
* refine act_id order condition

* strict act id check (excluding model regst)

* add TODO: figure out the ActNumForEachOutput of model regsts to MdSave area
```
  5be84c50
Aug 04, 2018
- ensure the order of regst producing and consuming (#1085) · af808c6e
  Jinhui Yuan authored 6 years ago
  
  af808c6e
Aug 01, 2018
- add produced_ctrl_regst2reading_cnt_ (#1067) · 82f6d43e
  Jinhui Yuan authored 6 years ago
  
  82f6d43e
Jul 16, 2018

feat: Add InputWiseActor for ReduceGlobalAdd and ReduceGather (#1012) · 0ffc781c

Niu Chong authored 6 years ago

* feat: avoid net contention by adding ctrl edge in ReduceStruct

* refine(task_graph.h/cpp): refine AddCtrlEdgeInReduceStruct()

* fix(graph/task_graph.cpp): fix the bug of machine order

* fix(graph/task_graph.cpp): do not add ctrl edge with reduce scatter

* feat: add ReduceGlobalAddCompActor

* fix: fix the bug of reduce_global_actor/kernel

* chore: remove used vim .swp file

* fix(graph/task_graph.cpp): fix the bug of sorting copycomment when build reduce ctrl edge

* fix(graph/task_graph.h/cpp): add CtrlEdge for ReduceGather

* feat: revert add ctrl edge in reduce struct from this PR

* refactor: rename ReduceGlobalAddCompActor to InputWiseCompActor for scalability

* fix(kernel/reduce_global_add_kernel.cpp): use Memcpy other than Memset for first blob to be added

* refactor(actor/input_wise_compute_actor.*): use HashMap and counter instead of HashSet for processed regst_desc

* refactor: let ReduceGlobalAddCompActor inherit InputWiseCompActor

* feature: add ReduceGatherCompActor that inherits InputWiseCompActor

* fix(reduce_gather_kernel.cpp): add missing break

* refactor: replace regst_desc_id2bn_in_op_ with regst_desc_id2in_bn_id_ in InputWiseCompActor

* fix(reduce_global_add_kernel): remove useless class member parallel_id_

* refactor: make ReduceLocalAdd kernel support inputwise, rename ReduceGlobalAddActor to ReduceAddActor for scalibility

0ffc781c

Jul 03, 2018

Refine loader (#982) · 9674be86

Jinhui Yuan authored 6 years ago

* add parallel record decoder

* null data loader

* use real loader

* use libjpeg-turbo

* add streams support, TODO:1,disable internal buffer_;2,use template

* make persistent_in_stream support multiple files

* make compiler support new loader

* minor refine

* make new loader work on mnist

* update proto in benchmark and example

* refactor stream buffer filler

* refine persistent_in_stream

* workable

* add record_loader_op

* finish record loader op

* AddRecordLoaderOps

* make compiler work

* infer shape works

* add record_load_kernel

* let decode actor pass in_regst in normal way

* add kOFRecordPtr type

* remove record regst type

* change ALL_DATA_TYPE_SEQ to ALL_POD_DATA_TYPE_SEQ

* support OFRecordPtr blob

* complete decode_ofrecord_kernel

* allocate OFRecord in Blob<OFRecordPtr>

* fix of record ptr blob

* let actor manage the OFRecord blob

* let regst mgr own ofrecord memory

* workable

* remove useless code

* refine

* NormalRegst -> DataRegst

* OFRecord data type (#984)

* OFRecord data type

* placement new (#987)

* placement new

* fix

* remove useless code

* placement new OFRecord

* remove useless code

* Refactor stream (#985)

* refactor stream scanner

* let persistence_in_stream create the binary stream

* refine persistence_in_stream

* refine

* POD_DATA_TYPE_SEQ

* update placement proto in benchmark and example

9674be86