diff --git a/research/cv/PyramidBox/README_CN.md b/research/cv/PyramidBox/README_CN.md new file mode 100644 index 0000000000000000000000000000000000000000..5f71783086affcddd8cba78742efac1486b93bfb --- /dev/null +++ b/research/cv/PyramidBox/README_CN.md @@ -0,0 +1,434 @@ + +# 鐩綍 + +<!-- TOC --> + +- [鐩綍](#鐩綍) +- [PyramidBox鎻忚堪](#pyramidbox鎻忚堪) +- [妯″瀷鏋舵瀯](#妯″瀷鏋舵瀯) +- [鏁版嵁闆哴(#鏁版嵁闆�) + - [WIDER Face](#wider-face) + - [FDDB](#fddb) +- [鐜瑕佹眰](#鐜瑕佹眰) +- [蹇€熷叆闂╙(#蹇€熷叆闂�) +- [鑴氭湰璇存槑](#鑴氭湰璇存槑) + - [鑴氭湰鍙婃牱渚嬩唬鐮乚(#鑴氭湰鍙婃牱渚嬩唬鐮�) + - [鑴氭湰鍙傛暟](#鑴氭湰鍙傛暟) + - [璁粌妯″瀷](#璁粌妯″瀷) + - [璇勪及妯″瀷](#璇勪及妯″瀷) + - [閰嶇疆鍙傛暟](#閰嶇疆鍙傛暟) + - [璁粌杩囩▼](#璁粌杩囩▼) + - [鍗曞崱璁粌](#鍗曞崱璁粌) + - [澶氬崱璁粌](#澶氬崱璁粌) + - [璇勪及杩囩▼](#璇勪及杩囩▼) +- [妯″瀷鎻忚堪](#妯″瀷鎻忚堪) + - [鎬ц兘](#鎬ц兘) + +<!-- /TOC --> + +# PyramidBox鎻忚堪 + +[PyramidBox](https://arxiv.org/pdf/1803.07737.pdf) 鏄竴绉嶅熀浜嶴SD鐨勫崟闃舵浜鸿劯妫€娴嬪櫒锛屽畠鍒╃敤涓婁笅鏂囦俊鎭В鍐冲洶闅句汉鑴哥殑妫€娴嬮棶棰樸€傚涓嬪浘鎵€绀猴紝PyramidBox鍦ㄥ叚涓昂搴︾殑鐗瑰緛鍥句笂杩涜涓嶅悓灞傜骇鐨勯娴嬨€傝宸ヤ綔涓昏鍖呮嫭浠ヤ笅妯″潡锛歀FPN銆丳yramid Anchors銆丆PM銆丏ata-anchor-sampling銆� + +[璁烘枃](https://arxiv.org/pdf/1803.07737.pdf): Tang, Xu, et al. "Pyramidbox: A context-assisted single shot face detector." Proceedings of the European conference on computer vision (ECCV). 2018. + +# 妯″瀷鏋舵瀯 + +**LFPN**: LFPN鍏ㄧОLow-level Feature Pyramid Networks, 鍦ㄦ娴嬩换鍔′腑锛孡FPN鍙互鍏呭垎缁撳悎楂樺眰娆$殑鍖呭惈鏇村涓婁笅鏂囩殑鐗瑰緛鍜屼綆灞傛鐨勫寘鍚洿澶氱汗鐞嗙殑鐗瑰緛銆傞珮灞傜骇鐗瑰緛琚敤浜庢娴嬪昂瀵歌緝澶х殑浜鸿劯锛岃€屼綆灞傜骇鐗瑰緛琚敤浜庢娴嬪昂瀵歌緝灏忕殑浜鸿劯銆備负浜嗗皢楂樺眰绾х壒寰佹暣鍚堝埌楂樺垎杈ㄧ巼鐨勪綆灞傜骇鐗瑰緛涓婏紝鎴戜滑浠庝腑闂村眰寮€濮嬪仛鑷笂鑰屼笅鐨勮瀺鍚堬紝鏋勫缓Low-level FPN銆� + +**Pyramid Anchors**: 璇ョ畻娉曚娇鐢ㄥ崐鐩戠潱瑙e喅鏂规鏉ョ敓鎴愪笌浜鸿劯妫€娴嬬浉鍏崇殑鍏锋湁璇箟鐨勮繎浼兼爣绛撅紝鎻愬嚭鍩轰簬anchor鐨勮澧冭緟鍔╂柟娉曪紝瀹冨紩鍏ユ湁鐩戠潱鐨勪俊鎭潵瀛︿範杈冨皬鐨勩€佹ā绯婄殑鍜岄儴鍒嗛伄鎸$殑浜鸿劯鐨勮澧冪壒寰併€備娇鐢ㄨ€呭彲浠ユ牴鎹爣娉ㄧ殑浜鸿劯鏍囩锛屾寜鐓т竴瀹氱殑姣斾緥鎵╁厖锛屽緱鍒板ご閮ㄧ殑鏍囩锛堜笂涓嬪乏鍙冲悇鎵╁厖1/2锛夊拰浜轰綋鐨勬爣绛撅紙鍙嚜瀹氫箟鎵╁厖姣斾緥锛夈€� + +**CPM**: CPM鍏ㄧОContext-sensitive Predict Module, 鏈柟娉曡璁′簡涓€绉嶄笂涓嬫枃鏁忔劅缁撴瀯(CPM)鏉ユ彁楂橀娴嬬綉缁滅殑琛ㄨ揪鑳藉姏銆� + +**Data-anchor-sampling**: 璁捐浜嗕竴绉嶆柊鐨勯噰鏍锋柟娉曪紝绉颁綔Data-anchor-sampling锛岃鏂规硶鍙互澧炲姞璁粌鏍锋湰鍦ㄤ笉鍚屽昂搴︿笂鐨勫鏍锋€с€傝鏂规硶鏀瑰彉璁粌鏍锋湰鐨勫垎甯冿紝閲嶇偣鍏虫敞杈冨皬鐨勪汉鑴搞€� + +# 鏁版嵁闆� + +浣跨敤鐨勬暟鎹泦涓€鍏辨湁涓や釜锛� + +1. [WIDER Face](http://mmlab.ie.cuhk.edu.hk/projects/WIDERFace/) +1. [FDDB](http://vis-www.cs.umass.edu/fddb/index.html) + +璇︾粏鍦帮紝 + +## WIDER Face + +- WIDER Face鏁版嵁闆嗙敤浜庤缁冩ā鍨嬪拰楠岃瘉妯″瀷锛屼笅杞借缁冩暟鎹甒IDER Face Training Images锛岃В鍘嬩笅杞界殑WIDER_train鏁版嵁闆嗭紱涓嬭浇楠岃瘉鏁版嵁闆哤IDER Face Validation Images锛岃В鍘嬩笅杞界殑WIDER_val鏁版嵁闆嗐€� +- 涓嬭浇WIDER Face鐨刐鏍囨敞鏂囦欢](http://shuoyang1213.me/WIDERFACE/support/bbx_annotation/wider_face_split.zip)锛岃В鍘嬫垚鏂囦欢澶箇ider_face_split銆� +- 鍦╠ataset鏂囦欢澶逛笅鏂板缓鐩綍WIDERFACE锛屽皢WIDER_train锛學IDER_val鍜寃ider_face_split鏂囦欢澶规斁鍦ㄧ洰褰昗IDERFACE涓嬨€� +- 鏁版嵁闆嗗ぇ灏忥細鍖呭惈32,203寮犲浘鐗囷紝393,703涓爣娉ㄤ汉鑴搞€� + - WIDER_train: 1.4G + - WIDER_val锛�355M +- 妫€鏌IDER_train锛學IDER_val鍜寃ider_face_split鏂囦欢澶瑰湪WIDERFACE鐩綍涓嬨€� + +## FDDB + +- FDDB鏁版嵁闆嗙敤鏉ヨ瘎浼版ā鍨嬶紝涓嬭浇[originalPics.tar.gz](http://vis-www.cs.umass.edu/fddb/originalPics.tar.gz)鍘嬬缉鍖呭拰[FDDB-folds.tgz](http://vis-www.cs.umass.edu/fddb/FDDB-folds.tgz)鍘嬬缉鍖咃紝originalPics.tar.gz鍘嬬缉鍖呭寘鍚湭鏍囨敞鐨勫浘鐗囷紝FDDB-folds.tgz鍖呭惈鏍囨敞淇℃伅銆� + +- 鏁版嵁闆嗗ぇ灏忥細鍖呭惈2,845寮犲浘鐗囧拰5,171涓汉鑴告爣娉ㄣ€� + + - originalPics.tar.gz锛�553M + - FDDB-folds.tgz锛�1M + +- 鍦╠ataset鏂囦欢澶逛笅鏂板缓鏂囦欢澶笷DDB銆� + +- 瑙e帇originalPics.tar.gz鑷矲DDB锛屽寘鍚袱涓枃浠跺す2002鍜�2003锛� + + ````bash + 鈹溾攢鈹€ 2002 + 鈹� 鈹溾攢鈹€ 07 + 鈹� 鈹溾攢鈹€ 08 + 鈹� 鈹溾攢鈹€ 09 + 鈹� 鈹溾攢鈹€ 10 + 鈹� 鈹溾攢鈹€ 11 + 鈹� 鈹斺攢鈹€ 12 + 鈹溾攢鈹€ 2003 + 鈹� 鈹溾攢鈹€ 01 + 鈹� 鈹溾攢鈹€ 02 + 鈹� 鈹溾攢鈹€ 03 + 鈹� 鈹溾攢鈹€ 04 + 鈹� 鈹溾攢鈹€ 05 + 鈹� 鈹溾攢鈹€ 06 + 鈹� 鈹溾攢鈹€ 07 + 鈹� 鈹溾攢鈹€ 08 + 鈹� 鈹斺攢鈹€ 09 + ```` + +- 瑙e帇FDDB-folds.tgz鑷矲DDB锛屽寘鍚�20涓猼xt鏂囦欢锛� + + ```bash + FDDB-folds + 鈹� 鈹溾攢鈹€ FDDB-fold-01-ellipseList.txt + 鈹� 鈹溾攢鈹€ FDDB-fold-01.txt + 鈹� 鈹溾攢鈹€ FDDB-fold-02-ellipseList.txt + 鈹� 鈹溾攢鈹€ FDDB-fold-02.txt + 鈹� 鈹溾攢鈹€ FDDB-fold-03-ellipseList.txt + 鈹� 鈹溾攢鈹€ FDDB-fold-03.txt + 鈹� 鈹溾攢鈹€ FDDB-fold-04-ellipseList.txt + 鈹� 鈹溾攢鈹€ FDDB-fold-04.txt + 鈹� 鈹溾攢鈹€ FDDB-fold-05-ellipseList.txt + 鈹� 鈹溾攢鈹€ FDDB-fold-05.txt + 鈹� 鈹溾攢鈹€ FDDB-fold-06-ellipseList.txt + 鈹� 鈹溾攢鈹€ FDDB-fold-06.txt + 鈹� 鈹溾攢鈹€ FDDB-fold-07-ellipseList.txt + 鈹� 鈹溾攢鈹€ FDDB-fold-07.txt + 鈹� 鈹溾攢鈹€ FDDB-fold-08-ellipseList.txt + 鈹� 鈹溾攢鈹€ FDDB-fold-08.txt + 鈹� 鈹溾攢鈹€ FDDB-fold-09-ellipseList.txt + 鈹� 鈹溾攢鈹€ FDDB-fold-09.txt + 鈹� 鈹溾攢鈹€ FDDB-fold-10-ellipseList.txt + 鈹� 鈹溾攢鈹€ FDDB-fold-10.txt + ``` + +- 妫€鏌�2002锛�2003锛孎DDB-folds涓変釜鏂囦欢澶瑰湪FDDB鏂囦欢澶逛笅锛屼笖FDDB鏂囦欢澶瑰湪dataset鏂囦欢澶逛笅銆� + +--------- +缁间笂锛屼竴鍏辨湁涓€涓缁冮泦鏂囦欢锛屾槸WIDER_train锛涗袱涓獙璇侀泦鏂囦欢锛學IDER_val鍜孎DDB銆� + +缂栬緫`src/config.py`鏂囦欢锛屽皢`_C.HOME`瀛楁鏀规垚dataset鏁版嵁闆嗚矾寰勩€� + +鎬荤殑鏁版嵁闆嗙洰褰曠粨鏋勫涓嬶細 + +```bash +dataset +鈹溾攢鈹€ FDDB +鈹偮犅� 鈹溾攢鈹€ 2002 +鈹偮犅� 鈹溾攢鈹€ 2003 +鈹偮犅� 鈹斺攢鈹€ FDDB-folds +鈹斺攢鈹€ WIDERFACE + 鈹溾攢鈹€ wider_face_split + 鈹偮犅� 鈹溾攢鈹€ readme.txt + 鈹偮犅� 鈹溾攢鈹€ wider_face_test_filelist.txt + 鈹偮犅� 鈹溾攢鈹€ wider_face_test.mat + 鈹偮犅� 鈹溾攢鈹€ wider_face_train_bbx_gt.txt + 鈹偮犅� 鈹溾攢鈹€ wider_face_train.mat + 鈹偮犅� 鈹溾攢鈹€ wider_face_val_bbx_gt.txt + 鈹偮犅� 鈹斺攢鈹€ wider_face_val.mat + 鈹溾攢鈹€ WIDER_train + 鈹偮犅� 鈹斺攢鈹€ images + 鈹斺攢鈹€ WIDER_val + 鈹斺攢鈹€ images +``` + +# 鐜瑕佹眰 + +- 纭欢(Ascend/GPU/CPU) + - 浣跨敤GPU鎼缓纭欢鐜 +- 妗嗘灦 + [MindSpore](https://www.mindspore.cn/install/en) +- 濡傞渶鏌ョ湅璇︽儏锛岃鍙傝濡備笅璧勬簮锛� + - [MindSpore鏁欑▼](https://www.mindspore.cn/tutorials/zh-CN/master/index.html) + - [MindSpore Python API](https://www.mindspore.cn/docs/api/zh-CN/master/index.html) + +# 蹇€熷叆闂� + +閫氳繃瀹樻柟缃戠珯瀹夎MindSpore鍚庯紝鎮ㄥ彲浠ユ寜鐓у涓嬫楠よ繘琛岃缁冨拰璇勪及锛� + +鍦ㄥ紑濮嬭缁冨墠锛岄渶瑕佽繘琛屼互涓嬪噯澶囧伐浣滐細 + +1. 妫€鏌src/config.py`鏂囦欢鐨刞_C.HOME`瀛楁涓篸ataset鏂囦欢澶硅矾寰勩€� +2. 瀵箇ider_face_train_bbx_gt.txt鍜寃ider_face_val_bbx_gt.txt鏂囦欢杩涜棰勫鐞嗕互鐢熸垚face_train.txt鍜宖ace_val.txt鏂囦欢銆� + +````bash +# 棰勫鐞唚ider_face_train_bbx_gt.txt鍜寃ider_face_val_bbx_gt.txt鏂囦欢 + +# 杩涘叆椤圭洰涓荤洰褰� +python preprocess.py +# 鎴愬姛鎵ц鍚巇ata鏂囦欢澶逛笅鍑虹幇face_train.txt鍜宖ace_val.txt +```` + +3. 鐢熸垚face_val.txt鐨刴indrecord鏂囦欢锛岀敤浜庤缁冭繃绋嬩腑楠岃瘉姣忎竴杞ā鍨嬬簿搴︼紝鎵惧嚭鏈€浣宠缁冩ā鍨嬨€� + +```bash +bash scripts/generate_mindrecord.sh +# 鎴愬姛鎵ц鍚巇ata鏂囦欢澶逛笅鍑虹幇val.mindrecord鍜寁al.mindrecord.db鏂囦欢 +``` + +4. 涓嬭浇棰勮缁冨畬鎴愮殑[vgg16.ckpt](https://pan.baidu.com/s/1e5qSW4e1QVZRnbyGRWi91Q?pwd=dryt)鏂囦欢锛岃棰勮缁冩ā鍨嬭浆鑷狿yTorch銆� + +瀹屾垚浠ヤ笂姝ラ鍚庯紝寮€濮嬭缁冩ā鍨嬨€� + +1. 鍗曞崱璁粌 + +```bash +bash scripts/run_standalone_train_gpu.sh DEVICE_ID VGG16_CKPT VAL_MINDRECORD_FILE +# example: bash scripts/run_standalone_train_gpu.sh 0 vgg16.ckpt val.mindrecord +``` + +2. 澶氬崱璁粌 + +```bash +bash scripts/run_distribute_train_gpu.sh DEVICE_NUM VGG16_CKPT VAL_MINDRECORD_FILE +# example: bash scripts/run_distribute_train_gpu.sh 8 vgg16.ckpt val.mindrecord +``` + +璁粌瀹屾瘯鍚庯紝寮€濮嬮獙璇丳yramidBox妯″瀷銆� + +3. 璇勪及妯″瀷 + +``` bash +# 鐢‵DDB鏁版嵁闆嗚瘎浼� +bash scripts/run_eval_gpu.sh PYRAMIDBOX_CKPT +# example: bash scripts/run_eval_gpu.sh checkpoints/pyramidbox.ckpt +``` + +# 鑴氭湰璇存槑 + +## 鑴氭湰鍙婃牱渚嬩唬鐮� + +```bash +PyramidBox +鈹溾攢鈹€ data // 淇濆瓨棰勫鐞嗗悗鏁版嵁闆嗘枃浠跺拰mindrecord鏂囦欢 +鈹溾攢鈹€ eval.py // 璇勪及妯″瀷鑴氭湰 +鈹溾攢鈹€ preprocess.py // 鏁版嵁闆嗘爣娉ㄦ枃浠堕澶勭悊鑴氭湰 +鈹溾攢鈹€ generate_mindrecord.py // 鍒涘缓mindrecord鏂囦欢鑴氭湰 +鈹溾攢鈹€ README_CN.md // PyramidBox涓枃鎻忚堪鏂囨。 +鈹溾攢鈹€ scripts +鈹� 鈹溾攢鈹€ generate_mindrecord_onet.sh // 鐢熸垚鐢ㄤ簬楠岃瘉鐨刴indrecord鏂囦欢shell鑴氭湰 +鈹� 鈹溾攢鈹€ run_distributed_train_gpu.sh // GPU澶氬崱璁粌shell鑴氭湰 +鈹� 鈹溾攢鈹€ run_eval_gpu.sh // GPU妯″瀷璇勪及shell鑴氭湰 +鈹� 鈹斺攢鈹€ run_standalone_train_gpu.sh // GPU鍗曞崱璁粌shell鑴氭湰 +鈹溾攢鈹€ src +鈹� 鈹溾攢鈹€ augmentations.py // 鏁版嵁澧炲己鑴氭湰 +鈹� 鈹溾攢鈹€ dataset.py // 鏁版嵁闆嗚剼鏈� +鈹� 鈹溾攢鈹€ evaluate.py // 妯″瀷璇勪及鑴氭湰 +鈹� 鈹溾攢鈹€ loss.py // 鎹熷け鍑芥暟 +鈹� 鈹溾攢鈹€ config.py // 閰嶇疆鏂囦欢 +鈹� 鈹溾攢鈹€ bbox_utils.py // box澶勭悊鍑芥暟 +鈹� 鈹溾攢鈹€ detection.py // decode妯″瀷棰勬祴鐐瑰拰缃俊搴� +鈹� 鈹溾攢鈹€ prior_box.py // 榛樿鍊欓€夋鐢熸垚鑴氭湰 +鈹� 鈹斺攢鈹€ pyramidbox.py // PyramidBox妯″瀷 +鈹斺攢鈹€ train.py // 璁粌妯″瀷鑴氭湰 + +``` + +## 鑴氭湰鍙傛暟 + +### 璁粌妯″瀷 + +```bash +usage: train.py [-h] [--basenet BASENET] [--batch_size BATCH_SIZE] + [--num_workers NUM_WORKERS] [--device_target {GPU,Ascend}] + [--lr LR] [--momentum MOMENTUM] [--weight_decay WEIGHT_DECAY] + [--gamma GAMMA] [--distribute DISTRIBUTE] + [--save_folder SAVE_FOLDER] [--epoches EPOCHES] + [--val_mindrecord VAL_MINDRECORD] + +Pyramidbox face Detector Training With MindSpore + +optional arguments: + -h, --help show this help message and exit + --basenet BASENET Pretrained base model + --batch_size BATCH_SIZE + Batch size for training + --num_workers NUM_WORKERS + Number of workers used in dataloading + --device_target {GPU,Ascend} + device for training + --lr LR, --learning-rate LR + initial learning rate + --momentum MOMENTUM Momentum value for optim + --weight_decay WEIGHT_DECAY + Weight decay for SGD + --gamma GAMMA Gamma update for SGD + --distribute DISTRIBUTE + Use mutil Gpu training + --save_folder SAVE_FOLDER + Directory for saving checkpoint models + --epoches EPOCHES Epoches to train model + --val_mindrecord VAL_MINDRECORD + Path of val mindrecord file +``` + +### 璇勪及妯″瀷 + +```bash +usage: eval.py [-h] [--model MODEL] [--thresh THRESH] + +PyramidBox Evaluatuon on Fddb + +optional arguments: + -h, --help show this help message and exit + --model MODEL trained model + --thresh THRESH Final confidence threshold +``` + +### 閰嶇疆鍙傛暟 + +```bash +config.py: + LR_STEPS: 鍗曞崱璁粌瀛︿範鐜囪“鍑忔鏁� + DIS_LR_STEPS: 澶氬崱璁粌瀛︿範鐜囪“鍑忔鏁� + FEATURE_MAPS: 璁粌闆嗘暟鎹壒寰佸舰鐘跺垪琛� + INPUT_SIZE: 杈撳叆鏁版嵁澶у皬 + STEPS: 鐢熸垚榛樿鍊欓€夋姝ユ暟 + ANCHOR_SIZES: 榛樿鍊欓€夋灏哄 + NUM_CLASSES: 鍒嗙被绫诲埆鏁� + OVERLAP_THRESH: 閲嶅悎搴﹂槇鍊� + NEG_POS_RATIOS: 璐熸牱鏈笌姝f牱鏈瘮渚� + NMS_THRESH: nms闃堝€� + TOP_K: top k鏁伴噺 + KEEP_TOP_K: 淇濈暀鐨則op k鏁伴噺 + CONF_THRESH: 缃俊搴﹂槇鍊� + HOME: 鏁版嵁闆嗕富鐩綍 + FACE.FILE_DIR: data鏂囦欢澶硅矾寰� + FACE.TRIN_FILE: face_train.txt鏂囦欢 + FACE.VAL_FILE: face_val.txt鏂囦欢 + FACE.FDDB_DIR: FDDB鏂囦欢澶� + FACE.WIDER_DIR: WIDER face鏂囦欢澶� +``` + +## 璁粌杩囩▼ + +鍦ㄥ紑濮嬭缁冧箣鍓嶏紝璇风‘淇濆凡瀹屾垚鍑嗗宸ヤ綔锛屽嵆锛� + +1. `src/config.py`鏂囦欢鐨刞_C.HOME`瀛楁涓篸ataset鏂囦欢澶硅矾寰� +2. 瀵箇ider_face_train_bbx_gt.txt鍜寃ider_face_val_bbx_gt.txt鏂囦欢杩涜棰勫鐞嗕互鐢熸垚face_train.txt鍜宖ace_val.txt鏂囦欢銆� +3. 鐢熸垚face_val.txt鐨刴indrecord鏂囦欢銆� +4. 涓嬭浇棰勮缁冨畬鎴愮殑[vgg16.ckpt](https://pan.baidu.com/s/1e5qSW4e1QVZRnbyGRWi91Q?pwd=dryt)鏂囦欢銆� + +鍑嗗宸ヤ綔瀹屾垚鍚庢柟鍙缁冦€� + +### 鍗曞崱璁粌 + +```bash +bash scripts/run_standalone_train_gpu.sh DEVICE_ID VGG16_CKPT VAL_MINDRECORD_FILE +# example: bash scripts/run_standalone_train_gpu.sh 0 vgg16.ckpt val.mindrecord +``` + +璁粌杩囩▼浼氬湪鍚庡彴杩愯锛岃缁冩ā鍨嬪皢淇濆瓨鍦╜checkpoints`鏂囦欢澶逛腑锛屽彲浠ラ€氳繃`logs/training_gpu.log`鏂囦欢鏌ョ湅璁粌杈撳嚭锛岃緭鍑虹粨鏋滃涓嬫墍绀猴細 + +```bash +epoch: 2 step: 456, loss is 0.3661264 +epoch: 2 step: 457, loss is 0.32284224 +epoch: 2 step: 458, loss is 0.29254544 +epoch: 2 step: 459, loss is 0.32631972 +epoch: 2 step: 460, loss is 0.3065704 +epoch: 2 step: 461, loss is 0.3995605 +epoch: 2 step: 462, loss is 0.2614449 +epoch: 2 step: 463, loss is 0.50305885 +epoch: 2 step: 464, loss is 0.30908597 +路路路 +``` + +### 澶氬崱璁粌 + +```bash +bash scripts/run_distribute_train_gpu.sh [DEVICE_NUM] [VGG16_CKPT] [VAL_MINDRECORD_FILE] +# example: bash scripts/run_distribute_train_gpu.sh 8 vgg16.ckpt val.mindrecord +``` + +璁粌杩囩▼浼氬湪鍚庡彴杩愯锛屽彧淇濆瓨绗竴寮犲崱鐨勮缁冩ā鍨嬶紝璁粌妯″瀷灏嗕繚瀛樺湪`checkpoints/distribute_0/`鏂囦欢澶逛腑锛屽彲浠ラ€氳繃`logs/distribute_training_gpu.log`鏂囦欢鏌ョ湅璁粌杈撳嚭锛岃緭鍑虹粨鏋滃涓嬫墍绀猴細 + +```bash +epoch: 1 total step: 2, step: 2, loss is 25.479286 +epoch: 1 total step: 2, step: 2, loss is 30.297405 +epoch: 1 total step: 2, step: 2, loss is 28.816475 +epoch: 1 total step: 2, step: 2, loss is 25.439453 +epoch: 1 total step: 2, step: 2, loss is 28.585438 +epoch: 1 total step: 2, step: 2, loss is 31.117134 +epoch: 1 total step: 2, step: 2, loss is 25.770748 +epoch: 1 total step: 2, step: 2, loss is 27.557945 +epoch: 1 total step: 3, step: 3, loss is 28.352016 +epoch: 1 total step: 3, step: 3, loss is 31.99873 +epoch: 1 total step: 3, step: 3, loss is 31.426039 +epoch: 1 total step: 3, step: 3, loss is 24.02226 +epoch: 1 total step: 3, step: 3, loss is 30.12824 +epoch: 1 total step: 3, step: 3, loss is 29.977898 +epoch: 1 total step: 3, step: 3, loss is 24.06476 +epoch: 1 total step: 3, step: 3, loss is 28.573633 +epoch: 1 total step: 4, step: 4, loss is 28.599226 +epoch: 1 total step: 4, step: 4, loss is 34.262005 +epoch: 1 total step: 4, step: 4, loss is 30.732353 +epoch: 1 total step: 4, step: 4, loss is 28.62697 +epoch: 1 total step: 4, step: 4, loss is 39.44549 +epoch: 1 total step: 4, step: 4, loss is 27.754185 +epoch: 1 total step: 4, step: 4, loss is 26.15754 +... +``` + +## 璇勪及杩囩▼ + +```bash +bash scripts/run_eval_gpu.sh [PYRAMIDBOX_CKPT] +# example: bash scripts/run_eval_gpu.sh checkpoints/pyramidbox.ckpt +``` + +娉細妯″瀷鍚嶇О涓篳pyramidbox_best_{epoch}.ckpt`锛宔poch琛ㄧず璇ユ鏌ョ偣淇濆瓨鏃惰缁冪殑杞暟锛宔poch瓒婂ぇ锛學IDER val鐨刲oss鍊艰秺灏忥紝妯″瀷绮惧害鐩稿瓒婇珮锛屽洜姝ゅ湪璇勪及鏈€浣虫ā鍨嬫椂锛屼紭鍏堣瘎浼癳poch鏈€澶х殑妯″瀷锛屾寜鐓poch浠庡ぇ鍒板皬鐨勯『搴忚瘎浼般€� + +璇勪及杩囩▼浼氬湪鍚庡彴杩涜锛岃瘎浼扮粨鏋滃彲浠ラ€氳繃`logs/eval_gpu.log`鏂囦欢鏌ョ湅锛岃緭鍑虹粨鏋滃涓嬫墍绀猴細 + +```bash +==================== Results ==================== +FDDB-fold-1 Val AP: 0.9614604685893 +FDDB-fold-2 Val AP: 0.9615593696135745 +FDDB-fold-3 Val AP: 0.9607889632039851 +FDDB-fold-4 Val AP: 0.972454404596466 +FDDB-fold-5 Val AP: 0.9734522365236052 +FDDB-fold-6 Val AP: 0.952158002966933 +FDDB-fold-7 Val AP: 0.9618735923917133 +FDDB-fold-8 Val AP: 0.9501671313630741 +FDDB-fold-9 Val AP: 0.9539008001056393 +FDDB-fold-10 Val AP: 0.9664355605240443 +FDDB Dataset Average AP: 0.9614250529878333 +================================================= +``` + +# 妯″瀷鎻忚堪 + +## 鎬ц兘 + +| 鍙傛暟 | PyramidBox | +| -------------------- | ------------------------------------------------------- | +| 璧勬簮 | GPU(Tesla V100 SXM2)锛孋PU 2.1GHz 24cores锛孧emory 128G| +| 涓婁紶鏃ユ湡 | 2022-09-17 | +| MindSpore鐗堟湰 | 1.8.1 | +| 鏁版嵁闆� | WIDER Face, FDDB | +| 璁粌鍙傛暟 | epoch=100,batch_size=4, lr=5e-4 | +| 浼樺寲鍣� | SGD | +| 鎹熷け鍑芥暟 | SoftmaxCrossEntropyWithLogits, SmoothL1Loss | +| 杈撳嚭 | 鍧愭爣锛岀疆淇″害 | +| 鎹熷け | 2-6 | +| 閫熷害 | 570姣/姝�(鍗曞崱) 650姣/姝�(鍏崱) | +| 鎬绘椂闀� | 50鏃�58鍒�(鍗曞崱)锛�7鏃�12鍒�(鍏崱) | +| 寰皟妫€鏌ョ偣 | 655M (.ckpt鏂囦欢) | diff --git a/research/cv/PyramidBox/eval.py b/research/cv/PyramidBox/eval.py new file mode 100644 index 0000000000000000000000000000000000000000..9d704eeb2188473ca8bd0b6853229eff34e1bcc2 --- /dev/null +++ b/research/cv/PyramidBox/eval.py @@ -0,0 +1,109 @@ +# Copyright 2022 Huawei Technologies Co., Ltd +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================ + +# This file was copied from project [ZhaoWeicheng][Pyramidbox.pytorch] + +import os +import argparse +from PIL import Image +from mindspore import Tensor, context +from mindspore import load_checkpoint, load_param_into_net +import numpy as np + +from src.config import cfg +from src.pyramidbox import build_net +from src.augmentations import to_chw_bgr +from src.prior_box import PriorBox +from src.detection import Detect +from src.evaluate import evaluation + +parser = argparse.ArgumentParser(description='PyramidBox Evaluatuon on Fddb') +parser.add_argument('--model', type=str, default='checkpoints/pyramidbox.pth', help='trained model') +parser.add_argument('--thresh', default=0.1, type=float, help='Final confidence threshold') +args = parser.parse_args() + +FDDB_IMG_DIR = cfg.FACE.FDDB_DIR +FDDB_FOLD_DIR = os.path.join(FDDB_IMG_DIR, 'FDDB-folds') +FDDB_OUT_DIR = 'FDDB-out' + +if not os.path.exists(FDDB_OUT_DIR): + os.mkdir(FDDB_OUT_DIR) + +def detect_face(net_, img_, thresh): + x = to_chw_bgr(img_).astype(np.float32) + x -= cfg.img_mean + x = x[[2, 1, 0], :, :] + + x = Tensor(x)[None, :, :, :] + size = x.shape[2:] + + loc, conf, feature_maps = net_(x) + + prior_box = PriorBox(cfg, feature_maps, size, 'test') + default_priors = prior_box.forward() + + detections = Detect(cfg).detect(loc, conf, default_priors) + + scale = np.array([img_.shape[1], img_.shape[0], img_.shape[1], img_.shape[0]]) + bboxes = [] + for i in range(detections.shape[1]): + j = 0 + while detections[0, i, j, 0] >= thresh: + box = [] + score = detections[0, i, j, 0] + pt = (detections[0, i, j, 1:] * scale).astype(np.int32) + + j += 1 + box += [pt[0], pt[1], pt[2] - pt[0], pt[3] - pt[1], score] + bboxes += [box] + + return bboxes + +if __name__ == '__main__': + context.set_context(mode=context.PYNATIVE_MODE) + net = build_net('test', cfg.NUM_CLASSES) + params = load_checkpoint(args.model) + load_param_into_net(net, params) + net.set_train(False) + + print("Start detecting FDDB images") + for index in range(1, 11): + if not os.path.exists(os.path.join(FDDB_OUT_DIR, str(index))): + os.mkdir(os.path.join(FDDB_OUT_DIR, str(index))) + print(f"Detecting folder {index}") + file_path = os.path.join(cfg.FACE.FDDB_DIR, 'FDDB-folds', 'FDDB-fold-%02d.txt' % index) + with open(file_path, 'r') as f: + lines = f.readlines() + for line in lines: + line = line.strip('\n') + image_path = os.path.join(cfg.FACE.FDDB_DIR, line) + '.jpg' + img = Image.open(image_path) + if img.mode == 'L': + img = img.convert('RGB') + img = np.array(img) + line = line.replace('/', '_') + with open(os.path.join(FDDB_OUT_DIR, str(index), line + '.txt'), 'w') as w: + w.write(line) + w.write('\n') + boxes = detect_face(net, img, args.thresh) + if not boxes is None: + w.write(str(len(boxes))) + w.write('\n') + for box_ in boxes: + w.write(f'{int(box_[0])} {int(box_[1])} {int(box_[2])} {int(box_[3])} {box_[4]}\n') + print("Detection Done!") + print("Start evluation!") + + evaluation(FDDB_OUT_DIR, FDDB_FOLD_DIR) diff --git a/research/cv/PyramidBox/generate_mindrecord.py b/research/cv/PyramidBox/generate_mindrecord.py new file mode 100644 index 0000000000000000000000000000000000000000..aa4097756292b4cefdd459e62ed146805b97e886 --- /dev/null +++ b/research/cv/PyramidBox/generate_mindrecord.py @@ -0,0 +1,66 @@ +# Copyright 2022 Huawei Technologies Co., Ltd +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================ + +import argparse +import os +from mindspore.mindrecord import FileWriter + +from src.dataset import WIDERDataset +from src.config import cfg + +parser = argparse.ArgumentParser(description='Generate Mindrecord File for training') +parser.add_argument('--prefix', type=str, default='./data', help="Directory to store mindrecord file") +parser.add_argument('--val_name', type=str, default='val.mindrecord', help='Name of val mindrecord file') + +args = parser.parse_args() + +def data_to_mindrecord(mindrecord_prefix, mindrecord_name, dataset): + if not os.path.exists(mindrecord_prefix): + os.mkdir(mindrecord_prefix) + mindrecord_path = os.path.join(mindrecord_prefix, mindrecord_name) + writer = FileWriter(mindrecord_path, 1, overwrite=True) + + data_json = { + 'img': {"type": "float32", "shape": [3, 640, 640]}, + 'face_loc': {"type": "float32", "shape": [34125, 4]}, + 'face_conf': {"type": "float32", "shape": [34125]}, + 'head_loc': {"type": "float32", "shape": [34125, 4]}, + 'head_conf': {"type": "float32", "shape": [34125]} + } + + writer.add_schema(data_json, 'data_json') + count = 0 + for d in dataset: + img, face_loc, face_conf, head_loc, head_conf = d + + row = { + "img": img, + "face_loc": face_loc, + "face_conf": face_conf, + "head_loc": head_loc, + "head_conf": head_conf + } + + writer.write_raw_data([row]) + count += 1 + writer.commit() + print("Total train data: ", count) + print("Create mindrecord done!") + + +if __name__ == '__main__': + print("Start generating val mindrecord file") + ds_val = WIDERDataset(cfg.FACE.VAL_FILE, mode='val') + data_to_mindrecord(args.prefix, args.val_name, ds_val) diff --git a/research/cv/PyramidBox/preprocess.py b/research/cv/PyramidBox/preprocess.py new file mode 100644 index 0000000000000000000000000000000000000000..f3bdf8144653c872b0e1b2049e6aa0cd7bba2a8e --- /dev/null +++ b/research/cv/PyramidBox/preprocess.py @@ -0,0 +1,96 @@ +# Copyright 2022 Huawei Technologies Co., Ltd +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================ + +# This file was copied from project [ZhaoWeicheng][Pyramidbox.pytorch] + +import os +from src.config import cfg + +WIDER_ROOT = os.path.join(cfg.HOME, 'WIDERFACE') +train_list_file = os.path.join(WIDER_ROOT, 'wider_face_split', + 'wider_face_train_bbx_gt.txt') +val_list_file = os.path.join(WIDER_ROOT, 'wider_face_split', + 'wider_face_val_bbx_gt.txt') + +WIDER_TRAIN = os.path.join(WIDER_ROOT, 'WIDER_train', 'images') +WIDER_VAL = os.path.join(WIDER_ROOT, 'WIDER_val', 'images') + + +def parse_wider_file(root, file): + with open(file, 'r') as fr: + lines = fr.readlines() + face_count = [] + img_paths = [] + face_loc = [] + img_faces = [] + count = 0 + flag = False + for k, line in enumerate(lines): + line = line.strip().strip('\n') + if count > 0: + line = line.split(' ') + count -= 1 + loc = [int(line[0]), int(line[1]), int(line[2]), int(line[3])] + face_loc += [loc] + if flag: + face_count += [int(line)] + flag = False + count = int(line) + if 'jpg' in line: + img_paths += [os.path.join(root, line)] + flag = True + + total_face = 0 + for k in face_count: + face_ = [] + for x in range(total_face, total_face + k): + face_.append(face_loc[x]) + img_faces += [face_] + total_face += k + return img_paths, img_faces + + +def wider_data_file(): + if not os.path.exists(cfg.FACE.FILE_DIR): + os.mkdir(cfg.FACE.FILE_DIR) + img_paths, bbox = parse_wider_file(WIDER_TRAIN, train_list_file) + fw = open(cfg.FACE.TRAIN_FILE, 'w') + for index in range(len(img_paths)): + path = img_paths[index] + boxes = bbox[index] + fw.write(path) + fw.write(' {}'.format(len(boxes))) + for box in boxes: + data = ' {} {} {} {} {}'.format(box[0], box[1], box[2], box[3], 1) + fw.write(data) + fw.write('\n') + fw.close() + + img_paths, bbox = parse_wider_file(WIDER_VAL, val_list_file) + fw = open(cfg.FACE.VAL_FILE, 'w') + for index in range(len(img_paths)): + path = img_paths[index] + boxes = bbox[index] + fw.write(path) + fw.write(' {}'.format(len(boxes))) + for box in boxes: + data = ' {} {} {} {} {}'.format(box[0], box[1], box[2], box[3], 1) + fw.write(data) + fw.write('\n') + fw.close() + + +if __name__ == '__main__': + wider_data_file() diff --git a/research/cv/PyramidBox/requirements.txt b/research/cv/PyramidBox/requirements.txt new file mode 100644 index 0000000000000000000000000000000000000000..a4436322b8ee0c5fabe557052d2d167d47c158ae --- /dev/null +++ b/research/cv/PyramidBox/requirements.txt @@ -0,0 +1,7 @@ +easydict==1.9 +mindspore-gpu==1.8.1 +numpy==1.21.5 +opencv-python==4.5.5.62 +Pillow==9.0.0 +scikit-image==0.18.3 +tqdm==4.64.1 \ No newline at end of file diff --git a/research/cv/PyramidBox/scripts/generate_mindrecord.sh b/research/cv/PyramidBox/scripts/generate_mindrecord.sh new file mode 100644 index 0000000000000000000000000000000000000000..988a264df4966f16cf320435cf67c25d4a951432 --- /dev/null +++ b/research/cv/PyramidBox/scripts/generate_mindrecord.sh @@ -0,0 +1,33 @@ +#!/bin/bash +# Copyright 2022 Huawei Technologies Co., Ltd +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================ + +echo "==============================================================================================================" +echo "Please run the script as: " +echo "bash generate_mindrecord.sh" +echo "for example: bash generate_mindrecord.sh" +echo "==============================================================================================================" + + +PROJECT_DIR=$(cd "$(dirname "$0")" || exit; pwd) +LOG_DIR=$PROJECT_DIR/../logs + +if [ ! -d $LOG_DIR ] +then + mkdir $LOG_DIR +fi + +python $PROJECT_DIR/../generate_mindrecord.py > $LOG_DIR/generate_mindrecord.log 2>&1 & +echo "The data log is at /logs/generate_mindrecord.log" diff --git a/research/cv/PyramidBox/scripts/run_distribute_train_gpu.sh b/research/cv/PyramidBox/scripts/run_distribute_train_gpu.sh new file mode 100644 index 0000000000000000000000000000000000000000..5c42474ca1c60fdbd665b2d9504ad447e419ceaf --- /dev/null +++ b/research/cv/PyramidBox/scripts/run_distribute_train_gpu.sh @@ -0,0 +1,53 @@ +#!/bin/bash +# Copyright 2022 Huawei Technologies Co., Ltd +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================ + +echo "==============================================================================================================" +echo "Please run the script as: " +echo "bash run_distribute_train_gpu.sh DEVICE_NUM VGG16_CKPT VAL_MINDRECORD_FILE" +echo "for example: bash run_distribute_train_gpu.sh 8 vgg16.ckpt val.mindrecord" +echo "==============================================================================================================" + +DEVICE_NUM=$1 +VGG16=$2 +VAL_MINDRECORD=$3 + +if [ $# -lt 3 ]; +then + echo "---------------------ERROR----------------------" + echo "You must specify number of gpu devices, vgg16 checkpoint, mindrecord file for evaling" + exit +fi + +PROJECT_DIR=$(cd "$(dirname "$0")" || exit; pwd) +LOG_DIR=$PROJECT_DIR/../logs +if [ ! -d $LOG_DIR ] +then + mkdir $LOG_DIR +fi + +export PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION=python + +mpirun -n $DEVICE_NUM --allow-run-as-root python $PROJECT_DIR/../train.py \ + --distribute True \ + --lr 5e-4 \ + --device_target GPU \ + --val_mindrecord $VAL_MINDRECORD \ + --epoches 100 \ + --basenet $VGG16 \ + --num_workers 1 \ + --batch_size 4 > $LOG_DIR/distribute_training_gpu.log 2>&1 & + +echo "The distributed train log is at /logs/distribute_training_gpu.log" diff --git a/research/cv/PyramidBox/scripts/run_eval_gpu.sh b/research/cv/PyramidBox/scripts/run_eval_gpu.sh new file mode 100644 index 0000000000000000000000000000000000000000..33a31bb9807d91028697c871cfc85091834d01e8 --- /dev/null +++ b/research/cv/PyramidBox/scripts/run_eval_gpu.sh @@ -0,0 +1,41 @@ +#!/bin/bash +# Copyright 2022 Huawei Technologies Co., Ltd +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================ + +echo "==============================================================================================================" +echo "Please run the script as: " +echo "bash run_eval_gpu.sh PYRAMIDBOX_CKPT" +echo "for example: bash run_eval_gpu.sh pyramidbox.ckpt" +echo "==============================================================================================================" + +if [ $# -lt 1 ]; +then + echo "---------------------ERROR----------------------" + echo "You must specify pyramidbox checkpoint" + exit +fi + +CKPT=$1 + +PROJECT_DIR=$(cd "$(dirname "$0")" || exit; pwd) +LOG_DIR=$PROJECT_DIR/../logs +if [ ! -d $LOG_DIR ] +then + mkdir $LOG_DIR +fi + +export PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION=python + +python $PROJECT_DIR/../eval.py --model $CKPT > $LOG_DIR/eval_gpu.log 2>&1 & diff --git a/research/cv/PyramidBox/scripts/run_standalone_train_gpu.sh b/research/cv/PyramidBox/scripts/run_standalone_train_gpu.sh new file mode 100644 index 0000000000000000000000000000000000000000..1b1135a6f657792610238869034e68c59e4fa83c --- /dev/null +++ b/research/cv/PyramidBox/scripts/run_standalone_train_gpu.sh @@ -0,0 +1,53 @@ +#!/bin/bash +# Copyright 2022 Huawei Technologies Co., Ltd +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================ + +echo "==============================================================================================================" +echo "Please run the script as: " +echo "bash run_standalone_train_gpu.sh DEVICE_ID VGG16_CKPT VAL_MINDRECORD_FILE" +echo "for example: bash run_standalone_train_gpu.sh 0 vgg16.ckpt val.mindrecord" +echo "==============================================================================================================" + +DEVICE_ID=$1 +VGG16=$2 +VAL_MINDRECORD=$3 + +if [ $# -lt 3 ]; +then + echo "---------------------ERROR----------------------" + echo "You must specify gpu device, vgg16 checkpoint and mindrecord file for valing" + exit +fi + +PROJECT_DIR=$(cd "$(dirname "$0")" || exit; pwd) +LOG_DIR=$PROJECT_DIR/../logs +if [ ! -d $LOG_DIR ] +then + mkdir $LOG_DIR +fi + +export DEVICE_ID=$DEVICE_ID +export PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION=python + +python $PROJECT_DIR/../train.py \ + --device_target GPU \ + --epoches 100 \ + --lr 5e-4 \ + --basenet $VGG16 \ + --num_workers 2 \ + --val_mindrecord $VAL_MINDRECORD \ + --batch_size 4 > $LOG_DIR/training_gpu.log 2>&1 & + +echo "The standalone train log is at /logs/training_gpu.log" diff --git a/research/cv/PyramidBox/src/augmentations.py b/research/cv/PyramidBox/src/augmentations.py new file mode 100644 index 0000000000000000000000000000000000000000..327d7dca196e1df5888083967e6daf4d33875d11 --- /dev/null +++ b/research/cv/PyramidBox/src/augmentations.py @@ -0,0 +1,844 @@ +# Copyright 2022 Huawei Technologies Co., Ltd +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================ + +# This file was copied from project [ZhaoWeicheng][Pyramidbox.pytorch] + +import math +import random +import six +import cv2 +import numpy as np +from PIL import Image, ImageEnhance +from src.config import cfg + + +class Sampler(): + + def __init__(self, + max_sample, + max_trial, + min_scale, + max_scale, + min_aspect_ratio, + max_aspect_ratio, + min_jaccard_overlap, + max_jaccard_overlap, + min_object_coverage, + max_object_coverage, + use_square=False): + self.max_sample = max_sample + self.max_trial = max_trial + self.min_scale = min_scale + self.max_scale = max_scale + self.min_aspect_ratio = min_aspect_ratio + self.max_aspect_ratio = max_aspect_ratio + self.min_jaccard_overlap = min_jaccard_overlap + self.max_jaccard_overlap = max_jaccard_overlap + self.min_object_coverage = min_object_coverage + self.max_object_coverage = max_object_coverage + self.use_square = use_square + + +def intersect(box_a, box_b): + max_xy = np.minimum(box_a[:, 2:], box_b[2:]) + min_xy = np.maximum(box_a[:, :2], box_b[:2]) + inter = np.clip((max_xy - min_xy), a_min=0, a_max=np.inf) + return inter[:, 0] * inter[:, 1] + + +def jaccard_numpy(box_a, box_b): + """Compute the jaccard overlap of two sets of boxes. The jaccard overlap + is simply the intersection over union of two boxes. + E.g.: + A 鈭� B / A 鈭� B = A 鈭� B / (area(A) + area(B) - A 鈭� B) + Args: + box_a: Multiple bounding boxes, Shape: [num_boxes,4] + box_b: Single bounding box, Shape: [4] + Return: + jaccard overlap: Shape: [box_a.shape[0], box_a.shape[1]] + """ + inter = intersect(box_a, box_b) + area_a = ((box_a[:, 2] - box_a[:, 0]) * + (box_a[:, 3] - box_a[:, 1])) # [A,B] + area_b = ((box_b[2] - box_b[0]) * + (box_b[3] - box_b[1])) # [A,B] + union = area_a + area_b - inter + return inter / union # [A,B] + + +class Bbox(): + + def __init__(self, xmin, ymin, xmax, ymax): + self.xmin = xmin + self.ymin = ymin + self.xmax = xmax + self.ymax = ymax + + +def random_brightness(img): + prob = np.random.uniform(0, 1) + if prob < cfg.brightness_prob: + delta = np.random.uniform(-cfg.brightness_delta, + cfg.brightness_delta) + 1 + img = ImageEnhance.Brightness(img).enhance(delta) + return img + + +def random_contrast(img): + prob = np.random.uniform(0, 1) + if prob < cfg.contrast_prob: + delta = np.random.uniform(-cfg.contrast_delta, + cfg.contrast_delta) + 1 + img = ImageEnhance.Contrast(img).enhance(delta) + return img + + +def random_saturation(img): + prob = np.random.uniform(0, 1) + if prob < cfg.saturation_prob: + delta = np.random.uniform(-cfg.saturation_delta, + cfg.saturation_delta) + 1 + img = ImageEnhance.Color(img).enhance(delta) + return img + + +def random_hue(img): + prob = np.random.uniform(0, 1) + if prob < cfg.hue_prob: + delta = np.random.uniform(-cfg.hue_delta, cfg.hue_delta) + img_hsv = np.array(img.convert('HSV')) + img_hsv[:, :, 0] = img_hsv[:, :, 0] + delta + img = Image.fromarray(img_hsv, mode='HSV').convert('RGB') + return img + + +def distort_image(img): + prob = np.random.uniform(0, 1) + # Apply different distort order + if prob > 0.5: + img = random_brightness(img) + img = random_contrast(img) + img = random_saturation(img) + img = random_hue(img) + else: + img = random_brightness(img) + img = random_saturation(img) + img = random_hue(img) + img = random_contrast(img) + return img + + +def meet_emit_constraint(src_bbox, sample_bbox): + center_x = (src_bbox.xmax + src_bbox.xmin) / 2 + center_y = (src_bbox.ymax + src_bbox.ymin) / 2 + if sample_bbox.xmin <= center_x <= sample_bbox.xmax and \ + sample_bbox.ymin <= center_y <= sample_bbox.ymax: + return True + return False + + +def project_bbox(object_bbox, sample_bbox): + if object_bbox.xmin >= sample_bbox.xmax or \ + object_bbox.xmax <= sample_bbox.xmin or \ + object_bbox.ymin >= sample_bbox.ymax or \ + object_bbox.ymax <= sample_bbox.ymin: + return False + + proj_bbox = Bbox(0, 0, 0, 0) + sample_width = sample_bbox.xmax - sample_bbox.xmin + sample_height = sample_bbox.ymax - sample_bbox.ymin + proj_bbox.xmin = (object_bbox.xmin - sample_bbox.xmin) / sample_width + proj_bbox.ymin = (object_bbox.ymin - sample_bbox.ymin) / sample_height + proj_bbox.xmax = (object_bbox.xmax - sample_bbox.xmin) / sample_width + proj_bbox.ymax = (object_bbox.ymax - sample_bbox.ymin) / sample_height + proj_bbox = clip_bbox(proj_bbox) + if bbox_area(proj_bbox) > 0: + return proj_bbox + + return False + + +def transform_labels(bbox_labels, sample_bbox): + sample_labels = [] + for i in range(len(bbox_labels)): + sample_label = [] + object_bbox = Bbox(bbox_labels[i][1], bbox_labels[i][2], + bbox_labels[i][3], bbox_labels[i][4]) + if not meet_emit_constraint(object_bbox, sample_bbox): + continue + proj_bbox = project_bbox(object_bbox, sample_bbox) + if proj_bbox: + sample_label.append(bbox_labels[i][0]) + sample_label.append(float(proj_bbox.xmin)) + sample_label.append(float(proj_bbox.ymin)) + sample_label.append(float(proj_bbox.xmax)) + sample_label.append(float(proj_bbox.ymax)) + sample_label = sample_label + bbox_labels[i][5:] + sample_labels.append(sample_label) + return sample_labels + + +def expand_image(img, bbox_labels, img_width, img_height): + prob = np.random.uniform(0, 1) + if prob < cfg.expand_prob: + if cfg.expand_max_ratio - 1 >= 0.01: + expand_ratio = np.random.uniform(1, cfg.expand_max_ratio) + height = int(img_height * expand_ratio) + width = int(img_width * expand_ratio) + h_off = math.floor(np.random.uniform(0, height - img_height)) + w_off = math.floor(np.random.uniform(0, width - img_width)) + expand_bbox = Bbox(-w_off / img_width, -h_off / img_height, + (width - w_off) / img_width, + (height - h_off) / img_height) + expand_img = np.ones((height, width, 3)) + expand_img = np.uint8(expand_img * np.squeeze(cfg.img_mean)) + expand_img = Image.fromarray(expand_img) + expand_img.paste(img, (int(w_off), int(h_off))) + bbox_labels = transform_labels(bbox_labels, expand_bbox) + return expand_img, bbox_labels, width, height + return img, bbox_labels, img_width, img_height + + +def clip_bbox(src_bbox): + src_bbox.xmin = max(min(src_bbox.xmin, 1.0), 0.0) + src_bbox.ymin = max(min(src_bbox.ymin, 1.0), 0.0) + src_bbox.xmax = max(min(src_bbox.xmax, 1.0), 0.0) + src_bbox.ymax = max(min(src_bbox.ymax, 1.0), 0.0) + return src_bbox + + +def bbox_area(src_bbox): + if src_bbox.xmax < src_bbox.xmin or src_bbox.ymax < src_bbox.ymin: + return 0. + + width = src_bbox.xmax - src_bbox.xmin + height = src_bbox.ymax - src_bbox.ymin + return width * height + + +def intersect_bbox(bbox1, bbox2): + if bbox2.xmin > bbox1.xmax or bbox2.xmax < bbox1.xmin or \ + bbox2.ymin > bbox1.ymax or bbox2.ymax < bbox1.ymin: + intersection_box = Bbox(0.0, 0.0, 0.0, 0.0) + else: + intersection_box = Bbox( + max(bbox1.xmin, bbox2.xmin), + max(bbox1.ymin, bbox2.ymin), + min(bbox1.xmax, bbox2.xmax), min(bbox1.ymax, bbox2.ymax)) + return intersection_box + + +def bbox_coverage(bbox1, bbox2): + inter_box = intersect_bbox(bbox1, bbox2) + intersect_size = bbox_area(inter_box) + + if intersect_size > 0: + bbox1_size = bbox_area(bbox1) + return intersect_size / bbox1_size + + return 0. + + +def generate_batch_random_samples(batch_sampler, bbox_labels, image_width, + image_height, scale_array, resize_width, + resize_height): + sampled_bbox = [] + for sampler in batch_sampler: + found = 0 + for _ in range(sampler.max_trial): + if found >= sampler.max_sample: + break + sample_bbox = data_anchor_sampling( + sampler, bbox_labels, image_width, image_height, scale_array, + resize_width, resize_height) + if sample_bbox == 0: + break + if satisfy_sample_constraint(sampler, sample_bbox, bbox_labels): + sampled_bbox.append(sample_bbox) + found = found + 1 + return sampled_bbox + + +def data_anchor_sampling(sampler, bbox_labels, image_width, image_height, + scale_array, resize_width, resize_height): + num_gt = len(bbox_labels) + # np.random.randint range: [low, high) + rand_idx = np.random.randint(0, num_gt) if num_gt != 0 else 0 + + if num_gt != 0: + norm_xmin = bbox_labels[rand_idx][1] + norm_ymin = bbox_labels[rand_idx][2] + norm_xmax = bbox_labels[rand_idx][3] + norm_ymax = bbox_labels[rand_idx][4] + + xmin = norm_xmin * image_width + ymin = norm_ymin * image_height + wid = image_width * (norm_xmax - norm_xmin) + hei = image_height * (norm_ymax - norm_ymin) + range_size = 0 + + area = wid * hei + for scale_ind in range(0, len(scale_array) - 1): + if scale_array[scale_ind] ** 2 < area < scale_array[scale_ind + 1] ** 2: + range_size = scale_ind + 1 + break + + if area > scale_array[len(scale_array) - 2]**2: + range_size = len(scale_array) - 2 + scale_choose = 0.0 + if range_size == 0: + rand_idx_size = 0 + else: + # np.random.randint range: [low, high) + rng_rand_size = np.random.randint(0, range_size + 1) + rand_idx_size = rng_rand_size % (range_size + 1) + + if rand_idx_size == range_size: + min_resize_val = scale_array[rand_idx_size] / 2.0 + max_resize_val = min(2.0 * scale_array[rand_idx_size], + 2 * math.sqrt(wid * hei)) + scale_choose = random.uniform(min_resize_val, max_resize_val) + else: + min_resize_val = scale_array[rand_idx_size] / 2.0 + max_resize_val = 2.0 * scale_array[rand_idx_size] + scale_choose = random.uniform(min_resize_val, max_resize_val) + + sample_bbox_size = wid * resize_width / scale_choose + + w_off_orig = 0.0 + h_off_orig = 0.0 + if sample_bbox_size < max(image_height, image_width): + if wid <= sample_bbox_size: + w_off_orig = np.random.uniform(xmin + wid - sample_bbox_size, + xmin) + else: + w_off_orig = np.random.uniform(xmin, + xmin + wid - sample_bbox_size) + + if hei <= sample_bbox_size: + h_off_orig = np.random.uniform(ymin + hei - sample_bbox_size, + ymin) + else: + h_off_orig = np.random.uniform(ymin, + ymin + hei - sample_bbox_size) + + else: + w_off_orig = np.random.uniform(image_width - sample_bbox_size, 0.0) + h_off_orig = np.random.uniform( + image_height - sample_bbox_size, 0.0) + + w_off_orig = math.floor(w_off_orig) + h_off_orig = math.floor(h_off_orig) + + # Figure out top left coordinates. + w_off = 0.0 + h_off = 0.0 + w_off = float(w_off_orig / image_width) + h_off = float(h_off_orig / image_height) + + sampled_bbox = Bbox(w_off, h_off, + w_off + float(sample_bbox_size / image_width), + h_off + float(sample_bbox_size / image_height)) + + return sampled_bbox + + return 0 + + +def jaccard_overlap(sample_bbox, object_bbox): + if sample_bbox.xmin >= object_bbox.xmax or \ + sample_bbox.xmax <= object_bbox.xmin or \ + sample_bbox.ymin >= object_bbox.ymax or \ + sample_bbox.ymax <= object_bbox.ymin: + return 0 + intersect_xmin = max(sample_bbox.xmin, object_bbox.xmin) + intersect_ymin = max(sample_bbox.ymin, object_bbox.ymin) + intersect_xmax = min(sample_bbox.xmax, object_bbox.xmax) + intersect_ymax = min(sample_bbox.ymax, object_bbox.ymax) + intersect_size = (intersect_xmax - intersect_xmin) * ( + intersect_ymax - intersect_ymin) + sample_bbox_size = bbox_area(sample_bbox) + object_bbox_size = bbox_area(object_bbox) + overlap = intersect_size / ( + sample_bbox_size + object_bbox_size - intersect_size) + return overlap + + +def satisfy_sample_constraint(sampler, sample_bbox, bbox_labels): + if sampler.min_jaccard_overlap == 0 and sampler.max_jaccard_overlap == 0: + has_jaccard_overlap = False + else: + has_jaccard_overlap = True + if sampler.min_object_coverage == 0 and sampler.max_object_coverage == 0: + has_object_coverage = False + else: + has_object_coverage = True + + if not has_jaccard_overlap and not has_object_coverage: + return True + found = False + for i in range(len(bbox_labels)): + object_bbox = Bbox(bbox_labels[i][1], bbox_labels[i][2], + bbox_labels[i][3], bbox_labels[i][4]) + if has_jaccard_overlap: + overlap = jaccard_overlap(sample_bbox, object_bbox) + if sampler.min_jaccard_overlap != 0 and \ + overlap < sampler.min_jaccard_overlap: + continue + if sampler.max_jaccard_overlap != 0 and \ + overlap > sampler.max_jaccard_overlap: + continue + found = True + if has_object_coverage: + object_coverage = bbox_coverage(object_bbox, sample_bbox) + if sampler.min_object_coverage != 0 and \ + object_coverage < sampler.min_object_coverage: + continue + if sampler.max_object_coverage != 0 and \ + object_coverage > sampler.max_object_coverage: + continue + found = True + if found: + return True + return found + + +def crop_image_sampling(img, bbox_labels, sample_bbox, image_width, + image_height, resize_width, resize_height, + min_face_size): + # no clipping here + xmin = int(sample_bbox.xmin * image_width) + xmax = int(sample_bbox.xmax * image_width) + ymin = int(sample_bbox.ymin * image_height) + ymax = int(sample_bbox.ymax * image_height) + w_off = xmin + h_off = ymin + width = xmax - xmin + height = ymax - ymin + + cross_xmin = max(0.0, float(w_off)) + cross_ymin = max(0.0, float(h_off)) + cross_xmax = min(float(w_off + width - 1.0), float(image_width)) + cross_ymax = min(float(h_off + height - 1.0), float(image_height)) + cross_width = cross_xmax - cross_xmin + cross_height = cross_ymax - cross_ymin + + roi_xmin = 0 if w_off >= 0 else abs(w_off) + roi_ymin = 0 if h_off >= 0 else abs(h_off) + roi_width = cross_width + roi_height = cross_height + + roi_y1 = int(roi_ymin) + roi_y2 = int(roi_ymin + roi_height) + roi_x1 = int(roi_xmin) + roi_x2 = int(roi_xmin + roi_width) + + cross_y1 = int(cross_ymin) + cross_y2 = int(cross_ymin + cross_height) + cross_x1 = int(cross_xmin) + cross_x2 = int(cross_xmin + cross_width) + + sample_img = np.zeros((height, width, 3)) + # print(sample_img.shape) + sample_img[roi_y1: roi_y2, roi_x1: roi_x2] = \ + img[cross_y1: cross_y2, cross_x1: cross_x2] + sample_img = cv2.resize( + sample_img, (resize_width, resize_height), interpolation=cv2.INTER_AREA) + + resize_val = resize_width + sample_labels = transform_labels_sampling(bbox_labels, sample_bbox, + resize_val, min_face_size) + return sample_img, sample_labels + + +def transform_labels_sampling(bbox_labels, sample_bbox, resize_val, + min_face_size): + sample_labels = [] + for i in range(len(bbox_labels)): + sample_label = [] + object_bbox = Bbox(bbox_labels[i][1], bbox_labels[i][2], + bbox_labels[i][3], bbox_labels[i][4]) + if not meet_emit_constraint(object_bbox, sample_bbox): + continue + proj_bbox = project_bbox(object_bbox, sample_bbox) + if proj_bbox: + real_width = float((proj_bbox.xmax - proj_bbox.xmin) * resize_val) + real_height = float((proj_bbox.ymax - proj_bbox.ymin) * resize_val) + if real_width * real_height < float(min_face_size * min_face_size): + continue + else: + sample_label.append(bbox_labels[i][0]) + sample_label.append(float(proj_bbox.xmin)) + sample_label.append(float(proj_bbox.ymin)) + sample_label.append(float(proj_bbox.xmax)) + sample_label.append(float(proj_bbox.ymax)) + sample_label = sample_label + bbox_labels[i][5:] + sample_labels.append(sample_label) + + return sample_labels + + +def generate_sample(sampler, image_width, image_height): + scale = np.random.uniform(sampler.min_scale, sampler.max_scale) + aspect_ratio = np.random.uniform(sampler.min_aspect_ratio, + sampler.max_aspect_ratio) + aspect_ratio = max(aspect_ratio, (scale**2.0)) + aspect_ratio = min(aspect_ratio, 1 / (scale**2.0)) + + bbox_width = scale * (aspect_ratio**0.5) + bbox_height = scale / (aspect_ratio**0.5) + + # guarantee a squared image patch after cropping + if sampler.use_square: + if image_height < image_width: + bbox_width = bbox_height * image_height / image_width + else: + bbox_height = bbox_width * image_width / image_height + + xmin_bound = 1 - bbox_width + ymin_bound = 1 - bbox_height + xmin = np.random.uniform(0, xmin_bound) + ymin = np.random.uniform(0, ymin_bound) + xmax = xmin + bbox_width + ymax = ymin + bbox_height + sampled_bbox = Bbox(xmin, ymin, xmax, ymax) + return sampled_bbox + + +def generate_batch_samples(batch_sampler, bbox_labels, image_width, + image_height): + sampled_bbox = [] + for sampler in batch_sampler: + found = 0 + for _ in range(sampler.max_trial): + if found >= sampler.max_sample: + break + sample_bbox = generate_sample(sampler, image_width, image_height) + if satisfy_sample_constraint(sampler, sample_bbox, bbox_labels): + sampled_bbox.append(sample_bbox) + found = found + 1 + return sampled_bbox + + +def crop_image(img, bbox_labels, sample_bbox, image_width, image_height, + resize_width, resize_height, min_face_size): + sample_bbox = clip_bbox(sample_bbox) + xmin = int(sample_bbox.xmin * image_width) + xmax = int(sample_bbox.xmax * image_width) + ymin = int(sample_bbox.ymin * image_height) + ymax = int(sample_bbox.ymax * image_height) + + sample_img = img[ymin:ymax, xmin:xmax] + resize_val = resize_width + sample_labels = transform_labels_sampling(bbox_labels, sample_bbox, + resize_val, min_face_size) + return sample_img, sample_labels + + +def to_chw_bgr(image): + """ + Transpose image from HWC to CHW and from RBG to BGR. + Args: + image (np.array): an image with HWC and RBG layout. + """ + # HWC to CHW + if len(image.shape) == 3: + image = np.swapaxes(image, 1, 2) + image = np.swapaxes(image, 1, 0) + # RBG to BGR + image = image[[2, 1, 0], :, :] + return image + + +def anchor_crop_image_sampling(img, + bbox_labels, + scale_array, + img_width, + img_height): + mean = np.array([104, 117, 123], dtype=np.float32) + maxSize = 12000 # max size + infDistance = 9999999 + bbox_labels = np.array(bbox_labels) + scale = np.array([img_width, img_height, img_width, img_height]) + + boxes = bbox_labels[:, 1:5] * scale + labels = bbox_labels[:, 0] + + boxArea = (boxes[:, 2] - boxes[:, 0] + 1) * (boxes[:, 3] - boxes[:, 1] + 1) + + rand_idx = np.random.randint(len(boxArea)) + rand_Side = boxArea[rand_idx] ** 0.5 + + distance = infDistance + anchor_idx = 5 + for i, anchor in enumerate(scale_array): + if abs(anchor - rand_Side) < distance: + distance = abs(anchor - rand_Side) + anchor_idx = i + + target_anchor = random.choice(scale_array[0:min(anchor_idx + 1, 5) + 1]) + ratio = float(target_anchor) / rand_Side + ratio = ratio * (2**random.uniform(-1, 1)) + + if int(img_height * ratio * img_width * ratio) > maxSize * maxSize: + ratio = (maxSize * maxSize / (img_height * img_width))**0.5 + + interp_methods = [cv2.INTER_LINEAR, cv2.INTER_CUBIC, + cv2.INTER_AREA, cv2.INTER_NEAREST, cv2.INTER_LANCZOS4] + interp_method = random.choice(interp_methods) + image = cv2.resize(img, None, None, fx=ratio, + fy=ratio, interpolation=interp_method) + + boxes[:, 0] *= ratio + boxes[:, 1] *= ratio + boxes[:, 2] *= ratio + boxes[:, 3] *= ratio + + height, width, _ = image.shape + + sample_boxes = [] + + xmin = boxes[rand_idx, 0] + ymin = boxes[rand_idx, 1] + bw = (boxes[rand_idx, 2] - boxes[rand_idx, 0] + 1) + bh = (boxes[rand_idx, 3] - boxes[rand_idx, 1] + 1) + + w = h = 640 + + for _ in range(50): + if w < max(height, width): + if bw <= w: + w_off = random.uniform(xmin + bw - w, xmin) + else: + w_off = random.uniform(xmin, xmin + bw - w) + + if bh <= h: + h_off = random.uniform(ymin + bh - h, ymin) + else: + h_off = random.uniform(ymin, ymin + bh - h) + else: + w_off = random.uniform(width - w, 0) + h_off = random.uniform(height - h, 0) + + w_off = math.floor(w_off) + h_off = math.floor(h_off) + + # convert to integer rect x1,y1,x2,y2 + rect = np.array( + [int(w_off), int(h_off), int(w_off + w), int(h_off + h)]) + + # keep overlap with gt box IF center in sampled patch + centers = (boxes[:, :2] + boxes[:, 2:]) / 2.0 + # mask in all gt boxes that above and to the left of centers + m1 = (rect[0] <= boxes[:, 0]) * (rect[1] <= boxes[:, 1]) + # mask in all gt boxes that under and to the right of centers + m2 = (rect[2] >= boxes[:, 2]) * (rect[3] >= boxes[:, 3]) + # mask in that both m1 and m2 are true + mask = m1 * m2 + + overlap = jaccard_numpy(boxes, rect) + # have any valid boxes? try again if not + if not mask.any() and not overlap.max() > 0.7: + continue + else: + sample_boxes.append(rect) + + sampled_labels = [] + + if sample_boxes: + choice_idx = np.random.randint(len(sample_boxes)) + choice_box = sample_boxes[choice_idx] + + centers = (boxes[:, :2] + boxes[:, 2:]) / 2.0 + m1 = (choice_box[0] < centers[:, 0]) * \ + (choice_box[1] < centers[:, 1]) + m2 = (choice_box[2] > centers[:, 0]) * \ + (choice_box[3] > centers[:, 1]) + mask = m1 * m2 + current_boxes = boxes[mask, :].copy() + current_labels = labels[mask] + current_boxes[:, :2] -= choice_box[:2] + current_boxes[:, 2:] -= choice_box[:2] + + if choice_box[0] < 0 or choice_box[1] < 0: + new_img_width = width if choice_box[ + 0] >= 0 else width - choice_box[0] + new_img_height = height if choice_box[ + 1] >= 0 else height - choice_box[1] + image_pad = np.zeros( + (new_img_height, new_img_width, 3), dtype=float) + image_pad[:, :, :] = mean + start_left = 0 if choice_box[0] >= 0 else -choice_box[0] + start_top = 0 if choice_box[1] >= 0 else -choice_box[1] + image_pad[start_top:, start_left:, :] = image + + choice_box_w = choice_box[2] - choice_box[0] + choice_box_h = choice_box[3] - choice_box[1] + + start_left = choice_box[0] if choice_box[0] >= 0 else 0 + start_top = choice_box[1] if choice_box[1] >= 0 else 0 + end_right = start_left + choice_box_w + end_bottom = start_top + choice_box_h + current_image = image_pad[ + start_top:end_bottom, start_left:end_right, :].copy() + image_height, image_width, _ = current_image.shape + if cfg.filter_min_face: + bbox_w = current_boxes[:, 2] - current_boxes[:, 0] + bbox_h = current_boxes[:, 3] - current_boxes[:, 1] + bbox_area_ = bbox_w * bbox_h + mask = bbox_area_ > (cfg.min_face_size * cfg.min_face_size) + current_boxes = current_boxes[mask] + current_labels = current_labels[mask] + for i in range(len(current_boxes)): + sample_label = [] + sample_label.append(current_labels[i]) + sample_label.append(current_boxes[i][0] / image_width) + sample_label.append(current_boxes[i][1] / image_height) + sample_label.append(current_boxes[i][2] / image_width) + sample_label.append(current_boxes[i][3] / image_height) + sampled_labels += [sample_label] + sampled_labels = np.array(sampled_labels) + else: + current_boxes /= np.array([image_width, + image_height, image_width, image_height]) + sampled_labels = np.hstack( + (current_labels[:, np.newaxis], current_boxes)) + + return current_image, sampled_labels + + current_image = image[choice_box[1]:choice_box[ + 3], choice_box[0]:choice_box[2], :].copy() + image_height, image_width, _ = current_image.shape + + if cfg.filter_min_face: + bbox_w = current_boxes[:, 2] - current_boxes[:, 0] + bbox_h = current_boxes[:, 3] - current_boxes[:, 1] + bbox_area_ = bbox_w * bbox_h + mask = bbox_area_ > (cfg.min_face_size * cfg.min_face_size) + current_boxes = current_boxes[mask] + current_labels = current_labels[mask] + for i in range(len(current_boxes)): + sample_label = [] + sample_label.append(current_labels[i]) + sample_label.append(current_boxes[i][0] / image_width) + sample_label.append(current_boxes[i][1] / image_height) + sample_label.append(current_boxes[i][2] / image_width) + sample_label.append(current_boxes[i][3] / image_height) + sampled_labels += [sample_label] + sampled_labels = np.array(sampled_labels) + else: + current_boxes /= np.array([image_width, + image_height, image_width, image_height]) + sampled_labels = np.hstack( + (current_labels[:, np.newaxis], current_boxes)) + + return current_image, sampled_labels + + image_height, image_width, _ = image.shape + if cfg.filter_min_face: + bbox_w = boxes[:, 2] - boxes[:, 0] + bbox_h = boxes[:, 3] - boxes[:, 1] + bbox_area_ = bbox_w * bbox_h + mask = bbox_area_ > (cfg.min_face_size * cfg.min_face_size) + boxes = boxes[mask] + labels = labels[mask] + for i in range(len(boxes)): + sample_label = [] + sample_label.append(labels[i]) + sample_label.append(boxes[i][0] / image_width) + sample_label.append(boxes[i][1] / image_height) + sample_label.append(boxes[i][2] / image_width) + sample_label.append(boxes[i][3] / image_height) + sampled_labels += [sample_label] + sampled_labels = np.array(sampled_labels) + else: + boxes /= np.array([image_width, image_height, + image_width, image_height]) + sampled_labels = np.hstack( + (labels[:, np.newaxis], boxes)) + + return image, sampled_labels + + +def preprocess(img, bbox_labels, mode): + img_width, img_height = img.size + sampled_labels = bbox_labels + if mode == 'train': + if cfg.apply_distort: + img = distort_image(img) + if cfg.apply_expand: + img, bbox_labels, img_width, img_height = expand_image( + img, bbox_labels, img_width, img_height) + + batch_sampler = [] + prob = np.random.uniform(0., 1.) + if prob > cfg.data_anchor_sampling_prob and cfg.anchor_sampling: + scale_array = np.array([16, 32, 64, 128, 256, 512]) + + img = np.array(img) + img, sampled_labels = anchor_crop_image_sampling( + img, bbox_labels, scale_array, img_width, img_height) + + img = img.astype('uint8') + img = Image.fromarray(img) + else: + batch_sampler.append(Sampler(1, 50, 1.0, 1.0, 1.0, 1.0, 0.0, 0.0, 1.0, + 0.0, True)) + batch_sampler.append(Sampler(1, 50, 0.3, 1.0, 1.0, 1.0, 0.0, 0.0, 1.0, + 0.0, True)) + batch_sampler.append(Sampler(1, 50, 0.3, 1.0, 1.0, 1.0, 0.0, 0.0, 1.0, + 0.0, True)) + batch_sampler.append(Sampler(1, 50, 0.3, 1.0, 1.0, 1.0, 0.0, 0.0, 1.0, + 0.0, True)) + batch_sampler.append(Sampler(1, 50, 0.3, 1.0, 1.0, 1.0, 0.0, 0.0, 1.0, + 0.0, True)) + sampled_bbox = generate_batch_samples( + batch_sampler, bbox_labels, img_width, img_height) + + img = np.array(img) + if sampled_bbox: + idx = int(np.random.uniform(0, len(sampled_bbox))) + img, sampled_labels = crop_image( + img, bbox_labels, sampled_bbox[idx], img_width, img_height, + cfg.resize_width, cfg.resize_height, cfg.min_face_size) + + img = Image.fromarray(img) + + interp_mode = [ + Image.BILINEAR, Image.HAMMING, Image.NEAREST, Image.BICUBIC, + Image.LANCZOS + ] + interp_indx = np.random.randint(0, 5) + + img = img.resize((cfg.resize_width, cfg.resize_height), + resample=interp_mode[interp_indx]) + + img = np.array(img) + + if mode == 'train': + mirror = int(np.random.uniform(0, 2)) + if mirror == 1: + img = img[:, ::-1, :] + for i in six.moves.xrange(len(sampled_labels)): + tmp = sampled_labels[i][1] + sampled_labels[i][1] = 1 - sampled_labels[i][3] + sampled_labels[i][3] = 1 - tmp + + img = to_chw_bgr(img) + img = img.astype('float32') + img -= cfg.img_mean + img = img[[2, 1, 0], :, :] # to RGB + + return img, sampled_labels diff --git a/research/cv/PyramidBox/src/bbox_utils.py b/research/cv/PyramidBox/src/bbox_utils.py new file mode 100644 index 0000000000000000000000000000000000000000..bb88f5a968c849b636f7aac6ed5b65b8851cb305 --- /dev/null +++ b/research/cv/PyramidBox/src/bbox_utils.py @@ -0,0 +1,309 @@ +# Copyright 2022 Huawei Technologies Co., Ltd +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================ + +# This file was copied from project [ZhaoWeicheng][Pyramidbox.pytorch] + +import numpy as np + + +def point_form(boxes): + """ Convert prior_boxes to (xmin, ymin, xmax, ymax) + representation for comparison to point form ground truth data. + Args: + boxes: center-size default boxes from priorbox layers. + Return: + boxes: Converted xmin, ymin, xmax, ymax form of boxes. + """ + return np.concatenate((boxes[:, :2] - boxes[:, 2:] / 2, + boxes[:, :2] + boxes[:, 2:] / 2), 1) + +def center_size(boxes): + """ Convert prior_boxes to (cx, cy, w, h) + representation for comparison to center-size form ground truth data. + Args: + boxes: point_form boxes + Return: + boxes: Converted xmin, ymin, xmax, ymax form of boxes. + """ + return np.concatenate([(boxes[:, 2:] + boxes[:, :2]) / 2, + boxes[:, 2:] - boxes[:, :2]], 1) + +def intersect(box_a, box_b): + """ We resize both tensors to [A,B,2] without new malloc: + [A,2] -> [A,1,2] -> [A,B,2] + [B,2] -> [1,B,2] -> [A,B,2] + Then we compute the area of intersect between box_a and box_b. + Args: + box_a: bounding boxes, Shape: [A,4]. + box_b: bounding boxes, Shape: [B,4]. + Return: + intersection area, Shape: [A,B]. + """ + A = box_a.shape[0] + B = box_b.shape[0] + + max_xy = np.minimum(np.broadcast_to(np.expand_dims(box_a[:, 2:], 1), (A, B, 2)), + np.broadcast_to(np.expand_dims(box_b[:, 2:], 0), (A, B, 2))) + min_xy = np.maximum(np.broadcast_to(np.expand_dims(box_a[:, :2], 1), (A, B, 2)), + np.broadcast_to(np.expand_dims(box_b[:, :2], 0), (A, B, 2))) + inter = np.clip((max_xy - min_xy), 0, np.inf) + return inter[:, :, 0] * inter[:, :, 1] + +def jaccard(box_a, box_b): + """Compute the jaccard overlap of two sets of boxes. The jaccard overlap + is simply the intersection over union of two boxes. Here we operate on + ground truth boxes and default boxes. + E.g.: + A 鈭� B / A 鈭� B = A 鈭� B / (area(A) + area(B) - A 鈭� B) + Args: + box_a: Ground truth bounding boxes, Shape: [num_objects,4] + box_b: Prior boxes from priorbox layers, Shape: [num_priors,4] + Return: + jaccard overlap: Shape: [box_a.size(0), box_b.size(0)] + """ + inter = intersect(box_a, box_b) + area_a = ((box_a[:, 2] - box_a[:, 0]) * + (box_a[:, 3] - box_a[:, 1])) + area_a = np.expand_dims(area_a, 1) + area_a = np.broadcast_to(area_a, inter.shape) + + area_b = ((box_b[:, 2] - box_b[:, 0]) * + (box_b[:, 3] - box_b[:, 1])) + area_b = np.expand_dims(area_b, 0) + area_b = np.broadcast_to(area_b, inter.shape) + + union = area_a + area_b - inter + + return inter / union + +def match(threshold, truths, priors, variances, labels, loc_t, conf_t, idx): + """Match each prior box with the ground truth box of the highest jaccard + overlap, encode the bounding boxes, then return the matched indices + corresponding to both confidence and location preds. + Args: + threshold: (float) The overlap threshold used when matching boxes. + truths: Ground truth boxes, Shape: [num_obj, num_priors]. + priors: Prior boxes from priorbox layers, Shape: [n_priors,4]. + variances: Variances corresponding to each prior coord, + Shape: [num_priors, 4]. + labels: All the class labels for the image, Shape: [num_obj]. + loc_t: Tensor to be filled w/ encoded location targets. + conf_t: Tensor to be filled w/ matched indices for conf preds. + idx: (int) current batch index + Return: + The matched indices corresponding to 1)location and 2)confidence preds. + """ + overlaps = jaccard(truths, point_form(priors)) + + # best_prior_overlap, best_prior_idx = overlaps.max(1, keepdim=True) + best_prior_overlap = np.max(overlaps, 1, keepdims=True) + best_prior_idx = np.argmax(overlaps, 1) + + best_truth_overlap = np.max(overlaps, 0, keepdims=True) + best_truth_idx = np.argmax(overlaps, 0) + + best_truth_idx = np.squeeze(best_truth_idx, 0) + best_truth_overlap = np.squeeze(best_truth_overlap, 0) + + best_prior_idx = np.squeeze(best_prior_idx, 1) + best_prior_overlap = np.squeeze(best_prior_overlap, 1) + + for i in best_prior_idx: + best_truth_overlap[i, :] = 2 + + for j in range(best_prior_idx.shape[0]): + best_truth_idx[best_prior_idx[j]] = j + + _th1, _th2, _th3 = threshold + + N = (np.sum(best_prior_overlap >= _th2) + + np.sum(best_prior_overlap >= _th3)) // 2 + + matches = truths[best_truth_idx] + conf = labels[best_truth_idx] + conf[best_truth_overlap < _th2] = 0 + + best_truth_overlap_clone = best_truth_overlap.copy() + idx_1 = np.greater(best_truth_overlap_clone, _th1) + idx_2 = np.less(best_truth_overlap_clone, _th2) + add_idx = np.equal(idx_1, idx_2) + + best_truth_overlap_clone[1 - add_idx] = 0 + stage2_overlap = np.sort(best_truth_overlap_clone)[:, ::-1] + stage2_idx = np.argsort(best_truth_overlap_clone)[:, ::-1] + + stage2_overlap = np.greater(stage2_overlap, _th1) + + if N > 0: + N = np.sum(stage2_overlap[:N]) if np.sum(stage2_overlap[:N]) < N else N + conf[stage2_idx[:N]] += 1 + + loc = encode(matches, priors, variances) + loc_t[idx] = loc + conf_t[idx] = conf + + +def match_ssd(threshold, truths, priors, variances, labels): + """Match each prior box with the ground truth box of the highest jaccard + overlap, encode the bounding boxes, then return the matched indices + corresponding to both confidence and location preds. + Args: + threshold: (float) The overlap threshold used when matching boxes. + truths: (tensor) Ground truth boxes, Shape: [num_obj, num_priors]. + priors: (tensor) Prior boxes from priorbox layers, Shape: [n_priors,4]. + variances: (tensor) Variances corresponding to each prior coord, + Shape: [num_priors, 4]. + labels: (tensor) All the class labels for the image, Shape: [num_obj]. + loc_t: (tensor) Tensor to be filled w/ encoded location targets. + conf_t: (tensor) Tensor to be filled w/ matched indices for conf preds. + idx: (int) current batch index + Return: + The matched indices corresponding to 1)location and 2)confidence preds. + """ + overlaps = jaccard(truths, point_form(priors)) + + # best_prior_overlap, best_prior_idx = overlaps.max(1, keepdim=True) + best_prior_overlap = np.max(overlaps, 1, keepdims=True) + best_prior_idx = np.argmax(overlaps, 1) + + best_truth_overlap = np.max(overlaps, 0, keepdims=True) + best_truth_idx = np.argmax(overlaps, 0) + + best_truth_overlap = np.squeeze(best_truth_overlap, 0) + best_prior_overlap = np.squeeze(best_prior_overlap, 1) + + for i in best_prior_idx: + best_truth_overlap[i] = 2 + + for j in range(best_prior_idx.shape[0]): + best_truth_idx[best_prior_idx[j]] = j + + matches = truths[best_truth_idx] + conf = labels[best_truth_idx] + conf[best_truth_overlap < threshold] = 0 + loc = encode(matches, priors, variances) + + return loc, conf + +def encode(matched, priors, variances): + """Encode the variances from the priorbox layers into the ground truth boxes + we have matched (based on jaccard overlap) with the prior boxes. + Args: + matched: Coords of ground truth for each prior in point-form + Shape: [num_priors, 4]. + priors: Prior boxes in center-offset form + Shape: [num_priors,4]. + variances: (list[float]) Variances of priorboxes + Return: + encoded boxes (tensor), Shape: [num_priors, 4] + """ + + # dist b/t match center and prior's center + g_cxcy = (matched[:, :2] + matched[:, 2:]) / 2 - priors[:, :2] + # encode variance + g_cxcy /= (variances[0] * priors[:, 2:]) + # match wh / prior wh + g_wh = (matched[:, 2:] - matched[:, :2]) / priors[:, 2:] + g_wh = np.log(g_wh) / variances[1] + # return target for smooth_l1_loss + return np.concatenate([g_cxcy, g_wh], 1) + + +def decode(loc, priors, variances): + """Decode locations from predictions using priors to undo + the encoding we did for offset regression at train time. + Args: + loc: location predictions for loc layers, + Shape: [num_priors,4] + priors: Prior boxes in center-offset form. + Shape: [num_priors,4]. + variances: (list[float]) Variances of priorboxes + Return: + decoded bounding box predictions + """ + if priors.shape[0] == 1: + priors = priors[0, :, :] + boxes = np.concatenate((priors[:, :2] + loc[:, :2] * variances[0] * priors[:, 2:], + priors[:, 2:] * np.exp(loc[:, 2:] * variances[1])), 1) + boxes[:, :2] -= boxes[:, 2:] / 2 + boxes[:, 2:] += boxes[:, :2] + return boxes + +def log_sum_exp(x): + """Utility function for computing log_sum_exp while determining + This will be used to determine unaveraged confidence loss across + all examples in a batch. + Args: + x (Variable(tensor)): conf_preds from conf layers + """ + x_max = x.max() + return np.log(np.sum(np.exp(x - x_max), 1, keepdim=True)) + x_max + + +def nms(boxes, scores, overlap=0.5, top_k=200): + """Apply non-maximum suppression at test time to avoid detecting too many + overlapping bounding boxes for a given object. + Args: + boxes: The location preds for the img, Shape: [num_priors,4]. + scores: The class predscores for the img, Shape:[num_priors]. + overlap: The overlap thresh for suppressing unnecessary boxes. + top_k: The Maximum number of box preds to consider. + Return: + The indices of the kept boxes with respect to num_priors. + """ + keep = np.zeros_like(scores).astype(np.int32) + if boxes.size == 0: + return keep, 0 + x1 = boxes[:, 0] + y1 = boxes[:, 1] + x2 = boxes[:, 2] + y2 = boxes[:, 3] + area = np.multiply(x2 - x1, y2 - y1) + idx = np.argsort(scores, axis=0) + + idx = idx[-top_k:] + + count = 0 + while idx.size > 0: + i = idx[-1] + keep[count] = i + count += 1 + if idx.shape[0] == 1: + break + idx = idx[:-1] + xx1 = x1[idx] + yy1 = y1[idx] + xx2 = x2[idx] + yy2 = y2[idx] + + xx1 = np.clip(xx1, x1[i], np.inf) + yy1 = np.clip(yy1, y1[i], np.inf) + xx2 = np.clip(xx2, -np.inf, x2[i]) + yy2 = np.clip(yy2, -np.inf, y2[i]) + + w = xx2 - xx1 + h = yy2 - yy1 + + w = np.clip(w, 0, np.inf) + h = np.clip(h, 0, np.inf) + inter = w * h + + rem_areas = area[idx] + union = (rem_areas - inter) + area[i] + IoU = inter / union + + idx = idx[np.less(IoU, overlap)] + + return keep, count diff --git a/research/cv/PyramidBox/src/config.py b/research/cv/PyramidBox/src/config.py new file mode 100644 index 0000000000000000000000000000000000000000..2c499f0c8379b03b1f546f74e3716d2c1fcc2540 --- /dev/null +++ b/research/cv/PyramidBox/src/config.py @@ -0,0 +1,82 @@ +# Copyright 2022 Huawei Technologies Co., Ltd +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================ + +# This file was copied from project [ZhaoWeicheng][Pyramidbox.pytorch] + +import os +from easydict import EasyDict +import numpy as np + + +_C = EasyDict() +cfg = _C +# data argument config +_C.expand_prob = 0.5 +_C.expand_max_ratio = 4 +_C.hue_prob = 0.5 +_C.hue_delta = 18 +_C.contrast_prob = 0.5 +_C.contrast_delta = 0.5 +_C.saturation_prob = 0.5 +_C.saturation_delta = 0.5 +_C.brightness_prob = 0.5 +_C.brightness_delta = 0.125 +_C.data_anchor_sampling_prob = 0.5 +_C.min_face_size = 6.0 +_C.apply_distort = True +_C.apply_expand = True +_C.img_mean = np.array([104., 117., 123.])[:, np.newaxis, np.newaxis].astype('float32') +_C.resize_width = 640 +_C.resize_height = 640 +_C.scale = 1 / 127.0 +_C.anchor_sampling = True +_C.filter_min_face = True + +# train config +_C.LR_STEPS = [80000, 100000, 120000] +_C.DIS_LR_STEPS = [30000, 35000, 40000] + +# anchor config +_C.FEATURE_MAPS = [[160, 160], [80, 80], [40, 40], [20, 20], [10, 10], [5, 5]] +_C.INPUT_SIZE = (640, 640) +_C.STEPS = [4, 8, 16, 32, 64, 128] +_C.ANCHOR_SIZES = [16, 32, 64, 128, 256, 512] +_C.CLIP = False +_C.VARIANCE = [0.1, 0.2] + +# loss config +_C.NUM_CLASSES = 2 +_C.OVERLAP_THRESH = 0.35 +_C.NEG_POS_RATIOS = 3 + + +# detection config +_C.NMS_THRESH = 0.3 +_C.TOP_K = 5000 +_C.KEEP_TOP_K = 750 +_C.CONF_THRESH = 0.05 + + +# dataset config +_C.HOME = '/data2/James/dataset/pyramidbox_dataset/' + +# face config +_C.FACE = EasyDict() +_C.FACE.FILE_DIR = os.path.dirname(os.path.realpath(__file__)) + '/../data' +_C.FACE.TRAIN_FILE = os.path.join(_C.FACE.FILE_DIR, 'face_train.txt') +_C.FACE.VAL_FILE = os.path.join(_C.FACE.FILE_DIR, 'face_val.txt') +_C.FACE.FDDB_DIR = os.path.join(_C.HOME, 'FDDB') +_C.FACE.WIDER_DIR = os.path.join(_C.HOME, 'WIDERFACE') +_C.FACE.OVERLAP_THRESH = 0.35 diff --git a/research/cv/PyramidBox/src/dataset.py b/research/cv/PyramidBox/src/dataset.py new file mode 100644 index 0000000000000000000000000000000000000000..1ba7f67436e429a92d5be659dd9f8a94bac85a93 --- /dev/null +++ b/research/cv/PyramidBox/src/dataset.py @@ -0,0 +1,173 @@ +# Copyright 2022 Huawei Technologies Co., Ltd +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================ + +import random +from PIL import Image +import numpy as np + +from mindspore import dataset as ds + +from src.augmentations import preprocess +from src.prior_box import PriorBox +from src.bbox_utils import match_ssd +from src.config import cfg + + +class WIDERDataset: + """docstring for WIDERDetection""" + + def __init__(self, list_file, mode='train'): + super(WIDERDataset, self).__init__() + self.mode = mode + self.fnames = [] + self.boxes = [] + self.labels = [] + prior_box = PriorBox(cfg) + self.default_priors = prior_box.forward() + self.num_priors = self.default_priors.shape[0] + self.match = match_ssd + self.threshold = cfg.FACE.OVERLAP_THRESH + self.variance = cfg.VARIANCE + + with open(list_file) as f: + lines = f.readlines() + + for line in lines: + line = line.strip().split() + num_faces = int(line[1]) + box = [] + label = [] + for i in range(num_faces): + x = float(line[2 + 5 * i]) + y = float(line[3 + 5 * i]) + w = float(line[4 + 5 * i]) + h = float(line[5 + 5 * i]) + c = int(line[6 + 5 * i]) + if w <= 0 or h <= 0: + continue + box.append([x, y, x + w, y + h]) + label.append(c) + if box: + self.fnames.append(line[0]) + self.boxes.append(box) + self.labels.append(label) + + self.num_samples = len(self.boxes) + + def __len__(self): + return self.num_samples + + def __getitem__(self, index): + img, face_loc, face_conf, head_loc, head_conf = self.pull_item(index) + return img, face_loc, face_conf, head_loc, head_conf + + def pull_item(self, index): + while True: + image_path = self.fnames[index] + img = Image.open(image_path) + if img.mode == 'L': + img = img.convert('RGB') + + im_width, im_height = img.size + boxes = self.annotransform(np.array(self.boxes[index]), im_width, im_height) + label = np.array(self.labels[index]) + bbox_labels = np.hstack((label[:, np.newaxis], boxes)).tolist() + img, sample_labels = preprocess(img, bbox_labels, self.mode) + sample_labels = np.array(sample_labels) + if sample_labels.size > 0: + face_target = np.hstack( + (sample_labels[:, 1:], sample_labels[:, 0][:, np.newaxis])) + + assert (face_target[:, 2] > face_target[:, 0]).any() + assert (face_target[:, 3] > face_target[:, 1]).any() + + face_box = face_target[:, :-1] + head_box = self.expand_bboxes(face_box) + head_target = np.hstack((head_box, face_target[ + :, -1][:, np.newaxis])) + break + else: + index = random.randrange(0, self.num_samples) + + face_truth = face_target[:, :-1] + face_label = face_target[:, -1] + + face_loc_t, face_conf_t = self.match(self.threshold, face_truth, self.default_priors, + self.variance, face_label) + head_truth = head_target[:, :-1] + head_label = head_target[:, -1] + head_loc_t, head_conf_t = self.match(self.threshold, head_truth, self.default_priors, + self.variance, head_label) + return img, face_loc_t, face_conf_t, head_loc_t, head_conf_t + + + def annotransform(self, boxes, im_width, im_height): + boxes[:, 0] /= im_width + boxes[:, 1] /= im_height + boxes[:, 2] /= im_width + boxes[:, 3] /= im_height + return boxes + + def expand_bboxes(self, + bboxes, + expand_left=2., + expand_up=2., + expand_right=2., + expand_down=2.): + expand_bboxes = [] + for bbox in bboxes: + xmin = bbox[0] + ymin = bbox[1] + xmax = bbox[2] + ymax = bbox[3] + w = xmax - xmin + h = ymax - ymin + ex_xmin = max(xmin - w / expand_left, 0.) + ex_ymin = max(ymin - h / expand_up, 0.) + ex_xmax = max(xmax + w / expand_right, 0.) + ex_ymax = max(ymax + h / expand_down, 0.) + expand_bboxes.append([ex_xmin, ex_ymin, ex_xmax, ex_ymax]) + expand_bboxes = np.array(expand_bboxes) + return expand_bboxes + +def create_val_dataset(mindrecord_file, batch_size, device_num=1, device_id=0, num_workers=8): + """ + Create user-defined mindspore dataset for training + """ + column_names = ['img', 'face_loc', 'face_conf', 'head_loc', 'head_conf'] + ds.config.set_num_parallel_workers(num_workers) + ds.config.set_enable_shared_mem(False) + ds.config.set_prefetch_size(batch_size * 2) + + train_dataset = ds.MindDataset(mindrecord_file, columns_list=column_names, shuffle=True, + shard_id=device_id, num_shards=device_num) + train_dataset = train_dataset.batch(batch_size=batch_size, drop_remainder=True) + + return train_dataset + +def create_train_dataset(cfg_, batch_size, device_num=1, device_id=0, num_workers=8): + """ + Create user-defined mindspore dataset for training + """ + column_names = ['img', 'face_loc', 'face_conf', 'head_loc', 'head_conf'] + ds.config.set_num_parallel_workers(num_workers) + ds.config.set_enable_shared_mem(False) + ds.config.set_prefetch_size(batch_size * 2) + train_dataset = ds.GeneratorDataset(WIDERDataset(cfg_.FACE.TRAIN_FILE, mode='train'), + column_names=column_names, shuffle=True, num_shards=device_num, + shard_id=device_id) + train_dataset = train_dataset.batch(batch_size=batch_size) + + return train_dataset diff --git a/research/cv/PyramidBox/src/detection.py b/research/cv/PyramidBox/src/detection.py new file mode 100644 index 0000000000000000000000000000000000000000..e9a9b705788459c3c21754ed6fb0a10c79553a1b --- /dev/null +++ b/research/cv/PyramidBox/src/detection.py @@ -0,0 +1,77 @@ +# Copyright 2022 Huawei Technologies Co., Ltd +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================ + +import numpy as np +from mindspore import Tensor +from src.bbox_utils import decode, nms + +class Detect: + """At test time, Detect is the final layer of SSD. Decode location preds, + apply non-maximum suppression to location predictions based on conf + scores and threshold to a top_k number of output predictions for both + confidence score and locations. + """ + def __init__(self, cfg): + self.num_classes = cfg.NUM_CLASSES + self.top_k = cfg.TOP_K + self.nms_thresh = cfg.NMS_THRESH + self.conf_thresh = cfg.CONF_THRESH + self.variance = cfg.VARIANCE + + def detect(self, loc_data, conf_data, prior_data): + """ + Args: + loc_data: (Tensor) Loc preds from loc layers + Shape: [batch, num_priors*4] + conf_data: (Tensor) Shape: Conf preds from conf layers + Shape: [batch*num_priors, num_classes] + prior_data: Prior boxes and variances from priorbox layers + Shape: [1,num_priors,4] + """ + if isinstance(loc_data, Tensor): + loc_data = loc_data.asnumpy() + if isinstance(conf_data, Tensor): + conf_data = conf_data.asnumpy() + + num = loc_data.shape[0] + num_priors = prior_data.shape[0] + + conf_preds = np.transpose(conf_data.reshape((num, num_priors, self.num_classes)), (0, 2, 1)) + batch_priors = prior_data.reshape((-1, num_priors, 4)) + batch_priors = np.broadcast_to(batch_priors, (num, num_priors, 4)) + decoded_boxes = decode(loc_data.reshape((-1, 4)), batch_priors, self.variance).reshape((num, num_priors, 4)) + + output = np.zeros((num, self.num_classes, self.top_k, 5)) + + for i in range(num): + boxes = decoded_boxes[i].copy() + conf_scores = conf_preds[i].copy() + + for cl in range(1, self.num_classes): + c_mask = np.greater(conf_scores[cl], self.conf_thresh) + scores = conf_scores[cl][c_mask] + + if scores.ndim == 0: + continue + + l_mask = np.expand_dims(c_mask, 1) + l_mask = np.broadcast_to(l_mask, boxes.shape) + + boxes_ = boxes[l_mask].reshape((-1, 4)) + + ids, count = nms(boxes_, scores, self.nms_thresh, self.top_k) + output[i, cl, :count] = np.concatenate((np.expand_dims(scores[ids[:count]], 1), + boxes_[ids[:count]]), 1) + return output diff --git a/research/cv/PyramidBox/src/evaluate.py b/research/cv/PyramidBox/src/evaluate.py new file mode 100644 index 0000000000000000000000000000000000000000..8b191eeb0efc44cd901456786ce61877d1caf231 --- /dev/null +++ b/research/cv/PyramidBox/src/evaluate.py @@ -0,0 +1,286 @@ +# Copyright 2022 Huawei Technologies Co., Ltd +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================ + +# This file was copied from project [RuisongZhou][FDDB_Evaluation] + +import os +import argparse +import tqdm +import numpy as np +import cv2 + + +def bbox_overlaps(boxes, query_boxes): + """ + Parameters + ---------- + boxes: (N, 4) ndarray of float + query_boxes: (K, 4) ndarray of float + Returns + ------- + overlaps: (N, K) ndarray of overlap between boxes and query_boxes + """ + N = boxes.shape[0] + K = query_boxes.shape[0] + overlaps = np.zeros((N, K), dtype=np.float32) + for k in range(K): + box_area = ( + (query_boxes[k, 2] - query_boxes[k, 0] + 1) * + (query_boxes[k, 3] - query_boxes[k, 1] + 1) + ) + for n in range(N): + iw = ( + min(boxes[n, 2], query_boxes[k, 2]) - + max(boxes[n, 0], query_boxes[k, 0]) + 1 + ) + if iw > 0: + ih = ( + min(boxes[n, 3], query_boxes[k, 3]) - + max(boxes[n, 1], query_boxes[k, 1]) + 1 + ) + if ih > 0: + ua = float( + (boxes[n, 2] - boxes[n, 0] + 1) * + (boxes[n, 3] - boxes[n, 1] + 1) + + box_area - iw * ih + ) + overlaps[n, k] = iw * ih / ua + return overlaps + + +def get_gt_boxes(gt_dir): + gt_dict = {} + for i in range(1, 11): + filename = os.path.join(gt_dir, 'FDDB-fold-{}-ellipseList.txt'.format('%02d' % i)) + assert os.path.exists(filename) + gt_sub_dict = {} + annotationfile = open(filename) + while True: + filename = annotationfile.readline()[:-1].replace('/', '_') + if not filename: + break + line = annotationfile.readline() + if not line: + break + facenum = int(line) + face_loc = [] + for _ in range(facenum): + line = annotationfile.readline().strip().split() + major_axis_radius = float(line[0]) + minor_axis_radius = float(line[1]) + angle = float(line[2]) + center_x = float(line[3]) + center_y = float(line[4]) + _ = float(line[5]) + angle = angle / 3.1415926 * 180 + mask = np.zeros((1000, 1000), dtype=np.uint8) + cv2.ellipse(mask, ((int)(center_x), (int)(center_y)), + ((int)(major_axis_radius), (int)(minor_axis_radius)), angle, 0., 360., (255, 255, 255)) + contours, _ = cv2.findContours(mask, cv2.RETR_LIST, cv2.CHAIN_APPROX_NONE)[-2:] + r = cv2.boundingRect(contours[0]) + x_min = r[0] + y_min = r[1] + x_max = r[0] + r[2] + y_max = r[1] + r[3] + face_loc.append([x_min, y_min, x_max, y_max]) + face_loc = np.array(face_loc) + + gt_sub_dict[filename] = face_loc + gt_dict[i] = gt_sub_dict + return gt_dict + +def read_pred_file(filepath): + with open(filepath, 'r') as f: + lines = f.readlines() + img_file = lines[0].rstrip('\n') + lines = lines[2:] + boxes = [] + for line in lines: + line = line.rstrip('\n').split(' ') + if line[0] == '': + continue + boxes.append([float(line[0]), float(line[1]), float(line[2]), float(line[3]), float(line[4])]) + boxes = np.array(boxes) + return img_file.split('/')[-1], boxes + +def get_preds_box(pred_dir): + events = os.listdir(pred_dir) + boxes = dict() + pbar = tqdm.tqdm(events) + for event in pbar: + pbar.set_description('Reading Predictions Boxes') + event_dir = os.path.join(pred_dir, event) + event_images = os.listdir(event_dir) + current_event = dict() + for imgtxt in event_images: + imgname, _boxes = read_pred_file(os.path.join(event_dir, imgtxt)) + current_event[imgname.rstrip('.jpg')] = _boxes + boxes[event] = current_event + return boxes + +def norm_score(pred): + """ norm score + pred {key: [[x1,y1,x2,y2,s]]} + """ + + max_score = 0 + min_score = 1 + + for _, k in pred.items(): + for _, v in k.items(): + if v.size == 0: + continue + _min = np.min(v[:, -1]) + _max = np.max(v[:, -1]) + max_score = max(_max, max_score) + min_score = min(_min, min_score) + + diff = max_score - min_score + for _, k in pred.items(): + for _, v in k.items(): + if v.size: + continue + v[:, -1] = (v[:, -1] - min_score) / diff + +def image_eval(pred, gt, ignore, iou_thresh): + """ single image evaluation + pred: Nx5 + gt: Nx4 + ignore: + """ + + _pred = pred.copy() + _gt = gt.copy() + pred_recall = np.zeros(_pred.shape[0]) + recall_list = np.zeros(_gt.shape[0]) + proposal_list = np.ones(_pred.shape[0]) + + _pred[:, 2] = _pred[:, 2] + _pred[:, 0] + _pred[:, 3] = _pred[:, 3] + _pred[:, 1] + + overlaps = bbox_overlaps(_pred[:, :4], _gt) + + for h in range(_pred.shape[0]): + gt_overlap = overlaps[h] + max_overlap, max_idx = gt_overlap.max(), gt_overlap.argmax() + if max_overlap >= iou_thresh: + if ignore[max_idx] == 0: + recall_list[max_idx] = -1 + proposal_list[h] = -1 + elif recall_list[max_idx] == 0: + recall_list[max_idx] = 1 + + r_keep_index = np.where(recall_list == 1)[0] + pred_recall[h] = len(r_keep_index) + return pred_recall, proposal_list + + +def img_pr_info(thresh_num, pred_info, proposal_list, pred_recall): + pr_info = np.zeros((thresh_num, 2)).astype('float') + for t in range(thresh_num): + + thresh = 1 - (t + 1) / thresh_num + r_index = np.where(pred_info[:, 4] >= thresh)[0] + if r_index.size == 0: + pr_info[t, 0] = 0 + pr_info[t, 1] = 0 + else: + r_index = r_index[-1] + p_index = np.where(proposal_list[:r_index + 1] == 1)[0] + pr_info[t, 0] = len(p_index) + pr_info[t, 1] = pred_recall[r_index] + return pr_info + +def dataset_pr_info(thresh_num, pr_curve, count_face): + _pr_curve = np.zeros((thresh_num, 2)) + + for i in range(thresh_num): + _pr_curve[i, 0] = pr_curve[i, 1] / pr_curve[i, 0] + _pr_curve[i, 1] = pr_curve[i, 1] / count_face + return _pr_curve + + +def voc_ap(rec, prec): + # correct AP calculation + # first append sentinel values at the end + mrec = np.concatenate(([0.], rec, [1.])) + mpre = np.concatenate(([0.], prec, [0.])) + + # compute the precision envelope + for i in range(mpre.size - 1, 0, -1): + mpre[i - 1] = np.maximum(mpre[i - 1], mpre[i]) + + # to calculate area under PR curve, look for points + # where X axis (recall) changes value + i = np.where(mrec[1:] != mrec[:-1])[0] + + # and sum (\Delta recall) * prec + ap = np.sum((mrec[i + 1] - mrec[i]) * mpre[i + 1]) + return ap + +def evaluation(pred, gt_path, iou_thresh=0.5): + pred = get_preds_box(pred) + norm_score(pred) + gt_box_dict = get_gt_boxes(gt_path) + event = list(pred.keys()) + event = [int(e) for e in event] + event.sort() + thresh_num = 1000 + aps = [] + + pbar = tqdm.tqdm(range(len(event))) + for setting_id in pbar: + pbar.set_description('Predicting ... ') + # different setting + count_face = 0 + pr_curve = np.zeros((thresh_num, 2)).astype('float') + gt = gt_box_dict[event[setting_id]] + pred_list = pred[str(event[setting_id])] + gt_list = list(gt.keys()) + for j in range(len(gt_list)): + gt_boxes = gt[gt_list[j]].astype('float') # from image name get gt boxes + pred_info = pred_list[gt_list[j]] + keep_index = np.array(range(1, len(gt_boxes) + 1)) + count_face += len(keep_index) + ignore = np.zeros(gt_boxes.shape[0]) + if gt_boxes.size == 0 or pred_info.size == 0: + continue + if keep_index.size != 0: + ignore[keep_index - 1] = 1 + pred_recall, proposal_list = image_eval(pred_info, gt_boxes, ignore, iou_thresh) + + _img_pr_info = img_pr_info(thresh_num, pred_info, proposal_list, pred_recall) + + pr_curve += _img_pr_info + pr_curve = dataset_pr_info(thresh_num, pr_curve, count_face) + + propose = pr_curve[:, 0] + recall = pr_curve[:, 1] + + ap = voc_ap(recall, propose) + aps.append(ap) + + print("==================== Results ====================") + for i in range(len(aps)): + print("FDDB-fold-{} Val AP: {}".format(event[i], aps[i])) + print("FDDB Dataset Average AP: {}".format(sum(aps)/len(aps))) + print("=================================================") + +if __name__ == "__main__": + parser = argparse.ArgumentParser() + parser.add_argument('--pred') + parser.add_argument('--gt') + args = parser.parse_args() + evaluation(args.pred, args.gt) diff --git a/research/cv/PyramidBox/src/loss.py b/research/cv/PyramidBox/src/loss.py new file mode 100644 index 0000000000000000000000000000000000000000..0f120c812295fc678fde3803d6c1eeb54ef8454e --- /dev/null +++ b/research/cv/PyramidBox/src/loss.py @@ -0,0 +1,82 @@ +# Copyright 2022 Huawei Technologies Co., Ltd +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================ + +from mindspore import nn, Tensor, ops +from mindspore import dtype as mstype +from mindspore import numpy as mnp +from src.config import cfg + + +class MultiBoxLoss(nn.Cell): + """SSD Weighted Loss Function + """ + def __init__(self, use_head_loss=False): + super(MultiBoxLoss, self).__init__() + self.use_head_loss = use_head_loss + self.num_classes = cfg.NUM_CLASSES + self.negpos_ratio = cfg.NEG_POS_RATIOS + self.cast = ops.Cast() + self.sum = ops.ReduceSum() + self.loc_loss = nn.SmoothL1Loss() + self.cls_loss = nn.SoftmaxCrossEntropyWithLogits(sparse=True) + self.sort_descending = ops.Sort(descending=True) + self.stack = ops.Stack(axis=1) + self.unsqueeze = ops.ExpandDims() + self.gather = ops.GatherNd() + + def construct(self, predictions, targets): + """Multibox Loss""" + if self.use_head_loss: + _, _, loc_data, conf_data = predictions + else: + loc_data, conf_data, _, _ = predictions + + loc_t, conf_t = targets + loc_data = self.cast(loc_data, mstype.float32) + conf_data = self.cast(conf_data, mstype.float32) + loc_t = self.cast(loc_t, mstype.float32) + conf_t = self.cast(conf_t, mstype.int32) + + batch_size, box_num, _ = conf_data.shape + + mask = self.cast(conf_t > 0, mstype.float32) + pos_num = self.sum(mask, 1) + + loc_loss = self.sum(self.loc_loss(loc_data, loc_t), 2) + loc_loss = self.sum(mask * loc_loss) + + # Hard Negative Mining + con = self.cls_loss(conf_data.view(-1, self.num_classes), conf_t.view(-1)) + con = con.view(batch_size, -1) + + con_neg = con * (1 - mask) + value, _ = self.sort_descending(con_neg) + neg_num = self.cast(ops.minimum(self.negpos_ratio * pos_num, box_num), mstype.int32) + batch_iter = Tensor(mnp.arange(batch_size), dtype=mstype.int32) + neg_index = self.stack((batch_iter, neg_num)) + min_neg_score = self.unsqueeze(self.gather(value, neg_index), 1) + neg_mask = self.cast(con_neg > min_neg_score, mstype.float32) + all_mask = mask + neg_mask + all_mask = ops.stop_gradient(all_mask) + + cls_loss = self.sum(con * all_mask) + + N = self.sum(pos_num) + N = ops.maximum(self.cast(N, mstype.float32), 0.25) + + loc_loss /= N + cls_loss /= N + + return loc_loss, cls_loss diff --git a/research/cv/PyramidBox/src/prior_box.py b/research/cv/PyramidBox/src/prior_box.py new file mode 100644 index 0000000000000000000000000000000000000000..d0acb5ce56a142c3993209c82d8b27f5b069d1d3 --- /dev/null +++ b/research/cv/PyramidBox/src/prior_box.py @@ -0,0 +1,62 @@ +# Copyright 2022 Huawei Technologies Co., Ltd +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================ + +# This file was copied from project [ZhaoWeicheng][Pyramidbox.pytorch] + +from itertools import product +import numpy as np + + +class PriorBox: + """Compute priorbox coordinates in center-offset form for each source + feature map. + """ + def __init__(self, cfg, feature_maps=None, input_size=(640, 640), phase='train'): + self.imh = input_size[0] + self.imw = input_size[1] + + # number of priors for feature map location (either 4 or 6) + self.variance = cfg.VARIANCE or [0.1] + if phase == 'train': + self.feature_maps = cfg.FEATURE_MAPS + else: + self.feature_maps = feature_maps + self.min_sizes = cfg.ANCHOR_SIZES + self.steps = cfg.STEPS + self.clip = cfg.CLIP + for v in self.variance: + if v <= 0: + raise ValueError('Variances must be greater than 0') + + def forward(self): + mean = [] + for k in range(len(self.feature_maps)): + feath = self.feature_maps[k][0] + featw = self.feature_maps[k][1] + for i, j in product(range(feath), range(featw)): + f_kw = self.imw / self.steps[k] + f_kh = self.imh / self.steps[k] + + cx = (j + 0.5) / f_kw + cy = (i + 0.5) / f_kh + + s_kw = self.min_sizes[k] / self.imw + s_kh = self.min_sizes[k] / self.imh + + mean += [cx, cy, s_kw, s_kh] + output = np.array(mean).reshape(-1, 4) + if self.clip: + output = np.clip(output, 0, 1) + return output diff --git a/research/cv/PyramidBox/src/pyramidbox.py b/research/cv/PyramidBox/src/pyramidbox.py new file mode 100644 index 0000000000000000000000000000000000000000..03a2b432be5ee7ea26879c3a9318318f2deb4ca7 --- /dev/null +++ b/research/cv/PyramidBox/src/pyramidbox.py @@ -0,0 +1,398 @@ +# Copyright 2022 Huawei Technologies Co., Ltd +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================ + +# This file was copied from project [ZhaoWeicheng][Pyramidbox.pytorch] + +from mindspore import nn, ops, Parameter, Tensor +from mindspore.common import initializer +from mindspore import dtype as mstype + +from src.loss import MultiBoxLoss + +class L2Norm(nn.Cell): + def __init__(self, n_channles, scale): + super(L2Norm, self).__init__() + self.n_channels = n_channles + self.gamma = scale or None + self.eps = 1e-10 + self.weight = Parameter(Tensor(shape=(self.n_channels), init=initializer.Constant(value=self.gamma), + dtype=mstype.float32)) + self.pow = ops.Pow() + self.sum = ops.ReduceSum() + self.div = ops.Div() + + def construct(self, x): + norm = self.pow(x, 2).sum(axis=1, keepdims=True) + norm = ops.sqrt(norm) + self.eps + x = self.div(x, norm) + out = self.weight[None, :][:, :, None][:, :, :, None].expand_as(x) * x + return out + +class ConvBn(nn.Cell): + """docstring for conv""" + + def __init__(self, + in_plane, + out_plane, + kernel_size, + stride, + padding): + super(ConvBn, self).__init__() + self.conv1 = nn.Conv2d(in_plane, out_plane, kernel_size, stride, pad_mode='pad', + padding=padding, has_bias=True, weight_init='xavier_uniform') + self.bn1 = nn.BatchNorm2d(out_plane) + + def construct(self, x): + x = self.conv1(x) + return self.bn1(x) + +class CPM(nn.Cell): + """docstring for CPM""" + + def __init__(self, in_plane): + super(CPM, self).__init__() + self.branch1 = ConvBn(in_plane, 1024, 1, 1, 0) + self.branch2a = ConvBn(in_plane, 256, 1, 1, 0) + self.branch2b = ConvBn(256, 256, 3, 1, 1) + self.branch2c = ConvBn(256, 1024, 1, 1, 0) + + self.relu = nn.ReLU() + + self.ssh_1 = nn.Conv2d(1024, 256, kernel_size=3, stride=1, pad_mode='pad', padding=1, + has_bias=True, weight_init='xavier_uniform') + self.ssh_dimred = nn.Conv2d(1024, 128, kernel_size=3, stride=1, pad_mode='pad', padding=1, + has_bias=True, weight_init='xavier_uniform') + self.ssh_2 = nn.Conv2d(128, 128, kernel_size=3, stride=1, pad_mode='pad', padding=1, + has_bias=True, weight_init='xavier_uniform') + self.ssh_3a = nn.Conv2d(128, 128, kernel_size=3, stride=1, pad_mode='pad', padding=1, has_bias=True, + weight_init='xavier_uniform') + self.ssh_3b = nn.Conv2d(128, 128, kernel_size=3, stride=1, pad_mode='pad', padding=1, + has_bias=True, weight_init='xavier_uniform') + self.cat = ops.Concat(1) + + def construct(self, x): + out_residual = self.branch1(x) + x = self.relu(self.branch2a(x)) + x = self.relu(self.branch2b(x)) + x = self.branch2c(x) + + rescomb = self.relu(x + out_residual) + ssh1 = self.ssh_1(rescomb) + ssh_dimred = self.relu(self.ssh_dimred(rescomb)) + ssh_2 = self.ssh_2(ssh_dimred) + ssh_3a = self.relu(self.ssh_3a(ssh_dimred)) + ssh_3b = self.ssh_3b(ssh_3a) + + ssh_out = self.cat((ssh1, ssh_2, ssh_3b)) + ssh_out = self.relu(ssh_out) + return ssh_out + + +class PyramidBox(nn.Cell): + """docstring for PyramidBox""" + + def __init__(self, + phase, + base, + extras, + lfpn_cpm, + head, + num_classes): + super(PyramidBox, self).__init__() + + self.vgg = nn.CellList(base) + self.extras = nn.CellList(extras) + self.num_classes = num_classes + + self.L2Norm3_3 = L2Norm(256, 10) + self.L2Norm4_3 = L2Norm(512, 8) + self.L2Norm5_3 = L2Norm(512, 5) + + self.lfpn_topdown = nn.CellList(lfpn_cpm[0]) + self.lfpn_later = nn.CellList(lfpn_cpm[1]) + self.cpm = nn.CellList(lfpn_cpm[2]) + + self.loc_layers = nn.CellList(head[0]) + self.conf_layers = nn.CellList(head[1]) + + self.relu = nn.ReLU() + self.concat = ops.Concat(1) + + self.is_infer = False + + if phase == 'test': + self.softmax = nn.Softmax(axis=-1) + self.is_infer = True + + def _upsample_prod(self, x, y): + _, _, H, W = y.shape + resize_bilinear = nn.ResizeBilinear() + result = resize_bilinear(x, size=(H, W), align_corners=True) * y + return result + + def construct(self, x): + # apply vgg up to conv3_3 relu + for k in range(16): + x = self.vgg[k](x) + conv3_3 = x + # apply vgg up to conv4_3 + for k in range(16, 23): + x = self.vgg[k](x) + conv4_3 = x + + for k in range(23, 30): + x = self.vgg[k](x) + conv5_3 = x + + for k in range(30, len(self.vgg)): + x = self.vgg[k](x) + convfc_7 = x + # apply extra layers and cache source layer outputs + for k in range(2): + x = self.relu(self.extras[k](x)) + conv6_2 = x + + for k in range(2, 4): + x = self.relu(self.extras[k](x)) + conv7_2 = x + + x = self.relu(self.lfpn_topdown[0](convfc_7)) + lfpn2_on_conv5 = self.relu(self._upsample_prod( + x, self.lfpn_later[0](conv5_3))) + + x = self.relu(self.lfpn_topdown[1](lfpn2_on_conv5)) + lfpn1_on_conv4 = self.relu(self._upsample_prod( + x, self.lfpn_later[1](conv4_3))) + + x = self.relu(self.lfpn_topdown[2](lfpn1_on_conv4)) + lfpn0_on_conv3 = self.relu(self._upsample_prod( + x, self.lfpn_later[2](conv3_3))) + + + ssh_conv3_norm = self.cpm[0](self.L2Norm3_3(lfpn0_on_conv3)) + ssh_conv4_norm = self.cpm[1](self.L2Norm4_3(lfpn1_on_conv4)) + ssh_conv5_norm = self.cpm[2](self.L2Norm5_3(lfpn2_on_conv5)) + ssh_convfc7 = self.cpm[3](convfc_7) + ssh_conv6 = self.cpm[4](conv6_2) + ssh_conv7 = self.cpm[5](conv7_2) + + face_locs, face_confs = [], [] + head_locs, head_confs = [], [] + + N = ssh_conv3_norm.shape[0] + mbox_loc = self.loc_layers[0](ssh_conv3_norm) + face_loc, head_loc = ops.Split(axis=1, output_num=2)(mbox_loc) + + + face_loc = ops.Transpose()(face_loc, (0, 2, 3, 1)).view(N, -1, 4) + if not self.is_infer: + head_loc = ops.Transpose()(head_loc, (0, 2, 3, 1)).view(N, -1, 4) + + mbox_conf = self.conf_layers[0](ssh_conv3_norm) + face_conf1 = mbox_conf[:, 3:4, :, :] + + _, face_conf3_maxin = ops.ArgMaxWithValue(axis=1, keep_dims=True)(mbox_conf[:, 0:3, :, :]) + + face_conf = self.concat((face_conf3_maxin, face_conf1)) + face_conf = ops.Transpose()(face_conf, (0, 2, 3, 1)).view(N, -1, 2) + + head_conf = None + if not self.is_infer: + _, head_conf3_maxin = ops.ArgMaxWithValue(axis=1, keep_dims=True)(mbox_conf[:, 4:7, :, :]) + head_conf1 = mbox_conf[:, 7:, :, :] + head_conf = self.concat((head_conf3_maxin, head_conf1)) + head_conf = ops.Transpose()(head_conf, (0, 2, 3, 1)).view(N, -1, 2) + + face_locs.append(face_loc) + face_confs.append(face_conf) + + if not self.is_infer: + head_locs.append(head_loc) + head_confs.append(head_conf) + + inputs = [ssh_conv4_norm, ssh_conv5_norm, + ssh_convfc7, ssh_conv6, ssh_conv7] + + feature_maps = [] + feat_size = ssh_conv3_norm.shape[2:] + feature_maps.append([feat_size[0], feat_size[1]]) + + for i, feat in enumerate(inputs): + feat_size = feat.shape[2:] + feature_maps.append([feat_size[0], feat_size[1]]) + mbox_loc = self.loc_layers[i + 1](feat) + face_loc, head_loc = ops.Split(axis=1, output_num=2)(mbox_loc) + face_loc = ops.Transpose()(face_loc, (0, 2, 3, 1)).view(N, -1, 4) + if not self.is_infer: + head_loc = ops.Transpose()(head_loc, (0, 2, 3, 1)).view(N, -1, 4) + + mbox_conf = self.conf_layers[i + 1](feat) + face_conf1 = mbox_conf[:, 0:1, :, :] + _, face_conf3_maxin = ops.ArgMaxWithValue(axis=1, keep_dims=True)(mbox_conf[:, 1:4, :, :]) + face_conf = self.concat((face_conf1, face_conf3_maxin)) + face_conf = ops.Transpose()(face_conf, (0, 2, 3, 1)).ravel().view(N, -1, 2) + + if not self.is_infer: + head_conf = ops.Transpose()(mbox_conf[:, 4:, :, :], (0, 2, 3, 1)).view(N, -1, 2) + + face_locs.append(face_loc) + face_confs.append(face_conf) + + if not self.is_infer: + head_locs.append(head_loc) + head_confs.append(head_conf) + + face_mbox_loc = self.concat(face_locs) + face_mbox_conf = self.concat(face_confs) + + head_mbox_loc, head_mbox_conf = None, None + if not self.is_infer: + head_mbox_loc = self.concat(head_locs) + head_mbox_conf = self.concat(head_confs) + + if not self.is_infer: + output = (face_mbox_loc, face_mbox_conf, head_mbox_loc, head_mbox_conf) + else: + output = (face_mbox_loc, self.softmax(face_mbox_conf), feature_maps) + return output + +vgg_cfg = [64, 64, 'M', 128, 128, 'M', 256, 256, 256, 'M', 512, 512, 512, 'M', + 512, 512, 512, 'M'] + +extras_cfg = [256, 'S', 512, 128, 'S', 256] + +lfpn_cpm_cfg = [256, 512, 512, 1024, 512, 256] + +multibox_cfg = [512, 512, 512, 512, 512, 512] + + +def vgg_(cfg, i, batch_norm=False): + layers = [] + in_channels = i + for v in cfg: + if v == 'M': + layers += [nn.MaxPool2d(kernel_size=2, stride=2)] + elif v == 'C': + layers += [nn.MaxPool2d(kernel_size=2, stride=2)] + else: + conv2d = nn.Conv2d(in_channels, v, kernel_size=3, pad_mode='pad', padding=1, + has_bias=True, weight_init='xavier_uniform') + if batch_norm: + layers += [conv2d, nn.BatchNorm2d(v), nn.ReLU()] + else: + layers += [conv2d, nn.ReLU()] + in_channels = v + + conv6 = nn.Conv2d(512, 1024, kernel_size=3, pad_mode='pad', padding=6, + dilation=6, has_bias=True, weight_init='xavier_uniform') + conv7 = nn.Conv2d(1024, 1024, kernel_size=1, has_bias=True, weight_init='xavier_uniform') + layers += [conv6, nn.ReLU(), conv7, nn.ReLU()] + return layers + + +def add_extras(cfg, i): + # Extra layers added to VGG for feature scaling + layers = [] + in_channels = i + flag = False + for k, v in enumerate(cfg): + if in_channels != 'S': + if v == 'S': + layers += [nn.Conv2d(in_channels, cfg[k + 1], kernel_size=(1, 3)[flag], stride=2, + pad_mode='pad', padding=1, has_bias=True, weight_init='xavier_uniform')] + else: + layers += [nn.Conv2d(in_channels, v, kernel_size=(1, 3)[flag], + has_bias=True, weight_init='xavier_uniform')] + flag = not flag + in_channels = v + return layers + + +def add_lfpn_cpm(cfg): + lfpn_topdown_layers = [] + lfpn_latlayer = [] + cpm_layers = [] + + for k, v in enumerate(cfg): + cpm_layers.append(CPM(v)) + + fpn_list = cfg[::-1][2:] + for k, v in enumerate(fpn_list[:-1]): + lfpn_latlayer.append(nn.Conv2d(fpn_list[k + 1], fpn_list[k + 1], kernel_size=1, + stride=1, padding=0, has_bias=True, weight_init='xavier_uniform')) + lfpn_topdown_layers.append(nn.Conv2d(v, fpn_list[k + 1], kernel_size=1, stride=1, + padding=0, has_bias=True, weight_init='xavier_uniform')) + + return (lfpn_topdown_layers, lfpn_latlayer, cpm_layers) + + +def multibox(vgg, extra_layers): + loc_layers = [] + conf_layers = [] + vgg_source = [21, 28, -2] + i = 0 + loc_layers += [nn.Conv2d(multibox_cfg[i], 8, kernel_size=3, pad_mode='pad', padding=1, + has_bias=True, weight_init='xavier_uniform')] + conf_layers += [nn.Conv2d(multibox_cfg[i], 8, kernel_size=3, pad_mode='pad', padding=1, + has_bias=True, weight_init='xavier_uniform')] + i += 1 + for _, _ in enumerate(vgg_source): + loc_layers += [nn.Conv2d(multibox_cfg[i], 8, kernel_size=3, pad_mode='pad', padding=1, + has_bias=True, weight_init='xavier_uniform')] + conf_layers += [nn.Conv2d(multibox_cfg[i], 6, kernel_size=3, pad_mode='pad', padding=1, + has_bias=True, weight_init='xavier_uniform')] + i += 1 + for _, _ in enumerate(extra_layers[1::2], 2): + loc_layers += [nn.Conv2d(multibox_cfg[i], 8, kernel_size=3, pad_mode='pad', + padding=1, has_bias=True, weight_init='xavier_uniform')] + conf_layers += [nn.Conv2d(multibox_cfg[i], 6, kernel_size=3, pad_mode='pad', padding=1, + has_bias=True, weight_init='xavier_uniform')] + i += 1 + return vgg, extra_layers, (loc_layers, conf_layers) + + +def build_net(phase, num_classes=2): + base_, extras_, head_ = multibox(vgg_(vgg_cfg, 3), add_extras((extras_cfg), 1024)) + lfpn_cpm = add_lfpn_cpm(lfpn_cpm_cfg) + return PyramidBox(phase, base_, extras_, lfpn_cpm, head_, num_classes) + +class NetWithLoss(nn.Cell): + def __init__(self, net): + super(NetWithLoss, self).__init__() + self.net = net + self.loss_fn_1 = MultiBoxLoss() + self.loss_fn_2 = MultiBoxLoss(use_head_loss=True) + + def construct(self, images, face_loc, face_conf, head_loc, head_conf): + out = self.net(images) + face_loss_l, face_loss_c = self.loss_fn_1(out, (face_loc, face_conf)) + head_loss_l, head_loss_c = self.loss_fn_2(out, (head_loc, head_conf)) + loss = face_loss_l + face_loss_c + head_loss_l + head_loss_c + return loss + +class EvalLoss(nn.Cell): + """ + Calculate loss value while training. + """ + def __init__(self, net): + super(EvalLoss, self).__init__() + self.net = net + self.loss_fn = MultiBoxLoss() + + def construct(self, images, face_loc, face_conf): + out = self.net(images) + face_loss_l, face_loss_c = self.loss_fn(out, (face_loc, face_conf)) + loss = face_loss_l + face_loss_c + return loss diff --git a/research/cv/PyramidBox/train.py b/research/cv/PyramidBox/train.py new file mode 100644 index 0000000000000000000000000000000000000000..fae99bbb9f8479a5debae1832eb42143be70a4a3 --- /dev/null +++ b/research/cv/PyramidBox/train.py @@ -0,0 +1,143 @@ +# Copyright 2022 Huawei Technologies Co., Ltd +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================ + +import argparse +import os +import time +from mindspore import context, nn +from mindspore.common import set_seed +from mindspore import save_checkpoint, load_checkpoint, load_param_into_net +from mindspore.communication import management as D +from mindspore.communication.management import get_group_size, get_rank +from src.pyramidbox import build_net, NetWithLoss, EvalLoss +from src.dataset import create_val_dataset, create_train_dataset +from src.config import cfg + +MIN_LOSS = 10000 + +def parse_args(): + parser = argparse.ArgumentParser(description='Pyramidbox face Detector Training With MindSpore') + parser.add_argument('--basenet', default='vgg16.ckpt', help='Pretrained base model') + parser.add_argument('--batch_size', default=4, type=int, help='Batch size for training') + parser.add_argument('--num_workers', default=8, type=int, help='Number of workers used in dataloading') + parser.add_argument('--device_target', dest='device_target', help='device for training', + choices=['GPU', 'Ascend'], default='GPU', type=str) + parser.add_argument('--lr', '--learning-rate', default=0.001, type=float, help='initial learning rate') + parser.add_argument('--momentum', default=0.9, type=float, help='Momentum value for optim') + parser.add_argument('--weight_decay', default=5e-4, type=float, help='Weight decay for SGD') + parser.add_argument('--gamma', default=0.1, type=float, help='Gamma update for SGD') + parser.add_argument('--distribute', default=False, type=bool, help='Use mutil Gpu training') + parser.add_argument('--save_folder', default='checkpoints/', help='Directory for saving checkpoint models') + parser.add_argument('--epoches', default=100, type=int, help="Epoches to train model") + parser.add_argument('--val_mindrecord', default='data/val.mindrecord', type=str, help="Path of val mindrecord file") + args_ = parser.parse_args() + return args_ + +def train(args): + print("The argument is: ", args) + context.set_context(device_target=args.device_target, mode=context.GRAPH_MODE) + device_id = 0 + device_num = 1 + ckpt_folder = os.path.join(args.save_folder, 'distribute_0') + if args.distribute: + D.init() + device_id = get_rank() + device_num = get_group_size() + if device_id == 0 and not os.path.exists(ckpt_folder): + os.mkdir(ckpt_folder) + + context.reset_auto_parallel_context() + context.set_auto_parallel_context(parallel_mode=context.ParallelMode.DATA_PARALLEL, gradients_mean=True, + device_num=device_num) + + else: + context.set_context(device_id=int(os.getenv('DEVICE_ID', '0'))) + + # Create train dataset + ds_train = create_train_dataset(cfg, args.batch_size, device_num, device_id, args.num_workers) + + # Create val dataset + ds_val = create_val_dataset(args.val_mindrecord, args.batch_size, 1, 0, args.num_workers) + + steps_per_epoch = ds_train.get_dataset_size() + net = build_net("train", cfg.NUM_CLASSES) + + # load pretrained vgg16 + vgg_params = load_checkpoint(args.basenet) + load_param_into_net(net.vgg, vgg_params) + + network = NetWithLoss(net) + network.set_train(True) + + if args.distribute: + milestone = cfg.DIS_LR_STEPS + [args.epoches * steps_per_epoch] + else: + milestone = cfg.LR_STEPS + [args.epoches * steps_per_epoch] + + learning_rates = [args.lr, args.lr * 0.1, args.lr * 0.01, args.lr * 0.001] + lr_scheduler = nn.piecewise_constant_lr(milestone, learning_rates) + + optimizer = nn.SGD(params=network.trainable_params(), learning_rate=lr_scheduler, momentum=args.momentum, + weight_decay=args.weight_decay) + + # train net + train_net = nn.TrainOneStepCell(network, optimizer) + train_net.set_train(True) + eval_net = EvalLoss(net) + + print("Start training net") + whole_step = 0 + for epoch in range(1, args.epoches+1): + step = 0 + time_list = [] + for d in ds_train.create_tuple_iterator(): + start_time = time.time() + loss = train_net(*d) + step += 1 + whole_step += 1 + print(f'epoch: {epoch} total step: {whole_step}, step: {step}, loss is {loss}') + per_time = time.time() - start_time + time_list.append(per_time) + + net.set_train(False) + if args.distribute and device_id == 0: + print('per step time: ', '%.2f' % (sum(time_list) / len(time_list) * 1000), "(ms/step)") + val(epoch, eval_net, train_net, ds_val, ckpt_folder) + + elif not args.distribute: + print('per step time: ', '%.2f' % (sum(time_list) / len(time_list) * 1000), "(ms/step)") + val(epoch, eval_net, train_net, ds_val, args.save_folder) + net.set_train(True) + +def val(epoch, eval_net, model, ds_val, ckpt_dir): + face_loss_list = [] + global MIN_LOSS + for (images, face_loc, face_conf, _, _) in ds_val.create_tuple_iterator(): + face_loss = eval_net(images, face_loc, face_conf) + face_loss_list.append(face_loss) + + a_loss = sum(face_loss_list) / len(face_loss_list) + if a_loss < MIN_LOSS: + MIN_LOSS = a_loss + print("Saving best ckpt, epoch is ", epoch) + save_checkpoint(model, os.path.join(ckpt_dir, f'pyramidbox_best_{epoch}.ckpt')) + + +if __name__ == '__main__': + train_args = parse_args() + set_seed(66) + if not os.path.exists(train_args.save_folder): + os.mkdir(train_args.save_folder) + train(train_args)