Shuffle Transformer with Feature Alignment for Video Face Parsing

Rui Zhang; Yang Han; Zilong Huang; Pei Cheng; Guozhong Luo; Gang Yu; Bin Fu

ビデオ顔解析用の機能調整を備えたシャッフルトランスフォーマー

これは、CVPR 2021での3人目の人物のコンテキスト（PIC）ワークショップとチャレンジのショートビデオ顔解析トラック用のチームTCParserのソリューションを紹介する短いテクニカルレポートです。このペーパーでは、クロスウィンドウである強力なバックボーンを紹介します。正確な顔解析表現を提示するためのベースのシャッフルトランスフォーマー。特にエッジでより細かいセグメンテーション結果を取得するために、Feature Alignment Aggregation（FAA）モジュールを導入します。多重解像度機能の集約によって引き起こされる機能の不整合の問題を効果的に軽減できます。より強力なバックボーンとより優れた機能集約の恩恵を受けて、提案された方法は、1位にランクされたサードパーソンインコンテキスト（PIC）ワークショップおよびチャレンジのショートビデオ顔解析トラックで86.9519％のスコアを達成します。

This is a short technical report introducing the solution of the Team TCParser for Short-video Face Parsing Track of The 3rd Person in Context (PIC) Workshop and Challenge at CVPR 2021. In this paper, we introduce a strong backbone which is cross-window based Shuffle Transformer for presenting accurate face parsing representation. To further obtain the finer segmentation results, especially on the edges, we introduce a Feature Alignment Aggregation (FAA) module. It can effectively relieve the feature misalignment issue caused by multi-resolution feature aggregation. Benefiting from the stronger backbone and better feature aggregation, the proposed method achieves 86.9519% score in the Short-video Face Parsing track of the 3rd Person in Context (PIC) Workshop and Challenge, ranked the first place.

updated: Wed Jun 16 2021 09:25:33 GMT+0000 (UTC)

published: Wed Jun 16 2021 09:25:33 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト