Multimodal Side-Tuning for Document Classification

Stefano Pio Zingaro; Giuseppe Lisanti; Maurizio Gabbrielli

ドキュメント分類のためのマルチモーダルサイドチューニング

このホワイトペーパーでは、マルチモーダルドキュメント分類にサイドチューニングフレームワークを活用することを提案します。サイドチューニングは、以前のアプローチに関連するいくつかの問題を解決するために最近導入されたネットワーク適応の方法論です。この手法のおかげで、微調整によって、モデルの硬直性と転移学習の壊滅的な忘却を実際に克服することができます。提案されたソリューションは、サイドチューニングフレームワークを活用する市販のディープラーニングアーキテクチャを使用して、基本モデルを 2 つのサイドネットワークのタンデムと組み合わせます。ドキュメント分類のテキストと画像など、さまざまなデータソースが考慮される場合にも、サイドチューニングをうまく使用できることを示します。実験結果は、このアプローチが、最先端のドキュメント分類精度の限界をさらに押し上げることを示しています。

In this paper, we propose to exploit the side-tuning framework for multimodal document classification. Side-tuning is a methodology for network adaptation recently introduced to solve some of the problems related to previous approaches. Thanks to this technique it is actually possible to overcome model rigidity and catastrophic forgetting of transfer learning by fine-tuning. The proposed solution uses off-the-shelf deep learning architectures leveraging the side-tuning framework to combine a base model with a tandem of two side networks. We show that side-tuning can be successfully employed also when different data sources are considered, e.g. text and images in document classification. The experimental results show that this approach pushes further the limit for document classification accuracy with respect to the state of the art.

updated: Mon Jan 23 2023 14:28:15 GMT+0000 (UTC)

published: Mon Jan 16 2023 11:08:03 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト