Efficient Online Processing with Deep Neural Networks

Lukas Hedegaard

ディープニューラルネットワークによる効率的なオンライン処理

ディープニューラルネットワーク (DNN) の機能と導入は驚くべきペースで成長しています。ビジョンモデルはビデオ内の人間の行動を正確に分類し、医療スキャンでは人間の専門家と同じくらい正確に癌組織を特定します。大規模な言語モデルは、幅広い質問に答え、コードを生成し、散文を書き、毎日の食卓の会話の話題になります。それらの使用法は爽快ですが、モデルのサイズと計算の複雑さが継続的に増加することには暗い側面があります。トレーニングとサービスモデルの経済的コストと負の環境外部性は、財政的な実行可能性や気候変動対策の目標と明らかに不調和です。この論文は、予測パフォーマンスのさらなる向上を追求するのではなく、ニューラルネットワークの効率の向上に特化しています。具体的には、コア貢献は、オンライン推論中の効率の側面に対処します。ここでは、継続推論ネットワーク (CIN) の概念が提案され、4 つの出版物にわたって検討されています。 CIN は、時空間データのオフライン処理用に開発された従来の最先端の手法を拡張し、事前にトレーニングされた重みを再利用して、オンライン処理効率を一桁向上させます。これらの進歩は、ボトムアップの計算再構成と賢明なアーキテクチャの変更によって達成されます。オンライン推論のメリットは、3D CNN、ST-GCN、Transformer Encoder など、広く使用されているいくつかのネットワークアーキテクチャを CIN に再定式化することによって実証されます。直交貢献は、大規模なソースモデルを複数の軽量の派生モデルに同時に適応させ、計算を高速化することに取り組みます。可融アダプターネットワークと構造化プルーニングを利用する構造化プルーニングアダプターは、プルーニングによる微調整と比較して、学習した重みが大幅に少なく、積極的なプルーニングの下で優れた予測精度を実現します。

The capabilities and adoption of deep neural networks (DNNs) grow at an exhilarating pace: Vision models accurately classify human actions in videos and identify cancerous tissue in medical scans as precisely than human experts; large language models answer wide-ranging questions, generate code, and write prose, becoming the topic of everyday dinner-table conversations. Even though their uses are exhilarating, the continually increasing model sizes and computational complexities have a dark side. The economic cost and negative environmental externalities of training and serving models is in evident disharmony with financial viability and climate action goals. Instead of pursuing yet another increase in predictive performance, this dissertation is dedicated to the improvement of neural network efficiency. Specifically, a core contribution addresses the efficiency aspects during online inference. Here, the concept of Continual Inference Networks (CINs) is proposed and explored across four publications. CINs extend prior state-of-the-art methods developed for offline processing of spatio-temporal data and reuse their pre-trained weights, improving their online processing efficiency by an order of magnitude. These advances are attained through a bottom-up computational reorganization and judicious architectural modifications. The benefit to online inference is demonstrated by reformulating several widely used network architectures into CINs, including 3D CNNs, ST-GCNs, and Transformer Encoders. An orthogonal contribution tackles the concurrent adaptation and computational acceleration of a large source model into multiple lightweight derived models. Drawing on fusible adapter networks and structured pruning, Structured Pruning Adapters achieve superior predictive accuracy under aggressive pruning using significantly fewer learned weights compared to fine-tuning with pruning.

updated: Fri Jun 23 2023 12:29:44 GMT+0000 (UTC)

published: Fri Jun 23 2023 12:29:44 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト