STREAMLINE: Streaming Active Learning for Realistic Multi-Distributional Settings

Nathan Beck; Suraj Kothawade; Pradeep Shenoy; Rishabh Iyer

ストリームライン: 現実的な複数分散設定のためのアクティブラーニングのストリーミング

ディープニューラルネットワークは、ラベル付きトレーニングデータの大規模なコーパスを効果的に活用し、自動運転車、衛星画像処理などのいくつかの実世界のユースケースで一貫して優れたパフォーマンスを示しています。ただし、不偏モデルを学習するには、特定のタスクに対するさまざまな現実的なシナリオを表すデータセットを構築する必要があります。これは、データが大容量ストリームから取得され、各シナリオがさまざまな頻度でランダムにインターリーブされたエピソードで発生する多くの設定では困難です。私たちは、データインスタンスが一時的な複数分散データストリームに到着し、そこからサンプリングされる現実的なストリーミング設定を研究します。サブモジュールの情報尺度を使用して、スライス識別、スライスを意識した予算設定、およびデータ選択の 3 ステップの手順を通じて、ラベル付き作業データにおけるシナリオ駆動型のスライスの不均衡を軽減する、新しいストリーミングアクティブラーニングフレームワークである STREAMLINE を提案します。私たちは、画像分類や物体検出タスクのための実世界のストリーミングシナリオで STREAMLINE を広範囲に評価しています。 STREAMLINE により、頻度は低いものの重要なデータスライスのパフォーマンスが現在のベースラインと比較して、画像分類タスクの精度で最大 5%、物体検出タスクの mAP で最大 8% 向上していることがわかります。

Deep neural networks have consistently shown great performance in several real-world use cases like autonomous vehicles, satellite imaging, etc., effectively leveraging large corpora of labeled training data. However, learning unbiased models depends on building a dataset that is representative of a diverse range of realistic scenarios for a given task. This is challenging in many settings where data comes from high-volume streams, with each scenario occurring in random interleaved episodes at varying frequencies. We study realistic streaming settings where data instances arrive in and are sampled from an episodic multi-distributional data stream. Using submodular information measures, we propose STREAMLINE, a novel streaming active learning framework that mitigates scenario-driven slice imbalance in the working labeled data via a three-step procedure of slice identification, slice-aware budgeting, and data selection. We extensively evaluate STREAMLINE on real-world streaming scenarios for image classification and object detection tasks. We observe that STREAMLINE improves the performance on infrequent yet critical slices of the data over current baselines by up to 5% in terms of accuracy on our image classification tasks and by up to 8% in terms of mAP on our object detection tasks.

updated: Thu May 18 2023 02:01:45 GMT+0000 (UTC)

published: Thu May 18 2023 02:01:45 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト