A Geometric Approach to Online Streaming Feature Selection

Salimeh Yasaei Sekeh; Madan Ravi Ganesh; Shurjo Banerjee; Jason J. Corso; Alfred O. Hero

オンラインストリーミング機能選択への幾何学的アプローチ

オンラインストリーミング機能選択（OSFS）は、すべてのサンプルの個々の機能がストリーミング方式でアルゴリズムで利用できるようになる逐次学習の問題です。この作業では、まず、実行時にすべてのサンプルのデータを使用できるというOSFSの主な仮定は非現実的であり、OSFSとストリーミングサンプル（OSFS-SS）と呼ばれる機能とサンプルが同時にストリーミングされる新しい設定を導入します。第二に、主要なOSFS方法であるSAOLAは、無制限の相互情報量測定を利用し、保存された機能セットと着信機能セットの間の複数の比較ステップを使用して機能の重要性を評価します。 Geometric Online Adaptionを導入します。これは、比較的少ない機能比較ステップを必要とし、制限付きの条件付き幾何学的依存性メジャーを使用するアルゴリズムです。このアルゴリズムは、さまざまなデータセットでSAOLAを含むいくつかのOSFSベースラインよりも優れています。また、OSOL-SS設定で動作するようにSAOLAを拡張し、GOAが引き続き最高の結果を達成することを示します。第三に、OSFSアルゴリズム比較の現在のパラダイムに欠陥があります。アルゴリズムは、使用される特徴の数と学習者が取得する精度、基本的に互いに対立する2つの特性を比較することで測定されます。これらのプロパティのいずれかの制限を修正しないと、さまざまなアルゴリズムによって取得された機能の品質は比較できません。学習者が利用できる機能の最大数を修正し、アルゴリズムの精度に関してアルゴリズムを比較することにより、この矛盾を修正しようとします。さらに、一般的なディープコンボリューショナルフィーチャライザーから派生した機能セットでのSAOLAとGOAの動作を特徴付けます。

Online Streaming Feature Selection (OSFS) is a sequential learning problem where individual features across all samples are made available to algorithms in a streaming fashion. In this work, firstly, we assert that OSFS's main assumption of having data from all the samples available at runtime is unrealistic and introduce a new setting where features and samples are streamed concurrently called OSFS with Streaming Samples (OSFS-SS). Secondly, the primary OSFS method, SAOLA utilizes an unbounded mutual information measure and requires multiple comparison steps between the stored and incoming feature sets to evaluate a feature's importance. We introduce Geometric Online Adaption, an algorithm that requires relatively less feature comparison steps and uses a bounded conditional geometric dependency measure. Our algorithm outperforms several OSFS baselines including SAOLA on a variety of datasets. We also extend SAOLA to work in the OSFS-SS setting and show that GOA continues to achieve the best results. Thirdly, the current paradigm of the OSFS algorithm comparison is flawed. Algorithms are measured by comparing the number of features used and the accuracy obtained by the learner, two properties that are fundamentally at odds with one another. Without fixing a limit on either of these properties, the qualities of features obtained by different algorithms are incomparable. We try to rectify this inconsistency by fixing the maximum number of features available to the learner and comparing algorithms in terms of their accuracy. Additionally, we characterize the behaviour of SAOLA and GOA on feature sets derived from popular deep convolutional featurizers.

updated: Mon Mar 16 2020 04:49:21 GMT+0000 (UTC)

published: Wed Oct 02 2019 19:36:46 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト