Representing Prior Knowledge Using Randomly, Weighted Feature Networks for Visual Relationship Detection

Jinyung Hong; Theodore P. Pavlic

視覚的関係検出のためのランダムに重み付けされた特徴ネットワークを使用した事前知識の表現

Hong and Pavlic（2021）によって導入された単一の隠れ層のランダム加重機能ネットワーク（RWFN）は、リレーショナル学習タスクのニューラルテンソルネットワークアプローチの代替として開発されました。 2つのランダム化された入力投影（昆虫の脳に触発された入力表現とランダムなフーリエ機能）の使用と組み合わされたその比較的小さなフットプリントにより、比較的低いトレーニングコストでリレーショナル学習の豊かな表現力を実現できます。特に、HongとPavlicがセマンティックイメージ解釈（SII）タスクでRWFNをロジックテンソルネットワーク（LTN）と比較して、画像から構造化されたセマンティック記述を抽出した場合、2つの非表示のランダム化された表現のRWFN統合が入力間の関係をより適切にキャプチャすることを示しました使用する学習可能なパラメーターがはるかに少ないにもかかわらず、トレーニングプロセスが高速になります。このホワイトペーパーでは、RWFNを使用して、より困難なSIIタスクである視覚的関係検出（VRD）タスクを実行します。ゼロショット学習アプローチはRWFNで使用され、他の見られる関係や背景知識との類似性を活用して、トレーニングに表示されないトリプルを予測する機能を実現します。設定。主要な統計的関係学習フレームワークの1つであるRWFNとLTNのパフォーマンスを比較するための視覚的関係データセットの実験では、RWFNが述語検出タスクでLTNを上回り、適応可能なパラメーターの数が少ない（1:56の比率）ことが示されています。さらに、RWFNのスペースの複雑さがLTN（1:27の比率）よりもはるかに小さい場合でも、RWFNによって表される背景知識を使用してトレーニングセットの不完全性を軽減できます。

The single-hidden-layer Randomly Weighted Feature Network (RWFN) introduced by Hong and Pavlic (2021) was developed as an alternative to neural tensor network approaches for relational learning tasks. Its relatively small footprint combined with the use of two randomized input projections -- an insect-brain-inspired input representation and random Fourier features -- allow it to achieve rich expressiveness for relational learning with relatively low training cost. In particular, when Hong and Pavlic compared RWFN to Logic Tensor Networks (LTNs) for Semantic Image Interpretation (SII) tasks to extract structured semantic descriptions from images, they showed that the RWFN integration of the two hidden, randomized representations better captures relationships among inputs with a faster training process even though it uses far fewer learnable parameters. In this paper, we use RWFNs to perform Visual Relationship Detection (VRD) tasks, which are more challenging SII tasks. A zero-shot learning approach is used with RWFN that can exploit similarities with other seen relationships and background knowledge -- expressed with logical constraints between subjects, relations, and objects -- to achieve the ability to predict triples that do not appear in the training set. The experiments on the Visual Relationship Dataset to compare the performance between RWFNs and LTNs, one of the leading Statistical Relational Learning frameworks, show that RWFNs outperform LTNs for the predicate-detection task while using fewer number of adaptable parameters (1:56 ratio). Furthermore, background knowledge represented by RWFNs can be used to alleviate the incompleteness of training sets even though the space complexity of RWFNs is much smaller than LTNs (1:27 ratio).

updated: Sat Nov 20 2021 21:56:45 GMT+0000 (UTC)

published: Sat Nov 20 2021 21:56:45 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト