Knowledge driven Description Synthesis for Floor Plan Interpretation

Shreya Goyal; Chiranjoy Chattopadhyay; Gaurav Bhatnagar

フロアプラン解釈のための知識主導の記述統合

画像のキャプションは、AIの分野で広く知られている問題です。フロアプラン画像からのキャプション生成は、屋内経路計画、不動産、および建築ソリューションの提供に適用されます。フロアプラン画像からキャプションまたは半構造化された説明を生成するためのいくつかの方法が文献で検討されています。キャプションだけではきめ細かい詳細をキャプチャするには不十分であるため、研究者は画像から説明的な段落も提案しました。ただし、これらの説明は構造が厳格で柔軟性に欠けるため、リアルタイムのシナリオで使用することは困難です。このホワイトペーパーでは、既存の方法のギャップを埋めるために、フロアプランの画像からテキストを生成するための2つのモデル、画像キューからの説明合成（DSIC）とトランスフォーマーベースの説明生成（TBDG）を提供します。これらの2つのモデルは、視覚的特徴抽出とテキスト生成のために最新のディープニューラルネットワークを利用しています。両方のモデルの違いは、フロアプラン画像から入力を取得する方法にあります。 DSICモデルは、ディープニューラルネットワークによって自動的に抽出された視覚的特徴のみを取得しますが、TBDGモデルは、段落を含む入力フロアプラン画像から抽出されたテキストキャプションを学習します。 TBDGで生成された特定のキーワードを段落で理解すると、一般的なフロアプランイメージでより堅牢になります。実験は、大規模な公的に利用可能なデータセットで実行され、提案されたモデルの優位性を示すために最先端の技術と比較されました。

Image captioning is a widely known problem in the area of AI. Caption generation from floor plan images has applications in indoor path planning, real estate, and providing architectural solutions. Several methods have been explored in literature for generating captions or semi-structured descriptions from floor plan images. Since only the caption is insufficient to capture fine-grained details, researchers also proposed descriptive paragraphs from images. However, these descriptions have a rigid structure and lack flexibility, making it difficult to use them in real-time scenarios. This paper offers two models, Description Synthesis from Image Cue (DSIC) and Transformer Based Description Generation (TBDG), for the floor plan image to text generation to fill the gaps in existing methods. These two models take advantage of modern deep neural networks for visual feature extraction and text generation. The difference between both models is in the way they take input from the floor plan image. The DSIC model takes only visual features automatically extracted by a deep neural network, while the TBDG model learns textual captions extracted from input floor plan images with paragraphs. The specific keywords generated in TBDG and understanding them with paragraphs make it more robust in a general floor plan image. Experiments were carried out on a large-scale publicly available dataset and compared with state-of-the-art techniques to show the proposed model's superiority.

updated: Mon Mar 15 2021 11:57:18 GMT+0000 (UTC)

published: Mon Mar 15 2021 11:57:18 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト