RL-CSDia: Representation Learning of Computer Science Diagrams

Shaowei Wang; LingLing Zhang; Xuan Luo; Yi Yang; Xin Hu; Jun Liu

RL-CSDia：コンピュータサイエンス図の表現学習

コンピュータビジョンに関する最近の研究は、主に現実世界のシーンを表現する自然画像に焦点を当てています。視覚的な質問応答など、さまざまなタスクで卓越したパフォーマンスを実現します。ダイアグラムは、教育分野で頻繁に現れる特殊な視覚表現であり、学習者がマルチモーダル知識を理解するために非常に重要です。ダイアグラムに関する現在の研究は、生物学や地理などの自然の分野に予備的に焦点を当てていますが、その表現は依然として自然の画像に似ています。コンピュータサイエンスなどの別のタイプの図は、複雑なトポロジと関係を含むグラフィックで構成されており、このタイプの図の研究はまだ空白です。グラフィックダイアグラムを理解する上での主な課題は、データの希少性とセマンティクスの混乱であり、これらは主に式の多様性に反映されています。この論文では、コンピュータサイエンスダイアグラム（CSDia）という名前のグラフィックダイアグラムの新しいデータセットを構築します。これには、1,200を超える図と、オブジェクトおよび関係の徹底的な注釈が含まれています。ダイアグラムのさまざまな表現によって引き起こされる視覚的なノイズを考慮して、トポロジー構造を解析するためにダイアグラムのトポロジーを紹介します。その後、トポロジー、視覚的特徴、テキストの3つのブランチからダイアグラムを表現するダイアグラム解析ネット（DPN）を提案し、モデルをダイアグラム分類タスクに適用して、ダイアグラムの理解能力を評価します。結果は、ダイアグラムの理解に対する提案されたDPNの有効性を示しています。

Recent studies on computer vision mainly focus on natural images that express real-world scenes. They achieve outstanding performance on diverse tasks such as visual question answering. Diagram is a special form of visual expression that frequently appears in the education field and is of great significance for learners to understand multimodal knowledge. Current research on diagrams preliminarily focuses on natural disciplines such as Biology and Geography, whose expressions are still similar to natural images. Another type of diagrams such as from Computer Science is composed of graphics containing complex topologies and relations, and research on this type of diagrams is still blank. The main challenges of graphic diagrams understanding are the rarity of data and the confusion of semantics, which are mainly reflected in the diversity of expressions. In this paper, we construct a novel dataset of graphic diagrams named Computer Science Diagrams (CSDia). It contains more than 1,200 diagrams and exhaustive annotations of objects and relations. Considering the visual noises caused by the various expressions in diagrams, we introduce the topology of diagrams to parse topological structure. After that, we propose Diagram Parsing Net (DPN) to represent the diagram from three branches: topology, visual feature, and text, and apply the model to the diagram classification task to evaluate the ability of diagrams understanding. The results show the effectiveness of the proposed DPN on diagrams understanding.

updated: Wed Mar 10 2021 07:01:07 GMT+0000 (UTC)

published: Wed Mar 10 2021 07:01:07 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト