arXiv reaDer
Enhancing Scene Graph Generation with Hierarchical Relationships and Commonsense Knowledge
This work introduces an enhanced approach to generating scene graphs by incorporating both a relationship hierarchy and commonsense knowledge. Specifically, we begin by proposing a hierarchical relation head that exploits an informative hierarchical structure. It jointly predicts the relation super-category between object pairs in an image, along with detailed relations under each super-category. Following this, we implement a robust commonsense validation pipeline that harnesses foundation models to critique the results from the scene graph prediction system, removing nonsensical predicates even with a small language-only model. Extensive experiments on Visual Genome and OpenImage V6 datasets demonstrate that the proposed modules can be seamlessly integrated as plug-and-play enhancements to existing scene graph generation algorithms. The results show significant improvements with an extensive set of reasonable predictions beyond dataset annotations. Codes are available at https://github.com/bowen-upenn/scene_graph_commonsense.
updated: Tue Jul 16 2024 04:39:05 GMT+0000 (UTC)
published: Tue Nov 21 2023 06:03:20 GMT+0000 (UTC)
参考文献 (このサイトで利用可能なもの) / References (only if available on this site)
被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)
Amazon.co.jpアソシエイト