Learning Neighborhood Representation from Multi-Modal Multi-Graph: Image, Text, Mobility Graph and Beyond

Tianyuan Huang; Zhecheng Wang; Hao Sheng; Andrew Y. Ng; Ram Rajagopal

マルチモーダルマルチグラフから近隣表現を学習する：画像、テキスト、モビリティグラフなど

最近の都市化は、ストリートビューやPOI（point-of-interest）などのジオタグ付きデータの充実と一致しています。より豊富なデータモダリティによって強化された地域の埋め込みにより、研究者と都市管理者は、構築された環境、社会経済学、および都市のダイナミクスをよりよく理解できるようになりました。マルチモーダル入力を同時に使用するためのいくつかの努力がなされてきましたが、既存の方法は、同じ埋め込みスペースに「近接性」のさまざまな測定値を組み込むことによって改善できます-地域を特徴付けるデータだけでなく（たとえば、ストリートビュー、地元の企業）パターン）だけでなく、地域間の関係を表すもの（たとえば、旅行、道路網）。この目的のために、近隣地域（タイル、国勢調査細分区、郵便番号地域など）との関係に基づいて、マルチモーダルジオタグ付き入力をマルチグラフのノードまたはエッジフィーチャとして統合する新しいアプローチを提案します。。次に、マルチグラフから対照的なサンプリングスキームに基づいて近傍表現を学習します。具体的には、ストリートビュー画像とPOI機能を使用して近隣（ノード）を特徴付け、人間の移動性を使用して近隣（有向エッジ）間の関係を特徴付けます。提案された方法の有効性を、定量的なダウンストリームタスクと埋め込みスペースの定性分析で示します。トレーニングした埋め込みは、地域の入力として単峰性データのみを使用したものよりも優れています。

Recent urbanization has coincided with the enrichment of geotagged data, such as street view and point-of-interest (POI). Region embedding enhanced by the richer data modalities has enabled researchers and city administrators to understand the built environment, socioeconomics, and the dynamics of cities better. While some efforts have been made to simultaneously use multi-modal inputs, existing methods can be improved by incorporating different measures of 'proximity' in the same embedding space - leveraging not only the data that characterizes the regions (e.g., street view, local businesses pattern) but also those that depict the relationship between regions (e.g., trips, road network). To this end, we propose a novel approach to integrate multi-modal geotagged inputs as either node or edge features of a multi-graph based on their relations with the neighborhood region (e.g., tiles, census block, ZIP code region, etc.). We then learn the neighborhood representation based on a contrastive-sampling scheme from the multi-graph. Specifically, we use street view images and POI features to characterize neighborhoods (nodes) and use human mobility to characterize the relationship between neighborhoods (directed edges). We show the effectiveness of the proposed methods with quantitative downstream tasks as well as qualitative analysis of the embedding space: The embedding we trained outperforms the ones using only unimodal data as regional inputs.

updated: Thu May 06 2021 07:44:05 GMT+0000 (UTC)

published: Thu May 06 2021 07:44:05 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト