Adapting to Skew: Imputing Spatiotemporal Urban Data with 3D Partial Convolutions and Biased Masking

Bin Han; Bill Howe

スキューへの適応: 3D 部分畳み込みとバイアスマスキングによる時空間都市データの代入

画像修復技術を適応させて、まばらさ、空間と時間の両方の分散、および異常なイベントを特徴とする都市設定の大規模で不規則な欠落領域を帰属させます。都市データの欠落地域は、センサーまたはソフトウェアの障害、データ品質の問題、気象現象による干渉、不完全なデータ収集、またはデータ使用規則の変化によって発生する可能性があります。データが欠落していると、データセット全体がダウンストリームアプリケーションで使用できなくなる可能性があります。カバレッジとユーティリティを確保するために、都市環境でのデータ交換に一般的に使用される 3D ヒストグラム (2D 空間 + 1D 時間) で動作するように、画像の修復にコンピュータービジョン技術を適応させます。これらの手法を時空間設定に適応させるには、スキューを処理する必要があります。都市データは、人口密度パターン (大きなまばらな地域に囲まれた小さな密集した地域) に従う傾向があります。これらのパターンは学習プロセスを支配し、モデルをだまして局所的または一時的な効果を無視させる可能性があります。スキューに対処するために、1) 空間と時間で同時にトレーニングし、2) トレーニングに使用されるマスクをデータのスキューに偏らせることで、密集した領域に注意を向けます。コアモデルとこれら 2 つの拡張機能を、NYC タクシーデータと NYC バイクシェアデータを使用して評価し、欠損データのさまざまな条件をシミュレートします。コアモデルが定性的および定量的に効果的であること、およびトレーニング中のバイアスマスキングがさまざまなシナリオでエラーを減らすことを示します。また、トレーニングサンプルごとのタイムステップ数を変更する際のトレードオフも明確にします。タイムステップが少なすぎると、モデルは一時的なイベントを無視します。タイムステップが多すぎると、モデルのトレーニングが遅くなり、パフォーマンスの向上が制限されます。

We adapt image inpainting techniques to impute large, irregular missing regions in urban settings characterized by sparsity, variance in both space and time, and anomalous events. Missing regions in urban data can be caused by sensor or software failures, data quality issues, interference from weather events, incomplete data collection, or varying data use regulations; any missing data can render the entire dataset unusable for downstream applications. To ensure coverage and utility, we adapt computer vision techniques for image inpainting to operate on 3D histograms (2D space + 1D time) commonly used for data exchange in urban settings. Adapting these techniques to the spatiotemporal setting requires handling skew: urban data tend to follow population density patterns (small dense regions surrounded by large sparse areas); these patterns can dominate the learning process and fool the model into ignoring local or transient effects. To combat skew, we 1) train simultaneously in space and time, and 2) focus attention on dense regions by biasing the masks used for training to the skew in the data. We evaluate the core model and these two extensions using the NYC taxi data and the NYC bikeshare data, simulating different conditions for missing data. We show that the core model is effective qualitatively and quantitatively, and that biased masking during training reduces error in a variety of scenarios. We also articulate a tradeoff in varying the number of timesteps per training sample: too few timesteps and the model ignores transient events; too many timesteps and the model is slow to train with limited performance gain.

updated: Tue Jan 10 2023 22:44:22 GMT+0000 (UTC)

published: Tue Jan 10 2023 22:44:22 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト