使用深层生成模型重建不完整的野火数据

论文标题

使用深层生成模型重建不完整的野火数据

Reconstruction of Incomplete Wildfire Data using Deep Generative Models

论文作者

Ivek, Tomislav, Vlah, Domagoj

论文摘要

我们提出了提交给极端价值分析2021数据挑战的提交，其中要求团队准确预测缺失数据的时空区域内野火频率和大小的分布。为了这场比赛，我们开发了强大的变分自动编码器模型的变体，称为有条件的数据重要性加权自动编码器（CMIWAE）。我们的深层变量生成模型几乎不需要功能工程，并且不一定依赖于数据挑战中评分的细节。它是对不完整数据的全面训练，其目标是最大程度地提高观察到的野火信息的对数可能性。我们通过从变异潜在变量分布中进行随机采样，以及通过在提供的数据的不同分配中训练和验证的一组CMIWAE模型，从而减轻训练样本数量相对较少的影响。提出的方法不是特定领域的，并且可以在其他缺少的数据恢复任务中应用，这些恢复任务具有表面上的图像或图像状信息，以辅助信息为条件。

We present our submission to the Extreme Value Analysis 2021 Data Challenge in which teams were asked to accurately predict distributions of wildfire frequency and size within spatio-temporal regions of missing data. For the purpose of this competition we developed a variant of the powerful variational autoencoder models dubbed the Conditional Missing data Importance-Weighted Autoencoder (CMIWAE). Our deep latent variable generative model requires little to no feature engineering and does not necessarily rely on the specifics of scoring in the Data Challenge. It is fully trained on incomplete data, with the single objective to maximize log-likelihood of the observed wildfire information. We mitigate the effects of the relatively low number of training samples by stochastic sampling from a variational latent variable distribution, as well as by ensembling a set of CMIWAE models trained and validated on different splits of the provided data. The presented approach is not domain-specific and is amenable to application in other missing data recovery tasks with tabular or image-like information conditioned on auxiliary information.

下载PDF全文

下载文献需遵守相关版权规定

论文标题