通过生成对抗网络通过数据扩展来改善天文学的时间序列分类

论文标题

通过生成对抗网络通过数据扩展来改善天文学的时间序列分类

Improving Astronomical Time-series Classification via Data Augmentation with Generative Adversarial Networks

论文作者

García-Jara, Germán, Protopapas, Pavlos, Estévez, Pablo A.

论文摘要

由于技术的最新进展，具有明显的天空覆盖范围的望远镜每晚将产生数百万天文学警报，必须迅速和自动进行分类。当前，分类由受监督的机器学习算法组成，其性能受到天文对象的现有注释及其高度不平衡的班级分布的限制。在这项工作中，我们提出了基于生成对抗网络（GAN）的数据增强方法，以从可变恒星中生成各种合成光曲线。我们的新颖贡献，包括重新采样技术和评估度量，可以评估不平衡数据集中生成模型的质量，并确定Fréchet成立距离未揭示的过量拟合案例。我们将提出的模型应用于来自Catalina和Zwicky瞬态设施调查的两个数据集。当使用合成数据训练并使用实际数据进行测试时，可变恒星的分类精度得到显着提高。

Due to the latest advances in technology, telescopes with significant sky coverage will produce millions of astronomical alerts per night that must be classified both rapidly and automatically. Currently, classification consists of supervised machine learning algorithms whose performance is limited by the number of existing annotations of astronomical objects and their highly imbalanced class distributions. In this work, we propose a data augmentation methodology based on Generative Adversarial Networks (GANs) to generate a variety of synthetic light curves from variable stars. Our novel contributions, consisting of a resampling technique and an evaluation metric, can assess the quality of generative models in unbalanced datasets and identify GAN-overfitting cases that the Fréchet Inception Distance does not reveal. We applied our proposed model to two datasets taken from the Catalina and Zwicky Transient Facility surveys. The classification accuracy of variable stars is improved significantly when training with synthetic data and testing with real data with respect to the case of using only real data.

下载PDF全文

下载文献需遵守相关版权规定

论文标题