论文标题
PRIU:一种基于出处的方法,用于逐步更新回归模型
PrIU: A Provenance-Based Approach for Incrementally Updating Regression Models
论文作者
论文摘要
无处不在的机器学习算法的使用为传统数据库问题(例如增量视图更新)带来了新的挑战。在识别和修复培训数据集中的错误时,正在付出更好的理解和调试机器学习模型。我们的重点是在清除有问题的培训样本或选择不同的培训数据中的培训样本后,必须在这些活动中重新研究机器学习模型时如何协助这些活动。本文介绍了一种有效的基于出处的方法PRIU及其优化版本PRIU-OPT,用于逐步更新模型参数,而无需牺牲预测准确性。我们证明了增量更新的模型参数的正确性和收敛性,并通过实验验证它。实验结果表明,与简单地从头开始重述模型相比,PRIU-OPT最多可以实现多达两个数量级的速度,但获得了高度相似的模型。
The ubiquitous use of machine learning algorithms brings new challenges to traditional database problems such as incremental view update. Much effort is being put in better understanding and debugging machine learning models, as well as in identifying and repairing errors in training datasets. Our focus is on how to assist these activities when they have to retrain the machine learning model after removing problematic training samples in cleaning or selecting different subsets of training data for interpretability. This paper presents an efficient provenance-based approach, PrIU, and its optimized version, PrIU-opt, for incrementally updating model parameters without sacrificing prediction accuracy. We prove the correctness and convergence of the incrementally updated model parameters, and validate it experimentally. Experimental results show that up to two orders of magnitude speed-ups can be achieved by PrIU-opt compared to simply retraining the model from scratch, yet obtaining highly similar models.