论文标题
积极学习对基于图像的植物表型的有用?
How useful is Active Learning for Image-based Plant Phenotyping?
论文作者
论文摘要
深度学习模型已成功地用于各种基于图像的植物表型应用,包括疾病检测和分类。但是,成功的监督深度学习模型需要大量的标记数据,这在植物科学(和大多数生物学)领域中是固有的复杂性,这是一个重大挑战。具体而言,数据注释是代价高昂,费力,耗时的,并且需要用于表型任务的领域专业知识,尤其是对于疾病。为了克服这一挑战,已经提出了积极的学习算法,以减少深度学习模型所需的标签量,以实现良好的预测性能。主动学习方法适应选择样本以使用采集功能来注释,以在固定标签预算下实现最高(分类)性能。我们报告了四种不同的主动学习方法的性能,(1)深贝叶斯主动学习(DBAL),(2)熵,(3)最小置信度和(4)核心,以及针对两个基于两个不同图像的分类数据集的常规随机抽样注释。第一个图像数据集由大豆[甘氨酸Max L.(Merr。)]叶子组成,属于八种不同的大豆胁迫和健康类别,第二个由田间的九种不同的杂草物种组成。对于固定的标签预算,我们观察到,具有积极学习的收购策略的深度学习模型的分类性能比两个数据集的基于随机抽样的采集要好。积极学习策略的数据注释可以帮助减轻植物科学应用程序中的标签挑战,尤其是在需要深层域知识的情况下。
Deep learning models have been successfully deployed for a diverse array of image-based plant phenotyping applications including disease detection and classification. However, successful deployment of supervised deep learning models requires large amount of labeled data, which is a significant challenge in plant science (and most biological) domains due to the inherent complexity. Specifically, data annotation is costly, laborious, time consuming and needs domain expertise for phenotyping tasks, especially for diseases. To overcome this challenge, active learning algorithms have been proposed that reduce the amount of labeling needed by deep learning models to achieve good predictive performance. Active learning methods adaptively select samples to annotate using an acquisition function to achieve maximum (classification) performance under a fixed labeling budget. We report the performance of four different active learning methods, (1) Deep Bayesian Active Learning (DBAL), (2) Entropy, (3) Least Confidence, and (4) Coreset, with conventional random sampling-based annotation for two different image-based classification datasets. The first image dataset consists of soybean [Glycine max L. (Merr.)] leaves belonging to eight different soybean stresses and a healthy class, and the second consists of nine different weed species from the field. For a fixed labeling budget, we observed that the classification performance of deep learning models with active learning-based acquisition strategies is better than random sampling-based acquisition for both datasets. The integration of active learning strategies for data annotation can help mitigate labelling challenges in the plant sciences applications particularly where deep domain knowledge is required.