论文标题

审查用于建模计数数据的概率分布

Review of Probability Distributions for Modeling Count Data

论文作者

Townes, F. William

论文摘要

计数数据采用非负整数值,并且使用标准线性高斯方法(例如线性回归和主成分分析)进行正确分析的挑战。通用线性模型可以使用泊松和负二项式等分布在回归上下文中对计数进行直接建模。当计数仅包含相对信息时,多项式或Dirichlet-Multinolial模型可以更合适。我们回顾了概率理论的多项式和计数模型之间的一些基本联系,提供了详细的证据。这些关系对于在应用程序中的方法开发(例如文本数据和基因组学的主题建模)中很有用。

Count data take on non-negative integer values and are challenging to properly analyze using standard linear-Gaussian methods such as linear regression and principal components analysis. Generalized linear models enable direct modeling of counts in a regression context using distributions such as the Poisson and negative binomial. When counts contain only relative information, multinomial or Dirichlet-multinomial models can be more appropriate. We review some of the fundamental connections between multinomial and count models from probability theory, providing detailed proofs. These relationships are useful for methods development in applications such as topic modeling of text data and genomics.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源