深度离散表示学习

论文标题

深度离散表示学习

Depthwise Discrete Representation Learning

论文作者

Fostiropoulos, Iordanis

论文摘要

在学习离散表示方面而不是连续的进步方面的最新进展导致了涉及语言，音频和远见的任务的状态。一些潜在因素，例如单词，音素和形状，以离散的潜在变量更好地表示，而不是连续变量。矢量量化的变分自动编码器（VQVAE）在多个域中产生了显着的结果。 VQVAE学习了先前的分发$ z_e $，并将其映射到离散数量的$ k $ vectors（向量量化）。我们建议沿特征轴应用VQ。我们假设这样做，我们正在学习代码簿向量和先前特征空间的边际分布之间的映射。与优质的离散模型相比，我们的方法可提高33 \％的改进，并且具有与最先进的自动回归模型（例如Pixelsnail）相似的性能。我们使用人造玩具数据集（Blobs）评估了静态事先的方法。我们进一步评估了CIFAR-10和Imagenet基准测试方法的方法。

Recent advancements in learning Discrete Representations as opposed to continuous ones have led to state of art results in tasks that involve Language, Audio and Vision. Some latent factors such as words, phonemes and shapes are better represented by discrete latent variables as opposed to continuous. Vector Quantized Variational Autoencoders (VQVAE) have produced remarkable results in multiple domains. VQVAE learns a prior distribution $z_e$ along with its mapping to a discrete number of $K$ vectors (Vector Quantization). We propose applying VQ along the feature axis. We hypothesize that by doing so, we are learning a mapping between the codebook vectors and the marginal distribution of the prior feature space. Our approach leads to 33\% improvement as compared to prevous discrete models and has similar performance to state of the art auto-regressive models (e.g. PixelSNAIL). We evaluate our approach on a static prior using an artificial toy dataset (blobs). We further evaluate our approach on benchmarks for CIFAR-10 and ImageNet.

下载PDF全文

下载文献需遵守相关版权规定

论文标题