基于GPU的数据并行渲染，大型，非结构化和非共同分区的数据

论文标题

基于GPU的数据并行渲染，大型，非结构化和非共同分区的数据

GPU-based Data-parallel Rendering of Large, Unstructured, and Non-convexly Partitioned Data

论文作者

Sahistan, Alper, Demirci, Serkan, Wald, Ingo, Zellmann, Stefan, Barbosa, João, Morrical, Nathan, Güdükbay, Uğur

论文摘要

计算流体动态模拟通常会产生大量有限元素的群集，这些元素具有非平凡的，非凸线的边界，并且在计算节点之间分布不均匀，从而在交互式体积渲染过程中对合成构成挑战。正确的，对此类簇的原位可视化变得困难，因为在多个计算节点上查看射线跨域边界。我们提出了一个基于GPU的，可扩展的，可扩展的直接音量可视化框架，适用于〜SITU和〜HOC使用情况。我们的方法通过利用独家或基于基于的索引缩减方案来减少非结构化卷元素的内存使用，并提供基于快速的基于射线的遍历，而无需在元素本身上构建大型外部数据结构。此外，我们提出了一种GPU优化的深层合成方案，该方案允许正确的订单合成在不同等级中累积的中间颜色值，这些颜色甚至适用于非convex群集。我们的方法在大型数据并行系统上很好地缩放，并在可视化过程中实现了交互式帧速率。我们可以在TACC的Frontera frontera superComputer上，分别以每秒14和10帧的速度将FUN3D小火星Lander（14 GB / 79840万限定元件）和巨大的火星Lander（111.57 GB / 64亿限定元素）的数据集以每秒14和10帧的速度使用。

Computational fluid dynamic simulations often produce large clusters of finite elements with non-trivial, non-convex boundaries and uneven distributions among compute nodes, posing challenges to compositing during interactive volume rendering. Correct, in-place visualization of such clusters becomes difficult because viewing rays straddle domain boundaries across multiple compute nodes. We propose a GPU-based, scalable, memory-efficient direct volume visualization framework suitable for in~situ and post~hoc usage. Our approach reduces memory usage of the unstructured volume elements by leveraging an exclusive or-based index reduction scheme and provides fast ray-marching-based traversal without requiring large external data structures built over the elements themselves. Moreover, we present a GPU-optimized deep compositing scheme that allows correct order compositing of intermediate color values accumulated across different ranks that works even for non-convex clusters. Our method scales well on large data-parallel systems and achieves interactive frame rates during visualization. We can interactively render both Fun3D Small Mars Lander (14 GB / 798.4 million finite elements) and Huge Mars Lander (111.57 GB / 6.4 billion finite elements) data sets at 14 and 10 frames per second using 72 and 80 GPUs, respectively, on TACC's Frontera supercomputer.

下载PDF全文

下载文献需遵守相关版权规定

论文标题