国防科技大学
国防科技大学计算机学院
1007-130X
43-1258/TP
1973
计算机工程与科学
信息科技
月刊
1-3个月
95955次
42-153
湖南省长沙市
410073
随着通用图形处理器在高性能计算领域的广泛应用,新的并行执行模式被提出。在新模式下,当前的存储调度策略未能使存储器的吞吐率达到最大。分析了图形处理器上多程序并行执行模式下应用程序访存行为特征及其性能损失不公平的原因,提出了一种基于访存行为感知的存储调度策略,利用不同程序类型的优势进行优先级调度。实验表明,该方法能够明显改善不同类型程序间性能损失不均衡的问题,相比基准结构对所有测试程序的存储系统吞吐率和公平性分别有平均9.7%和15.0%的提升。
As general purpose computing graphic units are widely used in high-performance computing, a new concurrent execution model is proposed, under which the current memory scheduling policy is unable to achieve maximum memory throughput. We characterize different memory access behaviors of applications in the concurrent kernel execution on a single GPU platform, analyze the unbalanced performance loss across them, and propose a behavior-aware memory scheduling policy for GPGPU applications. Different priority scheduling methods are employed to exploit the advantages of application types. Experimental results show a significant improvement on the unbalanced performance loss among different types of applications. Averaged memory system throughput and fairness across all benchmarks are improved by 9.7% and 15.0% respectively over the baseline architecture.
相关文章
| [1] | 沈立,杨耀华,王志英. 通过部分Warp重组消除GPGPU控制流的不一致性[J]. 计算机工程与科学, 2019, 41(08): 1335-1342. |
| [2] | 舒兵,任秀江,张清波,陈芳园. 数据密集型应用在NVIDIA Fermi片内存储结构上的适应性分析[J]. J4, 2014, 36(04): 601-606. |
| [3] | 王锋,杜云飞,陈娟. GPGPU性能模型研究[J]. J4, 2013, 35(12): 1-7. |
| [4] | 徐莹,徐磊,姜恺. 三维NavierStokes方程分步法的并行算法在异构平台上实现初探[J]. J4, 2012, 34(9): 33-39. |
| [5] | 徐莹,徐磊. 三维Navier-Stokes方程的差分-谱方法混合法在GPU上的实现与优化[J]. J4, 2012, 34(8): 53-58. |
| [6] | 孙安玉,江贵平. 基于满二叉树分块策略的大规模数据场纹理映射体绘制算法[J]. J4, 2011, 33(3): 57-61. |
| [7] | 林一松,唐玉华,唐〓滔. GPGPU技术研究与发展[J]. J4, 2011, 33(10): 85-92. |