找回密码
 立即注册
首页 业界区 安全 HPCG基准测试的几种执行方式

HPCG基准测试的几种执行方式

赖娅闺 2025-8-8 12:16:30
0 HPCG简介

HPCG(High Performance Conjugate Gradients)基准测试是一个高性能计算性能评估工具,它主要用于衡量超级计算机在稀疏矩阵、内存访问密集型任务下的真实性能,比传统的 HPL(LINPACK)更贴近很多科学与工程计算场景。
1.jpeg

HPL(LINPACK) 侧重浮点运算能力(FLOPS),适合反映处理器的峰值计算能力,但偏向计算密集型任务
HPCG 重点考察:

  • 稀疏矩阵存取
  • 内存带宽
  • 缓存效率
  • 通信延迟
因此,HPCG 的成绩通常是 HPL 的 0.3%~4% 左右,更接近真实 HPC 应用性能。
HPCG 实现了预条件共轭梯度法 (Preconditioned Conjugate Gradient) 求解三维泊松方程的迭代过程,包含:

  • 稀疏矩阵-向量乘法 (SpMV)
  • 向量更新 (AXPY)
  • 点积 (Dot Product)
  • 全局通信(MPI Allreduce)
  • 多重网格预条件器
1 标准源代码执行


  • 安装依赖库、clone代码、拷贝编译配置文件
  1. # dnf install -y gcc gcc-c++ make cmake openmpi openmpi-devel
  2. # git clone https://github.com/hpcg-benchmark/hpcg.git
  3. # cd setup
  4. # cp Make.Linux_MPI Make.kunpeng
复制代码

  • 修改编译配置文件
  1. # vim setup_make.kunpeng
  2. #HEADER
  3. #  -- High Performance Conjugate Gradient Benchmark (HPCG)
  4. #     HPCG - 3.1 - March 28, 2019
  5. #     Michael A. Heroux
  6. #     Scalable Algorithms Group, Computing Research Division
  7. #     Sandia National Laboratories, Albuquerque, NM
  8. #
  9. #     Piotr Luszczek
  10. #     Jack Dongarra
  11. #     University of Tennessee, Knoxville
  12. #     Innovative Computing Laboratory
  13. #
  14. #     (C) Copyright 2013-2019 All Rights Reserved
  15. #
  16. #
  17. #  -- Copyright notice and Licensing terms:
  18. #
  19. #  Redistribution  and  use in  source and binary forms, with or without
  20. #  modification, are  permitted provided  that the following  conditions
  21. #  are met:
  22. #
  23. #  1. Redistributions  of  source  code  must retain the above copyright
  24. #  notice, this list of conditions and the following disclaimer.
  25. #
  26. #  2. Redistributions in binary form must reproduce  the above copyright
  27. #  notice, this list of conditions,  and the following disclaimer in the
  28. #  documentation and/or other materials provided with the distribution.
  29. #
  30. #  3. All  advertising  materials  mentioning  features  or  use of this
  31. #  software must display the following acknowledgement:
  32. #  This  product  includes  software  developed  at Sandia National
  33. #  Laboratories, Albuquerque, NM and the  University  of
  34. #  Tennessee, Knoxville, Innovative Computing Laboratory.
  35. #
  36. #  4. The name of the  University,  the name of the  Laboratory,  or the
  37. #  names  of  its  contributors  may  not  be used to endorse or promote
  38. #  products  derived   from   this  software  without  specific  written
  39. #  permission.
  40. #
  41. #  -- Disclaimer:
  42. #
  43. #  THIS  SOFTWARE  IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
  44. #  ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES,  INCLUDING,  BUT NOT
  45. #  LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
  46. #  A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE UNIVERSITY
  47. #  OR  CONTRIBUTORS  BE  LIABLE FOR ANY  DIRECT,  INDIRECT,  INCIDENTAL,
  48. #  SPECIAL,  EXEMPLARY,  OR  CONSEQUENTIAL DAMAGES  (INCLUDING,  BUT NOT
  49. #  LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
  50. #  DATA OR PROFITS; OR BUSINESS INTERRUPTION)  HOWEVER CAUSED AND ON ANY
  51. #  THEORY OF LIABILITY, WHETHER IN CONTRACT,  STRICT LIABILITY,  OR TORT
  52. #  (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
  53. #  OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
  54. # ######################################################################
  55. #@HEADER
  56. # ----------------------------------------------------------------------
  57. # - shell --------------------------------------------------------------
  58. # ----------------------------------------------------------------------
  59. #
  60. SHELL        = /bin/sh
  61. #
  62. CD           = cd
  63. CP           = cp
  64. LN_S         = ln -s -f
  65. MKDIR        = mkdir -p
  66. RM           = /bin/rm -f
  67. TOUCH        = touch
  68. #
  69. # ----------------------------------------------------------------------
  70. # - HPCG Directory Structure / HPCG library ------------------------------
  71. # ----------------------------------------------------------------------
  72. #
  73. TOPdir       = .
  74. SRCdir       = $(TOPdir)/src
  75. INCdir       = $(TOPdir)/src
  76. BINdir       = $(TOPdir)/bin
  77. #
  78. # ----------------------------------------------------------------------
  79. # - Message Passing library (MPI) --------------------------------------
  80. # ----------------------------------------------------------------------
  81. # MPinc tells the  C  compiler where to find the Message Passing library
  82. # header files,  MPlib  is defined  to be the name of  the library to be
  83. # used. The variable MPdir is only used for defining MPinc and MPlib.
  84. #
  85. MPdir        = /usr/lib64/openmpi
  86. MPinc        = -I$(MPdir)/include
  87. MPlib        = -L$(MPdir)/lib -lmpi
  88. #
  89. #
  90. # ----------------------------------------------------------------------
  91. # - HPCG includes / libraries / specifics -------------------------------
  92. # ----------------------------------------------------------------------
  93. #
  94. HPCG_INCLUDES = -I$(INCdir) -I$(INCdir)/$(arch) $(MPinc)
  95. HPCG_LIBS     =
  96. #
  97. # - Compile time options -----------------------------------------------
  98. #
  99. # -DHPCG_NO_MPI         Define to disable MPI
  100. # -DHPCG_NO_OPENMP      Define to disable OPENMP
  101. # -DHPCG_CONTIGUOUS_ARRAYS Define to have sparse matrix arrays long and contiguous
  102. # -DHPCG_DEBUG          Define to enable debugging output
  103. # -DHPCG_DETAILED_DEBUG Define to enable very detailed debugging output
  104. #
  105. # By default HPCG will:
  106. #    *) Build with MPI enabled.
  107. #    *) Build with OpenMP enabled.
  108. #    *) Not generate debugging output.
  109. #
  110. HPCG_OPTS     = -DHPCG_NO_OPENMP
  111. #
  112. # ----------------------------------------------------------------------
  113. #
  114. HPCG_DEFS     = $(HPCG_OPTS) $(HPCG_INCLUDES)
  115. #
  116. # ----------------------------------------------------------------------
  117. # - Compilers / linkers - Optimization flags ---------------------------
  118. # ----------------------------------------------------------------------
  119. #
  120. CXX          = mpicxx
  121. #CXXFLAGS     = $(HPCG_DEFS) -fomit-frame-pointer -O3 -funroll-loops -W -Wall
  122. CXXFLAGS     = -O3 -march=armv8-a
  123. #
  124. LINKER       = $(CXX)
  125. LINKFLAGS    = $(CXXFLAGS)
  126. #
  127. ARCHIVER     = ar
  128. ARFLAGS      = r
  129. RANLIB       = echo
  130. USE_CUDA = 0
  131. #
  132. # ----------------------------------------------------------------------
  133. #
复制代码
注意这里禁用了CUDA

  • 编译
  1. cd ..
  2. make arch=kunpeng
  3. cd bin
复制代码

  • 准备执行配置文件
默认的配置文件: hpcg.dat
  1. HPCG benchmark input file
  2. Sandia National Laboratories; University of Tennessee, Knoxville
  3. 104 104 104
  4. 60
复制代码
2.png

我们的配置文件: hpcg.dat
  1. HPCG benchmark input file
  2. Sandia National Laboratories; University of Tennessee, Knoxville
  3. 128 128 128
  4. 300
复制代码
注意上面最后一行表示执行时间,建议是1800s起,300s以下可能会不准。

  • 执行:
  1. # mpirun --allow-run-as-root --mca pml ob1  -np 64 ./xhpcg
  2. ]# tail HPCG-Benchmark_3.1_2025-07-08_10-21-09.txt
  3. DDOT Timing Variations::Avg DDOT MPI_Allreduce time=2.58093
  4. Final Summary=
  5. Final Summary::HPCG result is VALID with a GFLOP/s rating of=16.1137
  6. Final Summary::HPCG 2.4 rating for historical reasons is=16.1678
  7. Final Summary::Reference version of ComputeDotProduct used=Performance results are most likely suboptimal
  8. Final Summary::Reference version of ComputeSPMV used=Performance results are most likely suboptimal
  9. Final Summary::Reference version of ComputeMG used=Performance results are most likely suboptimal
  10. Final Summary::Reference version of ComputeWAXPBY used=Performance results are most likely suboptimal
  11. Final Summary::Results are valid but execution time (sec) is=310.259
  12. Final Summary::Official results execution time (sec) must be at least=1800
  13. # cat hpcg20250708T100940.txt
  14. WARNING: PERFORMING UNPRECONDITIONED ITERATIONS
  15. Call [0] Number of Iterations [11] Scaled Residual [1.12102e-13]
  16. WARNING: PERFORMING UNPRECONDITIONED ITERATIONS
  17. Call [1] Number of Iterations [11] Scaled Residual [1.12102e-13]
  18. Call [0] Number of Iterations [2] Scaled Residual [2.79999e-17]
  19. Call [1] Number of Iterations [2] Scaled Residual [2.79999e-17]
  20. Departure from symmetry (scaled) for SpMV abs(x'*A*y - y'*A*x) = 7.31869e-10
  21. Departure from symmetry (scaled) for MG abs(x'*Minv*y - y'*Minv*x) = 5.92074e-11
  22. SpMV call [0] Residual [0]
  23. SpMV call [1] Residual [0]
  24. Call [0] Scaled Residual [0.00454823]
  25. Call [1] Scaled Residual [0.00454823]
复制代码
重要参数如下:
3.png

此机的HPCG结果为:16.1137 GFLOP/s
allow-run-as-root:这个参数明确告诉 mpirun 允许以 root 用户身份运行程序。
默认情况下,许多 MPI 实现(尤其是 Open MPI)会出于安全考虑,阻止 root 用户直接运行并行应用程序,因为这可能带来安全风险,尤其是在共享集群环境中。如果您的程序需要 root 权限才能运行(或者您正在以 root 身份直接执行 mpirun 命令),但 MPI 库又默认禁止 root 运行,就会出现权限相关的错误。添加此参数可以绕过此安全检查。
在非生产环境或测试中,如果确实需要 root 权限,可以使用此参数。但在生产环境或多用户共享集群中,不推荐以 root 身份运行计算任务,通常应该使用普通用户账户。
mca pml ob1: 这个参数是 MPI Component Architecture (MCA) 的一个选项,用于选择 点对点通信层 (PML - Point-to-Point Messaging Layer) 的具体实现。ob1 是 Open MPI 中一个常用的 PML 组件。
Open MPI 具有高度模块化的架构,允许用户为不同的功能(如点对点通信、集体通信、进程管理等)选择不同的组件。ob1 是 Open MPI 中默认的、也是最常用的 PML 组件之一。它通常使用各种网络接口(如 InfiniBand、Ethernet 等)进行通信。
显式指定 ob1 通常是为了确保使用特定的通信机制,或者解决与默认 PML 相关的兼容性/性能问题。在大多数情况下,如果您不指定,mpirun 也会默认使用 ob1,但显式指定可以确保行为一致性。
np 64:  这个参数指定了要启动的 MPI 进程(或称为“秩”或“rank”)的数量。
MPI 程序通过在多个进程之间分配任务来实现并行。每个进程都有一个唯一的 ID (rank),从 0 到 np-1。
64 表示您希望 xhpcg 程序以 64 个并行进程运行。这些进程可以分布在多个计算节点上,也可以全部运行在单个节点上,这取决于您的 hosts 文件配置和 mpirun 的其他资源调度参数。
参考资料


  • 软件测试精品书籍文档下载持续更新 https://github.com/china-testing/python-testing-examples 请点赞,谢谢!
  • 本文涉及的python测试开发库 谢谢点赞! https://github.com/china-testing/python_cn_resouce
  • python精品书籍下载 https://github.com/china-testing/python_cn_resouce/blob/main/python_good_books.md
  • Linux精品书籍下载 https://www.cnblogs.com/testing-/p/17438558.html
  • python八字排盘 https://github.com/china-testing/bazi
  • 联系方式:钉ding或V信: pythontesting
  • https://mirrors.huaweicloud.com/kunpeng/archive/HPC/benchmark/
  • https://developer.nvidia.com/nvidia-hpc-benchmarks-downloads?target_os=Linux&target_arch=x86_64
  • https://github.com/davidrohr/hpl-gpu
  • https://catalog.ngc.nvidia.com/orgs/nvidia/containers/hpc-benchmarks
  • https://github.com/NVIDIA/nvidia-hpcg
  • https://www.amd.com/en/developer/zen-software-studio/applications/pre-built-applications/zen-hpl.html
  • https://www.netlib.org/benchmark/hpl/
2 Phoronix Test Suite执行

安装Phoronix Test Suite参见:https://www.cnblogs.com/testing-/p/18303322

  • 安装 hpcg
  1. # phoronix-test-suite install hpcg
  2. # cd /var/lib/phoronix-test-suite/test-profiles/pts/hpcg-1.3.0
复制代码

  • 修改配置文件
test-definition.xml的内容:
  1. <?xml version="1.0"?>
  2. <PhoronixTestSuite>
  3.   <TestInformation>
  4.     <Title>High Performance Conjugate Gradient</Title>
  5.     3.1</AppVersion>
  6.     <Description>HPCG is the High Performance Conjugate Gradient and is a new scientific benchmark from Sandia National Lans focused for super-computer testing with modern real-world workloads compared to HPCC.</Description>
  7.     <ResultScale>GFLOP/s</ResultScale>
  8.     <Proportion>HIB</Proportion>
  9.     <TimesToRun>1</TimesToRun>
  10.   </TestInformation>
  11.   <TestProfile>
  12.     <Version>1.3.0</Version>
  13.     <SupportedPlatforms>Linux</SupportedPlatforms>
  14.     <SoftwareType>Benchmark</SoftwareType>
  15.     <TestType>Processor</TestType>
  16.     <License>Free</License>
  17.     <Status>Verified</Status>
  18.     <ExternalDependencies>build-utilities, fortran-compiler, openmpi-development</ExternalDependencies>
  19.     <EnvironmentSize>2.4</EnvironmentSize>
  20.     <ProjectURL>http://www.hpcg-benchmark.org/</ProjectURL>
  21.     <RepositoryURL>https://github.com/hpcg-benchmark/hpcg</RepositoryURL>
  22.     <InternalTags>SMP, MPI</InternalTags>
  23.     <Maintainer>Michael Larabel</Maintainer>
  24.   </TestProfile>
  25.   <TestSettings>
  26.     <Option>
  27.       <DisplayName>X Y Z</DisplayName>
  28.       <Identifier>xyz</Identifier>
  29.       <Menu>
  30.         <Entry>
  31.           <Name>104 104 104</Name>
  32.           <Value>--nx=104 --ny=104 --nz=104</Value>
  33.         </Entry>
  34.         <Entry>
  35.           <Name>144 144 144</Name>
  36.           <Value>--nx=144 --ny=144 --nz=144</Value>
  37.         </Entry>
  38.         <Entry>
  39.           <Name>160 160 160</Name>
  40.           <Value>--nx=160 --ny=160 --nz=160</Value>
  41.         </Entry>
  42.         <Entry>
  43.           <Name>192 192 192</Name>
  44.           <Value>--nx=192 --ny=192 --nz=192</Value>
  45.         </Entry>
  46.       </Menu>
  47.     </Option>
  48.     <Option>
  49.       <DisplayName>RT</DisplayName>
  50.       <Identifier>time</Identifier>
  51.       --rt=</ArgumentPrefix>
  52.       <Menu>
  53.         <Entry>
  54.           <Name>300</Name>
  55.           <Value>300</Value>
  56.           <Message>Shorter run-time</Message>
  57.         </Entry>
  58.         <Entry>
  59.           <Name>1800</Name>
  60.           <Value>1800</Value>
  61.           <Message>Official run-time</Message>
  62.         </Entry>
  63.       </Menu>
  64.     </Option>
  65.   </TestSettings>
  66. </PhoronixTestSuite>
复制代码

  • 执行测试
  1. # phoronix-test-suite benchmark hpcg
  2.     Evaluating External Test Dependencies .......................................................................................................................
  3. Phoronix Test Suite v10.8.4
  4.     Installed:     pts/hpcg-1.3.0
  5. High Performance Conjugate Gradient 3.1:
  6.     pts/hpcg-1.3.0
  7.     Processor Test Configuration
  8.         1: 104 104 104
  9.         2: 144 144 144
  10.         3: 160 160 160
  11.         4: 192 192 192
  12.         5: Test All Options
  13.         ** Multiple items can be selected, delimit by a comma. **
  14.         X Y Z: 1
  15.         1: 300  [Shorter run-time]
  16.         2: 1800 [Official run-time]
  17.         3: Test All Options
  18.         ** Multiple items can be selected, delimit by a comma. **
  19.         RT: 1
  20. System Information
  21.   PROCESSOR:              ARMv8 @ 2.90GHz
  22.     Core Count:           128
  23.     Cache Size:           224 MB
  24.     Scaling Driver:       cppc_cpufreq performance
  25.   GRAPHICS:               Huawei Hi171x [iBMC Intelligent Management chip w/VGA support]
  26.     Screen:               1024x768
  27.   MOTHERBOARD:            WUZHOU BC83AMDAA01-7270Z
  28.     BIOS Version:         11.62
  29.     Chipset:              Huawei HiSilicon
  30.     Network:              6 x Huawei HNS GE/10GE/25GE/50GE + 2 x Mellanox MT2892
  31.   MEMORY:                 16 x 32 GB 4800MT/s Samsung M321R4GA3BB6-CQKET
  32.   DISK:                   2 x 480GB HWE62ST3480L003N + 3 x 1920GB HWE62ST31T9L005N
  33.     File-System:          xfs
  34.     Mount Options:        attr2 inode64 noquota relatime rw
  35.     Disk Scheduler:       MQ-DEADLINE
  36.     Disk Details:         Block Size: 4096
  37.   OPERATING SYSTEM:       Kylin Linux Advanced Server V10
  38.     Kernel:               4.19.90-52.22.v2207.ky10.aarch64 (aarch64) 20230314
  39.     Display Server:       X Server 1.20.8
  40.     Compiler:             GCC 7.3.0 + CUDA 12.8
  41.     Security:             itlb_multihit: Not affected
  42.                           + l1tf: Not affected
  43.                           + mds: Not affected
  44.                           + meltdown: Not affected
  45.                           + mmio_stale_data: Not affected
  46.                           + spec_store_bypass: Mitigation of SSB disabled via prctl
  47.                           + spectre_v1: Mitigation of __user pointer sanitization
  48.                           + spectre_v2: Not affected
  49.                           + srbds: Not affected
  50.                           + tsx_async_abort: Not affected
  51.     Would you like to save these test results (Y/n): y
  52.     Enter a name for the result file: hpcg_45_31
  53.     Enter a unique name to describe this test run / configuration:
  54. If desired, enter a new description below to better describe this result set / system configuration under test.
  55. Press ENTER to proceed without changes.
  56. Current Description: ARMv8 testing with a WUZHOU BC83AMDAA01-7270Z (11.62 BIOS) and Huawei Hi171x [iBMC Intelligent Management chip w/VGA support] on Kylin Linux Advanced Server V10 via the Phoronix Test Suite.
  57. New Description:
  58. High Performance Conjugate Gradient 3.1:
  59.     pts/hpcg-1.3.0 [X Y Z: 104 104 104 - RT: 300]
  60.     Test 1 of 1
  61.     Estimated Trial Run Count:    1
  62.     Estimated Time To Completion: 38 Minutes [03:16 CDT]
  63.         Started Run 1 @ 02:39:00
  64.     X Y Z: 104 104 104 - RT: 300:
  65.         69.8633
  66.     Average: 69.8633 GFLOP/s
  67.     Do you want to view the text results of the testing (Y/n): Y
  68. hpcg_45_31
  69. ARMv8 testing with a WUZHOU BC83AMDAA01-7270Z (11.62 BIOS) and Huawei Hi171x [iBMC Intelligent Management chip w/VGA support] on Kylin Linux Advanced Server V10 via the Phoronix Test Suite.
  70. ARMv8:
  71.         Processor: ARMv8 @ 2.90GHz (128 Cores), Motherboard: WUZHOU BC83AMDAA01-7270Z (11.62 BIOS), Chipset: Huawei HiSilicon, Memory: 16 x 32 GB 4800MT/s Samsung M321R4GA3BB6-CQKET, Disk: 2 x 480GB HWE62ST3480L003N + 3 x 1920GB HWE62ST31T9L005N, Graphics: Huawei Hi171x [iBMC Intelligent Management chip w/VGA support], Network: 6 x Huawei HNS GE/10GE/25GE/50GE + 2 x Mellanox MT2892
  72.         OS: Kylin Linux Advanced Server V10, Kernel: 4.19.90-52.22.v2207.ky10.aarch64 (aarch64) 20230314, Display Server: X Server 1.20.8, Compiler: GCC 7.3.0 + CUDA 12.8, File-System: xfs, Screen Resolution: 1024x768
  73.     High Performance Conjugate Gradient 3.1
  74.     X Y Z: 104 104 104 - RT: 300
  75.     GFLOP/s > Higher Is Better
  76.     ARMv8 . 69.86 |==================================================================================================================================================
  77.     Would you like to upload the results to OpenBenchmarking.org (y/n): y
  78.     Would you like to attach the system logs (lspci, dmesg, lsusb, etc) to the test result (y/n): y
  79.     Results Uploaded To: https://openbenchmarking.org/result/2507083-NE-HPCG4531360
复制代码
可用浏览器查看测试结果:
4.png

3 NVIDIA HPC Benchmarks
  1. # docker pull nvcr.io/nvidia/hpc-benchmarks:25.04# vi HPCG.datHPCG benchmark input file
  2. Sandia National Laboratories; University of Tennessee, Knoxville
  3. 128 128 128
  4. 300# docker run --rm --gpus all --ipc=host --ulimit memlock=-1:-1 \       -v $(pwd):/host_data \       nvcr.io/nvidia/hpc-benchmarks:25.04 \       mpirun -np 1 \       /workspace/hpcg.sh \       --dat /host_data/HPCG.dat \       --cpu-affinity 0 \       --gpu-affinity 0========================================================================== NVIDIA HPC Benchmarks ==========================================================================NVIDIA Release 25.04Copyright (c) 2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved.Various files include modifications (c) NVIDIA CORPORATION & AFFILIATES.  All rights reserved.This container image and its contents are governed by the NVIDIA Deep Learning Container License.By pulling and using the container, you accept the terms and conditions of this license:https://developer.nvidia.com/ngc/nvidia-deep-learning-container-licenseWARNING: No InfiniBand devices detected.         Multi-node communication performance may be reduced.         Ensure /dev/infiniband is mounted to this container.HPCG-NVIDIA 25.4.0  -- NVIDIA accelerated HPCG benchmark -- NVIDIABuild v0.5.6Start of application (GPU-Only) ...Initial Residual = 2838.81Iteration = 1   Scaled Residual = 0.185703Iteration = 2   Scaled Residual = 0.101681...Iteration = 50   Scaled Residual = 3.94531e-07GPU Rank Info: | cuSPARSE version 12.5 | Reference CPU memory = 935.79 MB | GPU Name: 'NVIDIA GeForce RTX 4090' | GPU Memory Use: 2223 MB / 24082 MB | Process Grid: 1x1x1 | Local Domain: 128x128x128 | Number of CPU Threads: 1 | Slice Size: 2048WARNING: PERFORMING UNPRECONDITIONED ITERATIONSCall [0] Number of Iterations [11] Scaled Residual [1.19242e-14]WARNING: PERFORMING UNPRECONDITIONED ITERATIONSCall [1] Number of Iterations [11] Scaled Residual [1.19242e-14]Call [0] Number of Iterations [1] Scaled Residual [2.94233e-16]Call [1] Number of Iterations [1] Scaled Residual [2.94233e-16]Departure from symmetry (scaled) for SpMV abs(x'*A*y - y'*A*x) = 8.42084e-10Departure from symmetry (scaled) for MG abs(x'*Minv*y - y'*Minv*x) = 4.21042e-10SpMV call [0] Residual [0]SpMV call [1] Residual [0]Initial Residual = 2838.81Iteration = 1   Scaled Residual = 0.220178Iteration = 2   Scaled Residual = 0.118926...Iteration = 49   Scaled Residual = 4.98548e-07Iteration = 50   Scaled Residual = 3.08635e-07Call [0] Scaled Residual [3.08635e-07]Call [1] Scaled Residual [3.08635e-07]Call [2] Scaled Residual [3.08635e-07]...Call [1501] Scaled Residual [3.08635e-07]Call [1502] Scaled Residual [3.08635e-07]HPCG-Benchmarkversion=3.1Release date=March 28, 2019Machine Summary=Machine Summary::Distributed Processes=1Machine Summary::Threads per processes=1Global Problem Dimensions=Global Problem Dimensions::Global nx=128Global Problem Dimensions::Global ny=128Global Problem Dimensions::Global nz=128Processor Dimensions=Processor Dimensions::npx=1Processor Dimensions::npy=1Processor Dimensions::npz=1Local Domain Dimensions=Local Domain Dimensions::nx=128Local Domain Dimensions::ny=128########## Problem Summary  ##########=Setup Information=Setup Information::Setup Time=0.00910214Linear System Information=Linear System Information::Number of Equations=2097152Linear System Information::Number of Nonzero Terms=55742968Multigrid Information=Multigrid Information::Number of coarse grid levels=3Multigrid Information::Coarse Grids=Multigrid Information::Coarse Grids::Grid Level=1Multigrid Information::Coarse Grids::Number of Equations=262144Multigrid Information::Coarse Grids::Number of Nonzero Terms=6859000Multigrid Information::Coarse Grids::Number of Presmoother Steps=1Multigrid Information::Coarse Grids::Number of Postsmoother Steps=1Multigrid Information::Coarse Grids::Grid Level=2Multigrid Information::Coarse Grids::Number of Equations=32768Multigrid Information::Coarse Grids::Number of Nonzero Terms=830584Multigrid Information::Coarse Grids::Number of Presmoother Steps=1Multigrid Information::Coarse Grids::Number of Postsmoother Steps=1Multigrid Information::Coarse Grids::Grid Level=3Multigrid Information::Coarse Grids::Number of Equations=4096Multigrid Information::Coarse Grids::Number of Nonzero Terms=97336Multigrid Information::Coarse Grids::Number of Presmoother Steps=1Multigrid Information::Coarse Grids::Number of Postsmoother Steps=1########## Memory Use Summary  ##########=Memory Use Information=Memory Use Information::Total memory used for data (Gbytes)=1.49883Memory Use Information::Memory used for OptimizeProblem data (Gbytes)=0Memory Use Information::Bytes per equation (Total memory / Number of Equations)=714.697Memory Use Information::Memory used for linear system and CG (Gbytes)=1.31912Memory Use Information::Coarse Grids=Memory Use Information::Coarse Grids::Grid Level=1Memory Use Information::Coarse Grids::Memory used=0.15755Memory Use Information::Coarse Grids::Grid Level=2Memory Use Information::Coarse Grids::Memory used=0.0196946Memory Use Information::Coarse Grids::Grid Level=3Memory Use Information::Coarse Grids::Memory used=0.00246271########## V&V Testing Summary  ##########=Spectral Convergence Tests=Spectral Convergence Tests::Result=PASSEDSpectral Convergence Tests::Unpreconditioned=Spectral Convergence Tests::Unpreconditioned::Maximum iteration count=11Spectral Convergence Tests::Unpreconditioned::Expected iteration count=12Spectral Convergence Tests::Preconditioned=Spectral Convergence Tests::Preconditioned::Maximum iteration count=1Spectral Convergence Tests::Preconditioned::Expected iteration count=2Departure from Symmetry |x'Ay-y'Ax|/(2*||x||*||A||*||y||)/epsilon=Departure from Symmetry |x'Ay-y'Ax|/(2*||x||*||A||*||y||)/epsilon::Result=PASSEDDeparture from Symmetry |x'Ay-y'Ax|/(2*||x||*||A||*||y||)/epsilon::Departure for SpMV=8.42084e-10Departure from Symmetry |x'Ay-y'Ax|/(2*||x||*||A||*||y||)/epsilon::Departure for MG=4.21042e-10########## Iterations Summary  ##########=Iteration Count Information=Iteration Count Information::Result=PASSEDIteration Count Information::Reference CG iterations per set=50Iteration Count Information::Optimized CG iterations per set=50Iteration Count Information::Total number of reference iterations=75150Iteration Count Information::Total number of optimized iterations=75150########## Reproducibility Summary  ##########=Reproducibility Information=Reproducibility Information::Result=PASSEDReproducibility Information::Scaled residual mean=3.08635e-07Reproducibility Information::Scaled residual variance=0########## Performance Summary (times in sec) ##########=Benchmark Time Summary=Benchmark Time Summary::Optimization phase=0.017375Benchmark Time Summary::DDOT=6.03317Benchmark Time Summary::WAXPBY=6.80771Benchmark Time Summary::SpMV=58.5598Benchmark Time Summary::MG=227.166Benchmark Time Summary::Total=298.585Floating Point Operations Summary=Floating Point Operations Summary::Raw DDOT=9.5191e+11Floating Point Operations Summary::Raw WAXPBY=9.5191e+11Floating Point Operations Summary::Raw SpMV=8.54573e+12Floating Point Operations Summary::Raw MG=4.76988e+13Floating Point Operations Summary::Total=5.81484e+13Floating Point Operations Summary::Total with convergence overhead=5.81484e+13GB/s Summary=GB/s Summary::Raw Read B/W=1200GB/s Summary::Raw Write B/W=277.327GB/s Summary::Raw Total B/W=1477.32GB/s Summary::Total with convergence and optimization phase overhead=1457.89GFLOP/s Summary=GFLOP/s Summary::Raw DDOT=157.779GFLOP/s Summary::Raw WAXPBY=139.828GFLOP/s Summary::Raw SpMV=145.932GFLOP/s Summary::Raw MG=209.974GFLOP/s Summary::Raw Total=194.747GFLOP/s Summary::Total with convergence overhead=194.747GFLOP/s Summary::Total with convergence and optimization phase overhead=192.185User Optimization Overheads=User Optimization Overheads::Optimization phase time (sec)=0.017375User Optimization Overheads::Optimization phase time vs reference SpMV+MG time=0.0396317DDOT Timing Variations=DDOT Timing Variations::Min DDOT MPI_Allreduce time=0.220609DDOT Timing Variations::Max DDOT MPI_Allreduce time=0.220609DDOT Timing Variations::Avg DDOT MPI_Allreduce time=0.220609Final Summary=Final Summary::HPCG result is VALID with a GFLOP/s rating of=192.185Final Summary::HPCG 2.4 rating for historical reasons is=193.058Final Summary::Results are valid but execution time (sec) is=298.585Final Summary::Official results execution time (sec) must be at least=1800
复制代码
来源:程序园用户自行投稿发布,如果侵权,请联系站长删除
免责声明:如果侵犯了您的权益,请联系站长,我们会及时删除侵权内容,谢谢合作!
您需要登录后才可以回帖 登录 | 立即注册