赖娅闺 发表于 2025-8-8 12:16:30

HPCG基准测试的几种执行方式

0 HPCG简介

HPCG(High Performance Conjugate Gradients)基准测试是一个高性能计算性能评估工具,它主要用于衡量超级计算机在稀疏矩阵、内存访问密集型任务下的真实性能,比传统的 HPL(LINPACK)更贴近很多科学与工程计算场景。

HPL(LINPACK) 侧重浮点运算能力(FLOPS),适合反映处理器的峰值计算能力,但偏向计算密集型任务
HPCG 重点考察:

[*]稀疏矩阵存取
[*]内存带宽
[*]缓存效率
[*]通信延迟
因此,HPCG 的成绩通常是 HPL 的 0.3%~4% 左右,更接近真实 HPC 应用性能。
HPCG 实现了预条件共轭梯度法 (Preconditioned Conjugate Gradient) 求解三维泊松方程的迭代过程,包含:

[*]稀疏矩阵-向量乘法 (SpMV)
[*]向量更新 (AXPY)
[*]点积 (Dot Product)
[*]全局通信(MPI Allreduce)
[*]多重网格预条件器
1 标准源代码执行


[*]安装依赖库、clone代码、拷贝编译配置文件
# dnf install -y gcc gcc-c++ make cmake openmpi openmpi-devel
# git clone https://github.com/hpcg-benchmark/hpcg.git
# cd setup
# cp Make.Linux_MPI Make.kunpeng

[*]修改编译配置文件
# vim setup_make.kunpeng
#HEADER
#-- High Performance Conjugate Gradient Benchmark (HPCG)
#   HPCG - 3.1 - March 28, 2019

#   Michael A. Heroux
#   Scalable Algorithms Group, Computing Research Division
#   Sandia National Laboratories, Albuquerque, NM
#
#   Piotr Luszczek
#   Jack Dongarra
#   University of Tennessee, Knoxville
#   Innovative Computing Laboratory
#
#   (C) Copyright 2013-2019 All Rights Reserved
#
#
#-- Copyright notice and Licensing terms:
#
#Redistributionanduse insource and binary forms, with or without
#modification, arepermitted providedthat the followingconditions
#are met:
#
#1. Redistributionsofsourcecodemust retain the above copyright
#notice, this list of conditions and the following disclaimer.
#
#2. Redistributions in binary form must reproducethe above copyright
#notice, this list of conditions,and the following disclaimer in the
#documentation and/or other materials provided with the distribution.
#
#3. Alladvertisingmaterialsmentioningfeaturesoruse of this
#software must display the following acknowledgement:
#Thisproductincludessoftwaredevelopedat Sandia National
#Laboratories, Albuquerque, NM and theUniversityof
#Tennessee, Knoxville, Innovative Computing Laboratory.
#
#4. The name of theUniversity,the name of theLaboratory,or the
#namesofitscontributorsmaynotbe used to endorse or promote
#productsderived   from   thissoftwarewithoutspecificwritten
#permission.
#
#-- Disclaimer:
#
#THISSOFTWAREIS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
#``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES,INCLUDING,BUT NOT
#LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
#A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE UNIVERSITY
#ORCONTRIBUTORSBELIABLE FOR ANYDIRECT,INDIRECT,INCIDENTAL,
#SPECIAL,EXEMPLARY,ORCONSEQUENTIAL DAMAGES(INCLUDING,BUT NOT
#LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
#DATA OR PROFITS; OR BUSINESS INTERRUPTION)HOWEVER CAUSED AND ON ANY
#THEORY OF LIABILITY, WHETHER IN CONTRACT,STRICT LIABILITY,OR TORT
#(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
#OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
# ######################################################################
#@HEADER
# ----------------------------------------------------------------------
# - shell --------------------------------------------------------------
# ----------------------------------------------------------------------
#
SHELL      = /bin/sh
#
CD         = cd
CP         = cp
LN_S         = ln -s -f
MKDIR      = mkdir -p
RM         = /bin/rm -f
TOUCH      = touch
#
# ----------------------------------------------------------------------
# - HPCG Directory Structure / HPCG library ------------------------------
# ----------------------------------------------------------------------
#
TOPdir       = .
SRCdir       = $(TOPdir)/src
INCdir       = $(TOPdir)/src
BINdir       = $(TOPdir)/bin
#
# ----------------------------------------------------------------------
# - Message Passing library (MPI) --------------------------------------
# ----------------------------------------------------------------------
# MPinc tells theCcompiler where to find the Message Passing library
# header files,MPlibis definedto be the name ofthe library to be
# used. The variable MPdir is only used for defining MPinc and MPlib.
#
MPdir      = /usr/lib64/openmpi
MPinc      = -I$(MPdir)/include
MPlib      = -L$(MPdir)/lib -lmpi
#
#
# ----------------------------------------------------------------------
# - HPCG includes / libraries / specifics -------------------------------
# ----------------------------------------------------------------------
#
HPCG_INCLUDES = -I$(INCdir) -I$(INCdir)/$(arch) $(MPinc)
HPCG_LIBS   =
#
# - Compile time options -----------------------------------------------
#
# -DHPCG_NO_MPI         Define to disable MPI
# -DHPCG_NO_OPENMP      Define to disable OPENMP
# -DHPCG_CONTIGUOUS_ARRAYS Define to have sparse matrix arrays long and contiguous
# -DHPCG_DEBUG          Define to enable debugging output
# -DHPCG_DETAILED_DEBUG Define to enable very detailed debugging output
#
# By default HPCG will:
#    *) Build with MPI enabled.
#    *) Build with OpenMP enabled.
#    *) Not generate debugging output.
#
HPCG_OPTS   = -DHPCG_NO_OPENMP
#
# ----------------------------------------------------------------------
#
HPCG_DEFS   = $(HPCG_OPTS) $(HPCG_INCLUDES)
#
# ----------------------------------------------------------------------
# - Compilers / linkers - Optimization flags ---------------------------
# ----------------------------------------------------------------------
#
CXX          = mpicxx
#CXXFLAGS   = $(HPCG_DEFS) -fomit-frame-pointer -O3 -funroll-loops -W -Wall
CXXFLAGS   = -O3 -march=armv8-a
#
LINKER       = $(CXX)
LINKFLAGS    = $(CXXFLAGS)
#
ARCHIVER   = ar
ARFLAGS      = r
RANLIB       = echo
USE_CUDA = 0
#
# ----------------------------------------------------------------------

#注意这里禁用了CUDA

[*]编译
cd ..
make arch=kunpeng
cd bin

[*]准备执行配置文件
默认的配置文件: hpcg.dat
HPCG benchmark input file
Sandia National Laboratories; University of Tennessee, Knoxville
104 104 104
60
我们的配置文件: hpcg.dat
HPCG benchmark input file
Sandia National Laboratories; University of Tennessee, Knoxville
128 128 128
300注意上面最后一行表示执行时间,建议是1800s起,300s以下可能会不准。

[*]执行:
# mpirun --allow-run-as-root --mca pml ob1-np 64 ./xhpcg
]# tail HPCG-Benchmark_3.1_2025-07-08_10-21-09.txt
DDOT Timing Variations::Avg DDOT MPI_Allreduce time=2.58093
Final Summary=
Final Summary::HPCG result is VALID with a GFLOP/s rating of=16.1137
Final Summary::HPCG 2.4 rating for historical reasons is=16.1678
Final Summary::Reference version of ComputeDotProduct used=Performance results are most likely suboptimal
Final Summary::Reference version of ComputeSPMV used=Performance results are most likely suboptimal
Final Summary::Reference version of ComputeMG used=Performance results are most likely suboptimal
Final Summary::Reference version of ComputeWAXPBY used=Performance results are most likely suboptimal
Final Summary::Results are valid but execution time (sec) is=310.259
Final Summary::Official results execution time (sec) must be at least=1800
# cat hpcg20250708T100940.txt
WARNING: PERFORMING UNPRECONDITIONED ITERATIONS
Call Number of Iterations Scaled Residual
WARNING: PERFORMING UNPRECONDITIONED ITERATIONS
Call Number of Iterations Scaled Residual
Call Number of Iterations Scaled Residual
Call Number of Iterations Scaled Residual
Departure from symmetry (scaled) for SpMV abs(x'*A*y - y'*A*x) = 7.31869e-10
Departure from symmetry (scaled) for MG abs(x'*Minv*y - y'*Minv*x) = 5.92074e-11
SpMV call Residual
SpMV call Residual
Call Scaled Residual
Call Scaled Residual 重要参数如下:

此机的HPCG结果为:16.1137 GFLOP/s
allow-run-as-root:这个参数明确告诉 mpirun 允许以 root 用户身份运行程序。
默认情况下,许多 MPI 实现(尤其是 Open MPI)会出于安全考虑,阻止 root 用户直接运行并行应用程序,因为这可能带来安全风险,尤其是在共享集群环境中。如果您的程序需要 root 权限才能运行(或者您正在以 root 身份直接执行 mpirun 命令),但 MPI 库又默认禁止 root 运行,就会出现权限相关的错误。添加此参数可以绕过此安全检查。
在非生产环境或测试中,如果确实需要 root 权限,可以使用此参数。但在生产环境或多用户共享集群中,不推荐以 root 身份运行计算任务,通常应该使用普通用户账户。
mca pml ob1: 这个参数是 MPI Component Architecture (MCA) 的一个选项,用于选择 点对点通信层 (PML - Point-to-Point Messaging Layer) 的具体实现。ob1 是 Open MPI 中一个常用的 PML 组件。
Open MPI 具有高度模块化的架构,允许用户为不同的功能(如点对点通信、集体通信、进程管理等)选择不同的组件。ob1 是 Open MPI 中默认的、也是最常用的 PML 组件之一。它通常使用各种网络接口(如 InfiniBand、Ethernet 等)进行通信。
显式指定 ob1 通常是为了确保使用特定的通信机制,或者解决与默认 PML 相关的兼容性/性能问题。在大多数情况下,如果您不指定,mpirun 也会默认使用 ob1,但显式指定可以确保行为一致性。
np 64:这个参数指定了要启动的 MPI 进程(或称为“秩”或“rank”)的数量。
MPI 程序通过在多个进程之间分配任务来实现并行。每个进程都有一个唯一的 ID (rank),从 0 到 np-1。
64 表示您希望 xhpcg 程序以 64 个并行进程运行。这些进程可以分布在多个计算节点上,也可以全部运行在单个节点上,这取决于您的 hosts 文件配置和 mpirun 的其他资源调度参数。
参考资料


[*]软件测试精品书籍文档下载持续更新 https://github.com/china-testing/python-testing-examples 请点赞,谢谢!
[*]本文涉及的python测试开发库 谢谢点赞! https://github.com/china-testing/python_cn_resouce
[*]python精品书籍下载 https://github.com/china-testing/python_cn_resouce/blob/main/python_good_books.md
[*]Linux精品书籍下载 https://www.cnblogs.com/testing-/p/17438558.html
[*]python八字排盘 https://github.com/china-testing/bazi
[*]联系方式:钉ding或V信: pythontesting
[*]https://mirrors.huaweicloud.com/kunpeng/archive/HPC/benchmark/
[*]https://developer.nvidia.com/nvidia-hpc-benchmarks-downloads?target_os=Linux&target_arch=x86_64
[*]https://github.com/davidrohr/hpl-gpu
[*]https://catalog.ngc.nvidia.com/orgs/nvidia/containers/hpc-benchmarks
[*]https://github.com/NVIDIA/nvidia-hpcg
[*]https://www.amd.com/en/developer/zen-software-studio/applications/pre-built-applications/zen-hpl.html
[*]https://www.netlib.org/benchmark/hpl/
2 Phoronix Test Suite执行

安装Phoronix Test Suite参见:https://www.cnblogs.com/testing-/p/18303322

[*]安装 hpcg
# phoronix-test-suite install hpcg
# cd /var/lib/phoronix-test-suite/test-profiles/pts/hpcg-1.3.0

[*]修改配置文件
test-definition.xml的内容:
<?xml version="1.0"?>

<PhoronixTestSuite>
<TestInformation>
    <Title>High Performance Conjugate Gradient</Title>
    3.1</AppVersion>
    <Description>HPCG is the High Performance Conjugate Gradient and is a new scientific benchmark from Sandia National Lans focused for super-computer testing with modern real-world workloads compared to HPCC.</Description>
    <ResultScale>GFLOP/s</ResultScale>
    <Proportion>HIB</Proportion>
    <TimesToRun>1</TimesToRun>
</TestInformation>
<TestProfile>
    <Version>1.3.0</Version>
    <SupportedPlatforms>Linux</SupportedPlatforms>
    <SoftwareType>Benchmark</SoftwareType>
    <TestType>Processor</TestType>
    <License>Free</License>
    <Status>Verified</Status>
    <ExternalDependencies>build-utilities, fortran-compiler, openmpi-development</ExternalDependencies>
    <EnvironmentSize>2.4</EnvironmentSize>
    <ProjectURL>http://www.hpcg-benchmark.org/</ProjectURL>
    <RepositoryURL>https://github.com/hpcg-benchmark/hpcg</RepositoryURL>
    <InternalTags>SMP, MPI</InternalTags>
    <Maintainer>Michael Larabel</Maintainer>
</TestProfile>
<TestSettings>
    <Option>
      <DisplayName>X Y Z</DisplayName>
      <Identifier>xyz</Identifier>
      <Menu>
      <Entry>
          <Name>104 104 104</Name>
          <Value>--nx=104 --ny=104 --nz=104</Value>
      </Entry>
      <Entry>
          <Name>144 144 144</Name>
          <Value>--nx=144 --ny=144 --nz=144</Value>
      </Entry>
      <Entry>
          <Name>160 160 160</Name>
          <Value>--nx=160 --ny=160 --nz=160</Value>
      </Entry>
      <Entry>
          <Name>192 192 192</Name>
          <Value>--nx=192 --ny=192 --nz=192</Value>
      </Entry>
      </Menu>
    </Option>
    <Option>
      <DisplayName>RT</DisplayName>
      <Identifier>time</Identifier>
      --rt=</ArgumentPrefix>
      <Menu>
      <Entry>
          <Name>300</Name>
          <Value>300</Value>
          <Message>Shorter run-time</Message>
      </Entry>
      <Entry>
          <Name>1800</Name>
          <Value>1800</Value>
          <Message>Official run-time</Message>
      </Entry>
      </Menu>
    </Option>
</TestSettings>
</PhoronixTestSuite>

[*]执行测试
# phoronix-test-suite benchmark hpcg

    Evaluating External Test Dependencies .......................................................................................................................

Phoronix Test Suite v10.8.4

    Installed:   pts/hpcg-1.3.0


High Performance Conjugate Gradient 3.1:
    pts/hpcg-1.3.0
    Processor Test Configuration
      1: 104 104 104
      2: 144 144 144
      3: 160 160 160
      4: 192 192 192
      5: Test All Options
      ** Multiple items can be selected, delimit by a comma. **
      X Y Z: 1


      1: 300
      2: 1800
      3: Test All Options
      ** Multiple items can be selected, delimit by a comma. **
      RT: 1


System Information


PROCESSOR:            ARMv8 @ 2.90GHz
    Core Count:         128
    Cache Size:         224 MB
    Scaling Driver:       cppc_cpufreq performance

GRAPHICS:               Huawei Hi171x
    Screen:               1024x768

MOTHERBOARD:            WUZHOU BC83AMDAA01-7270Z
    BIOS Version:         11.62
    Chipset:            Huawei HiSilicon
    Network:            6 x Huawei HNS GE/10GE/25GE/50GE + 2 x Mellanox MT2892

MEMORY:               16 x 32 GB 4800MT/s Samsung M321R4GA3BB6-CQKET

DISK:                   2 x 480GB HWE62ST3480L003N + 3 x 1920GB HWE62ST31T9L005N
    File-System:          xfs
    Mount Options:      attr2 inode64 noquota relatime rw
    Disk Scheduler:       MQ-DEADLINE
    Disk Details:         Block Size: 4096

OPERATING SYSTEM:       Kylin Linux Advanced Server V10
    Kernel:               4.19.90-52.22.v2207.ky10.aarch64 (aarch64) 20230314
    Display Server:       X Server 1.20.8
    Compiler:             GCC 7.3.0 + CUDA 12.8
    Security:             itlb_multihit: Not affected
                        + l1tf: Not affected
                        + mds: Not affected
                        + meltdown: Not affected
                        + mmio_stale_data: Not affected
                        + spec_store_bypass: Mitigation of SSB disabled via prctl
                        + spectre_v1: Mitigation of __user pointer sanitization
                        + spectre_v2: Not affected
                        + srbds: Not affected
                        + tsx_async_abort: Not affected

    Would you like to save these test results (Y/n): y
    Enter a name for the result file: hpcg_45_31
    Enter a unique name to describe this test run / configuration:

If desired, enter a new description below to better describe this result set / system configuration under test.
Press ENTER to proceed without changes.

Current Description: ARMv8 testing with a WUZHOU BC83AMDAA01-7270Z (11.62 BIOS) and Huawei Hi171x on Kylin Linux Advanced Server V10 via the Phoronix Test Suite.

New Description:

High Performance Conjugate Gradient 3.1:
    pts/hpcg-1.3.0
    Test 1 of 1
    Estimated Trial Run Count:    1
    Estimated Time To Completion: 38 Minutes
      Started Run 1 @ 02:39:00

    X Y Z: 104 104 104 - RT: 300:
      69.8633

    Average: 69.8633 GFLOP/s

    Do you want to view the text results of the testing (Y/n): Y
hpcg_45_31
ARMv8 testing with a WUZHOU BC83AMDAA01-7270Z (11.62 BIOS) and Huawei Hi171x on Kylin Linux Advanced Server V10 via the Phoronix Test Suite.


ARMv8:

      Processor: ARMv8 @ 2.90GHz (128 Cores), Motherboard: WUZHOU BC83AMDAA01-7270Z (11.62 BIOS), Chipset: Huawei HiSilicon, Memory: 16 x 32 GB 4800MT/s Samsung M321R4GA3BB6-CQKET, Disk: 2 x 480GB HWE62ST3480L003N + 3 x 1920GB HWE62ST31T9L005N, Graphics: Huawei Hi171x , Network: 6 x Huawei HNS GE/10GE/25GE/50GE + 2 x Mellanox MT2892

      OS: Kylin Linux Advanced Server V10, Kernel: 4.19.90-52.22.v2207.ky10.aarch64 (aarch64) 20230314, Display Server: X Server 1.20.8, Compiler: GCC 7.3.0 + CUDA 12.8, File-System: xfs, Screen Resolution: 1024x768


    High Performance Conjugate Gradient 3.1
    X Y Z: 104 104 104 - RT: 300
    GFLOP/s > Higher Is Better
    ARMv8 . 69.86 |==================================================================================================================================================

    Would you like to upload the results to OpenBenchmarking.org (y/n): y
    Would you like to attach the system logs (lspci, dmesg, lsusb, etc) to the test result (y/n): y

    Results Uploaded To: https://openbenchmarking.org/result/2507083-NE-HPCG4531360可用浏览器查看测试结果:

3 NVIDIA HPC Benchmarks

# docker pull nvcr.io/nvidia/hpc-benchmarks:25.04# vi HPCG.datHPCG benchmark input file
Sandia National Laboratories; University of Tennessee, Knoxville
128 128 128
300# docker run --rm --gpus all --ipc=host --ulimit memlock=-1:-1 \       -v $(pwd):/host_data \       nvcr.io/nvidia/hpc-benchmarks:25.04 \       mpirun -np 1 \       /workspace/hpcg.sh \       --dat /host_data/HPCG.dat \       --cpu-affinity 0 \       --gpu-affinity 0========================================================================== NVIDIA HPC Benchmarks ==========================================================================NVIDIA Release 25.04Copyright (c) 2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved.Various files include modifications (c) NVIDIA CORPORATION & AFFILIATES.All rights reserved.This container image and its contents are governed by the NVIDIA Deep Learning Container License.By pulling and using the container, you accept the terms and conditions of this license:https://developer.nvidia.com/ngc/nvidia-deep-learning-container-licenseWARNING: No InfiniBand devices detected.         Multi-node communication performance may be reduced.         Ensure /dev/infiniband is mounted to this container.HPCG-NVIDIA 25.4.0-- NVIDIA accelerated HPCG benchmark -- NVIDIABuild v0.5.6Start of application (GPU-Only) ...Initial Residual = 2838.81Iteration = 1   Scaled Residual = 0.185703Iteration = 2   Scaled Residual = 0.101681...Iteration = 50   Scaled Residual = 3.94531e-07GPU Rank Info: | cuSPARSE version 12.5 | Reference CPU memory = 935.79 MB | GPU Name: 'NVIDIA GeForce RTX 4090' | GPU Memory Use: 2223 MB / 24082 MB | Process Grid: 1x1x1 | Local Domain: 128x128x128 | Number of CPU Threads: 1 | Slice Size: 2048WARNING: PERFORMING UNPRECONDITIONED ITERATIONSCall Number of Iterations Scaled Residual WARNING: PERFORMING UNPRECONDITIONED ITERATIONSCall Number of Iterations Scaled Residual Call Number of Iterations Scaled Residual Call Number of Iterations Scaled Residual Departure from symmetry (scaled) for SpMV abs(x'*A*y - y'*A*x) = 8.42084e-10Departure from symmetry (scaled) for MG abs(x'*Minv*y - y'*Minv*x) = 4.21042e-10SpMV call Residual SpMV call Residual Initial Residual = 2838.81Iteration = 1   Scaled Residual = 0.220178Iteration = 2   Scaled Residual = 0.118926...Iteration = 49   Scaled Residual = 4.98548e-07Iteration = 50   Scaled Residual = 3.08635e-07Call Scaled Residual Call Scaled Residual Call Scaled Residual ...Call Scaled Residual Call Scaled Residual HPCG-Benchmarkversion=3.1Release date=March 28, 2019Machine Summary=Machine Summary::Distributed Processes=1Machine Summary::Threads per processes=1Global Problem Dimensions=Global Problem Dimensions::Global nx=128Global Problem Dimensions::Global ny=128Global Problem Dimensions::Global nz=128Processor Dimensions=Processor Dimensions::npx=1Processor Dimensions::npy=1Processor Dimensions::npz=1Local Domain Dimensions=Local Domain Dimensions::nx=128Local Domain Dimensions::ny=128########## Problem Summary##########=Setup Information=Setup Information::Setup Time=0.00910214Linear System Information=Linear System Information::Number of Equations=2097152Linear System Information::Number of Nonzero Terms=55742968Multigrid Information=Multigrid Information::Number of coarse grid levels=3Multigrid Information::Coarse Grids=Multigrid Information::Coarse Grids::Grid Level=1Multigrid Information::Coarse Grids::Number of Equations=262144Multigrid Information::Coarse Grids::Number of Nonzero Terms=6859000Multigrid Information::Coarse Grids::Number of Presmoother Steps=1Multigrid Information::Coarse Grids::Number of Postsmoother Steps=1Multigrid Information::Coarse Grids::Grid Level=2Multigrid Information::Coarse Grids::Number of Equations=32768Multigrid Information::Coarse Grids::Number of Nonzero Terms=830584Multigrid Information::Coarse Grids::Number of Presmoother Steps=1Multigrid Information::Coarse Grids::Number of Postsmoother Steps=1Multigrid Information::Coarse Grids::Grid Level=3Multigrid Information::Coarse Grids::Number of Equations=4096Multigrid Information::Coarse Grids::Number of Nonzero Terms=97336Multigrid Information::Coarse Grids::Number of Presmoother Steps=1Multigrid Information::Coarse Grids::Number of Postsmoother Steps=1########## Memory Use Summary##########=Memory Use Information=Memory Use Information::Total memory used for data (Gbytes)=1.49883Memory Use Information::Memory used for OptimizeProblem data (Gbytes)=0Memory Use Information::Bytes per equation (Total memory / Number of Equations)=714.697Memory Use Information::Memory used for linear system and CG (Gbytes)=1.31912Memory Use Information::Coarse Grids=Memory Use Information::Coarse Grids::Grid Level=1Memory Use Information::Coarse Grids::Memory used=0.15755Memory Use Information::Coarse Grids::Grid Level=2Memory Use Information::Coarse Grids::Memory used=0.0196946Memory Use Information::Coarse Grids::Grid Level=3Memory Use Information::Coarse Grids::Memory used=0.00246271########## V&V Testing Summary##########=Spectral Convergence Tests=Spectral Convergence Tests::Result=PASSEDSpectral Convergence Tests::Unpreconditioned=Spectral Convergence Tests::Unpreconditioned::Maximum iteration count=11Spectral Convergence Tests::Unpreconditioned::Expected iteration count=12Spectral Convergence Tests::Preconditioned=Spectral Convergence Tests::Preconditioned::Maximum iteration count=1Spectral Convergence Tests::Preconditioned::Expected iteration count=2Departure from Symmetry |x'Ay-y'Ax|/(2*||x||*||A||*||y||)/epsilon=Departure from Symmetry |x'Ay-y'Ax|/(2*||x||*||A||*||y||)/epsilon::Result=PASSEDDeparture from Symmetry |x'Ay-y'Ax|/(2*||x||*||A||*||y||)/epsilon::Departure for SpMV=8.42084e-10Departure from Symmetry |x'Ay-y'Ax|/(2*||x||*||A||*||y||)/epsilon::Departure for MG=4.21042e-10########## Iterations Summary##########=Iteration Count Information=Iteration Count Information::Result=PASSEDIteration Count Information::Reference CG iterations per set=50Iteration Count Information::Optimized CG iterations per set=50Iteration Count Information::Total number of reference iterations=75150Iteration Count Information::Total number of optimized iterations=75150########## Reproducibility Summary##########=Reproducibility Information=Reproducibility Information::Result=PASSEDReproducibility Information::Scaled residual mean=3.08635e-07Reproducibility Information::Scaled residual variance=0########## Performance Summary (times in sec) ##########=Benchmark Time Summary=Benchmark Time Summary::Optimization phase=0.017375Benchmark Time Summary::DDOT=6.03317Benchmark Time Summary::WAXPBY=6.80771Benchmark Time Summary::SpMV=58.5598Benchmark Time Summary::MG=227.166Benchmark Time Summary::Total=298.585Floating Point Operations Summary=Floating Point Operations Summary::Raw DDOT=9.5191e+11Floating Point Operations Summary::Raw WAXPBY=9.5191e+11Floating Point Operations Summary::Raw SpMV=8.54573e+12Floating Point Operations Summary::Raw MG=4.76988e+13Floating Point Operations Summary::Total=5.81484e+13Floating Point Operations Summary::Total with convergence overhead=5.81484e+13GB/s Summary=GB/s Summary::Raw Read B/W=1200GB/s Summary::Raw Write B/W=277.327GB/s Summary::Raw Total B/W=1477.32GB/s Summary::Total with convergence and optimization phase overhead=1457.89GFLOP/s Summary=GFLOP/s Summary::Raw DDOT=157.779GFLOP/s Summary::Raw WAXPBY=139.828GFLOP/s Summary::Raw SpMV=145.932GFLOP/s Summary::Raw MG=209.974GFLOP/s Summary::Raw Total=194.747GFLOP/s Summary::Total with convergence overhead=194.747GFLOP/s Summary::Total with convergence and optimization phase overhead=192.185User Optimization Overheads=User Optimization Overheads::Optimization phase time (sec)=0.017375User Optimization Overheads::Optimization phase time vs reference SpMV+MG time=0.0396317DDOT Timing Variations=DDOT Timing Variations::Min DDOT MPI_Allreduce time=0.220609DDOT Timing Variations::Max DDOT MPI_Allreduce time=0.220609DDOT Timing Variations::Avg DDOT MPI_Allreduce time=0.220609Final Summary=Final Summary::HPCG result is VALID with a GFLOP/s rating of=192.185Final Summary::HPCG 2.4 rating for historical reasons is=193.058Final Summary::Results are valid but execution time (sec) is=298.585Final Summary::Official results execution time (sec) must be at least=1800
来源:程序园用户自行投稿发布,如果侵权,请联系站长删除
免责声明:如果侵犯了您的权益,请联系站长,我们会及时删除侵权内容,谢谢合作!
页: [1]
查看完整版本: HPCG基准测试的几种执行方式