背景介绍
一套在RHEL的集群上运行的Oracle实例,是用systemd服务启动Oracle实例的(方便集群的切换操作).在测试过程中发现标准大页没有被用上.
具体情况如下所示:- $ grep HugePages /proc/meminfo
- AnonHugePages: 0 kB
- ShmemHugePages: 0 kB
- FileHugePages: 0 kB
- HugePages_Total: 2034
- HugePages_Free: 2034
- HugePages_Rsvd: 0
- HugePages_Surp: 0
复制代码 原因分析
如上所示,HugePages_Free的值为2034, HugePages_Total的值也是2034,也就是说标准大页完全没有使用.
Linux服务器的基本信息如下- $ more /etc/redhat-release
- Red Hat Enterprise Linux release 8.10 (Ootpa)
- $ free -m
- total used free shared buff/cache available
- Mem: 11697 4929 5986 17 780 6612
- Swap: 16383 0 16383
复制代码 检查数据库的参数如下, 完全符合条件:- SQL> select banner from v$version;
- BANNER
- --------------------------------------------------------------------------------
- Oracle Database 19c Enterprise Edition Release 19.0.0.0.0 - Production
- SQL> col name for a20;
- SQL> col value for a32;
- SQL> select name, value from v$parameter
- 2 where name in ('memory_target','sga_target','use_large_pages');
- NAME VALUE
- -------------------- --------------------------------
- use_large_pages TRUE
- sga_target 4261412864
- memory_target 0
- SQL>
复制代码 内核参数vm.nr_hugepages也是正确设置的,具体如下所示- $ grep vm.nr_hugepages /etc/sysctl.conf
- vm.nr_hugepages = 2034
- $ ./hugepages_settings.sh
- This script is provided by Doc ID 401749.1 from My Oracle Support
- (http://support.oracle.com) where it is intended to compute values for
- the recommended HugePages/HugeTLB configuration for the current shared
- memory segments on Oracle Linux. Before proceeding with the execution please note following:
- * For ASM instance, it needs to configure ASMM instead of AMM.
- * The 'pga_aggregate_target' is outside the SGA and
- you should accommodate this while calculating the overall size.
- * In case you changes the DB SGA size,
- as the new SGA will not fit in the previous HugePages configuration,
- it had better disable the whole HugePages,
- start the DB with new SGA size and run the script again.
- And make sure that:
- * Oracle Database instance(s) are up and running
- * Oracle Database 11g Automatic Memory Management (AMM) is not setup
- (See Doc ID 749851.1)
- * The shared memory segments can be listed by command:
- # ipcs -m
- Press Enter to proceed...
- Recommended setting: vm.nr_hugepages = 2034
复制代码 资源限制配置文件limits.conf中的memlock的设置也是正确的.如下所示:- # grep memlock /etc/security/limits.conf
- # - memlock - max locked-in-memory address space (KB)
- oracle soft memlock 10485760
- oracle hard memlock 10485760
- # su - oracle
- Last login: Fri Aug 8 13:54:36 CST 2025 on pts/0
- $ ulimit -l
- 10485760
- $ grep memlock /etc/security/limits.conf
- # - memlock - max locked-in-memory address space (KB)
- oracle soft memlock 10485760
- oracle hard memlock 10485760
复制代码 很是纳闷为什么配置都是正确,但是Oracle就是不用标准大页,于是重启一下Oracle实例,在告警日志中发现了蛛丝马迹,如下所示:- **********************************************************************
- 2025-08-08T13:50:16.662256+08:00
- Dump of system resources acquired for SHARED GLOBAL AREA (SGA)
- 2025-08-08T13:50:16.662285+08:00
- Domain name: system.slice/bpsdbsvr.service
- 2025-08-08T13:50:16.662302+08:00
- Per process system memlock (soft) limit = 64K
- 2025-08-08T13:50:16.662318+08:00
- Expected per process system memlock (soft) limit to lock
- instance MAX SHARED GLOBAL AREA (SGA) into memory: 4066M
- 2025-08-08T13:50:16.662356+08:00
- Available system pagesizes:
- 4K, 2048K
- 2025-08-08T13:50:16.662387+08:00
- Supported system pagesize(s):
- 2025-08-08T13:50:16.662404+08:00
- PAGESIZE AVAILABLE_PAGES EXPECTED_PAGES ALLOCATED_PAGES ERROR(s)
- 2025-08-08T13:50:16.662421+08:00
- 4K Configured 11 1040395 NONE
- 2025-08-08T13:50:16.662450+08:00
- 2048K 2034 2033 0 NONE
- 2025-08-08T13:50:16.662466+08:00
- RECOMMENDATION:
- 2025-08-08T13:50:16.662483+08:00
- 1. Increase per process memlock (soft) limit to at least 4066MB
- to lock 100% of SHARED GLOBAL AREA (SGA) pages into physical memory
- 2025-08-08T13:50:16.662514+08:00
- **********************************************************************
复制代码 检查Oracle进程的限制,发现进程的Max locked memory为65536,也就是64K.- # oracle_pid=$(pgrep -f "_pmon_")
- # cat /proc/$oracle_pid/limits
- Limit Soft Limit Hard Limit Units
- Max cpu time unlimited unlimited seconds
- Max file size unlimited unlimited bytes
- Max data size unlimited unlimited bytes
- Max stack size 33554432 unlimited bytes
- Max core file size 0 unlimited bytes
- Max resident set unlimited unlimited bytes
- Max processes 46635 46635 processes
- Max open files 262144 262144 files
- Max locked memory 65536 65536 bytes
- Max address space unlimited unlimited bytes
- Max file locks unlimited unlimited locks
- Max pending signals 46635 46635 signals
- Max msgqueue size 819200 819200 bytes
- Max nice priority 0 0
- Max realtime priority 0 0
- Max realtime timeout unlimited unlimited us
复制代码 也就是说systemd服务启动Oracle实例时,由于某些原因memlock依然是64K,即日志中的提示" er process system memlock (soft) limit = 64K"
于是改用手工启动数据库实例,检查发现标准大页被Oracle使用了,但是systemd服务启动Oracle实例就会出现上面标准大页不被使用的情况- $ grep HugePages /proc/meminfo
- AnonHugePages: 0 kB
- ShmemHugePages: 0 kB
- FileHugePages: 0 kB
- HugePages_Total: 2034
- HugePages_Free: 4
- HugePages_Rsvd: 3
- HugePages_Surp: 0
复制代码 后面和同事查资料,发现systemctl启动的服务默认不读取资源限制配置文件(limits.conf). limits.conf中的限制是针对用户会话级别的资源控制,
由PAM模块在用户登录时生效。而systemd服务是通过systemd进程直接启动的,属于非登录会话,默认不会触发PAM的pam_limits.so模块,
因此/etc/security/limits.conf 中为oracle用户设置的所有资源限制都不会自动应用到通过 systemd 服务启动的进程.
解决方案
如果想让systemd服务启动Oracle实例时memlock限制生效,可以在oracle.service 中直接配置 memlock 限制或者通过PAM让服务读取limits.conf.
网上资料推荐直接在systemd服务文件中配置(这是 systemd 推荐的方式,比依赖limits.conf更可靠), 如下所示:
oracle.service原始的配置- [Unit]
- Description=Oracle Database Service
- After=network.target
- [Service]
- Type=forking
- User=oracle
- Group=oinstall
- ExecStart=/home/oracle/xxxx/ora19c.sh start
- ExecStop=/home/oracle/xxxx/ora19c.sh shutdown
- StandardOutput=append:/var/log/rhcs_resource_logs/xxx/xxx.log
- RemainAfterExit=yes
- KillMode=none
- [Install]
- WantedBy=multi-user.target
复制代码 注意: oracle.service配置做了一点混淆,不影响大家理解.
oracle.service修改后配置- [Unit]
- Description=Oracle Database Service
- After=network.target
- [Service]
- Type=forking
- User=oracle
- Group=oinstall
- ExecStart=/home/oracle/xxxx/ora19c.sh start
- ExecStop=/home/oracle/xxxx/ora19c.sh shutdown
- StandardOutput=append:/var/log/rhcs_resource_logs/xxx/xxx.log
- RemainAfterExit=yes
- KillMode=none
- # oracle /etc/security/limits.conf
- LimitNPROC=16384
- LimitNOFILE=65536
- LimitSTACK=10485760
- LimitMEMLOCK=10737418240
- [Install]
- WantedBy=multi-user.target
复制代码 这样配置后,通过systemd服务启动Oracle实例就能正常使用标准大页了.问题Troubleshooting完美解决. 究其原因还是因为对Linux的systemd服务了解得不够多.不够深入!
扫描上面二维码关注我如果你真心觉得文章写得不错,而且对你有所帮助,那就不妨帮忙“推荐"一下,您的“推荐”和”打赏“将是我最大的写作动力!本文版权归作者所有,欢迎转载,但未经作者同意必须保留此段声明,且在文章页面明显位置给出原文连接.
来源:程序园用户自行投稿发布,如果侵权,请联系站长删除
免责声明:如果侵犯了您的权益,请联系站长,我们会及时删除侵权内容,谢谢合作! |