Convert a 3-Node 11.2.0.3 RAC cluster to use HUGEPAGES ( OEL 6.4 )

Why should you use HugePages ?

  • Larger Page Size and Less # of Pages: Default page size is 4K whereas the HugeTLB size is 2048K. That means the system would need to handle 512 times less pages.
  • Reduced Page Table Walking: Since a HugePage covers greater contiguous virtual address range than a regular sized page, a probability of getting a TLB hit per TLB entry with HugePages are higher than with regular pages. This reduces the number of times page tables are walked to obtain physical address from a virtual address.
  • Less Overhead for Memory Operations: On virtual memory systems (any modern OS) each memory operation is actually two abstract memory operations. With HugePages, since there are less number of pages to work on, the possible bottleneck on page table access is clearly avoided.
  • Less Memory Usage: From the Oracle Database perspective, with HugePages, the Linux kernel will use less memory to create pagetables to maintain virtual to physical mappings for SGA address range, in comparison to regular size pages. This makes more memory to be available for process-private computations or PGA usage.
  • No Swapping: We must avoid swapping to happen on Linux OS at all Document 1295478.1. HugePages are not swappable (whereas regular pages are). Therefore there is no page replacement mechanism overhead. HugePages are universally regarded as pinned.
  • No ‘kswapd’ Operations: kswapd will get very busy if there is a very large area to be paged (i.e. 13 million page table entries for 50GB memory) and will use an incredible amount of CPU resource. When HugePages are used, kswapd is not involved in managing them. See also Document 361670.1

What should you know about using HugePages ?

  • Transparent HugePages are known to cause unexpected node reboots and performance problems with RAC
  • Oracle strongly advises to disable the use of Transparent HugePages.
  • Oracle highly recommends the use of standard HugePages that were recommended for previous releases of Linux.
  • Automatic Memory Management (AMM) is not compatible with Linux HugePages use Automatic Shared Memory Management and Automatic PGA Management
  • If standard HugePages are not configured RACCHECK fails with:  the Operating system hugepages count does not satisfy total SGA requirements
  • The HugePages are allocated in a lazy fashion, so the “Hugepages_Free” count drops as the pages get touched and are backed by physical memory. The idea is that it’s more efficient in the sense that you don’t use memory you don’t touch.
  • If you had set the instance initialization parameter PRE_PAGE_SGA=TRUE (for suitable settings see Document  30793.1), all of the pages would be allocated from HugePages up front.

To convert a RAC database to ASMM we need to do following steps

  • Disable  Transparent HugePage
  • Convert database from AMM to ASMM
  • Configure standard HUGEPAGES

 

Disable  Transparent HugePages

Check status  Transparent HugePages
# cat /sys/kernel/mm/transparent_hugepage/enabled
[always] madvise never
--> [always]  marks that this system is using transparent hugepages

Get the current Hugepagesize 
#  grep Huge /proc/meminfo
AnonHugePages:    485376 kB
HugePages_Total:       0
HugePages_Free:        0
HugePages_Rsvd:        0
HugePages_Surp:        0
Hugepagesize:       2048 kB
--> Hugepagesize : 2Mbyte
    AnonHugePages: 485376 kB -> the kerrnel is using Transparent HugePages as AnonHugePages > 0kByte
Because the kernel currently uses Transparent HugePages only for the anonymous memory blocks like stack and heap, the value of 
AnonHugepages in /proc/meminfo is the current amount of Transparent HugePages that the kernel is using.

To disable  Transparent HugePages add  transparent_hugepage=never to  /etc/grub.conf ( see  Note 1557478.1 for using /sys/kernel/ )
After changing  /etc/grub.conf and reboot if your system and verify that   Transparent HugePages are disabled.
# cat /sys/kernel/mm/transparent_hugepage/enabled
always madvise [never]
# grep AnonHugePages /proc/meminfo
AnonHugePages:         0 kB

Switching RAC database from Automatic Memory Management ( AMM ) to Automatic Shared Memory Management ( ASMM )

Please read the following link to archive this.

 

Setup standard HugePages

Configure memlock user limit set
Have the memlock user limit set in /etc/security/limits.conf file. 
Set the value (in KB) slightly smaller than installed RAM. e.g. 
For 4GB RAM installed, you may set: ( 4*1024*1024 = 4194304 ) -> 4200000
# Needed for hugepages setup ( 4 GByte )  
*   soft   memlock    4200000
*   hard   memlock    4200000
Login as oracle user and verify setting ( csh )
$ limit | grep memorylocked
memorylocked 4200000 kbytes
or sh
$  ulimit -l
4200000

Start instance(s) and verify your current Huge pages usage
$  grep Huge /proc/meminfo
AnonHugePages:         0 kB
HugePages_Total:       0
HugePages_Free:        0
HugePages_Rsvd:        0
HugePages_Surp:        0
Hugepagesize:       2048 kB
--> Standart HugePages not yet configured

Calculate nr_hugepages using script from Document 401749.1
./hugepages_settings.sh
This script is provided by Doc ID 401749.1 from My Oracle Support
(http://support.oracle.com) where it is intended to compute values for
the recommended HugePages/HugeTLB configuration for the current shared
memory segments. Before proceeding with the execution please note following:
 * For ASM instance, it needs to configure ASMM instead of AMM.
 * The 'pga_aggregate_target' is outside the SGA and
   you should accommodate this while calculating SGA size.
 * In case you changes the DB SGA size,
   as the new SGA will not fit in the previous HugePages configuration,
   it had better disable the whole HugePages,
   start the DB with new SGA size and run the script again.
And make sure that:
 * Oracle Database instance(s) are up and running
 * Oracle Database 11g Automatic Memory Management (AMM) is not setup
   (See Doc ID 749851.1)
 * The shared memory segments can be listed by command:
     # ipcs -m
Press Enter to proceed...
Recommended setting: vm.nr_hugepages = 708

Set kernel parameter and reboot system
 Edit the file /etc/sysctl.conf and set the vm.nr_hugepages parameter there:
    vm.nr_hugepages = 708

Check available hugepages after reboot system and restarting RAC  instances
As oracle user verify  ( HugePages_Free < HugePages_Total --> Hugepages are used )
$ grep Huge /proc/meminfo
AnonHugePages:         0 kB
HugePages_Total:     708
HugePages_Free:      399
HugePages_Rsvd:      396
HugePages_Surp:        0
Hugepagesize:       2048 kB
$ id
uid=54321(oracle) gid=54321(oinstall) groups=54321(oinstall),501(vboxsf),506(asmdba),54322(dba)
$ limit | grep memorylocked
memorylocked 4200000 kbytes
$  cat /sys/kernel/mm/transparent_hugepage/enabled
always madvise [never]

Verify alert.log for Huge Page usage
****************** Large Pages Information *****************
Total Shared Global Region in Large Pages = 1410 MB (100%) 
Large Pages used by this instance: 705 (1410 MB)
Large Pages unused system wide = 3 (6144 KB) (alloc incr 16 MB)
Large Pages configured system wide = 708 (1416 MB)
Large Page size = 2048 KB
***********************************************************

 

For troubleshooting check Note: HugePages on Oracle Linux 64-bit (see Doc ID 361468.1)

Symptom    
    Possible Cause    
            Troubleshooting Action
System is running out of memory or swapping     
    --> Not enough HugePages to cover the SGA(s) and therefore the area reserved for HugePages are wasted where SGAs are 
        allocated through regular pages.     
            --> Review your HugePages configuration to make sure that all SGA(s) are covered.
Databases fail to start     
    --> memlock limits are not set properly     
            --> Make sure the settings in limits.conf apply to database owner account.
One of the database fail to start while another is up     
    --> The SGA of the specific database could not find available HugePages and remaining RAM is not enough.     
            --> Make sure that the RAM and HugePages are enough to cover all your database SGAs
Cluster Ready Services (CRS) fail to start     
    --> HugePages configured too large (maybe larger than installed RAM)     
           --> Make sure the total SGA is less than the installed RAM and re-calculate HugePages.    
HugePages_Total = HugePages_Free     
    --> HugePages are not used at all. No database instances are up or using AMM.     
           --> Disable AMM and make sure that the database instances are up. See Doc ID 1373255.1
Database started successfully and the performance is slow     
    --> The SGA of the specific database could not find available HugePages and therefore the SGA is handled 
        by regular pages, which leads to slow performance            
           --> Make sure that the HugePages are many enough to cover all your database SGAs    

 

Memory considerations using Hugepages – Monitor tool:  top

Without TFA,CRS,RAC Rdbms - pure OS with Hugepages disabled 

Mem:   3602324k total,   686116k used,  2916208k free,    36936k buffers
Swap:  6373372k total,        0k used,  6373372k free,   271596k cached
  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND                                                                
 3099 root      20   0 1065m 114m 9.8m S  0.7  3.2   0:08.47 java                                                                   
 2694 root      20   0  149m  47m 7132 S  0.3  1.3   0:02.17 Xorg                                                                   
 2838 root      20   0  197m 1476  960 R  0.3  0.0   0:00.22 VBoxClient 
--> OS only used 600 MByte from our RSS memory - 3 Gbyte are free

Without TFA,CRS,RAC Rdbms - pure OS with Hugepages enabled
# Using hugepages
vm.nr_hugepages = 708

Mem:   3602324k total,  1996272k used,  1606052k free,    34716k buffers
Swap:  6373372k total,        0k used,  6373372k free,   245076k cached
  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND                                                                
 2613 root      20   0  144m  42m 7632 S  0.0  1.2   0:04.23 Xorg                                                                   
 2860 root      20   0  918m  21m  14m S  0.0  0.6   0:01.49 nautilus 
-> Hugepages are allocated during boot time 
-> RSS free memory drops from 2.9 GByte to 1.6 Byte ( 708 x 2 MByte = 1.4 Gbyte ) 
-> Hugepages are not pageable 
-> All the Hugepage memory is taken from RSS

Active components  : TFA, OS with Hugepages
Mem:   3602324k total,  2130804k used,  1471520k free,    38464k buffers
Swap:  6373372k total,        0k used,  6373372k free,   273624k cached
  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND                                                                
 4470 root      20   0 1067m 104m 9.8m S  0.0  3.0   0:09.72 java     <-- TFA           
 2613 root      20   0  144m  42m 7632 S  0.0  1.2   0:04.23 Xorg                                                                   
 2860 root      20   0  918m  21m  14m S  0.0  0.6   0:01.49 nautilus 
-> Even TFA uses ~ 1 Gbyte Virtual memory it only use ~ 100 Mbyte from our RSS

Active components:  TFA,CRS,ASM,OS with Hugepages
Mem:   3602324k total,  3136516k used,   465808k free,    43168k buffers
Swap:  6373372k total,        0k used,  6373372k free,   720812k cached
  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND                                                                
 5690 grid      RT   0  646m 115m  53m S  1.0  3.3   0:02.46 ocssd.bin                                                              
 4470 root      20   0 1067m 105m 9.8m S  0.0  3.0   0:10.80 java                                                                   
 5678 root      RT   0  636m  94m  55m S  0.0  2.7   0:00.18 cssdagent                                                              
 5658 root      RT   0  635m  93m  55m S  0.0  2.7   0:00.25 cssdmonitor                                                            
 5646 root      RT   0  628m  86m  55m S  1.7  2.5   0:02.74 osysmond.bin                                                      
 2613 root      20   0  144m  42m 7896 S  0.3  1.2   0:06.75 Xorg
--> After loading CRS we have a large  drop of 1 Gyte from free memory due to
    --> Lots of CRS processes get started and use RSS memory
    --> ASM gets started
    --> Lots of Oracle shared libraries get mapped into memory

Active components: RAC RDBMS instance, TFA,CRS,ASM,OS with Hugepages
Mem:   3602324k total,  3519732k used,    82592k free,    47796k buffers
Swap:  6373372k total,    14692k used,  6358680k free,   740624k cached
  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND                                                                
 9369 root      RT   0  431m 144m  60m D  1.7  4.1   0:35.32 ologgerd                                                               
 5690 grid      RT   0  646m 116m  53m S  0.7  3.3   0:25.15 ocssd.bin                                                              
 4470 root      20   0 1067m 107m 9.8m S  0.0  3.0   0:18.15 java                                                                   
 5678 root      RT   0  636m  94m  55m S  0.3  2.7   0:01.96 cssdagent                                                              
 5658 root      RT   0  635m  93m  55m S  0.3  2.7   0:02.22 cssdmonitor                                                            
 5646 root      RT   0  629m  87m  55m S  2.3  2.5   0:36.55 osysmond.bin  
--> After RDBMS restart only 400 MByte memory are allocted from free RSS memory
--> Most of the Oracle libraries are already mapped by CRS startup 
--> SGA is already allocated during boot time as we are using Hugepages
--> Oracle RDBMS processes still takes some RSS memory (  ~ 400 MByte per instance) 

 

References

  • HugePages on Oracle Linux 64-bit (Doc ID 361468.1)          <– Read this first
  • ALERT: Disable Transparent HugePages on SLES11, RHEL6, OEL6 and UEK2 Kernels (Doc ID 1557478.1)
  • HugePages and Oracle Database 11g Automatic Memory Management (AMM) on Linux (Doc ID 749851.1)
  • HugePages on Linux: What It Is… and What It Is Not… (Doc ID 361323.1)
  • Shell Script to Calculate Values Recommended Linux HugePages / HugeTLB Configuration (Doc ID 401749.1)

Leave a Reply

Your email address will not be published. Required fields are marked *