RAC Investigation on Low-Memory Linux | ORA600
Back in the Oracle 9i days, I was one of those people who got on eBay to buy firewire PCI cards and disks that could do non-exclusive login. Remember that? The first time a little test cluster could be cheap enough for the home enthusiast? I still have the parts in my closet.
Of course, we all know what happened after that – virtualization. It didn’t take long before my home-built test clusters were running on VMware. (Personally, I think that virtualization really started because of those NES and SNES emulators. Most great achievements start with a geek who wants to play more video games.) There are lots of people now who run RAC on virtual environments and it’s easy to find tutorials on the web for many different OS and VM combinations.
Low-Memory LinuxSomething I haven’t seen many other people do is RAC with a very small memory configuration. Like 760M of memory per server. (!) Of course you’d only do this for a hobby setup – never on a system where you want any kind of support. But I’m kinda cheap… and running RAC on these small VMs means that I don’t have to go buy an expensive new home computer. My current gateway laptop with Vista Home does the job quite nicely!
10.2 and 11.1 RAC will install and run on servers with 760M of memory. But things were a little unstable at first. Now I’m the curious type… I like to fiddle with things… so I investigated a little bit.
Basic Unix InvestigationThere are two basic investigation scenarios:
| what happened in the past | My main tool is sar (System Activity Reporter). Or Java-based ksar on my desktop – it gets data via ssh and graphs it. |
| what is happening now | My starting point is vmstat and top. To dig a little deeper, I might then use other tools like ps, free, iostat or netstat. |
In this particular case, I noticed pretty quickly from the top utility that one process was consuming over 30% of the system’s memory! (Note: in top, you can press the ‘<’ and ‘>’ keys to move the sort column left and right. The initial sort column is %CPU. I moved it one column to the right, sorting by %MEM.)
top - 18:41:13 up 5:28, 3 users, load average: 0.04, 0.35, 0.60 Tasks: 180 total, 2 running, 178 sleeping, 0 stopped, 0 zombie Cpu(s): 0.7%us, 3.7%sy, 0.0%ni, 89.4%id, 6.3%wa, 0.0%hi, 0.0%si, 0.0%st Mem: 767020k total, 754296k used, 12724k free, 7084k buffers Swap: 1540088k total, 654696k used, 885392k free, 361800k cached PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 17435 oracle RT 0 230m 229m 31m S 0.3 30.6 0:26.21 /u01/crs/oracle/product/11.1.0/crs/bin/ocssd.bin 7765 oracle 15 0 464m 111m 102m S 0.0 14.9 0:04.52 ora_smon_RAC1 7743 oracle -2 0 445m 90m 83m S 0.0 12.1 0:12.01 ora_lms0_RAC1 7783 oracle 15 0 440m 70m 66m S 0.0 9.4 0:05.35 ora_mmon_RAC1 20321 oracle 15 0 438m 48m 45m S 0.0 6.5 0:00.88 oracleRAC1 (DESCRIPTION=(LOCAL=YES)(ADDRESS=(PROTOCOL=beq))) 8676 oracle 16 0 437m 46m 45m S 0.0 6.2 0:02.69 oracleRAC1 (DESCRIPTION=(LOCAL=YES)(ADDRESS=(PROTOCOL=beq))) 8801 oracle 15 0 440m 44m 41m S 0.0 6.0 0:04.57 ora_cjq0_RAC1This process (ocssd) is Oracle’s Cluster Synchronization Services Daemon. It’s the process that sends and receives heartbeats from other nodes. Any delays sending or receiving those heartbeats can cause node evictions (a.k.a. server reboots) – so it’s a pretty important process! That’s why it runs with realtime (RT) scheduling priority, as you can see in the above output from top.
I was surprised that CSS uses so much physical memory – usually Linux is very good at memory management. In top, the VIRT column shows how much total memory each process is using, while the RES column shows how much actual physical memory Linux has allocated to it. It’s clear that Linux is pretty actively managing the physical memory for other processes.
A quick glance at vmstat shows that although we are actively swapping, it seems under control. This is about what I’d expect when we’re idle and all of the processes except CSS are sharing only 500M of memory:
collabn1:/home/oracle[RAC1]$ vmstat 5 procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu------ r b swpd free buff cache si so bi bo in cs us sy id wa st 0 0 604144 14024 34948 300632 0 0 32 86 1044 1797 1 8 84 7 0 2 1 603912 11224 34968 303968 87 0 763 81 1083 1902 3 11 56 30 0 0 0 604840 12660 34912 303904 25 0 253 38 1050 1975 2 11 69 18 0 0 0 604840 12728 34924 303900 0 0 68 106 1043 1975 1 9 82 8 0 0 0 604784 14536 34936 304140 19 0 89 71 1043 2042 1 5 81 13 0 1 0 604752 14168 34944 304496 6 0 119 21 1040 2016 2 8 80 11 0 0 1 604736 18732 35020 306212 0 0 384 73 1050 2489 1 10 70 19 0 1 1 604736 11144 35232 311252 104 0 1209 128 1074 2024 2 12 38 47 0 3 1 607900 8056 30788 307352 77 1500 504 1661 1055 2642 9 16 56 19 0 0 0 607900 9836 30800 307360 0 0 71 70 1031 1798 1 6 85 8 0 1 0 607884 10536 30812 307400 8 0 44 93 1031 1832 1 4 87 8 0The SO column tells us when memory is written to disk (and removed from physical). The SI column tells us when memory is read from disk (and put back in physical). On a side note, remember that on a healthy Unix system the free memory is always small. Sometimes this is confusing at first.
Linux Process Memory InvestigationNonetheless, I’m not happy that CSS is using 30% of my physical memory in this highly-constrained hobby environment. Why is Linux allowing this? The first clue comes simply from the output of the familiar unix ps utility:
collabn1:/home/oracle[RAC1]$ ps v -C ocssd.bin PID TTY STAT TIME MAJFL TRS DRS RSS %MEM COMMAND 17435 ? SLl 0:08 7 588 235331 234840 30.6 /u01/crs/oracle/product/11.1.0/crs/bin/ocssd.binOn Linux, the “v” flag tells ps to give information relevant to virtual memory. TRS and DRS tell me how much physical (resident) memory is used for machine executable code (text) and data, respectively. But more importantly – the STAT column gives some informative BSD-style flags about the process. That capital-L indicates that CSS has some pages that are locked into physical memory. Bingo.
If I have root access, then I can get a very detailed report on process memory usage with the pmap command. The output was a little long, so I’ve abbreviated it here:
[root@collabn1 ~]# pmap -x 17435 17435: /u01/crs/oracle/product/11.1.0/crs/bin/ocssd.bin Address Kbytes RSS Anon Locked Mode Mapping 00110000 656 - - - r-x-- libhasgen11.so 001b4000 8 - - - rwx-- libhasgen11.so ... 37 more library blocks 02a0d000 100 - - - rwx-- [ anon ] 02a26000 4 - - - --x-- [ anon ] 02a27000 10240 - - - rwx-- [ anon ] 03427000 4 - - - --x-- [ anon ] 03428000 10240 - - - rwx-- [ anon ] ... 10 more anonymous blocks, half are 10240K 08048000 592 - - - r-x-- ocssd.bin 080dc000 4 - - - rwx-- ocssd.bin ... 30 more anonymous blocks, half are 10240K bfe40000 148 - - - rwx-- [ stack ] bfe65000 8 - - - rw--- [ anon ] -------- ------- ------- ------- ------- total kB 235920 - - -Interestingly, the linux pmap utility does not indicate any locked memory! I don’t know whether that output column is non-functional or if it refers only to some particular kind of locking. But at any rate, I know something is locked. I couldn’t think of anything better, so the next place I looked was in the Linux /proc pseudo-filesystem.
collabn1:/home/oracle[RAC1]$ grep Vm /proc/17435/status VmPeak: 235924 kB VmSize: 235920 kB VmLck: 235920 kB VmHWM: 234844 kB VmRSS: 234840 kB VmData: 202272 kB VmStk: 156 kB VmExe: 592 kB VmLib: 31960 kB VmPTE: 268 kBNow we’re talking. The process has a total of 235920 kB of memory – and it’s ALL locked. On a normal RAC system you’d want this. Generally, important realtime processes should be locked so that they are never delayed by paging or swapping. (Remember how that could cause node reboots?)
But I personally doubt that all of the memory really NEEDS to be locked, and I think that Linux will actually do a decent job of not swapping the most important parts. And my highly constrained hobby environment will probably run much smoother if Linux has more flexibility when managing a measly 760M of memory.
Unlocking Linux Process MemoryBut is it actually possible to unlock the process memory? As far as I know, Oracle provides no option to disable CSS memory locking. (For good reason.) There is the system call munlockall() – which unlocks all of a particular processes’ memory. But the CSS process itself would have to call this function. And of course it will not. Or will it?
If you’ve got root, then there’s a hacker-back-door way of doing this. Remember, you’d be crazy to try this anywhere besides a dark closet at home. And if you type too slow then CSS could reboot your machine.
But watch this…
[root@collabn1 ~]# grep Vm /proc/17435/status VmPeak: 235924 kB VmSize: 235920 kB VmLck: 235920 kB VmHWM: 234844 kB VmRSS: 234840 kB VmData: 202272 kB VmStk: 156 kB VmExe: 592 kB VmLib: 31960 kB VmPTE: 268 kB [root@collabn1 ~]# gdb -p 17435 <<EOF > call munlockall() > quit > EOF GNU gdb Fedora (6.8-27.el5) Copyright (C) 2008 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html> This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Type "show copying" and "show warranty" for details. This GDB was configured as "i386-redhat-linux-gnu". Attaching to process 17435 Reading symbols from /u01/crs/oracle/product/11.1.0/crs/bin/ocssd.bin...done. ... 17 more Reading/Loading symbols. [Thread debugging using libthread_db enabled] [New Thread 0xb7f629f0 (LWP 17435)] ... 19 more New Threads. Loaded symbols for /lib/libpthread.so.0 Reading symbols from /lib/libnsl.so.1...done. ... 13 more Loaded/Reading symbols. 0x008e7402 in __kernel_vsyscall () (gdb) $1 = 0 (gdb) The program is running. Quit anyway (and detach it)? (y or n) [answered Y; input not from terminal] Detaching from program: /u01/crs/oracle/product/11.1.0/crs/bin/ocssd.bin, process 17435 [root@collabn1 ~]# grep Vm /proc/17435/status VmPeak: 235924 kB VmSize: 235920 kB VmLck: 0 kB VmHWM: 234844 kB VmRSS: 234840 kB VmData: 202272 kB VmStk: 156 kB VmExe: 592 kB VmLib: 31960 kB VmPTE: 268 kBHa. This is something you won’t find on metalink. After generating some activity on the system, the top utility shows me that Linux has significantly reduced the physical memory used by CSS.
These days I’ve actually scripted this for my home and classroom VM environments. I haven’t done a careful comparison or analysis, but it really has seemed to me that my low-memory Linux systems run noticeably smoother.
Europe
Belgium :
Kurt Van Meerbeeck
ORA600 bvba
E-mail
dude@ora600.be
Cell : +32 495 580714
Denmark :
Henrik Bjerknæs Rasmussen
Service & Support Manager
Miracle AS
E-mail :
hra@miracleas.dk
Cell: +45 53 747 110
North America
USA :
Tim Gorman
Evdbt Inc
E-mail
tim@evdbt.com
Cell : +1 303 885 4526
Canada :
Pythian
E-mail
dude@pythian.com
Contact
Latin America
Brazil :
HBtec
E-mail
dude@hbtec.com.br
Cell : +55 47 88497639
Contact
Africa
South Africa :
Kugendran Naidoo
NRG Consulting
E-mail
k@nrgc.co.za
Cell : +27 82 7799275
East Asia Pacific
Australia
Alex Gorbachev
Pythian Australia
E-mail
dude@pythian.com
Cell : +61 2 9844 5431