Debug RAC listener status change from ONLINE to INTERMEDIATE CHECK TIMED OUT,STABLE

Simulate a listener HANG scenario

To simulated a listener hang scenario attach a debugger to the local tnslsnr process ( tnslsnr LISTENER ) 
[root@gract3 Desktop]#  ps -elf | grep tnslsnr
0 S grid      4463     1  0  80   0 - 45777 ep_pol Aug05 ?        00:00:01 /u01/app/121/grid/bin/tnslsnr ASMNET1LSNR_ASM -no_crs_notify -inherit
0 S grid      4518     1  0  80   0 - 45906 ep_pol Aug05 ?        00:00:01 /u01/app/121/grid/bin/tnslsnr LISTENER -no_crs_notify -inherit
0 S grid      4525     1  0  80   0 - 45847 ep_pol Aug05 ?        00:00:01 /u01/app/121/grid/bin/tnslsnr LISTENER_SCAN1 -no_crs_notify -inherit

[root@gract3 Desktop]# gdb -p 4518
(gdb) where
#0  0x0000003ae10e8f43 in epoll_wait () from /lib64/libc.so.6
#1  0x00007f809eadad91 in sntevepoll () from /u01/app/121/grid/lib/libclntsh.so.12.1
#2  0x00007f809eada308 in nteveque () from /u01/app/121/grid/lib/libclntsh.so.12.1
#3  0x00007f809ead6a9a in ntevque () from /u01/app/121/grid/lib/libclntsh.so.12.1
#4  0x00007f809ea78650 in nsevwait () from /u01/app/121/grid/lib/libclntsh.so.12.1
#5  0x00000000004066dc in nsglma ()
#6  0x0000000000405939 in main ()

--> Check listener and  resource status 
[oracle@gract3 ~]$ lsnrctl status
LSNRCTL for Linux: Version 12.1.0.1.0 - Production on 06-AUG-2014 08:34:58
Copyright (c) 1991, 2013, Oracle.  All rights reserved.
Connecting to (ADDRESS=(PROTOCOL=tcp)(HOST=)(PORT=1521))
--> lsnrctl hangs

[root@gract3 Desktop]#   crs  | egrep 'ora.LISTENER.lsnr|STATE' | grep gract3
ora.LISTENER.lsnr              ONLINE     INTERMEDIATE    gract3       CHECK TIMED OUT,STABLE
--> LISTENER status changed from ONLINE to INTERMEDIATE CHECK TIMED OUT,STABLE 
    This is the expected behaviour as clusterware uses lsnrctl status to verify the listener resource status

Use strace to get details about the listener status

[grid@gract3 ~]$ strace -f -o LISTENER.trc  lsnrctl status
LSNRCTL for Linux: Version 12.1.0.1.0 - Production on 06-AUG-2014 08:38:17
Copyright (c) 1991, 2013, Oracle.  All rights reserved.
Connecting to (DESCRIPTION=(ADDRESS=(PROTOCOL=IPC)(KEY=LISTENER)))
Strace Output :
18597 socket(PF_FILE, SOCK_STREAM, 0)   = 7
18597 access("/var/tmp/.oracle/sLISTENER", F_OK) = 0
18597 connect(7, {sa_family=AF_FILE, path="/var/tmp/.oracle/sLISTENER"}, 110) = 0
18597 fcntl(7, F_SETFD, FD_CLOEXEC)     = 0
18597 brk(0x1aad000)                    = 0x1aad000
18597 rt_sigaction(SIGPIPE, {SIG_IGN, ~[ILL ABRT BUS FPE SEGV USR2 TERM XCPU XFSZ SYS RTMIN RT_1], SA_RESTORER|SA_RESTART|SA_SIGINFO, 0x3ae180f500}, {SIG_DFL, [], 0}, 8) = 0
18597 write(7, "\0\332\0\0\1\0\0\0\1;\1,\0\201 \0\177\377s\10\0\0\1\0\0\224\0F\0\0\7\370"..., 218) = 218
18597 read(7, 0x1a87896, 8208)          = ? ERESTARTSYS (To be restarted)
-->  IPC socket sLISTENER is used for local node socket communication 
     Server process tnslsnr can't read from IPC socket sLISTENER because it is stopped by gdb 
       ( Normal processing is read the message and send a reply to lsnrctl process ) 
     Client process lsnrctl reads from an empty socket and gets blocked

Fix the problem

Find processes which are using that socket file and not responding with a reply
ot@gract3 Desktop]# lsof | grep sLISTENER
tnslsnr    4518      grid    9u     unix 0xffff8800085ef200       0t0             15687340 /var/tmp/.oracle/sLISTENER
tnslsnr    4518      grid   14u     unix 0xffff880003a56780       0t0             15690441 /var/tmp/.oracle/sLISTENER
tnslsnr    4525      grid    9u     unix 0xffff880028e11c80       0t0             15687917 /var/tmp/.oracle/sLISTENER_SCAN1
tnslsnr    4525      grid   14u     unix 0xffff880037ec3540       0t0             15690968 /var/tmp/.oracle/sLISTENER_SCAN1

[root@gract3 Desktop]# ps -elf | grep 4518
0 t grid      4518     1  0  80   0 - 45906 ptrace Aug05 ?        00:00:01 /u01/app/121/grid/bin/tnslsnr LISTENER -no_crs_notify -inherit
--> OK what we expected : Our listener process is using that IPC socket file 
    Potential problems:   
      - A debugger attached to the tnslsnr process
      - tnslsnr process no functioning any more ( blocked by resources, wild running/looping program )

Solution: kill that process and check listener process again
[root@gract3 Desktop]# kill -9 4518

Check listener status 
[grid@gract3 ~]$ lsnrctl status
STATUS of the LISTENER
------------------------
Alias                     LISTENER
Version                   TNSLSNR for Linux: Version 12.1.0.1.0 - Production
Start Date                06-AUG-2014 08:54:36
Uptime                    0 days 0 hr. 0 min. 35 sec

Check resource status 
Rescource NAME                 TARGET     STATE           SERVER       STATE_DETAILSa
-------------------------      ---------- ----------      ------------ ------------------                  
ora.LISTENER.lsnr              ONLINE     ONLINE          gract3       STABLE   
--> Note: Local listener is automatically restarted by clusterware

Leave a Reply

Your email address will not be published. Required fields are marked *


You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>