Troubleshooting Clusterware startup problems with DTRACE

Case VII :  MDNSD doesn’t start as Port 5353 is already in use

Status lower Clusterware resources aftert startup:

*****  Local Resources: *****
Resource NAME               INST   TARGET       STATE        SERVER          STATE_DETAILS
--------------------------- ----   ------------ ------------ --------------- -----------------------------------------
ora.asm                        1   ONLINE       OFFLINE      -               STABLE
ora.cluster_interconnect.haip  1   ONLINE       OFFLINE      -               STABLE
ora.crf                        1   ONLINE       ONLINE       hract21         STABLE
ora.crsd                       1   ONLINE       OFFLINE      -               STABLE
ora.cssd                       1   ONLINE       OFFLINE      -               STABLE
ora.cssdmonitor                1   ONLINE       ONLINE       hract21         STABLE
ora.ctssd                      1   ONLINE       OFFLINE      -               STABLE
ora.diskmon                    1   ONLINE       OFFLINE      -               STABLE
ora.drivers.acfs               1   ONLINE       ONLINE       hract21         STABLE
ora.evmd                       1   ONLINE       INTERMEDIATE hract21         STABLE
ora.gipcd                      1   ONLINE       OFFLINE      -               STABLE
ora.gpnpd                      1   ONLINE       ONLINE       hract21         STABLE
ora.mdnsd                      1   ONLINE       INTERMEDIATE hract21         STABLE
ora.storage                    1   ONLINE       OFFLINE      -               STABLE
--> MDNSD doesn't start 
GREP command:
[grid@hract21 trace]$    grep "2015-02-17 12:4" * | egrep 'Address already in use'
mdnsd.trc:
2015-02-17 12:43:26.211079 :  CLSDMT:2281699072: PID for the Process [19764], connkey 9
2015-02-17 12:43:27.193282 :    MDNS:2353129024:  mdnsd interface eth0 (0x2 AF=2 f=0x1043 mcast=-1) 
                                192.168.1.7 mask 255.255.255.0 FAILED. Error 98 (Address already in use)
2015-02-17 12:43:27.194932 :    MDNS:2353129024:  mdnsd interface eth1 (0x3 AF=2 f=0x1043 mcast=-1) 192.168.5.121 mask 255.255.255.0 FAILED. Error 98 (Address already in use)
2015-02-17 12:43:27.194986 :    MDNS:2353129024:  mdnsd interface eth2 (0x4 AF=2 f=0x1043 mcast=-1) 
                                192.168.2.121 mask 255.255.255.0 FAILED. Error 98 (Address already in use)
2015-02-17 12:43:27.198670 :    MDNS:2353129024:  mdnsd interface eth3 (0x5 AF=2 f=0x1043 mcast=-1) 
                                192.168.3.121 mask 255.255.255.0 FAILED. Error 98 (Address already in use)
2015-02-17 12:43:27.198723 :    MDNS:2353129024:  mdnsd interface lo (0x1 AF=2 f=0x49 mcast=-1) 127.0.0.1 mask 255.0.0.0 FAILED. Error 98 (Address already in use)
2015-02-17 12:43:27.198726 :    MDNS:2353129024:  Error! No valid netowrk interfaces found to setup mDNS.
2015-02-17 12:43:27.198729 :    MDNS:2353129024:  Oracle mDNSResponder ver. mDNSResponder-1076 (Jun 30 2014 19:39:45) , init_rv=-65537
2015-02-17 12:43:27.198818 :    MDNS:2353129024:  stopping

CLUVFY :
Following cluvfy command doesn't detect the problem:
[grid@hract21 CLUVFY]$  ssh hract22 cluvfy stage -pre crsinst -n hract21,hract22 -networks eth1:192.168.5.0:PUBLIC/eth2:192.168.2.0:cluster_interconnect

DTRACE SCRIPT :
syscall::bind:entry
{
self->fd = arg0;
self->sockaddr =  arg1;
sockaddrp  =(struct sockaddr *)copyin(self->sockaddr, sizeof(struct sockaddr));
s = (char * )sockaddrp;
self->port =  ( unsigned short )(*(s+3)) + ( unsigned short ) ((*(s+2)*256));
self->ip1=*(s+4);
self->ip2=*(s+5);
self->ip3=*(s+6);
self->ip4=*(s+7);
}

/*
Generic DTRACE script tracking failed bind() system calls:
*/
syscall::bind:return
/arg0<0 && execname != "crsctl.bin"/
{
printf("- Exec: %s - PID: %d  bind() failed with error : %d - fd : %d - IP: %d.%d.%d.%d - Port: %d " , execname, pid, arg0, self->fd,
self->ip1, self->ip2, self->ip3, self->ip4,    self->port  );
}

DTRACE OUTPUT :
[root@hract21 DTRACE]# dtrace -s check_rac.d
dtrace: script 'check_rac.d' matched 19 probes
CPU     ID                    FUNCTION:NAME
0      1                           :BEGIN GRIDHOME: /u01/app/121/grid - GRIDHOME/bin: /u01/app/121/grid/bin  - Temp Loc: /var/tmp/.oracle -  PIDFILE: hract21.pid - Port for bind: 53
0     93                    sendto:return - Exec: ohasd.bin - PID: 17321  sendto() failed with error : -32 - fd : 173
0      9                      open:return - Exec: ohasd.bin - open() /var/tmp/.oracle/npohasd failed with error: -6 - scan_dir:  /var/tmp/.oracle
0      9                      open:return - Exec: ohasd.bin - open() /var/tmp/.oracle/npohasd failed with error: -6 - scan_dir:  /var/tmp/.oracle
0      9                      open:return - Exec: ohasd.bin - open() /var/tmp/.oracle/npohasd failed with error: -6 - scan_dir:  /var/tmp/.oracle
0    103                      bind:return - Exec: mdnsd.bin - PID: 18943  bind() failed with error : -98 - fd : 33 - IP: 0.0.0.0 - Port: 5353
0    103                      bind:return - Exec: mdnsd.bin - PID: 18943  bind() failed with error : -98 - fd : 34 - IP: 0.0.0.0 - Port: 5353
0    103                      bind:return - Exec: mdnsd.bin - PID: 18943  bind() failed with error : -98 - fd : 34 - IP: 0.0.0.0 - Port: 5353
0    103                      bind:return - Exec: mdnsd.bin - PID: 18943  bind() failed with error : -98 - fd : 34 - IP: 0.0.0.0 - Port: 5353
0    103                      bind:return - Exec: mdnsd.bin - PID: 18943  bind() failed with error : -98 - fd : 34 - IP: 0.0.0.0 - Port: 5353
0    103                      bind:return - Exec: mdnsd.bin - PID: 18943  bind() failed with error : -98 - fd : 34 - IP: 0.0.0.0 - Port: 5353

Investigate & Fix :
[root@hract21 network-scripts]#  netstat -taupen | egrep ":53 |:5353 |:42424"
udp        0      0 0.0.0.0:5353          0  36230279   18804/ohasd.bin
udp        0      0 230.0.1.0:42424       0  36230278   18804/ohasd.bin
udp        0      0 224.0.0.251:42424     0  35356639   12631/java
--> The clusterware port 5353 is used by a java program with PID 17263

FIX : kill that process with and restart CW kill -9 17263

2 thoughts on “Troubleshooting Clusterware startup problems with DTRACE”

Leave a Reply

Your email address will not be published. Required fields are marked *