Thursday, 2 May 2013

Linux: How to diagnose oracle server process getting stuck, in oracle 11G



The example below is shows how to diagnose a stuck oracle server process, on Linux.
PID 11264 is an oracle server process, which is getting stuck.

First, we use the "strace" Linux command, which replaces "tusc" used in HP-UX systems:


[box1@TESTDB]/u01/app/oracle/admin/TESTDB/diag/rdbms/camssdb/TESTDB/trace >strace -fp 11264
Process 11264 attached - interrupt to quit
times({tms_utime=5630562, tms_stime=605, tms_cutime=0, tms_cstime=0}) = 3377336221
times({tms_utime=5630562, tms_stime=605, tms_cutime=0, tms_cstime=0}) = 3377336221
times({tms_utime=5630562, tms_stime=605, tms_cutime=0, tms_cstime=0}) = 3377336221
times({tms_utime=5630562, tms_stime=605, tms_cutime=0, tms_cstime=0}) = 3377336221
getrusage(RUSAGE_SELF, {ru_utime={56325, 616204}, ru_stime={6, 55079}, ...}) = 0
getrusage(RUSAGE_SELF, {ru_utime={56325, 616204}, ru_stime={6, 55079}, ...}) = 0
times({tms_utime=5632561, tms_stime=605, tms_cutime=0, tms_cstime=0}) = 3377338220
times({tms_utime=5632561, tms_stime=605, tms_cutime=0, tms_cstime=0}) = 3377338220
times({tms_utime=5632561, tms_stime=605, tms_cutime=0, tms_cstime=0}) = 3377338220
times({tms_utime=5632561, tms_stime=605, tms_cutime=0, tms_cstime=0}) = 3377338220
times({tms_utime=5632561, tms_stime=605, tms_cutime=0, tms_cstime=0}) = 3377338220
times({tms_utime=5632561, tms_stime=605, tms_cutime=0, tms_cstime=0}) = 3377338220
times({tms_utime=5632561, tms_stime=605, tms_cutime=0, tms_cstime=0}) = 3377338220
times({tms_utime=5632561, tms_stime=605, tms_cutime=0, tms_cstime=0}) = 3377338220
read(13, "\0BC\7\320\0\n\0\0\0\1\0\0\0\0e\1hK\363\367\"\24\0\0\0\0\0\0\0\0\0"..., 2048) = 2048
times({tms_utime=5636503, tms_stime=606, tms_cutime=0, tms_cstime=0}) = 3377342164
times({tms_utime=5636503, tms_stime=606, tms_cutime=0, tms_cstime=0}) = 3377342164

Second, we use lsof Linux command:


[box1@TESTDB]/u01/app/oracle/admin/TESTDB/diag/rdbms/camssdb/TESTDB/trace >/usr/sbin/lsof -p 11264 |grep 13
oracle  11264 oracle  cwd    DIR      253,9        4096   1062513 /u01/app/oracle/product/11.1.0.7/dbs
oracle  11264 oracle  DEL    REG       0,13              25100301 /3
oracle  11264 oracle  mem    REG      253,0      139504    229689 /lib64/ld-2.5.so
oracle  11264 oracle  mem    REG      253,0      615136    229429 /lib64/libm-2.5.so
oracle  11264 oracle  mem    REG      253,9     2513705   1579856 /u01/app/oracle/product/11.1.0.7/lib/libhasgen11.so
oracle  11264 oracle  mem    REG      253,9       13159   1579985 /u01/app/oracle/product/11.1.0.7/lib/libskgxn2.so
oracle  11264 oracle  mem    REG      253,9     1062133   1579956 /u01/app/oracle/product/11.1.0.7/lib/libocr11.so
oracle  11264 oracle    5r   DIR        0,3           0 738197513 /proc/11264/fd
oracle  11264 oracle    8r   DIR        0,3           0 738197513 /proc/11264/fd
oracle  11264 oracle   11u   REG    253,118  2097160192   7913475 /amssdb_petcamssdb/ora_data00/PAMSSDB/system_CAMSSDB_01.dbf
oracle  11264 oracle   13u  IPv4 1506971131                   TCP anacaj:ncube-lm->box2.qc.bell.ca:17551 (ESTABLISHED)   ---------------------> This is what we are looking for
oracle  11264 oracle   14u   REG    253,124 20971528192  13336587 /amssdb_petcamssdb/ora_data06/PAMSSDB/pool_data_CAMSSDB_03.dbf
oracle  11264 oracle   15u   REG    253,118 10485768192   7913479 /amssdb_petcamssdb/ora_data00/PAMSSDB/pool_ix_CAMSSDB_01.dbf
oracle  11264 oracle   24u   REG    253,124 10485768192  13336585 /amssdb_petcamssdb/ora_data06/PAMSSDB/abp_ix_l2_CAMSSDB_05.dbf
oracle  11264 oracle   29u   REG    253,118 15728648192   7913478 /amssdb_petcamssdb/ora_data00/PAMSSDB/pool_data_CAMSSDB_01.dbf


 Last step, login to box2 and look for port 17551:

/usr/sbin/lsof |grep 17551

1 comment:

  1. On HP-UX, use "tusc" instead of strace, since strace works on Linux only.

    ReplyDelete