Thursday 30 May 2013

RMAN-06059: expected archived log not found

RMAN hot backup fails with the following error message:

RMAN>backup full database plus archivelog delete input;


RMAN-03002: failure of backup command at 05/30/2013 14:57:23
RMAN-06059: expected archived log not found, loss of archived log compromises recoverability
ORA-19625: error identifying file /TNXAMAES/RMAN/arc/RMAN1_1_816783266.arc
ORA-27037: unable to obtain file status
Linux-x86_64 Error: 2: No such file or directory
Additional information: 3

This is because I've manually deleted this file from the host, but RMAN metadata has still information about this file. The correct thing to do is to follow up the delete with the below RMAN command:

RMAN> delete expired archivelog all;

released channel: ORA_DISK_1
allocated channel: ORA_DISK_1
channel ORA_DISK_1: SID=266 device type=DISK
List of Archived Log Copies for database with db_unique_name RMAN
=====================================================================

Key     Thrd Seq     S Low Time
------- ---- ------- - ---------
1       1    1       X 30-MAY-13
        Name: /TNXAMAES/RMAN/arc/RMAN1_1_816783266.arc

2       1    2       X 30-MAY-13
        Name: /TNXAMAES/RMAN/arc/RMAN1_2_816783266.arc

3       1    3       X 30-MAY-13
        Name: /TNXAMAES/RMAN/arc/RMAN1_3_816783266.arc


Do you really want to delete the above objects (enter YES or NO)? YES
deleted archived log
archived log file name=/TNXAMAES/RMAN/arc/RMAN1_1_816783266.arc RECID=1 STAMP=816788064
deleted archived log
archived log file name=/TNXAMAES/RMAN/arc/RMAN1_2_816783266.arc RECID=2 STAMP=816788065
deleted archived log
archived log file name=/TNXAMAES/RMAN/arc/RMAN1_3_816783266.arc RECID=3 STAMP=816788068
Deleted 3 EXPIRED objects

 You can double check that now RMAN is up to date:

RMAN> crosscheck archivelog all;

released channel: ORA_DISK_1
allocated channel: ORA_DISK_1
channel ORA_DISK_1: SID=266 device type=DISK
validation succeeded for archived log
archived log file name=/TNXAMAES/RMAN/fra/RMAN1_4_816783266.arc RECID=4 STAMP=816788384
validation succeeded for archived log
archived log file name=/TNXAMAES/RMAN/fra/RMAN1_5_816783266.arc RECID=5 STAMP=816788385
validation succeeded for archived log
archived log file name=/TNXAMAES/RMAN/fra/RMAN1_6_816783266.arc RECID=6 STAMP=816788388
validation succeeded for archived log
archived log file name=/TNXAMAES/RMAN/fra/RMAN1_7_816783266.arc RECID=7 STAMP=816788388
validation succeeded for archived log
archived log file name=/TNXAMAES/RMAN/fra/RMAN/archivelog/2013_05_30/o1_mf_1_8_8th6dcmt_.arc RECID=8 STAMP=816791115
validation succeeded for archived log
archived log file name=/TNXAMAES/RMAN/fra/RMAN/archivelog/2013_05_30/o1_mf_1_9_8th88mbv_.arc RECID=9 STAMP=816793043
Crosschecked 6 objects

or

RMAN> list archivelog all;



Wednesday 22 May 2013

Example of "for" loop in ksh and echo for special characters

Let's assume that we have a set of scripts in the current directories, all with .sql suffix and we want to print 2 lines of the beginning of each such script.

We'll use the "for" loop in ksh and we'll also use the escape character "\" to print the special character: "


>for f in `ls *.sql`
> do
> echo "set appinfo on" >> 1
> echo "select sys_context('USERENV', 'MODULE')  as \"Running Script\" from dual;" >>1
> cat $f >> 1
> mv -f 1 $f
> done

How to print the name of the SQL script running, while running it?

Let's assume that for debugging purposes, we need to print the name of the SQL script that we are running, and this while running the script, meaning that we will need this info in the spool file.

What we need to do is to add 2 lines at the beginning of the script:


set appinfo on
select sys_context('USERENV', 'MODULE')  as "Running Script" from dual;


Example:


$>cat /tmp/1.sql

set appinfo on
select sys_context('USERENV', 'MODULE')  as "Running Script" from dual;


select * from dual;


$>sqlplus /

SQL*Plus: Release 11.1.0.7.0 - Production on Wed May 22 15:40:48 2013

Copyright (c) 1982, 2008, Oracle.  All rights reserved.


Connected to:
Oracle Database 11g Enterprise Edition Release 11.1.0.7.0 - 64bit Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options

SQL> @/tmp/1

Running Script
--------------------------------------------------------------------------------
01@ /tmp/1.sql


D
-
X





Thursday 16 May 2013

How to trace with binds a query which runs in seconds?

The challenge here is that we have an application which connects to the database, runs a very fast query ( a mater of seconds) and then they disconnect. How are we going not only to trace these queries, but also to catch the bind variables used?

The answer is, we'll use a logon trigger and "alter session set event".
Please see an example of such trigger below:




SQL> get cre_logon_trigger.sql
  1  -- To be run as sys ***** --
  2  create or replace trigger apptest_logon after logon on database
  3  begin
  4  if user='APPTEST'
  5  then
  6  execute immediate 'alter session set tracefile_identifier = ''apptest''';
  7  ----execute immediate 'alter session set sql_trace=true';
  8  execute immediate 'alter session set events ''10046 trace name context forever, level 4''';
  9  end if;
 10* end;




A few remarks:

Line #6: we are identifying the queries by the DB user used, apptest.
Line #8: we use level 4 for 10046 event, which means to trace bind variables.
Line  #7: commented out, usually used to trace when simple trace files is required, no binds.

Tuesday 14 May 2013

Oracle SQL Query Tuning Examples, Part 2

This is a 2nd post in a series which started with:

 Oracle SQL Query Tuning Part 1


This time, we'll look into a query which is doing 2 nested loops, and we'll see that by adding parallel hint for the driving table of the first nested loop being executed, the query is being speed up considerably.

The unexpected here is that by their nature, Nested Loops are being execute serially, meaning that the driving tables is being accessed and then for each entry which satisfy the query, we go to the inner table and searching for matches. Now, by adding parallel hint for the driving table, even if it is a small table, we'll still gain in performance, because the nested loop is being run in parallel.

The example below is proving this assumption:



 SELECT /*+ parallel(test2,4) */
test3.ban,test3.ban_status,test3.account_type,test3.account_sub_type,
test3.bill_cycle,test1.link_type,test2.*
FROM test1 anl , test2 adr , test3 ba
WHERE test1.address_id = test2.address_id
AND NVL(test1.expiration_date,'31-dec-4700')>SYSDATE
AND test1.id = test3.id
/


593 rows selected.

Elapsed: 00:00:06.17

Execution Plan
----------------------------------------------------------
Plan hash value: 2057958791

---------------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                              | Name                  | Rows  | Bytes | Cost  | Pstart| Pstop |    TQ  |IN-OUT| PQ Distrib |
---------------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                       |                       |  7223 |  3491K|  3096 |       |       |        |      |            |
|   1 |  PX COORDINATOR                        |                       |       |       |       |       |       |        |      |            |
|   2 |   PX SEND QC (RANDOM)                  | :TQ10000              |  7223 |  3491K|  3096 |       |       |  Q1,00 | P->S | QC (RAND)  |
|   3 |    NESTED LOOPS                        |                       |  7223 |  3491K|  3096 |       |       |  Q1,00 | PCWP |            |
|   4 |     NESTED LOOPS                       |                       |  7223 |  3378K|  2962 |       |       |  Q1,00 | PCWP |            |
|   5 |      PX BLOCK ITERATOR                 |                       |       |       |       |       |       |  Q1,00 | PCWC |            |
|   6 |       TABLE ACCESS FULL                | test2    | 31937 |    13M|     4 |       |       |  Q1,00 | PCWP |            |
|*  7 |      TABLE ACCESS BY GLOBAL INDEX ROWID| test1     |     1 |    21 |     1 | ROWID | ROWID |  Q1,00 | PCWP |            |
|*  8 |       INDEX RANGE SCAN                 | test1_7IX |     5 |       |     1 |       |       |  Q1,00 | PCWP |            |
|   9 |     PARTITION RANGE ITERATOR           |                       |     1 |    16 |     1 |   KEY |   KEY |  Q1,00 | PCWP |            |
|  10 |      TABLE ACCESS BY LOCAL INDEX ROWID | test3       |     1 |    16 |     1 |   KEY |   KEY |  Q1,00 | PCWP |            |
|* 11 |       INDEX UNIQUE SCAN                | test3_PK    |     1 |       |     1 |   KEY |   KEY |  Q1,00 | PCWP |            |
---------------------------------------------------------------------------------------------------------------------------------------------



Statistics
----------------------------------------------------------
         18  recursive calls
          0  db block gets
     225299  consistent gets
      43033  physical reads
          0  redo size
      38215  bytes sent via SQL*Net to client
        917  bytes received via SQL*Net from client
         41  SQL*Net roundtrips to/from client
          1  sorts (memory)
          0  sorts (disk)
        593  rows processed



 Without parallel:


SELECT
test3.ban,test3.ban_status,test3.account_type,test3.account_sub_type,
test3.bill_cycle,test1.link_type,test2.*
FROM test1 anl , test2 adr , test3 ba
WHERE test1.address_id = test2.address_id
AND NVL(test1.expiration_date,'31-dec-4700')>SYSDATE
AND test1.id = test3.id;


593 rows selected.

Elapsed: 00:02:17.19

Execution Plan
----------------------------------------------------------
Plan hash value: 2057958791

---------------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                              | Name                  | Rows  | Bytes | Cost  | Pstart| Pstop |    TQ  |IN-OUT| PQ Distrib |
---------------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                       |                       |  7223 |  3491K|  9284 |       |       |        |      |            |
|   1 |  PX COORDINATOR                        |                       |       |       |       |       |       |        |      |            |
|   2 |   PX SEND QC (RANDOM)                  | :TQ10000              |  7223 |  3491K|  9284 |       |       |  Q1,00 | P->S | QC (RAND)  |
|   3 |    NESTED LOOPS                        |                       |  7223 |  3491K|  9284 |       |       |  Q1,00 | PCWP |            |
|   4 |     NESTED LOOPS                       |                       |  7223 |  3378K|  8882 |       |       |  Q1,00 | PCWP |            |
|   5 |      PX BLOCK ITERATOR                 |                       |       |       |       |       |       |  Q1,00 | PCWC |            |
|   6 |       TABLE ACCESS FULL                | test2    | 31937 |    13M|    10 |       |       |  Q1,00 | PCWP |            |
|*  7 |      TABLE ACCESS BY GLOBAL INDEX ROWID| test1     |     1 |    21 |     1 | ROWID | ROWID |  Q1,00 | PCWP |            |
|*  8 |       INDEX RANGE SCAN                 | test1_7IX |     5 |       |     1 |       |       |  Q1,00 | PCWP |            |
|   9 |     PARTITION RANGE ITERATOR           |                       |     1 |    16 |     1 |   KEY |   KEY |  Q1,00 | PCWP |            |
|  10 |      TABLE ACCESS BY LOCAL INDEX ROWID | test3       |     1 |    16 |     1 |   KEY |   KEY |  Q1,00 | PCWP |            |
|* 11 |       INDEX UNIQUE SCAN                | test3_PK    |     1 |       |     1 |   KEY |   KEY |  Q1,00 | PCWP |            |
---------------------------------------------------------------------------------------------------------------------------------------------

P


Statistics
----------------------------------------------------------
         27  recursive calls
          3  db block gets
     225016  consistent gets
      43006  physical reads
        728  redo size
      37747  bytes sent via SQL*Net to client
        917  bytes received via SQL*Net from client
         41  SQL*Net roundtrips to/from client
          1  sorts (memory)
          0  sorts (disk)
        593  rows processed



A nested loops join that uses an index to join two tables can be fully parallelized providing that the driving table is accessed by a table scan.

Friday 10 May 2013

Oracle SQL Query Tuning Examples, Part 1

This is the first post in a series of interesting SQL tuning real life cases. These are, hopefully, more interesting than the usual tuning steps: looking for full table scans, missing indexes, missing statistics, obvious wrong execution plan.

The first examples is querying table1, which is about 100 GB in size , and it has an index on sys_creation_date.



select * from TABLE1 cs
where actv_code = 'MCN'
and trunc(sys_creation_date)=to_date('20130331','YYYYMMDD')
/


Execution Plan
----------------------------------------------------------
Plan hash value: 1027680822

------------------------------------------------------------------------------------------------------
| Id  | Operation            | Name             | Rows  | Bytes | Cost  |    TQ  |IN-OUT| PQ Distrib |
------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |                  |   101K|    12M| 35815 |        |      |            |
|   1 |  PX COORDINATOR      |                  |       |       |       |        |      |            |
|   2 |   PX SEND QC (RANDOM)| :TQ10000         |   101K|    12M| 35815 |  Q1,00 | P->S | QC (RAND)  |
|   3 |    PX BLOCK ITERATOR |                  |   101K|    12M| 35815 |  Q1,00 | PCWC |            |
|*  4 |     TABLE ACCESS FULL| TABLE1 |   101K|    12M| 35815 |  Q1,00 | PCWP |            |
------------------------------------------------------------------------------------------------------

The optimizer goes with a full table scan, since we have a function on the sys_creation_date and we don't have a function based index defined on trunc(sys_creation_date).

The interesting thing here is that we are looking for a date equal to March 31st 2013, and this can be accomplished by moving the trunc function to the right side and now the optimizer will choose the index.


  SQL> l
  1  select * from TABLE1 cs
  2  where actv_code = 'MCN'
  3* and sys_creation_date >= to_date('21030331','YYYYMMDD') and sys_creation_date < to_date('20130401','YYYYMMDD')
SQL> /

Execution Plan
----------------------------------------------------------
Plan hash value: 3230845712

-------------------------------------------------------------------------------------
| Id  | Operation                    | Name                 | Rows  | Bytes | Cost  |
-------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |                      |     1 |   132 |     0 |
|*  1 |  FILTER                      |                      |       |       |       |
|*  2 |   TABLE ACCESS BY INDEX ROWID| TABLE1     |     1 |   132 |     3 |
|*  3 |    INDEX RANGE SCAN          | TABLE1_1IX |    15 |       |     1 |
-------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter(NULL IS NOT NULL)
   2 - filter("ACTV_CODE"='MCN')
   3 - access("SYS_CREATION_DATE">=TO_DATE(' 2103-03-31 00:00:00',
              'syyyy-mm-dd hh24:mi:ss') AND "SYS_CREATION_DATE"<TO_DATE(' 2013-04-01
              00:00:00', 'syyyy-mm-dd hh24:mi:ss'))

Note
-----
   - cpu costing is off (consider enabling it)


 Note that we cannot use the between operator, since it is inclusive, and we don't want dates of April 1st.
 So we've managed to avoid an expensive full table scan and to take advantage of the existing index, by re-writing the query; the two queries are logically identical.



Tuesday 7 May 2013

Why oracle is reading "much more" data than needed?

Let's try a simple scenario; we are querying a table big_table, doing full table scan and we know that there are about 2 million blocks in the table. However, the query is reading much more data than these 2 million blocks. What is the reason? The simple answer is that oracle is reading data from Undo tablespace, to ensure read consistency. Is it possible that big_table was modified heavily and data was not committed yet.

One way to validate this assumption is to run the query below, where the SID for the FTS on big_table is 161:


SQL> select NAME from v$datafile where FILE# in ( select p1 from v$session_wait where sid in ( select sid from sid=161));

NAME
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
/testdb_ora/ora_data01/undo01.dbf


We can say from the output that we are reading data from the Undo tablespace.

"truncate table" and oracle redo logs


Is it "truncate" statement written to the oracle redo logs? Yes, but just as any other DDL command, just as "truncate table <table_name>.

Let's look at an example below, where I've used the logmnr oracle utility to search through the redo logs:

First we create a table t and then truncate it:


SQL> create table t as select * from obj;

Table created.


SQL> truncate table t;

Table truncated.

Now we're going to switch the logfile and then initialize the Logmnr:

SQL> alter system switch logfile;

System altered.

SQL> execute DBMS_LOGMNR.ADD_LOGFILE('/pete_pbmlol/redoB1/redo_pbmlol_B1.dbf',options => dbms_logmnr.new);

PL/SQL procedure successfully completed.

SQL> execute DBMS_LOGMNR.START_LOGMNR(options => -
                                dbms_logmnr.dict_from_online_catalog);
>
PL/SQL procedure successfully completed.

We create a table based on the logmnr results, for convenience:

SQL> create table mycontents nologging parallel(degree 8)
  2  tablespace tools
  3   as select * from v$logmnr_contents;

Table created.

The moment of truth, we query sql_redo with regards to our table:


SQL> select sql_redo from mycontents where TABLE_NAME='T';

SQL_REDO
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
create table t as select * from obj;
truncate table t;
Unsupported


 And there is no undo statement, as expected :-)


SQL> select sql_undo  from mycontents where TABLE_NAME='T';

SQL_UNDO
--------------------------------------------------------------------------------


Unsupported

Thursday 2 May 2013

Linux: How to diagnose oracle server process getting stuck, in oracle 11G



The example below is shows how to diagnose a stuck oracle server process, on Linux.
PID 11264 is an oracle server process, which is getting stuck.

First, we use the "strace" Linux command, which replaces "tusc" used in HP-UX systems:


[box1@TESTDB]/u01/app/oracle/admin/TESTDB/diag/rdbms/camssdb/TESTDB/trace >strace -fp 11264
Process 11264 attached - interrupt to quit
times({tms_utime=5630562, tms_stime=605, tms_cutime=0, tms_cstime=0}) = 3377336221
times({tms_utime=5630562, tms_stime=605, tms_cutime=0, tms_cstime=0}) = 3377336221
times({tms_utime=5630562, tms_stime=605, tms_cutime=0, tms_cstime=0}) = 3377336221
times({tms_utime=5630562, tms_stime=605, tms_cutime=0, tms_cstime=0}) = 3377336221
getrusage(RUSAGE_SELF, {ru_utime={56325, 616204}, ru_stime={6, 55079}, ...}) = 0
getrusage(RUSAGE_SELF, {ru_utime={56325, 616204}, ru_stime={6, 55079}, ...}) = 0
times({tms_utime=5632561, tms_stime=605, tms_cutime=0, tms_cstime=0}) = 3377338220
times({tms_utime=5632561, tms_stime=605, tms_cutime=0, tms_cstime=0}) = 3377338220
times({tms_utime=5632561, tms_stime=605, tms_cutime=0, tms_cstime=0}) = 3377338220
times({tms_utime=5632561, tms_stime=605, tms_cutime=0, tms_cstime=0}) = 3377338220
times({tms_utime=5632561, tms_stime=605, tms_cutime=0, tms_cstime=0}) = 3377338220
times({tms_utime=5632561, tms_stime=605, tms_cutime=0, tms_cstime=0}) = 3377338220
times({tms_utime=5632561, tms_stime=605, tms_cutime=0, tms_cstime=0}) = 3377338220
times({tms_utime=5632561, tms_stime=605, tms_cutime=0, tms_cstime=0}) = 3377338220
read(13, "\0BC\7\320\0\n\0\0\0\1\0\0\0\0e\1hK\363\367\"\24\0\0\0\0\0\0\0\0\0"..., 2048) = 2048
times({tms_utime=5636503, tms_stime=606, tms_cutime=0, tms_cstime=0}) = 3377342164
times({tms_utime=5636503, tms_stime=606, tms_cutime=0, tms_cstime=0}) = 3377342164

Second, we use lsof Linux command:


[box1@TESTDB]/u01/app/oracle/admin/TESTDB/diag/rdbms/camssdb/TESTDB/trace >/usr/sbin/lsof -p 11264 |grep 13
oracle  11264 oracle  cwd    DIR      253,9        4096   1062513 /u01/app/oracle/product/11.1.0.7/dbs
oracle  11264 oracle  DEL    REG       0,13              25100301 /3
oracle  11264 oracle  mem    REG      253,0      139504    229689 /lib64/ld-2.5.so
oracle  11264 oracle  mem    REG      253,0      615136    229429 /lib64/libm-2.5.so
oracle  11264 oracle  mem    REG      253,9     2513705   1579856 /u01/app/oracle/product/11.1.0.7/lib/libhasgen11.so
oracle  11264 oracle  mem    REG      253,9       13159   1579985 /u01/app/oracle/product/11.1.0.7/lib/libskgxn2.so
oracle  11264 oracle  mem    REG      253,9     1062133   1579956 /u01/app/oracle/product/11.1.0.7/lib/libocr11.so
oracle  11264 oracle    5r   DIR        0,3           0 738197513 /proc/11264/fd
oracle  11264 oracle    8r   DIR        0,3           0 738197513 /proc/11264/fd
oracle  11264 oracle   11u   REG    253,118  2097160192   7913475 /amssdb_petcamssdb/ora_data00/PAMSSDB/system_CAMSSDB_01.dbf
oracle  11264 oracle   13u  IPv4 1506971131                   TCP anacaj:ncube-lm->box2.qc.bell.ca:17551 (ESTABLISHED)   ---------------------> This is what we are looking for
oracle  11264 oracle   14u   REG    253,124 20971528192  13336587 /amssdb_petcamssdb/ora_data06/PAMSSDB/pool_data_CAMSSDB_03.dbf
oracle  11264 oracle   15u   REG    253,118 10485768192   7913479 /amssdb_petcamssdb/ora_data00/PAMSSDB/pool_ix_CAMSSDB_01.dbf
oracle  11264 oracle   24u   REG    253,124 10485768192  13336585 /amssdb_petcamssdb/ora_data06/PAMSSDB/abp_ix_l2_CAMSSDB_05.dbf
oracle  11264 oracle   29u   REG    253,118 15728648192   7913478 /amssdb_petcamssdb/ora_data00/PAMSSDB/pool_data_CAMSSDB_01.dbf


 Last step, login to box2 and look for port 17551:

/usr/sbin/lsof |grep 17551