Wednesday, April 16, 2014

All about Infiniband switches on Exadata :


1).The InfiniBand Network connects the database servers and Exadata Storage Servers using the InfiniBand switches on the rack. It is a private network between the database servers and Exadata Storage Servers.
2).A Exadata Rack contains at least 2 Infiniband switches. The third switch is called Spine Switch. Which connects both the leaf switches in half and full rack database machines.






3). Spine switch is for Connection of multiple racks to form a single and larger database machine environment.

4). Each Server (Storage and Database Server) Contains 2 Infiniband ports which are bonded together in ACTIVE/PASSIVE way (till X3 and ACTIVE/ACTIVE in X4).

5). The Active Passive connections are spread across both the switches using FAT-TREE switched Fabric Network Topology.

6). Infiniband switches run centOS.

7). MONITOR SWITCH PORTS:   To check failed switch and sensor hardware that exceeds preset thresholds, Run these commands every 1-2 minutes.
Use,
Login to the switch using root .
and run,
$ showunhealthy
OK - No unhealthy sensors
$ checkpower
PSU 0 present OK
PSU 1 present OK
All PSUs OK

7.1).In case of any issue reported on the above command, Use "env_test" command.
Login as root on IB switch and run,
# env_test
Environment test started:
Starting Environment Daemon test:
Environment daemon running
Environment Daemon test returned OK
Starting Voltage test:
Voltage ECB OK
Measured 3.3V Main = 3.27 V
Measured 3.3V Standby = 3.39 V
Measured 12V = 11.97 V
Measured 5V = 4.99 V
Measured VBAT = 3.09 V
Measured 2.5V = 2.49 V
Measured 1.8V = 1.78 V
Measured I4 1.2V = 1.22 V
Voltage test returned OK
Starting PSU test:
PSU 0 present OK
PSU 1 present OK
PSU test returned OK
Starting Temperature test:
Back temperature 29
Front temperature 30
SP temperature 48
Switch temperature 43, maxtemperature 45
Temperature test returned OK
Starting FAN test:
Fan 0 not present
Fan 1 running at rpm 12099
Fan 2 running at rpm 11881
Fan 3 running at rpm 12208
Fan 4 not present
FAN test returned OK
Starting Connector test:
Connector test returned OK
Starting Onboard ibdevice test:
Switch OK
All Internal ibdevices OK
Onboard ibdevice test returned OK
Starting SSD test:
SSD test returned OK
Environment test PASSED

8).MONITOR IB SWITCH PORTS : Use ibqueryerrors.pl on any of the database node or switches. Storage servers need not be checked as its automatically checked by Exadata Cell software(Part of MS)
Login as root to database or IB Switch and run,
# ibqueryerrors.pl -s RcvSwRelayErrors,RcvRemotePhysErrors,XmtDiscards,XmtContraintErrors,RcvContraintErrors,ExcBufOverrunErrors,Vl15Dropped
You should run this every 1 or 2 min to check if the value is raising.
Check for SymbolErrors,RcvErrors,LinkIntegrityErrors

9).  To check infiniband Firmware versions,
On infiniband Switch, Login as root user and then,
# version | head -1 | cut -d" " -f5
10).Monitor Database Node IB Ports:
Login to database server as root and then run
ibstatus => check that every port shows up in the output(2 per node).

 Sample Output :

Infiniband device 'mlx4_0' port 1 status:
        default gid:     fe80:0000:0000:0000:0021:2800:01ce:d28b
        base lid:        0x26
        sm lid:          0x3
        state:           4: ACTIVE
        phys state:      5: LinkUp
        rate:            40 Gb/sec (4X QDR)
        link_layer:      IB

Infiniband device 'mlx4_0' port 2 status:
        default gid:     fe80:0000:0000:0000:0021:2800:01ce:d28c
        base lid:        0x27
        sm lid:          0x3
        state:           4: ACTIVE
        phys state:      5: LinkUp
        rate:            40 Gb/sec (4X QDR)
        link_layer:      IB

perfquery => Check for SymbolErrors, RcvErrors, LinkIntegrityErrors
ifconfig, => Check the bondib0(ib0 and ib1) are up.
ping => Check for connectivity.
rds-ping => Check for connectivity.

11). Monitor the infiniband Fabric. Can be run from either Database node or one of the infiniband switches.
To Locate SM running,
Login as root on DB node or IB switch and run
# sminfo
sminfo: sm lid 3 sm guid 0x2128469156a0a0, activity count 55495849 priority 14 state 3 SMINFO_MASTER
and then
# ibswitches
Switch  : 0x002128469156a0a0 ports 36 "SUN DCS 36P QDR aeldb3sw-ibs0 10.146.28.50" enhanced port 0 lid 3 lmc 0
Switch  : 0x00212846914ba0a0 ports 36 "SUN DCS 36P QDR aeldb3sw-ibb0 10.146.28.52" enhanced port 0 lid 2 lmc 0
Switch  : 0x002128469157a0a0 ports 36 "SUN DCS 36P QDR aeldb3sw-iba0 10.146.28.51" enhanced port 0 lid 1 lmc 0

From the above command, 0x002128469156a0a0   is the Switch where SM is running(Compare the GUID) from the above command(sminfo)
Or login to one of IB switch and simply run,
# getmaster
Local SM not enabled
20140131 09:55:06 Master SubnetManager on sm lid 3 sm guid 0x2128469156a0a0 : SUN DCS 36P QDR aeldb3sw-ibs0 10.146.28.50

12) On a Full or Half rack node, Spine switch is present and thats where the SM should be running,
To identify spine switch,
run,
# ibnetdiscover -p | awk '/^SW + [0-9] +  + [0-9] + + 0x[0-9 \ a-e]+ + [0 - 9] + x .DR - [SW | CA] .*/ {if (spine [$4] == " ") spine[$4] == "yes" if ((spine [$8] == "CA")  spine[$4] == "no" } END { for (val in spine) if (spine [val] == "yes") print  val }'

13). Infiniband Cables are not as robust as Ethernet (RJ45) ones.  InfiniBand copper cables have strict

Specifications which define the minimum bend radius that they can tolerate.

Thursday, March 27, 2014

OPTIMIZER MODE in Oracle

                ALL_ROWS   : The optimizer will attempt to find an execution plan that completes the statement (typically meaning “returns all the rows”) in the shortest possible time. There are no special constraints built into this code. Used typically in OLAP Systems.

                FIRST_ROWS_N (N => 1, 10,100,1000) The optimizer first estimates the number of rows that will be returned by completely analyzing just the first join order. This tells it what fraction of the total data set the query is supposed to fetch, and it restarts the entire optimization process with the target of finding the execution plan that minimizes the resources required to return that fraction of the total data. This option was introduced in 9i. Used typically in OLTP systems.

                FIRST_ROWS:  Deprecated in 9i, but maintained for backward compatibility. The optimizer will attempt to find an execution plan to return the first row of a result set as rapidly as possible. There are several high-level constraints built into the code.

There are two other options for the optimizer_mode (even in 10g): rule and choose (default).

Up to Oracle9i, the default value is choose. This means that if object statistics are available for at least one of the objects referenced in the SQL statement that has to be optimized, the value all_rows is used; otherwise, rule is used.

As of Oracle Database 10g, the rule-based optimizer has been desupported , and therefore the new default value is all_rows. Consequently, the values choose and rule are no longer supported

Any tables that missed the most recent pass of statistics generation will then be subject to dynamic sampling. This is because the default value in 10g for the parameter optimizer_dynamic_sampling is 2 (which means use dynamic sampling on any table without statistics), rather than 1, as it was in 9i. 


Courtesy  : CBO Fundamentals by Johnathan Lewis and Troubleshooting Oracle Performance  by Christian Antognini

Friday, March 7, 2014

My SQL Scrapbook for Performance Tuning

1).Historic Details about Execution Plan , Elapsed Time , Plan Hash Value and Number of Execution.

col smple_time for a23
col SMPLE_TIME for 999999999999999
select sn.end_interval_time smple_time,
       sql.SQL_ID,
       sql.instance_number    ins,
       sql.PLAN_HASH_VALUE      planhash,
       sql.OPTIMIZER_COST       cost,
       sql.FETCHES_DELTA        fchs,
       sql.EXECUTIONS_DELTA     exes,
       sql.ROWS_PROCESSED_DELTA/1000    Krwps,
       sql.ELAPSED_TIME_DELTA/1000000   elps
from DBA_HIST_SQLSTAT sql, dba_hist_snapshot sn
where sql.sql_id='&sql_iD'
and sql.snap_id= sn.snap_id
and sql.instance_number=sn.instance_number
order by 1;

2).Historic Execution from AWR

select * from table(dbms_xplan.display_awr('&SQL_ID',PLAN_HASH_VALUE));

3). Bind Variables Used in a Query.

col END_INTERVAL_TIME for 999999999
col NAME for a999999
col name for 99999999
select
sn.END_INTERVAL_TIME,
sb.NAME,
sb.VALUE_STRING
from
DBA_HIST_SQLBIND sb,
DBA_HIST_SNAPSHOT sn
where
sb.sql_id='&SQL_ID' and
sb.WAS_CAPTURED='YES' and
sn.snap_id=sb.snap_id
order by
sb.snap_id,
sb.NAME;

4).Check the Histogram.

select column_name from dba_tab_col_statistics where table_name='&TABLE_NAME' and  and owner='&OWNER' order by COLUMN_NAME ;

5).Check the Index Positioning.

select index_name,column_name,column_position from dba_ind_columns where index_name='&INDEX_NAME' and index_owner='&INDEX_OWNER'

6).To Tune the Query.

From the Session level (as sys).

Alter session set sql_trace=TRUE; => This will not give Binds and Waits Events.
Alter session set statistics_level=ALL;  OR
Use Gather_Plan_Statistics Hint in the query you want to tune. Something like below one..
select /*GATHER_PLAN_STATISTICS */

And get the explain plan using.

SELECT * FROM table (DBMS_XPLAN.DISPLAY_CURSOR('&SQL_ID', NULL, 'ALLSTATS LAST')); 

7). Run SQL Tuning Advisor 

DECLARE
  l_sql_tune_task_id  VARCHAR2(100);
BEGIN
  l_sql_tune_task_id := DBMS_SQLTUNE.create_tuning_task (
                          begin_snap  => &BEG_SNAP,
                          end_snap    => &END_SNAP,
                          sql_id      => '&SQL_ID',
                          scope       => DBMS_SQLTUNE.scope_comprehensive,
                          time_limit  => 10000,
                          task_name   => 'MYTUNINGTASK_&SQL_ID',
                          description => 'Tuning task for statement &SQL_ID in AWR.');
  DBMS_OUTPUT.put_line('l_sql_tune_task_id: ' || l_sql_tune_task_id);
END;
/

EXEC DBMS_SQLTUNE.execute_tuning_task(task_name => '&SQL_ID');

SET LONG 10000;
SET PAGESIZE 1000
SET LINESIZE 200
SELECT DBMS_SQLTUNE.report_tuning_task('&SQL_ID') AS recommendations FROM dual;
SET PAGESIZE 24

SET LONG 10000;
SET PAGESIZE 1000
SET LINESIZE 200
spool sqltune_report.lst
SELECT DBMS_SQLTUNE.report_tuning_task('&SQL_ID') AS recommendations FROM dual;
SET PAGESIZE 24
spool off

8).Find AWR Snapshot  Interval and Change its Interval 

select
       extract( day from snap_interval) *24*60+
       extract( hour from snap_interval) *60+
       extract( minute from snap_interval ) "Snapshot Interval",
       extract( day from retention) *24*60+
       extract( hour from retention) *60+
       extract( minute from retention ) "Retention Interval"
from dba_hist_wr_control;

Output is in Minutes.

execute dbms_workload_repository.modify_snapshot_settings (interval => 60,retention => 1576800);


9).Find the Optimizer Statistics for an Index.

set serveroutput on
declare
      l_numrows   number;
     l_numlblks  number;
     l_numdist   number;
     l_avglblk   number;
     l_avgdblk   number;
     l_clstfct   number;
     l_indlevel  number;
   begin
    dbms_stats.get_index_stats(ownname ='&SCHEMA_NAME', indname ='&INDEX_NAME',
                    numrows => l_numrows, numlblks => l_numlblks,
                    numdist => l_numdist, avglblk => l_avglblk,
                    avgdblk => l_avgdblk, clstfct => l_clstfct,
                    indlevel => l_indlevel);
     dbms_output.put_line('No. of rows: ' || l_numrows);
     dbms_output.put_line('No. of blks: ' || l_numlblks);
     dbms_output.put_line('No. of distinct values: ' || l_numdist);
     dbms_output.put_line('Avg leaf blocks for distinct keys: ' || l_avglblk);
     dbms_output.put_line('Avg data blocks pointed to in the table: ' || l_avgdblk);
     dbms_output.put_line('Clustering factor: ' || l_clstfct);
     dbms_output.put_line('Index height: ' || l_indlevel);
   end;
   /

10). To find Longops Sessions

set pagesize 2000
set linesize 150
col message for a70
col opname for a30
col username for a10
select sid||','||serial# sid_Serial, start_time, opname, sofar, totalwork, round(sofar/totalwork*100) Done, round(time_remaining/60) Time_left, username, message
from gv$session_longops where totalwork != 0 and (sid,serial#) in (select sid,serial# from gv$session)
--and totalwork!=sofar
order by start_time;

Monday, August 5, 2013

What is "Parse CPU to Parse Elapsed %" ?

Parsing is fully CPU Bound Activity.In ideal case Parse CPU should be equal to Parse Elapsed.If the "Parse Elapsed is more, then that indicates the Parsing activity is waiting on some event(eg Acquiring a Latch).This would give how much wall clock time is elapsed for parsing activity has taken  to complete the Parse request when it takes. 

Total Time for Parsing(Parse Elapsed) = CPU Time Taken to Parse + Wait time .

So

Parse CPU to Parse Elapsed ratio = [Parse CPU/ Parse Elapsed]*100

Low Value for this ratio is an indicator of latching problem. Investigate the Latch Sections in AWR and Statspack report for contention on library cache and shared pool latches.

Ideally Parse Elapsed must be equal to Parse CPU, i.e., only CPU time is used for parsing. In that case the ratio is 100%. If wait time is more then the ratio will be less.

Friday, September 21, 2012

Simple Mantra on Developing Database Software:


Here is the simple mantra from my favourite Oracle Man Tom Kyte.

1).You should do it in a single SQL statement if at all possible.

2).If you cannot do it in a single SQL statement, do it in PL/SQL.

3).If you cannot do it in PL/SQL, try a Java stored procedure.

4).If you cannot do it in Java, do it in a C external procedure.

5).If you cannot do it in a C external procedure, you might want to seriously think about why it is you need to do it.

Wednesday, August 29, 2012

Tom Kyte's View on Pro C


"You need to know C well before using Pro*C.Pro*c simply lets you embedded SQL in C code. If you do not know how to code in C, you will not get very far at all with Pro*C.I use pro*c only when I cannot accomplish the task efficiently in PLSQL or SQL.  In 9i or Above,with external tables, merge, pipelined functions -- I'm very very hard pressed to find a reason to use C.I've use pro*c to dump data to flat files and sometimes to load it."

Friday, August 3, 2012

How To Set Time Prompt in SQLPLUS

We need to modify the glogin.sql sql file which will be located under $ORACLE_HOME/sqlplus/admin.
Take the backup of the glogin.sql and add the below entry.

set lines 200
set pages 200
set termout off
col dbname new_value prompt_dbname
select substr(global_name,1,instr(global_name,'.')-1) dbname
from global_name;
set sqlprompt "&&prompt_dbname> "
set termout on
set time on

System Statistics : A Quick Look

1).Till 8i ANALYZE command was used to collect statistics.However ANALYZE has following Problems 
       1).Doesn't support External Tables.
       2).Not good for partitioned Table. As it gathers stats for only segment and index level.
2).From 9i Its recommended to use dbms_stats.With dbms_stats,You can export, import, lock, unlock, get, set, gather and delete at database/dictionary/table/schema/index/system level.
3).Till 8i the CBO's Cost Estimations were based on I/O Cost Model.

4).Single Block and Multi Block reads are equally costly. Though in reality single block reads are faster. This gave unfair advantage to the Full Table Scans.To fix this we used optimizer_index_caching and optimizer_index_cost_adj is used.
5).New Cost model was introduced in 9i. Known as "CPU Cost Model".Wherein it gets performance of CPU and I/O Subsystem.
6).9i has no System Stats were available by default.10g has.
7).There are 2 types of System Statistics
        1).No workload Stats(Artificial Application Load).
        2).Workload Stats(Real Application Load) 
8).The main difference is method used to measure I/O Performance.You can still use I/O Cost Model in 10g by using _optimizer_cost_model to IO
9).System Stats are stored in AUX_STATS$ table. 
 
   SQL>SELECT pname, pval1, pval2   FROM sys.aux_stats$  WHERE sname = 'SYSSTATS_INFO';
 
       For Pname – Status can be completed / badstat 
10).SYSSTATS_MAIN set contains system statistics themselves.
    
   SQL>SELECT pname, pval1   FROM sys.aux_stats$  WHERE sname = 'SYSSTATS_MAIN';
 
11).Single set of statistics per database.So if a RAC machine has different configuration.Need to decide whether you need enable system statistics.

12).You Gather system statistics using below command,
 
   SQL>dbms_stats.gather_system_stats(gathering_mode => 'noworkload');

13).When using “noworkload” make sure the system is idle. As database uses synthetic benchmark. In 10g even if you delete the system statistics. Database will gather the statistics during next start-up.
14).Gathering statistics usually takes less than one minute.

15).With workload Statistcs,you use 
         1).exec dbms_stats.gather_system_stats(gathering_mode => 'start')
         2).Do actual work.  
         3).dbms_stats.gather_system_stats(gathering_mode => 'stop')

16).To avoid manually taking the ending snapshot, it is also possible to set the parameter gathering_mode
   SQL> dbms_stats.gather_system_stats(gathering_mode => 'interval', interval => 30)

17).You can also manually the set the statistics. This is helpful if you have different workload for in different period and you want to average the values.Say for example, 
   SQL>dbms_stats.set_system_stats(pname => 'CPUSPEED', pvalue => 772);

18).The CPU cost is defined as number of single-block reads that can be performed per second.

19).Workload statistics are preferred to “noworkload” when both of them are available.

20).Query optimizer performs several sanity checks that could disable or partially replace workload statistics.For example if mreadtim is less than sreadtime CBO re-computes again.

Courtesy: Troubleshooting Oracle Performance by Christian Antognini.


How to Start 11g R2 RAC Cluster

There are 2 methods.
Method 1:
1).Start the HAS (High Availability Service) on each node.
$ crsctl start has
2).To Start the Cluster on both the nodes.
$ crsctl start cluster –all
If OHASD is already running, you can start the Clusterware stack on any node from local node
Method 2:
1).Run the below command on each node. Starts both HAS and CRS
$ crsctl start crs
Noticed that just starting has also starts up the whole stack.

How to Check Clusterware Version and Name

To check the software versions on a single node. Software version is the lastest version installed in on an cluster.
You would use this option,When you do the rolling upgrade.
$ crsctl query crs softwareversion [hostname]
Active version is lowest version anywhere in the cluster. This is the command you would normally need to use
$ crsctl query crs activeversion
The version of oracle Clusterware must be always greater than the oracle products installed in the clusters
Permanently operating oracle Clusterware (software version vs active version) is not supported.
To check the cluster name use,
  $ORA_CRS_HOME/bin/cemutlo -n

UNIX Useful Commands - FIND


1).Find files modified within specific time
Find files that have been modified less than 3 days ago.
find . -mtime -3
To be more specific and just check for the file change time, use:
find . -ctime -3
To be even more specific you can set a date range by using: Find all files changed, and thus modified, at least one day ago but within three days ago.
find . -ctime +1 -a -ctime -3
2).Find and Replace text in Multiple Files
Use,
                                find . -type f | xargs grep -l 'IBM' | xargs sed -i '' -e 's/IBM/ORACLE/g'
The first part of the command is find, which finds all files and excludes directories. That result is then piped to grep, which lists all files that contain IBM. The results of is then sent to sed, which replaces all occurances of IBM with ORACLE.
3).Find and Move files to a specific location
Use,
                              find . -newer *lmd.trc -exec cp '{}' /tmp/ \ ;
Above command find the file newer than the *lmd.trc and then copies all the files to the tmp directory.