Journey with Oracle and PeopleSoft

06/21/2011

Another Latch:Cache Buffer Chains Troubleshooting

Filed under: Latches — orapsdba @ 1:05 pm

I got an emergency call early in the morning from my IDM support team.It looks like their OID (Oracle internet Directory) production server is not responding well and the CPU utilization always 99-100%.What a way to start the day :-)
 
As usuall,I have gone through OS utility nmon(OS is AIX 6.1) and verified the CPU utilization and top processes.There are 4 Oracle procsses from OID application using most of the CPUs.
 
A quick look at the v$session_wait shows me all top processes’s sessions were waiting on a event “latch:cache buffer chains”
 
Since the customer has dignostic package license,I have a cushion of selecting from ASH.
 
SQL> SELECT * FROM (
          SELECT
            event
          , TRIM(TO_CHAR(p1, ‘XXXXXXXXXXXXXXXX’)) latch_addr
          , TRIM(ROUND(RATIO_TO_REPORT(COUNT(*)) OVER () * 100, 1))||’%’ PCT
          , COUNT(*)
        FROM
            v$active_session_history
        WHERE
           event = ‘latch: cache buffers chains’
       AND session_state = ‘WAITING’
       GROUP BY
           event
         , p1
       ORDER BY
           COUNT(*) DESC
   )
   WHERE ROWNUM <= 10
/
 
EVENT                                                            LATCH_ADDR        PCT                                         COUNT(*)
—————————————————————- —————– —————————————– ———-
latch: cache buffers chains                                      70000003DF39DD8   10.3%                                              3
latch: cache buffers chains                                      70000003D868120   6.9%                                               2
latch: cache buffers chains                                      70000003D8CF450   6.9%                                               2
latch: cache buffers chains                                      70000003DED2DC8   6.9%                                               2
latch: cache buffers chains                                      70000003D90F498   6.9%                                               2
latch: cache buffers chains                                      70000003DEF9D90   6.9%                                               2
latch: cache buffers chains                                      70000003D876838   3.4%                                               1
latch: cache buffers chains                                      70000003D8DFAA0   3.4%                                               1
latch: cache buffers chains                                      70000003D986E18   3.4%                                               1
latch: cache buffers chains                                      70000003D90CA70   3.4%                                               1
 
To keep the blog small,I kept only the first three latch information.
 
SQL> @latchprofx sid,name,sqlid,object % 3DF39DD8 100000
 
       SID NAME                                SQLID                    OBJECT       Held       Gets  Held %     Held ms Avg hold ms
———- ———————————– ————- —————– ———- ———- ——- ———– ———–
       612 cache buffers chains                62k5ha59gm0ur           1800527         44         44     .04       2.891        .066
       626 cache buffers chains                ap466g3x7zyby           1800527         26         26     .03       1.708        .066
 
SQL> @latchprof sid,name,sqlid,object % 3D868120 100000
 
       SID NAME                                SQLID                    OBJECT       Held       Gets  Held %     Held ms Avg hold ms
———- ———————————– ————- —————– ———- ———- ——- ———– ———–
       619 cache buffers chains                dprdb6ng88vch           18000DD         30         30     .03       2.004        .067
       644 cache buffers chains                dz3j9dty68rgm           18000DD         25         25     .03       1.670        .067
       616 cache buffers chains                dnjk9235k62by           18000DD         23         23     .02       1.536        .067
 
SQL> @latchprofx sid,name,sqlid,object % 3D8CF450 100000
 

       SID NAME                                SQLID                    OBJECT       Held       Gets  Held %     Held ms Avg hold ms
———- ———————————– ————- —————– ———- ———- ——- ———– ———–
       644 cache buffers chains                f5wauqmvrv4af           18000DA         34         34     .03       2.475        .073
       616 cache buffers chains                17bysmbufufc2           18000DA         32         32     .03       2.330        .073
       619 cache buffers chains                dprdb6ng88vch           18000DA         30         30     .03       2.184        .073
 
– Find the hot block
 
SQL> @dba 1800527
    RFILE#     BLOCK# DUMP_CMD
———- ———- ———————————————————————————————————————
         6       1319 — alter system dump datafile 6 block 1319

STATE      BLOCK_CLASS        OBJECT_TYPE         DATA_OBJECT_ID object                                          TCH  MODE_HELD D T P S D FLG_LRUFLG                  DQ
———- —————— ——————- ————– —————————————- ———- ———- – - – - – ——————- ———-
xcur       data block         INDEX                        70409 ODS.ST_P1_ORCLEVENTTIME                           71          0 N N N N N 80000:4                      0

 
It’s evident that the sessions were trying to acquire the same block and thereby causes contention on CBC latch.
Further looking into the index reveals that there are 124 keys are packed into one leaf block.
 
Table_Name                       Index_Name                          Tbl_Blocks      Num_Rows     BLEVEL LEAF_BLOCKS            CF DISTINCT_KEYS
——————————– ——————————– ————- ————- ———- ———– ————- ————-
P1_CT_ORCLEVENTTIME              ST_P1_ORCLEVENTTIME                         20          4855          1          20            18          4855
P1_CT_ORCLEVENTTIME              VA_P1_ORCLEVENTTIME                         20          4855          1          20            18          4855
 
 
SQL> SELECT
  2  keys_per_leaf,
  3  count(*) “LeafBlocks”
  4  from
  5  (
  6  select sys_op_lbid(70409,’L',t.rowid) block_id,
  7  count(*)  keys_per_leaf
  8  from
  9  ODS.P1_CT_ORCLEVENTTIME t
 10  where (ENTRYID is not null or ATTRVALUE is not null or ATTRTYPE is not null)
 11  group by
 12  sys_op_lbid(70409,’L',t.rowid)
 13  )
 14  group by keys_per_leaf
 15  order by keys_per_leaf
 16  ;
 
KEYS_PER_LEAF LeafBlocks
————- ———-
          124          1    >>>  124 keys in 1 block
          249         19
 
It’s time to verify the SQLs and their plan.Most of the SQLs are using Nested Loops join.The estimated cardinality with the default statistics values was wrong which leads the optimizer to choose a sub-optimal plan.
 
Then I verified the statistics of the associated tables.I am shell shocked to see none of them are having statistics.The application’s admin team  forgot to run oidstats.sql which is the utility provided by oracle to collect statistics on ODS schema after each bulk load.Once the statistics are collected and the application was restarted,the CPU consumption was back to normal and the problematic SQLs are now using hash joins as expected.
 
Again,the right tool at the right time and it’s Tanel’s (My Oracle Guru) latchprofx makes my job easier.
 
I will see you all soon with another interesting blog…
 
Cheers!

1 Comment »

  1. Thanks for sharing. Just wonder what is the plan it uses after statistics are up-to-date on the hot index block?

    Comment by Bunditj — 07/01/2011 @ 1:06 am | Reply


RSS feed for comments on this post. TrackBack URI

Leave a Reply

Gravatar
WordPress.com Logo

Please log in to WordPress.com to post a comment to your blog.

Twitter picture

You are commenting using your Twitter account. Log Out )

Facebook photo

You are commenting using your Facebook account. Log Out )

Connecting to %s

Theme: Rubric. Blog at WordPress.com.

Follow

Get every new post delivered to your Inbox.