Saturday, January 30, 2016

Convert Single Instance to RAC - Manual Method

This topic is for converting single instance database to RAC instance.  I am demonstrating this in oracle12c version. There are  several ways we can do this.. But RMAN restore is preferred method for least downtime in bigger Tera Byte size database.

If you want to convert from Oracle11g to Oracle12c, then i would recommend to upgrade the stand alone DB to Oracle12c and complete the testing before you migrate to RAC.   This will help narrow down the issues when we do one thing at  a time.

I followed the oracle document for this RAC migration. Here is the reference  document  for my post. I also referred the metalink Doc ID 747457.1.  Please review the oracle document before you start doing in your environment.

Prerequisite :

Install cluster and ASM on all nodes
Install the oracle software
Stand alone DB and RAC must be same database version.
ASM disks size should be  good enough to accommodate the source database.
Cluster should be up and running on RAC node.

Source DB info :

Host Name usbenhost
Instance name usben
DB name usben
Storage type file system

Target DB info :

Host Name ractest1, ractest2, ractest3
Instance name usben1, usben2,usben3
DB name usben
Storage type ASM

There are two major task on this migration, The first part is to migrate the database from stand alone server(usbenhost) to ractest1 database server.  I am  using RMAN restore to migrate the database.

There are multiple ways we can restore the database. As part of the  migration, i am also moving the database to ASM disk.
  1.  Duplicate the database using target database backup. Click here for details.
  2.  Duplicate the database using Active database duplication. Click  here  for details.
  3.  Regular RMAN restore approach. Click  here for details.
  4.  Setup the Standby Database. Click here for details.
Option 1 & Option 2 would cause longer downtime. Because, we have to keep the application  down during the restore. Option 3 & Option 4 would reduce the application downtime significantly.  Application will be down only during the RAC configuration if we choose option 3 & 4.

Here are the high level steps for option 1 & Option 2.
  1. Shutdown the application.
  2. Restore the database in RAC host and open the database
  3. Convert the DB to RAC DB
  4. Start the application.
Here are the high level steps for option 3
  1. Restore the database in RAC host
  2. Apply the daily archive log files in RAC host.
  3. On the cut over date. apply all the archive log files. 
  4. Shutdown the application,
  5. Open the stand alone database in RAC node
  6. Convert the DB to RAC DB
  7. Start the application.
Here are the high level steps for option 4
  1. Setup the standby database in RAC host.
  2. Shutdown the application,
  3. Switch over the database or activate the database in RAC host
  4. Convert the DB to RAC DB
  5. Start the application.
I followed Option 3 to migrate the database from usbenhost to ractest1 node.

Now let us start converting the DB to RAC database!

My standby alone database is already  migrated from stand alone server to RAC node . Here is the database info :
select name from v$datafile
union
select name from v$controlfile
union
select member from v$logfile
union
select name from v$tempfile




















Step 1
Make sure, MAXINSTANCES and MAXLOGFILES Parameters are good enough for RAC environment.  I verified my control file and looks good for me. If not, please recreate the control file with appropriate values.










Step 2
Create the redo log group and undo tablespace for each instance.  The redo and undo are handled on per instance basis.  We need minimum two redo log groups for each instance and  it should be created on shared storage. My case, I keep two redo log groups. I will let you choose the number of redo log groups according to your environment.

Each redo log group should be assigned to thread number in RAC database. It is very important to specify the thread number when you add or create redo log files. This will ensure that the redo log file is assigned to right instance.  

Each instance is assigned a thread number starting at 1. Hence i am creating redo log groups for thread#2 and thread#3.

Here i have three node RAC. Hence i am creating redo and undo for other two instance.


alter database add logfile
thread 2 group 6 ('+DATA') size 50m reuse;

alter database add logfile
thread 2 group 7 ('+DATA') size 50m reuse;

alter database add logfile
thread 3 group 8 ('+DATA') size 50m reuse;

alter database add logfile
thread 3 group 9 ('+DATA') size 50m reuse;

alter database enable public thread 2;

alter database enable public thread 3;

create undo tablespace UNDOTBS2 datafile 
'+DATA' size 50M;

create undo tablespace UNDOTBS3 datafile 
'+DATA' size 50M;






















Step 3

Create pfile from spfile
Modify the pfile with RAC related parameters










Add the below parameters in pfile.


*.cluster_database_instances=3
*.cluster_database=true
usben1.instance_number=1
usben2.instance_number=2
usben3.instance_number=3
usben1.thread=1
usben2.thread=2
usben3.thread=3
usben1.undo_tablespace='UNDOTBS1'
usben2.undo_tablespace='UNDOTBS2'
usben3.undo_tablespace='UNDOTBS3'


Here is the updated pfile in ractest1 host. I also modified the archive log destination to ASM disk.

usben.__data_transfer_cache_size=0
usben.__db_cache_size=536870912
usben.__java_pool_size=16777216
usben.__large_pool_size=150994944
usben.__oracle_base='/ora/app/oracle'#ORACLE_BASE set from environment
usben.__pga_aggregate_target=671088640
usben.__sga_target=989855744
usben.__shared_io_pool_size=50331648
usben.__shared_pool_size=218103808
usben.__streams_pool_size=0
*.compatible='12.1.0.2.0'
*.control_files='+DATA/USBEN/CONTROLFILE/current.294.904836161',
'+DATA/USBEN/CONTROLFILE/current.312.904836161'
*.db_block_size=8192
*.db_domain='localdomain'
*.db_name='usben'
*.db_create_file_dest='+DATA'
*.db_recovery_file_dest='+DATA'
*.db_recovery_file_dest_size=4560m
*.diagnostic_dest='/ora/app/oracle'
*.dispatchers='(PROTOCOL=TCP) (SERVICE=usbenXDB)'
*.log_archive_dest_1='location=+DATA/USBEN/ARCH'
*.log_archive_dest_state_1='enable'
*.open_cursors=300
*.processes=300
*.remote_login_passwordfile='EXCLUSIVE'
*.cluster_database_instances=3
*.cluster_database=true
usben1.instance_number=1
usben2.instance_number=2
usben3.instance_number=3
usben1.thread=1
usben2.thread=2
usben3.thread=3
usben1.undo_tablespace='UNDOTBS1'
usben2.undo_tablespace='UNDOTBS2'
usben3.undo_tablespace='UNDOTBS3'


Step 4 Drop the static listener if you created during the database duplication process.  Use NETCA to configure the listener in RAC environment. Copy the password file from ractest1 to other nodes.

Step 5 Start the RAC instance on all three node

Add the entry in /etc/oratab

ractest1 host :  usben1:/ora/app/oracle/product/12.1.0.1/db_1:N
ractest1 host :  usben2:/ora/app/oracle/product/12.1.0.1/db_1:N
ractest1 host :  usben3:/ora/app/oracle/product/12.1.0.1/db_1:N

Rename the pfile as below.

mv $ORACLE_HOME/dbs/initusben.ora $ORACLE_HOME/dbs/initusben1.ora

Before you start the instance, make sure you set the new environment variable. Since the instance name is changed from usben to usben1. You get the below error if you don't set the ORACLE_SID with new value USBEN1


> startup nomount pfile=$ORACLE_HOME/dbs/initusben1.ora
ORA-29760: instance_number parameter not specified


Create the spfile from pfile


sys@usben> shutdown immediate
Database closed.
Database dismounted.
ORACLE instance shut down.
sys@usben>
sys@usben> startup nomount pfile=$ORACLE_HOME/dbs/initusben1.ora
ORACLE instance started.

Total System Global Area  297795584 bytes
Fixed Size                  2923632 bytes
Variable Size             239076240 bytes
Database Buffers           50331648 bytes
Redo Buffers                5464064 bytes
sys@usben1>  create spfile='+DATA/USBEN/PARAMETERFILE/spfileusben.ora' from PFILE;

File created.

sys@usben1>

update the pfile on all three instance as  below.

[oracle@RACTEST1 dbs]$ cat initusben1.ora
SPFILE='+DATA/USBEN/PARAMETERFILE/spfileusben.ora'
[oracle@RACTEST1 dbs]$

[oracle@RACTEST2 dbs]$ cat initusben2.ora
SPFILE='+DATA/USBEN/PARAMETERFILE/spfileusben.ora'
[oracle@RACTEST2 dbs]$

[oracle@RACTEST1 dbs]$ cat initusben1.ora
SPFILE='+DATA/USBEN/PARAMETERFILE/spfileusben.ora'
[oracle@RACTEST1 dbs]$

shutdown the current stand alone instance in ractest1 node.

Now start  the instance on all three node with new spfile.



























Step 6
Execute catclust.sql on ractest1 node. This script creates the dictionary views needed for Oracle RAC databases. I run this script even before we configure the RAC instance.

[oracle@RACTEST1 dbs]$ cd $ORACLE_HOME/rdbms/admin
[oracle@RACTEST1 admin]$ sqlplus / as sysdba

SQL*Plus: Release 12.1.0.2.0 Production on Sat Feb 27 05:34:27 2016

Copyright (c) 1982, 2014, Oracle.  All rights reserved.


Connected to:
Oracle Database 12c Enterprise Edition Release 12.1.0.2.0 - 64bit Production
With the Partitioning, Real Application Clusters, Automatic Storage Management, OLAP,
Advanced Analytics and Real Application Testing options

SQL> !pwd
/ora/app/oracle/product/12.1.0.1/db_1/rdbms/admin

SQL> @catclust.sql

Session altered.

Package created.

Package body created.

PL/SQL procedure successfully completed.

View created.

Synonym created.

Grant succeeded.

View created.

Grant succeeded.

View created.

Grant succeeded.

Synonym created.

Synonym created.

Synonym created.

Synonym created.

Synonym created.

Synonym created.

Synonym created.

Synonym created.

Synonym created.

Synonym created.

Synonym created.

Synonym created.

PL/SQL procedure successfully completed.

Session altered.

SQL>


Step 7  Register the RAC instance in the CRS.

srvctl add database -d usben -o $ORACLE_HOME  -p +DATA/USBEN/PARAMETERFILE/spfileusben.ora
srvctl add instance -d usben -i usben1 -n RACTEST1.localdomain
srvctl add instance -d usben -i usben2 -n RACTEST2.localdomain
srvctl add instance -d usben -i usben3 -n RACTEST3.localdomain


[oracle@RACTEST1 dbs]$ srvctl add database -d usben -o $ORACLE_HOME  -p +DATA/USBEN/PARAMETERFILE/spfileusben.ora
[oracle@RACTEST1 dbs]$ srvctl add instance -d usben -i usben1 -n RACTEST1.localdomain
[oracle@RACTEST1 dbs]$ srvctl add instance -d usben -i usben2 -n RACTEST2.localdomain
[oracle@RACTEST1 dbs]$ srvctl add instance -d usben -i usben3 -n RACTEST3.localdomain
[oracle@RACTEST1 dbs]$

Let us shutdown the DB  and start the database using srvctl command.
























[oracle@RACTEST1 dbs]$ srvctl start database -d usben
[oracle@RACTEST1 dbs]$ srvctl stop database -d usben
[oracle@RACTEST1 dbs]$ srvctl start database -d usben

[oracle@RACTEST1 dbs]$ srvctl status database -d usben
Instance usben1 is running on node ractest1
Instance usben2 is running on node ractest2
Instance usben3 is running on node ractest3
[oracle@RACTEST1 dbs]$

The conversion process is completed once we start the database by using srvctl. We  can run the following SQL statement to see the status of all the instances in the  Oracle RAC database.\

select * from v$active_instances;
column host_name format a50
select host_name,instance_number,instance_name,status,thread# from gv$instance;









Checking the RAC cluster global process :



[root@RACTEST1 ~]# crsctl stat resource -t
--------------------------------------------------------------------------------
Name           Target  State        Server                   State details
--------------------------------------------------------------------------------
Local Resources
--------------------------------------------------------------------------------
ora.DATA.dg
               ONLINE  ONLINE       ractest1                 STABLE
               ONLINE  ONLINE       ractest2                 STABLE
               ONLINE  ONLINE       ractest3                 STABLE
ora.LISTENER.lsnr
               ONLINE  ONLINE       ractest1                 STABLE
               ONLINE  ONLINE       ractest2                 STABLE
               OFFLINE OFFLINE      ractest3                 STABLE
ora.TEST.dg
               ONLINE  ONLINE       ractest1                 STABLE
               ONLINE  ONLINE       ractest2                 STABLE
               ONLINE  ONLINE       ractest3                 STABLE
ora.VOTE.dg
               ONLINE  ONLINE       ractest1                 STABLE
               ONLINE  ONLINE       ractest2                 STABLE
               ONLINE  ONLINE       ractest3                 STABLE
ora.VOTE1.dg
               OFFLINE OFFLINE      ractest1                 STABLE
               ONLINE  OFFLINE      ractest2                 STABLE
               ONLINE  OFFLINE      ractest3                 STABLE
ora.VOTE2.dg
               ONLINE  ONLINE       ractest1                 STABLE
               ONLINE  ONLINE       ractest2                 STABLE
               ONLINE  ONLINE       ractest3                 STABLE
ora.asm
               OFFLINE ONLINE       ractest1                 Started,STABLE
               ONLINE  ONLINE       ractest2                 Started,STABLE
               ONLINE  ONLINE       ractest3                 Started,STABLE
ora.net1.network
               ONLINE  ONLINE       ractest1                 STABLE
               ONLINE  ONLINE       ractest2                 STABLE
               ONLINE  ONLINE       ractest3                 STABLE
ora.ons
               ONLINE  ONLINE       ractest1                 STABLE
               ONLINE  ONLINE       ractest2                 STABLE
               ONLINE  ONLINE       ractest3                 STABLE
--------------------------------------------------------------------------------
Cluster Resources
--------------------------------------------------------------------------------
ora.LISTENER_SCAN1.lsnr
      1        ONLINE  ONLINE       ractest2                 STABLE
ora.LISTENER_SCAN2.lsnr
      1        ONLINE  ONLINE       ractest3                 STABLE
ora.LISTENER_SCAN3.lsnr
      1        ONLINE  ONLINE       ractest3                 STABLE
ora.MGMTLSNR
      1        ONLINE  ONLINE       ractest3                 169.254.214.12,STABL
                                                             E
ora.cvu
      1        ONLINE  ONLINE       ractest2                 STABLE
ora.govinddb.db
      1        OFFLINE OFFLINE                               Instance Shutdown,ST
                                                             ABLE
      2        OFFLINE OFFLINE                               Instance Shutdown,ST
                                                             ABLE
      3        OFFLINE OFFLINE                               STABLE
ora.mgmtdb
      1        OFFLINE OFFLINE                               STABLE
ora.oc4j
      1        ONLINE  ONLINE       ractest1                 STABLE
ora.ractest1.vip
      1        ONLINE  ONLINE       ractest1                 STABLE
ora.ractest2.vip
      1        ONLINE  ONLINE       ractest2                 STABLE
ora.scan1.vip
      1        ONLINE  ONLINE       ractest2                 STABLE
ora.scan2.vip
      1        ONLINE  ONLINE       ractest3                 STABLE
ora.scan3.vip
      1        ONLINE  ONLINE       ractest3                 STABLE
ora.usben.db
      1        ONLINE  ONLINE       ractest1                 Open,STABLE
      2        ONLINE  ONLINE       ractest2                 Open,STABLE
      3        ONLINE  ONLINE       ractest3                 Open,STABLE
--------------------------------------------------------------------------------
[root@RACTEST1 ~]#



Step 8  Monitor the logs ,

Monitor the below log files and make sure no issues.

CRS_HOME/log/hostname/crsd/ – The log files for the CRS daemon
CRS_HOME/log/hostname/cssd/ – The log files for the CSS daemon
CRS_HOME/log/hostname/evmd/ – The log files for the EVM daemon
CRS_HOME/log/hostname/client/ – The log files for the Oracle Cluster Registry (OCR)
CRS_HOME/log/hostname/racg/ – The log files for the Oracle RAC high availability component
CRS_HOME/log/hostname/racg – The log files for the Oracle RAC high availability component
CRS_HOME/log/hostanme/alert.log – The alert.log for Clusterware issues.


Sunday, January 24, 2016

How to Identify the Master node in RAC?

This question came to my mind when i was doing the RAC course and thought i post this topic in my blog and it might be helpful for others.

The node which join first in the cluster is Master node. Each node has node id and it is assigned the order in which the node is joined in the cluster.   If master node is evicted or rebooted, cluster choose the next node(node which joined in the cluster right after the master/first node)  for master instance.

Master node is responsible for backing up the OCR file. Obliviously, who ever is backing up the OCR is master instance. This is one way we can find the master node.

[root@RAC1 crsd]# ocrconfig -showbackup auto

rac1     2015/10/07 09:45:15     /grid/app/11.2.0/grid/cdata/RAC-cluster/backup00.ocr

rac1     2015/10/07 05:45:14     /grid/app/11.2.0/grid/cdata/RAC-cluster/backup01.ocr

rac1     2015/10/07 01:45:14     /grid/app/11.2.0/grid/cdata/RAC-cluster/backup02.ocr

rac1     2015/10/07 01:45:14     /grid/app/11.2.0/grid/cdata/RAC-cluster/day.ocr

rac1     2015/10/05 16:25:04     /grid/app/11.2.0/grid/cdata/RAC-cluster/week.ocr
[root@RAC1 crsd]#

As per the above output, RAC1 is master instance.

 Second way to identify the master instance is, scanning the ocssd log file.

[oracle@RAC1 cssd]$ cat /grid/app/11.2.0/grid/log/rac1/cssd/ocssd.log|grep 'master node'|tail -1
2016-01-24 20:35:25.670: [    CSSD][2105509648]clssgmCMReconfig: reconfiguration successful, incarnation 339437441 with 2 nodes, local node number 1, master node number 1
[oracle@RAC1 cssd]$


Third way to identify the master instance is, scanning the crssd log file.

[oracle@RAC1 crsd]$ cat /grid/app/11.2.0/grid/log/rac1/crsd/crsd.log|grep 'OCR MASTER'|tail -1
2016-01-24 20:35:25.077: [  OCRMAS][2206197520]th_master:12: I AM THE NEW OCR MASTER at incar 1. Node Number 1
[oracle@RAC1 crsd]$

Hope this post is helpful..

Thursday, January 21, 2016

What is Cache Fusion and how it works in RAC

Oracle used disk pinging Prior to Oracle RAC Cache Fusion concept. There was no data block transfer between the buffer cache of one instance buffer cache of another instance.

Instance A read the block from disk and Instance B wants to read the same block  which is not in the instance B buffer cache, then Instance B needs to read the block from the disk which is causing additional disk read.

Instance A made changes on the particular data block and Instance B wanted to read the committed row, then, Instance A has to write the changes on the disk before Instance B read the same record.

This becomes performance bottleneck and Oracle introduced Cache Fusion in Oracle 8i to enhance the performance improvement.

What is Cache Fusion?

Cache Fusion Oracle RAC transfer the data block from buffer cache of one instance to the buffer cache of another instance using the cluster high speed interconnect. 

For instance,  Instance A read the block and it is in local buffer cache. Now Instance B wants to read the same block, then it can transfer the block(shared current image - SCUR ) from instance A buffer cache to Instance B buffer cache. It does not require additional disk read.

Instance A made changes on the particular block and it is not committed yet. Now instance B wants to read the same block, then, Instance A send the consistent read (CR) image copy to instance B. Cache Fusion avoids disk read and it greatly enhance the performance.

Let me demonstrate the cache fusion and how it works in RAC.  I have three node RAC and node names are ractest1, ractest2 & ractest3. The instance names are govinddb1, govinddb2 & govinddb3. The Database version is oracle 12c. I am going to use only two nodes for this cache fusion experiment. I will create small tiny table with 6 rows and two columns and play around the cache fusion with different scenarios.


Cache Fusion Demonstration  : I am going to use the below SQL multiple times in my demonstration.. hence i am storing the script on my server.
vbh.sql
select file#, block#,
    decode(class#,1,'data block',2,'sort block',3,'save undo block', 4,
    'segment header',5,'save undo header',6,'free list',7,'extent map',
     8,'1st level bmb',9,'2nd level bmb',10,'3rd level bmb', 11,'bitmap block',
     12,'bitmap index block',13,'file header block',14,'unused',
     15,'system undo header',16,'system undo block', 17,'undo header',
     18,'undo block') class_type, status, LOCK_ELEMENT_ADDR
     from v$bh
     where objd = &data_obj_id
order by 1,2,3;
sess.sql
column name format a30
select sn.name,st.value
from v$statname sn
join v$sesstat st
on sn.statistic#=st.statistic#
where
sn.name in('session logical reads',
'physical reads', 'gc cr blocks received',
'gc current blocks received')
and st.sid = SYS_CONTEXT('USERENV','SID')
order by sn.statistic#;

The below script has run on my first node ractest1(govinddb1 instance). I setup new schema with table to test the cache fusion. 

create user cfdemo identified by cfdemo;
grant dba to cfdemo;
grant select on V_$BH to cfdemo;
conn cfdemo/cfdemo
create table emp (empno number, ename char(10));
insert into emp values(1,'Scott');
insert into emp values(2,'Mark');
insert into emp values(3,'Ranny');
insert into emp values(4,'King');
insert into emp values(5,'Eddie');
insert into emp values(6,'Danny');
commit;

Let us check how many blocks have occupied and what are the block numbers used for this tiny table. 
cfdemo@govinddb1> select extent_id, block_id, blocks
  2  from dba_extents
  3  where segment_name = 'EMP'
  4  and owner='CFDEMO';

 EXTENT_ID   BLOCK_ID     BLOCKS
---------- ---------- ----------
         0        216          8

1 row selected.
cfdemo@govinddb1>

okay... This table used 8 blocks and starting from block# 216 to block#223. It is small tiny table and why would it use 8 blocks?  The data block may be one or two. But other blocks  might include  header block & bitmap block.  Let us look and see which block is used for data..

cfdemo@govinddb1> select EMPNO,
  2  dbms_rowid.rowid_relative_fno(rowid) file#,
  3  dbms_rowid.rowid_block_number(rowid) block#
  4  from emp;

     EMPNO      FILE#     BLOCK#
---------- ---------- ----------
         1          6        223
         2          6        223
         3          6        223
         4          6        223
         5          6        223
         6          6        223

6 rows selected.
cfdemo@govinddb1>

oh..   Out of 8 blocks, the actual data are stored on only one block(#223).

 Let us check if these blocks are on the database buffer cache.

cfdemo@govinddb1>  select data_object_id, object_id
  2  from dba_objects
  3  where object_name = 'EMP'
  4  and owner='CFDEMO';

DATA_OBJECT_ID  OBJECT_ID
-------------- ----------
         93446      93446
1 row selected.

cfdemo@govinddb1>  @vbh
Enter value for data_obj_id: 93446
old   9: where objd = &data_obj_id
new   9: where objd = 93446

     FILE#     BLOCK# CLASS_TYPE         STATUS     LOCK_ELEMENT_ADD
---------- ---------- ------------------ ---------- ----------------
         6        216 1st level bmb      xcur       0000000086F8B010
         6        217 2nd level bmb      xcur       0000000086F7D208
         6        218 segment header     xcur       0000000086F8EF80
         6        219 data block         xcur       0000000086F966B8
         6        220 data block         xcur       0000000086FA46F0
         6        221 data block         xcur       0000000086F715E8
         6        222 data block         xcur       0000000086F78460
         6        223 data block         xcur       0000000086F8C4D8

8 rows selected.

cfdemo@govinddb1>

Look at here... all the blocks are in memory and shows status XCUR(Exclusive current).  why?  Because, we created the table and rows a moment ago. so, all the rows are in memory and these are most recent current copy.  Hence, it acquired Exclusive  current lock.

Another question is,  all 6 rows are stored on block#223. But why there are more blocks(219,220,221,222) shows as a data block? I guess, Oracle allocates more blocks and reserve for future insert. This is my best guess.
These may be empty blocks allocated for this table.

Let us go to instance 2 and check if these buffers are in memory.

sys@govinddb2> @vbh
Enter value for data_obj_id: 93446
old   9: where objd = &data_obj_id
new   9: where objd = 93446

no rows selected
sys@govinddb2>

 no rows selected. It means, these blocks are not in instance 2 buffer cache.  It makes  sense. Since we never selected these rows in instance 2.

Let us select a row in instance 2 and see if we have the data in buffer cache.

cfdemo@govinddb2> select ENAME from EMP where EMPNO = &empno;

Enter value for empno: 1
old   1: select ENAME from EMP where EMPNO = &empno
new   1: select ENAME from EMP where EMPNO = 1

ENAME
----------
Scott

1 row selected.

Now i selected one row from CFDEMO.EMP table on instance 2.  Let us check the cache and see status.

cfdemo@govinddb2> @vbh
Enter value for data_obj_id: 93446
old   9:  where objd = &data_obj_id
new   9:  where objd = 93446

     FILE#     BLOCK# CLASS_TYPE         STATUS     LOCK_ELEMENT_ADD
---------- ---------- ------------------ ---------- ----------------
         6        218 segment header     cr         00
         6        218 segment header     scur       0000000081FE23B0
         6        219 data block         scur       0000000081FEAC68
         6        219 data block         cr         00
         6        220 data block         scur       0000000081FE4020
         6        220 data block         cr         00
         6        221 data block         scur       0000000081FD8518
         6        221 data block         cr         00
         6        222 data block         cr         00
         6        222 data block         scur       0000000081FDD090
         6        223 data block         cr         00
         6        223 data block         scur       0000000081FE01C8

12 rows selected.
cfdemo@govinddb2>

Look at here.. All the blocks(buffer)  shows CR(consistent read) and SCUR(Shared current).  I selected  only one row from the table. It supposed to bring one data block(#223) to instance 2 buffer cache.  But why is it bringing all the data blocks?  Because, when we select one row, Oracle does not know which row satisfy the condition.. Hence, It does the full table scan and brought all rows to buffer cache.  Also note that, there are two copies(SCUR & CR)  for each block.  Why?  Oracle Cache fusion transfer the buffer from instance 1 to instance 2 with CR mode. Once it is copied, it makes another copy of the block  with SCUR mode.  CR copy is used for consistent read for other readers. But SCUR copy is shared current image across all the instance.   Once the buffer becomes SCUR, instance 1 should also change the buffer  mode from XCUR to SCUR.  We can not have XCUR in one  instance and SCUR on other instance.

Let us go back to instance 1 and check the status of these buffer.

cfdemo@govinddb1> @vbh
Enter value for data_obj_id: 93446
old   9: where objd = &data_obj_id
new   9: where objd = 93446

     FILE#     BLOCK# CLASS_TYPE         STATUS     LOCK_ELEMENT_ADD
---------- ---------- ------------------ ---------- ----------------
         6        216 1st level bmb      xcur       0000000086F8B010
         6        217 2nd level bmb      xcur       0000000086F7D208
         6        218 segment header     scur       0000000086F8EF80
         6        219 data block         scur       0000000086F966B8
         6        220 data block         scur       0000000086FA46F0
         6        221 data block         scur       0000000086F715E8
         6        222 data block         scur       0000000086F78460
         6        223 data block         scur       0000000086F8C4D8

8 rows selected.
cfdemo@govinddb1>

Now look at above output.  All the buffers on instance 1 changed from XCUR to SCUR. The buffers are now shared current. Oracle can go to either instance1 or instance 2 to get the data. The shared current buffers are identical across the instance.   

So far so good..... Let me bounce the database and clear all the buffers and let us play around few  scenarios.
[oracle@RACTEST1 ~]$ srvctl stop database -d govinddb -o immediate
[oracle@RACTEST1 ~]$ srvctl start database -d govinddb

Now the DB is bounced...  

Scenario 1(READ-READ) : Instance 1 read the block and instance 2 is also reading the same block. let us check the buffer status.   Read the block from govinddb1 and check the buffer status.

cfdemo@govinddb1> select ename from emp where empno=&empno;
Enter value for empno: 1
old   1: select ename from emp where empno=&empno
new   1: select ename from emp where empno=1

ENAME
----------
Scott

1 row selected.
cfdemo@govinddb1>@sess

NAME                                VALUE
------------------------------ ----------
session logical reads                1792
physical reads                         85
gc cr blocks received                   0
gc current blocks received              9

4 rows selected.
cfdemo@govinddb1>
cfdemo@govinddb1> @vbh
Enter value for data_obj_id: 93446
old   9: where objd = &data_obj_id
new   9: where objd = 93446

     FILE#     BLOCK# CLASS_TYPE         STATUS     LOCK_ELEMENT_ADD
---------- ---------- ------------------ ---------- ----------------
         6        218 segment header     scur       0000000080F7DBE0
         6        219 data block         scur       0000000080F8F098
         6        220 data block         scur       0000000080F8E490
         6        221 data block         scur       0000000080F76048
         6        222 data block         scur       0000000080F7F1C0
         6        223 data block         scur       0000000080F7A878

6 rows selected.
cfdemo@govinddb1>

When oracle reads the data from disk to buffer cache, it acquire shared current lock. Later it would change to exclusive current mode once started making the changes. 

Let us read the same block from  govinddb2 and check the status.

cfdemo@govinddb2>  select ename from emp where empno=&empno;
Enter value for empno: 1
old   1:  select ename from emp where empno=&empno
new   1:  select ename from emp where empno=1

ENAME
----------
Scott

1 row selected.

cfdemo@govinddb2>@sess

NAME                                VALUE
------------------------------ ----------
session logical reads                1572
physical reads                         41
gc cr blocks received                   2
gc current blocks received             39

4 rows selected.
cfdemo@govinddb2>@vbh
Enter value for data_obj_id: 93446
old   9: where objd = &data_obj_id
new   9: where objd = 93446

     FILE#     BLOCK# CLASS_TYPE         STATUS     LOCK_ELEMENT_ADD
---------- ---------- ------------------ ---------- ----------------
         6        218 segment header     scur       0000000080FD6560
         6        219 data block         scur       0000000080FE6898
         6        220 data block         scur       0000000080FEBCD0
         6        221 data block         scur       0000000080FBFD88
         6        222 data block         scur       0000000080FCE680
         6        223 data block         scur       0000000080FD0EF8

6 rows selected.
cfdemo@govinddb2>

Confusion:  On instance1, the physical read is 85. But on instance 2, the physical read is 41. The number of physical read is reduced in govinddb2. It means, the cache fusion played a role and transferred the block from instance 1 to instance 2. Also the block status is shared current. It means, no one has modified this block yet. Hence it is shared current image on the entire cluster. 

Scenario 2(WRITE-READ) : Instance 1 update  the block and instance 2 is also reading the same block. let us check the buffer status.    

 Let us update row 2 on EMP table from instance 1.

cfdemo@govinddb1> update emp set ename='A' where empno=2;

1 row updated.

cfdemo@govinddb1>@vbh
Enter value for data_obj_id: 93446
old   9: where objd = &data_obj_id
new   9: where objd = 93446

     FILE#     BLOCK# CLASS_TYPE         STATUS     LOCK_ELEMENT_ADD
---------- ---------- ------------------ ---------- ----------------
         6        218 segment header     scur       0000000080F7DBE0
         6        219 data block         scur       0000000080F8F098
         6        220 data block         scur       0000000080F8E490
         6        221 data block         scur       0000000080F76048
         6        222 data block         scur       0000000080F7F1C0
         6        223 data block         cr         00
         6        223 data block         xcur       0000000080F7A878

7 rows selected.
Look at the above output.  In instance1, now we have two buffers for block#223. The shared current buffer became CR mode.  Now we have new buffer with exclusive current(XCUR).

Let us check the instance 2.

cfdemo@govinddb2> @vbh
Enter value for data_obj_id: 93446
old   9: where objd = &data_obj_id
new   9: where objd = 93446

     FILE#     BLOCK# CLASS_TYPE         STATUS     LOCK_ELEMENT_ADD
---------- ---------- ------------------ ---------- ----------------
         6        218 segment header     scur       0000000080FD6560
         6        219 data block         scur       0000000080FE6898
         6        220 data block         scur       0000000080FEBCD0
         6        221 data block         scur       0000000080FBFD88
         6        222 data block         scur       0000000080FCE680
         6        223 data block         cr         00

6 rows selected.
See... The buffer #223 changed from SCUR to CR. Why?  The reason is, we made changes the block #223 on instance 1 and it acquired exclusive lock. Hence all other buffer becomes CR. Remember as i said, there should be only one XCUR  for every block on entire cluster. 

Let us select the same row on instance 2 and see how the buffers status changed on both instance.
cfdemo@govinddb2> select ename from emp where empno=2;

ENAME
----------
ZZZZ

1 row selected.

cfdemo@govinddb2>@vbh
Enter value for data_obj_id: 93446
old   9: where objd = &data_obj_id
new   9: where objd = 93446

     FILE#     BLOCK# CLASS_TYPE         STATUS     LOCK_ELEMENT_ADD
---------- ---------- ------------------ ---------- ----------------
         6        218 segment header     scur       0000000080FD6560
         6        219 data block         scur       0000000080FE6898
         6        220 data block         scur       0000000080FEBCD0
         6        221 data block         scur       0000000080FBFD88
         6        222 data block         scur       0000000080FCE680
         6        223 data block         cr         00
         6        223 data block         cr         00

7 rows selected.
cfdemo@govinddb2>

Now it generated another CR copy for block #223. The reason is, Cache fusion generate the CR copy  in instance 1 and transferred to instance 2 via cluster interconnect.  Hence we see two CR copies on instance 2.   Let us swing  back and check the instance 1 and see any thing changed.

cfdemo@govinddb1> @vbh
Enter value for data_obj_id: 93446
old   9: where objd = &data_obj_id
new   9: where objd = 93446

     FILE#     BLOCK# CLASS_TYPE         STATUS     LOCK_ELEMENT_ADD
---------- ---------- ------------------ ---------- ----------------
         6        218 segment header     scur       0000000080F7ED60
         6        219 data block         scur       0000000080F93698
         6        220 data block         scur       0000000080F9B690
         6        221 data block         scur       0000000080F794C8
         6        222 data block         scur       0000000080F82F00
         6        223 data block         cr         00
         6        223 data block         xcur       0000000080F845F8
         6        223 data block         cr         00

8 rows selected.
cfdemo@govinddb1>

hmm... look at here on instance 1. Previously there was on CR and one XCUR copy. But now we see two CR copy and 1 XCUR copy. Why?  Because, Oracle generate CR copy from XCUR and sent over to intance2 via interconnect.  It is not deleting the CR copy on instance 1 after sent over to instance 2.  Now two CR + 1 XCUR on instance 1.

Interesting!  imagine how many buffers are creating for small tiny table  on the database.

Confusion:  Whenever we make the changes, oracle acquire XCUR lock and downgrade the other buffer copy lock to CR across the cluster. Instance 1 made changes and it generated XCUR and other copy becomes CR.  Instance 2 reads the same block, there is another CR copy created on both instance. Why?  Instance one generate CR copy and keep one copy in instance 1 and send the same copy to instance 2.

Scenario 3(WRITE-WRITE) : Instance 1 update  the block and instance 2 is also updating  the same block. let us check the buffer status.    


cfdemo@govinddb1> update emp set ename='B' where empno=3;

1 row updated.

cfdemo@govinddb1> @vbh
Enter value for data_obj_id: 93446
old   9: where objd = &data_obj_id
new   9: where objd = 93446

     FILE#     BLOCK# CLASS_TYPE         STATUS     LOCK_ELEMENT_ADD
---------- ---------- ------------------ ---------- ----------------
         6        218 segment header     scur       0000000080F7ED60
         6        219 data block         scur       0000000080F93698
         6        220 data block         scur       0000000080F9B690
         6        221 data block         scur       0000000080F794C8
         6        222 data block         scur       0000000080F82F00
         6        223 data block         cr         00
         6        223 data block         cr         00
         6        223 data block         cr         00
         6        223 data block         xcur       0000000080F845F8

9 rows selected.
cfdemo@govinddb1>

Look at here.. previously there was 2CR + 1 XCUR copy. Now we see one more CR copy added. Why?  The reason is, before it make the changes, Oracle take a CR copy  and make the changes on the buffer. Other readers can read the CR copy for consistent read. This is the reason, one more CR copy is added on instance 1.  Let us swing back to instance 2 and check the buffer status.

cfdemo@govinddb2> @vbh
Enter value for data_obj_id: 93446
old   9: where objd = &data_obj_id
new   9: where objd = 93446

     FILE#     BLOCK# CLASS_TYPE         STATUS     LOCK_ELEMENT_ADD
---------- ---------- ------------------ ---------- ----------------
         6        218 segment header     scur       0000000080FE1460
         6        219 data block         scur       0000000080FF2058
         6        220 data block         scur       0000000080FF7490
         6        221 data block         scur       0000000080FCBE08
         6        222 data block         scur       0000000080FDCA00
         6        223 data block         cr         00
         6        223 data block         cr         00

7 rows selected.
cfdemo@govinddb2>

There is no additional buffer copy on instance  2 which is correct.  Because, we have not made any changes on instance 2. 

Let us update the row on instance 2 for the same block.

cfdemo@govinddb2> update emp set ename='B' where empno=4;

1 row updated.

cfdemo@govinddb2>@vbh
Enter value for data_obj_id: 93446
old   9: where objd = &data_obj_id
new   9: where objd = 93446

     FILE#     BLOCK# CLASS_TYPE         STATUS     LOCK_ELEMENT_ADD
---------- ---------- ------------------ ---------- ----------------
         6        218 segment header     scur       0000000080FE1460
         6        219 data block         scur       0000000080FF2058
         6        220 data block         scur       0000000080FF7490
         6        221 data block         scur       0000000080FCBE08
         6        222 data block         scur       0000000080FDCA00
         6        223 data block         cr         00
         6        223 data block         cr         00
         6        223 data block         cr         00
         6        223 data block         xcur       0000000080FDAC78

9 rows selected.
cfdemo@govinddb2>

Now we have one more CR copy and one XCUR copy added on instance 2. Why?  Because, we modified the same block in instance 2. Cache fusion generate the CR copy and sent over to instance 2. After that, it takes one more copy as XCUR and acquired exclusive lock.  The new CR copy is for other readers. The XCUR copy is latest and greatest image copy for the block#223.

Let us swing back to instance 1 and check the buffer status.

cfdemo@govinddb1> @vbh
Enter value for data_obj_id: 93446
old   9: where objd = &data_obj_id
new   9: where objd = 93446

     FILE#     BLOCK# CLASS_TYPE         STATUS     LOCK_ELEMENT_ADD
---------- ---------- ------------------ ---------- ----------------
         6        218 segment header     scur       0000000080F7ED60
         6        219 data block         scur       0000000080F93698
         6        220 data block         scur       0000000080F9B690
         6        221 data block         scur       0000000080F794C8
         6        222 data block         scur       0000000080F82F00
         6        223 data block         cr         00
         6        223 data block         cr         00
         6        223 data block         cr         00
         6        223 data block         cr         00
         6        223 data block         pi         0000000080F845F8

10 rows selected.
cfdemo@govinddb1>

Look at instance 1 now.  Previously it was 3 CR + 1 XCUR. Now we see 4 CR + 1 PI copy. Why?  Instance 2 modified the block. so, Cache Fusion creates new CR copy in instance 1 and sent over to instance 2 and covert XCUR copy to PI image(PI image will be used for instance recovery).  The PI image will be discarded once the data is written to disk.

Also see how the XCUR mode is changing back and forth on both the instance. Also block 223 has 4 CR copy  in instance 1 and 3 CR copy + XCUR copy in instance 2.  so.. one  block can be duplicated multiple CR copies and always one XCUR copy across the instance.  Every instance can go up to 6 CR copy for the same block and one XCUR copy.  Hence every block can have up to 7 copies for each instance.

 PI copy is not used for consistent read or current image. PI is copy of the block before the block the written to disk. It is specifically used for instance recovery. Oracle use the PI copy to reconstruct the block during the instance recovery.  It does not have to go through redo log when it has PI copy. PI copy will be discarded once the data is written to disk. 

Let us run the checkpoint on instance 1 and see if PI image is discarded or not...

cfdemo@govinddb1> alter system checkpoint;

System altered.

cfdemo@govinddb1> @vbh
Enter value for data_obj_id: 93446
old   9: where objd = &data_obj_id
new   9: where objd = 93446

     FILE#     BLOCK# CLASS_TYPE         STATUS     LOCK_ELEMENT_ADD
---------- ---------- ------------------ ---------- ----------------
         6        218 segment header     scur       0000000080F7ED60
         6        219 data block         scur       0000000080F93698
         6        220 data block         scur       0000000080F9B690
         6        221 data block         scur       0000000080F794C8
         6        222 data block         scur       0000000080F82F00
         6        223 data block         cr         00
         6        223 data block         cr         00
         6        223 data block         cr         00
         6        223 data block         cr         00
         6        223 data block         cr         00

10 rows selected.


cfdemo@govinddb1>
After the checkpoint,  PI copy is discarded.. But see there is one more CR copy. I guess, Oracle converted PI image to CR copy again. Just my guess. :-) 

Let me check the instance 2 and see any changes on buffers.

cfdemo@govinddb2> alter system checkpoint;

System altered.

cfdemo@govinddb2>@vbh
Enter value for data_obj_id: 93446
old   9: where objd = &data_obj_id
new   9: where objd = 93446

     FILE#     BLOCK# CLASS_TYPE         STATUS     LOCK_ELEMENT_ADD
---------- ---------- ------------------ ---------- ----------------
         6        218 segment header     scur       0000000080FE1460
         6        219 data block         scur       0000000080FF2058
         6        220 data block         scur       0000000080FF7490
         6        221 data block         scur       0000000080FCBE08
         6        222 data block         scur       0000000080FDCA00
         6        223 data block         cr         00
         6        223 data block         cr         00
         6        223 data block         cr         00
         6        223 data block         xcur       0000000080FDAC78

9 rows selected.
cfdemo@govinddb2>

No change on the instance 2.

Confusion:  Whenever we make the changes on the same block on both the instance, oracle crates PI copy for instance recovery. Also generate additional CR copies on both instance.

Let us flush the buffer cache on both instance and verify the buffer status.


cfdemo@govinddb1> alter system flush buffer_cache;



System altered.



cfdemo@govinddb1> @xbh

Enter value for data_obj_id: 93446

old   9: where objd = &data_obj_id

new   9: where objd = 93446


     FILE#     BLOCK# CLASS_TYPE         STATUS     LOCK_ELEMENT_ADD
---------- ---------- ------------------ ---------- ----------------
         6        218 segment header     free       00
         6        220 data block         free       00
         6        221 data block         free       00
         6        222 data block         free       00
         6        223 data block         free       00
         6        223 data block         free       00
         6        223 data block         free       00
         6        223 data block         free       00
         6        223 data block         free       00

9 rows selected.
cfdemo@govinddb2> alter system flush buffer_cache;

System altered.

cfdemo@govinddb2>@xbh
Enter value for data_obj_id: 93446
old   9: where objd = &data_obj_id
new   9: where objd = 93446

     FILE#     BLOCK# CLASS_TYPE         STATUS     LOCK_ELEMENT_ADD
---------- ---------- ------------------ ---------- ----------------
         6        218 segment header     free       00
         6        219 data block         free       00
         6        220 data block         free       00
         6        221 data block         free       00
         6        222 data block         free       00
         6        223 data block         free       00
         6        223 data block         free       00
         6        223 data block         free       00
         6        223 data block         free       00

9 rows selected.
cfdemo@govinddb2>

Now all the buffers are freed up and empty. Hope this post helps to explain how cache fusion know  where to get the block.

Question & Answers :

If i have 100MB database, I can have 100MB buffer cache to completely avoid physical reads?
NO.  As we see in the above post, the buffers are generating multiples copies for single block. It can go up to 7 CR copy for single block and XCUR copy. 

At what mode, buffers can be retrieved?
Two modes. consistent read(CR) mode and Current mode(shared & exclusive)

How many XCUR buffer copy in the cluster?
There should be only one on entire cluster. But we can have one shared current per instance.  Also we can go up to 6 CR copy per instance.

What is the difference between buffer lock and row lock?
These are two different things.  Oracle acquire row lock the moment the record is changed. The  row lock will be released the moment we commit or rollback the changes. The buffer lock will not be released  even if we commit or rollback the changes. Buffer lock will stay until we flush the buffer cache or aged out the buffers.

What are background process involved in cache fusion?

Global Cache service : It provides the buffer from one instance to another instance. But it does not know who has what type of buffer lock. 

Global Enqueue service : It holds the information on the locks on the buffers. The lock info is available in V$LOCK_ELEMENT & V$BH.LOCK_ELEMENT.

Global Resource  Directory : It list all the master instance of all the buffers, Below query use to find out the master instance of each buffer.

What is Master instance in Cache Fusion?
Every buffer has master instance. It is also called resource master.  When instance wants lock on the specific buffer, it has to reach master instance. Master instance will grant or revoke the privileges to acquire the lock. 

How to find  the  Master instance(resource master) for specific object?
There may be several ways we can find this info. However, here is one way we can find the resource master.

93446 is EMP table in CFDEMO schema in my database.

sys@govinddb1> select b.dbablk,r.kjblmaster master_node
from x$le l, x$kjbl r, x$bh b
where b.obj = &data_object_id
and b.le_addr = l.le_addr
and l.le_kjbl = r.kjbllockp  2    3    4    5
  6  /
Enter value for data_object_id: 93446
old   3: where b.obj = &data_object_id
new   3: where b.obj = 93446

    DBABLK MASTER_NODE
---------- -----------
       222           1
       219           1
       221           1
       218           1
       220           1
sys@govinddb1>

Any helpful Video's  for Cache fusion?
Please refer this Video.
Also few more slies  slde1slide2slide3slide4slide5slide6