govind

Friday, February 20, 2009

Character and Byte Semantics in Oracle

Historically Oracle character data type column have been defined by using number of bytes. This is good as long as total number of characters equal to total number of bytes. When we maintain multilingual version of application, the database needs to set as a multi byte character set. For instance, the column data type length is VARCHAR2(10). In this case, we can store 10 characters for single byte character set. When we moved to multi byte character set, we can not store 10 characters. Oracle9i has solved this problem with the introduction of character and byte length semantics. Oracle9i introduced NLS_LENGTH_SEMANTICS init parameter to solve this issue. NLS_LENGTH_SEMANTICS enables you to create CHAR and VARCHAR2 columns using either byte or character length semantics.

There are three different ways, we can declare CHAR/VARCHAR2 data type.
Let us take VARCHAR2(10) as an example here.

1. VARCHAR2(10)
2. VARCHAR2(10 BYTE)
3. VARCHAR2(10 CHAR)

What is the difference between VARCHAR2(10), VARCHAR2(10 BYTE) & VARCHAR2(10 CHAR).

VARCHAR2(10 BYTE)

When we declare the data type as VARCHAR2(10 BYTE), oracle stores only 10 bytes of data, regardless of how many characters this represents. This is perfect when the database has only single byte character set. Since total number of charater is equal to total number of bytes.. So this case, oracle stores 10 Characters.. But when database handles multilingual version of application, then oracle stores multi byte characters. In this case, oracle can store only 5 character.

VARCHAR2(10 CHAR)

This allows the specified number of characters to be stored in the column regardless of number of bytes this equates to.. Oracle can store 10 character regardless of single byte character set or multi byte character set.

VARCHAR2(10)

Oracle stores 10 Character or 10 Bytes depends on NLS_LENGTH_SEMANTICS parameter value. if it sets to BYTE, then we can store 10 bytes(if it is mutibyte character set, then we can store 5 characters). The default value is BYTE. When this parameter is CHAR, then we can store 10 characters regardless of single byte character set or multi byte character set. When we move the database to multibyte characterset, we can change NLS_LENGTH_SEMANTICS paramter to CHAR to resolve the storage issue. Existing columns will not be affected when we change the value for this parameter.

The default character semantics of the database or session can be altered using the NLS_LENGTH_SEMANTICS parameter like:

SQL> alter system set nls_length_semantics=char;

System altered.

SQL> alter system set nls_length_semantics=byte;

System altered.

SQL> alter session set nls_length_semantics=char;

Session altered.

SQL> alter session set nls_length_semantics=byte;

Session altered.

Note : NLS_LENGTH_SEMANTICS does not apply to tables in SYS and SYSTEM. The data dictionary always uses byte semantics.

Thursday, December 18, 2008

Bulk Binds

Oracle use two engine to process PLSQL block and subprogram. All SQL code is processed by SQL engine and PLSQL code is processed by PLSQL engine. When you run SQL code in the loop, oracle has to switch between SQL engine and PLSQL engine for each iteration. So there is overhead associated for each context switch. Bulk Bind is introduced in oracle8i to reduce the context switch.. This feature is enhanced in further versions...

Connected to:
Oracle9i Enterprise Edition Release 9.2.0.8.0 - 64bit Production
With the Partitioning, OLAP and Oracle Data Mining options
JServer Release 9.2.0.8.0 - Production

SQL> CREATE TABLE emp(
2 empno NUMBER(10),
3 empname VARCHAR2(50));

Table created.

SQL>
SQL> ALTER TABLE emp ADD (
2 CONSTRAINT emppk PRIMARY KEY(empno));

Table altered.

SQL>

The total time taken to insert the 50,000 records through FOR LOOP is .071 Minuts. Because, 50,000 context switch happens in the FOR LOOP.

SQL> set serveroutput on
SQL> DECLARE
2 TYPE emp_tab IS TABLE OF EMP%ROWTYPE;
3 t_tab emp_tab := emp_tab();
4 l_start NUMBER DEFAULT DBMS_UTILITY.GET_TIME;
5 BEGIN
6 FOR i IN 1 .. 50000 LOOP
7 t_tab.extend;
8 t_tab(t_tab.last).empno := i;
9 t_tab(t_tab.last).empname := 'name:' To_Char(i);
10 END LOOP;
12 FOR i IN t_tab.first .. t_tab.last LOOP
13 INSERT INTO emp (empno, empname)
14 VALUES (t_tab(i).empno, t_tab(i).empname);
15 END LOOP;
16 COMMIT;
17 DBMS_OUTPUT.PUT_LINE('Total time ... 'ROUND(ROUND((DBMS_UTILITY.GET_TIME - l_start)/100, 2)/6
0,3));
18 END;
19 /
Total time ... .071

PL/SQL procedure successfully completed.

SQL>

Here time taken to insert the 50,000 records through bulk binds is 0.015 Minuts. When we compared to above example, it is much faster.. since it is completed reduced the context switches between SQL engine to PLSQL engine.

SQL> TRUNCATE TABLE emp;

Table truncated.

SQL>
SQL> DECLARE
2 TYPE emp_tab IS TABLE OF emp%ROWTYPE;
3 l_start NUMBER DEFAULT DBMS_UTILITY.GET_TIME;
4 t_tab emp_tab := emp_tab();
5 BEGIN
6 FOR i IN 1 .. 50000 LOOP
7 t_tab.extend;
9 t_tab(t_tab.last).empno := i;
10 t_tab(t_tab.last).empname := 'name:' To_Char(i);
11 END LOOP;
12
13 FORALL i IN t_tab.first .. t_tab.last
14 INSERT INTO emp VALUES t_tab(i);
15 COMMIT;
16 DBMS_OUTPUT.PUT_LINE('Total time ... 'ROUND(ROUND((DBMS_UTILITY.GET_TIME - l_start)/100, 2)/6
0,3));
17 END;
18 /
Total time ... .015

PL/SQL procedure successfully completed.

SQL>

Bulk binds can also improve the performance when loading collection from queries. BULK COLLECT INTO clause can collect the data into collections. In the below example, it took 0.021 Minuts to fetch 50,000 records with out bulk binds

SQL> DECLARE
2 TYPE emp_tab IS TABLE OF emp%ROWTYPE;
3 t_tab emp_tab := emp_tab();
4 l_start NUMBER DEFAULT DBMS_UTILITY.GET_TIME;
5 CURSOR c_data IS
6 SELECT *
7 FROM emp;
8 BEGIN
9 FOR cur_rec IN c_data LOOP
10 t_tab.extend;
11 t_tab(t_tab.last).empno := cur_rec.empno;
12 t_tab(t_tab.last).empname := cur_rec.empname;
13 END LOOP;
14 DBMS_OUTPUT.PUT_LINE('Total time ... 'ROUND(ROUND((DBMS_UTILITY.GET_TIME - l_start)/100, 2)/6
0,3));
15 END;
16 /
Total time ... .021

PL/SQL procedure successfully completed.

In the below case, it took 0.004 Minutes to fetch the data into collection. When we compared to above case, it is faster.. since it is using BULK COLLECT to reduce the context switch.

SQL>
SQL> DECLARE
2 TYPE emp_tab IS TABLE OF emp%ROWTYPE;
3 t_tab emp_tab := emp_tab();
4 l_start NUMBER DEFAULT DBMS_UTILITY.GET_TIME;
5 BEGIN
6 SELECT empno,empname
7 BULK COLLECT INTO t_tab
8 FROM emp;
9 DBMS_OUTPUT.PUT_LINE('Total time ... 'ROUND(ROUND((DBMS_UTILITY.GET_TIME - l_start)/100, 2)/6
0,3));
10 END;
11 /
Total time ... .004

PL/SQL procedure successfully completed.

SQL>

Oracle 9i Release 2 allows to use ROW key word to update the record. When we update, we can not use record type definition with bulk binds. The below case, we can not use bulk bind to improve the performance, since we are using record type definition. This is one restriction on this version(oracle9i R2).

SQL> DECLARE
2 TYPE emp_tab IS TABLE OF emp%ROWTYPE;
3 l_start NUMBER DEFAULT DBMS_UTILITY.GET_TIME;
4 t_tab emp_tab := emp_tab();
5 BEGIN
6 FOR i IN 1 .. 10000 LOOP
7 t_tab.extend;
9 t_tab(t_tab.last).empno := i;
10 t_tab(t_tab.last).empname := 'name' To_Char(i);
11 END LOOP;
13 FOR i IN t_tab.first .. t_tab.last LOOP
14 UPDATE emp
15 SET ROW = t_tab(i)
16 WHERE empno = t_tab(i).empno;
17 END LOOP;
18 COMMIT;
19 DBMS_OUTPUT.PUT_LINE('Total time ... 'ROUND(ROUND((DBMS_UTILITY.GET_TIME - l_start)/100, 2)/6
0,3));
20 END;
21 /
Total time ... .012

PL/SQL procedure successfully completed.

SQL>
The below case, we are using bulk bind since, we are not using record type definition. The time took to update the 50,000 records is 0.007 Minuts. When we compare with above case, the performance is improved well.

SQL> DECLARE
2 TYPE empno_tab IS TABLE OF emp.empno%TYPE;
3 TYPE emp_tab IS TABLE OF emp%ROWTYPE;
4 l_start NUMBER DEFAULT DBMS_UTILITY.GET_TIME;
5 t_empno empno_tab := empno_tab();
6 t_tab emp_tab := emp_tab();
7 BEGIN
8 FOR i IN 1 .. 10000 LOOP
9 t_empno.extend;
10 t_tab.extend;
11 t_empno(t_empno.last) := i;
12 t_tab(t_tab.last).empno := i;
13 t_tab(t_tab.last).empname := 'name ' To_Char(i);
14 END LOOP;
16 FORALL i IN t_tab.first .. t_tab.last
17 UPDATE emp
18 SET ROW = t_tab(i)
19 WHERE empno = t_empno(i);
21 COMMIT;
22 DBMS_OUTPUT.PUT_LINE('Total time ... 'ROUND(ROUND((DBMS_UTILITY.GET_TIME - l_start)/100, 2)/6
0,3));
23 END;
24 /
Total time ... .007

PL/SQL procedure successfully completed.

SQL>

When to use Bulk Binds: There is no univeral rule exists to dictate when to use Bulk binds. If PLSQL code reads only few hundered records, then you will not see significant performance improvement for bulk binds. When you read huge number of records and have multiple insert/update, then you can think of bulk binds. If you have the luxury of time, you can test your code both with and without bulk binds and decide to go for this feature.

Thursday, November 6, 2008

Convert Non-Partitioned to Partitioned Table

This article is written in oracle10g R2. In any database environment, if we need to covert any huge table to partition table, then first we need to think about two factor.

1. Partition key
2. What type of partition we need to use.

Once we decided the above key factor, then there are couple of ways, we can convert the table to partition table. We have two option to convert the regular table to partition table.
Option 1
This option requires table down time and it might fit if business allows the application down time. Here are the steps to follow.

1. Stop using the table, make sure no one is chaning the data in table.
2. Create the partition table with the same structure of regular table.
3. Copy the data from regular table to partition table.
4. Create the constriant, keys, indexes on partition table.
5. Drop the original table
6. Analyze the partition table.

Option 2

This is ideal option to convert the regular table to partition table. It does not require down time and every one can use the table during the conversion. This option is introduced in oracle9i and enhanced to 10g. We are using online redefinition(DBMS_REDEFINITION) to convert the table to partition table.

Some restriction of using DBMS_REDEFINITION.

1· Cannot belong to SYS or SYSTEM Schema.
2. The table to be redefined online should not be using User-defined data types
3· Should not be a clustered table.
4. Should not have MV Logs or MVs defined
5. You cannot move a table from one schema to another using Online Redefinition feature.

Here are the steps to convert to partition table.

Step 1
Check to make sure that the table can use the online redefintion feature

Oracle Database 10g Enterprise Edition Release 10.2.0.3.0 - ProductionWith the Partitioning, OLAP and Data Mining options
SQL> EXEC DBMS_REDEFINITION.CAN_REDEF_TABLE('SCOTT','EMP');
PL/SQL procedure successfully completed.

Step 2
Create the temporary partition table as same structure of original table.

SQL> Create table EMP_part
2 (EMPNO NUMBER(4),
3 ENAME VARCHAR2(10),
4 JOB VARCHAR2(9),
5 MGR NUMBER(4),
6 HIREDATE DATE,
7 SAL NUMBER(7,2),
8 COMM NUMBER(7,2),
9 DEPTNO NUMBER(2))
10 partition by range (SAL)
11 (Partition p1 values less than (1000),
12 Partition p2 values less than (2000),
13 Partition p3 values less than (3000),
14 Partition p4 values less than (4000),
15 Partition max values less than (maxvalue))
16 tablespace users;

Table created.

Step 3
Start the online redefinition process.

SQL> EXEC DBMS_REDEFINITION.START_REDEF_TABLE('SCOTT','EMP','EMP_PART');

PL/SQL procedure successfully completed.

Step 4
Here is where oracle10g feature come into play. We do not need to copy any dependent objects to part_emp table. Dependent objects are like grants, synonym, triggers etc.

SQL> VARIABLE NUM_ERRORS NUMBER;
SQL> EXEC DBMS_REDEFINITION.COPY_TABLE_DEPENDENTS('SCOTT','EMP','EMP_PART', 1,TRUE,TRUE,TRUE,FALSE,
:NUM_ERRORS,FALSE);

PL/SQL procedure successfully completed.

SQL> PRINT NUM_ERRORS

NUM_ERRORS
----------
0

Step 5
Resync the table

SQL> EXEC DBMS_REDEFINITION.SYNC_INTERIM_TABLE('SCOTT','EMP','EMP_PART');

PL/SQL procedure successfully completed.
Step 6

Complete the online redefinition

SQL> EXEC DBMS_REDEFINITION.FINISH_REDEF_TABLE('SCOTT','EMP','EMP_PART');

PL/SQL procedure successfully completed.

Step 7

Make sure emp table has all partitions as we created in emp_part table
SQL> Select partition_name, high_value from user_tab_partitions where table_name='EMP';

PARTITION_NAME HIGH_VALUE
------------------------------ ---------------------------------------------------------------------
MAX MAXVALUE
P1 1000
P2 2000
P3 3000
P4 4000

SQL> select partition_name, high_value from user_tab_partitions where table_name='EMP_PART';

no rows selected

Step 8
Make sure all the dependent objects are copied .

SQL> SELECT TRIGGER_NAME FROM USER_TRIGGERS
2 WHERE TABLE_NAME='EMP';

TRIGGER_NAME
------------------------------
EMPTRIG

SQL> select constraint_name from user_constraints
2 where table_name='EMP';

CONSTRAINT_NAME
------------------------------
PK_EMP
FK_DEPTNO

SQL>

Note : The only problem i see here is, if we have any index on the original table, it will convert to global index on partition table. If we need the index to be local index, then we have to drop and recreate the index.

Wednesday, November 5, 2008

When to Use Index Organized table?

What is Index Organized table?

In Heap tables, oracle stores the data in data segment with ROWID. When we create the index, oracle stores the index key value and ROWID in the index segment. So the index key value and rowid are stored in both index & data segments. Whenever we search any data record, it first scan the index segment and find the rowid and match the rowid in data segment to get the particular record. So the drawback here is, rowid & index key value is stored in two places. The space consumption is one issue. Another issue is, oracle has to scan both data and index segment to obtain the record.

To overcome this problem, oracle introduced IOT in version 8. In IOT, the data is stored in the primary key index itself with the rows placed in a key-sequenced order, thus eliminating the necessity of duplicate data being stored in index and table. When we search the record, it does scan only on index segment.

IOT do not consider ROWID. IOT stores the data in B-tree index and sorts the data on primary key whenever we insert, update the record.

Example to create IOT

SQL> CREATE TABLE iot_emp
2 (empno number PRIMARY KEY,
3 empname varchar2(10),
4 salary number)
5 ORGANIZATION INDEX
6 TABLESPACE RPTS_STAGE_DATA
7 PCTTHRESHOLD 20
8 INCLUDING empname
9 OVERFLOW TABLESPACE ods_stage_data
10 /

Table created.

SQL>

ORGANIZATION INDEX denotes that the table is index organized table.

PCTTHRESHOLD Specifies the percentage of space reserved in the index block for an index-organized table row. Any portion of the row that exceeds the specified threshold is stored in the overflow segment. PCTTHRESHOLD must be a value from 1 to 50. PCTTHRESHOLD must be large enough to hold the primary key.

OVERFLOW - Index-organized table data rows exceeding the specified threshold will be placed in overflow segment.

INCLUDING - which can be used to specify nonkey columns that are to be stored in the overflow data segment.

When to use IOT?

Large tables where primary key exact fetch is time critical and there is a low volume of DML operations. In most cases, in the category of exact fetch via primary key, IOT organization is much faster than HEAP. This advantage varies between 15% and 400%, depending upon the row length, number of columns and the use of the PCTTHRESHOLD option.

Heavy volume of DML operations occurring in IOT, usually fragments the table, requiring frequent table reorganization. An index-organized table can be reorganized without invalidating its secondary indexes, and can be performed online thus reducing or even eliminating downtime.

Index-organized tables support parallel features for loading, index creation, and scans required for handling large volumes of data. Partitioned index-organized tables are also supported, so that each partition can be loaded concurrently. Data warehousing applications using star schemas can also gain performance and scalability by implementing fact tables as index-organized tables for efficient execution of star queries. All these features make index-organized tables suitable for handling large scale data.

Advantage of IOT?

1. Scanning both data and index segment is eliminated.
2. Storage requirement is reduced as data is stored only in primary key index segment.

Thursday, October 23, 2008

Timestamp data type in Oracle9i

Oracle9i introduced new date and time related data types.

1. TIMESTAMP
2. TIMESTAMP WITH TIME ZONE
3. TIMESTAMP WITH LOCAL TIME ZONE
4. INTERVAL DAY TO SECOND
5. INTERVAL YEAR TO MONTH

We can also change the database timezone as below. We can use v$timezone_names dictionary to view all the time zone information

ALTER SESSION SET TIME_ZONE = 'PST'

TIMESTAMP datatype is very much like the DATE datatype. it can store both the date and time values. It is not like DATE datatype, you can also specify the precision of the factional seconds value in the TIMESTAMP datatype. The default is 6 digit, but we can specify a value within the range from 0 through 9.

SQL> create table calltrack(
2 callnum integer,
3 calltime TIMESTAMP(4));

Table created.

We can change the NLS_TIMESTAMP_FORMAT as we required...

SQL> alter session set nls_timestamp_format = 'MMDDYYHH24MISSFF';

Session altered.

Now insert one record and select the timestamp field.

SQL> insert into calltrack values(1,sysdate);

1 row created.

SQL> select calltime from calltrack;

CALLTIME
---------------------------------------------------------------------------
1024080953590000

SQL>

TIMESTAMP WITH TIME ZONE data type stores time zone displacement which needs additional bytes of storage. We can also change the NLS_TIMESTAMP_TZ_FORMAT at session level.
SQL> create table calltime(
2 callnum integer,
3 calltime timestamp(4) with time zone);

Table created.

SQL> insert into calltime values(1,sysdate);

1 row created.

SQL> select calltime from calltime;

CALLTIME
-------------------------------------------------------
04-APR-09 11.10.40.0000 PM -04:00

SQL> select vsize(calltime) from calltime;

VSIZE(CALLTIME)
---------------
13

TIMESTAMP WITH LOCAL TIME ZONE includes time zone information, but unlike the TIMESTAMP WITH TIME ZONE datatype. It does not store the time zone displacement in additional bytes of storage. Instead, it stores the time values normalized in terms of the database time zone. When user attempts to retrieve this information, the database dispalys the information in terms of local time zone of the user's session.

Let us create a table with this data type.

SQL> create table calltrack(
2 callnum integer,
3 calltime timestamp(4) with local time zone);

Table created.

SQL> insert into calltrack values(1,sysdate);

1 row created.

SQL> select calltime from calltrack;

CALLTIME
---------------------------------------------------------------------------
24-OCT-08 10.58.34.0000 AM
SQL>

Now the date is displaying with Eastern US time. Now let us change the database time zone to Pacific time on session level.

SQL> ALTER SESSION SET TIME_ZONE = 'PST';

Session altered.

SQL> select calltime from calltrack;

CALLTIME
---------------------------------------------------------------------------
24-OCT-08 07.58.34.0000 AM

SQL>

INTERVAL DAY TO SECOND can store a time interval value in terms of days, hours, minutes and seconds. We can specify the precision for the number of days in a range from 0 to 9, the default being 2. We can also specify the precision for the fractional seconds in a range from 0 to 9, the default being 6.

SQL CREATE TABLE promotions (
2 promotion_id INTEGER,
3 name VARCHAR2(30),
4 duration INTERVAL DAY(3) TO SECOND (4))
5 /

Table created.

SQL
SQL
SQL INSERT INTO promotions (promotion_id, name, duration)
2 VALUES (1, '10% off Z Files', INTERVAL '3' DAY);

1 row created.

SQL
SQL INSERT INTO promotions (promotion_id, name, duration)
2 VALUES (2, '20% off Pop 3', INTERVAL '2' HOUR);

1 row created.

SQL
SQL INSERT INTO promotions (promotion_id, name, duration)
2 VALUES (3, '30% off Modern Science', INTERVAL '25' MINUTE);

1 row created.

SQL
SQL INSERT INTO promotions (promotion_id, name, duration)
2 VALUES (4, '20% off Tank War', INTERVAL '45' SECOND);

1 row created.

SQL
SQL INSERT INTO promotions (promotion_id, name, duration)
2 VALUES (5, '10% off Chemistry', INTERVAL '3 2:25' DAY TO MINUTE);

1 row created.

SQL
SQL INSERT INTO promotions (promotion_id, name, duration)
2 VALUES (6, '20% off Creative Yell',
3 INTERVAL '3 2:25:45' DAY TO SECOND);

1 row created.

SQL
SQL INSERT INTO promotions (promotion_id, name, duration)
2 VALUES (7, '15% off My Front Line',
3 INTERVAL '123 2:25:45.12' DAY(3) TO SECOND(2));

1 row created.

scott@orcl> INSERT INTO promotions (promotion_id, name, duration)
2 VALUES (7, '15% off My Front Line',
3 INTERVAL '1253 2:25:45.12' DAY(3) TO SECOND(2));
INTERVAL '1253 2:25:45.12' DAY(3) TO SECOND(2))
*
ERROR at line 3:
ORA-01873: the leading precision of the interval is too small

scott@orcl> INSERT INTO promotions (promotion_id, name, duration)
2 VALUES (7, '15% off My Front Line',
3 INTERVAL '153 2:25:415.12' DAY(3) TO SECOND(2))
4 select duration from promotions;
INTERVAL '153 2:25:415.12' DAY(3) TO SECOND(2))
*
ERROR at line 3:
ORA-01852: seconds must be between 0 and 59

scott@orcl>
SQL select duration from promotions;

DURATION
-------------------------------------------------------------------------
+003 00:00:00.0000
+000 02:00:00.0000
+000 00:25:00.0000
+000 00:00:45.0000
+003 02:25:00.0000
+003 02:25:45.0000
+123 02:25:45.1200

7 rows selected.

SQL
INTERVAL YEAR TO MONTH can store time interval value in terms of years and months. We can specify the precision for the number of years in a range from 0 to , default being 2.

SQL> CREATE TABLE coupons (
2 coupon_id INTEGER,
3 name VARCHAR2(30),
4 duration INTERVAL YEAR(3) TO MONTH);

Table created.

SQL>
SQL> INSERT INTO coupons (coupon_id, name, duration)
2 VALUES (1, '$1 off Z Files', INTERVAL '1' YEAR);

1 row created.

SQL>
SQL> INSERT INTO coupons (coupon_id, name, duration)
2 VALUES (2, '$2 off Pop 3', INTERVAL '11' MONTH);

1 row created.

SQL>
SQL> INSERT INTO coupons (coupon_id, name, duration)
2 VALUES (3, '$3 off Modern Science', INTERVAL '14' MONTH);

1 row created.

SQL>
SQL> INSERT INTO coupons (coupon_id, name, duration)
2 VALUES (4, '$2 off Tank War', INTERVAL '1-3' YEAR TO MONTH);

1 row created.

SQL>
SQL> INSERT INTO coupons (coupon_id, name, duration)
2 VALUES (5, '$1 off Chemistry', INTERVAL '0-5' YEAR TO MONTH);

1 row created.

SQL>
SQL> INSERT INTO coupons (coupon_id, name, duration)
2 VALUES (6, '$2 off Creative Yell', INTERVAL '123' YEAR(3));

1 row created.

SQL>
SQL> SELECT duration FROM coupons;

DURATION
---------------------------------------------------------------------------
+001-00
+000-11
+001-02
+001-03
+000-05
+123-00

6 rows selected.

SQL>

Merge statement in Oracle

Merge statement is introduced in oracle9i and it has enhanced in oracle10g. It is beneficial for the ETL (Extraction, Transformation, and Loading) part of any Business Intelligence process flow, addressing all the needs of highly scalable data transformation inside the database. In data warehouse environment, when we refresh FACT table periodically, the source system cannot distinguish between newly inserted or changed information during the extraction. We need to scan the FACT table multiple times and it is becoming performance issue.

Prior to Oracle9i, these operations were expressed either as a sequence of DMLs (INSERT/UPDATE) or as PL/SQL loops deciding, for each row, whether to insert or update the data.

In oracle9i, it overcomes this problem and we can write both INSERT, UPDATE in one statement called MERGE. It reduces multiple scans and it improves the performance.

Here is sample Merge statement in oracle9i. The source_table is created as the same structure of ALL_OBJECTS with out any data.

Connected to:
Oracle9i Enterprise Edition Release 9.2.0.8.0 - 64bit Production
With the Partitioning, OLAP and Oracle Data Mining options
JServer Release 9.2.0.8.0 - Production

SQL> MERGE INTO source_table a
2 USING all_objects b
3 ON (a.object_id = b.object_id)
4 WHEN MATCHED THEN
5 UPDATE SET a.status = b.status
6 WHEN NOT MATCHED THEN
7 INSERT (object_id, status)
8 VALUES (b.object_id, b.status);

22278 rows merged.

SQL>

In oracle10g, merge statement is enhanced. The new enhancements are

1. Optional INSRET or UDPATE clause
2. Conditional operations for INSERT and UPDATE
3. We can use the delete statement on Merge

Optional INSRET or UDPATE clause

The MATCHED and NOT MATCHED clauses are now optional making all of the following examples valid.

Connected to:
Oracle9i Enterprise Edition Release 9.2.0.8.0 - 64bit Production
With the Partitioning, OLAP and Oracle Data Mining options
JServer Release 9.2.0.8.0 - Production

SQL> MERGE INTO source_table a
2 USING all_objects b
3 ON (a.object_id = b.object_id)
4 WHEN NOT MATCHED THEN
5 INSERT (object_id, status)
6 VALUES (b.object_id, b.status);
VALUES (b.object_id, b.status)
*
ERROR at line 6:
ORA-00905: missing keyword

SQL> MERGE INTO source_table a
2 USING all_objects b
3 ON (a.object_id = b.object_id)
4 WHEN MATCHED THEN
5 UPDATE SET a.status = b.status;
UPDATE SET a.status = b.status
*
ERROR at line 5:
ORA-00905: missing keyword

SQL>

Connected to:
Oracle Database 10g Enterprise Edition Release 10.2.0.4.0 - 64bit Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options

SQL> MERGE INTO source_table a
2 USING all_objects b
3 ON (a.object_id = b.object_id)
4 WHEN NOT MATCHED THEN
5 INSERT (object_id, status)
6 VALUES (b.object_id, b.status);

28603 rows merged.

SQL> MERGE INTO source_table a
2 USING all_objects b
3 ON (a.object_id = b.object_id)
4 WHEN MATCHED THEN
5 UPDATE SET a.status = b.status;

28603 rows merged.

SQL>
Conditional operations for INSERT and UPDATE

Conditional inserts and updates are now possible by using a WHERE clause on these statements

Connected to:
Oracle9i Enterprise Edition Release 9.2.0.8.0 - 64bit Production
With the Partitioning, OLAP and Oracle Data Mining options
JServer Release 9.2.0.8.0 – Production

SQL> MERGE INTO source_table a
2 USING all_objects b
3 ON (a.object_id = b.object_id)
4 WHEN MATCHED THEN
5 UPDATE SET a.status = b.status
6 WHERE b.status != 'VALID'
7 WHEN NOT MATCHED THEN
8 INSERT (object_id, status)
9 VALUES (b.object_id, b.status)
10 WHERE b.status != 'VALID';
WHERE b.status != 'VALID'
*
ERROR at line 6:
ORA-00905: missing keyword

SQL>
SQL>
SQL> MERGE INTO source_table a
2 USING all_objects b
3 ON (a.object_id = b.object_id)
4 WHEN NOT MATCHED THEN
5 INSERT (object_id, status)
6 VALUES (b.object_id, b.status)
7 WHERE b.status != 'VALID';
WHERE b.status != 'VALID'
*
ERROR at line 7:
ORA-00905: missing keyword

SQL>
SQL> MERGE INTO source_table a
2 USING all_objects b
3 ON (a.object_id = b.object_id)
4 WHEN MATCHED THEN
5 UPDATE SET a.status = b.status
6 WHERE b.status != 'VALID';
WHERE b.status != 'VALID'
*
ERROR at line 6:
ORA-00905: missing keyword

SQL>

Connected to:
Oracle Database 10g Enterprise Edition Release 10.2.0.4.0 - 64bit Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options

SQL> MERGE INTO source_table a
2 USING all_objects b
3 ON (a.object_id = b.object_id)
4 WHEN MATCHED THEN
5 UPDATE SET a.status = b.status
6 WHERE b.status != 'VALID'
7 WHEN NOT MATCHED THEN
8 INSERT (object_id, status)
9 VALUES (b.object_id, b.status)
10 WHERE b.status != 'VALID';

33 rows merged.

SQL>
SQL>
SQL> MERGE INTO source_table a
2 USING all_objects b
3 ON (a.object_id = b.object_id)
4 WHEN NOT MATCHED THEN
5 INSERT (object_id, status)
6 VALUES (b.object_id, b.status)
7 WHERE b.status != 'VALID';

0 rows merged.

SQL>
SQL> MERGE INTO source_table a
2 USING all_objects b
3 ON (a.object_id = b.object_id)
4 WHEN MATCHED THEN
5 UPDATE SET a.status = b.status
6 WHERE b.status != 'VALID';

33 rows merged.

SQL>

Delete statement on Merge

An optional DELETE WHERE clause can be used to clean up after a merge operation. Only those rows which match both the ON clause and the DELETE WHERE clause are deleted.

Connected to:
Oracle9i Enterprise Edition Release 9.2.0.8.0 - 64bit Production
With the Partitioning, OLAP and Oracle Data Mining options
JServer Release 9.2.0.8.0 – Production
SQL> MERGE INTO source_table a
2 USING all_objects b
3 ON (a.object_id = b.object_id)
4 WHEN MATCHED THEN
5 UPDATE SET a.status = b.status
6 WHERE b.status != 'VALID'
7 DELETE WHERE (b.status = 'VALID');
WHERE b.status != 'VALID'
*
ERROR at line 6:
ORA-00905: missing keyword

SQL>

Connected to:
Oracle Database 10g Enterprise Edition Release 10.2.0.4.0 - 64bit Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options

SQL> MERGE INTO source_table a
2 USING all_objects b
3 ON (a.object_id = b.object_id)
4 WHEN MATCHED THEN
5 UPDATE SET a.status = b.status
6 WHERE b.status != 'VALID'
7 DELETE WHERE (b.status = 'VALID');

33 rows merged.

SQL>

Restriction On Merge : We cannot update a column that is referenced in the ON condition clause.

Oracle External Tables

Oracle External tables are not like regular table. We can apply DML statement on regular table. But in external table, you can not apply DML statement except SELECT. We can not create index on external table. In oracle9i, read only operations are permitted, but in oracle10g, we can also write out data to an external table, although you can't write to an existing table.

Oracle uses SQL*Loader functionality, through the ORACLE_LOADER access driver to move data from the flat file into the database; it uses a Data Pump access driver to move data out of the DB into a file in an Oracle-proprietary format, and back into the database from files of that format.

This article is written in oracle10g on windows.

Steps to create an External table

Step 1.

Create the directory on server as well as database.

C:\>mkdir ext
C:\>cd ext
C:\ext>

SQL> connect sys as sysdba
Enter password: ********
Connected.
SQL> create directory extdir
2 as 'c:/ext';

Directory created.

SQL> grant read,write on directory extdir to scott;

Grant succeeded.

SQL>

Step2

Place the text file on directory c:/ext location and create the external table as below.

Here is the text file content.

7369,SMITH,CLERK,20
7499,ALLEN,SALESMAN,30
7521,WARD,SALESMAN,30
7566,JONES,MANAGER,20
7654,MARTIN,SALESMAN,30
7698,BLAKE,MANAGER,30
7782,CLARK,MANAGER,10
7788,SCOTT,ANALYST,20
7839,KING,PRESIDENT,10
7844,TURNER,SALESMAN,30
7876,ADAMS,CLERK,20
7900,JAMES,CLERK,30
7902,FORD,ANALYST,20
7934,MILLER,CLERK,10

SQL> show user
USER is "SCOTT"
SQL> create table emptable
2 (empno number,
3 empname varchar2(50),
4 job varchar2(50),
5 deptno number)
6 organization external
7 ( default directory extdir
8 access parameters
9 ( records delimited by newline
10 fields terminated by ',')
11 location ('emp.txt'));

Table created.
SQL>
SQL> select empno,substr(empno,1,10),substr(job,1,10),deptno
2 from emptable;

EMPNO SUBSTR(EMP SUBSTR(JOB DEPTNO
---------- ---------- ---------- ----------
7369 7369 CLERK 20
7499 7499 SALESMAN 30
7521 7521 SALESMAN 30
7566 7566 MANAGER 20
7654 7654 SALESMAN 30
7698 7698 MANAGER 30
7782 7782 MANAGER 10
7788 7788 ANALYST 20
7839 7839 PRESIDENT 10
7844 7844 SALESMAN 30
7876 7876 CLERK 20
7900 7900 CLERK 30
7902 7902 ANALYST 20
7934 7934 CLERK 10

14 rows selected.

SQL>

We could add log file, bad file, discard file for the external table.

SQL> alter table emptable
2 access parameters
3 ( records delimited by newline
4 badfile extdir:'emp.bad'
5 logfile extdir:'emp.log'
6 discardfile extdir:'emp.dsc'
7 fields terminated by ','
8 ) ;

Table altered.

SQL>

Loading data from external table to Regular table

We can load the data from external table to oracle table as below. We can do the same job through SQLLOADER. So, now the question is, when do we use the sqlloader and when do we use the external table for loading data from flat file to oracle table. External table has good filtering capability while loading data from external table to oracle table.

In the below example, we are loading data only for deptno 10 and 20. For this situation, it is worth to try external table... If we want to load only specific set of data from flat file, external table is good option.

SQL> create table empload_db as select * from
2 emptable where deptno in(10,20);

Table created.

SQL> select empno,substr(empno,1,10),substr(job,1,10),deptno
2 from empload_db;

EMPNO SUBSTR(EMP SUBSTR(JOB DEPTNO
---------- ---------- ---------- ----------
7369 7369 CLERK 20
7566 7566 MANAGER 20
7782 7782 MANAGER 10
7788 7788 ANALYST 20
7839 7839 PRESIDENT 10
7876 7876 CLERK 20
7902 7902 ANALYST 20
7934 7934 CLERK 10

8 rows selected.

SQL>

Loading data from regular table to external table.

This feature is introduced in oracle10g. We shoud use oracle_datapump driver to load data into external table.

SQL> create table load_ext
2 organization external
3 ( type oracle_datapump
4 default directory extdir
5 location ('emp.dmp')
6 ) as select * from emptable
7 /

Table created.

SQL>
SQL> select empno,substr(empno,1,10),substr(job,1,10),deptno
2 from load_ext;

EMPNO SUBSTR(EMP SUBSTR(JOB DEPTNO
---------- ---------- ---------- ----------
7369 7369 CLERK 20
7499 7499 SALESMAN 30
7521 7521 SALESMAN 30
7566 7566 MANAGER 20
7654 7654 SALESMAN 30
7698 7698 MANAGER 30
7782 7782 MANAGER 10
7788 7788 ANALYST 20
7839 7839 PRESIDENT 10
7844 7844 SALESMAN 30
7876 7876 CLERK 20
7900 7900 CLERK 30
7902 7902 ANALYST 20
7934 7934 CLERK 10

14 rows selected.

SQL>

You can now move the file you just created, emp.dmp, to another system and create an external table to read the data:

SQL> create table import_load_ext
2 (empno number,
3 empname varchar2(50),
4 job varchar2(50),
5 deptno number)
6 organization external
7 ( type oracle_datapump
8 default directory extdir
9 location ('emp.dmp')
10 );

Table created.

SQL> set linesize 100
SQL> select empno,substr(empno,1,10),substr(job,1,10),deptno
2 from import_load_ext;

EMPNO SUBSTR(EMP SUBSTR(JOB DEPTNO
---------- ---------- ---------- ----------
7369 7369 CLERK 20
7499 7499 SALESMAN 30
7521 7521 SALESMAN 30
7566 7566 MANAGER 20
7654 7654 SALESMAN 30
7698 7698 MANAGER 30
7782 7782 MANAGER 10
7788 7788 ANALYST 20
7839 7839 PRESIDENT 10
7844 7844 SALESMAN 30
7876 7876 CLERK 20
7900 7900 CLERK 30
7902 7902 ANALYST 20
7934 7934 CLERK 10

14 rows selected.

Restriction on external tables :

1. An external table does not allow INSERT, UPDATE, DELETE statement. We get ORA-30657: operation not supported on external organized table error when apply DML statement except SELECT.

2. An external table does not describe how data is stored in the external source.

3. An external table cannot load data into a LONG column.

4. Column name or table names are specified in the external table access parameters, certain values are considered to be reserved words by the access parameter parser. If a reserved word is used as an identifier, it must be enclosed in double quotation marks.

If you want to read more on external tables... Please read here..

http://asktom.oracle.com/pls/asktom/f?p=100:11:0::::P11_QUESTION_ID:6611962171229