govind

Thursday, July 31, 2008

How do we reclaim the wasted space in a segment?

This article is tested in oracle 9.2.0.6.0. Before reclaiming the wasted space, First, we need to find out exactly how much space is wasted in that segment that can be reclaimed. To find out the wasted space in the segment, we can use SPACE_USAGE procedure in DBMS_SPACE pacakge. To use this procedure, the tablespace should be ASSM enabled.

I have used pipe symbol in the PLSQL procedure. But it is not visible in the blog. So i used ### symbol where ever, there is the pipe sign. So please replace the ### with pipe symbol in the code when you want to run this script in your test database.

DECLARE
v_unformatted_blocks number;
v_unformatted_bytes number;
v_fs1_blocks number;
v_fs1_bytes number;
v_fs2_blocks number;
v_fs2_bytes number;
v_fs3_blocks number;
v_fs3_bytes number;
v_fs4_blocks number;
v_fs4_bytes number;
v_full_blocks number;
v_full_bytes number;
BEGIN
dbms_space.space_usage ('INVENTADMIN',
'COLLECTION',
'TABLE',
v_unformatted_blocks,
v_unformatted_bytes,
v_fs1_blocks,
v_fs1_bytes,
v_fs2_blocks,
v_fs2_bytes,
v_fs3_blocks,
v_fs3_bytes,
v_fs4_blocks,
v_fs4_bytes,
v_full_blocks,
v_full_bytes);
dbms_output.put_line('Unformatted Blocks = '###v_unformatted_blocks);
dbms_output.put_line('FS1 Blocks = '###v_fs1_blocks);
dbms_output.put_line('FS2 Blocks = '###v_fs2_blocks);
dbms_output.put_line('FS3 Blocks = '###v_fs3_blocks);
dbms_output.put_line('FS4 Blocks = '###v_fs4_blocks);
dbms_output.put_line('Full Blocks = '###v_full_blocks);
end;
/

Here is the output for the above code.

Unformatted Blocks = 0
FS1 Blocks = 32
FS2 Blocks = 60
FS3 Blocks = 63
FS4 Blocks = 19671
Full Blocks = 5

PL/SQL procedure successfully completed.
SQL>
As per the above output
32 blocks has 0-25% free.
60 blocks has 25-50% free.
63 blocks has 50-75% free.
19671 blocks has 50-75% free.
5 blocks has 100% free

Now we have lot of blocks 50-75% free. so we can reclaim the space. We can reset the HWM to reorganize the space in the segment. Resetting the HWM is discussed in another topic in the same blog. Please click to read how to reset the HWM.

Saturday, July 26, 2008

Sending Emails through Oracle procedure

This procedure was tested in oracle9i. This procedure has ability to send the email TO/CC/BCC.

Before implementing this procedure, have the Email server host name handy. You can find the email server host name from your email software(lotus notes, outlook express etc). You can also check your admin or help desk to find out the email server host name.

There are only four step to implement this email procedure. Just one note, i have used pipe symbol in the plsql procedure. But it is not visible in the blog. So i used ### symbol where ever, there is the pipe sign. So please replace the ### with pipe symbol in the code.

Step 1

Create the table to store the email address.

CREATE TABLE EMAIL_ADDRESS(
EMAIL VARCHAR2(100),
EMAIL_GROUP VARCHAR(20))
/

We can store the emails on different group. In this example, we use DBA, ADMIN, BUSIENSS, MANAGERS. We can add these groups on TO, CC, BCC based on our requirements.

Here is the sample insert statement to store the email address.

insert into email_address values('scott@xyzcomp.com','DBA');
insert into email_address values('robert@xyzcomp.com','DBA');
insert into email_address values('mat@xyzcomp.com','ADMIN');
insert into email_address values('lissa@xyzcomp.com','ADMIN');
insert into email_address values('herman@xyzcomp.com','BUSINESS');
insert into email_address values('martin@xyzcomp.com','BUSINESS');
insert into email_address values('david@xyzcomp.com','MANAGER');
insert into email_address values('nany@xyzcomp.com','MANAGER');

Step2
create the array to store the email address

CREATE OR REPLACE type group_array as table of varchar2(255)
/

Step3

Here is the content of email package. Create this package..

-- The piple symbol is not displaying in the blog. So i placed ### symbol
-- whereever there is a pipe symbol. If we want to test the script in your test database,
-- please replace the ### symbol with pipe symbol.

create or replace PACKAGE EMAIL_PACKAGE IS

function address_email( p_string in varchar2,
p_recipients in group_array ) return varchar2;

procedure send( p_sender_email in varchar2,
p_from in varchar2,
p_to in group_array default group_array(),
p_cc in group_array default group_array(),
p_bcc in group_array default group_array(),
p_subject in varchar2,
p_body in long);

PROCEDURE LISTEMAILS(P_ARRAY_TO_GROUP IN group_array,
P_ARRAY_CC_GROUP IN group_array ,
P_ARRAY_BCC_GROUP IN group_array,
l_data_to OUT group_array,
l_data_cc OUT group_array,
l_data_bcc OUT group_array);

END EMAIL_PACKAGE;
/

create or replace package body EMAIL_PACKAGE as
g_crlf char(2) default chr(13)###chr(10);
g_mail_conn utl_smtp.connection;
g_mailhost varchar2(255) := 'EMAIL_HOST_NAME';

FUNCTION address_email( p_string in varchar2,
p_recipients in group_array ) return varchar2
is
l_recipients long;
begin
for i in 1 .. p_recipients.count
loop
utl_smtp.rcpt(g_mail_conn, p_recipients(i) );
if ( l_recipients is null )
then
l_recipients := p_string ### p_recipients(i) ;
else
l_recipients := l_recipients ### ', ' ### p_recipients(i);
end if;
end loop;
return l_recipients;
end;

procedure send( p_sender_email in varchar2,
p_from in varchar2,
p_to in group_array default group_array(),
p_cc in group_array default group_array(),
p_bcc in group_array default group_array(),
p_subject in varchar2,
p_body in long) IS
l_to_list long;
l_cc_list long;
l_bcc_list long;
l_date varchar2(255) default
to_char( SYSDATE, 'dd Mon yy hh24:mi:ss' );

procedure writeData( p_text in varchar2 )
as
begin
if ( p_text is not null )
then
utl_smtp.write_data( g_mail_conn, p_text ### g_crlf );
end if;
end;

BEGIN
g_mail_conn := utl_smtp.open_connection(g_mailhost, 25);
utl_smtp.helo(g_mail_conn, g_mailhost);
utl_smtp.mail(g_mail_conn, p_sender_email);
l_to_list := address_email( 'To: ', p_to );
l_cc_list := address_email( 'Cc: ', p_cc );
l_bcc_list := address_email( 'Bcc: ', p_bcc );
utl_smtp.open_data(g_mail_conn );

writeData( 'Date: ' ### l_date );
writeData( 'From: ' ### nvl( p_from, p_sender_email ) );
writeData( 'Subject: ' ### nvl( p_subject, '(no subject)' ) );

writeData( l_to_list );
writeData( l_cc_list );

utl_smtp.write_data( g_mail_conn, '' ### g_crlf );
utl_smtp.write_data(g_mail_conn, p_body );
utl_smtp.close_data(g_mail_conn );
utl_smtp.quit(g_mail_conn);
END send;

PROCEDURE LISTEMAILS(P_ARRAY_TO_GROUP IN group_array ,
P_ARRAY_CC_GROUP in group_array ,
P_ARRAY_BCC_GROUP in group_array ,
l_data_to out group_array ,
l_data_cc out group_array ,
l_data_bcc out group_array) as

CURSOR C1 IS SELECT * FROM EMAIL_ADDRESS;

BEGIN

l_data_to := group_array();
l_data_cc := group_array();
l_data_bcc := group_array();

FOR X IN 1..P_ARRAY_TO_GROUP.COUNT LOOP

FOR I IN C1 LOOP

IF I.EMAIL_GROUP = P_ARRAY_TO_GROUP(X) THEN
l_data_to.extend ;
l_data_to(l_data_to.count) := I.EMAIL;
END IF;
END LOOP;

END LOOP;

FOR X IN 1..P_ARRAY_CC_GROUP.COUNT LOOP

FOR I IN C1 LOOP

IF I.EMAIL_GROUP = P_ARRAY_CC_GROUP(X) THEN
l_data_cc.extend ;
l_data_cc(l_data_cc.count) := I.EMAIL;
END IF;
END LOOP;

END LOOP;

FOR X IN 1..P_ARRAY_BCC_GROUP.COUNT LOOP

FOR I IN C1 LOOP

IF I.EMAIL_GROUP = P_ARRAY_BCC_GROUP(X) THEN
l_data_bcc.extend ;
l_data_bcc(l_data_bcc.count) := I.EMAIL;
END IF;
END LOOP;

END LOOP;

END LISTEMAILS;
END EMAIL_PACKAGE;
/

Step4
Here is the sample unnamed plsql code to send the email.

-- The piple symbol is not displaying in the blog. So i placed ### symbol
-- whereever there is a pipe symbol. If we want to test the script in your test database,
-- please replace the ### symbol with pipe symbol.

declare
to_group group_array DEFAULT group_array();
cc_group group_array DEFAULT group_array();
bcc_group group_array DEFAULT group_array();
r_to_group group_array DEFAULT group_array();
r_cc_group group_array DEFAULT group_array();
r_bcc_group group_array DEFAULT group_array();
v_message VARCHAR2(22767) := 'test';
v_crlf VARCHAR2(2) := CHR(13) ### CHR(10);
v_time NUMBER;
BEGIN

to_group.extend;
to_group(1) := 'DBA';

to_group.extend;
to_group(2) := 'BUSINESS';

cc_group.extend;
cc_group(1) := 'MANAGER';

bcc_group.extend;
bcc_group(1) := 'DBA PAGER';

Email_Package.listemails(to_group,cc_group,bcc_group,r_to_group,r_cc_group,r_bcc_group);
Email_Package.send
( p_sender_email => 'john.doe@xyz.com' ,
p_from => 'IT' ,
p_to => r_to_group ,
p_cc => r_cc_group ,
p_bcc => r_bcc_group ,
p_subject => 'Testing messaage' ,
p_body => v_message);
END;
/

Wednesday, July 23, 2008

NOT IN (Vs) NOT EXISTS

“NOT IN” and “NOT EXISTS” clauses are not the same functionally or performance wise and, therefore, should be used appropriately. NOT IN is different then the NOT EXISTS. But both the query return same amount of records, when the sub query we use in the NOT IN does not contain any null values. NOT IN does not have the ability to compare the NULL values.

The below example is tested in oracle9i release 9.2.0.6.0.

Here is the example for NOT IN versus NOT EXISTS.

We create a table employee with 10 employees. 9 employees out of 10, they are reporting to one manager called Jim. The employee Jim is head of the team and he is not reporting to any one.

Here is the table and content.

SQL> CREATE TABLE employee
2 (
3 EMPNO NUMBER(10) NOT NULL PRIMARY KEY,
4 EMPNAME VARCHAR2(20 CHAR),
5 MGR NUMBER(10) NULL
6 )
7 /

Table created.

SQL>
SQL> INSERT INTO EMPLOYEE VALUES (1, 'TAN', 5);

1 row created.

SQL> INSERT INTO EMPLOYEE VALUES (2, 'TOMMY', 5);

1 row created.

SQL> INSERT INTO EMPLOYEE VALUES (3, 'SCOTT', 5);

1 row created.

SQL> INSERT INTO EMPLOYEE VALUES (4, 'JOE', 5);

1 row created.

SQL> INSERT INTO EMPLOYEE VALUES (5, 'JIM', NULL);

1 row created.

SQL> INSERT INTO EMPLOYEE VALUES (6, 'LEENA', 5);

1 row created.

SQL> INSERT INTO EMPLOYEE VALUES (7, 'TEENA', 5);

1 row created.

SQL> INSERT INTO EMPLOYEE VALUES (8, 'HACKEN', 5);

1 row created.

SQL> INSERT INTO EMPLOYEE VALUES (9, 'JEESE', 5);

1 row created.

SQL> INSERT INTO EMPLOYEE VALUES (10, 'TURNER', 5);

1 row created.

SQL> COMMIT
2 /

Commit complete.

SQL>

Now the goal is to list all the employees who are not managers...

Use NOT IN

SQL> select count(*) from employee where empno not in ( select mgr from employee );

COUNT(*)
----------
0

SQL>
So above query shows that, all the employees are managers.. It is not correct.

Use NOT EXISTS

SQL> select count(*) from employee T1
2 where not exists (select 1 from employee T2 where t2.mgr = t1.empno)
3 /

COUNT(*)
----------
9

SQL>

Now the above query returns 9 employees and it make sense.. So when we have NULL, we need to make sure, we use extra condition in the subquery where clause as below...

SQL> select count(*) from employee where empno not in
2 (select mgr from employee where mgr is not null)
3 /

COUNT(*)
----------
9
SQL>

Performance between NOT IN & NOT EXISTS

Based on my experience, I would recommend to try both (NOT IN & NOT EXISTS) and see which one gives better performance and take the final decision. In my experience, there are cases, performance is very good when i use NOT EXISTS.

Here is one real time example... In this example, import_yi_b table has 80580 records. sap_sohead table has 288382 records. sap_sohead.sonum is NOT NULL field and it is indexed. This two table has recent statistics.

When i use NOT EXISTS, it takes only one second. Here it correlates the inner query with the outer query. The optimizer no longer needs to scan the whole index because now it is only searching for subset of the data, as defined by the correlation.

For the below query, based on the explain plan, it use the index range scan on SAP_SOHEAD table.

12:59:00 SQL> select count(sonum) from import_yi_b x
12:59:00 2 where not exists(select 1 from sap_sohead h where h.sonum = x.sonum);

COUNT(SONUM)
------------
76080

12:59:01 SQL>

When i use NOT IN , query runs for ever. It took one hour to complete this query. It is not correlated, so the whole index is scanned to retrieve possible values. The optimizer doesn't know which values i am interested in.

For the below query, based on the explain plan, it use the index full scan on SAP_SOHEAD table. This is the difference i found on explain plan between NOT IN & NOT EXISTS clause.

13:55:12 SQL> select count(sonum) from import_yi_b x
13:55:12 2 where x.sonum not in
13:55:12 3 (select h.sonum from sap_sohead h);

COUNT(SONUM)
------------
76080

14:55:23 SQL>

So the same query, it took one minute for NOT EXISTS clause, took one hour for NOT IN clause. It is huge performance improvement when i replace the NOT EXISTS in the place of NOT IN.

Conclusion : NOT IN & NOT EXISTS are not substitute for each other. But NOT IN & NOT EXISTS produce the same result when subquery we use in the NOT IN does not contain NULLS. So when we use NOT IN, we need to ensure that, either the columns being selected must have NOT NULL constraints or there must be predicates in the WHERE clause of the subquery to ensure there are no nulls.

Avoid defining maximum data type length in Oracle

One of my co-worker asked me this question... Why can't we define VARCHAR2(4000) for all char data type.. So that, we don't need to increase the length in future.. Since VARCHAR2 data type is variable length, it would not occupy the defined field width in database.

I got an answer when i googled this question and i would like to write the answer in this blog..

Here are the answers..

1. When users runs a query in query tool(say for example TOAD, SQL Navigator), it display the data based on the width of the column. It will be very hard to scroll to see the data from left to right and right to left.

2. When developer use the array fetch(Bulk binds, bulk collects) for better performance, oracle allocate the memory as we defined in the field length. Let us say, developers wanted to fetch 10 columns for 100 rows . So, 100 * 10 * 4000 => almost 4MB of RAM, oracle has to allocate to run this query. One connected session eats 4MB memory for this query. Imagine, if 5 people runs the same query at the same time, it will be 20MB RAM..

PLSQL Developers might use column%type to declare PLSQL variable. This might waste system memory badly if we define the max field length.

3. Developers build the data entry screen to load the data into the table. Let us say, Product_code is VARCHAR2(4000), Product_comment is VARCHAR2(4000). No one knows what sort of data can go into the database.

4. Oracle throw ORA-01450 maximum key length (string) exceeded error when combined length of all the columns specified in a CREATE INDEX statement exceeded the maximum index length. The maximum index length varies by operating system. If you are more interested in how the maximum index key length is computed, there is a document (Doc.Id 136158.1) on Metalink that explains this in details

Connected to:
Oracle9i Enterprise Edition Release 9.2.0.8.0 - 64bit Production
With the Partitioning, OLAP and Oracle Data Mining options
JServer Release 9.2.0.8.0 – Production

SQL> create table test(text1 varchar2(4000),
2 text2 varchar2(4000),
3 text3 varchar2(4000),
4 text4 varchar2(4000));

Table created.

SQL> create index idx on test(text1,text2);
create index idx on test(text1,text2)
*
ERROR at line 1:
ORA-01450: maximum key length (6398) exceeded

SQL>

Conclusion :- We should not allocate maximum data length. We should use proper size for Primary key & foreign keys. We can always expand whenever we need it. Max size will hurt the application performance, because, it will mistakenly allocate more memory as we discussed in the seocnd point.

If any one wanted to read more on this, please read this link.

http://tkyte.blogspot.com/2006/11/see-you-just-cannot-make-this-stuff-up.html

When does oracle use Full Table Scan?

Full Table Scans reads all the blocks under the High Water Mark and filter out those that do not meet the selection criteria. Each row is examined to determine whether it satisfies the statement’s WHERE clause.

During full table scan, the blocks are read sequentially. Because the blocks are adjacent, I/O calls larger than a single block can be used to speed up the process. The size of the read calls range from one block to the number of blocks indicated by the initialization parameter DB_FILE_MULTIBLOCK_READ_COUNT. Using multiblock reads means a full table scan can be performed very efficiently.

Full table scans are cheaper than index range scans when accessing a large fraction of the blocks in a table. This is because full table scans can use larger I/O calls, and making fewer large I/O calls is cheaper than making many smaller calls.

When the Optimizer Uses Full Table Scans

Lack of Index
If the query is unable to use any existing indexes, then it uses a full table scan. For example, if there is a function used on the indexed column in the query, the optimizer is unable to use the index and instead uses a full table scan as below

ScanSELECT last_name, first_name
FROM employees
WHERE UPPER(last_name) LIKE :b1

If you need to use the index for case-independent searches, then either do not permit mixed-case data in the search columns or create a function-based index, such as UPPER(last_name), on the search column.

Large Amount of Data
If the optimizer thinks that the query will access most of the blocks in the table, then it uses a full table scan, even though, the index is available. Oracle takes FTS if the data are spreaded in most of the blocks. Optimizer takes FTS if clustering factor is high.

Small Table
If a table contains less than DB_FILE_MULTIBLOCK_READ_COUNT blocks under the high water mark, which can be read in a single I/O call, then a full table scan might be cheaper than an index range scan, regardless of the fraction of tables being accessed or indexes present.

Full Table Scan Hints
Use the hint FULL(table alias) if you want to force the use of a full table scan. For more information on the FULL hint.

SELECT /*+ FULL(e) */ employee_id, last_name
FROM employees e
WHERE last_name LIKE :b1;

Anti Join algorithm

This article was written in oracle9i release 9.2.0.6.0. The below test result may varry from Oracle one release to another release.

An “anti-join” between two tables returns rows from the first table where no matches are found in the second table. An anti-join is essentially the opposite of a semi-join: While a semi-join returns one copy of each row in the first table for which at least one match is found, an anti-join returns one copy of each row in the first table for which no match is found. Anti-joins are written using the NOT EXISTS or NOT IN constructs. These two constructs differ in how they handle nulls.

Suppose you have the DEPT and EMP tables in the SCOTT schema.

SQL> select count(*) from emp;

COUNT(*)
----------
458752
SQL> select count(*) from dept;

COUNT(*)
----------
5
SQL> EXECUTE DBMS_STATS.GATHER_TABLE_STATS(OWNNAME => 'SCOTT',TABNAME => 'EMP
',ESTIMATE_PERCENT => 10, METHOD_OPT => 'FOR ALL COLUMNS SIZE 1', CASCADE => TRUE);

PL/SQL procedure successfully completed.

SQL> EXECUTE DBMS_STATS.GATHER_TABLE_STATS(OWNNAME => 'SCOTT',TABNAME => 'DEP
T',ESTIMATE_PERCENT => 10, METHOD_OPT => 'FOR ALL COLUMNS SIZE 1', CASCADE => TRUE);

PL/SQL procedure successfully completed.

Let us test the Anti Join

We want to list all the departments in department table where the department does not have any employees in employee table.

Here is one way to write the query.
SQL> select count(*) from(
2 SELECT D1.deptno
3 FROM dept D1
4 MINUS
5 SELECT D2.deptno
6 FROM dept D2, emp E2
7 WHERE D2.deptno = E2.deptno)
8 /

COUNT(*)
----------
2
Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=FIRST_ROWS (Cost=2162 Card=1)
1 0 SORT (AGGREGATE)
2 1 VIEW (Cost=2162 Card=5)
3 2 MINUS
4 3 SORT (UNIQUE NOSORT) (Cost=7 Card=5 Bytes=15)
5 4 INDEX (FULL SCAN) OF 'SYS_C0010206' (UNIQUE)
6 3 SORT (UNIQUE) (Cost=2155 Card=461910 Bytes=2771460)
7 6 NESTED LOOPS (Cost=161 Card=461910 Bytes=2771460)
8 7 INDEX (FAST FULL SCAN) OF 'IDX_EMP' (NON-UNIQUE)
(Cost=161 Card=461910 Bytes=1385730)
9 7 INDEX (UNIQUE SCAN) OF 'SYS_C0010206' (UNIQUE)

The above query will give the desired results, but it might be clearer to write the query using an anti-join:

SQL> SELECT COUNT(D.deptno)
2 FROM dept D
3 WHERE D.deptno NOT IN
4 (
5 SELECT E.deptno
6 FROM emp E
7 WHERE E.deptno IS NOT NULL
8 ) ;

COUNT(D.DEPTNO)
---------------
2
Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=FIRST_ROWS (Cost=806 Card=1 Bytes=6)
1 0 SORT (AGGREGATE)
2 1 NESTED LOOPS (ANTI) (Cost=806 Card=2 Bytes=12)
3 2 INDEX (FULL SCAN) OF 'SYS_C0010206' (UNIQUE)
4 2 INDEX (FAST FULL SCAN) OF 'IDX_EMP' (NON-UNIQUE) (Cost=161 Card=277146 Bytes=831438)

We can write the above query by using NOT EXISTS clause. The below query will also produce the same result as above.

SELECT d.deptno, d.dname
FROM dept D
WHERE NOT EXISTS (SELECT 1
FROM emp E WHERE E.deptno = D.deptno)
/

Here is another good candidate for an ANTI-JOIN access path

Suppose you want a list of customers who have not placed an order within the last ten days. You might start with a query that looks like:
SELECT C.short_name, C.customer_id
FROM customers C
WHERE NOT EXISTS(
SELECT 1
FROM orders O
WHERE O.customer_id = C.customer_id
AND O.order_date > SYSDATE - 10)
ORDER BY C.short_name;
/

SELECT C.short_name, C.customer_id
FROM customers C
WHERE C.customer_id NOT IN(
SELECT O.customer_id
FROM orders O
WHERE O.order_date > SYSDATE - 10
AND O.customer_id IS NOT NULL)
ORDER BY C.short_name
/

In Oracle 9i, an anti-join can be performed using the nested loops, merge, or hash join algorithms. As with a conventional join, Oracle will choose the join algorithm with the lowest cost. Oracle provides the NL_AJ, MERGE_AJ, and HASH_AJ hints in order for you to manipulate the anti-join process if you need to. As with the anti-join hints, the hint is applied to the subquery and not the main body of the query itself.

Semi Join algorithm

This article was written in oracle9i release 9.2.0.6.0. The below test result may varry from Oracle one release to another release.

A “semi-join” between two tables returns rows from the first table where one or more matches are found in the second table. The difference between a semi-join and a conventional join is that rows in the first table will be returned at most once. Even if the second table contains two matches for a row in the first table, only one copy of the row will be returned.

Suppose you have the DEPT and EMP tables in the SCOTT schema.

SQL> select count(*) from emp;

COUNT(*)
----------
458752
SQL> select count(*) from dept;

COUNT(*)
----------
4
SQL> EXECUTE DBMS_STATS.GATHER_TABLE_STATS(OWNNAME => 'SCOTT',TABNAME => 'EMP
',ESTIMATE_PERCENT => 10, METHOD_OPT => 'FOR ALL COLUMNS SIZE 1', CASCADE => TRUE);

PL/SQL procedure successfully completed.

SQL> EXECUTE DBMS_STATS.GATHER_TABLE_STATS(OWNNAME => 'SCOTT',TABNAME => 'DEP
T',ESTIMATE_PERCENT => 10, METHOD_OPT => 'FOR ALL COLUMNS SIZE 1', CASCADE => TRUE);

PL/SQL procedure successfully completed.

Let us test the semi join.

We want a list all departments in dept table if department has atleast one employee in emp table.

Here is conventional way to write the query.

SELECT D.deptno, D.dname FROM dept D, emp E WHERE E.deptno = D.deptno ORDER BY D.deptno;

The problem here is, if the department has 100 employees, then department will appear 100 times. We can eliminate the duplicate records by using the DISTINCT clause. But again, oracle would do more work to get the output.

Here is total time it took to complete the query

SQL> declare
2 l_start NUMBER DEFAULT DBMS_UTILITY.GET_TIME;
3 v_cnt number;
4 begin
5 SELECT count(distinct D.deptno) into v_cnt
6 FROM dept D, emp E
7 WHERE E.deptno = D.deptno
8 ORDER BY D.deptno;
9 dbms_output.put_line(to_char(ROUND(ROUND((DBMS_UTILITY.GET_TIME - l_start)/100, 2)/60,3)));
10 end;
11 /
.009
Here is the execution plan for the query

SQL> SELECT count(distinct D.deptno)
2 FROM dept D, emp E
3 WHERE E.deptno = D.deptno
4 ORDER BY D.deptno;

COUNT(DISTINCTD.DEPTNO)
-----------------------
3
Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=FIRST_ROWS (Cost=160 Card=1 Bytes
=6)

1 0 SORT (GROUP BY)
2 1 NESTED LOOPS (Cost=160 Card=455570 Bytes=2733420)
3 2 INDEX (FAST FULL SCAN) OF 'IDX_EMP' (NON-UNIQUE) (Cost
=160 Card=455570 Bytes=1366710)

4 2 INDEX (UNIQUE SCAN) OF 'SYS_C0010206' (UNIQUE)

We can use semi join between the DEPT and EMP tables instead of a conventional join:

The below query will list the departments that have at least one employee. Whether a department has one employee or 100, the department will appear only once in the query output. Moreover, Oracle will move on to the next department as soon as it finds the first employee in a department, instead of finding all of the employees in each department.

Here is total time it took to complete the query

SQL> declare
2 l_start NUMBER DEFAULT DBMS_UTILITY.GET_TIME;
3 v_cnt number;
4 begin
5 SELECT count(D.deptno) into v_cnt
6 FROM dept D
7 WHERE EXISTS
8 (
9 SELECT 1
10 FROM emp E
11 WHERE E.deptno = D.deptno
12 )
13 ORDER BY D.deptno;
14 dbms_output.put_line(to_char(ROUND(ROUND((DBMS_UTILITY.GET_TIME - l_start)/100, 2)/60,3)));
15 end;
16 /
.001

Here is the execution plan

SQL> SELECT count(D.deptno)
2 FROM dept D
3 WHERE EXISTS
4 (
5 SELECT 1
6 FROM emp E
7 WHERE E.deptno = D.deptno
8 )
9 ORDER BY D.deptno;

COUNT(D.DEPTNO)
---------------
3

Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=FIRST_ROWS (Cost=641 Card=1 Bytes =6)

1 0 SORT (AGGREGATE)
2 1 NESTED LOOPS (SEMI) (Cost=641 Card=3 Bytes=18)
3 2 INDEX (FULL SCAN) OF 'SYS_C0010206' (UNIQUE)
4 2 INDEX (FAST FULL SCAN) OF 'IDX_EMP' (NON-UNIQUE) (Cost
=160 Card=341678 Bytes=1025034)

Here is the another good candidate for SEMI-JOIN algorithim.

Display list of gold-status customers who have placed an order within the last three days. You might start with a query that looks like:

SELECT C.short_name, C.customer_id
FROM customers C
WHERE C.customer_type = 'Gold'
AND EXISTS(
SELECT 1
FROM orders O
WHERE O.customer_id = C.customer_id
AND O.order_date > SYSDATE - 3)
ORDER BY C.short_name
/

Note : If the query contains a DISTINCT/UNION clause, then Oracle cannot transform EXISTS or IN clauses into something that could use a semi-join access path. Semi join can be used in Nested loop, Hash Join & Sort Merge. If semi join method is not possible, then optimizer use the conventional join with lowest cost. Oracle provides hints(NL_SJ, HASH_SJ, and MERGE_SJ) to enforce semi join algorithm. The hint is applied to the subquery of the EXISTS or IN clause, not the main body of the query itself.