Data Engineering: 2014

Tuesday, December 23, 2014

Avoid distinct in Query if its not needed actaully

This query came to me as its taking more time return result and cost is also high.

Ex:

SELECT id_

FROM act_ru_task

WHERE id_ IN (SELECT tasks.id_

FROM (SELECT a.*, rownum rnum

FROM (SELECT DISTINCT res.*

FROM act_ru_task res

INNER JOIN act_ru_identitylink i

ON i.task_id_ = res.id_

WHERE res.priority_ = :b3

AND res.assignee_ IS NULL

AND ((res.due_date_ <= :b2 AND

res.due_date_ IS NOT NULL) OR

res.due_date_ IS NULL)

AND res.assignee_ IS NULL

AND i.type_ = 'candidate'

AND i.group_id_ = :b1

ORDER BY res.id_ ASC) a

WHERE rownum <= :b4) tasks

WHERE rnum >= :b5)

FOR UPDATE

After taking plan

I found there is no duplicate values in the table and distinct is doing full table scan which is not required here.

Before putiing distinct in the table please make sure its needed for your requirement.

Monday, November 24, 2014

Oracle Tuning a Query -table with 25 million Record

Am Having the transaction table which will grow daily and now row count is around 25 million.

This is my query taking more time and cost. Its due to order by in the computation field(TOTAL_WEIGHT)

SELECT PPS_ID,TOTAL_WEIGHT from

(SELECT PPS_ID,TOTAL_WEIGHT

FROM (SELECT pps_id,round(((( 60 * name_pct_match / 100) + prs_weight + year_weight + dt_weight)/

90) * 100) total_weight

FROM (SELECT pps_id,

round(func_compare_name( 'FRANCIS JOHN ROCH',upper(name_en),' ',60)) name_pct_match,

decode(prs_nationality_id, 99, 15, 0) prs_weight,

10 mother_weight,

100 total_attrib_weight,

case when to_number(to_char(birth_date, 'yyyy')) = 1986 then 5 else 0 end year_weight,

case when to_char(to_date('12-JAN-86','DD-MON-RRRR'), 'dd') = to_char(birth_date, 'dd')

and to_char(to_date('12-JAN-86','DD-MON-RRRR'), 'mm') = to_char(birth_date, 'mm') then 10

when to_date('12-JAN-86','DD-MON-RRRR') between birth_date-6 and birth_date+6 then 8

when to_date('12-JAN-86','DD-MON-RRRR') between birth_date-28 and birth_date+28 then 5

when to_date('12-JAN-86','DD-MON-RRRR') between birth_date-90 and birth_date+90 then 3

else 0

end dt_weight

FROM individual_profile

WHERE birth_date = '12-JAN-86'

AND IS_ACTIVE = 1

AND gender_id = 1

AND round(func_compare_name('FRANCIS JOHN ROCH',upper(name_en),' ',60)) > 20

))

WHERE TOTAL_WEIGHT >= 100

order by total_weight desc)

where rownum <= 10

Till oracle 11g there is no way to solve order by issue. In oracle 12c oracle introduced TOP LIMIT concept to reduce this burden. In my case i cant use this since my DB is oracle 11g. so i did following steps to make this query performing

Step -1: Initially i thought to made my fetch fast. So i partitioned the table by birth_date and subpartitioned by gender_id. So now my fettch became fast.

Step -2: partitioned the table bith_date year wise as range partition and gender wise as list partition.

Step-3: Now i made the fetch faster. my query is used by many users at a time. So i want to think about CPU utilization also. After taking explain plan i found bytes is high and cost somewhat reduced

Step-4: Now i tried different indexes to reduce the cost and bytes. but nothing helped as order by is used.

step-5 : Finally i decided to make my query to run in parallel.

step-6: Oracle 11g has option of parallel(auto) which tell the optimizer to run in parallel based on user

step -7: SELECT /*+ parallel(auto) */ PPS_ID,TOTAL_WEIGHT from

step-8: added parallel hint which brought down cost and bytes. now query is performing well

Note: This parallel wont take indexes used in the table.It will ran in parallel way.

Oracle Text Concept-- Like operator Replaced with Oracle Text which improves better performance

In this process i want to replace like operator in where clause which is going for full table scan in some package. Oracle suggested replace the like operator with oracle text concepts. There are two ways to implement this

· Catsearch

· Context

Catsearch:

Catsearch is the best way to implement because here the index will be auto updated and we will get more matches in record. In our case it failed because its giving more record than which we are getting in like operator. Because its searching by fragment by fragment. But in our case it should start from start of the letter.

Context:

This is done by contains clause. Here the index is not auto updated and we can make it auto-commit by giving sync commit while creating the index. Created datastore and wordlist to perform index more better. And set attributes on the column which need to be indexed and prefix_index which would improve the performance. And set wildcard terms to 5000 which would avoid conversion error.

Example:

begin

ctx_ddl.create_preference('spm_full_ename_ds', 'multi_column_datastore');

ctx_ddl.set_attribute('spm_full_ename_ds', 'columns', '''xxstart'',spm_full_ename');

ctx_ddl.set_attribute('spm_full_ename_ds', 'delimiter', 'NEWLINE');

ctx_ddl.create_preference('spm_full_ename_wl', 'BASIC_WORDLIST');

ctx_ddl.set_attribute('spm_full_ename_wl','PREFIX_INDEX','TRUE');

ctx_ddl.set_attribute('spm_full_ename_wl','PREFIX_MIN_LENGTH',1);

ctx_ddl.set_attribute('spm_full_ename_wl','PREFIX_max_LENGTH',15);

CTX_DDL.SET_ATTRIBUTE ('spm_full_ename_wl', 'WILDCARD_MAXTERMS', 50000);

end;

Now while creating index added nationality in the filter, because in some cases nationality is not considered as index and going for full table so made nationality in the filter by and now it as composite index.

Index:

Create index idx_cont_spm_ename on list_master(spm_full_ename) indextype is ctxsys.context filter by nat_code_curr_nat

parameters ('wordlist spm_full_ename_wl datastore spm_full_ename_ds SYNC(ON COMMIT)') local;

If our system is RAC node oracle suggested to alter the index to parallel to improve the performance

alter index idx_cont_spm_ename parallel 1;

Now after creating the index we will face response time is more for the query for that oracle suggested some hints which would return the records in faster manner and optimize the cost of the query.

/*+ domain_index_no_sort */ and /*+ first_rows(1) */

First_rows() hint:

we can also optimize for response time using the related FIRST_ROWS hint. Like FIRST_ROWS(n), when queries are optimized for response time, Oracle Text returns the first rows in the shortest time possible.

domain_index_no_sort:

Example:

SELECT /*+ first_rows(1) domain_index_no_sort */ id sub_sys_individual_id,

spm_full_ename nm_e,

spm_full_aname nm_a,

-- func_get_moidata_gender(spm_person_no) sex,

nat_code_curr_nat prs_nat,

spm_person_no person_no,

null person_tp,

'NA' prog_where_not_allowed,

spm_person_no udb_no

FROM list_master

WHERE (ls_code IN (2, 3))

and contains(spm_full_ename,'xxstart AYOUB GHOLAMREZA AHMADI%') >0

AND (((date_from <= trunc(SYSDATE) OR date_from IS NULL) AND

(date_to >= trunc(SYSDATE) OR date_to IS NULL) AND

(duration_flag = 2)) OR (duration_flag = 1));

ctx_ddl.create_preference and Set_attribute:

To create a datastore, lexer, filter, classifier, wordlist, or storage preference,we can use the CTX_DDL.CREATE_PREFERENCE procedure.

Multi column Datastore: Data is stored in a text table in more than one column. Columns are concatenated to create a virtual document, one for each row.

we can also set attributes with the CTX_DDL.SET_ATTRIBUTE procedure.

Newline attribute:

Specify TRUE for Oracle Text to add no newline character after each detail row.

Prefix_index attribute:

In our case we are using right-truncated predicate. For that oracle suggested to use the prefix_idex attribute by setting min and max values.

WILDCARD_MAXTERMS:

As wre are wild card character in our search % we are setting this wildcard maxteerms as 5000. In wordlist we are setting this.

When we execute a query using wildcard(%), the term you used in a query is used to generate another list of terms (say match_list) from token_list.

Match_list consists of all the terms which have matched the wildcarded term - whith 0 or more characters to the query term in place of wildcard.

SYNC(ON COMMIT):

As we know context search is not auto update we are updating the index on commit. So this sync( on commit) is used.

Local:

Since our table is partitioned by creating the index as local. As we are using first_rows and domain_index_no_sort hint we cam improve the query performance and the response time. If a table is range or hash partition only we can create this index as local.v