Randolf Geist

Subscribe to Randolf Geist feed
Updated: 1 week 4 days ago

Indexing Null Values - Part 2

Wed, 2019-05-22 10:57
In the previous post I've demonstrated that Oracle has some problems to make efficient use of B*Tree indexes if an IS NULL condition is followed by IN / OR predicates also covered by the same index - the predicates following are not used to navigate the index structure efficiently but are applied as filters on all index entries identified by the IS NULL.

In this part I'll show what results I got when repeating the same exercise using Bitmap indexes - after all they include NULL values anyway, so no special tricks are required to use them for an IS NULL search. Let's start again with the same data set (actually not exactly the same but very similar) and an index on the single expression that gets searched for via IS NULL - results are again from 18.3.0:

SQL> create table null_index as select * from dba_tables;

Table created.

SQL> insert /*+ append */ into null_index select a.* from null_index a, (select /*+ no_merge cardinality(100) */ rownum as dup from dual connect by level <= 100);

214500 rows created.

SQL> commit;

Commit complete.

SQL> exec dbms_stats.gather_table_stats(null, 'NULL_INDEX', method_opt => 'for all columns size auto for columns size 254 owner')

PL/SQL procedure successfully completed.

SQL> create bitmap index null_index_idx on null_index (pct_free);

Index created.

SQL> set serveroutput off pagesize 5000 arraysize 500
SQL>
SQL> set autotrace traceonly
SQL>
SQL> select * from null_index where pct_free is null and owner in ('AUDSYS', 'CBO_TEST');

101 rows selected.


Execution Plan
----------------------------------------------------------
Plan hash value: 1297049223

------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 19 | 5852 | 2342 (1)| 00:00:01 |
|* 1 | TABLE ACCESS BY INDEX ROWID BATCHED| NULL_INDEX | 19 | 5852 | 2342 (1)| 00:00:01 |
| 2 | BITMAP CONVERSION TO ROWIDS | | | | | |
|* 3 | BITMAP INDEX SINGLE VALUE | NULL_INDEX_IDX | | | | |
------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

1 - filter("OWNER"='AUDSYS' OR "OWNER"='CBO_TEST')
3 - access("PCT_FREE" IS NULL)


Statistics
----------------------------------------------------------
1 recursive calls
0 db block gets
2192 consistent gets
30 physical reads
0 redo size
7199 bytes sent via SQL*Net to client
372 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
101 rows processed

So indeed the Bitmap index was successfully used to identify the PCT_FREE IS NULL rows but the efficiency suffers from the same problem and to the same degree as the corresponding B*Tree index plan - too many rows have to be filtered on table level:

Plan hash value: 1297049223

--------------------------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | Cost (%CPU)| A-Rows | A-Time | Buffers | Reads |
--------------------------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 2342 (100)| 101 |00:00:00.01 | 2192 | 30 |
|* 1 | TABLE ACCESS BY INDEX ROWID BATCHED| NULL_INDEX | 1 | 19 | 2342 (1)| 101 |00:00:00.01 | 2192 | 30 |
| 2 | BITMAP CONVERSION TO ROWIDS | | 1 | | | 13433 |00:00:00.01 | 3 | 30 |
|* 3 | BITMAP INDEX SINGLE VALUE | NULL_INDEX_IDX | 1 | | | 3 |00:00:00.01 | 3 | 30 |
--------------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

1 - filter(("OWNER"='AUDSYS' OR "OWNER"='CBO_TEST'))
3 - access("PCT_FREE" IS NULL)

Some interesting points to mention: The 13,000+ rows are identified in the Bitmap index using just three index row entries / bitmap fragments, so that's the special efficiency of Bitmap indexes where a single index row entry can cover many, many table rows, and it's also interesting to see that the costing is pretty different from the B*Tree index costing (2342 vs. 1028, in this case closer to reality of 2,200 consistent gets but we'll see in a moment how this can change) - and no cardinality estimate gets mentioned on Bitmap index level  - the B*Tree index plan showed the spot on 13,433 estimated rows.

So reproducing the B*Tree test case, let's add the OWNER column to the Bitmap index in an attempt to increase the efficiency. Note that I drop the previous index to prevent Oracle from a "proper" usage of Bitmap indexes, as we'll see in a moment:

SQL> drop index null_index_idx;

Index dropped.

SQL> create bitmap index null_index_idx2 on null_index (pct_free, owner);

Index created.

SQL> select * from null_index where pct_free is null and owner in ('AUDSYS', 'CBO_TEST');

101 rows selected.


Execution Plan
----------------------------------------------------------
Plan hash value: 1751956722

-------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 19 | 5852 | 2343 (1)| 00:00:01 |
|* 1 | TABLE ACCESS BY INDEX ROWID BATCHED| NULL_INDEX | 19 | 5852 | 2343 (1)| 00:00:01 |
| 2 | BITMAP CONVERSION TO ROWIDS | | | | | |
|* 3 | BITMAP INDEX RANGE SCAN | NULL_INDEX_IDX2 | | | | |
-------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

1 - filter("OWNER"='AUDSYS' OR "OWNER"='CBO_TEST')
3 - access("PCT_FREE" IS NULL)
filter("PCT_FREE" IS NULL AND ("OWNER"='AUDSYS' OR "OWNER"='CBO_TEST'))


Statistics
----------------------------------------------------------
1 recursive calls
0 db block gets
105 consistent gets
30 physical reads
0 redo size
7199 bytes sent via SQL*Net to client
372 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
101 rows processed

Plan hash value: 1751956722

---------------------------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | Cost (%CPU)| A-Rows | A-Time | Buffers | Reads |
---------------------------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 2343 (100)| 101 |00:00:00.01 | 105 | 30 |
|* 1 | TABLE ACCESS BY INDEX ROWID BATCHED| NULL_INDEX | 1 | 19 | 2343 (1)| 101 |00:00:00.01 | 105 | 30 |
| 2 | BITMAP CONVERSION TO ROWIDS | | 1 | | | 101 |00:00:00.01 | 4 | 30 |
|* 3 | BITMAP INDEX RANGE SCAN | NULL_INDEX_IDX2 | 1 | | | 1 |00:00:00.01 | 4 | 30 |
---------------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

1 - filter(("OWNER"='AUDSYS' OR "OWNER"='CBO_TEST'))
3 - access("PCT_FREE" IS NULL)
filter(("PCT_FREE" IS NULL AND ("OWNER"='AUDSYS' OR "OWNER"='CBO_TEST')))

So now we end up with an "Bitmap index range scan" operation, which in reality looks pretty efficient - just 105 consistent gets, so assuming 101 consistent gets for accessing the 101 table rows it just required 4 consistent gets on index level. But then look at the cost estimate: 2343, which is even greater than the cost estimate of the previous plan, and also check the "Predicate Information" section, which looks pretty weird, too - an access only for PCT_FREE IS NULL, a filter on index level repeating the whole predicates including the PCT_FREE IS NULL and most significantly the predicates on OWNER repeated on table level.

Clearly what the optimizer assumes in terms of costing and predicates required doesn't correspond to what happens at runtime, which looks pretty efficient, but at least according the predicates on index level again doesn't look like the optimal strategy we would like to see again: Why the additional filter instead of just access? We can also see that echoed in the Rowsource statistics: Only a single Bitmap index fragment gets produced by the "Bitmap index range scan" but it requires 4 consistent gets on index level, so three of them get "filtered" after access.

The costing seems to assume that only the PCT_FREE IS NULL rows are identified on index level, which clearly isn't the case at runtime...

Of course this is not proper usage of Bitmap indexes - typically you don't create a multi column Bitmap index but instead make use of the real power of Bitmap indexes, which is how Oracle can combine multiple of them for efficient usage and access.

Before doing so, let's just for the sake of completeness repeat the combined Bitmap index of the B*Tree variant that turned out to be most efficient for the B*Tree case:

SQL> drop index null_index_idx2;

Index dropped.

SQL> create bitmap index null_index_idx3 on null_index (owner, pct_free);

Index created.

SQL> select * from null_index where pct_free is null and owner in ('AUDSYS', 'CBO_TEST');

101 rows selected.


Execution Plan
----------------------------------------------------------
Plan hash value: 1022155563

--------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 19 | 5852 | 83 (0)| 00:00:01 |
| 1 | INLIST ITERATOR | | | | | |
|* 2 | TABLE ACCESS BY INDEX ROWID BATCHED| NULL_INDEX | 19 | 5852 | 83 (0)| 00:00:01 |
| 3 | BITMAP CONVERSION TO ROWIDS | | | | | |
|* 4 | BITMAP INDEX RANGE SCAN | NULL_INDEX_IDX3 | | | | |
--------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

2 - filter("PCT_FREE" IS NULL)
4 - access("OWNER"='AUDSYS' OR "OWNER"='CBO_TEST')
filter("OWNER"='AUDSYS' OR "OWNER"='CBO_TEST')


Statistics
----------------------------------------------------------
1 recursive calls
0 db block gets
207 consistent gets
30 physical reads
0 redo size
7199 bytes sent via SQL*Net to client
372 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
101 rows processed

Plan hash value: 1022155563

----------------------------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | Cost (%CPU)| A-Rows | A-Time | Buffers | Reads |
----------------------------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 83 (100)| 101 |00:00:00.01 | 207 | 30 |
| 1 | INLIST ITERATOR | | 1 | | | 101 |00:00:00.01 | 207 | 30 |
|* 2 | TABLE ACCESS BY INDEX ROWID BATCHED| NULL_INDEX | 2 | 19 | 83 (0)| 101 |00:00:00.01 | 207 | 30 |
| 3 | BITMAP CONVERSION TO ROWIDS | | 2 | | | 303 |00:00:00.01 | 5 | 30 |
|* 4 | BITMAP INDEX RANGE SCAN | NULL_INDEX_IDX3 | 2 | | | 2 |00:00:00.01 | 5 | 30 |
----------------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

2 - filter("PCT_FREE" IS NULL)
4 - access(("OWNER"='AUDSYS' OR "OWNER"='CBO_TEST'))
filter(("OWNER"='AUDSYS' OR "OWNER"='CBO_TEST'))

While we see now again the desired "INLIST ITERATOR" this one looks weird for several reasons, in particular because we now have a much lower cost estimate (83) but in reality it is less efficient than the previous one (cost estimate 2343 but 105 consistent gets) due to the 207 consistent gets required. Why is this so? The "Predicate Information" section shows why: Only the predicate on OWNER is evaluated on index level (303 rows identified on index level) and therefore rows need to be filtered on table level, which looks again like an implementation limitation and pretty unnecessary - after all the PCT_FREE IS NULL should be somehow treated on index level instead.

So finally let's see how things turn out when using Bitmap indexes the way they are designed - by creating multiple ones and let Oracle combine them:

SQL> create bitmap index null_index_idx on null_index (pct_free);

Index created.

SQL> create bitmap index null_index_idx4 on null_index (owner);

Index created.

SQL> select * from null_index where pct_free is null and owner in ('AUDSYS', 'CBO_TEST');

101 rows selected.


Execution Plan
----------------------------------------------------------
Plan hash value: 704944303

-------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 19 | 5852 | 8 (0)| 00:00:01 |
| 1 | TABLE ACCESS BY INDEX ROWID BATCHED| NULL_INDEX | 19 | 5852 | 8 (0)| 00:00:01 |
| 2 | BITMAP CONVERSION TO ROWIDS | | | | | |
| 3 | BITMAP AND | | | | | |
|* 4 | BITMAP INDEX SINGLE VALUE | NULL_INDEX_IDX | | | | |
| 5 | BITMAP OR | | | | | |
|* 6 | BITMAP INDEX SINGLE VALUE | NULL_INDEX_IDX4 | | | | |
|* 7 | BITMAP INDEX SINGLE VALUE | NULL_INDEX_IDX4 | | | | |
-------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

4 - access("PCT_FREE" IS NULL)
6 - access("OWNER"='AUDSYS')
7 - access("OWNER"='CBO_TEST')


Statistics
----------------------------------------------------------
1 recursive calls
0 db block gets
108 consistent gets
60 physical reads
0 redo size
33646 bytes sent via SQL*Net to client
372 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
101 rows processed

Plan hash value: 704944303

---------------------------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | Cost (%CPU)| A-Rows | A-Time | Buffers | Reads |
---------------------------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 8 (100)| 101 |00:00:00.01 | 108 | 60 |
| 1 | TABLE ACCESS BY INDEX ROWID BATCHED| NULL_INDEX | 1 | 19 | 8 (0)| 101 |00:00:00.01 | 108 | 60 |
| 2 | BITMAP CONVERSION TO ROWIDS | | 1 | | | 101 |00:00:00.01 | 7 | 60 |
| 3 | BITMAP AND | | 1 | | | 1 |00:00:00.01 | 7 | 60 |
|* 4 | BITMAP INDEX SINGLE VALUE | NULL_INDEX_IDX | 1 | | | 3 |00:00:00.01 | 3 | 30 |
| 5 | BITMAP OR | | 1 | | | 1 |00:00:00.01 | 4 | 30 |
|* 6 | BITMAP INDEX SINGLE VALUE | NULL_INDEX_IDX4 | 1 | | | 1 |00:00:00.01 | 2 | 30 |
|* 7 | BITMAP INDEX SINGLE VALUE | NULL_INDEX_IDX4 | 1 | | | 1 |00:00:00.01 | 2 | 0 |
---------------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

4 - access("PCT_FREE" IS NULL)
6 - access("OWNER"='AUDSYS')
7 - access("OWNER"='CBO_TEST')

So now we see access predicates only and Oracle making efficient use by combining multiple Bitmap indexes. Nevertheless I find the range of costing amazing: This plan is assigned a cost of 8 but it's actually less efficient at runtime (108 consistent gets) than the plan above having a cost of 2343 assigned but requiring just 105 consistent gets at runtime. Clearly the costing of Bitmap indexes is still - even in version 18.3 - full of surprises.

Summary

Repeating the same exercise as previously using Bitmap indexes shows several things:

- Oracle isn't necessarily good at costing and using multi column Bitmap indexes properly
- The costing of Bitmap indexes is still questionable (the most important figure "Clustering Factor" is still meaningless for Bitmap indexes)
- For proper handling use Bitmap indexes the way they are supposed to be used: By creating separate ones and let Oracle combine them

Indexing Null Values - Part 1

Wed, 2019-05-15 17:04
Indexing null values in Oracle is something that has been written about a lot in the past already. Nowadays it should be common knowledge that Oracle B*Tree indexes don't index entries that are entirely null, but it's possible to include null values in B*Tree indexes when combining them with something guaranteed to be non-null, be it another column or simply a constant expression.

Jonathan Lewis not too long ago published a note that showed an oddity when dealing with IS NULL predicates that in the end turned out not to be a real threat and looked more like an oddity how Oracle displays the access and filter predicates when accessing an index and using IS NULL together with other predicates following after.

However, I've recently come across a rather similar case where this display oddity turns into a real threat. To get things started, let's have a look at the following (this is from 18.3.0, but other recent versions should show similar results):

SQL> create table null_index as select * from dba_tables;

Table created.

SQL> insert /*+ append */ into null_index select a.* from null_index a, (select /*+ no_merge cardinality(100) */ rownum as dup from dual connect by level <= 100);

214700 rows created.

SQL> commit;

Commit complete.

SQL> exec dbms_stats.gather_table_stats(null, 'NULL_INDEX', method_opt => 'for all columns size auto for columns size 254 owner')

PL/SQL procedure successfully completed.

SQL> create index null_index_idx on null_index (pct_free, ' ');

Index created.

SQL> set serveroutput off pagesize 5000 arraysize 500

Session altered.

SQL> set autotrace traceonly
SQL>
SQL> select * from null_index where pct_free is null and owner in ('AUDSYS', 'CBO_TEST');

101 rows selected.


Execution Plan
----------------------------------------------------------
Plan hash value: 3608178030

------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 19 | 5852 | 1028 (1)| 00:00:01 |
|* 1 | TABLE ACCESS BY INDEX ROWID BATCHED| NULL_INDEX | 19 | 5852 | 1028 (1)| 00:00:01 |
|* 2 | INDEX RANGE SCAN | NULL_INDEX_IDX | 13433 | | 32 (0)| 00:00:01 |
------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

1 - filter("OWNER"='AUDSYS' OR "OWNER"='CBO_TEST')
2 - access("PCT_FREE" IS NULL)


Statistics
----------------------------------------------------------
1 recursive calls
0 db block gets
2178 consistent gets
35 physical reads
0 redo size
7199 bytes sent via SQL*Net to client
372 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
101 rows processed

So this is the known approach of indexing null values by simply adding a constant expression and we can see from the execution plan that indeed the index was used to identify the rows having NULLs.

But we can also see from the execution plan, the number of consistent gets and also the Rowsource Statistics that this access can surely be further improved:

Plan hash value: 3608178030

--------------------------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | Cost (%CPU)| A-Rows | A-Time | Buffers | Reads |
--------------------------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 1028 (100)| 101 |00:00:00.01 | 2178 | 35 |
|* 1 | TABLE ACCESS BY INDEX ROWID BATCHED| NULL_INDEX | 1 | 19 | 1028 (1)| 101 |00:00:00.01 | 2178 | 35 |
|* 2 | INDEX RANGE SCAN | NULL_INDEX_IDX | 1 | 13433 | 32 (0)| 13433 |00:00:00.01 | 30 | 35 |
--------------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

1 - filter(("OWNER"='AUDSYS' OR "OWNER"='CBO_TEST'))
2 - access("PCT_FREE" IS NULL)

Because the additional predicate on OWNER can only be applied on table level, we first identify more than 13,000 rows on index level, visit all those table rows via random access and apply the filter to end up with the final 101 rows.

So obviously we should add OWNER to the index to avoid visiting that many table rows:

SQL> create index null_index_idx2 on null_index (pct_free, owner);

Index created.

SQL> select * from null_index where pct_free is null and owner in ('AUDSYS', 'CBO_TEST');

101 rows selected.


Execution Plan
----------------------------------------------------------
Plan hash value: 3808602675

-------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 19 | 5852 | 40 (0)| 00:00:01 |
| 1 | TABLE ACCESS BY INDEX ROWID BATCHED| NULL_INDEX | 19 | 5852 | 40 (0)| 00:00:01 |
|* 2 | INDEX RANGE SCAN | NULL_INDEX_IDX2 | 19 | | 38 (0)| 00:00:01 |
-------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

2 - access("PCT_FREE" IS NULL)
filter("OWNER"='AUDSYS' OR "OWNER"='CBO_TEST')



Statistics
----------------------------------------------------------
1 recursive calls
0 db block gets
137 consistent gets
61 physical reads
0 redo size
33646 bytes sent via SQL*Net to client
372 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
101 rows processed

Plan hash value: 3808602675

---------------------------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | Cost (%CPU)| A-Rows | A-Time | Buffers | Reads |
---------------------------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 40 (100)| 101 |00:00:00.01 | 137 | 61 |
| 1 | TABLE ACCESS BY INDEX ROWID BATCHED| NULL_INDEX | 1 | 19 | 40 (0)| 101 |00:00:00.01 | 137 | 61 |
|* 2 | INDEX RANGE SCAN | NULL_INDEX_IDX2 | 1 | 19 | 38 (0)| 101 |00:00:00.01 | 36 | 61 |
---------------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

2 - access("PCT_FREE" IS NULL)
filter(("OWNER"='AUDSYS' OR "OWNER"='CBO_TEST'))

So at first sight this looks indeed like an improvement, and it is compared to the previous execution plan, see for example how the number of consistent gets has been reduced. However, there is something odd going on: The index cost part is even greater than in the previous example, and looking more closely at the predicate information section it becomes obvious that the additional predicate on OWNER isn't applied as access predicate to the index, but only as filter. This means rather than directly identifying the relevant parts of the index by navigating the index structure efficiently using both predicates, only the PCT_FREE IS NULL expression gets used to identify the more than 13,000 corresponding index entries and then applying the filter on OWNER afterwards. While this is better than applying the filter on table level, it still can become a very costly operation and the question here is, why doesn't Oracle use both expressions to access the index? The answer to me looks like an implementation restriction - I don't see any technical reason why Oracle shouldn't be capable of doing so. Currently it looks like that in this particular case when using an IN predicate or the equivalent OR predicates following an IS NULL on index level gets only applied as filter, similar to predicates following range or unequal comparisons, or skipping columns / expressions in a composite index. But for those cases there is a reason why Oracle does so - it no longer can use the sorted index entries for efficient access, but I don't see why this should apply to this IS NULL case - and Jonathan's note above shows that in principle for other kinds of predicates it works as expected (except the oddity discussed).

This example highlights another oddity: Since it contains an IN list, ideally we would like to see an INLIST ITERATOR used as part of the execution plan, but there is only an INDEX RANGE SCAN operation using this FILTER expression.

By changing the order of the index expressions and having the expression used for the IS NULL predicate as trailing one, we can see the following:

SQL> create index null_index_idx3 on null_index (owner, pct_free);

Index created.

SQL> select * from null_index where pct_free is null and owner in ('AUDSYS', 'CBO_TEST');

101 rows selected.


Execution Plan
----------------------------------------------------------
Plan hash value: 2178707950

--------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 19 | 5852 | 6 (0)| 00:00:01 |
| 1 | INLIST ITERATOR | | | | | |
| 2 | TABLE ACCESS BY INDEX ROWID BATCHED| NULL_INDEX | 19 | 5852 | 6 (0)| 00:00:01 |
|* 3 | INDEX RANGE SCAN | NULL_INDEX_IDX3 | 19 | | 4 (0)| 00:00:01 |
--------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

3 - access(("OWNER"='AUDSYS' OR "OWNER"='CBO_TEST') AND "PCT_FREE" IS NULL)


Statistics
----------------------------------------------------------
1 recursive calls
0 db block gets
108 consistent gets
31 physical reads
0 redo size
33646 bytes sent via SQL*Net to client
372 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
101 rows processed

Plan hash value: 2178707950

----------------------------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | Cost (%CPU)| A-Rows | A-Time | Buffers | Reads |
----------------------------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 6 (100)| 101 |00:00:00.01 | 108 | 31 |
| 1 | INLIST ITERATOR | | 1 | | | 101 |00:00:00.01 | 108 | 31 |
| 2 | TABLE ACCESS BY INDEX ROWID BATCHED| NULL_INDEX | 2 | 19 | 6 (0)| 101 |00:00:00.01 | 108 | 31 |
|* 3 | INDEX RANGE SCAN | NULL_INDEX_IDX3 | 2 | 19 | 4 (0)| 101 |00:00:00.01 | 7 | 31 |
----------------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

3 - access((("OWNER"='AUDSYS' OR "OWNER"='CBO_TEST')) AND "PCT_FREE" IS NULL)

So this is the expected execution plan, including an INLIST ITERATOR and showing that all predicate expressions get used to access the index efficiently, reducing the number of consistent gets further. Of course, a potential downside here is that this index might not be appropriate if queries are looking for PCT_FREE IS NULL only.

Summary

It looks like that IN / OR predicates following an IS NULL comparison on index level are only applied as filters and therefore also prevent other efficient operations like inlist iterators. The problem in principle can be worked around by putting the IS NULL expression at the end of a composite index, but that could come at the price of requiring an additional index on the IS NULL expression when there might be the need for searching just for that expression efficiently.

In part 2 for curiosity I'll have a look at what happens when applying the same to Bitmap indexes, which include NULL values anyway...

Script used:

set echo on

drop table null_index purge;

create table null_index as select * from dba_tables;

insert /*+ append */ into null_index select a.* from null_index a, (select /*+ no_merge cardinality(100) */ rownum as dup from dual connect by level <= 100);

commit;

exec dbms_stats.gather_table_stats(null, 'NULL_INDEX', method_opt => 'for all columns size auto for columns size 254 owner')

create index null_index_idx on null_index (pct_free, ' ');

set serveroutput off pagesize 5000 arraysize 500

set autotrace traceonly

select * from null_index where pct_free is null and owner in ('AUDSYS', 'CBO_TEST');

create index null_index_idx2 on null_index (pct_free, owner);

select * from null_index where pct_free is null and owner in ('AUDSYS', 'CBO_TEST');

create index null_index_idx3 on null_index (owner, pct_free);

select * from null_index where pct_free is null and owner in ('AUDSYS', 'CBO_TEST');

Compression Restrictions - Update: Wide Table Compression In 12.1 And Later

Tue, 2019-05-07 17:03
I do have a very old post that used to be rather popular here that listed various restrictions related to compression. One of the most obvious restrictions in older versions was that the basic / OLTP (Advanced Row) heap table compression based on symbol tables / de-duplication was limited to tables with 254 columns or less - or probably more general to rows with single row pieces.

This limitation was lifted with Oracle 12c which started to support this compression type with tables having more than 254 columns / multiple row pieces - and there is a new internal parameter called "_widetab_comp_enabled" that defaults to TRUE.

So this is nothing really new, 12.1 has been released a long time ago, but it looks like it wasn't mentioned in the official 12c "New Features Guide", but only in some other documents as a side note - although it could be an important change for users not allowed to make use of HCC compression that have such "wide" tables and would like to save space resp. improve I/O performance.

Now the odd thing is that Oracle obviously found some problems with this change in 12.1.0.2 that seems to revolve around redo apply and hence decided to disable this feature partially in later PSUs of 12.1.0.2. There are a number of related notes on MyOracleSupport, specifically:

All About Advanced Table Compression (Overview, Usage, Examples, Restrictions) (Doc ID 882712.1)

Advanced Compression Restriction on 255 Columns Lifted For Oracle 12c (Doc ID 1612095.1)

Bug 25498473 : OLTP COMPRESSION DOESN'T WORK IN 12.1.0.2.160719 AND HIGHER WITH > 255 COLUMNS

So the summary of the situation seems to be this:

- In 12.1.0.2 with current PSUs applied you only get compression of tables with multiple row pieces when using Advanced Compression and conventional DML. Direct Path Load / CTAS / ALTER TABLE MOVE reorganisation does not compress hence basic compression does not work for such "wide" tables

- Starting with Oracle 12.2 this (artificial) limitation is lifted again, and now the Direct Path Load / CTAS / ALTER TABLE MOVE reorganisation code paths support compression again, which seems to include basic compression, although above mentioned documents only refer to OLTP (Advanced Row) compression

Note that the code obviously - according to the mentioned documents - checks the COMPATIBLE setting of the database, so running 12.2+ with compatible set to 12.1 means behaviour like 12.1, which means no basic compression for "wide" tables.

So depending on the patch level of the database a script populating a 1,000 columns table with 10,000 rows of very repeatable content that you can find at the end of this post produces the following output in 12.1.0.2:

Unpatched 12.1.0.2:

SQL> create table many_cols (
2 col001 varchar2(10),
3 col002 varchar2(10),
4 col003 varchar2(10),
.
.
.
1000 col999 varchar2(10),
1001 col1000 varchar2(10)
1002 );

Table created.

SQL>
SQL> declare
2 s_sql1 varchar2(32767);
3 s_sql2 varchar2(32767);
4 begin
5 for i in 1..1000 loop
6 s_sql1 := s_sql1 || 'col' || to_char(i, 'FM9000') || ',';
7 s_sql2 := s_sql2 || '''BLABLA'',';
8 end loop;
9
10 s_sql1 := rtrim(s_sql1, ',');
11 s_sql2 := rtrim(s_sql2, ',');
12
13 s_sql1 := 'insert into many_cols (' || s_sql1 || ') values (' || s_sql2 || ')';
14
15 for i in 1..10000 loop
16 execute immediate s_sql1;
17 end loop;
18
19 commit;
20 end;
21 /

PL/SQL procedure successfully completed.

SQL> select count(*) from many_cols;

COUNT(*)
----------
10000

SQL> exec dbms_stats.gather_table_stats(null, 'many_cols')

PL/SQL procedure successfully completed.

SQL> select blocks from user_tab_statistics where table_name = 'MANY_COLS';

BLOCKS
----------
10003


SQL> alter table many_cols move compress basic;

Table altered.

SQL> exec dbms_stats.gather_table_stats(null, 'many_cols')

PL/SQL procedure successfully completed.

SQL> select blocks from user_tab_statistics where table_name = 'MANY_COLS';

BLOCKS
----------
55


12.1.0.2 with some recent PSU applied:

SQL> create table many_cols (
2 col001 varchar2(10),
3 col002 varchar2(10),
4 col003 varchar2(10),
.
.
.
1000 col999 varchar2(10),
1001 col1000 varchar2(10)
1002 );

Table created.

SQL>
SQL> declare
2 s_sql1 varchar2(32767);
3 s_sql2 varchar2(32767);
4 begin
5 for i in 1..1000 loop
6 s_sql1 := s_sql1 || 'col' || to_char(i, 'FM9000') || ',';
7 s_sql2 := s_sql2 || '''BLABLA'',';
8 end loop;
9
10 s_sql1 := rtrim(s_sql1, ',');
11 s_sql2 := rtrim(s_sql2, ',');
12
13 s_sql1 := 'insert into many_cols (' || s_sql1 || ') values (' || s_sql2 || ')';
14
15 for i in 1..10000 loop
16 execute immediate s_sql1;
17 end loop;
18
19 commit;
20 end;
21 /

PL/SQL procedure successfully completed.

SQL> select count(*) from many_cols;

COUNT(*)
----------
10000

SQL> exec dbms_stats.gather_table_stats(null, 'many_cols')

PL/SQL procedure successfully completed.

SQL> select blocks from user_tab_statistics where table_name = 'MANY_COLS';

BLOCKS
----------
10097


SQL> alter table many_cols move compress basic;

Table altered.

SQL> exec dbms_stats.gather_table_stats(null, 'many_cols')

PL/SQL procedure successfully completed.

SQL> select blocks from user_tab_statistics where table_name = 'MANY_COLS';

BLOCKS
----------
10143


12.2 and later produce in principle again the same output as the unpatched 12.1.0.2, here as an example 18.3 (18.4.1 same behaviour):

SQL> create table many_cols (
2 col001 varchar2(10),
3 col002 varchar2(10),
4 col003 varchar2(10),
.
.
.
1000 col999 varchar2(10),
1001 col1000 varchar2(10)
1002 );

Table created.

SQL>
SQL> declare
2 s_sql1 varchar2(32767);
3 s_sql2 varchar2(32767);
4 begin
5 for i in 1..1000 loop
6 s_sql1 := s_sql1 || 'col' || to_char(i, 'FM9000') || ',';
7 s_sql2 := s_sql2 || '''BLABLA'',';
8 end loop;
9
10 s_sql1 := rtrim(s_sql1, ',');
11 s_sql2 := rtrim(s_sql2, ',');
12
13 s_sql1 := 'insert into many_cols (' || s_sql1 || ') values (' || s_sql2 || ')';
14
15 for i in 1..10000 loop
16 execute immediate s_sql1;
17 end loop;
18
19 commit;
20 end;
21 /

PL/SQL procedure successfully completed.

SQL> select count(*) from many_cols;

COUNT(*)
----------
10000

SQL> exec dbms_stats.gather_table_stats(null, 'many_cols')

PL/SQL procedure successfully completed.

SQL> select blocks from user_tab_statistics where table_name = 'MANY_COLS';

BLOCKS
----------
10003


SQL> alter table many_cols move compress basic;

Table altered.

SQL> exec dbms_stats.gather_table_stats(null, 'many_cols')

PL/SQL procedure successfully completed.

SQL> select blocks from user_tab_statistics where table_name = 'MANY_COLS';

BLOCKS
----------
55


If you wanted to make use of "wide" table compression in 12.1.0.2 with PSU applied, then the script would have to be changed to enable OLTP / Advanced Row compression before the initial population and then you'll get this output:

SQL> create table many_cols (
2 col001 varchar2(10),
3 col002 varchar2(10),
4 col003 varchar2(10),
.
.
.
1000 col999 varchar2(10),
1001 col1000 varchar2(10)
1002 )
1003 compress for oltp
1004 ;

Table created.

SQL>
SQL> declare
2 s_sql1 varchar2(32767);
3 s_sql2 varchar2(32767);
4 begin
5 for i in 1..1000 loop
6 s_sql1 := s_sql1 || 'col' || to_char(i, 'FM9000') || ',';
7 s_sql2 := s_sql2 || '''BLABLA'',';
8 end loop;
9
10 s_sql1 := rtrim(s_sql1, ',');
11 s_sql2 := rtrim(s_sql2, ',');
12
13 s_sql1 := 'insert into many_cols (' || s_sql1 || ') values (' || s_sql2 || ')';
14
15 for i in 1..10000 loop
16 execute immediate s_sql1;
17 end loop;
18
19 commit;
20 end;
21 /

PL/SQL procedure successfully completed.

SQL> select count(*) from many_cols;

COUNT(*)
----------
10000

SQL> exec dbms_stats.gather_table_stats(null, 'many_cols')

PL/SQL procedure successfully completed.

SQL> select blocks from user_tab_statistics where table_name = 'MANY_COLS';

BLOCKS
----------
88


SQL> alter table many_cols move compress basic;

Table altered.

SQL> exec dbms_stats.gather_table_stats(null, 'many_cols')

PL/SQL procedure successfully completed.

SQL> select blocks from user_tab_statistics where table_name = 'MANY_COLS';

BLOCKS
----------
10143


which confirms that the "wide" table compression works for conventional inserts, but it requires the additional "Advanced Compression" license. Another downside of this approach is a potentially huge impact on performance and undo / redo generation due to the repeated block compression with each compression operation dumping the whole before image block to undo / redo.

Script used:

set echo on

drop table many_cols purge;

create table many_cols (
col001 varchar2(10),
col002 varchar2(10),
col003 varchar2(10),
col004 varchar2(10),
col005 varchar2(10),
col006 varchar2(10),
col007 varchar2(10),
col008 varchar2(10),
col009 varchar2(10),
col010 varchar2(10),
col011 varchar2(10),
col012 varchar2(10),
col013 varchar2(10),
col014 varchar2(10),
col015 varchar2(10),
col016 varchar2(10),
col017 varchar2(10),
col018 varchar2(10),
col019 varchar2(10),
col020 varchar2(10),
col021 varchar2(10),
col022 varchar2(10),
col023 varchar2(10),
col024 varchar2(10),
col025 varchar2(10),
col026 varchar2(10),
col027 varchar2(10),
col028 varchar2(10),
col029 varchar2(10),
col030 varchar2(10),
col031 varchar2(10),
col032 varchar2(10),
col033 varchar2(10),
col034 varchar2(10),
col035 varchar2(10),
col036 varchar2(10),
col037 varchar2(10),
col038 varchar2(10),
col039 varchar2(10),
col040 varchar2(10),
col041 varchar2(10),
col042 varchar2(10),
col043 varchar2(10),
col044 varchar2(10),
col045 varchar2(10),
col046 varchar2(10),
col047 varchar2(10),
col048 varchar2(10),
col049 varchar2(10),
col050 varchar2(10),
col051 varchar2(10),
col052 varchar2(10),
col053 varchar2(10),
col054 varchar2(10),
col055 varchar2(10),
col056 varchar2(10),
col057 varchar2(10),
col058 varchar2(10),
col059 varchar2(10),
col060 varchar2(10),
col061 varchar2(10),
col062 varchar2(10),
col063 varchar2(10),
col064 varchar2(10),
col065 varchar2(10),
col066 varchar2(10),
col067 varchar2(10),
col068 varchar2(10),
col069 varchar2(10),
col070 varchar2(10),
col071 varchar2(10),
col072 varchar2(10),
col073 varchar2(10),
col074 varchar2(10),
col075 varchar2(10),
col076 varchar2(10),
col077 varchar2(10),
col078 varchar2(10),
col079 varchar2(10),
col080 varchar2(10),
col081 varchar2(10),
col082 varchar2(10),
col083 varchar2(10),
col084 varchar2(10),
col085 varchar2(10),
col086 varchar2(10),
col087 varchar2(10),
col088 varchar2(10),
col089 varchar2(10),
col090 varchar2(10),
col091 varchar2(10),
col092 varchar2(10),
col093 varchar2(10),
col094 varchar2(10),
col095 varchar2(10),
col096 varchar2(10),
col097 varchar2(10),
col098 varchar2(10),
col099 varchar2(10),
col100 varchar2(10),
col101 varchar2(10),
col102 varchar2(10),
col103 varchar2(10),
col104 varchar2(10),
col105 varchar2(10),
col106 varchar2(10),
col107 varchar2(10),
col108 varchar2(10),
col109 varchar2(10),
col110 varchar2(10),
col111 varchar2(10),
col112 varchar2(10),
col113 varchar2(10),
col114 varchar2(10),
col115 varchar2(10),
col116 varchar2(10),
col117 varchar2(10),
col118 varchar2(10),
col119 varchar2(10),
col120 varchar2(10),
col121 varchar2(10),
col122 varchar2(10),
col123 varchar2(10),
col124 varchar2(10),
col125 varchar2(10),
col126 varchar2(10),
col127 varchar2(10),
col128 varchar2(10),
col129 varchar2(10),
col130 varchar2(10),
col131 varchar2(10),
col132 varchar2(10),
col133 varchar2(10),
col134 varchar2(10),
col135 varchar2(10),
col136 varchar2(10),
col137 varchar2(10),
col138 varchar2(10),
col139 varchar2(10),
col140 varchar2(10),
col141 varchar2(10),
col142 varchar2(10),
col143 varchar2(10),
col144 varchar2(10),
col145 varchar2(10),
col146 varchar2(10),
col147 varchar2(10),
col148 varchar2(10),
col149 varchar2(10),
col150 varchar2(10),
col151 varchar2(10),
col152 varchar2(10),
col153 varchar2(10),
col154 varchar2(10),
col155 varchar2(10),
col156 varchar2(10),
col157 varchar2(10),
col158 varchar2(10),
col159 varchar2(10),
col160 varchar2(10),
col161 varchar2(10),
col162 varchar2(10),
col163 varchar2(10),
col164 varchar2(10),
col165 varchar2(10),
col166 varchar2(10),
col167 varchar2(10),
col168 varchar2(10),
col169 varchar2(10),
col170 varchar2(10),
col171 varchar2(10),
col172 varchar2(10),
col173 varchar2(10),
col174 varchar2(10),
col175 varchar2(10),
col176 varchar2(10),
col177 varchar2(10),
col178 varchar2(10),
col179 varchar2(10),
col180 varchar2(10),
col181 varchar2(10),
col182 varchar2(10),
col183 varchar2(10),
col184 varchar2(10),
col185 varchar2(10),
col186 varchar2(10),
col187 varchar2(10),
col188 varchar2(10),
col189 varchar2(10),
col190 varchar2(10),
col191 varchar2(10),
col192 varchar2(10),
col193 varchar2(10),
col194 varchar2(10),
col195 varchar2(10),
col196 varchar2(10),
col197 varchar2(10),
col198 varchar2(10),
col199 varchar2(10),
col200 varchar2(10),
col201 varchar2(10),
col202 varchar2(10),
col203 varchar2(10),
col204 varchar2(10),
col205 varchar2(10),
col206 varchar2(10),
col207 varchar2(10),
col208 varchar2(10),
col209 varchar2(10),
col210 varchar2(10),
col211 varchar2(10),
col212 varchar2(10),
col213 varchar2(10),
col214 varchar2(10),
col215 varchar2(10),
col216 varchar2(10),
col217 varchar2(10),
col218 varchar2(10),
col219 varchar2(10),
col220 varchar2(10),
col221 varchar2(10),
col222 varchar2(10),
col223 varchar2(10),
col224 varchar2(10),
col225 varchar2(10),
col226 varchar2(10),
col227 varchar2(10),
col228 varchar2(10),
col229 varchar2(10),
col230 varchar2(10),
col231 varchar2(10),
col232 varchar2(10),
col233 varchar2(10),
col234 varchar2(10),
col235 varchar2(10),
col236 varchar2(10),
col237 varchar2(10),
col238 varchar2(10),
col239 varchar2(10),
col240 varchar2(10),
col241 varchar2(10),
col242 varchar2(10),
col243 varchar2(10),
col244 varchar2(10),
col245 varchar2(10),
col246 varchar2(10),
col247 varchar2(10),
col248 varchar2(10),
col249 varchar2(10),
col250 varchar2(10),
col251 varchar2(10),
col252 varchar2(10),
col253 varchar2(10),
col254 varchar2(10),
col255 varchar2(10),
col256 varchar2(10),
col257 varchar2(10),
col258 varchar2(10),
col259 varchar2(10),
col260 varchar2(10),
col261 varchar2(10),
col262 varchar2(10),
col263 varchar2(10),
col264 varchar2(10),
col265 varchar2(10),
col266 varchar2(10),
col267 varchar2(10),
col268 varchar2(10),
col269 varchar2(10),
col270 varchar2(10),
col271 varchar2(10),
col272 varchar2(10),
col273 varchar2(10),
col274 varchar2(10),
col275 varchar2(10),
col276 varchar2(10),
col277 varchar2(10),
col278 varchar2(10),
col279 varchar2(10),
col280 varchar2(10),
col281 varchar2(10),
col282 varchar2(10),
col283 varchar2(10),
col284 varchar2(10),
col285 varchar2(10),
col286 varchar2(10),
col287 varchar2(10),
col288 varchar2(10),
col289 varchar2(10),
col290 varchar2(10),
col291 varchar2(10),
col292 varchar2(10),
col293 varchar2(10),
col294 varchar2(10),
col295 varchar2(10),
col296 varchar2(10),
col297 varchar2(10),
col298 varchar2(10),
col299 varchar2(10),
col300 varchar2(10),
col301 varchar2(10),
col302 varchar2(10),
col303 varchar2(10),
col304 varchar2(10),
col305 varchar2(10),
col306 varchar2(10),
col307 varchar2(10),
col308 varchar2(10),
col309 varchar2(10),
col310 varchar2(10),
col311 varchar2(10),
col312 varchar2(10),
col313 varchar2(10),
col314 varchar2(10),
col315 varchar2(10),
col316 varchar2(10),
col317 varchar2(10),
col318 varchar2(10),
col319 varchar2(10),
col320 varchar2(10),
col321 varchar2(10),
col322 varchar2(10),
col323 varchar2(10),
col324 varchar2(10),
col325 varchar2(10),
col326 varchar2(10),
col327 varchar2(10),
col328 varchar2(10),
col329 varchar2(10),
col330 varchar2(10),
col331 varchar2(10),
col332 varchar2(10),
col333 varchar2(10),
col334 varchar2(10),
col335 varchar2(10),
col336 varchar2(10),
col337 varchar2(10),
col338 varchar2(10),
col339 varchar2(10),
col340 varchar2(10),
col341 varchar2(10),
col342 varchar2(10),
col343 varchar2(10),
col344 varchar2(10),
col345 varchar2(10),
col346 varchar2(10),
col347 varchar2(10),
col348 varchar2(10),
col349 varchar2(10),
col350 varchar2(10),
col351 varchar2(10),
col352 varchar2(10),
col353 varchar2(10),
col354 varchar2(10),
col355 varchar2(10),
col356 varchar2(10),
col357 varchar2(10),
col358 varchar2(10),
col359 varchar2(10),
col360 varchar2(10),
col361 varchar2(10),
col362 varchar2(10),
col363 varchar2(10),
col364 varchar2(10),
col365 varchar2(10),
col366 varchar2(10),
col367 varchar2(10),
col368 varchar2(10),
col369 varchar2(10),
col370 varchar2(10),
col371 varchar2(10),
col372 varchar2(10),
col373 varchar2(10),
col374 varchar2(10),
col375 varchar2(10),
col376 varchar2(10),
col377 varchar2(10),
col378 varchar2(10),
col379 varchar2(10),
col380 varchar2(10),
col381 varchar2(10),
col382 varchar2(10),
col383 varchar2(10),
col384 varchar2(10),
col385 varchar2(10),
col386 varchar2(10),
col387 varchar2(10),
col388 varchar2(10),
col389 varchar2(10),
col390 varchar2(10),
col391 varchar2(10),
col392 varchar2(10),
col393 varchar2(10),
col394 varchar2(10),
col395 varchar2(10),
col396 varchar2(10),
col397 varchar2(10),
col398 varchar2(10),
col399 varchar2(10),
col400 varchar2(10),
col401 varchar2(10),
col402 varchar2(10),
col403 varchar2(10),
col404 varchar2(10),
col405 varchar2(10),
col406 varchar2(10),
col407 varchar2(10),
col408 varchar2(10),
col409 varchar2(10),
col410 varchar2(10),
col411 varchar2(10),
col412 varchar2(10),
col413 varchar2(10),
col414 varchar2(10),
col415 varchar2(10),
col416 varchar2(10),
col417 varchar2(10),
col418 varchar2(10),
col419 varchar2(10),
col420 varchar2(10),
col421 varchar2(10),
col422 varchar2(10),
col423 varchar2(10),
col424 varchar2(10),
col425 varchar2(10),
col426 varchar2(10),
col427 varchar2(10),
col428 varchar2(10),
col429 varchar2(10),
col430 varchar2(10),
col431 varchar2(10),
col432 varchar2(10),
col433 varchar2(10),
col434 varchar2(10),
col435 varchar2(10),
col436 varchar2(10),
col437 varchar2(10),
col438 varchar2(10),
col439 varchar2(10),
col440 varchar2(10),
col441 varchar2(10),
col442 varchar2(10),
col443 varchar2(10),
col444 varchar2(10),
col445 varchar2(10),
col446 varchar2(10),
col447 varchar2(10),
col448 varchar2(10),
col449 varchar2(10),
col450 varchar2(10),
col451 varchar2(10),
col452 varchar2(10),
col453 varchar2(10),
col454 varchar2(10),
col455 varchar2(10),
col456 varchar2(10),
col457 varchar2(10),
col458 varchar2(10),
col459 varchar2(10),
col460 varchar2(10),
col461 varchar2(10),
col462 varchar2(10),
col463 varchar2(10),
col464 varchar2(10),
col465 varchar2(10),
col466 varchar2(10),
col467 varchar2(10),
col468 varchar2(10),
col469 varchar2(10),
col470 varchar2(10),
col471 varchar2(10),
col472 varchar2(10),
col473 varchar2(10),
col474 varchar2(10),
col475 varchar2(10),
col476 varchar2(10),
col477 varchar2(10),
col478 varchar2(10),
col479 varchar2(10),
col480 varchar2(10),
col481 varchar2(10),
col482 varchar2(10),
col483 varchar2(10),
col484 varchar2(10),
col485 varchar2(10),
col486 varchar2(10),
col487 varchar2(10),
col488 varchar2(10),
col489 varchar2(10),
col490 varchar2(10),
col491 varchar2(10),
col492 varchar2(10),
col493 varchar2(10),
col494 varchar2(10),
col495 varchar2(10),
col496 varchar2(10),
col497 varchar2(10),
col498 varchar2(10),
col499 varchar2(10),
col500 varchar2(10),
col501 varchar2(10),
col502 varchar2(10),
col503 varchar2(10),
col504 varchar2(10),
col505 varchar2(10),
col506 varchar2(10),
col507 varchar2(10),
col508 varchar2(10),
col509 varchar2(10),
col510 varchar2(10),
col511 varchar2(10),
col512 varchar2(10),
col513 varchar2(10),
col514 varchar2(10),
col515 varchar2(10),
col516 varchar2(10),
col517 varchar2(10),
col518 varchar2(10),
col519 varchar2(10),
col520 varchar2(10),
col521 varchar2(10),
col522 varchar2(10),
col523 varchar2(10),
col524 varchar2(10),
col525 varchar2(10),
col526 varchar2(10),
col527 varchar2(10),
col528 varchar2(10),
col529 varchar2(10),
col530 varchar2(10),
col531 varchar2(10),
col532 varchar2(10),
col533 varchar2(10),
col534 varchar2(10),
col535 varchar2(10),
col536 varchar2(10),
col537 varchar2(10),
col538 varchar2(10),
col539 varchar2(10),
col540 varchar2(10),
col541 varchar2(10),
col542 varchar2(10),
col543 varchar2(10),
col544 varchar2(10),
col545 varchar2(10),
col546 varchar2(10),
col547 varchar2(10),
col548 varchar2(10),
col549 varchar2(10),
col550 varchar2(10),
col551 varchar2(10),
col552 varchar2(10),
col553 varchar2(10),
col554 varchar2(10),
col555 varchar2(10),
col556 varchar2(10),
col557 varchar2(10),
col558 varchar2(10),
col559 varchar2(10),
col560 varchar2(10),
col561 varchar2(10),
col562 varchar2(10),
col563 varchar2(10),
col564 varchar2(10),
col565 varchar2(10),
col566 varchar2(10),
col567 varchar2(10),
col568 varchar2(10),
col569 varchar2(10),
col570 varchar2(10),
col571 varchar2(10),
col572 varchar2(10),
col573 varchar2(10),
col574 varchar2(10),
col575 varchar2(10),
col576 varchar2(10),
col577 varchar2(10),
col578 varchar2(10),
col579 varchar2(10),
col580 varchar2(10),
col581 varchar2(10),
col582 varchar2(10),
col583 varchar2(10),
col584 varchar2(10),
col585 varchar2(10),
col586 varchar2(10),
col587 varchar2(10),
col588 varchar2(10),
col589 varchar2(10),
col590 varchar2(10),
col591 varchar2(10),
col592 varchar2(10),
col593 varchar2(10),
col594 varchar2(10),
col595 varchar2(10),
col596 varchar2(10),
col597 varchar2(10),
col598 varchar2(10),
col599 varchar2(10),
col600 varchar2(10),
col601 varchar2(10),
col602 varchar2(10),
col603 varchar2(10),
col604 varchar2(10),
col605 varchar2(10),
col606 varchar2(10),
col607 varchar2(10),
col608 varchar2(10),
col609 varchar2(10),
col610 varchar2(10),
col611 varchar2(10),
col612 varchar2(10),
col613 varchar2(10),
col614 varchar2(10),
col615 varchar2(10),
col616 varchar2(10),
col617 varchar2(10),
col618 varchar2(10),
col619 varchar2(10),
col620 varchar2(10),
col621 varchar2(10),
col622 varchar2(10),
col623 varchar2(10),
col624 varchar2(10),
col625 varchar2(10),
col626 varchar2(10),
col627 varchar2(10),
col628 varchar2(10),
col629 varchar2(10),
col630 varchar2(10),
col631 varchar2(10),
col632 varchar2(10),
col633 varchar2(10),
col634 varchar2(10),
col635 varchar2(10),
col636 varchar2(10),
col637 varchar2(10),
col638 varchar2(10),
col639 varchar2(10),
col640 varchar2(10),
col641 varchar2(10),
col642 varchar2(10),
col643 varchar2(10),
col644 varchar2(10),
col645 varchar2(10),
col646 varchar2(10),
col647 varchar2(10),
col648 varchar2(10),
col649 varchar2(10),
col650 varchar2(10),
col651 varchar2(10),
col652 varchar2(10),
col653 varchar2(10),
col654 varchar2(10),
col655 varchar2(10),
col656 varchar2(10),
col657 varchar2(10),
col658 varchar2(10),
col659 varchar2(10),
col660 varchar2(10),
col661 varchar2(10),
col662 varchar2(10),
col663 varchar2(10),
col664 varchar2(10),
col665 varchar2(10),
col666 varchar2(10),
col667 varchar2(10),
col668 varchar2(10),
col669 varchar2(10),
col670 varchar2(10),
col671 varchar2(10),
col672 varchar2(10),
col673 varchar2(10),
col674 varchar2(10),
col675 varchar2(10),
col676 varchar2(10),
col677 varchar2(10),
col678 varchar2(10),
col679 varchar2(10),
col680 varchar2(10),
col681 varchar2(10),
col682 varchar2(10),
col683 varchar2(10),
col684 varchar2(10),
col685 varchar2(10),
col686 varchar2(10),
col687 varchar2(10),
col688 varchar2(10),
col689 varchar2(10),
col690 varchar2(10),
col691 varchar2(10),
col692 varchar2(10),
col693 varchar2(10),
col694 varchar2(10),
col695 varchar2(10),
col696 varchar2(10),
col697 varchar2(10),
col698 varchar2(10),
col699 varchar2(10),
col700 varchar2(10),
col701 varchar2(10),
col702 varchar2(10),
col703 varchar2(10),
col704 varchar2(10),
col705 varchar2(10),
col706 varchar2(10),
col707 varchar2(10),
col708 varchar2(10),
col709 varchar2(10),
col710 varchar2(10),
col711 varchar2(10),
col712 varchar2(10),
col713 varchar2(10),
col714 varchar2(10),
col715 varchar2(10),
col716 varchar2(10),
col717 varchar2(10),
col718 varchar2(10),
col719 varchar2(10),
col720 varchar2(10),
col721 varchar2(10),
col722 varchar2(10),
col723 varchar2(10),
col724 varchar2(10),
col725 varchar2(10),
col726 varchar2(10),
col727 varchar2(10),
col728 varchar2(10),
col729 varchar2(10),
col730 varchar2(10),
col731 varchar2(10),
col732 varchar2(10),
col733 varchar2(10),
col734 varchar2(10),
col735 varchar2(10),
col736 varchar2(10),
col737 varchar2(10),
col738 varchar2(10),
col739 varchar2(10),
col740 varchar2(10),
col741 varchar2(10),
col742 varchar2(10),
col743 varchar2(10),
col744 varchar2(10),
col745 varchar2(10),
col746 varchar2(10),
col747 varchar2(10),
col748 varchar2(10),
col749 varchar2(10),
col750 varchar2(10),
col751 varchar2(10),
col752 varchar2(10),
col753 varchar2(10),
col754 varchar2(10),
col755 varchar2(10),
col756 varchar2(10),
col757 varchar2(10),
col758 varchar2(10),
col759 varchar2(10),
col760 varchar2(10),
col761 varchar2(10),
col762 varchar2(10),
col763 varchar2(10),
col764 varchar2(10),
col765 varchar2(10),
col766 varchar2(10),
col767 varchar2(10),
col768 varchar2(10),
col769 varchar2(10),
col770 varchar2(10),
col771 varchar2(10),
col772 varchar2(10),
col773 varchar2(10),
col774 varchar2(10),
col775 varchar2(10),
col776 varchar2(10),
col777 varchar2(10),
col778 varchar2(10),
col779 varchar2(10),
col780 varchar2(10),
col781 varchar2(10),
col782 varchar2(10),
col783 varchar2(10),
col784 varchar2(10),
col785 varchar2(10),
col786 varchar2(10),
col787 varchar2(10),
col788 varchar2(10),
col789 varchar2(10),
col790 varchar2(10),
col791 varchar2(10),
col792 varchar2(10),
col793 varchar2(10),
col794 varchar2(10),
col795 varchar2(10),
col796 varchar2(10),
col797 varchar2(10),
col798 varchar2(10),
col799 varchar2(10),
col800 varchar2(10),
col801 varchar2(10),
col802 varchar2(10),
col803 varchar2(10),
col804 varchar2(10),
col805 varchar2(10),
col806 varchar2(10),
col807 varchar2(10),
col808 varchar2(10),
col809 varchar2(10),
col810 varchar2(10),
col811 varchar2(10),
col812 varchar2(10),
col813 varchar2(10),
col814 varchar2(10),
col815 varchar2(10),
col816 varchar2(10),
col817 varchar2(10),
col818 varchar2(10),
col819 varchar2(10),
col820 varchar2(10),
col821 varchar2(10),
col822 varchar2(10),
col823 varchar2(10),
col824 varchar2(10),
col825 varchar2(10),
col826 varchar2(10),
col827 varchar2(10),
col828 varchar2(10),
col829 varchar2(10),
col830 varchar2(10),
col831 varchar2(10),
col832 varchar2(10),
col833 varchar2(10),
col834 varchar2(10),
col835 varchar2(10),
col836 varchar2(10),
col837 varchar2(10),
col838 varchar2(10),
col839 varchar2(10),
col840 varchar2(10),
col841 varchar2(10),
col842 varchar2(10),
col843 varchar2(10),
col844 varchar2(10),
col845 varchar2(10),
col846 varchar2(10),
col847 varchar2(10),
col848 varchar2(10),
col849 varchar2(10),
col850 varchar2(10),
col851 varchar2(10),
col852 varchar2(10),
col853 varchar2(10),
col854 varchar2(10),
col855 varchar2(10),
col856 varchar2(10),
col857 varchar2(10),
col858 varchar2(10),
col859 varchar2(10),
col860 varchar2(10),
col861 varchar2(10),
col862 varchar2(10),
col863 varchar2(10),
col864 varchar2(10),
col865 varchar2(10),
col866 varchar2(10),
col867 varchar2(10),
col868 varchar2(10),
col869 varchar2(10),
col870 varchar2(10),
col871 varchar2(10),
col872 varchar2(10),
col873 varchar2(10),
col874 varchar2(10),
col875 varchar2(10),
col876 varchar2(10),
col877 varchar2(10),
col878 varchar2(10),
col879 varchar2(10),
col880 varchar2(10),
col881 varchar2(10),
col882 varchar2(10),
col883 varchar2(10),
col884 varchar2(10),
col885 varchar2(10),
col886 varchar2(10),
col887 varchar2(10),
col888 varchar2(10),
col889 varchar2(10),
col890 varchar2(10),
col891 varchar2(10),
col892 varchar2(10),
col893 varchar2(10),
col894 varchar2(10),
col895 varchar2(10),
col896 varchar2(10),
col897 varchar2(10),
col898 varchar2(10),
col899 varchar2(10),
col900 varchar2(10),
col901 varchar2(10),
col902 varchar2(10),
col903 varchar2(10),
col904 varchar2(10),
col905 varchar2(10),
col906 varchar2(10),
col907 varchar2(10),
col908 varchar2(10),
col909 varchar2(10),
col910 varchar2(10),
col911 varchar2(10),
col912 varchar2(10),
col913 varchar2(10),
col914 varchar2(10),
col915 varchar2(10),
col916 varchar2(10),
col917 varchar2(10),
col918 varchar2(10),
col919 varchar2(10),
col920 varchar2(10),
col921 varchar2(10),
col922 varchar2(10),
col923 varchar2(10),
col924 varchar2(10),
col925 varchar2(10),
col926 varchar2(10),
col927 varchar2(10),
col928 varchar2(10),
col929 varchar2(10),
col930 varchar2(10),
col931 varchar2(10),
col932 varchar2(10),
col933 varchar2(10),
col934 varchar2(10),
col935 varchar2(10),
col936 varchar2(10),
col937 varchar2(10),
col938 varchar2(10),
col939 varchar2(10),
col940 varchar2(10),
col941 varchar2(10),
col942 varchar2(10),
col943 varchar2(10),
col944 varchar2(10),
col945 varchar2(10),
col946 varchar2(10),
col947 varchar2(10),
col948 varchar2(10),
col949 varchar2(10),
col950 varchar2(10),
col951 varchar2(10),
col952 varchar2(10),
col953 varchar2(10),
col954 varchar2(10),
col955 varchar2(10),
col956 varchar2(10),
col957 varchar2(10),
col958 varchar2(10),
col959 varchar2(10),
col960 varchar2(10),
col961 varchar2(10),
col962 varchar2(10),
col963 varchar2(10),
col964 varchar2(10),
col965 varchar2(10),
col966 varchar2(10),
col967 varchar2(10),
col968 varchar2(10),
col969 varchar2(10),
col970 varchar2(10),
col971 varchar2(10),
col972 varchar2(10),
col973 varchar2(10),
col974 varchar2(10),
col975 varchar2(10),
col976 varchar2(10),
col977 varchar2(10),
col978 varchar2(10),
col979 varchar2(10),
col980 varchar2(10),
col981 varchar2(10),
col982 varchar2(10),
col983 varchar2(10),
col984 varchar2(10),
col985 varchar2(10),
col986 varchar2(10),
col987 varchar2(10),
col988 varchar2(10),
col989 varchar2(10),
col990 varchar2(10),
col991 varchar2(10),
col992 varchar2(10),
col993 varchar2(10),
col994 varchar2(10),
col995 varchar2(10),
col996 varchar2(10),
col997 varchar2(10),
col998 varchar2(10),
col999 varchar2(10),
col1000 varchar2(10)
);

declare
s_sql1 varchar2(32767);
s_sql2 varchar2(32767);
begin
for i in 1..1000 loop
s_sql1 := s_sql1 || 'col' || to_char(i, 'FM9000') || ',';
s_sql2 := s_sql2 || '''BLABLA'',';
end loop;

s_sql1 := rtrim(s_sql1, ',');
s_sql2 := rtrim(s_sql2, ',');

s_sql1 := 'insert into many_cols (' || s_sql1 || ') values (' || s_sql2 || ')';

for i in 1..10000 loop
execute immediate s_sql1;
end loop;

commit;
end;
/

select count(*) from many_cols;

exec dbms_stats.gather_table_stats(null, 'many_cols')

select blocks from user_tab_statistics where table_name = 'MANY_COLS';

-- alter session set "_widetab_comp_enabled" = false;

alter table many_cols move compress basic;

exec dbms_stats.gather_table_stats(null, 'many_cols')

select blocks from user_tab_statistics where table_name = 'MANY_COLS';

I/O Benchmark Minor Update

Wed, 2019-05-01 16:44
I've recently published a new version 1.03 of the I/O benchmark scripts on my github repository (ideally pick the IO_BENCHMARK.ZIP containing all the scripts). The original post including some more instructions can be found here, and there is also a video on my Youtube channel explaining how to use the benchmark scripts.

The main change is a new version of the "Write IOPS" benchmark that should scale much better than the older version.

There are now actually two variants of the "max_write_iops_benchmark_slave.sql" script. The currently used one is based on a batch SQL update whereas the "max_write_iops_benchmark_slave_forall.sql" script uses a PL/SQL FORALL update approach to achieve the same. In my tests the two performed quite similarly, but I've decided to include both so you can test which one works better for you - just rename the scripts accordingly.


In order to max out "Write IOPS" I suggest you create objects that are smaller than the corresponding buffer cache so can be cached entirely and set FAST_START_MTTR_TARGET to 1 to maximize the write pressure on the DBWR process(es). The Online Redo Logs should be sized adequately in order to avoid bottlenecks in that area. The script is designed to minimize redo generation and maximize the number of blocks modified that have then to be written by DBWR.


You could still run the script in a mixed read/write IOPS mode if you create objects larger than the buffer cache - in which case there can be additional pressure on the DBWR if there are no free buffers to read in new blocks ("free buffer waits"). I've also already used successfully both the "Read IOPS" and "Write IOPS" benchmark scripts simultaneously to maximize both, read and write IOPS.


There is still the problem at least in 12.2 (and I think it's still there in 18c but not entirely sure off the top of my head) that the PDB level AWR reports don't cover properly the DBWR "Write I/O" related figures, so although the script specifically detects that it runs on 12.2+ and on PDB level and creates AWR PDB reports accordingly you won't get any useful "Write IOPS" results and would have to either run the benchmark on CDB level or create CDB level AWR snapshots accordingly.


The interactive scripts now also echo the command finally called to execute the actual benchmark script, which can be helpful if you don't want to go through the interactive script again and again for repeated executions. Of course you still would need to take care of dropping / creating / keeping a suitable schema where to execute the benchmark, and maybe also modify the scripts that they don't keep creating and dropping the objects if you want have multiple runs with same object size / concurrency settings.

I'm thinking about a version 2.0 of the I/O benchmark scripts that should be improved in various aspects - the "interactive" script should become much more user friendly with improved defaults and abbreviations that can be entered, and more options like keeping the schema / objects. Also the measurement of the IOPS / throughput should be improved by monitoring the figures continuously which should provide a much better picture of the performance over time (varying IOPS rates for example). The core I/O benchmark scripts seem to be working pretty well (now that the write IOPS is improved) so I don't see much need for improvement there. Maybe an option to execute the benchmark in a kind of loop with increasing object sizes / concurrency level might also be useful.

Bloom Filter Efficiency And Cardinality Estimates

Tue, 2019-04-23 18:45
I've recently came across an interesting observation I've not seen documented yet, so I'm publishing a simple example here to demonstrate the issue.

In principle it looks like that the efficiency of Bloom Filter operations are dependent on the cardinality estimates. This means that in particular cardinality under-estimates of the optimizer can make a dramatic difference how efficient a corresponding Bloom Filter operation based on such a cardinality estimate will work at runtime. Since Bloom Filters are crucial for efficient processing in particular when using Exadata or In Memory column store this can have significant impact on the performance of affected operations.

While other operations based on SQL workareas like hash joins for example can be affected by such cardinality mis-estimates, too, these seem to be capable of adapting at runtime - at least to a certain degree. However I haven't seen such an adaptive behaviour of Bloom Filter operations at runtime (not even when executing the same statement multiple times and statistics feedback not kicking in).

To demonstrate the issue I'll create two simple tables that get joined and one of them gets a filter applied:

create table t1 parallel 4 nologging compress
as
with
generator1 as
(
select /*+
cardinality(1e3)
materialize
*/
rownum as id
, rpad('x', 100) as filler
from
dual
connect by
level <= 1e3
),
generator2 as
(
select /*+
cardinality(1e4)
materialize
*/
rownum as id
, rpad('x', 100) as filler
from
dual
connect by
level <= 1e4
)
select
id
, id as id2
, rpad('x', 100) as filler
from (
select /*+ leading(b a) */
(a.id - 1) * 1e4 + b.id as id
from
generator1 a
, generator2 b
)
;

alter table t1 noparallel;

create table t2 parallel 4 nologging compress as select * from t1;

alter table t2 noparallel;

All I did here is create two tables with 10 million rows each, and I'll look at the runtime statistics of the following query:

select /*+ no_merge(x) */ * from (
select /*+
leading(t1)
use_hash(t2)
px_join_filter(t2)
opt_estimate(table t1 rows=1)
--opt_estimate(table t1 rows=250000)
monitor
*/
t1.id
, t2.id2
from
t1
, t2
where
mod(t1.id2, 40) = 0
-- t1.id2 between 1 and 250000
and t1.id = t2.id
) x
where rownum > 1;

Note: If you try to reproduce make sure you get actually a Bloom Filter operation - in an unpatched version 12.1.0.2 I had to add a PARALLEL(2) hint to actually get the Bloom Filter operation.

The query filters on T1 so that 250K rows will be returned and then joins to T2. The first interesting observation regarding the efficiency of the Bloom Filter is that the actual data pattern makes a significant difference: When using the commented filter "T1.ID2 BETWEEN 1 and 250000" the resulting cardinality will be same as when using the "MOD(T1.ID2, 40) = 0", but the former will result in a perfect filtering of the Bloom Filter regardless of the OPT_ESTIMATE hint used, whereas when using the latter the efficiency will be dramatically different.

This is what I get when using version 18.3 (12.1.0.2 showed very similar results) and force the under-estimate using the OPT_ESTIMATE ROWS=1 hint - the output is from my XPLAN_ASH script and edited for brevity:

------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Execs | A-Rows | PGA |
------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | | | 1 | 0 | |
| 1 | COUNT | | | | 1 | 0 | |
|* 2 | FILTER | | | | 1 | 0 | |
| 3 | VIEW | | 1 | 26 | 1 | 250K | |
|* 4 | HASH JOIN | | 1 | 24 | 1 | 250K | 12556K |
| 5 | JOIN FILTER CREATE| :BF0000 | 1 | 12 | 1 | 250K | |
|* 6 | TABLE ACCESS FULL| T1 | 1 | 12 | 1 | 250K | |
| 7 | JOIN FILTER USE | :BF0000 | 10M| 114M| 1 | 10000K | |
|* 8 | TABLE ACCESS FULL| T2 | 10M| 114M| 1 | 10000K | |
------------------------------------------------------------------------------------

The Bloom Filter didn't help much, only a few rows were actually filtered (otherwise my XPLAN_ASH script would have shown "10M" as actually cardinality instead of "10000K", which is something slightly less than 10M rounded up).

Repeat the same but this time using the OPT_ESTIMATE ROWS=250000 hint:

-------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes |TempSpc| Execs | A-Rows| PGA |
-------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | | | | 1 | 0 | |
| 1 | COUNT | | | | | 1 | 0 | |
|* 2 | FILTER | | | | | 1 | 0 | |
| 3 | VIEW | | 252K| 6402K| | 1 | 250K | |
|* 4 | HASH JOIN | | 252K| 5909K| 5864K| 1 | 250K | 12877K |
| 5 | JOIN FILTER CREATE| :BF0000 | 250K| 2929K| | 1 | 250K | |
|* 6 | TABLE ACCESS FULL| T1 | 250K| 2929K| | 1 | 250K | |
| 7 | JOIN FILTER USE | :BF0000 | 10M| 114M| | 1 | 815K | |
|* 8 | TABLE ACCESS FULL| T2 | 10M| 114M| | 1 | 815K | |
-------------------------------------------------------------------------------------------

So we end up with exactly the same execution plan but the efficiency of the Bloom Filter at runtime has changed dramatically due to the different cardinality estimate the Bloom Filter is based on.

I haven't spent much time yet with the corresponding undocumented parameters that might influence the Bloom Filter behaviour, but when I repeated the same and used the following settings in the session (and ensuring an adequate PGA_AGGREGATE_TARGET setting otherwise the hash join might be starting spilling to disk, which means the Bloom Filter size is considered when calculating SQL workarea sizes):

alter session set "_bloom_filter_size" = 1000000;

I got the following result:

-----------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Execs | A-Rows| PGA |
-----------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | | | 1 | 0 | |
| 1 | COUNT | | | | 1 | 0 | |
|* 2 | FILTER | | | | 1 | 0 | |
| 3 | VIEW | | 1 | 26 | 1 | 250K | |
|* 4 | HASH JOIN | | 1 | 24 | 1 | 250K | 12568K |
| 5 | JOIN FILTER CREATE| :BF0000 | 1 | 12 | 1 | 250K | |
|* 6 | TABLE ACCESS FULL| T1 | 1 | 12 | 1 | 250K | |
| 7 | JOIN FILTER USE | :BF0000 | 10M| 114M| 1 | 815K | |
|* 8 | TABLE ACCESS FULL| T2 | 10M| 114M| 1 | 815K | |
-----------------------------------------------------------------------------------

which shows a slightly increased PGA usage compared to the first output but the same efficiency as when having the better cardinality estimate in place.

Increasing the size I couldn't however convince Oracle to make the Bloom Filter even more efficient, even when the better cardinality estimate was in place.

Summary

Obviously the efficiency / internal sizing of the Bloom Filter vector at runtime depends on the cardinality estimates of the optimizer. Depending on the actual data pattern this can make a significant difference in terms of efficiency. Yet another reason why having good cardinality estimates is a good thing and yet sometimes so hard to achieve, in particular for join cardinalities.

Footnote

On MyOracleSupport I've found the following note regarding Bloom Filter efficiency:

Bug 8932139 - Bloom filtering efficiency is inversely proportional to DOP (Doc ID 8932139.8)

Another interesting behaviour - the bug is only fixed in version 19.1 but also included in the latest RU(R)s of 18c and 12.2 from January 2019 on.

Chinar Aliyev's Blog

Tue, 2019-04-23 17:04
Chinar Aliyev has recently started to pick up on several of my blog posts regarding Parallel Execution and the corresponding new features introduced in Oracle 12c.

It is good to see that obviously Oracle has since then improved some of these and added new ones as well.

Here are some links to the corresponding posts:

New automatic Parallel Outer Join Null Handling in 18c

Improvements regarding automatic parallel distribution skew handling in 18c

Chinar has also put some more thoughts on the HASH JOIN BUFFERED operation:

New thoughts about the HASH JOIN BUFFERED operation

There are also a number of posts on his blog regarding histograms and in particular how to properly calculate the join cardinality in the presence of additional filters and resulting skew, which is a very interesting topic and yet to be handled properly by the optimizer even in the latest versions.

Speaking At DOAG 2018 Conference And IT Tage 2018

Wed, 2018-07-18 15:29
I will be speaking at the yearly DOAG conference in December as well as at the IT Tage in November. My talk will be "Oracle Optimizer System Statistics Update 2018" where I summarize the history and current state of affairs regarding System Statistics and I/O calibration in recent Oracle versions like 12c and 18c.

Speaking At DOAG 2018 Exa & Middleware Days In Frankfurt

Sun, 2018-04-08 07:45
I will be speaking at the DOAG 2018 Exa & Middleware Days in Frankfurt on June 18th and 19th. My talk will be "Exadata & InMemory Real World Performance" where I discuss the different performance improvements you can expect from the super fast scans delivered by those technologies depending on the actual work profile of the SQL and data used.

Hope to see you there!

No Asynchronous I/O When Using Shared Server (Also Known As MTS)

Wed, 2018-03-14 10:55
I've recently had a case at a client where it was questioned why a particular application was seemingly not making full use of the available I/O capabilities - in comparison to other databases / applications using similar storage.

Basically it ended up in a kind of finger pointing between the application vendor and the IT DBA / storage admins, one side saying that the infrastructure used offers insufficient I/O capabilities (since the most important application tasks where dominated by I/O waits in the database), and the other side saying that the application doesn't make use of the I/O capabilities offered - compared to other databases / applications that showed a significantly higher IOPS rate and/or I/O throughput using the same kind of storage.

At the end it turned out that in some way both sides were right, because the application made use of a somewhat unusual configuration for batch processing: Due to very slow Dedicated Server connection establishment that slowed down some other, more interactive part of the application, the database and client connection strings were configured to use Shared Server connections by default for all parts of the application. This successfully solved the connection establishment problem but obviously introduced another, so far not recognized problem.

Using my recently published I/O benchmark we performed some tests to measure the maximum IOPS and throughput of that specific database independently from the application, and the results were surprising, because in particular the test variations that were supposed to perform asynchronous physical read single block ("db file parallel read") and multi block I/O ("direct path read") didn't do so, but showed synchronous I/O only ("db file sequential read" / "db file scattered read").

After some investigations it became obvious the reason for this behaviour was the usage of the Shared Server architecture - simply switching to Dedicated Server sessions showed the expected behaviour and also a significantly higher maximum IOPS rate and I/O throughput at the same level of concurrency.

It's very easy to reproduce this, using for example my read IOPS and throughput benchmark scripts and performing the benchmark using either Shared or Dedicated Server architecture in asynchronous I/O mode.

For example, this is what I get running this on my laptop using Dedicated Server and testing maximum read I/O throughput in asynchronous I/O mode (which should result in "direct path read" operations bypassing the buffer cache):


If I repeat exactly the same test (same settings, number of processes, size of objects etc.) using Shared Server architecture, this is what I get:


This is particularly interesting - no direct path reads although the benchmark sessions set in this case "_serial_direct_read" = 'always'.

In principle the same can be seen when running the maximum read IOPS benchmark, here is the expected result when using Dedicated Servers in asynchronous I/O mode:


And again, this is what I get when running the same test using Shared Servers:


Again no sign of asynchronous I/O ("db file parallel read") - and the achieved IOPS rate is significantly lower, which is exactly what the client experienced, much more DB time waiting for I/O and less time spent on CPU than expected. Depending on the particular storage configuration and latency the difference when using the same number of processes can be even more significant, I've seen up to factor 20 difference in achieved IOPS rate, and factor 3 to 4 is quite typical. Of course as usual how relevant this really is all depends on the actual query, execution plan and data pattern and where most of the time is spent.

Also, it is important to point out that all this might not be too relevant to most configurations, since Shared Servers aren't used that much nowadays in the times of application servers / connection pools dominating typical environments. Also it's probably rather uncommon to use Shared Servers for batch processing tasks like this client did.

Although Shared Servers were originally designed for typical OLTP applications having many open sessions being idle most of the time and performing mostly very simple and straightforward operations (like looking up a few rows via an efficient index access path) it's still interesting to see this limitation that can be quite significant depending on the I/O patterns used. So far I've not seen this documented anywhere, also on MyOracleSupport I couldn't find any matching notes describing the behaviour, and unfortunately no Service Request was opened for the issue yet.

When using Parallel Execution, by the way, which is also very unlikely when using Shared Servers but perfectly possible (the Shared Server session then acts as Query Coordinator), the Parallel Servers can make use of asynchronous I/O - so the limitation only seems to apply to the Shared Server session itself. I can think of some reasons why this limitation could be explained from an implementation point of view the way Shared Servers work, but this is of course only speculation.

All currently relevant versions show the same behaviour in that regard, by the way. I've reproduced this on 11.2.0.4, 12.1.0.2 and 12.2.0.1 on Windows as well as on Linux.

Oracle Database Physical I/O IOPS And Throughput Benchmark

Wed, 2018-03-07 11:38
General Information

I've used a similar set of scripts quite a few times now to provide feedback to customers that wanted to get a more realistic picture of the I/O capabilities from a database perspective, rather than relying on artificial benchmarks performed outside - or even inside, think of I/O calibration - the database. Although there are already several Oracle benchmark toolkits available, like Swingbench and in particular SLOB, that run inside the database I was looking for a very simplistic and straightforward I/O benchmark that comes with minimum overhead and in principle allows maximizing I/O from the database perspective, so using regular Oracle database codepaths based on SQL execution plans, I/O calls performed as part of that, involving the database buffer cache where applicable and in particular supports focusing on specific I/O patterns (which might be unique to this benchmark toolkit). Therefore I've developed a set of simple scripts that support in total four different I/O tests:

- Single and multi block physical reads

- Single and multi block physical writes

In case of the physical read tests support for synchronous and asynchronous reads is provided, which can be controlled via a corresponding input / parameter to the script.

In terms of instrumentation / internal implementation this corresponds to:

- synchronous single block reads: "db file sequential read" / "cell single block physical read" on Exadata

- asynchronous single block reads: "db file parallel read" / "cell list of blocks physical read" on Exadata

- synchronous multi block reads: "db file scattered read" / "cell multiblock physical read" on Exadata

- asynchronous multi block reads: "direct path read" / "cell smart table/index scan" on Exadata

The physical write tests should mainly trigger "db file parallel write" in case of the single block writes and "direct path write" in case of the multi block writes. Of course when it comes to DML activity things get more complicated in terms of the actual database activity triggered, in particular the additional undo / redo / archiving / potential standby activity. Note that if you're using an auto-extensible Undo tablespace configuration, running the single block physical write tests might increase the size of your Undo tablespace - you have been warned.

So each I/O test generates a specific I/O pattern (except for the single block write test that can also trigger synchronous single block reads, see the script for more details). The basic idea is to run just one of the tests to maximize the specific I/O pattern, but of course nothing stops you from running several of the tests at the same time  (would require different schemas to use for each test because otherwise object names dropped / created will collide / overlap) which will result in a mixture of I/O patterns. There is no synchronisation though in terms of starting / stopping / generating performance report snapshots etc. when running multiple of these tests at the same time, so you would probably have to take care of that yourself. So far I've not tested this, so it might not work out as just described.

In case of the physical read tests (except for the asynchronous "direct path read" that bypasses the buffer cache by definition) using too small objects / a too large buffer cache can turn this into a CPU / logical I/O test instead, so in principle you could use those tests for generating mainly CPU load instead of physical I/O (and provided the physical I/O is quick enough the CPU usage will be significant anyway), but that is not the intended usage of the scripts.

The scripts allow control over the size of the objects created and also support placing in specific buffer caches via the STORAGE clause (like RECYCLE or KEEP cache), so it is up to you to create objects of a suitable size depending on your configuration and intended usage.

Usage

Please note - the scripts are freely available and come with no warranty at all - so please use at your own risk.

In principle the scripts can be downloaded from my github repository - ideally pick the IO_BENCHMARK.ZIP which contains all required scripts, and should simply be extracted into some directory. Since the four different I/O tests are so similar, there is a subdirectory "common" under "io_benchmark" that holds all the common script parts and the main scripts then just call these common scripts where applicable.

The benchmark scripts consist of four similar sets:

max_read_iops_benchmark*: Single block reads synchronous / asynchronous

max_read_throughput_benchmark*: Multi block reads synchronous / asynchronous

max_write_iops_benchmark*: Single block writes - optionally mixed with synchronous single block reads (depends on object vs. buffer cache size)

max_write_throughput_benchmark*: Multi block direct writes

Each set consists of three scripts - an interactive guided script prompting for inputs used as parameters for the actual benchmark harness that in turn will launch another "slave" script as many times as desired to run the concurrent benchmark code.

There are in principle two different ways how the scripts can be used:

1. For each set there is a script that is supposed to be used from a SYSDBA account and guides through the different options available (*interactive). It will drop and re-create a schema to be used for the benchmark and grant the minimum privileges required to create the objects and run the benchmark. At the end of this interactive script it will connect as the user just created and run the benchmark. You can also use this script to clean-up afterwards, which is dropping the user created and stopping the script at that point.

2. The interactive script just calls the main benchmark harness with the parameters specified, so if you already have everything in place (check the "interactive_create_user.sql" in the "common" script subdirectory for details what privileges are required) to run the benchmark you can simply connect as the intended user, call the actual benchmark script and specify the parameters as desired - it will use defaults for any parameter not explicitly specified - check the script header for more details. Please note that I haven't tested running the actual benchmark as SYS respectively SYSDBA and I wouldn't recommend doing so. Instead use a dedicated user / schema as created by the interactive script part.

Each set of scripts consists of a third script which is the "slave" script being called as many times concurrently as specified to perform the actual benchmark activity.

The scripts will generate objects, typically as part of the preparation steps before the actual concurrent benchmark activity starts, or in case of the multi block write test, the object creation is the actual benchmark activity.

After the benchmark ran for the specified amount of time (600 seconds / 10 minutes default) the "slaves" will be shut down (if they haven't done so automatically) and the corresponding information about the IOPS / throughput rate achieved will be shown, based on (G)V$SYSMETRIC_HISTORY, so at least 120 seconds of runtime are required to have this final query to show something meaningful (to ensure that at least one 60 seconds interval is fully covered).

In addition the script by default will generate performance report snapshots (either AWR or STATSPACK) and display the corresponding report at the end. The file name generated describes the test performed along with the most important parameters (parallel degree, I/O mode (sync / async), object size, duration, timestamp etc.) Note that the script on Unix/Linux makes use of the "xdg-open" utility to open the generated report, so the "xdg-utils" package would be required to be installed to have this working as intended.

Note that in 12.2.0.1 the PDB level reports and metrics seem to miss "physical single block writes" performed by the DB Writer, so effectively evaluating / running this benchmark in 12.2.0.1 on PDB level won't report anything meaningful - you would have to resort to reports on CDB level instead, which I haven't implemented (actually I had to put in some effort to use the PDB level AWR reports and metrics in 12.2, so hopefully Oracle will fix this in future versions).

Finally the benchmark script will clean up and drop the objects created for the benchmark.

In principle the benchmark scripts should cope with all kinds of configurations: Windows / Unix / Linux, Single Instance / RAC, Standard / Enterprise Edition, PDB / Non-PDB, Non-Exadata / Exadata, and support versions from 11.2.0.4 on. It might run on lower versions, too, but not tested, and of course 18c (12.2.0.2) is not available on premises yet at the time of writing this, so not tested either.

But since this is 1.0 version it obviously wasn't tested in all possible combinations / configurations / parameter settings, so expect some glitches. Feedback and ideas how to improve are welcome.

Where applicable the benchmark harness script also generates two tables EVENT_HISTOGRAM_MICRO1 and EVENT_HISTOGRAM_MICRO2 which are snapshots of GV$EVENT_HISTOGRAM_MICRO available from 12.1 on for synchronous single / multi block reads. The "harness" scripts provide a suitable query in the script header to display the latency histogram information nicely.

Happy I/O benchmarking!

DOAG 2017 and IT Tage 2017 Presentation Material

Wed, 2018-01-24 02:01
First of all Happy New Year to everyone and I want to thank the numerous people that attended my sessions "Calamities With Cardinalities" at DOAG 2017 and IT Tage 2017 last November and December.

You can find the slide deck here at slideshare.net.

Stay tuned for more publications this year - at least I have some interesting stuff upcoming.

DOAG Red Stack Magazin Artikelreihe "Oracle Database Cloud Performance" (German)

Wed, 2017-06-28 11:03
In der aktuellen Ausgabe des für Mitglieder der DOAG, SOUG und AOUG kostenlosen DOAG Red Stack Magazins wurde der erste Teil meiner zweiteiligen Artikelserie "Oracle Database Cloud Performance" veröffentlicht.

Die Artikelserie basiert auf den hier bereits publizierten Erkenntnissen in diesem Bereich und führt diese weiter fort.

Der erste Teil geht auf die verschiedenen Aspekte der maximal erreichbare Performance ein (CPU, Storage etc.), der zweite Teil wird in der nächsten Ausgabe zu lesen sein und legt den Schwerpunkt auf die Konsistenz der Performance, also wie konsistent sich die Datenbanken in der Cloud in Bezug auf Performance während der Tests verhalten haben.

New workshop "Exadata For Developers"

Wed, 2017-05-31 15:00
Just a short note that I've developed a new two day course that covers all relevant features that a database application developer should know when dealing with the Oracle Exadata Database Machine platform.

It covers in detail Smart Scans, the Exadata Flash Cache, Hybrid Columnar Compression and all surrounding features like Storage Indexes, (serial) direct path reads etc. etc.. Of course it also includes features that were added in 12c, like Attribute Clustering and Zone Maps.

All features are presented with live demo scripts, and there will be enough time to discuss your specific questions and analyse existing applications if desired.

For more information and details, check the corresponding pages:

German: Exadata für Anwendungsentwickler

English: Exadata For Developers

Oracle Database Cloud (DBaaS) Performance - Part 4 - Network

Mon, 2017-02-06 01:00
In the last part of this installment I'll have a brief look at the network performance measured in the Oracle DBaaS environment, in particular the network interface that gets used as private interconnect in case of RAC configuration. The network performance could also be relevant when evaluating how to transfer data to the cloud database.

I've used the freely available "iperf" tool to measure the network bandwidth and got the following results:

[root@test12102rac2 ~]# iperf3 -c 10.196.49.126
Connecting to host 10.196.49.126, port 5201
[  4] local 10.196.49.130 port 41647 connected to 10.196.49.126 port 5201
[ ID] Interval           Transfer     Bandwidth       Retr  Cwnd
[  4]   0.00-1.00   sec   651 MBytes  5.46 Gbits/sec   15    786 KBytes
[  4]   1.00-2.00   sec   823 MBytes  6.90 Gbits/sec   11   1.07 MBytes
[  4]   2.00-3.00   sec   789 MBytes  6.62 Gbits/sec    7   1014 KBytes
[  4]   3.00-4.00   sec   700 MBytes  5.87 Gbits/sec   39   1.04 MBytes
[  4]   4.00-5.00   sec   820 MBytes  6.88 Gbits/sec   21    909 KBytes
[  4]   5.00-6.00   sec   818 MBytes  6.86 Gbits/sec   17   1.17 MBytes
[  4]   6.00-7.00   sec   827 MBytes  6.94 Gbits/sec   21   1005 KBytes
[  4]   7.00-8.00   sec   792 MBytes  6.64 Gbits/sec    8    961 KBytes
[  4]   8.00-9.00   sec   767 MBytes  6.44 Gbits/sec    4   1.11 MBytes
[  4]   9.00-10.00  sec   823 MBytes  6.91 Gbits/sec    6   1.12 MBytes
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bandwidth       Retr
[  4]   0.00-10.00  sec  7.63 GBytes  6.55 Gbits/sec  149             sender
[  4]   0.00-10.00  sec  7.63 GBytes  6.55 Gbits/sec                  receiver

iperf Done.

So the network bandwidth seems to be something between 6 and 7 Gbits/sec, which is not too bad.

For completeness, the UDP results look like the following:

[root@test12102rac2 ~]# iperf3 -c 10.196.49.126 -u -b 10000M
Connecting to host 10.196.49.126, port 5201
[  4] local 10.196.49.130 port 55482 connected to 10.196.49.126 port 5201
[ ID] Interval           Transfer     Bandwidth       Total Datagrams
[  4]   0.00-1.00   sec   494 MBytes  4.14 Gbits/sec  63199
[  4]   1.00-2.00   sec   500 MBytes  4.20 Gbits/sec  64057
[  4]   2.00-3.00   sec   462 MBytes  3.87 Gbits/sec  59102
[  4]   3.00-4.00   sec   496 MBytes  4.16 Gbits/sec  63491
[  4]   4.00-5.00   sec   482 MBytes  4.05 Gbits/sec  61760
[  4]   5.00-6.00   sec   425 MBytes  3.57 Gbits/sec  54411
[  4]   6.00-7.00   sec   489 MBytes  4.10 Gbits/sec  62574
[  4]   7.00-8.00   sec   411 MBytes  3.45 Gbits/sec  52599
[  4]   8.00-9.00   sec   442 MBytes  3.71 Gbits/sec  56541
[  4]   9.00-10.00  sec   481 MBytes  4.04 Gbits/sec  61614
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bandwidth       Jitter    Lost/Total Datagrams
[  4]   0.00-10.00  sec  4.57 GBytes  3.93 Gbits/sec  0.028 ms  23434/599340 (3.9%)
[  4] Sent 599340 datagrams

iperf Done.

Finally, "ping" results look like the following:

9665 packets transmitted, 9665 received, 0% packet loss, time 9665700ms
rtt min/avg/max/mdev = 0.135/0.308/199.685/3.322 ms

So an average latency of 0.3 ms also doesn't look too bad.

[Update 6.2.2017]: Thanks to Frits Hoogland who pointed out the very high "max" value for the ping. Although I didn't spot the pattern that he saw in a different network test setup ("cross cloud platform"), which was an initial slowness, it's still worth to point out the high "max" value of almost 200 ms for a ping, and also the "mdev" value of 3.322 ms seems to suggest that there were some significant variations in ping times observed that are potentially hidden behind the average values provided. I'll repeat the ping test and see if I can reproduce these outliers and if yes, find out more details.

Oracle Database Cloud (DBaaS) Performance - Part 3 - Storage - 12.2 Update

Mon, 2017-01-30 01:00
Recently I repeated the I/O related tests on a 12.2.0.1 instance for curiosity and was surprised by the fact that I consistently got significantly better results as on 11.2.0.4 and 12.1.0.2.

Now you're probably aware that the version 12.2 so far is "cloud-only", so I can't tell / test whether the version 12.2 is generically providing that increased performance or whether Oracle has optimized the underlying stack, so that previous versions in general could also benefit from better performance if they ran on the same platform. Repeated tests with versions 11.2.0.4 and 12.1.0.2 confirmed the performance figures reported in the previous installment of this series, so as of the time of writing it's only the version 12.2 that provides the improved I/O performance.

Note that as of the time of writing only a single instance configuration was supported with version 12.2, so I wasn't able to run the tests in RAC configuration.

Here are the 12.2 I/O related test results:

IOPS

Again running the test on a 4 OCPU single instance configuration (results in 8 CPUs / 8 cores as outlined previously) with eight sessions:


So that is more than 65,000 IOPS on average at 8 KB block size, significantly more than the corresponding (and already impressive) 40,000 IOPS seen in the previous versions, and even at 16 KB block size still more than 50,000 IOPS - that is more than 800 MB / sec in single block requests!

I/O Throughput

Repeating the same Parallel Execution based test that first creates a very large table (8 KB block size, "direct path write") and then re-reads it using Parallel Query ("direct path read") I got the following results on the 4 OCPU single instance configuration:


Again the results are significantly improved over previous versions. The read performance improved from 640 MB / sec to almost 940 MB / sec. More importantly however the write performance improved from 120 MB / sec to 200 MB / sec, a performance that should allow even more write intensive workloads to perform well.

I/O Latency

The same test as previously was run, switching from asynchronous I/O ("db file parallel read") to synchronous I/O ("db file sequential read") allowing measurement of single block reads, running with 8 threads on the 4 OCPU / 8 CPU / cores configuration at 8 KB block size.




Again an significant improvement across all figures, 0.375 ms average wait time vs. 0.45 ms previously. almost 19,000 IOPS vs. 16,380 IOPS and more than 90% of the waits within 512 microseconds vs. 75% before.

Write Performance

With the significantly improved write throughput figures the expectation was that the random read + write test would perform much better than before, and it did:



In previous versions this test waited almost 50% on "free buffer waits" (due to the minimized buffer cache), so clearly the DBWR couldn't keep up with writing dirty blocks to disk. The picture has changed here significantly, with "free buffer waits" going down to just 4.5% of the overall database time, and performing almost 5,000 write requests per second (and almost 20,000 IOPS in total).

Note that the version 12.2 obviously introduces an optimization that treats repeated modifications of the same kind to the same block (in this case here update a column of a row to the same value as before) not as a "db block change", therefore I had to adjust the test to update the column value to a different value each time. Running the original test case showed a totally different profile, due the minimized "db block changes".

Summary

The 12.2 version in the Oracle Cloud shows a significantly improved I/O performance profile compared to previous versions. So far it is unclear whether that is a generic improvement of the new release, or the underlying stack used by the virtual machines has been optimized.

In particular the improved write performance is an important improvement.

Oracle Database Cloud (DBaaS) Performance - Part 2 - Storage

Mon, 2017-01-23 01:00
In this second part of this installment I'll focus on the performance figures related to I/O encountered when the corresponding tests were performed on the platform.

IOPS

When running with minimum sized buffer cache, direct and asynchronous I/O enabled, the following average read-only IOPS figures were measured over a period of several days (this is the test described in part three of the "performance consistency" series) .

First, running on a 4 OCPU single instance configuration (8 CPUs / 8 cores as outlined in the previous part) with either four or eight sessions:


Second, running on a 2+2 OCPU two instance RAC configuration (4+4 CPUs / 4+4 cores as outlined in the previous part) with either four or eight sessions:


So for the 8 KB block size the single instance test shows an average of almost 40.000 IOPS for read-only tests, and the two instance RAC even scales to almost 60.000 IOPS on average. These are pretty impressive IOPS figures for a general purpose shared / virtualized environment, and - at least - for the read part - are way above what other DBaaS cloud providers offer out of the box.

It's also worth mentioning that I got the same IOPS results independent from the amount of storage allocated - in contrast to Amazon RDS for example, where even the "Provisioned IOPS" storage class requires you to allocate at least 3 TB of storage in order to get the maximum of 30,000 IOPS. I've repeated the same test setup with the table size inflated to the maximum possible within my test account limitations (so total size of storage allocated close to the 1 TB storage quota) and still got comparable IOPS results to this test that only allocates approx. 1 GB for the 8 KB block size test (total storage allocated approx. 100 GB).

I/O Throughput

In order to measure the maximum I/O throughput corresponding Parallel Execution statements were used to create tables of sufficient size and read them via Parallel Query.

First, running on a 4 OCPU single instance configuration (8 CPUs / 8 cores as outlined in the previous part) at a degree of 16:


Second, running on a 2+2 OCPU two instance RAC configuration (4+4 CPUs / 4+4 cores as outlined in the previous part) at a degree of 16:


So the single instance configuration writes at 120 MB per second, and reads at 640 MB per second, whereas the two instance RAC configuration writes at 80 MB per second and reads at 1.100 MB per second. Clearly the storage layer is read optimized and writes are much slower than reads. The read performance is very good however, and again above what other cloud providers deliver (for example Amazon RDS "Provisioned IOPS" offers a maximum throughput of 320 MB per second, but again only if a sufficient amount of storage is allocated).

I/O Latency

In order to measure the I/O latency the test used above to determine the IOPS rate was modified to use random single block reads ("db file sequential read") instead of asynchronous reads ("db file parallel read"). The reason for this is that from a database perspective the latency for reads performed asynchronously cannot be measured properly, due to the way the database handles the I/O.

The results were the following, this time running on a 2 OCPU (4 CPUs / 4 cores) single instance configuration with 8 KB block size and 8 sessions:



So again rather impressive 0.45 ms average wait time for a random single block read, and the wait event histogram also shows that the performance was very consistent, with almost 75 percent of the waits taking less than 512 microseconds at a rate of more than 16.000 IOPS.

Write Performance

The read-only test above was modified to be 100 percent updates (for more details see here). Since the instance was configured to operate with a minimum sized buffer cache this puts maximum pressure on the database writer to write dirty blocks as fast as possible in order to allow new blocks to be read into the buffer cache.

Running on a 4 OCPU single instance configuration (8 CPUs / 8 cores) with eight sessions the following profile was measured - using NOARCHIVELOG mode and Flashback disabled:


So clearly the database writer couldn't write the dirty blocks quick enough - almost 50 percent of the database time the sessions had to wait for free buffers. This means that write intensive workloads might not work too well and run into these limitations.

Summary

The storage layer is clearly optimized for reads and delivers at least for the read-only tests a very good I/O performance. Write intensive workloads might not work too well if they consistently require more write performance than provided by the service.

Oracle Parallel Execution Deep Dive Session

Mon, 2017-01-16 01:00
Here is a recording of a session I did a while ago, covering how to understand the essentials of Oracle Parallel Execution and how to read the corresponding execution plans.


12c Adaptive Joins Plus Statistics Feedback For Joins Cardinality Estimate Bug

Mon, 2017-01-09 01:00
I've encountered a bug at several clients that upgraded to Oracle 12c - 12.1.0.2 - that requires the combination of several new adaptive features introduced with Oracle 12c.

It needs an execution plan that makes use of adaptive join methods, plus at runtime the activation of the new "statistics feedback for joins" feature that was also introduced with Oracle 12c. Note that in 11.2 there was already the "cardinality feedback" feature that only applies to single table cardinality misestimates, but not to join cardinality misestimates.

In case then the join method used at runtime is a Nested Loop join - not necessarily the join method preferred initially, so a runtime switch from Hash to Nested Loop join also reproduces the problem - the "statistics feedback for joins" feature generates a bad OPT_ESTIMATE hint for the join cardinality that always seems to be one, like the following: OPT_ESTIMATE(... JOIN (...) ROWS=1.000000), no matter what the actual join cardinality observed was.

This can lead to very inefficient execution plans that get generated based on the "statistics feedback" on subsequent executions of the same statement, caused by the misleading join cardinality used to generate the new plan.

The good news is that in 12.2.0.1 and in 12.1, when making use of the new backport available that enables the same optimizer default settings than in 12.2 - for more information see this MOS document: Recommendations for Adaptive Features in Oracle Database 12c Release 1 (2187449.1) and for example Christian Antognini's post - the "statistics feedback for joins" feature is disabled by default (in fact in principle only the "adaptive joins" feature is left enabled, all other adaptive features more or less disabled), so the problem doesn't occur there. So one more good reason why installing this backport in 12.1 is a good idea.

However, when enabling this feature specifically or simply enabling all "adaptive statistics" features (OPTIMIZER_ADAPTIVE_STATISTICS = TRUE in 12.2 / 12.1 with backport) the problem also reproduces in 12.2.

If you don't make use of the recommended optimizer settings backport in 12.1 yet then setting "_optimizer_use_feedback" to FALSE prevents the problem, however this will disable both, the "cardinality feedback for single table" 11.2 feature as well as the "statistics feedback for joins" 12c feature.

In 12.2, there are two parameters related to "cardinality feedback", "_OPTIMIZER_USE_FEEDBACK" that controls the "cardinality feedback for single table" 11.2 feature and "_OPTIMIZER_USE_FEEDBACK_FOR_JOIN" that controls the new "statistics feedback for joins" feature. Hence, in 12.2, when enabling the "adaptive statistics" feature, the problem can be avoided by setting specifically "_OPTIMIZER_USE_FEEDBACK_FOR_JOIN" to FALSE, which would still leave the "cardinality feedback for single table" 11.2 feature enabled.

For more information regarding the various optimizer related settings and differences between 12.1 and 12.2 / 12.1 backport, see this very informative post by Christian Antognini.

Here is a simplified test case that allows reproducing the problem:

-----------------------------------------------------------------------
-- Adaptive joins combined with statistics feedback for joins
-- lead to join cardinality estimate of 1
-- caused by incorrect statistics feedback OPT_ESTIMATE hint generated
--
-- The problem seems to be related to the following combination:
--
-- - Adaptive join method selected
-- - Statistics feedback for joins kicks in
-- - The runtime join method used is Nested Loop join
--
-- Reproduced: 12.1.0.2
-- 12.2.0.1 (with adaptive statistics features re-enabled)
--
-----------------------------------------------------------------------

set echo on

alter system flush shared_pool;

drop table t1 purge;

drop table t2 purge;

create table t1
as
select
rownum as id
, cast('9999' as varchar2(4)) as hist_ind
, rpad('x', 200) as filler
from
dual
connect by
level <= 30000
order by
dbms_random.value
;

exec dbms_stats.gather_table_stats(null, 't1', method_opt => 'for all columns size 1')

create unique index t1_idx on t1 (id);

create table t2
as
with
generator1 as
(
select /*+
cardinality(1e3)
materialize
*/
rownum as id
, rpad('x', 100) as filler
from
dual
connect by
level <= 1e3
),
generator2 as
(
select /*+
cardinality(1e3)
materialize
*/
rownum as id
, rpad('x', 100) as filler
from
dual
connect by
level <= 1e3
),
generator as
(
select /*+ leading(b a) */
(a.id - 1) * 1e3 + b.id as id
from
generator1 a
, generator2 b
)
select
case when id <= 10000 then 1 when id <= 20000 then -1 else id end as id
, cast('N' as varchar2(1)) as some_ind
, rpad('x', 200) as filler
from
generator
;

exec dbms_stats.gather_table_stats(null, 't2', method_opt => 'for all columns size 1')

create /*unique*/ index t2_idx on t2 (id);

set echo on define on

column sql_id new_value sql_id

alter session set statistics_level = all;

-- Enable statistics feedback for joins in 12.2.0.1
alter session set optimizer_adaptive_statistics = true;

-- Disabling statistics feedback for joins prevents the bug
-- alter session set "_OPTIMIZER_USE_FEEDBACK_FOR_JOIN" = false;

-- Or disabling adaptive joins prevents the bug
-- alter session set "_OPTIMIZER_ADAPTIVE_PLANS" = false;

-- alter session set "_OPTIMIZER_NLJ_HJ_ADAPTIVE_JOIN" = false;

-- Not related to NL join optimized plan shapes
--alter session set "_nlj_batching_enabled" = 0;

--alter session set "_table_lookup_prefetch_size" = 0;

-- Expected number of rows, initial join NL, still "statistics feedback" kicks in (is it a bug that statistics feedback kicks in here?)
-- No join method switch
-- Bad OPT_ESTIMATE hint for join (ROWS=1) => bug
SELECT /*+ leading(a1) */ count(A1.SOME_IND) FROM
T1 A0 INNER JOIN T2 A1
ON 1 = 1
AND A1.ID = A0.ID
WHERE 1 = 1
AND A0.HIST_IND = '9999'
AND A0.ID between 20000 and 22000
;

-- Re-execute to have statistics feedback kick in
/

select prev_sql_id as sql_id from v$session where sid = userenv('sid');

select * from table(dbms_xplan.display_cursor('&sql_id', null, format => 'TYPICAL ALLSTATS LAST ADAPTIVE'));

select hint_text from V$SQL_REOPTIMIZATION_HINTS where sql_id = '&sql_id' and hint_text like '%JOIN%';

-- More than expected number of rows, initial join NL, "statistics feedback" kicks in
-- Switch from NL to HASH (at runtime and at re-optimization time)
-- Generated OPT_ESTIMATE hints are OK
SELECT /*+ leading(a1) */ count(A1.SOME_IND) FROM
T1 A0 INNER JOIN T2 A1
ON 1 = 1
AND A1.ID = A0.ID
WHERE 1 = 1
AND A0.HIST_IND = '9999'
AND A0.ID between 1 and 2
;

-- Re-execute to have statistics feedback kick in
/

select prev_sql_id as sql_id from v$session where sid = userenv('sid');

select * from table(dbms_xplan.display_cursor('&sql_id', null, format => 'TYPICAL ALLSTATS LAST ADAPTIVE'));

select hint_text from V$SQL_REOPTIMIZATION_HINTS where sql_id = '&sql_id' and hint_text like '%JOIN%';

-- Less than expected number of rows, initial join HASH, "statistics feedback" kicks in
-- Switch from HASH to NL (at runtime and at re-optimization time)
-- Bad OPT_ESTIMATE hint for join generated (ROWS=1) => bug
SELECT /*+ leading(a1) */ count(A1.SOME_IND) FROM
T1 A0 INNER JOIN T2 A1
ON 1 = 1
AND A1.ID = A0.ID
WHERE 1 = 1
AND A0.HIST_IND = '9999'
AND A0.ID between 2 and 20500
;

-- Re-execute to have statistics feedback kick in
/

select prev_sql_id as sql_id from v$session where sid = userenv('sid');

select * from table(dbms_xplan.display_cursor('&sql_id', null, format => 'TYPICAL ALLSTATS LAST ADAPTIVE'));

select hint_text from V$SQL_REOPTIMIZATION_HINTS where sql_id = '&sql_id' and hint_text like '%JOIN%';

-- Expected number of rows, initial join HASH, "statistics feedback" does not kick in
-- No join method switch
SELECT /*+ leading(a1) */ count(A1.SOME_IND) FROM
T1 A0 INNER JOIN T2 A1
ON 1 = 1
AND A1.ID = A0.ID
WHERE 1 = 1
AND A0.HIST_IND = '9999'
AND A0.ID between 20000 and 30000
;

-- Re-execute to have statistics feedback kick in
/

select prev_sql_id as sql_id from v$session where sid = userenv('sid');

select * from table(dbms_xplan.display_cursor('&sql_id', null, format => 'TYPICAL ALLSTATS LAST ADAPTIVE'));

select hint_text from V$SQL_REOPTIMIZATION_HINTS where sql_id = '&sql_id' and hint_text like '%JOIN%';
The critical part of the output looks like this:

SQL> -- Expected number of rows, initial join NL, still "statistics feedback" kicks in (is it a bug that statistics feedback kicks in here?)
SQL> -- No join method switch
SQL> -- Bad OPT_ESTIMATE hint for join (ROWS=1) => bug
SQL> SELECT /*+ leading(a1) */ count(A1.SOME_IND) FROM
2 T1 A0 INNER JOIN T2 A1
3 ON 1 = 1
4 AND A1.ID = A0.ID
5 WHERE 1 = 1
6 AND A0.HIST_IND = '9999'
7 AND A0.ID between 20000 and 22000
8 ;

COUNT(A1.SOME_IND)
------------------
2000

SQL>
SQL> -- Re-execute to have statistics feedback kick in
SQL> /

COUNT(A1.SOME_IND)
------------------
2000

SQL>
SQL> select prev_sql_id as sql_id from v$session where sid = userenv('sid');

SQL_ID
-------------
8mqn521y28t58

SQL>
SQL> select * from table(dbms_xplan.display_cursor('&sql_id', null, format => 'TYPICAL ALLSTATS LAST ADAPTIVE'));

PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
SQL_ID 8mqn521y28t58, child number 0
-------------------------------------
SELECT /*+ leading(a1) */ count(A1.SOME_IND) FROM T1 A0 INNER JOIN
T2 A1 ON 1 = 1 AND A1.ID = A0.ID WHERE 1 = 1 AND A0.HIST_IND =
'9999' AND A0.ID between 20000 and 22000

Plan hash value: 3258782287

--------------------------------------------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows |E-Bytes| Cost (%CPU)| E-Time | A-Rows | A-Time | Buffers | Reads |
--------------------------------------------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | | 2036 (100)| | 1 |00:00:00.02 | 4044 | 38 |
| 1 | SORT AGGREGATE | | 1 | 1 | 17 | | | 1 |00:00:00.02 | 4044 | 38 |
|- * 2 | HASH JOIN | | 1 | 2001 | 34017 | 2036 (1)| 00:00:01 | 2000 |00:00:00.02 | 4044 | 38 |
| 3 | NESTED LOOPS | | 1 | 2001 | 34017 | 2036 (1)| 00:00:01 | 2000 |00:00:00.02 | 4044 | 38 |
| 4 | NESTED LOOPS | | 1 | 2002 | 34017 | 2036 (1)| 00:00:01 | 2000 |00:00:00.02 | 2044 | 38 |
|- 5 | STATISTICS COLLECTOR | | 1 | | | | | 2000 |00:00:00.01 | 2007 | 6 |
| 6 | TABLE ACCESS BY INDEX ROWID BATCHED| T2 | 1 | 2002 | 14014 | 1974 (0)| 00:00:01 | 2000 |00:00:00.01 | 2007 | 6 |
| * 7 | INDEX RANGE SCAN | T2_IDX | 1 | 2002 | | 7 (0)| 00:00:01 | 2000 |00:00:00.01 | 7 | 6 |
| * 8 | INDEX UNIQUE SCAN | T1_IDX | 2000 | 1 | | 0 (0)| | 2000 |00:00:00.01 | 37 | 32 |
| * 9 | TABLE ACCESS BY INDEX ROWID | T1 | 2000 | 1 | 10 | 1 (0)| 00:00:01 | 2000 |00:00:00.01 | 2000 | 0 |
|- * 10 | TABLE ACCESS FULL | T1 | 0 | 1 | 10 | 1 (0)| 00:00:01 | 0 |00:00:00.01 | 0 | 0 |
--------------------------------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

2 - access("A1"."ID"="A0"."ID")
7 - access("A1"."ID">=20000 AND "A1"."ID"<=22000)
8 - access("A1"."ID"="A0"."ID")
filter(("A0"."ID">=20000 AND "A0"."ID"<=22000))
9 - filter("A0"."HIST_IND"='9999')
10 - filter(("A0"."ID">=20000 AND "A0"."ID"<=22000 AND "A0"."HIST_IND"='9999'))

Note
-----
- this is an adaptive plan (rows marked '-' are inactive)

SQL_ID 8mqn521y28t58, child number 1
-------------------------------------
SELECT /*+ leading(a1) */ count(A1.SOME_IND) FROM T1 A0 INNER JOIN
T2 A1 ON 1 = 1 AND A1.ID = A0.ID WHERE 1 = 1 AND A0.HIST_IND =
'9999' AND A0.ID between 20000 and 22000

Plan hash value: 3258782287

-----------------------------------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows |E-Bytes| Cost (%CPU)| E-Time | A-Rows | A-Time | Buffers |
-----------------------------------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | | 2036 (100)| | 1 |00:00:00.02 | 4044 |
| 1 | SORT AGGREGATE | | 1 | 1 | 17 | | | 1 |00:00:00.02 | 4044 |
|- * 2 | HASH JOIN | | 1 | 1 | 17 | 2036 (1)| 00:00:01 | 2000 |00:00:00.02 | 4044 |
| 3 | NESTED LOOPS | | 1 | 1 | 17 | 2036 (1)| 00:00:01 | 2000 |00:00:00.02 | 4044 |
| 4 | NESTED LOOPS | | 1 | 2002 | 17 | 2036 (1)| 00:00:01 | 2000 |00:00:00.01 | 2044 |
|- 5 | STATISTICS COLLECTOR | | 1 | | | | | 2000 |00:00:00.01 | 2007 |
| 6 | TABLE ACCESS BY INDEX ROWID BATCHED| T2 | 1 | 2002 | 14014 | 1974 (0)| 00:00:01 | 2000 |00:00:00.01 | 2007 |
| * 7 | INDEX RANGE SCAN | T2_IDX | 1 | 2002 | | 7 (0)| 00:00:01 | 2000 |00:00:00.01 | 7 |
| * 8 | INDEX UNIQUE SCAN | T1_IDX | 2000 | 1 | | 0 (0)| | 2000 |00:00:00.01 | 37 |
| * 9 | TABLE ACCESS BY INDEX ROWID | T1 | 2000 | 1 | 10 | 1 (0)| 00:00:01 | 2000 |00:00:00.01 | 2000 |
|- * 10 | TABLE ACCESS FULL | T1 | 0 | 1 | 10 | 1 (0)| 00:00:01 | 0 |00:00:00.01 | 0 |
-----------------------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

2 - access("A1"."ID"="A0"."ID")
7 - access("A1"."ID">=20000 AND "A1"."ID"<=22000)
8 - access("A1"."ID"="A0"."ID")
filter(("A0"."ID">=20000 AND "A0"."ID"<=22000))
9 - filter("A0"."HIST_IND"='9999')
10 - filter(("A0"."ID">=20000 AND "A0"."ID"<=22000 AND "A0"."HIST_IND"='9999'))

Note
-----
- statistics feedback used for this statement
- this is an adaptive plan (rows marked '-' are inactive)


77 rows selected.

SQL>
SQL> select hint_text from V$SQL_REOPTIMIZATION_HINTS where sql_id = '&sql_id' and hint_text like '%JOIN%';

HINT_TEXT
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
OPT_ESTIMATE (@"SEL$58A6D7F6" JOIN ("A0"@"SEL$1" "A1"@"SEL$1") ROWS=1.000000 )

SQL>
SQL> -- More than expected number of rows, initial join NL, "statistics feedback" kicks in
SQL> -- Switch from NL to HASH (at runtime and at re-optimization time)
SQL> -- Generated OPT_ESTIMATE hints are OK
SQL> SELECT /*+ leading(a1) */ count(A1.SOME_IND) FROM
2 T1 A0 INNER JOIN T2 A1
3 ON 1 = 1
4 AND A1.ID = A0.ID
5 WHERE 1 = 1
6 AND A0.HIST_IND = '9999'
7 AND A0.ID between 1 and 2
8 ;

COUNT(A1.SOME_IND)
------------------
10000

SQL>
SQL> -- Re-execute to have statistics feedback kick in
SQL> /

COUNT(A1.SOME_IND)
------------------
10000

SQL>
SQL> select prev_sql_id as sql_id from v$session where sid = userenv('sid');

SQL_ID
-------------
92rttcj6ntzqs

SQL>
SQL> select * from table(dbms_xplan.display_cursor('&sql_id', null, format => 'TYPICAL ALLSTATS LAST ADAPTIVE'));

PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
SQL_ID 92rttcj6ntzqs, child number 0
-------------------------------------
SELECT /*+ leading(a1) */ count(A1.SOME_IND) FROM T1 A0 INNER JOIN
T2 A1 ON 1 = 1 AND A1.ID = A0.ID WHERE 1 = 1 AND A0.HIST_IND =
'9999' AND A0.ID between 1 and 2

Plan hash value: 777836357

-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows |E-Bytes| Cost (%CPU)| E-Time | A-Rows | A-Time | Buffers | Reads | OMem | 1Mem | Used-Mem |
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | | 7 (100)| | 1 |00:00:00.04 | 1262 | 70 | | | |
| 1 | SORT AGGREGATE | | 1 | 1 | 17 | | | 1 |00:00:00.04 | 1262 | 70 | | | |
| * 2 | HASH JOIN | | 1 | 2 | 34 | 7 (0)| 00:00:01 | 10000 |00:00:00.04 | 1262 | 70 | 2061K| 2061K| 1355K (0)|
|- 3 | NESTED LOOPS | | 1 | 2 | 34 | 7 (0)| 00:00:01 | 10000 |00:00:00.03 | 1258 | 40 | | | |
|- 4 | NESTED LOOPS | | 1 | 3 | 34 | 7 (0)| 00:00:01 | 10000 |00:00:00.02 | 1258 | 40 | | | |
|- 5 | STATISTICS COLLECTOR | | 1 | | | | | 10000 |00:00:00.02 | 1258 | 40 | | | |
| 6 | TABLE ACCESS BY INDEX ROWID BATCHED| T2 | 1 | 3 | 21 | 6 (0)| 00:00:01 | 10000 |00:00:00.02 | 1258 | 40 | | | |
| * 7 | INDEX RANGE SCAN | T2_IDX | 1 | 3 | | 3 (0)| 00:00:01 | 10000 |00:00:00.01 | 23 | 40 | | | |
|- * 8 | INDEX UNIQUE SCAN | T1_IDX | 0 | 1 | | 0 (0)| | 0 |00:00:00.01 | 0 | 0 | | | |
|- * 9 | TABLE ACCESS BY INDEX ROWID | T1 | 0 | 1 | 10 | 1 (0)| 00:00:01 | 0 |00:00:00.01 | 0 | 0 | | | |
| * 10 | TABLE ACCESS BY INDEX ROWID BATCHED | T1 | 1 | 1 | 10 | 1 (0)| 00:00:01 | 2 |00:00:00.01 | 4 | 30 | | | |
| * 11 | INDEX RANGE SCAN | T1_IDX | 1 | 1 | | 0 (0)| | 2 |00:00:00.01 | 2 | 30 | | | |
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

2 - access("A1"."ID"="A0"."ID")
7 - access("A1"."ID">=1 AND "A1"."ID"<=2)
8 - access("A1"."ID"="A0"."ID")
filter(("A0"."ID"<=2 AND "A0"."ID">=1))
9 - filter("A0"."HIST_IND"='9999')
10 - filter("A0"."HIST_IND"='9999')
11 - access("A0"."ID">=1 AND "A0"."ID"<=2)

Note
-----
- this is an adaptive plan (rows marked '-' are inactive)

SQL_ID 92rttcj6ntzqs, child number 1
-------------------------------------
SELECT /*+ leading(a1) */ count(A1.SOME_IND) FROM T1 A0 INNER JOIN
T2 A1 ON 1 = 1 AND A1.ID = A0.ID WHERE 1 = 1 AND A0.HIST_IND =
'9999' AND A0.ID between 1 and 2

Plan hash value: 3588347061

---------------------------------------------------------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows |E-Bytes| Cost (%CPU)| E-Time | A-Rows | A-Time | Buffers | OMem | 1Mem | Used-Mem |
---------------------------------------------------------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | | 7982 (100)| | 1 |00:00:00.13 | 29516 | | | |
| 1 | SORT AGGREGATE | | 1 | 1 | 17 | | | 1 |00:00:00.13 | 29516 | | | |
|* 2 | HASH JOIN | | 1 | 10000 | 166K| 7982 (1)| 00:00:01 | 10000 |00:00:00.13 | 29516 | 2061K| 2061K| 1356K (0)|
|* 3 | TABLE ACCESS FULL | T2 | 1 | 10000 | 70000 | 7978 (1)| 00:00:01 | 10000 |00:00:00.12 | 29512 | | | |
|* 4 | TABLE ACCESS BY INDEX ROWID BATCHED| T1 | 1 | 2 | 20 | 4 (0)| 00:00:01 | 2 |00:00:00.01 | 4 | | | |
|* 5 | INDEX RANGE SCAN | T1_IDX | 1 | 2 | | 2 (0)| 00:00:01 | 2 |00:00:00.01 | 2 | | | |
---------------------------------------------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

2 - access("A1"."ID"="A0"."ID")
3 - filter(("A1"."ID"<=2 AND "A1"."ID">=1))
4 - filter("A0"."HIST_IND"='9999')
5 - access("A0"."ID">=1 AND "A0"."ID"<=2)

Note
-----
- statistics feedback used for this statement


71 rows selected.

SQL>
SQL> select hint_text from V$SQL_REOPTIMIZATION_HINTS where sql_id = '&sql_id' and hint_text like '%JOIN%';

HINT_TEXT
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
OPT_ESTIMATE (@"SEL$58A6D7F6" JOIN ("A0"@"SEL$1" "A1"@"SEL$1") ROWS=10000.000000 )

SQL>
SQL> -- Less than expected number of rows, initial join HASH, "statistics feedback" kicks in
SQL> -- Switch from HASH to NL (at runtime and at re-optimization time)
SQL> -- Bad OPT_ESTIMATE hint for join generated (ROWS=1) => bug
SQL> SELECT /*+ leading(a1) */ count(A1.SOME_IND) FROM
2 T1 A0 INNER JOIN T2 A1
3 ON 1 = 1
4 AND A1.ID = A0.ID
5 WHERE 1 = 1
6 AND A0.HIST_IND = '9999'
7 AND A0.ID between 2 and 20500
8 ;

COUNT(A1.SOME_IND)
------------------
500

SQL>
SQL> -- Re-execute to have statistics feedback kick in
SQL> /

COUNT(A1.SOME_IND)
------------------
500

SQL>
SQL> select prev_sql_id as sql_id from v$session where sid = userenv('sid');

SQL_ID
-------------
c55rjg5mdxpph

SQL>
SQL> select * from table(dbms_xplan.display_cursor('&sql_id', null, format => 'TYPICAL ALLSTATS LAST ADAPTIVE'));

PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
SQL_ID c55rjg5mdxpph, child number 0
-------------------------------------
SELECT /*+ leading(a1) */ count(A1.SOME_IND) FROM T1 A0 INNER JOIN
T2 A1 ON 1 = 1 AND A1.ID = A0.ID WHERE 1 = 1 AND A0.HIST_IND =
'9999' AND A0.ID between 2 and 20500

Plan hash value: 1011946885

-------------------------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows |E-Bytes| Cost (%CPU)| E-Time | A-Rows | A-Time | Buffers |
-------------------------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | | 8227 (100)| | 1 |00:00:00.13 | 30028 |
| 1 | SORT AGGREGATE | | 1 | 1 | 17 | | | 1 |00:00:00.13 | 30028 |
|- * 2 | HASH JOIN | | 1 | 20500 | 340K| 8227 (1)| 00:00:01 | 500 |00:00:00.13 | 30028 |
| 3 | NESTED LOOPS | | 1 | 20500 | 340K| 8227 (1)| 00:00:01 | 500 |00:00:00.13 | 30028 |
| 4 | NESTED LOOPS | | 1 | | | | | 500 |00:00:00.13 | 29528 |
|- 5 | STATISTICS COLLECTOR | | 1 | | | | | 500 |00:00:00.13 | 29512 |
| * 6 | TABLE ACCESS FULL | T2 | 1 | 20500 | 140K| 7978 (1)| 00:00:01 | 500 |00:00:00.13 | 29512 |
| * 7 | INDEX UNIQUE SCAN | T1_IDX | 500 | | | | | 500 |00:00:00.01 | 16 |
| * 8 | TABLE ACCESS BY INDEX ROWID| T1 | 500 | 1 | 10 | 248 (0)| 00:00:01 | 500 |00:00:00.01 | 500 |
|- * 9 | TABLE ACCESS FULL | T1 | 0 | 20501 | 200K| 248 (0)| 00:00:01 | 0 |00:00:00.01 | 0 |
-------------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

2 - access("A1"."ID"="A0"."ID")
6 - filter(("A1"."ID"<=20500 AND "A1"."ID">=2))
7 - access("A1"."ID"="A0"."ID")
filter(("A0"."ID"<=20500 AND "A0"."ID">=2))
8 - filter("A0"."HIST_IND"='9999')
9 - filter(("A0"."ID"<=20500 AND "A0"."ID">=2 AND "A0"."HIST_IND"='9999'))

Note
-----
- this is an adaptive plan (rows marked '-' are inactive)

SQL_ID c55rjg5mdxpph, child number 1
-------------------------------------
SELECT /*+ leading(a1) */ count(A1.SOME_IND) FROM T1 A0 INNER JOIN
T2 A1 ON 1 = 1 AND A1.ID = A0.ID WHERE 1 = 1 AND A0.HIST_IND =
'9999' AND A0.ID between 2 and 20500

Plan hash value: 1011946885

----------------------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows |E-Bytes| Cost (%CPU)| E-Time | A-Rows | A-Time | Buffers |
----------------------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | | 7994 (100)| | 1 |00:00:00.13 | 30028 |
| 1 | SORT AGGREGATE | | 1 | 1 | 17 | | | 1 |00:00:00.13 | 30028 |
| 2 | NESTED LOOPS | | 1 | 1 | 17 | 7994 (1)| 00:00:01 | 500 |00:00:00.13 | 30028 |
| 3 | NESTED LOOPS | | 1 | 500 | 17 | 7994 (1)| 00:00:01 | 500 |00:00:00.13 | 29528 |
|* 4 | TABLE ACCESS FULL | T2 | 1 | 500 | 3500 | 7978 (1)| 00:00:01 | 500 |00:00:00.13 | 29512 |
|* 5 | INDEX UNIQUE SCAN | T1_IDX | 500 | 1 | | 0 (0)| | 500 |00:00:00.01 | 16 |
|* 6 | TABLE ACCESS BY INDEX ROWID| T1 | 500 | 1 | 10 | 1 (0)| 00:00:01 | 500 |00:00:00.01 | 500 |
----------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

4 - filter(("A1"."ID"<=20500 AND "A1"."ID">=2))
5 - access("A1"."ID"="A0"."ID")
filter(("A0"."ID"<=20500 AND "A0"."ID">=2))
6 - filter("A0"."HIST_IND"='9999')

Note
-----
- statistics feedback used for this statement


69 rows selected.

SQL>
SQL> select hint_text from V$SQL_REOPTIMIZATION_HINTS where sql_id = '&sql_id' and hint_text like '%JOIN%';

HINT_TEXT
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
OPT_ESTIMATE (@"SEL$58A6D7F6" JOIN ("A0"@"SEL$1" "A1"@"SEL$1") ROWS=1.000000 )

SQL>
SQL> -- Expected number of rows, initial join HASH, "statistics feedback" does not kick in
SQL> -- No join method switch
SQL> SELECT /*+ leading(a1) */ count(A1.SOME_IND) FROM
2 T1 A0 INNER JOIN T2 A1
3 ON 1 = 1
4 AND A1.ID = A0.ID
5 WHERE 1 = 1
6 AND A0.HIST_IND = '9999'
7 AND A0.ID between 20000 and 30000
8 ;

COUNT(A1.SOME_IND)
------------------
10000

SQL>
SQL> -- Re-execute to have statistics feedback kick in
SQL> /

COUNT(A1.SOME_IND)
------------------
10000

SQL>
SQL> select prev_sql_id as sql_id from v$session where sid = userenv('sid');

SQL_ID
-------------
4tj7bn17xcbad

SQL>
SQL> select * from table(dbms_xplan.display_cursor('&sql_id', null, format => 'TYPICAL ALLSTATS LAST ADAPTIVE'));

PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
SQL_ID 4tj7bn17xcbad, child number 0
-------------------------------------
SELECT /*+ leading(a1) */ count(A1.SOME_IND) FROM T1 A0 INNER JOIN
T2 A1 ON 1 = 1 AND A1.ID = A0.ID WHERE 1 = 1 AND A0.HIST_IND =
'9999' AND A0.ID between 20000 and 30000

Plan hash value: 4274056747

----------------------------------------------------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows |E-Bytes| Cost (%CPU)| E-Time | A-Rows | A-Time | Buffers | OMem | 1Mem | Used-Mem |
----------------------------------------------------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | | 8227 (100)| | 1 |00:00:00.17 | 30434 | | | |
| 1 | SORT AGGREGATE | | 1 | 1 | 17 | | | 1 |00:00:00.17 | 30434 | | | |
| * 2 | HASH JOIN | | 1 | 10001 | 166K| 8227 (1)| 00:00:01 | 10000 |00:00:00.17 | 30434 | 1969K| 1969K| 1895K (0)|
|- 3 | NESTED LOOPS | | 1 | 10001 | 166K| 8227 (1)| 00:00:01 | 10000 |00:00:00.14 | 29512 | | | |
|- 4 | NESTED LOOPS | | 1 | | | | | 10000 |00:00:00.14 | 29512 | | | |
|- 5 | STATISTICS COLLECTOR | | 1 | | | | | 10000 |00:00:00.13 | 29512 | | | |
| * 6 | TABLE ACCESS FULL | T2 | 1 | 10002 | 70014 | 7978 (1)| 00:00:01 | 10000 |00:00:00.13 | 29512 | | | |
|- * 7 | INDEX UNIQUE SCAN | T1_IDX | 0 | | | | | 0 |00:00:00.01 | 0 | | | |
|- * 8 | TABLE ACCESS BY INDEX ROWID| T1 | 0 | 1 | 10 | 248 (0)| 00:00:01 | 0 |00:00:00.01 | 0 | | | |
| * 9 | TABLE ACCESS FULL | T1 | 1 | 10001 | 97K| 248 (0)| 00:00:01 | 10001 |00:00:00.01 | 922 | | | |
----------------------------------------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

2 - access("A1"."ID"="A0"."ID")
6 - filter(("A1"."ID"<=30000 AND "A1"."ID">=20000))
7 - access("A1"."ID"="A0"."ID")
filter(("A0"."ID">=20000 AND "A0"."ID"<=30000))
8 - filter("A0"."HIST_IND"='9999')
9 - filter(("A0"."ID">=20000 AND "A0"."HIST_IND"='9999' AND "A0"."ID"<=30000))

Note
-----
- this is an adaptive plan (rows marked '-' are inactive)


37 rows selected.

SQL>
SQL> select hint_text from V$SQL_REOPTIMIZATION_HINTS where sql_id = '&sql_id' and hint_text like '%JOIN%';

no rows selected

SQL>
Note how in each case where the Nested Loop join gets used at runtime and "statistics feedback for joins" kicks in, the bad OPT_ESTIMATE hint gets generated.

I've discussed this case also with Nigel Bayliss at Oracle (the Optimizer Product Manager) and a corresponding bug was opened, so hopefully the problem gets addressed in the future.

Oracle Database Cloud (DBaaS) Performance - Part 1 - CPU

Tue, 2017-01-03 03:00
After having looked at the performance consistency provided by the Oracle Database Cloud offering in the previous series, I'll focus here on the raw performance figures I've measured during my tests, starting with the CPU related performance findings.

One of the first surprises is related to the fact that Oracle uses a unit called "OCPU" to specify the CPU capacity provided, which is explained here:

So one "OCPU" is supposed to represent one core with two Hyperthreading CPU threads, and hence should correspond for example to two VCPUs used as unit by Amazon.

But when looking at the actual CPU configuration of such a DBaaS VM, I got the following results for a 4 OCPU configuration:


So, that 4 OCPU configuration provides 8 CPUs, which is expected, but it provides those 8 CPUs with one thread per core, so that means 8 cores.

This is what I get when I configure a corresponding Amazon EC2 VM with 8 VCPUs (m4.2xlarge), which should be same as the Amazon RDS "db.m4.2xlarge" configuration (but I can't access a RDS instance on O/S level, hence the EC2 fallback):

Architecture:          x86_64
CPU op-mode(s):        32-bit, 64-bit
Byte Order:            Little Endian
CPU(s):                8
On-line CPU(s) list:   0-7
Thread(s) per core:    2
Core(s) per socket:    4
 

Socket(s):             1
NUMA node(s):          1
Vendor ID:             GenuineIntel
CPU family:            6
Model:                 79
Model name:            Intel(R) Xeon(R) CPU E5-2686 v4 @ 2.30GHz
Stepping:              1
CPU MHz:               2300.062
BogoMIPS:              4600.12
Hypervisor vendor:     Xen
Virtualization type:   full
L1d cache:             32K
L1i cache:             32K
L2 cache:              256K
L3 cache:              46080K
NUMA node0 CPU(s):     0-7


So this more in line to what is expected - 4 cores providing 8 CPU threads.

Does it make a difference in terms of actual performance? It does - when running my CPU tests, both the PL/SQL as well as the SQL engine based tests (see the previous "performance consistency" series for more details about the tests) show the following average duration per iteration per thread when running with 4 vs. 8 threads:

Oracle DBaaS:

 Amazon RDS:

So, using 8 threads instead of 4 threads only increases the duration of a test iteration slightly in the Oracle DBaaS 4 OCPU environment, whereas the Amazon RDS 8 VCPU environment significantly slows down, even more than expected considering Hyperthreading effects - in particular the SQL Logical I/O test requires more than twice of the time at 8 threads, but the PL/SQL based test, too, significantly slows down. It's interesting to see that running the SQL Logical I/O test at 4 threads the Amazon RDS environment outperforms the Oracle DBaaS offering to an unexpected degree.

Whether the better Amazon RDS SQL Logical I/O performance at 4 threads comes from the larger cache sizes reported by "lscpu" I don't know. I also don't know why Oracle provides more cores than outlined in their own product description. Maybe this should avoid exactly the effects seen with the Amazon RDS environment - maybe Hyperthreading doesn't work that well in virtualized environments - but that is just my speculation. Whether Oracle will keep this different approach in future I don't know either.

All I can say is that I consistently got that CPU / core ratio when configuring several services using a different number of OCPUs and that the my performance tests showed the difference outlined above when comparing the Oracle DBaaS and Amazon RDS environments.

DOAG.tv Interviews (German)

Tue, 2016-12-27 03:00
In den letzten Wochen sind zwei Interviews veröffentlicht worden, die die DOAG mit mir im Rahmen der jährlichen DOAG Konferenz in Nürnberg durchgeführt hat.

Das erste stammt noch von der DOAG Konferenz 2015 und bezieht sich auf meinen damaligen Vortrag über die neuen Parallel Execution Features von Oracle 12c:

DOAG.tv Interview 2015

Das zweite ist von der diesjährigen DOAG Konferenz und bezieht sich auf meine Performance-Tests der Oracle Database Cloud und dem dazugehörigen Vortrag:

DOAG.tv Interview 2016

Die Interviews dauern jeweils nur wenige Minuten, gehen also nur in wenigen Stichpunkten auf die jeweiligen Themen ein.

Pages