Women in Technology

Hear us Roar



Article:
  Range-Keyed Queries
Subject:   With Oracle
Date:   2009-07-13 08:56:42
From:   ahmusch
There can be some volume problems with only adding AND ROWNUM = 1, specifically with a large data set and not-found condition.


Assume the following index-organized table:


SQL> DESC GEO_IP_ORG_LOC;
Name Null? Type
--------------------- -------- ----------------
IP_NUM_START NOT NULL NUMBER(10)
IP_NUM_END NOT NULL NUMBER(10)
ISP VARCHAR2(255)
ORG VARCHAR2(255)
COUNTRY_CODE CHAR(2)
REGION_CODE CHAR(2)
CITY VARCHAR2(255)


The primary key of this table is (ip_num_end, ip_num_start).


If one searches this data for a found IP address like so:


SQL> variable google_ip number;
SQL> exec :google_ip := ((74*256*256*256) + (125*256*256) + (45*256) + 100);


PL/SQL procedure successfully completed.


SQL> variable private_ip number;
SQL> exec :private_ip := ((10*256*256*256) + (0*256*256) + (0*256) + 1);


select * from ag2.geo_ip_org_loc a
where :google_ip between ip_num_start and ip_num_end
and rownum = 1


very few -- on the order of single digits -- consistent gets are required if the index is cached.


But if the data isn't in the data set, performance can go sideways. The same query with a different IP address:


select * from ag2.geo_ip_org_loc a
where :private_ip between ip_num_start and ip_num_end
and rownum = 1


burns through 95,000+ consistent gets.


The nature of the problem is that we can't tell Oracle that the ranges are discrete through constraints or other declarations, but we can emulate the "greatest less than" and "least greater than" through SQL.


Consider the SQL with an additional where clause:


select * from ag2.geo_ip_org_loc a
where :google_ip between ip_num_start and ip_num_end
and rownum = 1
and ip_num_end = (select /* no_unnest */
min(b.ip_num_end)
from ag2.geo_ip_org_loc b
where b.ip_num_end >= :google_ip
)


Again, single digit consistent gets with a found IP address. We also get similar performance in cases where the value we're searching for is in no range.


The subquery is the least greater than, and the no_unnest hint allows it to only be evaluated when the preceding BETWEEN clause is true, depending on the release of Oracle being executed Some versions of the Oracle optimizer are more aggressive than others about unnesting that subquery.


You can obtain exemplar data at http://www.maxmind.com/app/geolitecity