O'Reilly Network    
 Published on O'Reilly Network (http://www.oreillynet.com/)
 See this if you're having trouble printing code examples


Mapping Geometric Data with Oracle Spatial

by Matt Bauer
11/10/2003

Mobile applications and technologies have created an increased need for storing, mapping, and analyzing spatial data. Traditional database techniques have stored this spatial data in X and Y columns to form various geometries. This technique allows only limited indexing and querying, making data access and manipulation difficult. Oracle Spatial attempts to make up for these limitations with its various data models, indexes, and queries. This article explains the core concepts of Oracle Spatial, particularly in its role as a way to store and analyze spatial data.

Data Model

Oracle Spatial can store and analyze four-dimensional data; however, for the sake of clarity, this article discusses only two-dimensional. The basic building block in Oracle Spatial is an element: a Point, a LineString, or a Polygon. Each element is described by an array of its vertices. For a two-dimensional element, the array is read as { X1, Y1, X2, Y2, X3, Y3, ... }. Examples for each element are below.

ElementNameOrdinates (Vertices Array)
pointPoint{ 1,2 }
linestringLineString{ 2,2, 5,1, 3,0 }
polygonPolygon{ 0,0, 1,0, 1,1, 0,1, 0,0 }

The Polygon element includes a repeat of the first point, which instructs Oracle Spatial that the element is a polygon. Without the repeat it would be impossible to distinguish a LineString that ends where it started from a Polygon. Ordered combinations of these elements are used to create Geometries. Multiple LineStrings are placed together to describe a roadway or two polygons are combined to describe a lake with an island. The lake would have ordinates (vertices array) of { 0,0, 10,0, 10,10, 0,10, 0,0 } and the island ordinates of { 4,4, 6,4, 6,6, 4,6, 4,4 }.

lake with island
Figure 1. Lake with island

It's important to note that the order of vertices is important. Oracle Spatial expects a clockwise order for polygon elements, with the last vertex being the first in order to close the polygon.

These ordinates are not enough to map the lake with an island though. What's missing is a coordinate system and tolerance. Looking at the lake ordinates it's impossible to tell if it's ten miles or ten kilometers wide. It's also not clear if the lake is on a plane or on some projection of the earth. What's needed is a coordinate or spatial reference system. Oracle Spatial defaults to a Cartesian coordinate system with dimensionless units. For large geodetic data sets (such as maps of the earth's surface), a geodetic coordinate system like NAD83 is needed. This coordinate system uses longitude and latitude for the ordinates and automatically handles issues with the curvature of the earth's surface. For the lake example, a Cartesian coordinate system is sufficient. Another piece of information needed to map is the accuracy of ordinates. Spatial data almost always has an associated tolerance or error associated with it. However, Oracle Spatial defaults to a tolerance of zero. With a coordinate system and tolerance, it's possible to query and map spatial data correctly.

Putting It in Tables

To create a table called mylake with an indexed spatial column, the following SQL statements are needed.

CREATE TABLE mylake (
    feature_id NUMBER PRIMARY KEY,
    name VARCHAR2(32),
    shape MDSYS.SDO_GEOMETRY);

INSERT INTO user_sdo_geom_metadata VALUES (
    'mylake',
    'shape',
    MDSYS.SDO_DIM_ARRAY(
MDSYS.SDO_DIM_ELEMENT('X', 0, 100, 0.05),
MDSYS.SDO_DIM_ELEMENT('Y', 0, 100, 0.05)),
    NULL);

CREATE INDEX mylake_idx ON mylake(shape) 
    INDEXTYPE IS MDSYS.SPATIAL_INDEX

The first statement creates the desired table with a column named shape to hold the spatial data. We'll discuss the MDSYS.SDO_GEOMETRY data type later. The second statement tells Oracle Spatial about the spatial data in the mylake table. The user_sdo_geom_metadata table describes the coordinate system and tolerance for each spatial column that a user owns. The table takes four values: the table name that has a spatial column, the column name for the spatial data, an array describing the minimum, maximum, and tolerance value for each dimension, and a number stating the coordinate system. mylake uses two-dimensional data that can range from 0 to 100 units with a tolerance of 0.05. The null value for the coordinate system tells Oracle Spatial to use the default Cartesian coordinate system.

The last statement creates the spatial index, which is required for spatial queries. Oracle uses two types of spatial indexing: R-tree and Quadtree. If the spatial data is geodetic (map data), R-Tree must be used to take full advantage of Oracle Spatial's functions. In the index statement above the spatial type wasn't specified, causing Oracle Spatial to default to an R-Tree index. Details about spatial indexing may be found in the reference section, but both types follow the same concept. Indexes are made by placing a grid over the spatial data and then noting which grid square or grid squares the shape lies in.

grid overlaid on various shapes
Figure 2. Grid overlaid on various shapes

(We'll discuss indexing in further detail below in the explanation of spatial querying.)

Having created an indexed table to store spatial data, all that's left is to populate it with data. Oracle Spatial uses the MDSYS.SDO_GEOMETRY type to store spatial data, which is defined as:

CREATE TYPE sdo_geometry AS OBJECT (
    SDO_GTYPE NUMBER,
    SDO_SRID NUMBER,
    SDO_POINT SDO_POINT_TYPE,
    SDO_ELEM_INFO MDSYS.SDO_ELEM_INFO_ARRAY,
    SDO_ORDINATES MDSYS.SDO_ORDINATE_ARRAY);

SDO_GTYPE is a number that defines the overall shape; that is, a line, a sphere, and so forth. It describes the end result of the ordered combination of elements. SDO_GTYPE is a four-digit integer. The first digit represents the number of dimensions. The second digit represents the linear representation, which is important for a three- or four-dimensional shape. In a two-dimensional shape the value is zero. The last two digits represent the shape and range from 00 to 07. Each value describes a particular geometry, noted below.

ValueGeometryDescription
00UNKNOWN_GEOMETRYSpatial ignores this value
01POINTA single point element
02LINE or CURVEContains one line string element that may be linear, curved or both
03POLYGONContains one polygon element with or without other polygon elements in it
04COLLECTIONA heterogeneous collection of elements
05MULTIPOINTContains one or more points
06MULTILINE or MULTICURVEContains one or more line string elements
07MULTIPOLYGONContains multiple polygon elements that maybe disjoint

Examples of SDO_GTYPE are a rectangle, 2003, and a collection of roadways, 2006.

The SDO_SRID number describes the coordinate system to use. This field is used to guarantee that all geometries within the table column use the same coordinate system. This number is the same as that used in the USER_SDO_GEOM_METADATA table. It also defaults to the Cartesian coordinate system if set to null.

Spatial also allows for the definition of a single point within a geometry. Think of this as a point within the lake. This point could be used for label placement, measurement determinations, and so on. An SDO_POINT is defined as:

CREATE TYPE sdo_point_type AS OBJECT (
    X NUMBER,
    Y NUMBER,
    Z NUMBER);

If a two-dimensional geometry is used, Z may be left null. Alternately, the entire SDO_POINT value may be set to null.

Skipping ahead, the SDO_ORDINATES_ARRAY is a list of all the vertices that define the geometry. It's defined below.

CREATE TYPE sdo_ordinate_array AS VARRAY (1048576) of NUMBER;

For a lake with an island, the SDO_ORDINATES_ARRAY value is (0,0, 10,0, 10,10, 0,10, 0,0, 4,4, 6,4, 6,6, 4,6, 4,4). Mentioned above, the values are read in pairs, with the first value being X and the second value being Y. If this were a three-dimensional geometry, the values would be read in triplets, with the last value being Z.

The SDO_ELEM_INFO_ARRAY describes the multiple elements within the SDO_ORDINATES_ARRAY and is defined below.

CREATE TYPE sdo_elem_info_array AS VARRAY (1048576) of NUMBER;

An SDO_ELEM_INFO_ARRAY is understood as three values at a time. Each set of three values describes an element of the geometry. The lake example is composed of two polygon elements and would have an SDO_ELEM_INFO_ARRAY containing six numbers: { 1,1003,1, 6, 2003, 1 }. The first triplet of 1, 1003, 1 states that the first vertex of the first element (the lake) starts at position 1 in the SDO_ORDINATES_ARRAY. The value of 1003 defines the element (the lake) as a single exterior polygon and the final 1 tells Oracle Spatial to connect the vertices with straight lines. A value of 2 would connect the vertices with arcs. The second triple of 6, 2003, 1 states the first vertex of the second element (the island) starts at position 6 in the SDO_ORDINATES_ARRAY. The value of 2003 defines the element (the island) as an interior polygon and it too has its vertices connected with straight lines. Possible values besides 1003 and 2003 are listed below. More information about SDO_ELEM_INFO_ARRAY may be found in the reference section at the end of this article.

ValueMeaning
1Point element
2LineString element
1003Exterior polygon element
2003Interior polygon element

The SQL needed to insert a lake with an island into the mylake table is listed below.

INSERT INTO mylake VALUES(
    10, -- feature_id
    'Lake Calhoun', -- name
    MDSYS.SDO_GEOMETRY(
2003,
NULL,
NULL,
MDSYS.SDO_ELEM_INFO_ARRAY(1,1003,1, 19,2003,1),
MDSYS.SDO_ORDINATE_ARRAY(0,0, 10,0, 10,10, 0,10, 0,0, 4,4, 6,4, 6,6, 4,6, 4,4)
    ));

The SQL needed to insert a couple of boats into the mylake table is listed below.

INSERT INTO mylake VALUES(
    12, -- feature_id
    'The Windswept', -- name
    MDSYS.SDO_GEOMETRY(
2003,
NULL,
NULL,
MDSYS.SDO_ELEM_INFO_ARRAY(1,1003,1),
MDSYS.SDO_ORDINATE_ARRAY(2,2, 3,2, 3,2, 2,3, 2,2)
    )
);

INSERT INTO mylake VALUES(
    12, -- feature_id
    'Blue Crest', -- name
    MDSYS.SDO_GEOMETRY(
2003,
NULL,
NULL,
MDSYS.SDO_ELEM_INFO_ARRAY(1,1003,1),
MDSYS.SDO_ORDINATE_ARRAY(7,7, 8,7, 8,7, 7,8, 7,7)
    )
);

Querying

Oracle Spatial uses a two-tier process for querying spatial data. The first part of the query selects candidate rows based only on their spatial index. The second step of the query looks at each candidate and determines if it meets the query exactly. This two-tier approach allows for quick querying by leveraging the spatial indexes.

The first part of the query uses an Oracle Spatial function called SDO_FILTER, which is defined below.

SDO_FILTER(geometry1 MDSYS.SDO_GEOMETRY, 
	geometry2 MDSYS.SDO_GEOMETRY, params VARCHAR2)

The first geometry argument, geometry1, is a column name of spatially indexed geometries. The second geometry argument, geometry2, may or may not come from a table and does not need to be indexed spatially. The final argument defines how the filtering works. If the final argument is querytype=WINDOW, filtering is done in memory and works very well when geometry2 doesn't come from a table. If the final argument is querytype=JOIN, geometry2 must come from a table and performance depends on the spatial index type of the two tables. SDO_FILTER returns a string of TRUE if a successful filtering occurs or FALSE otherwise. An example query using SDO_FILTER follows.

SELECT name boat_name
FROM mylake t
WHERE feature_id = 12
AND SDO_FILTER(t.shape, mdsys.sdo_geometry(2003,NULL,NULL,
    mdsys.sdo_elem_info_array(1,1003,1),
    mdsys.sdo_ordinate_array(2,2, 5,2, 5,5, 2,5, 2,2)),
    'querytype=WINDOW') = 'TRUE';

This query selects all boats that have a geometry with an indexed grid square within the polygon defined. This doesn't necessarily mean the returned boats are within the rectangle or if they are just touching the defined rectangle. To obtain an exact query, a second function called SDO_RELATE must be executed. SDO_RELATE looks at two geometries and determines if they interact in a specified way. It is important to note that SDO_RELATE only works on two-dimensional data. It is defined below.

SDO_RELATE(geometry1 MDSYS.SDO_GEOMETRY, geometry2 MDSYS.SDO_GEOMETRY,
	params VARCHAR2)

SDO_RELATE arguments are the same as those for SDO_FILTER with the exception of the last argument. The params argument has a masktype value in addition to the querytype value. The masktype value can take the values listed below.

— Oracle Spatial User Guide and Reference 9.2

To select all boats that are inside a defined rectangle the following query would work:

SELECT name boat_name
FROM mylake t
WHERE feature_id = 12
AND SDO_FILTER(t.shape, mdsys.sdo_geometry(2003,NULL,NULL,
    mdsys.sdo_elem_info_array(1,1003,1),
    mdsys.sdo_ordinate_array(2,2, 5,2, 5,5, 2,5, 2,2)),
    'querytype=WINDOW') = 'TRUE'
AND SDO_RELATE(t.shape, mdsys.sdo_geometry(2003,NULL,NULL,
    mdsys.sdo_elem_info_array(1,1003,1),
    mdsys.sdo_ordinate_array(2,2, 5,2, 5,5, 2,5, 2,2)),
    'masktype=INSIDE querytype=WINDOW') = 'TRUE'

It is also possible to combine masktypes to select sites that are inside or touching the defined polygon with the query below.

SELECT feature_id id
FROM mylake t
WHERE feature_id = 12
AND SDO_FILTER(t.shape, mdsys.sdo_geometry(2003,NULL,NULL,
    mdsys.sdo_elem_info_array(1,1003,1),
    mdsys.sdo_ordinate_array(2,2, 5,2, 5,5, 2,5, 2,2)),
    'querytype=WINDOW') = 'TRUE'
AND SDO_RELATE(t.shape, mdsys.sdo_geometry(2003,NULL,NULL,
    mdsys.sdo_elem_info_array(1,1003,1),
    mdsys.sdo_ordinate_array(2,2, 5,2, 5,5, 2,5, 2,2)),
    'masktype=INSIDE+TOUCH querytype=WINDOW') = 'TRUE'

Oracle Spatial provides other powerful querying functions beside SDO_FILTER and SDO_RELATE. They include:

QueryDescription
SDO_NNNearest neighbor
SDO_SDO_WITHIN_DISTANCEAll geometries with a certain distance
FunctionsDescription
SDO_GEOM.SDO_MBRThe minimum bounding rectangle for a geometry
SDO_GEOM.SDO_DISTANCEThe distance between two geometries
SDO_GEOM.SDO_INTERSECTIONProvides the intersection point of two geometries

There are many more query operators and functions in Oracle Spatial. A complete list and explanation may be found in reference section below.

Conclusion

Oracle Spatial makes it possible to combine the relational power of a database with spatial data. The ability to use indexes, various queries, and functions means complex spatial calculations may be pushed back onto large database servers. As mobile applications and technologies increase, so will the demands to store and analyze spatial data in a transactional setting.

Reference

Matt Bauer is the founder of MetroRacer, a Minneapolis based firm developing software for professional and amateur athletes.


Return to the O'Reilly Network.

Copyright © 2009 O'Reilly Media, Inc.