# 4.3 Vector Data Models

In contrast to the raster data model is the vector data model. In this model, space is not quantized into discrete grid cells like in the raster model. Vector data models use points and their associated X, Y coordinate pairs to represent the vertices of spatial features, much as if they were being drawn on a map by hand (Aronoff, 1989). [1] The data attributes of these features are then stored in a separate database management system. The spatial information and the attribute information for these models are linked via a simple identification number given to each feature in a map.

Three fundamental vector types exist in geographic information systems (GIS): points, lines, and polygons. Points are zero-dimensional objects that contain only a single coordinate pair. Points are typically used to model singular, discrete features such as buildings, wells, power poles, sample locations, etc. Points have only the property of location. Other types of point features include the node and the vertex. Specifically, a point is a stand-alone feature, while a node is a topological junction representing a common X, Y coordinate pair between intersecting lines and polygons. Vertices are defined as each bend along with a line or polygon feature that is not the intersection of lines or polygons.

Points can be spatially linked to form more complex features. Lines are one-dimensional features composed of multiple, explicitly connected points. Lines represent linear features such as roads, streams, faults, and boundaries. In addition, lines have the property of length. Lines that directly connect two nodes are sometimes referred to as chains, edges, segments, or arcs.

Polygons are two-dimensional features created by multiple lines that loop back to create a “closed” feature. In the case of polygons, the first coordinate pair (point) on the first line segment is the same as the last coordinate pair on the last line segment. Polygons represent features such as city boundaries, geologic formations, lakes, soil associations, vegetation communities, and more. In addition, polygons have the properties of area and perimeter. Therefore, polygons are also called areas.

# Vector Data Models Structures

Vector data models can be structured in many different ways. We will examine two of the more common data structures here. The simplest vector data structure is called the spaghetti data model (Dangermond, 1982). [2] In the spaghetti model, each point, line, and polygon feature is represented as a string of X, Y coordinate pairs (or as a single X, Y coordinate pair in the case of a vector image with a single point) no inherent structure. One could envision each line in this model as a single strand of spaghetti formed into complex shapes by adding more strands of spaghetti. In this model, any polygons that lie adjacent to each other must be made up of their lines or strands of spaghetti. In other words, each polygon must be uniquely defined by its own set of X, Y coordinate pairs, even if the adjacent polygons share the same boundary information. This creates some redundancies within the data model and therefore reduces efficiency.

Despite the location designations associated with each line, or strand of spaghetti, spatial relationships are not explicitly encoded within the spaghetti model; instead, they are implied by their location. This results in a lack of topological information, which is problematic if the user attempts to make measurements or analyses. Therefore, the computational requirements are steep if any advanced analytical techniques are employed on vector files. Nevertheless, the simple structure of the spaghetti data model allows for efficient reproduction of maps and graphics as this topological information is unnecessary for plotting and printing.

In contrast to the spaghetti data model, the topological data model is characterized by including topological information within the dataset, as the name implies. Topology is a set of rules that model the relationships between neighboring points, lines, and polygons and determines how they share geometry. For example, consider two adjacent polygons. In the spaghetti model, the shared boundary of two neighboring polygons is defined as two separate, identical lines. The inclusion of topology into the data model allows for a single line to represent this shared boundary with an explicit reference to denote which side of the line belongs to which polygon. Topology is also concerned with preserving spatial properties when the forms are bent, stretched, or placed under similar geometric transformations, which allows for more efficient projection and reprojection of map files.

Three basic topological precepts are necessary to understand the topological data model are outlined here. First, connectivity describes the arc-node topology for the feature dataset. As discussed previously, nodes are more than simple points. In the topological data model, nodes are the intersection points where two or more arcs meet. In the case of arc-node topology, arcs have both a from-node (i.e., starting node) indicating where the arc begins and a to-node (i.e., ending node) indicating where the arc ends. In addition, between each node pair is a line segment, sometimes called a link, which has its identification number and references both its from-node and to-node. For example, in this figure, “Arc-Node Topology,” arcs 1, 2, and 3 intersect because they share node 11. Therefore, the computer can determine that it is possible to move along arc 1 and turn onto arc 3, while it is impossible to move from arc 1 to arc 5, as they do not share a common node.

The second fundamental topological precept is area definition. Area definition states that an arc that connects to surround an area defines a polygon, also called polygon-arc topology. In the case of polygon-arc topology, arcs are used to construct polygons, and each arc is stored only once. This reduces the amount of data stored and ensures that adjacent polygon boundaries do not overlap. For example, in the figure on “Polygon-Arc Topology,” the polygon-arc topology clarifies that polygon F comprises arcs 8, 9, and 10.

Contiguity, the third topological precept, is based on the concept that polygons that share a boundary are deemed adjacent. Specifically, polygon topology requires that all arcs in a polygon have a direction (a from-node and a to-node), which allows adjacency information to be determined. Polygons that share an arc are deemed adjacent or contiguous, and therefore the “left” and “right” sides of each arc can be defined. This left and right polygon information is stored explicitly within the attribute information of the topological data model. The “universe polygon” is an essential component of polygon topology that represents the external area located outside of the study area. The figure “Polygon Topology” shows that arc 6 is bound by polygon B and to the right by polygon C. Polygon A, the universe polygon, is to the left of arcs 1, 2, and 3.

Topology allows the computer to rapidly determine and analyze the spatial relationships of all its included features. In addition, topological information is essential because it allows for efficient error detection within a vector dataset. In the case of polygon features, open or unclosed polygons, which occur when an arc does not completely loop back upon itself, and unlabeled polygons, which occur when an area does not contain any attribute information, violate polygon-arc topology rules. Another topological error found with polygon features is the sliver. Slivers occur when the shared boundary of two polygons does not meet precisely.

In the case of line features, topological errors occur when two lines do not meet perfectly at a node. This error is called an “undershoot” when the lines do not extend far enough to meet each other and an “overshoot” when the line extends beyond the feature it should connect to. The result of overshoots and undershoots is a “dangling node” at the end of the line. Dangling nodes are not always an error, however, as they occur in the case of dead-end streets on a road map.

Many types of spatial analysis require the degree of organization offered by topologically explicit data models. For example, network analysis (e.g., finding the best route from one location to another) and measurement (e.g., finding the length of a river segment) rely heavily on the concept of to-and-from nodes and use this information, along with attribute information, to calculate distances, shortest routes, or the quickest route. Topology also allows for sophisticated neighborhood analysis such as determining adjacency, clustering, or nearest neighbors.

Now that the basics of the concepts of topology have been outlined, we can begin to understand the topological data model better. In this model, the node acts as more than just a simple point along a line or polygon. Instead, the node represents the point of intersection for two or more arcs. Arcs may or may not be looped into polygons. Regardless, all nodes, arcs, and polygons are individually numbered. This numbering allows for quick and easy reference within the data model.

# Advantages and Disadvantages of the Vector Model

In comparison with the raster data model, vector data models tend to be better representations of reality due to the accuracy and precision of points, lines, and polygons over the regularly spaced grid cells of the raster model. This results in vector data tending to be more aesthetically pleasing than raster data.

Vector data also provides an increased ability to alter the scale of observation and analysis. However, as each coordinate pair associated with a point, line, and polygon represents an infinitesimally exact location (albeit limited by the number of significant digits and data acquisition methodologies), zooming deep into a vector image does not change the view of a vector graphic in the way that it does a raster graphic.

Vector data tend to be more compact in the data structure, so file sizes are typically much smaller than their raster counterparts. Although the ability of modern computers has minimized the importance of maintaining small file sizes, vector data often require a fraction of the computer storage space compared to raster data.

The final advantage of vector data is that topology is inherent in the vector model. Using a vector model, this topological information results in simplified spatial analysis (e.g., error detection, network analysis, proximity analysis, and spatial transformation).

Alternatively, there are two primary disadvantages of the vector data model. First, the data structure tends to be more complex than the simple raster data model. As the location of each vertex must be stored explicitly in the model, there are no shortcuts for storing data like there are for raster models (e.g., the run-length and quad-tree encoding methodologies).

Second, the implementation of spatial analysis can also be complicated due to minor differences in accuracy and precision between the input datasets. Similarly, the algorithms for manipulating and analyzing vector data are complex and can lead to intensive processing requirements, mainly when dealing with large datasets.