"

3.4 Data Driven Research

Researching Data

Now that we have a basic understanding of data and information, where can we find such data and information? Though an Internet search will undoubtedly produce myriad sources and types of data, the hunt for relevant and valuable data is often challenging and iterative. Therefore, before hopping online and downloading the first thing that appears from a web search, it is helpful to frame our search for data with the following questions and considerations:

What exactly is the purpose of the data?

Since the world is drowning in vast amounts of data, articulating why we need or do not need a given data set will streamline the search for valuable and relevant data. To this end, the more specific we can be about the purpose of the needed data, the more efficient our search for data will be. For example, if we are interested in understanding and studying economic growth, it is helpful to determine both temporal and geographic scales. In other words, for what periods (e.g., 1850–1900) and intervals (e.g., quarterly, annually) are we interested, and at what level of analysis (e.g., national, regional, state)? Data availability, or the lack of relevant data, will frequently force us to change the purpose or scope of our original question. A clear purpose will yield a more efficient search for data and enable us to accept or quickly discard the various data sets we may encounter.

What data already exists and is available?

Before searching for new data, it is always a good idea to the inventory we already have data. Such data may be from previous projects or analyses or colleagues and classmates, but the critical point here is that we can save a lot of time and effort using data we already possess. Furthermore, we better understand what we need by identifying what we have. For instance, though we may already have census data (i.e., attribute data), we may need updated geographic data that contains the boundaries of U.S. states or counties.

What are the costs associated with data acquisition?

Data acquisition costs go beyond financial costs. Just as important as the economic costs to data are those that involve your time. Time is money. The time and energy you spend collecting, finding, cleaning, and formatting data are time and energy taken away from data analysis. Therefore, depending on deadlines, time constraints, and deliverables, it is critical to learn to manage your time when looking for data.

What format does the data need to be in?

Though many programs can read many data formats, some data types can only be read by some programs, and some programs require data formats—understanding what data formats you can use and those that you cannot aid in your search for data. For instance, the shapefile is one of the most common forms of geographic information system (GIS) data. Not all GIS programs can read or use shapefiles, but it may be necessary to convert to or from a shapefile or another format. Therefore, the more data formats we are familiar with, the better off we will be in our search for data because we will understand what we can use and what format conversions need to be made if necessary.

All these questions are of equal importance, and being able to answer them will assist in a more efficient and effective search for data. Several other considerations behind the search for data, particularly GIS data, but those listed here provide an initial pathway to a successful search for data.

As information technology evolves and more data are collected and distributed, the various forms of data that can be used with GIS increase. GIS uses and integrates two types of data: geographic and attribute data. Sometimes the source of both geographic and attribute data is the same. For instance, the United States Census Bureau distributes geographic boundary files (e.g., census tract level, county level, state-level) and the associated attribute data (e.g., population, race/ethnicity, income). What is more, such data are freely available at no charge. U.S. Census data are exceptional in many respects: free and comprehensive.

Every search for data will vary according to the purpose. However, government data tend to have good coverage and provide a point of reference from which other data can be added, compared, and evaluated. Whether you need satellite imagery data from the National Aeronautics and Space Administration (NASA) or land use data from the United States Geological Survey (USGS), such government sources tend to be dependable, reputable, and consistent. Another critical element of most government data is that they are freely accessible to the public. In other words, using or acquiring the data is at no charge. Data that are free to use are called public data.

Unlike publicly available data, there are numerous private or proprietary data sources. The main difference between public and personal data is that the former tends to be free, and the latter must be acquired at a cost. Furthermore, there are often restrictions on distributing and disseminating proprietary data sets (i.e., sharing the purchased data is not allowed). Again, proprietary data may be the only option depending on the subject. Another reason for using proprietary data is that the data may be formatted and cleaned according to your needs. When working with deadlines, the trade-off between financial cost and time saved must be seriously considered and evaluated.

The search for data, particularly the data you need, is often the most time-consuming aspect of any GIS-related project. Therefore, it is essential to define and clarify your data requirements and needs, from the temporal and geographic scales of data to the formats required, as clearly as possible and as early as possible. Such definition and clarity will pay dividends in your search for the correct data, better analyses, and well-informed decisions.

Geographic Data and Questions

The ultimate objective of all geospatial data and technologies is to produce knowledge. Most of us are interested in data only to the extent that it can be used to help us understand the world around us and to make better decisions. Decision-making processes vary a lot from one organization to another. In general, however, the first steps in deciding are to articulate the questions that need to be answered and to gather and organize the data needed to answer the questions (Nyerges & Golledge, 1997).

Geographic data and information technologies can be highly effective in helping to answer certain kinds of questions. However, the expensive, long-term investments required to build and sustain GIS infrastructures can be justified only if the questions that confront an organization can be stated in terms GIS is equipped to answer. As a specialist in the field, you may be expected to advise clients and colleagues on the strengths and weaknesses of GIS as a decision support tool. Examples of the questions that are amenable to GIS analyses, along with questions that GIS is not so well suited to help answer.

The most straightforward geographic questions pertain to individual entities. Such questions include:

Geographic questions about space and time

  • Where is the entity located?
  • What is its extent?
  • When were the entity’s location, extent, or attributes measured?
  • Has the entity’s location, extent, or attributes changed over time?
  • Simple questions like these can be answered effectively with a good, printed map. GIS becomes increasingly attractive as the number of people asking the questions grows, especially if they lack access to the required paper maps.

Geographic questions related to location attributes

  • What are the attributes of the entity located there?
  • Do its attributes match one or more criteria?

More complex questions arise when we consider relationships among two or more entities. For instance, we can ask:

Geographic patterns and relationships

  • Do the entities contain one another?
  • Do they overlap?
  • Are they connected?
  • Are they situated within a certain distance of one another?
  • What is the best route from one entity to the others?
  • Where are entities with similar attributes located?

Location-based attribute relationships

  • Do the entities share attributes that match one or more criteria?
  • Are the attributes of one entity influenced by changes in another entity?

Location-based temporal relationships

  • Have the entities’ locations, extents, or attributes changed over time?

Geographic data and information technologies are very well suited to answering moderately complex questions like these. GIS is most valuable to large organizations that often need to answer such questions. Harder still, however, are explanatory questions–such as why entities are located where they are, why they have the attributes they do, and why they have changed as they have. In addition, organizations are often concerned with predictive questions–such as what will happen at this location if thus-and-so occurs. Commercial GIS software packages cannot be expected to provide clear-cut answers to explanatory and predictive questions right out of the box. Typically, analysts must turn to specialized statistical packages and simulation routines. Information produced by these analytical tools may then be re-introduced into the GIS database if necessary. Research and development efforts to couple analytical software more tightly with GIS software are underway within the GIScience community. It is important to remember that decision support tools like GIS are no substitutes for human experience, insight, and judgment.

At the outset of the chapter, I suggested that producing information by analyzing data is like making energy by burning coal. In both cases, technology is used to realize the potential value of raw materials. Also, in both cases, the production process yields some undesirable by-products. Similarly, in the process of answering specific geographic questions, GIS tends to raise others, such as:

  • Given the intrinsic imperfections of the data, how dependable are the results of the GIS analysis?
  • Does the information produced through GIS analysis benefit some constituent groups at the expense of others systematically?
  • Should the data use to make the decision be made public?
  • Does the use of GIS affect the organization’s decision-making processes in ways that benefit its management, employees, and customers?

As in many endeavors, the answer to a geographic question usually includes more questions.

Click the “Previous” button on the lower left or the ‘Next” button on the lower right to navigate throughout the textbook.