UP | HOME

Chapter XXX: Databases and SQL

$Id: kurt-2010.org 13030 2010-01-14 13:33:15Z schwehr $

Table of Contents

Introduction

This chapter is meant to give you an introductory knowledge of databases that use the Structured Query Language (SQL). For more theory and complicated commands, you will need to consult references that focus on databases.

Why use a database

Outlined only

  • Trying to work with column based data can get tiring
  • SQL provides a useful language for querying data
  • Allows uniform access to things like R, Python, QGIS, ArcGIS, Matlab, etc

Which database software to use

We will work with two open source databases in this chapter: SQLite and PostgreSQL. These two systems cover the range of power of databases.

SQLite is the simplest of databases. It uses a single file on your computer to store the database. This makes it easy to delete the file and start again or send the file to a coworker to share data. SQLite, by virtual of being very simple, is very fast for basic tasks. However, it is missing a very large number of features required in the SQL standard. For example it only has a couple data types and uses strings to represent many of the types. It does not follow the strict typing of SQL, meaning that you can store an integer in a string field and it will not complain. There are a number of very like web sites that use SQLite behind the scenes.

PostgreSQL is one of the most feature complete SQL databases around. It focuses on trying to do a very good job of following the SQL standard. It supports scripts that run inside the database (called store procedures), has a spatial add module called PostGIS, and has extensive documentation. PostgreSQL runs a server program (called a daemon) that is always in the background. Programs can talk to the database, which can be on the same computer or can connect over a network connection. However, for getting started learning databases, we will focus on SQLite to avoid the complexities of setting up a PostgreSQL.

Once you have the basics of SQL, we will start to use Python to talk to the database. First we will put SQL right into the Python code. This is the simplest way to talk to a database and allows you to see exactly what is happening. After that, we will use SQLAlchemy, which hides the SQL commands and allows you to interact with the database without having to think about the exact SQL commands that are being run for you.

Acquiring the data

database@iodp.tamu.edu

FIX/TODO: Explain where I got the data, what it means, why I chose it, and where how to get a copy.

Getting started with SQLite

Sample dataset - world ocean drilling

There is no better way to start learning SQL than to create a small database and to make use of it. For this section, we will use a data set of the world locations of bore holes drilled by the Deep Sea Drilling Program (DSDP), Ocean Drilling Program (ODP), and Integrated Ocean Drilling Program (IODP). We have fairly complete version of the table, but we will start with just 6 rows of the table to start.

LEGSITEHLONGITUDELATITUDEDate ArrivedWater Depth (m)TOTAL PEN (m)
100625A-87.160028.831701/12/85 1010889234.9
101626A-79.546825.600002/01/85 0415846.812.8
103637A-12.863342.088304/28/85 18005311285.6
104642A2.928367.225006/28/85 03151281.610.8

Creating a table in the database

We need to create a table in our database to store this data. SQL has a number of standard data types for each column of a database table. SQLite plays pretty loosely with data types, but for most purposes it just works. The basic SQL types are as follows:

TypeDescription
BOOLEANTRUE or FALSE
INTEGERA whole number - same as an
REALA floating point number
FIX: doc the rest

We will start off by creating a database. With SQLite, this is a lot easier than you might imagine. All we have to do is start SQLite and give it the name of the database file. Make sure not to use the "sqlite" command (without the 3), as you will be using a much older version of SQLite.

sqlite3 holes.db3

SQLite will respond with its initialization and leave you at a prompt. It is waiting for you to get started!

SQLite version 3.6.13
Enter ".help" for instructions
Enter SQL statements terminated with a ";"
sqlite> 

Now we need to create the table for our holes. Just paste this text directly into your sqlite

CREATE TABLE IF NOT EXISTS holes (
       -- Two dashes are a comment to the end of a line
       id INTEGER PRIMARY KEY, -- every table should have a primary key
       leg INTEGER,
       site INTEGER,
       hole VARCHAR(1),  -- a string of length 1
       arrived TIMESTAMP,
       water_depth REAL, -- in meters. It is helpful to comment the units!
       total_pen REAL -- in meters
);

FIX: Write a walk through of what just happened.

You now have a table that you can add data to. But first we should ask SQLite what it thinks it has in the database. SQLite has built-in commands that are not a part of SQL. The most important one is .help. That is a period followed by the word help. The .tables command will list the tables that we have. There will only be one: holes. Then the .schema command will list the definitions of all the tables. A schema is what SQL calls the definition of the contents of a table. SQLite preserves the comments, thereby keeping things like the units about which SQL knows nothing.

sqlite> .tables
holes
sqlite> .schema
CREATE TABLE holes (
       -- Two dashes are a comment to the end of a line
       id INTEGER PRIMARY KEY, -- every table should have a primary key
       leg INTEGER,
       site INTEGER,
       hole VARCHAR(1),  -- a string of length 1
       lon REAL, -- Longitude.  X before Y.
       lat REAL, -- Latitude
       arrived TIMESTAMP,
       water_depth REAL, -- in meters. It is helpful to comment the units!
       total_pen REAL -- in meters
);

Adding data to a table

Now that you have a table defined, you would like to add data to it. Let's start with just one line:

LegSiteHLongitudeLatitudeDate ArrivedWater Depth (m)Total Pen (m)
100625A-87.160028.831701/12/85 1010889234.9

The INSERT command puts data in the table.

INSERT INTO holes (leg, site, hole, lon, lat, arrived, water_depth, total_pen)
       VALUES (100,  625, 'A', -87.1600, 28.8317, '1985-01-12 10:10', 889, 234.9);

Getting the data back

FIX: write stuff to explain this!

sqlite> SELECT * FROM holes;
1|100|625|A|-87.16|28.8317|1985-01-12 10:10|889.0|234.9

The output from the select is ugly and hard to read. Lets try again, but first ask SQLite to do a better job formatting the output.

sqlite> .header ON
sqlite> .separator "\t"
sqlite> SELECT * FROM holes;
id      leg     site    hole    lon     lat     arrived water_depth     total_pen
1       100     625     A       -87.16  28.8317 1985-01-12 10:10        889.0   234.9

Tab characters are better, but not perfect.

Repeating the above from within Python

A SQLite driver comes with python starting with version 2.5 of python. If you are using Python 2.4 or older, you need to update your computer.

python
import sqlite3
cx = sqlite3.connect('holes2.db3')
cx.execute('''CREATE TABLE IF NOT EXISTS holes (
       -- Two dashes are a comment to the end of a line
        id INTEGER PRIMARY KEY, -- every table should have a primary key
        leg INTEGER,
        site INTEGER,
        hole VARCHAR(1),  -- a string of length 1
        lon REAL, -- Longitude
        lat REAL, -- Latitude
        arrived TIMESTAMP,
        water_depth REAL, -- in meters. It is helpful to comment the units!
        total_pen REAL -- in meters
 );
 ''')
cx.execute('''INSERT INTO holes (leg, site, hole, lon, lat, arrived, water_depth, total_pen)
       VALUES (100,  625, 'A', -87.1600, 28.8317, '1985-01-12 10:10', 889, 234.9);''')

Now we need to fetch the data back from the database. The SELECT call will return a "cursor".

FIX: explain cursors

cu = cx.execute('SELECT * FROM holes;')
print cu.fetchone()

The print should return the first line from the database.

(1,
 100,
 625,
 u'A',
 -87.159999999999997,
 28.831700000000001,
 u'1985-01-12 10:10',
 889.0,
 234.90000000000001)

Conventions - CAPITALIZATION

By now you have noticed that I capitalize some things and not others. SQL does not care, but there is a convention with most database programmers. Keywords in SQL are written in CAPITAL LETTERS, while column names are all lower case.

FIX: flush out

Switching to PostgreSQL

createuser -U postgres $USER createdb drilling

backup and restore

pgAdmin3 interface

Other databases and tools

Old school

CSV and gdbm/bdb

XML

It's not a traditional database, but XML can work like a database.

SpatialLite

MySQL

Oracle and OracleSpatial

Fancy Non-traditional databases

couch, column focused databases

Knowing when you need to get a pro

Additional Reading and References

FIX: write this!

Web sites

Cheat sheets

Books

Videos

Google Tech Talks? PostGIS status video

Author: Kurt Schwehr

Date: $Date: $

HTML generated by org-mode 7.3 in emacs 23