UP | HOME

Class 8: More emacs and script files (DRAFT)

Table of Contents

Introduction

Getting help    stackoverflow

Videos on Emacs

Sadly, http://showmedo.com does not have any good videos on the basics of using emacs. You might find the Hack Emacs videos on YouTube by rpdillon useful for getting more comfortable with emacs.

Creating a log file    logging

For the rest of the semester, you need to keep a log file for this class.

mkdir ~/Dropbox/logs

Working with the ocean drilling projects site database    odp

mkdir -p class/8
cd class/8

Today, we will use a program very similar to wget called curl (curl.1 man page) to fetch data.

sudo apt-get install curl

curl -O http://vislab-ccom.unh.edu/~schwehr/Classes/2011/esci895-researchtools/examples/holes.csv.bz2
# /esci895-researchtools/examples/holes.csv.bz2
#  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
#                                  Dload  Upload   Total   Spent    Left  Speed
# 100 38953  100 38953    0     0   224k      0 --:--:-- --:--:-- --:--:--  358k

Uncompress and take a first look at this file:

bunzip2 holes.csv.bz2

wc -l holes.csv 
# 3047 holes.csv

head holes.csv 

The beginning of the holes.csv file looks like this:

Expedition,Site,Hole,Program,Longitude,Latitude,Water Depth (m),Core Recovered (m)
1,1,,DSDP,-92.1833,25.8583,2827,50
1,2,,DSDP,-92.0587,23.0455,3572,13
1,3,,DSDP,-92.0433,23.03,3747,47
1,4,,DSDP,-73.792,24.478,5319,15
1,4,A,DSDP,-73.792,24.478,5319,5.8
1,5,,DSDP,-73.641,24.7265,5354,6.4
1,5,A,DSDP,-73.641,24.7265,5354,1.8
1,6,,DSDP,-67.6477,30.8398,5124,28
1,6,A,DSDP,-67.6477,30.8398,5124

Now we are going to use a program called cut to try to extract the "Program" column of the file. You can see above in the comma separated value (CSV) formatted data that there is at least a "DSDP", which is the Deep Sea Drill Program that ran from 1968 to 1983. Cut can work a couple different ways, but here we are going to ask it to work in "field mode" and tell it that commas (",") are the delimiter (or separator) between fields. We do that with a "-d" and the comma character. We then specify the number of the field we want. Looking at the first line of the file, you can see that "Program" appears in the fourth position.

cut -d, -f4 holes.csv | head
# Program
# DSDP
# DSDP
# DSDP
# DSDP
# DSDP
# DSDP
# DSDP
# DSDP
# DSDP

When you run the above command, you will only see the first 10 lines on the screen. That is not very helpful. We would like to see how many unique entry types there are. The uniq command removes duplicates in the lines of text that it receives.

cut -d, -f4 holes.csv | uniq
# Program
# DSDP
# ODP
# IODP

Next, let's see how many lines there are for each program. We can pass the output of the grep to the word count program we used before. wc has an option to only print the number of lines, so we will add "-l" to the command line.

The data gets passed from one program to another by a pipe. What goes in one side, comes out the other. A pipe is created by the vertical bar character: "|".

egrep DSDP holes.csv | wc -l  # the letter "l" as in Lima, not the number 1
# 1116

egrep ODP holes.csv | wc -l
# 1930

egrep IODP holes.csv | wc -l
# 153

We have a slight problem here in that the counts are not adding up. The string ODP is found in both the ODP and IODP entries. Here I am using the "binary calculator" (bc.1 man page) to do a little math. I suspect you can just do this by hand, but the example shows another pipe.

# The 3 results from the word counts above
echo  "1116 + 1930 + 153" | bc
# 3199

# That adds up to more than the number of lines in the file
wc -l holes.csv
# 3047 holes.csv

We can use the "," that precedes the ODP to help avoid the IODP.

egrep 'ODP' holes.csv  | wc -l
# 1930

egrep ',ODP' holes.csv  | wc -l
# 1777

There are lots of other ways that we could have solved this, but this way is pretty simple compared to some of the others.

Writing results to a file and making a quick plot with Gnuplot    gnuplot redirection

It is always important to get a graphical view of spatial data. Later in this chapter, we will start using Google Earth and in a future chapter, we will load our data into a Geographical Information System (GIS). For now, we will draw the locations with Gnuplot. This graphing program is not as flexible as matplotlib that we will cover in the programming in Python chapters, but it can definitely get the job done.

Gnuplot works most easily with files that have space delimited rather than comma delimited text data values. We need to pull out the longitude and latitude values from the holes.csv file. We can start back with the cut command that we used before. This time we will give it two different fields in the csv to print with "-f5-6". This means we are asking for fields 5 through 6. We could also have said "-f5,6", which would be fields 5 and 6.

cut -d, -f5-6 holes.csv | head
Longitude,Latitude
-92.1833,25.8583
-92.0587,23.0455
-92.0433,23.03
-73.792,24.478
-73.792,24.478
-73.641,24.7265
-73.641,24.7265
-67.6477,30.8398
-67.6477,30.8398

Gnuplot will get confused by the "Longitude,Latitude" strings on the first line. We can get rid of this line with the egrep command. Normally, egrep returns the lines that match, be we can ask it to return all lines that do not match by giving it the inverse option of "-v". We then give it string "Longitude" to match and it returns all lines that do not match.

egrep -v Longitude holes.csv | cut -d, -f5-6 | head
-92.1833,25.8583
-92.0587,23.0455
-92.0433,23.03
-73.792,24.478
-73.792,24.478
-73.641,24.7265
-73.641,24.7265
-67.6477,30.8398
-67.6477,30.8398
-68.2967,30.134

The output above is pretty close to being usable, but we have a "," characters between each longitude and latitude. We can use the tr (translate) command to exchange the "," for a " " (space). Make sure to place the tr after the cut command or cut will not be able to tell the comma separated fields apart.

egrep -v Longitude holes.csv | cut -d, -f5-6 | tr "," " " | head
-92.1833 25.8583
-92.0587 23.0455
-92.0433 23.03
-73.792 24.478
-73.792 24.478
-73.641 24.7265
-73.641 24.7265
-67.6477 30.8398
-67.6477 30.8398
-68.2967 30.134

This is the format that we need for Gnuplot, but we need the longitude and latitude lines saved to a file. The ">" (great than character) "redirects" the output from the last program in the chain of pipes to a file that is named after the ">". Be warned that ">" will overwrite a previous file with the same name if one existed. First, try a simpler example to see ">" in action. Here, I also use the cat (concatenate and print files) command to dump the contents of the "listing" file to the terminal. cat is much simpler than less, but if a file is very long or you are not sure how long the file is, you are better off using less.

Note: ">>" appends to a file if it already exists or create a new file when needed, whereas ">" will clobber a file in one already exists.

ls -la > listing

# You output may be different depending on the files you have in your
# current directory
cat listing
ls -l
total 124
-rw-r--r-- 1 researchtools researchtools 125861 2011-09-22 04:46 holes.csv

Now that you know how to redirect the output to a file, send the results of the chain of pipes consisting of egrep, cut, and tr to the file "xy.dat".

egrep -v Longitude holes.csv | cut -d, -f5-6 | tr "," " " > xy.dat

head xy.dat
-92.1833 25.8583
-92.0587 23.0455
-92.0433 23.03
-73.792 24.478
-73.792 24.478
-73.641 24.7265
-73.641 24.7265
-67.6477 30.8398
-67.6477 30.8398
-68.2967 30.134

It is time to give gnuplot a quick try. This does not give you much of a sense of what gnuplot can do, but we can at least look at the locations of the cores.

Note for Cygwin users: You must be running a shell through X11 to be able to plot with Gnuplot. If you are on Linux or Mac, this should just work with a graph popping up on your screen.

gnuplot
plot 'xy.dat'
# There should be a plot of the data on your screen.
quit

That looks really wrong! Check it out with the GMT minmax command from the homework:

GMT minmax xy.dat

This looks very wrong!!

xy.dat: N = 3046        <-179.5558/179.738>     <-77.4413/5736.4>

A latitude higher than 90 North is definitely wrong. Let's constrain the plot to the glob and see what we get.

gnuplot
set yrange [-90:90]
plot 'xy.dat'
quit

To get this database to work, we will clearly need to do some fixing of problems. Lesson:

Real data has real warts.

This will be the last time that we use gnuplot. We will do the rest of our plotting using matplotlib in python!

You can see examples of the wide range of plots that can be made with Gnuplot here:

http://www.gnuplot.info/screenshots/

Creating a Google Earth KML    googleearth kml

Now we are going to create our first KML file. We are going to cheat a bit and not try to understand the file format, but this will at least show you how easy it can be.

First, get the header and footer text for the KML line format:

curl -O http://vislab-ccom.unh.edu/~schwehr/Classes/2011/esci895-researchtools/google-earth-line-start.kml
curl -O http://vislab-ccom.unh.edu/~schwehr/Classes/2011/esci895-researchtools/google-earth-line-end.kml

These two pieces give you the front and back of the KML and all we need to do is provide the coordinates for the

Get the coordinates file from the Boston Construction file used during the homework:

curl -O http://vislab-ccom.unh.edu/~schwehr/Classes/2011/esci895-researchtools/examples/2007-boston-construction.csv.bz2

bunzip2 2007-boston-construction.csv.bz2

Take a look at the file:

head 2007-boston-construction.csv 
-70.5014566667,42.1006833333,1179617934
-70.5016466667,42.101755,1179617991
-70.501845,42.1028766667,1179618051
-70.5020833333,42.1039,1179618111
-70.5022083333,42.1049116667,1179618176
-70.5022883333,42.1059316667,1179618233
-70.502515,42.1069266667,1179618296
-70.5027566667,42.10796,1179618356
-70.5028616667,42.1090066667,1179618416
-70.5029816667,42.1102133333,1179618486

We can reuse the cut command to get just the X and Y coordinates:

cut -d, -f1,2 2007-boston-construction.csv | head
-70.5014566667,42.1006833333
-70.5016466667,42.101755
-70.501845,42.1028766667
-70.5020833333,42.1039
-70.5022083333,42.1049116667
-70.5022883333,42.1059316667
-70.502515,42.1069266667
-70.5027566667,42.10796
-70.5028616667,42.1090066667
-70.5029816667,42.1102133333

We are lucky! KML expects coordinates to come as x,y,z or x,y

<Placemark>
  <LineString>
    <coordinates>
      -125.810021667,48.4840316667
      -125.810295,48.483705
    </coordinates>
  </LineString>
</Placemark>

Let's create the x,y pairs in a file:

cut -d, -f1,2 2007-boston-construction.csv > 2007-boston-construction.xy

We can now put the header, points and tail together to create a KML file. Google Earth has trouble with lines with too many points in them, so we will use head to only output some of the points.

cat        google-earth-line-start.kml >  2007-boston-construction.kml
head -1000 2007-boston-construction.xy >> 2007-boston-construction.kml
cat        google-earth-line-end.kml   >> 2007-boston-construction.kml

Creating a script

There are some key tricks to understanding variables in bash. First, you must have no spaces before or after the equal sign. Bash is very picky about this. The other part is where your variable is available. Without the export, the variable is not available to other programs that are called from the command line. For us, right now, the export is not important, but later on for things like the PATH variable that control where to look for programs, export is essential.

To demonstrate variables, we will use the echo command which will just print out to the screen whatever we pass to it. Give it a try. The "$" character starts the use of a variable.

# Set a variable
testing=123

# Print the variable
echo $testing
# 123

# Start a new bash shell inside the original one
bash

# See that "testing" is not set.  If there is no variable, bash gives
# an empty string
echo $testing

# quit back to the main bash shell
exit

# Set testing to have a value that will be inherited
export testing="hello world"

bash

# Now see that the exported variable went through
echo $testing
# hello world

How can we use a variable to help out? What if we want to download one image every hour from one day on the USCGC Healy? Here is the 2010 set of images for the Healy:

http://mgds.ldeo.columbia.edu/healy/reports/aloftcon/2010/

Open emacs open a file called "healy.sh" and start typing:

for hour in 01 02 03 04 05 06 07 
do
  echo $hour
done

Try running that from the terminal.

source healy.sh

You should see:

01
02
03
04
05
06
07

Now we can try to construct a curl command in the echo.

for hour in 01 02 03 04 05 06 07 
do
  echo curl -O http://mgds.ldeo.columbia.edu/healy/reports/aloftcon/2010/20101009-${hour}01.jpeg
done

Try it and you should see:

curl -O http://mgds.ldeo.columbia.edu/healy/reports/aloftcon/2010/20101009-0101.jpeg
curl -O http://mgds.ldeo.columbia.edu/healy/reports/aloftcon/2010/20101009-0201.jpeg
curl -O http://mgds.ldeo.columbia.edu/healy/reports/aloftcon/2010/20101009-0301.jpeg
curl -O http://mgds.ldeo.columbia.edu/healy/reports/aloftcon/2010/20101009-0401.jpeg
curl -O http://mgds.ldeo.columbia.edu/healy/reports/aloftcon/2010/20101009-0501.jpeg
curl -O http://mgds.ldeo.columbia.edu/healy/reports/aloftcon/2010/20101009-0601.jpeg
curl -O http://mgds.ldeo.columbia.edu/healy/reports/aloftcon/2010/20101009-0701.jpeg

Author: Kurt Schwehr

Date: <2011-09-22 Thu>

HTML generated by org-mode 7.4 in emacs 23