Class 8: More emacs and script files (DRAFT)
Table of Contents
Introduction
Getting help stackoverflow
Videos on Emacs
Sadly, http://showmedo.com does not have any good videos on the basics of using emacs. You might find the Hack Emacs videos on YouTube by rpdillon useful for getting more comfortable with emacs.
Creating a log file logging
For the rest of the semester, you need to keep a log file for this class.
mkdir ~/Dropbox/logs
Working with the ocean drilling projects site database odp
mkdir -p class/8 cd class/8
Today, we will use a program very similar to wget called curl (curl.1 man page) to fetch data.
sudo apt-get install curl curl -O http://vislab-ccom.unh.edu/~schwehr/Classes/2011/esci895-researchtools/examples/holes.csv.bz2 # /esci895-researchtools/examples/holes.csv.bz2 # % Total % Received % Xferd Average Speed Time Time Time Current # Dload Upload Total Spent Left Speed # 100 38953 100 38953 0 0 224k 0 --:--:-- --:--:-- --:--:-- 358k
Uncompress and take a first look at this file:
bunzip2 holes.csv.bz2 wc -l holes.csv # 3047 holes.csv head holes.csv
The beginning of the holes.csv file looks like this:
Expedition,Site,Hole,Program,Longitude,Latitude,Water Depth (m),Core Recovered (m) 1,1,,DSDP,-92.1833,25.8583,2827,50 1,2,,DSDP,-92.0587,23.0455,3572,13 1,3,,DSDP,-92.0433,23.03,3747,47 1,4,,DSDP,-73.792,24.478,5319,15 1,4,A,DSDP,-73.792,24.478,5319,5.8 1,5,,DSDP,-73.641,24.7265,5354,6.4 1,5,A,DSDP,-73.641,24.7265,5354,1.8 1,6,,DSDP,-67.6477,30.8398,5124,28 1,6,A,DSDP,-67.6477,30.8398,5124
Now we are going to use a program called cut to try to extract the "Program" column of the file. You can see above in the comma separated value (CSV) formatted data that there is at least a "DSDP", which is the Deep Sea Drill Program that ran from 1968 to 1983. Cut can work a couple different ways, but here we are going to ask it to work in "field mode" and tell it that commas (",") are the delimiter (or separator) between fields. We do that with a "-d" and the comma character. We then specify the number of the field we want. Looking at the first line of the file, you can see that "Program" appears in the fourth position.
cut -d, -f4 holes.csv | head # Program # DSDP # DSDP # DSDP # DSDP # DSDP # DSDP # DSDP # DSDP # DSDP
When you run the above command, you will only see the first 10 lines on the screen. That is not very helpful. We would like to see how many unique entry types there are. The uniq command removes duplicates in the lines of text that it receives.
cut -d, -f4 holes.csv | uniq # Program # DSDP # ODP # IODP
Next, let's see how many lines there are for each program. We can
pass the output of the grep to the word count program we used before.
wc
has an option to only print the number of lines, so we will
add "-l" to the command line.
The data gets passed from one program to another by a pipe. What goes in one side, comes out the other. A pipe is created by the vertical bar character: "|".
egrep DSDP holes.csv | wc -l # the letter "l" as in Lima, not the number 1 # 1116 egrep ODP holes.csv | wc -l # 1930 egrep IODP holes.csv | wc -l # 153
We have a slight problem here in that the counts are not adding up. The string ODP is found in both the ODP and IODP entries. Here I am using the "binary calculator" (bc.1 man page) to do a little math. I suspect you can just do this by hand, but the example shows another pipe.
# The 3 results from the word counts above echo "1116 + 1930 + 153" | bc # 3199 # That adds up to more than the number of lines in the file wc -l holes.csv # 3047 holes.csv
We can use the "," that precedes the ODP to help avoid the IODP.
egrep 'ODP' holes.csv | wc -l # 1930 egrep ',ODP' holes.csv | wc -l # 1777
There are lots of other ways that we could have solved this, but this way is pretty simple compared to some of the others.
Writing results to a file and making a quick plot with Gnuplot gnuplot redirection
It is always important to get a graphical view of spatial data. Later in this chapter, we will start using Google Earth and in a future chapter, we will load our data into a Geographical Information System (GIS). For now, we will draw the locations with Gnuplot. This graphing program is not as flexible as matplotlib that we will cover in the programming in Python chapters, but it can definitely get the job done.
Gnuplot works most easily with files that have space delimited rather than comma delimited text data values. We need to pull out the longitude and latitude values from the holes.csv file. We can start back with the cut command that we used before. This time we will give it two different fields in the csv to print with "-f5-6". This means we are asking for fields 5 through 6. We could also have said "-f5,6", which would be fields 5 and 6.
cut -d, -f5-6 holes.csv | head
Longitude,Latitude -92.1833,25.8583 -92.0587,23.0455 -92.0433,23.03 -73.792,24.478 -73.792,24.478 -73.641,24.7265 -73.641,24.7265 -67.6477,30.8398 -67.6477,30.8398
Gnuplot will get confused by the "Longitude,Latitude" strings on the first line. We can get rid of this line with the egrep command. Normally, egrep returns the lines that match, be we can ask it to return all lines that do not match by giving it the inverse option of "-v". We then give it string "Longitude" to match and it returns all lines that do not match.
egrep -v Longitude holes.csv | cut -d, -f5-6 | head
-92.1833,25.8583 -92.0587,23.0455 -92.0433,23.03 -73.792,24.478 -73.792,24.478 -73.641,24.7265 -73.641,24.7265 -67.6477,30.8398 -67.6477,30.8398 -68.2967,30.134
The output above is pretty close to being usable, but we have a ","
characters between each longitude and latitude. We can use the
tr (translate) command to exchange the "," for a " " (space).
Make sure to place the tr
after the cut
command or cut
will not be able to tell the comma separated fields apart.
egrep -v Longitude holes.csv | cut -d, -f5-6 | tr "," " " | head
-92.1833 25.8583 -92.0587 23.0455 -92.0433 23.03 -73.792 24.478 -73.792 24.478 -73.641 24.7265 -73.641 24.7265 -67.6477 30.8398 -67.6477 30.8398 -68.2967 30.134
This is the format that we need for Gnuplot, but we need the longitude and latitude lines saved to a file. The ">" (great than character) "redirects" the output from the last program in the chain of pipes to a file that is named after the ">". Be warned that ">" will overwrite a previous file with the same name if one existed. First, try a simpler example to see ">" in action. Here, I also use the cat (concatenate and print files) command to dump the contents of the "listing" file to the terminal. cat is much simpler than less, but if a file is very long or you are not sure how long the file is, you are better off using less.
Note: ">>" appends to a file if it already exists or create a new file when needed, whereas ">" will clobber a file in one already exists.
ls -la > listing # You output may be different depending on the files you have in your # current directory cat listing
ls -l total 124 -rw-r--r-- 1 researchtools researchtools 125861 2011-09-22 04:46 holes.csv
Now that you know how to redirect the output to a file, send the
results of the chain of pipes consisting of egrep
, cut
,
and tr
to the file "xy.dat".
egrep -v Longitude holes.csv | cut -d, -f5-6 | tr "," " " > xy.dat head xy.dat
-92.1833 25.8583 -92.0587 23.0455 -92.0433 23.03 -73.792 24.478 -73.792 24.478 -73.641 24.7265 -73.641 24.7265 -67.6477 30.8398 -67.6477 30.8398 -68.2967 30.134
It is time to give gnuplot a quick try. This does not give you much
of a sense of what gnuplot
can do, but we can at least look at the
locations of the cores.
Note for Cygwin users: You must be running a shell through X11 to be able to plot with Gnuplot. If you are on Linux or Mac, this should just work with a graph popping up on your screen.
gnuplot plot 'xy.dat' # There should be a plot of the data on your screen. quit
That looks really wrong! Check it out with the GMT minmax
command
from the homework:
GMT minmax xy.dat
This looks very wrong!!
xy.dat: N = 3046 <-179.5558/179.738> <-77.4413/5736.4>
A latitude higher than 90 North is definitely wrong. Let's constrain the plot to the glob and see what we get.
gnuplot
set yrange [-90:90]
plot 'xy.dat'
quit
To get this database to work, we will clearly need to do some fixing of problems. Lesson:
Real data has real warts.
This will be the last time that we use gnuplot
. We will do the rest
of our plotting using matplotlib in python!
You can see examples of the wide range of plots that can be made with Gnuplot here:
Creating a Google Earth KML googleearth kml
Now we are going to create our first KML file. We are going to cheat a bit and not try to understand the file format, but this will at least show you how easy it can be.
First, get the header and footer text for the KML line format:
curl -O http://vislab-ccom.unh.edu/~schwehr/Classes/2011/esci895-researchtools/google-earth-line-start.kml curl -O http://vislab-ccom.unh.edu/~schwehr/Classes/2011/esci895-researchtools/google-earth-line-end.kml
These two pieces give you the front and back of the KML and all we need to do is provide the coordinates for the
Get the coordinates file from the Boston Construction file used during the homework:
curl -O http://vislab-ccom.unh.edu/~schwehr/Classes/2011/esci895-researchtools/examples/2007-boston-construction.csv.bz2 bunzip2 2007-boston-construction.csv.bz2
Take a look at the file:
head 2007-boston-construction.csv
-70.5014566667,42.1006833333,1179617934 -70.5016466667,42.101755,1179617991 -70.501845,42.1028766667,1179618051 -70.5020833333,42.1039,1179618111 -70.5022083333,42.1049116667,1179618176 -70.5022883333,42.1059316667,1179618233 -70.502515,42.1069266667,1179618296 -70.5027566667,42.10796,1179618356 -70.5028616667,42.1090066667,1179618416 -70.5029816667,42.1102133333,1179618486
We can reuse the cut command to get just the X and Y coordinates:
cut -d, -f1,2 2007-boston-construction.csv | head -70.5014566667,42.1006833333 -70.5016466667,42.101755 -70.501845,42.1028766667 -70.5020833333,42.1039 -70.5022083333,42.1049116667 -70.5022883333,42.1059316667 -70.502515,42.1069266667 -70.5027566667,42.10796 -70.5028616667,42.1090066667 -70.5029816667,42.1102133333
We are lucky! KML expects coordinates to come as x,y,z or x,y
<Placemark> <LineString> <coordinates> -125.810021667,48.4840316667 -125.810295,48.483705 </coordinates> </LineString> </Placemark>
Let's create the x,y pairs in a file:
cut -d, -f1,2 2007-boston-construction.csv > 2007-boston-construction.xy
We can now put the header, points and tail together to create a KML file. Google Earth has trouble with lines with too many points in them, so we will use head to only output some of the points.
cat google-earth-line-start.kml > 2007-boston-construction.kml head -1000 2007-boston-construction.xy >> 2007-boston-construction.kml cat google-earth-line-end.kml >> 2007-boston-construction.kml
Creating a script
There are some key tricks to understanding variables in bash. First, you must have no spaces before or after the equal sign. Bash is very picky about this. The other part is where your variable is available. Without the export, the variable is not available to other programs that are called from the command line. For us, right now, the export is not important, but later on for things like the PATH variable that control where to look for programs, export is essential.
To demonstrate variables, we will use the echo command which will just print out to the screen whatever we pass to it. Give it a try. The "$" character starts the use of a variable.
# Set a variable testing=123 # Print the variable echo $testing # 123 # Start a new bash shell inside the original one bash # See that "testing" is not set. If there is no variable, bash gives # an empty string echo $testing # quit back to the main bash shell exit # Set testing to have a value that will be inherited export testing="hello world" bash # Now see that the exported variable went through echo $testing # hello world
How can we use a variable to help out? What if we want to download one image every hour from one day on the USCGC Healy? Here is the 2010 set of images for the Healy:
http://mgds.ldeo.columbia.edu/healy/reports/aloftcon/2010/
Open emacs open a file called "healy.sh" and start typing:
for hour in 01 02 03 04 05 06 07
do
echo $hour
done
Try running that from the terminal.
source healy.sh
You should see:
01 02 03 04 05 06 07
Now we can try to construct a curl command in the echo.
for hour in 01 02 03 04 05 06 07
do
echo curl -O http://mgds.ldeo.columbia.edu/healy/reports/aloftcon/2010/20101009-${hour}01.jpeg
done
Try it and you should see:
curl -O http://mgds.ldeo.columbia.edu/healy/reports/aloftcon/2010/20101009-0101.jpeg curl -O http://mgds.ldeo.columbia.edu/healy/reports/aloftcon/2010/20101009-0201.jpeg curl -O http://mgds.ldeo.columbia.edu/healy/reports/aloftcon/2010/20101009-0301.jpeg curl -O http://mgds.ldeo.columbia.edu/healy/reports/aloftcon/2010/20101009-0401.jpeg curl -O http://mgds.ldeo.columbia.edu/healy/reports/aloftcon/2010/20101009-0501.jpeg curl -O http://mgds.ldeo.columbia.edu/healy/reports/aloftcon/2010/20101009-0601.jpeg curl -O http://mgds.ldeo.columbia.edu/healy/reports/aloftcon/2010/20101009-0701.jpeg
Date: <2011-09-22 Thu>
HTML generated by org-mode 7.4 in emacs 23