Class 19: BAGs part 2, XML Metadata
Table of Contents
Introduction
See also
- 5 part tutorial covering IPython on showmedo: http://showmedo.com/videotutorials/series?name=CnluURUTV
Setup
mkdir -p ~/class/19 cd ~/class/19
A slight intermission
Start ipython. We had a brown out last week at the end of class and some people lost their histories.
%quickref # Get an overview from ipython %logstart? # Ask for help with logging logstart log-class-19.py # Check out the log file !head log-class-19.py
Your head results will look something like this:
!head log-class-19.py
#log# Automatic Logger file. *** THIS MUST BE THE FIRST LINE ***
#log# DO NOT CHANGE THIS LINE OR THE TWO BELOW
#log# opts = Struct({'__allownew': True, 'logfile': 'log-class-19.py'})
#log# args = []
#log# It is safe to make manual edits below here.
#log#-----------------------------------------------------------------------
_ip.magic("logstate ")
help %logstart
help(logstart)
#?%logstart
And now for some fun!
import antigravity
Getting back to where we were before with BAGs
Make sure you are in ipython.  Note that there is a bang ( ! ) in 
front of the first two commands to ipython.  The ! is a shell 
escape to use the bash shell to call wget and bunzip2
!wget http://vislab-ccom.unh.edu/~schwehr/rt/examples/old-bags/H11703_Office_5m.bag.bz2 !bunzip2 *.bz2 ls -l
Last time, we went through a very painful process to use h5dump to view the metadata. Thanks to Marcus Cole, we have a better way to view the xml from the command line. We can quick try it from ipython.
!h5dump -b FILE -o H11703_Office_5m.xml -d BAG_root/metadata H11703_Office_5m.bag # You will see a few summary lines (like 13 lines) !file H11703_Office_5m.xml # H11703_Office_5m.xml: XML document text !less # <?xml version="1.0"?> # <smXML:MD_Metadata xmlns:smXML="http://metadata.dgiwg.org/smXML" xmlns:xlink="http://www.w3.org/19 # and so on
import h5py # To read HDF5 files (e.g. our BAG bathy) import numpy # for NAN aka "not a number" from matplotlib import pyplot bag = h5py.File('H11703_Office_5m.bag') bag.keys() # ['BAG_root'] bag['BAG_root'].keys() # ['elevation', 'metadata', 'tracking_list', 'uncertainty'] elevation = bag['/BAG_root/elevation'].value # Convert large values used for unknown values to # NANs a.k.a "Not a Number" # This is how we did it last time, but it is slow and takes 4 lines: # # for x in range(elevation.shape[0]): # for y in range(elevation.shape[1]): # if elevation[x,y] > 0: # elevation[x,y] = numpy.NAN # The less obvious, but faster solution: # elevation[elevation > 0] = numpy.NAN # Try this to see that it builds a "mask" matrix: elevation > 0 pyplot.interactive(True) # Make plotting easier to work with pyplot.figure(1) pyplot.subplot(221) pyplot.imshow(elevation) # pyplot.show() # Not needed because we have set interactive to True
Here is wehre I learned how to do the faster replacement of NAN in a matrix / numpy ndarray:
http://stackoverflow.com/questions/7994133/fast-in-place-replacement-of-some-values-in-a-numpy-array
We are now back to where we were at the end of last class.
Time to try to make a histogram of the elevations. First, make a 1D list of just the valid data. There is likely a much faster way to do this in numpy!
e_data = elevation.reshape(elevation.size) e_data_clean = [ ] for i in range(len(e_data)): if not numpy.isnan(e_data[i]): e_data_clean.append(e_data[i]) # len(e_data_clean) # 228449 e_data.size - len(e_data_clean) # 2645287 cells remain with NANs # pyplot.figure(2) pyplot.subplot(222) pyplot.hist(e_data_clean, bins=100)
Looking at the uncertainty
uncertainty = bag['/BAG_root/uncertainty'].value uncertainty.shape # (1434, 2004) uncertainty.min() # 0.21800001 uncertainty.max() # 1000000.0 uncertainty[uncertainty > 1] = numpy.NAN pyplot.subplot(223) pyplot.imshow(uncertainty)
And make a histogram of the uncertainty
u_data = uncertainty.reshape(uncertainty.size) u_data_clean = [ ] for i in range(len(u_data)): if not numpy.isnan(u_data[i]): u_data_clean.append(u_data[i]) pyplot.subplot(224) pyplot.hist(u_data_clean, bins=100)
Normally, you would have properly labeled all of the subplots!
Examining the metadata xml
We are going to use the "Element Tree" interface to XML as provided by the lxml library:
http://lxml.de/tutorial.html#the-elementtree-class
xpath is a way to search the tree of XML for parts that we want.
from lxml import etree metadata_txt = ''.join(bag['/BAG_root/metadata']) out = file('H11703_Office_5m-try2.xml','w') out.write(metadata_txt) out.close() !diff H11703_Office_5m.xml H11703_Office_5m-try2.xml # YES! The xml files are the same. Woohoo! out3 = open('H11703_Office_5m-try3.xml','w') out3.write( metadata_txt.replace('><','>\n<') ) out3.close() # !emacsclient --no-wait H11703_Office_5m-try3.xml edit? edit -x H11703_Office_5m-try3.xml # Use this to tell emacs you are done editing the file: C-x # # Use the element tree interface to xml root = etree.fromstring(metadata_txt).getroottree() title = root.xpath('//*/title') title # it's a list with one "Element" title = title[0] title # Yuck, not very nice title? title. # Press <TAB> after the period (full stop) to see what an element offers. title.tag # 'title' title.text # Yes! This gives us the title # Put it all together title = root.xpath('//*/title')[0].text abstract = root.xpath('//*/abstract')[0].text xmin = float(root.xpath('//*/westBoundLongitude')[0].text) xmax = float(root.xpath('//*/eastBoundLongitude')[0].text) ymin = float(root.xpath('//*/southBoundLatitude')[0].text) ymax = float(root.xpath('//*/northBoundLatitude')[0].text)
Can we plot the bounding box? kml
We need something more. matplotlib has basemap, but sadly, this libary was not updated for a long time and did not make it into Ubuntu 11.04. It has recently been updated and hopefully this will be fixed in the near future.
We need to instead give it a go in Google Earth with KML and then in QGIS using the global shoreline db.
Let's create a KML. We will try to do this as a template using the python .format template language in python 2.6 or newer. This will let us put variables into strings fairly easily.
'{myvalue}'.format() # Error! '{myvalue}'.format(myvalue='hello world') '{myvalue}'.format(myvalue=123) '{xmin}'.format(xmin=xmin) '{xmin} and {ymax}'.format(xmin=xmin, ymax=ymax) '{xmin} and {wahoo}'.format(xmin=xman, wahoo=ymax) bbox = {'xmin': xmin, 'ymin': ymin, 'xmax': xmax, 'ymax': ymax} bbox '{xmin} and {ymax}'.format(bbox) # ERROR! # Use the crazy expansion syntax of ** to use bbox as arguments # to the format method of a string '{xmin} and {ymax}'.format( **bbox ) # Better yet, there is a function that returns a dictionary of all locals? len( locals() ) locals().keys()[:10] locals() '{xmin},{ymin} {xmax},{ymax}'.format( **locals() ) # '-134.49,57.34 -134.32,57.4'
Actually building KML
See lecture 19!
History
#log# Automatic Logger file. *** THIS MUST BE THE FIRST LINE *** #log# DO NOT CHANGE THIS LINE OR THE TWO BELOW #log# opts = Struct({'__allownew': True, 'logfile': 'log-class-19.py'}) #log# args = [] #log# It is safe to make manual edits below here. #log#----------------------------------------------------------------------- _ip.magic("quickref ") _ip.magic("quickref ") _ip.magic("logstate ") #?%logstart _ip.magic("logstart log-class-19.py") _ip.system("ls -F ") _ip.system("head log-class-19.py") import antigravity _ip.system("wget http://vislab-ccom.unh.edu/~schwehr/rt/examples/old-bags/H11703_Office_5m.bag.bz2") _ip.system("bunzip2 H11703_Office_5m.bag.bz2") _ip.system("ls -F -l") _ip.system("ls -F -l") import numpy import h5py from matplotlib import pyplot bag = h5py.File('H11703_Office_5m.bag') bag.keys() bag['BAG_root'].keys() elevation = bag['BAG_root/elevation'].value type(elevation) elevation.size elevation.shape elevation elevation[elevation > 0] = numpy.NAN elevation[elevation > 999999] = numpy.NAN #?pyplot.interactive pyplot.interactive(True) pyplot.figure(1) #pyplot.imshow(elevation) pyplot.subplot(221) pyplot.imshow(elevation) #?e_data = elevation.reshape #?elevation.reshape e_data = elevation.reshape(elevation.size) e_data e_data_clean = [ ] for i in range( len(e_data) ): if not numpy.isnan (e_data[i]): for i in range( len(e_data) ): if not numpy.isnan(e_data[i]): e_data_clean.append = e_data[i] for i in range( len(e_data) ): if not numpy.isnan(e_data[i]): e_data_clean.append( e_data[i] ) len ( e_data ) len ( e_data_clean ) len(e_data) - len(e_data_clean) ( len(e_data) - len(e_data_clean) ) / float( len (e_data) ) pyplot.subplot(222) pyplot.hist(e_data_clean, bins=100) uncertainty = bag['/BAG_root/uncertainty'].value type(uncertainty) _ip.magic("whos ") _ip.magic("history ") uncertainty.shape uncertainty.min uncertainty.min() uncertainty.max() uncertainty[uncertainty > 1000] = numpy.NAN pyplot.subplot(223) pyplot.imshow(uncertainty) uncertainty.min() uncertainty.max() u_data = uncertainty.reshape(uncertainty.size) u_data_clean = [ ] for i in range(len(u_data)): if not numpy.isnan(u_data[i]): u_data_clean.append(u_data[i]) u_data_clean pyplot.subplot(224) pyplot.hist(u_data_clean) pyplot.hist(u_data_clean, bins=100) pyplot.cla() pyplot.hist(u_data_clean, bins=100) from lxml import etree metadata_txt = ''.join(bag['/BAG_root/metadata']) metadata_txt[:50] out = open('H11703_Office_5m.xml','w') type(out) out.write(metadata_txt) out.close() _ip.system("ls -F -l") _ip.magic("edit -x H11703_Office_5m.xml") out2 = open('H11703_Office_5m-try3.xml','w') metadata_txt.replace('><','>\n<') print ( metadata_txt.replace('><','>\n<') ) out2.write(metadata_txt.replace('><','>\n<')) out2.close() _ip.system("ls -F -l") _ip.magic("edit -x H11703_Office_5m-try3.xml") root = etree.fromstring(metadata_txt).getroottree() title = root.xpath('//*/title') title title = title[0] title title.tag title.text title = root.xpath('//*/title')[0].text title abstract = root.xpath('//*/abstract')[0].text abstract xmin = float( root.xpath('//*/westBoundLongtude')[0].text ) xmin = float( root.xpath('//*/westBoundLongitude')[0].text ) xmin xmax = float(root.xpath('//*/eastBoundLongitude')[0].text) ymin = float(root.xpath('//*/southBoundLatitude')[0].text) ymax = float(root.xpath('//*/northBoundLatitude')[0].text) _ip.magic("whos ") 'hello %d world' % (1234, ) ' {myvalue} '.format() ' {myvalue} '.format(myvalue= 'hello world') ' {myvalue} '.format(myvalue= 123.45) '{xmin}'.format(xmin=xmin) '{xmin} and {ymax}'.format(xmin=xmin, ymax=ymax) '{xmin} and {wahoo}'.format(xmin=xmin, wahoo=ymax) str(ymax) locals()
Date: <2011-11-03 Thu>
HTML generated by org-mode 7.4 in emacs 23