UP | HOME

Class 19: BAGs part 2, XML Metadata

Table of Contents

Introduction

See also

Setup

mkdir -p ~/class/19
cd ~/class/19

A slight intermission

Start ipython. We had a brown out last week at the end of class and some people lost their histories.

%quickref # Get an overview from ipython
%logstart? # Ask for help with logging

logstart log-class-19.py

# Check out the log file
!head log-class-19.py

Your head results will look something like this:

!head log-class-19.py
#log# Automatic Logger file. *** THIS MUST BE THE FIRST LINE ***
#log# DO NOT CHANGE THIS LINE OR THE TWO BELOW
#log# opts = Struct({'__allownew': True, 'logfile': 'log-class-19.py'})
#log# args = []
#log# It is safe to make manual edits below here.
#log#-----------------------------------------------------------------------
_ip.magic("logstate ")
help %logstart
help(logstart)
#?%logstart

And now for some fun!

import antigravity

Getting back to where we were before with BAGs

Make sure you are in ipython. Note that there is a bang ( ! ) in front of the first two commands to ipython. The ! is a shell escape to use the bash shell to call wget and bunzip2

!wget http://vislab-ccom.unh.edu/~schwehr/rt/examples/old-bags/H11703_Office_5m.bag.bz2
!bunzip2 *.bz2
ls -l

Last time, we went through a very painful process to use h5dump to view the metadata. Thanks to Marcus Cole, we have a better way to view the xml from the command line. We can quick try it from ipython.

!h5dump -b FILE -o H11703_Office_5m.xml -d BAG_root/metadata H11703_Office_5m.bag
# You will see a few summary lines (like 13 lines)

!file H11703_Office_5m.xml
# H11703_Office_5m.xml: XML  document text

!less
# <?xml version="1.0"?>
# <smXML:MD_Metadata xmlns:smXML="http://metadata.dgiwg.org/smXML" xmlns:xlink="http://www.w3.org/19
# and so on
import h5py  # To read HDF5 files (e.g. our BAG bathy)
import numpy # for NAN aka "not a number"
from matplotlib import pyplot

bag = h5py.File('H11703_Office_5m.bag')
bag.keys()
# ['BAG_root']

bag['BAG_root'].keys()
# ['elevation', 'metadata', 'tracking_list', 'uncertainty']

elevation = bag['/BAG_root/elevation'].value

# Convert large values used for unknown values to
# NANs a.k.a "Not a Number"

# This is how we did it last time, but it is slow and takes 4 lines:
#
# for x in range(elevation.shape[0]):
#     for y in range(elevation.shape[1]):
#         if elevation[x,y] > 0:
#             elevation[x,y] = numpy.NAN

# The less obvious, but faster solution:
# 
elevation[elevation > 0] = numpy.NAN

# Try this to see that it builds a "mask" matrix:
elevation > 0

pyplot.interactive(True) # Make plotting easier to work with

pyplot.figure(1)
pyplot.subplot(221)
pyplot.imshow(elevation)

# pyplot.show() # Not needed because we have set interactive to True

Here is wehre I learned how to do the faster replacement of NAN in a matrix / numpy ndarray:

http://stackoverflow.com/questions/7994133/fast-in-place-replacement-of-some-values-in-a-numpy-array

We are now back to where we were at the end of last class.

Time to try to make a histogram of the elevations. First, make a 1D list of just the valid data. There is likely a much faster way to do this in numpy!

e_data = elevation.reshape(elevation.size)
e_data_clean = [ ]
for i in range(len(e_data)):
    if not numpy.isnan(e_data[i]):
        e_data_clean.append(e_data[i])

# len(e_data_clean)
# 228449

e_data.size - len(e_data_clean)
# 2645287 cells remain with NANs

# pyplot.figure(2)
pyplot.subplot(222)
pyplot.hist(e_data_clean, bins=100)

Looking at the uncertainty

uncertainty = bag['/BAG_root/uncertainty'].value

uncertainty.shape
# (1434, 2004)

uncertainty.min()
# 0.21800001

uncertainty.max()
# 1000000.0

uncertainty[uncertainty > 1] = numpy.NAN

pyplot.subplot(223)
pyplot.imshow(uncertainty)

And make a histogram of the uncertainty

u_data = uncertainty.reshape(uncertainty.size)
u_data_clean = [ ]
for i in range(len(u_data)):
    if not numpy.isnan(u_data[i]):
        u_data_clean.append(u_data[i])

pyplot.subplot(224)
pyplot.hist(u_data_clean, bins=100)

Normally, you would have properly labeled all of the subplots!

Examining the metadata    xml

We are going to use the "Element Tree" interface to XML as provided by the lxml library:

http://lxml.de/tutorial.html#the-elementtree-class

xpath is a way to search the tree of XML for parts that we want.

from lxml import etree

metadata_txt = ''.join(bag['/BAG_root/metadata'])

out = file('H11703_Office_5m-try2.xml','w')
out.write(metadata_txt)
out.close()
!diff H11703_Office_5m.xml H11703_Office_5m-try2.xml
# YES!  The xml files are the same.  Woohoo!

out3 = open('H11703_Office_5m-try3.xml','w')
out3.write( metadata_txt.replace('><','>\n<') )
out3.close()

# !emacsclient --no-wait H11703_Office_5m-try3.xml
edit?
edit -x H11703_Office_5m-try3.xml
# Use this to tell emacs you are done editing the file: C-x # 

# Use the element tree interface to xml
root = etree.fromstring(metadata_txt).getroottree() 

title = root.xpath('//*/title')
title
# it's a list with one "Element"
title = title[0]
title # Yuck, not very nice
title?
title. # Press <TAB> after the period (full stop) to see what an element offers.

title.tag
# 'title'
title.text
# Yes!  This gives us the title

# Put it all together
title = root.xpath('//*/title')[0].text

abstract = root.xpath('//*/abstract')[0].text

xmin = float(root.xpath('//*/westBoundLongitude')[0].text)
xmax = float(root.xpath('//*/eastBoundLongitude')[0].text)

ymin = float(root.xpath('//*/southBoundLatitude')[0].text)
ymax = float(root.xpath('//*/northBoundLatitude')[0].text)

Can we plot the bounding box?    kml

We need something more. matplotlib has basemap, but sadly, this libary was not updated for a long time and did not make it into Ubuntu 11.04. It has recently been updated and hopefully this will be fixed in the near future.

We need to instead give it a go in Google Earth with KML and then in QGIS using the global shoreline db.

Let's create a KML. We will try to do this as a template using the python .format template language in python 2.6 or newer. This will let us put variables into strings fairly easily.

'{myvalue}'.format()
# Error!

'{myvalue}'.format(myvalue='hello world')
'{myvalue}'.format(myvalue=123)

'{xmin}'.format(xmin=xmin)
'{xmin} and {ymax}'.format(xmin=xmin, ymax=ymax)

'{xmin} and {wahoo}'.format(xmin=xman, wahoo=ymax)

bbox = {'xmin': xmin, 'ymin': ymin, 'xmax': xmax, 'ymax': ymax}
bbox

'{xmin} and {ymax}'.format(bbox)
# ERROR!

# Use the crazy expansion syntax of ** to use bbox as arguments
# to the format method of a string
'{xmin} and {ymax}'.format( **bbox )

# Better yet, there is a function that returns a dictionary of all
locals?
len( locals() )
locals().keys()[:10]
locals()

'{xmin},{ymin} {xmax},{ymax}'.format( **locals() )
# '-134.49,57.34 -134.32,57.4'

Actually building KML

See lecture 19!

History

#log# Automatic Logger file. *** THIS MUST BE THE FIRST LINE ***
#log# DO NOT CHANGE THIS LINE OR THE TWO BELOW
#log# opts = Struct({'__allownew': True, 'logfile': 'log-class-19.py'})
#log# args = []
#log# It is safe to make manual edits below here.
#log#-----------------------------------------------------------------------
_ip.magic("quickref ")
_ip.magic("quickref ")
_ip.magic("logstate ")
#?%logstart
_ip.magic("logstart log-class-19.py")

_ip.system("ls -F ")
_ip.system("head log-class-19.py")
import antigravity
_ip.system("wget http://vislab-ccom.unh.edu/~schwehr/rt/examples/old-bags/H11703_Office_5m.bag.bz2")
_ip.system("bunzip2 H11703_Office_5m.bag.bz2")
_ip.system("ls -F -l")
_ip.system("ls -F -l")
import numpy
import h5py
from matplotlib import pyplot
bag = h5py.File('H11703_Office_5m.bag')
bag.keys()
bag['BAG_root'].keys()
elevation = bag['BAG_root/elevation'].value
type(elevation)
elevation.size
elevation.shape
elevation
elevation[elevation > 0] = numpy.NAN
elevation[elevation > 999999] = numpy.NAN
#?pyplot.interactive
pyplot.interactive(True)
pyplot.figure(1)
#pyplot.imshow(elevation)
pyplot.subplot(221)
pyplot.imshow(elevation)
#?e_data = elevation.reshape
#?elevation.reshape
e_data = elevation.reshape(elevation.size)
e_data
e_data_clean = [ ]
for i in range( len(e_data) ):
    if not numpy.isnan (e_data[i]): 
for i in range( len(e_data) ):
    if not numpy.isnan(e_data[i]):
        e_data_clean.append = e_data[i]

for i in range( len(e_data) ):
    if not numpy.isnan(e_data[i]):
        e_data_clean.append( e_data[i] )

len ( e_data )
len ( e_data_clean )
len(e_data) - len(e_data_clean)
( len(e_data) - len(e_data_clean) ) / float( len (e_data) )
pyplot.subplot(222)
pyplot.hist(e_data_clean, bins=100)
uncertainty = bag['/BAG_root/uncertainty'].value
type(uncertainty)
_ip.magic("whos ")
_ip.magic("history ")
uncertainty.shape
uncertainty.min 
uncertainty.min()
uncertainty.max()
uncertainty[uncertainty > 1000] = numpy.NAN
pyplot.subplot(223)
pyplot.imshow(uncertainty)
uncertainty.min()
uncertainty.max()
u_data = uncertainty.reshape(uncertainty.size)
u_data_clean = [ ]
for i in range(len(u_data)):
    if not numpy.isnan(u_data[i]):
        u_data_clean.append(u_data[i])

u_data_clean
pyplot.subplot(224)
pyplot.hist(u_data_clean)
pyplot.hist(u_data_clean, bins=100)
pyplot.cla()
pyplot.hist(u_data_clean, bins=100)
from lxml import etree
metadata_txt = ''.join(bag['/BAG_root/metadata'])
metadata_txt[:50]
out = open('H11703_Office_5m.xml','w')
type(out)
out.write(metadata_txt)
out.close()
_ip.system("ls -F -l")
_ip.magic("edit -x H11703_Office_5m.xml")
out2 = open('H11703_Office_5m-try3.xml','w')
metadata_txt.replace('><','>\n<')
print ( metadata_txt.replace('><','>\n<') )
out2.write(metadata_txt.replace('><','>\n<'))
out2.close()
_ip.system("ls -F -l")
_ip.magic("edit -x H11703_Office_5m-try3.xml")
root = etree.fromstring(metadata_txt).getroottree()
title = root.xpath('//*/title')
title
title = title[0]
title
title.tag
title.text
title = root.xpath('//*/title')[0].text
title
abstract = root.xpath('//*/abstract')[0].text
abstract
xmin = float( root.xpath('//*/westBoundLongtude')[0].text )
xmin = float( root.xpath('//*/westBoundLongitude')[0].text )
xmin
xmax = float(root.xpath('//*/eastBoundLongitude')[0].text)
ymin = float(root.xpath('//*/southBoundLatitude')[0].text)
ymax = float(root.xpath('//*/northBoundLatitude')[0].text)
_ip.magic("whos ")
'hello %d world' % (1234, )
'   {myvalue}   '.format()
'   {myvalue}   '.format(myvalue= 'hello world')
'   {myvalue}   '.format(myvalue= 123.45)
'{xmin}'.format(xmin=xmin)
'{xmin} and {ymax}'.format(xmin=xmin, ymax=ymax)
'{xmin} and {wahoo}'.format(xmin=xmin, wahoo=ymax)
str(ymax)
locals()

Author: Kurt Schwehr

Date: <2011-11-03 Thu>

HTML generated by org-mode 7.4 in emacs 23