Class 18: BAGs, HDF5 and XML

Table of Contents


Bathymetric Attributed Grid (BAG)

Bags were started here at CCOM.

I am not sure how to go from the above to this:

cd ~/class/18
mkdir bags
cd bags
wget http://surveys.ngdc.noaa.gov/mgg/NOS/coast/H12001-H14000/H12263/BAG/H12263_MB_1m_MLLW_1of4.bag.gz
wget http://surveys.ngdc.noaa.gov/mgg/NOS/coast/H12001-H14000/H12263/BAG/H12263_MB_8m_MLLW_combined.bag.gz

ls -l

file *.bag.gz
# H12263_MB_1m_MLLW_1of4.bag.gz:     gzip compressed data, was "H12263_MB_1m_MLLW_1of4.bag", from Unix, last modified: Wed May 25 20:31:52 2011
# H12263_MB_8m_MLLW_combined.bag.gz: gzip compressed data, was "H12263_MB_8m_MLLW_combined.bag", from Unix, last modified: Wed May 25 20:50:56 2011

md5sum *.bag.gz
# da017d513457ec242c5e7df29ff13a6e  H12263_MB_1m_MLLW_1of4.bag.gz
# 477fd3b148b2047ec2ff4c9b9daa740b  H12263_MB_8m_MLLW_combined.bag.gz
sha256sum *.bag.gz
# 9ff5775098fd7ee168969268a59e8f141742a99c09fd7429e16f47eb1662e29f  H12263_MB_1m_MLLW_1of4.bag.gz
# aaa4ef417049d6144744bc26a1e975b99f2ac41df88f6c057989ca3fb1329e3e  H12263_MB_8m_MLLW_combined.bag.gz

gunzip *.gz

ls -l
# total 3363696
# -rw-r--r-- 1 researchtools researchtools 3391351952 2011-05-25 20:31 H12263_MB_1m_MLLW_1of4.bag
# -rw-r--r-- 1 researchtools researchtools   53067176 2011-05-25 20:50 H12263_MB_8m_MLLW_combined.bag
md5sum *.bag
33593312d06a614f22d4e8dcf3e756e5  H12263_MB_1m_MLLW_1of4.bag
9229e785dd69bd708bf63eba136d31d7  H12263_MB_8m_MLLW_combined.bag

file *
# H12263_MB_1m_MLLW_1of4.bag:     Hierarchical Data Format (version 5) data
# H12263_MB_8m_MLLW_combined.bag: Hierarchical Data Format (version 5) data

Err… what is a Hierarchical Data Format??? (HDF) It is a container for data of all different sorts of format.

And this one will not load in qgis

Another try at a bag


wget http://surveys.ngdc.noaa.gov/mgg/NOS/coast/H10001-H12000/H11703/BAG/H11703_5m_Combined_MLLW_5of5.bag.gz
file *.gz
# H11703_5m_Combined_MLLW_5of5.bag.gz: gzip compressed data, was "H11703_5m_Combined_MLLW_5of5.ba", from Unix, last modified: Fri Jul 16 08:47:06 2010

Nope! But I have a snapshot of NGDC's bag archive from May 2010.

wget http://vislab-ccom.unh.edu/~schwehr/rt/examples/old-bags/H11703_Combined_5m.bag.bz2
wget http://vislab-ccom.unh.edu/~schwehr/rt/examples/old-bags/H11703_Office_5m.bag.bz2

gdalinfo --version
# GDAL 1.6.3, released 2009/11/19

gdalinfo H11703_Office_5m.bag 
Driver: HDF5/Hierarchical Data Format Release 5
Files: H11703_Office_5m.bag
Size is 512, 512
Coordinate System is `'
  BAG_root:Bag Version=1.0.0
  SUBDATASET_1_DESC=[1434x2004] //BAG_root/elevation (32-bit floating-point)
  SUBDATASET_2_DESC=[1434x2004] //BAG_root/uncertainty (32-bit floating-point)
Corner Coordinates:
Upper Left  (    0.0,    0.0)
Lower Left  (    0.0,  512.0)
Upper Right (  512.0,    0.0)
Lower Right (  512.0,  512.0)
Center      (  256.0,  256.0)

The version of gdal that comes with Ubuntu 11.04 is just too old to read bags.

gdalinfo --formats | egrep -i 'bag|hdf'
#  HDF4 (ro): Hierarchical Data Format Release 4
#  HDF4Image (rw+): HDF4 Dataset
#  HDF5 (ro): Hierarchical Data Format Release 5
#  HDF5Image (ro): HDF5 Dataset

Using gdal from fink on Mac OSX:

gdalinfo --version
# GDAL 1.8.1, released 2011/07/09
snipe:BAG schwehr$ gdalinfo --formats | egrep -i 'hdf|bag'
#  BAG (ro): Bathymetry Attributed Grid
#  HDF5 (ro): Hierarchical Data Format Release 5
#  HDF5Image (ro): HDF5 Dataset

Bags from hdf5 tools

sudo apt-get install hdf5-tools
h5ls H11703_Office_5m.bag
h5ls --recursive H11703_Office_5m.bag
h5ls -d H11703_Office_5m.bag/BAG_root/metadata | head
h5ls --simple -d H11703_Office_5m.bag/BAG_root/metadata  | head
h5ls --string --simple -d H11703_Office_5m.bag/BAG_root/metadata | head

h5ls --string --simple -d H11703_Office_5m.bag/BAG_root/metadata | egrep '"' | head

h5ls --string --simple -d H11703_Office_5m.bag/BAG_root/metadata | egrep '"' | cut -f2 -d\" | head

h5ls --string --simple -d H11703_Office_5m.bag/BAG_root/metadata | egrep '"' | cut -f2 -d\" | tr -d '\n' | less

h5ls --string --simple -d H11703_Office_5m.bag/BAG_root/metadata | egrep '"' | cut -f2 -d\" | tr -d '\n' > H11703_Office_5m.xml

emacsclient --no-wait H11703_Office_5m.xml

In emacs, we need to make this more readable:

M-x replace-string > < <RET> > C-q C-j <

Got that? C-q says to take the next character litterally. Then C-j is a new line (aka move the move down a line command)

Now we want to change the indenting of the XML metadata file.

  • Go to the beginning of the file: C-<
  • Start marking a region: C-space
  • Go to the end of the file: C->
  • Ask emacs to indent all: M-x indent-region
    • or press C-M-\

Not the most fun way to get the metadata out of a bag!

A simpler solution

I asked on the Open Nav Surf mailing list and Marcus Cole responded with this solution:

wget http://surveys.ngdc.noaa.gov/mgg/NOS/coast/H12001-H14000/H12279/BAG/H12279_VB_4m_MLLW_1of1.bag.gz
h5dump -b FILE -o H12279_VB_4m_MLLW_1of1.xml -d BAG_root/metadata H12279_VB_4m_MLLW_1of1.bag

This writes the data in a raw form that passes xmllint

xmllint H12279_VB_4m_MLLW_1of1.xml

This issue is discussed on stack overflow:


Bags from python

See also: https://github.com/schwehr/bag-py/

sudo apt-get install python-h5py # Only need to do this once!
import h5py
bag = h5py.File('H11703_Office_5m.bag')

root = bag['/BAG_root']

metadata = bag['/BAG_root/metadata']

metadata_txt = ''.join(metadata)

elevation = bag['/BAG_root/elevation'].value

from matplotlib import pyplot
pyplot.show()  # YUCK!
pyplot.imshow(elevation) # Better but not great

Can we make the plot more useful?

import numpy # for NAN aka "not a number"
for x in range(elevation.shape[0]):
    for y in range(elevation.shape[1]):
        if elevation[x,y] > 0:
            elevation[x,y] = numpy.NAN
# wait a while... this isn't fast



In [36]: history 1 35
1 : import h5py
2 : h5py.File('H11703_Office_5m.bag')
3 : bag = h5py.File('H11703_Office_5m.bag')
4 : _ip.magic("whos")
5 : bag.values()
6 : bag.items()
7 : root = bag['/BAG_root']
8 : root.items()
9 : metadata = bag['/BAG_root/metadata']
10: type(metadata)
11: metadata[0]
12: metadata.value
13: metadata_txt = ''.join(metadata.value)
14: metadata_txt[:100]
15: elevation = bag['/BAG_root/elevation'].value
16: type(elevation)
17: elevation.shape
18: elevation.min()
19: elevation.max()
20: from matplotlib import pyplot
21: pyplot.plot(elevation)
23: pyplot.show()
24: pyplot.imshow(elevation)
25: elevation.max()
26: #for x in range(elevation.
27: elevation.shape
29: import numpy
30: #?numpy.NAN
31: numpy.NAN
for x in range(elevation.shape[0]):
    for y in range(elevation.shape[1]):
        if elevation[x,y] > 0:
            elevation[x,y] = numpy.NAN
33: pyplot.imshow(elevation)
34: #?pyplot.imshow

Alternative python code with gdal

If you have gdal version 1.7.0 or newer, you can use this method, contribed by Jack Riley, to get the metadata:

from osgeo import gdal
bag = gdal.OpenShared(r"C:\DATA\NGDC\H11555_2m_1.bag") # NOTE: This is a windows file path
bagmetadata = bag.GetMetadata("xml:BAG")[0]

Author: Kurt Schwehr

Date: <2011-11-01 Tue>

HTML generated by org-mode 7.4 in emacs 23