Class 18: BAGs, HDF5 and XML
Table of Contents
Introduction
Bathymetric Attributed Grid (BAG)
Bags were started here at CCOM.
I am not sure how to go from the above to this:
- http://surveys.ngdc.noaa.gov/mgg/
- http://surveys.ngdc.noaa.gov/mgg/NOS/coast/H12001-H14000/H12263/BAG/
cd ~/class/18 mkdir bags cd bags wget http://surveys.ngdc.noaa.gov/mgg/NOS/coast/H12001-H14000/H12263/BAG/H12263_MB_1m_MLLW_1of4.bag.gz wget http://surveys.ngdc.noaa.gov/mgg/NOS/coast/H12001-H14000/H12263/BAG/H12263_MB_8m_MLLW_combined.bag.gz ls -l file *.bag.gz # H12263_MB_1m_MLLW_1of4.bag.gz: gzip compressed data, was "H12263_MB_1m_MLLW_1of4.bag", from Unix, last modified: Wed May 25 20:31:52 2011 # H12263_MB_8m_MLLW_combined.bag.gz: gzip compressed data, was "H12263_MB_8m_MLLW_combined.bag", from Unix, last modified: Wed May 25 20:50:56 2011 md5sum *.bag.gz # da017d513457ec242c5e7df29ff13a6e H12263_MB_1m_MLLW_1of4.bag.gz # 477fd3b148b2047ec2ff4c9b9daa740b H12263_MB_8m_MLLW_combined.bag.gz sha256sum *.bag.gz # 9ff5775098fd7ee168969268a59e8f141742a99c09fd7429e16f47eb1662e29f H12263_MB_1m_MLLW_1of4.bag.gz # aaa4ef417049d6144744bc26a1e975b99f2ac41df88f6c057989ca3fb1329e3e H12263_MB_8m_MLLW_combined.bag.gz gunzip *.gz ls -l # total 3363696 # -rw-r--r-- 1 researchtools researchtools 3391351952 2011-05-25 20:31 H12263_MB_1m_MLLW_1of4.bag # -rw-r--r-- 1 researchtools researchtools 53067176 2011-05-25 20:50 H12263_MB_8m_MLLW_combined.bag md5sum *.bag 33593312d06a614f22d4e8dcf3e756e5 H12263_MB_1m_MLLW_1of4.bag 9229e785dd69bd708bf63eba136d31d7 H12263_MB_8m_MLLW_combined.bag file * # H12263_MB_1m_MLLW_1of4.bag: Hierarchical Data Format (version 5) data # H12263_MB_8m_MLLW_combined.bag: Hierarchical Data Format (version 5) data
Err… what is a Hierarchical Data Format??? (HDF) It is a container for data of all different sorts of format.
And this one will not load in qgis
Another try at a bag
http://surveys.ngdc.noaa.gov/mgg/NOS/coast/H10001-H12000/H11703/BAG/
wget http://surveys.ngdc.noaa.gov/mgg/NOS/coast/H10001-H12000/H11703/BAG/H11703_5m_Combined_MLLW_5of5.bag.gz file *.gz # H11703_5m_Combined_MLLW_5of5.bag.gz: gzip compressed data, was "H11703_5m_Combined_MLLW_5of5.ba", from Unix, last modified: Fri Jul 16 08:47:06 2010
Nope! But I have a snapshot of NGDC's bag archive from May 2010.
wget http://vislab-ccom.unh.edu/~schwehr/rt/examples/old-bags/H11703_Combined_5m.bag.bz2 wget http://vislab-ccom.unh.edu/~schwehr/rt/examples/old-bags/H11703_Office_5m.bag.bz2 gdalinfo --version # GDAL 1.6.3, released 2009/11/19 gdalinfo H11703_Office_5m.bag
Driver: HDF5/Hierarchical Data Format Release 5 Files: H11703_Office_5m.bag Size is 512, 512 Coordinate System is `' Metadata: BAG_root:Bag Version=1.0.0 Subdatasets: SUBDATASET_1_NAME=HDF5:"H11703_Office_5m.bag"://BAG_root/elevation SUBDATASET_1_DESC=[1434x2004] //BAG_root/elevation (32-bit floating-point) SUBDATASET_2_NAME=HDF5:"H11703_Office_5m.bag"://BAG_root/uncertainty SUBDATASET_2_DESC=[1434x2004] //BAG_root/uncertainty (32-bit floating-point) Corner Coordinates: Upper Left ( 0.0, 0.0) Lower Left ( 0.0, 512.0) Upper Right ( 512.0, 0.0) Lower Right ( 512.0, 512.0) Center ( 256.0, 256.0)
The version of gdal that comes with Ubuntu 11.04 is just too old to read bags.
gdalinfo --formats | egrep -i 'bag|hdf' # HDF4 (ro): Hierarchical Data Format Release 4 # HDF4Image (rw+): HDF4 Dataset # HDF5 (ro): Hierarchical Data Format Release 5 # HDF5Image (ro): HDF5 Dataset
Using gdal from fink on Mac OSX:
gdalinfo --version # GDAL 1.8.1, released 2011/07/09 snipe:BAG schwehr$ gdalinfo --formats | egrep -i 'hdf|bag' # BAG (ro): Bathymetry Attributed Grid # HDF5 (ro): Hierarchical Data Format Release 5 # HDF5Image (ro): HDF5 Dataset
Bags from hdf5 tools
sudo apt-get install hdf5-tools
h5ls H11703_Office_5m.bag h5ls --recursive H11703_Office_5m.bag h5ls -d H11703_Office_5m.bag/BAG_root/metadata | head h5ls --simple -d H11703_Office_5m.bag/BAG_root/metadata | head h5ls --string --simple -d H11703_Office_5m.bag/BAG_root/metadata | head h5ls --string --simple -d H11703_Office_5m.bag/BAG_root/metadata | egrep '"' | head h5ls --string --simple -d H11703_Office_5m.bag/BAG_root/metadata | egrep '"' | cut -f2 -d\" | head h5ls --string --simple -d H11703_Office_5m.bag/BAG_root/metadata | egrep '"' | cut -f2 -d\" | tr -d '\n' | less h5ls --string --simple -d H11703_Office_5m.bag/BAG_root/metadata | egrep '"' | cut -f2 -d\" | tr -d '\n' > H11703_Office_5m.xml emacsclient --no-wait H11703_Office_5m.xml
In emacs, we need to make this more readable:
M-x replace-string > < <RET> > C-q C-j <
Got that? C-q says to take the next character litterally. Then C-j is a new line (aka move the move down a line command)
Now we want to change the indenting of the XML metadata file.
- Go to the beginning of the file: C-<
- Start marking a region: C-space
- Go to the end of the file: C->
-
Ask emacs to indent all: M-x indent-region
- or press C-M-\
Not the most fun way to get the metadata out of a bag!
A simpler solution
I asked on the Open Nav Surf mailing list and Marcus Cole responded with this solution:
wget http://surveys.ngdc.noaa.gov/mgg/NOS/coast/H12001-H14000/H12279/BAG/H12279_VB_4m_MLLW_1of1.bag.gz h5dump -b FILE -o H12279_VB_4m_MLLW_1of1.xml -d BAG_root/metadata H12279_VB_4m_MLLW_1of1.bag
This writes the data in a raw form that passes xmllint
xmllint H12279_VB_4m_MLLW_1of1.xml
This issue is discussed on stack overflow:
http://stackoverflow.com/questions/7966747/hdf5dump-of-h5t-string
Bags from python
See also: https://github.com/schwehr/bag-py/
sudo apt-get install python-h5py # Only need to do this once! ipython import h5py
bag = h5py.File('H11703_Office_5m.bag') bag.values() bag.items() root = bag['/BAG_root'] root.values() root.items() root.keys() metadata = bag['/BAG_root/metadata'] type(metadata) metadata[0] metadata.value metadata_txt = ''.join(metadata) metadata_txt[:80] elevation = bag['/BAG_root/elevation'].value elevation.shape elevation.min() elevation.max() from matplotlib import pyplot pyplot.plot(elevation) pyplot.show() # YUCK! pyplot.imshow(elevation) # Better but not great
Can we make the plot more useful?
import numpy # for NAN aka "not a number" for x in range(elevation.shape[0]): for y in range(elevation.shape[1]): if elevation[x,y] > 0: elevation[x,y] = numpy.NAN # wait a while... this isn't fast pyplot.figure(2) pyplot.imshow(elevation)
History
In [36]: history 1 35 1 : import h5py 2 : h5py.File('H11703_Office_5m.bag') 3 : bag = h5py.File('H11703_Office_5m.bag') 4 : _ip.magic("whos") 5 : bag.values() 6 : bag.items() 7 : root = bag['/BAG_root'] 8 : root.items() 9 : metadata = bag['/BAG_root/metadata'] 10: type(metadata) 11: metadata[0] 12: metadata.value 13: metadata_txt = ''.join(metadata.value) 14: metadata_txt[:100] 15: elevation = bag['/BAG_root/elevation'].value 16: type(elevation) 17: elevation.shape 18: elevation.min() 19: elevation.max() 20: from matplotlib import pyplot 21: pyplot.plot(elevation) 22: 23: pyplot.show() 24: pyplot.imshow(elevation) 25: elevation.max() 26: #for x in range(elevation. 27: elevation.shape 28: 29: import numpy 30: #?numpy.NAN 31: numpy.NAN 32: for x in range(elevation.shape[0]): for y in range(elevation.shape[1]): if elevation[x,y] > 0: elevation[x,y] = numpy.NAN 33: pyplot.imshow(elevation) 34: #?pyplot.imshow
Alternative python code with gdal
If you have gdal version 1.7.0 or newer, you can use this method, contribed by Jack Riley, to get the metadata:
from osgeo import gdal bag = gdal.OpenShared(r"C:\DATA\NGDC\H11555_2m_1.bag") # NOTE: This is a windows file path bagmetadata = bag.GetMetadata("xml:BAG")[0]
Date: <2011-11-01 Tue>
HTML generated by org-mode 7.4 in emacs 23