You are on page 1of 80

Introduction to HDF5

HDF & HDF-EOS Workshop XII


October 15, 2008

10/15/08 HDF & HDF-EOS Workshop XII 1


Topics Covered

- Introduce HDF5

- Describe HDF5 Data and Programming Models

- Walk Through Example Code

10/15/08 HDF & HDF-EOS Workshop XII 2


For More Information …

All workshop slides will be available from:

http://hdfeos.org/workshops/ws12/workshop_twelve.php

10/15/08 HDF & HDF-EOS Workshop XII 3


What is HDF5?

HDF = Hierarchical Data Format

• Data model, library and file format for managing


data

• Tools for accessing data in the HDF5 format

10/15/08 HDF & HDF-EOS Workshop XII 4


Brief History of HDF
1987 At NCSA (University of Illinois), a task force formed to create an
architecture-independent format and library:
AEHOO (All Encompassing Hierarchical Object Oriented format)
Became HDF

Early NASA adopted HDF for Earth Observing System project


1990’s

1996 DOE’s ASC (Advanced Simulation and Computing) Project began


collaborating with the HDF group (NCSA) to create “Big HDF”
(Increase in computing power of DOE systems at LLNL, LANL and
Sandia National labs, required bigger, more complex data files).

“Big HDF” became HDF5.

1998 HDF5 was released with support from National Labs, NASA, NCSA

2006 The HDF Group spun off from University of Illinois as non-profit
corporation
10/15/08 HDF & HDF-EOS Workshop XII 5
Why HDF5?

In one sentence ...

10/15/08 HDF & HDF-EOS Workshop XII 6


Answering big questions …

Matter and the universe

Life and nature

August 24, 2001 August 24, 2002

Total Column Ozone (Dobson)

60 385 610

Weather and climate


10/15/08 HDF & HDF-EOS Workshop XII 7
… involves big data …

10/15/08 HDF & HDF-EOS Workshop XII 8


… varied data …

LCI Tutorial

Thanks to Mark Miller, LLNL


10/15/08 HDF & HDF-EOS Workshop XII 9
… and complex relationships …
SNP Score
Contig Summaries

Discrepancies

Contig Qualities

Coverage Depth

Trace

Reads

Aligned bases

Read
quality Contig

Percent match

10/15/08 HDF & HDF-EOS Workshop XII 10


… on big computers …

… and small computers …


10/15/08 HDF & HDF-EOS Workshop XII 11
How do we…

• Describe our data?


• Read it? Store it? Find it? Share it? Mine it?
• Move it into, out of, and between computers and
repositories?
• Achieve storage and I/O efficiency?
• Give applications and tools easy access our data?

10/15/08 HDF & HDF-EOS Workshop XII 12


Solution: HDF5!

• Can store all kinds of data in a variety of ways

• Runs on most systems

• Lots of tools to access data

• Emphasis on standards (HDF-EOS, CGNS)

• Library and format emphasis on I/O efficiency and


storage

10/15/08 HDF & HDF-EOS Workshop XII 13


Structure of HDF5 Library

Applications

Object API (C, F90, C++, Java)

Library internals
Virtual file I/O

File or other “storage”

10/15/08 HDF & HDF-EOS Workshop XII 14


HDF Tools

- HDFView and Java Products

- Command-line utilities (h5dump, h5ls, h5cc,


h5diff, h5repack)

10/15/08 HDF & HDF-EOS Workshop XII 15


HDF5 Applications & Domains

Examples: Thermonuclear simulations


Simulation, visualization, Product modeling
Data mining tools
remote sensing… Visualization tools
Climate models

HDF-EOS CGNS ASC Communities

Virtual File Layer HDF5 Data Model & API


(I/O Drivers)
Stdio Split Files MPI I/O Custom

Storage
HDF5 ?
User-defined
Split metadata File on parallel
format File and raw data files file system device

10/15/08 HDF & HDF-EOS Workshop XII 16


Lots of Layers in HDF5!

“Ogres are like onions.”

Shrek  HDF5 Monster??

Just like Shrek, once you get to


know HDF5 you will really like it!!

10/15/08 HDF & HDF-EOS Workshop XII 17


The HDF5 Format

10/15/08 HDF & HDF-EOS Workshop XII 18


An HDF5 file is a container…

…into
which you lat | lon | temp

can put ----|-----|-----


12 | 23 | 3.1
15 | 24 | 4.2

your data 17 | 21 | 3.6

objects.

10/15/08 HDF & HDF-EOS Workshop XII 19


HDF5 Structures for Organizing Objects

“/” (root)

3-D array “foo”

lat | lon | temp


----|-----|-----
12 | 23 | 3.1
15 | 24 | 4.2
17 | 21 | 3.6
palette
Table
Raster image

Raster image 2-D array

10/15/08 HDF & HDF-EOS Workshop XII 20


HDF5 Data Model

Primary Objects
• Groups
• Datasets

Additional ways to organize and annotate data


• Attributes
• Storage and access properties

Everything else is built from these parts.

10/15/08 HDF & HDF-EOS Workshop XII 21


HDF5 Dataset

Metadata Data
Dataspace
Rank Dimensions
3 Dim_1 = 4
Dim_2 = 5
Dim_3 = 7

Datatype
Integer
Attributes
Storage Info Time = 32.4
Chunked Pressure = 987
Compressed Temp = 56

10/15/08 HDF & HDF-EOS Workshop XII 22


Dataspaces
Two roles:
• Dataspace contains spatial info about a dataset
stored in a file
• Rank and dimensions
• Permanent part of dataset
definition
Rank = 2
Dimensions = 4x6

• Partial I/0: Dataspace describes application’s data


buffer and data elements participating in I/O
Rank = 1
Dimension = 10

10/15/08 HDF & HDF-EOS Workshop XII 23


Write – from memory to disk

memory disk

10/15/08 HDF & HDF-EOS Workshop XII 24


Partial I/O
Move just part of a dataset

memory disk

(a) Slab from a 2D array to the disk memory


corner of a smaller 2D array

(b) Regular series of blocks from a


2D array to a contiguous sequence
at a certain offset in a 1D array
Elements in each must be same.
10/15/08 HDF & HDF-EOS Workshop XII 25
Datatypes (array elements)

• Datatype – how to interpret a data element


• Permanent part of the dataset definition
• Two classes: atomic and compound

10/15/08 HDF & HDF-EOS Workshop XII 26


Datatypes
• HDF5 atomic types include:
integer & float
user-definable (e.g., 13-bit integer)
variable length types (e.g., strings)
references to objects/dataset regions
enumeration - names mapped to integers

• HDF5 compound types


Comparable to C structs (“records”)
Members can be atomic or compound types

10/15/08 HDF & HDF-EOS Workshop XII 27


HDF5 dataset: array of records

Dimensionality: 5 x 3

int8 int4 int16 2x3x2 array of float32


Datatype:

Record
10/15/08 HDF & HDF-EOS Workshop XII 28
Properties

• Properties are characteristics of HDF5 objects


that can be modified

• Default properties handle most needs

• By changing properties can take advantage of the


more powerful features in HDF5

10/15/08 HDF & HDF-EOS Workshop XII 29


Special Storage Properties
Better subsetting
chunked access time;
extensible

Improves storage
compressed efficiency,
transmission speed

Arrays can be
extensible extended in any
direction

File B
Dataset “Fred” Metadata in one file,
split file File A raw data in another

Metadata for Fred Data for Fred

10/15/08 HDF & HDF-EOS Workshop XII 30


Attributes (optional)
• Attribute – data of the form “name = value”,
attached to an object

• Operations similar to dataset operations, but …


Not extensible
No compression or partial I/O

• Can be overwritten, deleted, added during the


“life” of a dataset

10/15/08 HDF & HDF-EOS Workshop XII 31


HDF5 Dataset (again)

Metadata Data
Dataspace
Rank Dimensions
3 Dim_1 = 4
Dim_2 = 5
Dim_3 = 7

Datatype
Integer
Attributes
Storage info Time = 32.4
Chunked Pressure = 987
Compressed Temp = 56

10/15/08 HDF & HDF-EOS Workshop XII 32


Groups

• A mechanism for organizing collections


• Every file starts with a root group “/”
• Similar to UNIX directories A C
B
• Can have attributes
l m
k

10/15/08 HDF & HDF-EOS Workshop XII 33


Path to HDF5 Object in a File

“/”
foo x
/ (root)
/x
temp bar
/foo
/foo/temp
/foo/bar/temp temp

10/15/08 HDF & HDF-EOS Workshop XII 34


Shared Objects

“/”
A C
B

R P
P

/A/P
/B/R
/C/P

10/15/08 HDF & HDF-EOS Workshop XII 35


Questions So Far?

10/15/08 HDF & HDF-EOS Workshop XII 36


Useful Tools For New Users

h5dump:
Tool to “dump” or display contents of HDF5 files

h5cc, h5c++, h5fc:


Scripts to compile applications

HDFView:
Java browser to view HDF4 and HDF5 files

10/15/08 HDF & HDF-EOS Workshop XII 37


H5dump Command-line Utility To View HDF5 File

h5dump [--header] [-a ] [-d <names>] [-g <names>]


[-l <names>] [-t <names>] [-p] <file>

--header Display header only; no data is displayed.


-a <names> Display the specified attribute(s).
-d <names> Display the specified dataset(s).
-g <names> Display the specified group(s) and all the members.
-l <names> Displays the value(s) of the specified soft link(s).
-t <names> Display the specified named datatype(s).
-p Display properties.

<names> is one or more appropriate object names.

10/15/08 HDF & HDF-EOS Workshop XII 38


Example of h5dump Output

HDF5 "dset.h5" {
GROUP "/" {
DATASET "dset" {
DATATYPE { H5T_STD_I32BE }
DATASPACE { SIMPLE ( 4, 6 ) / ( 4, 6 ) }
DATA {
1, 2, 3, 4, 5, 6,
7, 8, 9, 10, 11, 12,
13, 14, 15, 16, 17, 18,
19, 20, 21, 22, 23, 24
} “/”
} ‘dset’
}
}

10/15/08 HDF & HDF-EOS Workshop XII 39


HDF5 Compile Scripts

• h5cc – HDF5 C compiler command


• h5fc – HDF5 F90 compiler command
• h5c++ – HDF5 C++ compiler command

To compile:
% h5cc h5prog.c
% h5fc h5prog.f90

10/15/08 HDF & HDF-EOS Workshop XII 40


Compile option: -show

-show: displays the compiler commands and options


without executing them
% h5cc –show Sample_c.c
gcc -I/home/packages/hdf5_1.6.6/Linux_2.6/include -UH5_DEBUG_API
-DNDEBUG -I/home/packages/szip/static/encoder/Linux2.6-gcc/include
-D_LARGEFILE_SOURCE -D_LARGEFILE64_SOURCE -D_FILE_OFFSET_BITS=64
-D_POSIX_SOURCE -D_BSD_SOURCE -std=c99 -Wno-long-long -O
-fomit-frame-pointer -finline-functions -c Sample_c.c

gcc -std=c99 -Wno-long-long -O -fomit-frame-pointer -finline-functions


-L/home/packages/szip/static/encoder/Linux2.6-gcc/lib Sample_c.o
-L/home/packages/hdf5_1.6.6/Linux_2.6/lib
/home/packages/hdf5_1.6.6/Linux_2.6/lib/libhdf5_hl.a
/home/packages/hdf5_1.6.6/Linux_2.6/lib/libhdf5.a
-lsz -lz -lm -Wl,-rpath -Wl,/home/packages/hdf5_1.6.6/Linux_2.6/lib

10/15/08 HDF & HDF-EOS Workshop XII 41


Browsing HDF5 Files with HDFView

10/15/08 HDF & HDF-EOS Workshop XII 42


HDFView

Structure of File Contents


of Dataset

10/15/08 HDF & HDF-EOS Workshop XII 43


HDFView File Menu

10/15/08 HDF & HDF-EOS Workshop XII 44


10/15/08 HDF & HDF-EOS Workshop XII 45
Simple HDF5 File in HDFView
Right-click and select
“Open” with mouse

Right-click and select


“Show Properties”
with mouse

10/15/08 HDF & HDF-EOS Workshop XII 46


Simple HDF5 File in HDFView

10/15/08 HDF & HDF-EOS Workshop XII 47


HDF-EOS5 File in HDFView

10/15/08 HDF & HDF-EOS Workshop XII 48


Right-click and select
“Open As” with mouse

10/15/08 HDF & HDF-EOS Workshop XII 49


What you can’t see
with slides:

-Picture displayed instantly


-File size is 906,229,176

10/15/08 HDF & HDF-EOS Workshop XII 50


Introduction to
HDF5 Programming Model
and APIs

10/15/08 HDF & HDF-EOS Workshop XII 51


Operations Supported by the API
• Create objects (groups, datasets, attributes, complex data
types, …)

• Assign storage and I/O properties to objects

• Perform complex subsetting during read/write

• Use variety of I/O “devices” (parallel, remote, etc.)

• Transform data during I/O

• Make inquiries on file and object structure, content,


properties

10/15/08 HDF & HDF-EOS Workshop XII 52


General Programming Paradigm

• Properties of object are optionally defined


Creation properties
Access property lists

• Object is opened or created


• Object is accessed, possibly many times
• Object is closed

10/15/08 HDF & HDF-EOS Workshop XII 53


Order of Operations
• An order is imposed on operations by argument
dependencies

For Example:
A file must be opened before a dataset
-because-
the dataset open call requires a file handle
as an argument.

• Objects can be closed in any order.

10/15/08 HDF & HDF-EOS Workshop XII 54


The General HDF5 API
• Currently C, Fortran 90, Java, and C++ bindings.
• C routines begin with prefix H5?
? is a character corresponding to the type of object
the function acts on

Example Functions:

H5D : Dataset interface e.g., H5Dread


H5F : File interface e.g., H5Fopen
H5S : dataSpace interface e.g., H5Sclose

10/15/08 HDF & HDF-EOS Workshop XII 55


HDF5 Defined Types
For portability, the HDF5 library has its own defined
types:

hid_t: object identifiers (native integer)


hsize_t: size used for dimensions (unsigned long or
unsigned long long)
hssize_t: for specifying coordinates and sometimes for
dimensions (signed long or signed long long)
herr_t: function return value
hvl_t: variable length datatype

For C, include hdf5.h in your HDF5 application.

10/15/08 HDF & HDF-EOS Workshop XII 56


The HDF5 API

• For flexibility, the API is extensive Victronix


Swiss Army

 300+ functions Cybertool


34

• This can be daunting… but there is hope


A few functions can do a lot
Start simple
Build up knowledge as more features are needed

10/15/08 HDF & HDF-EOS Workshop XII 57


Basic Functions

H5Fcreate (H5Fopen) create (open) File

H5Screate_simple create dataSpace

H5Dcreate (H5Dopen) create (open) Dataset

H5Dread, H5Dwrite access Dataset

H5Dclose close Dataset

H5Sclose close dataSpace

H5Fclose close File

10/15/08 HDF & HDF-EOS Workshop XII 58


Other Common Functions

DataSpaces: H5Sselect_hyperslab (Partial I/O)


H5Sselect_elements (Partial I/O)

Groups: H5Gcreate, H5Gopen, H5Gclose

Attributes: H5Acreate, H5Aopen_name,


H5Aclose, H5Aread, H5Awrite

Property lists: H5Pcreate, H5Pclose


H5Pset_chunk, H5Pset_deflate

10/15/08 HDF & HDF-EOS Workshop XII 59


High Level APIs

• Included along with the HDF5 library


• Simplify steps for creating, writing, and reading
objects
• Do not entirely ‘wrap’ HDF5 library

10/15/08 HDF & HDF-EOS Workshop XII 60


Example HDF5 Code

10/15/08 HDF & HDF-EOS Workshop XII 61


Steps to Create a File
1. Decide on special properties the file should have
• Creation properties, like size of user block
• Access properties, such as metadata cache size
• Use default properties (H5P_DEFAULT)

2. Create property lists, if necessary


3. Create the file
4. Close the file and the property lists, as needed

10/15/08 HDF & HDF-EOS Workshop XII 62


Code: Create a File

hid_t file_id;
herr_t status;

file_id = H5Fcreate ("file.h5", H5F_ACC_TRUNC,


H5P_DEFAULT, H5P_DEFAULT);

status = H5Fclose (file_id);


“/” (root)

Note: Return codes not checked for errors in code samples.

10/15/08 HDF & HDF-EOS Workshop XII 63


Dataset Components

Metadata Data
Dataspace
Rank Dimensions
3 Dim_1 = 4
Dim_2 = 5
Dim_3 = 7

Datatype
Integer
Attributes
Storage info Time = 32.4
Chunked Pressure = 987
Compressed Temp = 56

10/15/08 HDF & HDF-EOS Workshop XII 64


Steps to Create a Dataset
1. Define dataset characteristics
• Dataspace - 4x6
• Datatype – integer
• Properties if needed, or use H5P_DEFAULT

2. Decide where to put it


• Obtain location ID: “/” (root)
- Group ID puts it in a Group A
- File ID puts it in Root Group
3. Create dataset in file
4. Close everything

10/15/08 HDF & HDF-EOS Workshop XII 65


HDF5 Pre-defined Datatype Identifiers
HDF5 defines* set of Datatype Identifiers per HDF5
session.
For example:

C Type HDF5 File Type HDF5 Memory Type


int H5T_STD_I32BE H5T_NATIVE_INT
H5T_STD_I32LE

float H5T_IEEE_F32BE H5T_NATIVE_FLOAT


H5T_IEEE_F32LE

double H5T_IEEE_F64BE H5T_NATIVE_DOUBLE


H5T_IEEE_F64LE

* Value of datatype is NOT fixed

10/15/08 HDF & HDF-EOS Workshop XII 66


Pre-defined File Datatype Identifiers

Examples:

H5T_IEEE_F64LE Eight-byte, little-endian, IEEE floating-point


H5T_STD_I32LE Four-byte, little-endian, signed two's
complement integer

Programming
Architecture* Type

NOTE: What you see in the file. Name is the same everywhere and
explicitly defines a datatype.

*STD= “An architecture with a semi-standard type like 2’s complement integer, unsigned integer…”
10/15/08 HDF & HDF-EOS Workshop XII 67
Pre-defined Native Datatypes
Examples of predefined native types in C:

H5T_NATIVE_INT (int)
H5T_NATIVE_FLOAT (float )
H5T_NATIVE_UINT (unsigned int)
H5T_NATIVE_LONG (long )
H5T_NATIVE_CHAR (char )

NOTE: Memory types.


Different for each machine.
Used for reading/writing.

10/15/08 HDF & HDF-EOS Workshop XII 68


Dataset Creation Property List

Dataset creation property list: information on how to


organize data in storage.

Chunked

Chunked &
compressed

H5P_DEFAULT: contiguous

10/15/08 HDF & HDF-EOS Workshop XII 69


Code: Create a Dataset
1 hid_t file_id, dataset_id, dataspace_id;
2 hsize_t dims[2];
3 herr_t status;

4 file_id = H5Fcreate (”file.h5", H5F_ACC_TRUNC,


H5P_DEFAULT, H5P_DEFAULT);
Create a dataspace
5 dims[0] = 4; rank current dims
6 dims[1] = 6;
7 dataspace_id = H5Screate_simple (2, dims, NULL);

Create a dataset pathname datatype


8 dataset_id = H5Dcreate(file_id,”A",H5T_STD_I32BE,
dataspace_id, H5P_DEFAULT);
dataspace
property list
(default)
Terminate access to dataset, dataspace, file
9 status = H5Dclose (dataset_id);
10 status = H5Sclose (dataspace_id);
11 status = H5Fclose (file_id);
10/15/08 HDF & HDF-EOS Workshop XII 70
Example Code - H5Dwrite

Dataset Identifier from


H5Dcreate or H5Dopen Memory Datatype

status = H5Dwrite (dataset_id, H5T_NATIVE_INT, H5S_ALL,


H5S_ALL, H5P_DEFAULT, dset_data);

10/15/08 HDF & HDF-EOS Workshop XII 71


Example Code – H5Dwrite

status = H5Dwrite (dataset_id, H5T_NATIVE_INT, H5S_ALL, H5S_ALL,


H5P_DEFAULT, dset_data);

Data Transfer Property List Memory


(MPI I/O, Transformations, …) Dataspace File
Dataspace

H5S_ALL selects entire


dataspace

10/15/08 HDF & HDF-EOS Workshop XII 72


Partial I/O
Memory Dataspace File Dataspace (disk)

H5S_ALL H5S_ALL

Get a Dataspace:
H5Screate_simple
H5Dget_space

Modify Dataspace:
H5Sselect_hyperslab
H5Sselect_elements

10/15/08 HDF & HDF-EOS Workshop XII 73


Example Code – H5Dread

status = H5Dread (dataset_id, H5T_NATIVE_INT,


H5S_ALL, H5S_ALL, H5P_DEFAULT, dset_rdata);

10/15/08 HDF & HDF-EOS Workshop XII 74


High Level APIs: HDF5 Lite (H5LT)

#include "H5LT.h"

file_id = H5Fcreate (“file.h5", H5F_ACC_TRUNC,
H5P_DEFAULT, H5P_DEFAULT);

status = H5LTmake_dataset (file_id,“A", 2, dims,


H5T_STD_I32BE, data);
status = H5Fclose (file_id);

10/15/08 HDF & HDF-EOS Workshop XII 75


High Level APIs

• HDF5 Lite
• HDF5 Image
• HDF5 Table
• HDF5 Dimension Scales
• HDF5 Packet Table

10/15/08 HDF & HDF-EOS Workshop XII 76


Example: Create a Group

“/” (root)
A B

4x6 array of
integers

file.h5

10/15/08 HDF & HDF-EOS Workshop XII 77


Steps to Create a Group
1. Decide where to put it – “root group”
• Obtain location ID

2. Decide name – “B”

3. Create group in file

4. (Eventually) close the group.

10/15/08 HDF & HDF-EOS Workshop XII 78


Code: Create a Group

hid_t file_id, group_id;


...
/* Open “file.h5” */
file_id = H5Fopen (“file.h5”, H5F_ACC_RDWR,
H5P_DEFAULT);

/* Create group "/B" in file. */ Size hint for number of


bytes to store names of
group_id = H5Gcreate (file_id,"B",0); objects. 0=default

/* Close group and file. */


status = H5Gclose (group_id);
status = H5Fclose (file_id);

10/15/08 HDF & HDF-EOS Workshop XII 79


Thank you!

This work was supported by the Cooperative Agreement with the


National Aeronautics and Space Administration (NASA) under NASA
grant NNX06AC83A and NNX08A077A. Any opinions, findings,
conclusions or recommendations expressed in this material are those of
the author(s) and do not necessarily reflect the views of NASA.

10/15/08 HDF & HDF-EOS Workshop XII 80

You might also like