HDF-EOS Vendors / Software Developers Workshop

Summary of papers, panels, questions and responses on format standard

September 8-10, 1997

NASA - Goddard Space Flight Center (GSFC)


Please note the summaries below were reconstructed from notes taken during the workshop. Please inform Doug Ilg (Hughes STX) of any inaccuracies or needed corrections.


URL http://ulabhp.gsfc.nasa.gov/~Workshop/Workshop.html

Reporter: Dr. Ravi Sharma, Principal Systems Engineer,

Phone 301-441-4296, email: rsharma@stx.com

Technical Editor: Doug Ilg,

Hughes STX Corporation


DAY 1: Monday - Sept. 8, 1997


Welcome - Candace Carlisle, ESDIS Project, GSFC, NASA


I. General Introduction - Rick Obenschain, ESDIS Project Manager

An overview was provided of the purpose of the Workshop. The purpose included: the provision of HDF-EOS related information to the users and vendors, an overview of current and planned tools, sessions with the experts on HDF-EOS (a new profile of HDF) and science panel members, demonstrations and presentations by developers and vendors, and feedback for improving the usefulness of EOS data. Salient features of the EOS mission spacecraft and planned launches, as well as the detailed agenda for the Workshop were discussed. The presentation also listed the criteria that were used to select HDF and the role of DAACs, NCSA, and others toward the development of HDF-EOS. This presentation also set the stage for the Workshop, by outlining the opportunities available to the developers and vendors, due to the enormous volume of data expected. NASA is committed to supporting HDF-EOS in order to facilitate the use of EOS data and information.


II. Introduction to ESDIS - Dr. Timothy Gubbels, Hughes Info. Tech. Systems

An overview was presented of EOS, ECS & EOSDIS including the AM-1 and PM-1 satellites, sensors, spectral bands and related mission parameters as well as system data handling capabilities and potential uses of the data.


III. Overview of HDF - Dr. Michael Folk, NCSA

HDF grew out of the need for users to share scientific data in a heterogeneous computing environment. Users include: Johns Hopkins Applied Physics Lab, Mathworks, Image Annotation, NIST, Stanford, European Corporations (Lufthansa, etc), DAAC’s, GCRP, among others.

HDF files consist of a collection of data objects and an index. There are 7 types of data objects, e.g. 8/ 24-bit, and general raster images, scientific data sets (SDS - multi-dimensional array), Vdata (Table), and Vgroup (grouping structure). Data can be stored as linked blocks, in external files or as "chunks."

The HDF library uses a layered architecture. It consists of a general purpose API, on top of which several object specific APIs are built. User-developed application programs are, in turn, built on top of the object APIs.

The library is free. The distribution includes source code for the libraries and utilities. The activity is partially supported by NASA. At present, C and Fortran interfaces on the following platforms are supported: Sun, SGI, Cray, IBM, DEC, Power PC, MAC.

Projects undertaken are:


IV. Overview of HDF-EOS- Doug Ilg, Hughes STX

An overview describing HDF-EOS as a profile of HDF was presented. It addresses EOSDIS needs and meets requirements for tight coupling of data and geolocation, as well as subsetting services.

Point, Grid & Swath related prefixes in C and Fortran interfaces and the general programming model for HDF-EOS were described.

Point data was described as an indexed table structure. The Point structure is designed for in-situ measurements such as weather station or buoy data. Point structure, Point interface, Point subsetting features (time, geolocation), and Point dataset writing tips were given.

The Swath model can be used for scanning, staring, profiling, or push-broom sensors. It organizes data in terms of crosstrack scans or elements of vertical cells along the orbital track. Order of function calls is significant: dimensions must be defined before use. Compression methods run-length encoding, Adaptive Huffman, and Gzip, are available. Unless dimensional map is defined, a 1-1 mapping is assumed during subsetting by time or geolocation.

The Grid is useful for highly processed data that has been spatially and/or temporally resampled. It consists of a set of x,y data fields with projection information, etc. Tiling is a special recently added feature. Again, the order of function calls is important.

IDs returned by subsetting routines may be re-used to refine an area of interest.

URL http://edhs1.gsfc.nasa.gov/

Ref: EDHS server documents 170-TP-005-002 and also 006-001, 007-002, 008-002, 009-002.


V. Introduction to EOS Metadata - Dr. Karl Cox, Hughes Information Technology Systems

EOSDIS metadata include granule level information attributes. Data Server Subsystem contains metadata tables. The granules are stored in HDF-EOS files. The metadata resides both in the granule and in text files with ".met" extension. Thus ECS metadata is a user defined global attribute (coremetadata.n, where as many as 10 blocks of up to 64k can be chained together). User defined or archive metadata extensions are allowed. HDF library calls can be used to retrieve metadata blocks. ODL source code is taken from JPL Planetary Data System group. ODL is in the public domain. URL http://pds.jpl.nasa.gov/stdref/chap12. Information on V1 Science data software is in the ECS documents: 1600-TP-013-001 and 420-TP-015-002.


VI. Available Tools and Overview of Needs. Ramachandran Suresh, Hughes STX

The tools were described in four major categories: HDF Utilities, NCSA Tools, Other Public Domain Tools and Commercial Tools. The features of each tool were provided and relevant URLs were cited. However, it was concluded that there were not many required tools for HDF-EOS support and much needs to be done.

Comment from Mike Folk, NCSA: In the list of NCSA utilities, there are two more to be added, namely, i). floating-point to HDF and ii). ASCII to HDF.


VII. Vendor Presentations and Demonstrations


JHV is an HDF viewer/browser application. It allows arithmetic and boolean operations and supports applets, movies, filtering, etc. The current implementation supports spreadsheet and image overlays. Boundary lines (political and coast lines) are also drawn. Currently application is being upgraded from JDK1.0 to JDK1.1. Since Java is a network aware language, it is easy to transition from stand alone to distributed applications written in Java. Demonstrated a major application on remote file using Java (servelet, Jigsaw Server) one month ago.


The objective is to provide the data user the data they need as quickly as feasible and reduce analysis time by reduced amount of data. The prototype is web based, dataset independent and uses both HDF-EOS formatted and HDF Libraries. User interface is written in C-language. Subsetting executes in batch mode when processing load allows. Input parameters include geographic bounding box, parameter/channels, subsampling stride, geolocation, and time. Files as large as 100Mb are routinely processed for SSMI, TOMS, snow cover maps, etc. The prototype can subset grid and swath data, deal with multiple grids and swaths, can include or exclude non-geolocated data and provide output as an HDF-EOS file. Time is used in 64-bit floating point tai93 format.


Plans include:


What is required:




Second Day, September 9, 1997


VIII. Presentations by DAACs


GSFC: George Serafino

Goddard DAAC overview and functionality was briefly described. The Goddard DAAC contains many products in HDF and, after launch of the AM-1 satellite, will receive data from several instruments in HDF-EOS. Among the users 25% were international. This DAAC has been actively supporting the HDF format.


LaRC: Linda Hunt

Archives data from 20 Earth science projects with 205 datasets, 40 of which are in HDF/HDF-EOS format. HDF/HDF-EOS formatted data comprises 30% of the available data granules. Relevant statistics: 6 CD-ROMs produced, 100 Solar radiation budget data users, 31 countries have accessed data, total 3,366 users, 2TB data holdings, 1,996 total orders, 4 projects relating to web. An X-Windows GUI which can subset Grids is provided and an IDL-based GUI is supported. What is desired is a platform independent tool for selecting a little bit from many datasets into a file.


EDC: John Boyd

Users of the US Geological Survey’s EROS Data Center (EDC) data holdings (AVHRR, Landsat, etc.) include Nature Conservancy, IGBP, users from minerals, energy, forestry, COTS software developers, VAR's, etc. Other application areas include land use and land cover, resources management and ecosystems, disaster location, etc. The EDC provides data in EDC levels L0R (reformatted unprocessed), L1R (radiometrically corrected), L1G (geometric distortion removed) as GeoTIFF and Fast Format (EOSAT). EDC’s ASTER holdings will consist of at-sensor, decorrelated & stretched, scene classification, surface radiation, and surface reflectance data products. MODIS holdings will include EOS L3 and L4 data products are planned with surface reflectance, NDVI, BRDF, surface temperature, land cover, thermal anomalies, fire, LAI etc. There are no V0 data sets. EDC has 120 TB of raw serial bitstream data including Landsat downstream format and 5 TB of NOAA L1B, AVHRR data.

Among users, very few are familiar with HDF. How do the users and Data Center personnel ingest and display HDF data using a COTS tool? In December, at Valley Forge, pre-launch demo is planned to include a simulated 1km topographic Grid product.


JPL (PO DAAC): Carol Hsu

The Physical Oceanographic DAAC at JPL has TOPEX-POSEIDON data (95% ocean coverage every two days) with CNES participation. The products include Sea Surface Height, SST, Ocean Surface Wind, etc. The SSM/I data include atmospheric moisture, hydrology and tide model related products. Data user distribution: R&D 75%, education, oil companies and others. Data holdings: 5TB; orders: 5000 for a total of 8.6TB data; shipped 10,000 educational CD-ROMs totaling 15TB.

Twenty data sets carry HDF formats, including AVHRR, CZCS, NSCAT, SSM/I, etc. SeaWinds data will be in HDF-EOS. PO DAAC uses Fortran, C, IDL. Browsing tools include EOSView, LinkWinds, Collage. Custom HDF Subsetting utilities are available on-line. Hughes STX developed an I/O library using HDF. There are portability aspects to be resolved among workstations such as Sun and Solaris, and also mentioned problems with Fortran-C interface, HP/DEC crash, etc. Suggested that vendors use variety of platforms for developments of HDF-EOS products, with following functionality: line, image, plot, browse, subsetting, select data fields, vector linear algebra, and also tools for CD-ROM products.


ASF: Chris Wyatt

Alaska SAR Facility concentrates on polar research and earth sciences. Holdings of 150 TB include ERS-1, -2, JERS (Japan), RADARSAT (Canada), GPS & ice motion data in ASCII or Binary forms, SAR in CEOS format, etc. What is required is a sniffer tool which can recognize data formats, and indicate or invoke the proper conversion software.


IX. EOS Precursor Data Sets - Mike Fitzgerald, Ames Research Center

MODIS Airborne Spectrometer (MAS) is a 50 channel aircraft based sensor with the purpose to simulate MODIS data. Using MAS, precursor data sets have been created. These proto-data sets can be used for cross-calibration/ validation and include reflective and emitted radiation modeling from 151 flights of aircraft flying over 65,000 ft. Data are 15 nautical mile swath, 716 pixel/line, 16-bit values. Parameters include ER-2 navigation, black body calibration, geolocation (lat/long), solar zenith, etc. Calibrated HDF data with error checks are produced. Exabyte 8500 tapes of 5 million scan lines are available, covering 222 flight hours and 22 TB of data.

A team at the University of Wisconsin developed and presented an IDL based on-line search and ingest tool which is publicly available along with the L1B user guide.


X. Software Developer/ Vendor Presentations, Demonstrations


JPL - LinkWinds: Lee Elson (PI) and Mark Allen (Co-I)

MUSE is a rapid prototype development tool, using object-oriented (OO) concepts for data displays, controls and standard rules. MUSE scripting uses minimum network bandwidth. It allows only commands to be transmitted in a low bandwidth mode, thus making it a useful tool for remote tutorials. A "replay journal" supports 2D/3D context sensitive animation, help, direct manipulation of display objects, slider links, data decimation, sub region selection, etc. HDF 4.1r1 and CDF are supported. Mission related formats such as UARS, SAGE, SAGE Native / RGB are supported. Tool has ingest capabilities, file finder, WWW support. Version 2.2 has been downloaded by 1900 users. At present the client tool is C-based and there is no distributed processing support.

WebWinds is a web extensible Java tool, is a modular OO system striving to be platform independent by using JavaBeans API, RMI, JDB Connection API, JIT Compilers, CORBA Connectivity, Javadoc and 2D, 3D Graphics API. It has a stand-alone web browser applet GUI, a flexible server configuration (proxy server or dedicated Java server), and a central software server with both client and server in-house. The applet helps display images, graphs, scatter plots etc. The tool was demonstrated with various ocean data.


ECOlogic - HDF-EOS DataBlade: Renu Chaudhry

URL http://www.ecologic.net

The first phase of DataBlade development will ingest HDF-EOS Grid data. Metadata will be in the file and Grid info in data objects in the database. The image will also be a binary object in the database. The second phase will cater to the users' subsetting requirements. The third and final phase will include Swath and Point objects. The prototype uses Informix Object Relational Database Management System (ORDBMS).

DataBlade technology uses OO techniques and ORDBMS together to give the users the capability of handling complex data objects (Inheritance and Functional loading), flexible secure transaction, etc. The SQL interface uses R-Tree and B-Tree indices to help retrieve objects from the database. The DataBlade API is like a handle into which various object blades can be snapped (text, image, spatial, and Web, etc.). The handle gives connection to extensible object relational engine, scalable data manager and the Informix Database Server. In the HDF-EOS context, the blades are new object types: Swath, Point, Grid, as well as standard HDF objects; routines, tables, API client code, etc. Routines provide database access through SQL. Data objects are super-type Table objects. Tables inherit Swath, Grid and Point information in Database. SQL accesses information across granules in the Database. Thus the client sends a query and gets results from the Database.

Typical SQL "where" clauses are used e.g. to retrieve complex data objects. Data and functions can both move at the same time in the Database in this formalism. Java enables ease of integration with other types of objects. HDF-EOS DataBlade is an Image DataBlade for storage, capture, compare and browse, etc. In this context, SQL allows the use of "select", "from", "where", and "contains" operating on complex objects for example cloud selection, or precipitation rates.

HDF-EOS DataBlade thus allows easy manipulation of Grid, Swath, Point data via SQL. Data and functions both reside in the Database thus giving better performance. SQL is familiar to many professionals. The prototype is reliable, secure, portable and extensible. The DataBlade concept easily incorporates HDF-EOS data objects, and this is a new technology using SQL & Objects.


Fortner Software - Ted Meyer

Scientists need powerful tools using data standards. The Web, Mosaic, HTML, GIF are some examples. Repeating paradigms are: desktop, point & click, data standards, etc. NCSA has partnership with Fortner (Examples: Windows 95/NT, MAC/OS versions of HDF libraries). For HDF information, a web site is maintained by Fortner. URL http://www.HDFinfo.com.

NOESYS is a word with Greek origin ("of or pertaining to knowledge or intellect"). NOESYS has several functions: view/create/edit data sets such as Vdatas, attributes, Vgroups, images, etc.

It can handle multidimensional data with up to 7 dimensions, creating images, surfaces, 3D isosurfaces, line scans, etc. A Fortran 90 based interpreter allows data set manipulation capabilities. NOESYS has plug-ins such as data translators, data imagers, and data manipulators. A Translator SDK (software development kit) is planned. Version 1.1 supports only HDF, 1.2 will support Translator plug-ins, 2.0 will add HDF-EOS.


LLNL (Lawrence Livermore National Laboratory) - Dean Williams

PCMDI is a system designed for the Atmospheric Model Inter-comparison Project (AMIP). PCMDI contains 9 tools including storage database, diagnosis, visualization, scientists tools (e.g. clouds analysis tool, cloud DBMS, etc.). Supported APIs include netCDF (2.3 & 3.3), HDF, DRS, GrADS, GRIB, and VPOP. Other tools include DDI (Data Dimensions Interface) and VCS (Visual & Computational System). Python is at the center of the architecture which connects all modules: DDI, VCS, Numerics, CGI, Cdunif, GUI, and wrappers for NCAR, LANL/ACL, and PCMDI functions. Metadata change through DDI and are then saved as a file through GUI. VCS functions are numerous including animation, etc. Tools run on UNIX and are available on the Web, free of charge.


Hughes STX - Data and Information Access Link (DIAL) - Ramachandran Suresh

NASA’s Data and Information Access Link (DIAL) was briefly described and demonstrated. DIAL is a CGI augmentation of a WWW server that supports metadata based file search and browsing/visualization on HDF files. It was jointly developed by Hughes STX and NCSA with NASA funding and is currently being updated to include HDF-EOS capabilities.

DIAL’s target users include low-to-medium volume data producers, Earth science researchers, and educators. Although DIAL is designed for serving Earth science data, it could easily be modified to serve other types of data. DIAL is available for testing at http://hops.stx.com:8080/dialhome.html, where the binaries for Sun Solaris and SGI IRIX can also be downloaded.


Pegasus Imaging - Andrew Hudson

Raster compression developments at Pegasus, JPEG compression Libraries, etc.

Lossless, wavelet and enhanced compressions are being developed. Lossless compression ELS (JPEG 2000 Standards Committee) is probably near optimal. The algorithms are in Dr. Dobbs' Journal.

I. Raster and data: superior results, 40% improvement in grayscale over GLZ and Huffman and a 30% improvement over GIF and LZ grayscales.

II. Wavelet toolkit is a newcomer for lossy compression, no blocks compared to JPEG. Blocky wavelet smoothes high frequencies. Larger compressions are for acceptable quality but not high contrast images (1MB to few kB). JPEG Cubis images for multimedia video transmissions and over T1 line. The algorithm degrades sequentially. A beta version of the wavelet tool will be available in a couple of weeks. A length encoder is used as part of compression. IBM is patenting an arithmetic extension to JPEG. ELS Coder with internal DCT (Discrete Cosine Transform) is 5-15% better than standard JPEG. Licensed for digital photography, imagery, flash or array of disks.

III. JPEG has been implemented as a dynamically linked library. This is also an option for Huffman encoding. This option is not available in Photoshop or other available tools. Transmission acceleration can be achieved with versions incorporating cross block smoothing. So, when free tools are available, why license? Because high performance applications require superior compression. But, ELS will be free for HDF (for space savings).


Flashback Imaging Inc. - Hao Le

A browsing tool for on-the-fly visualization of gigabytes of data in a couple of minutes was demonstrated. Examples included earth sciences and atmospheric data such as the ozone hole, weather patterns. It was stated that the RAM used by the tool was only a few MB, while the disc space was in excess of 10 GB compressed (20 GB uncompressed). The mouse was used to perform most of the operations, freeing screen space and processing power. This was an application centric demonstration; volumetric and time series data visualizations using data from various application areas such as METEOSAT, GMS, TOMS were shown. A VGA video board was used in an ordinary desktop machine (Pentium @ 200 MHz).


RSI (Research Systems Inc.) - Dr. David M Uhlir

IDL On the Network (ION) is a tool which uses Java and WWW technology to put IDL on the internet. The ION server handles client communications requests. Using Java applets requires no programming skills. The applet connects to the ION server, processes IDL commands, and displays information graphically. IDL handles graphics in Java applets, manages communications between the server and the client defines drawing areas for graphics component classes (plot, contour and surface). Java classes provide atomic access to ION. The full power of the web is in the Java client. Embedded IDL logic connection management and command processing are key features. Key features of IDL 5.0 are Object Oriented graphics, OpenGL support, accelerated 2D and 3D graphics, powerful analysis - no coding, and easy database connectivity. URL http://www.rsinc.com


The Panelists:

Candace Carlisle, ESDIS

Karl Cox, Hughes Information Technology Systems

Mike Folk, NCSA

Raj Gejjagaraguppe, Space Applications Corp.

Doug Ilg, Hughes STX

Larry Klein, Space Applications Corp.

Ray Milburn, Space Applications Corp.

Ramachandran Suresh, Hughes STX



Day 3 - 10 September 1997



Simpson Weather Associates, Inc. - Steve Greco

(To Be Summarized)


IBM, Data Explorer - Carl Spongberg

Data Explorer (DX) is an applications development environment and has 250-300 modules. Categories of modules include Canvas, DX Link and groups of modules include HDF, CDF, netCDF and HDF-EOS (yet to be included). Informix DataBlade, and ArcInfo have development agreements for DX. DX supports regular and irregular data models, structured and unstructured data, as well as missing and invalid data. Images are handled in RGB format, TIFF and GIF.

Modules include import, GUI development, analysis, computation and subsetting, display, VRML2, and export. The platform supports a large data model, contrast, comparison, animation, etc. At high end are SGI Origin and IBM SP2, the latter having 524 nodes with 8 distributed symmetric multiprocessors (SMP's) across each node. The middle ground includes Unix workstaions: Dec, DG, Sun, HP, SGI, IBM. Low end includes Windows 95/NT, also with SMP support. Output options supported are: Open GL, 3D, accelerated 3D. DX has been used by the San Diego Supercomputer Center, NCAR, and NASA Langley Research Center.

URL http://www.almaden.ibm.com/DX.


(Feedback from attendees and discussion of next steps)

Workshop Summary

The Workshop provided an opportunity for Interaction among ESDIS project, science community, DAAC personnel, and software developers/vendors. It also facilitated technical interchange among EOSDIS programmatic and technical personnel/users and HDF/HDF-EOS tool developers. It provided the current state of available tools and emphasized the need for additional tools.

Resources for Vendors/Software Developers:

Workshop web site http://ulabhp.gsfc.nasa.gov/~workshop/workshop.html

 Resources for HDF-EOS are Next step

How to continue interchange

Action Items

Future updates can be found at