logo200.png

The Unified Code for Units of Measure

Gunther Schadow, Clement J. McDonald

Regenstrief Institute, Inc., Indianapolis, IN

Announcement

We are migrating the maintenance of UCUM to this trac site, apologies for any transient difficulties.

What is it?

The Unified Code for Units of Measure is a code system intended to include all units of measures being contemporarily used in international science, engineering, and business. The purpose is to facilitate unambiguous electronic communication of quantities together with their units. The focus is on electronic communication, as opposed to communication between humans. A typical application of The Unified Code for Units of Measure are electronic data interchange (EDI) protocols, but there is nothing that prevents it from being used in other types of machine communication. How does it relate?

The Unified Code for Units of Measure is inspired by and heavily based on ISO 2955-1983, ANSI X3.50-1986, and HL7's extensions called ISO+. The respective ISO and ANSI standards are both entitled Representation of ... units in systems with limited character sets where ISO 2955 refers to SI and other units provided by ISO 1000-1981, while ANSI X3.50 extends ISO 2955 to include U.S. customary units. Because these standards carry the restriction of limited character sets in their names they seem to be of less value today where graphical user interface and laser printers are in wide-spread use, which is why the european standard ENV 12435 in its clause 7.3 declares ISO 2955 obsolete.

ENV 12435 is dedicated exclusively to the communication of measurements between humans in display and print, and does not provide codes that can be used in communication between systems. It does not even provide a specification that would allow communication of units from one system to the screen or printer of another system. The issue about displaying units in the common style defined by the 9th Conférence Générale des Poids et Mesures (CGPM) in 1947 is not just the character set. Although The Unicode Standard and its predecessor ISO/IEC 10646 is the richest character set ever it is still not enough to specify the presentation of units because there are important typographical details such as superscripts, subscripts, roman and italics.1

Why is it needed?

The real value of the restriction on the character set and typographical details, however, is not to cope with legacy systems and less powerful technology, but to facilitate unambiguous communication and interpretation of the meaning of units from one computer system to another. In this respect, ISO 2955 and ANSI X3.50 are not obsolete because there is no other standard that would fill in for inter-systems communication of units. However, ISO 2599 and ANSI X3.50 currently have severe defects:

  1. ISO 2955 and ANSI X3.50 contain numerous name conflicts, both direct conflicts (e.g., "a" being used for both year and are) and conflicts that are generated through combination of unit symbols with prefixes (e.g., "cd" means candela and centi-day and "PEV" means peta-volt and pico-electronvolt.)
  2. Neither ISO 2955 nor ANSI X3.50 cover all units that are currently used in practice. There are many more units in use than what is allowed by the Système International d'Unités (SI) and accompanying standards. For example, the older CGM-units dyne and erg are still used in the science of physiology. Although ANSI X3.50 extends ISO 2955 with some U.S. customary units, it is still not complete in this respect. For example it doesn't define the degree Fahrenheit.
  3. ANSI X3.50 is semantically ambiguous with respect to customary units, even if we do not consider the history and international aspects of customary units. Three systems of mass units are used in the U.S., avoirdupois used generally, apothecaries' used by pharmacists, and troy used in trade with Gold and other precious metals. ANSI X3.50 has no way to select any one of those specifically, which is bad in medicine, where both apothecaries' and avoirdupois weights are being used frequently.

ISO 2955 and all standards that do only look for the resolutions and recommendations of the CGPM and the Comité International des Poids et Mesures (CIPM) as published by the Bureau International des Poids et Mesures (BIPM) and various ISO standards (ISO 1000 and ISO 31) fail to recognize that the needs in practice are often different from the ideal propositions of the CGPM. Although not allowed by the CGPM and related ISO standards, many other units are used in international sciences, healthcare, engineering, and business, both meaningfully and some units of questionable meaning. A coding system that is to be useful in practice must cover the requirements and habits of the practice---even some of the bad habits.

None of the current standards attempt to specify a semantics of units that can be deployed in information systems with moderate requirements. Metrological standards such as those published by the BIPM are dedicated to maximal scientific correctness of reproducible definitions of units. These definitions make sense only to human specialists and can hardly be deployed to their full extent by any information system that is not dedicated to metrology. On the other hand, ISO 2955 and ANSI X3.50 provide no semantics at all for the codes they define.

The Unified Code for Units of Measure intends to provide a single coding system for units that is complete, free of all ambiguities, and that assigns to each defined unit a concise semantics. In communication it is not only important that all communicating parties have the same repertoir of signs, but also that all attach the same meaning to the signals they exchange. The common meaning must be computationally verifiable. The Unified Code for Units of Measure assumes a semantics for units based on dimensional analysis.2

In short, each unit is defined relative to a system of base units by a numeric factor and a vector of exponents by which the base units contribute to the unit to be defined. Although we can reflect all the meaning of units covered by dimensional analysis with this vector notation, the following tables do not show these vectors. One reason is that the vectors depend on the base system chosen and even on the ordering of the base units. The other reason is that these vectors are hard to understand to human readers while they can be easily derived computationally. Therefore we define new unit symbols using algebraic terms of other units. Those algebraic terms are also valid codes of The Unified Code for Units of Measure.

What is available?

The Unified Code for Units of Measures is very stable in content and has already been adopted by some standard organizations such as DICOM, HL7 and has been referenced as best practice by the Open Geospatial Consortium in their Web Map Service (WMS) and Geography Markup Language (GML) implementation specifications. We are still looking for the best way to establish this specification as a widely used industry standard. The official status and the affiliation may change during that process. However, we try to keep as much as possible of the specification freely available and redistributable to assure the maximum use and benefit. We would also like to keep this specification maintainable and flexible to updates. Although the initial version contains more than 250 terminal unit symbols (more than three times as many symbols as in ANSI X3.50), there are areas that are not covered completely yet.

The specification is maintained electronically so that the printed version is guaranteed to contain consistent and tested data that is free from severe name conflicts or random errors. The full specification is now available as an HTML document (whereas it used to be only a PDF file). The new XML format of the specification enabled us to make XML releases of the formal part of the specification, have better sorting and indexing capabilities, etc.

IMPORTANT SPECIAL RELEASE NOTE

Version 1.6, November 2005

Ever since we changed the internal maintenance of The Unified Code for Units of Measure to XML (which happened after release 1.4 in May of 2002) the definition of some units that used exponential notation for the magnitude was incorrect. A systematic text-conversion error had caused the minus sign and the first digit of all exponents to be deleted. This mostly affected natural constants, such as parsec, proton mass, electron charge, Boltzman's constant, and all units that are defined based on these. Fortunately these units are rather rare in everyday use in trade and medicine, however, we must urge everyone to regenerate their tables, dictionaries, and knowledge bases, and check their data immediately based on this corrected release of The Unified Code for Units of Measure.

The units directly affected by this error were: unified atomic mass unit (u), parsec (pc), Planck's constant [h], Boltzman's constant [k], electric permittivity ([eps_0]), elementary charge ([e]), electron mass ([m_e]), proton mass ([m_p]), Newton's constant of gravitation ([G]), Maxwell (Mx), Gauss (G), phot (ph), Curie (Ci), Roentgen (R), and U.S. and international mil ([mil_i], [mil_us]).

Because these units were used in some (though not many) definitions of other units, the indirectly affected units are: electron volt (eV), and circular mil ([cml_i]).

More

There is an open-source reference implementation, instantly usable as a Java applet, that is configured at runtime over the Internet with the latest release of The Unified Code for Units of Measures.

Comments

All comments are welcome and are usually responded to within only few business days. We invite commenters to submit a tracked ticket (http://aurora.regenstrief.org/ucum/newticket). From now on we will also post all email exchanges with responses in order to maintain accountability for any changes and community input. See the CommentsArchive for these.


1) Interestingly the authors of ENV 12435 forgot to include superscripts in the minimum requirements as given by subclause 7.1.4 for which they do not specify an alternative.

2)A more extensive introduction into this semantics of units can be found in: Schadow G, McDonald CJ et al: Units of Measure in Clinical Information Systems. JAMIA. 6(2); Mar/Apr 1999; p.~151--162. Available from: URL: http://www.jamia.org/cgi/reprint/6/2/151