Welcome, guest ( Login )

Powered by JotSpot

WikiHome » DojoDotBook » Book105

Book105

Version 24, changed by skinner 01/11/2007.   Show version history

Dojo Data

What is dojo.data?

The dojo.data module provides datastore objects which a JavaScript program can use to access a variety of different data sources. A data source could be a simple data file, a web service provided by a site like del.icio.us or Yahoo, or a database like a relational database or an XML database.

The goal of dojo.data is to have a standard set of data-access APIs, and a large set of datastore implementations that conform to those APIs.

Simple example

Here's a simple example from the dojo.data unit tests. This test page creates a dojo.data.CsvStore object, uses the CsvStore to read the contents of a simple .csv format file, and displays the results in the Dojo FilteringTable widget.

Here's the movies.csv file that the CsvStore reads from, and the test page that displays the movies in a FilteringTable:
movies.csv ==> CsvStore ==> movies.html

What sorts of datastores are available?

As of January 2007, we have five simple datastores, which are included in dojo as example datastore implementations. The four datastores are:

dojo.data.CsvStorea read-only store that reads tabular data from .csv format files
dojo.data.OpmlStorea read-only store that reads hierarchical data from .ompl format files
dojo.data.YahooStorea read-only store that fetches search results from the Yahoo search engine web service
dojo.data.DeliciousStorea read-only store that fetches bookmarks from the del.icio.us web service
dojo.data.RdfStorea read-write store that uses SPARQL to talk to RDF data servers including, for example, the Rhizome RDF application server


We hope to have more datastore implementations bundled with Dojo in future releases. We also hope that people who use Dojo will be able to easily create new custom datastore implementations to talk to their own custom data sources.

The dojo.data APIs

All of the dojo datastore implementations conform to a standard set of data-access APIs. Even though dojo.data.CsvStore and dojo.data.YahooStore read data from very different sources, CsvStore and YahooStore offer the exact same set of methods for accessing that data.

The basic operations in the dojo.data APIs are fairly simple -- create an item, delete an item, set the value of an attribute, get an attribute value, etc. We've split those operations into groups. The first section is the simple read-only access methods, then the simple write-access methods, and then additional methods for less common features like attribution, versioning, update notification, schema inspection, etc.

As of Dojo release 0.4.1, we have only defined a few basic data-access APIs:
  • dojo.data.core.Read -- 10 basic read-only datastore access methods
  • dojo.data.core.Write -- 9 basic methods for creating data items and updating attribute values
  • dojo.data.core.Identity -- 2 basic methods for datastores that can uniquely identify items
In future Dojo releases we'll be adding more API definitions, probably including:
  • dojo.data.core.Notification -- update notifications
  • dojo.data.core.Schema or Model
  • dojo.data.core.Attribution -- create/modify timestamps, author, etc.
  • dojo.data.core.Versioning -- for access to old versions of items
  • dojo.data.core.Derivation -- derived attributes and calculated values

Status as of December 2006

The initial dojo.data APIs and example datastore implementations will ship in the dojo 0.4.1 release in December 2006. The APIs aren't 100% stable yet, and we don't yet have good documentation, but they should be a foundation for the dojo.data work to come. The entire dojo.data module is marked as experimental, and the APIs are subject to change without notice.

Terminology

  • data source -- the place that the raw data comes from. In the movies.csv example above, the movies.csv file is the data source. The data source could be a file, a database server, a web service, or something else completely.
  • datastore -- a JavaScript object that reads data from a data source and makes that data available as data items via the dojo.data APIs
  • dojo.data APIs -- the standard set of methods that datastore implements. The dojo.data module includes a set of APIs (Read, Write, etc.), and a datastore may implement one or more of the APIs.
  • internal data representation -- the private data structures that a datastore uses to cache data in local memory (XML DOM nodes, anonymous JSON objects, arrays of arrays, etc.)
  • item -- a data item that has attributes with attribute values. In the movies.csv example above, Alien is an item with three attributes, Title:'Alien', Year:1979, and Producer:'Ridley Scott'.
  • attribute -- one of the fields or properties of an item. In the movies.csv example, Producer is an attribute.
  • value -- the contents of an attribute for a given item. In the movies.csv example, 'Ridley Scott' is a string value.
  • reference -- a value in an item that points to another item
  • identity -- some sort of identifier that can be used to uniquely identify an item within the context of a single datastore.
  • query -- some sort of specification or request that asks a datastore for some subset of the items it knows about. A query will often be a string, but in some datastores it might be a number or a date or a complex object structure.

Source code organization

The dojo.data API definition files, like Read.js and Write.js, are located in the /src/data/core directory. The core directory also contains any generally re-usable dojo.data code, including classes that are designed to serve as abstract superclasses for datastore implementations, or files that are designed to be used as mixins for adding functionality to datastore implementations.

The datastore implementations that ship with dojo are located in the /src/data directory itself.

API documenation

The API definition files include in-line comments providing documentation about each of the methods in the API. Those documentation comments should eventually be available in the on-line Dojo API Reference tool, but unfortunately that's not yet working correctly, so for now the methods' signatures and their comments fail to show up in the on-line Dojo API Reference tool. Once that's fixed, the API Reference documentation will be here:
http://dojotoolkit.org/api/#dojo.data

You can also look at the API definition files themselves to see the documentation about individual API methods:

Examples

In the dojo.data unit-tests, there's an example page that allows you to read data from any of a variety of different data sources and then display data in a few different widgets:

We also have a number of dojo.data unit-test pages, which may serve as simple examples of how to use the APIs:


Dependency diagram

One of the goals for the dojo.data module was to decouple widget code from data provider code. Because we now have standard dojo.data APIs, a widget author can write a widget binding that gets data using the dojo.data APIs and displays that data in the widget.

The widget code itself can be independent of the dojo.data APIs, and know nothing about how data access is done. The widget binding code depends on both dojo.data APIs and on the widget itself, but the dojo.data APIs are independent of both the widget itself and the widget binding. The widget and the binding are both independent of any particular datastore implementation, which allows a single widget binding to be used with a variety of datastores implementations.

Once one person has written one binding to display dojo.data items in a FilteringTable, then data items from any datastore can be displayed in a FilteringTable. If Dojo someday has 15 datastore implementations, and has 20 data display widgets, then dojo authors will only need to write 20 bindings in order to connect all the widgets to all the datastores. Without a standard dojo.data API, we would need a different bit of intermediary code for each possible connection between a widget and a datastore -- with 15 datastores and 20 widgets, that would require 300 different pieces of intermediary data translation code.


+-----------+
Widgets Bindings |
dojo.data | Datastores Data Sources
|
APIs |
+------------------+ | |
| Trees | | |
| (TreeV3) |-- binding --| |
+------------------+ | | +------------+ +-----------+
| |---| CsvStore |---| .csv file |
+------------------+ | | +------------+ +-----------+
| Tables & Grids | | |
| (FilteringTable) |-- binding --| | +------------+ +-------------------+
+------------------+ | |---| YahooStore |---| Yahoo web service |
| | +------------+ +-------------------+
+------------------+ | |
| Charts & Graphs | | | +------------+ +------------+
| (Chart) |-- binding --| |---| OpmlStore |---| .opml file |
+------------------+ | | +------------+ +------------+
| |
+------------------+ | | +------------+ +------------+
| Other widgets | | |---| RdfStore |---| RDF server |
| (ComboBox) |-- binding --| | +------------+ +------------+
| (SlideShow) |-- binding --| |
| (etc.) |-- binding --| | +------------+ +--------------------+
+------------------+ | |---| Other |---| other file formats |
+-----------+ | stores | | and web services |
+------------+ +--------------------+

Goals

We are designing the data-access APIs with a wide variety of use cases in mind, and we hope that the APIs will work well with many different kinds of data: RDF, XML, SQL, CSV, OMPL, etc. Here are some of the features we've been designing the APIs to support:

  • sync vs. async -- Some data sources can be read synchronously and will always have data available immediately (for example, if the data is coming from an HTML table on the web page itself). Some datastores may provide data asynchronously (for example, as the result of a search submitted to Yahoo). Some datastores may provide results incrementally -- the first 100 items in one batch, then a pause, then another 100 items, etc.
  • read vs. write -- Some datastores may be read-only, and some may be read-write.
  • hierarchical vs. tabular -- Some sources have hierarchical data and some sources have tabular data.
  • strongly typed vs. loosely typed -- Some datastores may provide "strongly typed" data and some sources may have free-form data.
  • literals vs. references -- Some datastores may only support literal values (for example, a CsvStore), while other datastores support both literals and item references.
  • single-valued vs. multi-valued attributes -- Some datastores may store only single values for an attribute {name:"robert"}, and some datastores may allow multiple values for each attribute {name:["robert","bob"]}.
  • simple identifiers vs. complex identifiers -- Most datastores may use simple serial numbers or key strings as unique identifiers, but relational databases might require compound keys, and other datastores may require XPath expressions, or may not support the notion of identifiers at all.
  • derived attributes -- Some items may have derived attributes as well as stored attribute values (price = base-price + tax).
  • lazy reference building -- Simple data sources will have only literal attribute values, but some data sources will have items with reference values that point to other data items. When the data is read from some serialized format (read from a file or retrieved from a server), all the references will be in the form of foreign keys or unique ids. At some point the datastore needs to notice the id-reference and replace it with a pointer to the referenced object. The datastore could to that reference building step in bulk for all the items immediately after the data is loaded, but the API is designed so as not to force the datastore to be implemented that way.
  • lazy loading: reference faulting -- For a datastore that connects to large database, it won't be possible to load the whole data set into memory at once. For example, imagine a genealogy data set, with data records for hundreds of thousands of people, all interconnected. You might want to load only a few dozen records at first, and then incrementally load more records as the user navigates through the graph. When the user clicks on something in the UI to see who Mark's mother is, the UI code calls get(mark, mother) on the datastore API. The datastore might be able to return a result based on what's already loaded in memory, or it might need to send a request to the server to get more info.
  • lazy loading: partial objects -- For some large data sets, you may want to start by loading only part of each data item. For example, in an e-mail client, you might have a list view that shows just the subject line for a bunch of messages -- when you select one of the messages, then the body of the message appears in the detail pane below. You don't want to load all the message body in bulk -- you want to get them one at a time, as needed.

Widget bindings

We hope that the dojo.data APIs will support:

  • Dojo widget bindings -- Dojo includes a number of data display widgets (Trees, Charts, Data Tables, etc.), and we want to be able to "bind" any of those widgets to any datastore instance that conforms to the data-access APIs.
  • third party widget bindings -- We also want the authors of proprietary third-party widgets to be able to bind those widgets to any datastore implementation.
  • read/write bindings, with update notifications -- The bound widget updates when the data model updates, and the data model is updated when the user makes a change via the widget.
  • easy creation of bindings -- support for creating bindings programmatically in JavaScript code or declaratively in HTML
  • different granularities for bindings -- simple attribute bindings that bind one attribute of one data item to one property of one UI control (for example, an input field), as well as full data-set bindings that bind an entire set of data items to a widget like a chart or a grid

Data sources

We hope to eventually have different datastore implementations that read data from a wide variety of data sources. Here's a list of some of the kinds of data sources that we've had in mind while designing the APIs:

Datastore data representations

We hope that the dojo.data APIs do not impose unnecessary limitations on the ways that datastore implementations can represent data in memory. In particular, we've designed the APIs with a few different data representations in mind:

  • simple anonymous JSON objects: {name:"sales", headcount:38}
  • JavaScript objects that are instances of data classes: Dept.js, Employee.js...
  • XML DOM nodes

Design notes

Question: Why aren't the dojo.data APIs more object-oriented, with accessor methods available on the individual data items?

For example, using the existing dojo.data APIs, a line of code might look like this:

var value = datastore.get(kermit, color);
Why weren't the APIs designed so that line of code instead looked like this:
var value = kermitItem.get(color);

Answer: Performance -- both memory use and execution speed.

If we had used an object-oriented API for the item accessor methods, that would have required every datastore implementation to have a different JavaScript object for every data item. By putting the accessor methods on the datastore object rather than on a data item object, we made it possible for different datastore implementations to use different data structures.

For example, our OpmlStore reads data from an XML text file using XMLHttpRequest via a call to dojo.io.bind. The XML text file is automatically parsed into XML DOM nodes, and the OpmlStore uses those XML DOM nodes themselves as its native data item representation, which avoids the unnecessary step of copying all the data from the DOM nodes into JavaScript objects.

For more background on different approaches we considered, see the dojo.data design page on the dojo development wiki.

History

The dojo.data module was first envisioned back in 2005 or earlier, before Dojo release 0.1 had shipped.

  • The first experimental dojo.data code was written in January 2006, but it was geared only toward semi-structured data, and it had a single data model implementation rather than defining an API which could be implemented differently by different datastore authors.
  • In July 2006 IBM contributed a significant body of experimental data model code, including both code geared toward data from XML data sources and code geared toward data from structured data sources like relational databases.
  • Over the course of the next few months, August to October, Dojo held a series of dojo.data design meetings to try to design a unified set of APIs that would work well for a wide variety of use cases. In November 2006, for the Dojo 0.4.1 release, we checked in the initial dojo.data Read and Write APIs, as well as a handful of example datastore implementations, and a handful of test pages. The old design work is archived on the dojo wiki, on the dojo.data page.

Future plans

In 2007 we hope to:
  • extend the dojo.data APIs to handle features like attribution, versioning, and update notification
  • write more datastore implementations
  • write bindings for a number of dojo data display widgets
  • write more unit tests
  • improve the documentation

How can people volunteer to help?

If you'd like to help with the dojo.data module, here are some areas where we could especially use some help:
  • Dojo Book entry: add to this page, or improve what's already here
  • API method documentation: improve the documentation in the API definition files
  • unit tests: write more unit tests to help ensure that different datastore implementations are truly interoperable
  • datastore implementations: write new datastore classes that get data from different data sources (different web services, file formats, databases, application server, server frameworks, etc.)

Attachments (0)

  File By Size Attached Ver.