Wikidata:Data access

From Wikidata
Jump to: navigation, search
Translate this page; This page contains changes which are not marked for translation.

Other languages:
العربية • ‎català • ‎dansk • ‎Deutsch • ‎English • ‎español • ‎euskara • ‎فارسی • ‎suomi • ‎français • ‎हिन्दी • ‎Հայերեն • ‎interlingua • ‎italiano • ‎日本語 • ‎Lojban • ‎Nederlands • ‎polski • ‎português do Brasil • ‎русский • ‎中文

This page is a starting point for you if you are an institution, company or organisation that wants to use data from Wikidata.

Basic important things to know

Volunteers like these people - and you - make Wikidata

Wikidata offers a wide range of general data about our universe as well as links to other databases. The data is published under the CC0 "Public domain dedication" license. It can be edited by anyone and is maintained by Wikidata's editor community.

How can I get data out of Wikidata?

There are several ways to access and edit the data from Wikidata. You can access data per item, or the entirety of the data as dumps.

Per-item access to data

Data can be accessed either via dereferencable URIs following linked data standards, or through the MediaWiki API.

Linked Data interface

Meet Q42

Each item or property has a persistent URI that you obtain by appending its ID (such as Q42 or P12) to the Wikidata concept namespace:

http://www.wikidata.org/entity/

For example, the concept URI of Douglas Adams is http://www.wikidata.org/entity/Q42. Note that this URI refers to the real-world person, not Wikidata's description of Douglas Adams. However, it is possible to use the concept URI to access data about Douglas Adams by simply using it as a URL. When you request this URL, it triggers an HTTP redirect that forwards the client to the data URL for Wikidata's data about Douglas Adams: http://www.wikidata.org/wiki/Special:EntityData/Q42. The namespace for Wikidata's data about entities is

http://www.wikidata.org/wiki/Special:EntityData/

Appending an entity's ID to this prefix creates the "abstract" (format neutral) form of the data URL of the entity. When you request a Special:EntityData URL, the special page applies content negotiation to determine the format of Wikidata's output. Most likely you opened the URL in a normal Web browser, and an HTML page of Wikidata's data about the entity will be displayed, because a web browser prefers HTML over other formats. Linked data clients would receive Wikidata's data about the entity in a different format such as JSON or RDF, depending on the HTTP Accept: header of their request.

For cases in which it is inconvenient to use content negotiation (e.g. to view non-HTML content in a web browser), you can also access data about an entity in a specific format by extending the data URL with an extension suffix to indicate the content format that you want, such as .json, .rdf, .ttl or .nt. For example, http://www.wikidata.org/wiki/Special:EntityData/Q42.json leads to a JSON export for item Q42. Specific revisions can be obtained by appending a revision query parameter like so http://www.wikidata.org/wiki/Special:EntityData/Q42.json?revision=112.

MediaWiki API

See the documentation of the API.

Caution: Some API modules, in particular those accessed via action=query, will return raw page content. For entity pages, that raw page content is not guaranteed to use any documented format or follow any standard structure. Raw page content should be treated as an opaque blob. For access to the canonical JSON form of entity pages, use the wbgetentities and wbsearchentities modules.

Wikidata Query (WDQ)

See the homepage of this API

WDQ currently underpins many tools used to explore and interact with Wikidata.

However, going forward, a properly integrated query service is being worked on. You can track the progress of it in this Phabricator project.

SPARQL endpoints

There are currently the following SPARQL endpoints with Wikidata data. They are all experimental.

  • Wikidata Query Service Beta: BlazeGraph-based endpoint; with examples and a translator from WDQ to SPARQL. See user manual and local community pages.
  • University of Chile: Virtuoso-based endpoint (3rd-party)
  • Metaphacts: BlazeGraph-based endpoint (3rd-party)
  • OpenLink: Virtuoso-based endpoint; LOD Cloud Cache which tracks 5-star Linked Open Datasets in general (3rd-party)
  • LDF: Demonstration of Linked-Data-Fragments (client-based query answering) for Wikidata (3rd-party)

The end points that were set up to assist RDF/SPARQL work at Wikimedia are also use to gather logs of queries, and it is particularly welcomed if you use them to try important queries. Issues should be discussed on the wikidata-l mailing list.

Note that the underlying RDF model used in each store to represent Wikidata contents may differ, since the discussion of this is still in flux as of early 2015. Moreover, some endpoints may only use part of the data (e.g., only simplified statements but not the full ones) or be based on dumps that are not the most recent.

Bots

We welcome well-behaved bots

You can also access the API by using a bot. See Wikidata:Bots for more on bots.

Access to dumps

See the database dumps documentation.

Incremental updates

The recent changes API can be used to see which entities were changed and those can then be fetched via the per item access to data. That way one can incrementally update as long as one is not behind by 30 days or more.

Best practices to follow

Our logo

Wikidata offers you the data in Wikidata for free with no requirement to attribute under CC-0. We would however greatly appreciate if you would mention Wikidata as the origin of your data. This will allow us to ensure that the projects stays around for a long time and provides you with up-to-date and high quality data. We will also promote the best projects using Wikidata's data. Some examples for attributing Wikidata: "Powered by Wikidata", "Powered by Wikidata Tags", "Powered by Wikidata data", "Powered by the magic of Wikidata", "Using Wikidata data", "With data from Wikidata", "Data from Wikidata", "Source: Wikidata", "Including data from Wikidata", ...

You may also use the Wikidata logo (see above), but should not do so in any way that implies endorsement by Wikidata, or the Wikimedia Foundation.

Please offer your users a way to report issues in the data and find a way to feed this back to Wikidata's editor community. We are currently working on streamlining this process. Until then please announce where you collect issues on the Project chat.

Examples and showcases

A number of great tools are being built on top of Wikidata. The external tools page collects them.

See also