We use cookies on this site to count our users and display social networks buttons. You can find more information on their use on this page.

Concepts Overview

You'll find below the main concepts that are used in the remaining of this documentation.

Dataset

A Dataset is a logical data entity. It contains a set of Records. It can be seen as a table in a relational database.

A Dataset also contains a set of metadata that describes it further (for instance, the publication date, the ownership, tags, themes, ...).

Thus, a Dataset is fully defined by the list of Fields of the Records it contains and by its metadata.

Record

A Record is simply a row of values associated with their Fields. It is similar to a row in an Excel spreadsheet.

Domain

A Domain contains users and Datasets and defines a set of services allowing to manage and access these objects (for instance, the search API, the exploration console).

A Domain can be public or private. In the latter case, only a select group of users can access it. A specific user can be granted access to one or several Domains.

Connection and Authentication

Access to Domain APIs can be either public or protected depending on the security properties of the Domain. When this access is protected, two solutions can be used to authenticate a user on the Domain:

  • HTTP Basic Authentication using the user login and password.
  • An API key, passing the key as a simple HTTP parameter apikey. To generate an API key, just open your preferences page when connected.
    http://<DOMAIN>/api/datasets/1.0/search/?apikey=<APIKEY>

Both HTTP and HTTPS may be used. When the call is authenticated, it is recommended to use HTTPS to protect both authentication information and returned data. Please make sure to pay attention to potential web browser restrictions as well (for instance, JSONP HTTP call from inside a HTTPS protected page).

Datasets APIs

/api/datasets/1.0/search/
Description

This API exposes the Dataset catalog and allows for search based access (full-text search and faceted navigation).

HTTP Methods

This API supports GET and POST methods. GET queries are recommended however.

Parameters
q

The full-text query. This parameter can be left empty, in which case no full-text filtering on the result set occurs. Detailed documentation.

lang

The language that will be used to process the full-text query, activating linguistic processing features. The value of this parameter shall be an ISO 639-1 code.

facet

Activates faceting on the specified field (see Appendices for the available list of facets on Datasets). This parameter can be used multiple times to simultaneously activate several facets. By default, faceting is disabled.

facet=modified

refine.<FACET>

Facet based filtering. This parameter limits the result set to the results matching a facet value. It can be used several times for the same facet or for different facets.

refine.modified=2012/02
refine.modified=2012/02&refine.publisher=Paris

exclude.<FACET>

Facet based filtering. This parameter excludes the results matching a facet's value from the result set. It can be used several times for the same facet or for different facets.

exclude.modified=2012/02
exclude.modified=2012/02&exclude.publisher=Paris

sort

Sorts results according to the specified field (see Appendices for the list of available sort fields on Datasets). By default, the sort is descending (from the highest value to the smallest value). A minus sign ('-') may be used to perform an ascending sort.

sort=issued
sort=-issued

rows

Number of results to return in a single call. Use -1 to return all the results. By default, 10 results are returned.

start

Index of the first result to return (starting at 0). To be used in conjunction with "rows" to implement paging.

pretty_print

If set to true (default is false), pretty prints JSON and JSONP outputs.

format

Format of the response output. One of JSON (default), JSONP and CSV.

callback

JSONP callback.

format=jsonp&callback=myFunction

Lookup

/api/dataset/1.0/<DATASETID>/
Description

This API makes it easy to fetch meta information about a given Dataset.

HTTP Methods

This API supports GET and POST methods. GET queries are recommended however.

Parameters
datasetid

Mandatory (in the URL): identifier of the Dataset.

http://<DOMAIN_ID>/api/dataset/1.0/arbresremarquablesparis2011/?...

pretty_print

If set to true (default is false), pretty prints JSON and JSONP outputs.

format

Format of the response output. One of JSON (default) and JSONP.

callback

JSONP callback.

format=jsonp&callback=myFunction

Records APIs

/api/records/1.0/search/
Description

This API makes it possible to perform complex queries on the Records of a Dataset, such as full-text search or geo search. It also provides faceted search features on Dataset Records.

HTTP Methods

This API supports GET and POST methods. GET queries are recommended however.

Parameters
dataset

Mandatory: identifier of the Dataset (datasetid) Several Datasets of the same Domain can be queried simultaneously. Just repeat this parameter as many times as required.

q

The full-text query. This parameter can be left empty, in which case no full-text filtering on the result set occurs. Detailed documentation.

lang

The language that will be used to process the full-text query, activating linguistic processing features. The value of this parameter shall be an ISO 639-1 code.

geofilter.distance

Limits the result set to a geographical area defined by a circle (coordinates of the center of the circle expressed in WGS84 and distance expressed in meters): latitude,longitude,distance

geofilter.distance=48.8520930694,2.34738897685,1000

geofilter.polygon

Limits the result set to a geographical area defined by a polygon (coordinates of the points expressed in WGS84): (lat1,lon1),(lat2,lon2),(lat3,lon3)

geofilter.polygon=(48.883086,2.379072),(48.879022,2.379930),(48.883651,2.386968)

facet

Activate faceting on the specified field. This parameter can be used multiple times to activate simultaneously several facets. By default, faceting is disabled.

facet=city

refine.<FACET>

Facet based filtering. This parameter limits the result set to the results matching a facet's value. It can be used several times for the same facet or for different facets

refine.city=Paris
refine.city=Paris&refine.year=2013

exclude.<FACET>

Facet based filtering. This parameter excludes from the result set the results matching a facet's value. It can be used several times for the same facet or for different facets.

exclude.city=Paris
exclude.city=Paris&exclude.year=2013

sort

Sort results given the specified field. By default, the sort is descending (from the highest value to the smallest value). To perform an ascending sort, just use '-' as a prefix.

Sorting is only available on numeric fields (integer, double, date and datetime) and only on single Dataset queries.

sort=price
sort=-width

rows

Number of results to return in a single call. By default, 10 results are returned. If you need a large number of results (10 000 or more), you can use the Download service.

start

Index of the first result to return (starting at 0). To be used in conjunction with "rows" to implement paging.

pretty_print

If set to true (default is false), pretty prints JSON, JSONP, GEOJSON and GEOJSONP outputs.

format

Format of the response output. One of JSON (default), JSONP, geoJSON and geoJSONP.

format=json
format=geojsonp

callback

JSONP and GEOJSONP callback.

format=jsonp&callback=myFunction

Download

/api/records/1.0/download/
Description

This API provides Records download streaming features. It accepts the same filtering parameters as the Search API but returns a flow of Records as they are generated by the server.

HTTP Methods

This API supports GET and POST methods. GET queries are recommended however.

Parameters
q

The full-text query. This parameter can be left empty, in which case no full-text filtering on the result set occurs. Detailed documentation.

lang

The language that will be used to process the full-text query, activating linguistic processing features. The value of this parameter shall be an ISO 639-1 code.

geofilter.distance

Limits the result set to a geographical area defined by a circle (coordinates of the center of the circle expressed in WGS84 and distance expressed in meters): latitude,longitude,distance

geofilter.distance=48.8520930694,2.34738897685,1000

geofilter.polygon

Limits the result set to a geographical area defined by a polygon (coordinates of the points expressed in WGS84): (lat1,lon1),(lat2,lon2),(lat3,lon3)

geofilter.polygon=(48.883086,2.379072),(48.879022,2.379930),(48.883651,2.386968)

refine.<FACET>

Facet based filtering. This parameter excludes from the result set the results matching a facet's value. It can be used several times for the same facet or for different facets.

exclude.city=Paris
exclude.city=Paris&exclude.year=2013

exclude.<FACET>

Facet based filtering. This parameter excludes from the result set the results matching a facet's value. It can be used several times for the same facet or for different facets.

exclude.city=Paris
exclude.city=Paris&exclude.year=2013
format

Format of the response output. One of CSV, JSON (default), JSONP, geoJSON and geoJSONP.

callback

JSONP and GEOJSONP callback.

format=jsonp&callback=myFunction

Analyze (Analyze)

/api/records/1.0/analyze/
Description

This API allows for analyzing the data contained in a Dataset, while limiting them to rows matching user defined search criteria, in a similar way as the search API.

This API takes series for parameters (see below), and returns the result of these statistical series.

HTTP Methods

This API supports the GET method. The POST method is supported as well, however its use is recommended against for standardization reasons.

The analyze API takes a X parameter, which describes what the data will be aggregated on, and one or more Y parameters, representing what will be graphed.

Supported output format are JSON (the default), JSONP (which requires an additional JSONP callback parameter when specified) and CSV.

Parameters
dataset

Mandatory: identifier of the source Dataset (datasetid) on which the query will take place.

x

Mandatory: the name of the field off which the data aggregation will be based. It allows for analyzing a subset of data according to the different values of the fields

Example: to get the average height of trees by species

x=tree_species&y.series1.func=AVG&y.series.expr=height

The behavior changes according to the field type:

  • Date or DateTime: the slices are made on the dates contained in the field. It is possible to refine the desired aggregation with the precision and periodic parameters

    x=event_date

  • Other types: the slices are made on the field values

    x=tree_species

    [{"x": "plane tree", "series1": 10.7 }, {"x": "oak tree", "series1": 12.3}]

y.<SERIES>.<FUNC>

Mandatory: the name of the field off which the data aggregation will be based.

<SERIES> is an arbitrary name. It is used to define the related expression (<EXPR>) and to name the output set.

Available functions:

  • Functions that do not require an expression (<EXPR>): COUNT:

    There functions are only based on the x parameter. If provided, the <EXPR> expression will be ignored.

    x=city&y.countseries.func=COUNT
  • Functions that require an expression (<EXPR>): AVG, SUM, MIN, MAX, STDDEV, SUMSQUARES

    These functions return the result of their execution on the expression provided in y.<SERIE><EXPR> for each value of x

    For example: x=tree_species&y.series1.func=AVG&y.series1.EXPR=height

    Output: [{"x": "plane tree", "series1": 10.7}, {"x": "oak tree", "series1": 12.3}]

y.<SERIES>.<EXPR>

Mandatory for the functions AVG, SUM, MIN, MAX, STDDEV, SUMSQUARES. The <SERIES> parameter must have the same name as the one used for the function.

The parameter may contain the name of a numeric field in the Dataset (INT or DOUBLE), or a mathematical expression

For example: x=tree_species&y.series1.func=AVG&y,series1.expr=0.079578 * height * circumference * circumference

Outputs: [{"x": "plane tree", "series1": 0.716}, {"x": "oak tree", "series1": 1.114}]

In this example, circumference and height are names of fields contained in the Dataset.

What's more, the API provides common mathematical functions that can be used in the expressions.

These functions are: time, sin, cos, tan, asin, acos, atan, toRadians, toDegrees, exp, log, log10, sqrt, cbrt, IEEEremainder, ceil, floor, rint, atan2, pow, round, random, abs, max, min, ulp, signum, sinh, cosh, tanh, hypot

x=espece_arbre&y.series1.func=Min&y.series1.expr=sin(height) * 2

y.<SERIES>.cumulative

This parameter accepts values true and false (which is the default). If the parameter is set to true, the results of a series are combined with the previous values.

maxpoints Limits the maximum number of results returned by the query. By default there is no limit.
periodic

Used only in cases in which X is of type Date or DateTime.

It defines the level at which aggregation is done. Possible values are year, month, week, weekday, day, hour, minute

For example: x=event_date&periodic=weekday&y.series1.func=COUNT

Outputs: [{"x": {"weekday":0},"series1": 12}, {"x": {"weekday":1},"series1": 30}]

Weekday returns a value between 0 and 6, in which 0 corresponds to Monday and 6 to Sunday. In this example, there is 12 events on Monday and 30 events on Tuesday in the Dataset.

precision

Used only in cases in which X is of type Date or DateTime.

It defines the precision with which the aggregation will take place. Possible values are year, month, week, day, hour, minute.

If weekday is provided as periodic parameter, the precision is ignored.

The parameter may not be narrower than the precision defined at the creation of the Dataset.

If this parameter is not provided, the default precision is day.

For Example: x=event_date&periodic=year&precision=month&y.series1.func=COUNT

Outputs: [{"x": {"year": 2002, "month":1},"series1": 3}, {"x": {"year": 2002, "month":1},"series1": 5}]

sort

sorts values according to the specified series, or to the x parameter. By default, the values are sorted in descending order, according to the x parameter. A minus sign ('-') can however be prepended to the argument to make the sort bein ascending order.

x=city&y.series1.func=SUM&y.series1.expr=population&sort=-x

x=city&y.series1.func=SUM&y.series1.expr=population&sort=-series1

q

full-text query. By default, with no query, all results are returned. Detailed documentation

lang

the language used to interpret the q parameter, allowing the linguistic features on the query. Le language is a ISO 639-1 code (featuring 2 letters, such as "en" for English).

geofilter.distance

limits the results to a maximal given distance (in meters) from a given WGS84 point: x,y,distance

geofilter.distance=48.8520930694,2.34738897685,1000

geofilter.polygon

limits the result to those included in a geographic area, specified as a polygon made of WGS84: (x1,y1),(x2,y2),(x3,y3)

geofilter.polygon=(48.883086,2.379072),(48.879022,2.379930),(48.883651,2.386968)

refine.<FACET>

Limits the results to those included in the specified path for this facet. It can be used multiple times, for a single or multiple facets.

refine.modified=2012/11

exclude.<FACET>

Excludes results matching the specified path for this facet.It can be used multiple times, for a single or multiple facets

exclude.modified=2011

format

Response output format: JSON (default), CSV, JSONP

callback

JSONP callback

GeoCluster

/api/records/1.0/geocluster/
Description

This API allows for geographic clustering over geographic points in a Dataset.

HTTP methods

This API supports the GET method. The POST method is supported as well, however its use is advised against for standardization reasons.

This API takes the cluster precision, a polygon representing the current view (on a map) as parameters and returns a list of clusters with the number of points contained in each cluster and a polygon containing all the points.

The output format is JSON.

Parameters
dataset

Mandatory: Dataset identifier (datasetid) on which the search will take place.

clusterprecision

Mandatory: the desired precision level, depending on the current map zoom level (if used through Leaflet, the Leaflet zoom level can be used).

shapeprecision

Allows for refining the returned polygon shape. The sum of clusterprecision and shapeprecision may not exceed 29.

clustermode

Defines the desired clustering mode. Supported values are polygon (the default), heatmap and world.

Polygon returns a geoshape encapsulating the outline of each cluster.

Heatmap allows aggregating values in a more precise fashion and doesn't return the associated polygon.

Series

This API accepts series as defined in the analyze API.

If defined, the aggregation will take place on the returned clusters.

clusterprecision=6&y.serie1.expr=height&y.series1.func=SUM

q

Full-text query. Empty by default, in which case all results are returned. Detailed documentation

lang

The language used to interpret the q parameter, allowing the linguistic features on the query. The language is a ISO 639-1 code (featuring 2 letters, such as "en" for English).

geofilter.distance

Limits the results to a maximal given distance (in meters) from a given WGS84 point: x,y,distance

geofilter.distance=48.8520930694,2.34738897685,1000

geofilter.polygon

Limits the result to those included in a geographic area, specified as a polygon made of WGS84: (x1,y1),(x2,y2),(x3,y3)

geofilter.polygon=(48.883086,2.379072),(48.879022,2.379930),(48.883651,2.386968)

refine.<FACET>

Limits the results to those included in the specified path for this facet. It can be used multiple times, for a single or multiple facets.

refine.modified=2012/11

exclude.<FACET>

Excludes results matching the specified path for this facet.It can be used multiple times, for a single or multiple facets

exclude.modified=2011

callback

JSONP callback

Appendices

How to identify a Dataset ?

You are looking for specific data to build your application but you don't know yet in which Dataset you can find these data ?

You can simply use the data exploration console by clicking on the "Explore" link in the top page menu. Once you have identified the Dataset you need, just go to this Dataset's "Information" tab where you'll find the Dataset id.

How to use facets ?

A facet can be considered as a valued tag associated with a Record. For instance, let's say a Dataset has a facet "City". A Record in this Dataset could have the value "Paris" for the "City" facet.

Facets are especially useful to implement guided navigation in large result sets. In exploration console, they are basically displayed on the left.

Identifying facets

By default, in Dataset and Record APIs, faceting is disabled. Faceting can be enabled by using the "facet" API parameter, specifying the name of the facet to retrieve.

In the Dataset APIs, facets are the same for all Datasets and are defined later in these Appendices.

In the Records API, facets are defined at field level. A field facet can be available depending on the data producer choices. Fields (retrieved for instance from the Dataset Lookup API) for which faceting is available can be easily identified:

...
"fields": [
    ...
    {
        "label": "City",
        "type": "text",
        "name": "city",
        "annotations": [
            {
                "name": "facet"
            }
        ]
    },
    ...

When faceting is enabled, facets are returned in the response after the result set.

Every facet has a display value ("name" attribute) and a refine property ("path" attribute) which can be used in refine and exclude parameters.

Facets are hierarchical, for instance, a year facet will contain months facets and a month fact will contain days facets.

Example of a facet tree:

"facet_groups": [
        {
            "name": "modified",
            "facets": [
                {
                    "name": "2012",
                    "path": "2012",
                    "facets": [
                        {
                            "name": "09",
                            "path": "2012/09",
                            "facets": [
                                {
                                    "name": "11",
                                    "path": "2012/09/11"
                                }
                            ...
                        

Every facet contains two additional information:

  • The "count" attribute contains the number of Records that have the same facet value.
  • The "state" attribute defines whether the facet is currently used in a "refine" or in an "exclude". Possible values are displayed (no refine nor exclude), refined (refine) and excluded (exclude)

"facet_groups": [
    {
        "name": "modified",
        "count": 45,
        "facets": [
            {
                "name": "2012",
                "path": "2012",
                "count": 24,
                "state": "displayed"
            },
        ...
                            

Refining

It is possible to limit the result set by refining on a given facet value.
/api/datasets/1.0/search/?refine.modified=2011

In the returned result set, only the Datasets modified in 2011 will be returned.

As the refinement occurs on the "year" and as the "modified" facet is hierarchical, the sub-level is returned; results are dispatched in the "month" sub value:

facet_groups: [{
    name: "modified",
    count: 20,
    facets: [{
        name: "2011",
        path: "2011",
        count: 20,
        state: "refined",
        facets: [{
            name: "04",
                path: "2011/04",
                count: 7,
                state: "displayed"
                            

Excluding

Using the same principle as above, it is possible to exclude from the result set the Records matching a given value of a given facet.
/api/datasets/1.0/search/?exclude.modified=2011

Only results that have not been modified in 2011 will be returned.

Facets in Datasets API

modified Last modification date
publisher Producer
issued First publication date
accrualperiodicity Publication frequency
language Language
license Licence
granularity Data granularity
dataquality Data quality
theme Theme
keyword Keywords
created Creation date
creator Creator
contributor Contributors

Sorting in Datasets API

modified Last modification date
issued First publication date
created Creation date

Query language (q= parameter)

Description

This query language may be used with APIs that accept the q parameters

It allows for refining the base query and can be used in conjunction with facets

Full-text search

Restricts results to those that contain the given words.

If the words are given surrounded by double quotes, the search returns the exact matches only.

q=film returns results that contain film, films, filmography ...

q="film" only returns the ones containing exactly film

Boolean query

This query language supports the following boolean operators: "AND", "OR", "NOT"

Spaces are considered to be “AND" operators.

Parentheses allow for grouping operations.

Examples:

q=film OR trees

q=(film OR trees) AND paris

Queries on fields

The query language admits queries on specific Dataset field.

The available fields for the Dataset API can be found here.

The available fields for the Records API depend on the related Dataset schema. the field list can be found among the Dataset API

For example: q=film_title:lord

Multiple operator fields can be used between the field name and the query:

  • ":", "=", "==": return results whose field exactly matches the given value (granted the fields are of text or numeric type)
  • ">", "<", ">=", "<=": Return results whose field values are larger, smaller, larger or equal, smaller or equal to the given value (granted the field is of date or numeric type).
  • [start_date TO end_date]: Queries Records whose date is between start_date and and_date.

Date formats can be specified in different formats: simple (YYYY[[/mm]/dd]) or ISO 8601 (YYYY-mm-DDTHH:MM:SS)

Examples:

q=film_date >= 2002

q=film_date >= 2013/02/11

q=film_date: [1950 TO 2000]

q=film_box_office > 10000 AND film_date < 1965

Query language functions

Use of the following functions is permitted in the query language.

These functions must be prefixed with a pound sign (#).

Available functions:

  • now

    This function returns the current date.

    This function may be called as a query value for a field.

    When called without an argument, it will evaluate to the current time.

    q=birthdate >= #now() returns all Records containing a birth date greater or equal to the current date

    The now function can also accept parameters:

    years, months, weeks, days, hours, minutes, seconds, microseconds: These parameters add time to the current date

    For example: #now(years=-1, hours=-1) returns the current date minus a year and an hour

    year, month, day, hour, minute, second, microsecond: can also be used to specify an absolute date

    For example: #now(year=2001) returns the current time, day and month for year 2001

    weekday: Specifies a day of the week.

    This parameter accepts either an integer between 0 and 6 (where 0 is Monday and 6 is Sunday) or the first two letters of the day (in English) followed by the cardinal of the first week on which to start the query.

    #now(weeks=-2, weekday=1) returns the Tuesday before last.

    #now(weekday=MO(2)) returns Monday after next.

Examples