4.1 Working with GBIF data outside of Pladias

GBIF data is available in the Pladias database, but not everyone needs this access. Especially for analyses, it is necessary to be able to process an individual export from GBIF and be able to pair it to Pladias taxa using the stored mapping.

A recipe for the OpenRefine tool is available for this purpose:

  1. start OpenRefine; for Docker users simply (without data persistency) docker run -d -it -p 3333:3333 -e REFINE_MEMORY=8G abesesr/openrefine, for others please see https://openrefine.org/docs/manual/installing
  2. get data from GBIF in DarwinCore format (the “simple” format can not be used as the taxa mapping is done on original names, not on names interpreted by the Taxonomy Backbone), extract the zip file and take the file occurrence.txt
  3. visit http://127.0.0.1:3333/ and upload the file
  4. “create project”, that is check and confirm the data were loaded correctly (check “tabs (TSV)” is selected as a column separator + “Attempt to parse cell text into numbers” is also checked).
  5. switch to Undo/Redo tab and press Apply. When the bellow JSON code is pasted, the OpenRefine will produce a new columns “PladiasId” and “PladiasName” at the very beginning of the matrix.
[
  {
    "op": "core/column-addition-by-fetching-urls",
    "engineConfig": {
      "facets": [],
      "mode": "row-based"
    },
    "baseColumnName": "taxonKey",
    "urlExpression": "grel:\"https://api.pladias.cz/prest/rpc/gbif2pladias?gbif_id=\" + value",
    "onError": "set-to-blank",
    "newColumnName": "pladiasRAW",
    "columnInsertIndex": 192,
    "delay": 50,
    "cacheResponses": true,
    "httpHeadersJson": [
      {
        "name": "authorization",
        "value": ""
      },
      {
        "name": "if-modified-since",
        "value": ""
      },
      {
        "name": "accept-language",
        "value": ""
      },
      {
        "name": "accept-encoding",
        "value": ""
      },
      {
        "name": "user-agent",
        "value": "OpenRefine 3.9.0 [TRUNK]"
      },
      {
        "name": "accept",
        "value": "*/*"
      },
      {
        "name": "accept-charset",
        "value": ""
      }
    ],
    "description": "Create column pladiasRAW at index 192 by fetching URLs based on column taxonKey using expression grel:\"https://api.pladias.cz/prest/rpc/gbif2pladias?gbif_id=\" + value"
  },
  {
    "op": "core/column-move",
    "columnName": "pladiasRAW",
    "index": 0,
    "description": "Move column pladiasRAW to position 0"
  },
  {
    "op": "core/column-addition",
    "engineConfig": {
      "facets": [],
      "mode": "row-based"
    },
    "baseColumnName": "pladiasRAW",
    "expression": "grel:value.parseJson()[0][\"pladias_id\"]",
    "onError": "set-to-blank",
    "newColumnName": "pladiasID",
    "columnInsertIndex": 1,
    "description": "Create column pladiasID at index 1 based on column pladiasRAW using expression grel:value.parseJson()[0][\"pladias_id\"]"
  },
  {
    "op": "core/column-addition",
    "engineConfig": {
      "facets": [],
      "mode": "row-based"
    },
    "baseColumnName": "pladiasRAW",
    "expression": "grel:value.parseJson()[0][\"pladias_name\"]",
    "onError": "set-to-blank",
    "newColumnName": "pladiasName",
    "columnInsertIndex": 1,
    "description": "Create column pladiasName at index 1 based on column pladiasRAW using expression grel:value.parseJson()[0][\"pladias_name\"]"
  },
  {
    "op": "core/column-removal",
    "columnName": "pladiasRAW",
    "description": "Remove column pladiasRAW"
  }
]