Adding processors to a dataset

Processors are tools that can be used in order to modify, improve or enrich the data of a dataset. In the OpenDataSoft platform, processors are classified in 4 different categories:

  • Processors for geographical mapping
  • Processors for dates handling
  • Processors for text transformations
  • Processors for generic operations

To add a processor to a dataset:

  1. In the Processing tab, click on the Add a processor button.
  2. Choose the processor to add to the dataset.
  3. Using the documentation of the chosen processor, fill in the right parameters to set the processor.

Note

You may need to click outside the processor box once the parameters are configured, to make sure the processor and the changes it triggers are taken into account and applied to the dataset.

Note

No matter the processor, always use the technical identifiers of the fields to process, never the labels.

Geographical processors

Geographical processors are divided into 4 categories, according to what is tried to being achieved:

  • Geocoders: to convert a human readable address into a geo point. There are 10 geocoders.
  • GeoJoin processor: to retrieve geoshapes from normalized codes for country specific administrative divisions. The GeoJoin processor supports several countries, each of which features several indexing codes like postcode, state or region identifier, etc.
  • Retrieve Administrative Divisions processor: to retrieve the name, code and geoshape of country specific administrative divisions enclosing a geopoint.
  • Converters & Functions: to simplify, convert or normalize geographical data, or run computations based on them. There are 7 processors in this category.

Geocoders

Name Description Availability
Geocode with BAN Geocode addresses in France by using the Base d’Adresses Nationale (BAN) service Default
Geocode with Google Geocode full text addresses by using the Google geocoding API On demand
Geocode with ArcGIS Geocode full text addresses by using the ArcGIS geocoding API Default
Geocode with PDOK Geocode addresses in the Netherlands by using the PDOK service On demand
Country code to geo coordinates Produce a geo coordinate with a country ISO code Default
INSEE code to geo coordinates Produce a geo coordinate with a French INSEE code Default
IP address to geo coordinates Geocode an IP address Default
Zip code to geo coordinates Produce a geo coordinate with a French postal code Default
what3words Produce a 3 word address with geographical coordinates On demand
Geo coordinates from a 3 word address Convert a 3 word address into geographical coordinates On demand

The GeoJoin processor

Name Description Availability
Geojoin Retrieve administrative divisions geo shapes for a specified country and referential Default

The Retrieve administrative divisions processor

Name Description Availability
Retrieve administrative divisions Retrieve administrative divisions information with a geo point Default

Converters & Functions

Name Description Availability
Convert degrees Convert a degrees, minutes, seconds geo coordinate to WGS84 coordinates Default
Normalize projection reference Replace a geopoint with its its WGS84 representation Default
WKT and WKB to GeoJSON Convert vector geometry object represented in WKT or WKB into a GeoJson object On demand
Simplify geo shape Simplify a geo shape to reduce processing time and dataset size Default
Geomasking Provides privacy protection by approximating a geographical location within a specific radius Default
Geo distance Compute the distance between 2 coordinates Default
Create geo point Create a geopoint field from a latitude field and a longitude field Default

Date processors

Name Description Availability
Normalize date Normalize a date format not automatically understood by the platform Default
Set timezone Define a timezone for a datetime field Default

Text processors

Name Description Availability
Concatenate text Concatenate 2 fields Default
Decode HTML entities Decode HTML entities from a text, to transform them into valid HTML Default
Extract HTML Extract HTML from an HTML tag to only keep textual content Default
Extract text Extract part of a field value using a regular expression Default
Extract URLs Extract URLs from HTML or text contents Default
Normalize unicode values Normalize unicode content using the Normalization Form Canonical Composition (NFC) Default
Normalize URL Normalize a field value to obtain a valid URL Default
Replace text Replace a textual field value with a chosen text Default
Replace via Regexp Replace a remove part of a field value using a regular expression Default
Split text Split a field value and extract part of it in a new field Default

Generic processors

Name Description Availability
Add a field Add a new empy field in a dataset Default
Copy a field Copy a field value from a field to another Default
Delete record by ID Remove an existing record, based on its unique ID, from a dataset Default
Expand from JSON array Transpose rows containing a JSON array into several rows Default
Expression Write complex expression patterns using field values Default
Extract bit range Extract an arbitrary bit range from an hexadecimal content On demand
Extract from JSON Extract values from a field containing a JSON object Default
File Retrieve images from URLs Default
Join dataset Join 2 datasets together to retrieve a specified field in a dataset Default
JSON array to multivalued Extract multiple values from a JSON array and concatenates them into a multivalued field Default
Skip records Skip records from a dataset Default
Transform boolean columns to multivalues fields Transform true values from boolean fields into a multivalued field Default
Transpose fields Transform labels into field values Default