Adding processors to a dataset

Processors are tools that can be used in order to modify, improve, or enrich the data of a dataset. In the Opendatasoft platform, processors are classified into 4 different categories:

  • Processors for geographical mapping

  • Processors for dates handling

  • Processors for text transformations

  • Processors for generic operations

To add a processor to a dataset:

  1. In the Processing tab, click on the Add a processor button.

  2. Choose the processor to add to the dataset.

  3. Using the documentation of the chosen processor, fill in the right parameters to set the processor.

  4. (optional) Click on the edition icon icon-edit to rename the processor. This step can especially be useful when a lot of processors are applied to a dataset, including multiple processors of the same type (for example, renaming the multiple Expression processors applied to a dataset to know more easily which one contains which expression).

Note

You may need to click outside the processor box once the parameters are configured to make sure the processor and the changes it triggers are taken into account and applied to the dataset.

Note

No matter the processor, always use the technical identifiers of the fields to process, never the labels.

Geographical processors

Geographical processors are divided into 4 categories, according to what is tried to being achieved:

  • Geocoders: to convert a human-readable address into a geo point.

  • GeoJoin processor: to retrieve geoshapes from normalized codes for country-specific administrative divisions. The GeoJoin processor supports several countries, each of which features several indexing codes like postcode, state or region identifier, etc.

  • Retrieve Administrative Divisions processor: to retrieve the name, code, and geoshape of country-specific administrative divisions enclosing a geopoint.

  • Converters & Functions: to simplify, convert or normalize geographical data, or run computations based on them.

Geocoders

Name Description Availability
Geocode with ArcGIS Geocode full-text addresses by using the ArcGIS geocoding API Default
Geocode with BAN (France) Geocode addresses in France by using the Base d'Adresses Nationale (BAN) service Default
Geocode with PDOK Geocode addresses in the Netherlands by using the PDOK service On demand
Geocode with the Census Bureau (USA) Geocode addresses in the USA by using the Census Bureau On demand
Get coordinates from a 3 word address Convert a 3 word address into geographical coordinates On demand
IP address to geo coordinates Geocode an IP address Default
Nominatim geocoder Geocode full-text addresses using OpenStreetMap data On demand
what3words Produce a 3 word address with geographical coordinates On demand

The GeoJoin processor

Name Description Availability
Geojoin Retrieve administrative divisions geo shapes for a specified country and referential Default

The Retrieve administrative divisions processor

Name Description Availability
Retrieve administrative divisions Retrieve administrative divisions information with a geo point Default

Converters & Functions

Name Description Availability
Compute geo distance Compute the distance between 2 coordinates Default
Correct geo shape Fix invalid geo shapes On demand
Convert degrees Convert a degrees, minutes, seconds geo coordinate to WGS84 coordinates Default
Create geo point Create a geopoint field from a latitude field and a longitude field Default
Decode a Google polyline Transform an encoded Google polyline into a GeoJSON LineString On demand
GeoHash to GeoJSON Convert GeoHash values to GeoJSON On demand
Geomasking Provides privacy protection by approximating a geographical location within a specific radius Default
Normalize projection reference Replace a geopoint with its WGS84 representation Default
Polygon filtering Remove points that are not in a polygon On demand
Simplify geo shape Simplify a geo shape to reduce processing time and dataset size Default
WKT and WKB to GeoJSON Convert vector geometry object represented in WKT or WKB into a GeoJson object On demand

Date processors

Name Description Availability
Normalize date Normalize a date format not automatically understood by the platform Default
Set timezone Define a timezone for a datetime field Default

Text processors

Name Description Availability
Concatenate text Concatenate 2 fields Default
Decode HTML entities Decode HTML entities from a text, to transform them into valid HTML Default
Extract HTML Extract HTML from an HTML tag to only keep textual content Default
Extract text Extract part of a field value using a regular expression Default
Extract URLs Extract URLs from HTML or text contents Default
Normalize Unicode values Normalize Unicode content using the Normalization Form Canonical Composition (NFC) Default
Normalize URL Normalize a field value to obtain a valid URL Default
Replace text Replace a textual field value with a chosen text Default
Replace via regular expression Replace a remove part of a field value using a regular expression Default
Split text Split a field value and extract part of it in a new field Default

Generic processors

Name Description Availability
Add a field Add a new empty field in a dataset Default
Copy a field Copy a field value from a field to another Default
Deduplicate multivalued fields Remove duplicated values in a multivalued field Default
Delete record Delete a record based on field values Default
Expand JSON array Transpose rows containing a JSON array into several rows Default
Expand multivalued field Transform the values contained in a multivalued field into several records Default
Expression Write complex expression patterns using field values Default
Extract bit range Extract an arbitrary bit range from an hexadecimal or binary content On demand
Extract from JSON Extract values from a field containing a JSON object Default
File Retrieve images from URLs Default
Join datasets Join 2 datasets together to retrieve a specified field in a dataset Default
JSON array to multivalued Extract multiple values from a JSON array and concatenates them into a multivalued field Default
Meta expression Apply an expression on multiple fields On demand
Skip records Skip records from a dataset Default
Transform boolean columns to multivalued field Transform true values from boolean fields into a multivalued field Default
Transpose columns to rows Transform labels into field values Default