Adding processors to a dataset¶
Processors are tools that can be used in order to modify, improve, or enrich the data of a dataset. In the Opendatasoft platform, processors are classified into 4 different categories:
Processors for geographical mapping
Processors for dates handling
Processors for text transformations
Processors for generic operations
To add a processor to a dataset:
In the Processing tab, click on the Add a processor button.
Choose the processor to add to the dataset.
Using the documentation of the chosen processor, fill in the right parameters to set the processor.
(optional) Click on the edition icon
to rename the processor. This step can especially be useful when a lot of processors are applied to a dataset, including multiple processors of the same type (for example, renaming the multiple Expression processors applied to a dataset to know more easily which one contains which expression).
Note
You may need to click outside the processor box once the parameters are configured to make sure the processor and the changes it triggers are taken into account and applied to the dataset.
Note
No matter the processor, always use the technical identifiers of the fields to process, never the labels.
Geographical processors¶
Geographical processors are divided into 4 categories, according to what is tried to being achieved:
Geocoders: to convert a human-readable address into a geo point.
GeoJoin processor: to retrieve geoshapes from normalized codes for country-specific administrative divisions. The GeoJoin processor supports several countries, each of which features several indexing codes like postcode, state or region identifier, etc.
Retrieve Administrative Divisions processor: to retrieve the name, code, and geoshape of country-specific administrative divisions enclosing a geopoint.
Converters & Functions: to simplify, convert or normalize geographical data, or run computations based on them.
Geocoders¶
Name | Description | Availability |
---|---|---|
Geocode with ArcGIS | Geocode full-text addresses by using the ArcGIS geocoding API | Default |
Geocode with BAN (France) | Geocode addresses in France by using the Base d'Adresses Nationale (BAN) service | Default |
Geocode with PDOK | Geocode addresses in the Netherlands by using the PDOK service | On demand |
Geocode with the Census Bureau (USA) | Geocode addresses in the USA by using the Census Bureau | On demand |
Get coordinates from a 3 word address | Convert a 3 word address into geographical coordinates | On demand |
IP address to geo coordinates | Geocode an IP address | Default |
Nominatim geocoder | Geocode full-text addresses using OpenStreetMap data | On demand |
what3words | Produce a 3 word address with geographical coordinates | On demand |
The GeoJoin processor¶
Name | Description | Availability |
---|---|---|
Geojoin | Retrieve administrative divisions geo shapes for a specified country and referential | Default |
The Retrieve administrative divisions processor¶
Name | Description | Availability |
---|---|---|
Retrieve administrative divisions | Retrieve administrative divisions information with a geo point | Default |
Converters & Functions¶
Name | Description | Availability |
---|---|---|
Compute geo distance | Compute the distance between 2 coordinates | Default |
Correct geo shape | Fix invalid geo shapes | On demand |
Convert degrees | Convert a degrees, minutes, seconds geo coordinate to WGS84 coordinates | Default |
Create geo point | Create a geopoint field from a latitude field and a longitude field | Default |
Decode a Google polyline | Transform an encoded Google polyline into a GeoJSON LineString | On demand |
GeoHash to GeoJSON | Convert GeoHash values to GeoJSON | On demand |
Geomasking | Provides privacy protection by approximating a geographical location within a specific radius | Default |
Normalize projection reference | Replace a geopoint with its WGS84 representation | Default |
Polygon filtering | Remove points that are not in a polygon | On demand |
Simplify geo shape | Simplify a geo shape to reduce processing time and dataset size | Default |
WKT and WKB to GeoJSON | Convert vector geometry object represented in WKT or WKB into a GeoJson object | On demand |
Date processors¶
Name | Description | Availability |
---|---|---|
Normalize date | Normalize a date format not automatically understood by the platform | Default |
Set timezone | Define a timezone for a datetime field | Default |
Text processors¶
Name | Description | Availability |
---|---|---|
Concatenate text | Concatenate 2 fields | Default |
Decode HTML entities | Decode HTML entities from a text, to transform them into valid HTML | Default |
Extract HTML | Extract HTML from an HTML tag to only keep textual content | Default |
Extract text | Extract part of a field value using a regular expression | Default |
Extract URLs | Extract URLs from HTML or text contents | Default |
Normalize Unicode values | Normalize Unicode content using the Normalization Form Canonical Composition (NFC) | Default |
Normalize URL | Normalize a field value to obtain a valid URL | Default |
Replace text | Replace a textual field value with a chosen text | Default |
Replace via regular expression | Replace a remove part of a field value using a regular expression | Default |
Split text | Split a field value and extract part of it in a new field | Default |
Generic processors¶
Name | Description | Availability |
---|---|---|
Add a field | Add a new empty field in a dataset | Default |
Copy a field | Copy a field value from a field to another | Default |
Deduplicate multivalued fields | Remove duplicated values in a multivalued field | Default |
Delete record | Delete a record based on field values | Default |
Expand JSON array | Transpose rows containing a JSON array into several rows | Default |
Expand multivalued field | Transform the values contained in a multivalued field into several records | Default |
Expression | Write complex expression patterns using field values | Default |
Extract bit range | Extract an arbitrary bit range from an hexadecimal or binary content | On demand |
Extract from JSON | Extract values from a field containing a JSON object | Default |
File | Retrieve images from URLs | Default |
Join datasets | Join 2 datasets together to retrieve a specified field in a dataset | Default |
JSON array to multivalued | Extract multiple values from a JSON array and concatenates them into a multivalued field | Default |
Meta expression | Apply an expression on multiple fields | On demand |
Skip records | Skip records from a dataset | Default |
Transform boolean columns to multivalued field | Transform true values from boolean fields into a multivalued field | Default |
Transpose columns to rows | Transform labels into field values | Default |