Note: This document is for an older version of the Gro API Client. Please see the latest documentation.

API Reference

Creating a client

GroClient.__init__(api_host='api.gro-intelligence.com', access_token=None)[source]

Construct a GroClient instance.

Parameters
  • api_host (string, optional) – The API server hostname.

  • access_token (string, optional) – Your Gro API authentication token. If not specified, the $GROAPI_TOKEN environment variable is used. See Authentication.

Raises

RuntimeError – Raised when neither the access_token parameter nor $GROAPI_TOKEN environment variable are set.

Examples

>>> client = GroClient()  # token stored in $GROAPI_TOKEN
>>> client = GroClient(access_token="your_token_here")

Basic Exploration

GroClient.lookup(entity_type, entity_ids)[source]

Retrieve details about a given id or list of ids of type entity_type.

https://developers.gro-intelligence.com/gro-ontology.html

Parameters
  • entity_type ({ 'metrics', 'items', 'regions', 'frequencies', 'sources', 'units' }) –

  • entity_ids (int or list of ints) –

Returns

A dict with entity details is returned if an integer is given for entity_ids. A dict of dicts with entity details, keyed by id, is returned if a list of integers is given for entity_ids.

Example:

{ 'id': 274,
  'contains': [779, 780, ...]
  'name': 'Corn',
  'definition': 'The seeds of the widely cultivated corn plant <i>Zea mays</i>,'
                ' which is one of the world's most popular grains.' }

Example:

{   '274': {
        'id': 274,
        'contains': [779, 780, ...],
        'belongsTo': [4138, 8830, ...],
        'name': 'Corn',
        'definition': 'The seeds of the widely cultivated corn plant'
                      ' <i>Zea mays</i>, which is one of the world's most popular'
                      ' grains.'
    },
    '270': {
        'id': 270,
        'contains': [1737, 7401, ...],
        'belongsTo': [8830, 9053, ...],
        'name': 'Soybeans',
        'definition': 'The seeds and harvested crops of plants belonging to the'
                      ' species <i>Glycine max</i> that are used in the production'
                      ' of oil and both human and livestock consumption.'
    }
}

Return type

dict or dict of dicts

GroClient.search(entity_type, search_terms)[source]

Search for the given search term. Better matches appear first.

Parameters
  • entity_type ({ 'metrics', 'items', 'regions', 'sources' }) –

  • search_terms (string) –

Returns

Example:

[{'id': 5604}, {'id': 10204}, {'id': 10210}, ....]

Return type

list of dicts

GroClient.search_and_lookup(entity_type, search_terms, num_results=10)[source]

Search for the given search terms and look up their details.

For each result, yield a dict of the entity and it’s properties.

Parameters
  • entity_type ({ 'metrics', 'items', 'regions', 'sources' }) –

  • search_terms (string) –

  • num_results (int) – Maximum number of results to return. Defaults to 10.

Yields

dict – Result from search() passed to lookup() to get additional details.

Example:

{ 'id': 274,
  'contains': [779, 780, ...],
  'name': 'Corn',
  'definition': 'The seeds of the widely cultivated...' }

See output of lookup(). Note that as with search(), the first result is the best match for the given search term(s).

GroClient.search_for_entity(entity_type, keywords)[source]

Returns the first result of entity_type that matches the given keywords.

Parameters
  • entity_type ({ 'metrics', 'items', 'regions', 'sources' }) –

  • keywords (string) –

Returns

The id of the first search result

Return type

integer

GroClient.get_ancestor(entity_type, entity_id, distance=None, include_details=True)[source]

Given an item, metric or region, returns all its ancestors i.e. entities that “contain” in the given entity

Parameters
  • entity_type ({ 'metrics', 'items', 'regions' }) –

  • entity_id (integer) –

  • distance (integer, optional) – Return all entities that contain the entity_id at maximum distance. If not provided, get all ancestors.

  • include_details (boolean, optional) – True by default. Will perform a lookup() on each ancestor to find name, definition, etc. If this option is set to False, only ids of ancestor entities will be returned, which makes execution significantly faster.

Returns

Example:

[{
    'id': 134,
    'name': 'Cattle hides, wet-salted',
    'definition': 'Hides and skins of domesticated cattle-animals ...',
} , {
    'id': 382,
    'name': 'Calf skins, wet-salted',
    'definition': 'Wet-salted hides and skins of calves-animals of ...'
}, ...]

See output of lookup()

Return type

list of dicts

GroClient.get_descendant(entity_type, entity_id, distance=None, include_details=True)[source]

Given an item, metric or region, returns all its descendants i.e. entities that are “contained” in the given entity

Similar to get_descendant_regions(), but also works on items and metrics. This method has a distance parameter (which returns all nested child entities) instead of a descendant_level parameter (which only returns child entities at a given depth/level).

Parameters
  • entity_type ({ 'metrics', 'items', 'regions' }) –

  • entity_id (integer) –

  • distance (integer, optional) – Return all entity contained to entity_id at maximum distance. If not provided, get all descendants.

  • include_details (boolean, optional) – True by default. Will perform a lookup() on each descendant to find name, definition, etc. If this option is set to False, only ids of descendant entities will be returned, which makes execution significantly faster.

Returns

Example:

[{
    'id': 134,
    'name': 'Cattle hides, wet-salted',
    'definition': 'Hides and skins of domesticated cattle-animals ...',
} , {
    'id': 382,
    'name': 'Calf skins, wet-salted',
    'definition': 'Wet-salted hides and skins of calves-animals of ...'
}, ...]

See output of lookup()

Return type

list of dicts

GroClient.get_data_series(**selection)[source]

Get available data series for the given selections.

https://developers.gro-intelligence.com/data-series-definition.html

Parameters
  • metric_id (integer, optional) –

  • item_id (integer, optional) –

  • region_id (integer, optional) –

  • partner_region_id (integer, optional) –

  • source_id (integer, optional) –

  • frequency_id (integer, optional) –

Returns

Example:

[{ 'metric_id': 2020032, 'metric_name': 'Seed Use',
   'item_id': 274, 'item_name': 'Corn',
   'region_id': 1215, 'region_name': 'United States',
   'source_id': 24, 'source_name': 'USDA FEEDGRAINS',
   'frequency_id': 7,
   'start_date': '1975-03-01T00:00:00.000Z',
   'end_date': '2018-05-31T00:00:00.000Z'
 }, { ... }, ... ]

Return type

list of dicts

GroClient.find_data_series(result_filter=None, **kwargs)[source]

Find data series matching a combination of entities specified by name and yield them ranked by coverage.

Example:

client.find_data_series(item="Corn",
                        metric="Futures Open Interest",
                        region="United States of America")

will yield a sequence of dictionaries of the form:

{ 'metric_id': 15610005, 'metric_name': 'Futures Open Interest',
  'item_id': 274, 'item_name': 'Corn',
  'region_id': 1215, 'region_name': 'United States',
  'frequency_id': 15, 'source_id': 81,
  'start_date': '1972-03-01T00:00:00.000Z', ...},
{ ... },  ...

See https://developers.gro-intelligence.com/data-series-definition.html

result_filter can be used to filter entity searches. For example:

client.find_data_series(item="vegetation",
                        metric="vegetation indices",
                        region="Central",
                        result_filter=lambda r: ('region_id' not in r or
                                                 r['region_id'] == 10393))

will only consider that particular region, and not the many other regions with the same name.

This method uses search(), get_data_series(), get_available_timefrequency() and rank_series_by_source().

Parameters
  • metric (string, optional) –

  • item (string, optional) –

  • region (string, optional) –

  • partner_region (string, optional) –

  • start_date (string, optional) – YYYY-MM-DD

  • end_date (string, optional) – YYYY-MM-DD

  • result_filter (function, optional) – function taking data series selection dict returning boolean

Yields

dict – A sequence of data series matching the input selections

Data Retrieval

GroClient.get_data_points(**selections)[source]

Get all the data points for a given selection.

https://developers.gro-intelligence.com/data-point-definition.html

Example:

client.get_data_points(**{'metric_id': 860032,
                          'item_id': 274,
                          'region_id': 1215,
                          'frequency_id': 9,
                          'source_id': 2,
                          'start_date': '2017-01-01',
                          'end_date': '2017-12-31',
                          'unit_id': 15})

Returns:

[{  'start_date': '2017-01-01T00:00:00.000Z',
    'end_date': '2017-12-31T00:00:00.000Z',
    'value': 408913833.8019222, 'unit_id': 15,
    'reporting_date': None,
    'metric_id': 860032, 'item_id': 274, 'region_id': 1215,
    'partner_region_id': 0, 'frequency_id': 9, 'source_id': 2,
    'belongs_to': {
        'metric_id': 860032,
        'item_id': 274,
        'region_id': 1215,
        'frequency_id': 9,
        'source_id': 2
    }
}]

Note: you can pass the output of get_data_series() into get_data_points() to check what series exist for some selections and then retrieve the data points for those series. See quick_start.py for an example of this.

get_data_points() also allows passing a list of ids for metric_id, item_id, and/or region_id to get multiple series in a single request. This can be faster if requesting many series.

For example:

client.get_data_points(**{'metric_id': 860032,
                          'item_id': 274,
                          'region_id': [1215,1216],
                          'frequency_id': 9,
                          'source_id': 2,
                          'start_date': '2017-01-01',
                          'end_date': '2017-12-31',
                          'unit_id': 15})

Returns:

[{  'start_date': '2017-01-01T00:00:00.000Z',
    'end_date': '2017-12-31T00:00:00.000Z',
    'value': 408913833.8019222, 'unit_id': 15,
    'reporting_date': None,
    'metric_id': 860032, 'item_id': 274, 'region_id': 1215,
    'partner_region_id': 0, 'frequency_id': 9, 'source_id': 2,
    'belongs_to': {
        'metric_id': 860032,
        'item_id': 274,
        'region_id': 1215,
        'frequency_id': 9,
        'source_id': 2
    }
}, { 'start_date': '2017-01-01T00:00:00.000Z',
     'end_date': '2017-12-31T00:00:00.000Z',
     'value': 340614.19507563586, 'unit_id': 15,
     'reporting_date': None,
     'metric_id': 860032, 'item_id': 274, 'region_id': 1216,
     'partner_region_id': 0, 'frequency_id': 9, 'source_id': 2,
     'belongs_to': {
        'metric_id': 860032,
        'item_id': 274,
        'region_id': 1216,
        'frequency_id': 9,
        'source_id': 2
     }
}]
Parameters
  • metric_id (integer or list of integers) – How something is measured. e.g. “Export Value” or “Area Harvested”

  • item_id (integer or list of integers) – What is being measured. e.g. “Corn” or “Rainfall”

  • region_id (integer or list of integers) – Where something is being measured e.g. “United States Corn Belt” or “China”

  • partner_region_id (integer or list of integers, optional) – partner_region refers to an interaction between two regions, like trade or transportation. For example, for an Export metric, the “region” would be the exporter and the “partner_region” would be the importer. For most series, this can be excluded or set to 0 (“World”) by default.

  • source_id (integer) –

  • frequency_id (integer) –

  • unit_id (integer, optional) –

  • start_date (string, optional) – All points with end dates equal to or after this date

  • end_date (string, optional) – All points with start dates equal to or before this date

  • show_revisions (boolean, optional) – False by default, meaning only the latest value for each period. If true, will return all values for a given period, differentiated by the reporting_date field.

  • show_available_date (boolean, optional) – False by default. If true, will return the available date of each data point.

  • insert_null (boolean, optional) – False by default. If True, will include a data point with a None value for each period that does not have data.

  • at_time (string, optional) – Estimate what data would have been available via Gro at a given time in the past. See at-time-query-examples.ipynb for more details.

  • include_historical (boolean, optional) – True by default, will include historical regions that are part of your selections

  • available_since (string, optional) – Fetch points since last data retrieval where available date is equal to or after this date

Returns

Return type

list of dicts

Geographic

GroClient.get_geojson(region_id, zoom_level=7)[source]

Given a region ID, return shape information in geojson.

Parameters
  • region_id (integer) –

  • zoom_level (integer, optional(allow 1-8)) – Valid if include_geojson equals True. If zoom level is specified and it is less than 6, simplified shapefile will be returned. Otherwise, detailed shapefile will be used by default.

Returns

Example:

{ 'type': 'GeometryCollection',
'geometries': [{'type': 'MultiPolygon',
                'coordinates': [[[[-38.394, -4.225], ...]]]}, ...]}

Return type

a geojson object or None

GroClient.get_descendant_regions(region_id, descendant_level=None, include_historical=True, include_details=True)[source]

Look up details of all regions of the given level contained by a region.

Given any region by id, get all the descendant regions that are of the specified level.

Parameters
  • region_id (integer) –

  • descendant_level (integer, optional) – The region level of interest. See REGION_LEVELS constant. If not provided, get all descendants.

  • include_historical (boolean, optional) – True by default. If False is specified, regions that only exist in historical data (e.g. the Soviet Union) will be excluded.

  • include_details (boolean, optional) – True by default. Will perform a lookup() on each descendant region to find name, latitude, longitude, etc. If this option is set to False, only ids of descendant regions will be returned, which makes execution significantly faster.

Returns

Example:

[{
    'id': 13100,
    'contains': [139839, 139857, ...],
    'name': 'Wisconsin',
    'level': 4
} , {
    'id': 13101,
    'contains': [139891, 139890, ...],
    'name': 'Wyoming',
    'level': 4
}, ...]

See output of lookup()

Return type

list of dicts

GroClient.get_provinces(country_name)[source]

Given the name of a country, find its provinces.

Parameters

country_name (string) –

Returns

Example:

[{
    'id': 13100,
    'contains': [139839, 139857, ...],
    'name': 'Wisconsin',
    'level': 4
} , {
    'id': 13101,
    'contains': [139891, 139890, ...],
    'name': 'Wyoming',
    'level': 4
}, ...]

See output of lookup()

Return type

list of dicts

Advanced Exploration

GroClient.lookup_belongs(entity_type, entity_id)[source]

Look up details of entities containing the given entity.

Parameters
  • entity_type ({ 'metrics', 'items', 'regions' }) –

  • entity_id (int) –

Yields

dict – Result of lookup() on each entity the given entity belongs to.

For example: For the region ‘United States’, one yielded result will be for ‘North America.’ The format of which matches the output of lookup():

{ 'id': 15,
  'contains': [ 1008, 1009, 1012, 1215, ... ],
  'name': 'North America',
  'level': 2 }
GroClient.rank_series_by_source(selections_list)[source]

Given a list of series selections, for each unique combination excluding source, expand to all available sources and return them in ranked order. The order corresponds to how well that source covers the selection (metrics, items, regions, and time range and frequency).

Parameters

selections_list (list of dicts) – See the output of get_data_series().

Yields

dict – The input selections_list, expanded out to each possible source, ordered by coverage.

GroClient.get_available_timefrequency(**selection)[source]

Given a selection, return a list of frequencies and time ranges. The results are ordered by coverage-optimized ranking.

Parameters
  • metric_id (integer, optional) –

  • item_id (integer, optional) –

  • region_id (integer, optional) –

  • partner_region_id (integer, optional) –

Returns

Example:

[{
   'start_date': '2000-02-18T00:00:00.000Z',
   'frequency_id': 3,
   'end_date': '2020-03-12T00:00:00.000Z',
   'name': '8-day'
 }, {
   'start_date': '2019-09-02T00:00:00.000Z',
   'frequency_id': 1,
   'end_date': '2020-03-09T00:00:00.000Z',
   'name': u'daily'}, ... ]

Return type

list of dicts

GroClient.get_top(entity_type, num_results=5, **selection)[source]

Find the data series with the highest cumulative value for the given time range.

Examples:

# To get FAO's top 5 corn-producing countries of all time:
client.get_top('regions', metric_id=860032, item_id=274, frequency_id=9, source_id=2)

# To get FAO's top 5 corn-producing countries of 2014:
client.get_top('regions', metric_id=860032, item_id=274, frequency_id=9, source_id=2,
               start_date='2014-01-01', end_date='2014-12-31')

# To get the United States' top 15 exports in the decade of 2010-2019:
client.get_top('items', num_results=15, metric_id=20032, region_id=1215, frequency_id=9,
               source_id=2, start_date='2010-01-01', end_date='2019-12-31')
Parameters
  • entity_type ({ 'items', 'regions' }) – The entity type to rank, all other selections being the same. Only items and regions are rankable at this time.

  • num_results (integer, optional) – How many data series to rank. Top 5 by default.

  • metric_id (integer) –

  • item_id (integer) – Required if requesting top regions. Disallowed if requesting top items.

  • region_id (integer) – Required if requesting top items. Disallowed if requesting top regions.

  • partner_region_id (integer, optional) –

  • frequency_id (integer) –

  • source_id (integer) –

  • start_date (string, optional) – If not provided, the cumulative value used for ranking will include data points as far back as the source provides.

  • end_date (string, optional) –

Returns

Example:

[
    {'metricId': 860032, 'itemId': 274, 'regionId': 1215, 'frequencyId': 9,
     'sourceId': 2, 'value': 400, 'unitId': 14},
    {'metricId': 860032, 'itemId': 274, 'regionId': 1215, 'frequencyId': 9,
     'sourceId': 2, 'value': 395, 'unitId': 14},
    {'metricId': 860032, 'itemId': 274, 'regionId': 1215, 'frequencyId': 9,
     'sourceId': 2, 'value': 12, 'unitId': 14},
]

Along with the series attributes, value and unit are also given for the total cumulative value the series are ranked by. You may then use the results to call get_data_points() to get the individual time series points.

Return type

list of dicts

Pandas Utils

GroClient.get_df(show_revisions=False, show_available_date=False, index_by_series=False, include_names=False, compress_format=False, async_mode=False)[source]

Call get_data_points() for each saved data series and return as a combined dataframe.

Note you must have first called either add_data_series() or add_single_data_series() to save data series into the GroClient’s data_series_list. You can inspect the client’s saved list using get_data_series_list().

Parameters
  • show_revisions (boolean, optional) – False by default, meaning only the latest value for each period. If true, will return all values for a given period, differentiated by the reporting_date field.

  • show_available_date (boolean, optional) – False by default. If true, will return the available date of each data point.

  • index_by_series (boolean, optional) – If set, the dataframe is indexed by series. See https://developers.gro-intelligence.com/data-series-definition.html

  • include_names (boolean, optional) – If set, the dataframe will have additional columns with names of entities. Note that this will increase the size of the dataframe by about 5x.

  • compress_format (boolean, optional) – If set, each series will be compressed to a single column in the dataframe, with the end_date column set as the dataframe inde. All the entity names for each series will be placed in column headers. compress_format cannot be used simultaneously with show_revisions or show_available_date

  • async_mode (boolean, optional) – If set, it will make get_data_points() requests asynchronously. Note that when running in a Jupyter Ipython notebook with async_mode, you will need to use nest_asyncio module

Returns

The results to get_data_points() for all the saved series, appended together into a single dataframe. See https://developers.gro-intelligence.com/data-point-definition.html

Return type

pandas.DataFrame

GroClient.add_data_series(**kwargs)[source]

Adds the top result of find_data_series() to the saved data series list.

For use with get_df().

Parameters
  • metric (string, optional) –

  • item (string, optional) –

  • region (string, optional) –

  • partner_region (string, optional) –

  • start_date (string, optional) – YYYY-MM-DD

  • end_date (string, optional) – YYYY-MM-DD

  • result_filter (function, optional) – function taking data series selection dict returning boolean

Returns

The data_series that was added or None if none were found.

Return type

data_series object, as returned by get_data_series().

GroClient.add_single_data_series(data_series)[source]

Save a data series object to the GroClient’s data_series_list.

For use with get_df().

Parameters

data_series (dict) – A single data_series object, as returned by get_data_series() or find_data_series(). See https://developers.gro-intelligence.com/data-series-definition.html

Returns

Return type

None

GroClient.get_data_series_list()[source]

Inspect the current list of saved data series contained in the GroClient.

For use with get_df(). Add new data series to the list using add_data_series() and add_single_data_series().

Returns

A list of data_series objects, as returned by get_data_series().

Return type

list of dicts

Crop Modeling

CropModel.compute_weights(crop_name, metric_name, regions)[source]

Compute a vector of ‘weights’ that can be used for crop-weighted average across regions, as in compute_crop_weighted_series().

For each region, the weight of is the mean value over time, of the given metric for the given crop, normalized so the sum across all regions is 1.0.

For example: say we have a region_list = [{‘id’: 1, ‘name’: ‘Province1’}, {‘id’: 2, ‘name’: ‘Province2’}]. This could be a list returned by search_and_lookup() or get_descendant_regions() for example. Now say model.compute_weights(‘soybeans’, ‘land cover area’, region_list) returns [0.6, 0.4], that means Province1 has 60% and province2 has 40% of the total area planted across the two regions, when averaged across all time.

Parameters
  • crop_name (string) –

  • metric_name (string) –

  • regions (list of dicts) – Each entry is a region with id and name

Returns

weights corresponding to the regions.

Return type

list of floats

CropModel.compute_crop_weighted_series(weighting_crop_name, weighting_metric_name, item_name, metric_name, regions, weighting_func=<function CropModel.<lambda>>)[source]

Compute the ‘crop-weighted average’ of the series for the given item and metric, across regions. The weight of a region is the fraction of the value of the weighting series represented by that region as explained in compute_weights().

For example: say we have a region_list = [{‘id’: 1, ‘name’: ‘Province1’}, {‘id’: 2, ‘name’: ‘Province2’}]. This could be a list returned by search_and_lookup() or client.get_descendant_regions for example. Now model.compute_crop_weighted_series(‘soybeans’, ‘land cover area’, ‘vegetation ndvi’, ‘vegetation indices index’, region_list) will return a dataframe where the NDVI of each province is multiplied by the fraction of total soybeans area is accounted for by that province. Thus taking the sum across provinces will give a crop weighted average of NDVI.

Parameters
  • weighting_crop_name (string) –

  • weighting_metric_name (string) –

  • item_name (string) –

  • metric_name (string) –

  • regions (list of dicts) – Each entry is a region with id and name

  • weighting_func (optional function) – A function of (weight, value) to apply. Default: weight*value

Returns

contains the data series for the given item_name, metric_name, for each region in regions, with values adjusted by the crop weight for that region.

Return type

pandas.DataFrame

CropModel.compute_gdd(tmin_series, tmax_series, base_temperature, start_date, end_date, min_temporal_coverage, upper_temperature_cap)[source]

Compute Growing Degree Days value from specific data series.

This function performs the low-level computation used in growing_degree_days().

Parameters
  • tmin_series (dict) – A data series object for min temperature e.g. {metric_id: 1, item_id: 2, region_id: 3, source_id: 4, frequency_id: 5}

  • tmax_series (dict) – A data series object for max temperature e.g. {metric_id: 1, item_id: 2, region_id: 3, source_id: 4, frequency_id: 5}

  • base_temperature (number) –

  • start_date (string) – YYYY-MM-DD date

  • end_date (string) – YYYY-MM-DD date

  • min_temporal_coverage (float, optional) –

  • upper_temperature_cap (number, optional) –

Returns

The sum of the GDD over all days in the interval

Return type

number

CropModel.growing_degree_days(region_name, base_temperature, start_date, end_date, min_temporal_coverage=1.0, upper_temperature_cap=inf)[source]

Get Growing Degree Days (GDD) for a region.

Growing degree days (GDD) are a weather-based indicator that allows for assessing crop phenology and crop development, based on heat accumulation. GDD for one day is defined as max(T_mean - T_base, 0), where T_mean is the average temperature of that day if available. Typically T_mean is approximated as (T_max + T_min)/2. If upper_temperature_cap is specified, T_mean is capped to not exceed that value.

The GDD over a longer time interval is the sum of the GDD over all days in the interval. Days where the data is missing contribute 0 GDDs, i.e. are treated as if T_mean = T_base. Use the temporal coverage threshold to avoid computing GDD with too little data.

The threshold and the base temperature should be carefuly selected based on fundamental understanding of the crops and region of interest.

The region can be any region of the Gro regions, from a point location to a district, province etc. This will use the best available data series for T_max and T_min for the given region and time period, using “find_data_series”. In the simplest case, if the given region is a weather station location which has data for the time period, then that will be used. If it’s a district or other region, the underlying data could be from one or more weather stations and/or satellite. To by-pass the search for available series, use compute_gdd() directly.

Parameters
  • region_name (string) –

  • base_temperature (number) –

  • start_date (string) – YYYY-MM-DD date

  • end_date (string) – YYYY-MM-DD date

  • min_temporal_coverage (float, optional) –

  • upper_temperature_cap (number, optional) –

Returns

The sum of the GDD over all days in the interval

Return type

number