Note: This document is for an older version of the Gro API Client. Please see the latest documentation.

API Reference

Basic Exploration

GroClient.lookup(entity_type, entity_id)

Retrieve details about a given id of type entity_type.

https://developers.gro-intelligence.com/gro-ontology.html

Parameters
  • entity_type ({ 'metrics', 'items', 'regions', 'frequencies', 'sources', 'units' }) –

  • entity_id (int) –

Returns

Example:

{ 'id': 274,
  'contains': [779, 780, ...]
  'name': 'Corn',
  'definition': 'The seeds of the widely cultivated corn plant <i>Zea mays</i>,'
                ' which is one of the world's most popular grains.' }

Return type

dict

GroClient.search(entity_type, search_terms)

Search for the given search term. Better matches appear first.

Parameters
  • entity_type ({ 'metrics', 'items', 'regions', 'sources' }) –

  • search_terms (string) –

Returns

Example:

[{'id': 5604}, {'id': 10204}, {'id': 10210}, ....]

Return type

list of dicts

GroClient.search_and_lookup(entity_type, search_terms, num_results=10)

Search for the given search terms and look up their details.

For each result, yield a dict of the entity and it’s properties.

Parameters
  • entity_type ({ 'metrics', 'items', 'regions', 'sources' }) –

  • search_terms (string) –

  • num_results (int) – Maximum number of results to return. Defaults to 10.

Yields

dict – Result from search() passed to lookup() to get additional details.

Example:

{ 'id': 274,
  'contains': [779, 780, ...],
  'name': 'Corn',
  'definition': 'The seeds of the widely cultivated...' }

See output of lookup(). Note that as with search(), the first result is the best match for the given search term(s).

GroClient.search_for_entity(entity_type, keywords)[source]

Returns the first result of entity_type that matches the given keywords.

Parameters
  • entity_type ({ 'metric', 'item', 'region', 'source' }) –

  • keywords (string) –

Returns

The id of the first search result

Return type

integer

GroClient.get_data_series(**selection)

Get available data series for the given selections.

https://developers.gro-intelligence.com/data-series-definition.html

Parameters
  • metric_id (integer, optional) –

  • item_id (integer, optional) –

  • region_id (integer, optional) –

  • partner_region_id (integer, optional) –

  • source_id (integer, optional) –

  • frequency_id (integer, optional) –

Returns

Example:

[{ 'metric_id': 2020032, 'metric_name': 'Seed Use',
   'item_id': 274, 'item_name': 'Corn',
   'region_id': 1215, 'region_name': 'United States',
   'source_id': 24, 'source_name': 'USDA FEEDGRAINS',
   'frequency_id': 7,
   'start_date': '1975-03-01T00:00:00.000Z',
   'end_date': '2018-05-31T00:00:00.000Z'
 }, { ... }, ... ]

Return type

list of dicts

GroClient.find_data_series(**kwargs)[source]

Find the best possible data series matching a combination of entities specified by name.

Example:

next(client.find_data_series(item="Corn",
                             metric="Futures Open Interest",
                             region="United States of America"))

will yield:

{ 'metric_id': 15610005, 'metric_name': 'Futures Open Interest',
  'item_id': 274, 'item_name': 'Corn',
  'region_id': 1215, 'region_name': 'United States',
  'partner_region_id': 0, 'partner_region_name': 'World',
  'frequency_id': 15, 'source_id': 81,
  'start_date': '1972-03-01T00:00:00.000Z', 'end_date': '2022-12-31T00:00:00.000Z' }

See https://developers.gro-intelligence.com/data-series-definition.html

This method uses search() to find entities by name and get_data_series() to find available data series for all possible combinations of the entities, and rank_series_by_source().

Parameters
  • metric (string, optional) –

  • item (string, optional) –

  • region (string, optional) –

  • partner_region (string, optional) –

  • start_date (string, optional) – YYYY-MM-DD

  • end_date (string, optional) – YYYY-MM-DD

Yields

dict – A sequence of data series matching the input selections, in quality rank order.

Data Retrieval

GroClient.get_data_points(**selections)[source]

Get all the data points for a given selection.

https://developers.gro-intelligence.com/data-point-definition.html

Example:

client.get_data_points(**{'metric_id': 860032,
                          'item_id': 274,
                          'region_id': 1215,
                          'frequency_id': 9,
                          'source_id': 2,
                          'start_date': '2017-01-01',
                          'end_date': '2017-12-31',
                          'unit_id': 15})

Returns:

[{  'start_date': '2017-01-01T00:00:00.000Z',
    'end_date': '2017-12-31T00:00:00.000Z',
    'value': 408913833.8019222, 'unit_id': 15,
    'reporting_date': None,
    'metric_id': 860032, 'item_id': 274, 'region_id': 1215,
    'partner_region_id': 0, 'frequency_id': 9, 'source_id': 2,
    'belongs_to': {
        'metric_id': 860032,
        'item_id': 274,
        'region_id': 1215,
        'frequency_id': 9,
        'source_id': 2
    }
}]

Note: you can pass the output of get_data_series() into get_data_points() to check what series exist for some selections and then retrieve the data points for those series. See quick_start.py for an example of this.

get_data_points() also allows passing a list of ids for metric_id, item_id, and/or region_id to get multiple series in a single request. This can be faster if requesting many series.

For example:

client.get_data_points(**{'metric_id': 860032,
                          'item_id': 274,
                          'region_id': [1215,1216],
                          'frequency_id': 9,
                          'source_id': 2,
                          'start_date': '2017-01-01',
                          'end_date': '2017-12-31',
                          'unit_id': 15})

Returns:

[{  'start_date': '2017-01-01T00:00:00.000Z',
    'end_date': '2017-12-31T00:00:00.000Z',
    'value': 408913833.8019222, 'unit_id': 15,
    'reporting_date': None,
    'metric_id': 860032, 'item_id': 274, 'region_id': 1215,
    'partner_region_id': 0, 'frequency_id': 9, 'source_id': 2,
    'belongs_to': {
        'metric_id': 860032,
        'item_id': 274,
        'region_id': 1215,
        'frequency_id': 9,
        'source_id': 2
    }
}, { 'start_date': '2017-01-01T00:00:00.000Z',
     'end_date': '2017-12-31T00:00:00.000Z',
     'value': 340614.19507563586, 'unit_id': 15,
     'reporting_date': None,
     'metric_id': 860032, 'item_id': 274, 'region_id': 1216,
     'partner_region_id': 0, 'frequency_id': 9, 'source_id': 2,
     'belongs_to': {
        'metric_id': 860032,
        'item_id': 274,
        'region_id': 1216,
        'frequency_id': 9,
        'source_id': 2
     }
}]
Parameters
  • metric_id (integer or list of integers) – How something is measured. e.g. “Export Value” or “Area Harvested”

  • item_id (integer or list of integers) – What is being measured. e.g. “Corn” or “Rainfall”

  • region_id (integer or list of integers) – Where something is being measured e.g. “United States Corn Belt” or “China”

  • partner_region_id (integer or list of integers, optional) – partner_region refers to an interaction between two regions, like trade or transportation. For example, for an Export metric, the “region” would be the exporter and the “partner_region” would be the importer. For most series, this can be excluded or set to 0 (“World”) by default.

  • source_id (integer) –

  • frequency_id (integer) –

  • unit_id (integer, optional) –

  • start_date (string, optional) – All points with start dates equal to or after this date

  • end_date (string, optional) – All points with end dates equal to or after this date

  • show_revisions (boolean, optional) – False by default, meaning only the latest value for each period. If true, will return all values for a given period, differentiated by the reporting_date field.

  • insert_null (boolean, optional) – False by default. If True, will include a data point with a None value for each period that does not have data.

  • at_time (string, optional) – Estimate what data would have been available via Gro at a given time in the past. See at-time-query-examples.ipynb for more details.

Returns

Return type

list of dicts

Geographic

GroClient.get_geojson(region_id)

Given a region ID, return a geojson shape information

Parameters

region_id (integer) –

Returns

Example:

{ 'type': 'GeometryCollection',
'geometries': [{'type': 'MultiPolygon',
                'coordinates': [[[[-38.394, -4.225], ...]]]}, ...]}

Return type

a geojson object or None

GroClient.get_descendant_regions(region_id, descendant_level=None, include_historical=True)

Look up details of all regions of the given level contained by a region.

Given any region by id, get all the descendant regions that are of the specified level.

Parameters
  • region_id (integer) –

  • descendant_level (integer, optional) – The region level of interest. See REGION_LEVELS constant. If not provided, get all descendants.

  • include_historical (boolean, optional) – True by default. If False is specified, regions that only exist in historical data (e.g. the Soviet Union) will be excluded.

Returns

Example:

[{
    'id': 13100,
    'contains': [139839, 139857, ...],
    'name': 'Wisconsin',
    'level': 4
} , {
    'id': 13101,
    'contains': [139891, 139890, ...],
    'name': 'Wyoming',
    'level': 4
}, ...]

See output of lookup()

Return type

list of dicts

GroClient.get_provinces(country_name)[source]

Given the name of a country, find its provinces.

Parameters

country_name (string) –

Returns

Example:

[{
    'id': 13100,
    'contains': [139839, 139857, ...],
    'name': 'Wisconsin',
    'level': 4
} , {
    'id': 13101,
    'contains': [139891, 139890, ...],
    'name': 'Wyoming',
    'level': 4
}, ...]

See output of lookup()

Return type

list of dicts

Advanced Exploration

GroClient.lookup_belongs(entity_type, entity_id)

Look up details of entities containing the given entity.

Parameters
  • entity_type ({ 'metrics', 'items', 'regions' }) –

  • entity_id (int) –

Yields

dict – Result of lookup() on each entity the given entity belongs to.

For example: For the region ‘United States’, one yielded result will be for ‘North America.’ The format of which matches the output of lookup():

{ 'id': 15,
  'contains': [ 1008, 1009, 1012, 1215, ... ],
  'name': 'North America',
  'level': 2 }
GroClient.rank_series_by_source(series_list)

Given a list of series selections, for each unique combination excluding source, expand to all available sources and return them in ranked order. The order corresponds to how well that source covers the selection (metrics, items, regions, and time range and frequency).

Parameters

series_list (list of dicts) – See the output of get_data_series().

Yields

dict – The input series_list, expanded out to each possible source, ordered by coverage.

Pandas Utils

GroClient.get_df(show_revisions=False)[source]

Call get_data_points() for each saved data series and return as a combined dataframe.

Note you must have first called either add_data_series() or add_single_data_series() to save data series into the GroClient’s data_series_list. You can inspect the client’s saved list using get_data_series_list().

Returns

The results to get_data_points() for all the saved series, appended together into a single dataframe. See https://developers.gro-intelligence.com/data-point-definition.html

Return type

pandas.DataFrame

GroClient.add_data_series(**kwargs)[source]

Adds the top result of find_data_series() to the saved data series list.

For use with get_df().

Parameters
  • metric (string, optional) –

  • item (string, optional) –

  • region (string, optional) –

  • partner_region (string, optional) –

  • start_date (string, optional) – YYYY-MM-DD

  • end_date (string, optional) – YYYY-MM-DD

Returns

Return type

None

GroClient.add_single_data_series(data_series)[source]

Save a data series object to the GroClient’s data_series_list.

For use with get_df().

Parameters

data_series (dict) – A single data_series object, as returned by get_data_series() or find_data_series(). See https://developers.gro-intelligence.com/data-series-definition.html

Returns

Return type

None

GroClient.get_data_series_list()[source]

Inspect the current list of saved data series contained in the GroClient.

For use with get_df(). Add new data series to the list using add_data_series() and add_single_data_series().

Returns

A list of data_series objects, as returned by get_data_series().

Return type

list of dicts

Crop Modeling

CropModel.compute_weights(crop_name, metric_name, regions)[source]

Compute a vector of ‘weights’ that can be used for crop-weighted average across regions, as in compute_crop_weighted_series().

For each region, the weight of is the mean value over time, of the given metric for the given crop, normalized so the sum across all regions is 1.0.

For example: say we have a region_list = [{‘id’: 1, ‘name’: ‘Province1’}, {‘id’: 2, ‘name’: ‘Province2’}]. This could be a list returned by search_and_lookup() or get_descendant_regions() for example. Now say model.compute_weights(‘soybeans’, ‘land cover area’, region_list) returns [0.6, 0.4], that means Province1 has 60% and province2 has 40% of the total area planted across the two regions, when averaged across all time.

Parameters
  • crop_name (string) –

  • metric_name (string) –

  • regions (list of dicts) – Each entry is a region with id and name

Returns

weights corresponding to the regions.

Return type

list of floats

CropModel.compute_crop_weighted_series(weighting_crop_name, weighting_metric_name, item_name, metric_name, regions)[source]

Compute the ‘crop-weighted average’ of the series for the given item and metric, across regions. The weight of a region is the fraction of the value of the weighting series represented by that region as explained in compute_weights().

For example: say we have a region_list = [{‘id’: 1, ‘name’: ‘Province1’}, {‘id’: 2, ‘name’: ‘Province2’}]. This could be a list returned by search_and_lookup() or client.get_descendant_regions for example. Now model.compute_crop_weighted_series(‘soybeans’, ‘land cover area’, ‘vegetation ndvi’, ‘vegetation indices index’, region_list) will return a dataframe where the NDVI of each province is multiplied by the fraction of total soybeans area is accounted for by that province. Thus taking the sum across provinces will give a crop weighted average of NDVI.

Parameters
  • weighting_crop_name (string) –

  • weighting_metric_name (string) –

  • item_name (string) –

  • metric_name (string) –

  • regions (list of dicts) – Each entry is a region with id and name

Returns

contains the data series for the given item_name, metric_name, for each region in regions, with values adjusted by the crop weight for that region.

Return type

pandas.DataFrame

CropModel.compute_gdd(tmin_series, tmax_series, base_temperature, start_date, end_date, min_temporal_coverage, upper_temperature_cap)[source]

Compute Growing Degree Days value from specific data series.

This function performs the low-level computation used in growing_degree_days().

Parameters
  • tmin_series (dict) – A data series object for min temperature e.g. {metric_id: 1, item_id: 2, region_id: 3, source_id: 4, frequency_id: 5}

  • tmax_series (dict) – A data series object for max temperature e.g. {metric_id: 1, item_id: 2, region_id: 3, source_id: 4, frequency_id: 5}

  • base_temperature (number) –

  • start_date (string) – YYYY-MM-DD date

  • end_date (string) – YYYY-MM-DD date

  • min_temporal_coverage (float, optional) –

  • upper_temperature_cap (number, optional) –

Returns

The sum of the GDD over all days in the interval

Return type

number

CropModel.growing_degree_days(region_name, base_temperature, start_date, end_date, min_temporal_coverage=1.0, upper_temperature_cap=inf)[source]

Get Growing Degree Days (GDD) for a region.

Growing degree days (GDD) are a weather-based indicator that allows for assessing crop phenology and crop development, based on heat accumulation. GDD for one day is defined as max(T_mean - T_base, 0), where T_mean is the average temperature of that day if available. Typically T_mean is approximated as (T_max + T_min)/2. If upper_temperature_cap is specified, T_mean is capped to not exceed that value.

The GDD over a longer time interval is the sum of the GDD over all days in the interval. Days where the data is missing contribute 0 GDDs, i.e. are treated as if T_mean = T_base. Use the temporal coverage threshold to avoid computing GDD with too little data.

The threshold and the base temperature should be carefuly selected based on fundamental understanding of the crops and region of interest.

The region can be any region of the Gro regions, from a point location to a district, province etc. This will use the best available data series for T_max and T_min for the given region and time period, using “find_data_series”. In the simplest case, if the given region is a weather station location which has data for the time period, then that will be used. If it’s a district or other region, the underlying data could be from one or more weather stations and/or satellite. To by-pass the search for available series, use compute_gdd() directly.

Parameters
  • region_name (string) –

  • base_temperature (number) –

  • start_date (string) – YYYY-MM-DD date

  • end_date (string) – YYYY-MM-DD date

  • min_temporal_coverage (float, optional) –

  • upper_temperature_cap (number, optional) –

Returns

The sum of the GDD over all days in the interval

Return type

number