API Reference

Basic Exploration

GroClient.lookup(entity_type, entity_ids)[source]

Retrieve details about a given id or list of ids of type entity_type.

https://developers.gro-intelligence.com/gro-ontology.html

Parameters:
  • entity_type ({ 'metrics', 'items', 'regions', 'frequencies', 'sources', 'units' }) –
  • entity_ids (int or list of ints) –
Returns:

A dict with entity details is returned if an integer is given for entity_ids. A dict of dicts with entity details, keyed by id, is returned if a list of integers is given for entity_ids.

Example:

{ 'id': 274,
  'contains': [779, 780, ...]
  'name': 'Corn',
  'definition': 'The seeds of the widely cultivated corn plant <i>Zea mays</i>,'
                ' which is one of the world's most popular grains.' }

Example:

{   '274': {
        'id': 274,
        'contains': [779, 780, ...],
        'belongsTo': [4138, 8830, ...],
        'name': 'Corn',
        'definition': 'The seeds of the widely cultivated corn plant'
                      ' <i>Zea mays</i>, which is one of the world's most popular'
                      ' grains.'
    },
    '270': {
        'id': 270,
        'contains': [1737, 7401, ...],
        'belongsTo': [8830, 9053, ...],
        'name': 'Soybeans',
        'definition': 'The seeds and harvested crops of plants belonging to the'
                      ' species <i>Glycine max</i> that are used in the production'
                      ' of oil and both human and livestock consumption.'
    }
}

Return type:

dict or dict of dicts

GroClient.search(entity_type, search_terms)[source]

Search for the given search term. Better matches appear first.

Parameters:
  • entity_type ({ 'metrics', 'items', 'regions', 'sources' }) –
  • search_terms (string) –
Returns:

Example:

[{'id': 5604}, {'id': 10204}, {'id': 10210}, ....]

Return type:

list of dicts

GroClient.search_and_lookup(entity_type, search_terms, num_results=10)[source]

Search for the given search terms and look up their details.

For each result, yield a dict of the entity and it’s properties.

Parameters:
  • entity_type ({ 'metrics', 'items', 'regions', 'sources' }) –
  • search_terms (string) –
  • num_results (int) – Maximum number of results to return. Defaults to 10.
Yields:

dict – Result from search() passed to lookup() to get additional details.

Example:

{ 'id': 274,
  'contains': [779, 780, ...],
  'name': 'Corn',
  'definition': 'The seeds of the widely cultivated...' }

See output of lookup(). Note that as with search(), the first result is the best match for the given search term(s).

GroClient.search_for_entity(entity_type, keywords)[source]

Returns the first result of entity_type that matches the given keywords.

Parameters:
  • entity_type ({ 'metrics', 'items', 'regions', 'sources' }) –
  • keywords (string) –
Returns:

The id of the first search result

Return type:

integer

GroClient.get_data_series(**selection)[source]

Get available data series for the given selections.

https://developers.gro-intelligence.com/data-series-definition.html

Parameters:
  • metric_id (integer, optional) –
  • item_id (integer, optional) –
  • region_id (integer, optional) –
  • partner_region_id (integer, optional) –
  • source_id (integer, optional) –
  • frequency_id (integer, optional) –
Returns:

Example:

[{ 'metric_id': 2020032, 'metric_name': 'Seed Use',
   'item_id': 274, 'item_name': 'Corn',
   'region_id': 1215, 'region_name': 'United States',
   'source_id': 24, 'source_name': 'USDA FEEDGRAINS',
   'frequency_id': 7,
   'start_date': '1975-03-01T00:00:00.000Z',
   'end_date': '2018-05-31T00:00:00.000Z'
 }, { ... }, ... ]

Return type:

list of dicts

GroClient.find_data_series(result_filter=None, **kwargs)[source]

Find data series matching a combination of entities specified by name and yield them ranked by coverage.

Example:

client.find_data_series(item="Corn",
                        metric="Futures Open Interest",
                        region="United States of America")

will yield a sequence of dictionaries of the form:

{ 'metric_id': 15610005, 'metric_name': 'Futures Open Interest',
  'item_id': 274, 'item_name': 'Corn',
  'region_id': 1215, 'region_name': 'United States',
  'frequency_id': 15, 'source_id': 81,
  'start_date': '1972-03-01T00:00:00.000Z', ...},
{ ... },  ...

See https://developers.gro-intelligence.com/data-series-definition.html

result_filter can be used to filter entity searches. For example:

client.find_data_series(item="vegetation",
                        metric="vegetation indices",
                        region="Central",
                        result_filter=lambda r: ('region_id' not in r or
                                                 r['region_id'] == 10393))

will only consider that particular region, and not the many other regions with the same name.

This method uses search(), get_data_series(), get_available_timefrequency() and rank_series_by_source().

Parameters:
  • metric (string, optional) –
  • item (string, optional) –
  • region (string, optional) –
  • partner_region (string, optional) –
  • start_date (string, optional) – YYYY-MM-DD
  • end_date (string, optional) – YYYY-MM-DD
  • result_filter (function, optional) – function taking data series selection dict returning boolean
Yields:

dict – A sequence of data series matching the input selections

Data Retrieval

GroClient.get_data_points(**selections)[source]

Get all the data points for a given selection.

https://developers.gro-intelligence.com/data-point-definition.html

Example:

client.get_data_points(**{'metric_id': 860032,
                          'item_id': 274,
                          'region_id': 1215,
                          'frequency_id': 9,
                          'source_id': 2,
                          'start_date': '2017-01-01',
                          'end_date': '2017-12-31',
                          'unit_id': 15})

Returns:

[{  'start_date': '2017-01-01T00:00:00.000Z',
    'end_date': '2017-12-31T00:00:00.000Z',
    'value': 408913833.8019222, 'unit_id': 15,
    'reporting_date': None,
    'metric_id': 860032, 'item_id': 274, 'region_id': 1215,
    'partner_region_id': 0, 'frequency_id': 9, 'source_id': 2,
    'belongs_to': {
        'metric_id': 860032,
        'item_id': 274,
        'region_id': 1215,
        'frequency_id': 9,
        'source_id': 2
    }
}]

Note: you can pass the output of get_data_series() into get_data_points() to check what series exist for some selections and then retrieve the data points for those series. See quick_start.py for an example of this.

get_data_points() also allows passing a list of ids for metric_id, item_id, and/or region_id to get multiple series in a single request. This can be faster if requesting many series.

For example:

client.get_data_points(**{'metric_id': 860032,
                          'item_id': 274,
                          'region_id': [1215,1216],
                          'frequency_id': 9,
                          'source_id': 2,
                          'start_date': '2017-01-01',
                          'end_date': '2017-12-31',
                          'unit_id': 15})

Returns:

[{  'start_date': '2017-01-01T00:00:00.000Z',
    'end_date': '2017-12-31T00:00:00.000Z',
    'value': 408913833.8019222, 'unit_id': 15,
    'reporting_date': None,
    'metric_id': 860032, 'item_id': 274, 'region_id': 1215,
    'partner_region_id': 0, 'frequency_id': 9, 'source_id': 2,
    'belongs_to': {
        'metric_id': 860032,
        'item_id': 274,
        'region_id': 1215,
        'frequency_id': 9,
        'source_id': 2
    }
}, { 'start_date': '2017-01-01T00:00:00.000Z',
     'end_date': '2017-12-31T00:00:00.000Z',
     'value': 340614.19507563586, 'unit_id': 15,
     'reporting_date': None,
     'metric_id': 860032, 'item_id': 274, 'region_id': 1216,
     'partner_region_id': 0, 'frequency_id': 9, 'source_id': 2,
     'belongs_to': {
        'metric_id': 860032,
        'item_id': 274,
        'region_id': 1216,
        'frequency_id': 9,
        'source_id': 2
     }
}]
Parameters:
  • metric_id (integer or list of integers) – How something is measured. e.g. “Export Value” or “Area Harvested”
  • item_id (integer or list of integers) – What is being measured. e.g. “Corn” or “Rainfall”
  • region_id (integer or list of integers) – Where something is being measured e.g. “United States Corn Belt” or “China”
  • partner_region_id (integer or list of integers, optional) – partner_region refers to an interaction between two regions, like trade or transportation. For example, for an Export metric, the “region” would be the exporter and the “partner_region” would be the importer. For most series, this can be excluded or set to 0 (“World”) by default.
  • source_id (integer) –
  • frequency_id (integer) –
  • unit_id (integer, optional) –
  • start_date (string, optional) – All points with end dates equal to or after this date
  • end_date (string, optional) – All points with start dates equal to or before this date
  • show_revisions (boolean, optional) – False by default, meaning only the latest value for each period. If true, will return all values for a given period, differentiated by the reporting_date field.
  • insert_null (boolean, optional) – False by default. If True, will include a data point with a None value for each period that does not have data.
  • at_time (string, optional) – Estimate what data would have been available via Gro at a given time in the past. See at-time-query-examples.ipynb for more details.
  • include_historical (boolean, optional) – True by default, will include historical regions that are part of your selections
Returns:

Return type:

list of dicts

Geographic

GroClient.get_geojson(region_id, zoom_level=7)[source]

Given a region ID, return shape information in geojson.

Parameters:
  • region_id (integer) –
  • zoom_level (integer, optional(allow 1-8)) – Valid if include_geojson equals True. If zoom level is specified and it is less than 6, simplified shapefile will be returned. Otherwise, detailed shapefile will be used by default.
Returns:

Example:

{ 'type': 'GeometryCollection',
'geometries': [{'type': 'MultiPolygon',
                'coordinates': [[[[-38.394, -4.225], ...]]]}, ...]}

Return type:

a geojson object or None

GroClient.get_descendant_regions(region_id, descendant_level=None, include_historical=True, include_details=True)[source]

Look up details of all regions of the given level contained by a region.

Given any region by id, get all the descendant regions that are of the specified level.

Parameters:
  • region_id (integer) –
  • descendant_level (integer, optional) – The region level of interest. See REGION_LEVELS constant. If not provided, get all descendants.
  • include_historical (boolean, optional) – True by default. If False is specified, regions that only exist in historical data (e.g. the Soviet Union) will be excluded.
  • include_details (boolean, optional) – True by default. Will perform a lookup() on each descendant region to find name, latitude, longitude, etc. If this option is set to False, only ids of descendant regions will be returned, which makes execution significantly faster.
Returns:

Example:

[{
    'id': 13100,
    'contains': [139839, 139857, ...],
    'name': 'Wisconsin',
    'level': 4
} , {
    'id': 13101,
    'contains': [139891, 139890, ...],
    'name': 'Wyoming',
    'level': 4
}, ...]

See output of lookup()

Return type:

list of dicts

GroClient.get_provinces(country_name)[source]

Given the name of a country, find its provinces.

Parameters:country_name (string) –
Returns:Example:
[{
    'id': 13100,
    'contains': [139839, 139857, ...],
    'name': 'Wisconsin',
    'level': 4
} , {
    'id': 13101,
    'contains': [139891, 139890, ...],
    'name': 'Wyoming',
    'level': 4
}, ...]

See output of lookup()

Return type:list of dicts

Advanced Exploration

GroClient.lookup_belongs(entity_type, entity_id)[source]

Look up details of entities containing the given entity.

Parameters:
  • entity_type ({ 'metrics', 'items', 'regions' }) –
  • entity_id (int) –
Yields:

dict – Result of lookup() on each entity the given entity belongs to.

For example: For the region ‘United States’, one yielded result will be for ‘North America.’ The format of which matches the output of lookup():

{ 'id': 15,
  'contains': [ 1008, 1009, 1012, 1215, ... ],
  'name': 'North America',
  'level': 2 }
GroClient.rank_series_by_source(selections_list)[source]

Given a list of series selections, for each unique combination excluding source, expand to all available sources and return them in ranked order. The order corresponds to how well that source covers the selection (metrics, items, regions, and time range and frequency).

Parameters:selections_list (list of dicts) – See the output of get_data_series().
Yields:dict – The input selections_list, expanded out to each possible source, ordered by coverage.
GroClient.get_available_timefrequency(**selection)[source]

Given a selection, return a list of frequencies and time ranges. The results are ordered by coverage-optimized ranking.

Parameters:
  • metric_id (integer, optional) –
  • item_id (integer, optional) –
  • region_id (integer, optional) –
  • partner_region_id (integer, optional) –
Returns:

Example:

[{
   'start_date': '2000-02-18T00:00:00.000Z',
   'frequency_id': 3,
   'end_date': '2020-03-12T00:00:00.000Z',
   'name': '8-day'
 }, {
   'start_date': '2019-09-02T00:00:00.000Z',
   'frequency_id': 1,
   'end_date': '2020-03-09T00:00:00.000Z',
   'name': u'daily'}, ... ]

Return type:

list of dicts

GroClient.get_top(entity_type, num_results=5, **selection)[source]

Find the data series with the highest cumulative value for the given time range.

Examples:

# To get FAO's top 5 corn-producing countries of all time:
client.get_top('regions', metric_id=860032, item_id=274, frequency_id=9, source_id=2)

# To get FAO's top 5 corn-producing countries of 2014:
client.get_top('regions', metric_id=860032, item_id=274, frequency_id=9, source_id=2,
               start_date='2014-01-01', end_date='2014-12-31')

# To get the United States' top 15 exports in the decade of 2010-2019:
client.get_top('items', num_results=15, metric_id=20032, region_id=1215, frequency_id=9,
               source_id=2, start_date='2010-01-01', end_date='2019-12-31')
Parameters:
  • entity_type ({ 'items', 'regions' }) – The entity type to rank, all other selections being the same. Only items and regions are rankable at this time.
  • num_results (integer, optional) – How many data series to rank. Top 5 by default.
  • metric_id (integer) –
  • item_id (integer) – Required if requesting top regions. Disallowed if requesting top items.
  • region_id (integer) – Required if requesting top items. Disallowed if requesting top regions.
  • partner_region_id (integer, optional) –
  • frequency_id (integer) –
  • source_id (integer) –
  • start_date (string, optional) – If not provided, the cumulative value used for ranking will include data points as far back as the source provides.
  • end_date (string, optional) –
Returns:

Example:

[
    {'metricId': 860032, 'itemId': 274, 'regionId': 1215, 'frequencyId': 9,
     'sourceId': 2, 'value': 400, 'unitId': 14},
    {'metricId': 860032, 'itemId': 274, 'regionId': 1215, 'frequencyId': 9,
     'sourceId': 2, 'value': 395, 'unitId': 14},
    {'metricId': 860032, 'itemId': 274, 'regionId': 1215, 'frequencyId': 9,
     'sourceId': 2, 'value': 12, 'unitId': 14},
]

Along with the series attributes, value and unit are also given for the total cumulative value the series are ranked by. You may then use the results to call get_data_points() to get the individual time series points.

Return type:

list of dicts

Pandas Utils

GroClient.get_df(show_revisions=False, index_by_series=False)[source]

Call get_data_points() for each saved data series and return as a combined dataframe.

Note you must have first called either add_data_series() or add_single_data_series() to save data series into the GroClient’s data_series_list. You can inspect the client’s saved list using get_data_series_list().

Returns:The results to get_data_points() for all the saved series, appended together into a single dataframe. See https://developers.gro-intelligence.com/data-point-definition.html If index_by_series is set, the dataframe is indexed by series. See https://developers.gro-intelligence.com/data-series-definition.html
Return type:pandas.DataFrame
GroClient.add_data_series(**kwargs)[source]

Adds the top result of find_data_series() to the saved data series list.

For use with get_df().

Parameters:
  • metric (string, optional) –
  • item (string, optional) –
  • region (string, optional) –
  • partner_region (string, optional) –
  • start_date (string, optional) – YYYY-MM-DD
  • end_date (string, optional) – YYYY-MM-DD
  • result_filter (function, optional) – function taking data series selection dict returning boolean
Returns:

The data_series that was added or None if none were found.

Return type:

data_series object, as returned by get_data_series().

GroClient.add_single_data_series(data_series)[source]

Save a data series object to the GroClient’s data_series_list.

For use with get_df().

Parameters:data_series (dict) – A single data_series object, as returned by get_data_series() or find_data_series(). See https://developers.gro-intelligence.com/data-series-definition.html
Returns:
Return type:None
GroClient.get_data_series_list()[source]

Inspect the current list of saved data series contained in the GroClient.

For use with get_df(). Add new data series to the list using add_data_series() and add_single_data_series().

Returns:A list of data_series objects, as returned by get_data_series().
Return type:list of dicts

Crop Modeling

CropModel.compute_weights(crop_name, metric_name, regions)[source]

Compute a vector of ‘weights’ that can be used for crop-weighted average across regions, as in compute_crop_weighted_series().

For each region, the weight of is the mean value over time, of the given metric for the given crop, normalized so the sum across all regions is 1.0.

For example: say we have a region_list = [{‘id’: 1, ‘name’: ‘Province1’}, {‘id’: 2, ‘name’: ‘Province2’}]. This could be a list returned by search_and_lookup() or get_descendant_regions() for example. Now say model.compute_weights(‘soybeans’, ‘land cover area’, region_list) returns [0.6, 0.4], that means Province1 has 60% and province2 has 40% of the total area planted across the two regions, when averaged across all time.

Parameters:
  • crop_name (string) –
  • metric_name (string) –
  • regions (list of dicts) – Each entry is a region with id and name
Returns:

weights corresponding to the regions.

Return type:

list of floats

CropModel.compute_crop_weighted_series(weighting_crop_name, weighting_metric_name, item_name, metric_name, regions, weighting_func=<function <lambda>>)[source]

Compute the ‘crop-weighted average’ of the series for the given item and metric, across regions. The weight of a region is the fraction of the value of the weighting series represented by that region as explained in compute_weights().

For example: say we have a region_list = [{‘id’: 1, ‘name’: ‘Province1’}, {‘id’: 2, ‘name’: ‘Province2’}]. This could be a list returned by search_and_lookup() or client.get_descendant_regions for example. Now model.compute_crop_weighted_series(‘soybeans’, ‘land cover area’, ‘vegetation ndvi’, ‘vegetation indices index’, region_list) will return a dataframe where the NDVI of each province is multiplied by the fraction of total soybeans area is accounted for by that province. Thus taking the sum across provinces will give a crop weighted average of NDVI.

Parameters:
  • weighting_crop_name (string) –
  • weighting_metric_name (string) –
  • item_name (string) –
  • metric_name (string) –
  • regions (list of dicts) – Each entry is a region with id and name
  • weighting_func (optional function) – A function of (weight, value) to apply. Default: weight*value
Returns:

contains the data series for the given item_name, metric_name, for each region in regions, with values adjusted by the crop weight for that region.

Return type:

pandas.DataFrame

CropModel.compute_gdd(tmin_series, tmax_series, base_temperature, start_date, end_date, min_temporal_coverage, upper_temperature_cap)[source]

Compute Growing Degree Days value from specific data series.

This function performs the low-level computation used in growing_degree_days().

Parameters:
  • tmin_series (dict) – A data series object for min temperature e.g. {metric_id: 1, item_id: 2, region_id: 3, source_id: 4, frequency_id: 5}
  • tmax_series (dict) – A data series object for max temperature e.g. {metric_id: 1, item_id: 2, region_id: 3, source_id: 4, frequency_id: 5}
  • base_temperature (number) –
  • start_date (string) – YYYY-MM-DD date
  • end_date (string) – YYYY-MM-DD date
  • min_temporal_coverage (float, optional) –
  • upper_temperature_cap (number, optional) –
Returns:

The sum of the GDD over all days in the interval

Return type:

number

CropModel.growing_degree_days(region_name, base_temperature, start_date, end_date, min_temporal_coverage=1.0, upper_temperature_cap=inf)[source]

Get Growing Degree Days (GDD) for a region.

Growing degree days (GDD) are a weather-based indicator that allows for assessing crop phenology and crop development, based on heat accumulation. GDD for one day is defined as max(T_mean - T_base, 0), where T_mean is the average temperature of that day if available. Typically T_mean is approximated as (T_max + T_min)/2. If upper_temperature_cap is specified, T_mean is capped to not exceed that value.

The GDD over a longer time interval is the sum of the GDD over all days in the interval. Days where the data is missing contribute 0 GDDs, i.e. are treated as if T_mean = T_base. Use the temporal coverage threshold to avoid computing GDD with too little data.

The threshold and the base temperature should be carefuly selected based on fundamental understanding of the crops and region of interest.

The region can be any region of the Gro regions, from a point location to a district, province etc. This will use the best available data series for T_max and T_min for the given region and time period, using “find_data_series”. In the simplest case, if the given region is a weather station location which has data for the time period, then that will be used. If it’s a district or other region, the underlying data could be from one or more weather stations and/or satellite. To by-pass the search for available series, use compute_gdd() directly.

Parameters:
  • region_name (string) –
  • base_temperature (number) –
  • start_date (string) – YYYY-MM-DD date
  • end_date (string) – YYYY-MM-DD date
  • min_temporal_coverage (float, optional) –
  • upper_temperature_cap (number, optional) –
Returns:

The sum of the GDD over all days in the interval

Return type:

number