Stata Geocoding Tutorial

This is a tutorial for using the OpenCage geocoding API in Stata.

Topics covered in this tutorial

Background

The code examples below will use your geocoding API key once you log in.

Before we dive in to the tutorial

  1. Sign up for an OpenCage geocoding API key.
  2. Play with the demo page, so that you see the actual response the API returns.
  3. Browse the API reference, so you understand the optional parameters, best practices, possible response codes, and the rate limiting on free trial accounts.

Install (or update) opencagegeo

opencagegeo is a Stata module written by Lars Zeigermann to access the OpenCage Geocoding API. You can find the newest version here.

* Install the Stata module and two required user-written stata libraries from SSC:
. ssc install opencagegeo
. ssc install libjson
. ssc install insheetjson

* If you already have opencagegeo installed make sure you have the newest version
. adoupdate opencagegeo, update

Geocoding a single address or pair of coordinates

To geocode a single address or coordinates, you can use opencagegeoi the immediate version of opencagegeo
* First you need to save your API key to a global macro 'mykey'
. global mykey YOUR-API-KEY
. opencagegeoi YOUR-ADDRESS-HERE
. opencagegeoi YOUR-LATITUDE,YOUR-LONGITUDE

Batch geocode addresses (forward geocoding)

* If you have a dataset of addresses stored in a single string variable 'address'
. opencagegeo, key(YOUR-API-KEY) fulladdress(address)

* If your addresses are stored in separate variables, e.g. house number in 'num', street name in 'str', city in 'city', and country in 'ctry':
. opencagegeo, key(YOUR-API-KEY) number(num) street(str) city(city) country(ctry)

Batch geocode coordinates (reverse geocoding)

* To geocode coordinates stored in a single variable 'coords' in the following format: latitude,longitude
. opencagegeo, key(YOUR-API-KEY) coordinates(coords)

* If your coordinates are stored in two separate variables 'lat' and 'lng'
. opencagegeo, key(YOUR-API-KEY) latitude(lat) longitude(lng)

Learn more

. help opencagegeo

Troubleshooting common problems

  • If your dataset is of any significant size (you have more than 20,000 locations to geocode) please read our guide to geocoding large datasets where we explain various strategies and points to consider.

  • Unfortunately Stata does not do well with parsing API responses that contain place names with apostrophes in the place name. For example the Earl's Court area of London. The problem is the apostrophes in our JSON response cause Stata's JSON parsing engine to die, thus causing the program to die.

    Adding to the confusion the opencagegeo module unhelpfully then falls back to a default error message which says

    Invalid key, rate limit exceeded or no internet connection

    which is simply incorrect. You can test your API key by clicking on the "Sample request using this key" link in your account dashboard.

    So, how can you solve this and go forward with your geocoding? The only solution we have found is to determine which of your queries is causing the response which leads to the problem and exclude that query from your data set. A tediuos process, we know, and sympathise.

    The other option is to use another programming language like R, Python, Matlab, etc. Happily, we have tutorials for all of those languages. but we also appreciate that it is not easy to jump to another language.

    Sorry. We welcome all suggestions as to how to prevent this bug. If anyone from StataCorp is reading this, please get in touch and we can supply examples. Stata is the only language where this seems to happen.

  • In older versions of this software there is an optional parameter paidkey which needs to be set if you are an OpenCage customer, so that the software can deal with the slight difference in format between free trial and paid responses. This is not needed in the newest version.

Start your free trial

2,500 geocoding API requests per day.

No credit card required.