Create a map of Budapest districts colored by income using folium in Python

Mor Kapronczay
HCLTech-Starschema Blog
4 min readJun 13, 2019

--

A vector map of Budapest. https://www.shutterstock.com/image-vector/black-white-vector-city-map-budapest-1035519106

Ever wondered how to draw a map of less common geographical areas? And color them based on some data? This pair of tutorials shows how to build this from scratch! First, you need to construct the border of your polygons — Part 1 is about this task. After that you need to create a map, and color those polygons according to some value of your interest. That will be shown in Part 2.

Part 1 of this tutorial is available here.

There are many tutorials on the internet for drawing maps in Python, even more sophisticated maps like heatmaps (where heat is basically the density of points in an area) or choropleth maps (where polygons are colored according to some arbitrary value). However, these tutorials are mainly done on states of the United States. View some great ones with plot.ly or an also otherwise superb one with folium. For the US, these packages have some convenience methods, but for the rest of the world, they’re of little use. Alternatively, they are based on some accidentally available json file like this one with Altair. Obviously, these are not general solutions to the problem of creating a map of some areas of your choice. I aim to give a general solution in these articles.

In this second part you will:

  1. learn how to create a GeoJSON file, the basic way to plot polygons on a map
  2. understand geodataframes, by which you can add additional data to polygons
  3. create a beautiful choropleth map using folium.

The code is also available on GitHub:

What is a GeoJson?

Python function creating geojson from a list of coordinates.

A GeoJson is actually a JSON file with some predetermined structure. The function above generates a GeoJSON file from a list of points by JSON dumping a dictionary formatted like a GeoJSON file. In these dictionaries, there are only 3 keys: lat, lon and name. Lat and lon stand for the coordinates, while name identifies the shape that the point belongs to. The order of these dictionaries in the list is also critical, you can read about this in the first part of this tutorial.

What is a GeoDataFrame?

GeoDataFrames are a subclass of classic `pandas` DataFrames with a special column. This special column, geometry contains all the information that makes this DataFrame geo-aware. A GeoDataFrame object can be easily created from a GeoJSON file, you can add additional data to polygons easily, and just as easily convert it back to a GeoJSON file for visualization purposes.

GeoDataFrames are easy to work with!

In this snippet to the left centroids are added to each polygon. By this, we can add a nice marker to the centroid’s location depicting information about the polygon - in this case, the district. Just as easily as this, other data can be added using a plain pandas merge. Here, income tax per capita data is added to each polygon!

Creating the map

The full code creating the beautiful map!

The creation of the map consists of 4 important steps:

  1. Creating the base map
  2. Creating the choropleth
  3. Adding markers as a FeatureGroup
  4. Enabling LayerControl

Firstly, map creation is just as easy as it seems. You should provide a starting position, a starting zoom, and a tiles argument which is responsible for the basic design of your map. Be aware, that folium expects coordinates in latitude, longitude order!

Secondly, creating the choropleth is also easy, but it requires a neatly structured GeoJSON file. The parameters of the choropleth method speak for themselves, but you can refer to the documentation if something is not clear. This creates a layer where polygons are colored according to income tax per capita data — the more the greener, the less the yellower.

Thirdly, A FeatureGroup object is created. This object consist of Marker objects signalling each district’s name and the respective income tax per capita value to give the reader exact amounts apart from the color coding from choropleth. Lastly, LayerControl is added to the map in order to make it possible to show or hide layers, such as the choropleth or the FeatureGroup layer.

Finally, the map can be saved and the created .html file opened using your browser. You can find it on my github, in the outputs folder, but here is a snippet of it:

Takeaways

The main takeaway here is that in creating a beautiful choropleth map — just like in any data science project — data preparation takes around 90% of the effort. We got to know OpenStreetMap data structures and the Overpass API, and solved an interesting problem in Part 1. In Part 2, using GeoDataFrames made it easier to add information to our GeoJSON files and maps. It is also clear for me, that if you are creating maps in Python, folium is the way to go. As you can see, creating a map can be a one-liner, and creating choropleth maps or showing markers with html popups are really easy with folium.

--

--

Mor Kapronczay
HCLTech-Starschema Blog

Machine Learning Team Lead, Chatbot Developer@K&H (KBC Bank Hungary)