Geospatial Data Visualization
Geospatial Data Visualization is an effort to represent the importance of location data by providing visual context. Verbal skills are not enough to present geographic information and hence graphical skills are required to understand trends, patterns, correlations to help draw conclusions. In the arena of massive dataset generation, it has become fundamental to provide visual context which otherwise might go unnoticed in tabular/text information to make meaningful data-driven decisions. A successful visualisation effort begins with knowing why it’s being made. A good amount of efforts and specialised tools were earlier required for creating effective visualisation. But with the advent of modern tools and technologies, the effort to create informative visualisation has become lot simpler and easier. In this fast-evolving world, Geospatial Data Visualization understanding, and creation is not only limited to system analyst and data scientist but has expanded footprints across business verticals such as marketing, advertising, finance, e-commerce, government, telecommunications, or anything else to make data-driven decision making.
Why is Geospatial Data Visualization so important?
Geographic data (Geo data) science is a subset of data science that deals with location-based data i.e. description of objects and their relationship in space. In today’s world, Geographic data is key, and most businesses/applications /services revolve around the location element. So, in order to derive faster insights from location-based datasets visualisation is important.
Communication skills are crucial to every work and geospatial data visualisation is an effective mode to present geographic information for the purpose of drawing conclusions. In addition, a human work captures and retains graphics/images much easily than hundred rows of spreadsheet with information. Thus, a visual summary of information makes it easier to identify patterns and trends to gain insights. To summarise, geospatial data visualisation provides:
- the ability to visualize location related information easily, and improve insights to foster decisions
- an improved ability to convey information to the audience that they can understand
- more accessible and understandable information without the need for being verbal explanation
- an increased ability to act on observations, therefore, achieve success with greater agility
How to create Geospatial Data Visualization using Open Source libraries?
A wide variety of libraries exists which support creation of Geospatial Visualization, a few of which are listed below (list is not exhaustive)
- folium: It builds on the data wrangling strengths of the Python ecosystem and the mapping strengths of the leaflet.js library
- d3-geo — A JavaScript library for rendering geographic information in the form of map
- mplleaflet — A Python library that converts a matplotlib plot into a webpage containing a pannable, zoomable Leaflet map. It can also embed the Leaflet map in an IPython notebook.
- geoplot: A high-level Python geospatial plotting library. It’s an extension to cartopy and matplotlib which makes mapping easy: like seaborn for geospatial
- geoJs: GeoJS is intended to bridge the gap between GIS, Geospatial Visualization, Scientific Visualization and Infovis. GeoJS is more than just a GIS library as users can create scientific plots such as vector and contour and can embed infovis plots using D3.js.
This article will attempt to show you the use of folium library and its functions to create powerful geospatial visualisations using various open source data. These examples can be used in various use cases such as crime mapping, real estate analysis, demographic insights to understand the patterns, outliers, correlations and trends which might go unnoticed in the plethora of tables and spreadsheets containing massive amounts of data.
Pre-requisites
This article assumes basic knowledge of Python and Jupyter notebook, and Pandas library.
Folium Library
Folium builds on the data wrangling strengths of the Python ecosystem and the mapping strengths of the Leaflet.js library to help visualise geospatial data. The data is manipulated in Python and then visualised in a Leaflet map via folium. The maps are interactive in nature and can be built based on latitude and longitude values and visualised straight within the notebooks environment which most of us prefer.
Installation
Folium can be installed on the system by any of the two methods below:
$ pip install folium or
$ conda install -c conda-forge folium
Dataset
Source: https://www.kaggle.com/harlfoxem/housesalesprediction
The dataset is available for download from the link above. This dataset contains property sale prices for King County, Washington, USA. King County is considered the most populous county in Washington, and the 12th-most populous in the United States. It includes 21,613 property sold observations between May 2014 and May 2015 with 23 variables which help define property characteristics. The data is densely clustered around Seattle-Bellevue-Renton-Kent-Federal Way-Tacoma area, an urban conglomeration.
Exploratory Data Analysis
# Import necessary libraries
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)
#!pip install folium — install folium library for mapping
import folium
from folium.plugins import MarkerCluster
# Import property attributes dataset
df = pd.read_csv(“innercity.csv”)
# Check a few rows of the loaded dataset to ensure if data is loaded is correctly
df.head()
# Check the shape of dataframe
df.shape
(21613, 23)
Property data contains 21,613 observations with 23 fields/attributes.
# Check the name of fields in data
df.columns
Index([‘cid’, ‘dayhours’, ‘price’, ‘room_bed’, ‘room_bath’, ‘living_measure’,
‘lot_measure’, ‘ceil’, ‘coast’, ‘sight’, ‘condition’, ‘quality’,
‘ceil_measure’, ‘basement’, ‘yr_built’, ‘yr_renovated’, ‘zipcode’,
‘lat’, ‘long’, ‘living_measure15’, ‘lot_measure15’, ‘furnished’,
‘total_area’],
dtype=’object’)
Folium need latitude/longitude attributes in the data to be able to run its functionalities and this dataset has these two fields.
# Check the data type and null values present in fields.
df.info()
# Count of unique values in each field
df.nunique()
# Check for missing values
df.isnull().sum()
# Check rows with missing values
df[df.isnull().any(axis=1)]
# Viewing the data statistics
df.describe()
Describe function illustrates that various fields/attributes have 0 values in the data. Whether these Zero are meaningful or require cleansing would require further data exploration. A few outliers are present in the data (for example75% of the data in field room_bed is within limits of 3 bedroom but value such as 33 is seen as max value) which need to be imputed with right strategy. Skewness is present in the data as well.
The above analysis has nothing to do with the visualisation but it’s really important to take a deep look at the data you plan to work, which is a sign of good data scientist :)
Geospatial Data Visualization with Folium Library
Folium is a powerful Python library that helps create several map visualisation viz. map, cluster maps, terrain maps, heatmap etc. The Folium maps are interactive and easy to use which makes this library very powerful to be used within Jupyter notebook or in dashboard building. To illustrate a few of its capabilities, a few example have been demonstrated below with code snippets.
Map with simple marker
# Plotting the map of property data location with OpenStreet as basemap
# Create Map: Basemap — OpenStreet Map
property_map = folium.Map(
location=[df[‘lat’].mean(),
df[‘long’].mean()],
zoom_start=11,
control_scale=True,
)
for i in range(len(df)):
folium.CircleMarker(
location = [df.lat.iloc[i], df.long.iloc[i]],
radius = 3,
popup = df.cid.iloc[i],
color = ‘blue’,
opacity = 0.2
).add_to(property_map)
property_map
There are several parameters within folium.Map class function which are discussed below in details:
- folium.Map(): property_map object has been created through folium.Map() object which holds information related to map center, zoom, scale, tiles etc.
- location: Location parameter within folium.Map() object helps define the Latitude (Y value) and Longitude (X value) fields to define center of the map. There are various ways to center the map viz. either you can specify center coordinates (if you know where to center your map) by providing latitude and longitude values (location = [47.6062, -122.3321]) or you can take mean values of the latitude and longitude from the data (location = [df[‘lat’].mean(), df[‘lon’].mean()]).
- zoom: The zoom parameter sets the magnification level for the map which is going to be rendered at first instance.
- tiles: Tiles parameter option is not added here as ‘OpenStreetMap’ which is the default tile is being used as background map. There is various tiling server which can be specified as background map within folium.
There are also various parameters that can be set within folium class method, which you can read up from folium github.
There are several parameters associated with folium.CircleMarker class function which helps create circle of fixed size with specified radius and are discussed in detail below:
- location: Latitude and Longitude pair (Northing and Easting values or X & Y coordinates) which defines a feature’s geographic location
- radius: Value defines radius of the circle marker, in pixels
- popup: Input text or visualization for object displayed when clicking feature layer
- color: color of the marker
- opacity: opacity value of the marker
Map is now created with OpenStreet as background map and feature layer overlaid on it. +/- icon the map allows you to zoom in and zoom out. You can also pan across the map. Scale present at the bottom changes with zoom level and represent scale value both in km and miles. Great, first task is accomplished! Congratulate yourself for making your first map without any hassle :)
Cluster Map
Numerous data points are clumped together and hence maps looks little cluttered. As each marker is displayed so close proximity ones overlap with each other and getting in the way of clarity. Hence, map loses its charm of good visualisation and will not encourage users to explore more. This is where concept of clustering-based visualisation come into picture. Marker clustering is a useful method of aggregating nearby markers into a single cluster with a count of the number of points contained in the cluster. The clusters are created based on the viewer’s map bounding box thus making it easier to understand a map as a whole. The clustering techniques provides better user experience on maps than densely marker map. So, let’s apply clustering techniques with folium too to make our map look better.
# Create Map with clustering: Basemap — OpenStreet Map
property_map = folium.Map(
location=[df[‘lat’].mean(),
df[‘long’].mean()],
zoom_start=11)
#creating a Marker for each point in dataframe. Each point will get a popup with their zipcode value
mc = MarkerCluster()
for row in df.itertuples():
mc.add_child(folium.Marker(location=[row.lat,row.long], popup=row.zipcode))
property_map.add_child(mc)
property_map
Heatmap
Till now, we were focused around feature representation on the map. What if we can provide context to this feature representation, for example, location w.r.t property price can be depicted. This can be achieved using a class method called Heatmap(). This class function overlay a heat map over the map object.
There are several parameters within Heatmap() class function which are used in the map and discussed below in details:
- data (list of points of the form [lat, lng] or [lat, lng, weight]): The points for which heat map need to be generated.
- name: The name of the layer, as it will appear in LayerControls.
- min_opacity (default 1.): The minimum opacity the heat will start at
- max_zoom (default 18): Zoom level where the points reach maximum intensity (as intensity scales with zoom), equals maxZoom of the map by default
- radius (int, default 25): Radius of each “point” of the heatmap
- overlay (bool, default True): Adds the layer as an optional overlay (True) or the base layer (False)
- control (bool, default True): Whether the Layer will be included in LayerControls
- show (bool, default True): Whether the layer will be shown on opening (only for overlays).
# Create Heatmap
from folium.plugins import HeatMap
property_map = folium.Map(
location=[df[‘lat’].mean(),
df[‘long’].mean()],
tiles=’Stamen Terrain’,
attr=’Map tiles by Stamen Design, under CC BY 3.0. Data by OpenStreetMap, under ODbL’,
zoom_start=11)
df[‘count’] = 1
property_heatmap = HeatMap(
data=df[[‘lat’, ‘long’, ‘count’]].groupby([‘lat’, ‘long’]).sum().reset_index().values.tolist(),
name = ‘Heatmap’,
radius = 10,
min_opacity = 0.1,
max_zoom=16
).add_to(property_map)
folium.LayerControl().add_to(property_map)
property_map
The heatmap shows that high property price concentration is around Seattle with a few pockets of high prices in and around Bellevue metropolitan area.
Summary
Folium is highly intuitive and flexible to use to perform Geospatial data visualisation. Various features such as map displays with markers, clustering of markers, heatmap creation are inbuilt with the folium class to help analyse pattern, trends, outliers and more which might not be possible through tabular/graphical data. It brings location-centric data visualisation which is more meaningful and impactful to the users.
Python Notebook
The accompanying Jupyter Notebook used in this article is available here .