engineering

January 07, 2021   |   19min read

Visualizing Dynamically Updated Large Vector Spatial Datasets

Use cases

Let’s say that you have an API that gathers spatial data from multiple devices and stores it in a database. Data is collected and visualized on the product owner’s device.

Basic usecase

Do I need an API to collect spatial data? Not really. In the open-source world, QGIS is the most popular software for spatial data collection. ArcGIS package is the most advanced when it comes to commercial software and has been dominating the market for years. You can connect them directly to the database.

If these tools meet your needs, you can skip the API and APP part of this article.

Instead of QGIS you can place any GIS software that meets your needs. Also, you can manage procedures or permissions on the database.

GIS usecase

However, if you need specific calculations, which are not present or cannot be effectively calculated in one of these tools or don’t want to pay for the pricey ArcGIS license, you can implement your own set of tools.

Another case to write your own app is when the spatial data is collected in the background or you want to streamline the data collection and provide a custom user experience embedded in the existing product. If you don’t have prior experience with large spatial vector datasets, the most intuitive solution could be storing the vector data on the server and render it on the client’s side.

The problem with client side rendering

With client side rendering of huge spatial datasets, the following problems make user experience completely broken:

  • Vector data weights MUCH more than compressed images
  • Rendering vector data on the clients side drains the battery or even freezes the browser card.

If you want to see these problems, I’ve provided them in this article’s example repository. You can find the repository here.

The article is full of repository and code references. Therefore, I strongly recommend cloning it. To do this, type in the shell:

git clone https://gitlab.polidea.com/tomasz.szerszen/visualizing-vector-spatial-datasets
cd visualizing-vector-spatial-datasets

To start the project, just type in shell in this article’s repository root directory (you need Docker with docker-compose for this to work):

docker-compose up

After the app starts, navigate to http://localhost:8000/invalid/counties and make yourself a cup of coffee ;)

It took me around two minutes on MBPro 2019 to see the response with the US Counties and each time I zoom in or out; it’s very buggy.

You can imagine how it would work on a mobile device. And this is only one layer. Most of the time, there is a need to display multiple layers. If you have only one dataset, that dataset contains only points, and you can filter it by map extent. Such a solution could work on a larger scale. However, in the case of many different layers containing polygons, without the possibility to filter data by the extent, it’s necessary to render spatial data on the server-side and send it to the client as rendered images.

Solution: server side rendering

Now navigate to http://localhost:8000/. You can see how quickly data was displayed. It’s because it was rendered on the server side instead of the client’s side. We didn’t have to send raw vector data to the client and then render it; we’ve sent a small, already rendered image.

GeoServer usecase

In this example, we use Django, PostgreSQL with PostGIS extension, and GeoServer. From now on, the provided graphs will use such a technological stack. However, technological stack does not really matter; it’s the idea of server side rendering that matters here, but why should one invent a wheel again?

Other languages which are popular in the GIS world are Java and C++, and there are awesome tools out there to implement everything the way you want in your favourite language.

If you don’t need a custom API, here’s a graph for popular GIS software. These software programs have great interfaces and tools which allow easy configuration.

GeoServer GIS usecase

Example implementation

If you are still with me, I assume you are familiar with programming, especially Python, Django, Docker, and the basics of GIS. The example in this article is based on Django integrated with GeoServer. Dataset used is US County Data.

Everything sits on the top of Docker. If you haven’t started the app before, remember that it’s required to have Docker and docker-compose. When you have it, to start the app, just type in the shell in the repository root directory:

docker-compose up

As you can see on the graphs at the begging of this article, we have the following services defined in our docker-compose.yml:

I assume that you’re familiar with the concepts of Django and Django Rest Framework. If not, there are dozens of sources on the web, which I recommend getting familiar with first. Here, I will explain mostly GIS specific stuff. You can work from the djangoapi container’s shell or setup GeoDjango locally.

Create table and populate it with spatial data

First, we need to download our example data. It’s in the article’s repository (djangoapi/counties/data/tl_2019_us_county) or you can download it from US County data. After downloading US County data in the Shapefile format, we need to create a table in the geodb database and then populate it with downloaded data.

python manage.py ogrinspect counties/data/tl_2019_us_county/tl_2019_us_county.shp County --mapping

Where County is the model name, which we would like to create. This command will give us the following result:

# This is an auto-generated Django model module created by ogrinspect.
from django.contrib.gis.db import models


class County(models.Model):
    statefp = models.CharField(max_length=2)
    countyfp = models.CharField(max_length=3)
    countyns = models.CharField(max_length=8)
    geoid = models.CharField(max_length=5)
    name = models.CharField(max_length=100)
    namelsad = models.CharField(max_length=100)
    lsad = models.CharField(max_length=2)
    classfp = models.CharField(max_length=2)
    mtfcc = models.CharField(max_length=5)
    csafp = models.CharField(max_length=3)
    cbsafp = models.CharField(max_length=5)
    metdivfp = models.CharField(max_length=5)
    funcstat = models.CharField(max_length=1)
    aland = models.BigIntegerField()
    awater = models.BigIntegerField()
    intptlat = models.CharField(max_length=11)
    intptlon = models.CharField(max_length=12)
    geom = models.PolygonField(srid=4269)


# Auto-generated `LayerMapping` dictionary for County model
county_mapping = {
    'statefp': 'STATEFP',
    'countyfp': 'COUNTYFP',
    'countyns': 'COUNTYNS',
    'geoid': 'GEOID',
    'name': 'NAME',
    'namelsad': 'NAMELSAD',
    'lsad': 'LSAD',
    'classfp': 'CLASSFP',
    'mtfcc': 'MTFCC',
    'csafp': 'CSAFP',
    'cbsafp': 'CBSAFP',
    'metdivfp': 'METDIVFP',
    'funcstat': 'FUNCSTAT',
    'aland': 'ALAND',
    'awater': 'AWATER',
    'intptlat': 'INTPTLAT',
    'intptlon': 'INTPTLON',
    'geom': 'POLYGON',
}

Line geom = models.PolygonField(srid=4269) in generated code tells us that the downloaded dataset is in 4269 projection. If you want to visualize multiple datasets on one map, it’s necessary to have all of them in one projection.

The most popular projection on the web is 4326 projection. Let’s store all of our data in 4326 projection. That’s why this line needs to change to geom = models.PolygonField(srid=4326).

In the generated code, all of our columns are non-nullable. If you open a downloaded shapefile with GIS software like QGIS, you can see that some of these fields are actually nulls in the dataset. For the sake of this example, we will just allow all of our columns to be nullable, except for the geom column. Here’s the final code:

from django.contrib.gis.db import models


class County(models.Model):
    statefp = models.CharField(max_length=2, null=True)
    countyfp = models.CharField(max_length=3, null=True)
    countyns = models.CharField(max_length=8, null=True)
    geoid = models.CharField(max_length=5, null=True)
    name = models.CharField(max_length=100, null=True)
    namelsad = models.CharField(max_length=100, null=True)
    lsad = models.CharField(max_length=2, null=True)
    classfp = models.CharField(max_length=2, null=True)
    mtfcc = models.CharField(max_length=5, null=True)
    csafp = models.CharField(max_length=3, null=True)
    cbsafp = models.CharField(max_length=5, null=True)
    metdivfp = models.CharField(max_length=5, null=True)
    funcstat = models.CharField(max_length=1, null=True)
    aland = models.BigIntegerField(null=True)
    awater = models.BigIntegerField(null=True)
    intptlat = models.CharField(max_length=11, null=True)
    intptlon = models.CharField(max_length=12, null=True)
    geom = models.MultiPolygonField(srid=4326)

which should be placed in your models.py in our case it’s in the counties app. You can find this code in counties/models.py. Remember to makemigrations and migrate your database and that your app is enabled in project’s settings.py.

We have the table in the database. But now, we have to load the actual data into it. To do this, we will create a script load_counties.py in the counties app. Let’s create counties/scripts folder, place them counties/scripts/__init__.py so django_extensions will see this folder as a scripts folder and finally create counties/scripts/load_counties.py.

If you don’t remember the ogrinspect command:

python manage.py ogrinspect counties/data/tl_2019_us_county/tl_2019_us_county.shp County --mapping

It gives us not only a model, but also LayerMapping, which we will use to map Shapefile columns into our database table.

Let’s place it in the load_counties.py script.

# Auto-generated `LayerMapping` dictionary for County model
county_mapping = {
    'statefp': 'STATEFP',
    'countyfp': 'COUNTYFP',
    'countyns': 'COUNTYNS',
    'geoid': 'GEOID',
    'name': 'NAME',
    'namelsad': 'NAMELSAD',
    'lsad': 'LSAD',
    'classfp': 'CLASSFP',
    'mtfcc': 'MTFCC',
    'csafp': 'CSAFP',
    'cbsafp': 'CBSAFP',
    'metdivfp': 'METDIVFP',
    'funcstat': 'FUNCSTAT',
    'aland': 'ALAND',
    'awater': 'AWATER',
    'intptlat': 'INTPTLAT',
    'intptlon': 'INTPTLON',
    'geom': 'POLYGON',
}

We will use the generated county_mapping in the LayerMapping utility. It should be imported from django.contrib.gis.utils import LayerMapping.

And run as followed:

lm = LayerMapping(model=County, data=shp_path, mapping=county_mapping, source_srs=4269, transform=True)
lm.save()

Since we changed the generated 4269 projection to 4326 projection, it’s important to tell LayerMapping to transform our dataset to the desired projection. Therefore, these two parameters are important:

source_srs=4269, transform=True

Data parameter should be a string (unfortunately python Path is not supported yet) containing a path to the Shapefile. Considering all of these we should end up with the following code:

import os

from django.conf import settings
from django.contrib.gis.utils import LayerMapping

from ..models import County


SHP_PATH = os.path.join(settings.BASE_DIR, 'counties/data/tl_2019_us_county/tl_2019_us_county.shp')

county_mapping = {
    'statefp': 'STATEFP',
    'countyfp': 'COUNTYFP',
    'countyns': 'COUNTYNS',
    'geoid': 'GEOID',
    'name': 'NAME',
    'namelsad': 'NAMELSAD',
    'lsad': 'LSAD',
    'classfp': 'CLASSFP',
    'mtfcc': 'MTFCC',
    'csafp': 'CSAFP',
    'cbsafp': 'CBSAFP',
    'metdivfp': 'METDIVFP',
    'funcstat': 'FUNCSTAT',
    'aland': 'ALAND',
    'awater': 'AWATER',
    'intptlat': 'INTPTLAT',
    'intptlon': 'INTPTLON',
    'geom': 'MULTIPOLYGON',
}


def run(
    shp_path=SHP_PATH,
):
    lm = LayerMapping(model=County, data=shp_path, mapping=county_mapping, source_srs=4269, transform=True)
    lm.save()

You can run it by typing in the shell:

python manage.py runscript load_counties

This command should populate your table with downloaded US Counties data.

Django Rest Framework GIS API

It’s mostly standard Django Rest Framework stuff. On top of the Django Rest Framework , we should install and enable Django Rest Framework GIS. Let’s focus on the serialization part. Standard GIS serializer for County model would look like this:

class CountyGeoSerializer(GeoFeatureModelSerializer):

    class Meta:
        model = County
        fields = '__all__'
        geo_field = 'geom'

And the corresponding viewset:

class CountyViewSet(viewsets.ModelViewSet):
    queryset = County.objects.all()
    serializer_class = CountySerializer

    def get_serializer_class(self):
        return CountyGeoSerializer

But if we return our spatial data in lists, the size of the response will be huge, and querying db + serialization will take a lot of time. You can check it out by visiting http://localhost:8000/invalid/counties. Personally, I would recommend not to return geometries in lists—GeoServer is responsible for it. To achieve this, let’s write another standard Django Rest Framework serializer:

class CountySerializer(serializers.ModelSerializer):

    class Meta:
        model = County
        exclude = ('geom', )

and return it in the viewset when action is of type list:

class CountyViewSet(viewsets.ModelViewSet):
    queryset = County.objects.all()
    serializer_class = CountySerializer

    def get_serializer_class(self):
        if self.action == 'list':
            return CountySerializer
        return CountyGeoSerializer

We shouldn’t ask the API for such heavy vector data, and integrate with GeoServer instead.

Geoserver integration

Geoserver is an awesome project and a comprehensive tool. You can use it with Django in uncountable ways. This article’s example is a straightforward to just keep you rolling with your project and give you the idea of Server side rendering in the GIS world.

If you want you can access GeoServer admin panel, the default username is admin and password is geoserver.

GeoServer login

After you login you should see the following panel:

GeoServer main

Please follow the security instructions provided there when you do your production deployment. You can see that we have one workspace, one store, and one layer. They were created with the djangoapi/scripts/publish_geoserver_model.py script. Below, I will explain this script to you and hopefully help you write your own.

During writing your own code, you can remove the workspace in GeoServer to see the results of your API requests in the GeoServer panel. Don’t worry; after you rebuild the project with docker-compose up --build, everything will work again, so you can’t break anything by playing with this example.

Django should have access to GeoServer. In this example, it’s done by depends_on directive in docker-compose.yml. When it comes to GeoServer, it should have access to geodb database; it’s done by depends_on, the same way Django has access to geoserver service.

In djangoapi/djangoapi/geoserver.py there is a class Geoserver, which is responsible for handling GeoServer requests in this article’s project.

Workspace creation

The code we will write below is based on GeoServer REST API documentation. The main unit of organization in GeoServer are workspaces. If you add anything to GeoServer, it has to be in some kind of workspace. Therefore, we need a piece of code that handles workspace creation:

class Geoserver:
    def __init__(
        self,
        url: str = os.getenv('GEOSERVER_URL', 'http://localhost:8000/geoserver'),
        username: str = os.getenv('GEOSERVER_USERNAME', 'admin'),
        password: str = os.getenv('GEOSERVER_PASSWORD', 'geoserver'),
    ):
        self.url = url
        self.auth = (username, password)

    def create_workspace(self, workspace_name: str) -> requests.Response:
        data = {
            'workspace': {
                'name': workspace_name,
                'enabled': True,
            }
        }
        response = requests.post(
            f'{self.url}/rest/workspaces',
            auth=self.auth,
            headers={'Content-type': 'application/json'},
            data=json.dumps(data),
        )
        return response

You can see that GeoServer requests are authenticated using basic username:password authentication.

GeoServer security is another subject, and it’s not essential to this article. The default username and password in GeoServer are admin:geoserver, and we will use it in our example. If you want to change the GeoServer authentication configuration, you have to set up a custom GeoServer data_dir. The easiest way to do it is to run this project locally. Configure your GeoServer in the desired way and then store your configuration in a volume.

In our case, we will configure our GeoServer workspace each time a Django app is deployed. That way, we have to code all of our configuration in Python, but feel free to do it differently depending on your use case.

Because workspace creation will fail if it already exists, let’s write a code to delete workspace:

    def delete_workspace(self, workspace_name: str, recurse: bool = False) -> requests.Response:
        url = f'{self.url}/rest/workspaces/{workspace_name}?quietOnNotFound=true&recurse={str(recurse).lower()}'
        response = requests.delete(
            url,
            auth=self.auth,
            headers={'Content-type': 'application/json'},
        )
        return response

We need ?recurse=true/false parameter, so we will be able to delete workspace, which have things such as datastores inside.

Connecting Geoserver to our PostGIS database

In Geoserver terminology, our database is called “datastore.” Datastores are a part of a workspace and could be anything that provides spatial data. In our case, it’s our geodb PostGIS database. According to the Geoserver’s rest documentation, we should send a text/xml POST request with name and connectionParams.

The following code should do the job:

    def create_postgis_datastore(
        self,
        workspace_name: str,
        datastore_name: str,
        postgres_db: str = os.getenv('POSTGRES_DB', 'geodb'),
        postgres_user: str = os.getenv('POSTGRES_USER', 'geodb'),
        pgpassword: str = os.getenv('PGPASSWORD', 'YOUR_POSTGRES_PASSWORD'),
        postgres_host: str = os.getenv('POSTGRES_HOST', 'geodb'),
        postgres_port: str = os.getenv('POSTGRES_PORT', '5432'),
    ):
        data = f'''
            <dataStore>
                <name>{datastore_name}</name>
                <connectionParameters>
                    <host>{postgres_host}</host>
                    <port>{postgres_port}</port>
                    <database>{postgres_db}</database>
                    <user>{postgres_user}</user>
                    <passwd>{pgpassword}</passwd>
                    <dbtype>postgis</dbtype>
                </connectionParameters>
            </dataStore>
        '''
        response = requests.post(
            f'{self.url}/rest/workspaces/{workspace_name}/datastores',
            auth=self.auth,
            headers={'Content-type': 'text/xml'},
            data=data,
        )
        return response

You can see in our URL f'{self.url}/rest/workspaces/{workspace_name}/datastores' that our datastore has to be a port of a previously created workspace. After this request, we should have access to the PostGIS database from Geoserver. In case you need to secure your installation and move it to production, I would recommend doing different database users for PostGIS with read-only access to only specific tables with geometries. You can also consider using a different database or schema for storing GIS data and reference it only by foreign keys. Your GeoServer password, your database connectionParameters are all stored in geoserver_data_dir volume. Remember to make sure access to these directories is also secured.

Publishing US County layer

So far, we have: 1. Downloaded US County Shapefile 2. Created County model for storing US County data 3. Wrote code responsible for workspace creation in GeoServer 4. Wrote code responsible for connecting GeoServer to our PostGIS database

The last thing to do is to publish the layer in the created workspace with our PostGIS datastore. To do it, we need to send the following request:

    def create_postgis_layer(self, workspace_name: str, datastore_name: str, table_name: str):
        data = f'''
            <featureType>
                <name>{table_name}</name>
            </featureType>
        '''
        response = requests.post(
            f'{self.url}/rest/workspaces/{workspace_name}/datastores/{datastore_name}/featuretypes',
            auth=self.auth,
            headers={'Content-type': 'text/xml'},
            data=data,
        )
        return response

We can now write a method that will publish our County model; even when there’s no workspace or datastore, it will handle its creation. In our case, the workspace is dropped each time the method below is run. It’s okay for this example and could be good in your project as well. It forces you to store all GeoServer configuration in code, making it easy to deploy and scale your project.

Here’s the code:

    def publish_models(
        self,
        models: list,
        workspace_name: str = os.getenv('GEOSERVER_WORKSPACE', 'geodb'),
        datastore_name: str = os.getenv('GEOSERVER_DATASTORE', 'geodb'),
        postgres_db: str = os.getenv('POSTGRES_DB', 'geodb'),
        postgres_user: str = os.getenv('POSTGRES_USER', 'geodb'),
        pgpassword: str = os.getenv('PGPASSWORD', 'YOUR_POSTGRES_PASSWORD'),
        postgres_host: str = os.getenv('POSTGRES_HOST', 'geodb'),
        postgres_port: str = os.getenv('POSTGRES_PORT', '5432'),
        silent=False,
    ):
        self.delete_workspace(workspace_name=workspace_name, recurse=True)
        self.create_workspace(workspace_name=workspace_name)
        response = self.create_postgis_datastore(
            workspace_name=workspace_name,
            datastore_name=datastore_name,
            postgres_db=postgres_db,
            postgres_user=postgres_user,
            pgpassword=pgpassword,
            postgres_host=postgres_host,
            postgres_port=postgres_port,
        )
        assert response.status_code == 201
        if not silent:
            print(f'GeoServer workspace {workspace_name} is connected to PostGIS through {datastore_name} datastore.')
        for model in models:
            table_name = model.objects.model._meta.db_table
            self.create_postgis_layer(
                workspace_name=workspace_name,
                datastore_name=datastore_name,
                table_name=table_name,
            )
            if not silent:
                print(f'Published table {table_name} in {workspace_name} workspace through {datastore_name} datastore.')

Remember to execute publish_models method:

from djangoapi.geoserver import Geoserver
from counties.models import County

def run():
    geoserver = Geoserver()
    geoserver.publish_models(models=[County])

After publish_models execution, the frontend should have access to the US County layer served by GeoServer.

This is a huge topic, and you probably want to style your layer using GeoServer SLD styles. And publish multiple layers, not one using layer groups.

Also remember to secure your installation. But this article is already longer than I expected it to be, so might write about these things in the future.

Consuming published layer on Frontend

You can use GeoServer’s rendered spatial data by connecting its WMS service to QGIS or by consuming it in a web browser using one of the GIS frontend libraries, like leaflet or OpenLayers.

GeoServer recommends to use OpenLayers in its examples. OpenLayers is the most advanced library, however, a bit harder to use than Leaflet.

We can consume our published layer like this:

const map = new ol.Map({
    target: 'map',
    layers: [
        new ol.layer.Tile({source: new ol.source.OSM()}),
        new ol.layer.Image({
            source: new ol.source.ImageWMS({
                ratio: 1,
                url: '{{ SITE_URL }}/geoserver/{{ POSTGRES_DB }}/wms',
                params: {
                    'FORMAT': 'image/png',
                    'VERSION': '1.1.1',
                    "STYLES": '',
                    "LAYERS": '{{ POSTGRES_DB }}:{{ DB_TABLE }}',
                    "exceptions": 'application/vnd.ogc.se_inimage',
                }
            })
        }),
    ],
    view: new ol.View({
        projection: new ol.proj.Projection({
            code: 'EPSG:4326',
            units: 'degrees',
            axisOrientation: 'neu',
            global: true
        })
    })

});
map.getView().fit([-131.04380106255752, 15.203957312557513, -60.731301062557534, 67.93833231255752], map.getSize());

In line target: 'map', we defined an HTML container in which map should be rendered. The map container has to have some size for our map to be displayed.

<style>
  #map {
    height: 100vh;
    width: 100vw;
  }
</style>

<div id="map"></div>

Variables in the following lines, are taken from Django context:

  • url: '{{ SITE_URL }}/geoserver/{{ POSTGRES_DB }}/wms'
  • "LAYERS": '{{ POSTGRES_DB }}:{{ DB_TABLE }}'

Here’s the code to render the page which consumes our GeoServer’s US Counties data:

class CountiesView(generic.TemplateView):
    template_name = 'counties.html'

    def get_context_data(self, **kwargs):
        return {
            'SITE_URL': settings.SITE_URL,
            'POSTGRES_DB': settings.DATABASES['default']['NAME'],
            'DB_TABLE': County.objects.model._meta.db_table,
        }

In this example, when you have docker-compose services up and running, by visiting http://localhost:8000/, you can see the same code working.

Project results

Summary

This article covers the general idea of publishing large spatial datasets, example provided in article’s repository does not cover large subjects like security, SLD Styling or Layer groups.

Once your app grows, you will probably need GeoWebCache to make your app handle big user loads. Also, we focused on vector data handled by datastores, and sometimes you need to handle raster data with coverage stores.

I hope that this article gave you the general idea of GeoServer and it’s integration with Django and what’s most important—why do we have to do it and render data on the server side. I hope this will help you get rolling with your project! :)

Please contact me if you have any notices about this article: tomasz.szerszen@polidea.com.

Have a wonderful day! :)

Tomasz Szerszeń

Software Engineer

Did you enjoy the read?

If you have any questions, don’t hesitate to ask!