By Alejandro | June 15, 2020 | 3 min read

Spatial Cache using geohash

Currently, map applications are widely used, and it is very common to show points of interest to the user. There are many geospatial data providers to get those points, either through downloadable information
(OSM) or API services.

The first solution requires more than 1TB of storage for the entire planet, which makes it difficult (expensive) to host on a server. Also, the information is scattered and imprecise. For this reasons, the option of requesting points from an API is widely used, which has its own disadvantages. One of them is the limit of requests – monthly and per second - if you do not have a payment plan. In addition, using an external API to query several points of interest can take some time to respond.
These drawbacks lead to the idea of saving the results of the queries to only request them once. In this way, a query that coincides with one previously made will be answered from the database faster.

How to save the information?

This question is given when you try to identify if a query has already been made. A first approach to this problem is to save all the points, but then come up the question of how to know if an area of the map is entirely covered. That led us to not only save the points, but also the query. Let's say queries are like: "Get the points of interest close to a center (latitude, longitude) and within a radius". If you try to match by the coordinates of the center, the probability that two queries coincide is almost zero. If we save the circumferences, then we can have “gaps”, in addition to the fact that there can be many overlapping circumferences, an undesired effect, since it can considerably increase the size of the database.

Spatial cache
Spatial cache

As we mentioned before, if we save that information without a logical structure, then we will not be able to identify the sections of the world we have and those that we need to obtain. For this, the world was subdivided in the form of a grid or quadrants, where each quadrant is subdivided in the same way. In this way a tree-like structure is produced (Figure 1: Gaps among queries) - (Figure 2: Overlapping circumferences) where each node represents a region that does not have an intersection with another of the same level. The fact that nodes of the same level are disjoint allows us to "fill them" completely without overlaps when we make a query, and in this way we know if we have that information or not. Building this tree can be somewhat tedious and difficult to represent in a relational model. To simplify this process we use an equivalence known as geohash.

What is a geohash?

A geohash is a way to map these quadrants in the form of text to avoid building the structure itself. In this way, each quadrant has a unique string that represents it and vice versa. The image shows how to reach a specific region by interpreting these strings.

Spatial cache

If each level is divided into 4 quadrants, then the strings can only be formed by 4 possible values (0-3 in this case), which would cause the geohash of a small region to lengthy. The idea is to be able to "go down" faster, that is, to take bigger "steps". To do this, a base 32 representation system is used instead of 4. In this way, each level will be divided into 32 quadrants, so that the tree is wider but shorter. This structure allows to represent the region of a block with about 7 characters.

Spatial cache

How do we process the queries made to our system?

When a query is made to our system (in the form of a circumference), the closest geohash (longest string) to the query is identified. If this quadrant, or one that contains it, is in our database (and has not expired), the points of interest associated with it are automatically returned to the user. Otherwise, the quadrants that intersect the query are identified and the points are searched in the API.