305 lines
12 KiB
Markdown
305 lines
12 KiB
Markdown
|
---
|
||
|
title: PostGIS and the Hard Rock Cafe collection
|
||
|
date: 2024-05-27T18:00:00+02:00
|
||
|
categories:
|
||
|
- postgresql
|
||
|
---
|
||
|
|
||
|
In 2007, I went to the west coast of the United States where I visited a Hard
|
||
|
Rock Cafe store in Hollywood CA and bought the first T-Shirt of my collection.
|
||
|
Now, I'm an open-source DBA and my number one database is
|
||
|
[PostgreSQL](https://www.postgresql.org/). So I've decided to use the
|
||
|
[PostGIS](https://postgis.net/) extension to plan my next vacations or
|
||
|
conferences to buy more T-Shirts.
|
||
|
|
||
|
# The collection
|
||
|
|
||
|
## Inventory
|
||
|
|
||
|
Let's do the inventory of my collected T-Shirts by looking at my closet.
|
||
|
|
||
|
![](/hrc/t-shirts.jpg)
|
||
|
|
||
|
The titles represent a location, generally a city but it could be a known place
|
||
|
like a stadium. This will be enough to find the related shop in a database.
|
||
|
Titles can be written into a file:
|
||
|
|
||
|
```
|
||
|
Dublin
|
||
|
Los Angeles
|
||
|
Prague
|
||
|
London
|
||
|
```
|
||
|
|
||
|
([collection.csv](/hrc/collection.csv))
|
||
|
|
||
|
## Coordinates
|
||
|
|
||
|
The next step is to add coordinates. This information can be found by querying
|
||
|
the [Nominatim](https://nominatim.org/) API based on the
|
||
|
[OpenStreetMap](https://www.openstreetmap.org) community-driven project.
|
||
|
|
||
|
```python
|
||
|
#!/usr/bin/env python3
|
||
|
import requests
|
||
|
import csv
|
||
|
import time
|
||
|
|
||
|
|
||
|
if __name__ == "__main__":
|
||
|
headers = {"User-Agent": "Hard Rock Cafe Blog Post From Julien Riou"}
|
||
|
session = requests.Session()
|
||
|
session.headers.update(headers)
|
||
|
|
||
|
with open("collection_with_coordinates.csv", "w") as dest:
|
||
|
writer = csv.writer(dest)
|
||
|
with open("collection.csv", "r") as source:
|
||
|
for row in csv.reader(source):
|
||
|
name = row[0]
|
||
|
r = session.get(f"https://nominatim.openstreetmap.org/search?q={name}&limit=1&format=json")
|
||
|
time.sleep(1)
|
||
|
r.raise_for_status()
|
||
|
data = r.json()
|
||
|
if len(data) == 1:
|
||
|
data = data[0]
|
||
|
writer.writerow([name, data["lon"], data["lat"]])
|
||
|
else:
|
||
|
print(f"Location not found for {title}, skipping")
|
||
|
```
|
||
|
|
||
|
([collection.py](/hrc/collection.py))
|
||
|
|
||
|
The Python script iterates over the `collection.csv` file to query the
|
||
|
Nominatim API to find the most relevent OpenStreetMap node then writes
|
||
|
coordinates in the `collection_with_coordinates.csv` file using the
|
||
|
coma-separated values (CSV) format.
|
||
|
|
||
|
```
|
||
|
Dublin,53.3493795,-6.2605593
|
||
|
Los Angeles,34.0536909,-118.242766
|
||
|
Prague,50.0596288,14.446459273258009
|
||
|
London,51.4893335,-0.14405508452768728
|
||
|
```
|
||
|
|
||
|
([collection_with_coordinates.csv](/hrc/collection_with_coordinates.csv))
|
||
|
|
||
|
# The shops (or "nodes")
|
||
|
|
||
|
Now we need a complete list of Hard Rock Cafe locations with their coordinates
|
||
|
to match the collection.
|
||
|
|
||
|
## OpenStreetMap
|
||
|
|
||
|
My first idea was to use OpenStreetMap that should provide the needed dataset.
|
||
|
I tried to use the Nominatim API but the queries are [limited to 40
|
||
|
results](https://nominatim.org/release-docs/latest/api/Search/#parameters). I
|
||
|
could [download the entire
|
||
|
dataset](https://wiki.openstreetmap.org/wiki/Downloading_data), import it in a
|
||
|
[PostgreSQL instance](https://osm2pgsql.org/) locally but it would have been
|
||
|
space and time consuming. So, I used the [Overpass
|
||
|
API](https://wiki.openstreetmap.org/wiki/Overpass_API) with [this
|
||
|
query](https://overpass-turbo.eu/s/1LwQ) (thanks
|
||
|
[Andreas](https://andreas.scherbaum.la/)). In the end, the quality of the data
|
||
|
was not satisfying. The amenity was restaurant, bar, pub, cafe, nightclub or
|
||
|
shop. The name had an accent ("é") or not ("e"). Sometimes, the brand was not
|
||
|
reported. Even with all those filters, there was a
|
||
|
[node](https://www.openstreetmap.org/node/6260098710) that was not even a Hard
|
||
|
Rock Cafe. The more the query grew, the more I wanted to use another method.
|
||
|
|
||
|
## Website
|
||
|
|
||
|
I decided to parse the official website. By using a well-known library like
|
||
|
[Selenium](https://selenium-python.readthedocs.io/) or
|
||
|
[ferret](https://www.montferret.dev/)? Given the personal time I had for this
|
||
|
project, I've chosen the quick and dirty path. Let me present you the ugly but
|
||
|
functional one-liner to parse the official Hard Rock Cafe website:
|
||
|
|
||
|
```
|
||
|
curl -sL https://cafe.hardrock.com/locations.aspx | \
|
||
|
grep 'var currentMapPoint=' | \
|
||
|
sed "s/.*{'title':/{'title':/g;s/,'description.*/}/g;s/'/\"/g" | \
|
||
|
sed 's/{"title"://g;s/"lat":/"/g;s/,"lng":/","/g;s/}/"/g' \
|
||
|
> nodes.csv
|
||
|
```
|
||
|
|
||
|
Very ugly, not future-proof, but it did the job.
|
||
|
|
||
|
```
|
||
|
"Hard Rock Cafe Amsterdam","52.36211","4.88298"
|
||
|
"Hard Rock Cafe Andorra","42.507707","1.531977"
|
||
|
"Hard Rock Cafe Angkor","13.35314","103.85676"
|
||
|
"Hard Rock Cafe Asuncion","-25.2896910","-57.5737599"
|
||
|
```
|
||
|
|
||
|
([nodes.csv](/hrc/nodes.csv))
|
||
|
|
||
|
# Data exploration
|
||
|
|
||
|
The tool of choice to import and analyze this data is PostgreSQL and its
|
||
|
PostGIS extension. I've used [Docker](https://www.docker.com/) to have a
|
||
|
disposable local instance to perform quick analysis.
|
||
|
|
||
|
```
|
||
|
docker run -d --name postgres -v "$(pwd):/mnt:ro" \
|
||
|
-e POSTGRES_USER=hrc -e POSTGRES_PASSWORD=insecurepassword \
|
||
|
-e POSTGRES_DB=hrc \
|
||
|
postgis/postgis:16-3.4
|
||
|
docker exec -ti postgres psql -U hrc -W
|
||
|
```
|
||
|
|
||
|
The instance is now started and we are connected.
|
||
|
|
||
|
## Import
|
||
|
|
||
|
The [COPY](https://www.postgresql.org/docs/current/sql-copy.html) command on
|
||
|
PostgreSQL can import CSV lines easily. We'll use the psql alias (`\copy`) to
|
||
|
send data directly through the client.
|
||
|
|
||
|
```
|
||
|
create table collection (
|
||
|
name text primary key,
|
||
|
lat numeric,
|
||
|
lon numeric
|
||
|
);
|
||
|
\copy collection (name, lat, lon) from '/mnt/collection_with_coordinates.csv' csv;
|
||
|
|
||
|
create table nodes (
|
||
|
name text primary key,
|
||
|
lat numeric,
|
||
|
lon numeric
|
||
|
);
|
||
|
\copy nodes (name, lat, lon) from '/mnt/nodes.csv' delimiter ',' csv;
|
||
|
```
|
||
|
|
||
|
## Correlation
|
||
|
|
||
|
The SQL query takes all the rows from the `collection` table and try to
|
||
|
find a row in the `nodes` table within 50 km based on coordinates.
|
||
|
|
||
|
```
|
||
|
select c.name as tshirt, n.name as restaurant,
|
||
|
round((ST_Distance(ST_Point(c.lon, c.lat), ST_Point(n.lon, n.lat), true)/1000)::numeric, 2)
|
||
|
as distance_km
|
||
|
from collection c
|
||
|
left join nodes n
|
||
|
on ST_DWithin(ST_Point(c.lon, c.lat), ST_Point(n.lon, n.lat), 50000, true)
|
||
|
order by c.name, distance_km;
|
||
|
```
|
||
|
|
||
|
The PostGIS functions used are:
|
||
|
|
||
|
* [ST_Point](https://postgis.net/docs/ST_Point.html) to create a point in space with coordinates
|
||
|
* [ST_Distance](https://postgis.net/docs/ST_Distance.html) to compute the distance between two points
|
||
|
* [ST_DWithin](https://postgis.net/docs/ST_DWithin.html) to filter only rows with a distance less or equal than the provided value
|
||
|
|
||
|
Result:
|
||
|
|
||
|
```
|
||
|
tshirt | restaurant | distance_km
|
||
|
------------------+--------------------------------------------+-------------
|
||
|
Amsterdam | Hard Rock Cafe Amsterdam | 1.38
|
||
|
Angkor | Hard Rock Cafe Phnom Penh | 0.75
|
||
|
Antwerp | Hard Rock Cafe Brussels | 41.84
|
||
|
Barcelona | Hard Rock Cafe Barcelona | 0.64
|
||
|
Berlin | Hard Rock Cafe Berlin | 4.32
|
||
|
Boston | (null) | (null)
|
||
|
Brussels | Hard Rock Cafe Brussels | 0.10
|
||
|
Detroit | (null) | (null)
|
||
|
Dublin | Hard Rock Cafe Dublin | 0.40
|
||
|
Hamburg | Hard Rock Cafe Hamburg | 2.36
|
||
|
Ho Chi Minh City | (null) | (null)
|
||
|
Hollywood | Hard Rock Cafe Hollywood on Hollywood Blvd | 1.07
|
||
|
Lisbon | Hard Rock Cafe Lisbon | 1.07
|
||
|
London | Hard Rock Cafe London | 1.66
|
||
|
London | Hard Rock Cafe London Piccadilly Circus | 2.40
|
||
|
Los Angeles | Hard Rock Cafe Hollywood on Hollywood Blvd | 10.46
|
||
|
Miami | Hard Rock Cafe Miami | 0.97
|
||
|
Miami | Hard Rock Cafe Hollywood FL | 30.83
|
||
|
Miami Gardens | Hard Rock Cafe Hollywood FL | 12.62
|
||
|
Miami Gardens | Hard Rock Cafe Miami | 19.26
|
||
|
New York | Hard Rock Cafe New York Times Square | 5.18
|
||
|
New York | Hard Rock Cafe Yankee Stadium | 14.53
|
||
|
Orlando | Hard Rock Cafe Orlando | 11.50
|
||
|
Oslo | (null) | (null)
|
||
|
Paris | Hard Rock Cafe Paris | 2.14
|
||
|
Prague | Hard Rock Cafe Prague | 3.59
|
||
|
San Francisco | Hard Rock Cafe San Francisco | 3.36
|
||
|
Singapore | Hard Rock Cafe Singapore | 5.74
|
||
|
Singapore | Hard Rock Cafe Changi Airport Singapore | 18.88
|
||
|
Singapore | Hard Rock Cafe Puteri Harbour | 19.33
|
||
|
Yankee Stadium | Hard Rock Cafe Yankee Stadium | 0.14
|
||
|
Yankee Stadium | Hard Rock Cafe New York Times Square | 9.53
|
||
|
```
|
||
|
|
||
|
We can identify multiple patterns here:
|
||
|
|
||
|
* exact match
|
||
|
* closed restaurants (Boston, Detroit, Ho Chi Minh City, Oslo)
|
||
|
* multiple restaurants (Miami, Miami Gardens, Singapore, New York, Yankee Stadium)
|
||
|
* multiple T-Shirts (Miami, Los Angeles)
|
||
|
* wrong match (Angkor, Antwerp)
|
||
|
* missed opportunities (Hollywood FL, Piccadilly Circus)
|
||
|
|
||
|
I've created a [script](/hrc/update.sql) to update the names in the
|
||
|
`collection` table to match the names in the `nodes` tables to join them by names
|
||
|
instead of the location.
|
||
|
|
||
|
# Next locations
|
||
|
|
||
|
The last step of the exploration is to find Hard Rock Cafe locations within a
|
||
|
reasonable distance from home (1000 Km). As I don't want to disclose the exact
|
||
|
position, we'll search for "Belgium". The country is quite small so that should
|
||
|
not be an issue.
|
||
|
|
||
|
```
|
||
|
$ curl -A "Hard Rock Cafe Blog Post From Julien Riou" \
|
||
|
-sL "https://nominatim.openstreetmap.org/search?limit=1&format=json&q=Belgium" | \
|
||
|
jq -r '.[0].name,.[0].lon,.[0].lat'
|
||
|
België / Belgique / Belgien
|
||
|
4.6667145
|
||
|
50.6402809
|
||
|
```
|
||
|
|
||
|
The query to search for shops that I don't have already visited looks like
|
||
|
this:
|
||
|
|
||
|
```
|
||
|
select n.name,
|
||
|
round((ST_Distance(ST_Point(n.lon, n.lat), ST_Point(4.6667145, 50.6402809), true)/1000)::numeric, 2)
|
||
|
as distance_km
|
||
|
from nodes n
|
||
|
left join collection c
|
||
|
on n.name = c.name
|
||
|
where c.name is null
|
||
|
and ST_Distance(ST_Point(n.lon, n.lat), ST_Point(4.6667145, 50.6402809), true)/1000 < 1000
|
||
|
order by distance_km;
|
||
|
```
|
||
|
|
||
|
The final result:
|
||
|
|
||
|
```
|
||
|
name | distance_km
|
||
|
-----------------------------------------+-------------
|
||
|
Hard Rock Cafe Cologne | 164.87
|
||
|
Hard Rock Cafe London Piccadilly Circus | 349.99
|
||
|
Hard Rock Cafe Manchester | 569.37
|
||
|
Hard Rock Cafe Munich | 573.42
|
||
|
Hard Rock Cafe Innsbruck | 618.90
|
||
|
Hard Rock Cafe Newcastle | 640.64
|
||
|
Hard Rock Cafe Milan | 666.41
|
||
|
Hard Rock Cafe Copenhagen | 769.51
|
||
|
Hard Rock Cafe Edinburgh | 789.31
|
||
|
Hard Rock Cafe Venice | 812.96
|
||
|
Hard Rock Cafe Wroclaw | 870.89
|
||
|
Hard Rock Cafe Vienna | 890.23
|
||
|
Hard Rock Cafe Florence | 911.37
|
||
|
Hard Rock Cafe Gothenburg | 918.32
|
||
|
Hard Rock Cafe Andorra | 935.20
|
||
|
```
|
||
|
|
||
|
# Conclusion
|
||
|
|
||
|
According to this study, my next vacations or conferences should take place in
|
||
|
Germany or UK. A perfect opportunity to go to
|
||
|
[PGConf.DE](https://2024.pgconf.de/) and [PGDay UK](https://pgday.uk/)!
|