12 KiB
title | date | categories | |
---|---|---|---|
PostGIS and the Hard Rock Cafe collection | 2024-05-27T18:00:00+02:00 |
|
In 2007, I went to the west coast of the United States where I visited a Hard Rock Cafe store in Hollywood CA and bought the first T-Shirt of my collection. Now, I'm an open-source DBA and my number one database is PostgreSQL. So I've decided to use the PostGIS extension to plan my next vacations or conferences to buy more T-Shirts.
The collection
Inventory
Let's do the inventory of my collected T-Shirts by looking at my closet.
The titles represent a location, generally a city but it could be a known place like a stadium. This will be enough to find the related shop in a database. Titles can be written into a file:
Dublin
Los Angeles
Prague
London
Coordinates
The next step is to add coordinates. This information can be found by querying the Nominatim API based on the OpenStreetMap community-driven project.
#!/usr/bin/env python3
import requests
import csv
import time
if __name__ == "__main__":
headers = {"User-Agent": "Hard Rock Cafe Blog Post From Julien Riou"}
session = requests.Session()
session.headers.update(headers)
with open("collection_with_coordinates.csv", "w") as dest:
writer = csv.writer(dest)
with open("collection.csv", "r") as source:
for row in csv.reader(source):
name = row[0]
r = session.get(f"https://nominatim.openstreetmap.org/search?q={name}&limit=1&format=json")
time.sleep(1)
r.raise_for_status()
data = r.json()
if len(data) == 1:
data = data[0]
writer.writerow([name, data["lon"], data["lat"]])
else:
print(f"Location not found for {title}, skipping")
The Python script iterates over the collection.csv
file to query the
Nominatim API to find the most relevent OpenStreetMap node then writes
coordinates in the collection_with_coordinates.csv
file using the
coma-separated values (CSV) format.
Dublin,53.3493795,-6.2605593
Los Angeles,34.0536909,-118.242766
Prague,50.0596288,14.446459273258009
London,51.4893335,-0.14405508452768728
(collection_with_coordinates.csv)
The shops (or "nodes")
Now we need a complete list of Hard Rock Cafe locations with their coordinates to match the collection.
OpenStreetMap
My first idea was to use OpenStreetMap that should provide the needed dataset. I tried to use the Nominatim API but the queries are limited to 40 results. I could download the entire dataset, import it in a PostgreSQL instance locally but it would have been space and time consuming. So, I used the Overpass API with this query (thanks Andreas). In the end, the quality of the data was not satisfying. The amenity was restaurant, bar, pub, cafe, nightclub or shop. The name had an accent ("é") or not ("e"). Sometimes, the brand was not reported. Even with all those filters, there was a node that was not even a Hard Rock Cafe. The more the query grew, the more I wanted to use another method.
Website
I decided to parse the official website. By using a well-known library like Selenium or ferret? Given the personal time I had for this project, I've chosen the quick and dirty path. Let me present you the ugly but functional one-liner to parse the official Hard Rock Cafe website:
curl -sL https://cafe.hardrock.com/locations.aspx | \
grep 'var currentMapPoint=' | \
sed "s/.*{'title':/{'title':/g;s/,'description.*/}/g;s/'/\"/g" | \
sed 's/{"title"://g;s/"lat":/"/g;s/,"lng":/","/g;s/}/"/g' \
> nodes.csv
Very ugly, not future-proof, but it did the job.
"Hard Rock Cafe Amsterdam","52.36211","4.88298"
"Hard Rock Cafe Andorra","42.507707","1.531977"
"Hard Rock Cafe Angkor","13.35314","103.85676"
"Hard Rock Cafe Asuncion","-25.2896910","-57.5737599"
Data exploration
The tool of choice to import and analyze this data is PostgreSQL and its PostGIS extension. I've used Docker to have a disposable local instance to perform quick analysis.
docker run -d --name postgres -v "$(pwd):/mnt:ro" \
-e POSTGRES_USER=hrc -e POSTGRES_PASSWORD=insecurepassword \
-e POSTGRES_DB=hrc \
postgis/postgis:16-3.4
docker exec -ti postgres psql -U hrc -W
The instance is now started and we are connected.
Import
The COPY command on
PostgreSQL can import CSV lines easily. We'll use the psql alias (\copy
) to
send data directly through the client.
create table collection (
name text primary key,
lat numeric,
lon numeric
);
\copy collection (name, lat, lon) from '/mnt/collection_with_coordinates.csv' csv;
create table nodes (
name text primary key,
lat numeric,
lon numeric
);
\copy nodes (name, lat, lon) from '/mnt/nodes.csv' delimiter ',' csv;
Correlation
The SQL query takes all the rows from the collection
table and try to
find a row in the nodes
table within 50 km based on coordinates.
select c.name as tshirt, n.name as restaurant,
round((ST_Distance(ST_Point(c.lon, c.lat), ST_Point(n.lon, n.lat), true)/1000)::numeric, 2)
as distance_km
from collection c
left join nodes n
on ST_DWithin(ST_Point(c.lon, c.lat), ST_Point(n.lon, n.lat), 50000, true)
order by c.name, distance_km;
The PostGIS functions used are:
- ST_Point to create a point in space with coordinates
- ST_Distance to compute the distance between two points
- ST_DWithin to filter only rows with a distance less or equal than the provided value
Result:
tshirt | restaurant | distance_km
------------------+--------------------------------------------+-------------
Amsterdam | Hard Rock Cafe Amsterdam | 1.38
Angkor | Hard Rock Cafe Phnom Penh | 0.75
Antwerp | Hard Rock Cafe Brussels | 41.84
Barcelona | Hard Rock Cafe Barcelona | 0.64
Berlin | Hard Rock Cafe Berlin | 4.32
Boston | (null) | (null)
Brussels | Hard Rock Cafe Brussels | 0.10
Detroit | (null) | (null)
Dublin | Hard Rock Cafe Dublin | 0.40
Hamburg | Hard Rock Cafe Hamburg | 2.36
Ho Chi Minh City | (null) | (null)
Hollywood | Hard Rock Cafe Hollywood on Hollywood Blvd | 1.07
Lisbon | Hard Rock Cafe Lisbon | 1.07
London | Hard Rock Cafe London | 1.66
London | Hard Rock Cafe London Piccadilly Circus | 2.40
Los Angeles | Hard Rock Cafe Hollywood on Hollywood Blvd | 10.46
Miami | Hard Rock Cafe Miami | 0.97
Miami | Hard Rock Cafe Hollywood FL | 30.83
Miami Gardens | Hard Rock Cafe Hollywood FL | 12.62
Miami Gardens | Hard Rock Cafe Miami | 19.26
New York | Hard Rock Cafe New York Times Square | 5.18
New York | Hard Rock Cafe Yankee Stadium | 14.53
Orlando | Hard Rock Cafe Orlando | 11.50
Oslo | (null) | (null)
Paris | Hard Rock Cafe Paris | 2.14
Prague | Hard Rock Cafe Prague | 3.59
San Francisco | Hard Rock Cafe San Francisco | 3.36
Singapore | Hard Rock Cafe Singapore | 5.74
Singapore | Hard Rock Cafe Changi Airport Singapore | 18.88
Singapore | Hard Rock Cafe Puteri Harbour | 19.33
Yankee Stadium | Hard Rock Cafe Yankee Stadium | 0.14
Yankee Stadium | Hard Rock Cafe New York Times Square | 9.53
We can identify multiple patterns here:
- exact match
- closed restaurants (Boston, Detroit, Ho Chi Minh City, Oslo)
- multiple restaurants (Miami, Miami Gardens, Singapore, New York, Yankee Stadium)
- multiple T-Shirts (Miami, Los Angeles)
- wrong match (Angkor, Antwerp)
- missed opportunities (Hollywood FL, Piccadilly Circus)
I've created a script to update the names in the
collection
table to match the names in the nodes
tables to join them by names
instead of the location.
Next locations
The last step of the exploration is to find Hard Rock Cafe locations within a reasonable distance from home (1000 Km). As I don't want to disclose the exact position, we'll search for "Belgium". The country is quite small so that should not be an issue.
$ curl -A "Hard Rock Cafe Blog Post From Julien Riou" \
-sL "https://nominatim.openstreetmap.org/search?limit=1&format=json&q=Belgium" | \
jq -r '.[0].name,.[0].lon,.[0].lat'
België / Belgique / Belgien
4.6667145
50.6402809
The query to search for shops that I don't have already visited looks like this:
select n.name,
round((ST_Distance(ST_Point(n.lon, n.lat), ST_Point(4.6667145, 50.6402809), true)/1000)::numeric, 2)
as distance_km
from nodes n
left join collection c
on n.name = c.name
where c.name is null
and ST_Distance(ST_Point(n.lon, n.lat), ST_Point(4.6667145, 50.6402809), true)/1000 < 1000
order by distance_km;
The final result:
name | distance_km
-----------------------------------------+-------------
Hard Rock Cafe Cologne | 164.87
Hard Rock Cafe London Piccadilly Circus | 349.99
Hard Rock Cafe Manchester | 569.37
Hard Rock Cafe Munich | 573.42
Hard Rock Cafe Innsbruck | 618.90
Hard Rock Cafe Newcastle | 640.64
Hard Rock Cafe Milan | 666.41
Hard Rock Cafe Copenhagen | 769.51
Hard Rock Cafe Edinburgh | 789.31
Hard Rock Cafe Venice | 812.96
Hard Rock Cafe Wroclaw | 870.89
Hard Rock Cafe Vienna | 890.23
Hard Rock Cafe Florence | 911.37
Hard Rock Cafe Gothenburg | 918.32
Hard Rock Cafe Andorra | 935.20
Conclusion
According to this study, my next vacations or conferences should take place in Germany or UK. A perfect opportunity to go to PGConf.DE and PGDay UK!