--- title: PostGIS and the Hard Rock Cafe collection date: 2024-05-27T18:00:00+02:00 categories: - postgresql --- In 2007, I went to the west coast of the United States where I visited a Hard Rock Cafe store in Hollywood CA and bought the first T-Shirt of my collection. Now, I'm an open-source DBA and my number one database is [PostgreSQL](https://www.postgresql.org/). So I've decided to use the [PostGIS](https://postgis.net/) extension to plan my next vacations or conferences to buy more T-Shirts. # The collection ## Inventory Let's do the inventory of my collected T-Shirts by looking at my closet. ![](/hrc/t-shirts.jpg) The titles represent a location, generally a city but it could be a known place like a stadium. This will be enough to find the related shop in a database. Titles can be written into a file: ``` Dublin Los Angeles Prague London ``` ([collection.csv](/hrc/collection.csv)) ## Coordinates The next step is to add coordinates. This information can be found by querying the [Nominatim](https://nominatim.org/) API based on the [OpenStreetMap](https://www.openstreetmap.org) community-driven project. ```python #!/usr/bin/env python3 import requests import csv import time if __name__ == "__main__": headers = {"User-Agent": "Hard Rock Cafe Blog Post From Julien Riou"} session = requests.Session() session.headers.update(headers) with open("collection_with_coordinates.csv", "w") as dest: writer = csv.writer(dest) with open("collection.csv", "r") as source: for row in csv.reader(source): name = row[0] r = session.get(f"https://nominatim.openstreetmap.org/search?q={name}&limit=1&format=json") time.sleep(1) r.raise_for_status() data = r.json() if len(data) == 1: data = data[0] writer.writerow([name, data["lon"], data["lat"]]) else: print(f"Location not found for {title}, skipping") ``` ([collection.py](/hrc/collection.py)) The Python script iterates over the `collection.csv` file to query the Nominatim API to find the most relevent OpenStreetMap node then writes coordinates in the `collection_with_coordinates.csv` file using the coma-separated values (CSV) format. ``` Dublin,53.3493795,-6.2605593 Los Angeles,34.0536909,-118.242766 Prague,50.0596288,14.446459273258009 London,51.4893335,-0.14405508452768728 ``` ([collection_with_coordinates.csv](/hrc/collection_with_coordinates.csv)) # The shops (or "nodes") Now we need a complete list of Hard Rock Cafe locations with their coordinates to match the collection. ## OpenStreetMap My first idea was to use OpenStreetMap that should provide the needed dataset. I tried to use the Nominatim API but the queries are [limited to 40 results](https://nominatim.org/release-docs/latest/api/Search/#parameters). I could [download the entire dataset](https://wiki.openstreetmap.org/wiki/Downloading_data), import it in a [PostgreSQL instance](https://osm2pgsql.org/) locally but it would have been space and time consuming. So, I used the [Overpass API](https://wiki.openstreetmap.org/wiki/Overpass_API) with [this query](https://overpass-turbo.eu/s/1LwQ) (thanks [Andreas](https://andreas.scherbaum.la/)). In the end, the quality of the data was not satisfying. The amenity was restaurant, bar, pub, cafe, nightclub or shop. The name had an accent ("é") or not ("e"). Sometimes, the brand was not reported. Even with all those filters, there was a [node](https://www.openstreetmap.org/node/6260098710) that was not even a Hard Rock Cafe. The more the query grew, the more I wanted to use another method. ## Website I decided to parse the official website. By using a well-known library like [Selenium](https://selenium-python.readthedocs.io/) or [ferret](https://www.montferret.dev/)? Given the personal time I had for this project, I've chosen the quick and dirty path. Let me present you the ugly but functional one-liner to parse the official Hard Rock Cafe website: ``` curl -sL https://cafe.hardrock.com/locations.aspx | \ grep 'var currentMapPoint=' | \ sed "s/.*{'title':/{'title':/g;s/,'description.*/}/g;s/'/\"/g" | \ sed 's/{"title"://g;s/"lat":/"/g;s/,"lng":/","/g;s/}/"/g' \ > nodes.csv ``` Very ugly, not future-proof, but it did the job. ``` "Hard Rock Cafe Amsterdam","52.36211","4.88298" "Hard Rock Cafe Andorra","42.507707","1.531977" "Hard Rock Cafe Angkor","13.35314","103.85676" "Hard Rock Cafe Asuncion","-25.2896910","-57.5737599" ``` ([nodes.csv](/hrc/nodes.csv)) # Data exploration The tool of choice to import and analyze this data is PostgreSQL and its PostGIS extension. I've used [Docker](https://www.docker.com/) to have a disposable local instance to perform quick analysis. ``` docker run -d --name postgres -v "$(pwd):/mnt:ro" \ -e POSTGRES_USER=hrc -e POSTGRES_PASSWORD=insecurepassword \ -e POSTGRES_DB=hrc \ postgis/postgis:16-3.4 docker exec -ti postgres psql -U hrc -W ``` The instance is now started and we are connected. ## Import The [COPY](https://www.postgresql.org/docs/current/sql-copy.html) command on PostgreSQL can import CSV lines easily. We'll use the psql alias (`\copy`) to send data directly through the client. ``` create table collection ( name text primary key, lat numeric, lon numeric ); \copy collection (name, lat, lon) from '/mnt/collection_with_coordinates.csv' csv; create table nodes ( name text primary key, lat numeric, lon numeric ); \copy nodes (name, lat, lon) from '/mnt/nodes.csv' delimiter ',' csv; ``` ## Correlation The SQL query takes all the rows from the `collection` table and try to find a row in the `nodes` table within 50 km based on coordinates. ``` select c.name as tshirt, n.name as restaurant, round((ST_Distance(ST_Point(c.lon, c.lat), ST_Point(n.lon, n.lat), true)/1000)::numeric, 2) as distance_km from collection c left join nodes n on ST_DWithin(ST_Point(c.lon, c.lat), ST_Point(n.lon, n.lat), 50000, true) order by c.name, distance_km; ``` The PostGIS functions used are: * [ST_Point](https://postgis.net/docs/ST_Point.html) to create a point in space with coordinates * [ST_Distance](https://postgis.net/docs/ST_Distance.html) to compute the distance between two points * [ST_DWithin](https://postgis.net/docs/ST_DWithin.html) to filter only rows with a distance less or equal than the provided value Result: ``` tshirt | restaurant | distance_km ------------------+--------------------------------------------+------------- Amsterdam | Hard Rock Cafe Amsterdam | 1.38 Angkor | Hard Rock Cafe Phnom Penh | 0.75 Antwerp | Hard Rock Cafe Brussels | 41.84 Barcelona | Hard Rock Cafe Barcelona | 0.64 Berlin | Hard Rock Cafe Berlin | 4.32 Boston | (null) | (null) Brussels | Hard Rock Cafe Brussels | 0.10 Detroit | (null) | (null) Dublin | Hard Rock Cafe Dublin | 0.40 Hamburg | Hard Rock Cafe Hamburg | 2.36 Ho Chi Minh City | (null) | (null) Hollywood | Hard Rock Cafe Hollywood on Hollywood Blvd | 1.07 Lisbon | Hard Rock Cafe Lisbon | 1.07 London | Hard Rock Cafe London | 1.66 London | Hard Rock Cafe London Piccadilly Circus | 2.40 Los Angeles | Hard Rock Cafe Hollywood on Hollywood Blvd | 10.46 Miami | Hard Rock Cafe Miami | 0.97 Miami | Hard Rock Cafe Hollywood FL | 30.83 Miami Gardens | Hard Rock Cafe Hollywood FL | 12.62 Miami Gardens | Hard Rock Cafe Miami | 19.26 New York | Hard Rock Cafe New York Times Square | 5.18 New York | Hard Rock Cafe Yankee Stadium | 14.53 Orlando | Hard Rock Cafe Orlando | 11.50 Oslo | (null) | (null) Paris | Hard Rock Cafe Paris | 2.14 Prague | Hard Rock Cafe Prague | 3.59 San Francisco | Hard Rock Cafe San Francisco | 3.36 Singapore | Hard Rock Cafe Singapore | 5.74 Singapore | Hard Rock Cafe Changi Airport Singapore | 18.88 Singapore | Hard Rock Cafe Puteri Harbour | 19.33 Yankee Stadium | Hard Rock Cafe Yankee Stadium | 0.14 Yankee Stadium | Hard Rock Cafe New York Times Square | 9.53 ``` We can identify multiple patterns here: * exact match * closed restaurants (Boston, Detroit, Ho Chi Minh City, Oslo) * multiple restaurants (Miami, Miami Gardens, Singapore, New York, Yankee Stadium) * multiple T-Shirts (Miami, Los Angeles) * wrong match (Angkor, Antwerp) * missed opportunities (Hollywood FL, Piccadilly Circus) I've created a [script](/hrc/update.sql) to update the names in the `collection` table to match the names in the `nodes` tables to join them by names instead of the location. # Next locations The last step of the exploration is to find Hard Rock Cafe locations within a reasonable distance from home (1000 Km). As I don't want to disclose the exact position, we'll search for "Belgium". The country is quite small so that should not be an issue. ``` $ curl -A "Hard Rock Cafe Blog Post From Julien Riou" \ -sL "https://nominatim.openstreetmap.org/search?limit=1&format=json&q=Belgium" | \ jq -r '.[0].name,.[0].lon,.[0].lat' België / Belgique / Belgien 4.6667145 50.6402809 ``` The query to search for shops that I don't have already visited looks like this: ``` select n.name, round((ST_Distance(ST_Point(n.lon, n.lat), ST_Point(4.6667145, 50.6402809), true)/1000)::numeric, 2) as distance_km from nodes n left join collection c on n.name = c.name where c.name is null and ST_Distance(ST_Point(n.lon, n.lat), ST_Point(4.6667145, 50.6402809), true)/1000 < 1000 order by distance_km; ``` The final result: ``` name | distance_km -----------------------------------------+------------- Hard Rock Cafe Cologne | 164.87 Hard Rock Cafe London Piccadilly Circus | 349.99 Hard Rock Cafe Manchester | 569.37 Hard Rock Cafe Munich | 573.42 Hard Rock Cafe Innsbruck | 618.90 Hard Rock Cafe Newcastle | 640.64 Hard Rock Cafe Milan | 666.41 Hard Rock Cafe Copenhagen | 769.51 Hard Rock Cafe Edinburgh | 789.31 Hard Rock Cafe Venice | 812.96 Hard Rock Cafe Wroclaw | 870.89 Hard Rock Cafe Vienna | 890.23 Hard Rock Cafe Florence | 911.37 Hard Rock Cafe Gothenburg | 918.32 Hard Rock Cafe Andorra | 935.20 ``` # Conclusion According to this study, my next vacations or conferences should take place in Germany or UK. A perfect opportunity to go to [PGConf.DE](https://2024.pgconf.de/) and [PGDay UK](https://pgday.uk/)!