Archived
1
0
Fork 0
Send notifications when products become available
This repository has been archived on 2024-12-18. You can view files and clone it, but cannot push or open issues or pull requests.
Find a file
Julien Riou e67ab63ca8
Prepare for new parsers
- Rename "Parser" to "URLParser"
- Make "Parse" function generic
- Rename "crawlShop" function to "handleProducts"
- Reduce "handleProducts" footprint a little bit

Signed-off-by: Julien Riou <julien@riou.xyz>
2021-03-23 09:00:10 +01:00
.gitignore Release 0.2.0 2021-02-27 08:10:43 +01:00
.pre-commit-config.yaml Release 0.2.0 2021-02-27 08:10:43 +01:00
config.go Configure headless browser address 2021-03-01 13:11:58 +01:00
Dockerfile Add Dockerfile 2021-03-09 10:59:26 +01:00
go.mod Release 0.2.0 2021-02-27 08:10:43 +01:00
go.sum Release 0.2.0 2021-02-27 08:10:43 +01:00
LICENSE Initial commit 2020-12-27 18:23:36 +01:00
main.go Prepare for new parsers 2021-03-23 09:00:10 +01:00
Makefile Release 0.2.0 2021-02-27 08:10:43 +01:00
models.go Release 0.2.0 2021-02-27 08:10:43 +01:00
notifier.go Release 0.2.0 2021-02-27 08:10:43 +01:00
parser_url.go Prepare for new parsers 2021-03-23 09:00:10 +01:00
parser_url_test.go Prepare for new parsers 2021-03-23 09:00:10 +01:00
pid.go Release 0.2.0 2021-02-27 08:10:43 +01:00
README.md Add Dockerfile 2021-03-09 10:59:26 +01:00
twitter.go Update the Twitter reply 2021-03-12 12:01:07 +01:00
twitter_test.go Remove useless comment in Twitter tests 2021-03-02 09:01:18 +01:00
utils.go Remove useless compileRegex function 2021-02-27 15:14:21 +01:00
VERSION Release 0.2.3 2021-03-23 07:58:44 +01:00

RestockBot

Year 2020 has been quite hard for hardware supply. Graphics cards are out of stock everywhere. Nobody can grab the new generation (AMD RX 6000 series, NVIDIA GeForce RTX 3000 series). Even older generations are hard to find. RestockBot is a bot that crawl retailers websites and notify when a product is available.

Requirements

Headless browser

Use Docker:

docker run --name chromium --rm -d -p 9222:9222 montferret/chromium

Or get inspired by the source code to run it on your own.

Twitter (optional)

Follow this procedure to generate all the required settings:

  • consumer_key
  • consumer_secret
  • access_token
  • access_token_secret

Compilation

With pre-built binaries

Download the latest release.

Ensure checksums are identical.

With make

Clone the repository:

git clone https://github.com/jouir/restockbot.git

Build the restockbot binary:

make build
ls -l bin/restockbot

Build with the architecture in the binary name:

make release

Eventually remove produced binaries with:

make clean

With Docker

docker image build -t restockbot:$(cat VERSION) .

Configuration

Default file is restockbot.json in the current directory. The file name can be passed with the -config argument.

Options:

  • urls: list of retailers web pages
  • twitter (optional):
    • consumer_key: API key of your Twitter application
    • consumer_secret: API secret of your Twitter application
    • access_token: authentication token generated for your Twitter account
    • access_token_secret: authentication token secret generated for your Twitter account
    • hashtags: list of key/value used to append hashtags to each tweet. Key is the pattern to match in the product name, value is the string to append to the tweet. For example, {"twitter": {"hashtags": [{"rtx 3090": "#nvidia #rtx3090"}]}} will detect rtx 3090 to append #nvidia #rtx3090 at the end of the tweet.
  • include_regex (optional): include products with a name matching this regexp
  • exclude_regex (optional): exclude products with a name matching this regexp
  • browser_address (optional): set headless browser address (ex: http://127.0.0.1:9222)

Usage

With binary

restockbot -help

With Docker

docker run -it --name restockbot --rm --link chromium:chromium -v $(pwd):/root/ restockbot:$(cat VERSION) restockbot -help

How to contribute

Lint the code with pre-commit:

docker run -it -v $(pwd):/mnt/ --rm golang:latest bash
go get -u golang.org/x/lint/golint
apt-get update && apt-get upgrade -y && apt-get install -y git python3-pip
pip3 install pre-commit
cd /mnt
pre-commit run --all-files

How to parse a shop

Create the Ferret query

RestockBot uses Ferret and its FQL (Ferret Query Language) to parse websites. The full documentation is available here. Once installed, this library can be used as a CLI command or embedded in the application. To create the query, we can use the CLI for fast iterations, then we'll integrate the query in RestockBot later.

vim shop.fql
ferret --cdp http://127.0.0.1:9222 -time shop.fql

The query must return a list of products in JSON format with the following elements:

  • name: string
  • url: string
  • price: float
  • price_currency: string
  • available: boolean

Example:

[
  {
    "available": false,
    "name": "Zotac GeForce RTX 3070 AMP Holo",
    "price": 799.99,
    "price_currency": "EUR",
    "url": "https://www.topachat.com/pages/detail2_cat_est_micro_puis_rubrique_est_wgfx_pcie_puis_ref_est_in20007322.html"
  },
  {
    "available": false,
    "name": "Asus GeForce RTX 3070 DUAL 8G",
    "price": 739.99,
    "price_currency": "EUR",
    "url": "https://www.topachat.com/pages/detail2_cat_est_micro_puis_rubrique_est_wgfx_pcie_puis_ref_est_in20005540.html"
  },
  {
    "available": false,
    "name": "Palit GeForce RTX 3070 GamingPro OC",
    "price": 819.99,
    "price_currency": "EUR",
    "url": "https://www.topachat.com/pages/detail2_cat_est_micro_puis_rubrique_est_wgfx_pcie_puis_ref_est_in20005819.html"
  }
]

RestockBot will convert this JSON to a list of Product.

Embed the query

Shops are configured as a list of URLs:

{
    "urls": [
        "https://www.topachat.com/pages/produits_cat_est_micro_puis_rubrique_est_wgfx_pcie_puis_f_est_58-11447,11445,11446,11559,11558.html",
        "https://www.ldlc.com/informatique/pieces-informatique/carte-graphique-interne/c4684/+fv121-19183,19184,19185,19339,19340.html",
        "https://www.materiel.net/carte-graphique/l426/+fv121-19183,19184,19185,19339,19340/"
    ]
}

The Parse function (parser.go) will be called. In this example, the following shop names will be deduced: topachat.com, ldlc.com and materiel.net.

Each shop should implement a function to create a ferret query based on an URL:

  • func createQueryForLDLC(url string) string
  • func createQueryForMaterielNet(url string) string
  • func createQueryForTopachat(url string) string
  • ...

This function should be added to the switch of the createQuery function (parser.go).

Products will then be parsed.

Disclaimer

Crawling a website should be used with caution. Please check with retailers if the bot respects the terms of use for their websites. Authors of the bot are not responsible of the bot usage.