Release 0.2.0
- new language: go - new shops: cybertek.fr, mediamarkt.ch - deprecated shops: alternate.be, minershop.eu - improved database transaction management - better web parsing library (ferret, requires headless chrome browser) - include or exclude products by applying regex on their names - check for PID file to avoid running the bot twice - hastags are now configurable Signed-off-by: Julien Riou <julien@riou.xyz>
This commit is contained in:
parent
31bcbc51dd
commit
3a4aba93e5
26 changed files with 1376 additions and 1076 deletions
212
README.md
212
README.md
|
@ -1,88 +1,168 @@
|
|||
Year 2020 has been quite hard for hardware supply. Graphics Cards are out of stock everywhere. Nobody can grab the
|
||||
new generation (AMD RX 6000 series, NVIDIA GeForce RTX 3000 series). Even older generations are hard to find.
|
||||
**GraphicRestock** is a bot that crawl retailers websites and notify when a product is available.
|
||||
# RestockBot
|
||||
|
||||
# Setup
|
||||
Year 2020 has been quite hard for hardware supply. Graphics cards are out of stock everywhere. Nobody can grab the new generation (AMD RX 6000 series, NVIDIA GeForce RTX 3000 series). Even older generations are hard to find. `RestockBot` is a bot that crawl retailers websites and notify when a product is available.
|
||||
|
||||
Based on Debian 10:
|
||||
## Requirements
|
||||
|
||||
### Headless browser
|
||||
|
||||
Use Docker:
|
||||
|
||||
```
|
||||
apt install python3-selenium python3-sqlalchemy python3-tweepy python3-bs4 firefox-esr
|
||||
curl -L -s https://github.com/mozilla/geckodriver/releases/download/v0.28.0/geckodriver-v0.28.0-linux64.tar.gz | tar xvpzf - -C /usr/local/bin/
|
||||
chown root:root /usr/local/bin/geckodriver
|
||||
chmod +x /usr/local/bin/geckodriver
|
||||
docker run --name chromium --rm -d -p 9222:9222 montferret/chromium
|
||||
```
|
||||
|
||||
# Configure
|
||||
Or get inspired by the [source code](https://github.com/MontFerret/chromium) to run it on your own.
|
||||
|
||||
Configuration file example can be found [here](config.json.example).
|
||||
### Twitter (optional)
|
||||
|
||||
Follow [this procedure](https://github.com/jouir/twitter-login) to generate all the required settings:
|
||||
* `consumer_key`
|
||||
* `consumer_secret`
|
||||
* `access_token`
|
||||
* `access_token_secret`
|
||||
|
||||
## Installation
|
||||
|
||||
Download the latest [release](https://github.com/jouir/restockbot/releases).
|
||||
|
||||
Ensure checksums are identical.
|
||||
|
||||
Then execute the binary:
|
||||
|
||||
```
|
||||
./restockbot -version
|
||||
./restockbot -help
|
||||
```
|
||||
|
||||
## Compilation
|
||||
|
||||
Clone the repository:
|
||||
```
|
||||
git clone https://github.com/jouir/restockbot.git
|
||||
```
|
||||
|
||||
Build the `restockbot` binary:
|
||||
```
|
||||
make build
|
||||
ls -l bin/restockbot
|
||||
```
|
||||
|
||||
Build with the architecture in the binary name:
|
||||
|
||||
```
|
||||
make release
|
||||
```
|
||||
|
||||
Eventually remove produced binaries with:
|
||||
|
||||
```
|
||||
make clean
|
||||
```
|
||||
|
||||
## Configuration
|
||||
|
||||
Default file is `restockbot.json` in the current directory. The file name can be passed with the `-config` argument.
|
||||
|
||||
Options:
|
||||
* **twitter.consumer_key**: key of your Twitter application
|
||||
* **twitter.consumer_secret**: secret of your Twitter application
|
||||
* **twitter.access_token**: authentication token generated by [twitter_auth.py](twitter_auth.py)
|
||||
* **twitter.access_token_secret**: authentication token secret generated by [twitter_auth.py](twitter_auth.py)
|
||||
* **urls**: list of retailers web pages (they need to respect crawlers' format)
|
||||
* **executable_path** (optional): path to selenium driver (firefox/gecko browser)
|
||||
|
||||
* `urls`: list of retailers web pages
|
||||
* `twitter` (optional):
|
||||
* `consumer_key`: API key of your Twitter application
|
||||
* `consumer_secret`: API secret of your Twitter application
|
||||
* `access_token`: authentication token generated for your Twitter account
|
||||
* `access_token_secret`: authentication token secret generated for your Twitter account
|
||||
* `hashtags`: map of key/values used to append hashtags to each tweet. Key is the pattern to match in the product name, value is the string to append to the tweet. For example, `{"twitter": {"hashtags": {"rtx 3090": "#nvidia #rtx3090"}}}` will detect `rtx 3090` to append `#nvidia #rtx3090` at the end of the tweet.
|
||||
* `include_regex` (optional): include products with a name matching this regexp
|
||||
* `exclude_regex` (optional): exclude products with a name matching this regexp
|
||||
|
||||
# Twitter authentication
|
||||
## How to contribute
|
||||
|
||||
Create a configuration file with **twitter.consumer_key** and **twitter.consumer_secret** parameters.
|
||||
|
||||
Then authenticate:
|
||||
Lint the code with pre-commit:
|
||||
|
||||
```
|
||||
python3 twitter_auth.py
|
||||
```
|
||||
|
||||
You will have to open the URL and authenticate:
|
||||
|
||||
```
|
||||
Please go to https://api.twitter.com/oauth/authorize?oauth_token=****
|
||||
```
|
||||
Click on **Authorize app**. A verifier code will be shown. Go back to your console and enter the code.
|
||||
|
||||
```
|
||||
Verifier:*******
|
||||
```
|
||||
|
||||
Tokens will be created:
|
||||
|
||||
```
|
||||
access_token = *****
|
||||
access_token_secret = ****
|
||||
```
|
||||
|
||||
Finally, write them to configuration file in **twitter.access_token** and **twitter.access_token_secret** parameters.
|
||||
|
||||
|
||||
# Usage
|
||||
|
||||
```
|
||||
python3 main.py --help
|
||||
```
|
||||
|
||||
# How to contribute
|
||||
|
||||
First things first, check issues to ensure the feature or bug you are facing is not already declared.
|
||||
|
||||
Pull requests are highly appreciated.
|
||||
|
||||
Please lint your code:
|
||||
|
||||
```
|
||||
docker run -it -v $(pwd):/mnt/ --rm debian:10 bash
|
||||
apt-get update && apt-get upgrade -y && apt-get install -y python3-pip git
|
||||
docker run -it -v $(pwd):/mnt/ --rm golang:latest bash
|
||||
go get -u golang.org/x/lint/golint
|
||||
apt-get update && apt-get upgrade -y && apt-get install -y git python3-pip
|
||||
pip3 install pre-commit
|
||||
cd /mnt
|
||||
pre-commit run --all-files
|
||||
```
|
||||
|
||||
Happy coding!
|
||||
## How to parse a shop
|
||||
|
||||
### Create the Ferret query
|
||||
|
||||
# Disclaimer
|
||||
`RestockBot` uses [Ferret](https://github.com/MontFerret/ferret) and its FQL (Ferret Query Language) to parse websites. The full documentation is available [here](https://www.montferret.dev/docs/introduction/). Once installed, this library can be used as a CLI command or embedded in the application. To create the query, we can use the CLI for fast iterations, then we'll integrate the query in `RestockBot` later.
|
||||
|
||||
Crawling a website should be used with caution. Please check with retailers if the bot respects the terms of use for
|
||||
their websites. Authors of the bot are not responsible of the bot usage.
|
||||
```
|
||||
vim shop.fql
|
||||
ferret --cdp http://127.0.0.1:9222 -time shop.fql
|
||||
```
|
||||
|
||||
The query must return a list of products in JSON format with the following elements:
|
||||
* `name`: string
|
||||
* `url`: string
|
||||
* `price`: float
|
||||
* `price_currency`: string
|
||||
* `available`: boolean
|
||||
|
||||
Example:
|
||||
|
||||
```json
|
||||
[
|
||||
{
|
||||
"available": false,
|
||||
"name": "Zotac GeForce RTX 3070 AMP Holo",
|
||||
"price": 799.99,
|
||||
"price_currency": "EUR",
|
||||
"url": "https://www.topachat.com/pages/detail2_cat_est_micro_puis_rubrique_est_wgfx_pcie_puis_ref_est_in20007322.html"
|
||||
},
|
||||
{
|
||||
"available": false,
|
||||
"name": "Asus GeForce RTX 3070 DUAL 8G",
|
||||
"price": 739.99,
|
||||
"price_currency": "EUR",
|
||||
"url": "https://www.topachat.com/pages/detail2_cat_est_micro_puis_rubrique_est_wgfx_pcie_puis_ref_est_in20005540.html"
|
||||
},
|
||||
{
|
||||
"available": false,
|
||||
"name": "Palit GeForce RTX 3070 GamingPro OC",
|
||||
"price": 819.99,
|
||||
"price_currency": "EUR",
|
||||
"url": "https://www.topachat.com/pages/detail2_cat_est_micro_puis_rubrique_est_wgfx_pcie_puis_ref_est_in20005819.html"
|
||||
}
|
||||
]
|
||||
```
|
||||
|
||||
`RestockBot` will convert this JSON to a list of `Product`.
|
||||
|
||||
### Embed the query
|
||||
|
||||
Shops are configured as a list of URLs:
|
||||
|
||||
```json
|
||||
{
|
||||
"urls": [
|
||||
"https://www.topachat.com/pages/produits_cat_est_micro_puis_rubrique_est_wgfx_pcie_puis_f_est_58-11447,11445,11446,11559,11558.html",
|
||||
"https://www.ldlc.com/informatique/pieces-informatique/carte-graphique-interne/c4684/+fv121-19183,19184,19185,19339,19340.html",
|
||||
"https://www.materiel.net/carte-graphique/l426/+fv121-19183,19184,19185,19339,19340/"
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
The `Parse` function ([parser.go](parser.go)) will be called. In this example, the following **shop names** will be deduced: `topachat.com`, `ldlc.com` and `materiel.net`.
|
||||
|
||||
Each shop should implement a function to create a ferret query based on an URL:
|
||||
* `func createQueryForLDLC(url string) string`
|
||||
* `func createQueryForMaterielNet(url string) string`
|
||||
* `func createQueryForTopachat(url string) string`
|
||||
* ...
|
||||
|
||||
This function should be added to the switch of the `createQuery` function ([parser.go](parser.go)).
|
||||
|
||||
Products will then be parsed.
|
||||
|
||||
## Disclaimer
|
||||
|
||||
Crawling a website should be used with caution. Please check with retailers if the bot respects the terms of use for their websites. Authors of the bot are not responsible of the bot usage.
|
Reference in a new issue