When workers threshold was reached, the current parser was skipped instead of
being processed later. Adding a for loop to retry instead.
Signed-off-by: Julien Riou <julien@riou.xyz>
Products not updated since a while are not supposed to stay in the database nor
exposed via the API. This commit automatically updates all detected products
with the current date and adds a "retention" flag (number of days) to remove old
products. This flag is disabled by default.
Signed-off-by: Julien Riou <julien@riou.xyz>
A shop map was created to group URLs by shops and process them in order. Now
that we have Amazon and each URL can be parsed independently, there is no need
to group them anymore. Moreover, shops were passed as an argument to the
handleProducts function. Shop name can be deduced by the parser itself. The
parser has a reference to the database. The parser now select or create the shop
before parsing products.
Signed-off-by: Julien Riou <julien@riou.xyz>
This commit introduces the Amazon support with calls to the Product Advertising
API (PA API). For now, I was only able to use the "www.amazon.fr" marketplace.
I will add more marketplaces when my Amazon Associate accounts will be
validated.
Signed-off-by: Julien Riou <julien@riou.xyz>
Add `-api` mode to start the HTTP API with the following routes:
- /health
- /shops
- /shops/:id
- /products
- /products/:id
Signed-off-by: Julien Riou <julien@riou.xyz>
- Rename "Parser" to "URLParser"
- Make "Parse" function generic
- Rename "crawlShop" function to "handleProducts"
- Reduce "handleProducts" footprint a little bit
Signed-off-by: Julien Riou <julien@riou.xyz>
Add `browser_address` configuration setting to define where is the headless
browser instead of relying on the default value.
Signed-off-by: Julien Riou <julien@riou.xyz>
- new language: go
- new shops: cybertek.fr, mediamarkt.ch
- deprecated shops: alternate.be, minershop.eu
- improved database transaction management
- better web parsing library (ferret, requires headless chrome browser)
- include or exclude products by applying regex on their names
- check for PID file to avoid running the bot twice
- hastags are now configurable
Signed-off-by: Julien Riou <julien@riou.xyz>