Archived
1
0
Fork 0
Commit graph

13 commits

Author SHA1 Message Date
244c9f68e7
refactor: move filters out of parser
Filters are now separate structures to include a product or not based
on their own set of properties. For now, include and exclude filters
are supported. They take a regex as an argument and include a product
if the regex matches (or doesn't match) the product name. This commit
will allow us to create new filters on product like on a price range.

Signed-off-by: Julien Riou <julien@riou.xyz>
2021-05-19 17:43:31 +02:00
ab5abcd171
Select or create shop before parsing
A shop map was created to group URLs by shops and process them in order. Now
that we have Amazon and each URL can be parsed independently, there is no need
to group them anymore. Moreover, shops were passed as an argument to the
handleProducts function. Shop name can be deduced by the parser itself. The
parser has a reference to the database. The parser now select or create the shop
before parsing products.

Signed-off-by: Julien Riou <julien@riou.xyz>
2021-04-01 17:50:50 +02:00
5ac5f78ae2
Add Amazon support (#3)
This commit introduces the Amazon support with calls to the Product Advertising
API (PA API). For now, I was only able to use the "www.amazon.fr" marketplace.
I will add more marketplaces when my Amazon Associate accounts will be
validated.

Signed-off-by: Julien Riou <julien@riou.xyz>
2021-04-01 12:59:16 +02:00
e67ab63ca8
Prepare for new parsers
- Rename "Parser" to "URLParser"
- Make "Parse" function generic
- Rename "crawlShop" function to "handleProducts"
- Reduce "handleProducts" footprint a little bit

Signed-off-by: Julien Riou <julien@riou.xyz>
2021-03-23 09:00:10 +01:00
2681e4a427
Add Versus Gamers support (#12)
Signed-off-by: Julien Riou <julien@riou.xyz>
2021-03-19 17:03:11 +01:00
42b8f50068
Add STEG support (#10)
Signed-off-by: Julien Riou <julien@riou.xyz>
2021-03-16 15:29:28 +01:00
384ad0beef
Add newegg support (#14)
Signed-off-by: Julien Riou <julien@riou.xyz>
2021-03-02 12:37:12 +01:00
45025def65
Configure headless browser address
Add `browser_address` configuration setting to define where is the headless
browser instead of relying on the default value.

Signed-off-by: Julien Riou <julien@riou.xyz>
2021-03-01 13:11:58 +01:00
6f002f007d
Bugfix include and exclude regexes
Signed-off-by: Julien Riou <julien@riou.xyz>
2021-03-01 09:06:00 +01:00
2afd36584b
Add store support for Micro Center (#2)
Local stores are set with an "storeid" param in the query string of the URL
and by a "storeSelected" cookie to avoid garbage in the query strings for
further requests. As product URL is a unique key in the database, Micro Center
is able to handle an URL with the store ID for every product. We can add this
storeid in the list of URLs to parse and job done. Every single Micro Center
local store are parsable.

Signed-off-by: Julien Riou <julien@riou.xyz>
2021-02-28 15:11:31 +01:00
42f79c03d4
Add Micro Center support (#2)
As a good start, only the "shippable items" are parsed. Next enhancement would
be to configure local shops.

Signed-off-by: Julien Riou <julien@riou.xyz>
2021-02-28 10:33:28 +01:00
40b0ba999a
Remove useless compileRegex function
Signed-off-by: Julien Riou <julien@riou.xyz>
2021-02-27 15:14:21 +01:00
3a4aba93e5
Release 0.2.0
- new language: go
- new shops: cybertek.fr, mediamarkt.ch
- deprecated shops: alternate.be, minershop.eu
- improved database transaction management
- better web parsing library (ferret, requires headless chrome browser)
- include or exclude products by applying regex on their names
- check for PID file to avoid running the bot twice
- hastags are now configurable

Signed-off-by: Julien Riou <julien@riou.xyz>
2021-02-27 08:10:43 +01:00