Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

没有验证重复的商品? #6

Closed
ajoeee opened this issue Oct 17, 2017 · 2 comments
Closed

没有验证重复的商品? #6

ajoeee opened this issue Oct 17, 2017 · 2 comments

Comments

@ajoeee
Copy link

ajoeee commented Oct 17, 2017

No description provided.

@ramsayleung
Copy link
Owner

因为爬虫的去重是scrapy-redis实现的,我并没有重写去重规则,而scrapy-redis 是根据 Requestfingerprint来作去重的判断依据的,所以可能相同的商品(即相同的url),但是Request对象不相同,就没有去掉.所以可能会出现重复的商品

@ramsayleung
Copy link
Owner

其实最最开始的时候,也是有验证重复商品的,只是我是通过Mongodb 的唯一索引来去重的.插入是遇到重复的商品就抛出异常,然后把异常吃掉.这种去重方式很不优雅,现在我已经重写了商品的去重逻辑,通过Redis 保存商品的sku-id, 插入Mongodb 前验证一下Redis 是否已经有这个商品的sku-id, 有则不插入,反之亦然.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants