We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
No description provided.
The text was updated successfully, but these errors were encountered:
因为爬虫的去重是scrapy-redis实现的,我并没有重写去重规则,而scrapy-redis 是根据 Request的fingerprint来作去重的判断依据的,所以可能相同的商品(即相同的url),但是Request对象不相同,就没有去掉.所以可能会出现重复的商品
Request
fingerprint
Sorry, something went wrong.
其实最最开始的时候,也是有验证重复商品的,只是我是通过Mongodb 的唯一索引来去重的.插入是遇到重复的商品就抛出异常,然后把异常吃掉.这种去重方式很不优雅,现在我已经重写了商品的去重逻辑,通过Redis 保存商品的sku-id, 插入Mongodb 前验证一下Redis 是否已经有这个商品的sku-id, 有则不插入,反之亦然.
No branches or pull requests
No description provided.
The text was updated successfully, but these errors were encountered: