Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

duplicates of site.data in db.json leads to large size #5260

Closed
5 tasks done
EmptyDreams opened this issue Aug 1, 2023 · 9 comments · Fixed by #5325
Closed
5 tasks done

duplicates of site.data in db.json leads to large size #5260

EmptyDreams opened this issue Aug 1, 2023 · 9 comments · Fixed by #5325
Labels
bug Something isn't working

Comments

@EmptyDreams
Copy link

EmptyDreams commented Aug 1, 2023

Check List

Please check followings before submitting a new issue.

Expected behavior

The db.json file should be a reasonable size. (Should it be under 50MB at the very least?)

Actual behavior

But in fact, even though I deleted most of the posts from my blog and kept only one post (the md file was 5KB in size), the resulting db.json was still over 70MB.

I'm not quite clear on why this happened, I didn't have this problem originally either, it seemed to come out of nowhere but I can't remember what I was modifying at the time.

Through my observation, there are a lot of duplicate _id & data fields in db.json, such as the source/_data/avatar/cpen.webp file, which is recorded more than 100 times in db.json, which takes up a lot of space.

This problem caused my hexo se and hexo g to wait quite a long time after the INFO Validating config step, and I had to run hexo cl or manually delete the db.json before each command, which clearly defeats the purpose of the db.json file.

Is the problem still there under "Safe mode"?

The problem persists even if I add the --safe directive.

Environment & Settings

Node.js & npm version(node -v && npm -v)

node version: 18.12.1

npm version: 9.8.1

Your site _config.yml (Optional)

Hexo and Plugin version(npm ls --depth 0)

Details

Your package.json package.json

detail

{
  "name": "hexo-site",
  "version": "0.0.0",
  "private": true,
  "scripts": {
    "build": "hexo generate",
    "clean": "hexo clean",
    "deploy": "hexo deploy",
    "server": "hexo server"
  },
  "hexo": {
    "version": "6.3.0"
  },
  "dependencies": {
    "@neilsustc/markdown-it-katex": "^1.0.0",
    "cheerio": "^1.0.0-rc.12",
    "hexo": "^6.3.0",
    "hexo-abbrlink": "^2.2.1",
    "hexo-asset-image": "^1.0.0",
    "hexo-butterfly-envelope": "^1.0.15",
    "hexo-deployer-git": "^3.0.0",
    "hexo-filter-nofollow": "^2.0.2",
    "hexo-generator-archive": "^2.0.0",
    "hexo-generator-baidu-sitemap": "^0.1.9",
    "hexo-generator-category": "^2.0.0",
    "hexo-generator-feed": "^3.0.0",
    "hexo-generator-index": "^3.0.0",
    "hexo-generator-sitemap": "^3.0.1",
    "hexo-generator-tag": "^2.0.0",
    "hexo-graphviz": "^1.0.2",
    "hexo-log": "^3.0.0",
    "hexo-renderer-ejs": "^2.0.0",
    "hexo-renderer-markdown-it": "^7.1.0",
    "hexo-renderer-pug": "^3.0.0",
    "hexo-renderer-stylus": "^3.0.0",
    "hexo-server": "^3.0.0",
    "hexo-swpp": "^2.8.10",
    "hexo-wordcount": "^6.0.1",
    "node-fetch": "^2.6.9",
    "prismjs": "^1.29.0"
  },
  "devDependencies": {
    "gulp": "^4.0.2",
    "gulp-clean": "^0.4.0",
    "gulp-cssnano": "^2.1.3",
    "gulp-html-minifier-terser": "^7.1.0",
    "gulp-htmlclean": "^2.7.22",
    "gulp-terser": "^2.1.0"
  }
}

@uiolee
Copy link
Member

uiolee commented Aug 30, 2023

This could be a problem with one of your plugins

@EmptyDreams
Copy link
Author

This could be a problem with one of your plugins

But it still does after I add the --safe parameter, which I don't think should be the case if it's a plugin issue.

@uiolee
Copy link
Member

uiolee commented Sep 12, 2023

Can you provide a reproducible example?

@EmptyDreams
Copy link
Author

Can you provide a reproducible example?你能提供一个可重复的例子吗?

blog.zip

I'm sorry I'm just now responding to your question.

I've uploaded a zip file with examples that can successfully reproduce the problem I'm talking about. The source folder in this zip contains two folders that make db.json easily over 100MB in size, and whenever I create an empty blog post (using the hexo new post command to create it with nothing in the md file), db.json grows in size at a rate that I can see. I'm guessing that hexo incorrectly duplicates data in db.json, causing the size to balloon indefinitely.

@EmptyDreams
Copy link
Author

My friend told me that placing binary files (including images) in the _data folder causes db.json to bloat, I tested this, at least for image files, and the problem disappeared when I moved all the images in _data outside.
But I don't think hexo should have this problem, it should be considered a bug in hexo.

@uiolee
Copy link
Member

uiolee commented Oct 18, 2023

reproduce

  1. put image (or other type) file(s) in _source/_data/
  2. run hexo g.

    Or run hexo g, then modify a post and rerun hexo g.

db.json

the data of _data will copied to site.data of each post (and page).

this may be the related code. (

)

db.json

@uiolee uiolee added enhancement New feature or request and removed need-investigation labels Oct 18, 2023
@uiolee
Copy link
Member

uiolee commented Oct 18, 2023

I think you shouldn't put images or other binary files in Data Folders, but in Asset Folders.

Of course, we should probably process site.data in a more elegant way

@uiolee uiolee changed the title Large number of duplicates in db.json leads to abnormally large size! duplicates of site.data in db.json leads to large size Oct 18, 2023
@stevenjoezhang
Copy link
Member

It was added in #1969

@uiolee
Copy link
Member

uiolee commented Oct 23, 2023

Fixed by #5325

@uiolee uiolee closed this as completed Oct 23, 2023
@uiolee uiolee added bug Something isn't working and removed enhancement New feature or request labels Oct 23, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants