Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Org-node only seems to index/find a fraction of org-roam entities #8

Closed
emacsomancer opened this issue May 7, 2024 · 16 comments
Closed

Comments

@emacsomancer
Copy link
Contributor

I started using quickroam the other day, and decided to try out org-node, but setting it up as:

(use-package org-node
  :vc (:fetcher github :repo "meedstrom/org-node")
  :hook (org-mode . org-node-cache-mode)
  :config
  (setq org-node-creation-fn #'org-node-new-by-roam-capture)
  (setq org-node-slug-fn #'org-node-slugify-like-roam)
  (setq org-node-creation-hook nil)
  (setq org-node-extra-id-dirs (list org-roam-directory)))

When I call org-node-find or org-node-insert, there are many org-roam entries that are missing (that org-roam-node-find and quickroam-find &c. find successfully). I've tried org-node-reset a few times, but it makes no difference.

(I think I have around 14,000 org-roam nodes.)

@meedstrom
Copy link
Owner

Thanks for reporting! What about if you do M-x org-roam-update-org-id-locations? Does the current value org-node-extra-id-dirs contain your actual org-roam-directory? (Thinking it's affected by which one gets loaded first...)

@emacsomancer
Copy link
Contributor Author

Running org-roam-update-org-id-locations takes a bit, but seems to make no difference.

I actually set org-roam-directory before calling either the org-roam or org-node packages,so org-node should know the appropriate location,and indeed org-node-extra-id-dirs contains my actual org-roam directory.

My saying "fraction" was inaccurate. Org-node seems to know about 10,000 some of roughly 14,000 org-roam entities. I have some vague notion that the missing entries are ones that include :ROAM_ALIASES: headers, but I'm probably primed to think about that (see meedstrom/quickroam#2 ). But many of my frequently used org-roam nodes are missing, and those are likely to be ones with defined aliases. (Of course, aliases may well be a red herring.)

@meedstrom
Copy link
Owner

The plot thickens. A puzzle for me! I'll probably get back to you tomorrow or so.

@meedstrom
Copy link
Owner

meedstrom commented May 8, 2024

I've made some small bug fixes, I don't know if they'll have helped.

If it still doesn't work, can you tell me more about your setup? What's your OS? Do you use any right-to-left text? Access any files over TRAMP? Symlinks?

@emacsomancer
Copy link
Contributor Author

emacsomancer commented May 8, 2024

Thanks! It seems to mainly work now.

For some reason there's still a slight discrepancy between org-node and org-roam (15963 vs 16042 nodes, respectively) on my laptop (but the other way on my phone, 16252 vs 16039, respectively), but org-node still at least seems to find most things.

I'm not sure what nodes are missing (or how to figure that out).

(I'm running on various Linuxen on laptop/desktop (Guix, Arch) and also on Android via Termux.

I surely do have a little bit of right-to-left text, but probably not in node names (but in body text).

Haven't tried over TRAMP.

I use a symlinked Org directory on my phone, but that seems to work fine with org-node as far as I can tell.)

@meedstrom
Copy link
Owner

Interesting numbers! Maybe you'll notice one day that you can't find a node, and then you can report it here :)

Symlinks were my guess because the org-node-extra-id-dirs are searched with the function directory-files-recursively without the FOLLOW-SYMLINKS argument. I guess it's fine when org-roam-directory itself is symlinked. It's just that links inside that point to directories outside the org-roam-directory won't be found.

I'm wondering how to write a safety wrapper to use FOLLOW-SYMLINKS... the docstring warns about infinite recursion.

Another thing that can affect discovery is a non-nil value of org-node-perf-assume-coding-system, if some files have a byte-order mark (BOM) and others don't.

Btw just checking, I assume you can find the nodes that have RTL text?

@emacsomancer
Copy link
Contributor Author

I'm thinking that the discrepancy is perhaps a bit greater than it seems since org-node generally seems to find some things that regular org-roam doesn't index. But, again, I'm not sure what's missing. I checked a node that contains right-to-left text and org-node indexes that one without problem.

@meedstrom
Copy link
Owner

meedstrom commented May 8, 2024

Thanks! It's also true that org-node looks up /all/ your org-id-locations, so it should exceed what org-roam indexes, at a minimum.

@meedstrom
Copy link
Owner

To count the files known to org-id:

(length (org-id-hash-to-alist org-id-locations))

To count the files known to org-node:

(cl-loop for node in (hash-table-values org-nodes)
 count (not (org-node-get-is-subtree node)))

To count the files known to org-roam:

(length (org-roam-list-files))

@meedstrom
Copy link
Owner

In fact, we can find out which files they're missing:

;; What org-roam knows that org-node doesn't
(seq-difference
 (mapcar #'file-truename (org-roam-list-files))
 (cl-loop for node in (hash-table-values org-nodes)
  unless (org-node-get-is-subtree node)
  collect (file-truename (org-node-get-file-path node))))

;; What org-node knows that org-roam doesn't
(seq-difference
 (cl-loop for node in (hash-table-values org-nodes)
  unless (org-node-get-is-subtree node)
  collect (file-truename (org-node-get-file-path node)))
 (mapcar #'file-truename (org-roam-list-files)))

@emacsomancer
Copy link
Contributor Author

Thanks for the seq-difference functions!

It is curious, the results.

So, running the "(diff org-roam org-node)" one revealed about 30 files that org-roam "knew about" (in some sense) that org-node didn't. They all had one or more of the following characteristics:

  • entries without a :PROPERTIES: header or missing the :ID: property
  • entries where there is either a newline or one or more spaces before :PROPERTIES:
  • entries with accidental two :PROPERTIES:
  • entries with accidental garbage characters in :PROPERTIES:

Fixing these things, org-node was able to index them (though it seemed to require a restart of emacs to do so).

However, there still seem to be about 70 nodes that org-roam knows about that org-node doesn't. (report: org-node: 15985 / org-roam: 16055).

Further, the "(diff org-node org-roam)" function returned nil, even though I'm sure that there must be nodes the org-node knows about that org-roam doesn't. Maybe it's because they're nodes inside of files rather than files with top-level :PROPERTIES: ?

(Strangely, as far as I can tell, on my janky Termux emacs setup on Android, everything works perfectly, with org-node finding more nodes than org-roam, while on two different Linux machines, I have the above issue. I'm intrigued about why this would be the case still.)

@meedstrom
Copy link
Owner

meedstrom commented May 8, 2024

Hm. Actually. I did a minor oversight. The cl-loop expression was only catching files with file-level noes -- as you said, with a top-level :PROPERTIES:..

Fixed. Maybe now org-node finds more files?

;; What org-roam knows that org-node doesn't
(seq-difference
 (mapcar #'file-truename (org-roam-list-files))
 (seq-uniq (cl-loop
            for node in (hash-table-values org-nodes)
            collect (file-truename (org-node-get-file-path node)))))

;; What org-node knows that org-roam doesn't
(seq-difference
 (seq-uniq (cl-loop
            for node in (hash-table-values org-nodes)
            collect (file-truename (org-node-get-file-path node))))
 (mapcar #'file-truename (org-roam-list-files)))

@meedstrom
Copy link
Owner

But yea, the stuff about malformed property drawers is a big problem. I've had a few of those too :) I've been thinking of doing some sort of autoformatter that could check all Org files. Recently found out about the builtin org-lint, which you can apparently run on save, haven't learned to use that yet.

@emacsomancer
Copy link
Contributor Author

The new seq-difference function for "what org-roam knows about that org-node doesn't" produces the same output as the previous one, so it's not revealing any further nodes. The "what org-node knows about that org-mode doesn't" function now spits out links to various agenda/calendar org files, which is probably reasonable.

Some sort of linter/autoformatter could indeed be useful at some point, indeed. (I'm still curious where the discrepancy lies for my Linux boxes, since the same Org files are shared with the Android device, where org-node does seem to find everything.)

@meedstrom
Copy link
Owner

Hmm. Termux, with Emacs 28? 29? in console mode, I guess?

@meedstrom
Copy link
Owner

@emacsomancer I added a command you might enjoy, M-x org-node-lint-all-files, which runs the built-in org-lint on all files.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants