Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Emacs process sentinel-related issues #3

Closed
alphapapa opened this issue Sep 28, 2021 · 14 comments
Closed

Emacs process sentinel-related issues #3

alphapapa opened this issue Sep 28, 2021 · 14 comments
Assignees
Labels
bug Something isn't working discussion help wanted Extra attention is needed

Comments

@alphapapa
Copy link
Owner

alphapapa commented Sep 28, 2021

See also:

@alphapapa
Copy link
Owner Author

alphapapa commented Sep 16, 2022

@niklaseklund If you are encountering this problem or one related to it, please provide more information to improve the chances of solving it.

FYI, to anyone reading this, the issues mentioned here don't seem to be problematic in practice. Ement.el, plz's primary user so far, is now on ELPA and used by at least a few hundred people without any apparent problems.

@alphapapa alphapapa added the help wanted Extra attention is needed label Feb 8, 2023
@alphapapa
Copy link
Owner Author

Small update: In looking at CI results lately, it seems that the current Emacs snapshot version may have improved on this problem slightly in that sometimes all of the tests pass (however, note that the linting pass fails due to changes in the cl-labels form's indentation, and that makes the whole CI job appear to fail even when the test phase passes). But it does not appear to be completely solved, because sometimes some of the tests still appear to randomly fail. So maybe some kind of progress has been made in the mysterious process.c...

@alphapapa
Copy link
Owner Author

Have just had a possible insight into this problem: In the following source code:

(defun hyperdrive-entry-at (version entry)
  "Return ENTRY at its hyperdrive's VERSION, or nil if not found."
  ;; Use `hyperdrive-copy-tree', because `copy-tree' doesn't work on
  ;; records/structs, and `copy-hyperdrive-entry' doesn't copy deeply,
  ;; and we need to be able to modify the `etc' alist of the copied
  ;; entry separately.
  (let ((entry (hyperdrive-copy-tree entry t)))
    (setf (alist-get 'with-version-p (hyperdrive-entry-etc entry)) (and version t))
    (when version
      (setf (hyperdrive-entry-version entry) version))
    (condition-case err
        (hyperdrive-fill entry :then 'sync)
      (plz-http-error
       (pcase (plz-response-status (plz-error-response (caddr err)))
         (404 nil)
         (_ (signal (car err) (cdr err))))))))

(cl-defun hyperdrive-fill
    (entry &key queue then
           (else (lambda (plz-error)
                   ;; FIXME: Use a message instead of a warning for
                   ;; now, because the 404 errors for filenames with
                   ;; spaces are annoying as warnings.
                   (hyperdrive-message (format "hyperdrive-fill: error: %S" plz-error) )
                   ;; (display-warning 'hyperdrive
                   ;;                  (format "hyperdrive-fill: error: %S" plz-error))
                   )))
  "Fill ENTRY's metadata and call THEN.
If THEN is `sync', return the filled entry and ignore ELSE.
Otherwise, make request asynchronously and call THEN with the
filled entry; or if request fails, call ELSE (which is passed to
`hyperdrive-api', which see.  If QUEUE, make the fill request in
the given `plz-queue'"
  (declare (indent defun))
  (pcase then
    ('sync (hyperdrive--fill entry
                             (plz-response-headers
                              (with-local-quit
                                (hyperdrive-api 'head (hyperdrive-entry-url entry)
                                  :as 'response
                                  :then 'sync
                                  :noquery t)))))
    (_ (hyperdrive-api 'head (hyperdrive-entry-url entry)
         :queue queue
         :as 'response
         :then (lambda (response)
                 (funcall then (hyperdrive--fill entry (plz-response-headers response))))
         :else else
         :noquery t))))

When stepping through hyperdrive-entry-at with Edebug, the code works correctly: the condition-case catches the error signaled from the plz--sentinel sentinel function. But after re-evaluating the definition of hyperdrive-entry-at, the error signaled from the sentinel is not caught by the condition-case, and the signal propagates all the way up, resulting in this error:

error in process sentinel: plz: HTTP error: "plz--sentinel: HTTP error", #s(plz-error nil #s(plz-response 1.1 404 ((content-type . "text/plain;charset=UTF-8") (date . "Tue, 18 Apr 2023 21:31:07 GMT") (connection . "keep-alive") (keep-alive . "timeout=5")) "
Process plz-request-curl stderr finished
") nil)

(Note the Process plz-request-curl stderr finished junk, for which a solution is being attempted in this branch: https://github.com/alphapapa/plz.el/tree/wip/no-mix-stderr

@alphapapa
Copy link
Owner Author

alphapapa commented May 9, 2023

Today, thanks to @josephmturner's help, I finally put together this puzzle piece from the Info page (elisp)Sentinels:

If an error happens during execution of a sentinel, it is caught automatically, so that it doesn’t stop the execution of whatever programs was running when the sentinel was started. However, if ‘debug-on-error’ is non-‘nil’, errors are not caught.

I wrote a post to emacs-devel about the issue, asking for guidance and whether Emacs should be patched to change this behavior: https://lists.gnu.org/archive/html/emacs-devel/2023-05/msg00251.html In that message I also noted how request.el handles this issue, and a potential workaround.

@piknik
Copy link

piknik commented May 24, 2023

There is absolutely some sort of issue with process handling in Emacs, at least when in batch mode. I've been developing some tools with the heavy use of make-process but always had problems, especially with the use of many processes launched at once. I observed that sometimes when a process finishes and that process has output right at the end of execution, that output does not go through the process filter and into the process buffer before the process sentinel is called. In other words, sometimes the sentinel with the "finished\n" event is called before the final process filter call. The filter is essential for receiving process output and inserting the output into the process buffer. This happens even when no filter or sentinel is set on a process.

In my case, I found that the following code I hacked together helps:

(defun advice--make-sentinel-waiter (sentinel)
  (lambda (process event)
    (with-local-quit
      (accept-process-output process 0.1))
    (when sentinel (funcall sentinel process event))))

(defun advice--make-process (oldfun &rest args)
  (let ((sentinel (plist-get args :sentinel)))
    (let ((args (copy-sequence args)))
      (setf args (plist-put args :sentinel
			    (advice--make-sentinel-waiter sentinel)))
      (let (process actually-live)
	(while (not actually-live)
	  (when process
	    (message "(%s) Failed to start process, retrying" (process-name process)))
	  (setf process (apply oldfun args)
		actually-live (process-live-p process)))
	process))))

(defun advice--set-process-sentinel (oldfun process sentinel)
  (funcall oldfun process (advice--make-sentinel-waiter sentinel)))

(advice-add 'make-process :around #'advice--make-process)
(advice-add 'set-process-sentinel :around #'advice--set-process-sentinel)

What this does is modify make-process and the sentinel mechanisms of it. It ensures that any sentinel that is set on the process will allow an extra 'cycle' of accept-process-output first before being called. This allows the filter to be called on the tail output of the process and have the output handled as intended. Also, the make-process advice will make sure that a process is actually live after a call to make-process, though the message is never called so I don't think this is a problem, just one of the debugging methods I have employed to figure out a workaround.

This one misbehavior I have found with the solution completely solved my issues in my projects. If this helps get you at least a bit further that would be great.

@alphapapa
Copy link
Owner Author

@piknik Wow, thanks for sharing that. Did you report an Emacs bug with that info and code? Also, what versions of Emacs did you see that behavior on? I don't think I've seen that exact problem in my projects, so I wonder if it's been fixed since then. AFAIK Emacs 27 and/or 28 had some minor fixes in process.c.

Although, I'm not completely sure if I haven't seen that problem. My current intuition is that, when I update plz to no longer signal errors from the sentinel, it will solve the problem that manifests as plz sometimes returning nil (or passing nil to the callback) instead of an intended value; but if I'm wrong about that, it might be the same problem you've described.

@piknik
Copy link

piknik commented May 24, 2023

@alphapapa I haven't run into any issue specifically when errors are invoked or caught in a sentinel. In fact, my projects have a nested sentinel call stack, where sentinels can make other processes and handle errors from them. I believe that the C code hands off execution to elisp in the accept-process-output call such that a call to accept-process-output is like a call to another sentinel or filter, but that's my intuition from my observation.

I'm using GNU Emacs 29.0.91 (build 1, x86_64-pc-linux-gnu, GTK+ Version 3.24.37, cairo version 1.17.8) of 2023-05-23, git version, so this is definitely a persistent issue, perhaps decades old. I know that for me, I sometimes had output that was nil where output was expected, but the workaround seems to have fixed that.

I am not familiar with the Emacs bug reporting process and have never submitted Emacs bugs. Would I be right to follow the EmacsWiki entry on Emacs Bug Tracker?

@alphapapa
Copy link
Owner Author

@alphapapa I haven't run into any issue specifically when errors are invoked or caught in a sentinel. In fact, my projects have a nested sentinel call stack, where sentinels can make other processes and handle errors from them.

Hm, well, IIUC that's maybe not a good idea. See my recent post to emacs-devel and Eli Zaretskii's reply: https://lists.gnu.org/archive/html/emacs-devel/2023-05/msg00251.html He says that one should definitely not signal errors from inside a sentinel, which I would presume would apply to anything called from a sentinel, i.e. the whole call stack.

I believe that the C code hands off execution to elisp in the accept-process-output call such that a call to accept-process-output is like a call to another sentinel or filter, but that's my intuition from my observation.

This is basically described in the Elisp manual section 39.10 on process sentinels. But even that has a limit on how much detail it goes into.

I'm using GNU Emacs 29.0.91 (build 1, x86_64-pc-linux-gnu, GTK+ Version 3.24.37, cairo version 1.17.8) of 2023-05-23, git version, so this is definitely a persistent issue, perhaps decades old. I know that for me, I sometimes had output that was nil where output was expected, but the workaround seems to have fixed that.

Hm, that does sound similar to the problems I've experienced.

I am not familiar with the Emacs bug reporting process and have never submitted Emacs bugs. Would I be right to follow the EmacsWiki entry on Emacs Bug Tracker?

Maybe, but the EmacsWiki is not an official resource and may be outdated or incorrect. Basically, the thing to do is M-x report-emacs-bug RET. Write as clear and complete a report as you can, then email it as instructed there, and wait for a reply from the maintainers. If you do, you should probably also mention the report I filed, https://debbugs.gnu.org/cgi/bugreport.cgi?bug=50166, and also provide a link to this GitHub issue where we've discussed it. And then please link your bug report here as well. :)

@piknik
Copy link

piknik commented May 25, 2023

Hm, well, IIUC that's maybe not a good idea. See my recent post to emacs-devel and Eli Zaretskii's reply: https://lists.gnu.org/archive/html/emacs-devel/2023-05/msg00251.html He says that one should definitely not signal errors from inside a sentinel, which I would presume would apply to anything called from a sentinel, i.e. the whole call stack.

Interesting. With my use case of Emacs primarily in batch mode, I have a custom debugger that requires debug-on-error to be non-nil for the custom debugger to be invoked during execution of normal-top-level. I suppose inadvertently I've made my code dependent on this behavior and would cause me problems if the sentinels were ran in a GUI, so thanks for bringing this up to me to correct my assumptions. This would mean that when we use accept-process-output we must check for an error variable of sorts set by a sentinel after this call and signal the error when appropriate.

This behavior still doesn't explain why the sentinel sometimes runs before the final filter call. At some point I'll look into plz and run the tests, do some poking around.

Maybe, but the EmacsWiki is not an official resource and may be outdated or incorrect. Basically, the thing to do is M-x report-emacs-bug RET. Write as clear and complete a report as you can, then email it as instructed there, and wait for a reply from the maintainers. If you do, you should probably also mention the report I filed, https://debbugs.gnu.org/cgi/bugreport.cgi?bug=50166, and also provide a link to this GitHub issue where we've discussed it. And then please link your bug report here as well. :)

Thanks for the info

@kaibagley

This comment was marked as off-topic.

@alphapapa

This comment was marked as off-topic.

@alphapapa
Copy link
Owner Author

@piknik FWIW, I've made some changes in a branch that seem to solve the problem of sometimes getting a nil return value from a plz request. https://github.com/alphapapa/plz.el/tree/wip/no-signal-from-sentinel I didn't copy your workarounds exactly, but I did add a simple check to ensure that the sentinel had run and returned a result, and to call the sentinel again if not, and that seems to have made a big difference: https://github.com/alphapapa/plz.el/compare/wip/no-signal-from-sentinel#diff-86628899659777b5eeb393a8c5c3471f4bdbe10ed16a653a12705cee1225d07dR487-R491 Even the tests all pass now, reliably (as long as the httpbin.org server isn't overloaded; I sometimes use a local Docker container for that part).

I intend for that branch to be a future version of plz; after it's merged I'll let it mellow on master for a while.

@alphapapa alphapapa added this to the 0.7 milestone Jun 26, 2023
@alphapapa
Copy link
Owner Author

This seems to be fixed in v0.7 and on master now. I'll leave this issue open, though, because of the longstanding Emacs process-handling issues, and in case they crop up again...

@alphapapa alphapapa removed this from the 0.7 milestone Jun 26, 2023
@alphapapa
Copy link
Owner Author

AFAIK this is solved now, so closing.

@alphapapa alphapapa self-assigned this Sep 24, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working discussion help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

3 participants