chore: move events sock path to conf #10732

bzp2010 · 2023-12-29T04:12:23Z

Description

In my last PR #10550, I place the unix socket files in logs directory. It may be inappropriate because if we use a container environment with log-rotate turned on, we may mount the logs folder to an external volume, and I think we may run into communication and permissions issues.

So this PR moves the sock files to the conf folder, which is probably a better choice because that's where we're generating our nginx.conf right now, and the same way we used to create conf_server sock file there previously.

In addition, this PR adds some code for cleaning up the sock file to ensure that when an Nginx process crashes or is killed, the sock file can be cleaned up before the next startup, preventing such errors (this is because the file already exists):

✗ bin/apisix start
/usr/local/openresty//luajit/bin/luajit ./apisix/cli/apisix.lua start
nginx.pid exists but there's no corresponding process with pid  1860232 , the file will be overwritten
nginx: [emerg] bind() to unix:/home/bzp/code/apisix/conf/worker_events.sock failed (98: Address already in use)
nginx: [emerg] bind() to unix:/home/bzp/code/apisix/conf/worker_events.sock failed (98: Address already in use)

This often occurs when using containers, and we have to be able to handle it correctly.

This feature has not yet been released, so fixing it now is timely and involves no compatibility issues. In the meantime, no test cases themselves need to be changed.

Checklist

I have explained the need for this PR and the problem it solves
I have explained the changes or the new features added to this PR
I have added tests corresponding to this change
I have updated the documentation to reflect this change
I have verified that this change is backward compatible (If not, please discuss on the APISIX mailing list first)

bzp2010 · 2023-12-30T13:22:11Z

The errors in the test is the result of the behavior in test::nginx that cleans up the conf directory, which looks like this:

start test =>
start nginx and listen to UNIX socket (create file in conf dir) =>
case 1 =>
reload (test::nginx cleanup conf dir and regenerate nginx.conf file) =>
case 2 (will try to connect a not exist sock file that cleaned by test::nginx) =>
connect failed and cannot receive events =>
test failed

In fact, this was already present when we were still using the conf server (config_listen.sock will be in conf), it's just that the tests back then didn't strongly rely on this behavior and it couldn't be detected. I do see it on the 3.2.2 branch.

There are several ways to fix it:

Provide a configuration or environment variable to override the default sock file path for testing (which can also be set by the user). For testing, we place the sock file under /tmp.
Stop setting USE_HUP for cases that fail in the test, so Nginx will restart between tests [1], the unix socket will indeed be recreated, and the test will work.
a. Cases t/node/healthcheck-multiple-worker.t and t/node/healthcheck-ipv6.t involved
b. These tests are simple and do not rely on the reload behavior. Therefore, no impact is expected.
Create a new folder such as tmp alongside the conf folder location to store these temporary files.

This PR itself mentions issues that we should be wary of, including but not limited to files being exposed by container mounts and not being cleaned up on abnormal exits resulting in failure to start next time.
So we need to choose a way to solve this problem, either one of the above or your better suggestion.

Waiting for discussion. cc @monkeyDluffy6017 @shreemaan-abhishek

Ref:
[1] According to Nginx's behavior, when a reload is performed, the newly started worker will not perform a listen operation again, but will simply reuse the file descriptor already created by the old worker. This means that if test::nginx cleans up that sock file between tests, you can still see the unix socket listening via netstat -lx, but the file doesn't exist and you can't connect. Therefore, it can be said that the cleanup behavior of test::nginx and the Nginx reload feature together broke the test.

shreemaan-abhishek · 2024-01-01T02:51:46Z

This PR itself mentions issues that we should be wary of,

But your changes don't introduce these problems, do they? What I am trying to say is, this problem can arise anytime even without your PR being merged.

I think, introducing a configuration variable is fair.

monkeyDluffy6017 · 2024-01-03T06:30:00Z

IMHO, i prefer option 3

bzp2010 · 2024-01-08T06:51:06Z

In fact, I've implemented the third point mentioned (create a tmp dir alongside conf and logs) and it still doesn't work.

The test::nginx makes some pretty dirty assumptions, it completely disregards any directories (conf, logs, html, etc) that may exist under the servroot other than the ones it confirms exist, and therefore its cleanup doesn't work properly. https://github.com/openresty/test-nginx/blob/40aa6364b0a428779cbf96cd10f49b20166483ee/lib/Test/Nginx/Util.pm#L867-L891

And the test::nginx will only do special things to paths like *_cache and *_temp to remove them. https://github.com/openresty/test-nginx/blob/40aa6364b0a428779cbf96cd10f49b20166483ee/lib/Test/Nginx/Util.pm#L870-L871

Instead it tries again to delete and recreate the servroot folder before the start of each test case, and there are no hooks available there for early cleanup files. So servroot/tmp will keep there and block remove servroot.

I have tested the cleanup added via add_cleanup_handler and it will run after an error occurs, so it won't work.

Therefore the use of special cases in testing is indispensable (in my perception). For example, use an environment variable to overwrite the directory used for testing. Is there a better way to do this?

ping @shreemaan-abhishek @monkeyDluffy6017 🤔

monkeyDluffy6017 · 2024-01-09T06:06:42Z

LGTM

shreemaan-abhishek · 2024-01-19T06:06:52Z

@bzp2010 did you attempt using other solution? IMHO, at this stage since solution 3 is not available you can choose any solution and give us a fix.

github-actions · 2024-03-24T10:04:18Z

This pull request has been marked as stale due to 60 days of inactivity. It will be closed in 4 weeks if no further activity occurs. If you think that's incorrect or this pull request should instead be reviewed, please simply write any comment. Even if closed, you can still revive the PR at any time or discuss it on the [email protected] list. Thank you for your contributions.

github-actions · 2024-04-22T10:04:48Z

This pull request/issue has been closed due to lack of activity. If you think that is incorrect, or the pull request requires review, you can revive the PR at any time.

darkSheep404 · 2024-08-28T07:50:26Z

Description

In my last PR #10550, I place the unix socket files in logs directory. It may be inappropriate because if we use a container environment with log-rotate turned on, we may mount the logs folder to an external volume, and I think we may run into communication and permissions issues.↳

So this PR moves the sock files to the conf folder, which is probably a better choice because that's where we're generating our nginx.conf right now, and the same way we used to create conf_server sock file there previously.↳

In addition, this PR adds some code for cleaning up the sock file to ensure that when an Nginx process crashes or is killed, the sock file can be cleaned up before the next startup, preventing such errors (this is because the file already exists):↳
✗ bin/apisix start
/usr/local/openresty//luajit/bin/luajit ./apisix/cli/apisix.lua start
nginx.pid exists but there's no corresponding process with pid  1860232 , the file will be overwritten
nginx: [emerg] bind() to unix:/home/bzp/code/apisix/conf/worker_events.sock failed (98: Address already in use)
nginx: [emerg] bind() to unix:/home/bzp/code/apisix/conf/worker_events.sock failed (98: Address already in use)
This often occurs when using containers, and we have to be able to handle it correctly.

This feature has not yet been released, so fixing it now is timely and involves no compatibility issues. In the meantime, no test cases themselves need to be changed.

Checklist

I have explained the need for this PR and the problem it solves

I have explained the changes or the new features added to this PR

I have added tests corresponding to this change

I have updated the documentation to reflect this change

I have verified that this change is backward compatible (If not, please discuss on the APISIX mailing list first)

we may mount the logs folder to an external volume ==>
we mount the logs folder to an external volume and we occur this issue worker_events.sock failed (98: Address already in use) after updgrate from 3.2 to 3.9
It seems like moving to the conf folder would be a better solution, since mount the logs folder is indeed a common behavior

bzp2010 added 4 commits December 29, 2023 11:58

chore: move events socket path to conf

9c5bca3

feat: ensure socket files be cleaned when nginx conf init

4a360bf

test: fix path

09e520d

fix

8bbd135

bzp2010 self-assigned this Dec 29, 2023

bzp2010 requested review from monkeyDluffy6017 and shreemaan-abhishek December 29, 2023 04:15

bzp2010 marked this pull request as draft December 29, 2023 05:50

cleanup

e557ec2

bzp2010 added 6 commits January 8, 2024 10:51

move to tmp dir

2be9653

add apisix internal shared dict

bf90e61

add tmp dir init

2fc12d1

cleanup

01b9c83

fix lint

bb402c1

move test socks to tmp dir

340ffb0

This was referenced Jan 11, 2024

fix: apisix startup failed due to unix socket not being removed api7/apisix-build-tools#363

Merged

fix: apisix-debian-dev image failed to run apache/apisix-docker#537

Merged

github-actions bot added the stale label Mar 24, 2024

github-actions bot closed this Apr 22, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

chore: move events sock path to conf #10732

chore: move events sock path to conf #10732

bzp2010 commented Dec 29, 2023 •

edited

Loading

bzp2010 commented Dec 30, 2023 •

edited

Loading

shreemaan-abhishek commented Jan 1, 2024

monkeyDluffy6017 commented Jan 3, 2024

bzp2010 commented Jan 8, 2024 •

edited

Loading

monkeyDluffy6017 commented Jan 9, 2024

shreemaan-abhishek commented Jan 19, 2024

github-actions bot commented Mar 24, 2024

github-actions bot commented Apr 22, 2024

darkSheep404 commented Aug 28, 2024

Description

Checklist

chore: move events sock path to conf #10732

chore: move events sock path to conf #10732

Conversation

bzp2010 commented Dec 29, 2023 • edited Loading

Description

Checklist

bzp2010 commented Dec 30, 2023 • edited Loading

shreemaan-abhishek commented Jan 1, 2024

monkeyDluffy6017 commented Jan 3, 2024

bzp2010 commented Jan 8, 2024 • edited Loading

monkeyDluffy6017 commented Jan 9, 2024

shreemaan-abhishek commented Jan 19, 2024

github-actions bot commented Mar 24, 2024

github-actions bot commented Apr 22, 2024

darkSheep404 commented Aug 28, 2024

Description

Checklist

bzp2010 commented Dec 29, 2023 •

edited

Loading

bzp2010 commented Dec 30, 2023 •

edited

Loading

bzp2010 commented Jan 8, 2024 •

edited

Loading