Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UDP ports, option SRTO_REUSEADDR and srt_close() #1892

Closed
lelegard opened this issue Mar 25, 2021 · 12 comments
Closed

UDP ports, option SRTO_REUSEADDR and srt_close() #1892

lelegard opened this issue Mar 25, 2021 · 12 comments
Assignees
Labels
Type: Question Questions or things that require clarification
Milestone

Comments

@lelegard
Copy link
Contributor

Hi,

I have a question about UDP ports, option SRTO_REUSEADDR and srt_close(). I do not know if this is the expected beahviour, I am not an SRT specialist.

The scenario I run in is the following (detailed traces with SRT options below):

  • Create SRT socket, bind it to UDP port 5555.
  • UDP port 5555 is in use (seen using netstat -anu | grep :5555)
  • Exchange packets, then srt_close(), do not exit application.
  • UDP port 5555 is still in use, even waiting up to 10 seconds after srt_close() <== Is this normal?
  • In the same application, create another SRT socket, set SRTO_REUSEADDR, bind it to UDP port 5555, fails with "Another socket is already listening on the same port" <== Is this normal?
  • Terminate application.
  • UDP port 5555 is immediately unused.

This has been seen on Ubuntu 20.10 with libsrt 1.4.1 and also on Windows 10 with libsrt 1.4.3-rc.0.

I am surprised that the UDP port is still in use after srt_close(). Also surprised by srt_bind() failing despite option SRTO_REUSEADDR (unless of course srt_close() does not close the underlying UDP socket).

I have tried with various delays between srt_close() and the next srt_create_socket() without success.

Debug traces using TSDuck:

* Debug: srt: calling srt_create_socket()
* Debug: srt: calling srt_setsockflag(SRTO_SENDER, 01, 1)
* Debug: srt: calling srt_setsockflag(SRTO_TRANSTYPE, 00 00 00 00, 4)
* Debug: srt: calling srt_setsockflag(SRTO_MESSAGEAPI, 01, 1)
* Debug: srt: calling srt_setsockflag(SRTO_REUSEADDR, 01, 1)
* Debug: srt: calling srt_bind(0.0.0.0:5555)
* Debug: srt: calling srt_listen()
* Debug: srt: calling srt_accept()
* Debug: srt: connected to 127.0.0.1:49012
....
* Debug: srt: calling srt_close()
* Debug: srt: calling srt_create_socket()
* Debug: srt: calling srt_setsockflag(SRTO_SENDER, 01, 1)
* Debug: srt: calling srt_setsockflag(SRTO_TRANSTYPE, 00 00 00 00, 4)
* Debug: srt: calling srt_setsockflag(SRTO_MESSAGEAPI, 01, 1)
* Debug: srt: calling srt_setsockflag(SRTO_REUSEADDR, 01, 1)
* Debug: srt: calling srt_bind(0.0.0.0:5555)
* Debug: srt: calling srt_listen()
* Error: srt: error during srt_listen(), msg: Operation not supported: Another socket is already listening on the same port
* Debug: srt: calling srt_close()

To check the UDP port status:

$ netstat -anu | grep :5555
udp        0      0 0.0.0.0:5555            0.0.0.0:*

Sender command (which produces the traces):

tsp -d -I file -i tsfile.ts -P regulate -O srt 5555 --transtype live --messageapi --multiple --restart-delay 10000

Receiver command (interrupt it using Ctrl-C to reproduce the issue):

tsp -v -I srt localhost:5555 --transtype live --messageapi -O drop

Thanks for you help.

@lelegard lelegard added the Type: Question Questions or things that require clarification label Mar 25, 2021
@ethouris
Copy link
Collaborator

See #1822.

As an explanation: The real occupant of the UDP socket is the object called "multiplexer", which can potentially be shared between SRT sockets. It is always shared between a listener socket and sockets accepted off it, as well as it can be shared by another socket when you call srt_bind with the existing socket's binding address (upon finding it, it will be reused).

Closing a socket is a complicated thing because the socket you want to close might still be in use by other threads, some pending activities and data transmission may be underway, including reading the data in another thread. That's why the original design contained also a Garbage Collector, which made the sockets cleaned up after making sure that all other activities around them are finished. And there is the place where the "disconnection from multiplexer" (and potentially deleting it, when it was the last share) for the socket is happening.

There is already an initiative started to fix it, but the fix is a bit complicated, so this will need time.

@lelegard
Copy link
Contributor Author

Hi @ethouris

Thanks for the explanation. It is clear.

In practice, I have no problem with the underlying UDP socket still being active. On the contrary, it probably has a lot of advantages.

My problem is that calling another srt_create_socket() and rebinding on the same port, in the same application, does not work. It seems contradictory with this:

The real occupant of the UDP socket is the object called "multiplexer", which can potentially be shared between SRT sockets

Sharing the same UDP socket between multiple SRT sockets (in sequence, not at the same time) is precisely what I would like to do but it fails with "Another socket is already listening on the same port".

@DevSysEngineer
Copy link

DevSysEngineer commented Mar 25, 2021

We have this "issue" also in our SRT solution. Based on our experience we created a while loop and in the loop we are trying to create a socket. It's can take some time until a previous socket is deleted in the kernel.

@lelegard
Copy link
Contributor Author

So, what would you recommend in order to create an SRT socket on the same port as a previous SRT socket in the same application?

Using standard UDP sockets, we know that it may take time for a socket to be deleted but specifying REUSEADDR immediately in the new socket works. What about SRT sockets?

@ethouris
Copy link
Collaborator

@lelegard "Rebinding" a socket to the same port should work without problems, as long as you do it in the same application. If you have multiple SRT applications running on one machine then they will be rivalling for the same resources with other applications, also a completely non-SRT application trying to occupy a UDP port by its own UDP socket.

"Certainty" that a UDP socket associated with a multiplexer will be closed at some time requires some prerequisites - and, of course, as for now also some patience. For example, if you have a socket that you have sent some packets to, and they haven't been sent yet over the connection (still enqueued), the resources will not be freed until the buffers are empty. If we are able to provide the fix to do "early multiplexer disconnection", then it will disconnect the multiplexer at the earliest possible time, but this doesn't mean that the UDP resources will be freed exactly after calling srt_close, at least not in every case. It is possible that this happens as long as you make sure first that all buffers are empty and you don't send any data on either side, and of course, the socket being closed is the only user of the multiplexer - for example, if you want to free a port occupied by a listener, you'd have to close not only the listener, but also every socket accepted off it, and any other sockets that were bound to the same port.

@ethouris
Copy link
Collaborator

Ah, as for REUSEADDR option (or REUSEPORT) - be careful. There are information that their portability is often questionable, as well as you better make sure that the UDP socket used by SRT is really not in use at the moment. This option may make the system deliver a packet sent to that socket to the (randomly) first application that requested reading.

@lelegard
Copy link
Contributor Author

@lelegard "Rebinding" a socket to the same port should work without problems, as long as you do it in the same application.

@ethouris, this is precisely what I am trying to do, but it does not work ("Operation not supported: Another socket is already listening on the same port"). Have a look at the debug traces in my original post to see the scenario. Did I miss some option?

@ethouris
Copy link
Collaborator

That's why I'm asking how exactly you are doing it. This error occurs exclusively in one case: when you have another listener socket bound to this port.

There was a problem in early UDT codebase that you couldn't do it as well, but it was fixed in SRT. When you close the previous listening socket, a new listening socket should be able to be bound to the port used by the previous listener. That's not a problem with "binding" because binding multiple sockets to the same address is allowed, just only one of these sockets is allowed to be a listener.

@lelegard
Copy link
Contributor Author

That's not a problem with "binding" because binding multiple sockets to the same address is allowed, just only one of these sockets is allowed to be a listener.

Yes, exactly. If you have a look at the debug traces, you will see that the SRT socket that was listening on the port is closed (srt_close()) and then another SRT socket is created (srt_create_socket()). So, there is just one socket at a time acting as a listener. Moreover, I added delays between srt_close() and srt_create_socket(), up to 10 seconds, without success. So, this is not a matter of some thread not having finished some background cleanup.

@ethouris
Copy link
Collaborator

Then I have no idea what happened here - in my tests this works without problems. This part of the code is executed on the listener socket when it is being closed (direct call from srt_close -> CUDTUnited::close(CUDTSocket*):

   if (s->m_Status == SRTS_LISTENING)
   {
      if (s->m_pUDT->m_bBroken)
         return 0;

      s->m_tsClosureTimeStamp = steady_clock::now();
      s->m_pUDT->m_bBroken    = true;

      // Change towards original UDT: 
      // Leave all the closing activities for garbageCollect to happen,
      // however remove the listener from the RcvQueue IMMEDIATELY.
      // Even though garbageCollect would eventually remove the listener
      // as well, there would be some time interval between now and the
      // moment when it's done, and during this time the application will
      // be unable to bind to this port that the about-to-delete listener
      // is currently occupying (due to blocked slot in the RcvQueue).

      HLOGC(smlog.Debug, log << s->m_pUDT->CONID() << " CLOSING (removing listener immediately)");
      s->m_pUDT->notListening();

      // broadcast all "accept" waiting
      CSync::lock_broadcast(s->m_AcceptCond, s->m_AcceptLock);
   }

The call to s->m_pUDT->notListening() resolves into:

void CRcvQueue::removeListener(const CUDT *u)
{
    ScopedLock lslock(m_LSLock);

    if (u == m_pListener)
        m_pListener = NULL;
}

While your call to srt_listen resolves into CUDTUnited::listen and then CUDT::setListenState, which does this - which returns you this error:

    // if there is already another socket listening on the same port
    if (m_pRcvQueue->setListener(this) < 0)
        throw CUDTException(MJ_NOTSUP, MN_BUSY, 0);

And this setListener does:

int CRcvQueue::setListener(CUDT *u)
{
    ScopedLock lslock(m_LSLock);

    if (NULL != m_pListener)
        return -1;

    m_pListener = u;
    return 0;
}

In other words, this error is reported because m_pListener is not NULL. But if you really called srt_close on the previous listener, it must have done m_pListener = NULL in the call to removeListener, there's no other possibility.

The m_pRcvQueue pointer points to a queue object that is a part of multiplexer - just as a shortcut these queue pointers are copied here for an easy access. At least it should have the same pointer value in the call on the new socket that you are trying to make listen and in the old listener socket that you were trying to close.

The procedure of creating the multiplexer or reusing the old one is in CUDTUnited::updateMux. All these activities should be happening in the main thread, so you should be able to track them all in the debugger. Just remember that when debugging it's worth to set the SRTO_CONNTIMEO and SRTO_PEERIDLETIMEO options to some high values to avoid a surprising interrupt.

@lelegard
Copy link
Contributor Author

Hi @ethouris

Thanks for your time. I analyzed the code of the SRT plugin in TSDuck (I never really dived into SRT previously) and I found that there was a bug there. After srt_accept(), the returned data socket incorrectly overwrote the listener socket value, which was then lost. As a consequence, when the SRT sockets were closed, the SRT listener socket was in fact not closed, hence the "another socket is already listening on the same port" error.

This is unfortunately the second time in a short period that a problem was reported here while it was in fact an issue in the TSDuck SRT plugin. I apologise for this and will be more cautious with TSDuck code contributions in the future.

I close this one.

@ethouris
Copy link
Collaborator

That's exactly what I thought has happened :) Tx.

@mbakholdina mbakholdina added this to the v1.4.3 milestone Mar 26, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Type: Question Questions or things that require clarification
Projects
None yet
Development

No branches or pull requests

4 participants