Skip to content

Commit

Permalink
Fix broken log streaming from #35677 (#36127)
Browse files Browse the repository at this point in the history
* Fixes #28452: "TaskInstances do not succeed when using enable_logging=True option in DockerSwarmOperator"
It introduces logging of Docker Swarm services which was previously not
working.

* tty=True/False to be chosen by the user as it was the case before this fix (#28452)

* Follow=true for logs will always result in tasks not ending. This is  standard and provided upstream by the Docker API.
Therefore in DockerSwarmOperator follow is always false.

* service_logs called multiple times as we continuously poll the Docker API for logs. As we indicated in the previous commmit,
the docker client malfunctions when we try to get the logs with follow=True. Therefore we make multiple calls to the API (every 2 seconds), to fetch the new logs.

* service_logs called multiple times. In this test the tasks increase (6 instead of 5) as we check if the service has terminated (+1). As this assertion makes less sense in a situation where we do multiple calls to the Docker API (polling), we might think of removing it or replacing it with something more suitable.

* Final commit of this PR marking the test case that validates logging in the Docker Swarm Operator. We log two times a different message and we assert that the two lines are given back in the logs in the expected sequence.

* Formatting ruff

* Reverting as Github actions don't run this test as a swarm node:

docker.errors.APIError: 503 Server Error for http+docker://localhost/v1.43/services/create: Service Unavailable ("This node is not a swarm manager. Use "docker swarm init" or "docker swarm join" to connect this node to swarm and try again.")

Revert "Final commit of this PR marking the test case that validates logging in the Docker Swarm Operator. We log two times a different message and we assert that the two lines are given back in the logs in the expected sequence."

This reverts commit 048ba1e.

* Logging "since" timestamp to avoid memory issues
Fix for #28452

* Formatting - Fix for #28452

* Fixes #28452: "TaskInstances do not succeed when using enable_logging=True option in DockerSwarmOperator"
It introduces logging of Docker Swarm services which was previously not
working.

* tty=True/False to be chosen by the user as it was the case before this fix (#28452)

* Follow=true for logs will always result in tasks not ending. This is  standard and provided upstream by the Docker API.
Therefore in DockerSwarmOperator follow is always false.

* service_logs called multiple times as we continuously poll the Docker API for logs. As we indicated in the previous commmit,
the docker client malfunctions when we try to get the logs with follow=True. Therefore we make multiple calls to the API (every 2 seconds), to fetch the new logs.

* service_logs called multiple times. In this test the tasks increase (6 instead of 5) as we check if the service has terminated (+1). As this assertion makes less sense in a situation where we do multiple calls to the Docker API (polling), we might think of removing it or replacing it with something more suitable.

* Final commit of this PR marking the test case that validates logging in the Docker Swarm Operator. We log two times a different message and we assert that the two lines are given back in the logs in the expected sequence.

* Formatting ruff

* Reverting as Github actions don't run this test as a swarm node:

docker.errors.APIError: 503 Server Error for http+docker://localhost/v1.43/services/create: Service Unavailable ("This node is not a swarm manager. Use "docker swarm init" or "docker swarm join" to connect this node to swarm and try again.")

Revert "Final commit of this PR marking the test case that validates logging in the Docker Swarm Operator. We log two times a different message and we assert that the two lines are given back in the logs in the expected sequence."

This reverts commit 048ba1e.

* Logging "since" timestamp to avoid memory issues
Fix for #28452

* Formatting - Fix for #28452

* Fix bugs (#28452): Correctly assign last_line_logged, last_timestamp
  • Loading branch information
stavdav143 authored Dec 9, 2023
1 parent 17e9172 commit f9c952b
Showing 1 changed file with 6 additions and 0 deletions.
6 changes: 6 additions & 0 deletions airflow/providers/docker/operators/docker_swarm.py
Original file line number Diff line number Diff line change
Expand Up @@ -201,6 +201,12 @@ def stream_new_logs(last_line_logged, since=0):
match = re.match(r"(\d{4}-\d{2}-\d{2}T\d{2}:\d{2}:\d{2}.\d{6,}Z) (.*)", line)
timestamp, message = match.groups()
self.log.info(message)

if len(logs) == 0:
return last_line_logged, since

last_line_logged = line

# Floor nanoseconds to microseconds
last_timestamp = re.sub(r"(\.\d{6})\d+Z", r"\1Z", timestamp)
last_timestamp = datetime.strptime(last_timestamp, "%Y-%m-%dT%H:%M:%S.%fZ")
Expand Down

0 comments on commit f9c952b

Please sign in to comment.