-
Notifications
You must be signed in to change notification settings - Fork 399
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Airflow 2.2.0 compatibility #117
Comments
This is the error I am seeing for this issue Broken DAG: [/dags/airflow-db-cleanup.py] Traceback (most recent call last):
File "xxx/lib/python3.9/json/encoder.py", line 257, in iterencode
return _iterencode(o, 0)
File "xxx/lib/python3.9/json/encoder.py", line 179, in default
raise TypeError(f'Object of type {o.__class__.__name__} '
TypeError: Object of type InstrumentedAttribute is not JSON serializable |
I see this error too...
|
FWIW - This is the full stacktrace
|
Spent a while troubleshooting this as well. Issue is that the database objects being passed to the cleanup task via params aren't JSON serializable. This might've been introduced by apache/airflow#17100. |
if anyone is working actively on this, or even vaguely thinking about it, please reach out. Fixing this is basically my week this week and I'd love to have a natter and catch up with anyone working on the same problem. |
One approach we were trying was to give those classes and attributes as literals and then at the end, trying to objectify them. E.g.:
Objectify by python 'locate' and 'getattr', but doing this dynamically for all of those database_objects, would require some additional information. |
I'm facing similar issues after migrating from 2.1.2 to 2.2.1 Change 1Turn
Change 2include database_objects_dicts inside the cleanup_function, as it raises error when Database_objects are passed as arguments. You SHOULD CHANGE
Change 3And change the
It does not raises error anymore. But it does not guarantee performance or other stabilities. I'm just detouring it to make it work. |
Great! Thanks @ud803! Also, don't forget change:
to
|
i tested @ealebed and @ud803 fix for db-cleanup in Airflow 2.2.1 and now JSON serialize error are now gone
|
…the cleanup task via params aren't JSON serializable
For anyone looking at this on v2.2.4 or greater, you also need to change the ##### Change 1:
'TaskInstance' : {
"airflow_db_model": TaskInstance,
"age_check_column": TaskInstance.start_date, #changed
"keep_last": False,
"keep_last_filters": None,
"keep_last_group_by": None
}
##### Change 2:
try:
from airflow.models import TaskReschedule
DATABASE_OBJECTS_DICTS['TaskReschedule'] = {
"airflow_db_model": TaskReschedule,
"age_check_column": TaskReschedule.start_date, # changed
"keep_last": False,
"keep_last_filters": None,
"keep_last_group_by": None
} |
@austin-phil Using start_date deletes different instances as that is not when the instance actually executed. Following the logic to comply with 2.2.x and newer, I adjusted the following:
The issue is that the execution_date is referenced differently and you must join to dag_run. Alternatively, you can add an option called "load_only" and pass that through to the query in the cleanup function. Example:
References: |
@austin-phil , @tylerwmarrs: Starting with v2.2.0, there is a foreign key constraint in TaskInstance and TaskReschedule referencing the DagRun with the "on_delete" option set to "CASCADE". Deleting a row in the DagRun table should therefore delete the corresponding rows in the TaskInstance and TaskReschedule tables automatically. At least if you're using a database backend that supports this such as Postgres. So simply removing TaskInstance and TaskReschedule from the DATABASE_OBJECTS list should yield the desired behavior. References: |
It looks like that @PhilippDB has right regarding the constraints in Postgresql. The same situation is also for cleanup_BaseXCom, cleanup_RenderedTaskInstanceFields. References: I made a simple google search and it looks like that main databases (mysql, oracle, microsoft sql server, postgresql) support the 'on_delete=CASCADE' option. |
Using these dags in Airflow 2.2.0 results in a scheduler crash once a new dagrun has been created for airflow_logs_cleanup - most likely because of a dag parsing error.
is an update planned to support Airflow 2.2.0 ? sadly i removed them from the project without copying the airflow dag error first so i can't be more precise right now
The text was updated successfully, but these errors were encountered: