Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[data-lifecyle] add data-lifecycle API #2157

Merged
merged 14 commits into from
Nov 20, 2019
Merged

Conversation

ryancragun
Copy link
Contributor

@ryancragun ryancragun commented Nov 7, 2019

Expose data lifecycle configuration, status, and run hooks in the gateway.

We need to expose the data lifecycle interfaces in the gateway so that we can configure them in the UI. During the design we noticed that the existing lifecycle page required 4 API calls to get the configuration and 4 to set it. Rather than build another ad-hoc API to provide the purge interfaces to the UI we determined that it would be better to build a top-level data-lifecycle resource and to move the complexity into the gateway instead of the UI.

The current shape of the API is as follows:

/data-lifecycle
  /status # GET: gets aggregate configuration along with other status metadata
  /config # PUT: updates aggregate configuration
  /run # POST: run aggregate lifecycle operations
  /infra
    /status # GET: gets configuration and status for infra lifecycle operation
    /config # PUT: update the infra lifecycle config
    /run # POST: run all the nodes {missing,delete} lifecycle events and purges infra
  /event-feed 
    /status # GET: get the configuration and status for event-feed data lifecycle operations
    /config # PUT: update the event-feed lifecycle config
    /run # POST: purges event feed
  /compliance
    /status # GET: get the configuration and status for compliance data lifecycle operations
    /config # PUT: update the compliance lifecycle config 
    /run # POST: purge compliance reports and scans

There's also a shim for applications services:

  /services # Not currently implemented as the applications service is still in beta
    /status
    /config
    /run

Example aggregate status response

{
  "infra": {
    "jobs": [
      {
        "name": "delete_nodes",
        "disabled": false,
        "recurrence": "FREQ=SECONDLY;DTSTART=20191106T180240Z;INTERVAL=600",
        "threshold": "2d",
        "purge_policies": null,
        "last_elapsed": "0.041154s",
        "next_due_at": "2019-11-07T22:32:40Z",
        "last_enqueued_at": "2019-11-07T22:22:40.017947Z",
        "last_started_at": null,
        "last_ended_at": null
      },
      {
        "name": "missing_nodes",
        "disabled": false,
        "recurrence": "FREQ=SECONDLY;DTSTART=20191106T180240Z;INTERVAL=600",
        "threshold": "2d",
        "purge_policies": null,
        "last_elapsed": "0.043205s",
        "next_due_at": "2019-11-07T22:32:40Z",
        "last_enqueued_at": "2019-11-07T22:22:40.026315Z",
        "last_started_at": null,
        "last_ended_at": null
      },
      {
        "name": "missing_nodes_for_deletion",
        "disabled": false,
        "recurrence": "FREQ=SECONDLY;DTSTART=20191106T180240Z;INTERVAL=600",
        "threshold": "31d",
        "purge_policies": null,
        "last_elapsed": "0.048030s",
        "next_due_at": "2019-11-07T22:32:40Z",
        "last_enqueued_at": "2019-11-07T22:22:40.000590Z",
        "last_started_at": null,
        "last_ended_at": null
      },
      {
        "name": "periodic_purge_timeseries",
        "disabled": false,
        "recurrence": "FREQ=DAILY;DTSTART=20191106T180240Z;INTERVAL=2",
        "threshold": "",
        "purge_policies": {
          "elasticsearch": [
            {
              "name": "actions",
              "index": "actions",
              "older_than_days": 29,
              "custom_purge_field": "",
              "disabled": false
            },
            {
              "name": "converge-history",
              "index": "converge-history",
              "older_than_days": 0,
              "custom_purge_field": "",
              "disabled": false
            }
          ],
          "postgres": []
        },
        "last_elapsed": "0.019777s",
        "next_due_at": "2019-11-08T18:02:40Z",
        "last_enqueued_at": "0001-01-01T00:00:00Z",
        "last_started_at": "2019-11-07T19:37:16.463036Z",
        "last_ended_at": "2019-11-07T19:37:16.482813Z"
      }
    ]
  },
  "compliance": {
    "jobs": [
      {
        "name": "periodic_purge",
        "disabled": true,
        "recurrence": "FREQ=DAILY;DTSTART=20191106T180323Z;INTERVAL=2",
        "threshold": "",
        "purge_policies": {
          "elasticsearch": [
            {
              "name": "compliance-reports",
              "index": "comp-5-r",
              "older_than_days": 100,
              "custom_purge_field": "",
              "disabled": false
            },
            {
              "name": "compliance-scans",
              "index": "comp-5-s",
              "older_than_days": 100,
              "custom_purge_field": "",
              "disabled": false
            }
          ],
          "postgres": []
        },
        "last_elapsed": "0.015443s",
        "next_due_at": "2019-11-08T18:03:23Z",
        "last_enqueued_at": "0001-01-01T00:00:00Z",
        "last_started_at": "2019-11-07T18:03:23.000254Z",
        "last_ended_at": "2019-11-07T18:03:23.015697Z"
      }
    ]
  },
  "event_feed": {
    "jobs": [
      {
        "name": "periodic_purge",
        "disabled": true,
        "recurrence": "FREQ=DAILY;DTSTART=20191106T180243Z;INTERVAL=2",
        "threshold": "",
        "purge_policies": {
          "elasticsearch": [
            {
              "name": "feed",
              "index": "eventfeed-2-feeds",
              "older_than_days": 60,
              "custom_purge_field": "pub_timestamp",
              "disabled": true
            }
          ],
          "postgres": []
        },
        "last_elapsed": "0.817205s",
        "next_due_at": "2019-11-08T18:02:43Z",
        "last_enqueued_at": "0001-01-01T00:00:00Z",
        "last_started_at": "2019-11-07T18:02:43.000265Z",
        "last_ended_at": "2019-11-07T18:02:43.817470Z"
      }
    ]
  },
  "services": null
}

Example aggregate configuration payload

{ "infra": {
    "job_settings": [
      { "name":"delete_nodes",
        "recurrence": "FREQ=SECONDLY;DTSTART=20191106T180240Z;INTERVAL=600",
        "threshold": "2d"
      },
      { "name":"missing_nodes",
        "recurrence": "FREQ=SECONDLY;DTSTART=20191106T180240Z;INTERVAL=600",
        "threshold": "2d"
      },
      { "name":"missing_nodes_for_deletion",
        "recurrence": "FREQ=SECONDLY;DTSTART=20191106T180240Z;INTERVAL=600",
        "threshold": "31d"
      },
      { "name":"periodic_purge_timeseries",
        "recurrence": "FREQ=DAILY;DTSTART=20191106T180240Z;INTERVAL=2",
        "purge_policies": {
          "elasticsearch": [
            {
              "policy_name": "actions",
              "older_than_days": 29,
              "disabled": false
            },
            {
              "policy_name": "converge-history",
              "older_than_days": 29,
              "disabled": false
            }
          ]
        }
      }
    ]
  },
  "compliance": {
    "job_settings": [
      {
        "name": "periodic_purge",
        "disabled": true,
        "recurrence": "FREQ=DAILY;DTSTART=20191106T180323Z;INTERVAL=2",
        "purge_policies": {
          "elasticsearch": [
            {
              "policy_name": "compliance-reports",
              "older_than_days": 100,
              "disabled": false
            },
            {
              "policy_name": "compliance-scans",
              "older_than_days": 100,
              "disabled": false
            }
          ]
        }
      }
    ]
  },
  "event_feed": {
    "job_settings": [
      {
        "name": "periodic_purge",
        "disabled": true,
        "recurrence": "FREQ=DAILY;DTSTART=20191106T180243Z;INTERVAL=2",
        "purge_policies": {
          "elasticsearch": [
            {
              "policy_name": "feed",
              "older_than_days": 60,
              "disabled": true
            }
          ]
        }
      }
    ]
  }
}

Example supervisor output during an aggregate run

[350][default:/src:0]#  curl -kL -H "api-token: $(get_admin_token)" -X POST  https://localhost:2000/data-lifecycle/run; sl
authz-service.default(O): time="2019-11-07T22:43:07Z" level=info msg="Projects Authorized Query" action="dataLifecycle:run:create" projects="[]" resource="dataLifecycle:run" result="[*]" subject="[token:api-token-41]"
ingest-service.default(O): time="2019-11-07T22:43:07Z" level=info msg="Marked nodes missing" nodes_updated=0 status=missing
ingest-service.default(O): time="2019-11-07T22:43:07Z" level=info msg="Node(s) marked for deletion" exists=false nodes_updated=0
ingest-service.default(O): time="2019-11-07T22:43:07Z" level=info msg="Nodes deleted" nodes_deleted=0
ingest-service.default(O): time="2019-11-07T22:43:07Z" level=info msg=Purging
compliance-service.default(O): time="2019-11-07T22:43:07Z" level=info msg=Purging
event-feed-service.default(O): time="2019-11-07T22:43:07Z" level=info msg=Purging

⛓️ Related Resources

#1208
#2108

👍 Definition of Done

An admin user can:

  • see data lifecycle config for all services that implement it
  • configure data lifecycle recurrence, policies, and thresholds
  • run the data lifecycle operations
  • create IAM policies for data lifecycle operations

👟 How to Build and Test the Change

  • Rebuild ingest-service and automate-gateway
  • start_all_services

Test the change via curl:

  • curl -kL -H "api-token: $(get_admin_token)" https://localhost/api/v0/data-lifecycle/status
  • curl -kL -H "api-token: $(get_admin_token)" -X PUT --data "@ds.json" https://localhost/api/v0/data-lifecycle/config where ds.json is json payload file.

go_test ./components/automate-gateway/...

✅ Checklist

  • Tests added/updated?
  • Docs added/updated?

@ryancragun ryancragun force-pushed the ryan/data-lifecycle-gateway branch from b03ec1a to 465161e Compare November 12, 2019 18:15
@ryancragun ryancragun force-pushed the ryan/data-lifecycle-gateway branch from aac606f to 18416e0 Compare November 14, 2019 00:04
@ryancragun ryancragun self-assigned this Nov 14, 2019
@ryancragun ryancragun marked this pull request as ready for review November 14, 2019 00:08
@ryancragun ryancragun requested a review from a team as a code owner November 14, 2019 00:08
Signed-off-by: Ryan Cragun <[email protected]>
@susanev susanev added the documentation Anything related to the Automate docs. label Nov 14, 2019
@ryancragun ryancragun requested a review from a team as a code owner November 14, 2019 22:33
@susanev
Copy link
Contributor

susanev commented Nov 14, 2019

@ryancragun @mjingle im wondering if the content you wrote belongs here https://automate.chef.io/docs/node-lifecycle/ instead of a new page in the config section?

@ryancragun
Copy link
Contributor Author

@ryancragun @mjingle im wondering if the content you wrote belongs here https://automate.chef.io/docs/node-lifecycle/ instead of a new page in the config section?

@susanev I agree that it's not ideal that both of these pages exist, but I sorta think they should for the moment. Right now the Node Lifecycle page relates to the Node Lifecycle UI and the Data Lifecycle pages relates to the API for data lifecycle, which includes much more than nodes. Since we'll live in a world with both the old UI and the new API it makes sense to me. Eventually we could consolidate them or delete most of the Data Lifecycle CLI docs when the new UI lands.

All that said, I'm happy to put the docs wherever.

Signed-off-by: Ryan Cragun <[email protected]>
}

message GetStatusRequest { }
message GetStatusResponse {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems all of the messages below are the same. Would it make sense to just have 1 message type and use a map[MessageType] in the Getters/Setters

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a related thing (not sure how possible it is given the authz stuff): we probably could just have just 1 of each of Set/Get/Status RPCs.
https://github.com/googleapis/googleapis/blob/master/google/api/http.proto#L73 shows off some of that

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Way back in the day we did that but if you ever want to change the result you're super boxed in and have to change it for everything. I don't particularly like having independent messages everywhere because in most cases YAGNI, but alas, if we do it's certainly nice.


func jobSettingsToPurgeConfigure(setting *api.JobSettings) *data_lifecycle.ConfigureRequest {
return &data_lifecycle.ConfigureRequest{
Enabled: !setting.Disabled,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

was this deliberate? would it be safer to keep using Enabled (for example, forgetting to pass in disabled or typoing it could accidentally enable it

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, not using enabled is by design because of go's default values. It's much easier to handle disabled being defaulted to false than enable being defaulted to false. Rather than implicitly disabling things you have to explicitly disable them.

Signed-off-by: Ryan Cragun <[email protected]>
Signed-off-by: Ryan Cragun <[email protected]>

return errors.Errorf("last end '%v' not after start time '%v'", lastEnd, startTime)
}

func() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dont think you need func here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's so I don't have to use GOTO's.

@ryancragun ryancragun merged commit ee26375 into master Nov 20, 2019
@chef-expeditor chef-expeditor bot deleted the ryan/data-lifecycle-gateway branch November 20, 2019 16:55
kagarmoe pushed a commit that referenced this pull request Nov 23, 2019
As a requirement for building web UI to control Data Lifecycle jobs, we needed
to expose the data lifecycle functionality in the gateway. While contemplating
implementation options we noticed that the existing Node Lifecycle page required
two gateway API calls to get the configuration and another two to set it for the
existing two Node Lifecycle jobs. Rather than build several additional ad hoc
gateway APIs to for the the purge jobs, we determined that it would be better to
build a new top-level `data-lifecycle` endpoint where we could expose a unified
single interface for data lifecycle jobs and move all complexity to the backend.
                                                     
The new data lifecycle API is:
                                                     
```          
GET /data-lifecycle/status
PUT /data-lifecycle/config       
POST /data-lifecycle/run 
                                                                                                           
GET /data-lifecycle/infra/status
PUT /data-lifecycle/infra/config
POST /data-lifecycle/infra/run
                                                     
GET /data-lifecycle/compliance/status
PUT /data-lifecycle/compliance/config      
POST /data-lifecycle/compliance/run 
                                                     
GET /data-lifecycle/event-feed/status
PUT /data-lifecycle/event-feed/config
POST /data-lifecycle/event-feed/run
```                     
                                                     
Example aggregate status response:  
                                                     
```json                                            
{                                                                                                          
  "infra": {                                                                                               
    "jobs": [
      {
        "name": "delete_nodes",
        "disabled": false,
        "recurrence": "FREQ=SECONDLY;DTSTART=20191106T180240Z;INTERVAL=600",
        "threshold": "2d",
        "purge_policies": null,
        "last_elapsed": "0.041154s",   
        "next_due_at": "2019-11-07T22:32:40Z",
        "last_enqueued_at": "2019-11-07T22:22:40.017947Z",
        "last_started_at": null,
        "last_ended_at": null 
      },                                                                                                   
      {                  
        "name": "missing_nodes",
        "disabled": false,     
        "recurrence": "FREQ=SECONDLY;DTSTART=20191106T180240Z;INTERVAL=600",
        "threshold": "2d",
        "purge_policies": null,
        "last_elapsed": "0.043205s",        
        "next_due_at": "2019-11-07T22:32:40Z",                                                             
        "last_enqueued_at": "2019-11-07T22:22:40.026315Z",
        "last_started_at": null,
        "last_ended_at": null              
      },                                                                                                   
      {                    
        "name": "missing_nodes_for_deletion",
        "disabled": false,
        "recurrence": "FREQ=SECONDLY;DTSTART=20191106T180240Z;INTERVAL=600",
        "threshold": "31d",         
        "purge_policies": null,
        "last_elapsed": "0.048030s",
        "next_due_at": "2019-11-07T22:32:40Z",
        "last_enqueued_at": "2019-11-07T22:22:40.000590Z",
        "last_started_at": null,    
        "last_ended_at": null  
      },     
      {    
        "name": "periodic_purge_timeseries",
        "disabled": false,
        "recurrence": "FREQ=DAILY;DTSTART=20191106T180240Z;INTERVAL=2",
        "threshold": "",
        "purge_policies": {
          "elasticsearch": [
            {
              "name": "actions", 
              "index": "actions",
              "older_than_days": 29,                                                                       
              "custom_purge_field": "",
              "disabled": false
            },
            {                                     
              "name": "converge-history",
              "index": "converge-history",
              "older_than_days": 0,
              "custom_purge_field": "",
              "disabled": false                 
            }                        
          ],                   
          "postgres": []
        }, 
        "last_elapsed": "0.019777s",
        "next_due_at": "2019-11-08T18:02:40Z",
        "last_enqueued_at": "0001-01-01T00:00:00Z",
        "last_started_at": "2019-11-07T19:37:16.463036Z",
        "last_ended_at": "2019-11-07T19:37:16.482813Z"
      }              
    ]  
  },                             
"compliance": {        
    "jobs": [                                                                                              
      {                    
        "name": "periodic_purge",
        "disabled": true,
        "recurrence": "FREQ=DAILY;DTSTART=20191106T180323Z;INTERVAL=2",
        "threshold": "",            
        "purge_policies": {   
          "elasticsearch": [
            {
              "name": "compliance-reports",
              "index": "comp-5-r",
              "older_than_days": 100,
              "custom_purge_field": "",
              "disabled": false
            },
            {
              "name": "compliance-scans",
              "index": "comp-5-s",            
              "older_than_days": 100,         
              "custom_purge_field": "",                                                                    
              "disabled": false
            }
          ],
          "postgres": []
        },
        "last_elapsed": "0.015443s",
        "next_due_at": "2019-11-08T18:03:23Z",
        "last_enqueued_at": "0001-01-01T00:00:00Z",
        "last_started_at": "2019-11-07T18:03:23.000254Z",
        "last_ended_at": "2019-11-07T18:03:23.015697Z"
      }
    ]
  },
  "event_feed": {
    "jobs": [
      {
        "name": "periodic_purge",
        "disabled": true,
        "recurrence": "FREQ=DAILY;DTSTART=20191106T180243Z;INTERVAL=2",
        "threshold": "",
        "purge_policies": {
          "elasticsearch": [
            {
              "name": "feed",
              "index": "eventfeed-2-feeds",
              "older_than_days": 60,
              "custom_purge_field": "pub_timestamp",
              "disabled": true
            }
          ],
          "postgres": []
        },
        "last_elapsed": "0.817205s",
        "next_due_at": "2019-11-08T18:02:43Z",
        "last_enqueued_at": "0001-01-01T00:00:00Z",
        "last_started_at": "2019-11-07T18:02:43.000265Z",
        "last_ended_at": "2019-11-07T18:02:43.817470Z"
      }
    ]
  },
  "services": null
}
```

Example aggregate configuration payload
```json
{ "infra": {
    "job_settings": [
      { "name":"delete_nodes",
        "recurrence": "FREQ=SECONDLY;DTSTART=20191106T180240Z;INTERVAL=600",
        "threshold": "2d"
      },
      { "name":"missing_nodes",
        "recurrence": "FREQ=SECONDLY;DTSTART=20191106T180240Z;INTERVAL=600",
        "threshold": "2d"
      },
      { "name":"missing_nodes_for_deletion",
        "recurrence": "FREQ=SECONDLY;DTSTART=20191106T180240Z;INTERVAL=600",
        "threshold": "31d"
      },
      { "name":"periodic_purge_timeseries",
        "recurrence": "FREQ=DAILY;DTSTART=20191106T180240Z;INTERVAL=2",
        "purge_policies": {
          "elasticsearch": [
            {
              "policy_name": "actions",
              "older_than_days": 29,
              "disabled": false
            },
            {
              "policy_name": "converge-history",
              "older_than_days": 29,
              "disabled": false
            }
          ]
        }
      }
    ]
  },
  "compliance": {
    "job_settings": [
      {
        "name": "periodic_purge",
        "disabled": true,
        "recurrence": "FREQ=DAILY;DTSTART=20191106T180323Z;INTERVAL=2",
        "purge_policies": {
          "elasticsearch": [
            {
              "policy_name": "compliance-reports",
              "older_than_days": 100,
              "disabled": false
            },
            {
              "policy_name": "compliance-scans",
              "older_than_days": 100,
              "disabled": false
            }
          ]
        }
      }
    ]
  },
  "event_feed": {
    "job_settings": [
      {
        "name": "periodic_purge",
        "disabled": true,
        "recurrence": "FREQ=DAILY;DTSTART=20191106T180243Z;INTERVAL=2",
        "purge_policies": {
          "elasticsearch": [
            {
              "policy_name": "feed",
              "older_than_days": 60,
              "disabled": true
            }
          ]
        }
      }
    ]
  }
}
```

Relates to:
* #1208
* #2108
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
deployment-team documentation Anything related to the Automate docs.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants