flaky test: compliance scanner #540

vjeffrey · 2019-06-07T02:43:02Z

User Story


1) Failure:
--
  | 30_jobs_docker_spec.rb#test_0005_works [/var/lib/buildkite-agent/builds/single-use-privileged-i-0634b28ea655af127-1/chef-oss/.golang/chef-automate-master-verify/src/github.com/chef/automate/components/compliance-service/api/tests/30_jobs_docker_spec.rb:669]:
  | expected | 0s
  | actual | 1m 30s
  | @@ -1 +1 @@
  | -1
  | +2

https://buildkite.com/chef-oss/chef-automate-master-verify/builds/1796#db69b150-b637-418c-837e-4e70f13322c5

we've seen this multiple times.
i believe it's related to https://github.com/chef/automate/pull/377/files and #399 ....sorry :/

The text was updated successfully, but these errors were encountered:

alexpop · 2019-06-10T15:59:38Z

Let me have a look at this bad boy!

alexpop · 2019-06-17T13:23:16Z

Looking into this issue, I noticed that for this nodes list query:

MANAGER_GRPC nodes, :list, Nodes::Query.new(filters: [
      Common::Filter.new(key: "state", values: ["TERMINATED"]),
      Common::Filter.new(key: "environment", values: ["trou"]),
      Common::Filter.new(key: "group", values: ["mak", "do"])
    ])

the node manager is doing the following SQL query to get the list of nodes:

SELECT
  n.connection_error,
  n.id,
  n.last_contact,
  n.last_job,
  n.manager,
  n.name,
  n.platform,
  n.platform_version,
  n.report_id,
  n.source_state,
  n.status,
  n.target_config,
  n.last_scan,
  n.last_run,
  n.projects_data,
  COALESCE(('[' || string_agg('{"key":"' || t.key || '"' || ',"value": "' || t.value || '"}', ',') || ']'), '[]') :: JSON AS tags,
  COALESCE(array_to_json(array_remove(array_agg(DISTINCT m.manager_id), NULL)), '[]') AS manager_ids,
  COALESCE(array_to_json(array_remove(array_agg(p.project_id), NULL)), '[]') AS projects,
  COUNT(*) OVER () AS total_count
FROM nodes n
  LEFT JOIN nodes_tags nt ON n.id = nt.node_id
  LEFT JOIN tags t ON t.id = nt.tag_id
  LEFT JOIN node_managers_nodes m on n.id = m.node_id
  LEFT JOIN nodes_projects np on np.node_id = n.id
  LEFT JOIN projects p on np.project_id = p.id
WHERE (n.source_state IN ('TERMINATED') AND t.key LIKE 'group' AND t.value LIKE 'mak%' OR t.value LIKE 'do%' OR t.key LIKE 'environment' AND t.value LIKE 'trou%')
GROUP BY n.id
ORDER BY LOWER(n.name) asc
LIMIT 100
OFFSET 0;

 connection_error |                  id                  |    last_contact     |               last_job               | manager  |          name           | platform | platform_version | report_id | source_state |   status    |                                         target_config                                          |
                       last_scan                                                            | last_run | projects_data |                                                     tags                                                      |               manager_ids                | projects | total_count
------------------+--------------------------------------+---------------------+--------------------------------------+----------+-------------------------+----------+------------------+-----------+--------------+-------------+------------------------------------------------------------------------------------------------+-------------------------------------
--------------------------------------------------------------------------------------------+----------+---------------+---------------------------------------------------------------------------------------------------------------+------------------------------------------+----------+-------------
                  | 509f845c-f833-4635-9e21-5dd30e1141c4 | 2019-06-17 10:31:07 | 48cdefa5-08ce-4d27-943d-819e38a074e1 | automate | My Existing Docker Node | debian   | 9.7              |           | TERMINATED   | unreachable | {"secrets":["098ecb0d-f440-4f27-acd2-386269c863e1"],"backend":"docker","host":"cc_pg"}         | {"ID":"f6046a69-b7a0-4b88-9b89-8e0bc
9f7cbc6","Status":"SKIPPED","PenultimateStatus":"SKIPPED","EndTime":"2019-06-17T10:31:06Z"} | {}       | null          | [{"key":"group","value": "doers"},{"key":"group","value": "makers"},{"key":"environment","value": "trouble"}] | ["e69dc612-7e67-43f2-9b19-256afd385820"] | []       |           2
 unknown error    | 6a5d83e3-0315-479e-873c-2690a05648e7 | 0001-01-01 00:00:00 | 79ce54b9-3480-4942-aa14-8905d03c0796 | automate | My Missing Docker Node  |          |                  |           |              | unreachable | {"secrets":["098ecb0d-f440-4f27-acd2-386269c863e1"],"backend":"docker","host":"cc_pggggggggg"} | {}
                                                                                            | {}       | []            | [{"key":"environment","value": "trouble"}]                                                                    | ["e69dc612-7e67-43f2-9b19-256afd385820"] | []       |           2
(2 rows)

Because of the lack of parentheses and the ORed t.key LIKE 'environment' in the WHERE clause where tags are matched, the n.source_state IN ('TERMINATED') condition is being ignored.
If the right parentheses are used along with AND for the t.key LIKE 'environment' condition:

WHERE n.source_state IN ('TERMINATED') AND (t.key LIKE 'group' AND (t.value LIKE 'mak%' OR t.value LIKE 'do%') AND t.key LIKE 'environment' AND t.value LIKE 'trou%')

0 nodes are returned. This is because of the multiple tag based conditions that are applied to each row after the LEFT join.

alexpop · 2019-06-17T13:31:16Z

I managed to come up with this query that uses an EXIST + SELECT for each tag used for filtering:

SELECT
  n.connection_error,
  n.id,
  n.last_contact,
  n.last_job,
  n.manager,
  n.name,
  n.platform,
  n.platform_version,
  n.report_id,
  n.source_state,
  n.status,
  n.target_config,
  n.last_scan,
  n.last_run,
  n.projects_data,
  COALESCE(('[' || string_agg('{"key":"' || t.key || '"' || ',"value": "' || t.value || '"}', ',') || ']'), '[]') :: JSON AS tags,
  COALESCE(array_to_json(array_remove(array_agg(DISTINCT m.manager_id), NULL)), '[]') AS manager_ids,
  COALESCE(array_to_json(array_remove(array_agg(p.project_id), NULL)), '[]') AS projects,
  COUNT(*) OVER () AS total_count
FROM nodes n
  LEFT JOIN nodes_tags nt ON n.id = nt.node_id
  LEFT JOIN tags t ON t.id = nt.tag_id
  LEFT JOIN node_managers_nodes m on n.id = m.node_id
  LEFT JOIN nodes_projects np on np.node_id = n.id
  LEFT JOIN projects p on np.project_id = p.id
WHERE
  n.source_state IN ('TERMINATED') AND
  EXISTS (SELECT NULL FROM nodes_tags nt LEFT JOIN tags t ON nt.tag_id = t.id WHERE nt.node_id = n.id AND t.key LIKE 'environment' AND t.value LIKE 'trou%') AND
  EXISTS (SELECT NULL FROM nodes_tags nt LEFT JOIN tags t ON nt.tag_id = t.id WHERE nt.node_id = n.id AND t.key LIKE 'group' AND (t.value LIKE 'mak%' OR t.value LIKE 'do%'))
GROUP BY n.id
ORDER BY LOWER(n.name) asc
LIMIT 100
OFFSET 0;

 connection_error |                  id                  |    last_contact     |               last_job               | manager  |          name           | platform | platform_version | report_id | source_state |   status    |                                     target_config                                      |
               last_scan                                                            | last_run | projects_data |                                                     tags                                                      |               manager_ids                | projects | total_count
------------------+--------------------------------------+---------------------+--------------------------------------+----------+-------------------------+----------+------------------+-----------+--------------+-------------+----------------------------------------------------------------------------------------+---------------------------------------------
------------------------------------------------------------------------------------+----------+---------------+---------------------------------------------------------------------------------------------------------------+------------------------------------------+----------+-------------
                  | 509f845c-f833-4635-9e21-5dd30e1141c4 | 2019-06-17 10:31:07 | 48cdefa5-08ce-4d27-943d-819e38a074e1 | automate | My Existing Docker Node | debian   | 9.7              |           | TERMINATED   | unreachable | {"secrets":["098ecb0d-f440-4f27-acd2-386269c863e1"],"backend":"docker","host":"cc_pg"} | {"ID":"f6046a69-b7a0-4b88-9b89-8e0bc9f7cbc6"
,"Status":"SKIPPED","PenultimateStatus":"SKIPPED","EndTime":"2019-06-17T10:31:06Z"} | {}       | null          | [{"key":"group","value": "doers"},{"key":"group","value": "makers"},{"key":"environment","value": "trouble"}] | ["e69dc612-7e67-43f2-9b19-256afd385820"] | []       |           1
(1 row)

which correctly matches only the TERMINATED node that has tags environment:trouble, group:doers and group:makers

I'm concerned about the performance of this query due to the nested SELECT calls within WHERE EXISTS.
To test out the performance, I inserted 30k nodes with 35k tags between them.
I EXPLAIN ANALYZE the query and responded in between 4 to 5 seconds on my local machine running postgres in docker.
Without the EXISTS conditions, the response comes back in 1.5 seconds.

vjeffrey · 2019-06-17T13:36:16Z

ooooh nice find @alexpop !!

alexpop · 2019-06-18T21:54:23Z

I paired with Rick today and looked for an alternative solution to the EXISTS with inner SELECTS, which was returning in 3 minutes for tag only filtering for 30k nodes.

We came up with the following SELECT with HAVING which we saw up to 100 times faster than the inner SELECTS version:

SELECT
  n.connection_error,
  n.id,
  n.last_contact,
  n.last_job,
  n.manager,
  n.name,
  n.platform,
  n.platform_version,
  n.report_id,
  n.source_state,
  n.status,
  n.target_config,
  n.last_scan,
  n.last_run,
  n.projects_data,
  COALESCE(('[' || string_agg('{"key":"' || t.key || '"' || ',"value": "' || t.value || '"}', ',') || ']'), '[]') :: JSON AS tags,
  COALESCE(array_to_json(array_remove(array_agg(DISTINCT m.manager_id), NULL)), '[]') AS manager_ids,
  COALESCE(array_to_json(array_remove(array_agg(p.project_id), NULL)), '[]') AS projects,
  COUNT(*) OVER () AS total_count
FROM nodes n
  LEFT JOIN nodes_tags nt ON n.id = nt.node_id
  LEFT JOIN tags t ON t.id = nt.tag_id
  LEFT JOIN node_managers_nodes m on n.id = m.node_id
  LEFT JOIN nodes_projects np on np.node_id = n.id
  LEFT JOIN projects p on np.project_id = p.id
WHERE (n.source_state IN ('TERMINATED'))
GROUP BY n.id
HAVING (array_agg(t.key || ' ' || t.value) @> ARRAY ['group doers'] OR array_agg(t.key || ' ' || t.value) @> ARRAY ['group makers']) AND  array_agg(t.key || ' ' || t.value) @> ARRAY ['environment trouble']
ORDER BY LOWER(n.name) asc
LIMIT 100
OFFSET 0;

phiggins · 2019-06-19T00:50:31Z

Nice! We could also be missing some indexes for all these different types of queries.

alexpop · 2019-06-21T18:48:16Z

The array_agg solution works but can't do wildcard tags filtering.

I switched to string_agg instead and wildcard(%) filtering is possible with LIKE. Check it out:

SELECT
  n.connection_error,
  n.id,
  n.last_contact,
  n.last_job,
  n.manager,
  n.name,
  n.platform,
  n.platform_version,
  n.report_id,
  n.source_state,
  n.status,
  n.target_config,
  n.last_scan,
  n.last_run,
  n.projects_data,
  COALESCE(('[' || string_agg('{"key":"' || t.key || '"' || ',"value": "' || t.value || '"}', ',') || ']'), '[]') :: JSON AS tags,
  COALESCE(array_to_json(array_remove(array_agg(DISTINCT m.manager_id), NULL)), '[]') AS manager_ids,
  COALESCE(array_to_json(array_remove(array_agg(p.project_id), NULL)), '[]') AS projects,
  COUNT(*) OVER () AS total_count
FROM nodes n
  LEFT JOIN nodes_tags nt ON n.id = nt.node_id
  LEFT JOIN tags t ON t.id = nt.tag_id
  LEFT JOIN node_managers_nodes m on n.id = m.node_id
  LEFT JOIN nodes_projects np on np.node_id = n.id
  LEFT JOIN projects p on np.project_id = p.id
WHERE (n.source_state IN ('TERMINATED'))
GROUP BY n.id
HAVING (string_agg(',' || t.key || ':' || t.value, '') LIKE '%,environment:trou%') AND (string_agg(',' || t.key || ':' || t.value, '') LIKE '%,group:mak%' OR string_agg(',' || t.key || ':' || t.value, '') LIKE '%,group:do%')
ORDER BY LOWER(n.name) asc
LIMIT 100
OFFSET 0;

alexpop · 2019-06-22T08:14:30Z

For NOT to work when nodes have no tags, I had to concat the value of string_agg before LIKING it:
For example:

HAVING NOT concat('',string_agg(',' || t.key || ':' || t.value, '')) LIKE '%,department:%'

alexpop · 2019-06-22T09:22:59Z

Looks like this was passing through and generating an invalid postgres query with my refactor:

    # Get the nodes with a department tag without passing the values array:
    nodes_list = MANAGER_GRPC manager, :search_nodes, Manager::NodeQuery.new(
      node_manager_id: "e69dc612-7e67-43f2-9b19-256afd385820",
      query: Manager::Query.new(
        filter_map: [
          Common::Filter.new(key: "department"),
        ]
      )
    )

I updated the code to search for any tag value in this case.

alexpop · 2019-06-23T09:53:18Z

I added a number of extra tests for tags filtering.

Did another performance test with 10k nodes with 11k tags between them.
The SELECT with WHERE filtering using EXISTS + SELECT averaged at 5 seconds response time without the n.source_state IN ('TERMINATED') condition that filters at node level.

Same filter condition but with HAVING + string_agg as the PR is using comes back in around 0.2 seconds on Postgres in Docker for Mac.

alexpop · 2019-06-24T11:30:58Z

Node manager integration tests are failing on #662 because we issue a GetNodes here:

automate/components/nodemanager-service/pgdb/nodes_integration_test.go

Line 220 in 1e73466

    
           fetchedNodes, count, err := suite.Database.GetNodes("name", nodes.Query_ASC, 1, 100, []*common.Filter{filter1, filter2})

with filters:

Key:    "tacos",    Values: []string{"yes"}
Key:    "nachos",   Values: []string{"yes"}

These used to be OR'ed before and returned two nodes. They are now AND'ed and we don't have nodes that share those two tags.

Before I update the test I want to clarify that AND is what we want here.

vjeffrey · 2019-06-24T20:41:01Z

thinking through this..
the tag keys are different, which means this is the same thing as doing something like an environment filter and a platform filter - which we AND together.

So, for consistency, AND for tags of different keys, OR for tags of same keys - so ya, AND is what we want here. might be worth noting that logic in the docs: https://github.com/chef/automate/blob/master/components/automate-chef-io/content/docs/nodes.md

alexpop · 2019-06-25T13:05:47Z

Thank you for confirming Victoria.
I updated the tests added two tags filtering examples to nodes.md for the OR and AND cases.
PR is green now.

vjeffrey added bug 🐛 Something isn't working testing This issue or pull request applies to testing work for Automate labels Jun 7, 2019

alexpop self-assigned this Jun 10, 2019

alexpop mentioned this issue Jun 23, 2019

Fix node manager tags based filtering #662

Merged

alexpop closed this as completed in #662 Jun 27, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

flaky test: compliance scanner #540

flaky test: compliance scanner #540

vjeffrey commented Jun 7, 2019

alexpop commented Jun 10, 2019

alexpop commented Jun 17, 2019 •

edited

Loading

alexpop commented Jun 17, 2019 •

edited

Loading

vjeffrey commented Jun 17, 2019

alexpop commented Jun 18, 2019

phiggins commented Jun 19, 2019

alexpop commented Jun 21, 2019

alexpop commented Jun 22, 2019

alexpop commented Jun 22, 2019

alexpop commented Jun 23, 2019

alexpop commented Jun 24, 2019

vjeffrey commented Jun 24, 2019

alexpop commented Jun 25, 2019

flaky test: compliance scanner #540

flaky test: compliance scanner #540

Comments

vjeffrey commented Jun 7, 2019

User Story

alexpop commented Jun 10, 2019

alexpop commented Jun 17, 2019 • edited Loading

alexpop commented Jun 17, 2019 • edited Loading

vjeffrey commented Jun 17, 2019

alexpop commented Jun 18, 2019

phiggins commented Jun 19, 2019

alexpop commented Jun 21, 2019

alexpop commented Jun 22, 2019

alexpop commented Jun 22, 2019

alexpop commented Jun 23, 2019

alexpop commented Jun 24, 2019

vjeffrey commented Jun 24, 2019

alexpop commented Jun 25, 2019

alexpop commented Jun 17, 2019 •

edited

Loading

alexpop commented Jun 17, 2019 •

edited

Loading