Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

flaky test: compliance scanner #540

Closed
vjeffrey opened this issue Jun 7, 2019 · 13 comments · Fixed by #662
Closed

flaky test: compliance scanner #540

vjeffrey opened this issue Jun 7, 2019 · 13 comments · Fixed by #662
Assignees
Labels
bug 🐛 Something isn't working testing This issue or pull request applies to testing work for Automate

Comments

@vjeffrey
Copy link

vjeffrey commented Jun 7, 2019

User Story


1) Failure:
--
  | 30_jobs_docker_spec.rb#test_0005_works [/var/lib/buildkite-agent/builds/single-use-privileged-i-0634b28ea655af127-1/chef-oss/.golang/chef-automate-master-verify/src/github.com/chef/automate/components/compliance-service/api/tests/30_jobs_docker_spec.rb:669]:
  | expected | 0s
  | actual | 1m 30s
  | @@ -1 +1 @@
  | -1
  | +2

https://buildkite.com/chef-oss/chef-automate-master-verify/builds/1796#db69b150-b637-418c-837e-4e70f13322c5

we've seen this multiple times.
i believe it's related to https://github.com/chef/automate/pull/377/files and #399 ....sorry :/

@vjeffrey vjeffrey added bug 🐛 Something isn't working testing This issue or pull request applies to testing work for Automate labels Jun 7, 2019
@alexpop alexpop self-assigned this Jun 10, 2019
@alexpop
Copy link
Contributor

alexpop commented Jun 10, 2019

Let me have a look at this bad boy!

@alexpop
Copy link
Contributor

alexpop commented Jun 17, 2019

Looking into this issue, I noticed that for this nodes list query:

MANAGER_GRPC nodes, :list, Nodes::Query.new(filters: [
      Common::Filter.new(key: "state", values: ["TERMINATED"]),
      Common::Filter.new(key: "environment", values: ["trou"]),
      Common::Filter.new(key: "group", values: ["mak", "do"])
    ])

the node manager is doing the following SQL query to get the list of nodes:

SELECT
  n.connection_error,
  n.id,
  n.last_contact,
  n.last_job,
  n.manager,
  n.name,
  n.platform,
  n.platform_version,
  n.report_id,
  n.source_state,
  n.status,
  n.target_config,
  n.last_scan,
  n.last_run,
  n.projects_data,
  COALESCE(('[' || string_agg('{"key":"' || t.key || '"' || ',"value": "' || t.value || '"}', ',') || ']'), '[]') :: JSON AS tags,
  COALESCE(array_to_json(array_remove(array_agg(DISTINCT m.manager_id), NULL)), '[]') AS manager_ids,
  COALESCE(array_to_json(array_remove(array_agg(p.project_id), NULL)), '[]') AS projects,
  COUNT(*) OVER () AS total_count
FROM nodes n
  LEFT JOIN nodes_tags nt ON n.id = nt.node_id
  LEFT JOIN tags t ON t.id = nt.tag_id
  LEFT JOIN node_managers_nodes m on n.id = m.node_id
  LEFT JOIN nodes_projects np on np.node_id = n.id
  LEFT JOIN projects p on np.project_id = p.id
WHERE (n.source_state IN ('TERMINATED') AND t.key LIKE 'group' AND t.value LIKE 'mak%' OR t.value LIKE 'do%' OR t.key LIKE 'environment' AND t.value LIKE 'trou%')
GROUP BY n.id
ORDER BY LOWER(n.name) asc
LIMIT 100
OFFSET 0;

 connection_error |                  id                  |    last_contact     |               last_job               | manager  |          name           | platform | platform_version | report_id | source_state |   status    |                                         target_config                                          |
                       last_scan                                                            | last_run | projects_data |                                                     tags                                                      |               manager_ids                | projects | total_count
------------------+--------------------------------------+---------------------+--------------------------------------+----------+-------------------------+----------+------------------+-----------+--------------+-------------+------------------------------------------------------------------------------------------------+-------------------------------------
--------------------------------------------------------------------------------------------+----------+---------------+---------------------------------------------------------------------------------------------------------------+------------------------------------------+----------+-------------
                  | 509f845c-f833-4635-9e21-5dd30e1141c4 | 2019-06-17 10:31:07 | 48cdefa5-08ce-4d27-943d-819e38a074e1 | automate | My Existing Docker Node | debian   | 9.7              |           | TERMINATED   | unreachable | {"secrets":["098ecb0d-f440-4f27-acd2-386269c863e1"],"backend":"docker","host":"cc_pg"}         | {"ID":"f6046a69-b7a0-4b88-9b89-8e0bc
9f7cbc6","Status":"SKIPPED","PenultimateStatus":"SKIPPED","EndTime":"2019-06-17T10:31:06Z"} | {}       | null          | [{"key":"group","value": "doers"},{"key":"group","value": "makers"},{"key":"environment","value": "trouble"}] | ["e69dc612-7e67-43f2-9b19-256afd385820"] | []       |           2
 unknown error    | 6a5d83e3-0315-479e-873c-2690a05648e7 | 0001-01-01 00:00:00 | 79ce54b9-3480-4942-aa14-8905d03c0796 | automate | My Missing Docker Node  |          |                  |           |              | unreachable | {"secrets":["098ecb0d-f440-4f27-acd2-386269c863e1"],"backend":"docker","host":"cc_pggggggggg"} | {}
                                                                                            | {}       | []            | [{"key":"environment","value": "trouble"}]                                                                    | ["e69dc612-7e67-43f2-9b19-256afd385820"] | []       |           2
(2 rows)

Because of the lack of parentheses and the ORed t.key LIKE 'environment' in the WHERE clause where tags are matched, the n.source_state IN ('TERMINATED') condition is being ignored.
If the right parentheses are used along with AND for the t.key LIKE 'environment' condition:

WHERE n.source_state IN ('TERMINATED') AND (t.key LIKE 'group' AND (t.value LIKE 'mak%' OR t.value LIKE 'do%') AND t.key LIKE 'environment' AND t.value LIKE 'trou%')

0 nodes are returned. This is because of the multiple tag based conditions that are applied to each row after the LEFT join.

@alexpop
Copy link
Contributor

alexpop commented Jun 17, 2019

I managed to come up with this query that uses an EXIST + SELECT for each tag used for filtering:

SELECT
  n.connection_error,
  n.id,
  n.last_contact,
  n.last_job,
  n.manager,
  n.name,
  n.platform,
  n.platform_version,
  n.report_id,
  n.source_state,
  n.status,
  n.target_config,
  n.last_scan,
  n.last_run,
  n.projects_data,
  COALESCE(('[' || string_agg('{"key":"' || t.key || '"' || ',"value": "' || t.value || '"}', ',') || ']'), '[]') :: JSON AS tags,
  COALESCE(array_to_json(array_remove(array_agg(DISTINCT m.manager_id), NULL)), '[]') AS manager_ids,
  COALESCE(array_to_json(array_remove(array_agg(p.project_id), NULL)), '[]') AS projects,
  COUNT(*) OVER () AS total_count
FROM nodes n
  LEFT JOIN nodes_tags nt ON n.id = nt.node_id
  LEFT JOIN tags t ON t.id = nt.tag_id
  LEFT JOIN node_managers_nodes m on n.id = m.node_id
  LEFT JOIN nodes_projects np on np.node_id = n.id
  LEFT JOIN projects p on np.project_id = p.id
WHERE
  n.source_state IN ('TERMINATED') AND
  EXISTS (SELECT NULL FROM nodes_tags nt LEFT JOIN tags t ON nt.tag_id = t.id WHERE nt.node_id = n.id AND t.key LIKE 'environment' AND t.value LIKE 'trou%') AND
  EXISTS (SELECT NULL FROM nodes_tags nt LEFT JOIN tags t ON nt.tag_id = t.id WHERE nt.node_id = n.id AND t.key LIKE 'group' AND (t.value LIKE 'mak%' OR t.value LIKE 'do%'))
GROUP BY n.id
ORDER BY LOWER(n.name) asc
LIMIT 100
OFFSET 0;

 connection_error |                  id                  |    last_contact     |               last_job               | manager  |          name           | platform | platform_version | report_id | source_state |   status    |                                     target_config                                      |
               last_scan                                                            | last_run | projects_data |                                                     tags                                                      |               manager_ids                | projects | total_count
------------------+--------------------------------------+---------------------+--------------------------------------+----------+-------------------------+----------+------------------+-----------+--------------+-------------+----------------------------------------------------------------------------------------+---------------------------------------------
------------------------------------------------------------------------------------+----------+---------------+---------------------------------------------------------------------------------------------------------------+------------------------------------------+----------+-------------
                  | 509f845c-f833-4635-9e21-5dd30e1141c4 | 2019-06-17 10:31:07 | 48cdefa5-08ce-4d27-943d-819e38a074e1 | automate | My Existing Docker Node | debian   | 9.7              |           | TERMINATED   | unreachable | {"secrets":["098ecb0d-f440-4f27-acd2-386269c863e1"],"backend":"docker","host":"cc_pg"} | {"ID":"f6046a69-b7a0-4b88-9b89-8e0bc9f7cbc6"
,"Status":"SKIPPED","PenultimateStatus":"SKIPPED","EndTime":"2019-06-17T10:31:06Z"} | {}       | null          | [{"key":"group","value": "doers"},{"key":"group","value": "makers"},{"key":"environment","value": "trouble"}] | ["e69dc612-7e67-43f2-9b19-256afd385820"] | []       |           1
(1 row)

which correctly matches only the TERMINATED node that has tags environment:trouble, group:doers and group:makers

I'm concerned about the performance of this query due to the nested SELECT calls within WHERE EXISTS.
To test out the performance, I inserted 30k nodes with 35k tags between them.
I EXPLAIN ANALYZE the query and responded in between 4 to 5 seconds on my local machine running postgres in docker.
Without the EXISTS conditions, the response comes back in 1.5 seconds.

@vjeffrey
Copy link
Author

ooooh nice find @alexpop !!

@alexpop
Copy link
Contributor

alexpop commented Jun 18, 2019

I paired with Rick today and looked for an alternative solution to the EXISTS with inner SELECTS, which was returning in 3 minutes for tag only filtering for 30k nodes.

We came up with the following SELECT with HAVING which we saw up to 100 times faster than the inner SELECTS version:

SELECT
  n.connection_error,
  n.id,
  n.last_contact,
  n.last_job,
  n.manager,
  n.name,
  n.platform,
  n.platform_version,
  n.report_id,
  n.source_state,
  n.status,
  n.target_config,
  n.last_scan,
  n.last_run,
  n.projects_data,
  COALESCE(('[' || string_agg('{"key":"' || t.key || '"' || ',"value": "' || t.value || '"}', ',') || ']'), '[]') :: JSON AS tags,
  COALESCE(array_to_json(array_remove(array_agg(DISTINCT m.manager_id), NULL)), '[]') AS manager_ids,
  COALESCE(array_to_json(array_remove(array_agg(p.project_id), NULL)), '[]') AS projects,
  COUNT(*) OVER () AS total_count
FROM nodes n
  LEFT JOIN nodes_tags nt ON n.id = nt.node_id
  LEFT JOIN tags t ON t.id = nt.tag_id
  LEFT JOIN node_managers_nodes m on n.id = m.node_id
  LEFT JOIN nodes_projects np on np.node_id = n.id
  LEFT JOIN projects p on np.project_id = p.id
WHERE (n.source_state IN ('TERMINATED'))
GROUP BY n.id
HAVING (array_agg(t.key || ' ' || t.value) @> ARRAY ['group doers'] OR array_agg(t.key || ' ' || t.value) @> ARRAY ['group makers']) AND  array_agg(t.key || ' ' || t.value) @> ARRAY ['environment trouble']
ORDER BY LOWER(n.name) asc
LIMIT 100
OFFSET 0;

@phiggins
Copy link
Contributor

Nice! We could also be missing some indexes for all these different types of queries.

@alexpop
Copy link
Contributor

alexpop commented Jun 21, 2019

The array_agg solution works but can't do wildcard tags filtering.

I switched to string_agg instead and wildcard(%) filtering is possible with LIKE. Check it out:

SELECT
  n.connection_error,
  n.id,
  n.last_contact,
  n.last_job,
  n.manager,
  n.name,
  n.platform,
  n.platform_version,
  n.report_id,
  n.source_state,
  n.status,
  n.target_config,
  n.last_scan,
  n.last_run,
  n.projects_data,
  COALESCE(('[' || string_agg('{"key":"' || t.key || '"' || ',"value": "' || t.value || '"}', ',') || ']'), '[]') :: JSON AS tags,
  COALESCE(array_to_json(array_remove(array_agg(DISTINCT m.manager_id), NULL)), '[]') AS manager_ids,
  COALESCE(array_to_json(array_remove(array_agg(p.project_id), NULL)), '[]') AS projects,
  COUNT(*) OVER () AS total_count
FROM nodes n
  LEFT JOIN nodes_tags nt ON n.id = nt.node_id
  LEFT JOIN tags t ON t.id = nt.tag_id
  LEFT JOIN node_managers_nodes m on n.id = m.node_id
  LEFT JOIN nodes_projects np on np.node_id = n.id
  LEFT JOIN projects p on np.project_id = p.id
WHERE (n.source_state IN ('TERMINATED'))
GROUP BY n.id
HAVING (string_agg(',' || t.key || ':' || t.value, '') LIKE '%,environment:trou%') AND (string_agg(',' || t.key || ':' || t.value, '') LIKE '%,group:mak%' OR string_agg(',' || t.key || ':' || t.value, '') LIKE '%,group:do%')
ORDER BY LOWER(n.name) asc
LIMIT 100
OFFSET 0;

@alexpop
Copy link
Contributor

alexpop commented Jun 22, 2019

For NOT to work when nodes have no tags, I had to concat the value of string_agg before LIKING it:
For example:

HAVING NOT concat('',string_agg(',' || t.key || ':' || t.value, '')) LIKE '%,department:%'

@alexpop
Copy link
Contributor

alexpop commented Jun 22, 2019

Looks like this was passing through and generating an invalid postgres query with my refactor:

    # Get the nodes with a department tag without passing the values array:
    nodes_list = MANAGER_GRPC manager, :search_nodes, Manager::NodeQuery.new(
      node_manager_id: "e69dc612-7e67-43f2-9b19-256afd385820",
      query: Manager::Query.new(
        filter_map: [
          Common::Filter.new(key: "department"),
        ]
      )
    )

I updated the code to search for any tag value in this case.

@alexpop
Copy link
Contributor

alexpop commented Jun 23, 2019

I added a number of extra tests for tags filtering.

Did another performance test with 10k nodes with 11k tags between them.
The SELECT with WHERE filtering using EXISTS + SELECT averaged at 5 seconds response time without the n.source_state IN ('TERMINATED') condition that filters at node level.

Same filter condition but with HAVING + string_agg as the PR is using comes back in around 0.2 seconds on Postgres in Docker for Mac.

@alexpop
Copy link
Contributor

alexpop commented Jun 24, 2019

Node manager integration tests are failing on #662 because we issue a GetNodes here:

fetchedNodes, count, err := suite.Database.GetNodes("name", nodes.Query_ASC, 1, 100, []*common.Filter{filter1, filter2})

with filters:

Key:    "tacos",    Values: []string{"yes"}
Key:    "nachos",   Values: []string{"yes"}

These used to be OR'ed before and returned two nodes. They are now AND'ed and we don't have nodes that share those two tags.

Before I update the test I want to clarify that AND is what we want here.

@vjeffrey
Copy link
Author

thinking through this..
the tag keys are different, which means this is the same thing as doing something like an environment filter and a platform filter - which we AND together.

So, for consistency, AND for tags of different keys, OR for tags of same keys - so ya, AND is what we want here. might be worth noting that logic in the docs: https://github.com/chef/automate/blob/master/components/automate-chef-io/content/docs/nodes.md

@alexpop
Copy link
Contributor

alexpop commented Jun 25, 2019

Thank you for confirming Victoria.
I updated the tests added two tags filtering examples to nodes.md for the OR and AND cases.
PR is green now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug 🐛 Something isn't working testing This issue or pull request applies to testing work for Automate
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants