Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chore(CI): add paths to ignore #8791

Merged
merged 2 commits into from
May 2, 2023
Merged

chore(CI): add paths to ignore #8791

merged 2 commits into from
May 2, 2023

Conversation

MichelDiz
Copy link
Contributor

Problem

Previously, our GitHub Actions CI workflows were processing unnecessary files, which was consuming extra compute resources and increasing our CI costs. The included files are such as documentation files, image files, and configuration files. This slows down the CI process and increases resource usage / costs.

Solution

To reduce CI costs and improve efficiency, I have added a set of paths to be ignored in our CI workflows using paths-ignore.

By ignoring these files and directories, our CI workflows will only process the necessary files, which should reduce the overall CI execution time and cost.

@MichelDiz MichelDiz added the dgraph Issue or PR created by an internal Dgraph contributor. label Apr 6, 2023
@ghost ghost added the area/integrations Related to integrations with other projects. label Apr 6, 2023
@mangalaman93
Copy link
Member

@MichelDiz how much does it save for us on CI?

@MichelDiz
Copy link
Contributor Author

@mangalaman93 Kind of hard to say. But I was talking with @skrdgraph about this. It is a waster of resources doing CI with all those tests at the same time all the time someone adds a markdown, gif, png or unrelated code. Makes no sense run CI in such simple cases right? We also run the test in CRON. No need to run when someone changes a md file.

@mangalaman93
Copy link
Member

Very few of the PRs in this repo are non code related. only 2 out of last 35 PRs merged are actually non-code related. If cost is a problem, we can figure something out there. I don't think this is really needed.

@MichelDiz
Copy link
Contributor Author

@mangalaman93 this is needed, not based on the past only. But in the future. We will have RFC(documentation). The compose path is code, but not related with any test we have. Dgraphtest can have it's own CI. And any image and other stuff we don't want to run it.

Also, this is just the beginning. We gonna fine tune it. I have plan to make the CI as "fail fast" instead of waiting the whole test to run. Just hold-on.

Copy link
Contributor

@joshua-goldstein joshua-goldstein left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Echoing @mangalaman93 concerns, why is this necessary right now? We are not making frequent changes to these files, and when we do so we can always keep the PR as a draft and then only open when ready to merge. This means that CI runs at most two times when such a change is proposed and merged. That is not a huge overhead.

My comments on the PR are the more serious concerns I have.

- '.vscode/*.json'
- 'compose/**'
- 'contrib/**'
- 'dgraphtest/**'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should be removed - if we make a change to our test suite, we want to see if the tests pass or fail

- '.vscode/CODEOWNERS'
- '.vscode/*.json'
- 'compose/**'
- 'contrib/**'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

our canonical Dockerfile is located in contrib, this should be removed

@MichelDiz
Copy link
Contributor Author

Echoing @mangalaman93 concerns, why is this necessary right now? We are not making frequent changes to these files, and when we do so we can always keep the PR as a draft and then only open when ready to merge. This means that CI runs at most two times when such a change is proposed and merged. That is not a huge overhead.

My comments on the PR are the more serious concerns I have.

why is this necessary right now?

Costs is always necessary thing to do. We are not philanthropy who receives infinite money from the government.

We are not making frequent changes to these files,

Which means? It's not that you're not seeing it now that it's not going to happen.

and when we do so we can always keep the PR as a draft and then only open when ready to merge.

The point is. Not only us open PRs. We can guide the team with "good practice". But we don't say anything to community contributors. This PR is a way to mitigate that. Plus, not every new engineer is going to know everything they need to know beforehand. That's why we automate things. Cuz we never know what the others will do.

That is not a huge overhead.

That can be true now. But with time every system tends to increase its complexity. Imagine running CI to handle just a PNG? Imagine in the future having a complex CI sequence just to change a Markdown? The CI have a cost. It's like the cost of an appliance doing unnecessary work. In the month you do not see the costs of this. But if you add every year or a decade. You can buy a private jet.

Another problem we solve with this is the following. Imagine the situation, someone introduces a bug and accidentally goes through the CI. Only then in "my" turn I'm just changing a text in a README. The CI blocks it and I'll have all the work to find out what's going on and call someone. And the person fixing the readme might not have the right skill. But usually we see a PR with a failing CI and we don't help to solve it. "Let the dev take care of it alone".

So, if we have a Markdown/PNG/JPG/others change. We should fore SURE skip the whole CI. If the CI will do any good for that PR at all.

@joshua-goldstein
Copy link
Contributor

Echoing @mangalaman93 concerns, why is this necessary right now? We are not making frequent changes to these files, and when we do so we can always keep the PR as a draft and then only open when ready to merge. This means that CI runs at most two times when such a change is proposed and merged. That is not a huge overhead.
My comments on the PR are the more serious concerns I have.

why is this necessary right now?

Costs is always necessary thing to do. We are not philanthropy who receives infinite money from the government.

We are not making frequent changes to these files,

Which means? It's not that you're not seeing it now that it's not going to happen.

and when we do so we can always keep the PR as a draft and then only open when ready to merge.

The point is. Not only us open PRs. We can guide the team with "good practice". But we don't say anything to community contributors. This PR is a way to mitigate that. Plus, not every new engineer is going to know everything they need to know beforehand. That's why we automate things. Cuz we never know what the others will do.

That is not a huge overhead.

That can be true now. But with time every system tends to increase its complexity. Imagine running CI to handle just a PNG? Imagine in the future having a complex CI sequence just to change a Markdown? The CI have a cost. It's like the cost of an appliance doing unnecessary work. In the month you do not see the costs of this. But if you add every year or a decade. You can buy a private jet.

Another problem we solve with this is the following. Imagine the situation, someone introduces a bug and accidentally goes through the CI. Only then in "my" turn I'm just changing a text in a README. The CI blocks it and I'll have all the work to find out what's going on and call someone. And the person fixing the readme might not have the right skill. But usually we see a PR with a failing CI and we don't help to solve it. "Let the dev take care of it alone".

So, if we have a Markdown/PNG/JPG/others change. We should fore SURE skip the whole CI. If the CI will do any good for that PR at all.

All reasonable points -- I think we should have a discussion about file reorganization in the repo before making these changes. There are already some strange things like the static directory that should probably either be removed or moved to a special directory. That way we don't have a huge list of exceptions and can keep this no-run list as small as possible.

@joshua-goldstein joshua-goldstein self-requested a review April 18, 2023 21:17
@mangalaman93
Copy link
Member

Thanks @joshua-goldstein for taking a look at the PR. As I said, my worry still is that this list is pretty long and repeated. A smaller list and hopefully written in one place would be a lot better. I am not sure if that is possible or something but I would much prefer that.

Copy link
Contributor

@skrdgraph skrdgraph left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on this. This is a good first pass IMO.

@skrdgraph
Copy link
Contributor

@MichelDiz there may be a way to not do DRY here. Can you check on that before merging?

@mangalaman93
Copy link
Member

Again, I'm not happy with this PR. There is so much duplicate things here. Please think about it and do better. @joshua-goldstein @skrdgraph

@MichelDiz
Copy link
Contributor Author

@aman-bansal let me do it

   - '.github/CODEOWNERS'

This will ignore any changes to the text file that defines code owners.

   - '.vscode/**'

This will ignore any changes made to the Vscode settings. We don't do any tests with it.

   - 'compose/**'

We don't do any testing in compose. It's a tool that creates yml for docker and the like. Again, there's no testing with this or something strictly connected to that in the CI.

   - 'contrib/systemd/**'

There are no tests here. It's just a repository of sample configurations.

   - 'licenses/**'

Only license here. We don't do CI on licenses.

   - 'paper/**'

It makes no sense to do CI in text and PDF.

   - 'present/**'

This is an old folder that contained Pitch. It should be deleted. But who knows if anyone wants to use it yet.

   - 'RFC/**'

RFC is just text. There are no tests. It doesn't make sense to run CI when we insert an RFC.

   - 'static/**'

These are artifacts that should be deleted. Because it was used in the Readme. Today we no longer use it. But as this PR is not intended to delete anything. So let someone else decide what to do with this path. Until there we are ignoring it.

   - 'wiki/**'

That's a dead end, as the documentation has been moved to another repository.

   - '**/**.dockerignore'
   - '**/**.gitignore'

This is ignoring dockerignore and gitignore files. There are no tests for this.

   - '**/**.md'

This is just text. It makes no sense to run CI for that.

   - '**/**.png'

This is just image. It makes no sense to run CI for that.

   - '**/**.jpg'

This is just image. It makes no sense to run CI for that.

   - '**/**.gif'

This is just image. It makes no sense to run CI for that.

   - '**/**.ini'

This is just a config file, which maybe should be deleted. It makes no sense to run CI for that.

Let me know if some doubt remains.

@MichelDiz
Copy link
Contributor Author

This setting does NOT mean that if it find a markdown for example. Which will block the CI. If you edit anything else. It will naturally trigger the CI. This logic applied in this PR ignores only if you are editing something only like texts and the others quoted. There is no risk of not triggering the CI. Otherwise Github would not have created this feature. Or would make it very clear if it would work other way.

@mangalaman93
Copy link
Member

Seems like there is no better way doing this. Merging this PR, thanks for doing this @MichelDiz

@mangalaman93 mangalaman93 merged commit b3e0c72 into main May 2, 2023
@mangalaman93 mangalaman93 deleted the micheldiz/paths-ignore branch May 2, 2023 18:38
jbhamra1 pushed a commit that referenced this pull request May 9, 2023
This PR limits CI runs on files changes that do not affect
product code.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/integrations Related to integrations with other projects. dgraph Issue or PR created by an internal Dgraph contributor.
Development

Successfully merging this pull request may close these issues.

4 participants