-
Notifications
You must be signed in to change notification settings - Fork 8
/
Copy pathprecious-files.txt
537 lines (476 loc) · 25.8 KB
/
precious-files.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
Precious Files Design Document
==============================
Table of Contents
* Objective
* Background
* File categorization exceptions
* Proposal
* Precious file specification
* Breakdown of suggested behaviors by command
* Backward compatibility notes
* Slightly Incompatible syntax
* Interaction with sparse-checkout parsing
* Behavior of traditional flags
* Interaction with older Git clients
* Commands with modified meaning
* Implementation hints
* Data structures
* Code areas
* Minimum
* Out of scope
* Previous discussions
* Alternatives considered
Objective
---------
Support "Precious" Files in git, a set of files which are considered
ignored (e.g. do not show up in "git status" output) but are not expendable
(thus won't be removed to make room for a file when switching or merging
branches).
Background
----------
In git we have different types of files, with various subdivisions:
* tracked
* present (i.e. part of sparse checkout)
* not present (i.e. not part of sparse checkout)
* not tracked
* ignored (also treated as expendable)
* untracked (more precisely, not-tracked-and-not-ignored, but often
referred to as simply "untracked" despite the fact that such a term
is easily mistaken as a synonym to "not tracked". However, we haven't
been fully consistent, and some places like `git ls-files --others`
may use "untracked" to refer to the larger not-tracked category).
Not considered expendable.
Over the years, the fact that ignored files are unconditionally treated as
expendable (so that other operations like git checkout might wipe them out
to make room for files on the other branch) has occasionally caused
problems. Many have expressed a desire for subdividing the ignored class,
so that we have both ignored-and-expendable (possibly referred to as
"trashable", covering the only type of ignored file we have today) and
introducing ignored-and-not-expendable (often referred to as "precious").
File categorization exceptions
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Our division above into nice categories is actually a bit of a lie.
Once upon a time untracked files were considered expendable[1]. Even after
that changed, we still had lots of edge cases where untracked files were
deleted when they shouldn't be, and ignored files weren't deleted when they
should be[2]. While that has been (mostly) fixed, despite the general
intent to preserve untracked files, we have special cases that are
documented as not preserving them[4,5]. There are also a few codepaths
that have comments about locations that might (or definitely do)
erroneously delete untracked paths[6]. And at least one code path that is
known to erroneously delete untracked paths which has not been commented:
`git checkout <tree> <pathspec>`. And there may be more.
[1] https://lore.kernel.org/git/CABPp-BFyR19ch71W10oJDFuRX1OHzQ3si971pMn6dPtHKxJDXQ@mail.gmail.com/
[2] https://lore.kernel.org/git/[email protected]/
[3] https://lore.kernel.org/git/de416f887d7ce24f20ad3ad4cc838394d6523635.1632760428.git.gitgitgadget@gmail.com/
[4] https://lore.kernel.org/git/[email protected]/
[5] https://lore.kernel.org/git/de416f887d7ce24f20ad3ad4cc838394d6523635.1632760428.git.gitgitgadget@gmail.com/
[6] https://lore.kernel.org/git/6b42a80bf3d46e16980d0724e8b07101225239d0.1632760428.git.gitgitgadget@gmail.com/
This history and these exceptions matter to this proposal because:
* it highlights how much work can be involved in trying to treat a class
of files as not expendable
* the existing corner cases where untracked files are erroneously
treated as expendable will probably also double as corner cases where
precious files are treated as expendable
* the past fixes for treating untracked files as precious will likely
highlight the needed types of code changes to treat ignored files as
precious
Proposal
--------
We propose adding another class of files: ignored-but-not-expendable,
referred to by the shorthand of "precious". The proposal is simple at a
high level, but there are many details to consider:
* How to specify precious files (extended .gitignore syntax? attributes?)
* Which commands should be modified, and how?
* How to handle flags that are essentially a partial implementation of
a precious capability (e.g. [--[no-]overwrite-ignore])
* How will older Git clients behave on a repo with precious files?
The subsequent sections will try to address these questions in more detail.
One thing to highlight here is that the class formerly called
`ignored` now has two subtypes: (1) the type we already have,
ignored-and-expendable (sometimes referred to below as "trashable")
and (2) the new type, ignored-and-not-expendable (referred to as
"precious").
Precious file specification
~~~~~~~~~~~~~~~~~~~~~~~~~~~
As per [P2]:
"""
Even though I referred to the precious _attribute_ in some of these
discussions, between the attribute mechanism and the ignore
mechanism, I am actually leaning toward suggesting to extend the
exclude/ignore mechanism to introduce the "precious" class. That
way, we can avoid possible snafu arising from marking a path in
.gitignore as ignored, and in .gitattrbutes as precious, and have to
figure out how these two settings are to work together.
"""
we specify precious files via an extension to .gitignore. In particular,
lines starting with a '$' character specify that the file is precious.
For example:
$.config
would say the file `.config` is precious.
Now that there are three types of files specified by .gitignore files --
untracked, trashable (ignored-and-expendable), and precious
(ignored-and-not-expendable), the meaning of `!` at the begining of a line
needs careful clarification. It could be seen as "not ignored" or as "not
trashable", given the subdivision of ignored files that has occurred. We
specifically take it to mean "not ignored", i.e. "untracked".
This leaves us with a simple set of rules to provide to users about lines
in their '.gitignore' file:
* No special prefix character => ignored-and-expendable ("trashable")
* A '$' prefix character => ignored-and-not-expendable ("precious")
* A '!' prefix character => not ignored, i.e. untracked
It's worth noting that the traditional use of '!' as a negation
character needs updating, given the introduction of a ternary state
("not trashable" could mean either untracked or precious, which is
ambiguous). Refrain from referring to '!' as a negation character to
avoid confusion. To assist users in making this mindset shift, flag
any line beginning with '!$' as an error. As always,
backslash-escaping remains an option, allowing users to specify
entries like '!\$foo' to mark a file named '$foo' as untracked.
Breakdown of suggested behaviors by command
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
See also "Out of Scope" section below, particularly for:
* apply, am [without -3]
* checkout/restore
* checkout-index
* additional information on merge backends
Documentation:
* audit for references to "ignore" and "ignored", to see which ones need
to now replace those with either "ignored-and-expendable" (or
"trashable"), and which can remain "ignored".
* audit for "exclude" and "excluded" (the older terminology for ignored
files) and update them as well.
* add references to "precious" (and perhaps "trashable) as needed (don't
forget the glossary)
* rm: update the documentation:
"Ignored files are deemed expendable and won't stop" ->
"Ignored files, unless specifically marked precious, are by default
deemed expendable and won't stop"
* ensure all codepaths touched by 0e29222e0c2 ("Documentation: call out
commands that nuke untracked files/directories", 2021-09-27) also call
out that they'll nuke precious files in addition to untracked ones.
* change the documentation for '!' in gitignore to stop using the term
'negates'; it's potentially misleading now (negating a ternary value
yields an ambiguous value). Instead, the prefix is used to mark
untracked (or "not ignored") files.
* note that the --[no-]overwrite-ignore option is deprecated, and, since
it predated the introduction of precious files is also a misnomer. The
correct name of the option would actually be --[no-]overwrite-trashable
but it is too late to rename.
* consider documenting that merge's --no-overwrite-ignore option is
virtually worthless (only works with the fast-forwarding backend).
* consider auditing the code for 'untracked' and fixing those to be
'not tracked' in cases where both 'untracked' and 'ignored' files
are meant
checkout/switch:
* will need to not overwrite precious files when they are in the way of
switching branches, unless --force/-f is specified.
checkout/restore:
* when passed a <tree> as a source, do not overwrite precious files
(NOR untracked files!), unless --force/-f is specified. [Could be
considered a stretch goal...]
merge:
* do not overwrite precious files when they are in the way of merging
branches. (Must be handled in each and every merge strategy;
user-defined merge strategies may get this wrong.)
read-tree:
* -u: do not overwrite precious files when they are in the way, unless...
* --reset and -u: overwrite precious files as well as untracked files.
Add to the warning under --reset about overwritten untracked files to
note that precious files are also overwritten.
am -3, cherry-pick, rebase, revert, : same as above for checkout/switch and
merge.
add:
* same as today, just make sure when we split the ignored array (ignored &
ignored_nr) into multiple categories that it continues working
rm:
* make sure submodules are not removed if precious files are present.
Currently, rm will remove submodules if only ignored files are present.
check-ignore:
* since this command exists for debugging gitignore rules, there needs to
be some kind of mechanism for differentiating between trashable and
precious files. It is okay if this comes with a new command-line flag,
but there should be some tests showing how it behaves both with and
without that flag when precious files are present
clean:
* clarify the meaning of -x and -X options: -X now means only remove
trashable files. -x means remove both untracked and trashable files.
(See also [P17])
* add a --all option for removing all not-tracked files: untracked,
trashable, and precious.
* Other than --all, it is not worth adding flags for cleaning subsets of
not-tracked files that include precious files (thus, no flag for just
precious, or trashable and precious, or untracked and precious)
* Paterns with a leading '$' can be passed to --exclude, if wanted.
ls-files:
* --ignored/-i: shows every kind of ignored files (thus behaving the same
as today, since there is no way to distinguish between the types of
ignored in the output)
* add new `--ignored=precious` and `--ignored=trashable` flags for
differentiating.
* --exclude,--exclude-from can now take patterns with a leading '$' and
the file will be considered precious rather than trashable.
status:
* --ignored (without additional parameters) continues behaving as-is: it
prints both trashable and precious files in its "Ignored" category with
no distinguishing.
* --ignored --short will continue showing trashable files with '!!', and
show precious files using '$$'.
* --ignored --porcelain={v1,v2} will continue showing precious files
with the '!' character, since scripts may not be prepared to parse a
leading '$'. We can't break those scripts, even if it'd avoid the
off chance that those scripts act on the information about "ignored"
files and end up nuking precious files.
* --ignored --porcelain=v3 will need to be introduced to show precious
files with a leading '$'.
sparse-checkout:
* the --rules-file option should be tested with a pattern with a leading
'$' to make sure it prints an expected error.
* it might be worth noting somewhere that sparse-checkout treats
ignored files as precious; when sparsifying, it attempts to remove
directories that do not match the sparse specification, but will
leave them present if any of the tracked files are modified, or if
there are any not-tracked files present. That includes ignored
files. That means no additional work is needed for precious
support; I just mention it for completeness.
Backward compatibility notes
----------------------------
There are multiple issues that impinge on backward compatibility (either in
terms of special care we need to take, or in terms of messaging we may need
to send out about changes):
* Slightly Incompatible syntax
* Interaction with sparse-checkout parsing
* Behavior of traditional flags
* Interaction with older Git clients
* Commands with modified meaning
We'll discuss each in its own subsection below.
Slightly Incompatible syntax
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
This new syntax obviously breaks backward compatibility in that an ignored
path named `$.config` would now have to be specified as `\$.config`. This
is similar to how introducing `!` as a prefix in .gitignore files was a
backward compatibility break. We expect and hope that the fallout will be
minor. See also [P10].
Interaction with sparse-checkout parsing
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The $GIT_DIR/info/sparse-checkout file also makes use of gitignore syntax
and the gitignore parsing to read the file. It differs in that the files
specified are considered the files to be included (i.e. present in the
working copy) rather than which files should be excluded, but otherwise
has until now used identical syntax and parsing.
However, for sparse-checkout there is no third type of file, so the '$'
prefix makes no sense for it. As such, it should be an error for any
lines to begin with '$' in a sparse-checkout file.
(This also means that if anyone really did have a path beginning with '$'
in sparse-checkout files previously, then they now need to backslash escape
them, the same as with .gitignore files.)
While we could theoretically avoid this small backward compatibility break
for sparse-checkout parsing by just treating a leading '$' the way it
traditionally has been done, I am worried about practically maintaining that
solution:
* the gitignore parsing is peppered with references like 'exclude' that
are specific to the gitignore case
* because of the above, it is _heavily_ confusing to attempt to read and
understand the gitignore handling while considering the sparse-checkout
case. I've been tripped up by it *many* times.
* I think trying to reuse the existing parsing engine and have it handling
both old and new syntax is a recipe for failure. It'd be much cleaner
to have errors thrown if the processing turns up any "precious" files,
or perhaps if any line starts with '$'.
* I think making a copy of the existing parsing, and then letting them
diverge, means the two will eventually diverge even further, and we
would need to make a copy of all the documentation about gitignore rules
for sparse-checkout, all for the non-default non-cone case we are
already recommending users away from.
Behavior of traditional flags
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
There are two flags to consider here: the --porcelain flag to git-status,
and the --no-overwrite-ignore command to checkout & merge commands. For
the --porcelain flag to git-status, see the "Breakdown of suggested
behaviors by command" and look for git-status there. The rest of this
section will focus on --[no-]overwrite-ignore.
People have wanted precious files long enough, that they implemented an
interim kludge of sorts -- a command line option that can be passed to
various subcommands that treats all ignored files as precious:
--no-overwrite-ignore.
In particular, this flag can be passed to both git-checkout, and git-merge.
However, in merge's case, the support depended the flag being passed to the
backend and the backend supporting it. The builtin/merge.c code only ever
bothered to pass this flag down to the fast-forwarding merge handling code,
so it never worked with any backends that actually create a merge commit.
We do need to keep these flags working, at least as much as they did
previously. However, we don't want to consider them desired features,
which would lead us to making related equivalents for precious files like
--overwrite-precious. Instead we will:
* Keep --[no-]overwrite-ignore working, as much as it already was.
* Recommend users mark precious files in their gitignore files instead of
using these flags
Interaction with older Git clients
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Older Git clients will not understand precious files. This means that:
* precious files will be considered untracked and not ignored.
* most comands will preserve these files, since untracked-and-not-ignored
are not considered expendable.
* git status will continue listing these files
* git add will add these files without requiring -f.
This seems like a reasonable tradeoff that only has minor annoyances. The
alternative of having the precious files treated as ignored has the very
risky trade-off of deleting files which the users marked as important for
us to keep.
Commands with modified meaning
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
In clean, we adjust the meaning of both -x and -X:
-X: remove only trashable files
-x: remove untracked and trashable files (but preserve precious ones)
Implementation hints
--------------------
Data structures
~~~~~~~~~~~~~~~
* We will want to add a `precious` and `precious_nr` in dir_struct,
similar to the current entries/nr or ignored/ignored_nr.
* We may want to rename `ignored` and `ignored_nr` in dir_struct to
`trashable` and `trashable_nr`.
Code areas
~~~~~~~~~~
* "preserve_ignored", a flag in the code for handling the
--[no-]overwrite-ignore flag, is a very helpful marker about what needs
to be tweaked and how to tweak it to preserve more files. In particular,
note that --no-overwrite-ignore works by telling the machinery in dir.c
to not do the setup_standard_excludes() stuff, so that all ignored files
just look like untracked files. We'll need something slightly smarter,
which makes precious files look like untracked while trashable files
still appear in ignored. Shouldn't be too bad.
* we might need to add another entry to the unpack_trees_reset_type
enum. Or perhaps rename we still keep both UNPACK_RESET_PROTECT_UNTRACKED
and UNPACK_RESET_OVERWRITE_UNTRACKED but rename them with
s/UNTRACKED/NOT_EXPENDABLE/ so it is clear they handle both untracked and
precious files. Not sure which is needed yet.
* dir_struct->flags _might_ need new entries.
* ensure all relevant codepaths touched by 94b7f1563ac ("Comment important
codepaths regarding nuking untracked files/dirs", 2021-09-27) are either
fixed or also mention precious files
* am/rebase/checkout[without -f]: see 480d3d6bf90 ("Change unpack_trees'
'reset' flag into an enum", 2021-09-27)
* Merge backends:
* (see also "Out of scope" section)
* merge-ort can be fixed by fixing the checkout code.
* merge-resolve and merge-octopus can probably be fixed by fixing
git-reset.
* stash:
* there is an existing --include-untracked option. There was no reason
to add a --include-ignored, because ignored files were trashable. Do
we need to add a --include-precious, though?
* this is a sad pile of shell-reimplemented-in-C. It's just awful.
See b34ab4a43ba ("stash: remove unnecessary process forking",
2020-12-01) and ba359fd5070 ("stash: fix stash application in
sparse-checkouts", 2020-12-01) and 94b7f1563ac ("Comment important
codepaths regarding nuking untracked files/dirs", 2021-09-27).
Fixing stash to not nuke precious files (and to not nuke untracked
files either) might mean expunging the stupid
shell-reimplemented-in-C design, or at least moving things more in
that direction.
* rebase (merge backend), revert, cherry-pick, am -3: should automatically
be handled by getting merge-ort to work, which should work by making
checkout/switch work.
* bisect: should work by making checkout work
Minimum
~~~~~~~
I think for a minimum implementation, we need to ensure that the following
are handled:
* parsing:
* parsing of lines starting with '$' in .gitignore
* erroring on lines starting with '!$' in .gitignore
* erroring on lines starting with '$' in $GIT_DIR/info/sparse-checkout
* commands with support:
* switch/checkout
* merge when using the ort backend
* read-tree -u [without --reset] (due to internal use)
* ls-files
Out of scope
------------
The following tasks are currently out of scope for this proposal:
apply, am [without -3]: apply won't overwrite any file in the working
directory even when a new file is in the patch. It should overwrite
trashable files. We could log that bug via testcase, but make sure
there's a companion testcase that ensures overwriting untracked or
precious files continues to make apply throw an error. However, since
apply/am don't misbehave for precious files, we can defer this to later.
checkout-index: similar to apply; won't overwrite any existing files, but
trashable files should be overwritten
reset --hard:
* `git reset --hard` is a little funny and we have thought about changing
it[4]. However, that can be left for later and will not be tackled as
part of the work of introducing "precious" files as a concept.
merge backends:
* trying to make --no-overwrite-ignore work with more merge backends
* when multiple merge strategies are specified, builtin/merge.c will
stash and restore state between the attempt of different strategies.
Since the reset_hard() function invokes `read-tree --reset -u`, there
might be a way to cause it to trash untracked files or to trash
precious files, depending on what the merge strategies did. It seems
unlikely (maybe the strategy handles D/F conflicts or rename
conflicts by renaming files in the way, and happens to rename a
precious file to a path that is considered either untracked or
precious -- merge-recursive certainly did this something like this
once upon a time and still might); we can probably ignore it for now.
* merge-recursive is a lost cause; it'd be a _huge_ amount of effort to
fix, but we intend to deprecate and delete it soon anyway (making all
requests for recursive just trigger ort instead).
* user-defined merge strategies are up to their authors to get right.
Odds are they won't, but odds are they already incorrectly nuke
untracked files too because who'd pay attention to a special case
like files being in the way of a merge? Anyway, "not our problem". :-)
Previous discussions
--------------------
A far from exhaustive sampling of various past conversations on the topic:
[P1] https://lore.kernel.org/git/[email protected]/
[P2] https://lore.kernel.org/git/[email protected]/
[P3] https://lore.kernel.org/git/[email protected]/
[P4] https://lore.kernel.org/git/[email protected]/
[P5] https://lore.kernel.org/git/[email protected]/
[P6] https://lore.kernel.org/git/[email protected]/
[P7] https://lore.kernel.org/git/[email protected]/
[P8] https://lore.kernel.org/git/[email protected]/
[P9] https://lore.kernel.org/git/[email protected]/
[P10] https://lore.kernel.org/git/[email protected]/
[P11] https://lore.kernel.org/git/[email protected]/
[P12] https://lore.kernel.org/git/[email protected]/
[P13] https://lore.kernel.org/git/[email protected]/
[P14] https://lore.kernel.org/git/[email protected]/
[P15] https://lore.kernel.org/git/ZSkpOc%2FdcGcrFQNU@ugly/
[P16] https://lore.kernel.org/git/[email protected]/
[P17] https://lore.kernel.org/git/[email protected]/
Alternatives considered
-----------------------
There have been multiple alternatives considered, along a few different
axes:
* .gitattributes instead of .gitignore
* leaving sparse-checkout alone
* Trashable [P9,P11]
* Alternative gitignore syntax
The choice of .gitattributes vs .gitignore was already addressed in the
"Precious file specification" section.
The choice to modify or leave alone the parsing of
$GIT_DIR/info/sparse-checkout was already addressed in the "Interaction
with sparse-checkout parsing" section.
One alternative raised in the past was treating ignored files as not
expendable by default, and then introducing a new category of
ignored-but-expendable. This new category has been dubbed "trashable" in
the past. That may have been a reasonable solution if Git did not have a
large userbase already, but moving in this direction would cause severe
problems for existing builds everywhere[P9] and would require users to
doubly configure most files (since it is expected that
ignored-but-expendable is a much larger class of files than
ignored-but-precious). See also [P11].
There have been multiple alternative suggestions for extending gitignore
syntax to handle precious files and optionally future extensions as well.
For example: [P10, P12, P13, P14, P15, P16] However:
* There have been on and off requests for precious files for about 14
years
* We are not aware of other types of extensions needed; there might
not be any
* The alternatives all seem much more complex to explain to users than
the simple proposal here.
In particular, we like the simplicity of the providing the simple mapping
to users from the penultimate paragraph of the "Precious file
specification" section (the one regarding no-prefix vs. '!' vs '$').