Skip to content

Commit 40e3f56

Browse files
committed
FAQ.md: new file with frequently answered questions
There are several questions I have had to answer several times; collect them into a single document that I can just link to. Signed-off-by: Elijah Newren <[email protected]>
1 parent 19b1cb6 commit 40e3f56

File tree

2 files changed

+313
-0
lines changed

2 files changed

+313
-0
lines changed

Documentation/FAQ.md

Lines changed: 311 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,311 @@
1+
# Frequently Answered Questions
2+
3+
## Table of Contents
4+
5+
* [Why did `git-filter-repo` rewrite commit hashes?](#why-did-git-filter-repo-rewrite-commit-hashes)
6+
* [Why did `git-filter-repo` rewrite more commit hashes than I expected?](#why-did-git-filter-repo-rewrite-more-commit-hashes-than-i-expected)
7+
* [Why did `git-filter-repo` rewrite other branches too?](#why-did-git-filter-repo-rewrite-other-branches-too)
8+
* [Help! Can I recover or undo the filtering?](#help-can-i-recover-or-undo-the-filtering)
9+
* [Can you change `git-filter-repo` to allow future folks to recover from `--force`'d rewrites?](#can-you-change-git-filter-repo-to-allow-future-folks-to-recover-from---forced-rewrites)
10+
* [Can I use `git-filter-repo` to fix a repository with corruption?](#Can-I-use-git-filter-repo-to-fix-a-repository-with-corruption)
11+
* [What kinds of problems does `git-filter-repo` not try to solve?](#What-kinds-of-problems-does-git-filter-repo-not-try-to-solve)
12+
* [Filtering history but magically keeping the same commit IDs](#Filtering-history-but-magically-keeping-the-same-commit-IDs)
13+
* [Bidirectional development between a filtered and unfiltered repository](#Bidirectional-development-between-a-filtered-and-unfiltered-repository)
14+
* [Removing specific commits, or filtering based on the difference (a.k.a. patch or change) between commits](#Removing-specific-commits-or-filtering-based-on-the-difference-aka-patch-or-change-between-commits)
15+
* [Filtering two different clones of the same repository and getting the same new commit IDs](#Filtering-two-different-clones-of-the-same-repository-and-getting-the-same-new-commit-IDs)
16+
17+
## Why did `git-filter-repo` rewrite commit hashes?
18+
19+
This is fundamental to how Git operates. In more detail...
20+
21+
Each commit in Git is a hash of its contents. Those contents include
22+
the commit message, the author (name, email, and time authored), the
23+
committer (name, email and time committed), the toplevel tree hash,
24+
and the parent(s) of the commit. This means that if any of the commit
25+
fields change, including the tree hash or the hash of the parent(s) of
26+
the commit, then the hash for the commit will change.
27+
28+
(The same is true for files ("blobs") and trees stored in git as well;
29+
each is a hash of its contents, so literally if anything changes, the
30+
commit hash will change.)
31+
32+
If you attempt to write a commit (or tree or blob) object with an
33+
incorrect hash, Git will reject it as corrupt.
34+
35+
## Why did `git-filter-repo` rewrite more commit hashes than I expected?
36+
37+
There are two aspects to this, or two possible underlying questions users
38+
might be asking here:
39+
* Why did commits newer than the ones I expected have their hash change?
40+
* Why did commits older than the ones I expected have their hash change?
41+
42+
For the first question, see [why filter-repo rewrites commit
43+
hashes](#why-did-git-filter-repo-rewrite-commit-hashes), and note that
44+
if you modify some old commit, perhaps to remove a file, then obviously
45+
that commit's hash must change. Further, since that commit will have a
46+
new hash, any other commit with that commit as a parent will need to
47+
have a new hash. That will need to chain all the way to the most recent
48+
commits in history. This is fundamental to Git and there is nothing you
49+
can do to change this.
50+
51+
For the second question, there are two causes: (1) the filter you
52+
specified applies to the older commits too, or (2) git-fast-export and
53+
git-fast-import (both of which git-filter-repo uses) canonicalize
54+
history in various ways. The second cause means that even if you have
55+
no filter, these tools sometimes change commit hashes. This can happen
56+
in any of these cases:
57+
58+
* If you have signed commits, the signatures will be stripped
59+
* If you have commits with extended headers, the extended headers will
60+
be stripped (signed commits are actually a special case of this)
61+
* If you have commits in an encoding other than UTF-8, they will by
62+
default be re-encoded into UTF-8
63+
* If you have a commit without an author, one will be added that
64+
matches the committer.
65+
* If you have trees that are not canonical (e.g. incorrect sorting
66+
order), they will be canonicalized
67+
68+
If this affects you and you really only want to rewrite newer commits in
69+
history, you can use the `--refs` argument to git-filter-repo to specify
70+
a range of history that you want rewritten.
71+
72+
(For those attempting to be clever and use `--refs` for the first
73+
question: Note that if you attempt to only rewrite a few old commits,
74+
then all you'll succeed in is adding new commits that won't be part of
75+
any branch and will be subject to garbage collection. The branches will
76+
still hold on to the unrewritten versions of the commits. Thus, you
77+
have to rewrite all the way to the branch tip for the rewrite to be
78+
meaningful. Said another way, the `--refs` trick is only useful for
79+
restricting the rewrite to newer commits, never for restricting the
80+
rewrite to older commits.)
81+
82+
## Why did `git-filter-repo` rewrite other branches too?
83+
84+
git-filter-repo's name is git-filter-**_repo_**. Obviously it is going
85+
to rewrite all branches by default.
86+
87+
`git-filter-repo` can restrict its rewriting to a subset of history,
88+
such as a single branch, using the `--refs` option. However, using that
89+
comes with the risk that one branch now has a different version of some
90+
commits than other branches do; usually, when you rewrite history, you
91+
want all branches that depend on what you are rewriting to be updated.
92+
93+
## Help! Can I recover or undo the filtering?
94+
95+
Sure, _if_ you followed the instructions. The instructions told you to
96+
make a fresh clone before running git-filter-repo. If you did that (and
97+
didn't force push your rewritten history back over the original), you
98+
can just throw away your clone with the flubbed rewrite, and make a new
99+
clone.
100+
101+
If you didn't make a fresh clone, and you didn't run with `--force`, you
102+
would have seen the following warning:
103+
```
104+
Aborting: Refusing to destructively overwrite repo history since
105+
this does not look like a fresh clone.
106+
[...]
107+
Please operate on a fresh clone instead. If you want to proceed
108+
anyway, use --force.
109+
```
110+
If you then added `--force`, well, you were warned.
111+
112+
If you didn't make a fresh clone, and you started with `--force`, and you
113+
didn't think to read the description of the `--force` option:
114+
```
115+
Ignore fresh clone checks and rewrite history (an irreversible
116+
operation, especially since it by default ends with an
117+
immediate pruning of reflogs and old objects).
118+
```
119+
and you didn't read even the beginning of the manual
120+
```
121+
git-filter-repo destructively rewrites history
122+
```
123+
and you think it's okay to run a command with `--force` in it on
124+
something you don't have a backup of, then now is the time to reasses
125+
your life choices. `--force` should be a pretty clear warning sign.
126+
(If someone on the internet suggested `--force`, you can complain at
127+
_them_, but either way you should learn to carefully vet commands
128+
suggested by others on the internet. Sadly, even sites like Stack
129+
Overflow where someone really ought to be able to correct bad guidance
130+
still unfortunately has a fair amount of this bad advice.)
131+
132+
See also the next question.
133+
134+
## Can you change `git-filter-repo` to allow future folks to recover from --force'd rewrites?
135+
136+
This will never be supported.
137+
138+
* Providing an alternate method to restore would require storing both
139+
the original history and the new history, meaning that those who are
140+
trying to shrink their repository size instead see it grow and have to
141+
figure out extra steps to expunge the old history to see the actual
142+
size savings. Experience with other tools showed that this was
143+
frustrating and difficult to figure out for many users.
144+
145+
* Providing an alternate method to restore would mean that users who are
146+
trying to purge sensitive data from their repository still find the
147+
sensitive data after the rewrite because it hasn't actually been
148+
purged. In order to actually purge it, they have to take extra steps.
149+
Same as with the last bullet point, experience has shown that extra
150+
steps to purge the extra information is difficult and error-prone.
151+
This extra difficulty is particularly problematic when you're trying
152+
to expunge sensitive data.
153+
154+
* Providing an alternate method to restore would also mean trying to
155+
figure out what should be backed up and how. The obvious choices used
156+
by previous tools only actually provided partial backups (reflogs
157+
would be ignored for example, as would uncommitted changes whether
158+
staged or not). The more you try to carefully backup everything, the
159+
more difficult the restoration from backup will be. The only backup
160+
mechanism I've found that seems reasonable, is making a separate
161+
clone. That's expensive to do automatically for the user (especially
162+
if the filtering is done via multiple invocations of the tool). Plus,
163+
it's not clear where the clone should be stored, especially to avoid
164+
the previous problems for size-reduction and sensitive-data-removal
165+
folks.
166+
167+
* Providing an alternate method to restore would also mean providing
168+
documentation on how to restore. Past methods by other tools in the
169+
history rewriting space suggested that it was rather difficult for
170+
users to figure out. Difficult enough, in fact, that users simply
171+
didn't ever use them. They instead made a separate clone before
172+
rewriting history and if they didn't like the rewrite, then they just
173+
blew it away and made a new clone to work with. Since that was
174+
observed to be the easy restoration method, I simply enforced it with
175+
this tool, requiring users who look like they might not be operating
176+
on a fresh clone to use the --force flag.
177+
178+
But more than all that, if there were an alternate method to restore,
179+
why would you have needed to specify the --force flag? Doesn't its
180+
existence (and the wording of its documentation) make it pretty clear on
181+
its own that there isn't going to be a way to restore?
182+
183+
## Can I use `git-filter-repo` to fix a repository with corruption?
184+
185+
Some kinds of corruption can be fixed, in conjunction with `git
186+
replace`. If `git fsck` reports warnings/errors for certain objects,
187+
you can often [replace them and rewrite
188+
history](examples-from-user-filed-issues.md#Handling-repository-corruption).
189+
190+
## What kinds of problems does `git-filter-repo` not try to solve?
191+
192+
This question is often asked in the form of "How do I..." or even
193+
written as a statement such as "I found a bug with `git-filter-repo`;
194+
the behavior I got was different than I expected..." But if you're
195+
trying to do one of the things below, then `git-filter-repo` is behaving
196+
as designed and either there is no solution to your problem, or you need
197+
to use a different tool to solve your problem. The following subsections
198+
address some of these common requests:
199+
200+
### Filtering history but magically keeping the same commit IDs
201+
202+
This is impossible. If you modify commits, or the files contained in
203+
them, then you change their commit IDs; this is [fundamental to
204+
Git](#why-did-git-filter-repo-rewrite-commit-hashes).
205+
206+
However, _if_ you don't need to modify commits, but just don't want to
207+
download everything, then look into one of the following:
208+
* [partial clones](https://git-scm.com/docs/partial-clone)
209+
* the ugly, retarded hack known as [shallow clones](https://git-scm.com/docs/shallow)
210+
* a massive hack like [cheap fake
211+
clones](https://github.com/newren/sequester-old-big-blobs) that at
212+
least let you put your evil overlord laugh to use
213+
214+
### Bidirectional development between a filtered and unfiltered repository
215+
216+
Some folks want to extract a subset of a repository, do development work
217+
on it, then bring those changes back to the original repository, and
218+
send further changes in both directions. Such a tool can be written
219+
using fast-export and fast-import, but would need to make very different
220+
design decisions than `git-filter-repo` did. Such a tool would be
221+
capable of supporting this kind of development, but lose the ability
222+
["to write arbitrary filters using a scripting
223+
language"](https://josh-project.github.io/josh/#concept) and other
224+
features that `git-filter-repo` has.
225+
226+
Such a tool exists; it's called [Josh](https://github.com/josh-project/josh).
227+
Use it if this is your usecase.
228+
229+
### Removing specific commits, or filtering based on the difference (a.k.a. patch or change) between commits
230+
231+
You are probably looking for `git rebase`. `git rebase` operates on the
232+
difference between commits ("diff"), allowing you to e.g. drop or modify
233+
the diff, but then runs the risk of conflicts as it attempts to apply
234+
future diffs. If you tweak one diff in the middle, since it just applies
235+
more diffs for the remaining patches, you'll still see your changes at
236+
the end.
237+
238+
filter-repo, by contrast, uses fast-export and fast-import. Those tools
239+
treat every commit not as a diff but as a "use the same versions of most
240+
files from the parent commit, but make these five files have these exact
241+
contents". Since you don't have either the diff or ready access to the
242+
version of files from the parent commit, that makes it hard to "undo"
243+
part of the changes to some file. Further, if you attempt to drop an
244+
entire commit or tweak the contents of those new files in that commit,
245+
those changes will be reverted by the next commit in the stream that
246+
mentions that file because handling the next commit does not involve
247+
applying a diff but a "make this file have these exact contents". So,
248+
filter-repo works well for things like removing a file entirely, but if
249+
you want to make any tweaks to any files you have to make the exact same
250+
tweak over and over for every single commit that touches that file.
251+
252+
In short, `git rebase` is the tool you want for removing specific
253+
commits or otherwise operating on the diff between commits.
254+
255+
### Filtering two different clones of the same repository and getting the same new commit IDs
256+
257+
Sometimes two co-workers have a clone of the same repository and they
258+
run the same `git-filter-repo` command, and they expect to get the same
259+
new commit IDs. Often they do get the same new commit IDs, but
260+
sometimes they don't.
261+
262+
When people get the same commit IDs, it is only by luck; not by design.
263+
There are three reasons this is unsupported and will never be reliable:
264+
265+
* Different Git versions used could cause differences in filtering
266+
267+
Since `git fast-export` and `git fast-import` do various
268+
canonicalizations of history, and these could change over time,
269+
having different versions of Git installed can result in differences
270+
in filtering.
271+
272+
* Different git-filter-repo versions used could cause differences in
273+
filtering
274+
275+
Over time, `git-filter-repo` may include new filterings by default,
276+
or fix existing filterings, or make any other number of changes. As
277+
such, having different versions of `git-filter-repo` installed can
278+
result in differences in filtering.
279+
280+
* Different amounts of the repository cloned or differences in
281+
local-only commits can cause differences in filtering
282+
283+
If the clones weren't made at the same time, one clone may have more
284+
commits than the other. Also, both may have made local commits the
285+
other doesn't have. These additional commits could cause history to
286+
be traversed in a different order, and filtering rules are allowed
287+
to have order-dependent rules for how they filter. Further,
288+
filtering rules are allowed to depend upon what history exists in
289+
your clone. As one example, filter-repo's default to update commit
290+
messages which refer to other commits by abbreviated hash, may be
291+
unable to find these other commits in your clone but find them in
292+
your coworkers' clone. Relatedly, filter-repo's update of
293+
abbreviated hashes in commit messages only works for commits that
294+
have already been filtered, and thus depends on the order in which
295+
fast-export traverses the history.
296+
297+
`git-filter-repo` is designed as a _one_-shot history rewriting tool.
298+
Once you have filtered one clone of the repository, you should not be
299+
using it to filter other clones. All other clones of the repository
300+
should either be discarded and recloned, or [have all their history
301+
rebased on top of the rewritten
302+
history](https://htmlpreview.github.io/?https://github.com/newren/git-filter-repo/blob/docs/html/git-filter-repo.html#_make_sure_other_copies_are_cleaned_up_clones_of_colleagues).
303+
304+
<!--
305+
## How do I see what was removed?
306+
307+
Run `git rev-list --objects --all` in both a separate fresh clone from
308+
before the rewrite and in the repo where the rewrite was done. Then
309+
find the objects that exist in the old but not the new.
310+
311+
-->

README.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -84,6 +84,8 @@ If you prefer learning from examples:
8484
section](https://htmlpreview.github.io/?https://github.com/newren/git-filter-repo/blob/docs/html/git-filter-repo.html#EXAMPLES)
8585
* I have collected a set of [example filterings based on user-filed issues](Documentation/examples-from-user-filed-issues.md)
8686

87+
In either case, you may also find the [Frequently Answered Questions](Documentation/FAQ.md) useful.
88+
8789
# Why filter-repo instead of other alternatives?
8890

8991
This was covered in more detail in a [Git Rev News article on

0 commit comments

Comments
 (0)