05月22, 2021

[译] Two years of squash merge

在日常的代码仓库管理中,我们常常面临多分支密集提交场景下代码版本管理的挑战。本篇文章作者分享如何从代码合并的角度出发改进出一个可持续、易维护的仓库贡献生态。

原文来自 Simone Carletti : https://blog.dnsimple.com/2019/01/two-years-of-squash-merge/

At DNSimple we use pull-requests every day as a standard workflow to propose, review, and submit changes to almost any git repository. For most core repositories, such as the DNSimple web application, or our Chef-based infrastructure, our policy is to not commit to master, but make changes into a separate branch, open a pull request, obtain a review from one or two people (depending on the change), and then merge the branch into master before deploying.

在DNSimple的日常开发中,我们几乎在所有仓库中使用PR流程进行代码的提交及审核工作。对于大多数像"DNSimple网页应用"这样的核心仓库,我们不会直接在master上提交而是将变更提交在其他分支上并创建一个PR,再根据变更的内容由一到两名人员审核后再合入master。 (这套git流程也是绝大多数团队正在使用的,包括我们)

A little more than two years ago, we decided to change the development team's workflow to always use git --squash merge. In this post I will highlight the reasons for this decision, how it worked for us, and what the benefits are.

大概在两年多以前,我们决定用 git --squash merge来替代现有的合并流程。在这篇文章中我会阐述这么做的原因,其中的工作原理以及我们是如何从这套流程中受益的。

git merge: fast-forward, recursive, and squash

Before we get into the details of why we adopted the --squash merge, let's have a quick look at the most common merge strategies in git.

Note: this is definitely not a comprehensive explanation of the git merge command. For more in-depth explanations, take a look at the documentation for git merge.

First of all, the purpose of git merge is to incorporate the changes from another branch into the current one. For simplicity, we'll assume we want to merge a branch containing our changes called bugfix into the branch master.

在深入为何采用git merge --squash的细节之前,让我们看看常用的几种git合并策略。

注意:这绝对不是一个对于git合并命令的硬核剖析文章,如果你想知道更多去查官方文档吧。

首先,所谓merge是将不同分支下的变更合并到当前分支上来,举个简单的例子,我们假定现在需要将bugfix的变更合并到主干master上:

fast-forward

  • 快速推进策略

If master has not diverged from the branch, when it's time to merge git will simply move the reference of master forward to the last commit of the bugfix branch.

如果master在创建bugfix后并没有变更,此时merge命令会将master的提交指针(引用)快速推进到bugfix的最新提交,这种策略即fast-forward快速推进

        C - D - E           bugfix
      /
A - B                       master

After git merge: 合并之后

A - B - C - D - E           master/bugfix

Here's the output of the merge: 控制台输出:

merge-examples git:(master) git merge --ff bugfix
Updating 9db2ac7..3452cab
Fast-forward

No fast-forward

  • 非快速推进策略

The default behavior of Git is to use fast-forwarding whenever possible. However, it's possible to change this behavior in the git configuration or passing the --no-ff (no fast-forward) option to git merge. As a result, even if git detects that master did not diverge, it will create a merge commit.

Git会在可能时默认使用fast-fowarding作为合并推进策略,但是你也可以通过设置或者 --no-ff (no fast-forward) 选项进行非快速推进合并,那么即使master在没有变更时也会为合并操作创建一个新的提交

        C - D - E           bugfix
      /
A - B                       master

After git merge --no-ff:

        C - D - E           bugfix
      /           \
A - B ------------ F        master

Here's the output of the merge:

merge-examples git:(master) git merge --no-ff bugfix
Already up to date!
Merge made by the 'recursive' strategy.

Recursive strategy

  • 递归合并策略

So far, we assumed that master never diverged from the bugfix branch. However, this is quite unlikely, even in a small size team with multiple developers working on several different changes at the same time. Take the following example:

到目前为止我们都是假设master在创建bugfix分支后没有发生变更的场景,然而仅是一个小规模团队都会出现多名开发人员在同一时间对分支做出不同的变更,比如以下的例子:

        C - D - E           bugfix
      /
A - B - F - G               master

The commits F and G caused master to diverge from bugfix. Therefore, git can't simply fast-forward the reference to E or it will lose those 2 commits.

master 在B提交基础上创建了bugfix分支,之后推送了F和G提交,为了不丢失这两个提交,此时master没法使用ff策略了。

In this case, git will (generally) adopt a recursive merge strategy. The result is a merge commit that joins the two histories together:

在这种情况下,git通常会采用递归合并策略——将bugfix的变更提交与master的历史提交合并在一起生成提交H。

        C - D - E           bugfix
      /           \
A - B - F - G ----- H       master

Here's the output of the merge:

merge-examples git:(master) git merge --no-ff bugfix
Already up to date!
Merge made by the 'recursive' strategy.

Squash merge

Squash merge is a different merge approach. The commits of the merged branch are squashed into one and applied to the target branch. Here's an example:

Squash合并是一个特别的合并方式,合并操作的所有提交将会被一起“挤入”一个提交并合并至目标分之,比如下面这个例子:

        C - D - E           bugfix
      /
A - B - F - G               master

After git merge --squash && git commit:

        C - D - E           bugfix
      /
A - B - F - G - CDE         master

where CDE is a single commit combining all the changes of C + D + E. Squashing retains the changes but discards all the individual commits of the bugfix branch.

其中CDE提交会组合C + D + E三个单独提交的所有变更,但是丢弃bugfix分支的所有单次提交记录。

Note that git merge --squash prepares the merge but does not actually make a commit. You will need to execute git commit to create the merge commit. git has already prepared the commit message to contain the messages of all the squashed commits.

注意git merge --squash是一个合并准备操作,并不会创建一个真正的提交。你需要手动在merge --squash后创建提交,git会将所有被squash的提交注释作为合并提交的注释。

What problem are we trying to solve?

Squash Merge 主要用于解决什么问题?

The main reason we decided to give --squash merge a try was to improve repository commit history quality.

之所以选择 squash merge 是因为这会为代码仓库的提交历史质量带来帮助。

Commits are essentially immutable. Technically there are ways to rewrite the history, but there are several reasons you generally don't want to do it. For the sake of simplicity, let's say the farther the commit is in the repository history, the more complicated it is to rewrite it.

提交本质上是不可更改的,虽然有若干技术手段可以对提交历史进行更改但是仍有足够的理由不建议这么做。简单来说,历史越久的的提交越难去改写它。

It's important to write good commits because they are the pillar of your git history. It's hard to perfectly define what makes a commit a good commit, but in my experience, a good commit satisfies at least the following requirements:

写好提交注释对于git提交历史至关重要,然而很难去客观定义怎样才是一个好的提交注释。从我的经验来看,一个好的提交注释至少满足以下要求:

Combines all the code changes related to a single logical change (it could be a feature, a bugfix, or an individual change part of a bigger change)

对于同一个逻辑变更,其所有的代码变更应被包含在同一次提交里。(可以是一个新功能,一个bug修复或者是另一个大变更的一部分)

Provides an explanatory commit message that helps people understand the intent of the change

提供解释性的提交信息有助于其他相关人员理解提交的意图。

If you pick this commit independently from the history, it makes sense on its own

从历史提交中挑出任意一个提交他会是自洽的(这点很难做到,要求保证每次提交的完整性,然而从实际情况来看,开发提交一个完整功能通常会是由众多“小提交”组成,这些单次提交很难保证逻辑完整的自洽

Requirement one should be your default coding habit. A commit should represent an atomic change, and you should avoid combining multiple changes that are not related each other. Although this seems obvious, I've seen commits that change the compilation script as well as introduce a new feature in the app. Let's use another more practical example: you are fixing a bug, so we want the changes to the software to be committed along with the regression tests, not in different commits that are not related to each other.

第一个基本的要求在于开发人员日常的编码习惯——一次单个提交应该代表一次原子变更,应当尽量避免将若干无关联的变更组合成一个提交。尽管这是显而易见的要求,但是我曾见过一个提交里包含了编译脚本的同时还有新功能的变更。举个更实际的例子:当你创建一个修复bug的提交同时应包含对其回归测试的代码,而不是分开在两个互不相关的提交中。

Requirement two is a very well-known problem. There are hundreds of articles trying to define a good commit messsage and trying to teach the programmer the art of writing a good commit message. The official git contributing page has some guidelines:

第二个要求是一个热议的话题,有不计其数的文章尝试去定义何谓“一个优秀的提交注释”,同时向开发者布道书写提交注释的艺术。Git官方贡献页面就有如下指引:


Short (50 chars or less) summary of changes

More detailed explanatory text, if necessary.  Wrap it to
about 72 characters or so.  In some contexts, the first
line is treated as the subject of an email and the rest of
the text as the body.  The blank line separating the
summary from the body is critical (unless you omit the body
entirely); tools like rebase can get confused if you run
the two together.

Further paragraphs come after blank lines.

  - Bullet points are okay, too

  - Typically a hyphen or asterisk is used for the bullet,
    preceded by a single space, with blank lines in
    between, but conventions vary here
少于50字的变更总结

如果需要的话更多的解释性内容。用72个字去概括。
在一些书面格式里,首行被看作邮件的标题,其余的部分为实际内容。
空行被严格的用于概括与内容的分隔(除非根本没有内容)。
如rebase这类工具会混淆两次提交注释。

空行用来承接余下的段落

 - 项目符号也可用来起分隔作用 (比如markdown格式里的'-')

 - 通常把连接符('-')或者星号('*')用于分段,项目符号前有单空格,相互用空行分隔,不过其约定也不尽相同。 (Git搞这么严格的格式也是方便用程序去批量处理这些PR comment吧)

These guidelines are extracted from an article written by Tim Pope back in 2008. It's probably the oldest article I can remember on this matter.

这些指引是从Tim Pope早在2008年的一篇文章中摘取的,可能也是我记忆中关于这个问题讨论最久远的一篇文章。

So we've seen that writing good commit messages seems to be a hard rule to follow. We've also seen that there are some objective metrics you can follow, and some tools enforce or encourage these metrics:

就像看到的那样,想要写出一个好的提交注释所遵循的条件是很苛刻的。不过同时也有一些客观指标可以供你遵循,依赖一些工具可以帮助我们达到这些要求。

image.png!

However, writing a good commit message is hard because it's not just a matter of following objective metrics. You can write a perfectly formatted but completely useless commit message:

然而单单遵循这些客观指标并不能写出一个好的提交注释,比如写出一个格式完美但并没什么卵用的注释:

image.png

OK, an even more useless one:

好吧,还有更扯的…… (哈哈哈哈哈,这个也太常见了)

image.png

Raise your hand if you ever created a commit with a message Fix test, Fix CI, Change foo, Add bar.

如果你曾经也写过“修复问题”、“优化xxx”、“新增功能”这样的注释,请举手 (举手)

What is wrong with this commit message, you may be asking. This brings us to requirement three. A good commit (and a good commit message) is one that if you select that commit at any point in time from the hundreds of commits in the repository, it will make sense on its own (or will provide enough information to reconstruct the reason of the change).

你可能会疑惑这些注释到底有什么问题,那要引出第三个要求——一个好的提交(当然包括提交注释)要做到,无论你何时在成百上千的历史提交中选择他时,他依然是合理自洽的。(或者提供足够的信息阐述修复的原因)

Let's do an experiment. Can you tell me what this change is about?

让我们做个实验,你能告诉我这个提交包含了哪些变更吗?

image.png

Indeed it fixes some specs with the goal to make them pass. But imagine if someone stumbles upon the changes on line 286 at some point 3 years from now. Neither the commit message nor the code explain why the specs were broken, when they were broken, what broke them, and why the change at line 286 was required. In isolation, this commit is quite meaningless.

当然这个提交修复一个由价格格式引起的bug,但想象一下如果某人在三年之后偶然发现了286行的这个变更,没有提交注释也没有代码注释来说明为什么这种格式错了、什么时候出现的错误、这个错误影响了什么以及为什么286行的这里需要这样修改。从这种意义上来看,这个提交注释是无用的。

Another common example of not very helpful messages is the first one in this history:

另一个无用的注释信息的例子,注意第一个提交: (作者公开处刑同事上瘾了)

image.png

Imagine you are navigating through the list of hundreds of commits trying to investigate when and why something broke. I think you would agree that the effort required to determine whether the first commit could be a candidate to examine is higher than the third commit. From the message point of view, it requires you to (at least) open the commit to examine the changes.

想象一下如果你需要依赖这份几百个提交的列表中去调查某个bug产生的原因,相信你会同意:需要更费力去搞清楚第一个提交做了什么而不是第三个提交。从所提供的信息看来,你需要去提交一个comment去解释变更了什么。

Furthermore, the first commit may also break requirement one, because it includes several changes in the same commit. You may argue that updating multiple dependencies is a single logical change, but if that's the case, you are probably underestimating the impact of changing even a single dependency in a large project.

而且第一个提交也违反了“基本要求一”的规定,因为他将几个变更包含在同一个提交中。你可能会杠:“更新几个依赖明明是一个同一个逻辑变更啊?”,但是在这个例子中,你可能低估了哪怕其中一个依赖的变更对整套系统的影响。

Advantages of git squash merge

Squash合并的优点

Now that we know the problem, let's see how squash merge can help us.

在搞清楚我们遇到的问题之后,让我们看看squash合并是如何帮助我们的。

As I explained before, using squash merge will bundle up all the changes in a single commit, also giving us the chance to write a fresh, complete commit message that properly describes the intent of the changes.

就像我之前说的那样,squash提交会将若干次更新合并成一次提交,同时在这次提交基础上,我们可以写出一个全新、完整的注释更恰当的阐述变更的意图。

Using commit messages is a great way to limit the presence of isolated changesets in your codebase. It drastically improves the quality of the code that is living on the primary repository branch, ensuring only independent, self-contained changesets are present.

提交注释可以很好的去隔离代码库里的变更集合。这点将会极大的提高基于master/branch这种开发模式的代码质量——只有相对独立,互不影响的变更集合会被显示出来。

Here's an example of what the history of the DNSimple app looks like:

DNSSimple app的仓库提交记录是这样的:

image.png

In case you are wondering if we are losing the individual changes, the answer is no. Each squash merge references back to a PR where the whole changes are tracked:

你可能会好奇如果这么做,每个独立提交的内容会丢失吗?并不会,每个squash合并可以回溯到相对应的PR以及整个变更轨迹:

image.png

Occasionally, non-squash merge occurs. It happens. We're all human beings. But you can immediately see the difference when this happens:

毕竟我们都是人,都会犯错,偶尔非squash的合并也会被提交,但是你可以很明显的看出二者其中的差异:

image.png

Questions & Concerns

The use of squash merge is certainly not the only possible way to keep your version control history clean and readable. There are a number of best practices that each developer can adopt, individually or as a team. However, we found this feature to provide the best balance between simplicity, freedom, and results.

使用squash合并不是维持版本控制记录整洁可读的唯一途径,还需要跟据是非个人开发者或者团队开发酌情选择合适的最佳实践。重要的是,我们发现squash合并可以在实施难度、自由性以及实施结果达到最佳平衡。

If you've been reading all the way to this point, you certainly have questions or comments. Here's the most common one I heard:

如果你已经读到这里了,你一定有一些疑问或者意见,这里是最常见的意见:

Aren't you discouraging individual commit quality?

No. Each committer is still encouraged to write good commits: combine together meaningful changes, along with explanatory commit messages. However, there is no peer pressure that a typo, a missing file, broken spec would end up cluttering the final primary branch.

你这是在鼓励忽视单次提交质量吗?

不,对于提交者来说,还是要尽可能写好单次提交注释:将所有有意义的变更组合起来并赋予一个解释性注释。 但是对于像“错别字修改”“文件丢失”“格式错误”等问题的提交会污染整个代码主干的整洁。

You could use git rebase!

Yes, we can certainly use rebase to amend a commit message, or recombine commits. While this may work for local commits (and I frequently do it), rewriting the git history is discouraged once you've shared it (e.g. after you pushed it to the remote shared repository).

你可以用git rebase

是的,你绝对可以用rebase去修改提交信息或重新合并提交。但是这只对本地提交有作用(我也常常这么玩儿哦)。但对于已提交内容的修改是需要避免的(比如你已经将提交推送至公共远程仓库)

In fact, to prevent issues with teammates and continuous intgration tools, we explicitly forbid rebasing your commits after you've pushed them. We don't allow git push --force either. The only use case for --force or git rebase is in the rare case of severe issues that may compromise security or stability of the repository. But that's an exceptional use case.

事实上,出于需要与团队成员进行issue讨论或CI工具的使用,我们禁止在推送提交后再去进行rebase操作,当然我们也不允许强制推送提交。在为数不多允许使用强制推送或者rebase的场景里都是基于其仓库在安全性与稳定性有着双重保障的前提下进行的,但是这也仅仅是例外场景。

You can use short-living branches to avoid repetitive merge of master

No, this doesn't work. It does if you have very few developers, each working on individual branches. But when multiple developers are working on multiple-feature branches together, that doesn't scale. We encourage backporting master often into your branch to limit the risk of conflicts, and stay on top of the latest changes. For example, we continuously update dependencies. We also merge and ship on average 10 times a day.

你可以用临时的分支去避免重复的对master进行合并

不,这并不好用。当只有你的开发团队有为数不多的开发者时,每个人在自己的分支上开发这种方式才有效果。但当多开发者同时在一起开发数个功能时,这种方式会变得不可控。我们鼓励时不时的同步master到你的本地分支去控制冲突产生的数量,并时常保持本地仓库的同步状态。比如我们持续的更新依赖,并且几乎每天就会有数十次合并发布。

Conclusion

总结

By using squash merge, we have been able to drastically improve the quality of our change history, turning our commit log into a very powerful tool to navigate:

通过使用squash合并,我们在提交记录的质量管理上取得了明显的成效并且将提交记录转变成了一个非常强大的指引工具:

We reduced (and in certain case even eliminated) the number of fixme, fix previous commit, fix specs commits to the repository. Mistakes happen, and the developer has full freedom to experiment, and even commit incomplete changesets in a branch with the full confidence that, once merged, only the final result will be shown.

我们减少了(在特定场景甚至去除了)fixme,fix previous,fix specs的提交数量 (git提交注释中的一些标签,用于标识分类一些特定操作,类似TODO)。得利于squash合并,bug产生后开发人员可以灵活的进行实验,甚至无后顾之忧的去提交若干不完整的代码变更,因为最终只有合并的最终结果会被显示。

The development of feature branches is now much easier. We can cherry-pick, backport, and even periodically merge back master into a development branch, without worrying about all the various recursive merges showing up in the logs and cluttering the history.

用于开发的功能性分支现在更加易于管理了,可以cherry-pick,backport (从master或被派生的分支往已派生分支合并修改即backport)甚至周期性backport,再也不用担心因为多分支并行开发合并导致的递归合并日志对提交记录产生污染了。

image.png

We increased the confidence of non-developers or non-technical team members to contribute. In particular, we can leverage the use of web UI to edit files in place. The final merge is still subject to our peer review process, and it is the responsibility of the leading member to merge the final changeset with an appropriate message. A great example is our current copy editing process, as shown in the following screenshot:

这也降低了对非开发人员或非技术从业者参与仓库贡献的管理成本。尤其是对于可以利用页面UI即时编辑文件的提交(比如在页面上修改下README,git自动生成一句“UPDATE README”这样的无用注释)最终提交任然需要经过review过程,仓库管理人员可以润色最终的变更提交说明。我们目前copy edition流程就是一个很好的例子:

image.png

We simplified the rollback or revert process by packaging all the changes in a unique set at the very end of the git history.

我们通过最终将所有的变更整合在一个唯一的提交记录集合中的方式简化了回滚或撤回的操作流程。

We facilitated all the tasks that require navigating historical commits (e.g. via git blame), such as debugging, reviews, and cleanup of technical debt. The main reason is that all the changes related to a particular feature are not included in the same changeset. If you git blame a particular line of code and go to the commit diff, you'll get exactly all the associated changes including: methods that were created, spec files that were touched, views that were updated, etc. No more cases where you research why a method signature was changed, and the only changes you see in the commit are the method signature edit and (hopefully) the corresponding spec.

减轻了所有类似debug,review这类需要以提交记录为基础的操作成本。出现这些问题的根源皆因针对于同一个功能的变更记录未被包含在同一个变更集合之内,这导致了如果你想要去搞清楚某次提交某行代码的变动,你只能去查看所有与之相关的变动,包括方法的创建,文件的创建,视图的变更等等。不会有更多的线索解释为什么方法签名变了,等待你的只有一句“更改了方法签名/修复了xxx”的注释。

All these benefits result in a more maintainable code, less time spent chasing team members to get insights about why that line was changed, and more productivity with less stress.

所有一切的好处在于让代码更易于维护,更明显的代码变更意图,以至于减少压力,提高效率。

本文链接:https://check321.net/post/repo_two_years_of_squash_merge.html

-- EOF --

Comments

请在后台配置评论类型和相关的值。