05月22, 2021

[译] Two years of squash merge


原文来自 Simone Carletti : https://blog.dnsimple.com/2019/01/two-years-of-squash-merge/

At DNSimple we use pull-requests every day as a standard workflow to propose, review, and submit changes to almost any git repository. For most core repositories, such as the DNSimple web application, or our Chef-based infrastructure, our policy is to not commit to master, but make changes into a separate branch, open a pull request, obtain a review from one or two people (depending on the change), and then merge the branch into master before deploying.

在DNSimple的日常开发中,我们几乎在所有仓库中使用PR流程进行代码的提交及审核工作。对于大多数像"DNSimple网页应用"这样的核心仓库,我们不会直接在master上提交而是将变更提交在其他分支上并创建一个PR,再根据变更的内容由一到两名人员审核后再合入master。 (这套git流程也是绝大多数团队正在使用的,包括我们)

A little more than two years ago, we decided to change the development team's workflow to always use git --squash merge. In this post I will highlight the reasons for this decision, how it worked for us, and what the benefits are.

大概在两年多以前,我们决定用 git --squash merge来替代现有的合并流程。在这篇文章中我会阐述这么做的原因,其中的工作原理以及我们是如何从这套流程中受益的。

git merge: fast-forward, recursive, and squash

Before we get into the details of why we adopted the --squash merge, let's have a quick look at the most common merge strategies in git.

Note: this is definitely not a comprehensive explanation of the git merge command. For more in-depth explanations, take a look at the documentation for git merge.

First of all, the purpose of git merge is to incorporate the changes from another branch into the current one. For simplicity, we'll assume we want to merge a branch containing our changes called bugfix into the branch master.

在深入为何采用git merge --squash的细节之前,让我们看看常用的几种git合并策略。




  • 快速推进策略

If master has not diverged from the branch, when it's time to merge git will simply move the reference of master forward to the last commit of the bugfix branch.


        C - D - E           bugfix
A - B                       master

After git merge: 合并之后

A - B - C - D - E           master/bugfix

Here's the output of the merge: 控制台输出:

merge-examples git:(master) git merge --ff bugfix
Updating 9db2ac7..3452cab

No fast-forward

  • 非快速推进策略

The default behavior of Git is to use fast-forwarding whenever possible. However, it's possible to change this behavior in the git configuration or passing the --no-ff (no fast-forward) option to git merge. As a result, even if git detects that master did not diverge, it will create a merge commit.

Git会在可能时默认使用fast-fowarding作为合并推进策略,但是你也可以通过设置或者 --no-ff (no fast-forward) 选项进行非快速推进合并,那么即使master在没有变更时也会为合并操作创建一个新的提交

        C - D - E           bugfix
A - B                       master

After git merge --no-ff:

        C - D - E           bugfix
      /           \
A - B ------------ F        master

Here's the output of the merge:

merge-examples git:(master) git merge --no-ff bugfix
Already up to date!
Merge made by the 'recursive' strategy.

Recursive strategy

  • 递归合并策略

So far, we assumed that master never diverged from the bugfix branch. However, this is quite unlikely, even in a small size team with multiple developers working on several different changes at the same time. Take the following example:


        C - D - E           bugfix
A - B - F - G               master

The commits F and G caused master to diverge from bugfix. Therefore, git can't simply fast-forward the reference to E or it will lose those 2 commits.

master 在B提交基础上创建了bugfix分支,之后推送了F和G提交,为了不丢失这两个提交,此时master没法使用ff策略了。

In this case, git will (generally) adopt a recursive merge strategy. The result is a merge commit that joins the two histories together:


        C - D - E           bugfix
      /           \
A - B - F - G ----- H       master

Here's the output of the merge:

merge-examples git:(master) git merge --no-ff bugfix
Already up to date!
Merge made by the 'recursive' strategy.

Squash merge

Squash merge is a different merge approach. The commits of the merged branch are squashed into one and applied to the target branch. Here's an example:


        C - D - E           bugfix
A - B - F - G               master

After git merge --squash && git commit:

        C - D - E           bugfix
A - B - F - G - CDE         master

where CDE is a single commit combining all the changes of C + D + E. Squashing retains the changes but discards all the individual commits of the bugfix branch.

其中CDE提交会组合C + D + E三个单独提交的所有变更,但是丢弃bugfix分支的所有单次提交记录。

Note that git merge --squash prepares the merge but does not actually make a commit. You will need to execute git commit to create the merge commit. git has already prepared the commit message to contain the messages of all the squashed commits.

注意git merge --squash是一个合并准备操作,并不会创建一个真正的提交。你需要手动在merge --squash后创建提交,git会将所有被squash的提交注释作为合并提交的注释。

What problem are we trying to solve?

Squash Merge 主要用于解决什么问题?

The main reason we decided to give --squash merge a try was to improve repository commit history quality.

之所以选择 squash merge 是因为这会为代码仓库的提交历史质量带来帮助。

Commits are essentially immutable. Technically there are ways to rewrite the history, but there are several reasons you generally don't want to do it. For the sake of simplicity, let's say the farther the commit is in the repository history, the more complicated it is to rewrite it.


It's important to write good commits because they are the pillar of your git history. It's hard to perfectly define what makes a commit a good commit, but in my experience, a good commit satisfies at least the following requirements:


Combines all the code changes related to a single logical change (it could be a feature, a bugfix, or an individual change part of a bigger change)


Provides an explanatory commit message that helps people understand the intent of the change


If you pick this commit independently from the history, it makes sense on its own


Requirement one should be your default coding habit. A commit should represent an atomic change, and you should avoid combining multiple changes that are not related each other. Although this seems obvious, I've seen commits that change the compilation script as well as introduce a new feature in the app. Let's use another more practical example: you are fixing a bug, so we want the changes to the software to be committed along with the regression tests, not in different commits that are not related to each other.


Requirement two is a very well-known problem. There are hundreds of articles trying to define a good commit messsage and trying to teach the programmer the art of writing a good commit message. The official git contributing page has some guidelines:


Short (50 chars or less) summary of changes

More detailed explanatory text, if necessary.  Wrap it to
about 72 characters or so.  In some contexts, the first
line is treated as the subject of an email and the rest of
the text as the body.  The blank line separating the
summary from the body is critical (unless you omit the body
entirely); tools like rebase can get confused if you run
the two together.

Further paragraphs come after blank lines.

  - Bullet points are okay, too

  - Typically a hyphen or asterisk is used for the bullet,
    preceded by a single space, with blank lines in
    between, but conventions vary here



 - 项目符号也可用来起分隔作用 (比如markdown格式里的'-')

 - 通常把连接符('-')或者星号('*')用于分段,项目符号前有单空格,相互用空行分隔,不过其约定也不尽相同。 (Git搞这么严格的格式也是方便用程序去批量处理这些PR comment吧)

These guidelines are extracted from an article written by Tim Pope back in 2008. It's probably the oldest article I can remember on this matter.

这些指引是从Tim Pope早在2008年的一篇文章中摘取的,可能也是我记忆中关于这个问题讨论最久远的一篇文章。

So we've seen that writing good commit messages seems to be a hard rule to follow. We've also seen that there are some objective metrics you can follow, and some tools enforce or encourage these metrics:



However, writing a good commit message is hard because it's not just a matter of following objective metrics. You can write a perfectly formatted but completely useless commit message:



OK, an even more useless one:

好吧,还有更扯的…… (哈哈哈哈哈,这个也太常见了)


Raise your hand if you ever created a commit with a message Fix test, Fix CI, Change foo, Add bar.

如果你曾经也写过“修复问题”、“优化xxx”、“新增功能”这样的注释,请举手 (举手)

What is wrong with this commit message, you may be asking. This brings us to requirement three. A good commit (and a good commit message) is one that if you select that commit at any point in time from the hundreds of commits in the repository, it will make sense on its own (or will provide enough information to reconstruct the reason of the change).


Let's do an experiment. Can you tell me what this change is about?



Indeed it fixes some specs with the goal to make them pass. But imagine if someone stumbles upon the changes on line 286 at some point 3 years from now. Neither the commit message nor the code explain why the specs were broken, when they were broken, what broke them, and why the change at line 286 was required. In isolation, this commit is quite meaningless.


Another common example of not very helpful messages is the first one in this history:

另一个无用的注释信息的例子,注意第一个提交: (作者公开处刑同事上瘾了)


Imagine you are navigating through the list of hundreds of commits trying to investigate when and why something broke. I think you would agree that the effort required to determine whether the first commit could be a candidate to examine is higher than the third commit. From the message point of view, it requires you to (at least) open the commit to examine the changes.


Furthermore, the first commit may also break requirement one, because it includes several changes in the same commit. You may argue that updating multiple dependencies is a single logical change, but if that's the case, you are probably underestimating the impact of changing even a single dependency in a large project.


Advantages of git squash merge


Now that we know the problem, let's see how squash merge can help us.


As I explained before, using squash merge will bundle up all the changes in a single commit, also giving us the chance to write a fresh, complete commit message that properly describes the intent of the changes.


Using commit messages is a great way to limit the presence of isolated changesets in your codebase. It drastically improves the quality of the code that is living on the primary repository branch, ensuring only independent, self-contained changesets are present.


Here's an example of what the history of the DNSimple app looks like:

DNSSimple app的仓库提交记录是这样的:


In case you are wondering if we are losing the individual changes, the answer is no. Each squash merge references back to a PR where the whole changes are tracked:



Occasionally, non-squash merge occurs. It happens. We're all human beings. But you can immediately see the difference when this happens:



Questions & Concerns

The use of squash merge is certainly not the only possible way to keep your version control history clean and readable. There are a number of best practices that each developer can adopt, individually or as a team. However, we found this feature to provide the best balance between simplicity, freedom, and results.


If you've been reading all the way to this point, you certainly have questions or comments. Here's the most common one I heard:


Aren't you discouraging individual commit quality?

No. Each committer is still encouraged to write good commits: combine together meaningful changes, along with explanatory commit messages. However, there is no peer pressure that a typo, a missing file, broken spec would end up cluttering the final primary branch.


不,对于提交者来说,还是要尽可能写好单次提交注释:将所有有意义的变更组合起来并赋予一个解释性注释。 但是对于像“错别字修改”“文件丢失”“格式错误”等问题的提交会污染整个代码主干的整洁。

You could use git rebase!

Yes, we can certainly use rebase to amend a commit message, or recombine commits. While this may work for local commits (and I frequently do it), rewriting the git history is discouraged once you've shared it (e.g. after you pushed it to the remote shared repository).

你可以用git rebase


In fact, to prevent issues with teammates and continuous intgration tools, we explicitly forbid rebasing your commits after you've pushed them. We don't allow git push --force either. The only use case for --force or git rebase is in the rare case of severe issues that may compromise security or stability of the repository. But that's an exceptional use case.


You can use short-living branches to avoid repetitive merge of master

No, this doesn't work. It does if you have very few developers, each working on individual branches. But when multiple developers are working on multiple-feature branches together, that doesn't scale. We encourage backporting master often into your branch to limit the risk of conflicts, and stay on top of the latest changes. For example, we continuously update dependencies. We also merge and ship on average 10 times a day.





By using squash merge, we have been able to drastically improve the quality of our change history, turning our commit log into a very powerful tool to navigate:


We reduced (and in certain case even eliminated) the number of fixme, fix previous commit, fix specs commits to the repository. Mistakes happen, and the developer has full freedom to experiment, and even commit incomplete changesets in a branch with the full confidence that, once merged, only the final result will be shown.

我们减少了(在特定场景甚至去除了)fixme,fix previous,fix specs的提交数量 (git提交注释中的一些标签,用于标识分类一些特定操作,类似TODO)。得利于squash合并,bug产生后开发人员可以灵活的进行实验,甚至无后顾之忧的去提交若干不完整的代码变更,因为最终只有合并的最终结果会被显示。

The development of feature branches is now much easier. We can cherry-pick, backport, and even periodically merge back master into a development branch, without worrying about all the various recursive merges showing up in the logs and cluttering the history.

用于开发的功能性分支现在更加易于管理了,可以cherry-pick,backport (从master或被派生的分支往已派生分支合并修改即backport)甚至周期性backport,再也不用担心因为多分支并行开发合并导致的递归合并日志对提交记录产生污染了。


We increased the confidence of non-developers or non-technical team members to contribute. In particular, we can leverage the use of web UI to edit files in place. The final merge is still subject to our peer review process, and it is the responsibility of the leading member to merge the final changeset with an appropriate message. A great example is our current copy editing process, as shown in the following screenshot:

这也降低了对非开发人员或非技术从业者参与仓库贡献的管理成本。尤其是对于可以利用页面UI即时编辑文件的提交(比如在页面上修改下README,git自动生成一句“UPDATE README”这样的无用注释)最终提交任然需要经过review过程,仓库管理人员可以润色最终的变更提交说明。我们目前copy edition流程就是一个很好的例子:


We simplified the rollback or revert process by packaging all the changes in a unique set at the very end of the git history.


We facilitated all the tasks that require navigating historical commits (e.g. via git blame), such as debugging, reviews, and cleanup of technical debt. The main reason is that all the changes related to a particular feature are not included in the same changeset. If you git blame a particular line of code and go to the commit diff, you'll get exactly all the associated changes including: methods that were created, spec files that were touched, views that were updated, etc. No more cases where you research why a method signature was changed, and the only changes you see in the commit are the method signature edit and (hopefully) the corresponding spec.


All these benefits result in a more maintainable code, less time spent chasing team members to get insights about why that line was changed, and more productivity with less stress.



-- EOF --