Free Online Courses for Software Developers - MrBool
× Please, log in to give us a feedback. Click here to login
×

You must be logged to download. Click here to login

×

MrBool is totally free and you can help us to help the Developers Community around the world

Yes, I'd like to help the MrBool and the Developers Community before download

No, I'd like to download without make the donation

×

MrBool is totally free and you can help us to help the Developers Community around the world

Yes, I'd like to help the MrBool and the Developers Community before download

No, I'd like to download without make the donation

Differences between GIT and SVN

In this article we will see the differences between the control systems Subversion and Git.

This article aims to present a comparison between the most common versioning control systems for the Java ecosystem: Subversion (SVN), as a representative model of centralized version control and Git, as representative of the distributed model (or DVCS - Distributed Version Control System). It is assumed that the reader already has some familiarity with code versioning systems, the concepts of revisions, branches and tags, and the use of command line tools. The end of the paper presents a brief introduction to Git integration tool with other systems, such as Java IDEs, and the build process with the Maven tool.

The increase of complexity in software projects also increases the demand for fast response to changes and the need for systems integration. In this scenario, the distributed version control systems are no longer a technological trend and they become essential components in the toolset of software development, especially with the growing demand in the corporate world. Among the companies that announced the use Git in their projects we can cite Google, which adapted Git for use on the Android project, and Microsoft, which supports Git after the release of the ASP.NET MVC framework source code was available on Codeplex .

This adoption of distributed version control systems trend arose from the need of a model for code management that shift the focus from simple source code backup to code sharing and collaboration between diverse development teams ranging from different departments of the same company to international projects. Besides the collaborative feature, the tools for distributed version control bring new approaches to solve versioning problems aiming to increase productivity in software development process.

History

Subversion is a centralized control system version that consists of a client (SVN) and a central server, accessed via TCP/IP, usually via SSH or HTTP/ WebDAV protocols. Initially developed by CollabNet in late 2000, and member of the Apache project since 2010, SVN has been adopted in many Java community projects due to the integration with development tools and features to solve problems of its predecessor - CVS - by the transactional support and performance improvements on communication between the client and the server.

Git was initially developed to meet the demands of a control version system to manage the source code of the Linux operating system kernel. The development of Linux by its very nature required a distributed versioning model, where commits are scattered in several parallel lines of development (branches) and must go through an approval process. This need led to the development of Git by the leader of the Linux project, Linus Torvalds. After the release of the first version in 2005 Git was handled to its current maintainer: Junio Hamano, a software engineer at Google.

In a short time after the release of Git, other large open source projects started to adopt it as a version control system, including Perl, PostgreSQL and Android, leveraging its popularity among the software development community. Currently, the main projects of the Java community - JBoss, Spring, Apache and Eclipse - are in the process of migrating from SVN to Git. The Eclipse project even developed a 100% Java Git version named JGit.

At the same time the social coding site named GitHub arisen. This site allows code sharing based on Git and it mixed the SourceForge's Social Coding concept with social networks elements. One of its main features is the simplicity of creating derivative works - or forks - due to the use of synchronization code between multiple remote repositories of a distributed version control system. When creating a fork, the user repository becomes a copy of another Github repository. Thus, it is possible to synchronize changes from the original repository with the user repository and generate new versions of any project available on the Github network. This enables the dissemination of original contributions to the project through "pull requests", another feature of distributed control systems that will be seen later in this article.

Concepts

The model of distributed version control go beyond the simple elimination of the single point of failure found on the centralized model: they also focus on the collaborative aspect of the software development process. The distributed system has the flexibility to meet the most diverse workflows (workflows). Decentralization allows any developer to contribute to a project without write permission on a central repository and the developers can also work with an elected central repository in a similar way to the centralized model. We list below some of the most common workflows distributed model:

  • Shared repository model: With this flow the changes are consolidated on a central repository. This repository may be mediated by an approval system, such as Gerrit. This approval system was employed on Android project and requires that every commit goes through a code review process;
  • Pull requests model: With this flow the changes are obtained on demand. The developer works only in their personal repository and signal to stakeholders so they can recover the changes directly from the repository. This is the model used by GitHub.

Even when a common repository is employed the use of distributed version control system implies a paradigm shift in relation to the centralized model (see Figure 1). In the distributed model, each developer has a local repository generated from the full copy (clone) of the source repository (in Figure 2, represented as a repository in the cloud).

Centralized Model

Figure 1. Centralized Model.

Distributed Model.

Figure 2.Distributed Model.

When the developer is working on his/her local copy, the code changes are made without network latency (offline). The synchronization between the local repository and remote repository can be done in batches, rather than the individual transfer of commits over the network as in the centralized model. In the distributed version control system, sending commits to another repository is the operation called push and the receipt of commits is called pull.

A distributed version control system is able to automatically detect any movement and/or files renaming, thus allowing the merge code between repositories even when files have different names and/or locations. This operation is known as a merge-through-rename. In Git this is possible because it creates a versioning of the content (snapshots) of each file, as opposed to the versioning of the file itself and its deltas, as in SVN. More details about the differences between the representations of the SVN and Git are presented below.

Comparison

In SVN, a common workflow consists of the transfer (checkout) of a folder or a branch operation from the directory server to a local copy on the desktop. Thus, development is done only based on the files found in the directory structure, which is the developer's desktop. In Git the developer's repository consists of a full copy of the remote repository and a partition operation into subdirectories does not have the same meaning, as the checkout occurs after the clone, so all files are present on the client's file system.

The developer desktop is generated upon checkout of a particular branch. After the clone, Git performs a checkout of an initial branch called "master", similar to "trunk" in Subversion. Branches and merges play a fundamental role in the distributed model. Every developer works in a local branch which, in turn, is associated with a remote branch, or tracking branch. By synchronizing the repositories through the pull operation, if there was any change in the remote branch, a merge commit is generated between the local branch and remote branch. Otherwise, local changes are applied sequentially on the remote branch (fast-forward). In Git, the merges are represented by points of convergence in the version history, illustrated in the first column of Figure 3.

Graphical
representation of the version history

Figure 3.Graphical representation of the version history.

Unlike SVN, which maintains a sequence number of revisions, Git revisions are distributed across multiple repositories and are identified only by a hash. A developer commit must generate a separate hash from another developer commit, since the collision would lead the system to consider the commits as identical. Git creates a 40-character hash (SHA-1), but in general only the first characters of this hash are needed to identify a commit (shown in the third column of Figure 3).

While SVN branches and tags are represented as directories, in Git they are represented by "pointers". A commit is a pointer to the parent-commit (or parents in the case of merges) and a set of metadata such as author, date and comments. By associating a new commit to a branch it is linked to the commit. The tag in turn, is static pointer for a given commit. Git allows the creation of annotated tags, i.e a tag with metadata associated with the same semantics as a commit.

In SVN, all commits are permanently recorded in the version history and it is possible to recover or reverse any commits that have been made in the past. In GIT, a commit can "disappear" from the version control system if all references to branches or tags are removed for this commit. This happens because objects (commits) without references (in this case, branches or tags) are elected to be collected by the "garbage collector" (git gc). Despite this, all pointer changes are registered (reflog) and it is possible to move a branch to point to an earlier commit in the revision history (if the commit has not yet been removed by the collector). Figure 4 presents a "master" branch and its commit associations to the file tree.

Simplified representation of a Git snapshot

Figure 4. Simplified representation of a Git snapshot.

Another difference between SVN and Git is in the configuration of a shared repository. In SVN it involves the creation of a repository in a separate location from the desktop client. In Git the local developer repository can be shared for reading, however to allow writings in local repository it is required to generate a new clone without including the desktop through the "bare" option (see the Note 1):

git clone --bare / ../.git

Note 1. Bare Repository

A Bare repository is a repository authorized to accept push requests containing only the Git system files without a working tree. It is identified by the suffix ."git".

The Git repository, as well as SVN, can be accessed via HTTP. Git uses Smart HTTP (see Note 2) for this purpose, with user permissions set in the Apache web server, such as LDAP. To set the read access, the git instaweb command provides the repository for viewing using the lighthttpd (Linux) and webrick (Mac) web servers.

Note 2. Smart HTTP

SmartHTTP is a RPC protocol for interactive access to Git. Available since version 1.6.6, it allows the operations clone, push and pull from a remote repository via HTTP/HTTPS, through a CGI script as an alternative to WebDAV.

As well as the protocol "svn://", a Git repository can be made available for TCP/IP access with the protocol "git://". For access via SSH, Git comes with the git shell to restrict access only to predefined commands on the server. There are third-party tools such as Gitorious, or Gitolite Gerrit [1] that enable a more refined permit control.

There are some pros and cons that can be observed when migrating to the distributed version control style that Git provides. Among the disadvantages of Git, we have the biggest learning curve due to the paradigm shift itself and the flexibility brought by the distributed model. The revisions are no longer sequential and must be adapted to the workflow for each project. Another factor is the increased use of disk space on the client side, because of the remote repository clone, which is not optimal for binary files that do not benefit from compression source tools code included in Git.

Moreover, one the Git advantages is workflow flexibility to meet diverse development teams. The possibility to work offline promotes the team's agility by eliminating network latency and allows any developer to contribute to the project indirectly from pull requests or even the simple creation of local branches. Another positive aspect is the natural workflow with branches and merges in Git, since every interaction on the distributed control system involves the merge between a local and remote branch. Even when there are merges between different branches, the process is similar to the tasks performed everyday by developer that often occurs in a transparent manner.

Finally, there are commands available on Git for virtually any operation in the version history. This includes everything from the possibility of returning to a version of minutes ago (git checkout "@ {10} minutes ago"), to seek commentaries using regular expressions (git show :/ ^ JIRA). The full command reference is found on the manual that came with the installation of Git into two categories: low-level commands (plumbing) and high-level commands (porcelain, as cited in this article).

Despite the differences, many Git commands are correlated to the SVN commands, such as git status, git blame, git add and git commit. However, git revert and git checkout have distinct behaviors of their correlates. For more information see the correlation of commands between Git and SVN on Table 1.

Correlation of Git and SVN commands

Table 1. Correlation of Git and SVN commands.

In addition to the command-line tools, Git includes the graphical tools gitk and git-gui to browse the version history and interact with the repository, respectively. If the developer is familiar with TortoiseSVN, the same tool is available to Git - the TortoiseGit - which can be useful to reduce the learning curve. The TortoiseGit contains an integrated version of the Git manual and the Pro Git book [3] .

SVN integration

Git has interoperability with SVN through the commands suite git svn. The following command performs a clone operation from a remote SVN repository:

git svn clone http://my-project.googlecode.com/svn

The clone generated by git svn command can also be used to migrate from a remote SVN repository to Git. Since the history of versions stored in SVN is preserved, a simple push of files to a "bare" Git repository is needed to migrate from SVN to Git.

In an enterprise development environment the procedure for this migration needs to consider aspects such as migration of the user base from SVN to Git [2] that are beyond the scope of this article. However, it is important to note that there are fundamental differences between Git and SVN. Git is optimized to manage source code, not binaries, and its distributed nature - the repository is fully replicated among developers - may be necessary to subdivide the SVN repository in several Git repositories.

Git also integrates with the Maven build tool by using the maven-scm-plugin. To perform this integration just provides the remote repository name with the prefix scm: git prefix as shown in Listing 1.

Listing 1. Entry in the Pom.xml with the SCM repositories configuration for reading (fetch) and write (push).

<scm>   <developerConnection>scm:git:[fetch=]ssh://git@dev1/~/sample.git [push=]ssh://git@dev2/~/sample.git</developerConnection>   </scm>

Finally, it is noteworthy that NetBeans and IntelliJ IDEs have native support for Git and Eclipse and provides the EGit plugin since version 3.6. The STS (Spring Tool Suite) Eclipse distribution includes this plugin by default (Figure 5).

The
EGit Git plugin for Eclipse

Figure 5.The EGit Git plugin for Eclipse.

Conclusion

Like any paradigm shift, moving from a system like SVN to Git go from an initial denial stage ("Why migrate the current system that is working well?") to an acceptance stage ("How was it possible to work without the DVCS?") when there is improvement in the software development process.

The version control systems proposed to solve problems of version control with an innovative approach focusing on improved ways to interact and share source code. The article sought to emphasize conceptual differences between Git and SVN systems in order to minimize the initial impact on adoption of DVCS, which will be likely to be successful with the assimilation of the concepts that underlie the distributed model combined with experience in its adoption for solving real world problems.



Freelancer Software Developer. Have knowledge in Java, Android, HTML, CSS and Javascript. He has also knowledge in Agile Development

What did you think of this post?
Services
[Close]
To have full access to this post (or download the associated files) you must have MrBool Credits.

  See the prices for this post in Mr.Bool Credits System below:

Individually – in this case the price for this post is US$ 0,00 (Buy it now)
in this case you will buy only this video by paying the full price with no discount.

Package of 10 credits - in this case the price for this post is US$ 0,00
This subscription is ideal if you want to download few videos. In this plan you will receive a discount of 50% in each video. Subscribe for this package!

Package of 50 credits – in this case the price for this post is US$ 0,00
This subscription is ideal if you want to download several videos. In this plan you will receive a discount of 83% in each video. Subscribe for this package!


> More info about MrBool Credits
[Close]
You must be logged to download.

Click here to login