You are on page 1of 50

CVSs and Software

Repositories
The Whole Picture
Tooling
Several tools support the storage and management of software
artifacts created during the lifespan of a software endeavor

• Version control systems


• Bug tracking systems
• Mailing lists (example: https://lkml.org/) Hosting Services
• IRC / Real-Time Chat Systems
• Wikis
• …
Ad-hoc Classification
Hosting Services: “Integration from End to End”
• Project and issue tracking
• Agile method support (Scrum)
• Backlog prioritization and sprint planning
• Flexible workflow
• Developer tools integrations
• Out-of-the-box agile reporting
• Plug-and-play add-ons
• Disaster recovery
Integrated project management

Source Code Repositories
What They are

They act as centralized locations for software developers and offer a


bunch of tools

• They are also web-based hosting services for software development


projects


Several Alternatives
• SourceForge, 1999

• GNU Savannah, 2001

• Launchpad, 2004

• Google code, 2005 (turned down in early 2016)

• CodePlex, 2006

• GitHub, 2008

• BitBucket, 2008 Comparison of source code hosting services


https://en.wikipedia.org/wiki/Comparison_of_source_code_hosting_facilities
• GitLab, 2011
Repos’ Popularity
Period:
January-May, 2011
Measure:
Number of commits

Results:
Github: 1,153,059
Sourceforge: 624,989
Google Code: 287,901
CodePlex: 49,839

Source: http://readwrite.com/2011/06/02/github-has-passed-sourceforge#awesm=~olI7ORJuVtmelR
Comments from the Community

GitHub has emerged as the prime Code Hosting place


• Network graph allows to see forks and what merged into what and when
• Ability to 'watch' projects - your account page is like a Facebook wall with new
checkins
• Super good diff viewer with the ability to comment on single line changes
• GitHub has a GUI tool GitHub for Windows (GitHub Desktop)
• Source code and collaboration among developers are its emphasis
Recommended Readings

A good comparison of a number of open-source software hosting


facilities is available at
http://en.wikipedia.org/wiki/Comparison_of_open-source_software_hosting_facilities

How GitHub Conquered Google, Microsoft, and Everyone Else


https://www.wired.com/2015/03/github-conquered-google-microsoft-everyone-else/

GitLab vs GitHub: Key differences & similarities


https://usersnap.com/blog/gitlab-github/
Version Control Systems
Google Trends
What They are
A VCS is a software system that tracks and provides control over
changes to a file or set of files (source code) so that you can recall
specific versions later

• They are the most used repositories within software development processes,
since the source code is the most valuable asset of a software endeavor

• Changes to source code artifacts are tracked and managed in terms of


revisions, versions and
Repository Flavors

Distributed
Centralized
• CVS - 1990 • Bitkeeper - 1997
• SVN - 2000 • Git - 2005
• Bazaar - 2005
• Mercurial - 2005
Repos and VCS

VCS Repos
Git GitHub, Bitbucket, Codeplex, SourceForge

Mercurial Bitbucket, Codeplex, , Google code, SourceForge

Bazaar Launchpad, GNU Savannah, SourceForge


Subversion GitHub, Codeplex, GNU Savannah, Google code,
SourceForge

CVS Upgrade!
Usage

https://www.openhub.net/repositories/compare
Advantages of DVCSs
Decentralization
Technical Advantages of DVCSs

• DVCSs allow individual developers to be servers or clients, like in peer-


to-peer models.

• Developers can work on source code without being connected to a


central or remote repository (offline mode)

• DVCSs work in terms of changesets instead of versions, where each


individual file has its own separate revision
Technical Advantages of DVCSs …

• This variation leads to easier branching and merging, and smaller


repositories since less redundant information is stored

• Operations such as diff, log, branch and merge perform faster than in
Structural Advantages
• DVCSs allow teams to easily create, implement, or switch between
different workflow models. For example,
• The Integration-manager model
• The Benevolent dictator model

• It allows teams to define intermediate roles that are responsible for


testing, merging, reviewing and integrating changes from new
Research Advantages

DVCSs keep new data, which will lead to new research questions
related to how DVCSs affect processes, products, and people around
software projects
• The repositories are smaller in size than the centralized ones,
yet contain more information about contributions and
workflows.
• The data extraction and repositories cloning are faster than
before.
• There is true authorship information
Research Advantages

There is an MSR challenge each year!


2018 Edition: https://conf.researchr.org/home/msr-2018
Remarkable users of DVCSs

Git Bazaar Mercurial


•Git •Mozilla
•Ubuntu
•Linux Kernel
•GNU Emacs •OpenJDK
•Perl
•Gnome
•GNU GRUB •OpenSolaris
•GNU Mailman •OpenOffice.org
•Qt
•Wget
•Ruby on Rails •Symbian OS
•Inkscape
•Android
•MySQL •Netbeans
•Wine •GNU Octave
•Gnash
•Fedora
•GNOME bindings for Java •LinuxTV/Video4Linux
•Debian
•Squid •Audacious
•Grails
•Stellarium •SAGE
•VLC
Triggering Events

Available at: http://www.youtube.com/watch?v=4XpnKHJAok8


Available at: http://www.youtube.com/watch?v=iR0rBYI1gy4
Migrations: from CVCs to DCVs
Project Year Migration
Python 2008 SVN -> Mercurial
Perl 2008 Perforce -> Git
MySQL 2008 Bitkeeper -> Bazaar

NetBeans 2009 CVS -> Mercurial


Mozilla 2007 CVS -> Mercurial
Gnome 2009 SVN -> Git
KDE 2009 SVN -> Git
PostgreSQL 2008 CVS -> Git
Eclipse 2009 CVS -> Git
GitHub Features for Non-Tech Users
What is GitHub?
GitHub
GitHub is a code hosting platform for version control and collaboration. It lets
you and others work together on projects from anywhere.

Collaborative Software Engineering


It is the study of collaboration among individuals – from users to developers.
• Concepts and techniques
• Global software engineering
• Ecosystems
• Software product lines

The Analyzed Projects
Bootstrap
The most popular HTML, CSS, and JavaScript framework for developing
responsive, mobile first projects on the web

In GitHub: https://github.com/twbs/bootstrap
Web site: http://getbootstrap.com/

UNFlea+ (a web application created by SE students)


A platform to sell, buy, exchange, donate, and auction items
In GitHub: https://github.com/virttuall/UNFlea-
Web site: http://unfleaplus-unfleaplus.rhcloud.com//
Graphs: the Pulse

It provides an overview of the activity level of a project

Bootstrap info
Graphs: Contributors (all)
It shows a graph for all of the contributions, followed by smaller graphs showing
the contributions by the individual developers

UNFlea+ info

It shows the number of commits that have been made over time to the master
branch
Graphs: Contributors
It is possible to select a specific time period for these
UNFlea+ info
Contributions as a Work Measure

There could be some developers that create many small commits,


while others could prefer to do few medium-sized commits
Under these individual preferences, is perfectly possible that all of them have
been doing similar work

Research questions:
• What is the size (and features) of a typical commit?
• How often do developers commit?

Graphs: Commits

It shows the number of commits per week over the life of the project

UNFlea+ info
Analysis of the Commits Number

• Is the number of commits a good estimator of the health of a


project?
• Is there a correlation between the number of commits and the
team size?
• How is the distributions of the commits over time? What can we
infer from this distribution?

Graphs: the Code frequency

It shows you the number of


lines added to and removed
from the project over time

Useful for detecting big


changes in the source code
Bootstrap info
Graphs: The Punch Card

UNFlea+ info

It shows what time of day and which day most commits get done
Graphs: The Punch Card
Bootstrap info
Analysis of the Punch Card

It this a great way to get insight into the times when your team is
most productive
Graphs: the Network
It shows the number of branches and commits on those branches
throughout a project’s history (Branching)
It also shows any forks that contributors have created

UNFlea+ info
Analysis of the Network

• It allows you to know how the team is using branching and


merging
• Is someone responsible for merging?
• How often do they merge?

The Members List
The members list shows just the people who have forked the repository or
forks of forks


Analysis of the Member List

• It could be a measure of how interesting is a project among


developers
• It could be a way to get good ideas regarding new features, and
even, new projects related to the original one

Graphs: the Traffic
It shows you …
• the number of views and unique visitors over time,
• lists the sites that people are linking from,
• and highlights the most popular content on your GitHub
Analysis of the Traffic Graph

• It can be a great way to get a sense of the popularity for open source
projects

• It also gives an assessment of the utility of the graphs offered by the GitHub
How to Contribute to a Project

There are two ways to contribute:


A. by creating a fork and a pull request (when you do not have
permission to work)
B. by adding, editing, renaming, or deleting a file directly on GitHub
(when you are the owner or a collaborator
Contributing via a Fork

• make a copy of the project on GitHub under your user account


(forking)
• make any changes you want to your fork (copy)
• Request that your changes get incorporated into the original project
by using a pull request
The Hello World! Ple
ase
do
Do the Hello World project presented at (GitHub Guides) it
• https://guides.github.com/activities/hello-world//
References

• Peter Bell & Brent Beer, Introducing GitHub: a non-technical guide.


O’Reilly, 2015

• Good Resources for Learning Git and GitHub


• https://help.github.com/articles/git-and-github-learning-resources/

You might also like