Extracting new metrics from Version Control

Transcrição

Extracting new metrics from Version Control
Extracting new metrics from Version
Control System for the comparison of
software developers
Marcello Moura1 , Hugo Nascimento2 e Thierson Rosa2
Centro de Recursos Computacionais1 , Instituto de Informática2
Universidade Federal de Goiás (UFG)
Caixa Postal 131 – 74.001-970 – Goiânia – GO – Brazil
[email protected], {hadn,thierson}@inf.ufg.br
Goiânia, 21 de Setembro 2014
Moura, Nascimento e Rosa
Extracting new metrics from VCS ...
1 / 48
Summary I
1
Introduction
2
Extracting fine-grain operations from VCS
3
Metrics for the developers
4
Comparison of the developers
5
The case study
6
Conclusion
Moura, Nascimento e Rosa
Extracting new metrics from VCS ...
2 / 48
Summary
1
Introduction
2
Extracting fine-grain operations from VCS
3
Metrics for the developers
4
Comparison of the developers
5
The case study
6
Conclusion
Moura, Nascimento e Rosa
Extracting new metrics from VCS ...
3 / 48
Introduction
Version Control Systems (VCSs), like Subversion and Git, store
revisions of the files of a software development project,
registering its historical evolution.
Moura, Nascimento e Rosa
Extracting new metrics from VCS ...
4 / 48
Introduction
VCSs have been used for:
Helping to understand the software development
process – Lopez-Fernandez et al. [2004], Huang and Liu
[2005], Girba et al. [2005], Voinea and Telea [2006]
and Voinea et al. [2007].
Helping to know more about the developers – Gilbert
and Karahalios [2007], Jermakovics et al. [2011], Mockus
and Herbsleb [2002], Minto and Murphy [2007], Schuler
and Zimmermann [2008], Zhang et al. [2008a,b]
and Di Bella et al. [2013].
Moura, Nascimento e Rosa
Extracting new metrics from VCS ...
5 / 48
Introduction
Our work focuses on understanding the developers by the
analisys of their work.
1
We identify and count finer-grain operations at line and file
levels that can be extracted from a VCS, like additions,
deletions and modifications.
This allows to derive a much more detailed and rich
information about the work performed by the developers.
2
We calculate a new set of formally defined metrics.
3
Developers are characterized by comparing each one of
them against the others.
Two comparison approaches for this aim are described.
Moura, Nascimento e Rosa
Extracting new metrics from VCS ...
6 / 48
Introduction
Note: The VCS data can not be taken as a full and precise
description of the software development process.
It is incomplete and may lead to distinct interpretations.
(e.g. Negara et al. [2012])
Information extracted from a VCS has to be revalidated by
the project managers and complemented with their own
knowledge.
Moura, Nascimento e Rosa
Extracting new metrics from VCS ...
7 / 48
Introduction
Note: The VCS data can not be taken as a full and precise
description of the software development process.
It is incomplete and may lead to distinct interpretations.
(e.g. Negara et al. [2012])
Information extracted from a VCS has to be revalidated by
the project managers and complemented with their own
knowledge.
Moura, Nascimento e Rosa
Extracting new metrics from VCS ...
7 / 48
Summary
1
Introduction
2
Extracting fine-grain operations from VCS
3
Metrics for the developers
4
Comparison of the developers
5
The case study
6
Conclusion
Moura, Nascimento e Rosa
Extracting new metrics from VCS ...
8 / 48
Extracting fine-grain operations from VCS
Basic notation:
P – a software project in a VCS
D – the set of developers that worked on P .
A – the set of all files created during the development of P
A r ⊆ A – the set of files that were removed (not reached the final
version) of P .
Moura, Nascimento e Rosa
Extracting new metrics from VCS ...
9 / 48
Extracting fine-grain operations from VCS
We mine the VCS for three types of operations: additions,
deletions and modifications of files and lines of code.
Project History
Moura, Nascimento e Rosa
Extracting new metrics from VCS ...
9 / 48
Extracting fine-grain operations from VCS
Moura, Nascimento e Rosa
Extracting new metrics from VCS ...
9 / 48
Summary
1
Introduction
2
Extracting fine-grain operations from VCS
3
Metrics for the developers
4
Comparison of the developers
5
The case study
6
Conclusion
Moura, Nascimento e Rosa
Extracting new metrics from VCS ...
10 / 48
Metrics for the developers
Aspects defined for consideration:
1
Effort – represents the total amount of operations of a type
performed by a developer.
2
Code-survival – indicates the amount of operations of a
type performed by a developer and not changed later by
anyone.
Moura, Nascimento e Rosa
Extracting new metrics from VCS ...
11 / 48
Metrics for the developers
A. Metrics for evaluating developers individually
|Ha | Effo Add(d ) =
∑∑
a∈A i =1

Effo Mod(d ) =
a,i
1 if o1 .devel = d
0 otherwise.
a
|Ha | |hli | 
 1
a,i
if oj .devel = d
a,i
and oj .type = MOD ;
a∈A i =1 j =1 
0 otherwise.
∑ ∑ ∑
Moura, Nascimento e Rosa
Extracting new metrics from VCS ...
12 / 48
Metrics for the developers
A. Metrics for evaluating developers individually

a,i

1 if o1 .devel = d



a,i
and ∀ os with s > 1,
|Ha | 

Surv Add(d ) =
(osa,i .type = MOD
∑ r ∑
a,i
a∈(A −A ) i =1 

and os .devel = d );



0
otherwise.

a,i

1 if oend .type = MOD



a,i
and oend .devel = d
|Ha | 

Surv Mod(d ) =
∑ r ∑  and ∃w , 1 ≤ wa,i< |hlai |,
a∈(A −A ) i =1 

such that ow .devel 6= d ;


 0 otherwise.
Moura, Nascimento e Rosa
Extracting new metrics from VCS ...
13 / 48
Metrics for the developers
A. Metrics for evaluating developers individually
Surv Add Div Effo Add (d ) =
Moura, Nascimento e Rosa
Surv Add (d )
Effo Add (d )
Extracting new metrics from VCS ...
14 / 48
Metrics for the developers
B. Uncovering and measuring relationships between developers
Also, ADD DEL, MOD MOD, MOD DEL.
Moura, Nascimento e Rosa
Extracting new metrics from VCS ...
15 / 48
Metrics for the developers
B. Uncovering and measuring relationships between developers

1 if |hl i | > 1



a,i


and o1 .devel = x


|Ha | 
a,i
and o1 .type = ADD
Line Add Mod(x , y ) = ∑ ∑
a,i

and o2 .devel = y

a∈A i =1 

a,i

and o2 .type = MOD ;



0 otherwise.
Moura, Nascimento e Rosa
Extracting new metrics from VCS ...
16 / 48
Metrics for the developers
B. Uncovering and measuring relationships between developers
Line Add ΣMod(d ) =
∑
Line Add Mod(d, y)
∑
Line Add Mod(x, d)
y ∈D −{d }
Line ΣAdd Mod(d ) =
x ∈D −{d }
Moura, Nascimento e Rosa
Extracting new metrics from VCS ...
17 / 48
Metrics for the developers
C. Extending the metrics for the file level
A project revision is a triple (r , d , L), where:
r is the label of the revision,
d is a identifier of the developer who made the revision,
with d ∈ D , and
L is a list of pairs (a, t) where a is a file and t ∈ {A, M , D }
describes the operation.
A project revision sequence is a sequence
S = h(r1 , d1 , L1 ), (r2 , d2 , L2 ), . . . , (rm , dm , Lm )i of project revisions
that represent the history of changes made on the files of P
without going into detail about the changes made on their
individual lines.
Moura, Nascimento e Rosa
Extracting new metrics from VCS ...
18 / 48
Metrics for the developers
C. Extending the metrics for the file level
File Add Mod(x , y ) =

1 if there are triples (ri , di , Li )




and (rj , dj , Lj ) in S, with i < j ,




such that di = x , dj = y ,




(a, A) ∈ Li and (a, M ) ∈ Lj ,

∑
a∈A 











Moura, Nascimento e Rosa
and for which there is no triple
(rk , dk , Lk ) with i < k < j
such that (a, t ) ∈ Lk
for any operation of type t ;
0 otherwise.
Extracting new metrics from VCS ...
19 / 48
Metrics for the developers
C. Extending the metrics for the file level
File Add ΣMod(d ) =
∑
File Add Mod(d, y)
∑
File Add Mod(x, d)
y ∈D −{d }
File ΣAdd Mod(d ) =
x ∈D −{d }
Moura, Nascimento e Rosa
Extracting new metrics from VCS ...
20 / 48
Metrics for the developers
D. Metrics regarding commits

 1 if triples (ri , di , Li ) and
|S |−1 

(ri +1 , di +1 , Li +1 ) are such that
Commits(x , y ) = ∑
di = x and di +1 = y ;

i =1 

0 otherwise.

 1 if triple (ri , di , Li )
is such that di = d ;
ΣCommits(d ) = ∑

i =1
|S |
0 otherwise.
Moura, Nascimento e Rosa
Extracting new metrics from VCS ...
21 / 48
Metrics for the developers
Metric Rel(d ) =
Moura, Nascimento e Rosa
Metric(d )
∑x ∈D Metric(x )
Extracting new metrics from VCS ...
22 / 48
Summary
1
Introduction
2
Extracting fine-grain operations from VCS
3
Metrics for the developers
4
Comparison of the developers
5
The case study
6
Conclusion
Moura, Nascimento e Rosa
Extracting new metrics from VCS ...
23 / 48
Comparison of the developers
A. Performance-based hierarchy
All metrics should have the same orientation
Moura, Nascimento e Rosa
Extracting new metrics from VCS ...
24 / 48
Comparison of the developers
B. Similarity Comparison
Moura, Nascimento e Rosa
Extracting new metrics from VCS ...
25 / 48
Summary
1
Introduction
2
Extracting fine-grain operations from VCS
3
Metrics for the developers
4
Comparison of the developers
5
The case study
6
Conclusion
Moura, Nascimento e Rosa
Extracting new metrics from VCS ...
26 / 48
The case study
Evaluating the metrics and the comparison approaches with
qualitative assessment on a real software-development project.
The software Weby
A content management system built by UFG.
Hosting more than 400 internal web sites1 .
Considered time (1 year and 7 months).
Eleven (11) developers contributed to the evolution of the
source code.
One developer was also the project manager.
1,294 code revisions into VCS (Subversion) of UFG.
1
The available at https://github.com/cercomp/weby.
Moura, Nascimento e Rosa
Extracting new metrics from VCS ...
27 / 48
The case study
D.
d1
d2
d3
d4
d5
d6
d7
d8
d9
d10
d11
Total
Commits
474
159
2
170
30
99
61
183
20
24
72
1,294
Add.
482
47
0
314
43
333
12
848
1
8
7
2,095
Moura, Nascimento e Rosa
Files
Mod.
1,807
453
6
585
78
367
379
783
34
74
199
4,765
Del.
64
4
0
12
1
17
15
29
0
5
4
151
Add.
110,204
4,340
26
44,013
1,736
51,673
1,116
85,686
102
542
1,190
300,628
Lines
Mod.
7,026
1,531
31
1,577
142
1,548
923
4,688
398
196
489
18,549
Del.
54,710
1,587
165
1,224
205
3,220
1,214
5,289
15
476
308
68,413
Extracting new metrics from VCS ...
28 / 48
The case study
The evaluation was conducted through two assessments
involving four steps each:
1
Calculation of the values of a set of metrics for all
developers.
2
Computation of the hierarchy of classes and the MDS
visualization.
3
Interview with the project manager, aiming to verify if the
classes and the visualization produced by the comparison
approaches match his/her perception about the developers.
4
Analysis and interpretation of the results obtained from the
interview.
Moura, Nascimento e Rosa
Extracting new metrics from VCS ...
29 / 48
The case study
Formulário de Entrevista
Nome do Entrevistado:
Nome do Projeto:
Cargo:
Formação:
Local e Data:
1 Explicar os dados existentes e as métricas. (Explicar o que o sistema desenvolvido faz)
2
Apresentar a classificação por classe de dominância. (Explicar o significado de cada classe)
3
Perguntas sobre a classe de dominância.
a) “Essa separação faz sentido para você?”
b) “Se você fosse escolher um ou mais desenvolvedores para um projeto futuro, esta classificação ajudaria? Por quê? Quais
os desenvolvedores você escolheria?”
c) “Você classificaria os desenvolvedores dessa mesma forma? Por quê? Se não, como seria sua classificação?”
d) “Tem algum desenvolvedor que você acha que foi classificado equivocadamente?”
4
Apresentar a visualização em MDS. (Explicar o que significa a distância entre dois desenvolvedores)
5
Perguntas sobre a visualização em MDS.
e) “Os desenvolvedores que estão próximos são, de fato, parecidos na sua produção técnica? Eles produzem resultados
semelhantes?”
f) “Como você rotularia (daria nomes com base em alguma característica de similaridade) os “grupos” de pessoas
visivelmente próximas?”
g) “Há alguma discrepância ou semelhança entre os resultados das classes de dominância, apresentadas anteriormente, e a
visualização MDS atual?”
6
Perguntas sobre o conjunto total de métricas.
h) “Você concorda que quanto maior for o valor obtido em cada uma dessas 4 métricas melhor foi o desempenho do
desenvolvedor? Por quê?”
i) “Quais outras métricas (da planilha completa) você acha interessante/útil para uma avaliação dos desenvolvedores? Por
quê?”
Moura, Nascimento e Rosa
Extracting new metrics from VCS ...
30 / 48
The case study
A. Metrics and comparisons computed in the first assessment
D.
d1
d2
d3
d4
d5
d6
d7
d8
d9
d10
d11
Surv Add
Surv Mod
Surv Add Div
Effo Add
Surv Mod Div
Effo Dist Mod
102,817
3,188
0
41,929
1,185
50,630
483
83,409
55
225
1,053
539
294
0
410
21
479
163
1,302
211
43
315
0.932
*0.734
0.000
0.952
*0.682
0.979
*0.432
0.973
*0.539
*0.415
*0.884
0.253
*0.609
0.000
0.455
*0.437
*0.807
*0.612
0.632
*0.875
*0.605
*0.734
Moura, Nascimento e Rosa
Extracting new metrics from VCS ...
31 / 48
The case study
Equivalence Classes
Developers
1
d1, d6, d8
2
d4
3
d2, d11
4
d5, d7, d9
5
d10
6
d3
Moura, Nascimento e Rosa
Extracting new metrics from VCS ...
32 / 48
The case study
Moura, Nascimento e Rosa
Extracting new metrics from VCS ...
33 / 48
The case study
Equivalence Classes
Developers [first]
Developers [second]
1
d1, d6, d8
d1, d6, d4, d8
2
d4
d2, d11
3
d2, d11
d5, d7, d9
4
d5, d7, d9
d10
5
d10
d3
6
d3
Moura, Nascimento e Rosa
Extracting new metrics from VCS ...
34 / 48
The case study
Moura, Nascimento e Rosa
Extracting new metrics from VCS ...
35 / 48
Summary
1
Introduction
2
Extracting fine-grain operations from VCS
3
Metrics for the developers
4
Comparison of the developers
5
The case study
6
Conclusion
Moura, Nascimento e Rosa
Extracting new metrics from VCS ...
36 / 48
Conclusion I
We presented new formal definitions and metrics that allow
the extraction of basic but important information from
projects hosted in VCSs.
We considered measures of efforts and code-survival.
Two approaches were suggested for comparing the
developers.
A case study with a real software project was carried out.
The results showed the usefulness of the metrics and of
the comparison approaches.
The new metrics may help to unveil interesting facts.
But there are limitations in the use of VCS data. The logs
are in general incomplete and can lead to ambiguous
interpretation.
Moura, Nascimento e Rosa
Extracting new metrics from VCS ...
37 / 48
Conclusion II
We tried to compensate this weakness by involving the
project manager.
Moura, Nascimento e Rosa
Extracting new metrics from VCS ...
38 / 48
Future Work
Future investigations include:
formulating new metrics;
using other techniques to compare the developers;
improving the diff analysis for detecting other types of
operation;
exploring more sources of data.
Moura, Nascimento e Rosa
Extracting new metrics from VCS ...
39 / 48
Questions?
Moura, Nascimento e Rosa
Extracting new metrics from VCS ...
40 / 48
Extracting new metrics from Version
Control System for the comparison of
software developers
Marcello Moura1 , Hugo Nascimento2 e Thierson Rosa2
Centro de Recursos Computacionais1 , Instituto de Informática2
Universidade Federal de Goiás (UFG)
Caixa Postal 131 – 74.001-970 – Goiânia – GO – Brazil
[email protected], {hadn,thierson}@inf.ufg.br
Goiânia, 21 de Setembro 2014
Moura, Nascimento e Rosa
Extracting new metrics from VCS ...
41 / 48
References I
Enrico Di Bella, Alberto Sillitti, and Giancarlo Succi. A
multivariate classification of open source developers.
Information Sciences, 221(0):72–83, February 2013. ISSN
0020-0255. doi: http://dx.doi.org/10.1016/j.ins.2012.09.031.
Eric Gilbert and Karrie Karahalios. Codesaw: A social
visualization of distributed software development. In
Proceedings of the 11th IFIP TC 13 International Conference
on Human-computer Interaction - Volume Part II,
INTERACT’07, pages 303–316, Berlin, Heidelberg, 2007.
Springer-Verlag. ISBN 3-540-74799-0, 978-3-540-74799-4.
Moura, Nascimento e Rosa
Extracting new metrics from VCS ...
42 / 48
References II
Tudor Girba, Adrian Kuhn, Mauricio Seeberger, and Stéphane
Ducasse. How Developers Drive Software Evolution. In
Proceedings of the Eighth International Workshop on
Principles of Software Evolution, IWPSE’05, pages 113–122,
Washington, DC, USA, 2005. IEEE Computer Society. ISBN
0-7695-2349-8. doi: 10.1109/IWPSE.2005.21.
Shih-Kun Huang and Kang-min Liu. Mining version histories to
verify the learning process of legitimate peripheral
participants. SIGSOFT Software Engineering Notes, 30(4):
1–5, May 2005. ISSN 0163-5948. doi:
10.1145/1082983.1083158.
Moura, Nascimento e Rosa
Extracting new metrics from VCS ...
43 / 48
References III
Andrejs Jermakovics, Alberto Sillitti, and Giancarlo Succi.
Mining and visualizing developer networks from version
control systems. In Proceedings of the 4th International
Workshop on Cooperative and Human Aspects of Software
Engineering, CHASE ’11, pages 24–31, New York, NY, USA,
2011. ACM. ISBN 978-1-4503-0576-1. doi:
10.1145/1984642.1984647.
Luis Lopez-Fernandez, Gregorio Robles, and Jesus M.
Gonzalez-Barahona. Applying Social Network Analysis to the
Information in CVS Repositories. In First International
Workshop on Mining Software Repositories, pages 101–105,
2004.
Moura, Nascimento e Rosa
Extracting new metrics from VCS ...
44 / 48
References IV
Shawn Minto and Gail C. Murphy. Recommending emergent
teams. In Proceedings of the Fourth International Workshop
on Mining Software Repositories, MSR ’07, page 5,
Washington, DC, USA, 2007. IEEE Computer Society. ISBN
0-7695-2950-X. doi: 10.1109/MSR.2007.27.
Audris Mockus and James D. Herbsleb. Expertise browser: A
quantitative approach to identifying expertise. In Proceedings
of the 24th International Conference on Software Engineering,
ICSE ’02, pages 503–512, New York, NY, USA, 2002. ACM.
ISBN 1-58113-472-X. doi: 10.1145/581339.581401.
Moura, Nascimento e Rosa
Extracting new metrics from VCS ...
45 / 48
References V
Stas Negara, Mohsen Vakilian, Nicholas Chen, RalphE.
Johnson, and Danny Dig. Is It Dangerous to Use Version
Control Histories to Study Source Code Evolution? In James
Noble, editor, ECOOP 2012 - Object-Oriented Programming,
volume 7313 of Lecture Notes in Computer Science, pages
79–103. Springer Berlin Heidelberg, 2012. ISBN
978-3-642-31056-0. doi: 10.1007/978-3-642-31057-7 5.
David Schuler and Thomas Zimmermann. Mining usage
expertise from version archives. In Proceedings of the 2008
International Working Conference on Mining Software
Repositories, MSR ’08, pages 121–124, New York, NY, USA,
2008. ACM. ISBN 978-1-60558-024-1. doi:
10.1145/1370750.1370779.
Moura, Nascimento e Rosa
Extracting new metrics from VCS ...
46 / 48
References VI
L Voinea, J Lukkien, and A Telea. Visual Assessment of
Software Evolution. Science of Computer Programming, 65
(3):222–248, April 2007. ISSN 01676423.
Lucian Voinea and Alexandru Telea. An Open Framework for
CVS repository Querying, Analysis and Visualization. In
Proceedings of the 2006 international workshop on Mining
software repositories - MSR’06, pages 33–39, New York, NY,
USA, May 20-28 2006. ACM Press. ISBN 1595933972. doi:
10.1145/1137983.1137993.
Shen Zhang, Yongji Wang, and Junchao Xiao. Mining Individual
Performance Indicators in Collaborative Development Using
Software Repositories. In Software Engineering Conference,
2008. APSEC ’08. 15th Asia-Pacific, pages 247 –254,
December 2008a. doi: 10.1109/APSEC.2008.12.
Moura, Nascimento e Rosa
Extracting new metrics from VCS ...
47 / 48
References VII
Shen Zhang, Yongji Wang, Ye Yang, and Junchao Xiao.
Capability assessment of individual software development
processes using software repositories and dea. In
Proceedings of the Software Process, 2008 International
Conference on Making Globally Distributed Software
Development a Success Story, ICSP’08, pages 147–159,
Berlin, Heidelberg, 2008b. Springer-Verlag. ISBN
3-540-79587-1, 978-3-540-79587-2.
Moura, Nascimento e Rosa
Extracting new metrics from VCS ...
48 / 48