ferramentas - Instituto de Computação

Transcrição

Congresso Brasileiro de Software: Teoria e Prática
28 de setembro a 03 de outubro de 2014 – Maceió/AL
XXI SESSÃO DE FERRAMENTAS
SESSÃO DE FERRAMENTAS 2014
Anais
Sessão de Ferramentas
Volume 02
ISSN 2178-6097
Anais
SESSÃO DE
FERRAMENTAS 2014
XXI Sessão de Ferramentas
COORDENADORES DO COMITÊ DE PROGRAMA
Uirá Kulesza - Universidade Federal do Rio Grande do Norte (UFRN)
Valter Camargo - Universidade Federal de São Carlos (UFSCar)
COORDENAÇÃO DO CBSOFT 2014
Baldoino Fonseca - Universidade Federal de Alagoas (UFAL)
Leandro Dias da Silva - Universidade Federal de Alagoas (UFAL)
Márcio Ribeiro - Universidade Federal de Alagoas (UFAL)
REALIZAÇÃO
Universidade Federal de Alagoas (UFAL)
Instituto de Computação (IC/UFAL)
PROMOÇÃO
Sociedade Brasileira de Computação (SBC)
PATROCÍNIO
CAPES, CNPq, INES, Google
APOIO
Instituto Federal de Alagoas, Aloo Telecom, Springer, Secretaria de Estado do Turismo
AL, Maceió Convention & Visitors Bureau, Centro Universitário CESMAC e Mix Cópia
2
PROCEEDINGS
Volume 02
ISSN 2178-6097
TOOLS 2014
XXI Tools Session
PROGRAM CHAIRS
Valter Camargo - Universidade Federal de São Carlos (UFSCar)
CBSOFT 2014 GENERAL CHAIRS
ORGANIZATION
Universidade Federal de Alagoas (UFAL)
Instituto de Computação (IC/UFAL)
PROMOTION
Sociedade Brasileira de Computação (SBC)
SPONSORS
CAPES, CNPq, INES, Google
SUPPORT
Instituto Federal de Alagoas, Aloo Telecom, Springer, Secretaria de Estado do Turismo AL, Maceió Convention & Visitors Bureau, Centro Universitário CESMAC and Mix Cópia
3
Autorizo a reprodução parcial ou total desta obra, para fins acadêmicos, desde que citada a fonte
4
Apresentação
A Sessão de Ferramentas é um evento bastante tradicional da comunidade de
software brasileira, sendo esta sua 21ª. edição. Até 2009, a Sessão de Ferramentas era
realizada como evento satélite de dois simpósios: o SBES (Simpósio Brasileiro de
Engenharia de Software) e o SBCARS (Simpósio Brasileiro de Componentes,
Arquiteturas e Reutilização de Software). Em 2010, a criação do CBSoft passou a
englobar SBES, SBCARS e SBLP (Simpósio Brasileiro de Linguagens de Programação)
sob um único congresso. Em 2011 o SBMF (Simpósio Brasileiro de Métodos Formais)
foi também incorporado ao CBSoft. Dessa forma, o escopo da sessão de ferramentas
foi ampliado, aceitando contribuições dessas quatro comunidades. Nesta 21ª.
edição, a sessão de ferramentas foi realizada novamente no âmbito do CBSoft, nos
dias 01 e 02 de outubro na cidade de Maceió, Alagoas, Brasil.
O comitê de programa foi composto por 49 membros de diferentes
universidades brasileiras e estrangeiras, cobrindo diferentes áreas de pesquisa da
engenharia de software. Foram recebidas 31 submissões de artigos de diferentes
programas de pós-graduação do Brasil. Cada artigo foi revisto por três membros do
comitê de programa, gerando um total de 93 excelentes revisões, o que contribuiu
imensamente no processo de seleção dos artigos. Durante o processo de revisão, uma
etapa de consenso e uma de rebuttal foram realizadas, melhorando o consenso entre
revisores e dando aos autores a oportunidade de responderem questões levantadas
nas revisões. Um diferencial desta edição foi a exigência de um vídeo sobre a
ferramenta, eliminando a tradicional necessidade de deixar a ferramenta disponível
para download.
Como resultado do processo de revisão, 16 ferramentas foram selecionadas
para serem publicadas nos anais e apresentadas na conferência (taxa de
aceitação = 51%). Os artigos selecionados abordam as seguintes áreas da engenharia
de software: arquitetura de software; modularidade e refatoração; mineração de
repositórios; testes de software e métodos formais; linhas de produtos de software, e
processos de software e negócio.
O sucesso da Sessão de Ferramentas do CBSoft 2014 somente foi possível por
causa da dedicação e entusiasmo de muitas pessoas. Primeiramente, gostaríamos
de agradecer aos autores que submeteram seus trabalhos. Gostaríamos de
agradecer também aos membros do comitê de programa e revisores externos,
pelo excelente trabalho de revisão e participação ativa nas discussões. Também
agradecemos à organização geral do CBSoft, representada por Leandro Dias da Silva
(UFAL), Baldoino Fonseca (UFAL) e Márcio Ribeiro (UFAL), que foi essencial para o
ótimo andamento deste evento.
Esperamos que você aprecie o programa técnico da 21ª. Sessão de Ferramentas do 5º
CBSoft 2014.
Maceió, Outubro de 2014
Prof. Dr. Uirá Kulesza
Prof. Dr. Valter Vieira de Camargo
Coordenadores da 21ª. Sessão de Ferramentas do CBSoft 2014
5
Foreword
Tools Session is one of the most traditional events of the Brazilian software community
and this is its 21st edition. Until 2009, the Tools Session had been held as a satellite
event of two well known Brazilian Symposiums: SBES (Brazilian Symposium on
Software Engineering) and SBCARS (Brazilian Symposium on Software Components,
Architectures and Reuse). In 2010, a new conference series named CBSoft was
initiated, putting together SBES, SBCARS and also SBLP (Brazilian Symposium on
Programming Languages). In 2011, SBMF (Brazilian Symposium on Formal Methods)
was also incorporated. As a consequence, the scope of the Tools Session was
broadened by accepting contributions from these four communities. In this 21st
edition, the Tools Session was held again under CBSoft conference on October 1st
and 2nd in Maceió, Alagoas, Brazil.
The program committee involved 49 members from Brazilian and
international universities, covering the software engineering main research areas. We
have received 31 submissions from different graduate programs in Brazil. Each paper
was reviewed by 3 (three) members of the program committee, resulting in 93
excellent reviews that contributed to the selection process of the tools. A consensus
and rebuttal phase was also conducted along the reviewing process, leading to a
better consensus among reviewers and giving the authors the opportunity to
elucidate unclear points. A remarkable characteristic of this edition was the
requirement to make a video describing the tool available together with the
submission, instead of the traditional necessity of making the tool available for
download.
As a result of the reviewing process, 16 tools were selected to be
included in this proceedings and presented in the conference (acceptance rate =
51%). The selected tools encompass the following software engineering areas:
software architecture; modularity and refactoring; mining software repositories;
software testing and formal methods; product lines and business and software
processes.
The success of CBSoft Tools Session 2014 was only possible because of the
dedication and enthusiasm of many people. First of all, we would like to thank
the authors for submitting their papers. We would also like to thank the Program
Committee members and external reviewers for the excellent reviewing work and the
active participation on the discussions. We also thank the organization of CBSoft,
Leandro Dias da Silva (UFAL), Baldoino Fonseca (UFAL) e Márcio Ribeiro (UFAL) that
was fundamental to the organization and success of this event.
We hope you enjoy the technical program of the 21st CBSoft Tools Session 2014.
Prof. Dr. Uirá Kulesza
Prof. Dr. Valter Vieira de Camargo
Chairs of the CBSoft 2014 Tools Session
6
Biografia dos Coordenadores / Chairs Short
Biographies
Uirá Kulesza
Uirá Kulesza is an Associate Professor at the Department of Informatics and
Applied Mathematics (DIMAp), Federal University of Rio Grande do Norte (UFRN),
Brazil. He obtained his PhD in Computer Science at PUC-Rio – Brazil (2007), in
cooperation with University of Waterloo and Lancaster University. His main
research interests include: software architecture, modularity, and software
product lines. He has co-authored over 150 referred papers in journals,
conferences, and books. He is currently a visiting researcher at the Software
Engineering Research Group (SERG) at Delft University of Technology (TU Delft). He
worked as a post-doc researcher member of the AMPLE project (2007-2009) –
Aspect-Oriented Model-Driven Product Line Engineering (www.ample-project.net)
at the New University of Lisbon, Portugal. He is currently a CNPq research fellow
level 2.
Valter Vieira de Camargo
Valter Vieira de Camargo is an Associate Professor at the Computing Department
of Computer Science of Federal University of São Carlos, Brazil (DC/UFSCar).
Currently he is the head of the AdvanSE (Advansed Research on Software
Engineering Group) in this department. He obtained his PhD in Computer Science at
ICMC/USP in 2006 and his Master Degree in 2001 in DC/UFSCar. Along the year
of 2013, he worked as an invited researcher in the ENOFES Project, at the Computing
Department of the University of Twente, Netherlands. His main research interests
are Software Modernization, Software Modularity and Software Reuse (frameworks
and product lines). He has co-authored over 110 referred papers in journals,
conferences and books.
7
Comitês Técnicos / Program Committee
Comitê do programa / Program Committee
Adenilso Simao - Universidade de São Paulo (USP)
Alexandre Mota - Universidade Federal de Pernambuco (UFPE)
Anamaria Martins Moreira - Universidade Federal do Rio de Janeiro (UFRJ)
André Santos - Universidade Federal de Pernambuco (UFPE)
Arilo Dias Neto - Universidade Federal do Amazonas (UFAM)
Cecilia Rubira - Universidade Estadual de Campinas (UNICAMP)
Cláudio SantÀnna - Universidade Federal da Bahia (UFBA)
Daniel Lucrédio - Universidade Federal de São Carlos (UFSCar)
David Déharbe - Universidade Federal do Rio Grande do Norte (UFRN)
Delano Beder - Universidade Federal de São Carlos (UFSCar)
Eduardo Almeida - Universidade Federal da Bahia (UFBA)
Eduardo Figueiredo - Universidade Federal de Minas Gerais (UFMG)
Elder José Cirilo - Universidade Federal de São João del-Rei (UFSJ)
Elisa Huzita - Universidade Estadual de Maringá (UEM)
Fabiano Ferrari - Universidade Federal de São Carlos (UFSCar)
Fernando Castor - Universidade Federal de Pernambuco (UFPE)
Fernando Trinta - Universidade Federal do Ceará (UFC)
Frank Siqueira - Universidade Federal de Santa Catarina (UFSC)
Franklin Ramalho - Universidade Federal de Campina Grande (UFCG)
Glauco Carneiro - Universidade Salvador (UNIFACS)
Gledson Elias - Universidade Federal da Paraíba (UFPB)
Ingrid Nunes - Universidade Federal do Rio Grande do Sul (UFRGS)
Leila Silva - Universidade Federal de Sergipe (UFS)
Lile Hattori - Microsoft Research
Luis Ferreira Pires - University of Twente, Netherlands
Marcel Oliveira - Universidade Federal do Rio Grande do Norte (UFRN)
Marcelo dAmorim - Universidade Federal de Pernambuco (UFPE)
Marcelo Augusto Santos Turine - Universidade Federal de Mato Grosso do Sul (UFMS)
Marco Tulio Valente - Universidade Federal de Minas Gerais (UFMG)
Maria Istela Cagnin - Universidade Federal de Mato Grosso do Sul (UFMS)
Márcio Cornélio - Universidade Federal de Pernambuco (UFPE)
Nabor Mendonca - Universidade de Fortaleza (UNIFOR)
Otavio Lemos - Universidade Federal de São Paulo (UNIFESP)
Patricia Machado - Universidade Federal de Campina Grande (UFCG)
Paulo Maciel - Universidade Federal de Pernambuco (UFPE)
Paulo Pires - Universidade Federal do Rio de Janeiro (UFRJ)
Pedro Santos Neto - Universidade Federal do Piauí (UFPI)
Raphael Camargo - Universidade Federal do ABC (UFABC)
Ricardo Lima - Universidade Federal de Pernambuco (UFPE)
Rita Suzana Pitangueira Maciel - Universidade Federal da Bahia (UFBA)
Roberta Coelho - Universidade Federal do Rio Grande do Norte (UFRN)
8
Rohit Gheyi - Universidade Federal de Campina Grande (UFCG)
Rosana Braga - Universidade de São Paulo (USP)
Rosângela Penteado - Universidade Federal de São Carlos (UFSCar)
Sandra Fabbri - Universidade Federal de São Carlos (UFSCar)
Sérgio Soares - Universidade Federal de Pernambuco (UFPE)
Tayana Conte - Universidade Federal do Amazonas (UFAM)
Tiago Massoni - Universidade Federal de Campina Grande (UFCG)
Vander Alves - Universidade de Brasília (UnB)
Avaliadores Externos / Additional Reviewers
Alex Alberto - Universidade de São Paulo (USP)
Davi Viana - Universidade Federal do Amazonas (UFAM)
Heitor Costa - Universidade Federal de Lavras (UFLA)
Jacilane Rabelo - Universidade Federal do Amazonas (UFAM)
Ricardo Terra - Universidade Federal de Lavras (UFLA)
9
Comitê organizador / Organizing Committee
COORDENAÇÃO GERAL
COMITÊ LOCAL
Adilson Santos - Centro Universitário Cesmac (CESMAC)
Elvys Soares - Instituto Federal de Alagoas (IFAL)
Francisco Dalton Barbosa Dias - Universidade Federal de Alagoas (UFAL)
COORDENADORES DO COMITÊ DE PROGRAMA DA SESSÃO DE FERRAMENTAS
Valter Vieira de Camargo - Universidade Federal de São Carlos (UFSCar)
10
Índice / Table of Contents
ArchViz: a Tool to Support Architecture Recovery Research
Vanius Zapalowski, Ingrid Nunes e/and Daltro Nunes
13
Uma Ferramenta para Verificação de Conformidade Visando
Diferentes Percepções de Arquiteturas de Software
Izabela Melo, Dalton Serey e/and Marco Túlio Valente
21
JExtract: An Eclipse Plug-in for Recommending Automated Extract
Method Refactorings
Danilo Silva, Ricardo Terra e/and Marco Túlio Valente
29
ModularityCheck: A Tool for Assessing Modularity using Co-Change
Clusters
Luciana Silva, Daniel Félix, Marco Túlio Valente e/and Marcelo de Almeida
Maia
37
Nuggets Miner: Assisting Developers by Harnessing the
StackOverflow Crowd Knowledge and the GitHub Traceability
Eduardo Campos, Lucas Batista Leite de Souza e/and Marcelo de Almeida
Maia
45
NextBug: A Tool for Recommending Similar Bugs in Open-Source
Systems
Henrique Rocha, Guilherme Oliveira, Humberto Marques e/and Marco Túlio
Valente
53
FunTester:
A
fully
automatic
Thiago Pinto e/and Arndt von Staa
61
functional
testing
tool
JMLOK2: A tool for detecting and categorizing nonconformances
Alysson Milanez, Dennis de Sousa, Tiago Massoni e/and Rohit Gheyi
69
A Rapid Approach for Building a Semantically Well Founded Circus
Model Checker
Alexandre Mota e/and Adalberto Farias
77
SPLConfig: Product Configuration in Software Product Line
Lucas Machado, Juliana Pereira, Lucas Garcia e/and Eduardo Figueiredo
85
SPLICE: Software Product Lines Integrated Construction
Environment
Bruno Cabral, Tassio Vale e/and Eduardo Almeida
93
11
FlexMonitorWS: uma solução para monitoração de serviços Web
com foco em atributos de QoS
Rômulo Franco, Cecilia Rubira, e/and Amanda Nascimento
101
A Code Smell Detection Tool for Compositional-based Software
Product Lines
Ramon Abílio, Gustavo Vale, Johnatan Oliveira, Eduardo Figueiredo e/and
Heitor Costa
109
AccTrace: Considerando Acessibilidade no Processo de
Desenvolvimento de Software
Rodrigo Branco, Maria Istela Cagnin e/and Debora Paiva
117
Spider-RM: Uma Ferramenta para Auxílio ao Gerenciamento de
Riscos em Projetos de Software
Heresson Mendes, Bleno Silva, Diego Abreu, Diogo Ferreira, Manoel Victor
Leite, Marcos Leal e/and Sandro Oliveira
125
A Tool to Generate Natural Language Text from Business Process
Models.
Raphael Rodrigues, Leonardo Azevedo, Kate Revoredo e/and Henrik Leopold
133
12
ArchViz: a Tool to Support Architecture Recovery Research
Vanius Zapalowski1 , Ingrid Nunes1 , and Daltro José Nunes1
1
Prosoft Research Group – Instituto de Informática
Universidade Federal do Rio Grande do Sul, Brazil
{vzapalowski,ingridnunes,daltro}@inf.ufrgs.br
Abstract. In order to produce documented software architectures, many software architecture recovery methods have been proposed. Developing such
methods involves a not trivial data analysis, and this calls for different data
visualisations, which help compare predicted and target software architectures.
Moreover, comparing methods is also difficult, because they use divergent measurements to evaluate their performance. With the goal of improving and supporting architecture recovery research, we developed the ArchViz tool, which
is presented in this paper. Our tool provides metrics and visualisations of software architectures, supporting the analysis of the output of architecture recovery
methods, and possibly the standardisation of their evaluation and comparison.
Video link: http://youtu.be/Gjo5cOzk4kM .
1. Introduction
An explicitly documented software architecture plays a key role in software development
as it keeps track of many design decisions and helps maintain consistence in the developed
software. It provides useful knowledge to deal with the software evolution accordingly to
planned architectural principles captured by a high-level model — usually represented in
a graphical model. Despite of the importance of having a documented architecture, many
systems lack proper architectural documentation.
Software architecture recovery (SAR) methods aid software architects in the task
of inspecting the source code to understand an implemented software when there is no
architectural documentation available or it is outdated. SAR methods have been proposed to reduce the human effort needed to perform this task. Such methods use different
inputs (e.g., dependencies, semantics, and patterns) and a variety of metrics (e.g., precision, recall, and distance) to produce recovered architectures. Moreover, SAR studies
focus on the measurement of certain properties lacking a visual representation of their
recovered and target architectures [Ducasse and Pollet 2009]. Consequently, the process
of evaluating and analysing results of a SAR method is a complex and time-consuming
activity [Garcia et al. 2013], given the combination of possible sources of information,
evaluation metrics, and results analysis.
Different tools have been proposed in the literature to improve software architectures, e.g. [Lindvall and Muthig 2008], and most of them focus on checking architecture
conformance or compliance. On the other hand, ArchViz, the tool introduced in this paper, has the goal of supporting SAR research. Therefore, the purpose of this new tool
is the key difference from other existing tools in the context on software architecture.
In previous work [Zapalowski et al. 2014], we faced the problems discussed above in a
13
study to evaluate the relevance of code-based characteristics to identify modules of recovered architectures. To address such problems, we implemented a web-based tool, named
ArchViz, able to partially automate the analysis of recovered architectures, as no other
similar tool was available. Therefore, our tool emerged from our own (real) need for supporting our research, and its effectiveness is indicated by the research results we were able
to derive from our data analysis with the support of ArchViz. Our tool provides evaluation metrics of recovered software architectures using well-known information retrieval
measures. In addition, our tool generates three visualisations (tree-map, module dependencies graph and element dependencies graph) of recovered (or predicted) and target
architectures in order to help understand the results of the recovery process.
This paper is organised as follows. Section 2 describes ArchViz, presenting its
main features. Next, Section 3 discusses existing tools related to software architecture
visualisation and their support to SAR. Finally, we present the final remarks in Section 4.
2. The ArchViz Tool
This section presents the contributions that ArchViz provides to support SAR research.
Our intended users are researchers, which are able to compare a target architecture (oracle) with a recovered architecture, and verify metrics that indicate the classification effectiveness. First, we describe in Section 2.1 ArchViz’s architecture and how to use it,
presenting its user interface. In Section 2.3, we present the two main features of ArchViz:
(i) measurement of well-known information retrieval metrics, which are adopted in the
context of general-purpose classification problems, and are detailed in Section 2.2; and
(ii) plotting of three different graphical models of recovered and target architectures.
In order to illustrate the functionalities of ArchViz, we present the evaluation and analysis of one of the five subject system used in our previous
work [Zapalowski et al. 2014], named OLIS. Thus, the metrics and visualisations presented in the remainder of this paper are extracted from OLIS, which is an agent-based
product line that provides personal services for users.
2.1. Architecture and User Interface
ArchViz is a web-based application implemented in Ruby using the Ruby on Rails
(RoR)1 framework. Consequently, the architecture of our tool follows the Model–View–
Controller architectural pattern adopted by RoR. In our implementation, the main task of
each architectural module are: the Model represents and stores imported architectures;
the Controller calculates the implemented evaluation metrics; and the View plots the architectural visualisations using D32 JavaScript library.
To start using ArchViz, users should import a software project using the Import
Project option available in the menu, which allows users to provide input data. Projects’
details must be specified in two Comma-Separated Values (CSV) files: (i) the first with
information about to which module each architectural element belongs in the recovered
and in the target architectures; and (ii) the second with all the dependencies between
architectural elements. The adequate format of such files is detailed in the functionality
1
2
Available at http://rubyonrails.org/
Available at http://d3js.org
14
of importing projects. After importing a project, our tool summarises and presents the
data related to it, as illustrated in Figure 1.
Figure 1. ArchViz User Interface.
2.2. Architecture Recovery Metrics
We selected general purpose metrics to evaluate multi-class prediction of machine learning algorithms, defined by Sokolova and Lapalme [Sokolova and Lapalme 2009], as the
metrics to evaluate the quality of the recovered architecture, because they are also applicable to our context. Using these metrics, we are able to standardise the analysis of
results of SAR methods, given that this set of metrics only needs the recovered and target
architectures to be calculated. The metrics definitions are given in Table 1, following this
notation: K is the set of proposed architectural modules; i is a module such that i ∈ K;
|K| represents the cardinality of K; tpi are the true positives of i; tni are the true negatives of i; f pi are the false positives of i; f ni are the false negatives of i; and |i| is the
number of elements in the module i. The definition of what is an element is specific to
each recovery method: it can be a class, a components, a procedure, and so on.
2.3. Architecture Visualizations
Because of the complexity of large-scale software, it is difficult to represent it in a single simple model. Software architecture visualisation helps stakeholders involved with
software development to understand the concepts adopted in their applications using a
high-level representation. Most of the SAR approaches focus on presenting metrics to
evaluate their results, and they do not provide architectural visualisations that enable a
finer-grained analysis of results. This is helpful particularly to researchers, because humans can derive findings based on visual models and data abstractions better than machines [Keim et al. 2008]. Therefore, we proposed and implemented three visualisations
that aim to improve the analysis and comparison of recovered and target architectures.
We next present the three visualisations that our tool provides: (i) Tree-map,
which provides a visual analysis of the recovered architecture using a hierarchical representation (Section 2.3.1); (ii) Module Dependencies Graph, which details dependencies
15
Table 1. Metrics implemented in ArchViz.
Metric
Description
Precision
The precision measures the correctness of the overall recovered
architecture independently from architectural modules size. It
considers only the tp of each module.
Formula
K
P
tpi
i=1
K
P
|i|
i=1
Average
Precision
To evaluate the per-module precision, the average precision
measures the agreement between the recovered and the target
architecture for each module. It only considers the cases where
the recovered classification of the architectural elements agrees
with the target architecture.
Average
Recall
By calculating the average recall, we obtain an average of the
per-class effectiveness of an SAR method to identify architectural
modules. To calculate the average recall, we consider the tp and
fn of each module.
Average
Accuracy
The average accuracy measures the correctness of each module
and the distinctness from the other modules. It evaluates the correct architectural elements, tp and tn, of each recovered module.
The average accuracy is useful to measure the recovery method
per-module effectiveness.
Average
F-measure
The average F-measure combines the average precision and average recall to provide one metric that indicated both the overall
correctness and the module prediction quality. This metric measures the relationship between correctly predicted elements and
those given by a metric based on a per-module average.
K
P
i=1
tpi
tpi +f pi
|K|
K
P
i=1
tpi
tpi +f ni
|K|
K
P
i=1
tpi +tni
tpi +tni +f pi +f ni
|K|
2∗avg prec∗avg rec
(avg prec+avg rec)
among modules, and their respective sizes (Section 2.3.2); and (iii) Element Dependencies
Graph, which presents a fine-grained visualisation of architectural elements showing only
the inter-modules dependencies (Section 2.3.3). Note that, in our tool, all visualisations
are shown together with the metrics described previously. Moreover, the last two visualisation types show two graphs corresponding to the recovered and target architectures,
allowing a side-by-side comparison.
2.3.1. Tree-map
The tree-map visualisation is a two-dimensional hierarchy graph created by Shneiderman [Shneiderman 1992] to analyse hard disk usage. Similarly to software architectures,
the disk folder hierarchy represents categories, and files are leaf elements that are in a
folder. A common problem to visualise the data is to represent the relevance of more
than two attributes in a single chart. In the hard disk usage, for example, files have a
parent folder and size, which means that we need to plot separate graphs for file usage
and for folder usage, to visualise both attributes using a Cartesian coordinate system.
Shneiderman proposed a tree-organisation structure, where each element is represented
as a rectangle and attributes can be specified by colours, sizes or hierarchy position of the
rectangles. Then, the hard disk usage can be represented in a single graph, where folders
16
are outside rectangles with its file elements inside. Additionally, their sizes can represent
the amount of disk usage.
Figure 2. Tree-map Visualisation of the OLIS Recovered Architecture.
As in hard disk usage, a software architecture typically has a hierarchical structure:
architectural elements belong to modules. So, we mapped software architectures to the
tree-map representation, in order to understand the predicted results of the recovered and
target architectures. We represent both architecture versions in a single graph to visualise
predicted results. Figure 2 is an example of the tree-map visualisation, where the outer
rectangles are the target architectural modules, the inner rectangles are the architectural
elements with their name, and colours of architectural elements are assigned according
to the recovered module to which they belong. The target module names, shown in the
upper right hand side of Figure 2, are possibly from a manually recovered architecture. In
the case when the recovered architecture matches the target architecture completely, all
outer rectangles are coloured by only one colour and each outer rectangle has a different
colour from the others. Figure 2 illustrates a scenario in which the recovered architecture
differs from the target architecture. As can be seen in Figure 2, the outer rectangles major
colour defines its recovered architectural module, i.e. in Figure 2, the lower left rectangle
corresponds to the Data module and the upper left rectangle corresponds to the UI module.
Figure 2 thus indicates the recovered modules and the assignment distribution of
architectural elements to the target modules. Furthermore, this representation confirms
the information provided by the module accuracy: (i) if a single colour is concentrated in
a single module, accuracy is high; and (ii) otherwise, it is low. Additionally, the tree-map
visualisation combines the recovered and target architecture allowing a visual comparison
of the architectural measures extracted from a SAR method.
2.3.2. Module Dependencies Graph
The module dependencies graph is a coarse-grained view that aims to provide an overall
view of the system. It is similar to the the most common notation to represent architectures used, where architectural modules are represented as nodes and communication
among them as edges. This representation improves architecture understanding, because
17
it exposes the main system concepts, and presents them in a concise way, showing both
the architectural modules and how they communicate to each other. Figure 3 shows an
example of this notation, presenting the architecture of OLIS, which uses a layered architectural pattern.
Figure 3. Example of a Typical Architecture Model.
Although this traditional model presents, in a high-level view, the main architectural modules and their communication, it lacks important details needed to compare the
recovered and target architectures. Furthermore, it undertakes architectural information
that could be represented in an architectural visualisation, such as the intensity of the
dependency among two modules, which is helpful for understanding a recovered architecture. Analysing the representation of the OLIS architecture in Figure 3, it is impossible
to identify the intensity of dependencies among modules. In this usual representation, the
module sizes represent just the existence of architectural modules and they do not correspond to the size of the modules in the system.
To enrich the information provided by this traditional module dependencies graph,
we implemented the module visualisation with modifications. The same architecture presented in Figure 3 is represented in ArchViz as shown in Figure 4(a). In ArchViz, the
modules are defined by their size and colour. Their colours characterise each module role
and their sizes are proportional to the number of architectural elements that they have.
Additionally, we add labels to the modules nodes with the architectural role and number
of elements that they have. The edges represent the communication among modules specifying the dependency hierarchy, i.e. an edge in red means that the red module uses the
module that it is linked to. Moreover, the edge thickness is proportional to the dependency
level that the modules has, e.g. the dependency among the Agent and Model modules is
stronger than that of the Agent and Business modules, in the presented OLIS architecture.
2.3.3. Element Dependencies Graph
The element dependencies graph is the finest-grained architectural visualisation that
ArchViz provides. It presents the dependencies among architectural elements classifying them into architectural modules. This visualisation represents elements as nodes, and
the node colours characterise to which module they belong. The edges are coloured by
the inter-module dependency, similarly to the module dependencies graph. This representation disregards the intra-module dependencies to reduce the number shown edges —
otherwise the graph would provide too much information. As an example, we present the
OLIS target architecture using the element dependencies graph in Figure 4(b). The graph
18
(a) Module Dependencies.
(b) Element Dependencies.
Figure 4. OLIS Graphs.
shows: (i) five modules, (ii) the inter-modules dependencies, and (iii) the 211 elements of
the system.
3. Related Work
Besides previous work we already discussed, which defines adopted metrics and inspired
our visualisations, there are important studies that are closely related to ArchViz. We
discuss these studies in this section.
A metric that is commonly used to evaluate recovered architectures is the MoJo
distance metric [Tzerpos and Holt 1999]. It is a domain-specific metric to measure the
number of steps needed to obtain the target architecture, given a recovered architecture.
It was not implemented in ArchViz due to problems on its application in some specific
SAR methods that were recently reported [Garcia et al. 2013].
Bunch [Mancoridis et al. 1999] is one of the first tools that support all the software decomposition process, from the manual investigation to the recovered architecture
visualisation. It generates subsystems and creates a fine-grained representation of the
software based on the architectural element dependencies, similar to that presented in
Figure 3. Differently from ArchViz, Bunch’s objective is to obtain a recovered architecture. Thus, it does not comprise the evaluation metrics and comparisons of the recovered
against the target architecture.
An approach that compares architectural models was proposed by Beck and
Diehl [Beck and Diehl 2010]. Their approach evaluates the similarity of architectural element dependencies using a matrix dependencies representation. This method points out
similarities and divergences of the architectures in the architectural elements level. Therefore, it analyses only the similarities in the elements, and modules information, such as
modules communication, are not taken into account.
4. Conclusion
Software architecture recovery (SAR) methods have been proposed to decrease the effort
needed to maintain up-to-date architectural documentation of software systems. These
19
methods apply different evaluation metrics to analyse recovered architectures and to be
used as basis to derive their findings. In addition, SAR methods often lack a visual representation of their recovered and target architectures, which are essential to analyse results.
We built ArchViz to address these issues to support SAR research, by providing
measurement of evaluation metrics and architecture visualisation representations. The
implemented metrics provide statistical evidences of the level of agreement between recovered and target architectures. Moreover, we provided visualisations and side-by-side
comparisons of recovered and concrete architectures contribute with useful knowledge
to understand results of a method, which helps in the process of refining and improving it. Thus, ArchViz is a tool that reduces the effort needed to analyse the recovered
architectures, providing a useful set of metrics together with an automatic generation of
architectural models to support the SAR research.
It is important to highlight that one of the subject systems investigated in our
research on SAR using ArchViz is from the industry. However, this system was not
presented in this paper (but the OLIS) due to a confidentiality agreement. Although we
used the tool in a real world scenario, we have not used it with (very) large scale systems,
and this is part of our future work.
References
[Beck and Diehl 2010] Beck, F. and Diehl, S. (2010). Visual comparison of software architectures. In International Symposium on Software Visualization, pages 183–192.
[Ducasse and Pollet 2009] Ducasse, S. and Pollet, D. (2009). Software architecture reconstruction: A process-oriented taxonomy. Trans. Softw. Eng., pages 573–591.
[Garcia et al. 2013] Garcia, J., Ivkovic, I., and Medvidovic, N. (2013). A comparative analysis of software architecture recovery techniques. In International Conference on Automated Software Engineering, pages 486–496.
[Keim et al. 2008] Keim, D., Mansmann, F., Schneidewind, J., Thomas, J., and Ziegler, H.
(2008). Visual analytics: Scope and challenges. In Visual Data Mining, pages 76–90.
[Lindvall and Muthig 2008] Lindvall, M. and Muthig, D. (2008). Bridging the software
architecture gap. Computer, 41(6):98–101.
[Mancoridis et al. 1999] Mancoridis, S., Mitchell, B. S., Chen, Y., and Gansner, E. R.
(1999). Bunch: A clustering tool for the recovery and maintenance of software system
structures. In International Conference on Software Maintenance, pages 50–59.
[Shneiderman 1992] Shneiderman, B. (1992). Tree visualization with tree-maps: 2-d spacefilling approach. Transactions Graphics, pages 92–99.
[Sokolova and Lapalme 2009] Sokolova, M. and Lapalme, G. (2009). A systematic analysis
of performance measures for classification tasks. Inform. Process. Manag., pages 427–
437.
[Tzerpos and Holt 1999] Tzerpos, V. and Holt, R. C. (1999). Mojo: A distance metric for
software clusterings. In Working Conference on Reverse Engineering, pages 187–193.
[Zapalowski et al. 2014] Zapalowski, V., Nunes, I., and Nunes, D. (2014). Revealing the
relationship between architectural elements and source code characteristics. In International Conference on Program Comprehension, pages 14–25.
20
Uma Ferramenta para Verificação de Conformidade Visando
Diferentes Percepções de Arquiteturas de Software
Izabela Melo1 , Dalton Serey1 , Marco Tulio Valente2
1
Laboratório de Práticas de Software – Departamento de Sistemas e Computação –
Universidade Federal de Campina Grande (UFCG) – Campina Grande – PB – Brasil
2
Departamento de Ciência da Computação –
Universidade Federal de Minas Gerais (UFMG) – Belo Horizonte – MG – Brasil
[email protected], [email protected], [email protected]
Abstract. Current architecture conformance checking tools do not take into account the different levels of abstraction for defining architectural rules. For
example, architects often use a descriptive language, whereas developers prefer
automatic and testable technologies. In this paper we present ARTT, an architecture conformance checking tool which allows the automatic transformation
between different architectural representations. ARRT extracts rules defined in
a document, written in a declarative language, and generates design tests using
the DesignWizard API. The tests generated by ARTT agreed with 89,12% of the
tests written by a specialist.
Resumo. As atuais ferramentas de verificação de conformidade arquitetural
não levam em consideração os diferentes nı́veis de abstração para definir regras
arquiteturais. Por exemplo, arquitetos de software normalmente usam linguagem descritiva, enquanto os desenvolvedores preferem tecnologias automáticas
e testáveis. Este trabalho apresenta ARTT, uma ferramenta de verificação arquitetural que considera a existência de diferentes nı́veis de abstração arquitetural,
permitindo a transformação automática entre eles. ARTT extrai regras definidas
em um documento, descritas em uma linguagem declarativa, e as transforma em
testes de design, utilizando DesignWizard. Os testes gerados automaticamente
por ARTT concordam com 89,12% dos testes escritos por um especialista.
URL do vı́deo: www.youtube.com/watch?v=PWleNX0mqDQ
1. Introdução
Arquitetura de software envolve um conjunto de decisões e regras arquiteturais que estabelecem relações entre os componentes de uma aplicação [Jansen and Bosch 2005],
sendo um dos artefatos mais importantes no ciclo de vida de um sistema
[Knodel and Popescu 2007]. Ela interfere nos objetivos de negócios, objetivos funcionais e na qualidade do sistema. A arquitetura, uma vez criada, é raramente atualizada. Aliado a esse fato, restrições técnicas de desenvolvimento e requisitos conflitantes de qualidade que podem surgir durante a implementação são fatores que geram
violações arquiteturais. Com a evolução do sistema, o número de violações tende a
crescer e não serem removidas [Brunet et al. 2012]. Para evitar o surgimento de problemas causados pelo acúmulo de violações arquiteturais e garantir a adequação da arquitetura, diversas técnicas de verificação de conformidade arquitetural já foram propostas [Passos et al. 2010] [Knodel and Popescu 2007]. Tais técnicas verificam se a
21
implementação do sistema está de acordo com a arquitetura planejada pelos desenvolvedores e arquitetos [Clements et al. 2003].
Garantir a conformidade arquitetural de um sistema é importante para permitir
reuso, compreensão do sistema, consistência da documentação com a implementação,
controle da evolução do sistema e permitir a discussão entre os membros da equipe sobre a estrutura do sistema [Knodel and Popescu 2007]. Porém, as atuais técnicas para
verificação arquitetural não levam em consideração os diferentes nı́veis de abstração da
arquitetura. Enquanto a equipe de arquitetura tende a preferir linguagens que expressam
propriedades de forma declarativa e/ou descritiva, equipes de desenvolvedores tendem a
preferir linguagens de natureza comportamental e/ou executáveis para expressar restrições
e/ou regras arquiteturais. Com isso, nem sempre a comunicação entre os diferentes nı́veis
de abstração é consistente. Além disso, transformar uma abstração em outra pode ser uma
atividade dispendiosa e sujeita a erros.
A solução proposta neste artigo teve seu inı́cio em uma cooperação com a equipe
responsável pelas atividades de verificação arquitetural de software da Dataprev. No contexto dessa empresa, equipes distintas executam as atividades de verificação arquitetural
e de desenvolvimento de software. Um dos arquitetos dessa empresa afirmou: ”De inı́cio,
pensamos em utilizar UML para escrever a arquitetura do software. Porém, na prática,
UML não é utilizado. É pesado, estrito e sofre mudanças constantes. Nós optamos por
escrever as restrições arquiteturais em português e utilizar DesignWizard para realizar
a checagem de conformidade. Mas escrever, manualmente, os testes de design em Java,
apesar de bem aceito na equipe de desenvolvimento, tomaria tempo da equipe de arquitetura e/ou da equipe de desenvolvimento. A equipe de arquitetura precisa de uma
linguagem que se aproxime do português ou do inglês, que seja mais descritiva, e de uma
ferramenta que transforme nossas regras em testes de design”.
Com isso, este artigo apresenta ARTT (Architectural Representation Transformation Tool), uma ferramenta de transformação automática entre os nı́veis de abstração das
equipes de desenvolvimento e arquitetura. O objetivo central é permitir uma comunicação
mais rápida e consistente entre essas equipes. De um lado, a equipe de arquitetura define
as regras arquiteturais em uma linguagem declarativa, inspirada em DCL (Dependency
Constraint Language) [Terra and Valente 2009], uma linguagem de domı́nio especı́fico
criada para representar arquiteturas de software. Do outro lado, a equipe de desenvolvimento terá as definições das regras arquiteturais escritas em testes de design, os quais
poderão, então, ser incorporados ao conjunto de testes funcionais do sistema. Os testes de
design são escritos em Java (linguagem mais próxima da equipe de desenvolvimento) com
o auxı́lio da API DesignWizard [Brunet et al. 2009], que dá suporte a análise estática de
programas Java. A transformação automática proposta economiza tempo no processo de
verificação arquitetural e garante que cada equipe utilize seu próprio nı́vel de abstração.
Como estudo de caso da ferramenta, foi utilizado um sistema real (e-Pol - Sistema
de Gestão das Informações de Polı́cia Judiciária). Nosso estudo mostrou que os testes
gerados automaticamente por ARTT concordam com 89,12% dos testes gerados manualmente por um especialista. A discordância entre os dois conjuntos de testes foi causada
por defeitos na API DesignWizard. Estes, já foram comunicados e discutidos com equipe
de evolução da API.
22
O restante deste artigo está organizado da seguinte forma. A seção 2 apresenta
a ferramenta ARTT. A seção 3 apresenta um estudo de caso. A seção 4 apresenta os
trabalhos relacionados e a seção 5 apresenta as conclusões.
2. A Ferramenta ARTT
ARTT é uma ferramenta de verificação que leva em consideração dois nı́veis de abstração
arquitetural: o da equipe de arquitetura, utilizando uma linguagem simples e declarativa
para descrever regras arquiteturais; e o da equipe de desenvolvimento, utilizando uma
linguagem de programação para escrever regras arquiteturais no formato de testes automáticos.
A Figura 1 apresenta a ideia geral de ARTT. Documentos arquiteturais, escritos
numa extensão .pdf, são recebidos como parâmetro de entrada. Nesses documentos, as
regras arquiteturais são precedidas de “#archtest”. Com isso, é possı́vel detalhar, em
linguagem natural, informações sobre a arquitetura que sejam relevantes para o leitor
do documento. Em seguida, as regras são extraı́das e armazenadas em um arquivo com
extensão .arch. Caso o usuário da ferramenta não queira escrever um arquivo em .pdf,
ele pode também passar como parâmetro de entrada o arquivo .arch. Essa especificação
é, então, transformada em testes de design escritos em Java, utilizando DesignWizard
e JUnit. Os testes, neste momento, estão prontos para serem executados, de forma a
capturar violações arquiteturais.
Figura 1. Funcionamento da ferramenta.
Acredita-se que utilizar uma linguagem declarativa para definir regras arquiteturais seja mais simples e rápido para o arquiteto (usuário da ferramenta em questão). Essa
linguagem foi inspirada em DCL e possui a sintaxe apresentada na Figura 2. A diferença
entre a linguagem utilizada nessa ferramenta e DCL é pequena. As modificações adicionadas por ARTT tiveram como objetivo apenas aproximá-la da lı́ngua inglesa. Por
exemplo, enquanto em DCL define-se um módulo com “module NOME: DIRETORIO”,
na linguagem de ARTT utiliza-se “module NOME is DIRETORIO” ou “#archtest module
NOME is DIRETORIO” (se a definição for realizada num arquivo .pdf ).
Por outro lado, testes de design são adequados para representar a arquitetura para
a equipe de desenvolvimento, pois são escritos em uma linguagem mais próxima dessa
equipe, são verificáveis durante a execução dos testes funcionais e podem garantir integridade e consistência arquitetural. Porém, escrever e especificar testes de design normalmente é uma tarefa dispendiosa para arquitetos de software. Portanto, foi criado um
tradutor que transforma automaticamente uma abstração na outra para que a comunicação
23
Figura 2. Sintaxe da versão de DCL utilizada por ARTT.
Tabela 1. Possı́veis entradas da ferramenta.
Documento Arquitetural em .pdf
O módulo A é formado pelas classes Exemplo1 e Exemplo2.
#archtest module A is Exemplo1, Exemplo2
O módulo B é formado pelas classes com endereço br.gov.classesB.*
#archtest module B is br.gov.classesB
Uma das regras arquiteturais definidas pelos arquitetos determina que A não pode acessar B
#archtest rule A cannot-access B
Documento Arquitetural em .arch
module A is Exemplo1, Exemplo2
module B is br.gov.classesB
rule A cannot-access B
seja mais rápida e consistente. As transformações são realizadas seguindo as regras apresentadas na Tabela 2. Como exemplo, a regra definida pelo arquiteto na Tabela 1 foi
transformada, por ARTT, no teste de design apresentado no Algoritmo 1.
3. Estudo de Caso
O objeto do estudo de caso é o projeto e-Pol - Sistema de Gestão das Informações de
Polı́cia Judiciária, desenvolvido em parceria pela Polı́cia Federal do Brasil e a Universidade Federal de Campina Grande. Os arquitetos do projeto e-Pol definiram três módulos
e seis regras arquiteturais básicas. As regras foram escritas em um .pdf na linguagem
proposta pela ferramenta. ARTT foi executado com sucesso, transformando as regras definidas em testes de design. Com a execução dos testes de design, foram detectadas 1489
violações arquiteturais distribuı́das em 4 regras.
Em seguida, um survey foi aplicado aos arquitetos do projeto e-Pol com o objetivo de capturar a sua percepção com relação à ferramenta proposta. Todos os arquitetos envolvidos no experimento consideraram a linguagem expressiva, simples e fácil
de ser utilizada. Além disso, eles consideraram que demandará um tempo para que os
24
Tabela 2. Exemplos de transformações.
Regra
module M is S exceptClasses C
Pseudocódigo
for element in S:
for class in allClassesDesignWizard:
if S contains class and C not contains class:
M.add(c)
rule A cannot-access B exceptClasses C
for a in A:
for element in a.getCalleeClasses():
assert B not contains element or
C contains element
rule only A can-implement B
for b in B:
for element in b.getEntitiesThatImplements():
assert A contains element
rule A cannot-extends B
for a in A:
for b in B:
assert not a.extendsClass(b)
membros da equipe se acostumem com a linguagem e a abordagem de testes contı́nuos
e automáticos. Porém, esse tempo seria maior se os testes de design tivessem que ser
escritos manualmente. O plano de verificação arquitetural definido por eles não mudará,
visto que apenas será inserida uma ferramenta de transformação. Um dos arquitetos citou
que “Apesar de existir um custo, é importante implementar essa abordagem para que
aumente a expectativa de vida útil do sistema”.
Com o objetivo de verificar se os testes gerados automaticamente pela ferramenta
realmente capturam violações arquiteturais, foram inseridas algumas violações mutantes
no código. Como mostrado na Figura 3, as violações inseridas foram realmente capturadas pelos testes (regras 2, 4, 5 e 6).
Por fim, para verificar se os testes gerados automaticamente estavam corretos, foi
solicitado que um especialista em testes de design (usando DesignWizard) escrevesse os
testes para as seis regras manualmente. Os resultados das execuções dos testes foram
comparados (Figura 4) e 162 das violações encontradas (10,87%) pelos testes gerados
automaticamente não foram encontradas pelos testes do especialista. Enquanto isso, duas
das violações encontradas pelos testes do especialista (0,13%) não foram encontradas
pelos testes gerados automaticamente.
Nas regras que apontaram inconsistência no número de violações entre os dois
conjuntos de testes, foi observado que a única diferença entre as implementações era que
o especialista utilizou o método getCallerClasses() do DesignWizard, enquanto os testes
25
Algoritmo 1. Teste de design gerado automaticamente pela ferramenta.
1
/ / A c a n n o t −a c c e s s B
2
public void t e s t R u l e 1 ( ) {
3
System . o u t . p r i n t l n ( ” R e g r a : A c a n n o t −a c c e s s B” ) ;
4
S e t <ClassNode > a l l C l a s s e s E x c e p t = new HashSet <ClassNode > ( ) ;
5
f o r ( C l a s s N o d e c a l l e r : A) {
6
S e t <ClassNode > c a l l e e A = new HashSet <ClassNode > ( ) ;
7
calleeA . addAll ( c a l l e r . g e t C a l l e e C l a s s e s ( ) ) ;
8
i f ( ! c a l l e e A . isEmpty ( ) ) {
9
for ( ClassNode c a l l e e : c a l l e e A ) {
10
try {
11
Assert . assertTrue ( allClassesExcept . contains ( callee )
12
| | (!B. contains ( callee ) ) ) ;
13
} catch ( A s s e r t i o n F a i l e d E r r o r e ) {
14
System . o u t . p r i n t l n ( c a l l e r . getName ( ) + ” dependsOn ”
15
+ c a l l e e . getName ( ) ) ;
16
}
17
}
18
}
19
}
20 }
Figura 3. Comparação da execução dos testes gerados automaticamente em
código com e sem mutantes.
gerados automaticamente utilizam o método getCalleeClasses(). Após estudo e inspeção
do código do DesignWizard, concluiu-se, então, que a diferença obtida entre os resultados
dos dois conjuntos de testes refere-se à API utilizada. Este fato foi comunicado à equipe
de evolução da API e já está em processo de correção.
Assim, acredita-se que seja confiável utilizar ARTT para transformar a
representação arquitetural descritiva em testes de design. Há economia de tempo, evitando que os arquitetos precisem escrever manualmente os testes de design. Por outro
lado, os desenvolvedores podem verificar continuamente se há violações arquiteturais
sendo inseridas pois os testes podem ser incorporados ao conjunto de testes do projeto
e, além disso, os testes de design são escritos numa linguagem próxima da equipe de
desenvolvimento.
Sabemos que não é possı́vel generalizar os resultados deste estudo de caso. Além
disso, os defeitos encontrados na API DesignWizard afetam nossos resultados. Somente
após a correção da API será possı́vel realizar um estudo mais aprofundado.
26
Figura 4. Comparação da execução dos testes gerados automaticamente com
testes escritos pelo especialista.
4. Trabalhos Relacionados
Na área de evolução de software, há várias pesquisas com o objetivo de encontrar formas
simples e práticas de verificação arquitetural. Em sua maioria, a automatização de alguma
etapa no processo de verificação é uma questão de grande importância, visto que realizar uma verificação arquitetural de forma manual, muitas vezes, se torna uma atividade
complexa, principalmente se o sistema é de larga escala [Postma 2003].
Existem várias abordagens para realizar a verificação arquitetural, mas nem sempre são usadas. Muitas vezes isso ocorre porque a linguagem utilizada na verificação é
diferente da linguagem usada na aplicação [Brunet et al. 2009]. Nesse contexto, Brunet
et al. desenvolveram uma API, chamada DesignWizard, que permite escrever testes de
design para implementações em Java, usando JUnit. Com DesignWizard, os desenvolvedores podem ter um melhor entendimento do sistema, já que utiliza a mesma linguagem
de desenvolvimento. Além disso, a documentação da arquitetura passa a ser executável,
facilitando a tarefa de conformidade arquitetural. Como podem ser agregados ao conjunto
de testes funcionais, testes de design são úteis para garantir que as decisões arquiteturais
sejam seguidas (sem demandar uma análise manual). Porém, os arquitetos de software
necessitam de mais tempo para entender a API e escrever as regras arquiteturais antes de
serem repassadas para os desenvolvedores.
Além disso, nenhuma dessas técnicas de verificação arquitetural leva em
consideração os diferentes nı́veis de abstração entre as equipes, nem tão pouco, uma
transformação automática entre eles. Enquanto para a equipe de arquitetura um nı́vel
declarativo é mais recomendável, para a equipe de desenvolvimento um nı́vel executável
e testável tende a ser mais adequado.
Há dois trabalhos que tratam de transformações entre diferentes nı́veis de
abstração. Pires et al. propuseram uma técnica para transformar automaticamente diagramas de classe em UML para testes de design [Pires et al. 2008]. Rabelo et al. propuseram uma técnica para transformar automaticamente diagramas de sequência em UML
para testes de design [Rabelo and Pinto 2012]. Apesar de ambos serem tradutores de
nı́veis de abstrações diferentes, nenhum deles se refere à verificação arquitetural.
5. Conclusões
DesignWizard é capaz de agilizar o processo de verificação de conformidade arquitetural
[Brunet et al. 2011]. Portanto, acredita-se que a ferramenta ARTT pode introduzir econo27
mia de tempo no processo de verificação arquitetural, pois permite transformar automaticamente os dois nı́veis de abstração (da equipe de arquitetura e da equipe de desenvolvimento). Os desenvolvedores terão sua representação arquitetural escrita numa linguagem
próxima a de desenvolvimento e, além disso, poderão incorporar os testes de design ao
conjunto de testes funcionais do projeto. A comunicação entre as duas equipes, portanto,
pode ser mais rápida e consistente. Pelos estudos realizados, a ferramenta transforma a
linguagem definida neste trabalho em testes de design de forma satisfatória, possuindo
uma concordância de 89,12% com os testes escritos manualmente por um especialista.
Como trabalho futuro, esta ferramenta pode ser estendida para outras linguagens.
Para tanto, é preciso implementar a API DesignWizard para as linguagens de saı́da desejadas e adaptar ARTT. Ainda como trabalho futuro, pretende-se realizar uma pesquisa
qualitativa para avaliar como os arquitetos de software realizam atividades de verificação
arquitetural. Espera-se entender melhor porque a indústria não utiliza ferramentas automáticas para tal atividade, apesar de existirem diversas abordagens na academia.
Referências
Brunet, J., Bittencourt, R., Guerrero, D., and Figueredo, J. (October 2012). On the Evolutionary Nature
of Architectural Violations. Proceedings of the 19th International Conference on Reverse Engineering
(WCRE 2012).
Brunet, J., Guerrero, D., and Figueredo, J. (2011). Structural Conformance Checking with Design Tests:
An Evaluation of Usability and Scalability. ICSM.
Brunet, J., Guerrero, D., and Figueredo, J. (May 2009). Design Tests: An Approach to Programmatically
Check you Code Against Design Rules. Proceedings of the 31st International Conference on Software
Engineering (ICSE 2009), New Ideas and Emerging Results.
Clements, P., Garlan, D., Little, R., Nord, R., and Stafford, J. (2003). Documenting software architectures:
views and beyond. Proceedings of the 25th International Conference on Software Engineering, pages
740–741.
Jansen, A. and Bosch, J. (2005). Software architecture as a set of architectural design decisions. Proceedings
of the 5th Working Conference on Software Architecture, pages 109–120.
Knodel, J. and Popescu, D. (2007). A comparison of static architecture compliance checking approaches.
In IEEE/IFIP Working Conference on Software Architecture, pages 44–53.
Passos, L., Terra, R., Diniz, R., Valente, M. T., and Mendonca, N. C. (2010). Static architecture conformance checking – an illustrative overview. IEEE Software, 27(5):82–89.
Pires, W., Brunet, J., Ramalho, F., and Guerrero, D. (2008). UML-based design test generation. 23nd ACM
Symposium on Applied Computing (SAC 2008), pages 735–740.
Postma, A. (2003). A method for module architecture verification and its application on a large componentbased system. Information & Software Technology 45(4), pages 171–194.
Rabelo, J. and Pinto, S. E. (2012). Verificação de conformidade entre diagramas de sequência UML e
código Java. Dissertação de Mestrado. Campina Grande, Brasil.
Terra, R. and Valente, M. T. (2009). A dependency constraint language to manage object-oriented software
architectures. Software: Practice and Experience, 32(12):1073–1094.
28
JExtract: An Eclipse Plug-in for Recommending Automated
Extract Method Refactorings
Danilo Silva1 , Ricardo Terra2 , Marco Túlio Valente1
1
Federal University of Minas Gerais, Brazil
2
Federal University of Lavras, Brazil
{danilofs,mtov}@dcc.ufmg.br, [email protected]
Abstract. Although Extract Method is a key refactoring for improving program
comprehension, refactoring tools for such purpose are often underused. To address this shortcoming, we present JExtract, a recommendation system based
on structural similarity that identifies Extract Method refactoring opportunities that are directly automated by IDE-based refactoring tools. Our evaluation
suggests that JExtract is more effective (w.r.t. recall and precision) to identify
contiguous misplaced code in methods than JDeodorant, a state-of-the-art tool.
Tool demonstration video. http://youtu.be/6htJOzXwRNA
1. Introduction
Refactoring has increased in importance as a technique for improving the design of
existing code [2], e.g., to increase cohesion, decrease coupling, foster maintainability,
etc. Particularly, Extract Method is a key refactoring for improving program comprehension. Besides promoting reuse and reducing code duplication, it contributes to readability
and comprehensibility, by encouraging the extraction of self-documenting methods [2].
Nevertheless, recent empirical research indicate that, while Extract Method is one
of the most common refactorings, automated tools supporting this refactoring are most of
the times underused [5, 4]. For example, Negara et al. found that Extract Method is the
third most frequent refactoring, but the number of developers who apply the refactoring
manually is higher than the number of those who do it automatically [5]. Moreover,
current tools focus only on automating refactoring application, but developers expend
considerable effort on the manual identification of refactoring opportunities.
To address this shortcoming, this paper presents JExtract, a tool that implements
a novel approach for recommending automated Extract Method refactorings. The tool was
designed as a plug-in for the Eclipse IDE that automatically identifies, ranks, and applies
the refactoring when requested. Thereupon, JExtract may aid developers to find refactoring opportunities and contribute to a widespread adoption of refactoring practices. The
underlying technique is inspired by the separation of concerns design guideline. More
specifically, we assume that the structural dependencies established by Extract Method
candidates should be very different from the ones established by the remaining statements
in the original method.
The remainder of this paper is structured as follows. Section 2 describes the
JExtract tool, including its design and implementation. Section 3 discusses related tools
and Section 4 presents final remarks.
29
2. The JExtract tool
JExtract is a tool that analyzes the source code of methods and recommends Extract
Method refactoring opportunities, as illustrated in Figure 1. First, the tool generates all
Extract Method possibilities for each method. Second, these possibilities are ranked according to a scoring function based on the similarity between sets of dependencies established in the code.
JExtract
Generation of Candidates
Source Code
publicgclassgCg{
gg...
ggvoidgmethodM(Aga)g{
ggggFoogfg=gnewgFoo();
ggggifg(x)g{
ggggggdoA(a);
ggggggintgyg=ggetY();
ggggggy++;
ggggggdoB();
gggg}
ggggsuper.methodM();
gg}
gg...
}
Scoring Function
A
B C
candidate
Ranking and Filtering
1
Extract Method
Recommendations
2
3
4
Figure 1. The JExtract tool
This main section of the paper is organized as follows. Subsection 2.1 provides an
overview of our approach for identifying Extract Method refactoring opportunities. Subsection 2.2 describes the design and implementation of the tool. Finally, Subsection 2.3
presents the results of our evaluation in open-source systems. A detailed description of the
recommendation technique behind JExtract is present in a recent full technical paper [9].
2.1. Proposed Approach
The approach is divided in three phases: Generation of Candidates, Scoring, and Ranking.
2.1.1. Generation of candidates
This phase is responsible for identifying all possible Extract Method refactoring opportunities. First, we split the methods into blocks, which consist of sequential statements that
follow a linear control flow. As an example, Figure 2 presents method mouseRelease
of class SelectionClassifierBox, extracted from ArgoUML. We can notice that each
statement is labeled using the SX.Y pattern, where X and Y denote the block and the statement, respectively. For example, S2.3 is the third statement of the second block, which
declares a variable cw.
public void mouseReleased(MouseEvent me) {
S1.1 for (Button btn : buttons) {
S2.1 int cx = btn.fig.getX() + btn.fig.getWidth() - btn.icon.getIconWidth();
S2.2 int cy = btn.fig.getY();
S2.3 int cw = btn.icon.getIconWidth();
S2.4 int ch = btn.icon.getIconHeight();
S2.5 Rectangle rect = new Rectangle(cx, cy, cw, ch);
S2.6 if (rect.contains(me.getX(), me.getY())) {
S3.1 Object metaType = btn.metaType;
S3.2 FigClassifierBox fcb = (FigClassifierBox) getContent();
S3.3 FigCompartment fc = fcb.getCompartment(metaType);
S3.4 fc.setEditOnRedraw(true);
S3.5 fc.createModelElement();
S3.6 me.consume();
S3.7 return;
}
}
S1.2 super.mouseReleased(me);
}
Figure 2. An Extract Method candidate in a method of ArgoUML (S3.2 to S3.5)
30
Second, we generate all Extract Method candidates based on Algorithm 1 (extracted from [9]).
Algorithm 1 Candidates generation algorithm [9]
Input: A method M
Output: List with Extract Method candidates
1: Candidates ← ∅
2: for all block B ∈ M do
3:
n ← statements(B)
4:
for i ← 1, n do
5:
for j ← i, n do
6:
C ← subset(B, i, j)
7:
if isV alid(C) then
8:
Candidates ← Candidates + C
9:
end if
10:
end for
11:
end for
12: end for
Fundamentally, the three nested loops in Algorithm 1 (lines 2, 4, and 5) enforce
that the list of selected statements attend the following preconditions:
• Only continuous statements inside a block are selected. In Figure 2, for example,
it is not possible to select a candidate with S3.2 and S3.4 without including S3.3.
• The selected statements are part of a single block of statements. In Figure 2, for
example, it is not possible to generate a candidate with both S2.6 and S3.1 since
they belong to distinct blocks.
• When a statement is selected, the respective children statements are also included.
In Figure 2, for example, when statement S2.6 is selected, its children statements
S3.1 to S3.7 are also included.
Last but not least, we do not ensure that every iteration of the loop yields an
Extract Method candidate because: (i) a candidate recommendation must respect a size
threshold defined by parameter Minimum Extracted Statements. The value is preset to
3 (changeable), which means that an Extract Method candidate has to have at least three
statements; and (ii) a candidate recommendation must respect the preconditions defined
by the Extract Method refactoring engine.
2.1.2. Scoring
This phase is responsible for scoring the possible Extract Method refactoring opportunities generated in the previous phase, using a technique inspired by a Move Method
recommendation heuristic [7]. Assume m0 as the selection of statements of an Extract
Method candidate and m00 the remaining statements in the original method m. The proposed heuristic aims to minimize the structural similarity between m0 and m00 .
Structural Dependencies: The set of dependencies established by a selection of statements S with variables, types, and packages is denoted by Dep var (S), Dep type (S), and
Dep pack (S), respectively. These sets are constructed as described next.
• Variables: If a statement s from a selection of statements S declares, assigns, or
reads a variable v, then v is added to Dep var (S). In a similar way, reads from and
writes to formal parameters and fields are considered.
31
• Types: If a statement s from a selection of statements S uses a type (class or
interface) T , then T is added to Dep type (S).
• Packages: For each type T included in Dep type (S), as described in the previous
item, the package where T is implemented and all its parent packages are also
included in Dep pack (S).
For instance, assume m0 as the highlighted code in Figure 2 (i.e., an
Extract Method candidate) and m00 the remaining statements in the original
method mouseReleased. On one hand, Dep var (m0 ) = {metaType, fc, fcb}. On the
other hand, the set Dep var (m00 ) = {metaType, btn, cy, cx, cw, ch, buttons, me, rect}.
In this case, the intersection between these two sets contains only metaType. Moreover,
the computation of fc and fcb is isolated from the remaining code. Therefore, one can
claim that m0 is cohesive and decoupled from m00 , i.e., a good separation of concerns is
achieved.
Scoring Function: To compute the dissimilarity between m0 and m00 , we rely on the
distance between the dependency sets Dep0 and Dep00 using the Kulczynski similarity
coefficient [10, 7]:
a i
1h a
0
00
+
dist(Dep , Dep ) = 1 −
2 (a + b) (a + c)
T
where a = |Dep0
Dep00 |, b = |Dep0 \ Dep00 |, and c = |Dep00 \ Dep0 |.
Thus, let m0 be the selection of statements of an Extract Method candidate for
method m. Let also m00 be the remaining statements in m. The score of m0 is defined as:
score(m0 ) = 1/3 × dist(Dep var (m0 ), Dep var (m00 )) +
1/3 × dist(Dep type (m0 ), Dep type (m00 )) +
1/3 × dist(Dep pack (m0 ), Dep pack (m00 ))
The scoring function is centered on the observation that a good Extract Method
candidate should encapsulate the use of variables, types, and packages. In other words,
we should maximize the distance between the dependency sets Dep0 and Dep00 .
2.1.3. Ranking
This phase is responsible for ranking and filtering the Extract Method candidates based
on the score computed in the previous phase. Basically, we sort the candidates and filter
them according to the following parameters: (i) Maximum Recommendations per Method.
The value is preset to 3 (changeable), which means that the tool triggers up to three recommendations for each method; and (ii) Minimum Score Value, which has to be configured
when the user desires to setup a minimum dissimilarity threshold.
2.2. Internal Architecture and Interface
We implemented JExtract as a plug-in on top of the Eclipse platform. Therefore, we
rely mainly on native Eclipse APIs, such as Java Development Tools (JDT) and Language
32
Toolkit (LTK). The current JExtract implementation follows an architecture with five
main modules:
1. Code Analyzer: This module provides the following services to other modules:
(a) it builds the structure of block and statements (refer to Subsection 2.1.1);
(b) it extracts the structural dependencies (refer to Subsection 2.1.2); and (c) it
checks if an Extract Method candidate satisfies the underlying Eclipse Extract
Method refactoring preconditions. In fact, this module contains most communication between JExtract and Eclipse APIs (e.g., org.eclipse.jdt.core and
org.eclipse.ltk.core.refactoring).
2. Candidate Generator: This module generates all Extract Method candidates
based on Algorithm 1 and hence depends on service (a) of module Code Analyzer.
3. Scorer: This module calculates the dissimilarity of the Extract Method candidates
generated by module Candidate Generator (refer to Subsection 2.1.2) and hence
depends on service (b) of module Code Analyzer.
4. Ranker: This module ranks and filters the Extract Method candidates generated
by module Candidate Generator and scored by module Scorer. It depends on service (c) of module Code Analyzer to filter candidates not satisfying preconditions.
5. UI: This module consists of the front-end of the tool, which relies on the
Eclipse UI API (org.eclipse.ui) to implement two menu extensions, six
actions, and one main view. Moreover, it depends on module UI from LTK
(org.eclipse.ltk.ui.refactoring) to delegate the refactoring appliance to the
underlying Eclipse Extract Method refactoring tool.
Such architecture permits the extension of our tool. For example, the Scorer module may be replaced by one that employs a new heuristic based on semantic and structural
information. As another example, the Candidate Generator module may be extended to
support the identification of non-contiguous code fragments.
Figure 3 presents JExtract’s UI, displaying method mouseReleased previously
presented in Figure 2. When a developer triggers JExtract to identify Extract Method
refactoring opportunities for this method, it opens the Extract Method Recommendations
view to report the potential recommendations. In this case, the best candidate consists of
the extraction of statements S3.2 to S3.5 whose dissimilarity score is 0.7148.
2.3. Evaluation
We conducted two different but complementary empirical studies.
Study #1: In our previous paper [9], we evaluated the recommendations provided by our
tool on three systems to assess precision and recall. We extended this study to consider
minor modifications to the ranking method and to compare the results with JDeodorant, a
state-of-the-art tool that identifies Extract Method opportunities [11]. For each system S,
we apply random Inline Method refactoring operations to obtain a modified version S 0 .
33
Figure 3. JExtract UI
We assume that good Extract Method opportunities are the ones that revert the modifications (i.e., restoring S from S 0 ).
System
JHotDraw 5.2
JUnit 3.8
MyWebMarket
Total
#
56
25
14
95
Table 1. Study #1 – Recall and precision results
JExtract
Top-1
Top-2
Top-3
Recall Prec.
Recall
Prec.
Recall
Prec.
19 (34%)
34%
26 (46%)
24%
32 (57%)
20%
13 (52%)
52%
16 (64%)
33%
18 (72%)
25%
12 (86%)
86% 14 (100%)
50% 14 (100%)
33%
44 (46%)
46%
56 (59%)
30%
64 (67%)
23%
JDeodorant
Recall
2 (4%)
0 (0%)
2 (14%)
4 (4%)
Prec.
5%
0%
33%
6%
Table 1 reports recall and precision values achieved using JExtract with three
different configurations (Top-k Recommendations per Method). While a high parameter value
favors recall (e.g., Top-3), a low one favors precision (e.g., Top-1). Table 1 also presents results achieved using JDeodorant with its default settings. As the main finding, JExtract
outperforms JDeodorant regardless of the configuration used.
Study #2: We replicate the previous study in other ten popular open-source Java systems
to assess how the precision and recall rates would vary. Nevertheless, we do not compare
our results with JDeodorant since we were not able to reliably provide the source code
of all required libraries, as demanded by JDeodorant.
Table 2 reports the recall and precision values achieved using the same settings
from the previous study. On one hand, the overall recall value ranges from 25% to 49.2%.
On the other hand, the overall precision value ranges from 25% to 16.7%. We argue these
values are acceptable for two reasons: (i) we only consider as correct a recommendation
that matches exactly the one at the oracle; thus, a slight difference of including (or excluding) a statement is enough to be considered a miss; and (ii) the modified methods may
have preexisting Extract Method opportunities, besides the ones we introduced, that will
be considered wrong by our oracle.
34
Table 2. Study #2 – Recall and precision results
JExtract
Top-1
Top-2
Top-3
System
#
Recall
Prec.
Recall
Prec.
Recall
Ant 1.8.2
964
235 (24.4%) 24.4%
363 (37.7%) 19.1%
460 (47.7%)
ArgoUML 0.34
439
98 (22.3%) 22.3%
160 (36.4%) 18.3%
186 (42.4%)
227 (42.6%) 42.6%
338 (63.4%) 31.9%
389 (73.0%)
Checkstyle 5.6
533
FindBugs 1.3.9
714
179 (25.1%) 25.1%
278 (38.9%) 19.7%
350 (49.0%)
85 (24.4%) 24.4%
134 (38.5%) 19.4%
181 (52.0%)
FreeMind 0.9.0
348
JFreeChart 1.0.13 1,090
204 (18.7%) 18.7%
396 (36.3%) 18.2%
536 (49.2%)
JUnit 4.10
35
11 (31.4%) 32.4%
17 (48.6%) 26.6%
22 (62.9%)
99 (41.4%) 41.4%
125 (52.3%) 26.5%
142 (59.4%)
Quartz 1.8.3
239
SQuirreL SQL 3.1.2
39
15 (38.5%) 38.5%
18 (46.2%) 23.7%
20 (51.3%)
214 (19.9%) 19.9%
325 (30.2%) 15.2%
409 (38.0%)
Tomcat 7.0.2
1,076
Total
5,477 1,367 (25.0%) 25.0% 2,154 (39.3%) 19.8% 2,695 (49.2%)
Prec.
16.3%
14.4%
24.7%
16.7%
17.8%
16.5%
23.7%
20.4%
18.2%
12.8%
16.7%
3. Related Tools
Recent empirical research shows that automated refactoring tools, especially those supporting Extract Method refactorings, are most of the times underused [5, 4]. In view of
such circumstances, recent studies on identification of refactoring opportunities are seeking to address this shortcoming. In this paper, we implemented our approach in a way
that it can be straightforwardly incorporated to the current development process through
a tool that identifies, ranks, and automate Extract Method refactoring opportunities [9].
JMove is the refactoring recommendation system our approach is inspired by [7,
6]. The tool identifies Move Method refactoring opportunities based on the similarity
between dependency sets [7]. More specifically, it computes the similarity of the set of
dependencies established by a given method m with (i) the methods of its own class C1
and (ii) the methods in other classes of the system (C2 , C3 , ..., Cn ). Whereas JMove recommends moving a method m to a more similar class Ci , our current approach recommends
extracting a fragment from a given method m into a new method m0 when there is a high
dissimilarity between m0 and the remainder statements in m.
JDeodorant is the state-of-the-art system to identify and apply common refactoring operations in Java systems, including Extract Method [11]. In contrast to our approach
that relies on the similarity between dependency sets, JDeodorant relies on the concept
of program slicing to select related statements that can be extracted into a new method.
Our approach, on the other hand, is not based on specific code patterns (such as a computation slice). It is also more conservative to preserve program behavior (although it
is currently restricted to non-contiguous fragments of code), and it relies on a scoring
function to rank and filter recommendations.
There are other techniques to identify refactoring opportunities based, for example, on search-based algorithms [8], Relational Topic Model (RTM) [1], metrics-based
rules [3], etc., that can be adapted to identify Extract Method refactoring opportunities.
4. Final Remarks
JExtract implements a novel approach for recommending automated Extract Method
refactorings. The tool was designed as a plug-in for the Eclipse IDE that automatically
35
identifies, ranks, and applies the refactoring. Thereupon, the tool may contribute to increase the popularity of IDE-based refactoring tools, which are normally considered underused by most recent empirical studies on refactoring. Moreover, our evaluation indicates that JExtract is more effective (w.r.t. recall and precision) to identify contiguous
misplaced code in methods than JDeodorant, a state-of-the-art tool.
As ongoing work, we are extending JExtract to be able to do statement reordering to uncover better Extract Method opportunities, as long as the modification preserves
the behavior of the original code. Moreover, we intend to evaluate our tool with human
experts to mitigate the threat that the synthesized datasets did not capture the full spectrum of Extract Method instances faced by developers. Last, we also intend to support
other kinds of refactoring (e.g., Move Method).
The JExtract tool—including its source code—is publicly available at
http://aserg.labsoft.dcc.ufmg.br/jextract.
Acknowledgments: Our research is supported by CAPES, FAPEMIG, and CNPq.
References
[1] G. Bavota, R. Oliveto, M. Gethers, D. Poshyvanyk, and A. D. Lucia. Methodbook: Recommending move
method refactorings via relational topic models. IEEE Transactions on Software Engineering, pages
1–26, 2014.
[2] M. Fowler. Refactoring: Improving the design of existing code. Addison-Wesley, 1999.
[3] R. Marinescu. Detection strategies: Metrics-based rules for detecting design flaws. In 20th International
Conference on Software Maintenance (ICSM), pages 350–359, 2004.
[4] E. R. Murphy-Hill, C. Parnin, and A. P. Black. How we refactor, and how we know it. IEEE Transactions
on Software Engineering, 38(1):5–18, 2012.
[5] S. Negara, N. Chen, M. Vakilian, R. E. Johnson, and D. Dig. A comparative study of manual and automated
refactorings. In 27th European Conference on Object-Oriented Programming (ECOOP), pages 552–
576, 2013.
[6] V. Sales, R. Terra, L. F. Miranda, and M. T. Valente. JMove: Seus métodos em classes apropriadas. In IV
Brazilian Conference on Software: Theory and Practice (CBSoft), Tools Session, pages 1–6, 2013.
[7] V. Sales, R. Terra, L. F. Miranda, and M. T. Valente. Recommending move method refactorings using
dependency sets. In 20th Working Conference on Reverse Engineering (WCRE), pages 232–241,
2013.
[8] O. Seng, J. Stammel, and D. Burkhart. Search-based determination of refactorings for improving the class
structure of object-oriented systems. In 8th Conference on Genetic and Evolutionary Computation
(GECCO), pages 1909–1916, 2006.
[9] D. Silva, R. Terra, and M. T. Valente. Recommending automated Extract Method refactorings. In 22nd
International Conference on Program Comprehension (ICPC), pages 146–156, 2014.
[10] R. Terra, J. Brunet, L. F. Miranda, M. T. Valente, D. Serey, D. Castilho, and R. S. Bigonha. Measuring the
structural similarity between source code entities. In 25th Conference on Software Engineering and
Knowledge Engineering (SEKE), pages 753–758, 2013.
[11] N. Tsantalis and A. Chatzigeorgiou. Identification of extract method refactoring opportunities for the decomposition of methods. Journal of Systems and Software, 84(10):1757–1782, 2011.
36
ModularityCheck: A Tool for Assessing Modularity using
Co-Change Clusters
Luciana Lourdes Silva1,2 , Daniel Félix1 , Marco Túlio Valente1 , Marcelo de A. Maia3
1
Department of Computer Science – Federal University of Minas Gerais (UFMG)
2
3
Federal Institute of Minas Gerais – IFMG
Faculty of Computing – Federal University of Uberlândia
{luciana.lourdes, dfelix, mtov}@dcc.ufmg.br, [email protected]
Abstract. It is widely accepted that traditional modular structures suffer from
the dominant decomposition problem. Therefore, to improve current modularity
views, it is important to investigate the impact of design decisions concerning
modularity in other dimensions, as the evolutionary view. In this paper, we propose the ModularityCheck tool to assess package modularity using co-change
clusters, which are sets of classes that usually changed together in the past. Our
tool extracts information from version control platforms and issue reports, retrieves co-change clusters, generates metrics related to co-change clusters, and
provides visualizations for assessing modularity. We also provide a case study
to evaluate the tool.
http://youtu.be/7eBYa2dfIS8
1. Introduction
There is a growing interest in tools to enhance software quality [Kersten and Murphy 2006, Zimmermann et al. 2005]. Specifically, several tools have
been developed for supporting software modularity improvement [Rebêlo et al. 2014,
Vacchi et al. 2014, Bryton and Brito e Abreu 2008, Schwanke 1991]. Most of such tools
help architects to understand the current package decomposition. Basically, they extract
information from the source code by using structural dependencies and the source code
text [Robillard and Murphy 2007, Robillard and Murphy 2002].
Modularity is a key concept when designing complex software systems [Baldwin and Clark 2003]. The central idea is that modules should hide important
design decisions or decisions that are likely to change [Parnas 1972]. Typically, the standard approach to assess modularity is based on coupling and cohesion, calculated using the structural dependencies established between the modules of a system (coupling)
and between the internal elements from each module (cohesion). Usually, high cohesive and low-coupled modules are desirable because they ease software comprehension,
maintenance, and reuse. However, typical cohesion and coupling metrics measure a single dimension of the software implementation (the static-structural dimension). On the
other hand, it is widely accepted that traditional modular structures and metrics suffer
from the dominant decomposition problem and tend to hinder different facets that developers may be interested in [Kersten and Murphy 2006, Robillard and Murphy 2002,
Robillard and Murphy 2007]. For example, there are various effects of coupling that are
not captured by structural coupling. Therefore, to improve current modularity views, it
37
Figure 1. ModularityCheck’s overview.
is important to investigate the impact of design decisions concerning modularity in other
dimensions of a software system, as the evolutionary dimensions.
To address this question, we present in this paper the ModularityCheck tool to
support package modularity assessment and understanding using co-change clusters. The
proposed tool has the following features:
• The tool extracts commits automatically from the version history of the target
system and discards noisy commits by checking with their issue reports.
• The tool retrieves set of classes that usually changed together in the past, which
we termed co-change clusters.
• The tool relies on distribution maps [Ducasse et al. 2006] to reason about the projection of the extracted co-change clusters in the tradition decomposition of a system in packages. It also calculates a set of metrics defined for distribution maps
to support the characterization of the extracted co-change clusters.
2. ModularityCheck in a Nutshell
ModularityCheck supports the following stages to assess the quality of a system package modularity: pre-processing, post-processing, co-change clusters retrieval, and cluster
visualization. Figure 1 shows the process to retrieve co-change clusters. A detailed presentation of this process is available in a full technical paper [Silva et al. 2014].
In the first stage, the tool applies several preprocessing tasks, which are responsible for selecting commits from version history to create the co-change graph. In such
graphs, the vertices are classes and the edges link classes changed together in the same
commits. In the second stage, a post-processing task prune edges with small weights from
the co-change graphs. After that, the co-change graph is automatically processed to produce a new modular facet: co-change clusters, which abstract out common changes made
to a system, as stored in version control platforms. Therefore, co-change clusters represent sets of classes that changed together in the past. Finally, the tool uses distribution
maps [Ducasse et al. 2006]—a well-known visualization technique—to reason about the
projection of the extracted clusters in the traditional decomposition of a system in packages. ModularityCheck also provides a set of metrics defined for distribution maps to
reason about the extracted co-change clusters. Particularly, it is possible to reason about
recurrent distribution patterns of co-change clusters listed by the tool, including patterns
denoting well-modularized and crosscutting clusters.
2.1. Architecture
ModularityCheck supports package modularity assessment of software systems implemented in the Java language. The tool relies on the following inputs: (i) the issue reports
38
Figure 2. ModularityCheck’s architecture.
saved in XML files; (ii) URL of the version control platform (SVN or GIT). (iii) maximum
number of packages to remove highly scattered commits. (iv) minimum number of
classes in a co-change cluster. We discard small clusters because they may eventually
generate a decomposition of the system with hundreds of clusters. Figure 2 shows the
tool’s architecture which includes the following modules:
Co-Change Graph Extraction: As illustrated in Figure 2, the tool receives the URL
associated with the version control platform of the target system and the issue reports.
When extracting co-change graphs, it is fundamental to preprocess the considered
commits to filter out commits that may pollute the graph with noise. Firstly, the tool
removes commits not associated to maintenance issues because commits can denote
partial implementations of programming tasks. Secondly, the tool removes commits
not changing classes because the co-changes considered by ModularityCheck are
defined for classes. Thirdly, commits associated to multiple maintenance issues are
removed. Such commits could generate edges connecting classes modified to implement
semantically unrelated maintenance tasks, which were included in the same commit just
by convenience, for example. Finally, the last pruning task removes highly scattered
commits, according to the Maximum Scattering threshold, an input parameter. Such
commits usually are associated to refactoring activities, dead code removal, or changes
to comment styles. The default value considered by the tool is ten packages.
Co-Change Cluster Retrieval: After extracting the co-change graph, a post-processing
tasks is applied to prune edges with small weights. In this phase, edges with weights
less than two co-changes are removed. Then, in a further step, a data mining algorithm
named Chameleon [Karypis et al. 1999] is performed to retrieve subgraphs with high
density. The number of clusters is defined by executing Chameleon multiple times. After
each execution, small clusters are discarded by the Minimum Cluster Size threshold
informed by the user. The default value considered by the tool is four classes, i.e., after
the clustering execution, clusters with less than four classes are removed.
Metric Set Extraction: The tool calculates the number of vertices, edges, and cochange graph’s density before and after the post-processing filter. After retrieving the
co-change clusters, the tool presents the final number of clusters and several standard
descriptive statistics measurements. These metrics describes the size and density of
the extracted co-change clusters, and cluster average edges’ weight. Moreover, the
tool presents metrics defined for distribution maps, like focus and spread. ModularityCheck also allows to investigate the distribution of the co-change clusters over the
package structure by using distribution maps [Ducasse et al. 2006]. In our distribution
maps [Santos et al. 2013, Santos et al. 2014], entities (classes) are represented as small
39
Figure 3. Filters and metric results.
squares and package structure groups such squares into large rectangles. In the package
structure, we only consider classes that are members of co-change clusters, in order to
improve the maps visualization. Finally, all classes in a co-change cluster have the same
color.
A distribution maps’ metric, named focus, ranges between 0 and 1, where the
value one means that the cluster q dominates the packages that it touches. There is also a
second metric, called spread, that measures the number of packages touched by q.
After measuring focus and spread, the tool classifies recurrent distribution patterns of co-change clusters, as follows: well-encapsulated, partially encapsulated, wellconfined in packages, or crosscutting clusters. Well-encapsulated clusters are those that
dominate the packages they touch. Clusters classified as partially encapsulated have focus ≈ 1.0 but touching classes in other packages (spread > 1). Clusters defined as wellconfined have focus < 1.0 and spread = 1. Finally, clusters with crosscutting behavior
have focus ≈ 0 and spread >= 3.
3. Use Case Scenario: Geronimo Web Application Server
In order to present ModularityCheck, we provide a scenario of usage involving information from the Geronimo Web Application Server system, extracted almost 10 years
(08/20/2003 - 06/04/2013). Figure 3 shows the results concerning co-change clustering.
A detailed discussion of such results is presented in technical paper [Silva et al. 2014].
3.1. Co-Change Extraction
First, our tool extracted 9,829 commits. We maintained the value for Maximum Scattering as 10, i.e., the tool discarded commits changing classes located in more than ten
40
packages. After the pre-processing tasks, only 1,406 commits were considered as useful.
However, we observed that about 44.4% of the commits change a single software artifact
and therefore they would not be used anyway in terms of co-change.
3.2. Co-Change Clustering
In the next step, small clusters are discarded by following Minimum Cluster Size filter.
The tool removed clusters with less than 4 classes, resulting in 21 clusters. The ratio
between the final number of clusters and the number of packages in the system is 0.05%.
This fact is an indication that the maintenance activity in the system is concentrated in
few classes.
Figure 3a shows standard descriptive statistics measurements regarding the size,
density, average edge’s weight of the extracted co-change clusters. ModularityCheck
presents the size of the extracted co-change clusters, in terms of number of classes. The
extracted clusters have 7.48 ± 3.78 classes in Geronimo. Moreover, the biggest cluster
has a considerable number of classes: 20 classes. The tool also presents the density of
the extracted co-change clusters. The clusters have a density of 0.79 ± 0.23. We can also
analyze the average weight of the edges in the extracted co-change clusters. For a given
co-change cluster, we define this average as the sum of the weights of all edges divided
by the number of edges in the cluster. We can observe that the average edges’ weight is
not high, being slightly greater than two in Geronimo.
3.3. Modularity Analysis
ModularityCheck also provides a visualization, which relies on co-change clusters to assess the quality of a system’s package decomposition. Basically, this visualization allows
to reveal the distribution of the co-change clusters over the package structure by using
distribution maps. The tool also shows the standard descriptive statistics measurements
regarding respectively the focus and spread of the co-change clusters. As presented in
Figure 3a, the co-change clusters in Geronimo have high focus with the average 0.95.
Regarding spread, on average the spread is 3.19. Figure 3b shows the focus, spread, and
type of patterns for each cluster.
3.3.1. Geronimo Results
Figure 4 shows the distribution map for Geronimo. To improve the visualization, besides
background colors, we use a number in each class (small squares) to indicate their respective clusters. If we stop the mouse over a class, a tooltip is displayed with its respective
name. The large boxes are the packages and the text below is the package name.
Considering the clusters that are well-encapsulated (high focus) in Geronimo, we
found two relevant distribution patterns:
• Clusters well-encapsulated (focus = 1.0) in a single package (spread =
1). Four clusters have this behavior. As an example, we have Cluster 2,
which dominates the co-change classes in the package main.webapp.WEBINF.view.realmwizard (line 1 in the map, column 9). Cluster 5 (package
mail, line 1 in the map, column 10) and Cluster 11 (package security.remoting.jmx, line 1, column 3).
41
Figure 4. Distribution maps for Geronimo [Silva et al. 2014].
• Clusters partially encapsulated (focus ≈ 1.0), but touching classes in other packages (spread > 1). As an example, we have Cluster 8 (focus = 0.97, spread =
2), which dominates the co-change classes in the package tomcat.model (line
1 and column 1 in the map), but also touches the class TomcatServerGBean
from package tomcat (line 2, column 8).
3.4. Practical Usage
ModularityCheck can support software architects to assess modularity under an evolutionary view. It helps to detect co-change behavior patterns, as follows:
• When the package structure is adherent to the cluster structure, localized cochanges are likely to occur, as in Geronimo’s clusters.
• When there is no a clear adherence to the cluster structure. Our tool detects two
cluster patterns that may suggest modularity flaws. The first pattern denotes clusters with crosscutting behavior, not detected in Geronimo but we could detect
them in other systems presented in [Silva et al. 2014]. The second indicates clusters partially encapsulated that suggest a possible ripple effect – when changes in
a module can propagate to dependent modules – during maintenance activities.
4. Related Tools
Zimmermann et al. proposed ROSE, a tool that uses association rule mining on version
histories to recommend further changes [Zimmermann et al. 2005]. Their tool differs
from ours because they rely on association rules and we use co-change clusters that are
semantically related to a maintenance task. Furthermore, our goal is not to recommend
future changes but to assess modularity using distribution maps to compare and contrast
co-change clusters with the system’s packages.
ConcernMapper [Robillard and Weigand-Warr 2005] is an Eclipse Plug-in to organize and view concerns using a hierarchical structure similar to the package structure.
However, the concern model is created manually by developers and the relations between
concerns are typically syntactical and structural. On the other hand, in our tool, the elements and their relationships are obtained by mining the version history. Particularly,
42
relationships express co-changes and concerns are retrieved automatically by clustering
co-change graphs.
Wong et al. presented CLIO, a tool that detects and locates modularity violations [Wong et al. 2011]. CLIO compares how components should co-change according
to the modular structure and how components usually co-change retrieving information
from version history. A modularity violation is detected when two components usually change together but they belong to different modules, which are supposed to evolve
independently. CLIO identifies modularity violations by comparing the results of structural coupling with the results of change coupling. They compare association rules and
structural information to detect modularity violations. On the other hand, we retrieve cochange clusters and use distribution maps to reason about the projection of the extracted
clusters in the traditional decomposition of a software system in packages.
Palomba et al. proposed HIST, a tool that uses association rule mining on version
histories to detect the following code smells: Divergent Change, Shotgun Surgery, Parallel Inheritance, Blob, and Feature Envy [Palomba et al. 2013]. HIST bases on changes
at method level granularity. For each smell, they defined a heuristics that relies on association rules discovery or that analyzes co-changed classes/methods for detecting bad
smells. In contrast, our goal is not to detect code smells but to assess package decomposition using co-change clusters.
5. Conclusion
In this paper, we proposed a tool to assess modularity using evolutionary information. The
tool extracts commits automatically from version histories and filter out noisy information
by parsing issue reports. After that, the tool retrieves co-change clusters, a set of metrics
concerning clusters, and provides a visualization based on distribution maps. The central
goal of ModularityCheck is to detect classes of the target system that usually change
together to help on assessment of the package modular decomposition. Moreover, the
co-change clusters can also be used as an alternative view during maintenance tasks to
improve the developer’s comprehension of the their tasks. The ModularityCheck tool is
publicly available at: aserg.labsoft.dcc.ufmg.br/modularitycheck
Acknowledgement
This work was supported by CNPq, CAPES, and FAPEMIG.
References
Baldwin, C. Y. and Clark, K. B. (2003). Design Rules: The Power of Modularity. MIT
Press.
Bryton, S. and Brito e Abreu, F. (2008). Modularity-oriented refactoring. In 12th European Conf. on Soft. Maintenance and Reengineering (CSMR), pages 294–297.
Ducasse, S., Gı̂rba, T., and Kuhn, A. (2006). Distribution map. In 22nd IEEE International Conference on Software Maintenance (ICSM), pages 203–212.
Karypis, G., Han, E.-H. S., and Kumar, V. (1999). Chameleon: hierarchical clustering
using dynamic modeling. Computer, 32(8):68–75.
43
Kersten, M. and Murphy, G. C. (2006). Using task context to improve programmer productivity. In 14th International Symposium on Foundations of Software Engineering
(FSE), pages 1–11.
Palomba, F., Bavota, G., Penta, M. D., Oliveto, R., de Lucia, A., and Poshyvanyk, D.
(2013). Detecting bad smells in source code using change history information. In
28th IEEE/ACM International Conference on Automated Software Engineering (ASE),
pages 11–15.
Parnas, D. L. (1972). On the criteria to be used in decomposing systems into modules.
Communications of the ACM, 15(12):1053–1058.
Rebêlo, H., Leavens, G. T., Bagherzadeh, M., Rajan, H., Lima, R., Zimmerman, D. M.,
Cornélio, M., and Thüm, T. (2014). Modularizing crosscutting contracts with aspectjml. In 13th International Conference on Modularity (MODULARITY), pages 21–24.
Robillard, M. P. and Murphy, G. C. (2002). Concern graphs: finding and describing
concerns using structural program dependencies. In 24th International Conference on
Software Engineering (ICSE), pages 406–416.
Robillard, M. P. and Murphy, G. C. (2007). Representing concerns in source code. ACM
Transactions on Software Engineering and Methodology, 16(1):1–38.
Robillard, M. P. and Weigand-Warr, F. (2005). Concernmapper: simple view-based separation of scattered concerns. In OOPSLA workshop on Eclipse technology eXchange,
eclipse ’05, pages 65–69.
Santos, G., Santos, K., Valente, M. T., Serey, D., and Anquetil, N. (2013). Topicviewer:
Evaluating remodularizations using semantic clustering. In IV Congresso Brasileiro
de Software: Teoria e Prática (Sessão de Ferramentas), pages 1–6.
Santos, G., Valente, M. T., and Anquetil, N. (2014). Remodularization analysis using
semantic clustering. In IEEE Conference on Software Maintenance, Reengineering
and Reverse Engineering (CSMR-WCRE), pages 224–233.
Schwanke, R. (1991). An intelligent tool for re-engineering software modularity. In 13th
International Conference on Software Engineering (ICSE), pages 83–92.
Silva, L., Valente, M. T., and Maia, M. (2014). Assessing modularity using co-change
clusters. In 13th International Conference on Modularity, pages 49–60.
Vacchi, E., Olivares, D. M., Shaqiri, A., and Cazzola, W. (2014). Neverlang 2: A framework for modular language implementation. In 13th International Conference on Modularity (MODULARITY), pages 29–32.
Wong, S., Cai, Y., Kim, M., and Dalton, M. (2011). Detecting software modularity violations. In 33rd Int. Conference on Software Engineering (ICSE), pages 411–420.
Zimmermann, T., Weissgerber, P., Diehl, S., and Zeller, A. (2005). Mining version
histories to guide software changes. IEEE Transactions on Software Engineering,
31(6):429–445.
44
Nuggets Miner: Assisting Developers by Harnessing the
StackOverflow Crowd Knowledge and the GitHub
Traceability
Eduardo C. Campos1 , Lucas B. L. de Souza1 , Marcelo de A. Maia1
1
Department of Computer Science – Federal University of Uberlândia (UFU),
Uberlândia, MG, 38400-902, Brazil
{eduardocunha11,lucas.facom.ufu}@gmail.com, [email protected]
Abstract. StackOverflow.com (SOF) is a Question and Answer service oriented
to support collaboration among developers. The information available on this
type of service is also known as “crowd knowledge” and currently is one important trend in supporting activities related to software development. GitHub.com
(GitHub) is a successful social site for developers that makes unique information
about users and their activities visible within and across open source software
projects. The traceability of GitHub’s issue tracker can be harnessed in the Integrated Development Environment (IDE) to assist software maintenance. We
give a form to our approach by implementing Nuggets Miner, an Eclipse plugin,
that recommends a ranked and interactive list of results to system’s user. Video
Demo URL: https://www.youtube.com/watch?v=AjsbgUJl-nY
1. Introduction
Modern-day software development is inseparable from the use of the Application Programming Interfaces (APIs) [Duala-Ekoko and Robillard 2012]. Several studies have shown that developers face problems when dealing with unfamiliar APIs
[Duala-Ekoko and Robillard 2012, Holmes et al. 2006, Thung et al. 2013]. It is seldom
the case that the documentation and examples provided with a large framework or library
are sufficient for a developer to use their API effectively. Frequently, developers become
lost when trying to use an API, unsure of how to make a progress on a programming
task [Holmes et al. 2006]. A common behavior of developers is to post questions on social media services and receive answers from other programmers that belong to different
projects [Treude et al. 2011].
To help developers find their way, a widely-know alternative is StackOverflow
(SOF), which is a Question and Answer (Q&A) website which uses social media to facilitate knowledge exchange between programmers by mitigating the pitfalls involved in
using code from the Internet. Mamykina et al. conducted a statistical study of the entire SOF corpus to find out what is behind the immediate success of it. Their findings
showed that a majority of the questions will receive one or more answers (above 90%
very quickly - with a median answer time of 11 minutes) [Mamykina et al. 2011]. The set
of information available on this social media services is called “crowd knowledge” and
often become a substitute for the official software documentation [Treude et al. 2011].
Despite its usefulness, the knowledge provided by Q&A services cannot be directly leveraged from within an Integrated Development Environment (IDE), in the sense
45
that developers must toggle to the Web browser to access those services. Moreover,
when dealing with maintenance tasks, software developers often also need to know
what changes were made in the past of the project. Thus, the developers are forced to
explore the historical information of the project (e.g., issues and respective commits)
[Robillard and Dagenais 2010]. In order to address this problem, we examined a successful social site called GitHub 1 . This site makes unique information about users and their
activities visible within and across open source software projects [Dabbish et al. 2012].
Furthermore, the GitHub’s issue tracker has excellent traceability and this feature can be
harnessed in the IDE (e.g., given a closed issue, we can explore the respective commit).
Although GitHub’s site has an integrated issue tracker, it is not possible to search automatically for related issues to a particular maintenance task. Thus, during the maintenance
task, the developer is constantly reviewing the issue tracker in search for some issue related to your task. Concluding, developers spend most of their time in the IDE to write
and understand code [LaToza et al. 2006] and they should be only focused on the current
task without any major interruption or disturbance [Raskin 2000]. Nevertheless, developers are forced to leave the IDE, thus interrupting the programming flow and lowering
their focus on the current task, and also maybe getting distracted with other activities in
the Web.
To deal with those problems, recommendation systems can be a reasonable alternative option. According to Robillard et al., a recommendation system for software
engineering (RSSE) can assist developers during maintenance and development tasks
providing useful information (e.g., right code for a task, good example of API usage)
[Robillard et al. 2010]. This information can be gathered from the “crowd knowledge”
provided by Q&A services or gathered from closed issues in GitHub related to the current
maintenance task.
Considering StackOverflow, we can rely on regular dumps of the entire dataset
to obtain the desired information. In the case of GitHub, the current project that the
developer is working on must host their issues in the GitHub’s issue tracker instead of
other issue trackers (e.g., Bugzilla 2 ). Nuggets Miner extracts only issues with CLOSED
state (i.e., issues that were previously solved by other developers), displays the ranked
search issues directly in the IDE and allows developers to select an issue and explore the
historical changes made to the respective commit files.
Our work has the following contribution. We present Nuggets Miner, a recommendation system in the form of a plugin for Eclipse IDE 3 to assist software developers in
development and maintenance tasks. Our recommendation strategy has been partially assessed in [Souza et al. 2014] (i.e., only recommendations of SOF were evaluated). There
are several differences from this paper and [Souza et al. 2014] paper, but these two are
the most important: 1) that paper was not tool-oriented, indeed, no tool was presented; 2)
that paper was only about SOF posts, but Nuggets Miner also indexes project’s issues.
The rest of this paper is organized as follows. In Section 2 we illustrate Nuggets
Miner usage with a use case scenario. In Section 3 we present Nuggets Miner components
and its architecture. In Section 4 we discuss related work. In Section 5 we draw our
1
https://github.com/
http://www.bugzilla.org/
3
http://eclipse.org
2
46
conclusions.
2. A Use Case Scenario
We show how Nuggets Miner can help developers to solve programming problems by
leveraging SOF and GitHub traceability from within the Eclipse IDE.
Bob is required to build a panel with three tabs using Java Swing API. However,
he is novice in this library. Bob opens up the Eclipse IDE, with the Nuggets Miner plugin
installed and writes the following query in Nuggets Miner’s Navigator: “tab pane java
swing”. Figure 1 shows the search results returned by the search engine for the query “tab
pane java swing”: Q&A pairs in the left panel and issues in the right panel. Concerning
the StackOverflow panel, the search engine returns to Bob, the top 15 Q&A pairs from
SOF in a ranked list considering two main aspects: the textual similarity of the pairs with
respect to the query and the quality of pairs (whose content was previously evaluated by
SOF community). Among the recommended Q&A pairs, Bob finds out a pair whose title
is “JTabbedPane: show task progress in a tab”. Figure 2 shows the content of a selected
Q&A pair. He reads the Q&A pair and finds an accepted answer that creates an object
of JTabbedPane type and invokes the method public void addTab(String title, Icon icon,
Component component) of this object. Bob can also import the code snippet given in the
answer into program’s editor via drag & drop and execute the Java program (in this case
without any modification). Thus, Bob can start modifying the code in the editor to achieve
the desired outcome. Concerning the Github panel, in the list of returned issues, Bob can
choose an issue that he thinks is more related to his activity. Figure 3 shows the content of
a selected issue. He can visualize the conversations between Bob’s colleagues about the
selected issue (through Conversation tab) which is supposed to be related to his current
task. In the Figure 3, it is possible to visualize the list of commits with their respective
links (through Commits tab). When Bob clicks in the commit’s link, another page will
open showing the code modified by the commit. Figure 4 shows a snapshot of commit
selected by Bob.
Figure 1. Nuggets Miner’s Navigator: Search Results.
3. Nuggets Miner
In this section, we present Nuggets Miner’s architecture (Subsection 3.1), the mechanism
for data collection and classification (Subsection 3.2) and the query engine (Subsection
3.3).
47
Figure 2. Nuggets Miner’s Document’s Content for the Q&A pair selected by Bob.
Figure 3. Nuggets Miner’s Issue’s Content for the issue selected by Bob.
Figure 4. Snapshot of commit selected by Bob.
3.1. The Architecture
Figure 5 depicts Nuggets Miner’s architecture. The left part of this figure represents the
server side, while the right side represents the client side (i.e., the graphical user interface
and features of the plugin).
On the server side, there is a component for collection and classification of data,
which is responsible for collecting and classifying Q&A pairs from SOF. Through this
48
component, we can also collect issues with CLOSED state along with the respective commits for a given interest project hosted on GitHub. Therefore, the Apache Solr 4 index is
constructed with Q&A pairs from SOF and with issues and respective commits of the
developer’s project hosted in the GitHub.
Figure 5. Nuggets Miner’s architecture.
The client side is responsible for querying the Apache Solr index, parsing the
JSON response (converting the JSON in a object-oriented representation for further manipulation), applying the methodology for ranking the Q&A pairs, applying the methodology for ranking the related issues and present these search results to the user’s system.
The ranking criteria for Q&A pairs is based on two main aspects: the textual
similarity of the pairs with respect to the query and the quality of the pairs (assessed by
SOF community members), while the ranking for GitHub issues takes into account only
the textual similarity of the issues with respect to the query.
3.2. Mechanism for Data Collection and Classification
In this subsection, we explain the mechanism for data collection (Subsection 3.2.1) and
the mechanism for data classification for Q&A pairs (Subsection 3.2.2).
3.2.1. Data Collection
We downloaded a release of SOF public data dump 5 provided by Stack Exchange 6 ,
which comprises several XML files that represent the database of each website. Since
performing these operations by manipulating data directly from XML files is resource
intensive, we imported everything in a relational database in order to classify the SOF
Q&A pairs. The “posts” table of this database stored all questions posted by questioners
4
http://lucene.apache.org/solr/
http://blog.stackexchange.com/category/cc-wiki-dump/
6
http://stackexchange.com/
5
49
in the website until the date the dump was performed. This table also stores all answers
that were given to each question, if any.
To retrieve the issues with CLOSED state from a GitHub project, we developed
another program that connects in the GitHub server (informing the user and password) and
performs the download of these issues from a given repository (e.g., in our study we considered a Swing look-and-feel project called Insubstantial 7 that is hosted on GitHub8 ).
Our program used an object-oriented GitHub API 9 . Then, for each retrieved issue, the
program stores the issue data (e.g., “issue title”, “issue body”, “issue id”, “commit address of the issue in GitHub”, “code modified by the commit”) in a XML file in the
format required by Apache Solr search engine. For instance, the issue whose “id” is #124
belongs to the repository “Insubstantial/insubstantial”. The “issue title” of this issue is:
“Modify base delay of TabWindowPreview”. The commit address of this issue in GitHub
is: “https://github.com/Insubstantial/insubstantial/pull/124/commits”. This page will be
displayed inside the browser of Nuggets Miner plugin.
3.2.2. Data Classification for Q&A pairs
On SOF, users ask many kind of different questions. Accordingly to Nasehi et al.
[Nasehi et al. 2012] “questions from SOF can also be classified in a second dimension
that is about the main concerns of the questioners and what they wanted to solve”. In this
dimension, one of the categories is the How-to-do-it in which the questioner provides a
scenario and asks about how to implement it. This category is very close to scenario in
which a developer has a programming task at hand and needs to solve it. For this reason, in our approach, we only consider Q&A pairs that are classified as How-to-do-it. In
order to automate the selection of this kind of Q&A pairs, we developed a classification
strategy. The information about the categories of Q&A pairs proposed in this study, the
classifier’s attributes and the steps to build the dataset for training/test of the classifier are
described in more detail in [Souza et al. 2014].
We used this classifier to automatically classify Q&A pairs of a pre-selected set
of APIs (Swing of Java, Boost of C++ and LINQ of C#) into one of three categories:
How-to-do-it, Conceptual and Seeking-something. The Apache Solr index was populated
with only Q&A pairs of How-to-do-it category of these pre-selected set of APIs.
3.3. The Query Engine
Nuggets Miner’s Eclipse plugin makes the Q&A “crowd knowledge” and closed issues
of a working GitHub project available in the IDE. Users can interact with this “crowd
knowledge” in ways that the SOF website normally does not allow, such as import code
snippets to the program’s editor through simple drag & drop. The main goal of the query
engine is to communicate with Apache Solr, by creating a query given an input string.
It is necessary that the Q&A pair has some information on your “title”, “question body”
or “answer body” to be returned by the search engine. It is also needed that the GitHub
issues has some information on your “issue title”, “issue body” or “code modified by
7
http://shemnon.com/speling/2011/04/insubstantial-62-release.html
https://github.com/Insubstantial/insubstantial
9
http://github-api.kohsuke.org/
8
50
the commit” to be returned by the search engine. As stated above, the query engine
simultaneously queries Apache Solr index for both Q&A pairs and issues similar to the
entered query. The query engine tokenizes the string inserted by the developer. The
engine builds the query, according to Apache Solr syntax, in a way that every token must
be presented in the document fields.
4. Related Work
Ponzanelli et al. [Ponzanelli et al. 2013] presented an approach to assist programmers
who want to leverage the “crowd knowledge” of Q&A services. They implemented SEAHAWK, a recommendation system in the form of a plugin for the Eclipse IDE to harness
the “crowd knowledge” of SOF from within the IDE. In our work, we introduced a more
efficient ranking mechanism than SEAHAWK and provided the GitHub access point. We
used SEAHAWK software10 to help us developing our plugin.
Cordeiro et al. [Cordeiro et al. 2012] presented an Eclipse plugin to help developers in problem solving tasks. Based on an exception’s stack trace gathered from the IDE’s
console, they suggest related documents from SOF.
HIPKAT [ČubraniĆ et al. 2004] is a recommendation system developed to support newcomers in a project by recommending items from problem reports, newsgroup,
and articles. Our approach recommends project’s issues with CLOSED state related to
the maintenance task at hand.
Takuya et al. presented SELENE [Takuya and Masuhara 2011], a source code
recommendation tool based on an associative search engine. It spontaneously searches
and displays example programs while the developer is editing a program text. Our work
also relies on search engines, but we suggest Q&A pairs taken from SOF to enrich the
information provided by code snippets.
5. Conclusions
We presented a novel approach to leverage the SOF “crowd knowledge” and the GitHub
traceability. We have detailed the implementation of our approach, Nuggets Miner. This
recommendation system allows users interact with SOF knowledge by importing code
snippets. It also allows users navigate through related issues previously solved by others
developers. Thus, users can explore the respective commit for the recommended issue and
see the modifications. As future work, we intend to improve the evaluation of Nuggets
Miner with human subjects to assess the performance gains compared to the use of an
external browser.
6. Acknowledgments
This work was partially supported by FAPEMIG grant CEXAPQ-2086-11 and CNPQ
grant 475519/2012-4.
References
Cordeiro, J., Antunes, B., and Gomes, P. (2012). Context-based recommendation to support problem solving in sof. development. In Proceedings of 3rd Int. Workshop on
RSSE), pages 85–89.
10
http://seahawk.inf.usi.ch/download.html
51
Dabbish, L., Stuart, C., Tsay, J., and Herbsleb, J. (2012). Social Coding in Github:
Transparency and collaboration in an open software repository. CSCW ’12, pages
1277–1286. ACM.
Duala-Ekoko, E. and Robillard, M. P. (2012). Asking and answering questions about
unfamiliar apis: An exploratory study. In Proc. of ICSE’2012, pages 266–276. IEEE
Press.
Holmes, R., Walker, R. J., and Murphy, G. C. (2006). Approximate structural context
matching: An approach to recommend relevant examples. IEEE Trans. Softw. Eng.,
32(12):952–970.
LaToza, T. D., Venolia, G., and DeLine, R. (2006). Maintaining mental models: A study
of developer work habits. In Proc. of ICSE’2006, pages 492–501. ACM.
Mamykina, L., Manoim, B., Mittal, M., Hripcsak, G., and Hartmann, B. (2011). Design
lessons from the fastest q&a site in the west. In Proc. of the SIGCHI Conference on
Human Factors in Computing Systems, pages 2857–2866. ACM.
Nasehi, S., Sillito, J., Maurer, F., and Burns, C. (2012). What makes a good code example?
A study of programming Q&A in Stack Overflow. In Proc. of ICSM’2012, pages 25–
34.
Ponzanelli, L., Bacchelli, A., and Lanza, M. (2013). Leveraging crowd knowledge for
software comprehension and development. In Cleve, A., Ricca, F., and Cerioli, M.,
editors, Proc. of CSMR’2013, pages 57–66. IEEE Computer Society.
Raskin, J. (2000). The Humane Interface: New Directions for Designing Interactive
Systems. ACM Press/Addison-Wesley Publishing Co., New York, NY, USA.
Robillard, M. P. and Dagenais, B. (2010). Recommending change clusters to support
software investigation: An empirical study. J. Softw. Maint. Evol., 22(3):143–164.
Robillard, M. P., Walker, R. J., and Zimmermann, T. (2010). Recommendation systems
for software engineering. IEEE Software, 27(4):80–86.
Souza, L., Campos, E., and Maia, M. (2014). Ranking crowd knowledge to assist software
development. In Proc. of ICPC’2014, pages 1–11.
Takuya, W. and Masuhara, H. (2011). A spontaneous code recommendation tool based on
associative search. In Proceedings of the 3rd International Workshop on Search-Driven
Development, pages 17–20. ACM.
Thung, F., Wang, S., Lo, D., and Lawall, J. L. (2013). Automatic recommendation of api
methods from feature requests. In ASE, pages 290–300. IEEE.
Treude, C., Barzilay, O., and Storey, M.-A. (2011). How do programmers ask and answer
questions on the web? (nier track). In Proc. of ICSE’2011, pages 804–807. ACM.
ČubraniĆ, D., Murphy, G. C., Singer, J., and Booth, K. S. (2004). Learning from project
history: A case study for software development. In Proceedings of the 2004 ACM Conference on Computer Supported Cooperative Work, CSCW ’04, pages 82–91. ACM.
52
NextBug: A Tool for Recommending Similar Bugs in
Open-Source Systems
Henrique S. C. Rocha1 , Guilherme A. de Oliveira2 ,
Humberto T. Marques-Neto2 , Marco Túlio O. Valente1
1
Department of Computer Science
Federal University of Minas Gerais (UFMG)
Belo Horizonte – MG – 31.270-901 – Brazil
2
Department of Computer Science
Pontifical Catholic University of Minas Gerais (PUC Minas)
Belo Horizonte – MG – 30.535-901 – Brazil
[email protected], [email protected]
Abstract. Due to the characteristics of the maintenance process of open-source
systems, grouping similar bugs to improve developers productivity is a challenging task. In this paper, we proposed and evaluate a tool, called NextBug, for
recommending similar bugs in open-source systems. NextBug is implemented
as Bugzilla plug-in and it was design to help maintainers selecting the next bug
he/she would fix. We also report an experience on using NextBug with 109,145
bugs previously reported for Mozilla products.
Video URL: <http://youtu.be/Tt69zVobnF8>
1. Introduction
Considering the great importance, the costs, and the increasing complexity of software
maintenance activities, most organizations usually maintain their systems by performing tasks periodically, i.e., maintenance requests are grouped and implemented as part
of large software projects [Tan and Mookerjee 2005; Aziz et al. 2009; Junio et al. 2011;
Marques-Neto et al. 2013]. On the other hand, open-source projects typically adopt continuous maintenance policies where the maintenance requests are addressed by maintainers with different skills and commitment levels, as soon as possible, after being
registered in an issue tracking platform, such as Bugzilla and Jira [Mockus et al. 2002;
Tan and Mookerjee 2005; Liu et al. 2012].
However, this process is usually uncoordinated, which results in a high number of
issues from which many are invalid or duplicated [Liu et al. 2012]. In 2005, a certified
maintainer from the Mozilla Software foundation made the following comment on this
situation: “everyday, almost 300 bugs appear that need triaging. This is far too much for
only the Mozilla programmers to handle” [Anvik et al. 2006]. The dataset formed by bugs
reported for the Mozilla projects indicates that, in 2011, the number of reported issues per
year increased approximately 75% when compared to 2005. In this context, tools to
assist in the issue processing would be very helpful and could contribute to increase the
productivity of open-source systems development.
53
In this paper, we claim that a very simple form of periodic maintenance policy can
be promoted in open-source systems by recommending similar maintenance requests to
maintainers whenever they manifest interest in handling a given request. Suppose that a
developer has manifested interest in a bug with a textual description di . In this case, we
rely on text mining techniques to retrieve open bugs with descriptions dj similar to di and
we recommend such bugs to the maintainers.
More specifically, we present NextBug, a tool to recommend similar bugs to maintainers based on the textual description of each bug stored in Bugzilla, an issue tracking
system widely used by open-source projects. The proposed tool is compatible with the
software development process followed by open-source systems for the following reasons: (a) it is based on recommendations and, therefore, maintainers are not required to
accept extra bugs to fix; (b) it is a fully automatic and unsupervised approach which does
not depend on human intervention; and (c) it relies on information readily available in
Bugzilla. Assuming the recommendations effectively denote similar bugs and supposing that the maintainers would accept the recommendations pointed out by NextBug, the
tool can contribute to introduce gains of scale similar to the ones achieved with periodic
policies [Banker and Slaughter 1997]. We also report a field study when we populated
NextBug with a dataset of bugs reported for Mozilla systems.
The remainder of this paper is organized as follows. Section 2 discuss tools for
finding duplicated issue reports in bug tracking systems and also tools that assign bugs
to developers. The architecture and the central features of NextBug are described in
Section 3. An example of usage is presented in Section 4. Finally, conclusions are offered
in Section 5.
2. Related Tools
Most open-source systems adopt an Issue Tracking System (ITS) to support their maintenance processes. Normally, in such systems both users and testers can report modification
requests [Liu et al. 2012]. This practice usually results in a continuous maintenance process where maintainers address the change requests as soon as possible. The ITS provides
a central knowledge repository which also serves as a communication channel for geographically distributed developers and users [Anvik et al. 2006; Ihara et al. 2009].
Recent studies have focused on finding duplicated issue reports in bug tracking
systems. Duplicated reports can hamper the bug triaging process and may drain maintenance resources [Cavalcanti et al. 2013]. Typically, studies for finding duplicated issues
rely on traditional information retrieval techniques such as natural language processing,
vector space model, and cosine similarity [Alipour et al. 2013].
Approaches to infer the most suitable developer to correct a software issue are also
reported in the literature. Most of them can be viewed as recommendation systems that
suggest developers to handle a reported bug. For instance, [Anvik and Murphy 2011] proposed an approach based on supervised machine learning that requires training to create
a classifier. This classifier assigns the data (bug reports) to the closest developer.
However, to the best of our knowledge, we are not aware of any tool designed to
recommend similar bugs to maintainers of open-source systems.
54
Figure 1. NextBug Screenshot (similar bugs are shown in the lower right corner)
3. NextBug in a Nutshell
In this section, we present NextBug1 main features (Section 3.1). We also present the
tool’s architecture and main components (Section 3.2).
3.1. Main Features
Currently, there are several ITSs that are used in software maintenance such as Bugzilla,
Jira, Mantis, and RedMine. NextBug was implemented as a Bugzilla plug-in mainly
because this ITS is used by the Mozilla project, which was used to validate our tool.
When a developer is analyzing or browsing an issue, NextBug can recommend
similar bugs in the usual Bugzilla web interface. As described in Section 3.2, NextBug
uses a textual similarity algorithm to verify the similarity among bug reports.
Figure 1 shows an usage example of our tool. This figure shows a real bug from the
Mozilla project, which refers to a FirefoxOS application issue related to a mobile device
camera (Bug 937928). As we can observe, Bugzilla shows detailed information about
this bug, such as a summary description, creation date, product, component, operational
system, and hardware information. NextBug extends this original interface by showing
a list of similar bugs to the browsed one. This list is shown on the lower right corner.
Another important feature is that NextBug is only executed if its Ajax link is clicked and,
thus, it will not cause additional overhead or hinder performance to developers who do
not wish to use similar bug recommendations.
In Figure 1, NextBug suggested three similar bugs to the one which is browsed
on the screenshot. As we can note, NextBug not only detects similar bugs but it also
calculates an index to express this similarity. Our final goal is to guide the developer’s
workflow by suggesting similar bugs to the one he/she is currently browsing. If a developer chooses to handle one of the recommended bugs, we claim he/she can minimize the
context change inherent to the task of handling different bugs and, consequently, improve
his/her productivity.
1
NextBug is open-source and available under the Mozilla Public License (MPL) at <http://aserg.
labsoft.dcc.ufmg.br/nextbug/>.
55
Figure 2. NextBug Architecture
3.2. Architecture and Algorithms
Figure 2 shows NextBug’s architecture, including the system main components and the
interaction among them. As described in Section 3.1, NextBug is a plug-in for Bugzilla.
Therefore, it is implemented in Perl, the same language used in the implementation of
Bugzilla. Basically, NextBug instruments the Bugzilla interface used for browsing and
for selecting bugs reported for a system. NextBug registers an Ajax event in this interface
that calls NextBug passing the browsed issue as an input parameter.
NextBug architecture has two central components: Information Retrieval (IR)
Process and Recommender. The IR Process component obtains all open issues currently
available on the Bugzilla system along with the browsed issue. Then it relies on
IR techniques for natural language processing such as: tokenization, stemming, and
stop-words removal [Runeson et al. 2007]. We implemnted all such techniques in Perl.
After this processing, the issues are transformed into vectors using the vector space
model [Baeza-Yates and Ribeiro-Neto 1999; Runeson et al. 2007]. VSM is a classical
information retrieval model to process documents and to quantify their similarities. The
usage of VSM is accomplished by decomposing the data (available bug reports and
queries) into t-dimensional vectors, assigning weights to each indexed term. The weights
wi are positive real numbers that represent the i-th index term in the vector. To calculate
wi we used the following equation, which is called a tf-idf weight formula:
wi = (1 + log2 fi ) × log2
N
ni
where fi is the frequency of the i-th term in the document, N is the total number of
documents, and ni is the number of documents in which the i-th term occurs.
The Recommender component receives the processed issues and verifies the ones
similar to the browsed issue. The similarity is computed using the cosine similarity measure [Baeza-Yates and Ribeiro-Neto 1999; Runeson et al. 2007]. More specifically, the
similarity between the vectors of a document dj and a query q is described by the following equation, which is called the cosine similarity because it measures the cosine of the
angle between the two vectors:
Pt
→
− →
dj • −
q
i=1 wi,d × wi,q
qP
= qP
Sim(dj , q) = cos(Θ) = →
−
→
−
t
t
2
2
|| dj || × || q ||
i=1 (wi,d ) ×
i=1 (wi,q )
56
Since all the weights are greater or equal to zero, we have 0 ≤ Sim(dj , q) ≤ 1,
where zero indicates that there is no relation between the two vectors, and one indicates
the highest possible similarity, i.e., both vectors are actually the same.
The issues are then ordered according to their similarity before being returned to
Bugzilla. Since NextBug started as an Ajax event, the recommendations are showed in
the same Bugzilla interface used by developers for browsing and selecting bugs to fix.
4. Evaluation
We used a dataset with bugs from the Mozilla project to evaluate the proposed tool.
Mozilla is composed of 69 products from different domains which are implemented in
different programming languages. Mozilla project includes some popular systems such
as Firefox, Thunderbird, SeaMonkey, and Bugzilla. We considered only issues that were
actually fixed from January 2009 to October 2012. More specifically, we ignored issue
types such as “duplicated”, “incomplete”, and “invalid”.
Mozilla issues are also classified according to their severity in the following scale:
blocker, critical, major, normal, minor, and trivial. Table 1 shows the number and the
percentage of each of these severity categories in the considered dataset. This scale also
includes enhancements as a particular severity category. Although, it was not considered
in our study, i.e., we do not provide recommendations for similar enhancements.
Table 1. Issues per Severity
Severity
blocker
critical
enhancement
major
minor
normal
trivial
Total
Final Dataset
Issues
Number
%
2,720 2.08
7,513 5.76
3,600 2.76
7,508 5.75
3,660 2.80
103,385 79.23
2,109 1.62
130,495 100
109,145 83.64
Min
0
0
0
0
0
0
0
–
–
Days to Resolve
Max
Avg
Dev
814
15.44
52.25
1258 37.87
99.52
1285 126.14 195.25
1275 41.59 109.83
1355 77.05 161.72
1373 46.27 108.84
1288 80.84 164.74
–
–
–
–
–
–
Med
1
6
40
5
11
8
11
–
–
Table 1 also shows the number of days required to fix the issues in each category.
We can observe that blocker bugs are quickly corrected by developers, showing the lowest values for maximum, average, standard deviation, and median measures among the
considered categories. The presented lifetimes also indicate that issues with critical and
major severity are closer to each other. Finally, enhancements are very different from the
others, showing the highest values for average, standard deviation, and median.
Issues marked as blocker, critical, or major were not considered in our evaluation
because developers have to fix them as quickly as possible. In other words, they would
probably not consider fixing other issues together, since their ultimate priority is to fix
the main blocker issue. In other words, our dataset is formed by fixed issues classified as
normal, minor, and trivial. These issues count for 109,154 bugs (83.64%) from our initial
population of bugs available for the NextBug evaluation.
57
We used three metrics in our evaluation: Feedback, Precision, and Likelihood.
These metrics were inspired by the evaluation followed by the ROSE recommendation
system [Zimmermann et al. 2004]. Feedback presents the ratio of queries where NextBug
makes at least k recommendations. Precision indicates the percentage of recommendations that were actually relevant among the top-k suggestions by NextBug. Finally, Likelihood indicates whether at least one relevant recommendation is included in NextBug’s
top-k suggestions.
In this evaluation, we defined a relevant recommendation as one that shares the
same developer with the main issue. More specifically, we consider that a recently created issue q is connected to a second opened issue when they are handled by the same
developer. The assumption in this case is that our approach fosters gains of productivity
whenever it recommends issues that were later fixed anyway by the same developer.
0.8
Likelihood
0.2
0.4
k=1
k=2
k=3
k=4
k=5
0.0
0.2
0.4
0.6
0.8
Precision
0.6
Feedback
0.0
0.4
0.0
0.2
metric value
0.6
0.8
Figure 3 shows the average feedback (left chart), precision (central chart) and
likelihood (right chart) up to k = 5.
Figure 3. Average Evaluation Results for k = 1 to k = 5.
We summarize our results as follows:
• We achieved a feedback of 0.63 for k = 1. Therefore, on average, NextBug made
at least one suggestion for 63% of the bugs, i.e., for every five bugs NextBug was
able to provide at least one similar recommendation for three of those. Moreover,
NextBug showed on average 3.2 recommendations for its queries.
• We achieved a precision of 0.31 or more for all values of k. In other words, the
NextBug recommendations were on average 31% relevant (i.e., further handled by
the same developer), no matter how many suggestions were given.
• We achieved a likelihood of 0.54 for k = 3. More specifically, in about 54% of
the cases, there is a top-3 recommendation that was later handled by the same
developer responsible for the original bug.
We also conducted a survey with Mozilla developers using our tool. We gave
recommendations suggested by NextBug to 176 Mozilla maintainers and asked them a
few questions. Our summarized results are: (i) 77% found our recommendations relevant;
(ii) 85% confirmed that a tool to recommend similar bugs would be useful to the Mozilla
community and it would allow them to do more work in less time.
58
4.1. Example of Recommendation
Table 2 presents an example of a bug (browsed or main issue) opened for the component
DOM:Device Interfaces of the Core Mozilla product and the first three recommendations (top-3) suggested by our tool for this bug. As we can observe in the summary description, both query and recommendations require maintenance in the Device
Storage API, used by Web apps to access local file systems. Moreover, all four issues
were handled by the same developer (Dev ID 302291).
Similarity
Browsed
–
Top-1
56%
Top-2
47%
Top-3
42%
Table 2. Example of Recommendation
Bug ID
Summary
Creation Date
Device
Storage
Default
location
for
device
stor788588
2012-09-05
age on windows should be
NS WIN PERSONAL DIR
Device Storage - Clean up error
754350
2012-05-11
strings
Device Storage - Convert tests to
788268
2012-09-04
use public types
Device Storage - use a properties
786922
2012-08-29
file instead of the mime service
Fix Date
2012-09-06
2012-10-17
2012-09-06
2012-09-06
We can also observe that the three recommended issues were created before the
original query. In fact, the developer fixed the bugs associated to the second and the
third recommendations in the same date which he has fixed the original query, i.e. on
2012-09-06. However, he only resolved the other recommended bug (ID 754350) 41 days
latter, i.e., on 2012-10-17. Therefore, our approach would have helped this maintainer to
discover quickly the related issues. This task probably demanded more effort without a
recommendation automatically provided by a tool like NextBug.
5. Conclusion
This paper presented NextBug, a tool for recommending similar bugs. NextBug is implemented as a plug-in for Bugzilla, a widely used Issue Tracking Systems (ITS), specially
used by open-source systems. The proposed tool relies on information retrieval techniques to extract semantic information from issue reports in order to identify the similarity
of open bugs with the one that is being handled by a developer.
We evaluate the NextBug with a dataset of 109,154 Mozilla bugs, achieving feedback results of 63%, precision results around 31% and likelihood results greater than 54%.
Those results are very reasonable compared to other recommendation tools.
We also conducted a survey with 176 Mozilla developers using recommendations
provided by NextBug. From such developers, 77% of them thought our recommendations
were relevant and 85% confirmed that a tool like NextBug would be useful to the Mozilla
community.
6. Acknowledgements
This work was supported by CNPq, CAPES, and FAPEMIG.
59
References
[Alipour et al. 2013] Alipour, A., Hindle, A., and Stroulia, E. (2013). A contextual approach towards more accurate duplicate bug report detection. In 10th Working Conference on Mining Software Repositories (MSR), pages 183–192.
[Anvik et al. 2006] Anvik, J., Hiew, L., and Murphy, G. C. (2006). Who should fix this
bug? In 28th International Conference on Software engineering (ICSE), pages 361–
370.
[Anvik and Murphy 2011] Anvik, J. and Murphy, G. C. (2011). Reducing the effort of
bug report triage: recommenders for development-oriented decisions. ACM Transactions on Software Engineering Methodology (TOSEM), 20(3):10:1–10:35.
[Aziz et al. 2009] Aziz, J., Ahmed, F., and Laghari, M. (2009). Empirical analysis of
team and application size on software maintenance and support activities. In 1st International Conference on Information Management and Engineering (ICIME), pages
47–51.
[Baeza-Yates and Ribeiro-Neto 1999] Baeza-Yates, R. A. and Ribeiro-Neto, B. (1999).
Modern information retrieval. Addison-Wesley, 2nd edition.
[Banker and Slaughter 1997] Banker, R. D. and Slaughter, S. A. (1997). A field study of
scale economies in software maintenance. Management Science, 43:1709–1725.
[Cavalcanti et al. 2013] Cavalcanti, Y. C., Mota Silveira Neto, P. A., Lucrédio, D., Vale,
T., Almeida, E. S., and Lemos Meira, S. R. (2013). The bug report duplication problem:
an exploratory study. Software Quality Journal, 21(1):39–66.
[Ihara et al. 2009] Ihara, A., Ohira, M., and Matsumoto, K. (2009). An analysis method
for improving a bug modification process in open source software development. In
7th International Workshop Principles of Software Evolution and Software Evolution
(IWPSE-Evol), pages 135–144.
[Junio et al. 2011] Junio, G., Malta, M., de Almeida Mossri, H., Marques-Neto, H., and
Valente, M. (2011). On the benefits of planning and grouping software maintenance
requests. In 15th European Conference on Software Maintenance and Reengineering
(CSMR), pages 55–64.
[Liu et al. 2012] Liu, K., Tan, H. B. K., and Chandramohan, M. (2012). Has this bug
been reported? In 20th ACM SIGSOFT International Symposium on the Foundations
of Software Engineering (FSE), pages 28:1–28:4.
[Marques-Neto et al. 2013] Marques-Neto, H., Aparecido, G. J., and Valente, M. T.
(2013). A quantitative approach for evaluating software maintenance services. In
28th ACM Symposium on Applied Computing (SAC), pages 1068–1073.
[Mockus et al. 2002] Mockus, A., Fielding, R. T., and Herbsleb, J. D. (2002). Two case
studies of open source software development: Apache and Mozilla. ACM Transactions
on Software Engineering and Methodology, 11(3):309–346.
[Runeson et al. 2007] Runeson, P., Alexandersson, M., and Nyholm, O. (2007). Detection of duplicate defect reports using natural language processing. In 29th International
Conference on Software Engineering (ICSE), pages 499–510.
[Tan and Mookerjee 2005] Tan, Y. and Mookerjee, V. (2005). Comparing uniform and
flexible policies for software maintenance and replacement. IEEE Transactions on
Software Engineering, 31(3):238–255.
[Zimmermann et al. 2004] Zimmermann, T., Weisgerber, P., Diehl, S., and Zeller, A.
(2004). Mining version histories to guide software changes. In 26th International
Conference on Software Engineering (ICSE), pages 563–572.
60
FunTester: A fully automatic functional testing tool
Thiago Delgado Pinto1,2, Arndt von Staa2 *
1
2
Informatics Department –Federal Center of Technological Education (CEFET/RJ)
28.635-000 – Nova Friburgo – RJ – Brazil
Informatics Department – Pontifical Catholic University of Rio de Janeiro (PUC-Rio)
22.453-900 – Rio de Janeiro – RJ – Brazil
{tpinto,arndt}@inf.puc-rio.br
Abstract. This paper presents a free, multi-language, model-based testing tool
that uses use cases and their business rules for generating relevant functional
tests with test data and oracles. These business rules can describe constraints
about data located at external sources such as relational databases, and use
them for generating tests. The tool also executes the tests and analyzes their
results.
Video available at http://funtester.org/video
1. Introduction
Over the last years, researchers have been using Model-Based Testing (MBT) to address
the problem of the automatic test generation. Tools like [1], [2], and [3] use interesting
approaches that allow software engineers to derive the system under test (SUT) model
from use case scenarios written in structured or natural languages, without needing a
formal modeling expertise. Such tools were also successful in presenting a set of
directives to guide the test generation process in an automated manner. However, these
directives did not tackle the test data generation or the automatic oracle generation, both
very important to create effective tests.
In a previous work [4], we described a successful approach to solve these
problems. In this paper we present FunTester,1 an open-source tool that uses our
approach and tries to fill these gaps, using business rules and techniques like
equivalence partitioning and boundary value analysis for generating test data and
oracles. The tool is supposed to be used for a wide range of applications such as
websites and form-based desktop and mobile applications.
2. Overview
The tool provides a GUI for helping the user documenting functional requirements
using use cases. When describing a use case, the user can detail its basic and alternative
flows and define some business rules about the widgets involved in it. These business
rules try to capture the accepted values, value ranges or formats, and describe the
expected behavior when a user enters an incorrect value. In this way, the tool can
generate valid and invalid test values for using in different tests, and create oracles that
will verify whether the system under test (SUT) is behaving as expected (e.g.: show the
right error message).
* Financially supported by CNPq/Brazil grant 303089/2011-3.
1
FunTester is available at http://funtester.org, with an open-sourced license.
61
After describing a use case and its business rules, the tool can generate abstract
test cases and test scripts, run these scripts and evaluate their results. Each abstract test
case is an instance of a testing scenario with testing data and oracles, in a structure not
tied to programming languages or testing frameworks. FunTester can transform these
abstract test cases into test scripts using plug-ins. Each plug-in executes these scripts
and transforms the execution results (e.g. a XML file containing the test results) into a
format that the automatic test tool can read. After running the plug-in, the tool presents
the results, relating failing tests with their respective scenarios and use case steps, so
that the tester can diagnose the failures in order to identify their causes. Figure 1 shows
an overview of this process.
Figure 1 Process
2.1. Main Features
FunTester's main features are: (i) generating abstract test cases with data and
corresponding oracles that explore the software business rules and try to expose defects
in the SUT; (ii) transforming abstract test cases into test scripts through plug-ins (iii)
executing the test scripts through plug-ins and testing frameworks and collects their
results; (iv) analyzing test results and requirements, helping a user to understand the
reason of failures.
Other interesting features are: a) configurable vocabulary: a vocabulary is, in this
context, a kind of translation of a profile, which is a set of reserved words (e.g. "click",
"type", "move") used to compose the steps (sentences) of a flow. It can use one or more
synonyms to better express the intent of a system action or user (actor) action. In this
way, the documentation can be written in, say, French, but the generated tests will stay
in English; b) referencing external databases in business rules: allow defining the data
source for values of editable widgets through database queries, so the tests can use these
data to generate valid or invalid values. This is especially useful for testers because they
can prepare a testing database with values similar to those used in a production
environment for simulating a real use of the system; and c) generate tests with
meaningful names: each test method name aims to help the tester understanding what
the method verifies, and thus making failure diagnose easier, when compared to names
generated with record-and-playback tools (such as t1, t2, t3, and so forth). For
instance, a test named price_with_random_value_below_lower_limit will fill out
all the editable fields with valid values except for the price, which will be filled out
with a random value below its lower limit.
62
A comparison to other tools' approaches can be found at [4].
2.2. Test Cases
Meyers et al. [5] affirm that test cases that explore boundary conditions have
higher pay off than test cases that do not. Most of FunTester's test cases explore
boundary conditions. Chen et al. [6] indicate that failures are likely to manifest
themselves on or near the boundary between subdomains and test cases based on the
knowledge about the program input domain – exploring these boundary conditions –
can help revealing failures. Our tool uses the business rules to create equivalence classes
and generate test cases with valid and invalid random data, according to the these
classes. Figure 2(a) shows an example of a valid value/length range, according to
defined lower and upper bounds. Examples of generated valid values/lengths are: (i)
lower bound; (ii) just above the lower bound; (iii) zero, whether applicable; (iv) the
median, whether applicable; (v) just below the upper bound; (vi) upper bound; and (vii)
random value/length between the lower and upper bounds.
Figure 2 Valid and invalid ranges
Figure 2(b) shows an example of invalid value/length ranges. Examples of
generated invalid values/lengths are: (i) just below the lower bound; (ii) random
value/length below the lower bound; (iii) just above the upper bound; (iv) random
value/length above the upper bound. Additionally, it also generates values with invalid
format, according to the respective business rules.
Each abstract test case verifies a use case scenario. When a scenario is being
executed, depending on the data informed by a user, there could be variations in the
expected behavior. For instance, given a form that has an e-mail field (widget), whether
the value of this e-mail has a format considered invalid, according to its business rule, it
could be expected from the system to ask the user to correct this value. In such case, the
scenario is behaving differently from it would do whether a valid value would be
informed. Thus, the tool generates different tests for a same scenario, each one in a test
method.2
3. Architecture
The solution is distributed as a set of four basic Maven3 projects: (i) core:
contains the main project classes and artifacts; (ii) common: useful classes used by the
other solution's projects; (iii) app: the application's user interface; (iv) plugin-common:
base classes for Java plug-ins.
Albeit the solution is implemented in Java, its plug-ins do not need to be, nor do
they need to use the plugin-common project. A plug-in can be any executable file
(including executable Java Archive files) that follows some rules such as receiving
2
3
More information on this at https://www.assembla.com/spaces/funtester-project/wiki/Generated_tests
http://maven.apache.org/
63
specific parameters and producing a test execution report file. The tool detects and
executes the plug-in through a plug-in descriptor – a simple JSON file containing some
information about the plug-in, and how to run it.
A plug-in is responsible for (i) transforming abstract test cases into test scripts; (ii)
running the tests; and (iii) analyzing and transforming the framework-specific execution
results into a common framework-independent format (e.g. transforming a XML file
produced by JUnit4 into the JSON file format expected by FunTester). Figure 3
illustrates the process performed by a plug-in.
Figure 3 Plug-in execution process
Most of the files handled by FunTester are JSON5 files, enabling a user to
visualize or, if needed, to edit them in simple text editors (e.g. for solving version
conflicts), and control their evolution using any version control system.
4. End-to-end example
For illustrating the tool's usage, let us document a little FunTester's use case called
"Create a Software", shown in Figure 4. Observe that FunTester is being used for
documenting itself.
Figure 5 shows how to document the target use case ("Create a Software"), its
Basic Flow, and one of the Steps of this Basic Flow (an Oracle Step). In this use case,
the system shows the Software dialog, the actor types the software name, selects one of
the available vocabularies, clicks "OK", the system checks the business rules and,
whether they are being met, closes the dialog.
Figure 4 "Create a Software" use case
4
5
http://junit.org
http://json.org
64
Figure 5 Creating the use case, flow, and steps
Figure 6 Describing elements and business rules
Figure 6 shows how a test analyst can document the business rules involved in the
use case, through the Elements tab. He or she can also describe the files that should be
included in the test cases through the Include Files tab.
The Elements tab presents the user interface elements (widgets) involved in the
flow's steps. The test analyst can give some information about these elements, such as
their internal names (the widget names), types (e.g. textbox, combobox, button, etc.),
and accepted value types and business rules and whether they are editable.
The currently available business rule types (as per version 0.7) are: (a) Minimum
and maximum values: the minimum or the maximum values for numbers, dates, times,
65
and date-time values; (b) Minimum and maximum lengths: the minimum or the
maximum lengths; (c) Required: whether the element must be filled out; (d) Regular
Expression: a regular expression for the value; (e) Equal to: an accepted value; (f) One
of: a list of accepted values; and (g) Not one of: a list of non-accepted values.
Each of these business rules types can also be configured to come from database
queries or from other elements (widgets). Each database query accepts parameters that
can come from other business rule configurations, what makes the business rules very
flexible. After describing the business rules and included files, we are ready to generate
the tests. However, the tests will not know how to fire the use case "Create a Software",
because it is fired through our main screen (see Figure 4). In this case, we describe an
"Access System" use case with an alternate flow that calls our target use case. Now our
tests can execute the system and reach our target use case. Figure 7 shows the Generate
and Run dialog used to configure the test generation, which involves the abstract test
generation, the plug-in selection, the generated test code, and the parameters to run the
tests and get their execution results.
Figure 7 Test generation configuration and execution
Figure 8 shows the screen that presents the execution results and an example of
source code generated by the FEST Plug-in for FunTester (Java with FEST6 and
TestNG7 frameworks).
6
7
http://docs.codehaus.org/display/FEST/Home
http://testng.org
66
Figure 8 Execution results
In case of failing tests, a tester can view some details about the failures (e.g. the
execution trace, the related use case step) that can give him/her relevant information
about the problem.
5. Final remarks
This paper presented FunTester, a fully automatic model-based functional testing
tool that generates and executes test suites from use case specifications. The tool reifies
our approach [4] and can be used in a wide range of applications such as websites and
form-based desktop and mobile applications.
We are currently developing a plug-in for Selenium8 and other for Robotium9 by
of means of allowing testing web, iOS, and Android (native and web) applications with
JUnit or TestNG. More information on the tool and plug-in development can be found
at the FunTester Wiki page: https://www.assembla.com/spaces/funtester-project/wiki.
Our future plans include (i) improving the flexibility of the test case steps to allow
other kinds of interaction between an actor and the system; (ii) allowing a system
analyst using steps in the business rules (just like he/she does for the flows) for defining
the expected system behavior, aiming at generating other kinds of test oracles; (iii)
reducing the number of generated scenarios by using a history-based and incremental
use case combination; and (iv) creating plug-ins for other programming languages and
testing frameworks.
References
[1] Felype Ferreira, Laís Neves, Michelle Silva, and Paulo Borba, "TaRGeT: a Model
Based Product Line Testing Tool," in 1st Brazilian Conference on Software: Theory
and Practice, Salvador, Bahia, 2010, pp. 67-72.
[2] Neil W. Kassel, "An approach to automate test case generation from structured use
cases," Clemson University, Clemson, SC, USA, Doctor Dissertation 2006.
8
9
http://docs.seleniumhq.org/
https://code.google.com/p/robotium/
67
[3] Mingyue Jiang and Zuohua Ding, "Automation of test case generation from textual
use cases," Hangzhou, China, 2011.
[4] Thiago Delgado Pinto and Arndt von Staa, "Functional validation driven by
automated tests," in XXVII Brazilian Symposium on Software Engineering (SBES
2013), Brasília, 2013, pp. 56-63.
[5] Glenford J. Myers, Corey Sandler, and Tom Badgett, The Art of Software Testing,
3rd ed., Wiley, Ed., 2011.
[6] Tsong Yueh Chen, Fei-Ching Kuo, Robert G. Merkel, and T. H. Tse, "Adaptive
Random Testing: the Art of Test Case Diversity," Journal of Systems and Software,
vol. 83, no. 1, pp. 60-66, January 2010.
68
JMLOK2: A tool for detecting and categorizing
nonconformances
Alysson Milanez1 , Dênnis Sousa1 , Tiago Massoni1 , Rohit Gheyi1
1
Department of Computing Systems – UFCG
[email protected],[email protected],
{massoni,rohit}@dsc.ufcg.edu.br
Abstract. In contract-based programs, detection and characterization of nonconformances is hard. Assigning categories to nonconformances can be useful
for maintenance. In this work, we present JMLOK2, which detects and categorizes nonconformances, suggesting their likely causes. We evaluated this
tool by comparing its categorization results with manually-provided results, with
respect to 84 nonconformances discovered in Java Modeling Language (JML)
projects summing up 29 KLOC and 9 K lines of contracts.
JMLOK2 is demonstrated online: http://youtu.be/9Y4izhjCfI8.
1. Introduction
In contract-based programs [Guttag et al. 1993] (as with the Java Modeling Language
(JML) [Leavens et al. 1999]), early detection of nonconformances is highly desirable, in
order to provide a more reliable account of correctness and robustness [Meyer 1997].
However, nonconformance detection can be hard to achieve. Formal conformance
is quite costly and not scalable, making it unfeasible for large-scale development.
Therefore, developers tend to apply automated, although incomplete, approaches.
For JML, there are basically two ways to automatically check conformance: statically, with ESC/Java2 [Cok and Kiniry 2004]; and dynamically, with several tools
(JMLUnit [Cheon and Leavens 2002b], JMLUnitNG [Zimmerman and Nagmoti 2011],
JET [Cheon 2007], Jartege [Oriat 2005], and Korat [Boyapati et al. 2002]). Nevertheless,
those approaches present limitations, mostly by falling short in providing (1) effective
test data generation; (2) comprehensive unit tests fully exercise sequences of calls to unveil subtle nonconformances (as seen in Section 2); and (3) categorization of detected
nonconformances.
In this paper, we describe JMLOK2, a tool for detecting and categorizing nonconformances in contract-based programs. The tool applies a randomly-generated tests
(RGT) for detecting nonconformances, and a heuristics-based approach for categorizing
nonconformances (Section 3). JMLOK2 was evaluated in two scenarios: first, JMLOK2
was applied to open-source JML projects, in order to assess the applicability of the approach in detecting and categorizing nonconformances; then, JMLOK2 was compared
with JET [Cheon 2007], to the best of our knowledge, the only tool that does not require
test data provision (Section 4).
69
2. Motivating Example
In JML, contracts are written as qualified comments (Listing 1). The following example is
adapted from TransactedMemory experimental unit (details in Section 4) – visibility
issues are omitted, for simplicity.
Listing 1. GenCounter and MapMemory classes
c l a s s GenCounter {
/ /@ i n v a r i a n t 0 <= c n t G e n && c n t G e n <= MapMemory . MAX;
i n t cntGen ;
GenCounter ( ) { cntGen= 1 ; }
/ /@ e n s u r e s ( b == t r u e )==>( c n t G e n == \ o l d ( c n t G e n + 1 ) ) ;
v o i d u p d a t e C o u n t ( b o o l e a n b ) { i f ( b ) { c n t G e n ++; }}
/ /@ e n s u r e s c n t G e n == 0 ;
v o i d r e s e t C o u n t ( ) { c n t G e n = 0 ; }}
c l a s s MapMemory {
f i n a l s t a t i c i n t MAX = 3 , MSIZE = 1 0 ;
G e n C o u n t e r g ; b o o l e a n [ ] map ; i n t p o s ;
MapMemory ( ) { g= new G e n C o u n t e r ( ) ; map= new b o o l e a n [ MSIZE ] ; p o s = 0 ; }
/ /@ r e q u i r e s p o s < MSIZE−1;
v o i d updateMap ( b o o l e a n m) { map [ p o s ++]= m; g . u p d a t e C o u n t (m ) ; }
/ /@ e n s u r e s p o s == 0 ;
v o i d r e s e t M a p ( ) { map= new b o o l e a n [ MSIZE ] ; g . r e s e t C o u n t ( ) ; p o s = 0 ; }}
GenCounter represents a piece of information about some named tag, while
MapMemory represents a Java implementation of memory for smart cards. These classes
declare a constructor and two methods: one for updating values and one for resetting
values. JML contracts are declared with keywords requires and ensures, specifying pre- and postconditions, respectively, for a method. Class invariant clause must
hold after constructor execution, and before and after every method call; the invariant in
GenCounter enforces field cntGen must be in the range [0, MapMemory.MAX]. The
\old clause used in the postcondition refers to pre-state value of cntGen.
Despite its simplicity, this program is not in conformance with its contracts. The
nonconformance in GenCounter can be detected only with at least a sequence of three
calls to MapMemory.updateMap with parameter m = true. In Listing 2, a test case
reveals this problem. This example is illustrative: nonconformances between contract
and implementation may be subtle to detect, even in small programs – more complex
programs tend to represent greater challenges for detection. Regardless of where the bug
is located (contract or code, or both), the failure may only arise within a sequence of
calls to two or more methods, called in a particular order. This nonconformance can be
removed by adding a precondition to GenCounter.updateCount, testing whether
the value of cntGen is less than MapMemory.MAX.
Listing 2. A test case revealing the nonconformance from GenCounter class
MapMemory m = new MapMemory ( ) ;
m. updateMap ( t r u e ) ; m. updateMap ( t r u e ) ;
m. updateMap ( t r u e ) ;
3. JMLOK2
In this work, we propose and implement a RGT-based (randomly-generated tests) approach to detect nonconformances, and a categorization model for those nonconformances. Our approach automatically generates and executes tests, comparing the test
70
results with oracles (generated from the contracts). The generated tests are composed of
sequences of calls to methods and constructors under test, while the test oracles are assertions from the contracts, generated from JML contracts by specialized compilers, such as
jmlc [Cheon and Leavens 2002a] and OpenJML [Cok 2011]. After test execution, two filters are applied: first, meaningless test cases are discarded [Cheon 2007] – tests violating
a precondition in the first call to a method under test. The remaining failures consist of
relevant contract violations, which are candidate nonconformances. The second filter distinguishes faults from the returned failures – those faults make up the nonconformances
subject to categorization process.
Regarding to nonconformances categorization, we propose a three-level model
composed by a category, a type and a likely cause. The category corresponds to the
artifact in which probably occurs the nonconformance – source code or contract. The
type is given automatically by the assertion checker, and corresponds to violated part of
JML - considering only visible behavior from the systems. The suggested likely cause
is given by specific heuristics derived from our experience in investigate likely causes
for nonconformances. This model is implemented in a heuristics-based approach, which
suggests a specific category and likely cause for a given nonconformance. Each heuristic
is based on a set of possible scenarios related to the type of detected nonconformance.
Based on the contract-based program, the nonconformance type and the corresponding
set of heuristics, a likely cause is suggested. For instance, regarding an invariant error in
class C, when calling method m, we suggest a likely cause with the following heuristics:
(1) First check for uninitialized fields in C; in this case, suggest category Code error;
(2) Otherwise, check for the absence of precondition (default = true), or the presence
of at least one field modified in m body; in either case, suggest category Contract error,
and likely cause Weak precondition; (3) Otherwise, suggest category Contract error, and
likely cause Strong invariant. From the example in Section 2, an invariant nonconformance, as method GenCounter.updateCount has default precondition, the likely
cause suggested is Weak precondition.
JMLOK2 is the implementation of this approach in the context of Java/JML
programs (JMLOK2 is an improvement over JMLOK [Varjão et al. 2011]). JMLOK2
avoids false positives by grouping failures into faults, whereas JMLOK tool presented
the overall failures (possibly repeated) revealed by the tests. Moreover, JMLOK2 is
an extension that categorizes nonconformances. JMLOK2 is available online: http:
//massoni.computacao.ufcg.edu.br/home/jmlok, for Windows, Mac and
Linux platforms under the GNU (GNU General Public License) GPL v3.
In the detection module, the test generation is performed automatically by Randoop [C. Pacheco and Ball 2007]. Randoop is a feedback-directed test generator tool,
producing JUnit test cases with sequences of calls to methods and constructors. This
feature made Randoop a satisfactory infrastructure for our approach. In addition, the
test oracle generation is performed by the jmlc compiler; although OpenJML is currently
recommended by the JML research community, it still presents limitations that led us to
report false positives1 . Although jmlc presents no active development, it is mature enough
to support the most basic JML features, limited to Java 1.4. Afterwards, the two filters are
1
We have contacted the OpenJML team for support, but a solution was not feasible for our time constraints, so this integration is left for future work.
71
enacted. Next, the set of distinct nonconformances is returned to the categorization module. In the categorization module, the contract-based program and a set of heuristics are
used to suggest a likely cause for each nonconformance. Subsequently the categorization,
the list of categorized nonconformances is sent to the Controller module, which sends the
list to the UI.
4. Evaluation
4.1. Detection and Categorization
The first study assesses JMLOK2 with respect to nonconformance detection and categorization, from the point of view of the developer in the context of JML programs. This
study addresses the following research questions:
Q1. Is JMLOK2 able to detect nonconformances in JML programs?
Q2. How many answers from tool are coincident with our previous manual analysis?
The experimental units consist of sample programs from the JML web site2 and
collected open-source JML projects. Samples include eleven example programs for
training purposes3 . Regarding open-source JML programs, Bomber [Rebêlo et al. 2009]
is a mobile game; HealthCard [Rodrigues 2009] is an application for management
of medical appointments into smart cards. JAccounting and JSpider are two case
studies from the ajml compiler project [Rebêlo et al. 2009], implementing, respectively,
an accounting system and a Web Spider Engine. In addition, Mondex [Tonin 2007]
is a direct translation to JML from an existing Z specification4 context. Finally,
TransactedMemory [Poll et al. 2002] is a specific feature of the Javacard API. These
units totalize over 29 KLOC and 9 K lines of JML contracts (that we will refer as KLJML
henceforth) and are characterized in Table 1.
Table 1. Programs characterization.
Samples
LOC
LJML
3,400
5,200
Bomber
6,400
255
Health
Card
1,700
2,400
JAccounting
6,500
331
JSpider
Mondex
Transacted
Memory
Total
8,800
386
1,000
361
1,800
335
29,600
9,268
The study was performed in a PC with CPU Intel Core i7 2.20 GHz, RAM 8
GB, OS Windows 8 and Java 7 update 51. Once Randoop [C. Pacheco and Ball 2007]
requires a time limit to generate tests – the time after which the generation process stops
–, we used 10s as basis5 . To collect data about test coverage we used EclEmma 2.3.0
(Eclipse plugin)6 , and manually collect the JML coverage – aided by EclEmma, which
counted the number of assertions, generated from the contracts, covered by the tests.
Table 2 presents the results of JMLOK2 for sample and open-source JML
projects, including information about detected nonconformances. For sample programs,
2
http://www.eecs.ucf.edu/˜leavens/JML/examples.shtml
dbc, digraph, dirobserver, jmlkluwer, jmltutorial, list, misc, reader, sets,
stacks, table, and an adaptation of the subpackage stacks – bounded
4
http://vsr.sourceforge.net/mondex.htm
5
We performed some experiments increasing the time limit from 10s to 120s, but results were unchanged, thus 10s was chosen as our reference time.
6
http://www.eclemma.org
3
72
18 nonconformances were detected – 15 were categorized as postcondition errors, other
two as invariant, and one as evaluation. For open-source JML projects, 66 nonconformances were detected: Bomber (4), HealthCard (30), JAccounting (23), Mondex
(2), and TransactedMemory (7). JSpider did not present nonconformances. Regarding type, most of the 84 nonconformances had type postcondition (38), followed by
invariant (35). Concerning likely causes, most of the 84 nonconformances had cause
categorized as weak precondition (38), followed by Code error (23).
Table 2. For each experimental unit we present: the number of generated test
cases, test coverage, and all nonconformances detected, grouped by the nonconformance type, and by the likely cause manually assigned.
# of Generated Tests
Java Coverage
JML Coverage
Samples
Bomber
7,581
93.44%
96.33%
946
11.62%
11.62%
Health
Card
710
87.51%
87.51%
JAccounting
1,000
36.14%
62.63%
JSpider
Mondex
477
32.93%
32.93%
3,743
53.42%
22.58%
Transacted
Memory
963
70.30%
55.93%
Nonconformance’s
Type
Postcondition error
Invariant error
Constraint error
Evaluation error
Precondition error
Total
15
2
0
1
0
1
2
0
0
1
12
11
6
0
1
9
12
0
2
0
0
0
0
0
0
0
2
0
0
0
1
6
0
0
0
Nonconformance’s
Likely Cause
Weak precondition
Code error
Strong postcondition
Undefined
Strong precondition
Strong constraint
Strong invariant
Weak postcondition
Total
38
35
6
3
2
Total
6
0
8
4
0
0
0
0
18
0
3
0
1
0
0
0
0
4
15
5
8
0
1
1
0
0
30
11
12
0
0
0
0
0
0
23
0
0
0
0
0
0
0
0
0
0
2
0
0
0
0
0
0
2
6
1
0
0
0
0
0
0
7
38
23
16
5
1
1
0
0
84
We compare the results of JMLOK2 with the manual categorization presented
in our tech report [Milanez 2014]. The coincidences ratio is measure by matches
(Equation 1). We got matches = 1 for bounded, stack, misc, JAccounting,
Mondex and TransactedMemory; and 0 (dbc), 0.2 (list), 0.5 (Bomber), and 0.63
(HealthCard).
matches(x) =
T otal of Agreements(x)
T otal of Categorized N onconf ormances(x)
(1)
where x is an experimental unit, and T otal of Agreements is the total of coincidences between automatic and manual categorization.
Discussion Q1. The JMLOK2 tool was able to detect 84 nonconformances. The generated sequences come to be a benefit of the approach, as several nonconformances
were only detected by running a particular sequence of constructor and method calls.
For instance, a postcondition error in AbstractTransactedMemory (a class from
TransactedMemory) is only revealed after 32 specific method calls. In addition, test
coverage results are also varying. While Bomber showed a very low value (due to the
73
need of user interaction), Samples and HealthCard presented the highest coverage
rates.
Discussion Q2. The mean value of the matches metric – used to compare the results
from manual and automatic categorization – was 0.73. Nevertheless there were two cases
in which the metric was very low: in Samples, 0.00 on dbc and 0.20 on list. In
dbc, this result occurred because only a semantic analysis of the contract-based program
can give a precise result. In list, the low matches metric is due to the fact of manual
categorization assigned Undefined as likely cause, whereas the automatic categorization
assigned Weak precondition. This difference occurred due to the fact the manual analysis
was not able to understand whether the problem arises from a code or a specification problem. On the other hand, there were six experimental units: bounded, stacks, misc,
JAccounting, Mondex, and TransactedMemory where the highest possible values to matches were obtained; and in the other two experimental units: Bomber, and
HealthCard, had values 0.5, and 0.63, respectively. These results showed, although
we are using a heuristics-based approach, our automatic categorization has good results
in comparison with the baseline (manual categorization).
4.2. Comparison between JMLOK2 and JET
The goal of this study is to compare two RGT approaches: JMLOK2 and JET, for the purpose of evaluation with respect to their effectiveness from the point view of the developer
in the context of JML programs. This study addresses the following question:
Q3. Does the JMLOK2 approach perform better than the JET tool?
This comparison considered only a subset of the experimental units from
the first study, due to JET requirements [Cheon 2007]. The units were Samples,
JAccounting, Mondex and TransactedMemory, totalizing over 6 KLOC and 5
KLJML; from JAccounting, only Account class was considered. This study was
performed in the same machine setup of the first study (Section 4.1). We used the JMLOK2 and JET tools with their default configurations. Table 3 presents the results of the
experimental evaluation considering JMLOK2 and JET. The total number of nonconformances detected by JET was 9, against 30 nonconformances detected by JMLOK2. In
relation to test coverage, only for JAccounting JET presented higher coverage.
Table 3. Comparison between JMLOK2 and JET.
JET
# Tests
Java Coverage
JML Coverage
# NCs
Types
JMLOK2
# Tests
Java Coverage
JML Coverage
# NCs
Types
Samples
8,306
62.86%
63.70%
4
2 invariant,
2 postcondition
7,581
93.44%
96.33%
18
15 postcondition,
2 invariant,
1 evaluation
JAccounting
1,787
100%
100%
4
Mondex
700
7.50%
11.60%
0
TransactedMemory
1,958
21.53%
52.59%
1
4 postcondition
—-
1 invariant
1,000
96.60%
95.83%
3
3,743
53.42%
22.58%
2
963
70.30%
55.93%
7
1 postcondition,
2 evaluation
2 invariant
6 invariant,
1 postcondition
Discussion Q3. JET was able to reveal nonconformances not detected by JMLOK2,
specially for the JAccounting experimental unit. However, we observed an important
74
drawback: the tool is inconstant about the nonconformances discovered; for instance, in
JAccounting unit, different executions found different nonconformances: JET often
detects zero nonconformances, then in the next execution shows four nonconformances;
for the same unit JMLOK2 always find three nonconformances. Maybe the genetic algorithm in the backend makes JET differ between repeated executions. This property was
not observed in JMLOK2, despite its RGT approach. Considering the test coverage, in
general JMLOK2 performed better than JET. The only case where JET was better was
JAccounting. This result can be related to JET requirements: no public fields can be
assigned, and object sharing is not allowed; the tests miss several parts from the programs
that do not fulfill those requirements, which does not occur with JMLOK2. Considering
the number of nonconformances detected, the only case where JET performs better than
JMLOK2 too was JAccounting: four against three.
5. Conclusions
In this work, we present an approach for detecting and categorizing nonconformances in
contract-based programs. In our experimental studies JMLOK2 detected 84 nonconformances in over 29 KLOC and over 9 KLJML. We reported those nonconformances and
their classification to authors, and answers were positive. Furthermore, we classified the
nonconformances and established likely causes; causes mostly split into Weak preconditions and Code errors. Comparing the coincidences – matches – between the automatic
categorization (by means of JMLOK2 tool) and our manual categorization (baseline), we
got a mean matches of 0.73. When comparing JMLOK2 with JET, the first detected 30
nonconformances with Java instructions coverage of 78.44% and JML instructions coverage of 67.67%, while JET detected 9 nonconformances by covering 47.97% of Java
instructions and 56.97% of JML instructions; for the same experimental units (a subset
from the first study, totalizing approximately 6 KLOC and 5 KLJML). These numbers
suggest JMLOK2 performs better than JET considering the number of nonconformances
detected and test coverage (block instructions coverage), for the experimental units. As
future work, we intend to improve the test generation of JMLOK2, integrate with OpenJML, and extend our model to treat with nonconformances into method bodies.
Acknowledgment
This work was supported by CAPES, CNPq – PIBITI 04/2013 and the National Institute
of Science and Technology for Software Engineering (INES7 ), funded by CNPq, grant
573964/2008-4.
References
[Boyapati et al. 2002] Boyapati, C., Khurshid, S., and Marinov, D. (2002). Korat: Automated Testing Based on Java Predicate. In ISSTA. ACM.
[C. Pacheco and Ball 2007] C. Pacheco, S. Lahiri, M. E. and Ball, T. (2007). Feedbackdirected random test generation. In ICSE.
[Cheon 2007] Cheon, Y. (2007). Automated Random Testing to Detect Specification-Code
Inconsistencies. In SETP.
7
www.ines.org.br
75
[Cheon and Leavens 2002a] Cheon, Y. and Leavens, G. (2002a). A Runtime Assertion
Checker for the Java Modeling Language (JML). In SERP. CSREA Press.
[Cheon and Leavens 2002b] Cheon, Y. and Leavens, G. (2002b). A Simple and Practical
Approach to Unit Testing: The JML and JUnit Way. In ECOOP. Springer-Verlag.
[Cok 2011] Cok, D. (2011). OpenJML: JML for Java 7 by Extending OpenJDK. In NFM.
Springer-Verlag.
[Cok and Kiniry 2004] Cok, D. and Kiniry, J. (2004). ESC/Java2: Uniting ESC/Java and
JML – Progress and issues in building and using ESC/Java2. In CASSIS. SpringerVerlag.
[Guttag et al. 1993] Guttag, J., Horning, J., Garl, W., Jones, K., Modet, A., and Wing, J.
(1993). Larch: Languages and Tools for Formal Specification. Spring-Verlag.
[Leavens et al. 1999] Leavens, G., Baker, A., and Ruby, C. (1999). JML: A Notation for
Detailed Design.
[Meyer 1997] Meyer, B. (1997). Object-Oriented Software Construction. Prentice Hall.
[Milanez 2014] Milanez, A. (2014). Case study on categorizing nonconformances. Technical report, Software Practices Laboratory, Federal University of Campina Grande.
[Oriat 2005] Oriat, C. (2005). Jartege: A Tool for Random Generation of Unit Tests for Java
Classes. In QoSA/SOQUA. Springer Berlin Heidelberg.
[Poll et al. 2002] Poll, E., Hartel, P., and Jong, E. (2002). A Java Reference Model of Transacted Memory for Smart Cards. In CARDIS. USENIX Association.
[Rebêlo et al. 2009] Rebêlo, H., Lima, R., Cornélio, M., Leavens, G., Mota, A., and
Oliveira, C. (2009). Optimizing JML Features Compilation in ajmlc Using AspectOriented Refactorings. In SBLP.
[Rodrigues 2009] Rodrigues, R. (2009). JML-Based Formal Development of a Java Card
Application for Managing Medical Appointments. Master’s thesis, Universidade da
Madeira.
[Tonin 2007] Tonin, I. (2007). Verifying the Mondex Case Study: the KeY approach. Technical report, Fakultät für Informatik, Universität Karlsruhe.
[Varjão et al. 2011] Varjão, C., Massoni, T., Gheyi, R., and Soares, G. (2011). JMLOK:
Uma Ferramenta para Verificar Conformidade em Programas Java/JML. In CBSoft
(Tools session).
[Zimmerman and Nagmoti 2011] Zimmerman, D. and Nagmoti, R. (2011). JMLUnit: the
Next Generation. In FoVeOOS ’10. Springer-Verlag.
76
A Rapid Approach for Building a Semantically Well Founded
Circus Model Checker
Alexandre Mota1 , Adalberto Farias2
1
Centro de Informática – UFPE
Caixa Postal 7851 – 50.740-560 – Recife – PE – Brazil
2
Departamento de Sistemas e Computação – UFCG
Rua Aprı́gio Veloso, 882, Bloco CN – 58.429-900 – Campina Grande – PB – Brazil
Abstract. Model checkers are tools focused on checking the satisfaction relation
M |= f , where M is a transition system (graph representation) of a specification
written in a language L and f is a property. Such a graph may come from the
semantics of L. We present a model checker resulted from a rapid prototyping
strategy for the language Circus. We capture its operational semantics with
the Microsoft FORMULA framework and use it to analyse classical properties
of specifications. As FORMULA supports SMT-solving, we can handle infinite
data communications and predicates. Furthermore, we create a semantically
correct Circus model checker because working with FORMULA is equivalent
to reasoning with First-Order Logic (Clark completion). We illustrate the use of
the model checker with an extract of an industrial case study.
Link: https://sites.google.com/site/adalbertocajueiro/research/circusmc
1. Problem and Motivation
Model checking [Clarke et al. 1994] is an automatic technique to verify the satisfiability
of the relation M |= f , where M is a model (a Labelled Transition System or Kripke
structure) of some formal language L and f is a temporal logic formula.
A model checker is a tool containing search procedures and specific representations for M and f . They use very specialized algorithms and data structures to achieve the
best space and time complexities to check M |= f . It is not common to find model checkers for rich-state space languages (that use elaborate data structures) and that clearly follow a formal semantics. Two essential issues are intrinsic to model checker development:
how to guarantee that M conforms to the semantics (usually the Structured Operational
Semantics—SOS) of the language L, and how to guarantee the correctness of the check
M |= f (or Mf v M ). For instance, FDR [Roscoe et al. 1994] and PAT [Liu et al. 2010]
had several delivered versions due to bug fixes. And [Mota and Sampaio 2001] analyses
CSP-Z via FDR but it is not assured to be correct. The possibility of building a model
checker from a formal semantics document plays an important role in this scenario.
A very recent technology developed by Microsoft Research, known as FORMULA [Jackson et al. 2011] (Formal Modelling Using Logic Programming and Analysis), seems to be appropriate for creating semantically correct model checkers. It is based
77
on the Constraint Programming Paradigm [Rossi et al. 2006] and Satisfiability Modulo
Theory (SMT) solving provided by Z3 [De Moura and Bjørner 2008]. Besides providing
a high abstraction level for describing structures, FORMULA allows one to deal with
some infiniteness aspects of data types and defining search procedures over structures.
We used FORMULA as a framework for describing semantics and for analysing
Circus models. The language Circus [Woodcock and Cavalcanti 2002] is a formal notation that combines Z [Woodcock and Davies 1996], CSP [Roscoe 2010], and constructs
of the refinement calculi [Morgan 1990] and Dijkstra’s language of guarded commands,
which is a simple imperative language with nondeterminism.
Our model checker contains a transcription from Circus SOS rules to FORMULA
and the encoding of classical properties (deadlock, livelock and nondeterminism) properties in terms of (FORMULA) queries. The encoding of SOS rules allows FORMULA
to create an LTS (as logical facts) for an arbitrary Circus specification according to the
formal semantics, while queries check desirable properties by looking at certain logical
facts. These tasks involve interaction with the Z3 SMT solver and, therefore, are powerful
to handle a fair class of infinite state-space problems. Another interesting feature of FORMULA is that values can be dynamically instantiated to satisfy a property. Naturally, this
instantiation represents a drawback if it does not validate the property (continuous search
for other values can originate non termination). Fortunately, FORMULA works with the
least fixed point search and, using some good practices, one can overcome this limitation.
2. Design and Implementation
Figure 1 shows the scenario for creating semantically correct model checkers using FORMULA. The language of FORMULA includes algebraic data types (ADTs) and strongly
typed constraint logic programming (CLP). This allows one to create concise specifications [Jackson et al. 2011], analysable by SMT-solving. The necessary elements to implement model checkers in FORMULA are a BNF grammar, an SOS, and a set of properties
stated in some (temporal) logic. The SOS (associated to the constructors defined by the
BNF) are described as abstractions (how to build a model for an instance of Circus and
how to check properties over it) in FORMULA. The Circus model checker is a FORMULA abstraction. Regarding correctness of our model checker, we follow the idea of
Clark completion [Dao-Tran et al. 2010] of a definite clause program, which makes the
assumption that the axioms in a program completely axiomatise all possible reasons for
atomic formulas to be true. This approach is also used by other works in the literature.
Figure 1. A model checker product line
We spent 2 months learning FORMULA, 8 months to create the proposed strat78
egy for any SOS, and 72 hours to build the model checker. This fast development is
result of the high-level abstraction of FORMULA. Currently, our model checker does
not have an optimal performance but it is semantically well founded. Other approaches
like [Freitas 2005], for instance, spent a whole PhD to build a first model checker for
Circus and the manipulation of infiniteness aspects is not fully automatic.
Figure 2 illustrates the use of our model checker over the Graphic User Interface
of Visual Studio. In fact, FORMULA relies on Visual Studio; that is, it uses libraries
and components to implement a specific engine that is able to generate terms and validate
constraints over it (automatically invoking Z3). Thus, in our model checker the user
has to encode a Circus specification using suitable (and already defined) FORMULA
constructors and inform the property to be checked. Then FORMULA generates the LTS
and checks the property. Naturally, the translation from Circus to FORMULA requires
knowledge about how Circus terms are mapped into FORMULA (it is almost a one-toone mapping). However, tools like Stratego/XT can be used for this purpose and a GUI
(model checker front-end) emerges very easily.
The answer returned by FORMULA is SAT or UNSAT. In the first case FORMULA is able to instantiate a value (even when its type is infinite) to validate the property. Moreover, as FORMULA creates the LTS (and all elements involved in its creation),
deeper analyses using the internal structure of the LTS are possible.
Figure 2. FORMULA running over Visual Studio
3. Practical Use and Case Study
The Circus model checker requires previous installation of FORMULA1 , which is free
and requires Microsoft Visual Studio2 . This makes the current version of the Circus
model checker platform dependent as the underlying framework is from Microsoft.
A FORMULA script can be put as part of a FORMULA project in Visual Studio.
The user creates a new FORMULA project and replaces the default content with that of
the script. Then analysis can be done by using features of Visual Studio. Figure 3 shows
the analysis result of a Circus specification. It is possible to view information about
the FORMULA code itself (B), the internal structure (domains and models in region C),
execution time of internal tasks (D), executed queries and the base of facts containing the
LTS and all elements used to create and analyse it (A).
1
2
http://research.microsoft.com/en-us/um/redmond/projects/formula/.
http://www.microsoft.com/visualstudio.
79
Figure 3. Running FORMULA on Visual Studio
To evaluate our Circus model checker, we consider the Emergency Response System (ERS) introduced in [Andrews et al. 2013b, Andrews et al. 2013a]. Figure 4 shows
its outline view. The ERS model is a set of SysML diagrams and the behaviour in Circus
is obtained from Activity Diagrams with specialized stereotypes. Due to space restrictions, we extract the code that corresponds to the activation, detection and recovery of
faults. ListingC 1 shows the Circus specification. We add controller processes ERUs0 ,
ERUs1 , or ERUs2 . It adds details of the behaviour of the Call Centre, controlling the number of ERUs currently allocated. Version 0 has a flaw in the implementation of the schema
AllocateState. The schema should add 1 to the previous allocated value. This simple mistake causes a deadlock on process ERSystem0 —because channel service rescue is never
offered by ERUs0 —that is successfully detected by the model-checker. Process ERUs1
fixes this problem and ERSystem1 is deadlock-free. The FORMULA code and instructions to run our case study are available at https://sites.google.com/site/
adalbertocajueiro/research/circusmc.
Figure 4. Outline of the ERS
4. Overall Architecture
The heart of our model checker is the embedding of the structured operational semantics
of Circus in FORMULA and the embedding (obtained by derivation rules) of predicates
describing the classical properties in FORMULA. Obviously, we have made a modular
embedding of the elements so that they are mapped to distinct parts/sections in the FORMULA script. This is illustrated in Figure 5, which is composed by several sections with
dependencies between them (represented by arrows).
80
process ERUs0 =
b begin
state Control == [ allocated, total erus : N ]
InitControl == [ Control 0 | allocated 0 = 0 ∧ total erus 0 = 5 ]
AllocateState == [ ∆Control | allocated 0 = allocated ]
Allocate =
b allocate idle eru → AllocateState Choose
ServiceState == [ ∆Control | allocated 0 = allocated − 1 ]
Service =
b service rescue → ServiceState Choose
Choose =
b
if [ Control | allocated = 0 ] → Allocate
[][ Control | allocated = total erus ] → Service
[][ Control | allocated > 0 ∧ allocated < total erus ] →
Allocate 2 Service
fi
;
;
;
• InitControl Choose
end
process InitiateRescueFault1Activation =
b begin
CallCentreStart =
b start rescue → FindIdleEru
FindIdleEru =
b find idle erus → (IdleEru 2 (wait → FindIdleEru))
IdleEru =
b allocate idle eru → send rescue info to eru → IR1
IR1 =
b (process message → FAReceiveMessage) 2 (fault 1 activation → IR2)
FAReceiveMessage = receive message → ServiceRescue
ServiceRescue = service rescue → CallCentreStart
IR2 =
b IR2Out 2 (error 1 detection → FAStartRecovery)
IR2Out =
b drop message → target not attended → CallCentreStart
FAStartRecovery =
b start recovery 1 → end recovery 1 → ServiceRescue
• CallCentreStart
end
process Recovery1 =
b begin
Recovery1Start =
b start recovery 1 → log fault 1 → resend rescue info to eru →
process message → receive message → end recovery 1 → Recovery1Start
• Recovery1Start
end
process ERSystemi∈{0,1} =
b InitiateRescueFault1Activation
process ERSystem2 =
b ERSystem1
k
k
ERUsi
ERUsSignals
Recovery1
RecoverySignals
chanset ERUsSignals == {| allocate idle eru, service rescue |}
chanset RecoverySignals == {| start recovery 1, end recovery 1 |}
ListingC 1: Emergency Response System processes
Figure 5. Overall Embedding
81
The section Auxiliary Definitions contain the representation of types and operations over it. The Circus syntax is conveniently mapped (transcript) to a specific
Syntax Domain section, which defines abstractions for Circus constructs. Then each
Circus SOS rule is mapped to a logical rule in the Semantics Domain section of the
FORMULA script. It contains abstractions useful to create the LTS according to the firing rules of the operational semantics of Circus [Woodcock et al. 2005]. Afterwards, the
properties (described in first-order logic) are translated to FORMULA queries (kept in the
Properties Domain section) and define the way the search engine will analyse the LTS
to validate properties.
4.1. Comparisons
Palikareva et al. [Palikareva et al. 2012] propose a prototype called SymFDR, which implements a bounded model checker for CSP based on SAT-solvers. The authors make a
comparison to show that SymFDR can deal with problems beyond FDR (such as combinatorial complex problems). They reported that FDR outperforms SymFDR when a
counter-example does not exist. In our work we extend the class of problems analysable
by SymFDR with the aid of SMT-solving. This resulted in a more expressive approach
to create the LTS because we do not depend on FDR. This makes our approach able to
handle infinite state systems while SymFDR can only deal with systems that FDR can.
Leuschel [Leuschel 2001] proposes an implementation of CSP in SICStus Prolog
for interpretation and animation purposes. Part of the design of our model checker in
FORMULA follows a similar declarative and logic representation. However, as we handle
infinite state systems, we indeed implement a future work of [Leuschel 2001].
The advances of SMT-solving bring a new level of verification. Bjørner et
al. [Bjørner et al. 2012] extend the SMT-LIB to describe rules and declare recursive predicates, which can be used by symbolic model checking. Alberti et al. [Alberti et al. 2012]
propose an SMT-based specification language to improve the verification of safety properties. We use Circus as the language and a model checker with a new perspective for
reasoning about infinite systems, where SMT-solving allows automatic verification and
reasoning of concurrent systems described in Circus.
Another similar approach was proposed in [Verdejo and Marti-Oliet 2002] and
uses MAUDE for executing and verifying CCS (Concurrent Communicating Systems).
According to that work, only behavioural aspects can be handled, whereas we deal with
data aspects even if they come from an infinite domain and are involved in communications and in predicates. Moreover, that work also considers temporal logic, whereas we do
not (it is not a Circus culture but FORMULA can handle it). We point out that MAUDE
can be more powerful than FORMULA but it can be harder to guarantee convergence
when applying rewriting rules. Our work is free of convergence problems because the
engine of FORMULA focuses on finding the least fixed-point using SMT solving.
5. Final Remarks
This work proposed a model checker for Circus that can handle infinite data (involved
in communications and in predicates). The relation between first-order logic and FORMULA assures the semantic correctness of the model checker. The development strategy
used here represents a remarkable result: it uses principles of model driven development,
where a conceptual and abstract model (SOS) is the starting point for the implementation.
82
The use of FORMULA as underlying framework has been crucial for reducing
time and complexity in the development life-cycle. The real implementation of Circus semantics took about 3 days. If compared with normal approaches for implementing model
checkers, this is incredibly smaller. Besides that, the correctness is another immediate
benefit of our development strategy. Our model checker is freely available at https://
sites.google.com/site/adalbertocajueiro/research/circusmc3 .
Our work is used in the context of the COMPASS4 , which uses CML—a formal
language based on the maturity of Circus and is a combination of VDM, CSP, and the
refinement calculus of Morgan [Morgan 1990]. Our model checker has been adapted to
CML and also allows time aspects. It is available in a single language and tool support.
As future work we intend to propose a DSL [Fowler 2010] for describing SOS
rules (following [Corradini et al. 2000], for example), using Stratego/XT [Visser 2004] or
QVT [Dan 2010] to automatize the generation of FORMULA abstractions, giving a UTP
semantics [Hoare and He 1998] to FORMULA and developing a refinement calculus.
References
Alberti, F., Bruttomesso, R., Ghilardi, S., Ranise, S., and Sharygina, N. (2012). Reachability Modulo Theory Library (Extended abstract) . In SMT Workshop.
Andrews, Z., Didier, A., Payne, R., Ingram, C., Holt, J., Perry, S., Oliveira, M., Woodcock, J., Mota, A., and Romanovsky, A. (2013a). Report on timed fault tree analysis
— fault modelling. Technical Report D24.2, COMPASS.
Andrews, Z., Payne, R., Romanovsky, A., Didier, A., and Mota, A. (2013b). Model-based
development of fault tolerant systems of systems. In Systems Conference (SysCon),
2013 IEEE International, pages 356–363.
Bjørner, N., McMillan, K., and Rybalchenko, A. (2012). Program Verification as Satisfiability Modulo Theories. In SMT Workshop.
Clarke, E., Grumberg, O., and Long, D. (1994). Model Checking and Abstraction. ACM
Trans. on Programming Languages and Systems, 16(5):1512–1542.
Corradini, A., Heckel, R., and Montanari, U. (2000). Graphical operational semantics. In
ICALP Satellite Workshops, pages 411–418.
Dan, L. (2010). QVT Based Model Transformation from Sequence Diagram to CSP. In
Engineering of Complex Computer Systems (ICECCS), 2010 15th IEEE International
Conference on, pages 349 –354.
Dao-Tran, M., Eiter, T., Fink, M., and Krennwallner, T. (2010). First-order encodings for
modular nonmonotonic datalog programs. In de Moor, O., Gottlob, G., Furche, T., and
Sellers, A. J., editors, Datalog, volume 6702 of Lecture Notes in Computer Science,
pages 59–77. Springer.
De Moura, L. and Bjørner, N. (2008). Z3: an efficient SMT solver. In Proceedings
of the Theory and practice of software, 14th international conference on Tools and
3
Visual Studio 2010 is available under MSDN Licensing. FORMULA is distributed over Microsoft
Research License for non-commercial use only.
4
The EU Framework 7 Integrated Project “Comprehensive Modelling for Advanced Systems of Systems” (COMPASS, Grant Agreement 287829).
83
algorithms for the construction and analysis of systems, TACAS’08/ETAPS’08, pages
337–340, Berlin, Heidelberg. Springer-Verlag.
Fowler, M. (2010). Domain Specific Languages. Addison-Wesley Professional, 1st edition.
Freitas, L. (2005). Model Checking Circus. PhD thesis, University of York.
Hoare, T. and He, J. (1998). Unifying theories of programming, volume 14. Prentice Hall
Englewood Cliffs.
Jackson, E. K., Levendovszky, T., and Balasubramanian, D. (2011). Reasoning about
metamodeling with formal specifications and automatic proofs. In Model Driven Engineering Languages and Systems, pages 653–667. Springer.
Leuschel, M. (2001). Design and Implementation of the High-Level Specification Language CSP(LP). In PADL, volume 1990 of LNCS, pages 14–28. Springer.
Liu, Y., Sun, J., and Dong, J. (2010). Developing Model Checkers Using PAT. In Bouajjani, A. and Chin, W.-N., editors, Automated Technology for Verification and Analysis,
volume 6252 of Lecture Notes in Computer Science, pages 371–377. Springer Berlin
Heidelberg.
Morgan, C. (1990). Programming from Specifications. Prentice-Hall, Inc., Upper Saddle
River, NJ, USA.
Mota, A. and Sampaio, A. (2001). Model-checking CSP-Z: strategy, tool support and
industrial application. Science of Computer Programming, 40(1):59–96.
Palikareva, H., Ouaknine, J., and Roscoe, A. W. (2012). SAT-solving in CSP Trace Refinement. Sci. Comput. Program., 77(10-11):1178–1197.
Roscoe, A. (2010). Understanding Concurrent Systems. Springer.
Roscoe, A. W. et al. (1994). Model-checking csp. A classical mind: essays in honour of
CAR Hoare, pages 353–378.
Rossi, F., van Beek, P., and Walsh, T., editors (2006). Handbook of Constraint Programming. Elsevier.
Verdejo, A. and Marti-Oliet, N. (2002). Executing and Verifying CCS in Maude. Technical report, Dpto. Sist. Informaticos y Programacion, Univ. Complutense de.
Visser, E. (2004). Program transformation with stratego/xt. In Domain-Specific Program
Generation, pages 216–238. Springer.
Woodcock, J. and Cavalcanti, A. (2002). The semantics of circus. In Proceedings of the
2Nd International Conference of B and Z Users on Formal Specification and Development in Z and B, ZB ’02, pages 184–203, London, UK, UK. Springer-Verlag.
Woodcock, J., Cavalcanti, A., and Freitas, L. (2005). Operational semantics for model
checking circus. In Fitzgerald, J., Hayes, I., and Tarlecki, A., editors, FM 2005: Formal Methods, volume 3582 of Lecture Notes in Computer Science, pages 237–252.
Springer Berlin Heidelberg.
Woodcock, J. and Davies, J. (1996). Using Z: Specification, Refinement, and Proof. Prentice Hall International Series in Computer Science.
84
SPLConfig: Product Configuration in Software Product Line
Lucas Machado, Juliana Pereira, Lucas Garcia, Eduardo Figueiredo
Department of Computer Science, Federal University of Minas Gerais (UFMG), Brazil
{lucasmdo, juliana.pereira, lucas.sg, figueiredo}@dcc.ufmg.br
Abstract. Software product line (SPL) is a set of software systems that share a
common set of features satisfying the specific needs of a particular market
segment. A feature represents an increment in functionality relevant to some
stakeholders. SPLs commonly use a feature model to capture and document
common and varying features. The key challenge of using feature models is to
derive a product configuration that satisfies all business and customer
requirements. To address this challenge, this paper presents a tool, called
SPLConfig, to support business during product configuration in SPL. Based
on feature models, SPLConfig automatically finds an optimal product
configuration that maximizes the customer satisfaction.
Demo Video. https://www.youtube.com/watch?v=QLHtIY8oHT8
1. Introduction
The growing need for developing larger and more complex software systems demands
better support for reusable software artifacts [Pohl et al., 2005]. In order to address these
demands, software product line (SPL) has been increasingly adopted in software
industry [Clements and Northrop, 2001; Apel et al., 2013]. SPL is a set of software
systems that share a common set of features satisfying the specific needs of a particular
market segment [Pohl et al., 2005]. It is built around a set of common software
components with points of variability that allow product configuration [Clements and
Northrop, 2001]. Large companies, such as Hewlett-Packard, Nokia, Motorola, and
Dell, have adopted SPL practices1.
The potential benefits of SPLs are achieved through a software architecture
designed to increase reuse of features in several SPL products. An important concept of
an SPL is the feature model. Feature models are used to represent the common features
found on all products of the product line (known as mandatory features) and variable
features that allow distinguishing between products in a product line (generally
represented by optional or alternative features) [Czarnecki and Eisenecker, 2000; Kang
et al., 1990]. Variable features define points of variation and their role is to permit the
instantiation of different products by enabling or disabling specific SPL functionality. In
practice, developing an SPL involves modeling features to represent different
viewpoints, sub-systems, or concerns of the software system [Batory, 2005].
A fundamental challenge in SPL is the process of enabling and disabling features
in a feature model for a new software product configuration [Pohl et al., 2005]. As the
number of features increase in a feature model, so does the number of product options in
1
http://splc.net/fame.html
85
an SPL [Benavides et al., 2005]. For instance, an SPL where all features are optional can
instantiate 2n different products where n is the number of features. Moreover, once a
feature is selected, it must be verified to conform to the myriad constraints in the feature
model, turning this process into a complex, time-consuming, and error-prone task.
Industrial-sized feature models with hundreds or thousands of features make this
process impractical. Guidance and automatic support are needed to increase business
efficiency when dealing with many possible combinations in an SPL.
This paper presents a tool, called SPLConfig2, to support automatic product
configuration in SPL. The main goal of the SPLConfig tool is to derive an optimized
features set that satisfies the customer requirements. The primary contribution of this
tool is to assist businesses during product configuration, answering the following
question: What is the set of features that balances cost and customer satisfaction, based
on available budget? By resolving this problem, industries can more effectively achieve
greater customer satisfaction.
The rest of this paper is organized as follow. Section 2 describes the problem.
Section 3 presents the SPLConfig architecture. Section 4 discusses the design and
implementation of this tool. Examples are used to validate the tool main functionalities
in Section 5. Section 6 presents some related work. Finally, Section 7 concludes and
points out directions for future work.
2. Problem Description
Feature models represent the common and variable features in SPL using a feature tree
[Kang et al., 1990]. In feature models, nodes represent features and edges show
relationships between parent and child features [Batory, 2005]. These relationships
constraint how the features can be combined. As an example, the mobile phone industry
uses features to specify and build software for configurable phones. The software
product deployed into a phone is determined by the phone characteristics. Figure 1
depicts a simplified feature model of an SPL, called MobileMedia [Figueiredo et al.,
2008], inspired by the mobile phone industry. It was developed for a family of 4 brands
of devices, namely Nokia, Motorola, Siemens, and RIM.
Figure 1. Example of a Feature Model for a Mobile Phone Product Line
2
SPLConfig is available at https://sourceforge.net/p/splconfig/
86
A key need with SPL is determining how to configure an optimized feature set
that satisfies the customer requirements. In Figure 1, features refer to functional
requirements of the MobileMedia SPL. However, feature may also be associated with
non-functional requirements. Kang et al. [1990] suggest the need to take into account
non-functional requirements since 1990. For instance, considering the MobileMedia
SPL, it is possible to identify non-functional requirements related to each feature, such
as cost and benefit of each feature to the customer. It means that every product differs
not only because of its functional features, but also because of its non-functional
requirements. Therefore, we propose to extend feature models with non-functional
requirements by adapting the proposed notation in Benavides et al. [2005] to our
problem.
Figure 2 illustrates the non-functional requirements in feature model of Figure 1.
In Figure 2, all features (mandatory and optional features) have the cost attribute, and
only optional features have the benefit attribute. For example, the optional feature
Favourites has cost and benefit attributes, while the mandatory feature MediaSelection
has only the cost attribute. Note that the benefit attribute is classified into six qualitative
levels: none, very low, low, medium, high and very high. Our goal is to find an optimal
solution by means of an objective function that maximizes customer satisfaction
(benefit/cost) without exceeding the available budget (in the lower right corner of the
Figure 2). As a motivating example, given a mobile phone product line that includes a
variety of varying features, what is the product that best meets the customer
requirements limited by a given budget? The challenge is that with hundreds or
thousands of features, it is hard to analyze all different product configurations to find an
optimal configuration.
Figure 2. A Feature Model Decorate with Non-Functional Requirements
Therefore, the main goal of the SPLConfig tool is to propose an automatic
product configuration method based on search-based software engineering (SBSE)
techniques [Harman & Jones, 2001]. Figure 3 presents an abstract overview of our
method. As shown in Figure 3, from a feature model we use search-based algorithms to
derive an optimized product configuration that maximizes the customer satisfaction,
subject to business requirements (cost), customer requirements (benefit and budget),
composition constraints, and cross-tree constraints. It is noteworthy that these
algorithms are minutely detailed in Pereira (2014).
87
Figure 3. Overview of the Method to Automatic Product Configuration
3. SPLConfig Architecture
SPLConfig is an Eclipse plugin implemented in Java. It requires one additional plugin,
named FeatureIDE, in order to support product configuration. Figure 4 presents the
architecture of SPLConfig and its relationships with FeatureIDE and Eclipse platform.
The decision of extending FeatureIDE is particularly attractive for two reasons. First,
FeatureIDE is an open source, extensible, modular, and easy to understand tool. Second,
our motivation in extending FeatureIDE occurred because we could reuse the key
functionalities of typical feature modeling tools, such as to create and edit a feature
model, to automatically analyze the feature model, product configuration basic
infrastructure, code generation, and import/export feature model. FeatureIDE is a
development tool for SPL largely used and it supports all phases of feature-oriented
software development of SPLs: analysis and application domains and code generation.
Figure 4. SPLConfig's Architecture
Automatic product configuration allows the customization of different products
according to the business and customer requirements, composition, and cross-tree
constraints of the feature model (SPLConfig in Figure 4). Two important activities are
performed in the SPL lifecycle, domain engineering and application engineering,
described below:
Domain Engineering. This process is represented by a feature model and is supported
by FeatureIDE (feature model editor in Figure 4). Common and variable requirements of
the product line are elicited and documented. The developers create reusable artifacts of
a product line in such a way that these artifacts are sufficiently adaptable (variable) to
efficiently derive individual applications from them.
Application Engineering. From meetings, the developers identify and describe the
requirements prioritization of customer (SPLConfig in Figure 4). SPLConfig prioritize
88
and create reports to aid the product builder when defining a product configuration for a
specific customer. Our product configuration algorithm uses an optimization scheme
and provides a valuable decision support to product configuration within the SPL. The
product configuration result is visualized in FeatureIDE (product configuration module
in Figure 4). It is important to observe that the tool can be easily extended - by means of
new algorithms - to support additional non-functional features.
As show in Figure 4, feature modeling in early stages enables to decide which
features should be supported by an SPL and which should not. This result is a feature
model represented in FeatureIDE. In a second stage, during the product configuration, it
is necessary to choose the features that appropriately fulfill the business and customer
requirements. This stage allows a single product configuration to be created through
search-based algorithms in SPLConfig. This product configuration is presented by
FeatureIDE in the product configuration module. Note that the current implementation
of SPLConfig randomly picks up one of the optimal solutions. However, we are
working on a new version that presents a sorted list of the top best solutions. Therefore,
the customer is going to have several options of products to choose from. In a third
stage, SPLConfig allows manual configuration of features if necessary. In a fourth stage,
it also allows compiling and building the product.
4. Design and Implementation Decisions
This section discusses some design and implementation decisions we made in the
development of SPLConfig. Figure 5 shows a screenshot of SPLConfig view in the
Eclipse IDE. Figure 5 shows the package explorer view typical of Eclipse IDE (a). It
also illustrates the FeatureIDE model editor (b) and outline view (c). Figure 5 (d) show
details of the SPLConfig view.
Figure 5. SPLConfig View in the Eclipse IDE
89
Figure 5 (d) presents the main view of SPLConfig integrated with the Eclipse
IDE. At the top of the view, we have two fields named Budget and Customer that should
be filled by the customer with the available budget and with the customer identification,
respectively. Each feature in the feature model is presented in a row of this view.
Columns give additional information, such as the feature name, the level of importance
of the feature to the customer (Benefit), and the cost of development of each feature
(Cost). The tool is supposed to be used by developers who are expected to translate
qualitative information from customers into quantitative ones. Moreover, this view
includes typical buttons, such as Refresh and Execute, which trigger their respective
actions. Refresh is used to update data being presented in this view, while Execute
selects the most appropriate product configuration that best satisfies the customer
requirements.
In addition to this view, we extend Eclipse with a preference page presented in
Figure 6. Industries can set specific preferences, as the cost of development of each
feature that makes up the SPL. Therefore, for the same SPL, several products can be
generated according to needs and constraints specific of each customer, but keeping the
cost of each feature fixed. Note that the FeatureIDE view can be used to show the
product configuration that best satisfies the customer requirements, but it does not
prevent other features from being included or excluded in the final product (i.e., manual
tuning).
Figure 6. SPLConfig Preference Page
5. Preliminary Evaluation
We reviewed a large number of research works in the field of SPL and selected a set of
ten feature models that were used as benchmark instances to evaluate SPLConfig:
MobileMedia [Figueiredo et al., 2008], Email System [Thüm et al., 2014], Smart Home
[Alferez et al., 2009], Devolution [Thüm et al., 2014], Gasparc [Aranega et al., 2012],
90
Web Portal [Mendonça et al., 2008], FraSCAti [Seinturier et al., 2012], Model
Transformation [Czarnecki & Helsen, 2003], Battle of Tanks [Thüm & Benduhn, 2011],
and e-Shop [Lau, 2006]. We found that the time spent in the configuration of a feature
model with 213 features (e-Shop) is about 6 milliseconds. However, so far, SPLConfig
has scaled well for all evaluated feature models (up to 3 hundreds features). It is
noteworthy that evaluation details are available in Pereira (2014).
6. Related Tools
Automated reasoning is a challenging field in SPL engineering. Mendonça et al. (2009)
and Thüm et al. (2014) have proposed visualization techniques for representing staged
feature configuration in SPLs. However, industrial-sized feature models with hundreds
or thousands of features make a manual feature selection process hard. Moreover, these
approaches focus on functional features of a product and their dependencies; neglecting
non-functional requirements. To the best of our knowledge, we still lack tools to deal
with non-functional features.
7. Conclusions and Future Work
This paper presented SPLConfig, a tool developed for product configuration in SPL by
the use of search-based algorithms. We described the problem handled by the tool
(Section 2) and summarized the method behind the tool and its main functionalities
(Sections 3 and 4). Our results so far are, in general, satisfactory. Nevertheless, our
goals for future work include the improvement of SPLConfig tool in order to (i) include
others non-functional requirements [Kang et al. 1998] and (ii) consider the difficulties
of developing and maintaining of the product for the industries, in order to minimize the
development effort integrating features to compose a product. Further work should also
address case studies in industries in order to ensure that realistic instances of
requirements are being used, through direct contact with the business and customers.
Acknowledgment
This work was partially supported by CNPq (grant Universal 485907/2013-5) and
FAPEMIG (grants APQ-02532-12 and PPM-00382-14).
References
Apel, S., Batory, D., Kästner, C., and Saake, G. (2013). Feature-Oriented Software
Product Lines: Concepts and Implementation. Springer-Verlag.
Alferez, M., Santos, J., Moreira, A.and Garcia, A., Kulesza, U., Araujo, J., and Amaral,
V. (2009). Multi-view composition language for software product line requirements.
In International Conference on Software Language Engineering (SLE), p. 136–154.
Aranega, V., Etien, A., and Mosser, S. (2012). Using feature model to build model
transformation chains. In 15th international conference on Model Driven Engineering
Languages and Systems (MODELS), pages 562–578.
Batory, D. S. (2005). Feature models, grammars and propositional formulas. In 9th
International Software Product Lines Conference (SPLC), pages 7–20.
91
Benavides, D., Martin-Arroyo, P. T., and Cortes, A. R. (2005). Automated reasoning on
feature models. In 17th Conference on Advanced Information Systems Engineering
(CAiSE), pages 491–503.
Clements, P. and Northrop, L. (2001). Software product lines: Practices and patterns.
Addison-Wesley.
Czarnecki, K. and Eisenecker, U. W. (2000). Generative programming: Methods, tools,
and applications. Addison-Wesley.
Czarnecki, K. and Helsen, S. (2003). Classification of model transformation approaches.
Available
at:
http://www.ptidej.net/course/ift6251/fall05/presentations/
050914/Czarnecki_Helsen.pdf/.
Figueiredo, E. et al. (2008). Evolving software product lines with aspects: An empirical
study. In 30th International Conference on Software Engineering (ICSE), p. 261-270.
Harman, M. and Jones, B. F. (2001). Search based software engineering. Journal
Information and Software Technology, 43(0):833–839.
Kang, K., Cohen, S., Hess, J., Novak, W., and Peterson, S. (1990). Feature-oriented
domain analysis (FODA) feasibility study. Technical Report. CMU/SEI-90-TR-21.
ESD-90-TR-222.
Kang, K., Kim, S., Lee, J., Kim, K., Shin, E., and Huh, M. (1998). Form: A featureoriented reuse method with domain-specific reference architectures. Annals of
Software Engineering, 5(1):143–168.
Lau, S. Q. (2006). Domain analysis of e-commerce systems using feature-based model
templates. Master’s thesis. University of Waterloo, Canada.
Mendonça, M., Bartolomei, T., and Donald, C. (2008). Decision-making coordination in
collaborative product configuration. In ACM Symposium on Applied Computing
(SAC), p. 108–113.
Mendonça, M., Branco, M., and Cowan, D. (2009). S.p.l.o.t.: Software product lines
online tools. In 24th Conference on Object-Oriented Programming Systems,
Languages, and Applications (OOPSLA), pages 761–762.
Pereira, Juliana A. Search-Based Product Configuration in Software Product Lines.
April 2014. Federal University of Minas Gerais. Belo Horizonte - April 2014.
Pohl, K., Böckle, G., and Van der Linden, F. J. (2005). Software product line
engineering: Foundations, principles and techniques. Springer-Verlag.
Seinturier, L., Merle, P., Rouvoy, R., Romero, D.and Schiavoni, V., and Stefani, J.
(2012). A component-based middleware platform for reconfigurable service-oriented
architectures. Software Practice and Experience, 42(5):559–583.
Thüm, T. and Benduhn, F. (2011). Spl2go: An online repository for open-source
software product lines. Available at: http://spl2go.cs.ovgu.de/projects. [Online;
accessed 10-December-2013].
Thüm, T., Kästner, C., Benduhn, F., Meinicke, J., Saake, G., and Leich, T. (2014).
Featureide: An extensible framework for feature-oriented software development.
Journal Science of Computer Programming, 79(0):70–85.
92
SPLICE:
Software Product Line Integrated Construction Environment
Bruno Cabral1,3 , Tassio Vale2,3 , Eduardo Santana de Almeida1,3
1
Computer Science Department
Federal University of Bahia (UFBA) – Salvador, BA – Brazil
2
Center of Exact Sciences and Technology
Federal University of Recôncavo da Bahia (UFRB) – Cruz das Almas, BA – Brazil
3
RiSE Labs – Reuse in Software Engineering – Salvador, BA – Brazil
[email protected], [email protected], [email protected]
Abstract. A Software Product Lines (SPL) is basically, a set of products developed from reusable assets. During the development of a SPL, a wide range of
artifacts needs to be created and maintained to preserve the consistency of the
family model during development, and it is important to manage the SPL variability and the traceability among those artifacts. In this paper, we propose the
Software Product Line Integrated Construction Environment (SPLICE). SPLICE
is an open source web-based life cycle management tool for managing, in an
automated way, the software product line activities. This initiative intends to
support most of the SPL process activities such as scoping, requirements, architecture, testing, version control, evolution, management and agile practices.
Overview video: http://youtu.be/GQBxQSejFYU
1. Introduction
Software Product Line (SPL) is considered one of the most effective methodologies for
developing software products and is proven as a successful approach in business environments. The successful introduction of a software product line provides a significant
opportunity for a company to improve its competitive position [Pohl et al. 2005]. However, managing a SPL is not so simple, since it demands planning and reuse, adequate
management and development techniques, and also the ability to deal with organizational
issues and architectural complexity [Cavalcanti et al. 2012].
During the development of an SPL, a wide range of artifacts needs to be created
and maintained to preserve the consistency of the family model during development, and
it is important to manage the SPL variability and the traceability among those artifacts.
However, this is a hard task, due to the heterogeneity of assets developed during the SPL
lifecycle. Maintaining the traceability of artifacts updated manually is error-prone, time
consuming and complex. Therefore, using a Project management system for supporting
those activities is essential.
A large number of CASE tools exist in for assisting Software Engineering activities and there are also specific tools for SPL engineering [Lisboa 2008]. However, they are
complex, formal and enforce a specific project management process, without a suitable
customization. Another issue is that the tools focus on a specific activity of the process,
93
forcing the Software Engineer to use different tools to support their process. By using
different tools, it is hard to automatize trace links (links from one artifact to another), providing traceability among the artifacts. Moreover, software engineers also have to provide
the installation, maintainability and user management for a number of tools, or rely on an
external person for those tasks not directly related to the product development.
Aiming to address such issues, we present the Software Product Lines Integrated
Construction Environment (SPLICE). SPLICE is a web-based software product line lifecycle management tool, providing traceability and variability management and supporting
most of the SPL process activities such as scoping, requirements, architecture, testing,
version control, evolution, management and agile practices. This tool assists the engineers involved in the process, with the assets creation and maintenance, while providing
traceability and variability management, as well offering detailed reports and enabling engineers to easily navigate between the assets using the traceability links. It also provides
a basic infrastructure for development, and a centralized point for user management.
The remaining of this paper is organized as follows: Section 2 addresses the what
was been proposed in the literature and tools available in the market; Section 3 presents
the tool, including the requirements of the tool, the proposed metamodel and its general architecture; Section 4 shows a case study conducted inside a research laboratory to
validate the tool; Section 5 provides the concluding remarks.
2. Related Work
Schwaber [Schwaber 2006] defines an ALM tool as a tool that provides: ”The coordination of development lifecycle activities, including requirements, modeling, development, build and testing, through: 1) enforcement of processes that span these activities;
2) management of relationships between development artifacts used or produced by these
activities; and 3) reporting on progress of the development effort as a whole.”
During our search[Cabral 2014] for similar commercial or academic tools, we initially came up with 221 possible tools related to ALM or general project management,
but after filtering, it was reduced to twenty-three tools. The selection criteria comprised
the support for the following assets: Requirements elicitation and/or use cases; Planning using agile methodologies; Issue reports management; Testing; Feature modeling.
In addition, the tool should provide traceability among the managed assets; Flexibility
for changing and modeling the tool metamodel for specific needs; the tool has features
oriented specifically for SPL development.
According to our analysis, the desired characteristics were sparsely supported
among different tools. Only the IBM Jazz Collaborative Lifecycle Management, Polarium ALM, codeBeamer ALM and Endeavour Agile ALM fulfilled multiple characteristics,
however with some issues. Issues from the other solutions included missing characteristics, metamodel inflexibility and absence of support for SPL development. A detailed
comparison table is available on the web 1 .
1
http://brunocabral.com.br/CBSoft/Comparison.png
94
3. SPLICE
SPLICE is a open source (GNU General Public License2 ), Python3 web-based tool built
in order to support and integrate SPL activities such as requirements management, architecture, coding, testing, tracking, and release management, providing process automation
and traceability. Our tool provides the infrastructure of Version Control and Issue Tracking, enabling stakeholders and Software Engineers to create the architecture and artifacts
in a controllable and traceable way. The tool requirements are described as follows:
• FR1 - Traceability of lifecycle artifacts. It should identify and maintain relationships between managed lifecycle artifacts.
• FR2 - Reporting of lifecycle artifacts. It must use lifecycle artifacts and traceability information to generate needed reports from the lifecycle product information.
• FR3 - Metamodel Implementation. The SPLICE should implement entities and
relationships described in a defined metamodel. The metamodel created comprises
the relationships among the SPL assets, allowing traceability and facilitating the
evolution and maintenance of the SPL.
• FR4 - Issue Tracking. Issue Tracking play a central role in software development,
and the tool must support it. It is used by developers to support collaborative bug
fixing and the implementation of new features. In addition, it is also used by other
stakeholders such as managers, QA experts, and end-users for tasks such as project
management, communication and discussion, code reviews, and story tracking.
• FR5 - Agile Planning.
In the software industry, there is a strong
shift from traditional phase-based development towards agile methods and
practices[Bjarnason et al. 2011]. The tool must support it.
• FR6 - Configuration management. For evolution management, the tool must
support change management across all the managed artifacts. It must also support creating and controlling the mainstream version management systems such
as SVN4 and GIT5 .
• FR7 - Unified User management. As an integrated environment, with the plan
to cover all the lifecycle activities, the tool can use a number of external tools,
taking advantage of the vibrant community and quality present in some opensource/freely available tools. For convenience, it must provide a unified user management between all the tools.
• FR8 - Collaborative documentation. Wiki is a collaborative authoring system
for collective intelligence which is quickly gaining popularity in content publication and it is a good candidate for documentation creation. Wikis collaborative
features and markup language make it very appropriate for documentation of Software Development tasks.
• FR9 - Artifacts search. All artifacts managed by the tool must support keyword
search, and present the results in an adequate way.
For brevity, the non-functional requirements will not be described. However, it included
Easy access, Metamodel Flexibility, Extensibility, Usability, Accountability, Transparency and Security.
2
https://www.gnu.org/copyleft/gpl.html
http://python.org/
4
http://subversion.apache.org/
5
http://git-scm.com/
3
95
3.1. Metamodel
To address the previously defined functional and non-functional requirements, we decided to use a model-driven approach to represent all the information, activities and
connections among artifacts. With a well-defined metamodel we can provide automation and interactive tool support to aid the corresponding activities. Several metamodels
have been proposed in the literature in the past [Buhne et al. 2005, Cavalcanti et al. 2012,
Schubanz et al. 2013]. However, we argue that all the metamodels proposed a traditional
phase-based methodology, and do not fit with a more lightweight, flexible and informal
methodology such as agile methods.
We propose a lightweight metamodel, adapted from [Cavalcanti et al. 2012], representing the interactions among SPL assets, developed in order to provide a way of managing traceability and variability. The proposed metamodel represents the reusable assets
involved in a SPL project, and simplified description of the models is presented next. The
complete metamodel, with all relations is available on the web 6 .
• Scoping Module comprises the Feature and the Product Model. Many artifacts
relates directly with the Feature Model including Use Case, Glossary, User Story
and Scope Backlog. A Product is composed of one or more Features.
• Requirements Module involves the requirement engineering traceability and interactions issues, considering the variability and commonality in the SPL products. The main object of this SPL phase is Use Case. The Use Case model is
composed by description, precondition, title and a number of MainSteps and AlternativeSteps.The concept of User stories is used in this metamodel to represent
what a user does or needs to do as part of his or her job function. It is composed
by a name and the associated template : As a, I want and So that.
• Testing Module is composed of a name, description, the Expected result and a
set of Test Steps. One Test Case can have many Test Execution that represent one
execution of it. The reasoning for the Test Execution is to enable a test automation
machinery.The metamodel also represent the acceptance testing with the Acceptance Test and Acceptance Test Execution.
• Agile Planning Module contains Sprint Planning models, which are composed
of a number of Tickets, a deadline, an objective and a start date. At the end of the
sprint, it happens a retrospective, represented in the model by Sprint Retrospective,
that contains a set of Strong Points and Should be Improved models that express
what points in the spring was adequate, and what needs improvement.
3.2. Architecture
SPLICE architecture is composed of Trac, a core (Tonho) module, a database module, an
Authentication Control (Maculellê) module and a set of versioning and revision control
tools. Trac7 is an open source, Web-based project management and bug tracking system
and it is used as a foundation to the SPLICE. On top of that, two separated modules were
built to provide the missing functionality. Tonho is the main module, where the metamodel
and functionality not provided by Trac is implemented and the Authentication manager is
the module who provides the Single sign-on property among the tools, supplying unified
access control with a single login.
6
7
http://brunocabral.com.br/CBSoft/Metamodel.png
http://trac.org
96
Figure 1. SPLICE architecture overview
Figure 1 illustrates a simplified organization of the architecture: the Authentication Control validates the request and enable access to the tools if the user have the right
credentials; Trac is responsible for Issue Tracking, managing the versioning and revision control tools, collaborative documentation and plugin extensibility; the main module
(Tonho), it is where the metamodel is implemented and has sub-modules for each metamodel module, such as: Scoping (SC), Requirements (RQ), Tests (TE), Agile Planning
(AP) and Other (OT); Versioning and revision control is part of software configuration
management, and is composed of the set of external Version control systems (VCS) tools
such as SVN, GIT and Mercurial. It tracks and provides control over changes to source
code, documents and other collections of information; and finally, the Database stores in
an organized way all the asserts and do the persistence of the data among the tools.
The bridge between Tonho and Trac is made using plugins, a shared database and
templates. Absolutely no modification was done to Trac core. This solution allows to easy
upgrading of Trac in the future taking advantage of new features and security fixes. All
modules of the architecture share the same Database and have the same template design.
3.3. Main Functionalities
Aiming to address the requirements described previously, the main functionalities of
SPLICE includes:
• Metamodel Implementation All the screens are completely auto-generated based
on the models descriptions, allowing the Software Engineer to easily modify the
process and to specific project needs. For every model, a complete CRUD (Create, Read, Update and Delete) system is created, but idiosyncrasies can be easily
customized. The SPLICE also provide advanced features such as filtering and
classification.
97
• Issue Tracking Based on trac core, the SPLICE have a full-featured Issue Tracking. We extended it to implement SPL specific features and to provide traceability
between other assets.
• Traceability SPLICE provides total traceability for all assets in the metamodel,
and is able to report direct and indirect relations between them. In reports, asserts
have hyperlinks, enabling the navigation between them.
• Custom SPL Widgets SPLICE has a set of custom widgets to represent specific
SPL models. Such as Feature Map, Feature restriction, Product Map, and Agile
Poker planning.
• Change history and Timeline SPLICE has a rich set of features to visualize how
the project is going, where the changes are happening, and who did it. For every
Issue or Asset, a complete Change history is recorded.
• Unified Control Panel The tool aggregates the configuration of all external tools
in a unified interface. With the same credentials, the user is able to access all
SPLICE features, including external tools as VCS.
• Agile Planning The SPLICE supports a set of Agile practices such as effort estimation, where team members use effort and degree of difficulty to estimate their
own work. The Features can be dragged by the mouse, and their position is updated in accordance.
• Automatic reports generation SPLICE has the ability of creating reports, including PDFs. The generated report includes a cover, a summary and the set of the
chosen artifact related to the product. This format is suitable for the requirements
validation by stakeholders. The tool is also able to collect all reports for a given
Product, and create a compressed file containing the set of generated reports.
4. Case Study: RescueMe SPL
The SPLICE tool was evaluated during an SPL development project. The study comprised
the migration from a manual Software Engineering process to the SPLICE tool and the
proposed metamodel. Cabral (2013)8 provides a more detailed description of the case
study.
Figure 2. Features list in SPLICE
During the months of June and November 2013, we developed the RescueMe SPL,
following an SPL agile process. RescueMe is a product line developed in Objective-C for
8
Bachelor’s thesis available at http://brunocabral.com.br/CBSoft/thesis.pdf
98
iOS devices, designed to help its users in dangerous situations. It was developed using
an iterative and incremental process, carried out by four developers, with a face-to-face
meeting at the end of each sprint. These meetings were responsible for evaluating results
and planning the next sprint. Before using SPLICE, the group manually maintained the
SPL process based on a set of external tools, such as SourceForge 9 service for issue
tracking and VCS. All the requirements artifacts where maintained using text documents
questionnaires. SPLICE was introduced to manage the SPL process and all artifacts were
migrated to it. After the migration, the development continued to use only the SPLICE to
manage the application lifecycle. Figure 2 shows a list of features using the SPLICE.
After the migration to SPLICE, we selected survey as data collection instrument. To evaluate the applicability of the tool, the survey design was based on
[Kitchenham and Pfleeger 2008] guidelines and is composed of a set of personal questions, closed-ended and open-ended questions related to the research questions. We used
the cross-sectional design, which participants were asked about their past experiences at
a particular fixed point in time. This survey was performed as a self-administered printed
questionnaire, that can be seen on the web10 , and were aimed to the PhD students who are
experts on the subject. Two experts answered the questionnaire, and they have more than 5
years of experience in Software Engineering, and more than 4 years on SPL development.
Analyzing the answers, no one reported any major difficulties during the tool usage. One developer reported a minor problem with the fact the initial screen is the collaborative document and not the assets screen, which is hidden behind a menu item. No
major usability problem was found, and all were able to use and evaluate the tool without
supervision. This can indicate that the tool fulfilled the requirement of Usability .
The experts explicitly stated that the tool was useful, aided on the assets traceability, provided all the traceability links they wanted and offered a valuable set of features. They also stated that would, spontaneously, use the tool in future SPL projects.
The experts also mentioned some points of improvement during the survey. One improvement suggested was the ability to configure the process and the metamodel. This
is a non-functional requirement of the tool, and the SPLICE architecture is capable of it.
However, it requires editing some files manually, so visual editor should be added to the
tool to address this problem. Some other problems includes the need to a better change
impact analysis and integration with variability in source code, to perform product line
derivation, that are on the roadmap for the next version.
5. Conclusion
This paper presented the SPLICE, an open source Python web-based tool built in order to
support and integrate SPL activities, such as, requirements management, architecture,
coding, testing, tracking, and release management, providing process automation and
traceability across the process. Moreover, we also presented a lightweight metamodel for
SPL using agile methodologies, which was implemented as default process in SPLICE.
SPLICE fills a gap on open-source ALM tools focused on SPL. It supports the main SPL
life cycle artifacts, providing traceability between those artifacts and consequently an
integrate and consistent management.Compared to the other tools, SPLICE also have as a
9
10
http://www.sourceforge.net
http://brunocabral.com.br/CBSoft/questionnaire.pdf
99
differential the fact of being web-based, allowing users with different devices to use it and
collaborate. As a future work, we intend to implement enhancements and issues reported
by the experts. Two important upcoming features include visual metamodel editing, so
users can visually adapt the metamodel for their specific needs and source-code variation
support, to archive a complete derivation process. The proposed tool is publicly available
at http://brunocabral.com.br/splice.
Acknowledgments
This work was partially supported by the National Institute of Science and Technology
for Software Engineering (INES11 ), funded by CNPq and FACEPE, grants 573964/2008-4
and APQ-1037-1.03/08 and CNPq grants 305968/2010-6, 559997/2010-8, 474766/20101 and FAPESB.
References
Bayer, J. and Widen, T. (2002). Introducing traceability to product lines. In Revised
Papers from the 4th International Workshop on Software Product-Family Engineering,
PFE ’01, pages 409–416, London, UK, UK. Springer-Verlag.
Baysal, O., Holmes, R., and Godfrey, M. W. (2013). Situational awareness: Personalizing issue tracking systems. In Proceedings of the 2013 International Conference on
Software Engineering, ICSE ’13, pages 1185–1188, Piscataway, NJ, USA. IEEE Press.
Bjarnason, E., Wnuk, K., and Regnell, B. r. (2011). A case study on benefits and sideeffects of agile practices in large-scale requirements engineering. page 5. ACM.
Buhne, S., Lauenroth, K., and Pohl, K. (2005). Modelling requirements variability across
product lines. In Proceedings of the 13th IEEE Conference on Requirements Engineering, RE ’05, pages 41–52, Washington, DC, USA. IEEE.
Cabral, B. S. (2014). Splice: A flexible spl lifecycle management tool.
Cavalcanti, Y. C., do Carmo Machado, I., da Mota Silveira Neto, P. A., and Lobato, L. L.
(2012). Software Product Line - Advanced Topic, chapter Handling Variability and
Traceability over SPL Disciplines, pages 3–22. InTech.
Kitchenham, B. and Pfleeger, S. (2008). Personal opinion surveys. In Shull, F., Singer, J.,
and SjÃ¸berg, D., editors, Guide to Advanced Empirical Software Engineering, pages
63–92. Springer London.
Lisboa, L. B. (2008). Toolday - a tool for domain analysis.
Pohl, K., Böckle, G., and van der Linden, F. (2005). Software Product Line Engineering:
Foundations, Principles, and Techniques.
Schubanz, M., Pleuss, A., Pradhan, L., Botterweck, G., and Thurimella, A. K. (2013).
Model-driven planning and monitoring of long-term software product line evolution.
In Proceedings of the Seventh International Workshop on VAMOS, VaMoS ’13, pages
18:1–18:5, New York, NY, USA. ACM.
Schwaber, C. (2006). The Changing Face Of Application Life-Cycle Management by
Carey Schwaber - Forrester Research.
11
INES - http://www.ines.org.br
100
FlexMonitorWS: uma solução para monitoração de
serviços Web com foco em atributos de QoS
Rômulo J. Franco1 , Cecília M. Rubira1 , Amanda S. Nascimento2
1
Instituto de Computação – Universidade Estadual de Campinas
2
Departamento de Computação – Universidade Federal de Ouro Preto
[email protected]
Resumo. FlexMonitorWS is a solution tool for monitoring Web services that
aims to monitor the QoS attribute values, as well as enable understanding the
degradation factors from the QoS attributes in the context of process monitoring. Solution adopts methods based on Software Product Lines to explore the
software variability in current solutions of monitoring systems, generating a family of monitors responsible in to monitor different QoS attributes and different
resources as IT targets, where different operating methods can be applied. Two
case studies were performed to assess feasibility of tool, obtaining satisfactory
results of delivery QoS attributes values and understanding of its degradation.
Apresentação da ferramenta em: http://youtu.be/3z8f4_Zz9HM
1. Introdução
Arquitetura Orientada a Serviços (SOA) e Linhas de Produtos de Software apoiam a reutilização de Software. SOA é um modelo de componentes de software que interrelaciona
diferentes unidades funcionais, chamadas serviços, por intermédio de interfaces bem definidas, que são independentes de plataformas e linguagens de implementação. Já LPS
pode ser definida como um conjunto de sistemas de software que compartilham características comuns e possuem características distintas visando satisfazer as necessidades de
um nicho de mercado [8]. O principal propósito da engenharia de linhas de produtos é
oferecer produtos personalizados a um custo razoável [11].
Devido às propriedades inerentes de SOA, tais como, dinamicidade, heterogeneidade, distribuição, autonomia dos serviços e a incerteza do ambiente de execução, atributos de qualidade de serviço (QoS) podem sofrer flutuações em seus valores ao longo
do tempo (e.g. disponibilidade e desempenho) [1]. Consequentemente, é necessário monitorar serviços ao longo do tempo a fim de garantir que tais serviços apóiem o nível de
qualidade esperado ou definido, ou seja, definido pelo provedor do serviço.
Contudo, embora exista uma demanda crescente por monitoração de atributos de
QoS, projetar e implementar ferramentas para tal atividade é uma tarefa não trivial. A
monitoração pode ser realizada, por exemplo, a partir de diferentes locais (e.g. do lado
do cliente e/ou servidor), considerando diferentes atributos de QoS e diferentes frequências de monitoração. Há uma carência de soluções que contemplam simultaneamente as
diferentes formas de monitoração de atributos QoS, apoiando, desta forma, requisitos de
diferentes usuários e aplicações.
101
Neste trabalho, apresentamos a FlexMonitorWS, uma solução baseada em LPS
que apoia uma família de ferramentas que permitem monitorar atributos de QoS. Primeiramente, a partir de uma revisão sistemática da literatura, foram estudadas soluções
existentes para monitoração de QoS a fim de indentificar funcionalidades comuns e variáveis entre elas [5, 6, 2, 1, 7]. Tais funcionalidades são inicialmente mapeadas para um
modelo de características e posteriormente implementadas mediante uso de uma arquitetura baseada em componentes da linha de produtos. A ALP visa facilitar a instanciação
de ferramentas (i.e. produtos) específicas conforme características de interesse.
2. Fundamentos de LPS
A visão original da LPS é formada por um domínio concebido a partir das necessidades
do negócio, formulado como uma família de produtos e com um processo de produção
associado [9]. Este domínio apóia a construção rápida e evolução de produtos personalizados conforme mudanças nos requisitos dos clientes. Variabilidade de Software é
um conceito fundamental em LPS e refere-se a capacidade de um sistema de software ou
artefato ser modificado, personalizado ou configurado, para ser utilizado em um contexto
específico [10].
As variabilidades podem ser inicialmente identificadas por meio de características. Característica é uma propriedade de sistema que é relevante para alguma parte interessante e é usada para capturar partes comuns ou variáveis entre sistemas de uma mesma
família [12]. Variabilidades de software podem ser representadas por meio de um modelo
de características, em que há uma classificação, separando características comuns das variáveis em sistemas [8]. Esta classificação ainda pode ser mais detalhada separando-as
em opcionais, alternativas e mandatórias [3].
3. FlexMonitorWS
A FlexMonitorWS é uma ferramenta de monitoração baseada em conceitos de LPS explorando a variabilidade de software existente em sistemas de monitoração de serviços
Web.
A partir de uma revisão sistemática da literatura de soluções de monitoração de
serviços, buscamos responder as seguintes questões: 1) Onde ocorrerá a monitoração
(alvo)? 2) O que se deseja obter com a monitoração ou o que deveria ser monitorado
(atributos de QoS)? 3) De que modo isso pode ser realizado (modo de operação)? 4) Qual
a frequência da monitoração? 5) Quais os meios de se obter resultados ou alertas gerados
pela monitoração (notificação)?
Ao responder tais questões, é possível perceber a variabilidade existente nestes
tipos de sistemas de monitoração. Estas questões foram mapeadas como características e
são representadas no Modelo de características que pode ser conferido na Figura 1.
O modelo de características representado na Figura 1 foi usado para projetar e
implementar a arquitetura da LPS, que permite a instanciação de produtos específicos
conforme requisitos dos diferentes usuários interessados na monitoração. Isto é, ao combinar as possibilidades identificadas como características no modelo da Figura 1 atende a
um monitor, face a uma necessidade específica. Note no modelo representado na Figura 1,
contêm as características mandatórias, como: Alvo, AQoS (Atributo de QoS), Modo de
102
Figura 1. Modelo de características de sistemas de monitoração
Operação, Frequência de Execução e a Notificação. No restante da hierarquia do modelo
têm-se as demais características opcionais.
Figura 2. Diagrama de comunicação representando a arquitetura da LPS da FlexMonitorWS
Figura 2, apresentamos um diagrama de comunicação que representa a interação entre os objetos da FlexMonitorWS. Conforme ilustrado nesta figura, a partir do ob103
jeto central FlexMonitorWS é configurada a Frequência de Execução da monitoração por
meio de um Timer, que uma vez acionado gera uma interrupção na máquina virtual e
dispara o processo de monitoração. A etapa posterior ocorre no Modo de Operação que
obtém amostras e informações sobre atributos de QoS por meio das maneiras associadas
a este ponto (e.g. invocação, interceptação ou inspeção). A etapa seguinte ocorre no objeto AQoS, os atributos de QoS associados a este ponto são calculados os valores para
cada atributo. O objeto de Notificação finaliza o processo, tendo em Tipos de notificação as possíveis maneiras de notificar (e.g. envio de email ou salvar arquivo de Log) os
interessados sobre os resultados obtidos durante a monitoração.
Observe na Figura 2, os elementos no diagrama que são do tipo Control e Entity
permitem oferecer uma visão ligada a pontos de extensão da solução e as possíveis alternativas de projeto associadas a estes pontos. Dado que, estes pontos estão ligados às
características opcionais do modelo da Figura 1.
A interface para configuração da ferramenta é através de um arquivo de parâmetros denominado por target.properties. Este arquivo é separado por blocos, cada bloco
corresponde a um alvo com configurações pertinentes a ele. Sendo assim, vários blocos
podem ser inseridos no arquivo para monitorar um ilimitado número de alvos.
Os três modos de operação possíveis da ferramenta são: interceptação através do
comando TCPDUMP1 , a saída do comando é analisada e interpretada para detectar políticas de segurança implementada no serviço Web; a inspeção de servidores é realizada
utilizando a API Hyperic Sigar que obtêm diversas informações sobre recursos de hardware. Ainda no modo de inspeção, são capturados exceções geradas dentro do arquivo de
log da aplicação servidora; para o modo invocação, utilizamos a API SAAJ2 , o arquivo
de parâmetros target.properties contendo endereço, porta e demais parâmetros do serviço
Web são obtidos e o XML de requisição criado em tempo de execução. Consideramos
o uso do comando PING através do protocolo ICMP um modo de invocação, em que é
aplicado para avaliar atributos de QoS relacionados a elementos da rede de comunicação.
4. Estudos de caso
Apresentamos dois estudos de caso executados e analisados para avaliar nossa solução de
forma qualitativa.
4.1. Estudo de caso 1 - Cenário Controlado
O estudo de caso avalia a viabilidade da solução ao demonstrar a capacidade em identificar
a degradação de um atributo de QoS em uma composição de serviços. No qual, têmse diversos provedores com diversos recursos (e.g. serviços, servidores, rede) passíveis
de falhas. Quando uma falha ocorre, implica em degradar um dado atributo de QoS.
Como há muitos recursos, torna-se difícil monitorar todos de forma efetiva. Neste caso,
uma família de monitores é gerada para atender ao cenário que exemplificamos aqui. Ao
final da monitoração, os dados podem ser cruzados para saber quem é o responsável pela
degradação, tanto em nível de provedores, quanto em nível de recursos.
A Figura 3 apresenta a identificação de cada monitor no cenário por meio de círculos enumerados. Note na Figura 3 que cada monitor foi gerado pela LPS a atender de
1
2
TCPDUMP, http://www.tcpdump.org/
Overview of SAAJ - http://docs.oracle.com/javaee/5/tutorial/doc/bnbhg.html
104
forma específica e foi inserido de forma estratégica a coletar valores de atributos de QoS
dos alvos. Os monitores gerados são para monitorar 1) Disponibilidade; 2) Desempenho;
3) Rede (incluindo gateways, roteadores e IP Público Google); 4) Servidor; 5) Confiabilidade e 6) Aplicação Servidora. Em resumo, foi gerado uma família de produtos baseado
em LPS (e.g. monitorarDisponibilidade.jar, monitorarDesempenho.jar, etc).
Figura 3. Cenário controlado composto por provedores, consumidores e os monitores representados por círculos enumerados sobre os alvos
A execução dos seis monitores, de forma a atender ao cenário definido na Figura 3,
durou 14 horas de monitoração. Após vários testes com diferentes intervalos, conseguimos encaixar nestas 14 horas, a execução de quatro intervenções no cenário controlado.
Intervenções estas, como por exemplo, na primeira hora, desligar interface de rede do
Provedor Master, as demais intervenções foram desligar serviços e recursos. Estas intervenções foram executadas em intervalos de tempos estrategicamente definidos para gerar
anomalias no cenário e capturar informações sobre a qualidade dos dados gerados provenientes da monitoração.
A partir dos valores entregues para cada atributo de QoS, foi possível cruzar dados
de diferentes monitores e identificar de onde de fato ocorria a degradação de um atributo
de QoS. O gráfico na Figura 4 apresenta um dos cruzamentos obtidos.
Por um lado, no caso apresentado na Figura 4, mesmo que o Provedor Master
inconsciente da indisponibilidade proveniente de uma intervenção na primeira hora, a
monitoração do Provedor A sobre o Provedor Master e sobre o IP Público do Google
iria garantir a precisão do diagnóstico oferecido pela solução de monitoração. Por outro,
mesmo que o Provedor Master consciente da degradação do atributo e queira eventualmente se isentar desta, o Provedor A apontaria o Provedor Master como causador da
degradação. Os resultados obtidos foram do monitor de rede apontado para IP Público
Google (Monitor 3 da Figura 3) e monitor de disponibilidade do serviço do Provedor
Master (Monitor 2 da Figura 3).
105
Figura 4. Gráfico representando o resultado da intervenção aplicada sobre o
cenário ontrolado
4.2. Estudo de caso 2 - Cenário de Injeção de falhas
Este segundo estudo de caso foi aplicado a FlexMonitorWS para injetar falhas e identificar
vulnerabilidades em serviços Web. O principal objetivo deste estudo foi determinar a viabilidade da FlexMonitorWS como solução de injeção de falhas para apoiar a monitoração
do atributo de QoS de robustez.
Executamos scripts criados de forma automática para ataques do tipo Malformed XML, XML Bomb injection e requisições duplicadas. Analisamos os resultados e
a FlexMonitorWS apontou vulnerabilidades para os serviços Web públicos, ao inserir
um script do tipo Malformed XML repetindo-se tags após uma tag existente (e.g. <param1>value</param1><fault><fault1></fault></fault1>).
A primeira vulnerabilidade identificada é sobre a inserção de uma cadeia de caracteres após um valor válido, de acordo com o relatório apontado na Figura 5. Para este
caso específico, o número de cartão de crédito inserido é válido, porém, ao acrescentar-se
caracteres especiais ao final do valor, ele se torna inválido, criando-se uma requisição maliciosa correspondente a uma injeção de falha. Observe na Figura 5, valores destacados
apontam o diagnóstico oferecido pela solução.
Figura 5. Relatório de execução dos scripts e identificação de vulnerabilidades
no serviço Web público
106
5. Trabalhos relacionados
A Tabela 1 apresenta uma comparação da ferramenta FlexMonitorWS com estudos identificados durante uma revisão das soluções existentes na literatura como [4, 2, 5, 1, 6, 7].
O primeiro item da Tabela 1 é a independência de plataforma de aplicação servidora, onde
muitas propostas estudadas como representado na Tabela 1, executam do lado do servidor
vinculado a uma aplicação servidora (e.g. Tomcat), e somente monitoram os serviços que
estão dentro desta.
O nível de flexibilidade citado na Tabela 1 é aquele que permite adicionar e remover atributos de QoS, o maior nível de flexibilidade seria aquele que permitira adicionar
e remover qualquer atributo. Contudo, nem todos atributos podem ser obtidos somente a
um modo de operação. É possível observar na Tabela 1 que a FlexMonitorWS é inovadora
ao usar LPS para apoiar uma família de ferramentas de monitoração. Percebemos que ela
apoia o nível de flexibilidade também com modos de operação e a atuar em um conjunto
de atributo de QoS.
Tabela 1. Comparativo dos trabalhos relacionados com a FlexMonitorWS
Vantagens
[4] [2] [5] [1] [6] [7] FlexMonitorWS
√
√
√
√
√
Independência de plataforma
da aplicação servidora
√
√
√
√
√
Existe nível de flexibilidade
com atributos de QoS
Número de Atributos de QoS
2
2
6
3
2
NA
7
verificados e validados
√
√
Existe nível de flexibilidade
com modos de operação
√
√
Monitora mais de um
alvo simultaneamente*
√
Utiliza abordagem LPS
√
Legenda: A proposta considera uma abordagem semelhante a vantagem
NA Nenhum atributo de QoS avaliado na abordagem *) Monitora serviços e
outros recursos simultaneamente
6. Conclusões
Este artigo apresenta a FlexMonitorWS uma solução de monitoração baseada em LPS
que apoia uma família de ferramentas de monitoração. Foram realizados estudos de casos em que a solução proposta é utilizada para monitorar atributos de QoS de serviços
Web. Os resultados obtidos apontam a viabilidade, aplicabilidade e flexibilidade da FlexMonitorWS. Os estudos de caso também sugerem a viabilidade da FlexMonitorWs para
monitorar robustez de serviços Web ao permitir a injeção de falhas e identificação das vulnerabilidades associadas. Como trabalhos futuros a FlexMonitorWS será estendida para:
1) Monitorar Serviços Web com protocolo REST; 2) Incluir protocolos de comunicação
TCP/ICMP no modelo de característica; 3) Monitoração autoadaptativa, onde a monitoração compreende e atua no ambiente para tomar decisões sobre o impacto da própria
monitoração.
107
Referências
[1] F. Souza, D. Lopes, K. Gama, N. S. Rosa, and R. Lima (2011). Dynamic event-based
monitoring in a soa environment. In OTM Conferences.
[2] B. Wetzstein, P. Leitner, F. Rosenberg, I. Brandic, S. Dustdar, F. Leymann. (2009).
Monitoring and Analyzing Influential Factors of Business Process Performance.
EDOC, 141-150.
[3] H. Gomaa. (2004) Designing Software Product Lines with UML: From Use Cases to
Pattern-Based Software Architectures. Addison Wesley Longman Publishing Co.,
Inc., Redwo o d City, CA, USA.
[4] C. Müller, M. Oriol, M. Rodríguez, X. Franch, J. Marco, M. Resinas, A. RuizCortés. (2012) SALMonADA: A platform for Monitoring and Explaining Violations
of WS-Agreement-compliant Documents, In Proceedings of the 4th International
Workshop on Principles of Engineering Service-Oriented Systems, PESOS 2012,
pp. 43-49, IEEE, June.
[5] N. Artaiam and T. Senivongse. (2008) Enhancing Service side QoS monitoring for
web services. In Proceedings of the 2008 Ninth ACIS International Conference on
Software Engineering. Washington, DC, USA, 2008. IEEE Computer Society.
[6] Q. Wang, J. Shao, F. Deng, Y. Liu, M. Li, J. Han, H. Mei, (2009) An Online Monitoring Approach for Web Service Requirements. IEEE Transactions on Services
Computing, vol. 2, no. 4, pp. 338-351, 2009
[7] L. Baresi, S. Guinea, M. Pistore, M. Trainotti. (2009) Dynamo + Astro: An Integrated
Approach for BPEL Monitoring. ICWS 2009: 230-237
[8] Clements, P. and Northrop, L. (2001) Software product lines: practices and patterns.
Addison Wesley Longman Publishing Co., Inc., Boston, MA, USA, 2001.
[9] G. H. Campbell. Renewing the product line vision. In Proceedings of the 2 008 12th
International Software Product Line Conference, Washington, DC, USA, 2008.
IEEE Computer Society
[10] J. V. Gurp, J. Bosch, and M. Svahnb erg. (2001) On the notion of variability in
software product lines. In Proceedings of the Working IEEE/IFIP Conference on
Software Architecture, WICSA ’01, Washington, DC, USA, 2001. IEEE Computer
Society.
[11] Pohl, K.; Böckle, G.; Van Der Linden, F. Software product line engineering: Foundations, principles, and techniques. Berlin/Heidelberg: Springer, 2005.
[12] S. H. Chang and S. D. Kim. A variability modeling method for adaptable services
in service-oriented computing. In SPLC ’07: Proceedings of the 11th International
Software Product Line Conference, pages 261-268, Washington, DC, USA, 2007.
IEEE Computer Society.
108
A Code Smell Detection Tool for Compositional-based
Software Product Lines
Ramon Abilio1, Gustavo Vale2, Johnatan Oliveira2, Eduardo Figueiredo2, Heitor
Costa3
1
2
IT Department - Federal University of Lavras (UFLA), Lavras, MG, Brazil
Department of Computer Science - Federal University of Minas Gerais (UFMG)
3
Department of Computer Science - Federal University of Lavras (UFLA)
[email protected],
{gustavovale,johnatan,figueiredo}@dcc.ufmg.br, [email protected]
Abstract. Software systems have different properties that can be measured.
Developers may perform manual inspection or use software measure-based
detection strategies for evaluating software quality. Detection strategies may
be implemented in a computational tool and they perform detection faster. We
developed an Eclipse plug-in called VSD (Variability Smell Detection) to
measure and detect code smells in AHEAD-based Software Product Line.
https://www.youtube.com/watch?v=M8VybWpcNI8
1. Introduction
Despite of the extensive use of software measures, isolated values of measures are not
meaningful because they are too fine-grained. However, we can combine measures to
obtain measure-based detection strategies to detect code smells, for example. Detection
strategies are based on the combination of measures and thresholds using logical
operators (AND and OR) [Marinescu, 2004; Lanza; Marinescu, 2006]. The values of
thresholds can be represented with the labels: Low, Avg (average), and High, because
real values may be different depending on the context [Marinescu, 2004; Lanza;
Marinescu, 2006]. A measure-based detection strategy may be implemented in a
computational tool and used to detect code smells faster than manual inspections, which
are time-consuming.
Measure-based detection strategies have been used to localize code smells in
Object-Oriented (OO) [Lanza; Marinescu, 2006] and Aspect-Oriented (AO) [Figueiredo
et al., 2012] software, but they have not been applied to detect code smells in FeatureOriented (FO) software. Feature-oriented programming (FOP) is particularly useful in
applications where a large variety of similar objects is needed [Prehofer, 1997; Batory et
al., 2003], such as in the development of Software Product Lines (SPL). SPL is an
approach to design and implementation of software systems that share common
properties and differ themselves by some features to meet needs of the market or of the
specific clients [Pohl et al., 2005].
A feature may be defined as a prominent or distinctive user-visible aspect,
quality, or characteristic of software [Kang et al., 1990] and can be implemented with
different approaches, such as Compositional and Annotative. In the Compositional
approach, features are implemented in separated artifacts, using FOP [Batory et al.,
2003] or Aspect-Oriented Programming (AOP) [Kiczales et al., 1997]. AHEAD is a
compositional approach based on gradual refinements, in which programs are defined as
109
constants and features are added using refinement functions [Batory et al., 2003].
Besides, classes implement the basic functions of a system (constants) and extensions,
variations, and adaptations in these functions constitute the features (refinements).
Features are implemented in modules syntactically independent of the classes and can
insert or modify methods and attributes [Batory et al., 2003].
The definition of code smells and its detection strategies for OO and AO address
mechanisms of those techniques, such as classes, methods, aspects, and pointcuts
[Fowler et al., 1999; Lanza; Marinescu, 2006; Macia et al., 2010]. In SPL, there are
smells which indicate potentially inadequate feature modeling or implementation. To
emphasize the difference in the focus on variability, Apel et al. (2013) called them
variability smells. A variability smell is a perceivable property of a SPL that is an
indicator of an undesired code property. It may be related to all kinds of artifacts in a
SPL, including feature models, domain artifacts, feature selections, and products.
Focusing on the variability smells related to implementation of features, Abilio
(2014) adapted three traditional code smells - God Method, God Class, and Shotgun
Surgery - to address specific characteristics of compositional-based SPL. The adaptation
of the code smells was based on literature [Fowler et al., 1999; Lanza; Marinescu, 2006;
Macia et al., 2010] and on analysis of a set of AHEAD-based SPLs. This adaptation was
necessary because the traditional code smells do not address mechanisms of FOP, such
as constants and refinements. Besides, detection strategies for those code smells were
defined [Abilio, 2014].
Abilio (2014) described the code smells and detection strategies using the
structure presented by Lanza and Marinescu (2006). Proposed measures to address
specific mechanisms of FOP [Abilio, 2014] and measures indicated as useful to detect the
traditional code smells [Lanza; Marinescu, 2006; Padilha et al., 2013; Padilha et al.,
2014] were used for filtering. Low, Avg, and High values were calculated from a set of
SPLs (not products). To measure source code of an AHEAD-based SPL and detect the
proposed code smells, we developed the Variability Smell Detection1 (VSD) tool as an
Eclipse plug-in.
Software measures are key means for assessing software modularity and
detecting design flaws [Blonski et al., 2013; Lanza; Marinescu, 2006; Marinescu, 2004].
The community of software measures has traditionally explored quantifiable module
properties, such as class coupling, cohesion, and interface size, in order to identify code
smells in software systems [Lanza; Marinescu, 2006; Marinescu, 2004]. For instance,
Marinescu (2004) relies on traditional measures to detect code smells in object-oriented
systems. Following a similar trend, Blonski et al. (2013) propose a tool, called
ConcernMeBS, to detect code smells based on concern measures. However, as far as we
are concerned, there is not tool to measure the source code and to detect code smell in
AHEAD-based SPLs.
Therefore, the main goal of this work is presenting VSD by showing a high-level
view of its architecture, the implemented measures and the detection strategies, and
presenting its main functions (Section 2). Section 3 presents an example of the VSD use
and discusses the results of a preliminary evaluation. Finally, Section 5 concludes this
paper and suggests future work.
1 VSD source code is available in <https://code.google.com/p/vsdtool/>
110
2. Variability Smell Detection Tool (VSD)
This section presents VSD, a tool to detect code smells in AHEAD-based SPL. Section
2.1 presents the VSD architecture highlighting the main components and their
interaction. The implemented measures and an example of the detection strategies are
summarized in Section 2.2, and the main functions are detailed.
To illustrate VSD use, we measured and detected code smells in TankWar SPL
[Schulze et al., 2012]. This SPL is a game developed by students of the University of
Magdeburg (German). It has medium size (~5,000 LOC), has 37 features (31 concrete
features), and runs on PC and mobile phones [Schulze et al., 2012]. This SPL was
chosen due to its size (lines of code and number of features) and because it was used in
other studies [Schulze; Apel; Kastner, 2010; Apel; Beyer, 2011; Schulze et al., 2012].
2.1. VSD Architecture
Figure 1 presents a high level view of VSD architecture. We used Eclipse IDE 4.3
(Kepler) and FeatureIDE to develop VSD. FeatureIDE is an Eclipse-based IDE that
supports feature-oriented software development for the development of SPLs [Thüm et
al., 2014]. It integrates different SPL implementation techniques, such as FOP, AOP, and
Preprocessors, and provides resources to deal with AHEAD projects.
VSD has 3,5 KLoC and its classes were distributed in three packages illustrated
in Figure 1: i) Detection Strategies: contains classes that implement the detection
strategies; ii) Measurement: contains measures and classes that perform the
measurement; and iii) Plugin: contains classes responsible to interact with Eclipse.
Figure 1 – High Level View of VSD Architecture
2.2. Implemented Measures and Detection Strategies
To measure an AHEAD-based SPL source code, VSD implements traditional, OO, and
FO measures (Table 1), such as McCabe's Cyclomatic Complexity (Cyclo) [McCabe,
1976], Weighted Methods Per Class (WMC) [Chidamber; Kemerer, 1994], and Number
of Constants (NOCt) [Abilio, 2014], respectively. As AHEAD uses the programming
language Jak (a Java superset), traditional and OO measures assess properties of OO
source code and FO measures assess the properties added by the mechanisms that Jak
provides to implement features. We organized the measures in three groups: i) Method
group: measures related to individual properties of methods; ii) Component group:
measures for properties of components (classes, interfaces, constants, refinements); and
iii) SPL group: measures to assess properties of source code of an entire SPL.
VSD implements three detection strategies, one for each code smell. When the
user selects the option to detect some code smell, VSD performs a measurement,
executes the respective detection strategy based on obtained values, and shows the
111
methods/components found with the selected code smell symptom. For example, the
detection strategy for God Method is based on two main characteristics [Abilio, 2014]: i)
Methods that may concentrate responsibilities, represented by method overrides, i.e.,
complete override and refinements; and ii) Long and complex methods. To address these
characteristics, four measures were selected: NOOr, NMR, MLoC, and Cyclo. The
detection strategy is [Abilio, 2014]:
((NOOr + NMR) > HIGH) OR ((MLoC > AVG) AND ((Cyclo/MLoC) > HIGH))
Table 1 - VSD Measures
Groups
Method
Measures
Method's Lines of Code (MLoC)
Number of Method Refinements (NMR)
McCabe's Cyclomatic Complexity (Cyclo)
Number of Operation Overrides (NOOr)
Number of Parameters (NP)
Lines of Code (LoC)
Coupling between Objects (CBO)
Component Number of Methods (NOM)
SPL
Weighted Methods Per Class (WMC)
Number of Attributes (NOA)
Number of Constant Refinements (NCR)
Number of Components (NOC)
Number of Features (NOF)
Number of Constants (NOCt)
Total Lines of Code (TLoC)
Number of Refinements (NOR)
Total Number of Methods (TNOM)
Number of Refined Constants (NRC)
Total Number of Attributes (TNOA)
Total Number of Method Refinements (TNMR)
Total Cyclomatic Complexity (TCyclo)
Number of Refined Methods (NRM)
Total Coupling between Objects (TCBO)
Number of Overridden Operations (NOrO)
Total Number of Operation Overrides (TNOOr)
2.3. Functions
The potential users of VSD are software engineers that want to measure an AHEADbased SPL and detect some undesired behavior in the implementation of the SPL
features. They can access VSD via pop-up menu showed after right-clicking an AHEAD
project. The available options are Detect Shotgun Surgery, Detect God
Class, Detect God Method, and Measure.
After selecting an option, VSD shows the results in the respective view: VSD
Shotgun Surgery, VSD God Class, VSD God Method, and VSD
Measures. Figure 2 depicts VSD Measures view that presents the measurement
results in a tree view; thus, users can see the value per measure - e.g. Total of Method’s
Lines of Code is 3,910. Users can also expand each measure to check the value for a
method / component - e.g. malen() method, from the ExplodierenEffekt.jak
component and the explodieren feature, has 5 as Cyclomatic Complexity value. If
the user selects Save as CSV button (on top-right position), VSD saves the results in
three Comma-Separated Values (CSV) files - one file per group (Table 1) - in vsd-output
folder.
VSD God Method, VSD God Class, and VSD Shotgun Surgery
views are similar. Figure 3 depicts the VSD God Method view. Each view presents the
respective strategy with threshold values centralized on top, Save as CSV button at the
top-right corner, and one table. The table contains id, feature, component, and method
(only, VSD God Method) names, indication if component/method is a refinement (Y =
112
yes, N = no), and measures. For instance, the strategy ((NOOr+NMR) > 2.39) OR
((MLoC > 10.09) AND ((Cyclo/MLoC) > 0.24))is centralized on top of
Figure 3. Line #2 presents the Beschleunigung feature, Tank.jak component,
and toolKontroller() method. This method is a refinement (Y) and has: MLoC:
11, Cyclo: 6, NP: 0, NMR: 0, and NOOr: 0. If the user performs a double-click in a row,
VSD opens the respective component in a code editor. In addition, if the user selects
Save as CSV, result is saved in a CSV file whose identifier (name) is the code smell
name (e.g., godmethod.csv) in vsd-output folder.
Figure 2 - VSD Measure View
Figure 3 - VSD God Method View
The threshold values may vary depending on the selected (set of) project(s) (i.e.,
SPL). VSD has a preferences page as presented in Figure 4 - accessed via Window ->
Preferences -> VSD Preferences. In the VSD preference page, users can
define values for each measure in each strategy. The strategies with the acronym and
name of each measure are presented. When the user saves or applies changes, VSD
updates the values presented in the views and new values are used when the user
performs a new detection.
3. An Example of VSD Use
We configured VSD with default threshold values, to detect the proposed code smells,
based in empirical data of eight SPLs [Abilio, 2014]. Using those values, VSD found 64
methods with God Method symptoms in TankWar SPL. Table 2, Table 3, and Table 4
show samples of methods and components detected by VSD with code smell symptoms
in TankWar SPL.
113
Figure 4 - VSD Preference Page
Table 2 - Sample of Methods with SPL God Method
Feature
TankWar
Tools
Beschleunigung
Bombe
einfrieren
Feuerkraft
Component
Tank.jak
Tank.jak
Tank.jak
Tank.jak
Tank.jak
Tank.jak
Method
toolBehandeln(int)
toolBehandeln(int)
toolBehandeln(int)
toolBehandeln(int)
toolBehandeln(int)
toolBehandeln(int)
Refinement
N
N
Y
Y
Y
Y
MLoC Cyclo NMR
1
1
8
1
1
8
11
3
0
18
8
0
13
5
0
11
3
0
NOOr
1
0
0
0
0
0
Table 3 - Sample of Components with SPL God Class
Feature
Handy
PC
fuer_Handy
fuer_PC
Component
Maler.jak
Maler.jak
Maler.jak
Maler.jak
Interface Refinement
N
N
N
N
N
Y
N
Y
LoC
300
304
320
322
CBO
12
18
8
7
WMC
63
58
104
104
NCR
16
16
0
0
Table 4 - Sample of Components with SPL Shotgun Surgery
Feature
Handy
PC
Component
Maler.jak
Maler.jak
Interface
N
N
Refinement
N
N
NOM
32
31
NOA
10
14
CBO
12
18
NCR
16
16
The sample of methods (Table 2) addresses one method and its refinements. The
toolBehandeln(int) method from the Tank.jak component was implemented
as an empty method in the TankWar feature, which is the ‘root’ of the feature model.
This method was re-implemented (overridden) in Tools feature, and refined in the
Beschleunigung, Bombe, einfrieren, and Feuerkraft features. One
possible problem is: if one adds some code in the first method (TankWar feature), it will
be overridden and will not be used in the refinement chain because the second method
(Tools feature) completely overrides it. That is, the first method was not refined. The
other four refinements were detected with code smell because the density of branches
(Cyclo/MLoC) is higher than the threshold. In fact, these methods seem to be simple, but
the software engineer needs to pay attention to them.
The sample of components (Table 3) detected with SPL God Class shows that
Maler.jak components are very similar between Handy and PC features and between
fuer_handy and fuer_PC features. This occurred because Handy and PC are
114
alternative features and the selection of one of them implies the selection of only one
“fuer”. We identified three possible problems with Maler.jak. The first problem is
that we have duplicated code, which is also code smell. The second one is that we have
two constants and a large number of refinements adding behaviors that the developer
does not know to which constant until the build of the product. Finally, the third problem
is that the refinements are very complex and coupled to other components. Observing the
code of the Maler.jak (PC feature), we noted that this component is responsible for
screen, menus, behavior of menus and keys, and help items, for example. That is, this
component concentrates many responsibilities.
Table 4 shows two constants of TankWar SPL detected with SPL Shotgun
Surgery: Maler.jak from the Handy and PC features. These components were also
indicated with SPL God Class and were detected with SPL Shotgun Surgery because
they are coupled to many components and share many methods and attributes with many
refinements. By a manual inspection in the code of Maler.jak (PC feature) and its
refinements, we noticed that the refinements access directly the attributes of the constant,
i.e., they do not use setter and getter methods. Hence, a change in the attributes may be
propagated to the refinements. For example, helpItemErstellen() method
instantiates the protected attribute menu in PC and this method was refined six times to
add items to the menu, which is directly accessed into the refinements. That is, changes
in the refinements may occur if Maler.jak is refactored regarding menu.
4. Conclusion
Several tools have been developed to measure properties of software, and measuresbased strategies have been proposed to detect code smells in OO and AO software. We
developed Variability Smell Detection (VSD) tool to measure FO software and detect
specific code smells in AHEAD-based SPLs. We used VSD with eight SPLs of different
sizes, e.g. AHEAD-java with 16,719 lines of code; 963 components; and 838
refinements. In an empirical study, Abilio (2014) verified that the results of VSD
detection are in agreement with results of a manual inspection performed by specialists.
Therefore, it can save time in the analysis of methods and components, allows software
engineers to have a notion of the feature implementation, and allows the identification of
code smells. The first version of VSD only measures AHEAD-based SPL, but it can be
expanded to measure SPLs developed with other techniques, such as AspectJ and
FeatureHouse, and we only need to parse the code to VSD structure and adapt the
measures, if necessary. It can also implement strategies to detect variability smells related
to feature models, for example. Therefore, our future goal is to improve VSD perform
further empirical studies to evaluate it.
Acknowledges
This work was partially supported by Capes, CNPq (grant Universal 485907/2013-5)
and FAPEMIG (grants APQ-02532-12 and PPM-00382-14).
References
Abilio, R. (2014) “Detecting Code Smells in Software Product Lines”. Master’s thesis,
Federal University of Lavras (UFLA), 141p.
Apel, S.; Batory, D.; Kastner, C.; Saake, G. (2013) “Feature-Oriented Software Product
Lines: Concepts and Implementation”. Springer, 315p.
115
Apel, S.; Beyer, D. (2011) “Feature Cohesion in Software Product Lines: An
Exploratory Study”. In: International Conf. on Software Engineering, pp. 421-430.
Batory, D.; Sarvela, J.; Rauschmayer, A. (2003) “Scaling Step-Wise Refinement”. In:
25th International Conference on Software Engineering, pp. 187-197.
Blonski, H.; Padilha, J.; Barbosa, M.; Santana, D.; Figueiredo, E. (2013)
“ConcernMeBS: Metrics-based Detection of Code Smells”. In: Brazilian Conference
on Software (CBSoft), Tools Session. Brasilia, Brazil, 2013.
Chidamber, S.; Kemerer, C. (1994) “A Metrics Suite for Object Oriented Design”. IEEE
Transactions on Software Engineering, v. 20, n. 6, pp. 476-493.
Figueiredo, E.; Sant'Anna, C.; Garcia, A.; Lucena, C.et al. (2012) “Applying and
Evaluating Concern-Sensitive Design Heuristics”. Journal of Systems and Software,
v.85, n.2, pp. 227-243.
Fowler, M.; Beck, K.; Brant, J.; Opdyke, W.; Roberts, D. (1999) “Refactoring:
Improving the Design of Existing Code”. Addison Wesley, 464p.
Kang, K. C.; Cohen, S. G.; Hess, J. A.; Novak, W. E.; Peterson, A. S. (1990) “FeatureOriented Domain Analysis (FODA) Feasibility Study”, Technical Report, SEI.
Kiczales, G.; Lamping, J.; Mendhekar, A.; Maeda, C.; Lopes, C.; Loigtier, J.; Irwin, J.;
(1997) “Aspect-Oriented Programming”. In ECOOP’97, pp.220-242.
Lanza, M.; Marinescu, R. (2006) “Object-Oriented Metrics in Practice: Using Software
Metrics to Characterize, Evaluate, and Improve the Design of Object-Oriented
Systems”. Springer, 205p.
Macia, I.; Garcia, A.; Staa, A. von. (2010) “Defining and Applying Detection Strategies
for Aspect-Oriented Code Smells”. In: 24th Brazilian Symposium on Software
Engineering, pp. 60-69.
Marinescu, R. (2004) “Detection Strategies: Metrics-Based Rules for Detecting Design
Flaws”. In: International Conference on Software Maintenance, pp. 350-359.
McCabe, T. J. (1976) “A Complexity Measure”. IEEE Transactions on Software
Engineering, v. 2, n. 4, pp. 308-320.
Padilha, J.; Figueiredo, E.; Sant'Anna, C.; Garcia, A. (2013) “Detecting God Methods
with Concern Metrics: An Exploratory Study”. In: 7th Latin-American Workshop on
Aspect-Oriented Software Development, pp. 24-29.
Padilha, J.; Pereira, J.; Figueiredo, E.; Almeida, J.; Garcia, A.; Sant’Anna, C. (2014) “On
the Effectiveness of Concern Metrics to Detect Code Smells: An Empirical Study”.
In: International Conference on Advanced Information Systems Engineering.
Pohl, K.; Bockle, G.; Linden, F. J. van der. (2005) “Software Product Line Engineering:
Foundations, Principles, and Techniques”. Springer, 490p.
Prehofer, C. (1997) “Feature-Oriented Programming: A Fresh Look at Objects”. In:
European Conference of Object-Oriented Programming, pp.419-443.
Schulze, S.; Apel, S.; Kastner, C. (2010) “Code Clones in Feature-Oriented Software
Product Lines”. In: 9th International Conference on Generative Programming and
Component Engineering, pp. 103-112.
Schulze, S.; Thüm, T.; Kuhlemann, M.; Saake, G. (2012) “Variant-Preserving
Refactoring in Feature-Oriented Software Product Lines”. In: 6th Workshop on
Variability Modeling of Software-Intensive Systems, pp. 73-81.
Thüm, T.; Kästner, C.; Benduhn, F.; Meinicke, J.; Saake, G.; Leich, T. (2014)
“FeatureIDE: An Extensible Framework for Feature-Oriented Software
Development”. Science of Computer Programming, v.79, pp. 70-85.
116
AccTrace: Considerando Acessibilidade no Processo de
Desenvolvimento de Software
Rodrigo Gonçalves de Branco, Maria Istela Cagnin, Debora Maria Barroso Paiva
1
Faculdade de Computação
Universidade Federal de Mato Grosso do Sul (UFMS)
Campo Grande, MS – Brasil
{rodrigo.g.branco,istela,dmbpaiva}@gmail.com
Abstract. Software Development Processes which not consider accessibility in
their scope can deliver a product inaccessible as a result. In addition, developers may not have the skills to interpret and implement accessibility requirements. This paper presents the AccTrace tool, a CASE tool built as an Eclipse
plugin, which delivery to developer, through traceability of accessibility requirements and comments in the source code, useful information for the implementation of these requirements.
link to video: http://youtu.be/MBMAxcBB408
Resumo. Processos de Desenvolvimento de Software que não consideram Acessibilidade em seu escopo podem entregar um produto inacessı́vel como resultado. Além disso, os desenvolvedores podem não ter as habilidades necessárias
para interpretar e implementar requisitos de acessibilidade. Este trabalho apresenta a ferramenta AccTrace, uma ferramenta CASE construı́da como um plugin
do Eclipse, que entrega ao desenvolvedor, através da rastreabilidade dos requisitos de acessibilidade e comentários no código-fonte, informações úteis para a
implementação de tais requisitos.
1. Introdução
Fornecer softwares acessı́veis continua sendo um desafio nos dias atuais, com diversas
pesquisas na área [Lazar et al. 2004, Brajnik 2006, Parmanto e Zeng 2005]. Dentre as
dificuldades pertinentes ao problema, pode-se destacar a identificação dos requisitos de
acessibilidade e sua posterior propagação e rastreabilidade até a fase de construção do
produto. Enquanto existem propostas para integrar usabilidade e acessibilidade nos processos de Engenharia de Software, muitos desenvolvedores não sabem como implementar
tais produtos acessı́veis [Kavcic 2005, Alves 2011].
A utilização de ferramentas CASE nos processos de Engenharia de Software é
muito comum, e em geral, aumenta a produtividade dos desenvolvedores, já que elas automatizam algumas tarefas diminuindo o esforço e o tempo de construção da solução. Na
área de acessibilidade, é possı́vel encontrar diversas ferramentas especializadas, como frameworks, simuladores, validadores, entre outras [Fuertes et al. 2011, Bigham et al. 2010,
Votis et al. 2009, Masuwa-Morgan 2008].
Contudo, os desenvolvedores constantemente apontam diversos problemas nessas
ferramentas e geralmente estão insatisfeitos com o suporte fornecido pelas empresas que
as desenvolvem e comercializam [Trewin et al. 2010].
117
A maioria das ferramentas existentes neste contexto é utilizada quando o produto
está em fase de codificação. Por isso, seria interessante que os requisitos de rastreabilidade, assim que localizados, pudessem ser rastreados para identificar se estão sendo codificados corretamente. Há vários estudos relacionados à rastreabilidade de requisitos de
forma genérica [Ali et al. 2011, Gotel e Finkelstein 1994, Mader e Egyed 2012], porém,
poucos estudos estão relacionados aos requisitos de acessibilidade durante o processo de
desenvolvimento de software [Dias et al. 2010].
Este trabalho apresenta a AccTrace, uma ferramenta CASE desenvolvida como
um plugin do Eclipse, que permite acompanhar a evolução dos requisitos de acessibilidade até a fase de codificação, fornecendo informações relevantes ao desenvolvedor
para a construção do produto acessı́vel. A ferramenta foi construı́da utilizando o MTA
[Maia 2010], um processo de desenvolvimento de software baseado na ISO/IEC 12207
[ISO/IEC 1998], que inclui tarefas de acessibilidade.
Usando esta nova abordagem provida pela AccTrace, o relacionamento entre os requisitos de acessibilidade e os modelos UML (Unified Modeling Language
[Booch et al. 1996]), adicionados de informações importantes ao desenvolvedor, são
transformados em comentários no código-fonte, recuperados em tempo real e apresentados ao desenvolvedor. A ferramenta tem caráter experimental e acadêmico e é distribuı́da sob a licença Eclipse Public License V1.0 e pode ser baixada em https:
//github.com/rodrigogbranco/acctrace.
Este trabalho está organizado da seguinte forma: na Seção 2 são discutidas as
caracterı́sticas da ferramenta, principais funcionalidades e potenciais usuários. Na Seção
3 são discutidos os principais conceitos da arquitetura da ferramenta, componentes de
software e interfaces. Na Seção 4 são descritos ferramentas e trabalhos relacionados. Por
fim, na Seção 5 são discutidas as conclusões e trabalhos futuros.
2. Caracterı́sticas da Ferramenta
A principal funcionalidade da ferramenta AccTrace é promover a rastreabilidade dos requisitos de acessibilidade, partindo da engenharia de requisitos até a fase de codificação.
Construı́da como uma ferramenta CASE que atua como um plugin da IDE Eclipse, esta
ferramenta trabalha em conjunto com outros plugins para atingir seu objetivo.
2.1. Fundamentação Teórica
A AccTrace utilizou o Processo de Desenvolvimento de Software MTA [Maia 2010] para
definir o fluxo de trabalho da ferramenta, principalmente os subprocessos 4 (Análise de
Requisitos de Software), 5 (Projeto de Software) e 6 (Construção do Software). A Figura
1 apresenta o fluxo de trabalho e o esquema de rastreabilidade em alto nı́vel e a inclusão
dos requisitos de acessibilidade.
A ferramenta AccTrace, assim como o MTA, prevê que no projeto exista um papel
designado Especialista em Acessibilidade. A pessoa que assume este papel tem a responsabilidade de identificar os requisitos de acessibilidade e relacioná-los aos modelos e
técnicas de implementação dos mesmos.
2.2. Funcionalidades
De forma geral, a ferramenta AccTrace permite que (a) os relacionamentos entre os requisitos de acessibilidade, modelos UML e técnicas de implementação de acessibilidade
118
Figura 1. Detalhamento dos Subprocessos do MTA [Maia 2010] para prover a
rastreabilidade dos requisitos de acessibilidade de acordo com a abordagem
adotada neste trabalho. O artefato com nome em negrito refere-se ao artefato
gerado pela AccTrace
sejam gerenciados; (b) a matriz de rastreabilidade dos relacionamentos descritos seja criada; (c) o código-fonte receba comentários personalizados indicando tais relacionamentos
e (d) o desenvolvedor recupere informações sobre tais relacionamentos em tempo real.
2.3. Potenciais Usuários
O Especialista em Acessibilidade cadastrará as informações relativas ao relacionamento
dos requisitos de acessibilidade e modelos UML. Ele incluirá neste relacionamento
informações sobre as técnicas de implementação de acessibilidade diretamente na ferramenta AccTrace. Em seguida, os usuários que se beneficiam de artefatos como Documento de Requisitos e Modelos UML (Gerentes de Projeto, Engenheiros de Requisitos,
Analistas, Desenvolvedores, entre outros) podem usufruir destas informações, seja através
das Visões diretamente na ferramenta, seja na Matriz de Rastreabilidade ou, no caso dos
desenvolvedores, na visão especı́fica para recuperação de informações no código-fonte.
3. Arquitetura da Ferramenta
A AccTrace trabalha em conjunto com as seguintes ferramentas: (a) Eclipse como IDE;
(b) Requirement Designer - Plugin de Gerenciamento de Requisitos; (c) UML Designer Plugin de modelagem UML e (d) UML to Java Generator - Plugin de geração de código
Os três plugins descritos anteriormente são distribuı́dos pela empresa Obeo e foram escolhidos para trabalharem em conjunto com a AccTrace por serem interoperáveis.
Eles estão disponı́veis em http://marketplace.obeonetwork.com/. Além
disso, uma tecnologia essencial utilizada é uma Ontologia de Acessibilidade do Projeto
Aegis [Aegis 2013], que utiliza o documento de referência WCAG 2.0, disponibilizadas
no formato OWL, e mapeia as diretrizes e técnicas de implementação de acessibilidade.
A Figura 2 mostra o comportamento e o relacionamento das ferramentas, tecnologias e
atores envolvidos no fluxo de trabalho da AccTrace.
119
Figura 2. Associação das ferramentas e atores no contexto do trabalho
Não existe uma interface formal entre as ferramentas, e por esse motivo, não existe
uma linguagem que descreva a arquitetura. Os artefatos gerados em formato RDF pelos
plugins Requirement Designer e UML Designer são informados como entrada para a AccTrace. As ferramentas devem ser encaradas como uma caixa-preta, que recebem as entradas e produzem as saı́das. A AccTrace também usa um arquivo no formato RDF como
forma de persistência, que é usado como entrada para o plugin UML to Java Generator,
que foi adaptado para atender aos propósitos deste trabalho. O diagrama de classes da
AccTrace pode ser visto na Figura 3.
Figura 3. Diagrama de classes da ferramenta (AccTrace)
A classe principal é a AccTraceModel, que armazena as referências para os repositórios dos requisitos (Repository), e também objetos de referência às associações dos
requisitos, modelos e técnicas de implementação de acessibilidade (Reference). Esse objeto referencia um requisito (Requirement), um diagrama UML (EObject) e uma ou mais
técnicas de implementação de acessibilidade, representadas aqui pela seleção da ontologia disponı́vel. As referências à ontologia são persistidas através de sua IRI (Internationalized Resource Identifier) (uma generalização de URI Uniform Resource Identifier) em
forma de String. Além disso, devido a possibilidade de existência de inúmeros requisitos
no projeto e considerando o fato de que apenas os requisitos de acessibilidade sejam im120
portantes para a associação à técnicas de implementação de acessibilidade, é previsto no
modelo também a inclusão de filtros dos requisitos (RequirementFilter), para não poluir
a visualização dos requisitos na ferramenta.
A ferramenta possui três visões principais, de acordo com a Figura 4. No editor (AccTrace Editor - 2) é possı́vel alterar os repositórios dos requisitos e gerar as
associações entre os modelos UML, requisitos e técnicas de implementação. Na visão
dos requisitos (Requirement Associations - 1) é possı́vel visualizar quais requisitos associados ao modelo UML foram selecionados no editor. Na visão das técnicas já vinculadas
(Accessibility Specifications View - 3) é possı́vel visualizar as técnicas de implementação
já associadas, de acordo com o modelo UML selecionado no editor e o requisito de acessibilidade selecionado na visão dos requisitos. Além disso, é possı́vel remover as técnicas
de implementação associadas. As três visões são importantes para o correto funcionamento da ferramenta.
Figura 4. Visualização da ferramenta AccTrace na tela principal do Eclipse
Uma vez selecionado o modelo UML e o requisito, é possı́vel efetuar a associação
da técnica de implementação de acessibilidade, clicando com o botão direito do mouse
sobre o modelo UML, conforme ilustrado na Figura 5. As técnicas de implementação de
acessibilidade estão mapeadas na Ontologia fornecida pelo Projeto Aegis [Aegis 2013].
Esta ontologia é o repositório das técnicas de implementação de acessibilidade e mapeia
o domı́nio. Essas técnicas são ligadas aos requisitos e modelos UML, e suas informações
armazenadas no repositório são recuperadas na visão especı́fica.
Como os artefatos são descritos em formato RDF (requisitos, modelos UML e
a ontologia), os vı́nculos são feitos a partir do elemento RDF:ID. Na prática, qualquer
modelo UML que seja descrito em formato RDF pode ser vinculado e rastreado através
da matriz de rastreabilidade e das visões no Eclipse. Para os comentários no código
fonte, contudo, apenas terão reflexos os modelos UML que são recebidos como entrada
no plugin de geração de código, como diagrama de classes, por exemplo.
Depois que o relacionamento entre os requisitos, modelos UML e técnicas de
121
Figura 5. Procedimento para efetuar a associação da técnica de implementação
de acessibilidade
implementação de acessibilidade estiver definido, é possı́vel gerar a matriz de rastreabilidade de forma automática pela ferramenta AccTrace no formato ODS (Open Document
Sheet, equivalente à planilha do Microsoft Excel). Parte dessa matriz pode ser vista na
Figura 6, que mostra o relacionamento entre Requisitos e Modelos UML. A matriz deve
ser gerada utilizando o wizard especı́fico para esse fim, acessando na barra de tarefas as
opções File, New e em seguida Other. . . , selecionando a opção Traceability Matrix file
wizard.
Figura 6. Parte da matriz de rastreabilidade gerada pela ferramenta AccTrace
Para a geração de código, o plugin UML to Java Generator foi customizado para
receber como entrada, além dos modelos UML, o arquivo RDF da AccTrace. A medida
que o código-fonte é gerado, é verificado se existe um relacionamento de acessibilidade
para o modelo UML usado como base. Em caso positivo, um comentário personalizado é
gerado, através da seguinte expressão regular Java:
String regex =
"//!ACCTRACE!(/)?([ˆ/\\\\0#]+(/)?)+#([ˆ\\*\\*/])+";
A Figura 7 mostra um comentário baseado na expressão regular supracitada já
traduzida na View especı́fica para este fim, em que é possı́vel observar informações como
o requisito, o modelo UML e quais técnicas estão referenciados no comentário.
Uma prova de conceito da AccTrace pode ser encontrada em [Branco 2013].
122
Figura 7. Explicitação do comentário selecionado
4. Ferramentas e Trabalhos Relacionados
Já existem iniciativas, principalmente corporativas, que permitem agregar os requisitos
levantados aos artefatos do processo de desenvolvimento, por exemplo, construção de
relatórios de rastreabilidade usando os programas IBM Rational Software Architect, IBM
Rational RequisitePro e BIRT para WebSphere [Hovater 2008].
O software Enterprise Architect da empresa Sparx Systems permite utilizar diagramas de requisitos, que são extensões dos diagramas tradicionais da UML, permitindo
a rastreabilidade do modelo. Contudo, não foram encontrados na literatura trabalhos que
tratem especificamente da rastreabilidade dos requisitos de acessibilidade dentro do processo de desenvolvimento de software. Além disso, não foi encontrado como essas alternativas permitem, partindo do código fonte da solução, recuperar as informações de
rastreabilidade dos requisitos.
5. Conclusões
Este estudo mostrou ser possı́vel especificar, antes das fases de codificação e vinculadas
aos modelos e requisitos de acessibilidade, as técnicas de implementação que deverão ser
visualizadas pelos programadores. A utilização de a ontologia pré-definida do projeto Aegis [Aegis 2013] ajudou a alcançar tal objetivo, estendendo as técnicas de implementação
anteriormente ditas para abordagens, diretrizes, critérios de sucesso, etc. É esperado que
o produto final tenha melhor acessibilidade, já que as informações sobre a implementação
da mesma estarão disponı́veis durante o processo de desenvolvimento de software.
Como limitação, apenas requisitos de acessibilidade podem ser utilizados, pois o
domı́nio, em forma de ontologia, está mapeado desta forma. Nota-se que a ferramenta
AccTrace não recupera, a partir de códigos fontes arbitrários, as informações sobre a
rastreabilidade dos requisitos, modelos UML e técnicas de implementação, já que em
nı́vel de código, a informação necessária para a recuperação dos dados é o comentário
AccTrace personalizado, que não estará presente em tais projetos. A ferramenta também
exige o acompanhamento de um Especialista em Acessibilidade para que o registro das
informações de acessibilidade seja feito no momento correto, profissional este que pode
não estar disponı́vel na equipe de desenvolvimento do produto.
É possı́vel identificar algumas atividades para trabalhos futuros: (a) utilizar a AccTrace em um projeto real como estudo de caso; (b) melhorar a usabilidade da ferramenta,
melhorando as mensagens apresentadas e aproveitando o relacionamento da ontologia do
projeto Aegis; (c) estender o escopo deste trabalho, incluindo tarefas de testes e integração
do software e do sistema (Subtarefas 7, 8, 9 e 10 do MTA); (d) estender a matriz de rastreabilidade dos requisitos construı́da para incluir os casos de testes descritos no item
anterior; (e) implementar consultas e visualizações dos artefatos rastreáveis, recuperáveis
através dos vı́nculos realizados pelos elementos RDF:ID dos artefatos e (f) substituir as
123
strings de entidades HTML recuperadas da ontologia ao serem apresentadas nas visões,
resolvendo assim o problema dos caracteres desconhecidos apresentados na Figura 7.
Referências
Aegis (2013). Aegis ontology. http://www.aegis-project.eu/index.php?option=com_
content&view=article&id=107&Itemid=65. Acessado em Maio de 2014.
Ali, N., Gueheneuc, Y., e Antoniol, G. (2011). Trust-based requirements traceability. Em 19th IEEE ICPC,
páginas 111–120, Kingston, Ontario, Canadá.
Alves, D. D. (2011). Acessibilidade no desenvolvimento de software livre. Dissertação de Mestrado,
UFMS. 135 páginas.
Bigham, J. P., Brudvik, J. T., e Zhang, B. (2010). Accessibility by demonstration: enabling end users
to guide developers to web accessibility solutions. Em Proceedings of the 12th International ACM
SIGACCESS, ASSETS ’10, páginas 35–42, Nova York, NY, EUA. ACM.
Booch, G., Rumbaugh, J., e Jacobson, I. (1996). The unified modeling language : selections from oopsla’96.
Tutorial 37.
Brajnik, G. (2006). Web accessibility testing: When the method is the culprit. Em ICCHP, páginas 156–
163.
Branco, R. G. d. (2013). Acessibilidade nas fases de engenharia de requisitos, projeto e codificação de
software: Uma ferramenta de apoio. Dissertação de Mestrado, UFMS. 95 páginas.
Dias, A. L., de Mattos Fortes, R. P., Masiero, P. C., e Goularte, R. (2010). Uma revisão sistemática sobre a
inserção de acessibilidade nas fases de desenvolvimento da engenharia de software em sistemas web. Em
Proceedings of the IX Symposium on Human Factors in Computing Systems, IHC ’10, páginas 39–48,
Porto Alegre, Brasil. Sociedade Brasileira de Computação.
Fuertes, J. L., Gutiérrez, E., e Martı́nez, L. (2011). Developing hera-ffx for wcag 2.0. Em Proceedings of
the International Cross-Disciplinary Conference on Web Accessibility, W4A ’11, páginas 3:1–3:9, New
York, NY, USA. ACM.
Gotel, O. C. Z. e Finkelstein, A. C. W. (1994). An analysis of the requirements traceability problem.
Em Proceedings of the First International Conference on Requirements Engineering, páginas 94–101,
Colorado Springs, Colorado, Estados Unidos.
Hovater, S. (2008). Uml-requirements traceability using ibm rational requisitepro, ibm rational software architect, and birt, part 1: Reporting requirements. http://www.ibm.com/developerworks/
rational/tutorials/dw-r-umltracebirt/dw-r-umltracebirt-pdf.pdf. Acessado em Maio de 2015.
ISO/IEC (1998). ISO/IEC 12207 - Standard for Informational Technology - Software Lifecycle Processes.
ISO/IEC, 1, ch. de la Voie-Creuse - CP 56 - CH-1211 Geneva 20 - Switzerland.
Kavcic, A. (2005). Software accessibility: Recommendations and guidelines. Em The International Conference on Computer as a Tool, 2005. EUROCON 2005, volume 2, páginas 1024 –1027, Belgrado, Sérvia.
Lazar, J., Dudley-Sponaugle, A., e Greenidge, K.-D. (2004). Improving web accessibility: a study of
webmaster perceptions. Computers in Human Behavior, 20(2):269–288.
Mader, P. e Egyed, A. (2012). Assessing the effect of requirements traceability for software maintenance.
Em 28th IEEE ICSM, páginas 171–180, Trento, Itália.
Maia, L. S. (2010). Um processo para o desenvolvimento de aplicações web acessı́veis. Dissertação de
Mestrado, UFMS. 94 páginas.
Masuwa-Morgan, K. (2008). Introducing accessonto: Ontology for accessibility requirements specification.
Em First International Workshop on ONTORACT, páginas 33 –38.
Parmanto, B. e Zeng, X. (2005). Metric for web accessibility evaluation. JASIST, 56(13):1394–1404.
Trewin, S., Cragun, B., Swart, C., Brezin, J., e Richards, J. (2010). Accessibility challenges and tool features: an ibm web developer perspective. Em Proceedings of the 2010 International Cross Disciplinary
Conference on Web Accessibility (W4A), W4A ’10, páginas 1–10, New York, NY, USA. ACM.
Votis, K., Oikonomou, T., Korn, P., Tzovaras, D., e Likothanassis, S. (2009). A visual impaired simulator to
achieve embedded accessibility designs. Em IEEE International Conference on ICIS, volume 3, páginas
368 –372.
124
Spider-RM: Uma Ferramenta para Auxílio ao Gerenciamento
de Riscos em Projetos de Software
Heresson João Pampolha de Siqueira Mendes1, Bleno Wilson Franklin Vale da
Silva², Diego Oliveira Abreu², Diogo Adriel Lima Ferreira², Manoel Victor
Rodrigues Leite², Marcos Senna Benaion Leal², Sandro Ronaldo Bezerra Oliveira1,2
1
Programa de Pós-Graduação em Ciência da Computação (PPGCC) – Universidade
Federal do Pará (UFPA) - Rua Augusto Corrêa, 01 – Guamá – Belém - PA - Brasil
²Faculdade da Computação – Faculdade de Computação, Instituto de Ciência Exatas e
Naturais, Universidade Federal do Pará (UFPA)
{heresson, blenofvale, diegooabreu, cetlho, victor.ufpaa,
marcosbenaion}@gmail.com, [email protected]
Abstract. This paper presents a software tool named Spider-RM, a desktop
solution that supports risk management according to software quality models.
The main purpose of it is systematizing risks management best practices
reducing the execution time of tasks and enhancing learning by those involved.
Vídeo da Ferramenta. http://youtu.be/ZRYlWfG-NHM.
1. Introdução
Para se tornarem competitivas, as organizações desenvolvedoras de software devem
entregar produtos que satisfaçam as necessidades dos clientes de forma a garantir a
confiança e satisfação [SOFTEX 2012]. Consequentemente, qualidade é um atributo
indispensável a ser considerado em todo processo de produção do software
[KOSCIANSKI e SOARES 2007].
Além disso, o desenvolvimento de software possui um aspecto não repetível,
tornando esta atividade imprevisível [KOSCIANSKI e SOARES 2007]. A gerência de
riscos tem como objetivo tratar as incertezas de um projeto de forma pró-ativa, evitando
que se tornem problemas e prejudiquem a execução do projeto conforme planejado.
Alguns dos principais modelos de qualidade para projetos de software
apresentam boas práticas para o gerenciamento efetivo de riscos, entre estes modelos
podem-se citar o MR-MPS-SW [SOFTEX 2012], o PMBOK [PMI 2013], a ISO/IEC
12207 [ABNT 2009] e o CMMI-DEV [SEI 2010].
Geralmente, o gerenciamento de riscos é recomendado nos maiores níveis de
maturidade da organização, e devido à pouca experiência prática nesses níveis no Brasil
[SOFTEX 2014], torna-se importante uma ferramenta de software que sistematize e
agilize as tarefas deste processo, reduzindo assim custos e facilitando o aprendizado
O artigo está organizado da seguinte maneira: a Seção 2 apresenta a arquitetura
da ferramenta; a Seção 3 apresenta as principais funcionalidades; a Seção 4 apresenta
alguns aspectos sobre a implementação; a Seção 5 relata um estudo de validação; a
Seção 6 apresenta alguns trabalhos relacionados; e, finalmente, a Seção 7 apresenta as
conclusões e trabalhos futuros.
2. Arquitetura
A ferramenta foi construída para ser utilizada em modo desktop e a princípio dá suporte
a um usuário, Gerente ou Líder de Projeto. A escolha dessa abordagem deu-se para
125
atender às necessidades da empresa que serviu para a realização do Estudo de Caso.
Entretanto, sua arquitetura foi desenvolvida de modo a facilitar constantes evoluções,
podendo ser adaptada futuramente ao ambiente web e multiusuário.
A arquitetura do Spider-RM (Risks Management) foi baseada em uma
combinação entre a arquitetura em três camadas e o modelo MVC. Deste modo, os
eventos ocorridos são gerenciados por controladores, que são intermediários entre a
interface com o usuário e as entidades modeladas do banco de dados. Assim, o principal
ganho com essa abordagem é a facilidade de manutenção e adição de novos recursos que
podem surgir, como a mudança das interfaces com o usuário ou do banco de dados
nativo.
Para manter o código o mais legível possível, padronizar o entendimento da
equipe de desenvolvimento e reduzir custos de futuras manutenções, também foram
adotados os padrões de projetos Facade e DAO [Gamma et al. 2000], isolando a camada
de negócio das camadas de visualização e de persistência.
Há também uma integração da arquitetura entre ferramenta Spider-RM com o
Redmine [Redmine 2014], integrando as necessidades de dar visibilidade da gerência de
riscos aos demais integrantes da equipe do projeto.
3. Principais Funcionalidades
Nesta seção serão apresentadas as principais funcionalidades da ferramenta Spider-RM,
que fazem da mesma um diferencial em relação a outras ferramentas disponíveis..
3.1. Criação de Política Organizacional
A ferramenta permite o armazenamento de uma política organizacional no formato de
texto, ou também a opção de anexar um documento já existente na organização, para
futuras consultas. A Política Organizacional serve para definir as diretrizes quanto à
abrangência de aplicação do processo de Gerência de Riscos na organização em relação
à sua estrutura. A centralização de informações agiliza o planejamento e
monitoramento, reduzindo esforço na execução do projeto.
3.2. Gerenciamento de vários Projetos Simultâneos
A ferramenta foi desenvolvida para ser facilmente utilizada para gerenciar múltiplos
projetos, visualizando informações de cada um de forma detalhada ou comparada. De
forma agregada, também é permitido realizar a avaliação de projetos concluídos e a
análise de novas categorias de riscos identificadas durante um novo projeto. As
avaliações são importantes insumos para o registro de uma base histórica organizacional
e orientação em futuros projetos.
3.3. Criação de Plano de Gerência de Riscos
De forma semelhante à política organizacional, é permitido inserir textualmente, ou
anexar um documento existente de um plano de riscos de um projeto. Além disso, há
uma funcionalidade especial para inserção de marcos e pontos de controle, de forma que
a ferramenta cria tarefas relacionadas a estas datas importantes.
3.4. Gerenciamento da Estrutura Analítica de Riscos
Um recurso bastante utilizado no gerenciamento de riscos é a Estrutura Analítica de
Riscos (EAR) [PMI 2013], que acumula informações de categorias, auxiliando futura
126
análise e mitigação. A ferramenta fornece um modelo de EAR e permite a edição de
acordo com as necessidade da organização.
Cada projeto tem sua própria EAR independente da organizacional, podendo
conter novas categorias ou não conter categorias da EAR organizacional. Após
conclusão de um projeto, é permitida a institucionalização de novas categorias
identificadas.
3.5. Identificação e Análise de Riscos
A ferramenta permite identificar riscos e realizar análise detalhada, através de:
subcondições, que identificam ocorrência; registro de relações entre riscos identificados;
e cálculo do grau de severidade de um risco.
3.6. Registro e Monitoramento de Plano de Mitigação e Contingência
Para cada risco identificado, é permitido definir planos de mitigação e/ou contingência,
orientando o monitoramento durante o ciclo de vida do projeto. O plano de mitigação é
definido como uma tarefa em um determinado marco ou ponto de controle do projeto,
enquanto o plano de contingência apresenta-se como uma tarefa pendente após a
ocorrência de todas as subcondições de um risco.
3.7. Monitoramento de Riscos
A Spider-RM além de realizar o planejamento, provê total suporte ao monitoramento de
riscos, permitindo realizar funções de: acompanhamento de mudanças nos riscos de um
projeto; inclusão de novos riscos ocorridos durante a execução do projeto; escolha livre
de riscos a serem monitorados e priorizados; histórico de ocorrência dos riscos.
3.8. Gerenciamento de Tarefas
Boa parte do monitoramento funciona com o auxílio de tarefas. Os planos de mitigação
são definidos como tarefas em marcos ou pontos de controles, e caso não sejam
realizados, serão alertados como tarefas pendentes. Desta forma, é possível realizar um
controle de quais riscos precisam ser mitigados, apenas monitorados ou até mesmo
contingenciados.
4. Implementação
A ferramenta Spider-RM foi implementada utilizando a linguagem de programação
Java, sob a licença GPL – General Public License voltada especificamente para o
processo de Gerência de Riscos, aderente às boas práticas recomendadas pelos padrões
MR-MPS-SW, CMMI-DEV, ISO/IEC 12207 e PMBOK. Trata-se de um ambiente
Desktop e seu desenvolvimento foi pautado no uso de ferramentas de software livre, tais
como: Sistema Operacional Ubuntu 14.04, Netbeans IDE 7.4, banco de dados MySQL
5.6.14 tanto para persistência de dados quanto para comunicação entre o servidor local e
a ferramenta.
5. Estudo de Caso
Para avaliar a sistematização do processo de gerência de riscos, a ferramenta foi
utilizada no ambiente de uma empresa de desenvolvimento de software em Recife, que
certificou seus projetos no Nível 3 do CMMI, no qual os projetos tratam da evolução de
produtos com foco em processos de decisão.
127
A equipe do projeto piloto era composta por três Gerentes e uma equipe técnica
formada por quarenta e oito membros. Os gerentes possuem experiência com mais de
sete anos neste perfil, tendo certificações PMBoK e Scrum, sendo que dois deles
participaram da implantação do CMMI em outras organizações de Recife.
Inicialmente, foi criada uma estrutura analítica de riscos padrão da organização.
A ferramenta já indica uma sugestão, como apresentado na Figura 1, porém esta
sugestão foi adaptada à necessidade do usuário. Em seguida foi criado um novo projeto,
inserindo suas informações básicas, anexando um documento referente ao plano de
riscos e mapeando a estrutura analítica de riscos organizacional para as categorias de
riscos deste projeto.
Figura 1. Estrutura Analítica de Riscos Rrganizacional na Ferramenta Spider-RM
O segundo passo foi a inclusão de marcos e pontos de controle já previstos no
projeto piloto, para começar a identificar os riscos. A partir da identificação de riscos e
análise de probabilidade e impacto, demonstrada na Figura 2, a ferramenta detecta
automaticamente a prioridade, porém a ordem de prioridade é ajustada pelo usuário de
acordo com sua necessidade. Uma amostra dos riscos priorizados foi selecionada para
monitoramento durante execução do projeto nos marcos e pontos de controle, gerando
assim tarefas para esses dias, que serão identificadas como pendentes, caso não sejam
realizadas.
Para os riscos monitorados foram criados planos de mitigação e contingência.
Além disso, foram identificadas relações de causa-efeito com outros riscos e
subcondições a serem avaliadas para detectar ocorrência durante o monitoramento. Em
uma segunda iteração, foram identificados novos riscos, que foram inseridos na
ferramenta para serem analisados e realizar uma nova priorização de riscos.
Durante o monitoramento, foram executados planos de mitigação e uma nova
análise para identificar redução de probabilidade ou impacto do risco mitigado. Os
riscos que tiveram todas suas subcondições atingidas foram detectados como ocorridos,
acionando assim seu plano de contingência como uma nova tarefa pendente. A Figura 3
demonstra o monitoramento de um dos risco ocorrido.
128
Figura 2. Cadastro de Novos Riscos na Ferramenta Spider-RM
Figura 3. Monitoramento de Subcondições de Riscos na Ferramenta Spider-RM.
Após a conclusão do projeto, foi registrado na ferramenta sua avaliação,
contendo os pontos fortes, pontos fracos e oportunidades de melhoria identificados, de
forma a nortear futuros projetos, que terão seus riscos gerenciados. A Figura 4 apresenta
a tela de avaliação de projetos concluídos.
Após a conclusão do projeto, a equipe de especialistas que utilizou a ferramenta
evidenciou alguns resultados positivos, como: a visibilidade do monitoramento dos
riscos sobre as condições e tarefas para mitigação e contingência; o planejamento claro
dos riscos em marcos e pontos de controle do cronograma; uma base de conhecimentos
para todos os Gerentes sobre os riscos mantidos por projeto; definição de uma EAR para
129
projetos e organização; apoio à melhoria organizacional a partir do MPS.BR, CMMI e
ISO/IEC 12207.
Os três participantes do experimento também solicitaram ajustes em requisitos
não funcionais, como usabilidade portabilidade e manutenibilidade, que serão atendidos
nas próximas versões da Spider-RM. Na empresa do Estudo de Caso outros Gerentes
fizeram uso da Spider-RM, porém estes profissionais estavam em formação, o que nos
permitiu realizar mentoring sobre gestão de riscos a partir do uso da ferramenta e
percebemos uma clareza do entendimento desta área de processo.
Figura 4. Avaliação de Projetos Concluídos na Ferramenta Spider-RM
6. Trabalhos Relacionados
Dentre as ferramentas para Gerência de Riscos encontradas, foram utilizadas como base
comparativa as ferramentas TRIMS, CRAMM e RiskFree.
A ferramenta TRIMS foi desenvolvida pelo BMP Center of Excellence como
parte de uma suíte de produtos do PMWS (Program Manager’s WorkStation), sendo
propriedade do Departamento da Marinha do Governo dos Estados Unidos [TRIMS
2014].
Já a ferramenta CRAMM (CCTA Risk Analysis and Management Method) foi
desenvolvida pelo British CCTA (Central Communication and Telecommunication
Agency) do governo do Reino Unido [Yazar 2002].
Por fim, a ferramenta de Gerenciamento de Riscos RiskFree, desenvolvida na
PUC-RS. Tal software é baseado nas boas práticas do PMBOK e aderente ao modelo
CMMI [Knob 2006].
Os principais diferenciais da ferramenta Spider-RM aos trabalhos relacionados
são: o embasamento em modelos de qualidade de processo de software; a possibilidade
de definição e personalização da EAR; a priorização de riscos; a identificação de
condições e tarefas para o monitoramento dos riscos; e o compartilhamento de riscos
entre diferentes projetos. Para mais detalhes, o Quadro 1 compara funcionalidades entre
as ferramentas TRIMS, CRAMM, RiskFree e Spider-RM.
130
Foi realizado um agrupamento de funcionalidades em categorias semelhantes,
recebendo cada ferramenta a informação: "S", caso possua a funcionalidade; "P", caso
realize a funcionalidade parcialmente, ou seja, com restrições; "N", caso não exista tal
funcionalidade; e "D", caso a referência analisada sobre a ferramenta não identifique
esta informação.
Análise de
Riscos
Monitoração de
Riscos
Gerenciamento
de tarefas
SpiderRM
Planejamento
para a Gerência
de Riscos
Permite fácil acesso à Política Organizacional
Avaliação de Projeto Concluído
Avaliação de Categorias de Risco Adicionadas
Gerenciamento de vários Projetos
Simultaneamente
Inserção do Plano de Gerência de Riscos para
Fácil Acesso
Definir e Personalizar a Estrutura Analítica de
Riscos (Categorias de Risco)
Definir Marcos e Pontos de Controle
Inserção e Controle de Plano de Mitigação
Inserção e Controle de Plano de Contingência
Identificar Subcondições para Monitorar
Ocorrência dos Riscos
Identificação de Relações entre Riscos
Flexibilidade para Priorização de Riscos
Cálculo do Grau de Severidade de um Risco
Personalização do Grau de Severidade de um
Risco
Acompanhamento de Mudanças em Riscos
durante o Projeto
Identificação de Risco Ocorrido
Permite Escolha Livre dos Riscos a serem
Monitorados
Histórico de Ocorrência de Risco
Histórico de Ocorrência geral no Projeto
Histórico de Alterações nos Riscos
Apresentação de Tarefas Pendentes
Histórico de Tarefas Realizadas
Apresentação de Tarefas a serem Realizadas em
Ponto de Controle ou Marco do Projeto
RiskFree
Organizacional
Funcionalidades
CRAMM
Categorias de
Funcionalidades
TRIMS
Quadro 1. Comparativo entre as Principais Funcionalidades da Ferramenta Spider-RM
e os Trabalhos Relacionados.
N
N
N
N
N
N
N
P
N
S
S
S
S
D
S
S
N
N
S
S
S
D
N
S
N
S
N
N
P
N
N
S
S
S
S
S
P
N
N
S
N
N
S
N
N
S
N
N
S
S
S
S
P
N
N
S
S
P
S
S
S
D
S
S
N
N
N
S
S
N
N
P
N
N
D
N
D
D
D
D
N
N
S
S
S
S
S
S
N
N
N
S
7. Considerações Finais
O foco do trabalho foi realizar um estudo sobre o gerenciamento de riscos em modelos
de qualidade, para que fosse desenvolvida uma ferramenta de software que
sistematizasse as boas práticas destes modelos. A ferramenta foi utilizada em um projeto
de experimentação identificando possíveis ajustes.
131
A ferramenta Spider-RM propõe reduzir custos e agilizar a implementação do
processo de gerenciamento de riscos em organizações desenvolvedoras de software.
Desta forma, a organização será beneficiada como um todo, tendo melhor controle das
tarefas relacionadas a riscos. Também os gerentes com reduzida experiência relacionada
a riscos podem de maneira mais fácil implantar este processo em seus projetos, de forma
alinhada aos principais modelos de qualidade de processo de software.
Como trabalhos futuros, pretendem-se: (1) promover a utilização da ferramenta
em outros projetos reais de software, contemplando diferentes cenários no
desenvolvimento do software, principalmente em organizações que buscam a
certificação em modelos de qualidade; (2) evoluir a ferramenta para suporte a ambiente
Web e múltiplos usuários; (3) integrar com ferramentas de suporte à implementação de
outros processos de software, como gerência de projetos, gerência de requisitos, etc.
8. Agradecimentos
Este trabalho recebe o apoio financeiro da CAPES a partir da concessão de bolsa
institucional de mestrado ao PPGCC-UFPA, e SPIDER com a concessão de bolsas de
Iniciação Científica. Este projeto é parte do Projeto SPIDER-UFPA
(www.spider.ufpa.br) [OLIVEIRA et al. 2011].
Referências
ABNT - Associação Brasileira De Normas Técnicas (2009) “NBR ISO/IEC 12207:2009
- Engenharia de Sistemas de Software - Processos de Ciclo de Vida de Software”.
Gamma, E. et al. (2000) "Padrões de Projetos - Soluções reutilizáveis de software
orientado a objetos". Bookman.
Knob, F. et al. (2006) "RiskFree – Uma Ferramenta de Gerenciamento de Riscos
Baseada no PMBOK e Aderente ao CMMI". Anais do V Simpósio Brasileiro de
Qualidade de Software - SBQS, Vila Velha, ES.
Koscianski, A., Soares, M. S. (2007) "Qualidade de Software". São Paulo, Novatec, 2.
ed.
Oliveira, S. R. B. et al. (2011) "SPIDER – Uma Proposta de Solução Sistêmica de um
SUITE de Ferramentas de Software Livre de Apoio à Implementação do Modelo
MPS.BR". Revista do Programa Brasileiro da Qualidade e Produtividade em
Software, SEPIN-MCT. 2ª Edição. Brasília-DF.
PMI - Project Management Institute (2013) "A Guide to the Project Management Body
of Knowledge". Campus Boulevard, Newton Square, 5th Edition.
Redmine (2014) "Ferramenta Web Flexível para Gerenciamento de Projetos".
Disponivel em http://www.redmine.org/. Acesso em 01 de Agosto de 2014
SEI - Software Engineering Institute (2010) "Capability Maturity Model Integration
(CMMI) for Development". , Version 1.3, Carnegie Mellon, USA.
SOFTEX - Associação para Promoção da Excelência do Software Brasileiro (2012)
"Melhoria do Processo de Software Brasileiro (MPS.BR) - Guia Geral 2012". Brasil.
SOFTEX - Associação para Promoção da Excelência do Software Brasileiro (2014)
"Avaliações MPS-SW (Software) Publicadas (prazo de validade: 3 anos)".
Disponível
em
http://www.softex.br/wp-content/uploads/2013/07/2AvaliacoesMPSSW-Publicadas_29.JAN_.2014_5331.pdf. Acesso em 02 de fevereiro de 2014.
TRIMS (2014) “Ferramenta de gerenciamento de riscos”. Disponível em:
http://www.bmpcoe.org/pmws/trims.html. Acesso em 18 de maio de 2014.
Yazar, Z. (2002) "A qualitative risk analysis and management tool – CRAMM". SANS
Institute - GSEC, disponível em: http://citeseerx.ist.psu.edu/viewdoc/download?
doi=10.1.1.201.9538&rep=rep1&type=pdf . Último acesso em 18 de maio de 2014.
132
A Tool to Generate Natural Language Text from Business
Process Models
Raphael de Almeida Rodrigues1 , Leonardo Guerreiro Azevedo1,2 ,
Kate Revoredo1 , Henrik Leopold3
1
Graduate Program in Informatics (PPGI)
Federal University of the State of Rio de Janeiro (UNIRIO)
Av. Pasteur, 456 – Urca – Rio de Janeiro – RJ – Brazil – 22290-240
2
IBM Research - Brazil
Av. Pasteur 146 – Botafogo – Rio de Janeiro – RJ – Brazil – 22290–240
3
WU Vienna, Welthandelsplatz 1, 1020 Vienna, Austria
{raphael.rodrigues, azevedo, katerevoredo}@uniriotec.br,
Abstract. Today, many organizations extensively model their business processes
using standardized notations such as the Business Process Model and Notation
(BPMN). However, such notations are often very specific and are not necessarily
intuitive for domain experts. This paper addresses this problem by providing a
flexible technique for automatically transforming BPMN process models into
natural language text. In this way, people with none or limited knowledge of
process modeling are enabled to read and understand the information captured
by a process model. The presented version supports a transformation of English
as well as Portuguese models. However, the technique is flexible and extensions
for other languages can be implemented with reasonable effort.
* Tool’s presentation video: http://youtu.be/RQ3gisGmZiA
1. Introduction
Business process models provide an abstract graphical view on organizational procedures
by reducing the complex reality to a number of activities. By doing so, they help to foster
an understanding of the underlying organizational procedures, serve as process documentation, and represent an important asset for organizational redesign [Larman 2005].
In order to depict business processes, many companies use specific notations, such
as, BPMN [Ko et al. 2009], which was developed and standardized by the Object Management Group [OMG 2011]. While these notations are useful in many different scenarios, it still represents a challenge for many employees to fully understand the semantics of
a process model. If the reader is not familiar with the wide range concepts (e.g., gateways,
events, or actors), parts of the process may remain unclear. For example, domain experts
usually do not have the necessary skills to read the process models designed by business analysts [Dumas et al. 2013]. Training employees in understanding process models
is costly and can hardly be considered an option for the entire workforce of a company.
In this paper, we follow up on prior work [Leopold et al. 2012a,
Leopold et al. 2014] and present a tool that implements a language-independent (e.g.,
133
Portuguese, English) framework which is capable of generating natural language texts
from BPMN process models [Rodrigues 2013]1 . In order to demonstrate the capabilities
of our tool, we present a proof of concept implementation based on English process
models. It shows the texts generated by our tool fully describe the input process models
and, thus, it is possible to understand the business process models without being familiar
with the employed process modeling notation. Our tool has the potential to increase the
benefits that can be derived from process modeling as discussions based on text tend to be
more productive than discussions based on models [Castro et al. 2011]. Furthermore, it
may significantly increase the audience of process models as an understanding of process
models is no longer bound to the knowledge of a specific notation.
The remainder of this work is structured as follows. Section 2 presents the pipeline
concept of natural language generation systems. Section 3 presents the framework. Section 4 presents an example of natural language generation in English. Finally, Section 5
presents the conclusion and future work.
2. The Pipeline Approach for Text Generation
As pointed out by Reiter and Dale [Reiter and Dale 1997], many natural language generation systems follow a pipeline approach consisting of three main steps:
• Text Planning: The information is determined which is communicated in the
text. Furthermore, it is specified in which order this information will be conveyed.
• Sentence Planning: Specific words are chosen to express the information
determined in the preceding phase. If applicable, messages are aggregated and
pronouns are introduced in order to obtain variety.
• Sentence Realization: The messages are transformed into grammatically
correct sentences.
Figure 1. Tool’s execution process [Leopold et al. 2012a].
Figure 1 illustrates how we adapted the pipeline architecture for generating text
from process models. In total, it consists of six components:
• Linguistic Information Extraction: In this component, we use the
linguistic label analysis presented in [Leopold et al. 2012b] to decompose the differing formats of process model element labels. In this way, for instance, we are
able to decompose an activity label such as Inform customer about problem into
the action inform, the business object customer, and the addition about problem.
1
Available for download at: http://bsi.uniriotec.br/tcc/20130430Rodrigues.pdf
134
• Annotated RPST Generation: This component derives a tree representation (Refined Process Structure Tree - RPST [Vanhatalo et al. 2009]) from the
process model in order to provide a basis for a step-by-step process description.
• Text Structuring: After deriving the RPST, we annotate each element with
the linguistic information obtained in the previous phase.
• DSynT-Message Generation:
The message generation component
maps the annotated RPST elements to a list of intermediate messages.
More specifically, each sentence is stored as a Deep-Syntactic Tree
(DSynT) [Mel’čuk and Polguere 1987]. DSynT facilitates the manageable yet
comprehensive storage of the constituents of a sentence.
• Message Refinement: This component takes care of message aggregation,
referring expression generation (e.g,. replace the role analyst with he), and discourse marker insertion (e.g., afterwards or subsequently). The need for these
measures arises if the considered process contains long sequences of tasks.
• Surface Realization: This component transforms the intermediate messages into grammatically correct sentences. This is accomplished by systematically mapping the generated DSynT to the corresponding grammatical entities.
To the best of our knowledge, we are the first to propose a tool for generating natural language text from business process models that can be adapted to a
wide range of languages. In prior work, the base generation technique has been introduced [Leopold et al. 2012a, Leopold et al. 2014], but it does not support other languages than English. There are, however, tools that address the opposite direction, i.e.,
there are works on generating models (process models [Friedrich et al. 2011], ontologies
[Leão et al. 2013], and UML diagrams [Bajwa and Choudhary 2006]) from natural language text. While all these techniques address different challenges, they mainly differ
from our technique by using real natural language text as input. The main challenge for
generating text from process models is to adequately analyze the existing natural language fragments from the process model elements, and to organize the information from
the process model in a sequential fashion.
3. The NLG Tool
According to Pree, our tool can be classified as an application framework. “Application
frameworks consist of ready-to-use and semi-finished building blocks. The overall architecture is predefined as well. Producing specific applications usually means to adjust
building blocks to specific needs by overriding some methods in subclasses” [Pree 1994].
The tool architecture (Figure 2) is composed by several ready-to-use building
blocks (known as Frozen spots [Pree 1994]) and defines interfaces which must be implemented to support specific languages. Each interface represents a hot spot [Pree 1994],
because they are flexible to satisfy specific needs (in our case, generate text in a specific
language). The architecture’s frozen spots are represented by classes, while the hot spots
are represented by interfaces (elements stereotyped as interface).
The GeneralLanguageCommon package (Figure 2) is the generic (languageindependent) module. It includes the interfaces definitions, which must be implemented for a specific language in order to generate natural language text, i.e, it is
the Natural Language Generation core. It contains the necessary infrastructure to
135
work with the NLG pipeline process. It includes the data structures, and it knows
exactly which and when an object must be called to deal with a specific phase of
the pipeline. For example, regarding the Localization strategy (represented by the
classes of the Localization package), the module knows when to call the
LocalizationManager object to translate a specific message, retrieved from the
LocalizationMessages enumeration (keys) during the text information extraction.
E.g, for the key PROCESS BEGIN WHEN, the returned text would be “O processo
começa quando” for Portuguese. Analogously, it knows when to trigger each interface
method implemented for a given language at runtime.
Figure 2. Tool Architecture - UML Package diagram.
Figure 3 presents a package diagram of the hot spots implementations for
Portuguese and English. They are named as Realizer since they realize the implementations of GeneralLanguageCommon package interfaces. Each language has
its own specific implementation (e.g, PortugueseLabelHelper class implements
ILabelHelper). So, PortugueseRealizer and EnglishRealizer classes
implement all architecture hot spots and use frozen spots to accomplish necessary tasks.
136
Figure 3. Implementation of the hot spots defined by the architecture.
With the definition of the NLG core module, the developer does not need to know
in detail how the NLG process works. The module assures that a natural language text
will be produced for the given language, as long as interfaces and their methods are implemented according to the specification. The components of this model are described as
follows.
• LabelAnalysis: extracts linguistic information from process model labels.
Interfaces defined in this package (hot spots) must be implemented for each supported language, i.e, all the linguistic classification algorithms must be implemented for each language. For example, algorithms to identify that assess is the
verb in the label application assessment.
• Fragments: represents sentences in natural language patterns. The classes defined in this package are frozen spots.
• DSynt: maps the information from the process into DSynT trees. Classes defined
in this package are frozen spots.
• Localization: define the logic needed to accesses specific language dictionaries. Besides, it has the common functionality for fetching the translation for a
given word. For example, the LocalizationManager class is used to fetch
messages from the dictionary, which will be used in the final text representation.
• ISurfaceRealizer: defines the contract that all the language-specific realizers must implement to produce messages in a natural language format. In a
nutshell, the implementation of this interface for a specific language must be able
to use a given DSynT tree to read the textual information from the nodes, and to
assemble a grammatically correct sentence.
• LanguageConfig in the MultiLanguageProject package: is a Factory2
that creates objects from the classes that implement language-specific logic interfaces for a given language (e.g, Portuguese or English).
4. Example of Generation of Natural Language Text
As an example of the a use of our tool, consider a scenario where a business expert wants
to validate the process model written in BPMN by a Business Process Analyst. When
the business expert looks at the model (depicted in Figure 4) he realizes that he never
2
A factory is a program component which main responsibility is the creation of other objects
137
used the notation before and cannot understand what the analyst tried to express with the
process model. Instead of trying to learn the BPMN notation, the business expert runs
the tool using the given process model as input 3 . Then, it outputs a natural language text
(depicted in Figure 5).
With the natural language representation of the model, the business expert reads
the text and realizes that the analyst did not model guest age verification activity before
processing an alcoholic beverage order. He decides to engage with a discussion with the
system analyst. During the discussion, both realize that this information is really missing
and the expert asks the analyst to add it in the model. The business expert was able
to identity a flaw in the model, only by reading the textual representation in a natural
language format.
Figure 4. Business Process Model sample using the BPMN notation.
5. Conclusion
Process models are frequently used in various organizations for understanding, documenting, and visualizing performed tasks. Through the approach of Generating Natural Language text (NLG), we enable non-technical users to understand process models without
understanding the process model notation that was used for designing the models.
This paper presented a tool that builds on NLG techniques to generate natural
language text from BPMN process models. The tool’s architecture is flexible to support
other languages (e.g., Spanish or German). Currently, the tool coverages several elements
3
Samples of business process models in Portuguese are presented in Appendix B of the document available at http://www.uniriotec.br/ãzevedo/bpm2nlg/Rebuttal_Apendix.pdf.
138
Figure 5. Tool’s output: Natural language text generated from the BPMN process
model.
of BPMN notation4 . The running example demonstrates texts generated correctly. Those
texts can be used to align business experts knowledge with system analysts. The tool was
implemented using the Java language and it is composed by:
• 207 Java classes
• 4 Java projects
• 62 Java Libs, including WordNet (English corpus) and RealPro (English Sentence
realization)
• Portuguese corpus: 22,821 nouns; 32 adverbs and conjunctions; 230,637 verbs;
39 prepositions; 50 pronouns; 63,344 adjectives; and, 8 articles. Mostly gather
from Floresta corpus [Afonso et al. 2002].
As future work, we suggest use the tool in a real business scenario. By presenting
the generated text to nontechnical users, we can learn about the usefulness of our tool. In
particular, questionnaires could be applied to evaluate if the texts generated from the models were sufficient to understand the process. We further suggest adding new languages
to the tool. For new languages, it is necessary to implement the operations defined in the
specific interfaces (package LabelAnalysis). Suitable languages for that are, among
others, German and Spanish.
References
Afonso, S., Bick, E., Haber, R., and Santos, D. (2002). Floresta sintá (c) tica: A treebank
for portuguese. In LREC.
Bajwa, I. S. and Choudhary, M. A. (2006). Natural language processing based automated
system for uml diagrams generation. In The 18th Saudi National Computer Conf. on
computer science (NCC18). Riyadh, Saudi Arabia: The Saudi Computer Society (SCS).
4
The BPMN elements supported by the tool are presented in Appendix C of the document available at
http://www.uniriotec.br/ãzevedo/bpm2nlg/Rebuttal_Apendix.pdf
139
Castro, L., Baião, F., and Guizzardi, G. (2011). A semantic oriented method for conceptual data modeling in ontouml based on linguistic concepts. In Conceptual Modeling–
ER 2011, pages 486–494. Springer.
Dumas, M., La Rosa, M., Mendling, J., and Reijers, H. A. (2013). Fundamentals of
business process management. Springer.
Friedrich, F., Mendling, J., and Puhlmann, F. (2011). Process model generation from
natural language text. In Advanced Information Systems Engineering, pages 482–496.
Springer.
Ko, R. K., Lee, S. S., and Lee, E. W. (2009). Business process management (bpm)
standards: a survey. Business Process Management Journal, 15(5):744–791.
Larman, C. (2005). Applying UML and patterns: An Introduction to object-oriented
analysis and design and iterative development. Prentice-Hall.
Leão, F., Revoredo, K., and Baião, F. (2013). Learning well-founded ontologies through
word sense disambiguation. In Intelligent Systems (BRACIS), 2013 Brazilian Conference on, pages 195–200. IEEE.
Leopold, H., Mendling, J., and Polyvyanyy, A. (2012a). Generating natural language
texts from business process models. In Advanced Information Systems Engineering,
pages 64–79. Springer.
Leopold, H., Mendling, J., and Polyvyanyy, A. (2014). Supporting process model validation through natural language generation. IEEE Transactions on Software Engineering,
In press.
Leopold, H., Smirnov, S., and Mendling, J. (2012b). On the refactoring of activity labels
in business process models. Information Systems, 37(5):443–459.
Mel’čuk, I. A. and Polguere, A. (1987). A formal lexicon in the meaning-text theory:(or
how to do lexica with words). Computational linguistics, 13(3-4):261–275.
OMG (2011). Business process model and notation (bpmn) version 2.0. http://www.
bpmn.org/.
Pree, W. (1994). Meta patterns a means for capturing the essentials of reusable objectoriented design. In Object-oriented programming, pages 150–162. Springer.
Reiter, E. and Dale, R. (1997). Building applied natural language generation systems.
Natural Language Engineering 1.
Rodrigues, R. (2013). Um Framework Genérico para Geração de Texto em Linguagem
Natural a partir de Modelos de Processo de Negócio. Bachelor thesis in Information
Systems – Federal University of the State of Rio de Janeiro (UNIRIO).
Vanhatalo, J., Völzer, H., and Koehler, J. (2009). The refined process structure tree. Data
& Knowledge Engineering, 68(9):793–818.
140

ferramentas - Instituto de Computação

Transcrição

Documentos relacionados

É tempo de analisar e fazer uma completa reflexão sobre a vida

Questions 1-4 refer to Reading A Reader`s Digest, Sept. 2006

Science without borders reflection (PDF - 57KB)

Acesse o poema Lady Lazarus.

Juramento de Maimônides

Gerador de tatuagem

Interzone 245(Interzone #245)

Mother Road(Route 66 #1) by Dorothy Garlock

Por favor, não matem a cotovia Realizador

Pedro Campos Costa, Paula Sousa, Louis Mariette, Benoit

Adicione a grade de eventos Web Expo Forum no Mac

RiPLE-RE: A Requirements Engineering Process

RiPLE-TE: A Software Product Lines Testing