RiPLE-TE: A Software Product Lines Testing

Transcrição

Pós-Graduação em Ciência da Computação
“RiPLE-TE: A Software Product Lines
Testing Process”
By
Ivan do Carmo Machado
M.Sc. Dissertation
Universidade Federal de Pernambuco
[email protected]
www.cin.ufpe.br/~posgraduacao
RECIFE, AUGUST/2010
Universidade Federal de Pernambuco
Centro de Informática
Pós-graduação em Ciência da Computação
Ivan do Carmo Machado
“RiPLE-TE: A Software Product Lines Testing Process”
Trabalho apresentado ao Programa de Pós-graduação em
Ciência da Computação do Centro de Informática da Universidade Federal de Pernambuco como requisito parcial para
obtenção do grau de Mestre em Ciência da Computação.
A M.Sc. Dissertation presented to the Federal University of
Pernambuco in partial fulfillment of the requirements for the
degree of M.Sc. in Computer Science.
Advisor: Silvio Romero de Lemos Meira
Co-Advisor: Eduardo Santana de Almeida
RECIFE, AUGUST/2010
Machado, Ivan do Carmo
RiPLE-TE: a software product lines testing process / Ivan
do Carmo Machado. - Recife: O Autor, 2010.
xiii, 163 folhas : il., fig., tab.
Dissertação
(mestrado)
Universidade
Federal
Pernambuco. CIN. Ciência da Computação, 2010.
de
Inclui bibliografia, anexo e apêndice.
1. Ciência da Computação. 2. Engenharia de software. 3.
Reuso de software. I. Título.
004
CDD (22. ed.)
MEI2010 – 0123
To my beloved family.
Acknowledgements
First and foremost, I would like to thank my greatest teacher of all: God, for providing me this
opportunity and granting me the capability to proceed successfully. I could never have done this
without the faith I have in you.
I would like to gratefully acknowledge the supervision of Dr. Eduardo Almeida during this
work. He provided me with many helpful suggestions and encouragement during the course
of this work as well as the challenging research that lies behind it. I also wish to express my
appreciation to Dr. Silvio Meira, for accepting me as M.Sc. Student. Special Gratitude goes to
the rest of the teaching staff of the CIn/UFPE, for their valuable classes.
My sincere thanks are extended to Dr. Manoel Mendonça, from DMCC/UFBA, and his
students, for their help during the execution of the experimental study. I would like to thank the
Brazilian National Research Council (CNPq). Without their grant, this M.Sc. would not have
been possible.
I cordially thank my friends and colleagues that I have met during my journey in Recife. I
thank my housemates Bruno, Jonatas, Iuri and Leandro for their good friendship and for being
the surrogate family during the years I stayed there. I want to express my deep thanks to my
colleague Paulo Anselmo for taking intense academic interest in this study as well as providing
valuable suggestions that improved the quality of this study. My cordial thanks to Danuza,
Ednaldo, Flavio, Heberth, Hernan, Marcela, Ricardo, Thiago, Vanilson and Vinicius, for the
insightful discussions, offering valuable advice and support. Postgraduates of the RiSE Research
Group are thanked for numerous stimulating discussions. Saturdays will not be the same any
longer!
I would like to express my heartiest thanks to Edna Telma for her understanding, endless
patience and encouragement when it was most required, and for never letting me feel that I was
away from my family. My cordial thanks to her family members for their kindness and affection,
specially to Francisco Joaquim, for supporting me during these years I lived in Recife.
I also want to thanks to Benedita (Dio) for opening her wonderful home to me and making
me feel so comfortable there. She provided me a perfect place to finish my work while I was in
Salvador.
I cannot finish without thanking my family. I am forever indebted to my parents, Serafim
and Joselice, my brother Ivo and his wife, Flavia, for their material and spiritual support in all
aspects of my life. I love them and appreciate the efforts they have put into giving me the life I
have now. Thank you so much!
iv
Agradecimentos
Em primeiro lugar, agradeço ao meu maior mestre de todos: Deus, por dar vida a esta oportunidade e conceder-me a capacidade de continuar a minha jornada com êxito. Meu Pai, eu não
teria realizado este sonho sem a fé que eu tenho em Ti.
Gostaria de agradecer a orientação do Professor Eduardo Almeida. Muitas sugestões úteis e
incentivo durante o decorrer deste trabalho, que mostrou-se bastante desafiador, a mim foram
concedidos. Expressar também o meu agradecimento ao Professor Silvio Meira, por ter me
aceito como Estudante de Mestrado em uma das melhores instituições de ensino e pesquisa do
Brasil. Agradeço ainda ao demais docentes do CIn/UFPE com quem tive a oportunidade de
trabalhar, por suas valiosas aulas e discussões.
Meus sinceros agradecimentos ao Prof. Dr. Manoel Mendonça, do DMCC/UFBA, e aos
seus alunos, por sua ajuda durante a execução do estudo experimental. Gostaria de agradecer ao
Conselho Nacional de Pesquisa (CNPq), sem o apoio financeiro concedido por esta agência, não
seria possível realizar este trabalho.
Agradeço de coração a meus amigos e colegas que conheci durante a minha breve “viagem”
ao Recife. Agradeço aos colegas que dividiram comigo o espaço de moradia, Bruno, Jonatas,
Iuri e Leandro, e puderam propiciar um ambiente divertido durante os anos em que por lá
permaneci. Quero manifestar o meu agradecimento a Paulo Anselmo, por participar diretamente
do desenvolvimento deste projeto, por interesses transversais, bem como proporcionar valiosas
sugestões que objetivavam a melhoria na qualidade deste trabalho. Meu cordial agradecimento a
Danuza, Ednaldo, Flavio, Heberth, Hernan, Marcela, Ricardo, Thiago, Vanilson e Vinicius, que
têm suas pontinhas de contribuição nesta pesquisa. Aos demais membros do RiSE eu também
extendo os meus agradecimentos, por tantos momentos, especialmente os sábados, de valiosas
conversas!
Gostaria de expressar meus mais sinceros agradecimentos a minha namorada Edna Telma,
por sua compreensão, paciência e encorajamento quando a mim mais foi exigido, e por não
deixar-me sentir a ausência de minha família. Meu cordial agradecimento a seus familiares,
especialmente a Francisco Joaquim, por apoiar-me durante os anos do meu Mestrado.
Eu também quero agradecer a Benedita (Dio), por abrir as portas de sua maravilhosa casa,
por fazer-me sentir tão confortável. Ela me proporcionou o ambiente perfeito, em harmonia e
aconchego, para a conclusão do meu trabalho, enquanto morador da cidade de Salvador.
Não posso terminar sem agradecer a minha família. Fui, Sou e Serei eternamente grato aos
meus pais, Serafim e Joselice, meu irmão Ivo e sua esposa, Flávia, por seu apoio incondicional,
seja espiritual ou material, em todos os aspectos da minha vida. Saibam que é imenso o meu
amor por vocês! Aprecio e agradeço, de coração, os esforços que vocês têm feito para possibilitar
a vida que tenho agora! Sou orgulhoso por ter esta família! Muito Obrigado Meu Deus! Muito
obrigado!
v
We are built to conquer environment, solve problems, achieve goals, and we
find no real satisfaction or happiness in life without obstacles to conquer
and goals to achieve.
—MAXWELL MALTZ (1899-1975)
Resumo
Linhas de Produtos de Software (SPL) pode ser considerada uma estratégia eficiente para o
reuso de software. SPL oferece significativos benefícios econômicos para as empresas, tais
como redução de custos, melhoria da qualidade e, redução do tempo de entrega de produtos.
SPL baseia-se no reuso sistemático de artefatos, através da exploração de commonalities (pontos em comum), e o gerenciamento de variabilities (pontos de variação), entre os produtos,
desenvolvidos sob uma arquitetura comum.
Em SPL, atenção especial deve ser dada à qualidade dos artefatos produzidos. Em termos
de garantia de qualidade, enquanto que no desenvolvimento tradicional, um programa é dito
válido se pudermos garantir que ele irá funcionar corretamente, no contexto de SPL, para este
mesmo cenário ser garantido, é necessário ter confiança de que qualquer instância de produto irá
funcionar corretamente. Reforça-se, então, a atenção necessária mencionada anteriormente. Da
mesma forma, é maior também o esforço necessário para tratar aspectos de garantia de qualidade
em projetos de SPL. No entanto, a entrega de softwares com qualidade é fundamental e, talvez,
a principal prática a se adotar, para que seja possível experimentar os benefícios mencionados.
Neste contexto, a entrega de produtos com qualidade deve contar com o apoio de processos
bem definidos, para o estabelecimento e coordenação das atividades relacionadas. Assim, testes,
como ainda a técnica de garantia de qualidade mais conhecida e aplicada, exige uma atenção
especial, devido a sua característica conhecida de ser uma atividade deveras custosa. Testes em
SPL é complexo e oneroso, podendo tornar-se um gargalo em projetos de SPL.
Assim, esta dissertação descreve um processo para apoiar as atividades de testes em projetos
de SPL. Estabelecemos este processo com o objetivo de fornecer às organizações uma estratégia
de redução de esforço na condução de atividades de teste em projetos de SPL. O processo é
parte do projeto RiPLE, um esforço para a construção de um framework para SPL, que engloba
o conjunto das disciplinas que compõem o ciclo de vida de desenvolvimento. Nossa pesquisa
fundamenta-se em um systematic mapping study, realizado com o objetivo de fornecer a base
teórica sobre o tema de investigação, bem como identificar tópicos de pesquisa a explorar.
Esta dissertação apresenta ainda uma avaliação inicial da proposta, conduzida através da
realização de estudos experimentais, objetivando coletar e reunir evidências sobre a eficácia
da proposta, bem como compreender, na prática, como a atividade de testes em SPL pode ser
melhorada, no sentido de alcançar os benefícios e metas da SPL.
Palavras-Chave: Linhas de Produtos, Teste de Software, Reuso de Software, Processo de
Software, Estudo Experimental.
vii
Abstract
Software Product Lines (SPL) can be considered an efficient approach to intra-organizational
reuse of software. SPL delivers significant economic benefits for organizations, such as reduced
cost and improved quality and time-to-market. It is based upon the systematic reuse of artifacts, through exploiting commonalities and managing variabilities among products, that are
established under a common architecture.
In SPL, special attention regarding quality of produced artifacts is required. In terms of
quality assurance, whereas in single system development, a program is said validated if we have
confidence that it will operate correctly, in SPL it is required to have confidence that any derived
instance will operate correctly. It reinforces the attention aforementioned. Likewise, it also
increases the effort required to deal with quality assurance in SPL projects. However, providing
software with quality is fundamental and perhaps the major practice that organizations should
adopt in order to experience the real SPL benefits. Thus, to provide quality products in a SPL,
one should count on well-defined support processes, to establish and coordinate the activities.
Hence, testing, as still the most known and widely applied technique for quality assurance,
requires such a special attention, due to its characteristic of being a resource consuming activity.
Testing a SPL is complex and costly and often becomes a bottleneck in product line projects.
Thus, this dissertation describes a process for supporting testing activities in SPL projects.
We established this process intended to provide a way for organizations to save effort when
performing testing activities in a SPL environment. The proposed process is part of the RiPLE
project, an effort to build a framework for SPL development, that encompasses the whole set of
disciplines that comprise its life cycle. We based our research on a systematic mapping study,
conducted in order to provide background on the research topic as well as to identify rooms for
improvements.
In addition, this dissertation also presents an initial evaluation of our proposed process,
in which experimental studies were conducted aiming to gather evidences about the proposal
effectiveness, as well as to understand, by applying it in practice, how testing in software product
lines can be improved upon, towards achieving software product line goals.
Keywords: Software Testing, Software Product Lines, Software Reuse, Software Process,
Experimental Study.
viii
Table of Contents
List of Figures
xiv
List of Tables
xvi
Acronyms
xvii
1
2
Introduction
1.1 Motivation . . . . . . . . .
1.2 Scope . . . . . . . . . . .
1.2.1 Context . . . . . .
1.3 Out of Scope . . . . . . .
1.4 Statement of Contributions
1.5 Organization of this Thesis
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Foundations on Software Product Lines and Testing
2.1 Software Product Lines (SPL) . . . . . . . . . .
2.1.1 SPL Essential Activities . . . . . . . . .
Core asset development . . . . . . . . .
Product development . . . . . . . . . . .
Management . . . . . . . . . . . . . . .
2.2 Software Testing . . . . . . . . . . . . . . . . .
2.2.1 Testing Processes . . . . . . . . . . . . .
Activities of a Test Engineer . . . . . . .
Testing Artifacts . . . . . . . . . . . . .
2.2.2 Testing Process Models . . . . . . . . . .
Testing Levels . . . . . . . . . . . . . .
Testing Coverage . . . . . . . . . . . . .
2.3 Testing in Software Product Lines . . . . . . . .
2.3.1 Testing in core asset development . . . .
2.3.2 Testing in product development . . . . .
2.4 Chapter Summary . . . . . . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
1
2
4
5
7
8
9
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
10
11
13
13
15
15
16
17
18
19
19
21
22
23
24
24
25
ix
3
4
A Mapping Study on Software Product Line Testing
3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . .
3.2 Related Work . . . . . . . . . . . . . . . . . . . . . .
3.3 Literature Review Method . . . . . . . . . . . . . . .
3.4 Research Directives . . . . . . . . . . . . . . . . . . .
3.4.1 Protocol Definition . . . . . . . . . . . . . . .
3.4.2 Question Structure . . . . . . . . . . . . . . .
3.4.3 Research Questions . . . . . . . . . . . . . . .
3.5 Data Collection . . . . . . . . . . . . . . . . . . . . .
3.5.1 Search Strategy . . . . . . . . . . . . . . . . .
3.5.2 Data Sources . . . . . . . . . . . . . . . . . .
3.5.3 Studies Selection . . . . . . . . . . . . . . . .
Reliability of Inclusion Decisions . . . . . . .
3.5.4 Quality Evaluation . . . . . . . . . . . . . . .
3.5.5 Data Extraction . . . . . . . . . . . . . . . . .
3.6 Outcomes . . . . . . . . . . . . . . . . . . . . . . . .
3.6.1 Classification Scheme . . . . . . . . . . . . .
3.6.2 Results . . . . . . . . . . . . . . . . . . . . .
Testing Strategy . . . . . . . . . . . . . . . . .
Static and Dynamic Analysis . . . . . . . . . .
Testing Levels . . . . . . . . . . . . . . . . .
Regression Testing . . . . . . . . . . . . . . .
Non-functional Testing . . . . . . . . . . . . .
Commonality and Variability Testing . . . . .
Variant Binding Time . . . . . . . . . . . . . .
Effort Reduction . . . . . . . . . . . . . . . .
Test Measurement . . . . . . . . . . . . . . .
3.6.3 Analysis of the Results and Mapping of Studies
Main findings of the study . . . . . . . . . . .
3.7 Threats to Validity . . . . . . . . . . . . . . . . . . . .
3.8 Chapter Summary . . . . . . . . . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
26
27
28
29
32
32
32
32
34
34
35
36
38
39
41
42
42
42
42
44
45
46
47
48
49
49
50
51
55
57
58
RiPLE-TE: a Process for Product Line Testing
59
4.1 Overview of the Process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
4.2 RiPLE-TE Roles and Responsibilities . . . . . . . . . . . . . . . . . . . . . . 63
x
4.3
4.4
4.5
4.6
4.7
RiPLE-TE Work Products . . . . . . . . . . . . .
RiPLE-TE in Core Asset Development . . . . . . .
4.4.1 Master Planning . . . . . . . . . . . . . .
Process Workflow . . . . . . . . . . . . .
4.4.2 Technical Reviews . . . . . . . . . . . . .
Usage Scenario . . . . . . . . . . . . . . .
4.4.3 Unit Testing . . . . . . . . . . . . . . . . .
Unit Test Planning . . . . . . . . . . . . .
Unit Test Assets Design . . . . . . . . . .
Unit Test Execution . . . . . . . . . . . . .
Unit Test Reporting . . . . . . . . . . . . .
4.4.4 Integration Testing . . . . . . . . . . . . .
Integration Test Planning . . . . . . . . . .
Integration Test Assets Design . . . . . . .
Integration Test Execution . . . . . . . . .
Integration Test Reporting . . . . . . . . .
RiPLE-TE in Product Development . . . . . . . .
4.5.1 Integration Testing . . . . . . . . . . . . .
4.5.2 System Testing . . . . . . . . . . . . . . .
System Test Planning . . . . . . . . . . . .
System Test Assets Design . . . . . . . . .
System Test Execution . . . . . . . . . . .
System Test Reporting . . . . . . . . . . .
4.5.3 Acceptance Testing . . . . . . . . . . . . .
Acceptance Test Planning . . . . . . . . .
Acceptance Test Assets Design . . . . . .
Acceptance Test Execution . . . . . . . . .
Acceptance Test Reporting . . . . . . . . .
Managing Test Assets Variability within RiPLE-TE
4.6.1 Meta-model for managing variability . . .
4.6.2 Development and Derivation of Test Cases
Chapter Summary . . . . . . . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
64
65
65
66
68
69
71
72
72
74
75
77
78
78
78
79
80
81
81
82
83
83
84
85
85
85
85
86
87
87
87
87
90
93
xi
5
Experimental Evaluation
5.1 Definition of the Experimental Study . . . . . . . . . . . . . . . . .
5.2 First Experiment . . . . . . . . . . . . . . . . . . . . . . . . . . .
5.2.1 The Planning . . . . . . . . . . . . . . . . . . . . . . . . .
Context Selection . . . . . . . . . . . . . . . . . . . . . . .
Pilot Project . . . . . . . . . . . . . . . . . . . . . . . . . .
Hypothesis formulation . . . . . . . . . . . . . . . . . . . .
Variables . . . . . . . . . . . . . . . . . . . . . . . . . . .
Selection of Subjects . . . . . . . . . . . . . . . . . . . . .
Design Types . . . . . . . . . . . . . . . . . . . . . . . . .
Instrumentation . . . . . . . . . . . . . . . . . . . . . . . .
Training . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Validity Evaluation . . . . . . . . . . . . . . . . . . . . . .
5.2.2 The Experimental Study Project . . . . . . . . . . . . . . .
5.2.3 The Operation . . . . . . . . . . . . . . . . . . . . . . . .
Preparation . . . . . . . . . . . . . . . . . . . . . . . . . .
Execution . . . . . . . . . . . . . . . . . . . . . . . . . . .
Data Validation . . . . . . . . . . . . . . . . . . . . . . . .
5.2.4 The Analysis and Interpretation . . . . . . . . . . . . . . .
Test case effectiveness . . . . . . . . . . . . . . . . . . . .
Quality of defects found . . . . . . . . . . . . . . . . . . .
Test coverage . . . . . . . . . . . . . . . . . . . . . . . . .
Approach effectiveness and difficulties in using the process .
5.3 Second Experiment . . . . . . . . . . . . . . . . . . . . . . . . . .
5.3.1 The Planning . . . . . . . . . . . . . . . . . . . . . . . . .
Validity Evaluation . . . . . . . . . . . . . . . . . . . . . .
5.3.2 The Operation . . . . . . . . . . . . . . . . . . . . . . . .
5.3.3 The Analysis and Interpretation . . . . . . . . . . . . . . .
Test case effectiveness . . . . . . . . . . . . . . . . . . . .
Quality of defects found . . . . . . . . . . . . . . . . . . .
Test coverage . . . . . . . . . . . . . . . . . . . . . . . . .
Approach effectiveness and difficulties in using the process .
5.4 Lessons Learned . . . . . . . . . . . . . . . . . . . . . . . . . . .
5.5 Chapter Summary . . . . . . . . . . . . . . . . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
94
95
97
97
97
98
98
99
99
99
100
101
103
105
106
106
108
111
111
113
114
118
118
119
120
121
123
124
125
125
125
126
127
128
xii
6
Concluding Remarks
6.1 Research Contributions . . . . . .
6.1.1 Systematic Mapping Study
6.1.2 RiPLE-TE Process . . . .
6.1.3 Experimental Study . . . .
6.2 Related Work . . . . . . . . . . .
6.3 Open Issues and Future Work . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
129
130
130
130
131
131
132
Bibliography
134
Appendices
147
A Mapping Study
148
A.1 List of Journals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 148
A.2 List of Conferences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149
A.3 Quality Score . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 150
B Experimental Study Instruments
B.1 Background Questionnaire .
B.2 Consent form . . . . . . . .
B.3 Error Reporting Form . . . .
B.4 Feedback Questionnaire A .
B.5 Feedback Questionnaire B .
B.6 Feedback Questionnaire C .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
153
154
156
157
158
160
163
xiii
List of Figures
1.1
1.2
RiSE Labs influences. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
RiSE Labs projects. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.1
2.2
2.3
2.4
Essential Product Line Activities
Core Asset Development . . . .
Product Development . . . . . .
Testing in the V-model. . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
13
14
15
20
3.1
3.2
3.3
3.4
3.5
3.6
3.7
3.8
The Systematic Mapping Process (adapted from Petersen et al. (2008)).
Stages of the selection process. . . . . . . . . . . . . . . . . . . . . . .
Primary studies filtering categorized by source. . . . . . . . . . . . . .
Distribution of primary studies by their publication years. . . . . . . . .
Amount of Studies vs. sources. . . . . . . . . . . . . . . . . . . . . . .
Distribution of papers according to classification scheme. . . . . . . . .
Distribution of papers according to intervention. . . . . . . . . . . . . .
Visualization of a Systematic Map in the Form of a Bubble Plot. . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
31
37
38
39
40
51
52
52
4.1
4.2
4.3
4.4
4.5
4.6
4.7
4.8
4.9
4.10
4.11
4.12
4.13
4.14
4.15
RiPLE-TE main flow. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Interaction among testing levels and SPL phases, in the context of the RiPLE-TE.
RiPLE-TE Core Asset testing workflow . . . . . . . . . . . . . . . . . . . . .
Levels of Test Planning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Technical Reviews . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Testing workflow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Unit Test execution steps . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
State transition diagram of test case result . . . . . . . . . . . . . . . . . . . .
RiPLE-TE Product testing workflow . . . . . . . . . . . . . . . . . . . . . . .
RiPLE Metamodel for assets variability management . . . . . . . . . . . . . .
The metamodel core UML profile. . . . . . . . . . . . . . . . . . . . . . . . .
The metamodel for tests. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Dependency between test objective and test cases considering variability . . . .
Example activity diagram including variability . . . . . . . . . . . . . . . . . .
Example activity diagram showing possible scenarios . . . . . . . . . . . . . .
5.1
RiSE Chair Products . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
5
6
60
62
66
67
69
71
75
77
82
88
89
91
91
92
92
xiv
5.2
5.3
5.4
5.5
5.6
RiSE Chair Feature Model used in the experiment. . . . . . . . . . . . . . . . 107
BoxPlot of defects found by groups, including outliers. . . . . . . . . . . . . . 114
BoxPlots of Scores from defects found. . . . . . . . . . . . . . . . . . . . . . 117
Distribution of RiPLE-TE effectiveness. . . . . . . . . . . . . . . . . . . . . . 119
Boxplot with the distribution of subjects in (A) first and (B) second experiments. 124
xv
List of Tables
3.1
3.2
3.3
3.4
List of Research Strings . . . . . . . . . . . .
Quality Criteria . . . . . . . . . . . . . . . .
Research Type Facet . . . . . . . . . . . . .
Research Questions (RQ) and primary studies.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
. 35
. 41
. 43
. 54
5.1
5.2
5.3
5.4
5.5
5.6
5.7
5.8
5.9
5.10
5.11
5.12
5.13
5.14
5.15
5.16
5.17
5.18
5.19
5.20
5.21
5.22
5.23
5.24
5.25
5.26
Experiment Training and Execution Agenda . . . . . . . . . . . . . . .
Subjects divided into groups . . . . . . . . . . . . . . . . . . . . . . .
Subjects’ Profile from Group 1 - Ad-hoc fashion . . . . . . . . . . . . .
Subjects’ Profile from Group 2 - RiPLE-TE . . . . . . . . . . . . . . .
Final grouping of subjects . . . . . . . . . . . . . . . . . . . . . . . .
Subjects Expertise, calculated through SX P formula. . . . . . . . . . . .
Distribution of Subjects considering the expertise coefficient . . . . . .
Amount of Designed Test Cases . . . . . . . . . . . . . . . . . . . . .
Amount of Defects Found . . . . . . . . . . . . . . . . . . . . . . . .
Test Case Effectiveness . . . . . . . . . . . . . . . . . . . . . . . . . .
Defects found by groups . . . . . . . . . . . . . . . . . . . . . . . . .
Difficulty and Severity of defects found . . . . . . . . . . . . . . . . .
Amount of defects found in terms of Difficulty and Severity . . . . . .
Results from the t-test applied to Test Score - Experts . . . . . . . . . .
Results from the t-test applied to Test Score - Non-Experts . . . . . . .
Results from the t-test applied to Test Score - Both groups . . . . . . .
Test Coverage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Subjects’ Profile in the Experimental Study . . . . . . . . . . . . . . .
Subjects Expertise, calculated through SX P formula - 2nd exp. . . . . .
Distribution of Subjects considering the expertise coefficient - 2nd exp.
Amount of Designed Test Cases - 2nd exp. . . . . . . . . . . . . . . . .
Amount of Defects Found - 2nd exp. . . . . . . . . . . . . . . . . . . .
Test Case Effectiveness - 2nd exp. . . . . . . . . . . . . . . . . . . . .
Amount of defects found in terms of Difficulty and Severity - 2nd exp. .
Scores - 2nd exp. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Test Coverage - 2nd. exp. . . . . . . . . . . . . . . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
102
109
109
110
111
112
113
113
113
114
115
116
116
116
118
118
118
123
124
124
125
125
125
126
126
126
A.1 List of Journals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 148
xvi
A.2 List of Conferences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149
A.3 Primary Studies Quality Score . . . . . . . . . . . . . . . . . . . . . . . . . . 150
B.1 Consent Form . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 156
B.2 Error Reporting Form . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157
xvii
Acronyms
CBD
Component-based development
C.E.S.A.R. Recife Center For Advanced Studies and Systems
EPF
Eclipse Process Framework
GQM
Goal Question Metric
OMG
Object Management Group
RiSE
Reuse in Software Engineering Labs http://labs.rise.com.br
RiPLE
RiSE Product Line Engineering Framework
SCM
Software Configuration Management
SPEM
Software Process Engineering Metamodel
SPL
Software Product Lines
xviii
1
Introduction
In the software engineering field, researchers and practitioners have been searching for methods,
techniques and tools that would allow for improvements in costs, time-to-market and quality
(Almeida et al., 2007), particularly due to the challenges of increasing complexity and size
of software systems. Software reuse is a key aspect for software organizations interested in
these gains (Mili et al., 2001), since a set of reusable assets is used to solve recurring problems
instead of performing the same activities over and over again (Almeida et al., 2007). However,
when the reuse initiative is based on an approach focusing on small-scale, ad-hoc reuse (i.e.
typically restricted to the code level), these benefits may not be perceived (Linden et al., 2007).
On the other hand, approaches that enables systematic assembling and configuring software
parts to be reused across various products, can enable organizations to experience the real
software reuse benefits. In this sense, the software product line engineering (SPL), an innovative,
growing concept in software engineering, was developed towards a systematic and prescribed
way to achieve reuse (Klaus Pohl and van der Linden, 2005). SPL is a planned, systematic and
pro-active reuse strategy, through exploiting the similarities within a set of products. SPL can
enable organizations to achieve significant reductions in terms of development and maintenance
costs and time to market as well (Clements and Northrop, 2001; Klaus Pohl and van der Linden,
2005), and remarkable quantitative improvements in productivity, product quality and customer
satisfaction (Northrop, 2002), thus addressing the problems aforementioned.
In general, the characteristic that distinguishes SPL from previous efforts in software development is predictive reuse versus opportunistic software reuse. Rather than put general software
components into a library in hopes that opportunities for reuse will arise, SPL only call for
software artifacts to be created when reuse is predicted in one or more products in a well defined
product line (Krueger, 2006).
With the growing acceptance of SPL by industry (Kang et al., 2007; Mansell, 2006; Northrop,
1
1.1. MOTIVATION
2002; Pohl and Metzger, 2006; Weiss, 2008), effective and efficient quality assurance techniques
and mechanisms are required. Therefore, testing, the most important and widely applied quality
assurance mechanism (Crnkovic, 2002), deserves attention. The SPL approach has a deep effect
on the overall development process, which in turn calls for changes in the testing processes
as parts of the overall process. Testing in software product lines is hence the focus of this
dissertation, which reports on our work of investigating the state-of-the-art in this field and
providing a systematic process for testing, in order to maximize the benefits of systematic reuse
achieved by the SPL approach.
This Chapter contextualizes the focus of this dissertation and starts by presenting its motivation in Section 1.1 and a clearer definition of the problem in Section 1.2. An overview of the
proposed solution is presented in Section ??. Section 1.3 describes some related aspects that are
not directly addressed by this work. Finnaly, Section 1.5 sketches the remainder structure of this
dissertation.
1.1
Motivation
The software industry is increasingly faced with a growing demand for customized products, in
order to meet a wide variety of customer specific needs. It has led to increase in the product
diversity the companies have to deal with. If scale and time were considered, throughout the
years, companies that have diverse implementations might probably re-develop assets that had
already been developed. As a consequence, companies waste workforce and budget.
SPL is in frame with this urging need of software companies, which requires more and more
meeting their customer requirements and handling with a growing diversity of products. The SPL
practice, offers a systematic reuse approach which is focused on improving the development
time-to-market, cost, quality, portfolio scale and scope and other business drivers, such as
customer satisfaction (Northrop and Clements, 2007). An important principle behind SPL is
that, among the different needs, i.e. diverse projects implementations, there might be large
amounts of common parts, which means a potential for high levels of reuse of development
efforts between different projects.
Organizations have achieved these improvements when working in the SPL approach, as
can be seen in (Weiss, 2008). Companies of all types and sizes have discovered that a product
line strategy, when skillfully implemented, can produce many benefits, and ultimately give the
organizations a competitive edge (Northrop and Clements, 2007).
While in the economic respect, SPL promises to return many benefits, at the same time
2
1.1. MOTIVATION
there are many aspects that need to be addressed to make the economic expectation come true.
There are considerable barriers to widespread product line practice. Even though the maturity
of techniques and mechanisms for implementing SPL is considerable, additional problems are
faced when adapting these practices to a specific organizational context (Mansell, 2006).
Developing software for a product line is extremely complex due to multiple intertwined
products, features and production deadlines. Moreover, the two phases of SPL development,
consisting of core asset and product development requires related but distinct treatments. Such
treatment does not stop at development but should extend also to testing (Kang et al., 2007).
There is a lack of an overall reasoning about testing in software product lines, regarding standard
methodologies that can be generalize testing practices to varying scenarios and organizations
(Lamancha et al., 2009) as well as techniques that address the problems directly rising from
scale, reuse and variability (Tevanlinna et al., 2004).
Testing is an essential activity in software engineering. It is an important process that is
performed to support quality assurance (Bertolino, 2007; Harrold, 2000). Product quality is
surely a business driver for SPL practices. However, testing is still a very labor-intensive task,
and effective testing requires a large number of tests to be handled. It is an expensive activity,
which may consume up to 50 percent (or even more in some projects) of software development
costs (Kolb and Muthig, 2003). The same holds true in the SPL context (Kauppinen and Taina,
2003; McGregor, 2001b; Tevanlinna et al., 2004).
It is important to point out that testing has become more critical and complex for product
lines since quality issues in an artifact can have high impact on the numerous products in the
product line that depends on it (Kolb and Muthig, 2003). It may indicate that the effort required
to testing in SPL is higher than in traditional software development, since several instances
will use the same artifact, but we cannot guarantee that the artifact tested in one instance will
necessarily behave accordingly in all instances. It is rather necessary to assess the quality of
all derived products that will use a specific asset, instead of assessing the quality in only one
product. In the context of a SPL project, which encompasses several artifacts, it is reinforced
the seek for strategies that attempt to reduce such an effort.
In this context, a testing process contributes to a large extent to the success or failure of a
product line effort and the potential benefits of product lines can be lost if the testing process
does not take into account the issues specific to product lines. For that, there is a growing
interest, by the SPL community, in developing effective and efficient methods, techniques and
processes to handle testing in the product line context (Kang et al., 2007; Tevanlinna et al.,
2004). These should considerate the both phases of SPL development as well as its interaction
3
1.2. SCOPE
with the other disciplines that comprise the SPL development life cycle, thus covering a gap of
the existing approaches, which usually do not handle testing in the whole life cycle, but rather
address specific points. Moreover, the systematic mapping study conducted in order to establish
a basis for this study, as will be further detailed in Chapter 3, pointed out some urging topics,
regarding processes for SPL testing, which deserve special attention, such as quality attributes
testing considering variations in quality levels among products, management of traceability
between development and test artifacts, and the management of variability. These are topics
which intends to compose an effective process focusing on reducing the effort required by testing
activities, as discussed above.
Therefore, the focus of this dissertation is to provide a systematic process for testing in
product lines, in order to maximize the benefits of systematic reuse and thus reduce the overall
effort in testing activities in the SPL context. Both software product lines and software testing,
as well as the relationship among these practices are further discussed in Chapter 2.
1.2
Scope
Motivated by the challenges presented in the previous section, the goal of the work described in
this dissertation can be stated as follows:
This work defines a testing process for software product lines which focuses on effort reduction through the systematic reuse of test artifacts by exploiting commonalities
and managing variability among products. The process guides the SPL testing activities
by providing tasks, inputs, outputs, roles and guidelines for test assets development and
management.
In order to achieve the goal stated above, the RiPLE-TE - RiSE Product Line Engineering
Testing Process is proposed. This process was proposed with basis in a mapping study on
software product lines testing, which will be presented in Chapter 3. Different from other
approaches in the literature, which only focuses on specific aspects on testing, e.g. considering
integration testing or system testing (Kang et al., 2007; Reuys et al., 2006), this process works
considering the both SPL phases, core asset and product development, in which testing levels
are distributed. Moreover, the relationship with other life cycle disciplines is represented.
Next it is presented the context in which this work is inserted.
4
1.2. SCOPE
1.2.1
Context
This dissertation is part of the RiSE Labs1 (Almeida et al., 2004), formerly called RiSE Project,
whose goal is to develop a robust framework for software reuse in order to enable the adoption
of a reuse program. RiSE Labs is influenced by a series of areas, such as software measurement,
architecture, quality, environments and tools, and so on, in order to achieve its goal. The
influence areas are depicted in Figure 1.1.
Figure 1.1 RiSE Labs influences.
Based on these areas, the RiSE Labs is divided in several different projects related to software
reuse, as shown in Figure 1.2:
• RiSE Framework: It involves reuse processes (Almeida et al., 2005), component certification (Alvaro et al., 2006) and reuse adoption and adaptation processes (Garcia et al.,
2008).
• RiSE Tools: Research focused on software reuse tools, such as the Admire Environment
(Mascena et al., 2006), the Basic Asset Retrieval Tool (B.A.R.T) (Santos et al., 2006),
1 http://www.rise.com.br/research/
5
1.2. SCOPE
Figure 1.2 RiSE Labs projects.
which was enhanced with folksonomy mechanisms (Vanderlei et al., 2007), semantic layer
(Durão, 2008), facets (Mendes, 2008) and data mining (Martins et al., 2008), the Legacy
InFormation retrieval Tool (LIFT) (Brito et al., 2008), the Reuse Repository System
(CORE) (Melo et al., 2008), and the Tool for Domain Analysis (ToolDAy) (Lisboa, 2008;
Lisboa et al., 2007).
• RiPLE: Stands for RiSE Product Line Engineering Process and aims at developing a
methodology for Software Product Lines, composed of scoping (Moraes, 2010), requirements engineering (Neiva, 2009), design, implementation, test (Neto, 2010), and evolution
management (Oliveira, 2009).
• SOPLE: Development of a methodology for Software Product Lines based on services,
with some idea of the RiPLE (Medeiros et al., 2009).
• BTT: Research focused on tools for detection of duplicated change requests (Cavalcanti,
2009; Cavalcanti et al., 2009).
• MATRIX: Investigates the area of measurement in reuse and its impact in quality and
productivity.
• Exploratory Research: Investigates new research directions in software engineering and
its impact on reuse.
6
1.3. OUT OF SCOPE
• CX-Ray: Focused on understanding with empirical data of C.E.S.A.R2 , its processes and
practices in software development, including reuse.
1.3
Out of Scope
This work is part of a broader context, as mentioned in the previous Section, in which a complete
framework for software product lines development has been designed. Thus, some SPL topics
are not described in this work, and some specific issues regarding testing are not considered in
the scope of this dissertation. The following issues are not discussed in this work:
• Regression testing - In software development, many products go through several releases,
and can be ported to many platforms. Every line of code written or modified offers
opportunity for defects to creep in. Regression testing is the way to catch a large class
of these bugs quickly and efficiently. It focuses on ensuring that everything that used to
work still works after evolution or bug fixing activities. This continues to hold in case of
SPL, strictly due to the nature of evolution of core assets. Even so, regression testing is
not the focus here, since another member of our Research Group worked towards defining
an approach for regression testing in the SPL context (Neto, 2010). In the future this work
will combine with that approach in order to have defined a complete process for testing
for SPL.
• Correctitude of test cases - In line with the previous point, we believe that a meticulous
analysis on existing techniques of correctitude analysis of documents, thereby involving
test cases, should be performed. It has been proved that earlier semantic and syntactic
corrections of documents can save much effort, because of the capability of finding and
correcting errors earlier in the development cycle is cheaper that postponing such a task.
It is even more suitable in the context of SPL Testing, since reusable artifacts play the
essential role of such a process. Thereby, avoiding errors earlier can save considerable
effort along the time. However, we have not find related research reporting the use of tools
and/or methodologies related to such a topic, i.e. we have no evidence on the effectiveness
of this technique. In addition, we believe that additional effort that would be required to
analyze this strategy would not be worthwhile to this work at this moment.
2 http://www.cesar.org.br/
7
1.4. STATEMENT OF CONTRIBUTIONS
1.4
Statement of Contributions
As a result of the work presented in this dissertation, a list of contributions may be enumerated:
• The realization of a systematic mapping study on software product lines testing,
which can provide research community with the state-of-the-art in the field, comprising
current and relevant information, extracted from a formalized evaluation, analysis and
synthesis process.
• The design of the RiPLE-TE process, developed within the context of the RiSE Framework for Software Product Line Engineering (RiPLE), which encompasses activities for
the test discipline. The RiPLE-TE provides a systematic way to handle testing in SPL
projects. It includes variability management concerns, a key point in the software product
lines engineering.
• The definition, planning, operation and analysis of an experimental study, aimed at
evaluating the proposed process. It was conducted once and then replicated in a slightly
different environment in order to have extensive feedbacks on the adoption of the process
through observing their outcomes. It was set up and thus documented in a way that enable
opportunities for replication.
In addition, the following paper, reporting on the mapping study we conducted was accepted
by Elsevier Information and Software Technology Journal (IST):
• Silveira Neto, P. A. M., Machado, I. C., McGregor, J. D., Almeida, E. S. and Meira, S. R.
L. Software Product Lines Testing: A Systematic Mapping Study. Elsevier Information
and Software Technology Journal, 2011. To appear.
The same amount of work was performed by the first two authors, wherein the choose of the
order of authoring does not represent the importance of their participation.
The following paper was also accepted for publication:
• Silveira Neto, P. A. M., Machado, I. C., Cavalcanti, Y. C., Almeida, E. S., Garcia,
V. C. and Meira, S. R. L. A Regression Testing Approach for Software Product Lines
Architectures. Brazilian Symposium on Software Components, Architectures and Reuse,
in conjunction with Brazilian Conference on Software: Theory and Practice. Salvador-BA,
Brazil. 2010 To appear.
8
1.5. ORGANIZATION OF THIS THESIS
1.5
Organization of this Thesis
The remainder of this dissertation is organized as follows:
• Chapter 2 reviews the essential topics used throughout this work: Software Product
Lines and Software Testing. An initial discussion on the relation between Testing and SPL
is also presented.
• Chapter 3 presents a mapping study on software product line testing, performed with
the goal of map out the existing approaches towards understanding the state of the art and
practice in this field, serving as a basis for our research work.
• Chapter 4 describes the proposed process for testing in software product lines, presenting
the roles associated, activities and disciplines involved, and the key concepts of the process.
• Chapter 5 describes the experiments performed in order to evaluate the proposed process.
It details the purpose of the experiments and discuss the context in which they took place,
besides reporting on their planning, operation, analysis, interpretation and packaging.
• Chapter 6 provides a set of conclusions based on the this work, discussing our contributions and limitations of the process, presents the related work, and outline directions for
future work.
• Appendix A presents the information sources from where primary studies were extracted,
to serve as basis for the mapping study analysis, as reported in Chapter 3. It also brings
the scores from the analysis, based on the established quality criteria.
• Appendix B details the material used in the experimental studies, reported in Chapter 5.
9
2
Foundations on Software Product Lines and
Testing
Software is everywhere. It comes in a large range of devices, not only as desktops or laptops, its
commonplace, but also in home electronics, airplanes, automobiles, security systems, medical
and business environment, and many other situations and devices. Human society depend
heavily on computers, and consequently on software, in almost every situation, such as their
transportation systems, commerce, governance, utilities and more. Software increasingly
becomes the key asset for modern, competitive products. No matter how simple or complex, no
matter how large or small, there is hardly any modern product without software (Linden et al.,
2007).
In face of the urging characteristic of software systems, which are becoming increasingly
intensive and complex, with an even increasing need of products diversity associated to the
market needs of developing software in shorter time and at a lower cost, Software Product Lines
(SPL) is emerging as a practical and important paradigm in software development (Northrop,
2002). Based on the systematic and planned reuse of previous development efforts among a set
of similar products, the SPL approach enables organizations not only to reduce development
and maintenance costs but also to achieve impressive gains in productivity and time-to-market.
Yet, it is based on the observation that most of new software systems are similar to already
existing ones. They are either different versions or next releases of the same baseline product
(Trendowicz and Punter, 2003). However, SPL quality demands are of high importance and
continuously rise, specially due to the its characteristic of supporting to large-scale systems,
that requires high confidence products. Thus, testing, as still the most effective way for quality
assurance, requires special attention, since the literature reports on that it is more critical and
complex for product lines than for traditional single software systems (Kolb and Muthig, 2003).
10
2.1. SOFTWARE PRODUCT LINES (SPL)
This Chapter describes background information relevant to this research. It is organized as
follows: Section 2.1 introduces concepts regarding Software Product Lines; Software Testing is
introduced in Section 2.2; Section 2.3 sketches the relationship between the previously addressed
concepts; Chapter Summary is addressed in Section 2.4.
2.1
Software Product Lines (SPL)
The software industry has continuously been challenged to improve its engineering practice
aiming at delivering products in a faster and more reliable way, since the way that goods are
produced has changed significantly in the course of time. Formerly goods were handcrafted for
individual customers. Gradually, the number of people who could afford to buy various kinds
of products increased (Pohl et al., 2005). Furthermore, it is important to point out that no two
customers are identical. Each has unique requirements that can only be completely satisfied by a
custom solution. However, software organizations must target large homogeneous markets in
order to keep development costs within reason, essentially treating their customers as if they are
identical.
This reason demonstrates the consequent competitiveness software development industry
has experienced. It has increasingly become a concern for companies of all sizes and in all
markets to develop strategies to meet the needs stated above.
A growing number of software development organizations are adopting strategies that
emphasize proactive reuse, interchangeable components, and multi-product planning cycles to
construct high-quality products faster and cheaper (McGregor et al., 2002), thereby meeting
customer needs. The SPL engineering seamlessly address these strategies.
SPL engineering is emerging as a viable and important development paradigm allowing
companies to realize improvements in time to market, cost, productivity, quality, and other
business drivers (Denger and Kolb, 2006; Pohl et al., 2005). Software product lines can also
enable rapid market entry and flexible response, and provide a capability for mass customization
(Northrop, 2002). By definition, a SPL is a set of software-intensive systems that share a
common, managed set of features satisfying the specific needs of a particular market segment or
mission and that are developed from a common set of core assets in a prescribed way (Clements
and Northrop, 2001).
In SPL, products are developed from a common set of assets, in contrast to being developed
separately, from scratch, or in an arbitrary fashion (Clements and Northrop, 2001). In addition,
in constrast with what is customary, “design reuse” in SPL does not simply mean taking an
11
existing design, copying it and modifying it to some particular need, but rather the development
of a model (or set of models) that can be reused in a disciplined fashion. SPL techniques
explicitly consolidate and capitalize on commonality throughout the product line. They formally
manage and control the variations among the products in the product line. New solutions are
developed by assembling partial solutions and/or by configuring generic ones. Only the unique
features of each solution, represented by product variations, must be specified because the
common ones can be safely assumed. It eliminates duplication of effort in the engineering
processes.
Yet, it has gained increasing attention by industry over recent years (Linden et al., 2007). It
has proven successful in a variety of settings, including large and small organizations in business,
industry, and governmental sectors, and across a variety of domains (Käkölä and Dueñas, 2006),
ranging from mobile phones to satellite ground station systems (Weiss, 2008).
As real world motivation, organizations use product line practices to (Northrop and Clements,
2007): achieve large scale productivity gains, improve time to market, maintain market presence,
sustain unprecedented growth, achieve greater market agility, compensate for an inability to
hire, enable mass customization, get control of diverse product configurations, improve product
quality, increase customer satisfaction, increase predictability of cost, schedule, and quality.
Ideas common to most successful product line efforts encompasses, besides exploring
commonality among products to strategically reuse software artifacts, also the adoption of
architecture -centric development and a two-tiered organizational structure (McGregor et al.,
2002).
The SPL development begins by identifying the domain, deciding a set of products comprising the product line and then proceeds by identifying what requirements are common to all
products (commonality), and what differentiate them (variability). Commonality and variability
concepts are key prerequisites for software product lines (Klaus Pohl and van der Linden, 2005).
Architecture is also a key concept of software product line engineering. It determines how
products are derived efficiently from the core assets. To allow the derivation of several different
products, a product line architecture has to deal with variation. The architecture’s support for
variation determines the scope of the product line (Rommes and America, 2006), so that it can
be configured to match a diverse range of requirements.
Moreover, SPL requires an organization to involve two types of development activities, one
responsible for developing reusable assets (core asset development) and the other responsible for
the development of products that use those assets (product development). These, taken together
with management, are the essential activities underlying SPL.
12
2.1.1
SPL Essential Activities
As mentioned, software product lines engineering involves three essential activities, as shown
in Figure 2.1 (Clements and Northrop, 2001): core asset development, product development,
and management. Each of these activities is essential, as is the blending of all three. The
rotating arrows indicate not only that core assets are used to develop product but also that
revisions of existing core assets or even new core assets might, and most often do, evolve out of
product development. There is a strong feedback loop between the core assets and the products.
Moreover, strong management at multiple levels is needed throughout. Management oversees
core asset and product development. Management orchestrates all activities and processes
needed to make the three essential activities work together. Each activity is next detailed:
Figure 2.1 Essential Product Line Activities
Core asset development
Also known as domain engineering, it forms the basis for the software product line. The
goal of this activity is to define the commonality and the variability of the product line, and
thus for establishing the reusable artifacts and then a production capability for products (Pohl
et al., 2005). It also involves the evolution of the assets in response to product feedback, new
13
market needs, and so on (Clements and Northrop, 2001). Figure 2.2 illustrates the core asset
development activity along with its outputs and influential contextual factors.
The activity of core asset development is iterative. The rotating arrows in Figure 2.2 suggest
that there is no one-way causal relationship from inputs to outputs, they rather affect each other.
For example, expanding the product line scope (one of the outputs) may admit whole new classes
of systems to examine as possible sources of legacy assets (part of the context). Correspondingly,
a production constraint may lead to restrictions on the product line architecture (an output).
This restriction, in turn, will determine which preexisting assets (another contextual factor) are
candidates for reuse or mining (Northrop, 2002).
Core assets include, but are not limited to, the architecture and its documentation, specifications, software components, tools such as component or application generators, performance
models, schedules, budgets, test plans, test cases, work plans, and process descriptions (Clements
and Northrop, 2001). Although it may be possible to create core assets that can be used across
the products without any adaptations, in many cases some adaptations are required to make
core assets usable in the broader context of a product line. Variation mechanisms used in
core assets help to control the required adaptations and to support the differences among the
software products (Bachmann and Clements, 2005). These adaptations should be planned before
development and made easy for the product development team to use without putting at risk
existing properties of the core assets.
Figure 2.2 Core Asset Development
14
Product development
Also known as application engineering, this activity creates individual products by reusing the
core assets, gives feedback to core asset development, and evolves the products. Figure 2.3
illustrates the product development activity along with its outputs and influential contextual
factors.
Product development uses the core assets, in accordance with the production plan, to produce
products that meet their respective requirements, as defined in the product line scope, all artifacts
derived from core assets development, as previously shown in Figure 2.2.
As in Figure 2.2, the rotating arrows in Figure 2.3 indicate iteration and involved relationships. For example, the existence and availability of a particular product may well affect
the requirements for a subsequent product. As another example, building a product that has
previously unrecognized commonality with another product already in the product line will
create pressure to update the core assets and provide a basis for exploiting that commonality for
future products (Northrop, 2002). Moreover, this activity has an obligation to give feedback on
any problems or deficiencies encountered with the core assets, in order to keep the core asset
base in accordance with the products.
Figure 2.3 Product Development
Management
It includes technical and organizational management, where technical management is responsible
for requirement control and the coordination between core asset and product development. Not
only technical aspects, here represented by the development of core assets and products, are
15
2.2. SOFTWARE TESTING
considered when developing SPL, but managerial and organizational activities as well.
The common set of assets and the plan for how they are used to build product do not just
materialize without planning, and they certainly do not come free. They require organizational
foresight, investment, planning and direction. They require strategic thinking that looks beyond
a single product. The disciplined use of the assets to build products does not just happen either.
Management must direct, track, and enforce the use of the assets. Software product lines are as
much about business practices as they are about technical practices (Clements and Northrop,
2001).
Although organizations are different, in terms of nature of their products, market or mission,
business goals, organizational structure, culture and policies, software process disciplines, and
so on, these essential activities apply in every situation, since they represent the highest level of
generality which involves the most important aspects regarding SPL development. In general,
organizations accomplish this division of responsibility in a variety of ways. Some organizations
have teams dedicated to each role and others use the same people for both (McGregor et al.,
2002). It depends on the strategy, budget, staff availability, among other aspects. In fact, it is
valid to mention that there is no “first" activity, i.e. in some contexts, existing products are mined
for core assets, whereas in others, core assets may be developed or procured for future use.
2.2
Software Testing
The role of software which is, more and more, part of our lives, both economically and socially,
and is used more often to perform critical tasks, has imposed pressure for software professionals
to focus on quality issues, since it is necessary to avoid that people, i.e. customers, have
experience with software that do not work as expected. It can be reachable by building reliable
software.
Although many factors affect the engineering of reliable software, including, of course,
careful design and sound process management, testing is the primary method that the industry
uses to evaluate software under development (Ammann and Offutt, 2008).
Testing is an important process that is performed to support quality assurance (Harrold,
2000). According to Software Engineering Institute (SEI)1 in its report on software testing
(McGregor, 2001b), testing is one approach to validating and verifying the artifacts produced in
software development. A more detailed definition can be stated as testing is designed to make
sure computer code does what it was designed to do and that it does not do anything unintended.
1 SEI
- Software Engineering Institute at Carnegie Melon University - www.sei.cmu.edu.
16
Software should be predictable and consistent, offering no surprises to users (Myers and Sandler,
2004). Moreover, testing helps us to measure the quality of software in terms of the number of
defects found, the tests performed, and the system covered by the tests. It can be performed for
both the functional and non-functional software requirements and characteristics.
Testing activities support quality assurance by gathering information about the nature of the
software being studied. These activities consist of designing test cases, executing the software
with those test cases, and examining the results produced by those executions. They, together,
can reduce the risk of failure in the real environment.
Since the focus is to deliver high-quality products to customers, software researchers realized
that it was important to integrate testing activities within the context of a software development
process (Burnstein, 2002), including testing into the whole software life cycle so that confidence
in the software can be acquired.
Following the role of process in software testing is addressed and then process models and
techniques are described.
2.2.1
Testing Processes
In a high-level definition, a process is a set of actions, observations, and decisions taken to
achieve some desired outcome, in which some activities can happen in parallel, and some
are sequential. The activities are related by the desired outcomes but do not necessarily have
techniques or required skills in common (Black, 2003). If we instantiate such a definition for
the context of software engineering, we can consider a process as the set of methods, practices,
standards, documents, activities, policies, and procedures that software engineers use to develop
and maintain a software system and its associated artifacts, such as project and test plans, design
documents, code, and manuals (Burnstein, 2002).
Thus, if we think about a test process, we can elaborate on this definition by considering
three main aspects: economic, technical and managerial.
Economic aspects are related to the reality that resources and time are available to the testing
group on a limited basis, as described by Burnstein (2002). In many cases, complete testing
is not feasible due to these economic contraints. Testing coverage criteria are used to decide
which test inputs to use so that testing continues to be an effective approach. A little bit more on
testing coverage is addressed in Section 2.2.2.
Technical aspects of testing include the techniques, methods, measurements, and tools
used to guarantee that the software under test is as defect-free and reliable as possible for the
conditions and constraints under which it must operate.
17
Regarding managerial aspect, since we advocate that testing is a process, a process must
be managed. In short, it means that an organizational policy for testing must be defined and
documented, which include their procedures and steps. The testing activity must be planned,
and the process should have associated quantifiable goals that can be measured and monitored.
A test process should specify and separate the testing characteristics, which represents the
why we test question; the testing stages, delineating when we test; and also testing techniques,
responsible for answering the question how we test. It enables to organize the activity of
revealing defects, confirming conformance to specified requirements and evaluating quality
against criteria such as business risk and business confidence. A well planned, designed, built
and executed test process can result in delivery of projects on time, to cost, meet the required
quality and deliver the required scope more often and more predictably. The expected business
benefits and return on investment from the project can be achieved through applying a test
process.
It clearly denotes the importance of such a process adoption by an organization, which in
short allows a team to better execute its testing projects, working in a guided environment with
timely, accurate, credible information (Black, 2003).
A test process depends upon the organizational needs. We can not consider a standard
process to be applicable to every one. We can only state that a process should accomplish the
three aspects as aforementioned and furthermore it should be in constant optmization, being
able to absorb improvements from both internal and external stakeholders. Besides, personnel
should clearly understand their roles and responsibilities. They should be trained in the process,
hence considering these elements we have just described. Next we bring on some information
regarding the role of test engineer, which is responsible for dealing with test in an organization
and then regarding the artifacts used and/or generated by test processes.
Activities of a Test Engineer
According to Ammann and Offutt (2008), a test engineer is a professional who is in charge
of one or more technical test activities. The tasks and responsibilities include designing test
inputs, producing test case values, running test scripts, analyzing results, and reporting results to
developers and managers.
A sample flow performed by a test engineer consists of firstly design tests by creating test
requirements. These requirements are then transformed into actual values and scripts that are
ready for execution. These executable tests are run against the software, and the results are
evaluated to determine if the tests reveal a non-compliance in the software.
18
These activities may be carried out by one person or by several, and the process is monitored
by a test manager. A test manager is in charge of one or more test engineers. Test managers
set test policies and processes, interact with other managers on the project, and otherwise help
the engineers do their work (Ammann and Offutt, 2008). Moreover, with every project the test
project manager learns new concepts that will improve the test processes, which should undergo
continuous process improvement.
In many organizations, the developer assumes the role of test engineers, being responsible
for both coding and testing tasks. In other, in which testing activities are separated from the
development ones, the role of test engineer is more visible. For the purpose of this work, we
use the term tester to refer to every person who works with testing activities, which in a broader
view can be included in the group of test engineers. Further in Section 4.2 we expand on the
roles and responsibilities of test engineers considering testing in a SPL.
Testing Artifacts
According to IEEE Standard for Software Test Documentation (IEEE, 1998), the following are
considered the basic artifacts to be applied in the testing activity, that is basically composed
by sets of test cycles: test plan, test design specification, test case specification, test procedure
specification and test report. These are considered the most relevant artifacts and they all are
applicable to any context, although other artifacts can be produced as well.
A good practice, specially when striving for reducing effort, indicates that tests must be
repeatable and reusable, thus testing artifacts are to be produced considering such aspects. The
expertise gathered in various projects should be documented so that the knowledge can be reused
in new projects.
2.2.2
Testing Process Models
Traditionally, efforts to improve quality have centered around the end of the product development
cycle by emphasizing the detection and correction of defects (Kshirasagar Naik, 2008). It is the
common practice of the Waterfall software development process, in which testing concerns are
postponed to the end of the development cycle, after the implementation has started, or even
after it has ended.
Notwithstanding, by waiting until this late in the process, testing ends up being compressed,
in which not enough time and budget resources remain, problems with previous stages have
been solved by taking time and money from testing phase, and testers do not have enough time
19
to plan for testing. Instead of planning and designing tests, the developers have time only to run
tests, usually without being guided by a process, but rather in an ad hoc fashion (Ammann and
Offutt, 2008).
An approach for testing to enhancing quality should start at the beginning of a project, long
before any program code has been written, encompassing all phases of a product development
process. Testing should occur at different places and times during the development life cycle
- from the requirements analysis to the final delivery of the product to the customer. The
requirements documentation is verified first; then, in the later stages of the project, testing can
concentrate on ensuring the quality of the application code. Expensive reworking is minimized
by eliminating requirements-related defects early in the project’s life, prior to detailed design or
coding work.
The idea of integrating testing activities into the earlier software life cycle phases is illustrated
by the V-model (Figure 2.4). The V-model gives equal weight to development and testing rather
than treating testing as an afterthought (Goldsmith and Graham, 2002).
Figure 2.4 Testing in the V-model.
There are other models for testing process that can be combined with the ones aforementioned
It is the case of Spiral Testing, which work with the idea of software increments, and prototypes
evolving into applications (Mathur, 2009). In this model, testing activities evolve over time and
with the prototype, increasing their sophistication as the project evolves. It can thus be applied
to any incremental development process.
20
Testing Levels
As mentioned above, in a quality-driven development process, testing activities should be
performed along the whole life cycle, considering all phases, in order to organize the testing
process as well as to find major problems earlier in the development and thus avoiding resources
waste. Early identification of defects is by far the best means of reducing their ultimate cost.
The testing activities mentioned are then expressed as testing levels. A different level of testing
accompanies each distinct software development activity, consequently the information for each
test level is typically derived from the associated development activity.
The idea behind splitting testing into levels is to build code and test it in pieces and gradually
put together into larger and larger portions, in order to avoid surprises when the entire product is
linked together (Patton, 2005).
The common testing levels are following described:
• Unit Testing: It is designed to assess the units produced by the implementation phase
and is the “lowest" level of testing. Unit testing has as goal the capability to ensure
that each individual software unit is functioning according to its specification (Burnstein,
2002). Most companies make unit testing the responsibility of the programmer. It is
straightforward to package unit tests together with the corresponding code through the use
of tools such as JUnit for Java classes (Ammann and Offutt, 2008). Each unit test must
run independently of all other units as well as unit tests must be able to run in any order.
In addition, one test must not depend on some side effect caused by a previous test, e.g. a
member variable being left in a certain state.
• Module Testing: It is designed to assess individual modules in isolation, including how
the component units interact with each other and their associated data structures. As
with unit testing, most software development organizations make module testing the
responsibility of the programmer (Ammann and Offutt, 2008). It is possible to describe a
process merging module and unit testing objectives in a single level.
• Integration Testing: As the units and/or modules are tested and the low-level bugs are
fixed, they are then integrated and integration testing occurs. It is designed to assess
whether the interfaces between units (or modules) in a given subsystem have consistent
assumptions and communicate correctly. Integration testing must assume that modules
work correctly (Ammann and Offutt, 2008).
• System Testing: Its purpose is to compare the system to its original objectives. It assumes
21
that the pieces work individually, and asks if the system works as a whole (Myers and
Sandler, 2004). This level of testing usually looks for design and specification problems,
identified at Analysis phase. It is a very expensive place to find lower-level faults and is
usually not done by the programmers, but by a separate testing team (Ammann and Offutt,
2008).
• Acceptance Testing: It is designed to determine whether the completed software in fact
meets customers’ requirements, gained through Requirements Analysis phase. Acceptance
testing probes whether the software does what the users want. It must involve users or
other individuals who have strong domain knowledge (Ammann and Offutt, 2008).
Testing Coverage
In an ideal world, we would want to test every possible permutation of a program. In most
cases, however, this simply is not possible (Ammann and Offutt, 2008; Craig and Jaskiel, 2002;
Myers and Sandler, 2004; Patton, 2005). The features and attributes of a simple application
may result in millions of permutations that could potentially be developed into test cases. Even
a seemingly simple program can have hundreds or thousands of possible input and output
combinations. Obviously, creating test cases for all of these possibilities is impractical. Even if
a large number of test cases are created, they generally still represent only a tiny fraction of the
possible combinations. Several other combinations may still exist. As said by Craig and Jaskiel
(2002), in most cases, what is tested in a system is much more important than how much it is
tested.
Since we cannot test with all inputs, it is necessary to determine a testing coverage criteria.
Coverage criteria are used to decide the what aforementioned, i.e. which test inputs to use,
providing suitable stopping rules for testing. A formal coverage criteria give test engineers ways
to decide what test inputs to use during testing, making it more likely that the tester will find
problems in the program and providing greater assurance that the software is of high quality and
reliability (Ammann and Offutt, 2008).
Coverage criteria can be defined in terms of test requirements. These can be described
with respect to a variety of software artifacts, including the source code, design components,
specification modeling elements, or even descriptions of the input space. Testers have to
understand the testing coverage required by a project, and it should be detailed in the test plan
document. Furthermore, coverage criteria act as basis for test case design. As an example, in
some projects there may be contractual agreements that list all functional requirements to be
22
2.3. TESTING IN SOFTWARE PRODUCT LINES
tested, or code-coverage requirements. In other cases, testing coverage is defined by testers,
based on available resources, schedules, tools, task at hand, and risks of not testing an item.
There are several coverage criteria available in literature, that are applicable to industry.
According to Ammann and Offutt (2008), the following are the major test coverage criteria in
use today: Graph Coverage, Logic Coverage and Input space partitioning.
2.3
Testing in Software Product Lines
In the SPL approach, as in general in single-system development, i.e. traditional software
development, testing is essential (Kauppinen, 2003) and its most important goal is to uncover the
evidence of defects (Pohl and Metzger, 2006; Reuys et al., 2006). A successful testing approach
can cut down reject rate and increase the quality of products developed from product lines,
thus saving significant development effort and increasing product quality, customer satisfaction
and lowering maintenance costs (Juristo and Moreno, 2006). Hence, it is important to devote
adequate effort to it.
As defined by McGregor (2001b), testing in SPL aims to examine core assets, individual
products and the interaction among them. Thus, testing in SPL encompasses activities from the
validation of the initial requirements to activities performed by customers to complete the acceptance of a product. It is important to subject every stage of software product lines development
to quality control as soon as development commences. Quality is herein caracterized by testing,
which is still the most effective way for quality assurance (Kolb and Muthig, 2003).
The development of software product lines is quite tasking and challenging. Whereas a
single program can be considered as validated when we have confidence that it will operate
correctly, for a SPL to be said validated we should have confidence that any instance of that
produce line will operate correctly. By testing SPL, we may experience problems in dividing
testing tasks in a product line testing process, which should consider both core asset and product
development, and the increase in the importance of test management that is required to handle
the extra complexity of the product line approach.
Following we describe testing issues from the viewpoints of the both SPL phases, explicitly
linking each other. After those, we present the most known and applied techniques, and after
that, a series of challenges in research and practice direction is outlined.
23
2.3. TESTING IN SOFTWARE PRODUCT LINES
2.3.1
Testing in core asset development
The major source of complexity within product lines come from the amount of variants. Variation
points are often a source of faults. Testing all variants of core assets a priori is usually impossible
for all but the simple cases.
Test artefacts are created in both processes, but in core asset phase the artefacts are created
so that can be reused efficiently in product testing. However, it requires a clear and unambiguous
documentation of variability in test artefacts produced in this phase (Pohl and Sikora, 2005).
Another point to consider is that these assets may save effort, since repetition in terms of test
execution is avoided, i.e. different products that use the same asset (or set of assets) do not need
to retest everything, but rather use the results prior achieved.
Indeed, if considering the V-model development phases, complete integration and system
testing in core asset phase is usually not feasible, because the implementation of variations is
not yet specified. Trying to test the different combinations of components leads to exponential
growth of tested configurations (McGregor, 2001b; Muccini and van der Hoek, 2003). In
addition, it is hard to decide on how much we can depend on the core asset testing in the product
testing (Tevanlinna et al., 2004). Hence, in this phase, it is common that configurations of the
common assets expected to be formed should be tested well. This effort minimizes the testing of
the common assets needed in product development phase.
2.3.2
Testing in product development
It is not applicable to rely on testing solely in core asset phase. However, it is also not feasible
that tests be performed only in product development phase. The reason is the expensive nature
of the product line scale, due to the redundant testing of the common assets.
As test assets were prior created and documented, thus validating the reusable assets, product
testing should be concerned with the integration of the common assets and product-specific
software and product-specific testing objectives. The prior refers to integrate isolated parts, i.e.
core assets which have already been tested, but in product development phase they need to be
more completely tested, according to product-specific requirements. The latter refers to new
features that can possibly be added to the product, that do not belong to the core asset base,
neither was tested in other products in the line, but rather was developed from scratch, in order to
meet a product specific requirement. In this case, tests should be performed, considering every
testing level, up to be integrated into the product. After that, this new feature can be integrated
to the core asset base, being subject of management intervention.
24
2.4. CHAPTER SUMMARY
Besides these usual cases, there are others in which product testing requires a really special
attention. A typical example refers to critical mission systems. In this domain, although products
are derived from a common platform, one must ensure that everything is working according to
specifications as well as products do not present “negative surprises” to others who will use the
systems. At first glance, we observe that it can mitigate the benefits of reuse, but with it the
quality of the products is most clearly guaranteed. What is done in this context is that critical and
risky requirements are tested as a whole, considering the real (or the most equivalent possible)
environment, whereas other requirements can be taken from core asset base.
2.4
Chapter Summary
In this chapter, we provided information on the topic software product lines and software testing
and then discussed how these concepts can interact. These underlying concepts are fundamental
to understand the purpose of the work described in this dissertation.
As described, substantial economies can be achieved when the systems in a software product
line are developed from a common set of assets in a prescribed way, in contrast to being
developed separately, from scratch, or in an arbitrary fashion (Clements and Northrop, 2001).
It is exactly these production economies that make software product lines attractive (Clements
and Northrop, 2001). The major distinction between SPL engineering and traditional software
engineering, which is focused in developing single-systems, is the presence of variation in some
or all of the software assets.
Handling variation is not a trivial nor an easy task, when considering software testing, since
this is a discipline connected to the whole development life cycle, which becomes increasingly
difficult to determine the most suitable practices and/or strategies to be adopted. Besides, the
nature of a SPL imply in an explosion of products that can be generated from a common set of
generic components, that should be specialized in order to meet product-specific requirements.
It raises problems regarding how to efficiently and effectively test all products taking advantage
of reuse, considering all artifacts produced during the testing process.
These problems require techniques that consider the unique characteristics of SPL, since
traditional techniques cannot solve such questions. Hence, it is necessary to figure out how
research community and practitioners have dealt with such aspects. In this context, next Chapter
presents a systematic mapping study, a literature overview of proposed concepts and practices to
deal with testing in SPL. It was conducted in order to investigate the current state-of-the-art and
practice of SPL testing field, serving as basis to the proposal of this study.
25
3
A Mapping Study on Software Product Line
Testing
In software development, Testing is an important mechanism both to identify defects and
assure that completed products work as specified. This is a common practice in single-system
development, and continues to hold in SPL. Even though research in SPL Testing field has
increasingly gained attention in the recent years, there are still many open rooms for improvement.
However, it is necessary to assess the current state of research and practice, in order to provide
practitioners with evidence established through scientific research to enable fostering its further
development.
This Chapter presents a systematic mapping study, conducted with a set of nine research
questions, in which 120 studies, dated from 1993 to 2009, were evaluated. This study focuses
on Testing in SPL and has the following goals: investigate state-of-the-art testing practices,
synthesize available evidence, and identify gaps between required techniques and existing
approaches, available in the literature.
Although several aspects regarding testing have been covered by single-system development
approaches, many can not be directly applied in the SPL context due to specific issues. In
addition, particular aspects regarding SPL are not covered by the existing SPL approaches, and
when the aspects are covered, the literature just gives brief overviews. This scenario indicates
that additional investigation, empirical and practical, should be performed.
The results can help to understand the needs in SPL Testing, by identifying points that still
require additional investigation, since important aspects regarding particular points of software
product lines have not been addressed yet.
26
3.1. INTRODUCTION
3.1
Introduction
The increasing adoption of Software Product Lines practices in industry has yielded decreased
implementation costs, reduced time to market and improved quality of derived products
(Denger and Kolb, 2006; Northrop and Clements, 2007). In this approach, as in singlesystem development, testing is essential (Kauppinen, 2003) to uncover defects (Pohl and
Metzger, 2006; Reuys et al., 2006). A systematic testing approach can save significant development effort, increase product quality and, customer satisfaction and lower maintenance costs
(Juristo and Moreno, 2006).
As defined in (McGregor, 2001b), testing in SPL aims to examine core assets, shared by
many products derived from a product line, their individual parts and the interaction among
them. Thus, testing in this context encompasses activities from the validation of the initial
requirements to activities performed by customers to complete the acceptance of a product, and
confirms that testing is still the most effective method of quality assurance, as observed in (Kolb
and Muthig, 2003).
However, despite the obvious benefits aforementioned, the state of software testing practice
is not as advanced in general as software development techniques (Juristo and Moreno, 2006)
and, the same holds true in the SPL context (Kauppinen and Taina, 2003; Tevanlinna et al.,
2004). From an industry point of view, with the growing SPL adoption by companies (Weiss,
2008), more efficient and effective testing methods and techniques for SPL are needed, since the
currently available techniques, strategies and methods make testing a very challenging process
(Kolb and Muthig, 2003). Moreover, the SPL Testing field has attracted the attention of many
researchers in the last years, which result in a large number of publications regarding general
and specific issues. However, the literature has provided lots of approaches, strategies and
techniques, but rather surprisingly little in the way of widely-known empirical assessment of
their effectiveness.
This Chapter presents a systematic mapping study (Petersen et al., 2008), performed in
order to map out the SPL Testing field, through synthesizing evidence to suggest important
implications for practice, as well as identifying research trends, open issues, and areas for
improvement. Mapping study (Petersen et al., 2008) is an evidence-based approach, applied in
order to provide an overview of a research area, and identify the quantity and type of research
and results available within it. The results are gained from a defined approach to locate, assess
and aggregate the outcomes from relevant studies, thus providing a balanced and objective
summary of the relevant evidence. Hence, the goal of this investigation is to identify, evaluate,
27
3.2. RELATED WORK
and synthesize state-of-the-art testing practices in order to present what has been achieved so
far in this discipline. We are also interested in identifying practices adopted in single systems
development that may be suitable for SPL.
The study also highlights the gaps and identifies trends for research and development.
Moreover, it is based on analysis of interesting issues, guided by a set of research questions.
This systematic mapping process was conducted from July to December in 2009.
The remainder of this Chapter is organized as follows: Section 3.2 presents the related work.
In Section 3.3 the method used in this study is described. Section 3.4 presents the planning
phase and the research questions addressed by this study. Section 3.5 describes its execution,
presenting the search strategy used and the resultant selected studies. Section 3.6 presents the
classification scheme adopted in this study and reports the findings. In section 3.7 the threats to
validity are described. Section 3.8 draws some conclusions and provides recommendations for
further research on this topic.
3.2
Related Work
As mentioned before, the literature on SPL Testing provides a large number of studies, regarding
both general and specific issues, as will be discussed later on in this study. Amongst them, we
have identified some studies developed in order to gather and evaluate the available evidence in
the area. They are thus considered as having similar ideas to our mapping study and are next
described.
A survey on SPL Testing was performed by Tevanlinna et al. (2004). They studied approaches
to product line testing methodology and processes that have been developed for or that can
be applied to SPL, laying emphasis on regression testing. The study also evaluates the stateof-the-art in SPL testing, up to the date of the paper, 2004, and highlighted problems to be
addressed.
A thesis on SPL Testing published in 2007 by Edwin (2007), investigated testing in SPL and
possible improvements in testing steps, tools selections and application applied in SPL testing.
It was conducted using the systematic review approach.
A systematic review was performed by Lamancha et al. (2009) and published in 2009. Its
main goal was to identify experience reports and initiatives carried out in Software Engineering
related to testing in software product lines. In order to accomplish that, the authors classified
the primary studies in seven categories, including: Unit testing, Integration testing, functional
testing, SPL Architecture, Embedded system, testing process and testing effort in SPL. After
28
3.3. LITERATURE REVIEW METHOD
that a summary of each area was presented.
These studies can be considered good sources of information on this subject. In order to
develop our work, we considered every mentioned study, since they bring relevant information.
However, we have noticed that important aspects were not covered by them in an extent that
should be possible to map out the current status of research and practice of the area. Thus, we
categorized a set of important research areas under SPL testing, focusing on aspects addressed by
the studies mentioned before as well as the areas they did not addressed, but are directly related
to SPL practices, in order to perform critical analysis and appraisal. In order to accomplish our
goals in this work, we followed the guidelines for mapping studies development presented in
Budgen et al. (2008). We also included threats mitigation strategies in order to have the most
reliable results.
We believe our study states current and relevant information on research topics that can
complement others previously published. By current, we mean that, as the number of studies
published has increased rapidly, as shown in Figure 3.4, it justifies the need of more up to date
empirical research in this area to contribute to the community investigations.
3.3
Literature Review Method
The method used in this research is a Systematic Mapping Study (henceforth abbreviated to as
’MS’) (Budgen et al., 2008; Petersen et al., 2008). A MS provides a systematic and objective
procedure for identifying the nature and extent of the empirical study data that is available to
answer a particular research question (Budgen et al., 2008).
While a Systematic Review is a mean of identifying, evaluating and interpreting all available research relevant to a particular question (Kitchenham and Charters, 2007), a MS intends to ’map out’ the research undertaken rather than to answer detailed research question
(Budgen et al., 2008; Petersen et al., 2008). A well-organized set of good practices and procedures for undertaking MS in the software engineering context is defined in (Budgen et al., 2008;
Petersen et al., 2008), which establishes the base for this mapping study. It is worthwhile to
highlight that the importance and use of MS in the software engineering area is increasing (Afzal
et al., 2008; Bailey et al., 2007; Budgen et al., 2008; Condori-Fernández et al., 2009; Juristo
et al., 2006; Kitchenham, 2010; Petersen et al., 2008; Pretorius and Budgen, 2008), showing
the relevance and potential of the method. Nevertheless, of the same way as systematic reviews
(Bezerra et al., 2009; Chen et al., 2009; Lisboa et al., 2010; Moraes et al., 2009; Souza Filho
et al., 2008), we need more MS related to software product lines, in order to evolve the field
29
with more evidence (Kitchenham et al., 2004).
A MS comprises the analysis of primary studies that investigate aspects related predefined
research questions, aiming at integrating and synthesizing evidence to support or refute particular
research hypotheses. The main reasons to perform a MS can be stated as follows, as defined by
Budgen et al. (2008):
• To make an unbiased assessment of as many studies as possible, identifying existing gaps
in current research and contributing to the research community with the reliable synthesis
of the data;
• To provide a systematic procedure for identifying the nature and extent of the empirical
study data that is available to answer research questions;
• To map out the research that has been undertaken;
• To help to plan new research, avoiding unnecessary duplication of effort and error;
• To identify gaps and clusters in a set of primary studies, in order to identify topics and
areas to perform more complete systematic reviews.
The experimental software engineering community is working towards the definition of
standard processes for conducting mapping studies. This effort can be checked out in the initial
study fromPetersen et al. (2008), which describes how to conduct systematic mapping studies in
software engineering. The paper provide a well defined process which serves as a starting point
for our work. We merged ideas from (Petersen et al., 2008) with good practices defined in the
guidelines published by (Kitchenham and Charters, 2007). This way, we could apply a process
for mapping study including good practices of conducting systematic reviews, making better
use of the both techniques.
This blending process enabled us to include topics not covered by Petersen et al. (2008) in
their study, such as:
• Protocol. This artifact was adopted from systematic review guidelines. Our initial activity
in this study was to develop a protocol, i.e. a plan defining the basic mapping study
procedures. Searching in the literature, we noticed that some studies created a protocol
(e.g. (Afzal et al., 2009)), while other do not (e.g. (Condori-Fernández et al., 2009;
Petersen et al., 2008)). Even though this is not a mandatory artifact, as mentioned by
Petersen et al. (2008), authors who created a protocol in their studies encourage the use
this artifact as being important to evaluate and calibrate the mapping study process.
30
• Collection Form. This artifact was also adopted from systematic review guidelines and
its main purpose is to help the researchers in order to collect all the information needed to
address the review questions, study quality criteria and classification scheme.
• Quality Criteria. The purpose of quality criteria is to evaluate the studies, as a means
of weighting their relevance against others. Quality criteria are commonly used when
performing systematic literature reviews. The quality criteria were evaluated independently
by two researchers, hopefully reducing the likelihood of erroneous results.
Some elements, as proposed by Petersen et al. (2008) were also changed and/or rearranged
in this study, such as:
• Phasing mapping study. As can be seen in Figure 3.1, the process was explicitly split
into three main phases: 1 - Research Directives, 2 - Data Collection and 3 - Results. It is
in line with systematic reviews practices (Kitchenham and Charters, 2007), which defines
planning, conducting and reporting phases. Phases are named differently from what is
defined for systematic reviews, but the general idea and objective for each phase was
followed. In the first, the protocol and the research questions are established. This is the
most important phase, since the research goal is satisfied with answers to these questions.
The second phase comprises the execution of the MS, in which the search for primary
studies is performed. This consider a set of inclusion and exclusion criteria, used in order
to select studies that may contain relevant results according to the goals of the research.
In third phase, the classification scheme is developed. The results of a meticulous analysis
performed with every selected primary study is reported, in a form of a mapping study.
All phases are detailed in next sections.
Figure 3.1 The Systematic Mapping Process (adapted from Petersen et al. (2008)).
31
3.4. RESEARCH DIRECTIVES
3.4
Research Directives
This section presents the first phase of the mapping study process, in which the protocol and
research questions are defined.
3.4.1
Protocol Definition
The protocol forms the research plan for an empirical study, and is an important resource for
anyone who is planning to undertake a study or considering performing any form of replication
study.
In this study, the purpose of the protocol is to guide the research objectives and clearly define
how it should be performed, through defining research questions and planning how the sources
and studies selected will be used to answer those questions. Moreover, the classification scheme
to be adopted in this study was prior defined and documented in the protocol.
Incremental reviews to the protocol were performed in accordance with the MS method.
The protocol was revisited in order to update it based on new information collected as the study
progressed.
To avoid duplication, we detail the content of the protocol in the Section 3.5, as we describe
how the study was conducted.
3.4.2
Question Structure
The research questions were framed by three criteria:
• Population. Published scientific literature reporting software testing and SPL testing.
• Intervention. Empirical studies involving SPL Testing practices, techniques, methods
and processes. Outcomes. Type and quantity of evidence relating to various SPL testing
approaches, in order to identify practices, activities and research issues concerning to this
area.
3.4.3
Research Questions
As previously stated, the objective of this study is to understand, characterize and summarize
evidence, identifying activities, practical and research issues regarding research directions in
SPL Testing. We focused on identifying how the existing approaches deal with testing in SPL.
32
3.4. RESEARCH DIRECTIVES
In order to define the research questions, our efforts were based on topics addressed by previous
research on SPL testing (Edwin, 2007; Kolb and Muthig, 2003; Tevanlinna et al., 2004). In
addition, the research questions definition task was aided by discussions with expert researchers
and practitioners, in order to encompass relevant and still open issues.
Nine research questions were derived from the objective of the study. Answering these
questions led a detailed investigation of practices arising from the identified approaches, which
support both industrial and academic activities. The research questions, and the rationale for
their inclusion, are detailed below.
• Q1. Which testing strategies are adopted by the SPL Testing approaches? This
question is intended to identify the testing strategies adopted by a software product line
approach (Tevanlinna et al., 2004). By strategy, we mean understanding when assets are
tested, considering the differentiation between the two SPL development processes: core
asset and product development.
• Q2. What are the existing static and dynamic analysis techniques applied to the
SPL context? This question is intended to identify the analysis type (static and dynamic
testing (McGregor, 2001b)) applied along the software development life cycle.
• Q3. Which testing levels commonly applicable in single-systems development are
also used in the SPL approaches? Ammann and Offutt (2008) and Jaring et al. (2008)
advocate different levels of testing (unit, integration, system and acceptance tests) where
each level is associated with a development phase, emphasizing development and testing
equally.
• Q4. How do the product line approaches handle regression testing along software
product line life cycle? Regression testing is done when changes are made to already
tested artifacts (Kauppinen, 2003; Rothermel and Harrold, 1996). Regression tests often
are automated since test cases related to the core assets may be repeated every time a new
product is derived (Northrop and Clements, 2007). Thus, this question investigates the
regression techniques applied to SPL.
• Q5. How do the SPL approaches deal with tests of non-functional requirements?
This question seeks clarification on how tests of non-functional requirements should be
handled.
• Q6. How do the testing approaches in an SPL organization handle commonality
and variability? An undiscovered defect in the common core assets of a SPL will affect
33
3.5. DATA COLLECTION
all applications and thus will have a severe effect on the overall quality of the SPL (Pohl
and Metzger, 2006). In this sense, answering this question requires an investigation into
how the testing approaches handle commonality issues through the software life cycle, as
well as gathering information on how variability affects testability.
• Q7. How do variant binding times affect SPL testability? According to (Jaring et al.,
2008), variant binding time determines whether a test can be performed at a given development or deployment phase. Thus, the identification and analysis of the suitable moment to
bind a variant determines the appropriate testing technique to handle the specific variant.
• Q8. How do the SPL approaches deal with test effort reduction? The objective is to
analyze within selected approaches the most suitable ways to achieve effort reduction, as
well as to understand how they can be accomplished within the testing levels.
• Q9. Do the approaches define any measures to evaluate the testing activities? This
question requires an investigation into the data collected by the various SPL approaches
with respect to testing activities.
3.5
Data Collection
In order to answer the research questions, data was collected from the research literature. These
activities involved developing a search strategy, identifying data sources, selecting studies to
analyze, and data analysis and synthesis.
3.5.1
Search Strategy
The search strategy was developed by reviewing the data needed to answer each of the research
questions.
The initial set of keywords was refined after a preliminary search returned too many results
with few relevance. We used several combinations of search items until achieve a suitable set of
keywords. These are: Verification, Validation; Product Line, Product Family; Static Analysis,
Dynamic Analysis; Variability, Commonality, Binding; Test Level; Test Effort, Test Measure;
Non-functional Testing; Regression Testing, Test Automation, Testing Framework, Performance,
Security, Evaluation, Validation, as well as their similar nouns and syntactic variations (e.g.
plural form). All terms were combined with the term ”Product Line” and ”Product Family” by
using Boolean ”AND” operator. They all were joined each other by using ”OR” operator so that
34
it could improve the completeness of the results. The complete list of search strings is available
in Table 3.1 and also in a website developed to show detailed information on this MS1 .
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
3.5.2
Table 3.1 List of Research Strings
Research Strings
verification AND validation AND ("product line" OR "product family" OR "SPL")
"static analysis" AND ("product line" OR "product family" OR "SPL")
"dynamic testing" AND ("product line" OR "product family" OR "SPL")
"dynamic analysis" AND ("product line" OR "product family" OR "SPL")
test AND level AND ("product line" OR "product family” OR SPL)
variability OR commonality AND testing
variability AND commonality AND testing AND ("product line" OR "product family" OR
"SPL")
binding AND test AND ("product line" OR "product family" OR "SPL")
test AND "effort reduction" AND ("product line" OR "product family" OR "SPL")
"test effort" AND ("product line" OR "product family" OR "SPL")
"test effort reduction" AND ("product line" OR "product family" OR "SPL")
"test automation" AND ("product line" OR "product family" OR "SPL")
"regression test" AND ("product line" OR "product family" OR "SPL")
"non-functional test" AND ("product line" OR "product family" OR "SPL")
measure AND test AND ("product line" OR "product family" OR "SPL")
“testing framework” AND ("product line" OR "product family" OR "SPL")
performance OR security AND ("product line" OR "product family" OR "SPL")
evaluation OR validation AND ("product line" OR "product family" OR "SPL")
Data Sources
The search included important journals and conferences regarding the research topic such as
Software Engineering, SPL, Software Verification, Validation and Testing and Software Quality.
The search was also performed using the ’snow-balling’ process, following up the references
in papers and it was extended to include grey literature sources, seeking relevant white papers,
industrial (and technical) reports, thesis, work-in-progress, and books.
We restricted the search to studies published up to December 2009. This date represents
when we stopped the search, and began the analysis. We indeed did not establish an inferior
year-limit, since our intention was to have a broader coverage of this research field. This was
decided due to many important issues that emerged ten or more years ago are still considered
open issues, as pointed out in (Bertolino, 2007; Juristo et al., 2004).
1 http://www.cin.ufpe.br/∼sople/testing/ms/
35
The initial step was to perform a search using the terms described in 3.5.1, at the digital libraries web search engines. We considered publications retrieved from ScienceDirect, SCOPUS,
IEEE Xplore, ACM Digital Library and Springer Link tools.
The second step was to search within top international, peer-reviewed journals published by
Elsevier, IEEE, ACM and Springer, since they are considered the world leading publishers for
high quality publications (Brereton et al., 2007).
Next, conference proceedings were also searched. In cases which the conference keep the
proceedings in a website, making them available, we accessed the website. When proceedings
were not available by the conference website, the search was done through DBLP Computer
Science Bibliography 2 .
When searching conference proceedings and journals, many were the results that had already
been found in the search through digital libraries. In this case, we discarded the last results,
considering only the first, that had already been included in our results list.
The lists of Conferences and Journals used in the search for primary studies are available in
Appendices A.2 and A.1.
After performing the search for publications in conferences, journals, using digital libraries
and proceedings, we noticed that known publications, commonly referenced by other studies in
this field, such as important technical reports and thesis, had not been included in our results
list. We thus decided to include these grey literature entries. Grey literature is used to describe
materials not published commercially or indexed by major databases.
3.5.3
Studies Selection
The set of search strings was thus applied within the search engines, specifically in those
mentioned in the previous section. The studies selection involved a screening process composed
of three filters, in order to select the most suitable results, since the likelihood of retrieving not
adequate studies might be high. Figure 3.2 briefly describes what was considered in each filter.
Moreover, the Figure depicts the amount of studies remaining after applying each filter.
The inclusion criteria were used to select all studies during the search step. After that, the
same exclusion criteria was firstly applied in the studies title and after in the abstracts and
conclusions. Regarding the inclusion criteria, the studies were included if they involved:
• SPL approaches which address testing concerns. Approaches that include information on methods and techniques and how they are handled and, how variabilities and
2 http://www.informatik.uni-trier.de/∼ley/db/
36
Figure 3.2 Stages of the selection process.
commonalities influence software testability.
• SPL testing approaches which address static and dynamic analysis. Approaches that
explicitly describe how static and dynamic testing applies to different testing phases.
• SPL testing approaches which address software testing effort concerns. Approaches
that describe the existence of automated tools as well as other strategies used in order to
reduce test effort, and metrics applied in this context.
Studies were excluded if they involved:
• SPL approaches with insufficient information on testing. Studies that do not have
detailed information on how they handle SPL testing concepts and activities.
• Duplicated studies. When the same study was published in different papers, the most
recent was included.
• Or if the study had already been included from another source.
Figure 3.3 depicts a Bar Chart with the results categorized by source and filter, as described
in section 3.5.2. Figure 3.4 shows the distribution of the primary studies, considering the
37
publication year. This Figure briefly gives us the impression that the SPL Testing area is
becoming more interesting, whereas the growing number of publications claims the trend that
many solutions have become recently available (disregarding 2009, since many studies might
not be made available by search engines until the time the search was performed, and thus we
did not consider in this study).
Figure 3.3 Primary studies filtering categorized by source.
An important point to highlight is that, between 2004 and 2008 an important international
workshop devoted specifically to SPL testing, the SPLiT workshop3 , demonstrated the interest
of the research community on expanding this field. Figure 3.5 shows the amount of publications
considering their sources. In fact, it can be seen that peaks in Figure 3.4 match with the years
when this workshop occurred. All the studies are listed in Appendix A.3.
Reliability of Inclusion Decisions
The reliability of decisions to include a study is ensured by having multiple researchers to
evaluate each study. The study was conducted by two research assistants (the two first authors)
who were responsible for performing the searches and summarizing the results of the mapping
3 c.f.
http://www.biglever.com/split2008/
38
Figure 3.4 Distribution of primary studies by their publication years.
study, with other members of the team acting as reviewers. A high-level agreement existed
before the study was included. In case the researchers did not agree after discussion, an expert
in the area was contacted to discuss and give appropriate guidance.
3.5.4
Quality Evaluation
In addition to general inclusion/exclusion criteria, the quality evaluation mechanism, usually
applied in systematic reviews (Dybå and Dings, 2008; Dybå and Dingsøyr, 2008; Kitchenham
et al., 2007), was applied in this study in order to assess the trustworthiness of the primary
studies. This assessment is necessary to limit bias in conducting this empirical study, to gain
insight into potential comparisons, and to guide interpretation of findings.
The quality criteria we used served as a means of weighting the importance of individual
studies, enhancing our understanding, and developing more confidence in the analysis.
As mapping study guidelines (Petersen et al., 2008) do not establish a formal evaluation in
the sense of quality criteria, we chose to assess each of the primary studies by principles of good
practice for conducting empirical research in software engineering (Kitchenham and Charters,
39
Figure 3.5 Amount of Studies vs. sources.
2007), tailoring the idea of assessing studies by a set of criteria to our specific context.
Thus, the quality criteria for this evaluation is presented in Table 3.2. Criteria grouped as
A covered a set of issues pertaining to quality that need to be considered when appraising the
studies identified in the review, according to (Kitchenham et al., 2002). Groups B and C assess
the quality considering SPL Testing concerns. The former was focused on identifying how well
the studies address testing issues along the SPL development life cycle. The latter evaluated
how well our research questions were addressed by individual studies. This way a better quality
score matched studies which covered the larger amount of questions.
The main purpose of this grouping is justified by the difficulty faced in establishing a reliable
relationship between final quality score and the real quality of each study. Some primary studies
(e.g. one which addresses some issue in a very detailed way) are referenced in several other
primary studies, but if we apply the complete quality criteria items, the final score is lower than
others which do not have the same relevance. This way, we intended to have a more valid and
reliable quality assessment instrument.
Each of the 45 studies was assessed independently by the researchers according to the 16
criteria shown in Table 3.2. Taken together, these criteria provided a measure of the extent to
which we could be confident that a particular study could give a valuable contribution to the
40
Group
A
B
C
ID
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
Table 3.2 Quality Criteria
Quality Criteria
Are there any roles described?
Are there any guideline described?
Are there inputs and outputs described?
Does it detail the test artifacts?
Does it detail the validation phase?
Does it detail the verification phase?
Does it deal with Testing in Requirements phase?
Does it deal with Testing in Architectural phase?
Does it deal with Testing in Implementation phase?
Does it deal with Testing in Deployment phase?
Does it deal with binding time?
Does it deal with variability testing?
Does it deal with commonality testing?
Does it deal with effort reduction?
Does it deal with non-functional tests?
Does it deal with any test measure?
mapping study. Each of the studies was graded on a trichotomous (yes, partly or no) scale and
tagged 1, 0.5 and 0. We did not use the grade to serve as a threshold for the inclusion decision,
but rather to identify the primary studies that would form a valid foundation for our study. We
note that, overall, the quality of the studies was good. It is possible to check every grade in
Appendix A.3, where the most relevant are highlighted.
3.5.5
Data Extraction
The data extraction forms must be designed to collect all the information needed to address the
research questions and the quality criteria. The following information was extracted from each
study: title and authors; source: conference/journal; publication year; the answers for research
questions addressed by the study; summary: a brief overview on its strengths and weak points;
quality criteria score according to the Table 3.2; reviewer name; and the date of the review.
At the beginning of the study, we decided that when several studies were reported in the
same paper, each relevant study was treated separately. Although, this situation did not occur.
41
3.6. OUTCOMES
3.6
Outcomes
In this section, we describe the classification scheme and the results of data extraction. When
having the classification scheme in place, the relevant studies are sorted into the scheme, which
is the real data extraction process. The results of this process is the mapping of studies, as
presented at the end of this section, together with concluding remarks.
3.6.1
Classification Scheme
We decided to use the idea of categorizing studies in facets, as described by Petersen et al.
(2008), since we considered this as a structured way of doing such a task. Our classification
scheme assembled two facets. One facet structured the topic in terms of the research questions
we defined. The other considered the type of research.
In the second, our study used the classification of research approaches described by
Wieringa et al. (2006). According to Petersen et al. (2008), which also used this approach, the
research facet which reflects the research approach used in the papers is general and independent
from a specific focus area. The classes that form the research facet are described in Table 3.3.
The classification was performed after applying the filtering process, i.e. only the final set of
studies was classified and are considered. The results of the classification is presented at the end
of this section (Figure 3.8).
3.6.2
Results
In this sub-section, each topic presents the findings of a sub-research question, highlighting
evidences gathered from the data extraction process. These results populate the classification
scheme, which evolves while doing the data extraction.
Testing Strategy
By analyzing the primary studies, we have found a wide variety of testing strategies. Tevanlinna
and Reuys, respectively (Reuys et al., 2006) and (Tevanlinna et al., 2004) present a similar set
of strategies to SPL testing development, that are applicable to any development effort since the
descriptions of the strategies are generic. We herein use the titles of the topics they outlined,
after making some adjustments, as a structure for aggregating other studies which use a similar
approach, as follows:
42
3.6. OUTCOMES
• Testing product by product: This approach ignores the possibility of reuse benefits.
This approach offers the best guarantee of product quality but is extremely costly. In
(Jin-hua et al., 2008), a similar approach is presented, named as pure application strategy,
in which testing is performed only for a concrete product in the product development.
No test is performed in the core asset development. Moreover, in this strategy, tests for
each derived application are developed independently from each other, which results in
an extremely high test effort, as pointed out by (Reuys et al., 2006). This testing strategy
is similar to the test in single-product engineering, because without reuse the same test
effort is required for each new application.
• Incremental testing of product lines: The first product is tested individually and the
following products are tested using regression testing techniques (Graves et al., 2001;
Rothermel and Harrold, 1996). Regression testing focuses on ensuring that everything
used to work still works, i.e. the product features previously tested are re-tested through a
regression technique.
• Opportunistic reuse of test assets: This strategy is applied to reuse application test
assets. Assets for one application are developed. Then, the application derived from the
Classes
Validation Research
Evaluation Research
Solution Proposal
Philosophical Papers
Opinion Papers
Experience Papers
Table 3.3 Research Type Facet
Description
Techniques investigated are novel and have not yet been implemented in
practice. Techniques used are for example experiments, i.e., work done in
the lab.
Techniques are implemented in practice and an evaluation of the technique
is conducted. That means, it is shown how the technique is implemented
in practice (solution implementation) and what are the consequences of
the implementation in terms of benefits and drawbacks (implementation
evaluation). This also includes to identify problems in industry.
A solution for a problem is proposed, the solution can be either novel or a
significant extension of an existing technique. The potential benefits and the
applicability of the solution is shown by a small example or a good line of
argumentation.
These papers sketch a new way of looking at existing things by structuring
the field in form of a taxonomy or conceptual framework.
These papers express the personal opinion of somebody whether a certain
technique is good or bad, or how things should been done. They do not rely
on related work and research methodologies.
Experience papers explain what and how something has been done in practice. It has to be the personal experience of the author.
43
3.6. OUTCOMES
product line use the assets developed for the first application. This form of reuse is not
performed systematically, which means that there is no method that supports the activity
of selecting the test assets (Reuys et al., 2006).
• Design test assets for reuse: Test assets are created as early as possible in domain
engineering. Domain test aims at testing common parts and preparing for testing variable
parts (Jin-hua et al., 2008). In application engineering, these test assets are reused,
extended and refined to test specific applications (Jin-hua et al., 2008; Reuys et al., 2006).
General approaches to achieve core assets reuse are: repository, core assets certification,
and partial integration (Hui Zeng and Rine, 2004). Kishi and Noda (Kishi and Noda, 2006)
state that a verification model can be shared among applications that have similarities. The
SPL principle design for reuse is fully addressed by this strategy, which can enable the
overall goals of reducing cost, shortening time-to-market, and increasing quality (Reuys
et al., 2006).
• Division of responsibilities: This strategy relates to select testing levels to be applied in
both domain and application engineering, depending upon the objective of each phase,
i.e. whether thinking about developing for or with reuse (Tevanlinna et al., 2004). This
division can be clearly seen when the assets are unit tested in domain engineering and,
when instantiated in application engineering, integration, system and acceptance testing
are performed.
As SPL Testing is a reuse-based test derivation for testing products within a product line, as
pointed out by (Hui Zeng and Rine, 2004), the Testing product by product and Opportunistic
reuse of test assets strategies cannot be considered “affordable” for the SPL context, since the
first does not consider the reuse benefits which results in costs of testing resembling singlesystems development. In the second, no method is applied, hence, the activity may not be
repeatable, and may not avoid the redundant re-execution of test cases, which can thus increase
costs.
These strategies can be considered a feasible grouping of what studies on SPL testing
approaches have been addressing, which can show us a more generic view on the topic.
Static and Dynamic Analysis
An effective quality strategy for a software product line requires both static and dynamic analysis
techniques. Techniques for static analysis are often dismissed as more expensive, but in a
software product line, the cost of static analysis can be amortized over multiple products.
44
3.6. OUTCOMES
A number of studies advocate the use of inspections and walkthroughs (Jaring et al., 2008;
McGregor, 2001b; Tevanlinna et al., 2004) and formal verification techniques, as static analysis
techniques/methods for SPL, to be conducted prior to dynamic analysis, i.e. with the presence
of executable code. (McGregor, 2001b) presents an approach for Guided Inspection, aimed
at applying the discipline of testing to the review of non-software assets. In (Kishi and Noda,
2006), a model checker is defined that focuses on design verification instead of code verification.
This strategy is effective because many defects are injected during the design phase (Kishi and
Noda, 2006).
Regarding dynamic analysis, some studies (Jaring et al., 2008; Kolb and Muthig, 2006)
recommend the V-model phases, commonly used in single-systems, to structure a series of
dynamic analysis. The V-model gives equal weight to development and testing rather than
treating testing as an afterthought (Goldsmith and Graham, 2002). However, despite the welldefined test process presented by V-model, its use in SPL context requires some adaptation, as
applied in (Jaring et al., 2008).
The relative amount of dynamic and static analysis depends on both technical and managerial
strategies. Technically, series of factors such as test-first development or model-based development determine the focus. Model-based development emphasizes static analysis of models while
test-first development emphasizes dynamic analysis. Managerial strategies such as reduced time
to market, lower cost and improved product quality determine the depth to which analysis should
be carried.
Testing Levels
Some of the analyzed studies (e.g. (Jaring et al., 2008; Kolb and Muthig, 2006)) divide SPL
testing according to the two primary software product line activities: core asset and product
development.
Core asset development: Some testing activities are related to the development of test assets
and test execution to be performed to evaluate the quality of the assets, which will be further
instantiated in the application engineering phase. The two basic activities include developing
test artifacts that can be reused efficiently during application engineering and applying tests to
the other assets created during domain engineering (Kamsties et al., 2003; Pohl et al., 2005).
Regarding types of testing, the following are performed in domain engineering:
• Unit Testing: Verification of the smallest unit of software implementation. This unit
can be basically a class, or even a module, a function, or a software component. The
granularity level depends on the strategy adopted. The purpose of unit testing is to
45
3.6. OUTCOMES
determine whether this basic element performs as required through verification of the
code produced during the coding phase.
• Integration Testing: This testing is applied as the modules are integrated with each other
or within the reference in domain-level V&V when the architecture calls for specific
domain components to be integrated in multiple systems. This type of testing is also
performed during application engineering (McGregor, 2002). Li et. al. (Li et al., 2007)
present an approach for generating integration tests from unit tests.
Product development: Activities here are related to the selection and instantiation of assets
to build specific product test assets, design additional product specific tests, and execute tests.
The following types of testing can be performed in application engineering:
• System Testing: System testing ensures that the final product matches the required
features (Nebut et al., 2006). According to (Geppert et al., 2004), system testing evaluates
the features and functions of an entire product and validates that the system works the way
the user expects. A form of system testing can be carried out on the software architecture
using a static analysis approach.
• Acceptance Testing: Acceptance testing is conducted by the customer but often the
developing organization will create and execute a preliminary set of acceptance tests. In a
software product line organization, commonality among the tests needed for the various
products is leveraged to reduce costs.
A similar division is stated by McGregor (2002), in which the author defines two separated
test processes used in product line organization, Core Asset Testing and Product Testing.
Some authors (Olimpiew and Gomaa, 2005a; Reuys et al., 2006; Wübbeke, 2008) also
include system testing in core asset development. The rationale for including such a level is
to produce abstract test assets to be further reused and adapted when deriving products in the
product development phase.
Regression Testing
Even though regression testing techniques have been researched for many years, as stated in
(Engström et al., 2008; Graves et al., 2001; Rothermel and Harrold, 1996), no study gives
evidence on regression testing practices applied to SPL. Some information is presented by a few
studies (Kolb and Muthig, 2003; Muccini and van der Hoek, 2003), where just a brief overview
46
3.6. OUTCOMES
on the importance of regression testing is given, but they do not take into account the issues
specific to SPLs.
McGregor (2001b) reports that when a core asset is modified due to evolution or correction,
they are tested using a blend of regression testing and development testing. According to him,
the modified portion of the asset should be exercised using:
• Existing functional tests if the specification of the asset has not changed;
• If the specifications has changed, new functional tests are created and executed; and
• Structural tests created to cover the new code created during the modification.
He also highlights the importance of regression test selection techniques and the automation
of the regression execution.
Kauppinen and Taina (2003) advocate that the testing process should be iterative, and based
on test execution results, new test cases should be generated and tests scripts may be updated
during a modification. These test cases are repeated during regression testing each time a
modification is made.
Kolb (2003) highlights that the major problems in a SPL context are the large number of
variations and their combinations, redundant work, the interplay between generic components
and product-specific components, and regression testing.
Jin-hua et al. (2008) emphasize the importance of regression testing when a component or a
related component cluster are changed, saying that regression testing is crucial to perform on the
application architecture, which aims to evaluate the application architecture with its specification.
Some researchers also developed approaches to evaluate architecture-based software by using
regression testing (Harrold, 1998; Muccini et al., 2005, 2006).
Non-functional Testing
Non-functional issues have a great impact on the architecture design, where predictability of
the non-functional characteristics of any application derived from the SPL is crucial for any
resource-constrained product. These characteristics are well-known quality attributes, such as
response time, performance, availability, scalability, etc., that might differ in instances of a
product line. According to (Ganesan et al., 2005), testing non-functional quality attributes is
equally important as functional testing.
By analyzing the studies, it was noticed that some of them propose the creation or execution
of non-functional tests. Reis (2006) presents a technique to support the development of reusable
47
3.6. OUTCOMES
performance test scenarios to be further reused in application engineering. Feng et al. (2007)
highlight the importance of non-functional concerns (performance, reliability, dependability,
etc.). Ganesan et al. (2005) describe a work intended to develop an environment for testing
the response time and load of a product line, however due to the constrained experimental
environment there was no visible performance degradation observed.
In single-system development, different non-functional testing techniques are applicable for
different types of testing, the same might hold for SPL, but no experience reports were found to
support this statement.
Commonality and Variability Testing
Commonality, as an inherent concept in the SPL theory, is naturally addressed by many studies,
such as stated by Pohl et al. (2005), in which the major task of domain testing is the development
of common test artifacts to be further reused in application testing.
The increasing size and complexity of applications can result in a higher number of variation
points and variants, which makes testing all combinations of the functionality almost impossible
in practice. Managing variability and testability is a trade-off. The large amount of variability in
a product line increases the number of possible testing combinations. Thus, testing techniques
that consider variability issues and thus reduce effort are required.
Cohen et al. (2006) introduce cumulative variability coverage, which accumulates coverage
information through a series of development activities, to be further exploited in a target testing
activities for product line instances.
Another solution, proposed by Kolb and Muthig (2006), is the imposition of constraints
in the architecture. Instead of having components with large amount of variability it is better
for testability to separate commonalities and variabilities and encapsulate variabilities as subcomponents. Aiming to reduce the retest of components and products when modifications are
performed, independence of feature and components, as well as the reduction of side effects,
reduce the effort required for adequate testing.
Tevanlinna et al. (2004) highlight the importance of asset traceability from requirements
to implementation. There are some ways to achieve this traceability between test assets and
implementation, as reported by McGregor et al. (2004), in which the design of each product line
test asset matches the variation implementation mechanism for a component.
The selected approaches handle variability in a range of different manners, usually expliciting
variability as early as possible in UML use cases (Hartmann et al., 2004; Kang et al., 2007;
Rumbaugh et al., 2004) that will further be used to design test cases, as described in the
48
3.6. OUTCOMES
requirement-based approaches (Bertolino and Gnesi, 2003a; Nebut et al., 2003). Moreover,
model-based approaches introduce variability into test models, created through use cases and
their scenarios (Reuys et al., 2005, 2006), and specifying variablity into feature models and
activity diagrams (Olimpiew and Gomaa, 2005a, 2009). They are usually concerned about
reusing test case in a systematic manner through variability handling as (Al-Dallal and Sorenson,
2008; Wübbeke, 2008) report.
Variant Binding Time
According to (McGregor et al., 2004), the binding of different variants requires different
binding time (Compile Time, Link Time, Execution Time and Post-Execution Time), which
requires different mechanisms (e.g. inheritance, parameterization, overloading and conditional
compilation). They are suitable for different variability implementation schemes. The different
mechanisms result in different types of defects, test strategies, and test processes.
This issue is also addressed by Jaring et al. (2008), in their Variability and Testability
Interaction Model, which is responsible for modeling the interaction between variability binding
and testability in the context of the V-model. The decision regarding the best moment to test a
variant is clearly important. The earliest point at which a decision is bound is the point at which
the binding should be tested.
In our findings, the approach presented in (Reuys et al., 2006) deals with testing variant
binding time as a form of ensuring that the application comprises the correct set of features,
as the customer looks forward. After performing the traditional test phases in application
engineering, the approach suggests tests to be performed towards verifying if the application
contains the set of functionalities required, and nothing else.
Effort Reduction
Some authors consider testing the bottleneck in SPL, since the cost of testing product lines
is becoming more costly than testing single systems (Kolb, 2003; Kolb and Muthig, 2006).
Although applications in a SPL share common components, they must be tested individually
in system testing level. This high cost makes testing an attractive target for improvements
(Northrop and Clements, 2007). Test effort reduction strategies can have significant impact on
productivity and profitability (McGregor, 2001a). We found some strategies regarding this issue.
They are described as follows:
• Reuse of test assets: Test assets - mainly test cases, test scenarios and test results -
49
3.6. OUTCOMES
(McGregor, 2001a) are created to be reusable, which consequently impacts the effort
reduction. According to (Kauppinen and Taina, 2003) and (Hui Zeng and Rine, 2004), an
approach to achieve the reuse of core assets comes from the existence of an asset repository.
It usually requires an initial testing effort for its construction, but throughout the process,
these assets do not need to be rebuilt, they can be rather used as is. Another strategy
considers the creation of test assets as extensively as possible in domain engineering,
anticipating also the variabilities by creating documents templates and abstract test cases.
Test cases and other concrete assets are used as is and the abstract ones are extended or
refined to test the product-specific aspects in application engineering. In (Juan Jenny Li
and Weiss, 2007), a method for monitoring the interfaces of every component during test
execution is proposed, observing commonality issues in order to avoid repetitive execution.
As mentioned before in section 3.6.2, the systematic reuse of test assets, especially test
cases, are the focus of many studies, each offering novel and/or extended approaches. The
reason for dealing with assets reuse in a systematic manner is that it can enable effort
reduction, since redundant work may be avoided when deriving many products from
the product line. In this context, the search for an effective approach has been noticed
throughout the past recent years, as can be seen in (McGregor, 2001a, 2002; Nebut et al.,
2006; Olimpiew and Gomaa, 2009; Reuys et al., 2006). Hence, it is feasible to infer that
there is not a general solution for dealing with systematic reuse in SPL testing yet.
• Test automation tools: Automatic testing tools to support testing activities (Condron,
2004) is a way to achieve effort reduction. Methods have been proposed to automatically
generate test cases from single system models expecting to reduce testing effort (Hartmann
et al., 2004; Li et al., 2007; Nebut et al., 2003), such as mapping the models of an SPL
to functional test cases in order to automatically generate and select functional test cases
for an application derived (Olimpiew and Gomaa, 2005b). Automatic test execution is
an activity that should be carefully managed to avoid false failures since unanticipated
or unreported changes can occur in the component under test. These changes should be
rejected in the corresponding automated tests (Condron, 2004).
Test Measurement
Test measurement is an important activity applied in order to calibrate and adjust approaches.
Adequacy of testing can be measured based on the concept of a coverage criterion. Metrics
related to test coverage are applied to extract information, and are useful for the whole project.
We investigated how test coverage has been applied by existing approaches regarding SPL issues.
50
3.6. OUTCOMES
According to (Tevanlinna et al., 2004), there is only one way to completely guarantee that a
program is fault-free, to execute it on all possible inputs, which is usually impossible or at least
impractical. It is even more difficult if the variations and all their constraints are considered.
Test coverage criteria are a way to measure how completely a test suite exercises the capabilities
of a piece of software. These measures can be used to define the space of inputs to a program.
It is possible to systematically sample this space and test only a portion of the feasible system
behavior (Cohen et al., 2006). The use of covering arrays as a test coverage strategy is addressed
in (Cohen et al., 2006). Kauppinen and Tevanlinna (Kauppinen et al., 2004) define coverage
criteria for estimating the adequacy of testing in a SPL context. They propose two coverage
criteria for framework-based product lines: hook and template coverage, that is, variation points
open for customization in a framework are implemented as hook classes and stable parts as
template classes. They are used to measure the coverage of frameworks or other collections
of classes in an application by counting the structures or hook method references from them
instead of single methods or classes.
3.6.3
Analysis of the Results and Mapping of Studies
Figure 3.6 Distribution of papers according to classification scheme.
The analysis of the results enables us to present the amount of studies that match each
category addressed in this study. It makes it possible to identify what have been emphasized
51
3.6. OUTCOMES
Figure 3.7 Distribution of papers according to intervention.
Figure 3.8 Visualization of a Systematic Map in the Form of a Bubble Plot.
52
3.6. OUTCOMES
in past research and thus to identify gaps and possibilities for future research (Petersen et al.,
2008).
Initially, let us analyze the distribution of studies regarding our analysis point of view. Figures
3.6 and 3.7, that present respectively the frequencies of publications according to the classes of
the research facet and according to the research questions addressed by them (represented by Q1
to Q9). Table 3.4 details Figure 3.7 showing which papers answer each research question. It is
valid to mention that, in both categories, it was possible to have a study matching more than one
topic. Hence, the total amount verified in Figures 3.6 and 3.7 exceeds the final set of primary
studies selected for detailed analysis.
When merging these two categories, we have a quick overview of the evidence gathered
from the analysis of the SPL testing field. We used a bubble plot to represent the interconnected
frequencies, as shown in Figure 3.8. This is basically a x-y scatterplot with bubbles in category
intersections. The size of a bubble is proportional to the number of articles that are in the pair of
categories corresponding to the bubble coordinates (Petersen et al., 2008).
The classification scheme applied in this study enabled us to infer that researchers are
mostly in the business of proposing new techniques and investigating their properties more than
evaluating and/or experiencing them in practice, through proposing new solutions, as seen in
Figure 3.8. Solution Proposal and Validation Research are together, the topics with more
entries, if we consider categories considered in this study.
Topics such as Q1 (testing strategies), Q3 (testing levels), Q6 (commonality and variability
analysis) and Q8 (effort reduction), join the amount of papers devoted to propose solution for the
problems they cover. They have really been the overall focus of researchers. On the other hand
we have pointed out topics in which new solutions are required, it is the case of Q2 (static and
dynamic analysis interconnection in SPL Testing), Q4 (regression testing), Q5 (non-functional
testing), Q7 (variant binding time) and Q9 (measures).
Although some topics present a relevant amount of entries in this analysis, such as Q1, Q3,
Q6 and Q8, as aforementioned, these still lack field research, since the techniques investigated
and proposed are mostly novel and have usually not yet been implemented in practice. We
realize that currently, Evaluation Research is weak in SPL Testing papers. Regarding the
maturity of the field in terms of evaluation research and solution papers, other studies report
results in line with our findins, e.g. (Šmite et al., 2010). Hence, we realize that this is not a
problem solely to SPL testing, but rather it involves, in a certain way, other software engineering
practices.
53
3.6. OUTCOMES
Table 3.4: Research Questions (RQ) and primary studies.
RQ
Primary Studies
Q1
(Al-Dallal and Sorenson, 2008; Bertolino and Gnesi, 2003a,b; Edwin, 2007; Hui Zeng and
Rine, 2004; Jaring et al., 2008; Jin-hua et al., 2008; Kang et al., 2007; Kauppinen et al., 2004;
Kishi and Noda, 2006; Kolb, 2003; Kolb and Muthig, 2003, 2006; McGregor, 2001b, 2002;
Olimpiew and Gomaa, 2005a, 2009; Reis, 2006; Reis et al., 2007; Reuys et al., 2005, 2006;
Wübbeke, 2008)
Q2
(Al-Dallal and Sorenson, 2008; Denger and Kolb, 2006; Edwin, 2007; Kishi and Noda, 2006;
McGregor, 2001b)
Q3
(Al-Dallal and Sorenson, 2008; Edwin, 2007; Geppert et al., 2004; Hui Zeng and Rine, 2004;
Jaring et al., 2008; Jin-hua et al., 2008; Juan Jenny Li and Weiss, 2007; Kamsties et al., 2003;
Kauppinen, 2003; Kolb and Muthig, 2003, 2006; Li et al., 2007; McGregor, 2001b, 2002;
Muccini and van der Hoek, 2003; Nebut et al., 2006; Olimpiew and Gomaa, 2005a; Pohl and
Sikora, 2005; Reis et al., 2007; Reuys et al., 2006; Wübbeke, 2008)
Q4
(Harrold, 1998; Jin-hua et al., 2008; Kauppinen and Taina, 2003; Kolb and Muthig, 2003;
McGregor, 2001b; Muccini and van der Hoek, 2003)
Q5
(Feng et al., 2007; McGregor, 2001b, 2002; Nebut et al., 2003; Reis, 2006)
Q6
(Al-Dallal and Sorenson, 2008; Beatriz Pérez Lamancha, 2009; Bertolino and Gnesi, 2003a,b;
Cohen et al., 2006; Condron, 2004; Edwin, 2007; Feng et al., 2007; Geppert et al., 2004;
Hui Zeng and Rine, 2004; Jaring et al., 2008; Juan Jenny Li and Weiss, 2007; Kamsties et al.,
2003; Kang et al., 2007; Kishi and Noda, 2006; Kolb and Muthig, 2006; Li et al., 2007;
McGregor et al., 2004; Nebut et al., 2006; Olimpiew and Gomaa, 2009; Pohl and Metzger,
2006; Pohl and Sikora, 2005; Reis, 2006; Reis et al., 2007; Reuys et al., 2005, 2006; Wübbeke,
2008)
Q7
(Cohen et al., 2006; Jaring et al., 2008; Jin-hua et al., 2008; McGregor et al., 2004; Pohl and
Metzger, 2006)
Q8
(Al-Dallal and Sorenson, 2008; Bertolino and Gnesi, 2003a; Condron, 2004; Edwin, 2007;
Feng et al., 2007; Ganesan et al., 2005; Geppert et al., 2004; Hui Zeng and Rine, 2004; Jaring
et al., 2008; Juan Jenny Li and Weiss, 2007; Kang et al., 2007; Kauppinen, 2003; Kauppinen
and Taina, 2003; Kishi and Noda, 2006; Kolb and Muthig, 2006; Li et al., 2007; McGregor,
2001b; Nebut et al., 2003, 2006; Olimpiew and Gomaa, 2009; Pohl and Metzger, 2006; Reis
et al., 2007; Reuys et al., 2005, 2006)
Q9
(Al-Dallal and Sorenson, 2008; Ganesan et al., 2005; Jin-hua et al., 2008; Kauppinen, 2003;
Olimpiew and Gomaa, 2009; Reuys et al., 2006)
We also realize that researchers are not concerned about Experience Reports on their
personal experience using particular approaches. Practitioners in the field should report results
on the adoption, in the real world of the techniques proposed and reported in the literature.
54
3.6. OUTCOMES
Moreover, authors should Express Opinions about the desirable direction of SPL Testing
research, expressing their experts viewpoint.
In fact, the volume of literature devoted to testing software product lines attests to the
importance assigned to it by the product line community. In the following subsection we detail
what we considered most relevant in our analysis.
Main findings of the study
We identified a number of test strategies that have been applied to software product lines.
Many of these strategies address different aspects of the testing process and can be applied
simultaneously. However, we have no evidence about the effectiveness of combining strategies,
and in which context it could be suitable. The analyzed studies do not cover this potential. There
is only a brief indication that the decision about which kind of strategy to adopt depends on
a set of factors such as software development process model, languages used, company and
team size, delivery time, budget, etc. Moreover, it is a decision made in the planning stage of
the product line organization since the strategy affects activities that begin during requirements
definition. But it still remains as hypotheses, that need to be supported or refuted through formal
experiments and/or case studies.
A complete testing process should define both static and dynamic analyses. We found that
even though some studies emphasize the importance of static analysis, few detail how this is
performed in a SPL context (Kishi and Noda, 2006; McGregor, 2001b; Tevanlinna et al., 2004),
despite its relevance in single-system development. Static analysis is particularly important in a
product line process since many of the most useful assets are non-code assets and particularly
the quality of the software architecture is critical to success.
Specific testing activities are divided across the two types of activities: domain engineering
and application engineering. From the set of studies, around four (Edwin, 2007; Jaring et al.,
2008; Jin-hua et al., 2008; Kauppinen, 2003) adopt (or advocate the use of) the V-model
as an approach to represent testing throughout the software development life cycle. As a
widely adopted strategy in single-system development, tailoring V-model to SPL could result in
improved quality. However, there is no consensus on the correct set of testing levels for each
SPL phase.
We did not find evidence regarding the impact for the SPL of not performing a specific
testing level in domain or application engineering. Is there any consequence if, for example
unit/integration/system testing was not performed in domain engineering? We need investigations to verify such an aspect. Moreover, what are the needed adaptations for the V-model to
55
3.6. OUTCOMES
be effective in the SPL context? This is a point which experimentation is welcome, in order to
understand the behavior of testing levels in SPL.
A number of the studies addressed, or assumed, that testing activities are automated (e.g.
(Condron, 2004; Li et al., 2007)). In a software product line automation is more feasible because
the resources required to automate are amortized over the larger number of products. The
resources are also more narrowly focused due to the overlap of the products. Some of the studies
illustrated that the use of domain specific languages, and the tooling for those languages, is
more feasible in a software product line context. Nevertheless, we need to understand if the
techniques are indeed effective when applying them in an industrial context. We lack studies
reporting results of this nature.
According to (Kolb, 2003), one of the major problems in testing product lines is the large
number of variations. The study reinforces the importance of handling variability testing during
all software life cycle.
In particular, the effect of variant binding time concerns was considered in this study.
A well-defined approach was found in (Jaring et al., 2008), with information provided by
case studies conducted in an important electronic manufacturer. However, there are still many
issues to be considered regarding variation and testing, such as what is the impact of designing
variations in test assets regarding effort reduction? What are the most suitable strategy to handle
variability within test assets: use cases and test cases or maybe sequence or class diagrams?
How to handle traceability and what is the impact of not handling such an issue, in respect to test
assets. We also did not find information about the impact of different binding times for testing
in SPL, e.g. compile-time, scoping-time, etc. We also lack evidences on this direction.
Regression testing does not belong to any one point in the software development life cycle
and as a result there is a lack of clarity in how regression testing should be handled. Despite this,
it is clear that regression testing is important in the SPL context. Regression testing techniques
include approaches to selecting the smallest test suite that will still find the most likely defects
and techniques that make automation of test execution efficient.
From the amount of studies analyzed, a few addressed testing non-functional requirements
(Feng et al., 2007; McGregor, 2001b, 2002; Nebut et al., 2003; Reis, 2006). They point out that
during architecture design static analysis can be used to give an early indication of problems
with non-functional requirements. One important point that should be considered when testing
quality attributes is the presence of trade-offs among them, for example, the trade-off between
modularity and testability. This leads to natural pairings of quality attributes and their associated
tests. When a variation point represents a variation in a quality attribute, the static analysis
56
3.7. THREATS TO VALIDITY
should be sufficiently complete to investigate different outcomes. Investigations towards making
explicit which techniques currently applied for single-system development can be adopted in
SPL are needed, since studies do not address such an issue.
Our mapping study has illustrated a number of areas in which additional investigation would
be useful, specially regarding evaluation and validation research. In general, SPL testing lack
evidence, in many aspects. Regression test selection techniques, test automation and architecturebased regression testing are points for future research as well as techniques that address the
relationships between variability and testing and techniques to handle traceability among test
and development artifacts.
As earlier mentioned, the related literature analysis provided by the mapping study served as
basis to define the research directions to be followed in the context of this dissertation. Hence,
among the perceived research needs, a suitable starting point should be the definition of a
process for testing in SPL, since few details are provided by existing studies. A process should
be designed that can be possible to assess its role in the SPL development.
3.7
Threats to Validity
There are some threats to the validity of our study. They are described and detailed as follows:
• Research Questions: The set of questions we defined might not have covered the whole
SPL testing area, which implies that one may not find answers to the questions that concern
them. As we considered this as a feasible threat, we had several discussion meetings with
project members and experts in the area in order to calibrate the questions. This way, even
if we had not selected the most optimum set of questions, we attempted to deeply address
the most asked and considered open issues in the field.
• Publication Bias: We cannot guarantee that all relevant primary studies were selected. It
is possible that some relevant studies were not chosen throughout the searching process.
We mitigated this threat to the extent possible by following references in the primary
studies.
• Quality Evaluation: The quality attributes as well as the weight used to quantify each
of them might not properly represent the attributes importance. In order to mitigate this
threat, the quality attributes were grouped in subsets to facilitate their further classification.
• Unfamiliarity with other fields: The terms used in the search strings can have many
synonyms, it is possible that we overlooked some work.
57
3.8
Chapter Summary
This Chapter presented a mapping study conducted with the goal of investigating the state-ofthe-art in SPL testing, through systematically mapping the literature in order to determine what
issues have been studied, as well as by what means, and provide a guide to aid researchers
in planning future research. This research was conducted through a Mapping Study, a useful
technique for identifying the areas where there is sufficient information for a SR to be effective,
as well as those areas where more research is needed Budgen et al. (2008).
Searching the literature, some important aspects are not reported, and when they are found
just a brief overview is given. Regarding industrial experiences, we noticed they are rare in
literature. The existent case studies report small projects, containing results obtained from in
company-specific application, which makes impracticable their reproduction in other context,
due to the lack of details. This scenario depicts the need of experimenting SPL Testing approaches not in academia but rather in industry. This study identified the growing interest in
a well-defined SPL Testing process, including tool support. Our findings in this sense are in
line with a previous study conducted by Lamancha et al. (2009), which reports on a systematic
review on SPL testing, as mentioned in Section 3.2.
The number of approaches that handle specific points in a testing process make the analysis
and the comparison a hard task. Nevertheless, through this study we are able to identify which
activities are handled by the existing approaches as well as understanding how the researchers
are developing work in SPL testing. Some research points were identified throughout this
research and these can be considered an important input into planning further research.
This mapping also pointed out topics that need additional investigation, such as quality
attribute testing considering variations in quality levels among products, how to maintain
traceability between development and test assets, the management of variability in the whole
development life cycle and reasoning about systematic processes to support testing in SPL.
This study was indeed fundamental to establish the basis for our research. From the set
of open rooms for improvement identified by analyzing the results of this mapping study we
decided to go further in the direction of defining a systematic process for handling testing
activities in SPL projects, elaborating on related approaches. We believe that this will serve as a
starting point towards understanding how testing can be (effectively) performed, since related
research does not give information enough that make feasible the application of testing practices
in real projects. Hence, next Chapter described the process built based on the information
obtained by analyzing the results of this study.
58
4
RiPLE-TE: a Process for Product Line
Testing
This chapter introduces the RiPLE-TE, the process for testing in Software Product Lines, by
describing its activities, roles and workflow, defining a number of key terms, and thereby
explaining its workability.
The RiPLE-TE provides a structured process for testing in SPL projects. Indeed, the
challenge in establishing a practical process for software engineering is to develop some clear
and definitive steps that are well explained and easily understood by practitioners. It continues
to hold in case of SPL engineering, where processes are intended to work not only with small
but also complex software systems.
4.1
Overview of the Process
The RiPLE-TE is a process for testing in SPL projects, part of the RiPLE - RiSE Product
Line Engineering Process -, a framework for SPL development, which comprises the following
disciplines: Scoping, Requirements, Design, Implementation, Evolution Management, and
Testing, thus encompassing the whole SPL development life cycle. The RiPLE project has being
developed in the context of the RiSE Labs1 .
In the RiPLE-TE, the tests are managed in parallel to the production assets to keep them
synchronized. Moreover, the assets are designed to handle the range of variability defined in
prior disciplines, such as Scoping, Requirements and Design.
Testing in the context of a product line includes testing the core assets’ software, the product
1 http://labs.rise.com.br
59
4.1. OVERVIEW OF THE PROCESS
specific software, and their interactions, conducted within the context of other disciplines
(McGregor, 2001b). Hence, the RiPLE-TE was developed in order to support testing in the
both SPL phases: Core Asset and Product Development. Figure 4.1 depicts the main flow of the
RiPLE-TE.
Figure 4.1 RiPLE-TE main flow.
With the results gained through performing the systematic mapping study (as presented in
Chapter 3), we decided to adopt two different processes for testing in SPL, considering the
peculiarities of each phase. In Core Asset Development, when assets have to be developed with
a special attention to the forthcoming reuse, we advocate that unit and integration testing levels
should be performed, whereas system and acceptance testing have to be postponed to Product
Development, where the assets previously developed will be reused.
The purpose of such a division is linked to the role of testing in every phase, and it is in line
with other approaches, such as (McGregor, 2002). In Core Asset phase, assets are produced
from scratch, while in Product Development these are instantiated into products.
Therefore, in a first moment testing should be concerned about evaluating assets from its
initial development effort. If we think about testing levels, the initial effort should be devoted
to unit testing. Yet, as we are dealing with a component-based development strategy (CBD),
the SPL approach, here components act as ’units’. Thus, in order to ensure that a component
may be further reused, it should be tested, under planned conditions. Although coupling and
cohesion are considered the cornerstones in modular software development, provided by CBD,
in practice it is quite normal to work with tightly coupled units. In these cases, after performing
unit tests in a component, and ensure that it fulfills what it was specified to, integration tests are
60
then performed.
We advocate the use of both testing levels since they are responsible for detecting different
types of faults. Whereas unit testing independently tests methods, classes, and the interaction
among these pieces which comprises a component, integration testing is responsible for testing
the interaction among components interfaces and the integration between modules.
After unit and integration tests have been executed, the role of testing in core asset development phase is over, since the set of conditions to enable further reuse of components has
been reached. In short, Core Asset tests try to minimize the Product testing. It can be achieved
through preserving variability in core assets to facilitate reuse.
Next, in Product Development, the phase in which products are instantiated, integration
testing is performed once again. But how come to perform integration tests again? Given that
our purpose is to avoid repetition and thus consequently to reduce the overall testing effort, in
CAD it is only necessary to perform the integration between tightly coupled units, regardless
integrating the whole set of components. It is even advisable due to the behavior of the core asset
base, in which there are several components, attending to a diverse range of variations, that not
necessarily integrate with each other, but indeed should be ready for future integration, in PD.
This is when integration testing in PD takes place. At the moment of the product derivation, a set
of components is selected from the core asset base. We have already ensured that individually
the components are working as specified, as well as the integration among tightly coupled ones.
Then, it is time to test the integration of the larger set of components which will comprise the
product, in order to ensure the workability of the interconnected modules as well.
Next, it is system testing time. This level is then focused on evaluating the product as a
whole, against the requirements previously defined.
Acceptance tests are carried out after system testing, in order to receive customer feedbacks
on the product just instantiated. The acceptance tests must be planned carefully with input from
the customers and users. Acceptance test cases are based on requirements.
Figure 4.2 presents the RiPLE-TE expanded view on the interaction among testing levels
and the SPL phases: (i) core asset, represented by the activities horizontally aligned on the top;
and (ii) product development, represented by the activities horizontally aligned at the bottom, so
that we can give readers an overview on where this proposed process fits. This Figure shows a
broader view of the process that will be, throughout this Chapter, detailed.
It is valid to mention that RiPLE-TE allows that unit tests may be performed during Product
development. It is possible in cases in which a new requirement or feature is probably to be
included in a specific product that does not belong to the Core Asset base yet. As long as
61
Figure 4.2 Interaction among testing levels and SPL phases, in the context of the RiPLE-TE.
this new artifact is produced, and new code is developed, unit tests is then performed. But
instead, whenever it occurs, we strongly recommend to use the RiPLE-EM activity (Oliveira,
2009) related to the propagation of assets, in which every new artifact that, at an initial moment,
pertains to only one product, should be included in the core asset base, since reuse is motivated.
It eases the assets management. It is truly easier to maintain and evolve assets in the same base,
than having different assets spread into different products.
In addition, with such a propagation we are in line with the purpose of unit testing (McGregor,
2001b), (Clements and Northrop, 2001), that should be performed in Core Asset development, as
soon as the core assets are being developed and their quality must be assured before instantiated
into derived products.
The RiPLE-TE was modeled using the Eclipse Process Framework (EPF) Composer 2 , a
process management tool platform and conceptual framework, which provides an easy to learn
user-experience and simple to use features for authoring, tailoring, and deploying of development
process frameworks (Haumer, 2007). The EPF Composer is an eclipse-based open source tool,
available for free under the Eclipse Public license at the EPF homepage. The OMG SPEM
(Software Process Engineering Meta-Model)(OMG, 2008), which defines a formal language for
2 EPF
Composer homepage - http://www.eclipse.org/epf/
62
4.2. RIPLE-TE ROLES AND RESPONSIBILITIES
describing development processes, was used to model this process within the EPF Composer.
Throughout this Chapter we detail the RiPLE-TE elements, explicitly their roles and responsibilities, work products and activities, as well as the interaction among these elements,
considering the both SPL phases. Moreover, we explicitly present how variability and reuse
concerns are encouraged and handled.
4.2
RiPLE-TE Roles and Responsibilities
Test efforts are complex, and require a test team capable of comprehend the scope and depth
of the required test effort and develop a strategy for the test program. Diverse background
and experience are thus required for these professionals, known as test engineers. The test
engineering roles used in RiPLE-TE are: Test Manager, Test Architect, Test Designer and
Tester. Their responsibilities can be inherited from the traditional software development roles
(Ammann and Offutt, 2008; McGregor, 2001b).
A test engineer, involved in a SPL project can ’wear many hats’, i.e. he/she can assume more
than one role, as well as one role can be commissioned by more than one engineer. The roles to
be set up will depend upon the task at hand. The team should have personnel enough to cover at
least these set of roles. Although some organizations can extend these set in order to include
additional specific roles, such as whenever individual testers become experts in a specific areas,
and the need of creating a new role emerges. The process model we create is flexible enough to
include them, without impact on it.
Besides the aforementioned test engineering roles, other stakeholders, usually orthogonal to
the remaining RiPLE disciplines, can be applicable to RiPLE-TE. Responsibilities associated
with each of them are following listed:
• SPL Manager: Responsible for the overall business enterprise and, therefore, is concerned with the entire set of past, current, and future products that comprise the product
line. He is therefore in the best position to control and monitor testing.
• Core Asset Manager: Responsible for the set of core assets and associated infrastructure
that enables the streamlined development of the products in a product line, as well as for
maintaining versions and variants of all domain assets for the complete range of products.
• Product Manager: Responsible for the planning and evolution management of the
complete range of products, the portfolio management. This involves planning of present
63
4.3. RIPLE-TE WORK PRODUCTS
and future products in the product line, their features and their business value (Linden
et al., 2007).
• CCB (Change Control Board): The CCB is the group (or individual) responsible for the
analysis of both change and propagation requests (Oliveira, 2009).
• Build and Release Engineer: This is the role responsible for building and releasing
products and core assets. It is also known as Software Configuration Engineer (Oliveira,
2009).
4.3
RiPLE-TE Work Products
Following we list the work products that support the RiPLE-TE process.
Master Test Plan. It refers to the general plan, the first artifact to be developed in order
to guide the remaining test activities, throughout the whole testing process. The Master Test
Plan describes the scope, approach, resources, and schedule of intended testing activities. It
identifies test items, the features to be tested, the testing tasks, who will do each task, and any
risks requiring contingency planning. The Master Test Plan embodies the test strategy being
used for the SPL testing activities. Besides, in the context of the RiPLE-TE, this plan also
encompasses fields specific for the SPL environment, such as variation points and variants to be
tested in a cycle.
Test Plan. Besides the Master Test Plan, that covers the whole testing discipline, every
testing level has a specific Test Plan associated, designed to provide a more detailed documentation. Although it can cause unnecessary overhead, as it might be said, the Master Test Plan
has already included the whole set of relevant information regarding the project and activities
associated to test concerns. Therefore, in a specific test level, the Test Plan will only include
information specific to it, regardless the information that had already been included in a more
general artifact. Moreover, this plan enables the reuse of information throughout the life cycle,
since as everything regarding testing in a component, for instance, is documented, once another
component that reuses some (or the whole) information of this one will also reuse the plan, or
even part of it. This way, the only need is to “populate” an abstract document, with information
that strictly refers a test level. In the case of delivering a component, it is indeed more interesting
to include in its documentation only information related to it. With this strategy, we can hence
accomplish such a goal. The component will carry only test information that refers to it.
Test Case. These are created based on the scenarios extracted from the RiPLE-RE use cases
64
4.4. RIPLE-TE IN CORE ASSET DEVELOPMENT
(Neiva, 2009). Before creating a Test Case, it is necessary to identify all of the scenarios for the
given use case. By scenario we mean is an instance of the use case, which describes one specific
path through the flow of events. RiPLE-TE provides templates for test case and test procedure
specifications. The Test Case specification consists of the following components: test case
specification identifier, test items, input specification, output specification, special environment
needs, special procedural requirements, and test case dependencies. Each Test Case has a test
procedure associated. Test procedure specification is a brief description on all necessary steps
to run/repeat/rerun a particular test. It consists of the following components: test procedure
specification identifier, purpose, specific requirements, and procedure steps. The execution
results are recorded and provide a useful history to be further applied in similar contexts.
Test Software. It is a way to implement test scripts that represent Test Cases so that they
can be automatically executed on the software under test and produce a report. The scripts are
piece of codes prepared to have a set of commands to carry out a user request so that testers
are not supposed to have repeatedly rerun of test cases manually. Indeed, test automation is a
effort-saving strategy since they have positive impact on tester productivity (Burnstein, 2002).
Test Reports. RiPLE-TE includes partial and final reports on the test status, including
information such as number of test cases executed over a specified period of time, associated
change requests, reporting any errors found during tests execution.
Test Suites. It is simply a table of contents for the individual test cases. Organizing the suite
of test cases by priority, functional area, actor, business object, or release can help identify parts
of the system that need additional test cases.
4.4
RiPLE-TE in Core Asset Development
The goal of RiPLE-TE in CAD is to provide confidence that assets that will be further reused,
when instantiating products in PD, possess an adequate level of quality, that meet the assets
project goals. Thus, the focus of testing in CAD is to assess the reliability of software assets
before delivering them to the PD phase. The workflow of RiPLE-TE CAD is presented in Figure
4.3, each of these activities is further detailed throughout this section.
4.4.1
Master Planning
Test planning can and should occur at several levels or stages. In the RiPLE-TE, the first
test planning refers to the development of the Master Test Plan. This artifact is resposible to
orchestrate testing at all levels. In addition, each test level has its own test planning, that should
65
Figure 4.3 RiPLE-TE Core Asset testing workflow
be conducted based upon the general, i.e. master test plan. In a nutshell, the master test plan,
the output of this activity, defines a general document that provides relevant information for
conducting testing in a guided way. The whole test team is guided by such a document.
Figure 4.4 illustrates the master test plan linking it to other plans developed in all test levels.
Next section details the process workflow of the Master Planning activity.
Process Workflow
Lewis (2008) provides a framework that includes most of the essential planning considerations.
We tailored this framework to our context, in which the 22 items they consider turned into our
8-step workflow for the Master Test Planning activity. We grouped the items based on their
similarities and to make simpler the “checklist” of items to consider. Hence, the steps are:
1. Assemble Testing Team - The team should be organized concurrently with the development
team. The plan comprises the individuals, or group, roles and attributions for the project.
66
Figure 4.4 Levels of Test Planning
2. Conduct Business Risk Analysis - Identify high-risk application components that must be
tested more completely than others3 , and to identify error-prone components to be tested
more rigorously.
3. Define the Tests Objective - Establish what is to be accomplished as a result of the testing.
Test criteria that will be used is also defined at this moment. It encompasses what is
going to be accomplished, the test expectations, the critical success factors of the test,
constraints, scope of the tests to be performed and the expected end products of the test.
4. Describe Testing Approach - Includes the testing techniques that will be used, test entry
and exit criteria, procedures to coordinate testing activities with development, the approach
for defect reporting and tracking, test progress tracking, status reporting, test resources
and skills, risks, and a definition of the test basis.
5. Define Testing Environment - Includes the definition of hardware, software, network, and
so on, detailing which tools are going to be used and responsibles. It also defines the
help desk support required, to be reached in case of non-conformances regarding testing
environment experienced along the project.
6. Develop Testing Specifications - The test team to develop test specifications is set up. This
group is responsible to develop the specification format standards, and then to write the
3 The
RiPLE workteam has been working on developing an approach for Risk Management (RM) in SPL project.
However, for the time being this step should be indeed performed at this point. In a near future, this step will turn
into a merely input document, from the RiPLE-RM.
67
test specifications to be followed by the following phases.
7. Schedule the Test - The schedule is based on the resource availability and development
schedule, and the deadlines. Amount of cycle for every test level is also estimated. It
should balance resources and workload demands, defines major checkpoints, and develops
contingency plans.
8. Review and Approve the Plan - This is accomplished in a review meeting with the major
players. These reviews the plan in detail to ensure it is complete and workable, and
approval to proceed is obtained. These would encompass incorrect, incomplete, missing
and inappropriate information. Each step aforementioned which produce any artifact is
subject to be review and approval before putting it into practice.
4.4.2
Technical Reviews
The RiPLE-TE process, at the current stage, does not have evidence regarding what is the best
reviews technique to be performed in SPL projects, but we reinforce the need of reviews, since
these allow project members to detect and get rid of errors and/or defects early in the software
development life cycle, even before any code is available for testing, where naturally repairs
are less costly. Figure 4.5 depicts which artefacts we advocate the technical reviews should be
performed in a SPL project. The main assets that guide the testing process should be reviewed,
such as: Feature Model, Product Map, Requirements (and Use Cases), Feature Dependency
Model, and the Architectural Document (Design).
This is the first activity in the RiPLE-TE, and also the first moment in which the communication with others RiPLE disciplines occur. Since technical reviews are directly in touch with
artifacts previously built in other disciplines, the input artifacts come from Scoping, Requirements and Design RiPLE disciplines. Moreover, there is a direct integration with the RiPLE-EM
(Evolution Management) process. If we look again at the Figure 4.5 we can see when RiPLE-EM
is called by a RiPLE-TE task, the Change Managemet (CAD) icon, which is called whenever a
defect in a document is found that need to open and/or associate such with a Change Request,
that is managed by the RiPLE-EM process.
These reviews are usually performed in group meetings. The meeting is composed of
different stakeholders, depending upon the type of artifact under review. The seasonality in
which reviews are undertaken is a project decision, and it should be documented in the Master
Test Plan, so that all involved stakeholders are aware of when and how as well reviews will be
conducted.
68
Figure 4.5 Technical Reviews
In a product line life cycle, the feature modeling is one of the first activities. From it, the
remaining artifacts are based on, such as requirements, use cases, and then test cases. If a feature
model review uncover a defect at the moment of its development, the team would avoid the
error spreading throughout the whole set of remaining artifacts, which could cause damages and
effort/budget wasting. A technical review reveals directly the location of a bug, while testing
requires a debugging step to locate the origin of a bug.
Usage Scenario
As a usage scenario, we will use the example of requirements technical review. The Test team
will review the requirements document to ensure that the requirements match user needs, and
are free from ambiguities and inconsistencies. In the SPL life cycle, requirements will originate
Use Cases, that will consequently be input to Test Case generation.
69
We have basically three steps, as follows:
1. Plan for the Requirements Review - This step serves as a guide to the requirements review
activity. The plan is prepared in a meeting, in which the review goals are established, as
well as roles (basically: review leader [who will coordinate the review], author [one who
built the artifact under review], reviewers, and recorder [responsible for the review report]),
team size and the right participants, i.e. stakeholders, are defined. The requirements, use
cases and traceability matrix to be reviewed are indicated, with clearly stated objectives in
view. It must be agreed upon by all members of the team. In this planning, a checklist is
designed that guide and optimize the process.
2. Conduct Requirements Review - In this step a meeting with the involved stakeholders will
be held. The coordinator chairs the meeting. In this meeting, the requirements’ author
is supposed to present the material, without influencing the reviewers into making the
same logical errors as he did. The reviewers can discuss the documentation in order
to make suggestions where the material seems flawed or has potential for extension.
If, during the presentation, things are not clear enough, team members are allowed to
ask questions in order to clarify aspects of the presented material. Reviewers must
observe specific elements of the requirements, including aspects of traceability, variability,
binding-time, etc., according to the checklist provided. It is critical that the group remains
focused on the task they are involved with. The coordinator can help in this process by
restraining unnecessary discussions and lead the group in the right direction. Moreover,
the Coordinator should resolve disagreements when the team can not reach a consensus.
3. Report the findings - After conducting the review, all attendees must decide whether to
accept/reject modifications in the artifact as well as accept/reject the overall artifact. It will
base the walkthrough report to be then produced. A detailed report, including information
regarding what was reviewed, who reviewed, and what are the findings from the review, is
generated. It lists items of concern regarding the requirements and use cases, i.e. problem
areas within the product and corrections to be made.
These steps are also suitable to the other artifacts to be reviewed in the first RiPLE-TE
activity.
70
Figure 4.6 Testing workflow
4.4.3
Unit Testing
Unit testing concentrates on testing the code from the inside, exercising the code logic. This
activity is directly linked with coding and happens at the same time. Hence, the intention is to
direct the testing search to those portions of the code that are most likely to contain faults. As
each unit is constructed, it is tested to ensure that it (1) does everything that its specification
claims and (2) does not do anything it should not do. Conducting these unit tests is particularly
productive because the visibility into the code under test is at its maximum. The degree of
visibility is related directly to the testability (the ease with which the code under test can be
tested) (Clements and Northrop, 2001).
In Unit Testing the main goal is to ensure this portion is working properly. A product
comprises several units, which, at this point, should be tested individually. This activity enables
to find, and then correct, errors at a fine grained level, which can reduce the error propagation.
Once you have the tests for a class up and running, the next step is to hook up with other methods
71
and with other services in order to examine the interactions among methods and classes inside a
component, i.e. inside a unit.
Taking the SPL framework defined in (Clements and Northrop, 2001), unit testing is performed in Core Asset development, as soon as the core assets are being developed and their
quality must be assured before instantiated into derived products.
Unit testing will be performed in Product Development in cases when a new feature that
not belong to the Core Asset base yet is to be implemented. In this case, Unit tests are performed
in Product Development phase, but the flow of information, tasks and activities are the same we
detail in this section. As our main intention is to provide a Core Asset Base which comprises the
largest amount of assets to be further reused by diverse products, we advocate that such ’new’
features should be integrated to the Core Asset Base, and then instantiated in a product, even
in case where specific features will work in only one product. The intention here is to have a
common room that ease assets management, as previously mentioned in this Chapter.
The whole unit testing process workflow is here presented in a detailed way. Along with the
workflow, the activities, artifacts and roles to be included into RiPLE unit testing are presented
as well, in order to exploit the reuse potential given by the commonalities and variabilities
concerns and avoid time wasting as well. It is worthwhile to mention that we do not introduce
new terminologies for testing, but rather tailor existing ones (IEEE, 1998), (McGregor, 2001b)
to our context.
Process Workflow
The Unit Testing process is composed of four activities: Planning, Design, Execution and
Reporting. Figure 4.6 shows the process workflow. All RiPLE-TE test levels, both in CAD
and PD use this same workfloe. Although they are presented as sequentially initiated, this is an
iterative and incremental activity, with feedback connections enabling refinements. Next, each
activity will be discussed in detail.
Unit Test Planning
Planning accomplishes the initial activity of the RiPLE-TE Unit testing process. In this phase,
Test Manager and Test Architect define which will be covered by the unit test and to which
extent. Moreover, the planning is responsible for resources estimate, coverage criteria, hardware
and schedule, besides information about which features will or not be tested, which variation
point and/or variants will be considered in that test cycle. The risks inherent in these strategic
decisions should be also considered.
72
For every component a test plan is devised. Despite the high cost to implement and maintain
this artifact, the main reason of having taken such a decision isto have an appropriate strategy to
ensure traceability of the artifacts built, thus considering a long life cycle for the component.
Test planning should involve at least the following steps:
1. Analyze input artifacts - The first step concerns to analyze the input artifacts in order to
define, taking the Master Plan as a guide, what will be tested along the cycle. If any other
cycle has previously been executed, the reports should be also considered.
2. Define and prepare the test environment - This step focuses on tailoring what was prior
defined in the Master Test Plan, regarding testing tools and other tools to support test
activities, applying it to the context of unit testing. If any tool was not previously described
in the Master Plan, it is highly recommendable to have it updated.
3. Select features to be tested in each cycle - Features and variation points to be tested in
every cycle must be identified, the ones which will not be tested should also be pointed
out. Moreover, critical parts, as defined in the risk analysis, prior conducted, should also
be considered. This identification is based on requirements and use cases analysis and
priorization performed during RiPLE-RE. The SPL risk document should be considered
during this step.
4. Define the schedule - The schedule to perform the unit tests should be described to make
stakeholders aware of timing. Timing previously alloted in the Master Plan should be
respected.
5. Define testing techniques - Identify white-box and black-box techniques and/or strategies
to be applied in unit testing.
6. Define strategy to test the integration of classes - Identify a strategy which can be suitable
to test the integration of classes and methods. An strategy suitable for the RiPLE-TE
context is the concept of object clusters. A cluster consists of related classes that may work
together to support a required functionality for the complete system (Burnstein, 2002). To
integrate methods ans classes using the cluster approach the tester could select clusters of
classes that work together to support simple functions as the first to be integrated. Then
these are combined to form higher-level, or more complex, clusters that perform multiple
related functions, until the system as a whole is assembled.
73
7. Define the coverage criteria - At this point, it is necessary to establish which coverage
criteria will be applied in the project and the rates associated.
8. Define the test input domain - Determine the test input domain, in order to address the
coverage criteria defined previously.
9. Define the acceptance (Pass/Fail rate) criteria - Specify the stop criteria to be used during
the component unit test cycle (e.g. 98% passed).
10. Summarize the information in a test plan - All the information from the previous steps are
then comprised in a test plan. A template for planning should be used is encouraged.
11. Review and approve the test plan - The test plan developer or manager schedules a review
meeting with the major players, reviews the plan in details to ensure it is complete and
workable, and obtains approval to proceed. These would encompass incorrect, incomplete,
missing and inappropriate information.
As input to this activity, the SPL Architecture Plan must be visited as well as the Project
Plan and the Component Source Code as well. The former assembles the architectural decisions on the variabilities and commonalities of the components to be tested, which can directly
impacts on the strategies adopted in this activity. The second one will be useful to define the
schedule for unit testing and establish the relationship among activities, coverage criteria and
available resources. Finally, unit tests take place at the source code level, hence access to the
code should also be helpful for the planning definitions. The Unit Test Plan is presented as the
output of this activity.
Unit Test Assets Design
The second activity covers the design of test assets, mainly Test Cases, Test Suites, Test Procedures and Test Scripts. Test Architect, Test Designer and Tester work jointly in this activity.
When creating test cases, the inputs, steps and expected results should be considered. The
relationship among test cases is identified to group related tests in test suites. It is interesting to
devote effort in producing reusable assets at this pont, so that next turns can benefit from it. All
assets should be recorded in a test repository for use in future releases.
The following steps have to be followed when designing test assets:
1. Create tests to evaluate the components methods - Based on the previous activity, a test
case is generated for each method within the component in order to evaluate if it works
74
according to the specification. It is strongly recommended to build scripts in order to
automatically execute the unit tests. Architecture information should be considered during
this step, e.g. Component Class Diagrams are important, since it helps the tester to verify
if the specification is in conformance with the component code.
2. Create tests to evaluate the classes integration - After creating test cases for each method,
the integration among the component’s classes can be performed using an integration
strategy, in order to evaluate this integration the test cases are created. The test cases will
evaluate the component’s methods based on their specification (e.g. class diagrams), risks
performed during planning step.
3. Group test cases in test suites - Group the tests created in the previous step according to a
common objective, e.g. group a set of test or scripts in order to evaluate a specific feature.
Unit Test Execution
After producing the required test assets, the activity should be the execution of the test cases,
activity commissioned to Testers. This activity is basically stated under the purpose of finding
errors in the code.
Aiming at reducing the test execution effort, test automation is a useful strategy. It is
important that the tests scripts development and maintainance costs should be lower than manual
tests execution. Sometimes a change might impact on the test script as a whole and force that a
new script should be developed from scratch. Whenever test automation scripts have been posed
as necessary into the scenario in use, testers are requested to use them.
Figure 4.7 Unit Test execution steps
75
In order to give detailed information on the fine-grained tasks to be executed inside a
component until to ensure it accomplishes what is required, we designed a sample illustration,
Figure 4.7, which presents the test execution steps to be performed inside a component.
As an example, we consider a Component, named α, which contains the classes 1, 2 and
3. Each of these classes contains n methods, as can be seen in the top right of the figure. For
illustrative purposes, we are using the nonincremental testing approach, but the incremental
approach could also be adopted. The first step (1) is to test every method, individually. Next
(2) test the integration of the methods contained in the class. In both cases, if a method or class
depends on a not coded yet, we encourage the use of test harness. Later (3) all the classes which
compose the component must be integrated and tested. We must consider the acceptance criteria,
which has been previously defined in Test Planning phase.
An important artifact generated along this activity is the CR, which associates defects found
in test execution within the developed code.
For each error found by a tester, it should be analyzed and a CR is associated. The steps
needed to find the defect are recorded. The CR is then assigned to the development team
responsible for the functionality under test, and must be further analyzed, in order to fix the
error. This artifact serves as a bridge between development and test teams. CRs management is
handled by RiPLE-EM process. CR is one important means of measurement, as for measuring
the productivity and effectiveness of a test team, as for measuring the quality of the development
team.
Furthermore, the acceptance criteria established during the planning phase serves to indicate
the moment to finish the test execution. After finishing the execution of the specified set of tests,
the test result (in which the state of test case result changes according to the range: pass, fail,
block or invalid, as can be seen in Figure 4.8) should be reported in a form of a Partial Report.
In short, the steps to be followed in such a task are:
1. Perform tests for each method - The first step is to test each method, individually.
2. Perform tests of the integration of methods contained in a class - Secondly, test the
integration of the methods contained in the class.
3. Perform tests of the integration of classes contained in a component - Later, all the classes
which compose the component must be integrated and tested.
4. Associate a CR to errors found - As soon as an error is found, the tester must associate the
error with a Change Request (CR).
76
Figure 4.8 State transition diagram of test case result
Unit Test Reporting
During test execution defects are found and reported using change requests, which will be further
used when assembling the test report. This artifact has some objectives: first, the recorded
information is important to calibrate the unit test process as a whole. After testing a component,
it will provide information about the coverage criteria, common defects and their respective
fixes, tested features, tested variants regarding to a variation point, process bottlenecks and gaps.
Last, it will provide information to perform another unit test cycle or maybe a regression test
cycle, whenever the component is not completely tested at first time, in terms of acceptance
criteria. Test Manager is responsible for conducting the reporting.
The reporting artifact is usually based on a template, that should be available in a repository
so that every test team member has access to it. RiPLE-TE provides such a template document
as well.
The steps to be followed during Test Reporting are:
1. Assemble information on the cycle execution - Information provided by the test execution
cycle is to be included in this report, to further guide additional cycles, regarding to
workforce, personnel allocation, defects found rate, and so on.
2. Record the issues found - Record the main issues found during the cycle execution.
3. Check for test completion and provide feedback - Whenever the test objective, in terms
of coverage and acceptance criteria, defined during the planning phase is not achieved, a
feedback should be performed in order to calibrate the execution for the next test cycle.
77
4.4.4
Integration Testing
Unit testing is an essential quality control activity, as showed in the previous section, but we
must deal with testing in a situation where different units of works are combined into a workflow.
Once we have tests for a unit up and running, the next step is to hook up with other units and
with other services. Examining the interaction between components, possibly when they run in
their target environment, is the stuff of integration testing.
As an example, let A and B be two components, SA and SB be their respective suites of
units tests, TAB be the test case from SA that cause A to call B and let TBSA be the set of test
cases from SB which cause A to pass through a path in which B is called. The integration testing
is the junction of TAB and TBSA.
Hence, after verifying individually the units, these need to be integrated to compose modules,
sub-systems, SPL reference architecture and futher instantiate it to assemble specific product
architectures. The test in which units are joined is known as Integration Test.
Integration Testing in CAD aims to test the interaction about the SPL common components
and the reference architecture as well. As input of this testing level, we should consider the
following assets: unit tested components, feature dependency diagram and architectural views
(behavioral and structural).
Process Workflow
In the context of the RiPLE-TE, integration testing assembles four activities, like in unit testing
process: Planning, Design, Execution and Reporting. Each is following detailed.
Integration Test Planning
Planning is the first activity to be performed. Test Manager and Test Architect define which will
be covered by the integration test and to which extent. Following, the steps are described.
1. Analyze input artifacts - The first step concerns to analyze every input artifact in order to
define, using the Master Plan as a guide, what will be tested along the cycle. If any other
cycle has previously been executed, the reports should be also considered.
activities, applying it to the context of integration testing. If, e.g. any tool is not described
in the Master Plan, it must be updated.
78
3. Define White-box and Black-box Techniques - Define white-box and black-box techniques
to be used in the project.
4. Define strategies for test the integration of components and modules - Identify a strategy
to test integration of components. The concept of object clusters can also be applied in the
RiPLE-TE integration testing. In this context, a priorization strategy should be established
in order to analyze the relevance of the components which implement the most common
or complex variation points.
5. Define the coverage criteria - Identify features and variation points to be and not to be
tested, as well as identify critical paths and relationships. The coverage criteria define
how much functionality of the application code expanding the application architecture
will be covered with test suites. The criteria can be useful when the test architect, when
looking for the structural view and the dependency diagram, can capture some critical
components and interactions. By looking the behavioral view he can capture software
functionalities that he is interested to test.
7. Define the Pass/Fail rate criteria - Specify the stop criteria to be used during the integration
test cycle.
thus comprised in a test plan. A template for planning should be used.
9. Review and approve the test plan - The test plan developers should schedule a review
meeting with the major players to review the plan in detail to ensure it is complete and
workable, and obtains approval to proceed. These would evaluate issues such as incorrect,
incomplete, missing and inappropriate information. The plan will only be ready for use
after the approval granted.
Integration Test Assets Design
After planning, assets design takes place. Test Architect, Test Designer and Tester are commissioned to this activity. The following steps are the object of this integration activity.
1. Create test scenarios - Based upon the interactions among components, as documented
through the architecture scenarios available, integration test scenarios that have to be
79
executed to test a specific feature should be created. These include scenario outcomes (the
expected results). Indeed, in some situations, not all test scenarios may be documented in
the same detail, as a template would require.
2. Create tests to evaluate interactions among components - Based upon the test scenarios,
test cases are generated for critical paths and interactions among components in order to
evaluate if it works according to the specification. Architecture information should be
considered during this step. For example, Architectural Views (component class diagrams,
modules class diagrams, sequence diagrams, and so on) are important, since they aid
testers to verify the specification against code. Whereas the behavioral view provides
functional informations regarding the reference architecture, the structural view provides
structural information.
3. Create tests to evaluate interactions among modules - After creating test cases for the
critical module’s interactions, the integration among modules is performed using an
integration strategy. In order to evaluate the integration, test cases are created. Test cases
should focus on evaluating interactions among modules based on their specification (e.g.
architectural views).
4. Create test suites - Group the tests created in the previous step according to a common
objective. E.g: Group a set of test or scripts in order to evaluate a specific feature.
5. Verify test coverage - Verify if the test cases created are covering the desired paths.
Integration Test Execution
Next, Testers are responsible for executing the tests previously created. Some are the steps
included in this activity. These are:
1. Perform tests for critical component’s interactions - The first step is to test critical
component’s interactions, individually.
2. Perform tests of the integration of modules modules - Secondly, test the integration of the
methods contained in the reference architecture.
3. Associate a CR to errors found - Whenever an error is found, the tester must associate the
error with a CR.
This activity can be conducted both manually and automatically, following what is described
in the test plan, which meets major organizational needs.
80
4.5. RIPLE-TE IN PRODUCT DEVELOPMENT
Integration Test Reporting
Finally, the reporting activity takes place. This activity shows the results after the tests execution.
The Test Manager is the responsible for assembling the information generated throughout the
components integration testing activity in a form of a report. Recorded errors aid the development
team to improve the quality of next iterations of the components. Besides, project management
uses the reports as a way to extract metrics from the project and enhance estimations, regarding
staff alocation, budget, schedule and so on.
The results provide information about the coverage criteria, common defects and their
respective fixes, tested features, tested variants regarding to a variation point, process bottlenecks
and gaps. Thereby, they aids to explicitly determine the quality of both tests and the produced
component code. Every test cycle has as a result a test report. Each new cycle, the previous
report is analyzed to check out what was fixed and define new directions.
The steps of the integration testing reporting are similar to the unit testing one. They are:
1. Assemble information on the cycle execution - Information provided by the test execution
cycle is to be included in this report, to further guide additional cycles, regarding to
workforce, personnel allocation, defects found rate, and so on.
2. Record the issues found - Record the main issues found during the cycle execution.
3. Check for test completion and provide feedback - Whenever the test objective, in terms
of coverage and acceptance criteria, defined during the planning phase is not achieved, a
feedback should be performed in order to calibrate the execution for the next test cycle.
The integration test process gets finished when the coverage criteria and acceptance rates are
achieved.
4.5
RiPLE-TE in Product Development
The goal of the RiPLE-TE in PD is to integrate the components created and individually tested
in CAD, to compose the instantiated products. The test levels that comprise this SPL phase are
integration, system and acceptance, each with its specific goals, as will be following detailed.
Figure 4.9 depicts the main workflow of the RiPLE-TE in PD.
81
Figure 4.9 RiPLE-TE Product testing workflow
4.5.1
Integration Testing
The inputs for Integration Testing in CAD can also be used in PD, with the addition of the
product map and/or the decision model, since these assets hold information on the features
(specific functionalities) of each product that will be realized in the product line. Thus, instead
of performing integration with only high-coupled components, which occurs in CAD, in PD the
goal is to test the integration of every components that will compose a product, according to
what is defined in the product map.
As the products are instantiated, the reference architecture is also instantiated and adapted in
order to meet the product specific needs. The term adaptation refers to the binding of optional
and alternative variants, modification of components dependencies, and the addition of new
components, which result in multiple product architectures within the same line. In this scenario,
the reuse is strongly recommended in order to save effort. Since we have previously defined
abstract test cases to be applied in integration testing, we can avoid to re-develop test assets
from scratch but rather reusing the assets recorded into the test library. Yet, as new products
are instantiated, it will be not necessary to retest everything, but instead only the portions that
results are non-existent or have does not guarantee confidence.
82
The output of this phase is the integration testing of the product architecture acoomplished,
and the reports and test plans associated.
4.5.2
System Testing
Since we have finished the evaluation of the components integration that will compose a product,
and the coverage and acceptance criteria were achieved, a system test should be performed. In
the RiPLE-TE context, the purpose of system test is to ensure that the customer’s documented
requirements are met, in preparation for acceptance test, the following test level. Test cases and
scenarios are designed to accomplish this purpose.
At this test level, system must be run from the point of view of its end user, sweeping
functionality for faults in relation to original goals. The tests are performed under similar
conditions - environment, interfaces and systemic masses of data - those that a user will use in
their day-to-day handling of the system. According to the policy of an organization, can be used
real conditions of environment, interfaces and systemic masses of data.
Four activities are encompassed by this level: planning, design, execution and reporting.
Each will be following described:
System Test Planning
As in the other levels, planning is the first activity to be performed in system testing. Test
Manager and Test Architect work jointly towards defining a plan for be followed during system
test level. The following steps are considered to create such a plan:
1. Analyze input artifacts - The first step concerns to analyze the input artifacts in order
to define, taking the Master Plan as a guide, what will be tested along the cycle. The
sequence diagrams and activity diagrams, which define the interactions on the basis of
which objects communicate are considered at this point. In addition, if any other cycle
has previously been executed, the reports should be also considered.
2. Define the schedule - The schedule to perform the unit tests should be described to make
stakeholders aware of timing. Timing previously alloted in the Master Plan should be
respected.
3. Define the coverage criteria - Based on the structure of the diagrams, coverage criteria
can be defined. The coverage criteria allow a tester to decide when a sufficient set of test
case scenarios has been represented.
83
activities, applying it to the context of unit testing. If any tool was not previously described
in the Master Plan, it is highly recommendable to have it updated.
6. Define the acceptance (Pass/Fail rate) criteria - Specify the stop criteria to be used during
the component unit test cycle (e.g. 98% passed).
then comprised in a test plan. A template for planning should be used is encouraged.
8. Review and approve the test plan - The test plan developer or manager schedules a review
meeting with the major players, reviews the plan in details to ensure it is complete and
workable, and obtains approval to proceed. These would encompass incorrect, incomplete,
missing and inappropriate information.
System Test Assets Design
1. Create test scenarios - Based upon the sequence diagrams and activity diagrams, in
which variability is represented, test scenarios are extracted. This input represents how
variability is represented in the architectural documents. This includes also how higherlevel functionality, or a scenario, is spread over multiple objects, and how such a scenario
is implemented through sequences of lower-level method invocations. Therefore, the
scenarios correspond to simple paths in an activity diagram, which can be analyzed to
detect feature interactions.
2. Create test cases - Based upon the scenarios extracted from the diagrams, test cases
are created that consider the variability inserted in such artifacts. Regarding the activity
diagram, each branch of the activity diagrams where the variability has been bound must
be covered by at least a one test case.
3. Create test suites - Group the tests created in the previous step according to a common
objective. E.g: Group a set of test or scripts in order to evaluate a specific feature.
4. Verify test coverage - Verify if the test cases created are covering the desired extent.
84
System Test Execution
Following the workflow, test execution takes place in the system test level. The following steps
are thus considered:
1. Execute test cases - Tests previously designed are then executed.
2. Associate CR to errors found - As in other levels, whenever an error is found, a CR must
be assoaciated to it.
System Test Reporting
System test reporting follows the same steps that presented in the Integration Testing CAD
activity.
4.5.3
Acceptance Testing
A famous cliché states that The customer is the ultimate judge. Based on this statement,
acceptance test is the last level to be performed in the context of the RiPLE-TE process.
It verifies in loco whether the final product meets the customer’s requirements. The main
responsible for performing the acceptance test is the customer. It is must probably the last
opportunity customers have to make sure that the product is what they asked for, before put the
product in real production.
Acceptance testing is sometimes performed with realistic data of client to demonstrate that
software is working satisfactorily. The internal logic of the program is not emphasized here,
only external behavior is tested. As a consequence, mostly functional testing is performed at
these levels.
As the previously described levels, we have sketched four activities for this level: planning,
design, execution and reporting.
Acceptance Test Planning
Acceptance test planning is composed of the following steps:
1. Define customer responsibilities - The initial point to be considered in the acceptance
test planning is to define the activities the customer is responsible for and what other
stakeholders will be involved in this activity, with which attributions.
85
2. Define acceptance criteria - Before you begin the acceptance test, the team should know
what criteria will be used to decide whether the system is acceptable or not. The acceptance
criteria should define how the decision regarding the product acceptance level will be
made. For example, the customer may accept a system with certain minor types of errors
remaining, but instead there may be other levels of errors that will render the system
unacceptable. Part of the acceptance criteria may be to revalidate some of the other system
tests.
3. Define and prepare the test environment - This step focuses on putting into practice the
hardware/software specifications set to the the product under test, in the real environment.
thus comprised in a test plan. A template for planning should be used.
5. Review and approve the test plan - The test plan developers should schedule a review
meeting with the major players to review the plan in detail to ensure it is complete and
workable, and obtains approval to proceed. These would evaluate issues such as incorrect,
incomplete, missing and inappropriate information. The plan will only be ready for use
after the approval granted.
Acceptance Test Assets Design
Next, the assets are designed. The following steps should be performed:
1. Analyze test repository - Before worrying about create test scenarios or test cases, the test
repository should be visited in order to check out the results of acceptance testing prior
performed in any component that compose the product under test. Even if the results were
not suitable to the new scenario, at least the test cases could be instantiated.
2. Create test scenarios - Real product use scenarios are considered in order to define the test
scenarios. It is strongly recommended to create scenarios for critical paths, since these is
to be the focus in acceptance testing.
3. Create test cases - Test cases should be built that represents the real test scenarios. Based
upon the non-functional requirements (NFR), test cases that consider product quality
attributes should be object of evaluation as well. Hence, acceptance tests should strongly
concentrate the effort on evaluating these aspects, before delivering the product for use.
86
4.6. MANAGING TEST ASSETS VARIABILITY WITHIN RIPLE-TE
Acceptance Test Execution
After designing the test assets, it is execution time. These are the steps to be performed:
1. Execute critical-paths tests - The first step when executing the tests refers to the criticalpath test execution. Since it will not be possible to test every possible configuration, even
after reusing results from previous acceptance tests (from other products), we have in
mind that criticality should be considered.
2. Execute remaining test cases - After performing tests considering the priority, the remaining test cases should be executed.
3. Associate CR to errors found - As in other levels, whenever an error is found, a CR must
be assoaciated to it.
Acceptance Test Reporting
The last activity consists of reporting the results. Basically the same steps of the Integration
Testing CAD activity will be performed at this point. After the acceptance test gets finished, it is
possible to state with a high level of confidence that the product will be relatively error-free and
stable.
4.6
Managing Test Assets Variability within RiPLE-TE
In this work, variability is represented by introducing stereotypes to the UML models: use case,
class, sequence and activity diagrams. In addition, UML notes are used to represent information
regarding variation points and variants.
A metamodel was created that represents how variability in test assets could be managed in
a SPL project. This model is the basis of the designing, tracking and evolution of the test assets
in the RiPLE-TE.
4.6.1
Meta-model for managing variability
Figure 4.10 presents the RiPLE metamodel for assets variability management, in which the
scoping, requirements and testing processes are sequentially arranged from the top to the bottom,
in each layer depicted.
87
Figure 4.10 RiPLE Metamodel for assets variability management
88
Figure 4.11 The metamodel core UML profile.
The core entity of the metamodel is called Asset, as shown in Figure 4.11. The estereotype
«asset» is used to determine which entities are subject of management. This entity is used as
a UML profile for the other entities of the metamodel which should behave as an Asset. That
is, whenever an entity of the SPL should have the properties of an Asset, it is extended using
this profile. The usage of such profile has two main reasons: to enable feature changes in
the metamodel without the need to modify the entire metamodel; to transform the metamodel
entities into first class entities; and to keep the metamodel views clean and understandable.
An Asset entity has some properties which enable some of the metamodel requirements
possible. For example, it is related to a History entity, which is responsible to keep track of the
changes that are made in an Asset object. Thus, an History object records the Asset object which
was modified, what kind of modification was performed, and who have done it. Recording such
modifications, enables, for example, to calculate the probability that an Asset object has to be
modified – such probability could impact directly in the SPL architecture design (von Knethen
and Paech, 2002).
The Asset entity is also related to a set of metrics (Metric entity). During the SPL development, it is important to have information quantifying it. For example, we could measure the
time for scoping analysis and requirement analysis, thus it would be possible to estimate the
value for each feature or requirement. We could also set a metric for number of LOC (Lines Of
Code) for each feature in the SPL, or how many requirements and use cases some feature has.
89
The granularity of the metrics can vary from the very high level to the very low level due to the
strong tracking capability of the metamodel.
Furthermore, the Asset entity is also related to a Version entity. It enables the Asset objects
to be versioned. This characteristic is important due to the variability nature of a SPL project.
The presence of the Version entity means that for each modification in an Asset object, it should
be created a new version of it. Thus, integrating a versioning mechanism inside the metamodel
will enable easy maintenance of different versions of the same product, for example. If the
metamodel is extended to support different SPLs, it becomes more critical.
Finally, but not less important, the metamodel integrates also a mechanism for issues
reporting. Such mechanism enables any Asset object to be associated with an Issue object.
It also means that you could report an issue for different versions of the same Asset object.
The direct impact of such a mechanism, is that it will provide easy maintenance and evolution
of different Asset object versions. However, due to better understanding reasons, the issue
mechanism showed in the Asset UML Profile is a simplification of what an issue mechanism
should be. Therefore, the metamodel can be extend to support a complete issue tracker system,
also as it could be done for versioning, metrics, and history mechanisms.
In a nutshell, this metamodel shows how assets interact with each other, and their dependences. This linkage represents an extensive form of traceability which enables an effective
change management. Regarding scale and evolution of a SPL, to maintain traceability between
artifacts becomes extremely important, thereby considered as a prerequisite for consistent and
effective change integration, specially due to the variable characteristic of SPL artifacts.
Regarding testing aspects, in this model, we present the Test Case asset and its relationships,
as detailed in Figure 4.12. Test Cases are, in RiPLE-TE, derived from Use Cases, as can be
seen in the metamodel. This model expands on the RiPLE-RE (Neiva, 2009) abstract use case
definition, in which variability is represented in the use cases. A use case is herein composed by
the entity AbstractFlow, that comprises the subentity Flow, which actually represent the use
case steps. Every step can be associated with a subflow, that can represent a variation point.
4.6.2
Development and Derivation of Test Cases
Figure 4.13 depicts the dependency between test objective and test case when variability is
considered (Wübbeke, 2008). In (A) the component is variable as a whole, and in (B) only part
of the component is variable. They will turn, respectively, into (A’) and (B’), the former as a new
test case, which is variable as a whole, and the latter, a test case in which only the corresponding
part of the test case is variable.
90
Figure 4.12 The metamodel for tests.
Hence, the challenge is how to optimally build test case that take into consideration variability
aspects so that reuse the parts of test cases will emerge easily.
Figure 4.13 Dependency between test objective and test cases considering variability
As an example of this possible flow, we represent in Figure 4.14 a use case in a form of
an activity diagram, which depicts its set of steps. In this diagram, the white diamond below
step A depicts a variation point in which there is one mandatory feature, depicted by step B
and an optional feature, depicted by E. The black diamond below step E is also representing a
variation point, which has two variants, denoted as F and G. These are alternative variants, so
that only one can be instantiated in a product. Figure 4.15 depicts the three possible scenarios:
(1) [A-B-C-D]; (2) [A-B-C-D-E-F]; e (3) [A-B-C-D-F-G].
This is the typical case presented in Figure 4.13, in which only part of a use case varies. In
this case, it is not necessary to create different use cases to represent the variability but rather
reuse part of it. Returning to the metamodel, this representation is enabled since, according to
the model specification, every step in a use case can make reference to a subflow. Thus, in order
to test these three possible scenarios, we could create the test case for scenario (1), and then
91
Figure 4.14 Example activity diagram including variability
Figure 4.15 Example activity diagram showing possible scenarios
reuse the flow of this scenario and the results as well for the remaining scenarios (2) and (3).
It is based on the control-flow coverage criteria, which is based on graph representation. We
applied such a representation, but evolve it considering peculiar aspects of variability.
The same idea could be applied to the sequence diagrams, artifact built in the design phase.
We can represent variation points and variants by using a stereotype tag, so that scenarios are
likely to be easily identified. Next step, hence, is to create the test assets, according to this
representation. The reuse strategy is also feasible.
Considering that a use case can generate several test cases, the strategy to represent each
92
step in a use case and enable it to be linked to a variation point enables the building of test cases
which also consider the variation points. This way, several test cases can be instantiated from a
use case, that have represented the variation points. Variability is preserved in the core asset test
artifacts to facilitate reuse. This strategy enables that every change in the use case and/or the
variation points and variants be propagated to the test cases and their steps (variable or not), due
to the traceability represented.
In summary, test cases are derived from use scenarios, that can emerge from both use cases
(depicted as activity diagrams) and design models (sequence diagrams), hence we can apply this
strategy in every level that comprises the RiPLE-TE.
4.7
Chapter Summary
This Chapter presented the proposed process for testing in Software Product Lines, named
RiPLE-TE. Throughout this chapter we presented the process roles and attributions, work
products, activities, and associated steps. This proposal was raised in order to meet a gap
found in the literature which does not include detailed approaches for conducting testing in SPL
projects, but instead only parts of approaches are available. As SPL is mainly concerned with
variability management, we presented how test assets could be built in a SPL environment, in
which variability takes place in many artifacts. UML is used to represented variability in the
process metamodel. Traceability and evolution management issues are also handled.
Adaptations can be made on the basis of the guidelines we proposed along this Chapter, in
order to meet some organization specific goals. As an example, we do not include exploratory
testing issues when executing test cases. However, it is possible that a kind of exploratory
testing should be included in the context of the RiPLE-TE process. It is an optional step for the
execution activities, but strongly relevant. As testers gain knowledge on the domain and the
products under test, they are capable enough to ask the right questions to the software. However,
a drawback is that with exploratory tests, how testers encountered defects are commonly not
recorded, going against the SPL principles where assets should be tracked and recorded in a
repository for future reuse.
We have not performed any comparative analysis to existing approaches, since the level of
details we provide in our process is not provided by any other, hence such an analysis might not
have relevance, due to the scope in which it would be conducted.
Next Chapter present an initial experimental study on the basis of this proposal, performed
as a way to understand the behaviour od a process for Testing in SPL.
93
5
Experimental Evaluation
Software development has been characterized from its origins by the need of empirical facts
tested against reality, which provide evidence of the advantages or disadvantages of the different
methods, techniques or tools used in the development of software systems (Juristo et al.,
2002). Wohlin et al. (2000) states in his well accepted textbook on experimentation in software
engineering that the only real evaluation of a process or process improvement proposal is to
have people using it, since the process is just a description until it is used by people. In fact, the
path from subjectivity to objectivity is paved by testing or empirical comparison with reality
(Juristo et al., 2004).
Besides, while nearly all computer scientists have strong intuition about the link between
unit testing and its effectiveness in overall software development, we are not aware of any
systematic studies examining this test level in SPL projects in terms of effectivity and effort
saving as well. Thus, we conducted two experimental studies in order to evaluate our proposal.
The same project was applied in both experiments, but in different environments. The replication
was motivated by the desire of achieving more significance and confidence on the results.
This Chapter describes and motivates the design of the experiments and discusses the threats
to validity. The experimental study was conceived and structured based on the concepts of
experimental software engineering and the evaluation of software engineering methods and tools
provided by Wohlin et al. (2000).
Following the experiment process defined in (Wohlin et al., 2000), we firstly defined the
purpose of the experiment and discussed the context in which the both experiments took place in
Section 5.1. Then Section 5.2 details the first experiment, comprising subsections: Section 5.2.1,
which reports on the experiment planning and design; Section 5.2.2, detailing the project used
in this experimental study; Section 5.2.3, which outlines the preparation and execution of
the experiment; and Section 5.2.4, which presents the analysis of the collected data. Then,
94
5.1. DEFINITION OF THE EXPERIMENTAL STUDY
Section 5.3 discusses the second experiment performed. Section 5.4 describes lessons learned.
Finally, Section 5.5 gives overall conclusions on the experiment.
5.1
Definition of the Experimental Study
In the definition phase, the foundation of the experiment is determined, in which its goals are
stated. We applied the Goal-Question-Metric Approach (GQM) (Basili et al., 1994) in order to
collect and analyze meaningful metrics to measure the proposed process.
Goal. The objective of these empirical studies is analyze the RiPLE-TE Process, Unit Testing
discipline for the purpose of evaluation and characterization with respect to its effectiveness
from the point of view of the potential users in the context of a SPL testing project at the
University.
Questions. Following we describe the questions defined in order to achieve the goal:
Q1. Do the quality of defects found improve when subjects apply the process?
Q2. Do the rate of defects found increase when subjects apply the process?
Q3. Do the subjects consider using the RiPLE-TE Unit Testing useful in detecting defects?
Metrics. The literature does not provide metrics directly associated to SPL testing, thus we
decided to apply some known practices from traditional software testing, with some adaptations
to reflect SPL specificities, when necessary.
M1. Test Case Effectiveness (TCE): The effectiveness metric, to measure test case effectiveness has been proposed by Chernak (2001). The idea is that the more defects test cases find,
the more effective they are. The author defines this metric as the ratio of defects found by test
cases to the total number of defects reported during a test cycle. We tailored such measure to
our context, so that Test Case Effectiveness (TCE) metric is defined as the ratio of the amount of
defects (Dtot ) reported to the total number of test cases (Ntc ). TCE metric serves specifically to
validate the effectiveness of functional test cases. This metric refers to Q1. It is defined as:
TCE =
Dt ot
× 100
Nt c
5.1 95
5.1. DEFINITION OF THE EXPERIMENTAL STUDY
M2. Quality of Defects Found (QDF): It refers to the number of valid defects found,
normalized to Difficulty (DD) and Severity (SV) values. Every defect, in a set of known defects,
are valued with a coefficient (k and r) according to its DD (values 1, 2 or 3) and SV (2, 4 or 6),
respectively from lower to higher values. The quality of defects is then assumed as the total
amount of errors considering each value category and their coefficients. The higher the score
of the quality of defects found, the more effective testing was. This metric refers to Q1. It is
defined as:
QFD = k. f (DD) + r. f (SV )
5.2 M3. Test Coverage (TC): This metric gives the fraction of all features (or requirements/use
cases) covered by a selected number of test cases or a complete test suite. The TC metric is a
measure of the number of test cases needed to be selected or designed to have good coverage
(Kshirasagar Naik, 2008). Cov is hence generated by the Eclemma code coverage tool1 , which
associated to the JUnit framework2 , results in given the overall code coverage by test cases, in
terms of basic blocks, lines, bytecode instructions, methods and types. This metric refers to Q2.
TC = Cov
5.3 M4. Effectiveness of using the process (EUP): This metric is used to evaluate the effectiveness of using the process according to the subjects’ feedbacks. The metric is calculated as
a ratio of subjects who experienced problems during applying the process, i.e. ones who had
difficulty to understand, follow and use its (S prob ) to the total amount of subjects (Tot S). This
metric refers to Q3.
DIF =
S prob
× 100
Tot S
5.4 1 Eclemma
2 JUnit
- Java Code Coverage for Eclipse - http://www.eclemma.org/
Framework - http://www.junit.org/
96
5.2. FIRST EXPERIMENT
5.2
First Experiment
The first experiment was performed from November to December in 2009, in the context of the
Experimental Methods in Software Engineering graduate course (MATC66)3 and the Software
Validation undergraduate course (MATB15)4 . The prior is intended for first year Ph.D. students
from DMCC/UFBA, Brazil5 . The second is intended to last year undergraduate students from
Computer Science Department at UFBA.
5.2.1
The Planning
After the definition of the experiment, the planning takes place. Whereas definition phase
determines the foundation for the experiment, i.e. the why it is conducted, the planning phase
prepares for how the experiment is conducted (Wohlin et al., 2000).
As in other software engineering disciplines, activities should be planned in advance, in the
form of a plan, which must be followed by the remaining activities, in order to have a control
over the experiment. Hence, the planning phase of this experimental study includes the steps
described in next subsection, and follows the model proposed in Wohlin et al. (2000), including
some important aspects described in Jedlitschka et al. (2008).
Context Selection
As mentioned earlier, Ph.D. candidates and undergraduate students will be involved in this
experiment. MATB15 classes will host the ’experimental lab’. This course was designed to train
students in concepts, principles, techniques and tools of software testing and and validation, so
that they can understand the theoretical background of testing as a means of gaining knowledge,
and the limits of this approach as a means of controlling quality. MATC66 classes will host
the experiment preparation, thus including a pilot project. This course is intended to teach
Ph.D. students concepts and practice of empirical software engineering, such as performing and
reporting controlled experiments.
Then, the experiment will be conducted in an academic environment, in which selection
os subjects, training, and the execution of the experiment will be held. Yet, regarding the
characterization of the experiment, it will run off-line (not industrial software development),
based on a simulated problem, a SPL project in the conference management domain. The project
3 https://disciplinas.dcc.ufba.br/MATC66
4 https://disciplinas.dcc.ufba.br/MATB15
5 http://dmcc.dcc.ufba.br/
97
is further detailed in Section 5.2.2. In addition, the experiment will be conducted as a single
object study, in which it will be conducted on a single subject and a single object study.
Pilot Project
Before performing the study, a pilot project will be conducted with the same structure defined
in this planning. The pilot project will be performed by MATC66 Ph.D. students, who will
be trained on how to use the proposed process. For the project, the subjects will use the same
material described in this planning, and will be observed by the responsible researcher. This
way, we can identify possible inconsistencies avoiding future misunderstanding.
Hypothesis formulation
In an experiment it is necessary to formally and clearly state what we intend to evaluate. In this
experimental study, we chose to focus on four hypotheses. We hence state them formally and
also define what measures we need to evaluate the hypotheses.
• Null Hypothesis (H0 ). The Null Hypothesis determines that there is no benefit of using
the RiPLE-TE Unit Testing (described below as RIP) approach to support testing in
product lines activities, if compared to ad-hoc testing (described below as ADHOC), in
terms of effectiveness. The Null Hypotheses and the values defined were:
H01 : µT CE ADH OC ≤ µT CE RI P
H02 : µQF D ADH OC ≤ µQF D RI P
H03 : µT C ADH OC ≤ µT C RI P
H04 : µEU P RI P ≥ 30%
• Alternative Hypothesis (H1 ). The Alternative Hypothesis determines that the use of the
process produces benefits that justify its use. The following Alternative Hypotheses were
defined:
H11 : µT CE ADH OC > µT CE RI P
H12 : µQF D ADH OC > µQF D RI P
H13 : µT C ADH OC > µT C RI P
H14 : µEU P RI P < 30%
98
Variables
The independent variables are the experience of the subjects, which will be collected through
the questionnaire and the proposed process. The dependent variables are the quality and amount
of defects found by subjects and the applicability of the proposed process.
Selection of Subjects
The non-probability sampling technique for subjects selection chosen in this experiment was
the convenience sampling (Wohlin et al., 2000), representing a non-random subset from the
universe of students from Software Engineering.
MATC66 graduate students will act as test managers and test architects, while MATB15
undergraduate students will act as test analysts and testers, following the roles defined in the
RiPLE-TE process. The tasks of designing assets and executing tests will be the responsibility
of MATB15 undergraduate students. These will basically perform the most laborious and time
consuming activity in this experiment, hence in order to ease the understanding, from now on
we will call these group simply as subjects.
These will be divided into two groups, in which one will be responsible for performing
testing activities in an ad-hoc fashion, whereas another will perform the tests applying the
RiPLE-TE approach. The division of groups will be based on the subjects’ expertise, as data
gathered from the background form, as explained in details in next section.
The motivation for the participants to attend the experimental study is based upon the
assumption that they will have an opportunity to engage a testing project so that they can put
into practice the knowledge gained throughout the course.
Design Types
The problem has been stated, and we have chosen our variables. Moreover, we have defined
the measurement scales for the variables. Hence, we are now able to design the experiment.
An experiment consists of a series of tests of the treatments, or even a set of tests. The design
type to be used in this experiment is the one factor with two treatments, in which we want to
compare the two treatment agains each other (Wohlin et al., 2000). In this sense, the factor is
the RiPLE-TE unit testing process and the treatments are following described:
1. Testing with the RiPLE-TE unit testing process. In this treatment, subjects will be
trained in the process, and they will have access to the process documentation (the master
test plan, the unit test plan, feature model, requirements specification, test scenarios,
99
process guidelines and usage examples), before and during the test performing session.
They are supposed to follow the guidelines so that we can draw conclusions regarding the
metrics prior mentioned.
2. Ad-hoc testing. In this treatment, subjects will not receive any training regarding a
specific process, but rather they will apply their expertise towards finding defects in the
SPL project code.
Instrumentation
The background and experience of the individuals is found through a survey handled out at the
first lecture, here named background questionnaire, in which they have to provide information
about their experience with software development, participation in projects, experience with
software product lines, software testing, and the tools the experiment requires, see Appendix B.1.
This data provided the input to the characterization of the subjects, serving as a means of
balancing the groups disposition.
Subjects will be given access to the artifacts necessary to perform the experiment, such as:
master test plan, unit test plan template, project code - components to be tested and specification
- requirements and use cases, and the error reporting form, the document in which errors
found must be carefully reported. In addition, the subjects will have access to the RiPLE-TE
documentation, and the tutorials used in the training as well as a guideline on how to report the
errors.
After being informed about the goals and general information on the experiment, all subjects
will sign a consent form (see Appendix B.2), as a means of agreeing to join the study. Their
signature means that they understand the information given about the study and in the consent
form as well. They will be informed that they can withdraw from the experiment at any time,
without penalty.
An important means of collecting information, maybe the most important, is the error
collection form, where subjects are supposed to report every defect found. Appendix B.3 depicts
a copy of the error reporting form. It includes details about every error found by a subject. We
chose to use such a form instead of using a bug tracking system, such as Trac6 , Bugzilla7 or
Mantis8 , due to timing reason. Although the fields were extracted from such systems, we believe
that a spreadsheet would be suitable for this experimental context. The use of spreadsheets to
6 Trac
- http://trac.edgewall.org/
- http://www.bugzilla.org/
8 Mantis Bug Tracker - http://www.mantisbt.org/
7 Bugzilla
100
extract data would be faster, due to the easy of implementation of parsers to extract data and
generate useful information. Surely, in a real industrial project, a bug tracking system is indeed
required.
In addition, there were designed three types of feedback questionnaires: one intended to
the group which conduct the experiment in an ad-hoc fashion (see Appendix B.4), and another
to the group that apply the RiPLE-TE unit testing process (see Appendix B.5). The feedback
questionnaires were designed to provide useful information on the use of the approach, through
gathering information about the participants’ satisfaction with the RiPLE-TE Unit Testing
process and the other elements that comprise the experimental study.
The third one refers to questionnaire exhibited in B.6, designed with the intention to gather
feedback on the subjects which performed the experiment without using the RiPLE-TE, regarding
their opinion about the possibility of finding more defects, if they had used the process. Unlike
the application of the previous ones, that is mandatory to all subjects, from both groups, this one
is intended to only a sample of subjects, who must answer it after the last training session, when
the RiPLE-TE is explained.
Subjects must be monitored in filling out the questionnaires, since these will provide useful
data for future analysis and improvements, regarding both the process and the experiment design.
Training
Table 5.2.1 indicates the experiment training and execution schedule, based on the elements the
experiment requires. It aids in identifying which trainings will be performed and the moment
they will occur throughout the experimental study. The table shows columns for groups 1 and 2,
referring to the division aforementioned. The both groups will work 5 days each, attending to
training sessions and the test execution session.
Regarding training sessions, firstly the subjects will become aware of the experiment purpose
and associated tasks, as well as an introduction to the SPL topic. It will last 4 hours. The tools
to be used in the experiment are JUnit and Eclemma code coverage tool. In order to balance
the knowledge, a training on such tools will be performed, which includes practical exercises.
Training sessions will last two 4-hour sessions.
As previously mentioned, one group will perform the tests without applying the RiPLE-TE,
but rather in an ad-hoc fashion. Looking at the Table it is possible to notice that Group 1,
responsible for such task, will execute the tests in advance to training in the RiPLE-TE. After
that, the training on the RiPLE-TE will take place. The both groups will attend this training,
that will last 3 hours. Then, Group 2 will be able to perform the tests correctly applying the
101
RiPLE-TE. Each group will have 4 hours to perform testing tasks.
Feedback 1 represents the moment in which subjects will report their feedbacks on the
experiment through filling in a questionnaire - Group 1, which performed tests without apply
the process will use the feedback questionnaire presented in Appendix B.4 and Group 2, which
performed tests using the RiPLE-TE will use the questionnaire presented in Appendix B.5.
Feedback 2 represents the moment in which an additional feedback questionnaire will be
applied to a sample of subjects from Group 1, right after they joined the training on the RiPLETE Process. As this group performed the experiment in an ad-hoc fashion, at this moment
they will give feedback on what they think that would be improved upon, by comparing this
experience with possible opportunities to improve the results they reported by using a process.
This was the main intention of enabling the first group to attend the training on the RiPLE-TE.
In summary, all activities mentioned in the Table will be sequentially executed, following
the schedule accordingly. Yet, all of them, except test execution, will be performed jointly.
Table 5.1 Experiment Training and Execution Agenda
Groups
1
Explanation on the Experiment
0:30
Introduction to SPL
2:30
Characterization
1:00
Consent Term
0:30
Training on JUnit/Eclemma
3:30
Training on JUnit/Eclemma
1:00
Practical Exercises
3:00
Day 1 - Nov 26t h , 2009
Day 2 - Dec 1st , 2009
Day 3 - Dec 3rd , 2009
Day 4 - Dec 5t h , 2009
Activity Length
2
Ad-hoc Testing
-
Feedback 1
Day 5 - Dec 9t h , 2009
Day 6 - Dec 10t h , 2009
4:00
4h
4h
4h
4:30 h
0:30
Training on RiPLE-TE
3:00
-
Test with RiPLE-TE
4:00
Feedback 2
Feedback 1
0:30
3h
4:30 h
102
Validity Evaluation
A fundamental question concerning results from an experiment is how valid the results are
(Wohlin et al., 2000). Thus, it is necessary to anticipate the threats possibly involved in the
context of an experiment.
(Wohlin et al., 2000) adopt the four-type-categorization of the threats to the validity of
the experimental results. They are: Internal validity, External validity, Construct validity and
Conclusion validity. Each will be further detailed, including threats that fit in the context of this
experiment:
Internal validity
Maturation. This is the effect of that the subjects react differently as time passes. As the
experiment (practical tests session) will be conducted during a continued 4-hour period, it is
possible that subjects are affected negatively (feel bored or tired) during the experiment, or
positively (learning) during the course of the experiment. Subjects were free to stop for some
moments, but they could not share information to other subjects.
Testing. If the test is repeated, the subjects may respond differently at different times
during the course of the experiment, since they acquire knowledge regarding how the test is
conducted. If there is a need for familiarization to the tests, it is important that the results
of the test are not fed back to the subject, in order not to support unintended learning. In
addition, subjects performed the experiment without external interference, only minor doubts,
e.g. sintax commands, could be solved by the experimental study staff (Ph.D. students, as earlier
mentioned).
External validity
Generalization of subjects. This is an affect of having a subject population not representative of the population we want to generalize to, i.e. the wrong people participate in the
experiment. Although this is an experiment involving a SPL project, some training sessions
on the topic will be held, involving subjects in practical sessions, in which they can become
familiar with the tools that will be used as well as the purpose of product lines, and so on. In
addition, the experiment will put into practice the knowledge on software testing the subjects
have acquired along a semester. However, if subjects succeed in using the proposed approach, it
is not convincing that we could generalize its use to SPL testing practitioners at all.
Generalization of scope. This is the effect of not having the experimental setting or material
representative of, for example, industrial practice. The experiment will be conducted on a defined
103
time according to the schedule of the undergraduate course, which may affect the overall results.
The scope is tied to the course schedule in order to make feasible its completion. Thus, although
a big domain involves the project in question, only a sample scenario was selected to this
experiment. Otherwise, it could not be possible to have everything finished within the timetable.
It is a clear indication that the process would fail if it was applied to a larger project, rather than
a toy context.
Construct validity
Mono-operation bias. Since the experiment includes a single independent variable, the
experiment may under-represent the construct and thus not give the full picture of the theory.
Experimenter expectancies. The experimenters can bias the results of a study both conciously and unconciously based on what they expect from the experiment. We hence tried to
reduce such a threat by involving Ph.D. students from other institution, that are interested in
practices on experiments, as mentioned earlier in this Chapter, in work together in the whole
study, which result in a set of meetings, during classes, to discuss, plan, and review the study.
Hypothesis guessing. When people take part in an experiment they might try to figure out
what the purpose and intended result of the experiment is. They are likely to base their behavior
on their guesses about the hypotheses. To minimize this risk, all formal definition and planning
of the experiment are being carefully designed in advance, and we search for valid measures in
the literature to aid in hypothesis definition, although not all metrics are reported in literature.
Conclusion validity
Reliability of measures. The validity of an experiment is highly dependent on the reliability
of the measures. Objective measures are more reliable than subjective ones, since humans do
not judge and hence do not influence the results. Thus, in objective metrics, the replication of
a phenomenon will always have the same outcome. Thus, the expertise of PhD students on
experimental software engineering will be helpful in defining objective more than subjective
metrics, in order to improve the reliability of our measures and results, consequently.
Heterogeneity of subjects. When the group is very heterogeneous, there is a risk that the
variation due to individual differences is larger than due to the treatment. It can represent a
threat for the conclusion validity. Since the experiment will be conducted with undergraduate
students in the last periods, and training on the required tools and techniques will be performed,
we can reduce the risky heterogeneity.
104
5.2.2
The Experimental Study Project
The project chosen for the experiment consists of a SPL project in the conference management
domain. The goal of the project is to develop the RiSE Chair product line, destined for papers
submission in conferences, journals, and related events, and its management, including the
control over the review life cycle.
The RiSE Chair platform was conceived based on largely used conference management
systems, such as: EasyChair9 , JEMS10 and CyberChair11 . It composes a core asset base
integrating many features to make it suitable for various conference models. Thus, it enables to
derive products based on this common base.
The code was developed through the J2EE platform, including Spring and Hibernate frameworks. As Database Management System, an instance of MySQL was used.
In its initial planning, three products were derived from the core asset base:
• RChair Plus. This product is a complete system for papers revision and submission
directioned for journals and conferences. Besides, the RiSE Chair Plus has a module
of event management and aims to offer all the functionality required by user of others
systems similar.
• RChair Journal. This product is destined for papers submission in Journal and its management. It aims to simplify the submission and revision procedure and to offer resources
to manage journals.
• Smart RChair. This product is destined for papers submission in conferences and its
management. It aims to simplify the submission and revision procedures and to offer
resources to manage conferences.
Figure 5.1 illustrates those products’ logos.
The amount of 41 features were identified in the scoping analysis, as can been see in Figure
5.2, which depicts the feature model. They originated 8 core components. From this set of
features, we selected EventManagement to apply our process. This feature comprises the core
component EventManagement, which depends upon the components RiseSplCore, where basic
entities are implemented and AccessControl, which give users access to the system.
9 http://www.easychair.org/
10 https://submissoes.sbc.org.br/
11 http://www.borbala.com/cyberchair/
105
Figure 5.1 RiSE Chair Products
EventManagement components implements a variation point, referring to event creation,
which can be set in both types: (1) from scratch, in which all information regarding an event
must be informed; and (2) based on a previous event, in which information reuse is encouraged.
This variability is represented in the feature model (figure 5.2).
5.2.3
The Operation
After designing and planning the experiment, it must be carried out in order to collect the data
that should be analyzed. This is the operation phase, which is basically consisted of three steps:
preparation, where subjects are chosen and experiment material is prepared; execution, where
the subjects perform their tasks according to different treatments and data is collected, and data
validation, where the collected data is validated (Wohlin et al., 2000). Each will be following
detailed.
Preparation
The subjects were 32 undergraduate students from the MATB15 course, from Computer Science
department at Federal University of Bahia, Brazil - UFBA -, representing all students regularly
enrolled in such a course. All students had previously enrolled in courses on object-oriented
programming, Java programming language, and most recently on software verification and
validation, thus including testing concerns. Hence, we believe this set is representative to be
involved in this experiment.
The subjects have to agree to the research objectives. They comply with the experiment
by signing up the consent form, as previously mentioned in Section 5.2.1. The results of the
personal subjects performance in the experiment will be kept confidential. Moreover, they were
informed that we would like to investigate the outcome of the process application. However,
they were not aware of which aspects we intended to study, thus including the hypotheses stated.
106
Figure 5.2 RiSE Chair Feature Model used in the experiment.
107
Execution
The experiment was conducted from November to December in 2009, according to the definition
and planning documented, and data was collected. The activities involved in the experiment
were followed by the subjects. Table 5.2.1 sketches the agenda of experiment training and
execution sessions.
The characterization, when subjects filled in the background form, enabled us to divide
subjects into two groups, one to perform the experiment in an ad-hoc fashion (group 1), and
other to apply the RiPLE-TE process (group 2). Moreover, the balancing strategy was then
applied in order to have two groups as balanced as possible in terms of expertise. Subjects’
expertise was considered for balancing purposes.
Table 5.2 presents the initial division of subjects into their respective groups. The division
is not equal since as the rounds would be held in different days, it caused the transfer of some
subjects from group 1 to group 2 and vice-versa, so that everyone could participate. Indeed, this
fact does not negatively influence on the results.
Table 5.3 and Table 5.4 present a condensed view of the subjects’ profile, respectively group
1 - which performed the experiment in an ad-hoc fashion -, and group 2 - which performed
the experiment using the RiPLE-TE Process. In these tables, their experience with software
development, Java testing, SPL and JUnit is presented, in terms of months dealing with such
aspects. Experience with Eclemma was not presented in this table, since no subjects had
experience with such a tool. We believe this lack of prior expertise does not represent a threat to
the validity of this experiment since the commands required for this experiment were addressed
in the training sessions, and were considered elementary for a group of subjects in the last period
of a Computer Science undergraduate course. Moreover, subjects had to their disposal a tutorial
with a step-by-step guide on how to correctly use the tool in the experimental study context.
Moreover, their participation in industrial development and test projects is presented as
well. This information is a clear indicator on how experient the subjects are in the topics
required to execute this study. Yet, the experience of the subject regarding english reading, in a
trichotomous categorisation: advanced, intermediate and basic. This information may impact on
the results since all the artifacts that comprise the set of instruments are english-writen. Another
information that can be seen in this table refers to their knownledge on other framework for
unit tests, in addition to JUnit - [A] NUnit12 , [B] RSpec13 , [C] Test::Unit14 , [D] Selenium, and
12 NUnit
is a unit-testing framework for all .Net languages - http://www.nunit.org/
provides a behaviour driven development framework for Ruby - http://rspec.info/ http://rspec.info/
14 Test::Unit is Ruby Unit Testing Framework - http://test-unit.rubyforge.org/
13 RSpec
108
[E] other. The form included a field in which they could write down tools that they had some
expertise. Much probably subjects with have already worked with any unit testing framework
can easily understand how JUnit works.
Table 5.2 Subjects divided into groups
Group
1
2
Subjects ID
# of subj.
1
2
3
5
6
8
10
11
14
18
21
22
23
24
25
4
7
9
12
13
15
16
17
19
20
28
29
30
31
32
26
17
15
27
Table 5.3: Subjects’ Profile from Group 1 - Ad-hoc fashion
Subject
English
Particip. in Industry
Experience in
Testing
ID
Reading
Dev. Project
Test. Project
Programming*
Java*
Testing*
SPL*
JUnit*
Tools
1
advanced
No
Yes
48
0
2
0
2
A
2
intermediate
No
Yes
24
0
5
0
1
A
3
advanced
Yes
Yes
18
0
12
0
0
C
5
advanced
No
Yes
50
30
4
0
1
C
6
intermediate
No
No
12
6
6
0
2
B,C
8
advanced
No
Yes
53
30
11
0
2
C,E
10
intermediate
No
No
60
36
6
0
6
E
11
advanced
No
No
48
30
6
0
6
-
14
intermediate
No
Yes
42
24
12
0
6
C
18
basic
No
No
24
24
0
0
0
-
21
advanced
No
No
54
6
3
0
1
C
22
advanced
No
Yes
48
6
4
0
4
A
23
intermediate
No
Yes
48
6
4
0
0
C
24
advanced
No
Yes
48
30
4
0
4
-
25
intermediate
No
Yes
30
30
4
0
4
A
26
intermediate
No
Yes
30
12
3
0
3
-
27
intermediate
No
Yes
36
18
4
0
0
C
(*) the experience is expressed in months.
109
Table 5.4: Subjects’ Profile from Group 2 - RiPLE-TE
Subject
English
Particip. in Industry
Experience in
Testing
ID
Reading
Dev. Project
Test. Project
Programming*
Java*
Testing*
SPL*
JUnit*
Tools
4
advanced
No
Yes
30
0
2
0
0
E
7
intermediate
Yes
Yes
50
36
6
0
3
C
9
intermediate
Yes
Yes
40
28
10
0
2
D
12
basic
Yes
Yes
32
24
6
0
3
-
13
intermediate
No
Yes
24
24
6
0
2
B,C
15
advanced
Yes
Yes
30
24
18
0
18
D,E
16
advanced
No
Yes
54
30
12
0
8
C,E
17
intermediate
No
Yes
24
3
2
0
0
-
19
advanced
No
Yes
60
6
3
0
0
E
20
intermediate
No
No
36
4
4
0
0
-
28
intermediate
No
Yes
48
24
3
0
2
A,E
29
advanced
No
No
32
24
4
0
2
E
30
basic
No
No
36
24
4
0
0
C
31
advanced
No
No
40
24
3
0
3
B
32
intermediate
No
Yes
36
12
4
0
0
E
(*) the experience is expressed in months.
The first training consisted of an explanation on the topic SPL, in which subjects from both
groups were presented to this new software development approach. Moreover, explanation on
the experiment’s objective, their roles in this study were addressed.
Then, the tools to be used in the experiment were subject of training. It included JUnit
framework15 and Eclemma code coverage tool16 . The former was used in order to create test
cases and automate unit test suites. The second, in order to gather the code coverage they
covered with test cases, as they were requested to.
The subjects were said for not to implement new features, but rather analyze the components
they were given and implement the test cases to evaluate those. Thus, their tasks involved
analyze the code and specifications, create test assets, execute test cases and suites, and report
15 http://www.junit.org/
16 http://www.eclemma.org/
110
the findings in the proper form (Appendix B.3). They had to perform unit tests only in the
EventManagement component, although they should access the code of the other components
and their specification, in order to known the flow of information as well as classes, methods and
attributes. In addition, subjects are encouraged to reuse test cases they create in terms of saving
effort, regarding to the variability inserted in the portion of code they have at their disposal. The
source code they produced containing the unit test cases must be recorded in a repository for
further analysis.
At the end of the experiment, each of the 32 participants completed a feedback questionnaire,
either type A (Appendix B.4) or B (Appendix B.5). Moreover, the type C (Appendix B.6) was
answered by 6 subjects from group 1, randomly chosen.
Data Validation
Data was collected from 32 subjects. However, data from 2 subjects (IDs 22 and and 29) were
removed, since they either did not participate in all activities of the experiment, or they did not
complete the forms as requested at the beginning of the experiment. Table 5.5 shows the final
grouping of subjects. Although we were counting on all subjects, we believe that the absence of
two of them does not invalidate our work, in terms of statistical analysis and interpretation of
the results.
Table 5.5 Final grouping of subjects
Group
1
2
5.2.4
Subjects ID
# of subj.
1
2
3
5
6
8
10
11
14
18
21
23
24
25
4
7
9
12
13
15
16
17
19
20
28
30
31
32
26
27
16
14
The Analysis and Interpretation
The third phase in the experiment is the analysis and interpretation of the gathered data. We
want to be able to draw valid conclusions, therefore we will analyze every artifact produced by
the subjects, including the error reporting form, the feedback questionnaire and the source code
of test cases they developed using JUnit. The analysis was performed using descriptive statistics
and is described along this section.
We analyzed the results of the both groups 1 and 2, considering the whole group components,
but we also analyzed the results considering the expertise of subjects. We decided to created
two subgroups for each group, named expert and non-expert subgroups.
111
As a means of deciding how to categorize each subject in one of them, every profile’s element
received a weighting factor, and all together were arranged in a formula, SX P , as follows:
SX P = 2θ +
3α+5β
a
+
3γ+2ε
b
+ 5δ
The elements that compose this formula came from the subjects profile and are next described:
α = Experience in programming
β = Experience in Java
γ = Experience in testing
δ = Experience in SPL
ε = Experience in JUnit
σ = Experience in industrial development pro jects
ω = Experience in industrial testing pro jects

2, if subject has experience in industrial dev. projects
a=
3, if subject has no experience in industrial dev. projects

2, if subject has experience in industrial testing projects
b=
3, if subject has no experience in industrial testing projects
θ = English reading expertise (scale: 1 - basic, 2 - intermediate, 3 - advanced)
Table 5.6 Subjects Expertise, calculated through SX P formula.
Group
1
2
Subjects ID / Score
1
2
3
5
6
8
10
11
77.3
45.7
41
156.7
44.3
168.8
194
159
14
18
21
23
24
25
26
27
143
102
101.7
95
155.7
130.7
84
107
4
7
9
12
13
15
16
17
49
177
145.3
122
107.3
137
175.3
49.5
19
20
28
30
31
32
110
72
140.3
124
127
92
112
Table 5.7 Distribution of Subjects considering the expertise coefficient
Experts
Non-Experts
5
24
1
21
GROUP 1
8 10 11
25
2
3
6
23 26 27
14
18
7
28
4
32
GROUP 2
9 12 15
30 31
13 17 19
16
20
Therefore, all subjects received a score according to their expertise. Table 5.6 shows the
scores for each subject.
The resultant scores, composing the data set, were arranged in a interval. The median value,
which represents the middle value of the data set, was chosen to be the threshold, in which
denotes 50% of the samples below this value, representing the non-experts, and the 50% over
representing the experts. The final distribution of subjects is presented in Table 5.7.
The formula uses arbitrary values, since there was not found evidence in the literature that
could aid in defining a mathematical formula that could combine to such grouping strategy.
However, we established a scoring, in which we considered the importance of each element
composing the profile.
Test case effectiveness
Table 5.8 Amount of Designed Test Cases
Experts
Non-Experts
The whole group
Group 1
93 (mean: 13.3)
73 (mean: 8.1)
166 (mean: 10.4)
Group 2
78 (mean: 9.75)
36 (mean: 6)
114 (mean: 8.1)
Table 5.9 Amount of Defects Found
Experts
Non-Experts
The whole group
Group 1
Valid Invalid
55
1
44
5
99
6
Group 2
Valid Invalid
35
8
19
1
54
9
Regarding the use and applicability of the TCE metric, in order to determine whether a test
suite was effective in a test suite, we compared the TCE from the both groups. We calculated for
the TCE as a ratio of the total amount of designed test cases by the total of valid defects found.
113
Table 5.10 Test Case Effectiveness
Experts
Non-Experts
The whole group
Group 1
59.1%
60.1%
59.6%
Group 2
44.8%
52.7%
47.3%
As a result, the Null Hypothesis H01 was not rejected, since H01 : µT CE ADH OC > µT CE RI P , in
the different situations, as when considering the subgroups individually as the whole groups.
Quality of defects found
In terms of valid defects found, Figure 5.3 shows that subjects using the process found less
defects than subjects using an ad-hoc approach. Table 5.11 give detailed information regarding
the defects found by subjects.
Figure 5.3 BoxPlot of defects found by groups, including outliers.
All defects found were tabulated so that similarities could be extracted. Then, we identified
12 groups of defects, and classified them according to associated Difficulty and Severity, as can
be seen in table 5.12. The classification was done based on a discussion performed with the
Ph.D. students, considering their industry expertise.
Besides, when false positives were identified by subjects, these were considered invalid
errors and were not included in the score analysis.
114
Table 5.11 Defects found by groups
Defects Found
Min.
1st. Quartile
Median
Mean
3rd. Quartile
Max.
Std. Dev.
Group 1
Group 2
1.000
5.000
5.000
6.188
6.500
14.000
3.087
0.000
0.250
4.000
3.857
6.000
12.000
3.378
The score was then extracted from a function of defects found, defined on the basis of
(IEEE, 1988). Every defect is classified according to its Difficulty - f(DD) - and Severity - f(SV).
Table 5.12 presents the classification. The weighting factors (k for Difficulty and r for Severity)
are displayed in the bottom of the Table. The values range from Low values, representing trivial
defects, to High values, representing critical defects.
The formula for calculating the score (SDS ) is presented below. Sub corresponds to the set
of subjects.
f(DD) =
∑
Ei .kl +
Ei ∈DL
f(SV) =
∑
Ei .km +
Ei ∈DM
Ei .rl +
Ei ∈SL
SDS =
∑
∑
Ei ∈SM
Ei .rm +
∑
Ei .kh
Ei ∈DH
∑
Ei .rh
Ei ∈SH
f (DD) + f (SV )
N
∑
Si
i∈Sub
These formulas were created to the purpose of this experiment. Since we used the same
formula for the both treatments and associated subgroups, as well as in the second experiment
the same calculation will be performed, hence, for this specific context, the values do not
negatively influence the results. We have no evidence about the effectiveness of such formula is
other contexts. It might not be applicable to any other experimental context, unless a series of
applications prove that. They probably have to be calibrated before applying in other experiment.
When we consider the quality of valid defects, in terms of Difficulty and Severity levels,
we have indicatives that subjects from the both groups have almost the same results. It can be
identified by analyzing the Figure 5.4, with 6 boxplots (line 1 shows the group of experts from
115
Table 5.12 Difficulty and Severity of defects found
Low (l)
E1
E2
E3
E5
E6
E11
E12
Difficulty
Medium (m)
E4
E7
E8
High (h)
E9
E10
Low (l)
E4
E7
E10
3
2
Coefficient (k)
2
1
Severity
Medium (m)
E11
High (h)
E1
E2
E3
E5
E6
E8
E9
E12
Coefficient (r)
4
6
Table 5.13 Amount of defects found in terms of Difficulty and Severity
Difficulty
Low
Experts
Group 1
Non-Experts
Experts
Group 2
Non-Experts
33
33
24
16
Medium
8
6
7
0
Severity
High
Low
14
5
4
3
Medium
12
9
4
3
6
2
2
1
High
37
33
29
15
the two treatments; line 2 shows the group of non-experts; and line 3 shows the general results,
including both groups) generated according to the score of valid defects found, in function of
Difficulty and Severity coefficient, represented in column (A) values including outliers and (B)
without outliers.
By analyzing the boxplots, in which experts are explored, we can see that there is a clear
pattern that subjects who did not use the process have better results. Thus, it may be possible to
prove the results in a hypothesis testing. The t-test (unpaired, two-tailed) with 95% of confidence
is shown in Table 5.14 (group of experts), Table 5.15 (non-experts) and Table 5.16 (both groups).
Table 5.14 Results from the t-test applied to Test Score - Experts
Degrees of freedom (df)
13
p-value
0.1390
t-value
1.5762
In the analysis, considering all scenarios, the t-test did not reject the Null Hypothesis H02 .
Thus, we can conclude that there was no gain using the process instead of an adhoc fashion.
116
Figure 5.4 BoxPlots of Scores from defects found.
117
Table 5.15 Results from the t-test applied to Test Score - Non-Experts
13
p-value
0.278
t-value
1.1321
Table 5.16 Results from the t-test applied to Test Score - Both groups
28
p-value
0.1079
t-value
1.6609
Test coverage
Table 5.17 presents the average test coverage of each group, considering (1) the subgroups and (2)
the value for the whole groups. The values exhibited in this table demonstrates that subjects that
did not apply the process (group 1) covered a larger amount of code, as in the two subgroups as
in the overall value. Thus, the coverage confirms the Null Hypothesis, H06 : µT C ADH OC ≤ µT C RI P .
Table 5.17 Test Coverage
Experts
Non-Experts
The whole group
Group 1
73%
61%
66%
Group 2
71%
53%
63%
Approach effectiveness and difficulties in using the process
Data from subjects who applied the process, who composed Group 2 form the input for this
analysis. Figure 5.5 presents the distribution of the process effectiveness according to subjects’
opinion. Factors next listed were considered so that subjects must attribute them with a YES/NO
value:
1. Subject needed additional information other than the available artifacts.
2. RiPLE-TE guidelines were properly followed.
3. RiPLE-TE was efficient in finding defects.
4. RiPLE-TE was effective in finding defects.
5. RiPLE-TE was helpful to find more defects.
118
5.3. SECOND EXPERIMENT
Figure 5.5 Distribution of RiPLE-TE effectiveness.
In addition, a text field was available for subjects to freely comment his/her choice.
The opinion was gathered from the feedback questionnaire, applied right after they had
performed the experiment. The majority of subjects approved the use of the process, with ratios
higher than 60%, except factor on how effective the process was. This factor deserves special
attention. In this case, an expert subject (ID 7) judged the process as ineffective in terms of
finding defects, since according to his comments, it does not influence either on testing assets
design or finding defects concerns. In general, the overall comments regarding difficulties
referred to lack of expertise in either Java language or Eclipse IDE, which directly impacted on
their activities.
Finally, the Figure 5.5 shows that, in only one factor the Null Hipothesis, H04 : µEU P RI P ≥ 20%,
was rejected, while in the remaining ones, it was confirmed.
5.3
Second Experiment
The second experiment replicated the first one in terms of Goal, Question and Metrics. Besides,
the same project (including source code, especifications and tool support) was applied.
119
However, unlike the first experiment, the design type used in the second was one factor
with one treatment (Wohlin et al., 2000). Our intention was to investigate the results of the
replication of the RiPLE-TE unit testing process, using as baseline values the results from the
first experiment, regarding to the application of the process. In that case, the results of Group
2. This helps finding out how much confidence it is possible to place in the results of the
experiment.
Along this section we describe the significant differences between the experiments and the
results achieved this time.
5.3.1
The Planning
The experiment was conducted in a Software Engineering post-graduate course at the Salvador
University (UNIFACS), Brazil. Along the course, students have classes regarding the whole
development life cycle, such as: Analysis & Design of Object-Oriented Software (a 28-hour
course), Software Requirements (30-hour), Software Reuse and Component-Based Development
(28-hour), Software Product Lines (15-hour), Software Quality (30-hour), Software Testing and
Inspection (30-hour), and other disciplines. Thus, prior to joining the experiment, students had
already attended classes that form the requirements for performing this experimental study.
As in the first experiment, all subjects were informed about the objectives of the experiment
and their roles and activities. The instruments used as evaluation, such as the characterization
form and the feedback questionnaires were also applied to this sample of subjects.
The schedule of the experiment was different from the prior. We could not count on 5 days
to conduct the experiment, due to time constraints. The format of the course one let us to use
two 4-hour-class as experiment sessions. As our intention was to only apply the RiPLE-TE, we
rearranged the calendar in order to have two sessions, one on characterization (consent form
and feedback questionnaire filling) and explanation of the experiment and training, with JUnit
framework and Eclemma, and other on performing the experiment. It could be feasible because
subjects had background on SPL and some theorical feedback on Testing, since before attending
the experiment, they had five 4-hour intensive classes on the topic, in which topic addressed
satisfies the experimental study requirements.
The measures evaluated were the same, differing only in the sense that this experiment was
intended to compare the results of applying the process. Thus, the comparison baselines set up
for them were the resultant values from the first round. The only exception is the value of µEU P ,
that remained the same. Therefore, the hypotheses were changed. Following they are listed:
120
• Null Hypothesis (H00 ).
H0 01 : µT CE RI P ≤ 47%
H0 02 : µQF D RI P ≤ 33%
H0 03 : µT C RI P ≤ 63%
H0 04 : µEU P RI P ≥ 30%
• Alternative Hypothesis (H10 ).
H1 01 : µT CE RI P > 47%
H1 02 : µQF D RI P > 33%
H1 03 : µT C RI P > 63%
H1 04 : µEU P RI P < 30%
The sampling technique for subjects selection was the convenience sampling and the study
was conducted as a single object study, in which it is conducted on a single subject and a single
object study.
Validity Evaluation
Possibly threats were also anticipated in this experiment, to reduce the risk of make the results
invalid. Each will be further detailed, according to the categories listed in (Wohlin et al., 2000)
and applied in the first experiment.
Internal validity
Maturation. Like in the first experiment, the practical test session will be conducted during
a continued 4-hour period, which can make possible that subjects are affected negatively (feel
bored or tired) during the experiment, or positively (learning) during the course of the experiment.
The same idea was applied, to allow subjects stop for some moments, but with the constraint of
not allowing them to share information to other colleagues regarding the experiment.
Testing. Subjects may respond differently at different times during the experiment, since
they acquire knowledge regarding how the test is conducted. If there is a need for familiarization
to the tests, it is important that the results of the test are not fed back to the subject, in order not
to support unintended learning.
121
External validity
Generalization of subjects. This experiment included subjects from post-graduate course,
which comprises disciplines that fulfill the requirements to attend it. Unlike the first experiment,
in which available time was enough, this experiment will be held in a time-constrained environment. Hence, we should count on the knowledge acquired by the subjects along prior disciplines
as well as their expertise in both indutry and academic projects.
Generalization of scope. The experiment will be conducted on a defined time according to
the schedule of the post-graduate course, which may affect the overall results. The scope is tied
to the course schedule in order to make feasible its completion. Thus, although a big domain
involves the project in question, only a sample scenario was selected to this experiment, the
same applied in the first experiment.
Construct validity
Experimenter expectancies. The experimenters can bias the results of a study both conciously and unconciously based on what they expect from the experiment. This experiment will
serve as basis for future replications, as the same way the first experiment provided the baselines
for this one. Hence, it is indeed necessary to report the results as really gathered, without making
distortions.
Hypothesis guessing. In order to minimize this risk, all formal definition and planning of
the experiment are being carefully designed in advance, and we search for valid measures in the
literature to aid in hypothesis definition, although not all metrics are reported in literature.
Conclusion validity
Reliability of measures. Thus, the results obtained in the first experiment will serve as
baselines for this second round. This fact, associated to the expertise of PhD students on
experimental software engineering will be helpful in defining objective more than subjective
metrics, in order to improve the reliability of our measures and results, consequently.
Heterogeneity of subjects. Like what was experienced in the first experiment, this scenario
of heterogeneity will take place again. In a post-graduate course the students’ profiles is
even more divergent than the prior scenario, since students have different ages, expertises and
objectives. The variation due to individual differences is larger than due to the treatment. It
can represent a threat for the conclusion validity. Hence, the analysis will consider two groups,
according to an average expertise, this way we can reduce the risky.
122
5.3.2
The Operation
The experimental study was conducted in January 15th and 16th, 2010, at Salvador University,
during the Software Testing course, part of the Software Engineering Post-Graduate Program.
We had at our diposal a lab containing PCs with the structure we needed, in terms of support
tools and IDE, to perform the tests and report it. The amount of 13 students were involved in
this experiment. Table 5.18 presents a condensed view of the subjects’ profile.
Table 5.18: Subjects’ Profile in the 2nd Experimental Study
Subject
English
Particip. in Industrial
Experience in
Testing
ID
Reading
Dev. Project
Test. Project
Programming*
Testing*
SPL*
JUnit*
Tools
1
basic
Yes
No
7
0
1
0
-
2
advanced
Yes
No
3
0
0
0
-
3
intermediate
Yes
No
5
0
0
0
-
4
intermediate
Yes
No
5
0
3
0
-
5
intermediate
Yes
No
19
0
0
0
-
6
basic
No
No
13
0
0
0
-
7
basic
Yes
Yes
9
6
0
6
-
8
basic
Yes
Yes
10
6
0
4
-
9
basic
No
Yes
2
1
0
2
-
10
intermediate
Yes
No
4
0
2
0
D
11
basic
Yes
No
2
0
1
0
-
12
intermediate
No
No
13
0
1
0
D
13
intermediate
Yes
No
22
0
0
0
-
(*) the experience is expressed in years.
The SX P formula was also used in order to have the division regarding the subjects’ expertise.
Table 5.19 shows the scores for each subject and following in Table 5.20 the distribution of
subjects within Experts and NonExperts group is presented.
123
Table 5.19 Subjects Expertise, calculated through SX P formula - 2nd exp.
Subjects ID / Score
1
2
3
4
5
6
7
188
60
94
274
346
158
344
8
9
10
11
12
13
338
68
196
116
220
400
Table 5.20 Distribution of Subjects considering the expertise coefficient - 2nd exp.
4
5.3.3
EXPERTS
5 7 8 12
13
1
NON-EXPERTS
2 3 6 9 10
11
The Analysis and Interpretation
This section describes the results based on the analysis of the feedback questionnaires as well as
the error reporting forms and the source code the subjects delivered at the end of the experiment.
By looking at the boxplot in Figure 5.6, in which the distribution of subjects in the first (A)
and in the second (B) experiments are presented, considering all 13 subjects, we can clearly
notice that subjects in the second experiment have greater experience than subjects run previous
experiment. As the amount of subjects in this round was no extensive enough, we considered all
of them in the analysis.
Figure 5.6 Boxplot with the distribution of subjects in (A) first and (B) second experiments.
124
Table 5.21 Amount of Designed Test Cases - 2nd exp.
Experts
Non-Experts
The whole group
58 (mean: 9.7)
56 (mean: 8)
114 (mean: 8.8)
Table 5.22 Amount of Defects Found - 2nd exp.
Experts
Non-Experts
The whole group
Valid
30 (mean: 5)
29 (mean: 4.1)
59
Invalid
0
6 (mean: 0.9)
6
Test case effectiveness
In this first analysis, we noticed that the result for both non-expert and expert subgroups
were very similar. Although experts did not report any invalid defect, we indeed expected
a better performance from them. The overall result have refuted the Null Hypothesis H0 01 :
µT CE RI P (51.7%) > 47%, as can be seen in Table 5.23.
Quality of defects found
We once again applied the formula to calculate the score (SDS ) in function of severity and
difficulty levels, based on the amount of defects found during the experiment, as shown in
Table 5.24. The score obtained with the formula is presented in Table 5.25. As the same
formula was applied we could then make a comparison with confidence. The Null Hypothesis
H0 02 : µQF D RI P (35.1%) > 33% was refuted.
Test coverage
Even though subjects had a very short time to understand the business rules and the process as
well, they covered a large amount of code.The results (Table 5.26 showed better numbers for the
second experiment, which refuted the Null Hypothesis H0 03 : µT C RI P (81%) > 63%.
Table 5.23 Test Case Effectiveness - 2nd exp.
Experts
Non-Experts
The whole group
51.7%
51.8%
51.7%
125
Table 5.24 Amount of defects found in terms of Difficulty and Severity - 2nd exp.
Difficulty
Low
Experts
Non-Experts
25
25
Medium
Severity
High
1
3
4
1
Low
Medium
3
1
8
3
High
19
25
Table 5.25 Scores - 2nd exp.
Experts
Non-Experts
The whole group
46.2%
32.1%
35.1%
Approach effectiveness and difficulties in using the process
Unlike subjects from the previous experiment, as the subjects from second round have, on
average, more experience than the first group, they were more concerned about the availability
of detailed specification documents, in order to avoid time wasting. 2 from 7 subjects, which
represents 29% mentioned the lack of detailed specification documents and/or comments in the
source code as the main problem they found. In the experts group, 3 from 6 experts, 50% also
complained about the lack of documentation. According to them, the activity would have better
results if further documentation was provided.
In the non-experts side, 5 from 7 subjects, which represents 71% reported they followed the
RiPLE-TE process. In the experts side, 67% of subjects, 4 from 6, ensured they followed the
RiPLE-TE during the experiment.
All non-expert subjects - 100% - reported positively regarding the efficiency of the approach.
On the other hand, only 67% of experts confirmed the process efficiency.
Regarding effectiveness, 6 from 7 non-experts, 86% of them reported positive, while 67%
from the experts reported positive about the effectiveness of the process.
Hence, the Null Hypothesis H0 04 : µEU P RI P ≥ 30% was confirmed in all factors analyzed,
exceeding the baseline value.
Table 5.26 Test Coverage - 2nd. exp.
Experts
Non-Experts
Average
79.8%
84%
81%
126
5.4. LESSONS LEARNED
5.4
Lessons Learned
After concluding the experimental studies, we have gathered useful information that can serve as
a guide to future replications of experiments following the structure presented along this Chapter
and even other general experiments in the SPL Testing practice.
Some important aspects should be considered, specially the ones seen as limitations in
these two initial experiments. The general impressions gathered from the both experiments are
following listed:
Training. Subjects reported the lack of expertise in the tools used in the experiment, specially
JUnit Framework. It is very interesting to either have a sample of subjects who have a certain
knowledge on the tools or conduct more training sessions before conducting the experiment.
JUnit is indeed a complex framework for beginners acting as subjects in an experiment with
directly involves unit testing in the Java platform. We experienced such issue in both experiments.
In a general way, subjects who reported in the characterization activity - see Tables ?? (first
experiment) and 5.18 (second epxeriment) - a lower level of experience in the framework, were
the same who reported this issue.
Questionnaires. After concluding the experiment, we noticed that useful information was
not collected, such as the subject’s impression of using the process, or even the points missed
by the approach, and so on. On the other hand, we have collected information that we did not
analyze, such as subjects’ satisfaction with the training sessions.
Project. We would select a project with more specification available in advance. Subjects,
mainly from the second experiment, which included more experts, and more specifically subjects
with a large experience in industry, complained about the lack of documentation in order to
help them to create the test assets. They asked for more test scenarios. Moreover, as we are
dealing with a SPL project, we really need a project containing many variation points and
variants, in order to analyze the impact of the process in such topic. In the project we used in
this experimental study, although the portion of code we chose contained variabilities, just few
subjects, in the group of experts, in both experiments, reused the assets. Most subjects created
everything from scratch than reused.
Measurement. We did not report on the reused artifacts since we did not establish a metric
to evaluate that. There was no evidences in the literature that would help us in defining such
metric. It is really necessary for next experiments. Furthermore, metrics such as DRE (Craig and
Jaskiel, 2002), defect removal efficiency, a very applied measure, but that can only be applied in
a whole project, to check its results in a valuable way, should be included when the whole test
127
levels were considered. Moreover, metrics to evaluate the ease of use should also be collected.
5.5
Chapter Summary
This Chapter presented the experimental studies conducted in order to evaluate the RiPLE-TE
unit testing process. It included the definition, planning, operation, analysis and interpretation
of two experiments performed, which following the guidelines for conducting experimental
studies in software engineering defined by Wohlin et al. (2000). The first experiment was
conducted at the Federal University of Bahia, involving undergraduate students, and the second
was conducted at the Salvador University, involving post-graduate students. The SPL project
used in this experimental study is inserted in the conference management domain.
The experiments analyzed the effectiveness of the use of the proposed unit testing process
for SPL, in order to gather empirical evidences on the examine of this test level in SPL projects.
In the first experiment, one factor with two treatments was the design type applied, in which
the results of the application of the process were compared to the results applied with no specific
process, i.e. ad-hoc. As a result, all null hypotheses were confirmed.
The second experiment followed the same structure as the first, in terms of phases, goals
and questions, with some slightly differences in the measures. This second experiment had
as motivation the intention of evaluating the use of the process, with results from the first
experiment serving as baseline values. This time, only one null hypothesis was not refuted.
The results can not be fully conclusive regarding the feasibility of applying the whole SPL
testing process in an industrial project, since we only applied the unit test level. The remaining
set of levels should be also analyzed. It did not occur in this setting, since a larger project
would be required, in order to make the empirical evaluation effective. Moreover, other metrics
should be collected before, that may serve as improvement points. Hence, we believe that more
experiments should be performed taking into account the lessons learned before.
Next chapter presents the conclusions of this dissertation, some related works, and directions
for future works, based upon this process and beyond it.
128
6
Concluding Remarks
Based on the systematic and planned reuse of previous development efforts among a set of
similar products, the SPL approach enables organizations not only to reduce development and
maintainance costs but also to achieve impressive productivity and time-to-market gains (Kolb
and Muthig, 2003). Software development costs and time to deploy a software-intensive system
significantly decrease when the SPL approach is applied (Barros and Marqués, 2006).
However, testing as still the most effective way for quality assurance is more complex
for product lines than for traditional single software systems, since there are more product’s
instances that need to be tested. With the growing acceptance of product lines, therefore,
effective and efficient techniques and methods for testing are required (Kolb and Muthig, 2003).
In adition, the literature reports usually does not present details that give us confidence on
the applicability of proposed approaches to any context. We are mainly concerned with some
aspects that we noticed, such as: (1) the approaches usually only represent part of an approach,
without present either which activities should be performed during testing a product line project
or how steps have to be performed; (2) when the approaches present some results, there is
usually an implicit assumption that there is no variability inside the components and assets they
are dealing with; we have no evidences about a real support process for testing that is feasibly
applicable to any context.
Based on the open rooms aforementioned, we have proposed an approach for dealing with
tests in the SPL approach, in which the both SPL phases, Core Asset and Product development
are addressed. We have emphasized the modeling and management of variability in test assets,
as the most important element that is supposed to be handled in the context of SPL. Besides the
focus was to systematically reuse artifacts in order to gain in terms of effort.
Throughout this study we have presented our proposal on a SPL testing process developed
upon tailoring the best practices found in the existing approaches, gained through a systematic
129
6.1. RESEARCH CONTRIBUTIONS
mapping study, combining new strategies and, specially, considering effort reduction aspects
since it can be faced as a bottleneck for effectively introduce testing concepts into practice.
Experimental studies were performed in order to validate and calibrate part of the process.
6.1
Research Contributions
This work has three main contributions: (i) a systematic mapping study, in which a formal
process for analyzing the state-of-the-art in the SPL testing field was performed; (ii) the RiPLETE, our process for testing SPL projects, based in the RiPLE framework for SPL engineering;
and finally (iii) two experimental studies, which enabled us to define a model for conducting
experimental studies to evaluate SPL testing approaches; Each is following detailed:
6.1.1
Systematic Mapping Study
We applied the systematic mapping study, an empirical method, to collect evidences in the
literature that enabled us to sketch and comprehend the state-of-the-art of the SPL testing
research and practice field. The motivation was to identify the evidence available on the topic
and to identify gaps in the existing approaches, hence identifying rooms for improvement. We
followed guidelines (Petersen et al., 2008) that enabled us to systematically assess the primary
studies we collected. We analyzed 45 studies considering 9 research questions. A classification
scheme based on facets was applied so that collected evidence could be categorized. At the end
of the study we had map out the research areas, including the most relevant publications.
6.1.2
RiPLE-TE Process
Based on the RiPLE framework for SPL engineering, we built the RiPLE-TE, a process for
testing SPL projects, that encompasses the both SPL phases, core asset and product development,
by providing guidelines, activities, steps, roles and attributions. An initial effort towards defining
an approach for handling variability in test assets is also presented. In summary, the process is
intended to reduce the effort of testing in SPL projects, by performing activities in a systematic
and manageable way. We modeled our process in the EPF1 tool, which can easily enable future
process improvements and integration with other processes as well.
1 http://www.eclipse.org/epf/
130
6.2. RELATED WORK
6.1.3
Experimental Study
Although the results of the experiment were not conclusive to evaluate the whole approach
and prove its effectiveness, we have defined what can be called a guideline for performing
experimental studies in SPL testing approaches. It is indeed an initial step towards conducting a
formal experiment, but the lack of information in the literature made us to define aspects from
scratch, even without having baselines to compare gained results. Therefore, from now on, for
next experiments, what we are supposed to do is simply to calibrate the experiment, through
including somen issues that were not considered in the two rounds we had in the experimental
evaluation of this approach.
6.2
Related Work
This Section points out three main works we analyze as having similar ideas, as follows:
(Reuys et al., 2006) - The ScenTED - (Scenario based Test case Derivation) approach
supports the systematic reuse of assets for the purpose of system and integration testing of SPL
applications. This approach describes test cases from requirements. The ScenTED uses UML
activity diagram to represent all possible use case scenarios for a use case. However, it does not
explicitly prescribe how to describe domain architecture and how to reflect domain architecture
to architecture scenarios for integration test development. On the other hand, the approach of
modeling all use case scenarios creates a new problem, which is the explosion of the number of
variabilities to consider for product test cases. This problem is not addressed in this work.
(Bertolino and Gnesi, 2003a) - This approach provides a method to derive scenarios to be
tested, called PLUTO (Product Lines Use Case Test Optimization). It deals with textual use
cases as the central development artifact. The authors extend the use cases by tags, which
describe the variability. Test specifications containing variability are derived from these use
cases. Moreover, it is based on the Category Partition method, but expands it with the capability
to handle PL variabilities and to instantiate test cases for a specific customer product. However,
the derived scenarios are just the cases to be tested and actual test scenarios have to be developed
from them. Besides, as the approach is based on structured, natural language requirements, the
test derivation has to be done manually, which increases substantially the effort.
(Nebut et al., 2002) This study provides a method for automatic generation of detailed test
cases associated to specific products, from sets of incomplete and generic scenarios associated
to a product line, i.e. behavioral test patterns. They propose a two-step process, from test
requirements to product-specific test cases. Test requirements are defined independently of the
131
6.3. OPEN ISSUES AND FUTURE WORK
target final-product, in terms of behavioral test patterns. From use cases, they structure scenarios
to produce reusable test patterns common to an entire product line and represented using the
UML. When the final design is available and a product chosen as test target, test cases for that
particular product are synthesized.
Despite the approaches aforementioned are representative in the field of SPL testing, they
have limitations that do not allow one to fully apply it to an industrial context, since they are
mainly concerned about generating test cases, usually from use case scenarios. They do not
provide the elements of a software process, which contain information that make its practical
implementation feasible. In addition, little is reported regarding the studies’ validation. (Reuys
et al., 2006) is the only study from these which reported how the validation was performed.
Despite that, the way validation was reported does not enable its replication since no guidelines
on how the experimental study was performed were reported. This indeed does not invalidate
the proposed process, however it might be infeasible its implementation in other contexts than
the one presented in the study. Hence, more evidence is required.
In our proposal, we go further. We adapt the test case generation methods described in these
approaches, since we considered them as well defined practices, but we also give details that are
not provided by them, in terms of how to perform testing in a SPL project.
6.3
Open Issues and Future Work
Besides the out-of-scope topics mentioned as early as in the Section 1.3, some problems we
judge as important were not covered by this approach.
So far, we have no evidence about how we can exploit the SPL model to generate tests that
are effective at revealing faults. Data collected through the experiment showed that subjects
advocate the use of the approach as a fault revealing effective strategy, but we need measurable
evidences other than subjects’ feedback. We need practical evidence. Moreover, we did not
analyze the impact of variant binding times in testing concerns. Variant binding much probably
impacts on the testing effort, but we had no inputs to evaluate such aspect.
Our future research agenda includes investigation on the topics not covered by this study, as
the ones aforementioned as well as others detailed below.
Meta Model for Variability Management. Our approach for variability management
within test assets uses a meta model for trace variability among artifacts. But it needs to be
evolved, since currently it is only an initial model. We have already defined the concepts
and relations of the assets, but we need to formalize it by describing it using an standardized
132
6.3. OPEN ISSUES AND FUTURE WORK
language, such as Object Constraint Language (OCL), by OMG2 .
Test Automation. We believe that automated support for the variability management process
is an important issue to be investigated. It will make automated product configuration analysis
possible, thus, providing the organizations with mechanisms to evaluate PL adoption and
evolution. There is a Master Student in our Research Group who is investigating how the
RiPLE-TE process can be extended to support automatic variability management an traceability.
Test Case Selection. Based on the RiPLE-TE Process, thus including the metamodel, the
next step is to provide an effective way to select test cases considering variability concerns.
There is another Master Student in our Research Group who is working towards extending the
RiPLE-TE to consider test case selection. The student is investigating suitable techniques for
selecting test cases, using both automatic tools and manual approaches.
Experimental Software Engineering. A next step is to perform an experimental study in a
broader context, before applying it in an industrial project. This will give us more evidences and
consequently more confidence on the process effectiveness.
The RiPLE has currently being applied in a SPL industrial project, in the medical information
management domain. It is a large project, including about 800 features. Based on this project
we will apply, along the 2010/2 semester, the RiPLE-TE process in a part of the project, and
collect data to improve the proposal. Then, we will have the opportunity to gather data in a
set of aspects, such as: understand sizes - understand the impact of the amount of variability
points and variants in a project. We have in mind that real SPLs may have hundred of VPs and
several hundreds of variants, hence it is necessary to understand, with evidence, how a testing
approach fits in different scenarios; quantify extent and complexity of constraints - understand the
relationship of constraints among features, and its impact on the testing activities. As constraints
grow in complexity and number, the difficulty of modeling and generating test suites increases;
effectiveness and feasibility of testing methods - we will try to apply different testing methods
and analyze their suitability to the proposed approach.
We propose to additionally investigate how much additional testing effort a variant would
cause. As the SPL grows, we must have testing mechanisms that do not exponentially increase
the effort as several new variants are inserted.
Fault Model for SPL. The project above mentioned will also serve as an environment to
gather issues from all SPL disciplines involved throughout its development, so that we will
accomplish an issue library to be analyzed and become a fault model, that can aid future Testing
and Inspection agendas, in the form of an error prediction approach based on evidences.
2 http://www.omg.org/spec/OCL/2.2.
133
Bibliography
Afzal, W., Torkar, R., and Feldt, R. (2008). A systematic mapping study on non-functional
search-based software testing. In SEKE, pages 488–493. Knowledge Systems Institute
Graduate School.
Afzal, W., Torkar, R., and Feldt, R. (2009). A systematic review of search-based testing for
non-functional system properties. Information and Software Technology, 51(6), 957–976.
Al-Dallal, J. and Sorenson, P. (2008). Testing software assets of framework-based product
families during application engineering stage. Journal of Software, 3(5), 11–25.
Almeida, E. S., Alvaro, A., Lucrédio, D., Garcia, V. C., and Meira, S. R. L. (2004). Rise project:
Towards a robust framework for software reuse. In D. Zhang, É. Grégoire, and D. DeGroot,
editors, IRI, pages 48–53. IEEE Systems, Man, and Cybernetics Society.
Almeida, E. S., Alvaro, A., Lucrédio, D., Garcia, V. C., and Meira, S. R. L. (2005). A survey
on software reuse processes. In D. Zhang, T. M. Khoshgoftaar, and M.-L. Shyu, editors, IRI,
pages 66–71. IEEE Systems, Man, and Cybernetics Society.
Almeida, E. S., Alvaro, A., Garcia, V. C., Jorge, Burégio, V. A., Nascimento, L. M., Lucrédio,
D., and Silvio (2007). C.R.U.I.S.E: Component Reuse in Software Engineering. C.E.S.A.R
e-book, Recife, 1st edition.
Alvaro, A., de Almeida, E. S., and de Lemos Meira, S. R. (2006). A software component quality
model: A preliminary evaluation. In EUROMICRO-SEAA, pages 28–37. IEEE.
Ammann, P. and Offutt, J. (2008). Introduction to Software Testing. Cambridge University
Press, 1st edition.
Bachmann, F. and Clements, P. C. (2005). Variability in software product lines. Technical Report
CMU/SEI-2005-TR-012 ESC-TR-2005-012, CMU/SEI - Software Engineering Institute,
Pittsburgh, PA.
Bailey, J., Budgen, D., Turner, M., Kitchenham, B., Brereton, P., and Linkman, S. G. (2007).
Evidence relating to object-oriented software design: A survey. In ESEM, pages 482–484.
IEEE Computer Society.
134
BIBLIOGRAPHY
Barros, J. L. and Marqués, J. M. (2006). Support to development-with-reuse in very small
software developing companies. In M. Morisio, editor, ICSR, volume 4039 of Lecture Notes
in Computer Science, pages 419–422. Springer.
Basili, V. R., Caldiera, G., and Rombach, H. D. (1994). Goal question matric paradim. In
Encyclopaedia of Software Engineering, volume 2. John Wiley & Sons, Inc.
Beatriz Pérez Lamancha, Macario Polo Usaola, M. P. (2009). Towards an automated testing framework to manage variability using the uml testing profile. In ICSE Workshop on
Automation of Software Test (AST09).
Bertolino, A. (2007). Software testing research: Achievements, challenges, dreams. FOSE,
pages 85–103.
Bertolino, A. and Gnesi, S. (2003a). Pluto: A test methodology for product families. In Software
Product-Family Engineering, 5th International Workshop, PFE, Siena, Italy, pages 181–197.
Bertolino, A. and Gnesi, S. (2003b). Use case-based testing of product lines. ACM SIGSOFT
Software Engineering Notes, 28(5), 355–358.
Bezerra, Y. M., Pereira, T. A. B., and da Silveira, G. E. (2009). A systematic review of software
product lines applied to mobile middleware. In ITNG ’09: Proceedings of the 2009 Sixth
International Conference on Information Technology: New Generations, pages 1024–1029,
Washington, DC, USA. IEEE Computer Society.
Black, A. (2003). Critical Testing Process: Plan, Prepare, Perform, Perfect. Addison-Wesley
Longman Publishing Co., Inc., Boston, MA, USA.
Brereton, P., Kitchenham, B. A., Budgen, D., Turner, M., and Khalil, M. (2007). Lessons from
applying the systematic literature review process within the software engineering domain.
Journal of Systems and Software, 80(4), 571–583.
Brito, K. S., Garcia, V. C., Almeida, E. S., and Meira, S. R. L. (2008). Lift - a legacy information
retrieval tool. J.UCS - Journal of Universal Computer Science, 14(8), 1256–1284.
Budgen, D., Turner, M., Brereton, P., and Kitchenham, B. (2008). Using Mapping Studies in
Software Engineering. In Proceedings of PPIG 2008, pages 195–204. Lancaster University.
Burnstein, I. (2002). Practical Software Testing. Springer-Verlag New York, Inc., Secaucus, NJ,
USA.
135
BIBLIOGRAPHY
Cavalcanti, Y. C. (2009). A Bug Report Analysis and Search Tool. Master’s thesis, UFPE Federal University of Pernambuco.
Cavalcanti, Y. C., da Cunha, C. E. A., de Almeida, E. S., and de Lemos Meira, S. R. (2009).
Bast: A tool for bug report analysis and search. In SBES - 23rd Brazilian Symposium on
Software Engineering, Tools Session, Fortaleza, CE, Brazil.
Chen, L., Babar, M. A., and Ali, N. (2009). Variability management in software product lines:
A systematic review. In SPLC 2009: 13th Software Product Line Conference, San Francisco,
CA, USA.
Chernak, Y. (2001). Validating and improving test-case effectiveness. IEEE Software, 18(1),
81–86.
Clements, P. and Northrop, L. (2001). Software Product Lines: Practices and Patterns. AddisonWesley, Boston, MA, USA.
Cohen, M. B., Dwyer, M. B., and Shi, J. (2006). Coverage and adequacy in software product
line testing. In R. M. Hierons and H. Muccini, editors, ROSATEA, pages 53–63. ACM.
Condori-Fernández, N., Daneva, M., Sikkel, K., Wieringa, R., Tubío, Ó. D., and Pastor, O.
(2009). A systematic mapping study on empirical evaluation of software requirements
specifications techniques. In ESEM, pages 502–505.
Condron, C. (2004). A domain approach to test automation of product lines. SPLiT - Workshop
on Software Product Line Testing, pages 27–35.
Craig, R. D. and Jaskiel, S. P. (2002). Systematic Software Testing. Artech House, Inc., Norwood,
MA, USA.
Crnkovic, I. (2002). Building Reliable Component-Based Software Systems. Artech House, Inc.,
Norwood, MA, USA.
Denger, C. and Kolb, R. (2006). Testing and inspecting reusable product line components:
first empirical results. In ISESE: Proceedings of the International Symposium on Empirical
Software Engineering, pages 184–193, New York, NY, USA.
Durão, F. A. (2008). Semantic Layer Applied to a Source Code Search Engine. Master’s thesis,
UFPE - Federal University of Pernambuco.
136
BIBLIOGRAPHY
Dybå, T. and Dings, T. (2008). Empirical studies of agile software development: A systematic
review. Information and Software Technology, 50(9-10), 833–859.
Dybå, T. and Dingsøyr, T. (2008). Strength of evidence in systematic reviews in software
engineering. In ESEM ’08: Proceedings of the Second ACM-IEEE international symposium
on Empirical software engineering and measurement, pages 178–187, New York, NY, USA.
ACM.
Edwin, O. O. (2007). Testing in Software Product Lines. Master’s thesis, School of Engineering
at Blekinge Institute of Technology.
Engström, E., Skoglund, M., and Runeson, P. (2008). Empirical evaluations of regression
test selection techniques: a systematic review. In ESEM ’08: Proceedings of the Second
ACM-IEEE international symposium on Empirical software engineering and measurement,
pages 22–31, New York, NY, USA. ACM.
Feng, Y., Liu, X., and Kerridge, J. (2007). A product line based aspect-oriented generative
unit testing approach to building quality components. In COMPSAC - Proceedings of the
31st Annual International Computer Software and Applications Conference, pages 403–408,
Washington, DC, USA.
Ganesan, D., Maurer, U., Ochs, M., Snoek, B., and Verlage, M. (2005). Towards testing response
time of instances of a web-based product line. SPLiT - Workshop on Software Product Line
Testing.
Garcia, V. C., Lisboa, L. B., de Lemos Meira, S. R., de Almeida, E. S., Lucrédio, D., and
de Mattos Fortes, R. P. (2008). Towards an assessment method for software reuse capability
(short paper). In H. Zhu, editor, QSIC - International Conference on Quality Software, pages
294–299. IEEE Computer Society.
Geppert, B., Li, J. J., Rößler, F., and Weiss, D. M. (2004). Towards generating acceptance tests
for product lines. In ICSR - Proceedings of 8th International Conference on Software Reuse,
Lecture Notes in Computer Science, pages 35–48.
Goldsmith, R. F. and Graham, D. (2002). The forgotten phase. Software Development Magazine,
pages 45 – 47.
137
BIBLIOGRAPHY
Graves, T. L., Harrold, M. J., Kim, J.-M., Porter, A., and Rothermel, G. (2001). An empirical
study of regression test selection techniques. ACM Transaction on Software Engineering
Methodology, 10(2), 184–208.
Harrold, M. J. (1998). Architecture-based regression testing of evolving systems. In International
Worshop on Role of Architecture in Testing and Analysis (ROSATEA 1998), pages 73–77,
Marsala, Sicily, Italy.
Harrold, M. J. (2000). Testing: a roadmap. In ICSE ’00: Proceedings of the Conference on The
Future of Software Engineering, pages 61–72, New York, NY, USA. ACM.
Hartmann, J., Vieira, M., and Ruder, A. (2004). A UML-based approach for validating product
lines. SPLiT - Workshop on Software Product Line Testing, pages 58–65.
Haumer, P. (2007). Eclipse process framework composer (part 1: Key concepts). Technical
report, IBM Rational Software.
Hui Zeng, W. Z. and Rine, D. (2004). Analysis of testing effort by using core assets in software
product line testing. SPLiT - Workshop on Software Product Line Testing, pages 1–6.
IEEE (1988). IEEE guide for the use of IEEE standard dictionary of measures to produce
reliable software - 982.2-1998. IEEE Computer Society.
IEEE (1998). IEEE Standard for Software Test Documentation - 829-1998. IEEE Computer
Society.
Jaring, M., Krikhaar, R. L., and Bosch, J. (2008). Modeling variability and testability interaction in software product line engineering. In ICCBSS - 7th International Conference on
Composition-Based Software Systems, pages 120–129.
Jedlitschka, A., Ciolkowski, M., and Pfahl, D. (2008). Reporting experiments in software
engineering. In Guide to Advanced Empirical Software Engineering, chapter 8, pages 201–
228. Springer, Secaucus, NJ, USA.
Jin-hua, L., Qiong, L., and Jing, L. (2008). The w-model for testing software product lines.
International Symposium on Computer Science and Computational Technology, 1, 690–693.
Juan Jenny Li, Birgit Geppert, F. R. and Weiss, D. (2007). Reuse execution traces to reduce
testing of product lines. SPLiT - Workshop on Software Product Line Testing.
138
BIBLIOGRAPHY
Juristo, N. and Moreno, A. M. (2006). Guest editors’ introduction: Software testing practices in
industry. IEEE Software, 23(4), 19–21.
Juristo, N., Moreno, A. M., and Vegas, S. (2002). A survey on testing technique empirical
studies: How limited is our knowledge. In ISESE ’02: Proceedings of the 2002 International
Symposium on Empirical Software Engineering, page 161, Washington, DC, USA. IEEE
Computer Society.
Juristo, N., Moreno, A. M., and Vegas, S. (2004). Reviewing 25 years of testing technique
experiments. Empirical Software Engineering, 9(1-2), 7–44.
Juristo, N., Moreno, A. M., Vegas, S., and Solari, M. (2006). In search of what we experimentally
know about unit testing. IEEE Software, 23(6), 72–80.
Käkölä, T. and Dueñas, J. C., editors (2006). Software Product Lines - Research Issues in
Engineering and Management. Springer.
Kamsties, E., Pohl, K., Reis, S., and Reuys, A. (2003). Testing variabilities in use case models.
In Software Product-Family Engineering, 5th International Workshop, PFE, Siena, Italy,
pages 6–18.
Kang, S., Lee, J., Kim, M., and Lee, W. (2007). Towards a formal framework for product
line test development. In CIT ’07: Proceedings of the 7th IEEE International Conference
on Computer and Information Technology, pages 921–926, Washington, DC, USA. IEEE
Computer Society.
Kauppinen, R. (2003). Testing framework-based software product lines. Master’s thesis,
University of Helsinki Department of Computer Science.
Kauppinen, R. and Taina, J. (2003). Rita environment for testing framework-based software
product lines. In P. Kilpelinen and N. Pivinen, editors, SPLST, pages 58–69. University of
Kuopio, Department of Computer Science.
Kauppinen, R., Taina, J., and Tevanlinna, A. (2004). Hook and template coverage criteria for
testing framework-based software product families. SPLiT - Workshop on Software Product
Line Testing, pages 7–12.
Kishi, T. and Noda, N. (2006). Formal verification and software product lines. Communications
of the ACM, 49(12), 73–77.
139
BIBLIOGRAPHY
Kitchenham, B. (2010). What’s up with software metrics? - a preliminary mapping study.
Journal of Systems and Software, 83(1), 37–51.
Kitchenham, B. and Charters, S. (2007). Guidelines for performing Systematic Literature
Reviews in Software Engineering. Technical Report EBSE 2007-001, Keele University and
Durham University Joint Report.
Kitchenham, B. A., Pfleeger, S. L., Pickard, L. M., Jones, P. W., Hoaglin, D. C., Emam, K. E., and
Rosenberg, J. (2002). Preliminary guidelines for empirical research in software engineering.
IEEE Transactions on Software Engineering, 28(8), 721–734.
Kitchenham, B. A., Dyba, T., and Jorgensen, M. (2004). Evidence-based software engineering.
In ICSE: Proceedings of the 26th International Conference on Software Engineering, pages
273–281, Washington, DC, USA.
Kitchenham, B. A., Mendes, E., and Travassos, G. H. (2007). Cross versus within-company
cost estimation studies: A systematic review. IEEE Transactions on Software Engineering,
33(5), 316–329.
Klaus Pohl, G. B. and van der Linden, F., editors (2005). Software Product Line Engineering Foundations, Principles, and Techniques. Springer.
Kolb, R. (2003). A risk-driven approach for efficiently testing software product lines. GPCE 5th Generative Programming and Component Engineering.
Kolb, R. and Muthig, D. (2003). Challenges in testing software product lines. CONQUEST 7th Conference on Quality Engineering in Software Technology, pages 81–95.
Kolb, R. and Muthig, D. (2006). Making testing product lines more efficient by improving the
testability of product line architectures. In ROSATEA: Proceedings of the ISSTA workshop on
Role of software architecture for testing and analysis, pages 22–27, New York, NY, USA.
Krueger, C. W. (2006). Introduction to the emerging practice software product line development.
Methods Tools, 14(3), 3–15.
Kshirasagar Naik, P. T. (2008). Software Testing and Quality Assurance: Theory and Practice.
John Wiley & Sons, Hoboken, New Jersey.
140
BIBLIOGRAPHY
Lamancha, B. P., Usaola, M. P., and Velthius, M. P. (2009). Software product line testing - a
systematic review. In ICSOFT International Conference on Software and Data Technologies,
pages 23–30. INSTICC Press.
Lewis, W. E. (2008). Software Testing and Continuous Quality Improvement, Third Edition.
Auerbach Publications, Boston, MA, USA.
Li, J. J., Weiss, D. M., and Slye, J. H. (2007). Automatic integration test generation from unit
tests of exvantage product family. In 11th International Conference on Software Product
Lines (Workshops), pages 73–80. Kindai Kagaku Sha Co. Ltd., Tokyo, Japan.
Linden, F. J. v. d., Schmid, K., and Rommes, E. (2007). Software Product Lines in Action: The
Best Industrial Practice in Product Line Engineering. Springer.
Lisboa, L. B. (2008). ToolDAy - A Tool for Domain Analysis. Master’s thesis, UFPE - Federal
University of Pernambuco.
Lisboa, L. B., Garcia, V. C., Almeida, E. S., and Silvio (2007). Toolday a process-centered
domain analysis tool. In 21st Brazilian Symposium on Software Engineering, Tools Session,
João Pessoa, PB, Brazil.
Lisboa, L. B., Garcia, V. C., Lucrédio, D., de Almeida, E. S., de Lemos Meira, S. R., and
de Mattos Fortes, R. P. (2010). A systematic review of domain analysis tools. Information &
Software Technology, 52(1), 1–13.
Mansell, J. X. (2006). Experiences and expectations regarding the introduction of systematic
reuse in small- and medium-sized companies. In Käkölä and Dueñas (2006), pages 91–124.
Martins, A. C., Garcia, V. C., Almeida, E. S., and Silvio (2008). Enhancing components search
in a reuse environment using discovered knowledge techniques. In 2nd Brazilian Symposium
on Software Components, Architectures, and Reuse (SBCARS), Porto Alegre, Brazil.
Mascena, J. C. C. P., de Lemos Meira, S. R., de Almeida, E. S., and Garcia, V. C. (2006).
Towards an effective integrated reuse environment. In S. Jarzabek, D. C. Schmidt, and T. L.
Veldhuizen, editors, GPCE, pages 95–100. ACM.
Mathur, A. P. (2009). Foundations of Software Testing - Fundamental Algorithms and Techniques.
Dorling Kindersley, India, 2nd edition.
141
BIBLIOGRAPHY
McGregor, J., Sodhani, P., and Madhavapeddi, S. (2004). Testing variability in a software
product line. SPLiT - Workshop on Software Product Line Testing, pages 45–50.
McGregor, J. D. (2001a). Structuring test assets in a product line effort. In Proceedings of
the 2nd International Workshop on Software Product Lines: Economics, Architectures, and
Implications, pages 89–92.
McGregor, J. D. (2001b). Testing a software product line. Technical report, CMU/SEI - Software
Engineering Institute.
McGregor, J. D. (2002). Building reusable test assets for a product line. In ICSR - Proceedings
of 7th International Conference on Software Reuse, pages 345–346.
McGregor, J. D., Northrop, L. M., Jarrad, S., and Pohl, K. (2002). Guest editors’ introduction:
Initiating software product lines. IEEE Software, 19(4), 24–27.
Medeiros, F. M., de Almeida, E. S., and de Lemos Meira, S. R. (2009). Towards an approach for
service-oriented product line architectures. In 3rd Workshop on Service-Oriented Architectures
and Software Product Lines (SOAPL) - Enhancing Variation, in conjuction with the 13th
International Software Product Line Conference (SPLC), San Francisco, CA, USA.
Melo, C. A., Burégio, V. A. A., Almeida, E. S., and Meira, S. R. L. (2008). A reuse repository
system: The core system. In 10th International Conference on Software Reuse (ICSR), Tools
Demonstration, Beijing, China.
Mendes, R. C. (2008). Search and Retrieval of Reusable Source Code using Faceted Classification Approach. Master’s thesis, UFPE - Federal University of Pernambuco.
Mili, H., Mili, A., Yacoub, S., and Addy, E. (2001). Reuse-based software engineering:
techniques, organization, and controls. Wiley-Interscience, New York, NY, USA.
Moraes, M. B. S. (2010). A scoping approach for software product lines.
Moraes, M. B. S., Almeida, E. S., and de Lemos Meira, S. R. (2009). A systematic review on
software product lines scoping. In ESELAW 2009: VI Experimental Software Engineering
Latin American Workshop, So Carlos-SP, Brazil.
Muccini, H. and van der Hoek, A. (2003). Towards testing product line architectures. Electronic
Notes in Theoretical Computer Science, 82(6).
142
BIBLIOGRAPHY
Muccini, H., Dias, M. S., and Richardson, D. J. (2005). Towards software architecture-based regression testing. In WADS ’05: Proceedings of the 2005 workshop on Architecting dependable
systems, pages 1–7, New York, NY, USA. ACM.
Muccini, H., Dias, M. S., and Richardson, D. J. (2006). Software architecture-based regression
testing. Journal of Systems and Software, 79(10), 1379–1396.
Myers, G. J. and Sandler, C. (2004). The Art of Software Testing. John Wiley & Sons.
Nebut, C., Pickin, S., Traon, Y. L., and marc Jézéquel, J. (2002). Reusable test requirements for
uml-modeled product lines. In In Proceedings of the Workshop on Requirements Engineering
for Product Lines (REPL’02, pages 51–56.
Nebut, C., Fleurey, F., Traon, Y. L., and Jézéquel, J.-M. (2003). A requirement-based approach
to test product families. In Software Product-Family Engineering, 5th International Workshop,
PFE, Siena, Italy, Nov 4-6, pages 198–210.
Nebut, C., Traon, Y. L., and Jézéquel, J.-M. (2006). System testing of product lines: From
requirements to test cases. In Käkölä and Dueñas (2006), pages 447–477.
Needham, D. and Jones, S. (2006). A software fault tree metric. In ICSM - International
Conference on Software Maintenance, pages 401–410.
Neiva, D. F. S. (2009). RiPLE-RE: A Requirements Engineering Process for Software Product
Lines. Master’s thesis, UFPE - Federal University of Pernambuco.
Neto, P. A. M. S. (2010). A regression testing approach for software product lines architectures.
Northrop, L. M. (2002). Sei’s software product line tenets. IEEE Software, 19(4), 32–40.
Northrop, L. M. and Clements, P. C. (2007). A framework for software product line practice,
version 5.0. Technical report, CMU/SEI - Software Engineering Institute.
Olimpiew, E. and Gomaa, H. (2005a). Reusable system tests for applications derived from
software product lines. SPLiT - Workshop on Software Product Line Testing.
Olimpiew, E. M. and Gomaa, H. (2005b). Model-based testing for applications derived from
software product lines. In A-MOST: Proceedings of the 1st International Workshop on
Advances in model-based testing, pages 1–7, New York, NY, USA.
143
BIBLIOGRAPHY
Olimpiew, E. M. and Gomaa, H. (2009). Reusable model-based testing. In ICSR ’09: Proceedings of the 11th International Conference on Software Reuse, pages 76–85, Berlin, Heidelberg.
Springer-Verlag.
Oliveira, T. H. B. (2009). RiPLE-EM: A Process to Manage Evolution in Software Product
Lines. Master’s thesis, UFPE - Federal University of Pernambuco.
OMG (2008). Software process engineering meta-model, version 2.0.
Patton, R. (2005). Software Testing (2nd Edition). Sams, Indianapolis, IN, USA.
Petersen, K., Feldt, R., Mujtaba, S., and Mattsson, M. (2008). Systematic mapping studies in
software engineering. In EASE ’08: Proceedings of the 12th International Conference on
Evaluation and Assessment in Software Engineering.
Pohl, K. and Metzger, A. (2006). Software product line testing. Communications of the ACM,
49(12), 78–81.
Pohl, K. and Sikora, E. (2005). Documenting variability in test artefacts. In Klaus Pohl and
van der Linden (2005), pages 149–158.
Pohl, K., Böckle, G., and van der Linden, F. J. (2005). Software Product Line Engineering:
Foundations, Principles and Techniques. Springer.
Pretorius, R. and Budgen, D. (2008). A mapping study on empirical evidence related to the
models and forms used in the uml. In H. D. Rombach, S. G. Elbaum, and J. Münch, editors,
ESEM, pages 342–344. ACM.
Reis, S.; Metzger, A. P. K. (2006). A reuse technique for performance testing of software
product lines. SPLiT - Workshop on Software Product Line Testing.
Reis, S., Metzger, A., and Pohl, K. (2007). Integration testing in software product line engineering: A model-based technique. In FASE - Fundamental Approaches to Software Engineering,
pages 321–335.
Reuys, A., Kamsties, E., Pohl, K., and Reis, S. (2005). Model-based system testing of software
product families. In CAiSE - International Conference on Advanced Information Systems
Engineering, pages 519–534.
144
BIBLIOGRAPHY
Reuys, A., Reis, S., Kamsties, E., and Pohl, K. (2006). The scented method for testing software
product lines. In Käkölä and Dueñas (2006), pages 479–520.
Rommes, E. and America, P. (2006). A scenario-based method for software product line
architecting. In Käkölä and Dueñas (2006), pages 3–52.
Rothermel, G. and Harrold, M. (1996). Analyzing regression test selection techniques. IEEE
Transactions on Software Engineering, 22(8), 529–551.
Rumbaugh, J., Jacobson, I., and Booch, G. (2004). Unified Modeling Language Reference
Manual, The (2nd Edition). Pearson Higher Education.
Santos, E. C. R., Durão, F. A., Martins, A. C., Mendes, R. C., de Albuquerque Melo, C., Garcia,
V. C., and de Almeida, E. S. (2006). Towards an effective context-aware proactive asset search
and retrieval tool. 6th Workshop on Component-Based Development.
Souza Filho, E. D., Oliveira Cavalcanti, R., Neiva, D. F., Oliveira, T. H., Lisboa, L. B., Almeida,
E. S., and Lemos Meira, S. R. (2008). Evaluating domain design approaches using systematic
review. In ECSA ’08: Proceedings of the 2nd European conference on Software Architecture,
pages 50–65, Berlin, Heidelberg. Springer-Verlag.
Tevanlinna, A., Taina, J., and Kauppinen, R. (2004). Product family testing: a survey. ACM
SIGSOFT Software Engineering Notes, 29(2), 12.
Trendowicz, A. and Punter, T. (2003). Quality modeling for software product lines. In In: 7th
ECOOP Workshop on Quantitative Approaches in Object-Oriented Software Engineering
(QAOOSE03).
Vanderlei, T. A., Durão, F. A., Martins, A. C., Garcia, V. C., de Almeida, E. S., and
de Lemos Meira, S. R. (2007). A cooperative classification mechanism for search and
retrieval software components. In Y. Cho, R. L. Wainwright, H. Haddad, S. Y. Shin, and Y. W.
Koo, editors, SAC, pages 866–871. ACM.
von Knethen, A. and Paech, B. (2002). A survey on tracing approaches in practice and research.
Technical report, Fraunhofer IESE.
Šmite, D., Wohlin, C., Gorschek, T., and Feldt, R. (2010). Empirical evidence in global software
engineering: a systematic review. Empirical Software Engineering, 15(1), 91–118.
145
BIBLIOGRAPHY
Weiss, D. M. (2008). The product line hall of fame. In SPLC ’08: Proceedings of the 2008 12th
International Software Product Line Conference, page 395, Washington, DC, USA. IEEE
Computer Society.
Wieringa, R., Maiden, N. A. M., Mead, N. R., and Rolland, C. (2006). Requirements engineering
paper classification and evaluation criteria: a proposal and a discussion. Requirements
Engineering, 11(1), 102–107.
Wohlin, C., Runeson, P., Höst, M., Ohlsson, M. C., Regnell, B., and Wesslén, A. (2000).
Experimentation in software engineering: an introduction. Kluwer Academic Publishers,
Norwell, MA, USA.
Wübbeke, A. (2008). Towards an efficient reuse of test cases for software product lines. In
S. Thiel and K. Pohl, editors, SPLC (2), pages 361–368. Lero International Science Centre,
University of Limerick, Ireland.
146
Appendices
147
A
Mapping Study
This appendix lists the journals (A.1) and the conferences (A.2) used in locating primary studies
in the Mapping Study explained in Chapter 3. It also presents the Quality Score table (A.3),
which reports on the score of each study included in the Mapping Study, according to the criteria
described in 3.5.4.
A.1
List of Journals
Table A.1 List of Journals
Journals
ACM Transactions on Software Engineering and Methodology (TOSEM)
Communications of the ACM (CACM)
ELSEVIER Information and Software Technology (IST)
ELSEVIER Journal of Systems and Software (JSS)
IEEE Software
IEEE Computer
IEEE Transactions on Software Engineering
Journal of Software Maintenance Research and Practice
Software Practice and Experience Journal
Software Quality Journal
Software Testing, Verification and Reliability
148
A.2. LIST OF CONFERENCES
A.2
List of Conferences
Table A.2 List of Conferences
Acronym
AOSD
APSEC
ASE
CAiSE
CBSE
COMPSAC
CSMR
ECBS
ECOWS
ECSA
ESEC
ESEM
WICSA
FASE
GPCE
ICCBSS
ICSE
ICSM
ICSR
ICST
ICWS
IRI
ISSRE
MODELS
PROFES
QoSA
QSIC
ROSATEA
SAC
SEAA
SEKE
SERVICES
SPLC
SPLiT
TAIC PART
TEST
Conference Name
International Conference on Aspect-Oriented Software Development
Asia Pacific Software Engineering Conference
International Conference on Automated Software Engineering
International Conference on Advanced Information Systems Engineering
International Symposium on Component-based Software Engineering
International Computer Software and Applications Conference
European Conference on Software Maintenance and Reengineering
International Conference and Workshop on the Engineering of Computer Based Systems
European Conference on Web Services
European Conference on Software Architecture
European Software Engineering Conference
Empirical Software Engineering and Measurement
Working IEEE/IFIP Conference on Software Architecture
Fundamental Approaches to Software Engineering
International Conference on Generative Programming and Component Engineering
International Conference on Composition-Based Software Systems
International Conference on Software Engineering
International Conference on Software Maintenance
International Conference on Software Reuse
International Conference on Software Testing, Verification and Validation
International Conference on Web Services
International Conference on Information Reuse and Integration
International Symposium on Software Reliability Engineering
International Conference on Model Driven Engineering Languages and Systems
International Conference on Product Focused Software Development and Process Improvement
International Conference on the Quality of Software Architectures
International Conference on Quality Software
International Workshop on The Role of Software Architecture in Testing and Analysis
Annual ACM Symposium on Applied Computing
Euromicro Conference on Software Engineering and Advanced Applications
International Conference on Software Engineering and Knowledge Engineering
Congress on Services
Software Product Line Conference
Software Product Line Testing Workshop
Testing - Academic & Industrial Conference
International Workshop on Testing Emerging Software Technology
149
A.3. QUALITY SCORE
A.3
Quality Score
Table A.3: Primary Studies Quality Score
Id
REF
Study Title
Year
A
B
C
1
Condron (2004)
A Domain Approach to Test Automation of
Product Lines
2004
2
0
2
2
Feng et al. (2007)
A product line based aspect-oriented generative unit testing approach to building quality
components
2007
1.5
0
2.5
3
Nebut et al. (2003)
A Requirement-Based Approach to Test Product Families
2003
2.5
1
1.5
4
Reis (2006)
A Reuse Technique for Performance Testing of
2006
1.5
2
3
5
Kolb (2003)
A Risk-Driven Approach for Efficiently Testing
2003
2
1
2.5
6
Needham
(2006)
A Software Fault Tree Metric
2006
0
0
1
7
Hartmann et al. (2004)
A UML-Based approach for Validating Product
Lines
2004
1
2
0.5
8
Hui Zeng and Rine (2004)
Analysis of Testing Effort by Using Core Assets in Software Product Line Testing
2004
1
1.5
2.5
9
Harrold (1998)
Architecture-Based Regression Testing of
Evolving Systems
1998
0
0.5
2
10
Li et al. (2007)
Automatic Integration Test Generation from
Unit Tests of eXVantage Product Family
2007
1
1
2
11
McGregor (2002)
Building reusable test assets for a product line
2002
2
2
0.5
12
Kolb and Muthig (2003)
Challenges in testing software product lines
2003
0
3
1.5
13
Cohen et al. (2006)
Coverage and adequacy in software product
line testing
2006
1
1.5
2
14
Pohl and Sikora (2005)
Documenting Variability in Test Artefacts
2005
1
0
1
15
Kishi and Noda (2006)
Formal verification and software product lines
2006
2
1.5
2
16
Kauppinen et al. (2004)
Hook and Template Coverage Criteria for Testing Framework-based Software Product Families
2004
0.5
0.5
3
17
Reis et al. (2007)
Integration Testing in Software Product Line
Engineering: A Model-Based Technique
2007
1
0
3
and
Jones
Continued on next page
150
A.3. QUALITY SCORE
Table A.3 – continued from previous page
Id
REF
Study Title
Year
A
B
C
18
Kolb and Muthig (2006)
Making testing product lines more efficient by
improving the testability of product line architectures
2006
1
1.5
1.5
19
Reuys et al. (2005)
Model-Based System Testing of Software Product Families
2005
2
1
3.5
20
Olimpiew and Gomaa
(2005b)
Model-based Testing For Applications Derived
from Software Product Lines
2005
0
1
1
21
Jaring et al. (2008)
Modeling Variability and Testability Interaction
in Software Product Line Engineering
2008
2.5
6
3.5
22
Bertolino
(2003a)
PLUTO: A Test Methodology for Product Families
2003
0.5
1
3
23
Olimpiew and Gomaa
(2009)
Reusable Model-Based Testing
2009
3
0.5
3.5
24
Olimpiew and Gomaa
(2005a)
Reusable System Tests for Applications Derived from Software Product Lines
2005
2.5
1
1
25
Juan Jenny Li and Weiss
(2007)
Reuse Execution Traces to Reduce Testing of
Product Lines
2007
0
0.5
2
26
Kauppinen
(2003)
Taina
RITA environment for testing framework-based
software product lines
2003
0
0
0.5
27
Pohl and Metzger (2006)
Software Product Line Testing Exploring principles and potential solutions
2006
0.5
0
2.5
28
McGregor (2001a)
Structuring Test Assets in a Product Line Effort
2001
1.5
1
0.5
29
Nebut et al. (2006)
System Testing of Product Lines From Requirements to Test Cases
2006
0
2
2
30
McGregor (2001b)
Testing a Software Product Line
2001
4
1.5
2
31
Denger and Kolb (2006)
Testing and inspecting reusable product line
components: first empirical results
2006
0
1
0.5
32
Kauppinen (2003)
Testing Framework-Based Software Product
Lines
2003
0.5
0.5
2
33
Edwin (2007)
Testing in Software Product Line
2007
2
2.5
2
34
Al-Dallal and Sorenson
(2008)
Testing Software Assets of Framework-Based
Product Families during Application Engineering Stage
2008
3
1
4
35
Kamsties et al. (2003)
Testing variabilities in use case models
2003
0.5
1.5
1.5
36
McGregor et al. (2004)
Testing Variability in a Software Product Line
2004
0
1
2.5
37
Reuys et al. (2006)
The ScenTED Method for Testing Software
Product Lines
2006
3
1
4.5
and
and
Gnesi
Continued on next page
151
A.3. QUALITY SCORE
Table A.3 – continued from previous page
Id
REF
Study Title
Year
A
B
C
38
Jin-hua et al. (2008)
The W-Model for Testing Software Product
Lines
2008
1
3
1.5
39
Kang et al. (2007)
Towards a Formal Framework for Product Line
Test Development
2007
2
2
1
40
Beatriz Pérez Lamancha
(2009)
Towards an automated testing framework to
manage variability using the UML Testing Profile
2009
0
0
1
41
Wübbeke (2008)
Towards an Efficient Reuse of Test Cases for
2008
0
0
2
42
Geppert et al. (2004)
Towards Generating Acceptance Tests for Product Lines
2004
0.5
1.5
2
43
Muccini and van der Hoek
(2003)
Towards Testing Product Line Architectures
2003
0
2.5
1
44
Ganesan et al. (2005)
Towards Testing Response Time of Instances
of a web-based Product Line
2005
1
1.5
1
45
Bertolino
(2003b)
Use Case-based Testing of Product Lines
2003
1
1
2.5
and
Gnesi
* The shaded lines represent the most relevant studies according to the grades.
152
B
Experimental Study Instruments
This appendix presents the instruments given to the subjects involved in the Experimental Study,
prior presented in Chapter 5. The set of instruments included in this Appendix comprise the
following forms: B.2 presents the consent form subjects must be given and signed before
joining the Experimental Study, confirming permission to participate in the research; then B.1
details the background questionnaire, intended to collect data about the subjects background;
and the feedback questionnaire, answered by the subjects after performing the experiment in
two flavors: B.4 to subjects who did not follow the RiPLE-TE Unit Test Process and B.5 to
the ones who followed the Process guidelines. An addition questionnaire - B.6 - should be
answered by some selected subjects which did not use RiPLE-TE, after they have performed the
experiment and joined the training session on the process.
153
B.1. BACKGROUND QUESTIONNAIRE
B.1
Background Questionnaire
A. GENERAL INFORMATION
1. Age:
2. Sex: [ ] Male [ ] Female
3. Current Undergraduate Semester:
4. Grade Point Average (GPA):
B. TECHNICAL KNOWLEDGE - PROFESSIONAL EXPERIENCE
1. English reading:
[ ] Bad
[ ] Medium
[ ] Good
2. Previous experience with software development:
[ ] I’ve never developed software
[ ] I’ve already developed software, alone
[ ] I’ve already developed software, in group classes group
[ ] I’ve already developed software, in companies
3. What is your experience with programming (in months/years)?
4. What are the programming languages you have used / are using now?
5. What is your experience with software testing (in months/years)?
6. Previous experience with software testing:
[ ] None
[ ] I’ve already studied testing, either in class or in books
[ ] I’ve already developed projects, in academic context, applying testing concepts
154
B.1. BACKGROUND QUESTIONNAIRE
[ ] I’ve already been involved in one industrial testing project
[ ] I’ve already been involved in several industrial testing projects
7. Have you ever used any automated software testing tool? Which one(s)?
8. What is your experience with JUnit framework (in months/years)?
9. Previous experience with JUnit framework:
[ ] None
[ ] I’ve already tried it, either in class or in books
[ ] I’ve already used it in academic project(s)
[ ] I’ve already used it in one industrial testing project
[ ] I’ve already used it in several industrial testing projects
10. What is your experience with Eclemma coverage tool (in months/years)?
11. Previous experience with Eclemma coverage tool:
[ ] None
12. What is your experience with Software Product Lines (in months/years)?
13. Previous experience with Software Product Lines:
[ ] None
155
B.2. CONSENT FORM
B.2
Consent form
Table B.1: Consent Form
CONSENT FORM
Subject Name:
The information contained in this form are intended to establish a written agreement, whereby the student
authorizes his participation in the experiment RIPLE-TE, with full knowledge of the nature of the procedures
he will be submitted as a participant, with a capacity of free will and without any duress. This participation
is voluntary and the subject is free to withdraw from the experiment at any time and no longer participate in
the study without prejudice to any service that is being or will be submitted.
I. STUDY TITLE:
On the behavior of the RiPLE Unit Test Process for Software Product Lines Development.
II. STUDY GOAL:
Evaluate the use of the proposed Unit Test Process for the development of product lines.
III. RESPONSIBLE INSTITUTIONS:
Federal University of Pernambuco (UFPE) and Federal University of Bahia (UFBA).
IV. REPONSIBLE RESEARCHERS:
Eduardo Almeida, Dr. (UFBA) - Manoel Mendonça, Dr. (UFBA) - Ivan Machado, MSc. Candidate (UFPE).
V. CONSENT:
By signing this consent form, I certify that having read the information above and sufficiently informed of all
statements, I fully agree with the experiment. So, I authorize the execution of the research discussed above.
Salvador, BA, Brazil,
/
/
Signature
156
B.3. ERROR REPORTING FORM
B.3
Error Reporting Form
Table B.2: Error Reporting Form
Subject ID #:
Total Coverage (in %):
Date: __/__/____
Start Time: __:__
End Time: __:__
Id #
Feature
Class
Method
Error Description
Time
Severity
Error Type
Error
ID
Feature where
error was found
Class where
error was found
Method where
error was found
Brief error description
Time when
error was found
High / Medium
/ Low
Interface / Logic / Error
handling / Persistence /
Concurrency / Other
1
2
3
4
5
6
7
8
9
10
__:__
__:__
__:__
__:__
__:__
__:__
__:__
__:__
__:__
__:__
157
B.4. FEEDBACK QUESTIONNAIRE A
B.4
Feedback Questionnaire A
1. Subject Name:
2. ID:
B. REGARDING THE EXPERIMENT
1. How effective was the training, in your opinion? Was it helpful to make you understand the procedures of the RiPLE-TE Unit Testing? There is something missing, or something that you think could
be done better?
[ ] Training was effective, it helped me to understand the unit testing task.
[ ] Training was effective, it helped me to understand the unit testing task, but training timing was too
short.
[ ] It would have been more effective if it had more practical examples.
[ ] Activity fairly intuitive, but you need good experience to apply it according to the rules and estimated
time.
[ ] It should have been shown an example, following step by step all the possible details that might arise
during the testing activity.
2. Did you have doubts on any notion of the presented RiPLE-TE Process? If yes, how you handled it?
[ ] Yes. I asked the instructors for explanations.
[ ] Yes. I just revised training material.
[ ] No.
Comments:
3. Besides the knowledge acquired in training, you needed other information to perform the experiment?
[ ] Yes
[ ] No.
If yes, which additional information?
4. Equivalence Class Partitioning method was applied all the time?
[ ] Yes
[ ] No.
158
B.4. FEEDBACK QUESTIONNAIRE A
If not, why?
5. Was the Equivalence Class Partitioning method effective in help you finding the seeded defects in the
original code?
[ ] Yes
[ ] No.
Comments:
6. Was the Equivalence Class Partitioning method efficient in help you finding the seeded defects in the
original code?
[ ] Yes
[ ] No.
Comments:
7. What were the major difficulties you faced while performing the experiment?
159
B.5. FEEDBACK QUESTIONNAIRE B
B.5
Feedback Questionnaire B
1. Subject Name:
2. ID:
B. REGARDING THE EXPERIMENT
1. How effective was the training, in your opinion? Was it helpful to make you understand the procedures of the RiPLE-TE Unit Testing? There is something missing, or something that you think could
be done better?
[ ] Training was effective, it helped me to understand the unit testing task.
[ ] Training was effective, it helped me to understand the unit testing task, but training timing was too
short.
[ ] It would have been more effective if it had more practical examples.
[ ] Activity fairly intuitive, but you need good experience to apply it according to the rules and estimated
time.
[ ] It should have been shown an example, following step by step all the possible details that might arise
during the testing activity.
2. Did you have doubts on any notion of the presented RiPLE-TE Process? If yes, how you handled it?
[ ] Yes. I asked the instructors for explanations.
[ ] Yes. I just revised training material.
[ ] No.
Comments:
3. Besides the knowledge acquired in RiPLE-TE training, you needed other information to perform the
unit tests in the experiment?
[ ] Yes
[ ] No.
4. Equivalence Class Partitioning method was applied all the time?
[ ] Yes
[ ] No.
160
If not, why?
5. Was the Equivalence Class Partitioning method effective in help you finding the seeded defects in the
original code?
[ ] Yes
[ ] No.
Comments:
6. Was the Equivalence Class Partitioning method efficient in help you finding the seeded defects in the
original code?
[ ] Yes
[ ] No.
Comments:
C. REGARDING THE RIPLE-TE UNIT TESTING PROCESS
1. Besides the knowledge acquired in training, you needed other information to perform the experiment?
[ ] Yes
[ ] No.
2. The RiPLE-TE process was applied all the time?
[ ] Yes
[ ] No.
If not, why?
161
3. Was the RiPLE-TE process efficient in help you finding the seeded defects in the original code?
[ ] Yes
[ ] No.
Comments:
4. Was the RiPLE-TE process effective in help you finding the seeded defects in the original code?
[ ] Yes
[ ] No.
Comments:
5. Do you think that RiPLE-TE process, presented in details in training, contributed to the finding
errors task?
[ ] Yes
[ ] No.
Comment your answer:
6. What were the major difficulties you faced while performing the experiment?
162
B.6. FEEDBACK QUESTIONNAIRE C
B.6
Feedback Questionnaire C
1. Subject Name:
2. ID:
B. REGARDING THE RIPLE-TE UNIT TESTING APPROACH
1. Do you think that RiPLE-TE unit testing approach, just presented in the training session, would aid
you in finding more defects than using an ad-hoc approach, like you did?
[ ] Yes
[ ] No.
Comment your answer.
163

RiPLE-TE: A Software Product Lines Testing

Transcrição

Documentos relacionados

SxA360 - Loja DJ

SbA760 - Loja DJ

RiPLE-RE: A Requirements Engineering Process

Leandro Marques do Nascimento

RiPLE-EM: A Process to Manage Evolution in Software Product Lines

Bruno Cabral`s Bachelor Thesis

ferramentas - Instituto de Computação

Landfill Management:6 Tips for ExcellenceIn Landfill - Web

COOPERAÇÃO INTERNACIONAL E RECUPERAÇÃO DE ATIVOS

arrow global investors - Arrow Global Group PLC

EWWR guidE of good pRacticEs - The European Week for Waste

funpecrp.com.br - Genetics and Molecular Research

Brazil`s leading credit markets event

An Integrated Cost Model for Product Line