You are on page 1of 15

ESE Notes

Software Engineering: The application of a systematic, disciplined, quantifiable approach to the


development, operation and maintenance of software, and the study of these approaches, that is, the
application of engineering to software.

Empirical: It is typically used to define any statement about the world that is related to observation or
experience.

Empirical Software Engineering (ESE): It is an area of research that emphasizes the use of
empirical methods in the field of software engineering. It involves methods for evaluating, assessing,
predicting, monitoring, and controlling the existing artifacts of software development. ESE applies
quantitative methods to the software engineering phenomenon to understand software development
better. ESE has been gaining importance over the past few decades because of the availability of vast
data sets from open-source repositories that contain information about software requirements, bugs,
and changes.

Empirical Study:
It is an attempt to compare the theories and observations using real-life data for analysis. Empirical
studies usually utilize data analysis methods and statistical techniques for exploring relationships.
Benefits of empirical study:

• Allows to explore relationships


• Allows to prove theoretical concepts
• Allows to evaluate accuracy of the models
• Allows to choose among tools and techniques
• Allows to establish quality benchmarks across software organizations
• Allows to assess and improve techniques and methods

Empirical studies are important in the area of software engineering as they allow software professionals
to evaluate and assess the new concepts, technologies, tools, and techniques in scientific and proved
manner. They also allow improving, managing, and controlling the existing processes and techniques by
using evidence obtained from the empirical analysis. Empirical studies are of many types, including
surveys, systematic reviews, experiments, and case studies.

Steps in Empirical Study:


• Research Questions
• Hypothesis Formation
• Data Collection
• Data Analysis
• Model Development and Validation
• Conclusion of Results

Types of Empirical Study:


1. Quantitative Research
a. Most widely used
b. Used to prove or disprove a hypothesis (a concept that has to be tested for further
investigation)
c. Generate results that are generalizable and unbiased and thus can be applied to a larger
population in research
d. Uses statistical methods to validate a hypothesis and to explore causal relationships
e. Generates numerical data for analysis
2. Qualitative Research
a. Widely used in case studies
b. Researchers study human behaviour, preferences, and nature
c. Uses focused data for research
d. Understanding a new process or technique in software engineering is an example of
qualitative research
e. Involves methods such as observations, interviews, and group discussions
f. Can be used to analyse and interpret the meaning of results produced by quantitative
research
g. Generates non-numerical data for analysis
h. Data generated is rich as compared to quantitative data
Experiment:
1. It tests the established hypothesis by finding the effect of variables of interest (independent
variables) on the outcome variable (dependent variable) using statistical analysis.
2. Experiments are small scale and must be controlled.
3. Carried out in a controlled environment and often referred to as controlled experiments
4. The key factors involved are independent variables, dependent variables, hypothesis, and
statistical techniques

Case Study:
1. Represents real-world scenarios and involves studying a particular phenomenon.
2. Allows software industries to evaluate a tool, method, or process.
3. Increases the understanding of the phenomenon under study.
4. Qualitative data is usually collected in a case study. The sources include interviews, group
discussions, or observations. The data may be directly or indirectly collected from participants
5. Case studies are appropriate where a phenomenon is to be studied for a longer period of time so
that the effects of the phenomenon can be observed.
6. The disadvantages of case studies include difficulty in generalization as they represent a typical
situation.

Survey Research:
1. Identifies features or information from a large scale of a population.
2. Surveys are usually conducted using questionnaires and interviews.
3. The questionnaire/survey can be used to detect trends and may provide valuable information and
feedback on a particular process, technique, or tool.
4. Surveys are classified into three types—descriptive, explorative, and explanatory.
a. Exploratory Survey – It focuses on the discovery of new ideas and insights and are usually
conducted at the beginning of a research study to gather initial information.
b. Descriptive Survey - It is more detailed and describes a concept or topic.
c. Explanatory Survey - It tries to explain how things work in connections like cause and effect,
meaning a researcher wants to explain how things interact or work with each other.

Systematic Review:
1. It is methodically undertaken with a specific search strategy and well-defined methodology to
answer a set of questions.
2. The aim of a systematic review is to analyse, assess, and interpret existing results of research to
answer research questions.
3. The purpose of a systematic review is to summarize the existing research and provide future
guidelines for research by identifying gaps in the existing literature. A systematic review involves:
a. Defining research questions.
b. Forming and documenting a search strategy.
c. Determining inclusion and exclusion criteria.
d. Establishing quality assessment criteria.
4. The systematic reviews are performed in three phases: planning the review, conducting the review,
and reporting the results of the review.

Postmortem Analysis:
1. Carried out after an activity or a project has been completed.
2. Main aim is to detect how the activities or processes can be improved in the future.
3. Captures knowledge from the past, after the activity has been completed.
4. Can be classified into two types: general postmortem analysis and focused postmortem analysis.
a. General Postmortem Analysis - It collects all available information from a completed activity.
b. Focused Postmortem Analysis – It collects information about specific activity such as effort
estimation.
5. In postmortem analysis, large software systems are analysed to gain knowledge about the good
and bad practices of the past.
6. The techniques such as interviews and group discussions can be used for collecting data in
postmortem analysis.

Empirical Study Process:


Difference between Empirical and Experimental Research: In experimental software engineering, the
dependent variable is closely observed in a controlled environment. The data is artificial or synthetic
but is more controlled. Empirical studies are used to define anything related to observation and
experience and are valuable as these studies consider real-world data.
1. Study Definition: It involves the definition of the goals and objectives of the empirical study.
The aim of the study is explained in this step. Parts of study definition-
a. Scope: What are the dimensions of the study?
b. Motivation: Why is it being studied?
c. Object: What entity is being studied?
d. Purpose: What is the aim of the study?
e. Perspective: From whose view is the study being conducted (e.g, project manager,
customer)?
f. Domain: What is the area of the study?
2. Experiment Design: The design of the experiment covers stating the research questions, formation
of the hypothesis, selection of variables, data-collection strategies, and selection of data analysis
methods. The context of the study is defined in this phase. The aim of the design phase should be
to select methods and techniques that promote replicability and reduce experiment bias.
3. Research Conduct and Analysis: The experiment analysis phase involves understanding the data
by collecting descriptive statistics. The unrelated attributes are removed, and the best attributes
(variables) are selected out of a set of attributes (e.g., software metrics) using attribute reduction
techniques. After removing irrelevant attributes, hypothesis testing is performed using statistical
tests and, on the basis of the result obtained, a decision regarding the acceptance or rebuttal of
the hypothesis is made. Finally, for analyzing the casual relationships between the independent
variables and the dependent variable, the model is developed and validated.
4. Results Interpretation: The results computed in the empirical study’s analysis phase are assessed
and discussed. The reason behind the acceptance or rejection of the hypothesis is examined. This
process provides insight to the researchers about the actual reasons of the decision made for
hypothesis. The significance and practical relevance of the results are defined in this phase.
The limitations of the study are also reported in the form of threats to validity.
5. Reporting: The results of the study can be disseminated in the form of a conference article, a
journal paper, or a technical report. The results are to be reported from the reader’s perspective.
Thus, the background, motivation, analysis, design, results, and the discussion of the results must
be clearly documented. The experiment settings, data-collection methods, and design processes
must be reported in significant level of detail.

Characteristics of a Good Empirical Study:


1. Clear 5. Precise
2. Valid 6. Unbiased
3. Control 7. Replicable
4. Repeatable 8. Descriptive

Ethics of Empirical Research:


Researchers, academicians, and sponsors should be aware of research ethics while conducting and
funding empirical research in software engineering. Four ethical principles are:
1. Informed consent: The subjects or participants must be provided all the relevant information
related to the experiment or study. The participants must willingly agree to participate in the
research process.
2. Scientific value: This principle states that the research results must be correct and valid. This issue
is critical if the researchers are not familiar with the technology or methodology they are using as
it will produce results of no scientific value.
3. Confidentiality: It refers to anonymity of data, participants, and organizations.
4. Beneficence: The research must provide maximum benefits to the participants and protect the
interests of the participants.

Basic Elements of Empirical Research:


1. Purpose: It defines the reason of the research, the relevance of the topic, specific aims in the form
of research questions, and objectives of the research.
2. Process: It lays down the way in which the research will be conducted. It defines the sequence of
steps taken to conduct research. It provides details about the techniques, methodologies, and
procedures to be used in the research. The data-collection steps, variables involved, techniques
applied, and limitations of the study are defined in this step. The process should be followed
systematically to produce successful research.
3. Participants: They are the subjects involved in the research. The participants may be interviewed or
closely observed to obtain the research results. While dealing with participants, ethical issues in ESE
must be considered so that the participants are not harmed in any way.
4. Product: It is the outcome produced by the research. The final outcome provides the answer to
research questions in the empirical research. The new technique developed or methodology
produced can also be considered as a product of the research. The journal paper, conference article,
technical report, thesis, and book chapters are products of the research.

Descriptive, Correlational, and Cause–Effect Research:


1. Descriptive research provides description of concepts.
2. Correlational research provides relation between two variables.
3. Cause–effect research is similar to experiment research in that the effect of one variable on another
is found.

Classification and Prediction:


1. Classification predicts categorical outcome variables (ordinal or nominal). The training data is used
for model development, and the model can be used for predicting unknown categories of outcome
variables. The classification techniques take training data (comprising of the independent and the
dependent variables) as input and generate rules or mathematical formulas that are used by
validation data to verify the model predicted. The generated rules or mathematical formulas are
used by future data to predict categories of the outcome variables.
2. Prediction is similar to classification except that the outcome variable is continuous.
Quantitative and Qualitative Data:

Independent, Dependent, and Confounding Variables:


1. Independent Variables (or predictor variables) – They are input variables that are manipulated or controlled
by the researcher to measure the response of the dependent variable.
2. Dependent Variable (or response variable) - It is the output produced by analyzing the effect of the
independent variables. The dependent variables are presumed to be influenced by the independent variables.
The independent variables are the causes and the dependent variable is the effect. Usually, there is only one
dependent variable in the research.
3. Confounding Variables (extraneous or unknown variables) – They may affect the outcome (dependent)
variable. Randomization can nullify the effect of confounding variables. In randomization, many replications
of the experiments are executed and the results are averaged over multiple runs, which may cancel the effect
of extraneous variables in the long course.

Proprietary, Open Source, and University Software:


1. Open-source software – It is usually a freely available software, developed by many developers from
different places in a collaborative manner. For example, Google Chrome, Android operating system,
and Linux operating system.
2. Proprietary software – It is a licensed software owned by a company. For example, Microsoft Office,
Adobe Acrobat, and IBM SPSS are proprietary software. In practice, obtaining data from proprietary
software for research validation is difficult as the software companies are usually not willing to share
the information about their software systems.
3. University Software - The software developed by the student programmers is generally small and
developed by limited number of developers. If the decision is made for collecting and using this type
of data in research then the guidelines similar to given above must be followed to promote unbiased
and replicated results.
Within-Company and Cross-Company Analysis:
1. Within-Company Analysis - The empirical study collects the data from the old versions/ releases of
the same software, predicts models, and applies the predicted models to the future versions of the
same project.
2. Cross-Company Analysis - When old data isn’t available, the data obtained from similar earlier
projects developed by different companies are used for prediction in new projects. It is the process
of validating the predicted model using data collected from different projects from which the model
has been derived.

Parametric and Nonparametric Tests:


1. Parametric Tests – They are used for data samples having normal distribution (bell-shaped curve). If
the assumptions of parametric tests are met, they are more powerful as they use more information
while computation.
2. Nonparametric Tests - They are used when the distribution of data samples is highly skewed.

Systematic Review Vs. Literature Survey:


Characteristics of Systematic Reviews:
1. Selection of High-Quality Research: SRs focus on selecting relevant, important, and essential high-
quality research papers and studies that are summarized in one review paper.
2. Systematic Search Strategy: SRs involve forming a systematic search strategy to identify primary
studies from digital libraries. The search strategy is documented for transparency and
reproducibility.
3. Valid Review Protocol and Research Questions: An SR includes a valid review protocol and well-
constructed research questions that address the issues to be answered in the review.
4. Comprehensive Study Summaries: SRs clearly summarize the characteristics of each selected study,
including aims, techniques, and methods used in the studies.
5. Quality Assessment Criteria: SRs have justified quality assessment criteria for inclusion and
exclusion of studies, allowing for the determination of each study's effectiveness.
6. Effective Presentation Tools: SRs use various presentation tools to report the findings and results
of selected studies included in the review.
7. Identification of Gaps and Future Directions: SRs identify gaps in current findings and highlight
future research directions based on the synthesized evidence.

Importance of Systematic Reviews:


1. They gather important empirical evidence on the technique or method being focused in the SR. On
the basis of the empirical evidence, the strengths and weaknesses of the technique may be
summarized.
2. They identify the gaps in the current research.
3. They report the commonalities and the differences in the primary studies.
4. They provide future guidelines and framework to researchers and practitioners to perform new
research.
Stages of Systematic Review:
1. Planning the Review
a. Identify the need for the review by recognizing gaps in existing research.
b. Formulate research questions that address the issues to be answered in the review.
c. Develop, document, and analyse the review protocol, which includes search strategy design,
study selection criteria, quality assessment criteria, data extraction process, and data synthesis
process.
2. Conducting the Review
a. Execute the search strategy to identify primary studies from digital libraries.
b. Select relevant studies based on inclusion and exclusion criteria.
c. Assess the quality of selected studies using predefined criteria.
d. Extract data from high-quality primary studies.
e. Synthesize data to analyse and interpret findings.
3. Reporting the Review Results
a. Evaluate the review protocol to ensure adherence to planned procedures.
b. Present and verify the results obtained from the review process.

Formation of Research Questions (RQ):


Formation of research questions is a crucial step in systematic review (SR) process. Research questions
should be well-formed and constructed after a thorough analysis. The structure of an SR depends on the
content of the research questions formed, and key decisions are based on the questions such as which
studies to focus on, where to search them, and how to assess the quality of these studies. Here are the
steps involved in forming research questions:
1. Identify relevant issues: Identify the areas that need to be explored or answered during the
proposed SR.
2. Assess existing reviews: Determine if the areas have already been explored in the existing reviews
(if any).
3. Determine target audience: Determine if the questions are important to researchers and software
practitioners.
4. Identify tools and techniques: Determine which tools and techniques will be evaluated in the study.
5. Determine outcomes: Determine the outcomes of the study, such as identifying similarities in
trends or deviations from existing trends.
6. Determine environment: Determine if the study will be conducted in an academic or industry
environment.
7. Construct research questions: Construct research questions based on the identified issues and
considerations.

Review Protocol:
It involves several key decisions based on the research questions, such as which studies to focus on,
where to search them, and how to assess the quality of these studies. The review protocol is established
by frequently holding meetings and group discussions in a group comprising of preferably senior
members having experience in the area. The process is iterative and is defined and refined in various
iterations. The steps involved in developing the review protocol are:
1. Formation of search terms: Identify search terms by breaking down research questions into
individual units, using search terms in the titles, keywords, and abstracts of relevant studies, and
identifying alternative terms and synonyms for the main search terms.
2. Selection of digital libraries: Choose digital libraries that must be searched to identify primary
studies.
3. Refinement of search terms: Form sophisticated search terms by incorporating alternative terms
and synonyms using Boolean expression "OR" and combining main search terms using "AND".
4. Initial search: Conduct an initial search using the identified search terms and digital libraries.
5. Identification of primary studies: Review the search results to identify primary studies that will
address the research questions.
6. Quality assessment criteria: Establish quality assessment criteria for inclusion and exclusion of
studies in the SR.
7. Data extraction: Design data extraction forms to collect the required information to answer the
research questions.
8. Data synthesis: Devise methods for data synthesis from the selected high-quality primary studies.
Why identification of the need for an SR is considered the most important step in
planning the review?
1. This identification involves recognizing existing literature, forming essential research questions,
assessing practical relevance, evaluating the quality of previous reviews, and justifying the proposed
review.
2. By understanding the current state of knowledge and addressing research gaps, researchers can
ensure that the SR is relevant, impactful, and contributes significantly to the field.
3. Additionally, this step involves reviewing existing SRs in the same domain to identify areas that
require further exploration and ensuring that the new SR adds value to existing knowledge.
4. The process includes determining the number of primary studies available, assessing existing SR
strengths and weaknesses, evaluating practical relevance, guiding practitioners and researchers,
and establishing criteria for evaluating the quality of the proposed SR.
Reporting the SR:
1. The report typically includes sections like Title, Author Details, Abstract, Introduction, Method,
Results, Threats to Validity, Conclusions, References and Appendix.
2. The Title should be concise and informative, while the Abstract provides an overview of the study's
relevance, importance, and main findings.
3. The Introduction justifies the need for the SR and presents an overview of existing SRs in the field.
4. The Method section details the research questions, search strategy, study selection criteria, and
quality assessment criteria.
5. Results highlight major findings, and Conclusions discuss implications and future research
directions.
6. Additionally, discussing the strengths and limitations of the SR is essential to provide context for the
findings and their impact on the overall results.

Experimental Design:
1. Experimental design is a very important activity, which involves laying down the background of the
experiment in detail. This includes understanding the problem, identifying goals, developing various
RQ, and identifying the environment.
2. In this phase, an extensive survey is conducted to have a complete overview of all the work done in
literature till date. Besides this, the research is formally stated, including a null hypothesis and an
alternative hypothesis.
3. The next step in design phase is to determine and define the variables. The variables are of two
types: dependent and independent variables. In this step, the variables are identified and defined.
The measurement scale should also be defined. This imposes restrictions on the type of data
analysis method to be used.
4. The environment in which the experiment will be conducted is also determined, for example,
whether the experiment will use data obtained from industry, open source, or university.
5. Finally, the data analysis methods to be used for performing the analysis are selected.
Research Questions (RQ):
1. RQs are questions that state the problem, identify main concepts, and relations to be explored in
research. They can be formed based on different research types like causal relationships, exploratory
research, explanatory research, and descriptive research.
2. The first step in the experimental design is to formulate the RQs. This step states the problem in the
form of questions, and identifies the main concepts and relations to be explored. The first question
that comes to our mind when doing research is “What is the need to conduct research (about a
particular topic)?”
3. If the questions are not yet answered, the researcher intends to answer those questions and carry
forward the research.
4. A research problem can be defined as a condition that can be studied or investigated through the
collection and analysis of data having theoretical or practical significance. Research problem is
defined as a part of research for which the researcher is continuously thinking about and wants to
find a solution for it.
5. Hence, the RQs must fill the gap between existing literature and current work and must give some
new perspective to the problem.
6. Characteristics of RQ are –
• Clear
• Unambiguous
• Empirical Focus
• Important
• Manageable
• Practical Use
• Related to Literature
• Ethically Neutral
Literature Review:
A literature review is an essential component of research that involves analyzing a body of literature
related to a research topic to gain a clear understanding of the topic, existing work, key issues, and
areas needing further exploration. It provides an overview of existing research in the field and helps in
identifying gaps and controversies. The main aim is to contribute to a better understanding of the
concerned field by collecting ideas, views, information, and evidence on the topic under investigation.
Key aspects of a literature review:

• Identifying key theories, concepts, and ideas.


• Highlighting knowledge gaps.
• Addressing major issues and controversies.
• Examining methodologies used and their quality.
• Understanding academic terminologies and information sources.
• Identifying key questions addressed to date.
• Exploring areas where different authors have varying views.
• Determining potential areas for further research.
Purpose of a literature review:

• To analyse and evaluate existing literature in relation to the area being explored.
• To familiarize the researcher with current research before starting their own study in the same
area.

Steps in literature review:


1. Develop search strategy: This step involves identification of digital portals, research journals, and
formation of search string. This involves survey of scholarly journal articles, conference articles,
proceeding articles, books, technical reports, and Internet resources in various research-related
digital portals (IEEE, Springer, Google Scholar, Wiley, etc.).
2. Conduct the search: This step involves searching the identified sources by using the formed search
string. The abstracts and/or full texts of the research papers should be obtained for reading and
analysis.
3. Analyze the literature: Once the research papers relevant to the research topic have been obtained,
the abstract should be read, followed by the introduction and conclusion sections.
4. Use the results: The results obtained from the literature review must then be summarized for later
comparison with the results obtained from the current work.
Guidelines for writing literature review:
• Identify the topics that are similar in multiple papers to compare and contrast different authors’
view.
• Group authors who draw similar conclusions.
• Group authors who are in disagreement with each other on certain topics.
• Compare and contrast the methodologies proposed by different authors.
• Show how the study is related to the previous studies in terms of the similarities and the
differences.
• Highlight exemplary studies and gaps in the research

Research Variable:
A research variable is a characteristic or attribute that can be measured and varies or can be
manipulated in research. In an experiment, there are two types of
variables: dependent and independent variables.

Selection of variables:

• It is based on the domain knowledge of the researcher. The variables are identified according to the
things being measured.
• Researcher must carefully analyze the research problem and identify the variables affecting the
dependent variable. The literature review can also help in identification of variables.
• The selection is based on the researcher’s experience, information obtained from existing published
research, and judgment or advice obtained from the experts in the related areas. The research
problem must be thoroughly and carefully analyzed to identify the variables.

You might also like