You are on page 1of 22

ANALYSIS OF SEARCH

METRICS
Jyotisman Das
17CS10017
Mining Software Repositories with
iSPARQL and a Software Evolution
Ontology, 2007
Christoph Kiefer, Abraham Bernstein, Jonas Tappolet
Department of Informatics
University of Zurich, Switzerland {kiefer,bernstein}@ifi.unizh.ch,
jtappolet@access.unizh.ch
Motivation

• Choice of a proper data analysis/exchange format.


• Existing formats require to be processed specifically.
• Exhaustive/repetitive import/export of programs to
the designated format.
• Requirement of a tool to aid in software repository
mining projects universally.
Introduction and Architecture
• EvoOnt – OWL based software evolution data exchange
format.
• Means to store all elements necessary for software evolution.
• OWL is a quasi standard – Fresh code, complicated command line tools not
required.
• OWL’s Description Framework foundation – Additional assertions can be
derived.
• iSPARQL – SPARQL based semantic web query engine.
• Query for similar software entitites in OWL software repositories.
• Exploits the semantic annotation of EvoOnt.
• Ontology Models – Encapsulate different aspects of OOSSC
• iSPARQL – “Virtual tuples”
Experiments
• Code Evolution measurement
• Refactoring experiment
• Metrics experiment (Detection of God Classes)
• Ontological reasoning experiments
Software Metrics
• Detection of “God Classes”
• Metrics used:
• NOM (Number of methods)
• NOA (Number of attributes)
• Bug Reports Count (For validation)

Results of God Results of bug


Class Query reports Query
Takeaway

• Mine semantically annotated software repositories


• Fosters refactoring
• Quantifies the size and complexity of software design
• Allows to derive additional assertions

• Loss of information due to use of FAMIX based model


• Unsatisfactory performance
Summary
• EvoOnt, together with iSPARQL framework
• Four sets of experiments to showcase the power of EvoOnt
• Tests
• Software Metrics evaluation
• God Class Detection
• Source – “org.eclipse.compare” plug-in
• Metrics used – NOM and NOA
• Query – Classes having NOM and NOA greater than 15
• Results – Class “TextMergeViewer” was found to be a God Class
• Validation – The bug reports count indicated the presence of God Class (via
correlation)
Codebook: Discovering and Exploiting
Relationships in Software Repositories,
2010
Andrew Begel Khoo Yit Phang Thomas Zimmermann
Microsoft Research University of Maryland Microsoft Research
Redmond, WA, USA College Park, MD, USA Redmond, WA, USA
andrew.begel@microsoft.com khooyp@cs.umd.edu tzimmer@microsoft.com
Motivation
• Survey at Microsoft
• Communication and collaboration – Key factors for
successful development of a product
• Finding and keeping track of other engineers – A major
problem
• Development of connections – Role of shared work
• Requirement of a tool that discovers connections
among the work-related repositories of engineers
Introduction and Approach
• The Codebook framework for mining repositories
• Getting to know and staying in touch with people
• Software repositories – Major source of interactions
• Single data structure and a single algorithm
• Graph of typed nodes – Key DS
• Crawlers
• Hoozizat – Web-based search portal
• Deep Intellisense – Complete history of events
Search Metrics
• Search takes a set of keywords and returns a ranked list of
Codebook
• Initially - TF-IDF (Poor results)
• Semantically meaning link structure
• Anchor text (Missing)
• Path Regular expressions (25!)
• Degree of Anchor Edges
• Score – Generated using SQL Server’s full-text search algorithm
and Anchor Metadata
• Ranking based on the above score
• Data Source – TFS source code, work item repositories and
Microsoft Active Directory
Example, Tests and Results
• Hoozizat – Finding people with Codebook
Takeaway
• Software Engineers no longer have to dig through
repositories or pester their colleagues to discover, track
and maintain connections to other people and their
associated work artifacts
• Dependent updates which are necessary for the proper
working of any framework/software can be done easily
• Easy maintenance of connections
Summary
• The problem of inter-team coordination
• Codebook – A framework for connecting engineers and their
work artifacts together
• Two front-end applications – Hoozizat and Deep Intellisense
• Query - Finding the correct person using Codebook
• Search Metric – Ranking based on a score generated using
SQL Server and Metadata
• Data Source - TFS source code, work item repositories and
Microsoft Active Directory
• Validation – Manual verification after search (group of 5-6
people)
NL-Based Query Refinement and
Contextualized Code Search Results: A User
Study, 2014
Emily Hill, Manuel Roldan-Vega, Jerry Alan Fails, Greg Mallet
Department of Computer Science
Montclair State University
Montclair, NJ, USA
{hillem, roldanvegam1, failsj, malletg1}@mail.montclair.edu
Summary
• The problem of software maintenance
• Challenges faced by the developer
• Formulation
• Relevance
• Reformulation, if not relevant
• CONQUER – A novel source code search interface
• Search Metric – Scoring function based on SWUM
• Location
• Semantic Role
• Head distance
• Usage
• 28 Search Tasks
• Source – Rhino, iReport, jBidWatcher, javaHMO and Jajuk
• Results and Validation – 4 factors (Enjoyment, effectiveness, ease of
use, likelihood of future use)
AN OVERVIEW OF EVALUATION
METHODS IN TREC AD HOC
INFORMATION RETRIEVAL AND
TREC QUESTION
SimoneANSWERING,
Teufel
Computer Laboratory
2007
University of Cambridge, United Kingdom
Simone.Teufel@cl.cam.ac.uk
Summary
• The necessity of evaluation of systems in natural language?
• Ad hoc IR (Information retrieval)
• QA (Question Answering)
• Evaluation Criteria
• Ad Hoc IR – “Possible” queries
• QA – “Correct” answer
• Metrics
• IR – Recall, precision and accuracy, F-measure, 11-point average
precision, Mean Average Precision (MAP)
• QA – Mean reciprocal rank (MRR), weighted confidence, and average
accuracy
• Source, Results and Validation – TREC experiments
• QA has a better performance than IR (best case)
On Search Engine Evaluation Metrics,
2012der Philosophie (Dr. Phil.)
Inaugural-Dissertation
zur Erlangung des Doktorgrades
durch die Philosophische Fakultät der
Heinrich-Heine-Universität Düsseldorf
Vorgelegt von Pavel Sirotkin
aus Düsseldorf
Betreuer:
Prof. Wolfgang G. Stock
Düsseldorf, April 2012
Summary
• Introduction to Metrics
• Explicit and Implicit Metrics
Thank You

You might also like