Results 1 - 10
of
38
Feature location using probabilistic ranking of methods based on execution scenarios and information retrieval
- IEEE Trans. Software Eng
, 2007
"... Abstract—This paper recasts the problem of feature location in source code as a decision-making problem in the presence of uncertainty. The solution to the problem is formulated as a combination of the opinions of different experts. The experts in this work are two existing techniques for feature lo ..."
Abstract
-
Cited by 115 (48 self)
- Add to MetaCart
(Show Context)
Abstract—This paper recasts the problem of feature location in source code as a decision-making problem in the presence of uncertainty. The solution to the problem is formulated as a combination of the opinions of different experts. The experts in this work are two existing techniques for feature location: a scenario-based probabilistic ranking of events and an information retrieval-based technique that uses latent semantic indexing. The combination of these two experts is empirically evaluated through several case studies, which use the source code of the Mozilla Web browser and the Eclipse integrated development environment. The results show that the combination of experts significantly improves the effectiveness of feature location when compared to each of the experts used independently. Index Terms—program understanding, feature identification, concept location, dynamic and static analyses, information retrieval, Latent Semantic Indexing, scenario-based probabilistic ranking, open source software.
Combining Formal Concept Analysis with Information Retrieval for Concept Location in Source Code
- in Proc. of ICPC'07
, 2007
"... The paper addresses the problem of concept location in source code by presenting an approach which combines Formal Concept Analysis (FCA) and Latent Semantic Indexing (LSI). In the proposed approach, LSI is used to map the concepts expressed in queries written by the programmer to relevant parts of ..."
Abstract
-
Cited by 70 (18 self)
- Add to MetaCart
(Show Context)
The paper addresses the problem of concept location in source code by presenting an approach which combines Formal Concept Analysis (FCA) and Latent Semantic Indexing (LSI). In the proposed approach, LSI is used to map the concepts expressed in queries written by the programmer to relevant parts of the source code, presented as a ranked list of search results. Given the ranked list of source code elements, our approach selects most relevant attributes from these documents and organizes the results in a concept lattice, generated via FCA. The approach is evaluated in a case study on concept location in the source code of Eclipse, an industrial size integrated development environment. The results of the case study show that the proposed approach is effective in organizing different concepts and their relationships present in the subset of the search results. The proposed concept location method outperforms the simple ranking of the search results, reducing the programmers ’ effort. 1.
Feature Location via Information Retrieval based Filtering of a Single Scenario Execution Trace
- in Automated Software Engineering (ASE 2007
, 2007
"... The paper presents a semi-automated technique for feature location in source code. The technique is based on combining information from two different sources: an execution trace, on one hand and the comments and identifiers from the source code, on the other hand. Users execute a single partial scen ..."
Abstract
-
Cited by 60 (26 self)
- Add to MetaCart
(Show Context)
The paper presents a semi-automated technique for feature location in source code. The technique is based on combining information from two different sources: an execution trace, on one hand and the comments and identifiers from the source code, on the other hand. Users execute a single partial scenario, which exercises the desired feature and all executed methods are identified based on the collected trace. The source code is indexed using Latent Semantic Indexing, an Information Retrieval method, which allows users to write queries relevant to the desired feature and rank all the executed methods based on their textual similarity to the query. Two case studies on open source software (JEdit and Eclipse) indicate that the new technique has high accuracy, comparable with previously published approaches and it is easy to use as it considerably simplifies the dynamic analysis.
On the Use of Relevance Feedback in IR-Based Concept Location
- In Proceedings of IEEE International Conference on Software Maintenance
"... Concept location is a critical activity during software evolution as it produces the location where a change is to start in response to a modification request, such as, a bug report or a new feature request. Lexical-based concept location techniques rely on matching the text embedded in the source c ..."
Abstract
-
Cited by 35 (7 self)
- Add to MetaCart
(Show Context)
Concept location is a critical activity during software evolution as it produces the location where a change is to start in response to a modification request, such as, a bug report or a new feature request. Lexical-based concept location techniques rely on matching the text embedded in the source code to queries formulated by the developers. The efficiency of such techniques is strongly dependent on the ability of the developer to write good queries. We propose an approach to augment information retrieval (IR) based concept location via an explicit relevance feedback (RF) mechanism. RF is a two-part process in which the developer judges existing results returned by a search and the IR system uses this information to perform a new search, returning more relevant information to the user. A set of case studies performed on open source software systems reveals the impact of RF on IR based concept location. 1.
On integrating orthogonal information retrieval methods to improve traceability recovery,” The College of Williams and
, 2011
"... Abstract—Different Information Retrieval (IR) methods have been proposed to recover traceability links among software artifacts. Until now there is no single method that sensibly outperforms the others, however, it has been empirically shown that some methods recover different, yet complementary tra ..."
Abstract
-
Cited by 25 (9 self)
- Add to MetaCart
(Show Context)
Abstract—Different Information Retrieval (IR) methods have been proposed to recover traceability links among software artifacts. Until now there is no single method that sensibly outperforms the others, however, it has been empirically shown that some methods recover different, yet complementary traceability links. In this paper, we exploit this empirical finding and propose an integrated approach to combine orthogonal IR techniques, which have been statistically shown to produce dissimilar results. Our approach combines the following IRbased
Can better identifier splitting techniques help feature location
- in Program Comprehension (ICPC), 2011 IEEE 19th International Conference on, 2011
"... Abstract — The paper presents an exploratory study of two feature location techniques utilizing three strategies for splitting identifiers: CamelCase, Samurai and manual splitting of identifiers. The main research question that we ask in this study is if we had a perfect technique for splitting iden ..."
Abstract
-
Cited by 22 (9 self)
- Add to MetaCart
(Show Context)
Abstract — The paper presents an exploratory study of two feature location techniques utilizing three strategies for splitting identifiers: CamelCase, Samurai and manual splitting of identifiers. The main research question that we ask in this study is if we had a perfect technique for splitting identifiers, would it still help improve accuracy of feature location techniques applied in different scenarios and settings? In order to answer this research question we investigate two feature location techniques, one based on Information Retrieval and the other one based on the combination of Information Retrieval and dynamic analysis, for locating bugs and features using various configurations of preprocessing strategies on two open-source systems, Rhino and jEdit. The results of an extensive empirical evaluation reveal that feature location techniques using Information Retrieval can benefit from better preprocessing algorithms in some cases, and that their improvement in effectiveness while using manual splitting over state-of-the-art approaches is statistically significant in those cases. However, the results for feature location technique using the combination of Information Retrieval and dynamic analysis do not show any improvement while using manual splitting, indicating that any preprocessing technique will suffice if execution data is available. Overall, our findings outline potential benefits of putting additional research efforts into defining more sophisticated source code preprocessing techniques as they can still be useful in situations where execution information cannot be easily collected. Keywords-feature location; information retrieval; dynamic analysis; identifier splitting algorithms I.
Using Information Retrieval to Support Software Maintenance Tasks
, 2008
"... This paper presents an approach based on Information Retrieval (IR) techniques for extracting and representing the unstructured information in large software systems such that it can be automatically combined with analysis of program dependencies and execution traces to define new techniques for fea ..."
Abstract
-
Cited by 7 (0 self)
- Add to MetaCart
(Show Context)
This paper presents an approach based on Information Retrieval (IR) techniques for extracting and representing the unstructured information in large software systems such that it can be automatically combined with analysis of program dependencies and execution traces to define new techniques for feature location, impact analysis, and software measurement tasks. We expect that these new techniques will contribute directly to the improvement of design of incremental changes and thus increased software quality and reduction of software maintenance costs. The presented results are based on the author’s doctoral dissertation [23]. 1.
Software is Data Too
"... Software systems are designed and engineered to process data. However, software is data too. The size and variety of today’s software artifacts and the multitude of stakeholder activities result in so much data that individuals can no longer reason about all of it. We argue in this position paper th ..."
Abstract
-
Cited by 6 (1 self)
- Add to MetaCart
Software systems are designed and engineered to process data. However, software is data too. The size and variety of today’s software artifacts and the multitude of stakeholder activities result in so much data that individuals can no longer reason about all of it. We argue in this position paper that data mining, statistical analysis, machine learning, information retrieval, data integration, etc., are necessary solutions to deal with software data. New research is needed to adapt existing algorithms and tools for software engineering data and processes, and new ones will have to be created. In order for this type of research to succeed, it should be supported with new approaches to empirical work, where data and results are shared globally among researchers and practitioners. Software engineering researchers can get inspired by other fields, such as, bioinformatics, where results of mining and analyzing biological data are often stored in databases shared across the world.
Traceclipse: an eclipse plug-in for traceability link recovery and management
- in 6th Int’l TEFSE Workshop
"... Traceability link recovery is an active research area in software engineering with a number of open research questions and challenges, due to the substantial costs and challenges associated with software maintenance. We propose Traceclipse, an Eclipse plug-in that integrates some similar characteris ..."
Abstract
-
Cited by 5 (1 self)
- Add to MetaCart
(Show Context)
Traceability link recovery is an active research area in software engineering with a number of open research questions and challenges, due to the substantial costs and challenges associated with software maintenance. We propose Traceclipse, an Eclipse plug-in that integrates some similar characteristics of traceability link recovery techniques in one easy-to-use suite. The tool enables software developers to specify, view, and manipulate traceability links within Eclipse and it provides an API through which recovery techniques may be added, specified, and run within an integrated development environment. The paper also presents initial case studies aimed at evaluating the proposed plug-in.
Process improvement for traceability: A study of human fallibility
- In 20th IEEE International Requirements Engineering Conference (RE
, 2012
"... Abstract—Human analysts working with results from automated traceability tools often make incorrect decisions that lead to lower quality final trace matrices. As the human must vet the results of trace tools for mission- and safety-critical systems, the hopes of developing expedient and accurate tra ..."
Abstract
-
Cited by 5 (2 self)
- Add to MetaCart
(Show Context)
Abstract—Human analysts working with results from automated traceability tools often make incorrect decisions that lead to lower quality final trace matrices. As the human must vet the results of trace tools for mission- and safety-critical systems, the hopes of developing expedient and accurate tracing procedures lies in understanding how analysts work with trace matrices. This paper describes a study to understand when and why humans make correct and incorrect decisions during tracing tasks through logs of analyst actions. In addition to the traditional measures of recall and precision to describe the accuracy of the results, we introduce and study new measures that focus on analyst work quality: potential recall, sensitivity, and effort distribution. We use these measures to visualize analyst progress towards the final trace matrix, identifying factors that may influence their performance and determining how actual tracing strategies, derived from analyst logs, affect results.