Information The paper focuses on investigating the combined use of semantic and structural information of programs to support the comprehension tasks involved in the maintenance and reengineering of software systems. Here, semantic refers to the domain specific issues (both problem and development domains) of a software system. The other dimension, structural, refers to issues such as the actual syntactic structure of the program along with the control and data flow that it represents. An advanced information retrieval method, latent semantic indexing, is used to define a semantic similarity measure between software components. Components within a software system are then clustered together using this similarity measure. Simple structural information (i.e., file organization) of the software system is then used to assess the semantic cohesion of the clusters and files, with respect to each other. The measures are formally defined for general application. A set of experiments is presented which demonstrates how these measures can assist in the understanding of a nontrivial software system, namely a version of NCSA Mosaic. 1.
|
1463
|
Indexing by Latent Semantic Analysis
– Deerwester, Dumais, et al.
- 1990
|
|
957
|
Automatic Text Processing: The Transformation, Analysis, and Retrieval of Information by Computer
– Salton
|
|
850
|
Principal Component Analysis
– Jolliffe
- 1986
|
|
516
|
Features of similarity
– Tversky
- 1977
|
|
444
|
Solutions to Plato’s problem: The latent semantic analsyis theory of acquisition, induction, and representation of knowledge
– Landauer, Dumais
- 1997
|
|
377
|
Using linear algebra for intelligent information retrieval
– Berry, Dumais, et al.
- 1995
|
|
228
|
On the Shortest spanning subtree of a graph and the travelling salesman problem
– Kruskal
|
|
166
|
Empirical studies of programming knowledge
– Soloway, Ehrlich
- 1984
|
|
139
|
An information retrieval approach for automatically constructing software libraries
– Maarek, Berry, et al.
- 1991
|
|
111
|
Program understanding and the concept assignment problem
– Biggerstaff, Mitbander, et al.
- 1994
|
|
108
|
An Intelligent Tool for Reengineering Software Modularity
– Schwanke
|
|
99
|
A Reverse Engineering Approach to Subsystem Structure Identification
– Müller, Orgun, et al.
- 1993
|
|
86
|
System Structure Analysis: Clustering with Data Bindings
– Hutchens, Basili
- 1985
|
|
77
|
Latent semantic indexing (LSI) and TREC-2
– Dumais
- 1994
|
|
76
|
Using Automatic Clustering to Produce High-Level System Organizations of Source Code
– Mancoridis, Mitchell, et al.
- 1998
|
|
71
|
A survey of information retrieval and filtering methods
– Faloutsos, Oard
- 1995
|
|
68
|
Information Distribution Aspects of Design Methodology
– Parnas
- 1971
|
|
64
|
Large scale singular value computations
– Berry
- 1992
|
|
59
|
Using clustering algorithms in legacy systems remodularization
– Wiggerts
- 1997
|
|
53
|
The Programmer's Apprentice: A research overview
– Rich, Waters
- 1988
|
|
46
|
Software reuse through information retrieval
– Frakes, Nejmeh
- 1987
|
|
37
|
Extracting concepts from file names: a new file clustering criterion
– Anquetil, Lethbridge
- 1998
|
|
33
|
A unified framework for expressing software subsystems classification techniques
– Lakhotia
- 1997
|
|
30
|
Full text indexing based on lexical relations, an application: Software libraries
– Maarek, Smadja
- 1989
|
|
28
|
Specification-based browsing of software component libraries
– Fischer
- 1998
|
|
24
|
Experiments with clustering as a software remodularization method
– Anquetil, Fourrier, et al.
- 1998
|
|
23
|
Source code informal information analysis using connectionnist models
– Merlo, McAdam, et al.
- 1993
|
|
22
|
Automatically identifying reusable OO legacy code
– Etzkorn, Davis
- 1997
|
|
18
|
Assessing Software Libraries by Browsing Similar Classes, Functions and Relationships
– Michail, Notkin
- 1999
|
|
17
|
Linear Algebra and Its Applications, 2nd ed
– Strang
- 1980
|
|
15
|
Using Latent Semantic Analysis to Identify Similarities in Source Code to Support Program Understanding
– Maletic, Marcus
- 2000
|
|
14
|
Experiments in identifying reusable abstract data types in program code
– Canfora, Cimitile, et al.
- 1993
|
|
14
|
Comparison of abstract data type and abstract state encapsulation detection techniques for architectural understanding
– Girard, Koschke, et al.
- 1997
|
|
14
|
A metric-based approach to detect abstract data types and abstract state encapsulation
– Girard, Koschke, et al.
- 1999
|
|
14
|
A Toolset for Program Understanding
– Livadas, Alden
- 1993
|
|
14
|
Automatic Software Clustering via Latent Semantic Analysis
– Maletic, Valluri
- 1999
|
|
14
|
Recovering reusable components from legacy systems by program segmentation
– Ning, Engberts, et al.
- 1993
|
|
10
|
A comparison of graphs of concept for reverse engineering
– Anquetil
- 2000
|
|
9
|
An Approach to Program Understanding by Natural Language Understanding
– Etzkorn, Bowen, et al.
- 1999
|
|
6
|
A Comparison of Abstract Data Type and Objects Recovery Techniques
– Girard, Koschke
- 1999
|
|
5
|
Mosaic Source Code v2.7b5," NCSA, ftp site, Date Accessed: 4/12/2000, ftp://ftp.ncsa.uiuc.edu/Mosaic/Unix/source
– Mosaic
- 1996
|
|
5
|
Plans in Program Design and Understanding
– Rist
- 1992
|
|
4
|
The LEDA Manual Version R-3.7
– LEDA
- 1998
|
|
4
|
A Tool to Support Knowledge Based Software Maintenance: The Software Service Bay
– Maletic, Reynolds
|
|
2
|
A Knowledge-Based Approach to the Analysis of Loops
– El, H, et al.
- 1996
|
|
2
|
Knowledge-Based Program Anaylsis
– Harandi, Ning
- 1990
|