Results 1 - 10
of
27
Recovering traceability links between code and documentation
- IEEE Trans. Softw. Eng
, 2002
"... Abstract—Software system documentation is almost always expressed informally in natural language and free text. Examples include requirement specifications, design documents, manual pages, system development journals, error logs, and related maintenance reports. We propose a method based on informat ..."
Abstract
-
Cited by 140 (15 self)
- Add to MetaCart
Abstract—Software system documentation is almost always expressed informally in natural language and free text. Examples include requirement specifications, design documents, manual pages, system development journals, error logs, and related maintenance reports. We propose a method based on information retrieval to recover traceability links between source code and free text documents. A premise of our work is that programmers use meaningful names for program items, such as functions, variables, types, classes, and methods. We believe that the application-domain knowledge that programmers process when writing the code is often captured by the mnemonics for identifiers; therefore, the analysis of these mnemonics can help to associate high-level concepts with program concepts and vice-versa. We apply both a probabilistic and a vector space information retrieval model in two case studies to trace C++ source code onto manual pages and Java code to functional requirements. We compare the results of applying the two models, discuss the benefits and limitations, and describe directions for improvements.
Convergence-Zone Episodic Memory: Analysis and Simulations
- NEURAL NETWORKS
, 1997
"... Human episodic memory provides a seemingly unlimited storage for everyday experiences, and a retrieval system that allows us to access the experiences with partial activation of their components. The system is believed to consist of a fast, temporary storage in the hippocampus, and a slow, longterm ..."
Abstract
-
Cited by 18 (0 self)
- Add to MetaCart
Human episodic memory provides a seemingly unlimited storage for everyday experiences, and a retrieval system that allows us to access the experiences with partial activation of their components. The system is believed to consist of a fast, temporary storage in the hippocampus, and a slow, longterm storage within the neocortex. This paper presents a neural network model of the hippocampal episodic memory inspired by Damasio's idea of Convergence Zones. The model consists of a layer of perceptual feature maps and a binding layer. A perceptual feature pattern is coarse coded in the binding layer, and stored on the weights between layers. A partial activation of the stored features activates the binding pattern, which in turn reactivates the entire stored pattern. For many configurations of the model, a theoretical lower bound for the memory capacity can be derived, and it can be an order of magnitude or higher than the number of all units in the model, and several orders of magnitude higher than the number of binding-layer units. Computational simulations further indicate that the average capacity is an order of magnitude larger than the theoretical lower bound, and making the connectivity between layers sparser causes an even further increase in capacity. Simulations also show that if more descriptive binding patterns are used, the errors tend to be more plausible (patterns are confused with other similar patterns), with a slight cost in capacity. The convergence-zone episodic memory therefore accounts for the immediate storage and associative retrieval capability and large capacity of the hippocampal memory, and shows why the memory encoding areas can be much smaller than the perceptual maps, consist of rather coarse computational units, and be only sparsely connected t...
Representation through Legislative Redistricting: A Stochastic Model
- American Journal of Political Science
, 1989
"... This paper builds a stochastic model of the processes that give rise to observed patterns of representation and bias in congressional and state legislative elections. The analysis demonstrates that partisan swing and incumbency voting, concepts from the congressional elections literature, have deter ..."
Abstract
-
Cited by 12 (4 self)
- Add to MetaCart
This paper builds a stochastic model of the processes that give rise to observed patterns of representation and bias in congressional and state legislative elections. The analysis demonstrates that partisan swing and incumbency voting, concepts from the congressional elections literature, have determinate effects on representation and bias, concepts from the redistrictihg literature. The model shows precisely how incumbency and increased variability of partisan swing reduce the responsiveness of the electoral system and how partisan swing affects whether the system is biased toward one party or the other. Incumbency, and other causes of unresponsive representation, also reduce the effect of partisan swing on current levels of partisan bias. By relaxing the restrictive portions of the widely applied "uniform partisan swing " assumption, the theoretical analysis leads directly to an empirical model enabling one more reliably to estimate responsiveness and bias from a single year of electoral data. Applying this to data from seven elections in each of six states, the paper demonstrates that redistricting has effects in predicted directions in the short run: partisan gerrymandering biases the system in favor of the party in control and, by freeing up seats held by opposition party incumbents, increases the system's responsiveness. Bipartisan-controlled redistricting appears to reduce bias somewhat and dramatically to reduce responsiveness. Nonpartisan redistricting processes substantially increase responsiveness but do not have as clear an effect on bias. However, after only two elections, prima facie evidence for redistricting effects evaporate in most states. Finally, across every state and type of redistricting process, responsiveness declined significantly over the course of the decade. This is clear evidence that the phenomenon of "vanishing marginals, " recognized first in the U.S. Congress literature, also applies to these different types of state legislative assemblies. It also strongly suggests that redistricting could not account for this pattern. 1.
An approach to classify software maintenance requests
- In Proc., International Conference on Software Maintenance (ICSM
, 2002
"... When a software system critical for an organization exhibits a problem during its operation, it is relevant to fix it in a short period of time, to avoid serious economical losses. The problem is therefore noticed to the organization having in charge the maintenance, and it should be correctly and q ..."
Abstract
-
Cited by 11 (1 self)
- Add to MetaCart
When a software system critical for an organization exhibits a problem during its operation, it is relevant to fix it in a short period of time, to avoid serious economical losses. The problem is therefore noticed to the organization having in charge the maintenance, and it should be correctly and quickly dispatched to the right maintenance team. We propose to automatically classify incoming maintenance requests (also said tickets), routing them to specialized maintenance teams. The final goal is to develop a router, working around the clock, that, without human intervention, dispatches incoming tickets with the lowest misclassification error, measured with respect to a given routing policy. 6000 maintenance tickets from a large, multi-site, software system, spanning about two years of system in-field operation, were used to compare and assess the accuracy of different classification approaches (i.e., Vector Space model, Bayesian model, support vectors, classification trees and k-nearest neighbor classification). The application and the tickets were divided into eight areas and pre-classified by human experts. Preliminary results were encouraging, up to 84 % of the incoming tickets were correctly classified.
Population Flow on Fitness Landscapes
, 1994
"... Contents 1 Introduction 1 1.1 The goal of this thesis : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 2 1.2 The outline of the thesis : : : : : : : : : : : : : : : : : : : : : : : : : : : : 3 1.3 Acknowledgements : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 4 2 Fitness L ..."
Abstract
-
Cited by 6 (2 self)
- Add to MetaCart
Contents 1 Introduction 1 1.1 The goal of this thesis : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 2 1.2 The outline of the thesis : : : : : : : : : : : : : : : : : : : : : : : : : : : : 3 1.3 Acknowledgements : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 4 2 Fitness Landscapes 5 2.1 The concept of fitness : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 5 2.1.1 Fitness in biology : : : : : : : : : : : : : : : : : : : : : : : : : : : : 5 2.1.2 Fitness in problem solving : : : : : : : : : : : : : : : : : : : : : : : 6 2.1.3 The fitness function : : : : : : : : : : : : : : : : : : : : : : : : : : : 7 2.2 Fitness landscapes : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 8 2.2.1 Bit strings and Hamming distance : : : : : : : : : : : : : : : : : : : 8 2.2.2 The
The motivation for hedging revisited
- Journal of Futures Markets
"... This article develops an alternative view on the motivation to hedge. A conceptual model shows how hedging facilitates contract relationships between firms and can solve conflicts between firms. In this model, the contract preferences, level of power, and conflicts in contractual relationships of fi ..."
Abstract
-
Cited by 4 (2 self)
- Add to MetaCart
This article develops an alternative view on the motivation to hedge. A conceptual model shows how hedging facilitates contract relationships between firms and can solve conflicts between firms. In this model, the contract preferences, level of power, and conflicts in contractual relationships of firms are driving the usage of futures contracts. The model shows how using futures markets can provide a jointly preferred contracting arrangement, enhancing relationships between firms. The robust nature of the conceptual model is empirically examined through a computer-guided study of various firms.
Visualizing Summary Statistics and Uncertainty
"... The graphical depiction of uncertainty information is emerging as a problem of great importance. Scientific data sets are not considered complete without indications of error, accuracy, or levels of confidence. The visual portrayal of this information is a challenging task. This work takes inspirati ..."
Abstract
-
Cited by 4 (1 self)
- Add to MetaCart
The graphical depiction of uncertainty information is emerging as a problem of great importance. Scientific data sets are not considered complete without indications of error, accuracy, or levels of confidence. The visual portrayal of this information is a challenging task. This work takes inspiration from graphical data analysis to create visual representations that show not only the data value, but also important characteristics of the data including uncertainty. The canonical box plot is reexamined and a new hybrid summary plot is presented that incorporates a collection of descriptive statistics to highlight salient features of the data. Additionally, we present an extension of the summary plot to two dimensional distributions. Finally, a use-case of these new plots is presented, demonstrating their ability to present high-level overviews as well as detailed insight into the salient features of the underlying data distribution. Categories and Subject Descriptors (according to ACM CCS): I.3.6 [Computer Graphics]: Methodology and Techniques 1.
Calculating Conditional Core Damage Probabilities for Nuclear Power
- Plant Operations,” Reliability Engineering and System Safety
, 1998
"... A part of managing nuclear power plant operations is the control of plant risk over time as components are taken out of service or plant upsets are caused by initiating events. Unfortunately, measuring risk over time proves to be challenging, even with modern PRAs and PRA tools. In general, the proc ..."
Abstract
-
Cited by 3 (3 self)
- Add to MetaCart
A part of managing nuclear power plant operations is the control of plant risk over time as components are taken out of service or plant upsets are caused by initiating events. Unfortunately, measuring risk over time proves to be challenging, even with modern PRAs and PRA tools. In general, the process of measuring the operational risk would satisfy three desires: (1) the measurement would provide the risk magnitude for a particular event or over a period of time; (2) the risk results could be summed for a period of time to obtain a cumulative risk profile; and (3) the measurement process would be tractable while still using the current modeling techniques and tools. This paper demonstrates the calculation of the conditional core damage probability (CCDP) for the two cases of component outages and initiating events. In addition, two potential complications were identified that must be addressed when performing a CCDP calculation. The first complication, determining the appropriate nonrecovery probabilities to be applied to an inoperable component or initiating event, addresses the possibility of the plant operators preventing damage to the plant from their actions. The second complication, adjusting common-cause probabilities specific to the plant configuration, accounts for the fact that the PRA common-cause probabilities built into the model are applicable only during nominal conditions. The examples presented in the paper illustrate the potential underestimation in CCDP when modifications to common-cause probabilities are ignored. These underestimation errors ranged from a factor of two to over a factor of six underestimation in CCDP. KEYWORDS risk analysis, core damage frequency, core damage probability, risk monitor, commoncause
Measuring an IP network in situ
, 2005
"... The Internet, and IP networking in general, have become vital to the scientific community and the global economy. This growth has increased the importance of measuring and monitoring the Internet to ensure that it runs smoothly and to aid the design of future protocols and networks. To simplify ne ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
The Internet, and IP networking in general, have become vital to the scientific community and the global economy. This growth has increased the importance of measuring and monitoring the Internet to ensure that it runs smoothly and to aid the design of future protocols and networks. To simplify network growth, IP networking is designed to be decentralized. This means that each router and each network needs and has only limited information about the Internet. One disadvantage of this design is that measurement systems are required in order to determine the behavior of the Internet as a whole. This thesis explores ways to measure five different aspects of the Internet. The first aspect considered is the Internet’s topology, the inter-connectivity of the Internet. This is one of the basic questions about the Internet: what hosts are on the Internet and how are they connected? The second aspect is routing: what are the routing decisions made by routers for a particular destination? The third aspect is locating the source of a denial-of-service (DoS) attack. DoS
Feature subset selection bias for classification learning
- In Proceedings of the 23rd International Conference on Machine Learning
, 2006
"... Feature selection is often applied to highdimensional data prior to classification learning. Using the same training dataset in both selection and learning can result in socalled feature subset selection bias. This bias putatively can exacerbate data overfitting and negatively affect classification ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
Feature selection is often applied to highdimensional data prior to classification learning. Using the same training dataset in both selection and learning can result in socalled feature subset selection bias. This bias putatively can exacerbate data overfitting and negatively affect classification performance. However, in current practice separate datasets are seldom employed for selection and learning, because dividing the training data into two datasets for feature selection and classifier learning respectively reduces the amount of data that can be used in either task. This work attempts to address this dilemma. We formalize selection bias for classification learning, analyze its statistical properties, and study factors that affect selection bias, as well as how the bias impacts classification learning via various experiments. This research endeavors to provide illustration and explanation why the bias may not cause negative impact in classification as much as expected in regression. 1.

