| R. Selby and A. Porter. Learning from examples: Generation and evaluation of decision trees for software resource analysis. IEEE Transactions on Software Engineering, 14(12):1743--1757, 1988. |
....and their Application Classification trees have been used frequently in previous software engineering research. For example, classification trees have been used to classify modules as fault prone or not fault prone [9] and to predict components for which development e#ort is likely to be high [14, 20]. In our context, we use classification trees to predict whether a certain testing scenario facilitates the use of a prioritization technique by measuring program, test suite, and change characteristics that hold for that particular scenario. Classification trees can help with this for several ....
R. Selby and A. Porter. Learning from examples: Generation and evaluation of decision trees for software resource analysis. IEEE Transactions on Software Engineering, 14(12):1743--1757, 1988.
....model as a set of testable conditions on potential measures in order to determine feasible measures. We then provide evaluation functions for choosing among the feasible measures. Finally we apply this selection process to Selby and Porter s empirically guided software development environment [16] to show improvement on the predictive behavior of this system by selecting measures based on our criteria. 1.1 Model Overview In our model of axiomatic program complexity qualitative comparisons are made through the use of complexity rankings, which define pair wise relationships on programs. ....
....Step 2 corresponds to the selection of Metrics in GQM. Steps 3 and 4 provide the important feedback needed to improve the process. 4. 1 Classification trees The following is an example of measure selection [19] for Classification Tree Analysis (CTA) based upon an earlier study by Selby and Porter [16] where complexity measures are used to construct classification trees of modules to identify critical (in terms of cost and faults) components in a software development environment. Our study consisted of the following: Environment: Sixteen software systems, ranging from 3000 to 112,000 lines of ....
[Article contains additional citation context not shown here]
R.W. Selby and A. A. Porter, "Learning from examples: Generation and evaluation of decision trees for software resource analysis," IEEE Trans. Soft. Eng., Vol. 14, No. 12, pp 1743-1757, Dec. 1988.
....class. For instance, Logistic Regression has been used in software measurement to predict whether a software module is faulty based on the values of measures collected on the module. Machine learning based techniques have been borrowed from artificial intelligence, e.g. classification trees [35]. Other machine learning techniques, e.g. Optimized Set Reduction [36] have been defined in the context of software measurement. These techniques divide a set of objects into subsets in a stepwise fashion based on the values of measures for classification purposes. The goal is to obtain subsets ....
R. W. Selby and A. A. Porter, Learning from examples: generation and evaluation of decision trees for software resource analysis," IEEE Trans. Software Eng. 14 (1988) 1743-1757.
....application domains or processes. Rather than nding a general solution to the problem, these works aim at investigating how to nd speci c solutions based on the available domain knowledge. To this end, many methods have been explored, based on machine learning principles such as decision trees [SP88, PS90] or neural networks [KLP94] probabilistic approaches suchasBayesian Belief Networks [FN99] statistical techniques such as discriminant analysis [MK92] and regression [MR00] or mixed techniques such as optimized set reduction [BBT92] Some of the proposed methods give only a discrete ....
....in the experiments described in [PCM96, FI98, KPR00] and thus it is a well recognized benchmark for scienti c experiments. The dataset contains 136 modules, of which 109 do not contain faults and 27 contain at least one fault. The term modules is used to refer to subprograms as already in [FI98, SP88, PS90] The total number of documented faults is 39. For each module in the dataset we considered 33 5 di erent software metrics collected with commercial and prototype tools. Appendix A lists these metrics for the sake of completeness. In the carried data analysis, we built models that ....
[Article contains additional citation context not shown here]
Richard W. Selby and Adam A. Porter. Learning from examples: Generation and evaluation of decision trees for software resource analysis. IEEE Transactions on Software Engineering, 14(12):1743-1757, December 1988. Special Issue on Articial Intelligence in Software Applications.
....of DT and ANN based estimation systems is that they are adaptable and nonparametric. The result reported in [3] indicates that the improved predictive performance can be obtained through the use of Bayesian analysis. Additional research on ML based software effort estimation can be found in [2,14,15,16]. Software defect prediction Bayesian belief networks are used in [4] to predict software defects. Though the system reported is only a prototype, it shows the potential BBN has in incorporating multiple perspectives on defect prediction into a single, unified model. Variables in the prototype ....
R. Selby and A. Porter, "Learning from examples: generation and evaluation of decision trees for software resource analysis," IEEE Trans. SE, Vol. 14, 1988, pp.1743-1757.
....[3] An advantage of using Clementine is that it handles issues such as tree pruning automatically pruning is important to prevent the rule tree over adapting to the training set and being unable to effectively generalise to new problems. Examples of using RI to develop prediction systems include [5, 19]. Case based Reasoning: Lastly we also used our inhouse developed CBR shell, ANGEL [21] CBR is considered to be fundamentally different from rule induction and regression approaches in that it utilises specific knowledge of previously solved cases to solve future ones, while the former use ....
Selby, R.W. and A.A. Porter, 'Learning from examples: generation and evaluation ofr decision trees for software resource analysis', IEEE Trans. on Softw. Eng., 14(12), pp743-757, 1988.
....to contain a high density of defects. This work was supported in part by NASA grant NSG 5123 Also with The MITRE Corp. McLean, VA. Process improvement in terms of the prediction of defects in the delivered product is one area that has received a significant amount of attention recently [SP88, MK92]. Recent studies have focused on the identification of problem areas during the design phase, noting that the software architecture is a major factor in the number of errors and rework effort found in later phases [HK81, ROM87, CA88, AES90] If such potential problem areas can be detected during ....
R. Selby and A. Porter, "Learning from Examples: Generation and Evaluation of Decision Trees for Software Resource Analysis", IEEE Trans. Software Eng., 14 (12), December, 1988.
....characteristics that have a visible and significant impact on effectiveness. Such partition algorithms are based on the multivariate analysis of historical data describing the contexts in which the experience package was applied and the effectiveness of its use. For example, Classification Trees [SP88] or Optimized Set Reduction [BBH93, BBT92] are possible techniques. The measured effectiveness is the dependent variable and the explanatory variables of the domain prediction model are derived from the collection of project characteristics (Table 2) possibly influencing the reuse of the ....
R. Selby and A. Porter, "Learning from Examples: Generation and Evaluation of Decision Trees for Software Resource Analysis", IEEE Trans. Software Eng., 14 (12), December, 1988.
.... Logistic regression, which has been included in empirical comparisons between models identifying highrisk components [5] 6] Logical classification models, which have been extensively used in software engineering issues, such as the identification of high risk modules [5] 6] 23] 24] [30], or the detection of reusable software components [9] Layered neural networks, which have been already applied in software engineering applications to build reliability growth models [15] 16] predict the gross change [18] or reusability metrics [4] Holographic networks, a ....
R. W. Selby, and A. A. Porter, "Learning from examples: generation and evaluation of decision trees for software resource analysis", IEEE Transactions on Software Engineering, vol.14, no.12, December 1988, pp.1743-1757.
.... has been used for modeling to identify high risk components [3, 4] Principal component analysis has often been used to improve the accuracy of discriminant models [15, 19] or regression models [3, 4, 14] Logical classification models have been used extensively to identify high risk modules [3, 4, 20, 21, 27] and reusable software components [8] Layered neural networks have already been applied to building reliability growth models [11, 12] to predicting the gross change [16] and the degree of reuse [2] Holographic networks, a nonconnectionist type of neural network, have been proposed for ....
Selby, R. W., and Porter, A. A., Learning from examples: generation and evaluation of decision trees for software resource analysis, IEEE Trans. Software Eng., 14 (12), 17431757, December (1988).
....outliers which are also more difficult to detect in a n dimension space. In addition, interaction terms may be significant and will make the model even more difficult to interpret. This is in part to construct models that are easier to interpret and use that techniques such as classification trees [SP88] and Optimized Set Reduction [BBT93; J95] have been developed. Therefore, it is important to note that regression, although the most often used approach to multivariate software engineering model construction, is not the only alternative. Machine learning techniques, such as the ones mentioned ....
R. Selby and A. Porter: "Learning from Examples: Generation and Evaluation of Decision Trees for Software Resource Analysis", IEEE Trans. Software Eng., 14 (12), December, 1988.
....due to outliers which are more difficult to detect in this context. In addition, interaction terms may be significant and will make the model even more difficult to interpret. This is in part to construct models that are easier to interpret and use that techniques such as classification trees [SP88] and Optimized Set Reduction [BBT93; J95] have been developed. Therefore, it is important to note that regression, although the most often used approach to mutlivariate software engineering model construction, is not the only alternative. Machine learning techniques, such as the ones mentioned ....
R. Selby and A. Porter: "Learning from Examples: Generation and Evaluation of Decision Trees for Software Resource Analysis", IEEE Trans. Software Eng., 14 (12), December, 1988.
....technique based on maximum likelihood estimation, to analyze the relationships between measures and the fault proneness of classes. Logistic regression has already been used in several instances to predict error prone components [2] 4] Other classification techniques such as classification trees [17], Optimized Set Reduction [5] or neural networks [15] could have been used. However, our goal here is not to compare multivariate analysis techniques but, based on a suitable and standard technique, to validate empirically a set of product measures. We first used univariate logistic regression, ....
Selby and A. Porter. "Learning from Examples: Generation and Evaluation of Decision Trees for Software Resource Analysis.", IEEE TSE, 14(2): 1743-1747. 1988.
....distinct from the duration of the project. Such predictive models are usually developed using linear regression analysis on available historical data for sufficiently similar projects, although there has been increasing use of other techniques, most notably fuzzy logic models [5] regression trees [6], neural networks [7] and case based reasoning [8] A useful summary of these techniques and their application to software metric modeling can be found in [9] Predictions can then be made for new system development projects using the resulting model as the independent variables are estimated or ....
R.W. Selby and A.A. Porter. Learning from examples: generation and evaluation of decision trees for software resource analysis. IEEE Transactions on Software Engineering, 14:1743-1757, 1988.
....of these target classes. Classification trees are automatically generated using data from previous releases and projects. Generation of classification trees is based on a recursive algorithm that selects metrics that best differentiate between modules within a target class and those outside of it. [SP88]. A developer wishing to focus resources on high payoff areas might use several classification trees to support his or her analysis process. Figure 2 provides an overview of the methodology for generating and using classification trees. The metric based classification tree approach has several ....
....being limited to small scale applications. We have developed a methodology for integrating automatic generation of metricbased classification trees into large scale software development and evolution. This 3 methodology is based on lessons learned in two validation studies using data from NASA [SP88] and Hughes [SP89] A detailed discussion of this methodology appears in the following section. 3 Empirically Based Classification Methodology An overview of the classification methodology appears in Figure 2. The three central activities in the methodology are: i) data management and ....
[Article contains additional citation context not shown here]
Richard W. Selby and Adam A. Porter. Learning from examples: Generation and evaluation of decision trees for software resource analysis. IEEE Trans. Software Engr., 14(12):1743--1757, December 1988. 14
....software components according to their likelihood of having user specified properties such as high fault proneness or high development cost. We conducted a feasibility study using 16 NASA projects (3000 112,000 lines) to validate an initial version of a classification tree generation algorithm [SP88] On the average, the classification trees correctly identified 79.3 percent of the software components according to whether or not they were fault prone or effort prone. In [PS90] we outlined a methodology for using classification trees on large scale systems and described examples of how they ....
....subsequence (LWS k) analysis, LWS 3, LWS 5, and LWS 8, where k is the upper bound on the number of partitions. For description of the complete tree generation algorithm, example trees, discussion of our validation results, and explanation of the classification methodology and its application, see [SP88] and [PS90] Section 2 gives an overview of classification trees. Section 3 describes different methods for partitioning metric data values. Section 4 summarizes the NASA systems that are analyzed in this study. Section 5 outlines the comparative study undertaken to evaluate the partition ....
[Article contains additional citation context not shown here]
R. W. Selby and A. A. Porter. Learning from examples: Generation and evaluation of decision trees for software resource analysis. IEEE Trans. Software Engr., 14(12):1743--1757, December 1988.
No context found.
R. Selby and A. Porter. Learning from examples: Generation and evaluation of decision trees for software resource analysis. IEEE Transactions on Software Engineering, 14(12):1743--1757, 1988.
No context found.
Richard W. Selby and Adam A. Porter. Learning from examples: Generation and evaluation of decision trees for software resource analysis. IEEE Transactions on Software Engineering, 14(12):1743--1757, December 1988.
No context found.
R.W. Selby and A.A. Porter. Learning from examples: generation and evaluation of decision trees for software resource analysis. IEEE Transactions on Software Engineering, 14:1743-1757, 1988.
No context found.
Richard W. Selby and Adam A. Porter. Learning from examples: Generation and evaluation of decision trees for software resource analysis. IEEE Transactions on Software Engineering, 14(12):1743--1757, December 1988.
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC