FREE ELECTRONIC LIBRARY - Books, dissertations, abstract

Pages:   || 2 | 3 | 4 |

«Applications of data mining in software engineering Quinn Taylor* Department of Computer Science, Brigham Young University, Provo, UT 84602, USA ...»

-- [ Page 1 ] --

Int. J. Data Analysis Techniques and Strategies, Vol. 2, No. 3, 2010 243

Applications of data mining in software engineering

Quinn Taylor*

Department of Computer Science,

Brigham Young University,

Provo, UT 84602, USA

E-mail: quinntaylor@byu.edu

*Corresponding author

Christophe Giraud-Carrier

Department of Computer Science,

Brigham Young University,

Provo, UT 84602, USA

E-mail: cgc@cs.byu.edu

Abstract: Software engineering processes are complex, and the related activities often produce a large number and variety of artefacts, making them well-suited to data mining. Recent years have seen an increase in the use of data mining techniques on such artefacts with the goal of analysing and improving software processes for a given organisation or project. After a brief survey of current uses, we offer insight into how data mining can make a significant contribution to the success of current software engineering efforts.

Keywords: data mining; software engineering; applications.

Reference to this paper should be made as follows: Taylor, Q.

and Giraud-Carrier, C. (2010) ‘Applications of data mining in software engineering’, Int. J. Data Analysis Techniques and Strategies, Vol. 2, No. 3, pp.243–257.

Biographical notes: Quinn Taylor is a student in the MS degree programme in Computer Science at Brigham Young University and a Researcher in the SEQuOIA Lab where he focuses on understanding and visualising software structure and development processes, including through the use of data mining techniques. His research interests include software architectures, software evolution, code maintenance and decay, software reverse engineering and refactoring.

Christophe Giraud-Carrier is an Associate Professor and the Director of the Data Mining Laboratory in the Department of Computer Science at Brigham Young University. His research interests include metalearning, social network analysis, medical informatics and applications of data mining. He received his BS, MS and PhD in Computer Science at BYU in 1991, 1993 and 1994, respectively.

Copyright ⃝ 2010 Inderscience Enterprises Ltd.

c 244 Q. Taylor and C. Giraud-Carrier 1 Introduction Software systems are inherently complex and difficult to conceptualise. This complexity, compounded by intricate dependencies and disparate programming paradigms, slows development and maintenance activities, leads to faults and defects and ultimately increases the cost of software. Most software development organisations develop some sort of processes to manage software development activities. However, as in most other areas of business, software processes are often based only on hunches or anecdotal experience, rather than on empirical data.

Consequently, many organisations are ‘flying blind’ without fully understanding the impact of their process on the quality of the software that they produce. This is generally not due to apathy about quality, but rather to the difficulty inherent in discovery and measurement. Software quality is not simply a function of lines of code, bug count, number of developers, man-hours, money or previous experience – although it involves all those things – and it is never the same for any two organisations.

Software metrics have long been a standard tool for assessing quality of software systems and the processes that produce them. However, there are pitfalls associated with the use of metrics. Managers often rely on metrics that they can easily obtain and understand which may be worse than using no metrics at all. Metrics can seem interesting, yet be uninformative, irrelevant, invalid or not actionable. Truly valuable metrics may be unavailable or difficult to obtain. Metrics can be difficult to conceptualise and changes in metrics can appear unrelated to changes in process.

Alternatively, software engineering activities generate a vast amount of data that, if harnessed properly through data mining techniques, can help provide insight into many parts of software development processes. Although many processes are domain – and organisation – specific, there are many common tasks which can benefit from such insight, and many common types of data which can be mined. Our purpose here is to bring software engineering to the attention of our community as an attractive testbed for data mining applications and to show how data mining can significantly contribute to software engineering research.

The paper is organised as follows. In Section 2, we briefly discuss related work, pointing to surveys and venues dedicated to recent applications of data mining to software engineering. Section 3 describes the sources of software data available for mining and Section 4 provides a brief, but broad, survey of current practices in this domain. Section 5 discusses issues specific to mining software engineering data and prerequisites for success. Finally, Section 6 concludes the paper.

2 Related work

Although the application of data mining to software engineering artefacts is relatively new, there are specific venues in which related papers are published and authors that have created resources similar to this survey.

Perhaps the earliest survey of the use of data mining in software engineering is the 1999 Data and Analysis Center for Software (DACS) state-of-the-art report (Mendonca and Sunderhaft, 1999). It consists of a thorough survey of data mining techniques, with emphasis on applications to software engineering, including a list of 55 data mining products with detailed descriptions of each product and summary information along a number of technical as well as process-dependent features.

Applications of data mining in software engineering 245 Since then, and over the years, Xie (2010) has been compiling and maintaining an (almost exhaustive) online bibliography on mining software engineering data. He also presented tutorials on that subject at the International Conference on Knowledge Discovery in Databases in 2006 and at the International Conference on Software Engineering in 2007, 2008 and 2009 (e.g., see Xie et al., 2007). Many of the publications we cite here are also included in Xie’s bibliography and tutorials.

The Mining Software Repositories (MSR) Workshop, co-located with the International Conference on Software Engineering, was originally established in 2004.

Papers published in MSR focus on many of the same issues we have discussed in this survey and the goal of the workshops is to increase understanding of software development practices through data mining. Beyond tools and applications, topics include assessment of mining quality, models and meta-models, exchange formats, replicability and reusability, data integration and visualisation techniques.

Finally, Kagdi et al. (2007) have recently published a comprehensive survey of approaches for MSR in the context of software evolution. Although their survey is narrower in scope than the overview given here, it has greater depth of analysis, presents a detailed taxonomy of software evolution data mining methodologies and identifies a number of related research issues that require further investigation.

3 Software engineering data

The first step in the knowledge discovery process is to gain understanding about the data that is available and the business goals that drive the process. This is essential for software engineering data mining endeavours, because unavailability of data for mining is a factor that limits the questions which can be effectively answered.

In this section, we describe software engineering data that are available for data mining and analysis. Current software development processes involve several types of resources from which software-related artefacts can be obtained. Software ‘artefacts’ are a product of software development processes. Artefacts are generally lossy and thus cannot provide a full history or context, but they can help piece together understanding and provide further insight. There are many data sources in software engineering. In this paper, we focus only on four major groups and describe how they may be used for mining software engineering data.

First, the vast majority of collaborative software development organisations utilise revision control software1 (e.g., CVS, Subversion, Git, etc.) to manage the ongoing development of digital assets that may be worked on by a team of people. Such systems maintain a historical record of each revision and allow users to access and revert to previous versions. By extension, this provides a way to analyse historical artefacts produced during software development, such as number of lines written, authors which wrote particular lines or any number of common software metrics.

Second, most large organisations (and many smaller ones) also use a system for tracking software defects. Bug tracking software (such as Bugzilla, JIRA, FogBugz, etc.) associates bugs with meta-information (status, assignee, comments, dates and milestones, etc.) that can be mined to discover patterns in software development processes, including the time-to-fix, defect-prone components, problematic authors, etc.

Some bug trackers are able to correlate defects with source code in a revision system.

246 Q. Taylor and C. Giraud-Carrier Third, virtually all software development teams use some form of electronic communication (e-mail, instant messaging, etc.) as part of collaborative development (communication in small teams may be primarily or exclusively verbal, but such cases are inconsequential from a data mining perspective). Text mining techniques can be applied to archives of such communication to gain insight into development processes, bugs and design decisions.

Fourth, software documentation and knowledge bases can be mined to provide further insight into software development processes. This approach is useful to organisations that use the same processes across multiple projects and want to examine a process in terms of overall effectiveness or fitness for a given project. Although knowledge bases may contain source code, this approach focuses primarily on retrieval of information from natural languages.

4 Mining software engineering data: a brief survey

In this section, we give a technique-oriented overview of how traditional data mining techniques have been applied in the context of software engineering, followed by a more task-oriented view in which we show how software tasks in three broad groups can benefit from data mining.

4.1 Data mining techniques in software engineering

In this section, we discuss several data mining techniques and provide examples of ways they have been applied to software engineering data. Many of these techniques may be applied to software process improvement. We attempt to emphasise innovative and promising approaches and how they can benefit software organisations.

4.1.1 Association rules and frequent patterns Zimmermann et al. (2005) have developed the Reengineering of Software Evolution (ROSE) tool to help guide programmers in performing maintenance tasks. The goals of

ROSE are to:

1 suggest and predict likely changes 2 prevent errors due to incomplete changes 3 detect coupling undetectable by program analysis.

Similar to Amazon’s system for recommending related items, they aim to provide guidance akin to “programmers who changed these functions also changed... ”. They use association rules to distinguish between change types in CVS and try to predict the most likely classification of a change-in-progress.

Livshits and Zimmermann (2005) collaborated to create DynaMine, an automated tool that analyses code check-ins to discover application-specific coding patterns and identify violations which are likely to be errors. Their approach is based on a classic a priori algorithm, combined with pattern categorisation and dynamic analysis. Their tool has been able to detect previously unseen patterns and several pattern violations in studies of the Eclipse and jEdit projects.

Pages:   || 2 | 3 | 4 |

Similar works:

«Ermächtigt und notifiziert g em äß Arti kel 10 der Richtlinie des Rates vom 21. Dezember 1988 zur Angleichung der Rechtsund Verwaltungsvorschriften d er M i t g li e d s t aa t en über Bauprodukte (89/106/EWG) Europäische Technische Zulassung ETA-02/0002 Handelsbezeichnung MKT Bolzenanker BZ-IG Trade name MKT Wedge Anchor BZ-IG Zulassungsinhaber MKT Holder of approval Metall-Kunststoff-Technik GmbH & Co. KG Auf dem Immel 2 67685 Weilerbach Zulassungsgegenstand Kraftkontrolliert spreizender...»

«Transportation Asset Management Case Studies Presented by U.S. Department of Transportation Federal Highway Administration BRIDGE MANAGEMENT Practices in Idaho, Michigan and Virginia Cover: Perrine Bridge, courtesy of Idaho Transportation Department Note from the Director The U.S. Department of Transportation (DOT) Office of Inspector General (OIG) recommended in January 2009 that the Federal Highway Administration (FHWA) evaluate the State Departments of Transportation’s implementation and...»

«TECHNISCHE UNIVERSITÄT MÜNCHEN Fachgebiet Biogene Polymere Nanostrukturierte hierarchische Materialien durch Biotemplatierung Daniel Van Opdenbosch Vollständiger Abdruck der von der Fakultät Wissenschaftszentrum Weihenstephan für Ernährung, Landnutzung und Umwelt der Technischen Universität München zur Erlangung des akademischen Grades eines Doktors der Naturwissenschaften genehmigten Dissertation Vorsitzender Univ.-Prof. Dr. Volker Sieber Prüfer der Dissertation 1. Univ.-Prof. Dr....»

«Using Product Data Management Systems for Civil Engineering Projects – Potentials and Obstacles André Borrmann1, Markus Schorr2, Mathias Obergriesser3, Yang Ji1, I-Chen Wu1, Willibald Günthner2, Thomas Euringer3, Ernst Rank1 Computation in Engineering, Technische Universität München, Germany Materials Handling, Material Flow and Logistics, Technische Universität München, Germany Bauinformatik, Hochschule Regensburg, Germany Abstract Product data management (PDM) systems are well...»

«557 CLOTHING AND INDIGNATIO AT JUV. 5.141–45 SATIRE IN GREEN: MARKED CLOTHING AND THE TECHNIQUE OF INDIGNATIO AT JUVENAL 5.141–45 MARIANNE HOPMAN Abstract. At Juvenal 5.141–45, Virro distributes a curious series of presents to the children of his impoverished client Trebius: a viridis thorax, nuts, and an as. Through an exploration of the connotations attached to these gifts, I argue that the scene provides a vivid mise en abyme for the rest of the poem. Just as the dinner offered to...»

«Materials Bibliography Compiled by Robert C. Worrest Abdel-Bary, E. M., Abdel-Razik, E. A., Abdelaal, M. Y., & El-Sherbiny, I. M. (2005). Stability of polypropylene blends under the effect of thermal and UV degradation. Polymer-Plastics Technology and Engineering, 44(5), 847-862. Abel, M. L., & Coppitters, C. (2008). Conservation of polymers: A view to the future. Surface and Interface Analyses, 40, 445-449. Abu Bakr, A., Hassan, A., & Yusof, A. F. M. (2005). Effect of accelerated weathering on...»

«Reaktionstechnische Untersuchungen zur Ethylenherstellung aus wässrigen Ethanollösungen (Fermenterausträgen) Eine nachhaltige Alternative zum Steam-Cracker Abschlussbericht für die Max-Buchner-Forschungsstiftung Januar 2011 Kennziffer 2783 Gökhan Aras (1. Förderperiode) Heiner Busch (2. Förderperiode) Prof. Dr.-Ing. Herbert Vogel Ernst-Berl-Institut für Technische und Makromolekulare Chemie Technische Universität Darmstadt Petersenstraße 20, 64287 Darmstadt 1. Abstract Die...»

«TECHNISCHE UNIVERSITÄT MÜNCHEN Lehrstuhl für Biotechnologie Studying Cellular Protein Folding in Nematodes and Baker’s Yeast Christoph J. O. Kaiser Vollständiger Abdruck der von der Fakultät für Chemie der Technischen Universität München zur Erlangung des akademischen Grades Doktor der Naturwissenschaften (Dr. rer. nat.) genehmigten Dissertation. Vorsitzender : Univ.-Prof. Dr. A. Itzen Prüfer der Dissertation: 1. Univ.-Prof. Dr. J. Buchner 2. Univ.-Prof. Dr. S. Weinkauf 3....»

«  Phosphoreinträge durch Erosion in Sachsen Schriftenreihe, Heft 11/2012 Abschätzung der erosionsbedingten Nährstoffeinträge in Oberflächenwasserkörper nach EU-WRRL in Sachsen mit Hilfe des Modells EROSION 3D Marcus Schindewolf Projektleitung: Dr. Walter Schmidt Schriftenreihe des LfULG, Heft 11/2012 | 2 Inhaltsverzeichnis 1 Motivation und Zielstellung 1.1 Wissenschaftliche Arbeitssziele 1.2 Technische Arbeitsziele 1.3 Ergebnisverwertung 2 Material und Methoden 2.1 Untersuchungsgebiet...»

«You can read the recommendations in the user guide, the technical guide or the installation guide for JVC CH-X300. You'll find the answers to all your questions on the JVC CH-X300 in the user manual (information, specifications, safety advice, size, accessories, etc.). Detailed instructions for use are in the User's Guide. User manual JVC CH-X300 User guide JVC CH-X300 Operating instructions JVC CH-X300 Instructions for use JVC CH-X300 Instruction manual JVC CH-X300 You're reading an excerpt....»

<<  HOME   |    CONTACTS
2016 www.book.dislib.info - Free e-library - Books, dissertations, abstract

Materials of this site are available for review, all rights belong to their respective owners.
If you do not agree with the fact that your material is placed on this site, please, email us, we will within 1-2 business days delete him.