In computer science, software engineering plays part in the design and analysis of projects for computer and other electronic gadgets. In software development processes, the work is part into particular stages with specific tasks in each, with the aim of improving planning and management. The most usually utilized procedures incorporate waterfall, prototyping, iterative and incremental development, spiral development, RAD, extreme programming and different sorts of agile procedure. While SDLC 'model' is a more general term for a category of procedures, a software development 'process' is regularly synonymous to a particular process picked by a particular organization. A variety of such systems have developed throughout the years, each with its own perceived strengths and weakness. One software development methodology system isn't really appropriate for use by all projects. Each of the accessible methodology frameworks are most appropriate to particular projects, based on various technical, organizational, project and team considerations. Such differentiating development ideal models and the complicated dependencies that they make increase the intricacy of software systems. This backs off development and maintenance, causes issues and absconds and in the end prompts an expansion in expense of the software. Organizations frequently neglect to see how their process impacts the quality of the software that they produce. This is for the most part because of the trouble intrinsic in discovery and measurements. In spite of the fact that product measurements have long been the accepted de-facto standard of software quality and development processes, their disadvantages are various. The over-dependence on measurements that can be effectively acquired and comprehended, utilization of measurements that appear to be interesting yet stay insignificant and uninformative and the trouble in getting significant measurements are but to name a couple. Data mining is characterized as the way toward finding previously obscure and possibly helpful data from data collections. Therefore, using data mining in software engineering with the point of software improvement has aroused the interest of researchers around the world.
Data Mining Techniques
Frequent Pattern Mining and Association Rules
Association rules are utilized to uncover interesting relations between variables in large datasets. These relations are represented in the form: A→B, where A and B are variables in the given dataset. Therefore, they can be utilized to find patterns that cause deformities of an extreme sort. The disclosure of such patterns can help a few choice making processes such as cross advertising, catalog design and loose-leader analysis. Algorithms, like a priori and FP-growth are generally utilized in the software engineering context for this reason.
Save your time!
We can take care of your essay
- Proper editing and formatting
- Free revision, title page, and bibliography
- Flexible prices and money-back guarantee
Place an order
Classification
A standout amongst the most broadly utilized data mining methods, classification find applications in stats, pattern recognition and machine learning. Classification is the way toward building a model by using which future data items can be gathered into a set of pre-characterized classes. Despite what might be expected, clustering procedure does not depend on predefined classes or models. Classification algorithms like decision trees, Bayesian classifiers and K-Nearest Neighbor can be utilized to do the said assignment adequately. If there should arise an occurrence of numerical information, regression models like linear regression, non-linear regression and logistic regression can be utilized.
Text Mining
Text mining alludes to the way toward inferring information from textual data. Around 80% of all software engineering data is in text format. In this way text mining can be adequately used to trace requirements, identify and predict software failures and code duplication. It tends to be consolidated with natural language processing to avoid the duplication of bug reports. Text mining mostly includes the change of raw textual data into an organized representation. It is basic that the raw data experiences a preprocessing stage before the use of text mining methods.
Data Summarization
From another viewpoint, text mining has been utilized in software engineering to approve the data from mailing lists, CVS logs, and change log documents of open-source software. In they made a bunch of tools, to be specific soft change, that executes data validation from the previously mentioned text sources of open-source software. Their tools retrieve, summarize and validate these kinds of data of open-source projects. Some portion of their analysis can check out the most active developers of an open-source project. The stats and learning accumulated by soft change analysis has not been exploited completely however, since further predictive techniques can be applied with respect to snippets of code that may change in future, or on the other hand associative analysis between the changes' significance and the people (for example, were every change submitted by the most active engineer as imperative as the rest, in scale and in practice?).
Clustering
Instead of classification, clustering does not depend on predefined classes. Consequently, it is referred to as an unsupervised learning process. It partitions a given data set into groups or clusters such that inter cluster distance is maximized and intra cluster distance is minimized. Elements to be clustered should be recognized and credited should be chosen before applying the clustering algorithm. Clustering algorithm like graph theoretical algorithms, construction algorithms, optimization algorithms and hierarchical algorithms can be utilized for software engineering data. Clustering high dimensional data is generally troublesome. To manage this situation, profoundly concentrated algorithms like CLIQUE can be utilized.
Software Engineering Data to Be Mined
This section talks about different software engineering data, that can be mined and analyzed. Data generated by software engineering evaluates the choice of data mining methods that can be helpful to gain valuable knowledge. The following are the data produced by software engineering.
Documentation
Despite the fact that software documentation data is of high significance, its complexity is moderately high too. This incorporates application, system administration and source code documentation which predominantly comprises of free content in natural language. Among these text data, software depiction, start up and usage config, user guide, file management issues, logging, license and compatibility issues can be considered of incredible value for use in data mining. An analytical reference of every single imaginable kind of software documentation data can be found in. Software documentation may likewise contain multi-media data in the type of figures and audio/video guidelines. Multi-media mining systems can be utilized to mine such data productively.
Software Configuration Management Data
Data created by software config management systems incorporate software code, archives, design models, status accounting, defect tracking and also revision control data. Revision control software is used by software development organizations to deal with the continuous development of digital resources that might be taken a shot at by a group of individuals. Such frameworks keep up a historical record of every correction and enable clients to access and return to past versions. In this way, we can break down the historical data produced amid software development. This incorporates details like number of common software metrics, number of lines that have been written and creators who have written specific lines.
Source Code
Source code can be used by data mining applications to help software maintenance, program appreciation and software segments' analysis. The source code ought to at first be parsed. Once parsed, it winds up organized text. Central parts of applying data mining methods in source code among others incorporate predictions of future changes through mining change history, predicting change propagation, issues from cached history, just as predicting deformity densities in source code documents.
Compiled Code and Execution Traces
Compiled code comprises in its type of object code one of the elective data sources for applying static analysis in software engineering. Compiled code has likewise been utilized as a data source from data mining systems so as to help to find malicious software detection. Moreover, web mining standards have been generally utilized in object-oriented executables to help program cognizance for methods for reverse engineering. At the point when the software modules and parts of it are tested, a chain of occasions happens which are recorded in an execution trace. Execution pattern mining has likewise been utilized in execution trace under the framework of dynamic analysis to help with the extraction of software system's functionalities.
Mailing Lists
Mailing records are regularly offered by substantial software system as methods for connecting users and software engineers in a collaborative environment. Mailing records basically contain a part of free text. It is moderately simple to pull out message and creator diagrams from the data, however content analysis might be fundamentally harder since messages including replies would expect one to think about initial or previously discussions in the mailing records. The significant data mining applications in mailing records are text analysis, text clustering of subjects talked about, and linguistic analysis of messages to feature the developer's personalities and profiles. Text mining methods can be applied to documents of such communication to gather knowledge into development processes, bugs and design choices.
Bug Finding and Issue Tracking
An issue-finding database, mostly comprises of three sorts of information: 1) organized information (database tuples) containing the depiction of an issue; 2) the reporters' details; 3) date or time. Most software development organizations utilize a framework for following software defects. Bug tracking software joins bugs with meta-data (status, assignee, remarks, dates and milestones, and so on.) that can be mined to find patterns in software development processes, including the time to fix, defect-prone components, problematic authors, and so forth. Some bug trackers can associate imperfections with source code in a revision framework. Systems that include machine learning have been utilized to predict right assignments of developers to bugs, cleaning the database from signs of a similar mistake, or even predicting software modules that are influenced in the meantime from reported bugs.
Role of Data Mining in Improving the Development Process
In the requirements elicitation stage, the prerequisite report gives a full depiction of all the software and hardware prerequisites for the projects. Since this document is exceptionally detailed and clear in nature, it increases the time required to abridge the prerequisites in such a way that they are accessible at the suitable time. Time management in resource availability is necessary for the working of all the subsequent stages. Data mining methods like classification could be utilized on such data, which will classify and prioritize the necessities in such a way, that every resource which are required at each phase will be available on schedule. Text mining can be utilized to summarize the enormous measure of given data. This will reduce the man hours put into summarizing and prioritizing the requirements, in this manner saving time, cost and human resources. In the design stage, while designing the layout of the architecture and planning out the database structure it ends up critical to know which data would be required where and when. Data mining methods like clustering can accumulate similar data now and again with the goal that extraction of data will be easier. Data gathering turns into a job particularly when it must be pre-processed over and over again. By utilizing clustering on data components, the information can be separated based on its similarity or uniqueness. Naming data from any incoming site would also be a lot simpler by using clustering. During implementation, independent snippets of codes or modules are to be written first, after which they are coordinated with one another. This combination stage can demonstrate to be more challenging than really coding these modules. The functionalities of every module must be seen so that they can be incorporated proficiently. Data mining methods like classification and text mining will permit the developer to comprehend the possible bugs that may happen during integration. Here the input would be the source code of these independent modules and the output would be regardless of whether there would be bugs after combination. Frequent pattern mining will likewise help in correcting those errors that are found while performing classification. Clustering can help bunch together the software forms that are similar. The ability to rely of a software system is conversely proportional to the number of failures and bugs experienced in the software. Utilizing the data mining methods referenced over, these bugs and failures can be distinguished effectively and corrected. This spares important time, money and the additional resources that may have been required for their identification and resolution alongside increasing the ability to rely and maintainability of the software. While testing the software, unit testing will normally be performed at the implementation level. In any case, the other testing methods will have high probability to be performed by an analyzer who isn't from the development team. The task of finding bugs in a code is tedious and the possible test cases are too many. Classification methods can be utilized for I/O variables of the system which will create a network with sets for functional testing. This decreases the time for testing stage and the item can be released as right on time as possible. Accordingly, it additionally prevents the development time from extending which in turns spares cost and resources. The person who is testing needs to likewise pay special attention to behavioral/state changes in the execution of the program. Clustering and classification procedures can be utilized to pay special attention to such changes in the system behavior for a specific set of data of testing.
Conclusion
As software engineering produces huge amount of data, it is vital to use it properly with the goal that the issues in regards to the software development cycle can be resolved proficiently. The need and significance of utilizing data mining procedures to help software engineering, particularly to handle issues, for example, the occurrence of bugs, increase in the cost of software maintenance; unclear requirements, and so forth that can influence software efficiency and quality. List of software engineering data that can be mined as explained above, most basic stages in the development procedure just as the data mining methods that can be applied in these stages. The paper mostly lies in the specification of the data mining method generally fit for a specific stage in the development procedure. The benefits of utilizing such incredible data mining strategies, particularly as far as time, cost, resources, reliability and maintainability are concerned. At last, some these perceptions are represented in tabular form, looking at the result of software engineering with and without the inclusion of data mining.
References
- Laplante, Phillip (2007). “What Every Engineer Should Know about Software Engineering”, Boca Raton: CRC. ISBN 9780849372285.
- “Selecting a development approach”, Centers for Medicare & Medicaid Services (CMS) Office of Information Service (2008). Re-validated: March 27, 2008. Retrieved 27 Oct 2015.
- Nabil Mohammed Ali Munassar1 and A. Govardhan, “A Comparison Between Five Models Of Software Engineering”, IJCSI International Journal of Computer Science Issues, Volume 07, Issue 05, Page No (94-101), September 2010.
- Taylor, Q.and Giraud-Carrier, C. “Applications of data mining in software engineering”, International Journal of Data Analysis Techniques and Strategies, Volume 02, Issue 03, Page No (243-257), July 2010.
- T. Xie, S. Thummalapenta, D. Lo and C. Liu, “Data mining for software engineering”, IEEE Computer Society, Volume 42, Issue 08, Page No (55-62), August 2009.
- R. H. Thayer, A. Pyster, and R. C. Wood, “Validating solutions to major problems in software engineering project management”, IEEE Computer Society, Page No (65-77), 1982.
- C. V. Ramamoorthy, A. Prakash, W. T. Tsai, and Y. Usuda, “Software engineering: problems and perspectives”, IEEE Computer Society, Page No (191-209), October 1984.
- J. Clarke et al., “Refomulating software engineer as a search problem”, IEEE Proceeding Software., Volume 150, Issue 03, Page No (161-175), June 2003.
- M. Z. Islam and L. Brankovic, “Detective: a decision tree based categorical value clustering and perturbation technique for preserving privacy in data mining”, Third IEEE Conference on Industrial Informatics (INDIN), Page No (701- 708), 2005.
- M. Aouf, L. Lyanage, and S. Hansen, “Critical review of data mining techniques for gene expression analysis”, International Conference on Information and Automation for Sustainability (ICIAFS) 2008, Page No (367-371), 2008.
- P. C. H. Ma and K. C. C. Chan, “An iterative data mining approach for mining overlapping coexpression patterns in noisy gene expression data”, IEEE Trans. NanoBioscience, Volume 08, Issue 03, Page No (252-258), September 2009.
- Mendonca, M. and Sunderhaft, N. “Mining software engineering data: a survey”, Data & Analysis Center for Software (DACS) State-of-the-Art Report, No. DACSSOAR- 99-3.
- Xie, T., Pei, J. and Hassan, A.E. “Mining software engineering data”, Software Engineering - Companion, 2007. ICSE 2007 Companion. 29th International Conference, Page No (172–173).
- Kagdi, H., Collard, M.L. and Maletic, J.I. “A survey and taxonomy of approaches for mining software repositories in the context of software evolution”, Journal of Software Maintenance and Evolution: Research and Practice, Volume 19, Issue 02, Page No (77–131).
- C. CHANG and C. CHU, “Software Defect Prediction Using Inter transaction Association Rule Mining”, International Journal of Software Engineering and Knowledge Engineering, Volume 19, Issue 06, Page No (747-764), September 2009.
- N. Pannurat, N. Kerdprasop and K. Kerdprasop “Database Reverse Engineering based on Association Rule Mining” , International Journal of Computer Science Issues, Volume 7, Issue 2, Page No (10-15), March 2010.
- Caiyan Dai and Ling Chen, 'An Algorithm for Mining Frequent Closed Itemsets with Density from Data Streams', International Journal of Computer Sciences and Engineering, Volume-04, Issue-02, Page No (40-48), Feb -2016.
- S.M.Weiss and C. Kulikowski, “Computer Systems that Learn: Classification and Prediction Methods from Statistics, Neural Nets, Machine Learning and Expert Systems, Morgan Kauffman”, Morgan Kaufmann Publishers Inc, ISBN:1- 55860-065-5.
- U. M. Fayyad, G. PiateskyShapiro, P. Smuth and R. Uthurusamy, “Advances in Knowledge Discovery and Data Mining”, AAAI Press, ISBN:0-262-56097-6.
- M. Halkidia, D. Spinellisb, G. Tsatsaronisc and M. Vazirgiannis, “Data mining in software engineering”, Intelligent Data Analysis 15, Page No (413–441), 2011.
- M. Berry and G. Linoff, “Data Mining Techniques For marketing, Sales and Customer Support”, John Willey and Sons Inc., ISBN: 978-0-471-17980-1.
- K.Selvi, 'Identify Heart Diseases Using Data Mining Techniques: an Overview', International Journal of Computer Sciences and Engineering, Volume-03, Issue-11, Page No (180-187), Nov -2015.
- L. Kauffman and P.J. Rousseeuw, “Finding Groups in Data: An Introduction to Cluster Analysis”, John Wiley and Sons, ISBN - 9780470317488.
- Lovedeep, Varinder Kaur Atri, “Applications of Data Mining Techniques in Software Engineering”, International Journal of Electrical, Electronics and Computer Systems (IJEECS), Volume 02, Issue 05, Page No (70-74), June 2014.
- M. Gegick, P. Rotella and T. Xie, “Identifying security bug reports via text mining: an industrial case study”, Mining Software Repositories (MSR), 7th IEEE Working Conference, Page No (11 – 20), 2010.
- P. Runeson, and O. Nyholm, “Detection of duplicate defect reports using natural language processing”, Software Engineering, 2007. ICSE 2007. 29th International Conference, Page No (499 – 510), 2007.
- Ian Somerville, “Software Engineering”, AddisonWesley, Chapter 30, 4th edition, ISBN - 9783827370013.
- J. Estublier, D. Leblang, A. Van Der Hoek, R. Conradi, G. Clemm, W. Tichy and D. WilborgWeber, “Impact of software engineering research on the practice of software configuration management”, ACM Transactions on Software Engineering and Methodology, Volume 14, Issue 04, Page No (383-430), October 2005.
- H.A. Basit and S. Jarzabek, “Data mining approach for detecting higher level clones in software”, IEEE Transactions on Software Engineering, Volume 35, Issue 04, Page No (497 – 514).
- Iam Sommerville, “Requirements Engineering A good practice guide”, Ramos Rowel and Kurts Alfeche, John Wiley and Sons, 1997, ISBN – 9780470359396.