Prediction for software cost estimation
Analyze critical cost drivers in past project data and forecast to estimate software costs using Weka data mining tools. Methodology and stages of determining the main and standard cost factors used in the process of calculating the scope of the project.
Рубрика | Экономика и экономическая теория |
Вид | статья |
Язык | английский |
Дата добавления | 15.07.2021 |
Размер файла | 32,3 K |
Отправить свою хорошую работу в базу знаний просто. Используйте форму, расположенную ниже
Студенты, аспиранты, молодые ученые, использующие базу знаний в своей учебе и работе, будут вам очень благодарны.
Размещено на http://www.allbest.ru/
Размещено на http://www.allbest.ru/
National University of Mongolia
Information Technology Center of Custom, Taxation and Finance
Prediction for software cost estimation
Uyanga Sambuu
Oyunbileg Pagjii
Munkhtsetseg Namsraidorj
Naranmandal Chimedmaa
Ulaanbaatar, Mongolia
Abstract
This paper aims to identify the critical cost drivers in the past project data and prediction for a software cost estimate with the help of data mining tools Weka. Cost drivers are multiplicative factors that determine the effort required to complete our software project. We used data mining tools Weka to identify the essential and standard cost drivers used to generate the estimate of a project.
Keywords. Software, cost estimation, project estimation, non-algorithmic techniques, algorithmic methods, COCOMO, Weka.
Аннотация
weka software cost
Монгольский государственный университет
Центр таможенных, налоговых и финансовых информационных технологий
Прогноз стоимости программного обеспечения
Уянга Самбуу
Оюунбилэг Пагжий
Мунхцэцэг Намсрайдорж
Наранмандал Чимэдмаа
г. Улан-Батор
В этой статье рассматриваются определение критических факторов затрат в прошлых данных проекта и прогноз для оценки стоимости программного обеспечения с помощью инструментов интеллектуального анализа данных Weka. Драйверы затрат -- это мультипликативные факторы, которые определяют усилия, необходимые для завершения проекта программного обеспечения. В данной работе использованы инструменты интеллектуального анализа данных Weka для определения основных и стандартных факторов затрат, используемые для расчета объема проекта.
Ключевые слова. Программное обеспечение, оценка стоимости, оценка проекта, неалгоритмические методы, алгоритмические методы, КОКОМО, Weka.
Introduction
Cost estimation is a process or an approximation of the probable cost of a product, program, or project computed based on available information. Accurate cost estimation is essential for every kind of project. Sometimes it will be reached 150-200% more than the original cost. In that case, it is essential to estimate the project correctly. This research aims to review existing software costs estimation techniques and understand how they can be applied in software development, considering their specific nature. We tried an overview of cost estimation models and then discussed their advantages and disadvantages. Finally, the guidelines for selecting appropriate cost estimation models are given, and a combination method is recommended.
This research has two different fields one is software engineering, and another field is data mining. Data mining methods help to classified the past project data and generate valuable information. After 20 years of research, many software cost estimation methods are available, including algorithmic methods, estimating by analogy, expert judgment method, price to win method, top-down method, and bottom-up method.
Various methods are available to estimate the cost of the software. These techniques are classified mainly into two types: Algorithmic and non-algorithmic techniques [1]. In this section, we enlist these techniques and their advantages and disadvantages to determine which one is more suitable or regarded as the best technique.
1. Non algorithmic techniques
Non-algorithmic techniques base their estimation process on analogy and deduction. We need to know about a previously complete project similar to our current software project. Estimation is done based on analysis of previous software projects or data sets. Some of the techniques based on Non-Algorithmic methods are as detailed below:
Estimation based on analogy
The basic idea behind estimation by analogy is that whenever we get a new software project for cost estimation, it is just compared to similar historical projects to arrive at the nearest similar software project to estimate our current project cost. The values and data from previously complete projects are deduced to calculate the cost of our current project. We can use this technique both at the system or component level [ibid.].
Expert judgement method
Estimation based on expert judgement captures experts' knowledge, and the estimation of the cost depends on those projects that involved the inclusion of the expert. Usually, there are some scenarios when we have limitations to gather and find data. The expert judgement method is suitable to be used in these situations. It is the widely used estimation strategy for software projects [1].
The experts can predict the impacts caused due to new technologies, architecture, and languages. It is difficult and tedious to document the factors used by experts.
Top-down estimation
In this technique derive total cost from global properties using either of algorithmic or non-algorithmic technique. Then this cost is split into various components of the system. Top-down estimation is more beneficial in the early stages of software development because detailed information is not available. It requires significantly less detail about the project; moreover, it is faster and easier to implement. Unlike other techniques, top-down estimation focuses on activities like integration, management, etc. Usually, these are overlooked in other techniques. This technique does not consider low-level problems that are difficult and can increase the cost of the system.
Bottom-up estimation
Bottom-up estimation is the opposite of the Top-down estimation method. In this method, the software components derive the cost, and then the result is combined to achieve the software's overall cost. The goal is to derive a system estimate from the accumulated estimate of the minor component.
Price to win estimation
Here we are focused more on a budget of the customer rather than the functionality of the software. Overall software cost is agreed based on an outline proposal, and that cost restricts the development of software.
I.II. Algorithmic techniques
Algorithmic methods make use of equations and mathematics to perform the process of estimation. Moreover, these equations are derived from research and use inputs like Source Lines of Code (SLOC), function points, and cost drivers like risk assessments, languages, design methodology, etc. Models like COCOMO (Constructive Cost Model), Putnam's Model, Function Point based models, and SEER-SEM models are some of the Algorithmic models.
Advantages are:
1. It can generate repeatable estimations.
2. It is easy to modify input data, refine and customize formulas.
3. It is efficient and able to support a family of estimations or a sensitivity analysis.
4. It is objectively calibrated to previous experience.
Disadvantages are:
1. It cannot deal with exceptional conditions, such as exceptional personnel in any software cost estimating exercises, exceptional teamwork, and an exceptional match between skill levels and tasks.
2. Poor sizing inputs and inaccurate cost driver rating will result in inaccurate estimation.
3. Some experience and factors can not be easily quantified.
COCOMO Models
The Constructive Cost Model (COCOMO) is a well documented and widely accepted algorithmic model for effort estimation, developed by Barry W. Boehm in 1981 [2].
The basic COCOMO model has a simple form:
MAN - MONTHS = K1 * (Thousands of Delivered Source Instructions) K2,
where, K1 and K2 are two parameters dependent on the application and development environment.
Estimates from the basic COCOMO model can be made more accurate by considering other factors concerning the required characteristics of the software to be developed, the qualification and experience of the development team, and the software development environment. Some of these factors are:
Complexity of the software:
1. Required reliability.
2. Size of data base.
3. Required efficiency (memory and execution time).
4. Analyst and programmer capability.
5. Experience of team in the application area.
6. Experience of team with the programming language and computer.
7. Use of tools and software engineering practices.
Many of these factors affect the person months required by order of magnitude or more. COCOMO assumes that the system and software requirements have already been defined and that these requirements are stable. Boehm has proposed three models of COCOMO, and these are as follows:
Basic cOcOMO -- Being the first of the COCOMO set of models, the formula used by this model is:
Effort = a * (KLOC)b,
where, KLOC denotes the code size and the constant is represented by a and b.
The value of these constants depends on the type of project, whether organic, semi-detached or embedded [3; 4].
The model helps is defining mathematical equations that identify the developed time, the effort and the maintenance effort. COCOMO model is used to make estimates based upon three different software project estimates. The three ways of estimating software project effort/ cost with increasing levels of accuracy are simple, intermediate and complex models [5].
Intermediate COCOMO -- In this we obtain nominal effort estimation and the value of constants a and b differs from that of basic COCOMO. The formula used in this model is:
Effort = a * (KLOC)b * EAF.
Here the effort adjustment factor is represented by EAF.
Detailed COCOMO -This works on each sub-system separately and serves as a boon for large systems made up of non-homogenous sub-systems.
COCOMO Cost Drivers
COCOMO has 15 cost drivers for estimating project, development environment, and team to set each cost driver. The cost drivers are multiplicative factors that determine the effort required to complete a software project. For example, if the project will develop software that controls an airplane's flight, we would set the Required Software Reliability (RELY) cost driver to Very High. That rating corresponds to an effort multiplier of 1.26, meaning that the project will require 26% more effort than a typical software project. In the COCOMO model, the four groups' cost drivers show the below and introduce some cost drivers in short form. The cost drivers' four groups are:
I. Personnel Factors:
1. Analyst Capability.
2. Programmer Capability.
3. Applications Experience.
4. Platform Experience.
5. Personnel Continuity, and
6. Use of Software Tools.
II. Product cost driver:
1. Required Software Reliability.
2. Data Base Size.
3. Required Reusability, and
4. Documentation match to life-cycle needs, etc.
III. Platform Factors:
1. Execution Time Constraint, and
2. Platform Volatility.
IV. Project Factors:
1. Required Development Schedule, and
2. Multisite Development, etc.
Introduction of some cost drivers
1. Required Software Reliability (RELY) -- This is the measure of the extent to which the software must perform its intended function over a while. If the effect of a software failure is only a slight inconvenience, then RELY is low. If a failure would risk human life, then RELIES is very high.
2. DataBase Size (DATA) -- This measure attempts to capture the effect large data requirements have on product development. The rating is determined by calculating D/P. The database's size is essential to consider because of the effort required to generate the test data that will be used to exercise the program.
3. Product Complexity (CPLX) -- Complexity is divided into five areas: control operations, computational operations, device-dependent operations, data management operations, and user interface management operations. Select the area or combination of areas that characterize the product or a sub-system of the product. The complexity rating is the subjectively weighted average of these areas.
4. Required Reusability (RUSE) -- This cost driver accounts for the additional effort needed to construct components intended to reuse current or future projects. This effort is consumed with creating a more generic software design, more detailed documentation, and more extensive testing to ensure components are ready for use in other applications.
5. Execution Time Constraint (TIME) -- This is a measure of the execution time constraint imposed upon a software system. The rating is expressed in terms of the percentage of available execution time expected to be used by the system or subsystem consuming the execution time resource. The rating ranges from nominal, less than 50% of the execution time resource used, to extra high, 95% of the execution time resource is consumed.
6. Analyst Capability (ACAP) -- Analysts work on requirements, high-level design, and detailed design. The significant attributes that should be considered in this rating are analysis and design ability, efficiency and thoroughness, and the ability to communicate and cooperate. The rating should not consider the level of experience of the analyst; that is rated with AEXP. Analysts that fall in the 15th percentile are rated very low, and those that fall in the 95th percentile are rated as very high.
7. Programmer Capability (PCAP) -- Current trends continue to emphasize the importance of competent analysts. However, the increasing role of complex COTS packages, and the significant productivity leverage associated with programmers' ability to deal with these COTS packages, indicates a trend toward higher importance of programmer capability. Evaluation should be based on the capability of the programmers as a team rather than as individuals. Significant factors which should be considered in the rating are ability, efficiency and thoroughness, and the ability to communicate and cooperate. The experience of the programmer should not be considered here; it is rated with AEXP. A low-rated programmer team is in the 15th percentile, and a very high-rated programmer team is in the 95th percentile.
8. Applications Experience (AEXP) - This rating is dependent on the level of application experience of the project team developing the software system or subsystem. The ratings are defined in terms of the project team's equivalent level of experience with this application type. A low rating is for application experience of fewer than two months. A very high rating is for the experience of 6 years or more.
9. Platform Experience (PEXP) -- The Post-Architecture model broadens the productivity influence of PEXP, recognizing the importance of understanding the use of more powerful platforms, including more graphic user interface, database, networking, and distributed middleware capabilities.
10. Use of Software Tools (TOOL) -- Software tools have improved significantly since the 1970's projects used to calibrate COCOMO™. The tool rating ranges from simple edit and code, very low, to integrated lifecycle management tools, very high.
Putnam model
Another popular software cost model is the Putnam model. The form of this model is:
Technical constant C = size * B1/3 * T4/3;
Total Person Months B = 1/T4 * (size/C)3;
T = Required Development Time in years;
Size is estimated in LOC;
Where: C is a parameter dependent on the development environment and It is determined on the basis of historical data of the past projects.
Rating: C = 2,000 (poor), C = 8 000 (good) C = 12,000 (excellent).
The Putnam model is susceptible to the development time: decreasing the development time can significantly increase the person-months needed for development.
One significant problem with the Putnam model is that it is based on knowing or accurately estimating the size (in lines of code) of the software to be developed. There is often significant uncertainty in the software size. It may result in the inaccuracy of cost estimation.
Function Point Analysis Based Methods
From the above two algorithmic models, we found they require the estimators to estimate the SLOC number to get person-months and duration estimates. The Function Point Analysis is another method of quantifying a software system's size and complexity in terms of the functions that the systems deliver to the user. The function point measurement method was developed by Allan Albrecht at IBM and published in 1979. He believes function points offer several significant advantages over SLOC counts of size measurement. The collection of function point data has two primary motivations. One is the desire by managers to monitor levels of productivity. Another use of it is in the estimation of software development cost.
The advantages of function point analysis based model are:
1. Function points can be estimated from requirements specifications or design specifications, thus making it possible to estimate development cost in the early phases of development.
2. Function points are independent of the language, tools, or methodologies used for implementation.
3. Non-technical users have a better understanding of what function points are measuring since function points are based on the system user's external view of the system.
2. Introduction of data mining and Weka tool
weka software cost
All software cost estimation models cannot produce accurate estimates that often can be off by greater than 50% from the actual cost, and sometimes as much as 150-200% off from the actual cost. Therefore, we need such new methods or models that can help us generate the actual costs, and their accuracy is being investigated. With the enormous amount of data stored in files, databases, and other repositories, it is increasingly important, if not necessary, to develop powerful means for analysis and perhaps the interpretation of such data and for the extraction of interesting knowledge that could help in decision-making.
Data mining is the process of exploration and analysis of large data, so that meaningful pattern and rules can be discovered. The objective of data mining is to design and work efficiently with large data sets. Data mining is the component of wider process called knowledge discovery from database. Data Mining is the process of analysing data from different perspectives and summarizing the results as useful information [6]. Data Mining refers to the nontrivial extraction of implicit, previously unknown, and potentially useful information from databases. Data mining isn't solely the domain of big companies and expensive software. There's a piece of software that does almost all the same things as these expensive software pieces. The software is called Weka. K-means clustering is a data mining/machine learning algorithm used to cluster observations into groups of related observations without prior knowledge of those relationships. The k-means algorithm is one of the most straightforward clustering techniques, and it is commonly used in medical imaging, biometrics, and related fields. In this paper, we implemented the association rule mining technique's apriori algorithm and clustering k-means algorithm.
The k-means algorithm
The k-means algorithm is an evolutionary algorithm that gains its name from its operation method--the algorithm clusters observations into k groups, where k is provided as an input parameter.
It then assigns each observation to clusters based upon the observation's proximity to the cluster's mean. The cluster's mean is then recomputed, and the process begins again. Here is how the algorithm works:
1. The algorithm arbitrarily selects k points as the initial cluster centers ("means").
2. Each point in the dataset is assigned to the closed cluster, based upon the Euclidean distance between the point and cluster center.
3. Each cluster center is recomputed as the average of the points in that cluster.
4. Steps 2 and 3 repeats until the clusters converge. Convergence may be defined differently depending upon the implementation, but it usually means that no observations change clusters when steps 2 and 3 are repeated or that the changes do not make a material difference in the clusters' definition.
3. Implementation
Software cost estimation is one of the crucial activities of the software develop- ment which involves predicting the effort, size and cost required to develop a software system or a software project. The utilization of software cost estimation techniques makes it possible to predict the amount of effort and cost that will be incurred in a certain software project [7].
We combined two different data mining and software engineering fields and tried to generate the project's accurate cost with the help of past project data whose cost or effort is known and the typical cost factors.
We used Weka tools for data mining and COCOMO tools for software estimation. We used the promise data set for the analysis. This is a PROMISE Software Engineering Repository data set made publicly available to encourage repeatable, verifiable, refutable, and/or improvable predictive models of software engineering. The data files in the.arff and.csv format. These data set directly apply in the Weka and apply the various algorithms. Result of Weka applied in the COCOMO model. In this model, we estimate the new project with the help of comparing the past project data. The features of a new project are very similar to the past project. With the help of Weka and CoCOMO, we predicted some useful results. In this research, we have taken 60 past project data whose efforts are already given -- the common cost drivers and the scale factors that are mainly affected the project estimation. Table 1 shows the classification of the after apply the k-means clustering algorithms. With the help of clustering, we grouped a similar group of cost drivers. These cost drivers are beneficial to predict the estimate of the new projects.
Table 1. Classification of the after apply the k-means clustering algorithms
Cost drivers |
Cluster 1 |
Cluster 2 |
Cluster 3 |
Cluster 4 |
Cluster 5 |
|
RELY |
Normal, High |
Normal |
Low, Normal |
High |
Normal |
|
DATA |
Low |
Normal |
Normal, High |
Normal |
Very High |
|
CPLX |
High |
High |
Normal, Very_ High |
High |
High |
|
TIME |
Normal |
Normal |
Normal |
High |
Very_High |
|
STOR |
Normal |
Normal |
Normal |
High |
Very_High |
|
VIRT |
Low |
Normal |
Low |
Low |
Low |
|
TURN |
Low |
Normal |
Low, Normal |
High |
High |
|
AEXP |
Normal- Very_ High |
Normal, High |
Normal- Very_ High |
High |
High |
|
PCAP |
Normal- Very_ High |
Normal, High |
Very_High |
Normal |
Normal |
|
ACAP |
Normal, High |
Normal, High |
High |
Normal |
Very_High |
|
VEXP |
Normal |
Normal |
Low, Normal |
Normal |
Low |
|
LEXP |
High |
Normal, High |
Normal, High |
Normal |
High |
|
MODP |
Normal, High |
Normal, High |
Low, High |
Low |
Very_High |
|
TOOL |
Normal |
Normal |
Low, Normal |
Very_ High |
Very_High |
|
SCED |
Low, Normal |
Normal |
Low -- High |
Normal |
Low |
|
LOC |
5.5-302 |
8-90 |
70-423 |
21-219 |
12.8-48.5 |
Where, Cluster 1 -- Organic mode, Cluster 2 -- Semi-detached mode, Cluster 3 -- Semi-detached mode, Cluster 4 -- Embedded mode, Cluster 5 -- Embedded mode
Evaluation Method
We used data mining techniques for analysis. The data set was divided into two groups: training and testing with the following percentage ratios:
Training set -- 60%.
Testing set -- 40%.
Table 2. Results of estimation
№ |
RELY |
DATA |
CPLX |
TIME |
STOR |
VIRT |
TURN |
ACAP |
AEXP |
PCAP |
VEXP |
LEXP |
MODP |
TOOL |
SCED |
LOC |
ACT_ EFFORT |
Cluster |
|
0 |
Low |
Nominal |
Nominal |
Nominal |
Nominal |
Low |
Low |
High |
High |
Very_ High |
Nominal |
High |
Low |
Low |
High |
423 |
2 300 |
cluster2 |
|
1 |
Nominal |
Very_ High |
High |
Very_ High |
Very_ High |
Low |
High |
Very_ High |
High |
Nominal |
Low |
High |
Very _ High |
Very_ High |
Low |
16.3 |
82 |
cluster4 |
|
2 |
Nominal |
Low |
High |
Nominal |
Extra_ High |
Low |
Low |
High |
Very_ High |
Very_ High |
Nominal |
High |
Nominal |
Nominal |
Nominal |
150 |
324 |
clusterO |
|
3 |
Nominal |
Low |
High |
Nominal |
Nominal |
Low |
Low |
High |
High |
High |
Nominal |
High |
Nominal |
Nominal |
Nominal |
31.5 |
60 |
clusterl |
|
4 |
High |
Nominal |
High |
High |
High |
Low |
High |
Nominal |
High |
Nominal |
Nominal |
Nominal |
Low |
Very_ High |
Nominal |
219 |
2 120 |
cluster3 |
|
5 |
High |
Low |
High |
Nominal |
Nominal |
Low |
Low |
Nominal |
Nominal |
Nominal |
Nominal |
High |
High |
Nominal |
Low |
25.9 |
117.6 |
clusterO |
|
6 |
High |
Nominal |
High |
High |
High |
Low |
High |
Nominal |
High |
Nominal |
Nominal |
Nominal |
Low |
Very_ High |
Nominal |
50 |
370 |
clusters |
|
7 |
Very_ High |
Nominal |
Extra_ High |
High |
High |
Low |
Low |
Nominal |
High |
Nominal |
Nominal |
Nominal |
Low |
High |
Nominal |
21 |
107 |
clusters |
|
8 |
High |
Low |
High |
Nominal |
Nominal |
Low |
Low |
Nominal |
Nominal |
Nominal |
Nominal |
High |
High |
Nominal |
Low |
19.7 |
60 |
clusterO |
|
9 |
Nominal |
Low |
High |
Nominal |
Nominal |
Low |
Low |
High |
Very_ High |
Very_ High |
Nominal |
High |
Nominal |
Nominal |
Nominal |
100 |
360 |
clusterO |
|
10 |
High |
Low |
High |
Nominal |
Nominal |
Low |
Low |
Nominal |
Nominal |
Nominal |
Nominal |
High |
High |
Nominal |
Low |
14 |
60 |
clusterO |
|
11 |
Nominal |
Low |
High |
Nominal |
Nominal |
Low |
Low |
High |
Very_ High |
High |
Nominal |
High |
Nominal |
Nominal |
Nominal |
15 |
48 |
clusterO |
12 |
High |
Low |
High |
Nominal |
Nominal |
Low |
Low |
Nominal |
Nominal |
Nominal |
Nominal |
High |
High |
Nominal |
Low |
9.7 |
25.2 |
clusterO |
|
13 |
Nominal |
Low |
High |
Nominal |
Extra_ High |
Low |
Low |
High |
High |
Nominal |
Nominal |
High |
Nominal |
Nominal |
Nominal |
32.5 |
60 |
clusterO |
|
14 |
Nominal |
Nominal |
High |
Nominal |
Nominal |
Nominal |
Nominal |
Nominal |
Nominal |
High |
Nominal |
High |
High |
High |
Nominal |
90 |
450 |
clusterl |
|
15 |
High |
Nominal |
High |
Nominal |
Nominal |
Nominal |
Nominal |
Nominal |
High |
High |
Nominal |
Nominal |
Nominal |
Nominal |
Nominal |
38 |
210 |
clusterl |
|
16 |
Nominal |
Low |
High |
Nominal |
Nominal |
Low |
Low |
High |
Very_ High |
Very_ High |
Nominal |
High |
Nominal |
Nominal |
Nominal |
20 |
72 |
clusterO |
|
17 |
Nominal |
Low |
High |
Nominal |
Nominal |
Low |
Low |
High |
Very_ High |
High |
Nominal |
High |
Nominal |
Nominal |
Nominal |
20 |
48 |
clusterO |
|
18 |
Nominal |
Nominal |
Nominal |
Nominal |
Nominal |
Nominal |
Nominal |
Nominal |
High |
High |
Nominal |
Nominal |
Nominal |
Nominal |
Nominal |
10 |
48 |
clusterl |
|
19 |
Nominal |
High |
Very_ High |
Nominal |
Nominal |
Low |
Nominal |
High |
Nominal |
Very_ High |
Low |
Nominal |
High |
Nominal |
Low |
70 |
278 |
cluster2 |
|
20 |
Nominal |
Low |
High |
Nominal |
Nominal |
Low |
Low |
High |
Nominal |
Nominal |
Nominal |
Very_ low |
Nominal |
Nominal |
Nominal |
100 |
360 |
clusterO |
|
21 |
Nominal |
Very_ High |
High |
Very_ High |
Very_ High |
Low |
High |
Very_ High |
High |
Nominal |
Low |
High |
Very _ High |
Very_ High |
Low |
32.6 |
170 |
cluster4 |
|
22 |
Nominal |
Nominal |
High |
Nominal |
High |
Nominal |
Nominal |
High |
High |
Nominal |
Nominal |
High |
High |
Nominal |
High |
47.5 |
252 |
clusterl |
|
23 |
Nominal |
Nominal |
Nominal |
Nominal |
Nominal |
Low |
Nominal |
High |
Very_ High |
Very_ High |
Low |
High |
High |
Nominal |
Nominal |
190 |
420 |
cluster2 |
24 |
High |
Nominal |
Very_ High |
High |
High |
Low |
High |
High |
Nominal |
Nominal |
High |
High |
Low |
Very_ High |
High |
101 |
750 |
cluster3 |
|
25 |
Nominal |
Very_ High |
High |
Very_ High |
Very_ High |
Low |
High |
Very_ High |
High |
Nominal |
Low |
High |
Very _ High |
Very_ High |
Low |
48.5 |
239 |
cluster4 |
|
26 |
Nominal |
Nominal |
High |
Nominal |
Nominal |
Nominal |
Nominal |
Nominal |
Nominal |
Nominal |
Nominal |
Nominal |
Nominal |
Nominal |
Nominal |
8 |
42 |
clusterl |
|
27 |
Nominal |
Very_ High |
High |
Very_ High |
Very_ High |
Low |
High |
Very_ High |
High |
Nominal |
Low |
High |
Very _ High |
Very_ High |
Low |
15.4 |
70 |
cluster4 |
|
28 |
High |
Low |
High |
Nominal |
Nominal |
Low |
Low |
Nominal |
Nominal |
Nominal |
Nominal |
High |
High |
Nominal |
Low |
115.8 |
480 |
clusterO |
|
29 |
High |
Low |
High |
Nominal |
Nominal |
Low |
Low |
Nominal |
Nominal |
Nominal |
Nominal |
High |
High |
Nominal |
Low |
66.6 |
352.8 |
clusterO |
|
30 |
Nominal |
Very_ High |
High |
Very_ High |
Very_ High |
Low |
High |
Very_ High |
High |
Nominal |
Low |
High |
Very _ High |
Very_ High |
Low |
35.5 |
192 |
cluster4 |
|
31 |
High |
Low |
High |
Nominal |
Nominal |
Low |
Low |
Nominal |
Nominal |
Nominal |
Nominal |
High |
High |
Nominal |
Low |
66.6 |
300 |
clusterO |
|
32 |
Nominal |
Very_ High |
High |
Very_ High |
Very_ High |
Low |
High |
Very_ High |
High |
Nominal |
Low |
High |
Very _ High |
Very_ High |
Low |
12.8 |
62 |
cluster4 |
|
33 |
High |
Low |
High |
Nominal |
Nominal |
Low |
Low |
Nominal |
Nominal |
Nominal |
Nominal |
High |
High |
Nominal |
Low |
5.5 |
18 |
clusterO |
|
34 |
High |
High |
Nominal |
Nominal |
Nominal |
Low |
Low |
Nominal |
High |
High |
Nominal |
High |
Nominal |
Nominal |
Nominal |
79 |
400 |
clusterO |
|
35 |
High |
Low |
High |
Nominal |
Nominal |
Low |
Low |
Nominal |
Nominal |
High |
Nominal |
Nominal |
High |
Very_ low |
Nominal |
302 |
2 400 |
clusterO |
System Analysis & Mathematical Modeling. 2021. T. 3, № 2
Evaluating of cost estimation accuracy is performed by comparing actual effort and estimated effort in order to compute MRE (Magnitude of Relative Error), which described as follows:
As shown more, we trained the dataset with these models, then evaluated and compared it with the MRE criterion.
After k-means clustering, the estimated cost of every project data and testing result. As a result of the experiments, 40% of the Cluster 1, 88.8% of the Cluster 2, 100% of the Cluster 3, 100% of the Cluster 4 and 66% of the Cluster 5 predicted respectively. The successfully predicted average was 79.1%.
After applying the k-means clustering and apriori algorithm, we find out the clusters that store similar cost drivers. With the help of clustering, we grouped similar behavior instances into the clusters.
Increase these to decrease effort:
Acap |
analysts capability |
|
Pcap |
programmers capability |
|
Aaexp |
application experience |
|
Modp |
Modern programming practices |
|
Tool |
use of software tools etc |
|
Lexp |
language experience |
Decrease these to decrease cost of the project:
Store |
main memory constraint |
|
Data |
data base size |
|
Time |
time constraint for cpu |
|
Virt |
machine volatility |
|
Rely |
required software reliability etc |
Conclusion
The estimation of software effort is an essential and crucial activity for the software development life cycle. In recent years, many researchers and software industries have given significant attention on the estimation of software effort. In industry, effort is used for planning, budgeting and development time calculation. Therefore a realistic effort estimation is required [8].
Generally, data mining analyzes data from different perspectives and summarizes it into helpful information: information that can be used to increase revenue, cuts costs, or both. Data mining software is one of several analytical tools for analyzing data. It allows users to analyze data from many different dimensions or angles, categorize it, and summarize the relationships identified. These results suggest that building data mining and machine learning techniques into existing software estimation techniques such as COCOMO can effectively improve a proven method's performance. We have used Weka tools for data mining because it consists of different machine learning algorithms that can easily classify the data. The main aim to show the data mining is also beneficial for the field of software engineering. Not all data mining techniques performed better than the traditional method of local calibration. However, a couple of techniques used in combination did provide more accurate software cost models than the traditional technique. While the best combination of data mining techniques was not consistent across the different stratifications of data, it shows that there are different populations of software projects and that rigorous data collection should be continued to improve the development of accurate cost estimation models. Based on this research, we can say that cost drivers perform an essential role in this estimation, which we used any analogy models. We found out some familiar cost drivers that we can use for all projects. Future work is the need to investigate some more data mining algorithms that can help improve software cost estimation and easy to use. The main reason for choosing the COCOMO model for this research is that it is the best software cost estimation model and is publicly available easily.
References
1. Shivangi Shekhar, Umesh Kumar. Review of Various Software Cost Estimation Techniques. International Journal of Computer Applications, 2016, vol. 141, no. 11, pp. 31-34.
2. Boehm B.W. Software Engineering Economics. Englewood, Prentice-hall, 1981. 767 p.
3. Boehm B., Abts C., Clark B., Devnani-Chulani S. Cocomo II Model Definition Manua. Los Angeles, University of Southern California, 1997. 68 p.
4. Benediktsson O., Dalcher D., Reed K., Woodman M. Cocomo-Based Effort Estimation for Iterative and Incremental Software Development. Software Quality Journal, 2003, vol. 11, no. 4, pp. 265-281.
5. Aljahdali S., Sheta A.F. Software Effort Estimation by Tuning COCOMO Model Parameters Using Differential Evolution. The 8th ACS/IEEE International Conference on Computer Systems and Applications, AICCSA 2010, Hammamet, Tunisia, May 16-19, 2010.
6. Deshmukh S.A., Ahmad S.W. Implementation of using classification Data Mining Techniques for Software Cost Estimation. International Journal on Recent and Innovation Trends in Computing and Communication, 2016, vol. 4, iss. 5, pp. 362-363.
7. Chirra S.M.R., Reza H. A Survey on Software Cost Estimation Techniques. Journal of Software Engineering and Applications, 2019, vol. 12, no. 6, pp. 226-248. DOI: 10.4236/jsea.2019.126014.
8. Sachan R.K., Nigam A., Singh A., Singh S. Optimizing Basic COCOMO Model using Simplified Genetic Algorithm. Procedia Computer Science, 2016, vol. 89, pp. 492-498.
Информация об авторах:
weka software cost
Уянга Самбуу -- доктор технических наук, профессор, кафедра информации и компьютерных наук, Институт инженерии и прикладных наук, Монгольский государственный университет
Оюунбилэг Пагжий -- доктор экономических наук, доцент, кафедра финансов, Институт бизнеса, Монгольский государственный университет
Мунхцэцэг Намсрайдорж -- доктор технических наук, доцент, кафедра информации и компьютерных наук, Институт инженерии и прикладных наук, Монгольский государственный университет
Наранмандал Чимэдмаа -- магистр технических наук, инженер, Центр таможенных, налоговых и финансовых информационных технологий Монголии
Размещено на Allbest.ru
...Подобные документы
Calculation of accounting and economic profits. The law of diminishing returns. Short-Run production relationships and production costs, it's graphic representation. The long-run cost curve. Average fixed, variable, total costs and marginal costs.
презентация [66,7 K], добавлен 19.10.2016Gas pipeline construction: calculating the pipe diameter, the pressure required for the transportation of natural gas compressors. The definition of capital costs for construction and operation of the pipeline. Financial management of the project.
статья [774,7 K], добавлен 05.12.2012The air transport system in Russia. Project on the development of regional air traffic. Data collection. Creation of the database. Designing a data warehouse. Mathematical Model description. Data analysis and forecasting. Applying mathematical tools.
реферат [316,2 K], добавлен 20.03.2016Thematic review of the characteristics of each factor of production. The theories of main economists. The possible variants of new factors of production. Labor resources. "Elementary factors of the labour-process" or "productive forces" of Marx.
реферат [437,4 K], добавлен 18.10.2014Resources of income for enterprises. Main ways of decreasing the costs Main ways of increasing the income. Any enterprise’s target is to make profit. In order to make it a company should understand where comes from the income and where goes out costs.
курсовая работа [59,9 K], добавлен 09.11.2010Основные этапы работы с Project Expert: построение модели, определение потребности в финансировании, разработка стратегии и анализ результатов, формирование и печать отчета. Анализ финансово-экономических показателей деятельности предприятия "SBag".
курсовая работа [5,8 M], добавлен 14.05.2014Значение и необходимость внедрения современных автоматизированных систем ведения хозяйственного учета. Характеристика программ "Audit Expert" и "Project Expert", содержащих информационную базу для учета, планирования и финансового анализа предприятия.
контрольная работа [22,5 K], добавлен 12.10.2010Экономический анализ инвестиционных проектов. Определение выгод и затрат инвестиционных проектов. Расчет показателей эффективности (Cost-Benefit Analysis). Оценка общественной эффективности проекта. Анализ рисков проекта с помощью дерева решений.
курсовая работа [165,5 K], добавлен 12.12.2008Organizational structure of "Samruk-Kazyna" JSC. Formation of financial resources of the Fund. Mining and power assets directorate. The characteristic stages of the process of registration of new legal entities. Cash flow from the operating activity has.
отчет по практике [2,6 M], добавлен 02.02.2015Project background and rationales. Development methodology, schedule planning. Company mission and vision. Organization of staff and company structure. Procurement system target market. Implementation of procurement system. Testing, user manual.
дипломная работа [6,8 M], добавлен 28.11.2013Natural gas market overview: volume, value, segmentation. Supply and demand Factors of natural gas. Internal rivalry & competitors' overview. Outlook of the EU's energy demand from 2007 to 2030. Drivers of supplier power in the EU natural gas market.
курсовая работа [2,0 M], добавлен 10.11.2013The Human Capital Theory. External Migration in Kazakhstan. The major causes of out-migration in Germany. Migration in Kazakhstan during 2004-2010. Internal Migration in Kazakhstan. The major factors determining the nature of the migration to Russia.
реферат [2,2 M], добавлен 14.04.2012Разработка модели основного бизнес-процесса создания инновационного продукта. Оценка технологических ограничений и ожидаемых затрат на производство путем математического и физического моделирования. Методика использования Microsoft Project для управления.
контрольная работа [66,8 K], добавлен 24.11.2017Negative consequences proceeding in real sector of economy. Social stratification in a society. Estimation of efficiency of economic safety. The parity of the manufacturers of commodity production. Main problems of the size of pension of common people.
статья [15,4 K], добавлен 12.04.2012Классификация экономических информационных систем и выполняемые ими функции; метод имитационного моделирования. Анализ альтернативных вариантов развития проекта и оптимальных путей развития предприятия при помощи аналитической системы Project Expert.
курсовая работа [43,3 K], добавлен 19.12.2009Факторы микросреды, методика установления цен, методы продвижения товаров на рынке с помощью рекламы. Производственный процесс изготовления и упаковки молочной продукции на ООО "ВиммБилльДэнс". Моделирование основных процессов в Project Expert.
бизнес-план [2,8 M], добавлен 04.01.2012Предпосылки развития электронного бизнеса. Переход от "детройтской" модели производства к "голливудской". Розничная торговля через Интернет в 90-х годах. Электронный обмен данными (Electronic Data Interchange). Общая схема реализации модели сообщения.
презентация [79,1 K], добавлен 22.03.2014The influence of corruption on Ukrainian economy. Negative effects of corruption. The common trends and consequences of increasing corruption. Crimes of organized groups and criminal organizations. Statistical data of crime in some regions of Ukraine.
статья [26,7 K], добавлен 04.01.2014Компьютерные программы разработки бизнес-плана. Планирование оборотных средств для написания плана. Цель и структура финансового плана в инвестиционном плане. Аналитическая система Project Expert. Расчет издержек производства и себестоимости продукции.
контрольная работа [25,6 K], добавлен 15.10.2013Понятие и признаки государственного унитарного предприятия. Особенности составления бизнес-плана с помощью программы Project Expert 6.0. Анализ экономической деятельности ФКП "УТ ЛенВО" с целью составления финансового плана развития организации.
дипломная работа [499,2 K], добавлен 05.11.2011