Detection of antipatterns in service-based systems

The issue of necessity of monitoring circumstance of software systems implemented through service-based principle in conditions of continuous development and enhancement. Pattern and antipattern detection. Metric calculation algorithms. Response time.

Рубрика Программирование, компьютеры и кибернетика
Вид дипломная работа
Язык английский
Дата добавления 31.10.2016
Размер файла 873,0 K

Отправить свою хорошую работу в базу знаний просто. Используйте форму, расположенную ниже

Студенты, аспиранты, молодые ученые, использующие базу знаний в своей учебе и работе, будут вам очень благодарны.

Размещено на http://www.allbest.ru/

Russian Federation Government

Federal State Autonomous Educational Institution of Higher Education

National Research University

"Higher School of Economics"

School of Software Engineering

Master thesis paper

Detection of Anti-patterns in Service-based Systems

09.04.04 - Software Engineering

Scientific Advisor Dmitry V. Alexandrov .

Author Alexander Yugov.

Moscow, 2016

Contents

Introduction

Chapter 1. Antipattern Detection Fundamentals

1.1 Related Work

1.1.1 Pattern and Antipattern Detection

1.1.2 Knowledge Extraction

1.2 Antipatterns classification

1.2.1 Attribute Specific Antipatterns

1.2.2 Metric Specific Antipatterns

1.3 Examples of Antipatterns

1.4 Introduction of Detection Strategy

1.5 Detection Strategy: Combining Rules

Chapter 2. Antipattern Detection Application

2.1 Metadata of metric model. Rule Cards Specification

2.2 Metric Calculation Algorithms

2.2.1 Incoming and Outcoming Call Rates

2.2.2 Response Time

2.2.3 Cohesion with Other Services

2.2.4 Number of Service Connections

2.3 Implementation

2.3.1 General Workflow Structure

2.3.2 Rule Cards XML structure

2.3.3 Log Structure

Chapter 3. Experiments

3.1 Bottleneck Service

3.2 Chatty Service

Conclusion

References

Introduction

Service-based style of software systems is very widely spread at the industrial development because it allows implementing flexible and scalable distributed systems at a competitive price. The result of development are autonomous, reusable, and independent units of a platform - services - that can be consumed via any network including the Internet [12]. That approach is becoming widely adopted in industry of software engineering because it allows the implementation of distributed systems characterized by high quality.

Quality attributes can be about the system (e.g., availability, modifiability), business-related aspects (e.g., time to market) or about the architecture (e.g., correctness, consistency) [3]. Maintaining quality-attributes on a high level is critical issue because service-based systems lack central control and authority, have limited end-to-end visibility of services, are subject to unpredictable usage scenarios and support dynamic system composition [3].

The constant evolution in an SBS can easily deteriorate the overall architecture of the system and thus bad design choices, known as SOA antipatterns [28], may appear. These are the patterns to be avoided. If we study them and are able to recognize them, then we should be able to avoid them. If we are not aware of anti_patterns, we risk repeating the mistakes others have made several times. By formally capturing the repeated mistakes, one can recognize the symptoms and then work towards getting into a problem situation. Anti-patterns tell us how we can go from a problem to a bad solution. In fact, they look like a good solution but when applied they backfire. Knowing bad practices is perhaps as valuable as knowing good practices. With this knowledge, we can re-factor the solution in case we are heading towards an anti-pattern. As with patterns, anti-pattern catalogues are also available. Traditional approaches to software delivery are based on life cycle phases of the system, when in the development process became involved various teams inside a company or even by different companies [13]. Moreover, in classical approach, the focus is on one vendor supplying the entire system or subsystem. The emergence of service-oriented architecture approach introduces a model divided into levels. It enables the existence of different design approaches; whereby different parties deliver service layers as separate elements. However, layers or even single services may be developed, operated, and evolved by independent teams or organizations. Thus, even if services were full-fledged and meeting all the quality requirements, there is no guarantee that once a system is under operation, some services will not deviate from the specification [45]. Experience in development of joint projects, divided into separate services, shows that errors may appear in potentially dangerous areas. As part of this work, we will call these areas as anti-patterns.

Anti-patterns in software systems based on services are “bad” solutions recurring design problems. In contrast to design patterns, anti_patterns are well-proven solutions that engineers should avoid. Anti-patterns can also be introduced as a consequence of various changes, such as new user requirements or operating environment changes. Given the clear negative impact of SOA antipatterns, there is a clear and urgent need for techniques and tools to detect them.

This paper will focus on exploring rules to recognize symptoms of anti_patterns from data mined from software logs and to develop prototype of software system implementing these rules.

The remainder of the paper is organized as follows. Chapter 1 presents fundamentals of antipattern detection domain. This chapter starts with observing related works, then provides classification of antipatterns basing on detection strategy types, continues with examples of antipatterns and finishes with description of detection strategy. Chapter 2 presents an illustration of proposed approach and developed research prototype of software system. It describes metadata scheme used for main detection algorithms, description of algorithms itself and observation of the application. The third chapter discusses the experiments and results of these experiments that demonstrate how proposed approach works.

Chapter 1. Antipattern Detection Fundamentals

This chapter starts with observing related works and then gives fundamental concepts of antipattern detection like classification of antipatterns, examples of antipatterns and description of detection strategy.

1.1 Related Work

This work addresses the issue of antipattern detection and knowledge extraction from execution traces, therefore surveys of related work have been provided on both: Section II-A deals with detection of patterns and antipatterns in both object-oriented (OO) and SOA paradigms while Section II-B addresses knowledge extraction.

1.1.1 Pattern and Antipattern Detection

Design (or architectural) quality is vital for building well-designed, maintainable, and evolvable service-based systems (SBSs). Patterns and antipatterns have been recognized as one of the most effective ways to express architectural concerns and solutions, and therefore target greater quality in various systems [29].

Further, a number of efforts were taken to formalize the properties and concepts of bad practices at a more high level of abstraction. Several brand new approaches have been introduced to specify and detect code vulnerabilities and antipatterns [28], [20]. They range from manual approaches, based on inspection techniques, to metric-based heuristics, using rules and thresholds on various metrics or Bayesian Belief Networks. Some approaches [1] intend to work at the level of application design and can be applied early in the software lifecycle.

Several methodologies and tools exist for the antipattern detection, specifically, in OO systems [18], [22]. However, the detection of antipatterns in SOA systems, unlike in their OO counterparts, is still in its pupillage. Among previously mentioned works, only several can be applicable to SOA. DЙCOR approach [27], initially introduced for OO systems, later was adopted to service based systems [32]. This is a rule-based approach for the specification and detection of flaws in source code and design.

The authors apply a domain specific language (DSL) for specification of suspicious places in the code and then automatically generate algorithms for smell detection. To highlight, these algorithms are directly executable. As a result of the researches, a SOFA framework (Service Oriented Framework for Antipatterns) was developed [30]. The SOFA tool implements SODA approach supporting the detection of SOA antipatterns in systems consisting of services. The tool is based on metric calculation. Antipattern specifications in DSL are translating into detection algorithms in an automated way.

However, SODA suffers from severe limitations. For SODA's functioning it is needed to provide access to source code, consequently, it cannot analyze systems that are not open-source or proprietary. In addition, since SODA specifically targets systems developed according to the SCA standard, its precision drops with growth of target system. Given these limitations, authors proposed a new way of considering for the detection of antipatterns in SOA named SOMAD (Service Oriented Mining for Antipattern Detection) [29]. SOMAD is implementation the SODA approach extending by extracting source data from execution traces (logs from any SOA technology). Nowadays, the SOMAD approach is the most relevant work to our one.

In [38] SPARSE - an approach to the declarative antipattern specification- is presented. In SPARSE, antipatterns are specified as an OWL (Web Ontology Language) ontology extended with a SWRL (Semantic Web Rule Language) rule basis whereas their occurrences are identified through automated reasoning.

Several relevant work is focused on detection of pretty specific antipatterns, for example, related to the system's performance characteristics and resource consumption and/or given technologies. For example, Wong et al. [46] apply a genetic algorithm for detecting faults in software and abnormal behavior in the usage of resources of a system (e.g. memory consumption, processor consumption, number of threads). The approach is organized by utility functions, which correspond to predicates identifying suspicious lines of behavior by means of resource consumption metrics.

In another work in this field, Parsons et al. [33] proposed the detection of performance specific antipatterns. The authors use a rule-driven approach composed of both static and dynamic analyzes that are introduced to component-based corporative systems (particularly, Java EE applications).

1.1.2 Knowledge Extraction

A large amount of studies pays attention to extraction of knowledge from execution traces. They were caused by the identification of

? business processes [19];

? crosscutting concerns (aspects) [40];

? patterns of interests among service users [2], [11];

? features either in OO systems [37] or in SOA systems [47].

Further related work is focused on the detection of patterns of service composition [42], i.e. sets of services that are constantly invocated together in various systems and that are similar in structure and functionality. Composition patterns include good practices in design and development of SBSs.

Few projects have investigated pattern identification through mining of execution traces. Ka-Yee Ng et al. [31] proposed MoDeC, an approach for detection of creational and behavioral design patterns applying dynamic analysis and constraint programming. They can instrument bytecode and reverse-engineer scenario diagrams from an OO system. After that, authors apply constraint programming to identify patterns as runtime collaborations.

Sartipi and Hu [17] tackle the identification of design patterns in logs applying scenario execution, pattern mining, and concept analysis. The approach is conducted by a set of feature-specific scenarios to detect patterns, opposing to a commonly adopted pattern detection.

Despite the fact that different in scope and goals, the mentioned above studies on antipatterns in OO systems form a sound basis of technical knowledge and expertise for creation of methods for the detection of antipatterns in SOA. However, despite a plenty of commonalities, methods of OO (anti)pattern detection cannot directly be applied to SOA. Indeed, SOA focuses on concept of services as first-class entities and, therefore, remains at a higher granularity level than classes in OO paradigm. Furthermore, the highly dynamic nature of a SBS raises issues that are not important in OO systems.

A separate point of analysis of event logs is process mining. Process mining is a relatively new discipline providing comprehensive tool sets to provide fact-based analytic assessments and support improvements for processes [44]. The main difference from data mining is in fact that existing data mining instruments are too data-centric to provide comprehensive insights of processes fully from the beginning to the end. While process mining techniques combine process models and event data, that makes possible deviation detection, decision-making support, conformance checking, delay prediction, and process redesign recommendation.

Anti-patterns can vary significantly from the point of view of detection strategy. Following section considers different classes of antipatterns.

software antipattern detection service

1.2 Antipatterns classification

1.2.1 Attribute Specific Antipatterns

First class is characterized by usage of specific attributes of service behavior. The only proposed approach for detecting such antipatterns is proposed by Canadian researchers [29]. The described approach identifies detection strategy of antipatterns of this class as checking messages, by which services communicate with each other, on presence of specific attributes. For example, for detection of Ignoring Caching anti-pattern [39] it is necessary to scan response header and try to find “Cache-Control” attribute. If this attribute is omitted or set to “no_cache” or “no_store”, it should be considered as acquisition of anti-pattern.

The main disadvantage of that approach is that even if attribute is mentioned in transferred message it could be ignored by a recipient. A possibility to check real usage of attribute existence is provided by process mining technique called conformance checking [43]. The conformance checking techniques described in preliminary form in [36]

Conformance checking relates events in the event log to activities in the process model and compares both [44]. The goal is to find commonalities and discrepancies between the modeled behavior and the observed. I.e. the model of desired behavior should be prepared and on case of message with special “Cache-Control” attribute the real behavior of the recipient service have to conform to modeled one.

The challenge is to predefine model of service behavior. The mission is impossible while no possibility to look into infrastructure of the program system under consideration exists. Of course, no one vendor allows us to do such exploration. This is a reason why this type of anti-patterns will not be considered in current work.

1.2.2 Metric Specific Antipatterns

Detection of antipatterns of this class is basing on calculation of specific numerical values. The strategy is far from being new and it characterizes intrinsically any metrics-based approach. Back in 1990, Card already emphasized that metrics should be employed for the evaluation of software design from a quality point of view [6].

Typical detection strategy consists of a collection of metrics - rules. The preliminary defined values of this metrics are called symptoms. Calculation of metrics displays whether the symptom is presented or not.

The challenge is to predefine the values of metrics that show us presence or absence of symptom in the object to be analyzed, i.e. what are the correct threshold values to be used and how it can be defined. In most of cases, setting the threshold values is highly empirical process and it is guided by similar past experiences and by hints from metrics' authors [2]. The metrics itself could be cohesion of services, coupling of services, number of methods, response and availability time, etc. Metrics in more detailed way will be observed later.

1.2.3 Relational specific antipatterns

This is a behavioral property of the system under consideration. Antipatterns of this type are characterized by emergence of specific sequential interactivity patterns. In other words, antipatterns typically refer to a set of activities that often appear together in a transaction log.

Depending on distance between observed occurrences, the methods of detection and, more frequently, correction can vary significantly.

1.3 Examples of Antipatterns

The problem of antipattern specification is not new. Already exist several repositories of antipatterns. They are collected in books [4] or web resources [39].

One of the last works by detecting of antipattern in service-oriented architectures (SOA) has been proposed in Moha et al. in 2012 [28]. There authors proposed an approach to the determination and detection of an extensive set of SOA anti-patterns operating such concepts as granularity, cohesion and duplication. In addition to the approach, authors identified three antipatterns, namely: bottleneck service, service chain and data services. Bottleneck is a service that is used by many other components of the system, and as a result, is characterized by high incoming and outgoing connections affecting the response time of service. Chains of services occur when a business object is achieved by a long chain of successive calls. Data service is a service that performs a simple operations of information search or data access, which may affect the connectivity of the component. In 2012, Rotem-Gal-Oz [35] identified the “knot” antipattern, a small set of connected services, which, however, is closely dependent on each other. Anti_pattern, thus, may reduce the ease of use and response time.

Another example of anti_pattern is “sand pile” defined by Kr'al et al [21]. It appears when many small services use shared data, which can be accessed through the service, which represent the “data service” anti_pattern.

In the paper of Scherbakov et al. proposed “duplicate service” antipattern [9] that affects sharing services that contain similar functions, causing problems in the support process.

In 2003 Dudney et al. [10] have identified a set of anti-patterns for the J2EE applications. “Multi service” anti_pattern stands out, among others, a “tiny service” and “chatty service”. Multi service is a service that provides a variety of business operations, which have no practical similarity (for example, belong to different subsystems) that can affect service availability and response time. Tiny service is a small service with few methods, which are always used together. This can lead to the inability of reuse. Finally, an anti-pattern “chatty service” represents such services that constantly call each other, passing small amount of information.

To highlight, collection and specification of anti-patterns is behind the interest of this work. For analysis the already prepared anti_pattern specifications will be taken and translated into formal models. Let us consider some of anti_patterns that will be used later in detection experiments.

Chatty Service (fig. 1.1) corresponds to a group of services that transfer many small data of primitive types to each other. A high number of service invocations also characterizes the Chatty Service. More generally, chatty services chat a lot with each other.

Figure 1.1. “Chatty Service” diagram.

Bottleneck Service (fig. 1.2) is a service that is highly used by other clients or services. It has a high coupling as outgoing as incoming. Its time of response can be large because too many external services may call it, for which clients may need to wait to get access to the service. In addition, due to the traffic its availability may also be low.

Figure 1.2. “Bottleneck Service” diagram.

Tiny Service (fig. 1.3) is a small service with few methods, which only implements part of an abstraction. Such service often requires several services to be coupled to use together, resulting in higher complexity of development and reduced usability. In the extreme case, a Tiny Service will be limited to only one method, resulting in many services that implement an overall set of requirements.

Figure 1.3. “Tiny Service” diagram.

Table 1 contains a list of performance antipatterns. Each row represents a specific antipattern that is characterized by corresponding specification of proposed metrics.

Table 1. Antipattern specification.

Chatty Service

Bottleneck service

Tiny service

Incoming call number

Very high

Outcoming call number

Very high

Number of mutual calls

Very high

Number of methods

Very low

Coupling with other

Very high

Very high

Response time

High

As it can be observed in the table, chatty service is characterized by very high coupling with another service and very high number of mutual calls. It should be highlighted that “number of mutual calls” is the only characteristic in the table above, which is calculated for pair of services while all other ones are calculated for each particular service. Bottleneck service has very high number of incoming and outcoming calls. Additionally, it has very high number of connections - this is number of arcs as from as to service under consideration.

Tiny service antipattern consists of very low number of methods and these methods are called obligatorily coupling with another service.

1.4 Introduction of Detection Strategy

The design structure flaws have a strong negative impact on such quality attributes as maintainability or flexibility [15]. Therefore, the detection and identification of these problems in design is essential for the evaluation and improvement of quality of software.

As DeMarco noted [8], in order to control the quality of development, correct quantitative methods are needed. Already in 1990 Card emphasized that metrics should be used to assess the development of software in terms of quality [6]. But what should be measured? In the above context of design rules, principles and heuristics, this question should be rephrased as follows: is it possible to express the principles of “good design” in a measurable way?

The main goal of this approach is to provide a mechanism for engineers, which will allow them to work with metrics on a more abstract level, which is conceptually much closer to real conditions of applying numerical characteristics. Mechanism defined for this purpose is called a detection strategy. detection strategy was proposed [25] to help developers and maintainers localize and identify design problems in a system, it was a novel mechanism for formulating metric-based rules that represent deviations from proper heuristics and design principles.

Detection strategy [25] is a quantitative expression of the rules by which specific pieces of software (architectural elements), corresponding to this rule, can be found in the source code.

By this reason, the detection strategy is a common approach to analysis of the source code model using metrics. It should be noted that in the context of the above definition, "quantitative expression of the rule" means that the rule should be properly expressible using metrics. The use of metrics in detection strategies grounded filtering mechanisms and composition. In the following subsections, these two mechanisms will be considered more detailed.

The key problem in data filtering is reducing the initial collection of information, so that there remain only those values that are of particular value. This is commonly referred to as data reduction [16]. The aim is to detect those elements of the system, which have special properties. Limits (boundaries) of the subset are determined on the basis of the type of filter. In the context of the measurement process with respect to the software, we usually try to find the extreme (abnormal) values or those values that lay within a certain range. Therefore, distinguish types of filters [25]:

· Marginal filter is a data filter, in which one limit (border) in the result set is clearly identified with a corresponding restriction of the original data set.

· Interval filter is a data filter, in which the lower and upper limits of the resulting subset are explicitly specified in the definition of the data set.

Marginal filters consist of two depending on how we specify the borders, resulting dataset limiting filters may be semantical or statistical.

· Semantical. For these filters two parameters must be specified: a threshold value that indicates a limit value (to be explicitly indicated); and the direction that determines whether the threshold upper or lower limit of the filtered data set. This category of filters is called semantical as the choice of options is based on the semantics of specific metrics in the framework of the model chosen for the interpretation of this metric.

· Statistical. Unlike semantical filters, statistical ones do not require explicit specifications for the threshold, as it is defined directly from the original data set using statistical methods (e.g., scatter plot). However, the direction is still to be specified. Statistical filters are based on the assumption that all the measured entities of the system are designed using the same style, and therefore, the measurement results are comparable.

In this paper, a set of specific data filters of the two previous categories were used. Basing on practical use and interpretation of the selected models, these filters may be grouped as follows:

· Absolute semantic filters: HigherThan and LowerThan. These filtering mechanisms are parameterized by a numerical value representing the border. We will only use data filters are to express "clear" design rules or heuristics, such as "class should not be associated with more than 6 other classes." It should be noted that the threshold is specified as a parameter of the filter, while the two possible directions are defining by two particular filters.

· Relative semantic filters: TopValues and BottomValues. These filters differentiate the filtered data set according to the parameter that determines the number of objects to be recovered, and do not indicate the value of the maximum (or minimum) values are permitted in the result set. Thus, the values in the result set will be considered with respect to the original data set. The parameters used may be absolute (for example, "select 20 objects with the highest values") or percentile (for example, "to remove 10% of the measured objects with the lowest values"). This type of filter is useful in situations where it is necessary to consider the highest (or lowest) values of a given data set, rather than indicating the exact thresholds.

· Statistics: scatter plots. Scatter diagram is a statistical method that can be used to detect outliers in the data set [14]. Data filters based on these statistical techniques, which, of course, not limited to only the scatter diagrams, are useful in the quantification of rules. Again, we need to specify the direction of the deviation of adjacent values based on design rules of semantics.

· Interval Filters. For this data interval it is necessary to define two thresholds. However, in the context of the detection strategies, where, in addition to the mechanism of filtering, the composition mechanism exists, filter interval is defined by two composition of two semantic absolute filters of opposite directions.

Unlike simple metrics and interpretation models of it, detection strategy should be able to draw conclusions on the basis of a number of rules. Consequently, in addition to the filtering mechanism, which supports the interpretation of the particular metric results, we need a second mechanism for comparing the results of calculations of a number of metrics - a mechanism of composition. Composition mechanism is a rule combining the results of calculating several metric values. In the literature three composition operators were observed: “and”, “or” and “butnot” [25].

These operators can be discussed from two different perspectives:

· From a logical point of view. These three operators are a reflection of rules to combine multiple detection strategies, where operands are descriptions of the design characteristics (symptoms). They facilitate reading and understanding of the detection strategy, because operators of composition are generally expressed in the form of quantitative characteristics, so it is similar to the original wording of the informal thoughts. From this point of view, for example, the operator «and» presupposes that the investigated object has both symptoms that are combined by the operator.

· From the point of sets. This view helps to understand how to build the ultimate result of the detection strategy. The initial set of calculation results on each of the metrics is carried out through the filtering mechanism. Then remains limited set of system elements (and calculated metrics for these elements), which are interesting for further investigation. The resultant plurality of filtered sets should be merged with the operators using the formulation. Thus, in terms of operations on sets, the operator "and" will correspond to the operation of intersection (?), the operator "or" to reunion operation, and the operator “butnot” to minus operation.

1.5 Detection Strategy: Combining Rules

This section will be written in the formation of a strategy on the example of the detection of a particular anti-pattern "God Service. The starting point is the presence of one (or more) of the informal rules that describes the problem situation. In this example, we will proceed from the three heuristics found in the book of Riel [34] for “God Object” anti_pattern designed initially for object-oriented systems:

· The top-level services should share equally the responsibility.

· Services should not contain large amounts of semantically separate functions.

· Services should not have access to fields or properties of other services.

The initial step to create a detection strategy is to translate the set of informal rules into symptoms that can be evaluated by a particular metric. In the case of God Object anti pattern, the first rule refers to an equal sharing of responsibilities among services, and therefore it refers to service complexity. The second rule tells us about the intensity of communications among this service and all other services; thus, it refers to the low cohesion of services. The third heuristic describes a special coupling i.e., the direct access to data items manipulated by other services. In this case, the symptom is access to “foreign” data.

The second step is to find appropriate metrics, which evaluate more precisely every of the discovered properties. For the God Service anti pattern, these properties are complexity of the service, cohesion of the service and access to data from other services. Therefore, we found the following set of metrics:

· Weighted Method Count (WMC) is the sum of the static complexity of all methods in a class [7]. We considered the McCabe's approach as a complexity measure [26].

· Tight Class Cohesion (TCC) is the relative number of directly connected methods [5].

· Access to Foreign Data (ATFD) represents the number of external classes from which a given class accesses attributes, directly or via accessor-methods [24].

The next step is to select an appropriate filtering scheme that should be applied to all metrics. This step is mainly done basing on the rules described earlier. Therefore, as the first symptom is a “high service complexity” the TopValues relative semantical filter was chosen for the WMC metric. For the “low cohesion” symptom it was also chosen a relative semantical filter, but now the BottomValues one. For the third symptom, an absolute filter was selected as we need to catch any try to access a “foreign” data; thus, we the HigherThan filter will be used.

One of vital issues in creating a detection strategy is to choose proper parameters (i.e., threshold values) for all data filters. Several approaches exist to do this, but now we just take a 25% value for both the TopValues filter for to the WMC metric and to the BottomValues filter for the TCC metric. As for filter boundary for the ATFD metric, the decision is pretty simple: no direct access to the data of other services should be allowed, therefore, the threshold value is 1.

The final step is to join all the symptoms, with applying of the special operators described before. From the unstructured heuristics as presented in [34], it was inferred that all three symptoms should be combined if a service is supposed to be a behavioral God Object.

The intention of this work is to use detection strategies in rule definitions in order to facilitate detection of anti_patterns in service based software systems i.e., to select such areas of the system (subsystem) that are participated in a particular anti pattern. From this point of view, it should be emphasized that the detection strategy approach and the whole method is not limited by finding problems, but it also can facilitate completely different objectives too. For instance, different investigation purposes could be in reverse engineering [6], design pattern detection [24], identification of components in legacy systems [41], etc.

Chapter 2. Antipattern Detection Application

2.1 Metadata of metric model. Rule Cards Specification

Let us consider detection rules for metric based detection of anti_patterns. Each rule contains a number of metrics with predefined specific numerical thresholds. They are called symptoms. Each metric has unique algorithm of calculation. After execution of that algorithm, the value of the metric is received. Further, this value is compared with written in rule card. If received value coincide with predefined one, it means presence of the symptom. Otherwise, it means the absence of the symptom. You can find metamodel on fig. 2.1.

Figure 1.4. Rule card scheme.

Each rule consists of a pair: metric and value. Metric says about a method of calculation. And then received value is compared with the value mentioned in the rule.

2.2 Metric Calculation Algorithms

Calculations intended to detect antipatterns is conducted basing on several basic metrics:

1) incoming call rate;

2) outcoming call rate;

3) response time;

4) number of service connections;

5) cohesion with other services;

6) etc.

Each metric has its specific model and its specific algorithm to calculate. Values of this metric have decisive influence on detection of services participating in antipatterns.

In calculation of metrics, objective measures of occurrence pattern interestingness of data mining like confidence and support are used. These are based on the structure of discovered patterns and the statistics underlying them.

A measure for association rules of the form X>Y is called rule support, representing the percentage of transactions from a log database that the given rule satisfies. This is intended to be the probability P(X ? Y), where X ? Y indicates that a transaction contains both X and Y, that is, the union of item sets X and Y.

Another objective measure for association rules from data mining is confidence, which addresses the degree of certainty of the detected association. In classical data mining this is taken to be the conditional probability P(X ? Y), that is, the probability that a transaction containing X also contains Y. More formally, confidence and support are defined as

Размещено на http://www.allbest.ru/

In general, each measure of interestingness is associated with a threshold, which may be controlled. For calculation of metrics each final value of metric is confidence (which is calculated not as in classical data mining but more complexly) divided by support measure (which is calculated in the same manner as in classical data mining).

Further, each metric is described in more details.

2.2.1 Incoming and Outcoming Call Rates

The model for calculation of IncomingCallRate metric is call matrix. This matrix represents calls services make to each other. For building this matrix and some other models, we need to identify the order of calls. This information does not stored in logs, therefore, the first task is to mine service calls from log.

Procedure of mining calls consists of several main steps. The first is ordering log events by traces. This is necessary because occurrence of events in particular order in boundaries of one trace gives us evidence of one particular service call. To mine all the service calls properly it is needed to sort events in the log chronologically within every trace. Once ordering on both levels (trace and timestamp) is finished, we can go through the log and reconstruct service calls.

Received values in mined matrix will represent generalized number of calls among services for as IncomingCallRate as OutcomingCallRate.

2.2.2 Response Time

Response time metric represent general bandwidth of a particular service. This parameter is crucial for systems having high load. Calculation of this metric uses assumptions made for IncomingCallRate and OutcomingCallRate metric but with some modifications. As we are aimed here at measure of time characteristic, the object to explore will be time stamp parameter of the log.

Given defined algorithm for incoming and outcoming call rates, we modify it with calculation of time prospect. Instead of just number of calls, we calculate general length of service response. In such a way the summarized time while service was busy is calculated. As a result of precious calculations, the matrix of general time every service spent on work was obtained. Following step is to normalize real values, i.e. to measure not in absolute number but in relative number. This relative number will show percentage of time where the service was working on processing calls. This metric can be used for detection of both highly loaded services and rarely used services.

2.2.3 Cohesion with Other Services

For calculation of this metric classical data mining rules are implemented.

For this the conditional probability P(X ? Y) is taken. That is, the probability that a transaction containing X also contains Y. Additionally, the special rule for ordering is added. This means that X>Y and Y>X is different relations. I.e. we observe not only occurrence at one trace but also the order of occurrences.

High rate of confidence of this metric is evaluated as high cohesion of several services and, therefore, high behavioral dependency.

2.2.4 Number of Service Connections

All the previous metrics were dynamic characteristics of a system under consideration while number of service connections is a static property of the system. For mining this property the incidence matrix of service calls is enough. If one service called once another service, we do not consider the same connections in future. Obtained incidence matrix allows us to calculate all existing connections in the system.

The basic model to calculate each of metrics is Graph model (fig. 2.2) which is extended in each particular metric calculation algorithm with specific attributes. As part of this work, it is assumed that each object, once appeared in the system, initiates a sequence of operations to be performed on the object. This sequence of operations is called workflow. It is worth noting that not every service-oriented system is based on this principle, but we will consider only such systems.

Figure 1.5. Base graph model for calculation of metrics.

In this model, services of a software system are presented by graph nodes. Arcs of the graph represent the call ratio, i.e., oriented arc from Srv1 to Srv2 shows that Srv1 in the process of operation calls one of Srv2 functions. Depending on what metric should be calculated, edges of the graph are marked by specific values. For example, on fig. 2.2 arcs are labeled by amount of calls in a particular direction and, in parentheses, some weighted value of the transmitted data.

2.3 Implementation

To automate the process of anti_pattern detection the research prototype of information system, which implements the described approach, has been developed. The proposed approach is shown below in this section.

2.3.1 General Workflow Structure

The workflow of the software system consists of several steps. At the point of entry, the program takes log, which is reading from the relational database implemented in SQL Server, and rule card describing rules to detect particular antipattern.

General workflow structure is presented in fig. 2.3. It starts with reading input data, which are:

1) log from some software system implemented according to SOA principles;

2) Rule card describing all the rules and metrics needed for detection of each particular antipattern.

Figure 1.6. Workflow model of the program.

Once the XML with antipattern description is read the system starts calculation of metrics. Each metric is calculated against its specific algorithm. Therefore, for each rule the process of metric calculation has been launching. First, the special model used for analysis of a particular metric is build. All the models were defined previously. Then with use of received model metrics are calculated. As a result of this process, services suspected in participation in the antipattern are selected.

Next step is to integrate results received in threads of calculation of metrics. The integration is conducted as intersection of result sets from previous threads. Finally, we obtain set of suspicious services, which are parts of antipattern. Commonly there are several services, but is always can be that just one service represents antipatterns or no such services at all were discovered.

2.3.2 Rule Cards XML structure

The rule cards are storing in XML format. The structure of XML represents scheme of described above rule card structure. The scheme of XML is presented in fig. 2.4 in graphical mode and in fig. 2.5 for more detailed view in XSD standard. The XML should have specialized namespace: “RuleCardNS”. The root element is “RuleCard”. It has name element called “AntipatternName”. This also plays the role of identification attribute.

Each Rule is defined through type attribute, metric value and its own name. The type attribute describes what metric (from a set of available metrics) should be calculated. Metric value refers to specific value of calculated metric, which shows whether the service under analysis has a particular symptom or not. Finally, rule name is an identification property for rule.

Figure 1.7. Structure of antipattern XML.

<?xml version="1.0" encoding="utf-8"?>

<xs:schema xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:xs="http://www.w3.org/2001/XMLSchema"

attributeFormDefault="unqualified"

elementFormDefault="qualified"

targetNamespace="RuleCardNS">

<xsd:element name="RuleCard">

<xsd:complexType>

<xsd:sequence>

<xsd:element name="AntipatternName" type="xsd:string" />

<xsd:element name="Rules">

<xsd:complexType>

<xsd:sequence>

<xsd:element maxOccurs="unbounded" minOccurs="1" name="Rule">

<xsd:complexType>

<xsd:sequence>

<xsd:element name="RuleName" type="xsd:string" />

<xsd:element name="MetricValue" type="xsd:string" />

</xsd:sequence>

</xsd:complexType>

</xsd:element>

</xsd:sequence>

</xsd:complexType>

</xsd:element>

</xsd:sequence>

</xsd:complexType>

</xsd:element>

</xs:schema>

Figure 1.8. Detailed structure of antipattern XML.

2.3.3 Log Structure

The main weakness of previously observed works was the necessity to modify source code of a particular system in order to evaluate concrete metrics. In this work we use event logs to create a process model of the system and calculate metrics basing on this model. To apply these, it is assumed that the information system records data of events. These logs also contain unstructured and irrelevant data, e.g. information on hardware components, errors and recovery information, and system internal temporary variables. Therefore, extraction of data from log files is a non-trivial task and a necessary pre-processing step for analysis. Business processes and their executions related data are extracted from these log files. Such data are called process trace data. For example, typical process trace data would include process instance id, activity name, activity originator, time stamps, and data about involved elements. Extracted data are converted into the required format. To be able to analyze log, the log should have specified structure. The minimal requirements for log is as follows:

1) TraceID: shows the identity attribute for a particular trace;

2) ServiceID: shows the identity attribute for a particular service;

3) FunctionID: shows the identity attribute for a particular function in the service;

4) Timestamp: shows the time of occurrence of a particular event.

The log sample is presented in Table 2.

Table 2. Source Log Sample

TraceId

Service

Function

TimeStamp

1

Srv2

C

2015-06-15 00:25:20

1

Srv1

A

2015-06-15 00:33:24

2

Srv4

F

2015-06-15 00:32:25

3

Srv3

E

2015-06-15 00:24:13

1

Srv2

C

2015-06-15 00:31:52

3

Srv1

B

2015-06-15 00:34:05

4

Srv4

G

2015-06-15 00:25:12

3

Srv3

E

2015-06-15 00:26:28

4

Srv1

A

2015-06-15 00:28:21

4

Srv2

C

2015-06-15 00:30:32

2

Srv1

A

2015-06-15 00:29:48

2

Srv2

C

2015-06-15 00:29:51

Each field included in log has its own purpose in future usage. TraceID is needed for distinguishing events among execution sequences, i.e. for majority of metrics it is necessary to connect events in boundaries of one trace. Moreover, inside traces events appears in chronological order. That is why timestamp is included in log format.

ServiceID and FunctionID describe source of each particular event. In addition, dimensions of functions and services are main structural units in analysis and creation of models.

2.3.4 Application Architecture

The whole architecture consists of three layers (fig. 2.6). The first layer is data layer. It stores log data. For storing logs, it was decided to draw on the SQL Server (Express edition) database. The log structure was described earlier in previous section.

Figure 1.9. Application architecture.

The client application is developed using ASP.NET MVC Platform. ASP is a part of .NET framework provided by Microsoft corporation. This platform provides dividing the logic into three layers: data model layer, business logic layer, and user interface layer. The interface part is developed with Razor markup JavaScript language and the logic is implemented using C# programming language.

2.3.5 Graphical Representation

Results of analysis is depicting in general graph representation (fig. 2.7). Nodes in this graph are services and edges in this graph are direct references among services.

Each node represents one service observed in the system whose log has been observed. As for example on fig. 2.7, nodes such as for services Srv2, Srv3, and Srv4 represent proper developed services, i.e. they are not participated in antipatterns. Suspicious services are marked with “!” sign, that means that this particular service is a part (or is whole) of antipattern. In our example this is service number 1 (Srv1).

Edges represent calls made of one service to another one. Concerning example from fig. 2.6, Srv4 calls Srv1 therefore one edge directed from Srv4 to Srv1 is depicted. Srv1 and Srv2 calls functions of each other therefore the edge is bidirectional.

Figure 1.10. Graphical representation of results.

As additional feature, it is highlighted the possibility to resize largescale of the picture the user can observe. Moreover, it is possible to format automatically the whole graph picture in one figure like line, tree, or circle.

Chapter 3. Experiments

To demonstrate the work of the proposed approach, it was performed some experiments that consisted in specifying the anti_patterns presented in the previous sections and detecting them automatically. Concretely, these experiments aim to show the extensibility of the language for specifying new anti_patterns, the accuracy and efficiency of the detection algorithms, and the overall correctness of the underlying framework. As part of the experiments. The experiments were organized in two steps.

1. Analysis of artificially in advance prepared execution traces.

2. Execution traces of real software system.

For the first case the logs were prepared in such a way to demonstrate one particular anti_pattern. For the second case the developed approach was implemented to analyze real logs, results of the research were validated by members of team who develop that system.

All experiment data you can find using especially prepared demo accounts in the system. The system is located on Azure: http://belleteyn.azurewebsites.net/.

3.1 Bottleneck Service

Bottleneck Service is a service that is highly used by other services or clients. It has a high incoming and outgoing coupling. Its response time can be high because it may be used by too many external clients, for which clients may need to wait to get access to the service. Moreover, its availability may also be low due to the traffic.

RULE CARD: BottleneckService {

RULE: BottleneckService { INTER LowPerformance HighCoupling };

RULE: LowPerformance {INTER LowAvailability HighResponse };

RULE: HighResponse { RT HIGH };

RULE: LowAvailability { A LOW };

RULE: HighCoupling { CPL VERY HIGH };

};

Here RT is a “response time” metric, A is an “availability” metric and CPL is a “coupling” metric.

Let us consider event log representing anti_pattern “Bottleneck Service”. The log is presented in table 3.

Sample event log for “Bottleneck Service” anti_pattern.

TraceId

ServiceName

FunctionName

OccurenceTime

1

Srv2

C

2016-06-03 22:00:09.013

1

Srv1

A

2016-06-03 22:01:09.013

2

Srv4

F

2016-06-03 22:01:09.013

3

Srv3

E

2016-06-03 22:02:09.013

1

Srv2

C

2016-06-03 22:04:09.013

4

Srv4

G

2016-06-03 22:05:09.013

3

Srv1

B

2016-06-03 22:05:09.013

3

Srv3

E

2016-06-03 22:06:09.013

4

Srv1

A

2016-06-03 22:07:09.013

4

Srv2

C

2016-06-03 22:08:09.013

2

Srv1

A

2016-06-03 22:09:09.013

2

Srv2

C

2016-06-03 22:10:09.013

In this example the Srv1 service is an obvious center of interaction as each service calls it. Moreover, each service waits a response from Srv1. The result of work of anti_pattern detection algorithm is presented on fig. 3.1.

Figure 1.11. Result demonstration of detection of “Bottleneck Service” anti_pattern.

...

Подобные документы

  • Lines of communication and the properties of the fiber optic link. Selection of the type of optical cable. The choice of construction method, the route for laying fiber-optic. Calculation of the required number of channels. Digital transmission systems.

    дипломная работа [1,8 M], добавлен 09.08.2016

  • Сутність, типи, архітектура, управління, швидкість реакції та інформаційні джерела СВВ. Особливості документування існуючих загроз для мережі і систем. Контроль якості розробки та адміністрування безпеки. Спільне розташування та поділ Host і Target.

    реферат [28,0 K], добавлен 12.03.2010

  • The material and technological basis of the information society are all sorts of systems based on computers and computer networks, information technology, telecommunication. The task of Ukraine in area of information and communication technologies.

    реферат [29,5 K], добавлен 10.05.2011

  • Опис специфічних просторів імен, класів, функцій, використаних при роботі з системними процесами. Створення Windows service та клієнта-програми до неї, що виводить діючі курси валют (купівлі\продажу долара, євро та рубля) деяких банків в режимі онлайн.

    курсовая работа [659,1 K], добавлен 21.04.2015

  • Необходимость программы "Мониторинг" для службы Service Desk в АО "Алюминий Казахстана". Обработка заявок в службе Service Desk по установке программного обеспечения, покупке и замене офисной техники и расходных материалов. Управление уровнем сервиса.

    курсовая работа [3,4 M], добавлен 23.02.2015

  • Значение Информационных технологий. Традиционные проблемы взаимодействия. Принципы организации и возможности автоматизированной Диспетчерской службы. Основные преимущества компьютеризированной реализации службы Service Desk. Классификация, учет обращений.

    лекция [2,0 M], добавлен 04.12.2014

  • Спецификация организации службы Short Message Service. Алгоритм работы сервера и возможность расширения функциональных возможностей. Реализация проекта на языке высокого уровня С++ на платформе Linux. Расчет себестоимости и цены программного продукта.

    дипломная работа [168,6 K], добавлен 19.01.2014

  • Сравнение эталонных моделей OSI, TCP. Концепции OSI: службы; интерфейсы; протоколы. Критика модели, протоколов OSI. Теория стандартов Дэвида Кларка (апокалипсис двух слонов). Плохая технология как одна из причин, по которой модель OSI не была реализована.

    реферат [493,1 K], добавлен 23.12.2010

  • Overview history of company and structure of organization. Characterization of complex tasks and necessity of automation. Database specifications and system security. The calculation of economic efficiency of the project. Safety measures during work.

    дипломная работа [1009,6 K], добавлен 09.03.2015

  • Простые системы для отслеживания заявок. Информационные потоки, возникающие на этапе поступления запроса для решения инцидента. Концептуальная и логическая модель данных. Разработка программного обеспечения по автоматизации процесса Службы Service Desk.

    дипломная работа [2,6 M], добавлен 11.06.2017

  • History of development. Building Automation System (BMS) and "smart house" systems. Multiroom: how it works and ways to establish. The price of smart house. Excursion to the most expensive smart house in the world. Smart House - friend of elders.

    контрольная работа [26,8 K], добавлен 18.10.2011

  • Review of development of cloud computing. Service models of cloud computing. Deployment models of cloud computing. Technology of virtualization. Algorithm of "Cloudy". Safety and labor protection. Justification of the cost-effectiveness of the project.

    дипломная работа [2,3 M], добавлен 13.05.2015

  • Data mining, developmental history of data mining and knowledge discovery. Technological elements and methods of data mining. Steps in knowledge discovery. Change and deviation detection. Related disciplines, information retrieval and text extraction.

    доклад [25,3 K], добавлен 16.06.2012

  • Понятие компонентов как определенного типа объектов, их свойства и функции. Режимы создания: Design-time и Run-time, их сравнительная характеристика, условия и возможности использования, преимущества и недостатки. Контролеры за объектами, их значение.

    презентация [1,3 M], добавлен 27.10.2013

  • Многоуровневая разветвлённая система сертификации инженеров по компьютерным сетям. Устройства сетевой безопасности. Крупные системы видеоконференций TelePresence. Программное обеспечение управления сетью. Универсальные шлюзы и шлюзы удалённого доступа.

    презентация [212,2 K], добавлен 26.11.2014

  • Анализ числа отремонтированных предприятием скважин и фонда заработной платы, функций служб и отделов. Методология сервисного обслуживания. Разработка отчетов по ключевым показателям эффективности. Типовая модель SLA. Структура HP OpenView Service Desk.

    отчет по практике [426,1 K], добавлен 09.11.2014

  • Модель взаимодействия открытых систем Open Systems Interconnection Reference Model. Основные особенности модели ISO/OSI. Характеристики физических сигналов, метод кодирования, способ подключения. Канальный уровень модели ISO/OSI. Передача и прием кадров.

    презентация [52,7 K], добавлен 25.10.2013

  • Экономическая характеристика организации, структура и анализ современной деятельности. Оценка рынка информационных систем и выбор лучшей. Обоснование проектного решения по информационному и программному обеспечению. Технологическое обеспечение проекта.

    дипломная работа [4,7 M], добавлен 21.05.2013

  • Программа обмена сообщениями через Интернет в реальном времени через службы мгновенных сообщений (Instant Messaging Service, IMS). Приемы и навыки объектно-ориентированного программирования с использованием языка программирования высокого уровня C#.

    курсовая работа [1,4 M], добавлен 07.07.2013

  • Использование UDP для экспорта данных в MS Project. Документирование моделей, формирование HTML-отчета. Технология создания SADT модели. Стоимостной анализ (Activity Based Costing). Способы создания диаграммы Ганта с помощью программы Microsoft Project.

    курсовая работа [6,2 M], добавлен 24.09.2013

Работы в архивах красиво оформлены согласно требованиям ВУЗов и содержат рисунки, диаграммы, формулы и т.д.
PPT, PPTX и PDF-файлы представлены только в архивах.
Рекомендуем скачать работу.