Methods for evaluating the quality of knowledge extraction systems for weakly structured subject areas

Analyze metrics to extract information, especially for poorly defined domains. Certain concepts and relationships that are difficult to identify formally. Using linguistic methods to extract information. Basic requirements for the Yandex. Metrica system.

Рубрика Экономика и экономическая теория
Вид курсовая работа
Язык английский
Дата добавления 30.08.2020
Размер файла 3,6 M

Отправить свою хорошую работу в базу знаний просто. Используйте форму, расположенную ниже

Студенты, аспиранты, молодые ученые, использующие базу знаний в своей учебе и работе, будут вам очень благодарны.

Размещено на http://www.allbest.ru/

ПРАВИТЕЛЬСТВО РОССИЙСКОЙ ФЕДЕРАЦИИ

Федеральное государственное автономное образовательное учреждение

высшего образования

Национальный исследовательский университет

«Высшая школа экономики»

Выпускная квалификационная работа

Methods for evaluating the quality of knowledge extraction systems for weakly structured subject areas

Anastasia Khorosheva

Москва 2020

Table of contents

General statement

Thesis statement

Theory

Chapter Conclusion

Methods

Data

Experiment

Results

Analysis

Discussion

Conclusion

Reference

General statement

This work's objective is to analyze metrics for Information Extraction, especially for ill-defined domains. According to Dan Jurafsky, Information Extraction is referred to as “process of extracting limited kinds of semantic content from the text” [1]

Traditionally, in Information Extraction, one can think of such metrics as Precision and Recall, and all those that can be further derived from these two (like F-measure). These metrics, however, sometimes are not tailor-made for the specific tasks or domains, sometimes are not as precise as they could be, and hence there's a way to improve them. The research and methodology behind this work are aimed to do so.

Information Extraction is thought of as the task of finding useful information in a collection of texts from some domain of interest. Information about some domain structured in a certain way is called knowledge. Knowledge must be represented in the machine-understandable (formal) way.

Knowledge domains are usually divided into well-defined (well-structured) and ill-defined ones. This work mainly focuses on the latter ones.

The ill-defined domains involve certain concepts and relationships that are hard to identify formally. For instance, it is a concept of “quality of life” or “disease prevention” in the medical domain. Each of them may be represented as a set of elements expressed in different ways(one can think of them as a set of entities and relations).

Moreover, the ill-defined domains are challenging to deal with as they comprise various sources of knowledge from multiple disciplines. However, before constructing a base, the knowledge must be extracted from the source.

Knowledge extraction is the process of elicitation of valuable information (knowledge) from structured (databases,RDF, XML) and unstructured (text, audio, images, etc) sources in a machine-interpretable format. Knowledge-extraction systems are built with an intention to generate a schema based on the source data or to reuse the existing formal knowledge. As for now, the best knowledge extraction systems employ both linguistic methods and machine learning, with architecture built around such tasks as NER+Coreference+Fact Extraction [see Appendix1]. In any knowledge extraction system it is important to evaluate the performance.

Thesis statement

The research is focused on metrics for the successful knowledge extraction in ill-defined domains (as in medical one). This task is challenging because of the above mentioned peculiarities of ill-defined domains. With the research question being “are there any scalable parameters that can be useful for tidying up Precision and Recall?” we expect to find (a) certain parameter(s) that would affect the main metrics in an important way. The aim of this work is to show, by finding such a parameter, that True Positives, False Positives, True Negatives and False Negatives are scalable as constituents of Precision and Recall.

Theory

This section is going to be a general overview on the advances of evaluating Information Extraction, its metrics and particular application of ontologies there.

Among the first efforts there is the one taken by Wendy Lehnert and Claire Cardie [2] who suggested various components to construct the Precision and Recall, this approach was implemented at the third Message Understanding Conference (MUC-3) in 1994.

(However, this approach was resource-consuming as it required a decent amount of human evaluation)

The next interesting paper to mention is “A second look at Egghe's universal IR surface and a simple derivation of a complete set of universal IR evaluation points” by M. Shatkun from 2009 [3]. The work refers to L. Egghe's three papers (2004, 2007, 2008) [4-6] in an attempt to find a complete set of universal evaluation points :Precision, Recall, Fallout, Miss. (Though Shatkun studies the task of Information Retrieval, and this work focuses on Information Extraction, certain evaluation metrics overlap and can be used in both tasks)

Each component in the quadruple (P,R,F,M) may be of zero- or non-zero value with restrictions on which values may be inserted into each component; then valid cases are counted. Each relevance metric from the quadruple and its complement contribute to Egghe's original equation:

Shatkun in his paper further represents the metrics P, R, F, M and their complements in terms of the Swets [7] variables a, b, c, d by the table:

Then he derives Egghe's original equation using Swets variables:

Given that a, b, c, d are natural numbers and P, R, F, M are rational ones ranging from 0 to 1 (excluding 0 and 1), the original Egghe's equation represents the most important (P, R, F, M) metrics quadruple. Since originally each quadruple element can have one of four possible values (0, 1, 0/0, or the relevance metric variable), as Shatkun states, it totals into 256 possible quadruples. However, not all of them are valid for actual information extraction evaluation.

Shatkun finds exactly 15 valid, additional quadruples:

To sum up, this paper concentrates on finding the complete range of valid values for the quadruples (P, R, F, M) as it applies to any universal IR evaluation system.

According to Shathun, points 1-6 are valid continuous extensions of Egghe's “universal IR surface”. The remaining cases 7-15 containing the form 0/0 are ``off the surface” and yet to be proven valid (which is beyond the scope of this paper).

The next major work is called ”ОЦЕНКА СИСТЕМ ИЗВЛЕЧЕНИЯ ИНФОРМАЦИИ ИЗ ТЕКСТОВ НА ЕСТЕСТВЕННОМ ЯЗЫКЕ: КТО ВИНОВАТ, ЧТО ДЕЛАТЬ” written by V. F. Khoroshevsky [8]. The paper focuses on the metrics for assessing the quality of information extraction systems applied to texts, main requirements for the metrics are listed, then a new system is proposed allowing more accurate comparison between the metrics. Finally, having used the suggested metrics for assessing the quality of OntosMiner systems, the author presents the results. metrics extract information

The main requirement for all the quality-assessment metrics is that the score should be maximal for “good” systems, minimal for “bad” ones, and their change should be monotonous. Moreover, metrics are expected to be clear and intuitive, effectively computed and non-ambiguous, also there should be a correlation with human-assessed results.

However, the metrics do not always correlate with experts' opinion and allow various interpretations of the results. In addition, some metrics are limited, for example, they evaluate the quality of “atomic entity” extraction (objects as Person, Organization, Location, etc), but do not consider recall and precision of artifacts extraction, related to those entities (attributes as JobTitle, Time, etc). According to Khoroshevsky, there are almost no good metrics to assess quality of inter-object relations extraction, and the metrics in use just charge the system multiple times for the same error . The metrics almost do not consider the importance of object components when the former have inner structure (e.g. when extracting object of type Person, the name and patronymic might be processed correctly while the surname, especially compound, may only be half-extracted) . In this case the object will enter the final formula with coefficient Ѕ regardless of the importance of its components. Thus, there is a necessity in developing more adequate metrics for assessing the quality of information extraction systems.

According to the author, the main requirements for the metrics system are:

? Monotonicity of the metrics and of the system in general

? All metrics should be well-balanced

? Certain metrics and the whole system in general must be clear for the human expert

? The evaluation result should not be ambiguous

? There should be a possibility of integral quality assessment

? All metrics should be effectively computed

? All metrics should conform to current practices of developing an information extraction system, with further ability to be generalized

The author suggests retaining current precision-recall-F1 measures as the basis for the new system. Metrics parameters should rely on explicitly specified object annotations, for example:

<AnnName is_a AnnType; StartOffSet = Number; EndOffSet = Number;

Attr_1 = Value_1 .... Attr_n = Value_n

(where attributes are expressed as elementary data types - strings, integers, etc)

The object is identified correctly if its type, all the attributes and OffSets are correct according to the human expert. Besides correctly identified objects, there can be incorrect and partially correct identifications.

The situation is more challenging with relations extraction, as the author states, there are (as for 2012) almost no current agreed metrics to evaluate the task, due to its complexity.

Khoroshevsky then suggests the new metrics for recall, precision and F1-measure, both for extracted entities and relationships :

for entities:

for relationships:

For integral measures (considering extracted entities and relationships at the same time):

The nature of б and в parameters is explained in the paper in more detail, in short one can consider them as weights.

The metrics are used in OntosMiner text processor and were tested on a corpus made of russian newspaper articles. Typical entities extracted were of types Person, JobTitle/Title, Organization and Location, as well as relationships like BeEmployeeOf and ConnectedWith.

The new suggested metrics proved to be more sensitive to the errors (for example, in defining Offsets) and more accurate in object extraction. Thus, for example, if one of the Person attributes (Gender, FirstName, PatrName, FamName) is extracted incorrectly, precision and recall will be higher compared to the cases when none of the attributes is extracted correctly. The same rules apply to the relationships extraction. Concluding, the author emphasises the advantages of the new metrics as they perform better, but at the same time he states that the metrics are yet to be improved.

It is important to mention that the task of Knowledge Extraction slightly differs from Information Extraction. In the first case one needs to extract not only entities, but also relations between them, and store the information in a structured kind (also known as knowledge base). Hence it might be a good idea to face the ontological structural approach that can contribute to evaluation metrics, in the means of organizing the extracted information.

The next paper “A Survey on Ontology Metrics” by J. Garcнa, F. J. Garcнa-Peсalvo, R. Therуn, 2010 [9] gives us a thorough meta-research on various ontology-evaluating tools and metrics.

There is a question if it is possible to adapt certain ontology-evaluating metrics or metric components to the task of Knowledge Extraction. As an explanation of such an approach, one can think of certain structural similarities between knowledge extraction and an ontology design: ontology focuses on relations between types(classes) and class structures, while in case of knowledge extraction, the focus is on relations between entities.

In their work, Garcнa-Peсalvo and Therуn have collected results of several research-groups [10-18] and compared their approaches to evaluating ontologies. Each approach is explained in more detail below.

Vrandecic and Sure [10] propose some metrics to normalize ontologies (by naming anonymous classes, individuals, classifying hierarchically and unifying the names, propagating the individuals to the deepest possible classes, and finally normalizing the object properties) .

Alani, Brewster and Shadbol [11] [12] describe a way to rank ontologies (using Java Servlet to process user-defined keywords as inputs and Swoogle1 engine to retrieve all the URI's for relevant ontologies).

A particularly interesting approach was proposed by Orme et al. [13], [14] as a set of coupling and cohesion metrics for ontology-based systems in OWL, introduced below.

Coupling metrics:

? Number of distinct external classes in the ontology (NEC)

? Number of references to external classes (REC),

? Number of referenced inclusions (RI).

Cohesion metrics:

? Number of Root Classes (NoR), the total number of root classes explicitly defined in the ontology

? Number of Leaf Classes (NoL), the number of leaf classes explicitly defined in the ontology

? Average Depth of Inheritance Tree of all Leaf Nodes (ADIT-LN), the sum of depths of all paths divided by the total number of paths

On the other hand, Yinglong [15] focuses on the ontological semantics rather than structure. The metrics proposed here are:

? Number of Ontology Partitions (NOP), the number of semantical partitions of a knowledge base

? Number of Minimally Inconsistent Subsets (NMIS), the number of all minimally inconsistent subsets in a knowledge base

? Average Value of Axiom Inconsistencies (AVAI), is a ratio defined as the sum of inconsistency impact values of all axioms and assertions over the cardinality of the knowledge base

Nicola Guarino and Chris Welty [16], [17] explore the use of some defined metaproperties of entities (as rigidity, unity, identity and dependency). This is done in order to provide a logical and semantic meaning and detect non-logical relationships.

Finally, Yang et al. [18] focus on the evolution of the ontologies. The metrics mainly based on its quantity, ratio and correlativity of concepts and relationships, sum up to “ontology complexity”. These metrics are divided into two groups: Primitive Metrics and Complexity Metrics. The Primitive metrics:

? TNOC (Total Numbers of Concepts or Classes),

? TNOR (Total Number of Relations),

? TNOP (Total Number of Paths), where a path is defined as a trace that can be taken from a specific particular concept to the most general concept in the ontology.

Complexity Metrics:

? the average relations per concept, TNOR / TNOC

? the average paths per concept, TNOP / TNOC

At the end of this meta-research Garcнa, Garcнa-Peсalvo and Therуn made a rйsumй of all the metrics proposed by the research-groups:

Concluding, this paper shows us three main trends in the evaluation metrics for ontologies (structure-focused, cohesion-focused,coupling-focused) with “focusing on the structure” being a dominant trend in ontology evaluation.

A work from 2015 by Joe Raad and Christophe Cruz [19] continues to delve into the subject of ontology evaluation by studying different methods and discussing their advantages.

The authors distinguish several criteria that make a “good” ontology:

? Accuracy: if the ontology axioms comply with the domain knowledge

? Completeness: if domain coverage is sufficient

? Conciseness: if the ontology includes irrelevant/redundant elements

? Adaptability: how far the ontology expects its use

? Clarity: if meaning of the defined terms communicated successfully

? Computational efficiency: the ability of the used tools to work with the ontology, (in particular the speed that reasoners need to fulfil the required tasks)

? Consistency: if the ontology contains/allows any contradictions

According to the authors, existing approaches to evaluation can be grouped into four categories: gold standard (compares the learned ontology with a previously created “gold” one), corpus-based (evaluates domain coverage for ontology), task-based (measures how far an ontology helps in improving the results of a certain task), and criteria-based approaches (taxonomic depth and class match measures).

Raad and Cruz present a table describing criteria/approaches correlation (darker colour means better criterion coverage):

The most interesting is the task-based category as it shows if an ontology helps to improve the results of a certain task, and if yes, then to what extent. Such parameters as “Completeness”, “Adaptability”, “Conciseness” and “Computation complexity” might be of further use in evaluating knowledge extraction.

Chapter Conclusion

To sum up, various ways to evaluate general performance of Information Extraction (in particular, Knowledge Extraction) systems have been explored over the two last decades. Even though the methods diverge and evolve through years, there are certain noticeable patterns:

? The two metrics (Precision and Recall) remain at the basis of the evaluating system; we can look at them as core ones

? However, sometimes just two main measures are not enough, and it's better to either add additional metrics (as Miss, Fallout, etc) or somehow modify the core ones (e.g. by adding parameters as Khoroshevsky did)

? Ontological perspective can contribute to the IE system performance evaluation, as there is structural similarity between the concept of knowledge and an ontology.

? It's natural for metrics to develop as the field of IE advances.

Let us see how this can be implemented in the current work.

Methods

Data

As an example of an ill-defined domain, it was decided to study the concept of “Quality of life” collected from the Real World Data. In the medical domain, Real World Data (RWD) [20] is the type of data captured in a non-interventional, observational manner, in a natural, uncontrolled setting. It is traditionally compared to data from Randomized Control Trial (though both should be thought of as complementing each other). RWD can be collected from various sources such as non-interventional studies, patient registries, healthcare datasets, patient chart reviews, health records from wearable devices, surveys, etc.

Our example of RWD is a dataset collected from a thematic web-site (https://health.mail.ru/). It consists of numerous entries (in Russian) describing patients' troubles related to food consumption (as we research the concept of “Quality of Life”, with eating being a part of it). The example mining was based on a set of keywords that, in our opinion, are typical for the texts related to problems with food consumption. The keywords can be divided into 3 clusters according to their POS-tag: nouns, verbs and adjectives. For each cluster, a cosine similarity between the central word and others was calculated, as well.

The full set of keywords can be found in Appendix2. In the next step, all the mined data was shuffled (more than 10 times) and a sub-corpus of 50 random entries was chosen. This sub-corpus was further given to the annotators.

Experiment

Annotators were divided into 2 groups (those with medical background and those with background in NLP). Each of them annotated the whole corpus adopting strategies from FactRuEval 2016 [23]. During the annotation they were asked to detect entities and relations between them.

According to Merriam-Webster Dictionary, the term “entity” is explained as:

1 a: BEING, EXISTENCE

especially : independent, separate, or self-contained existence

b: the existence of a thing as contrasted with its attributes

2: something that has separate and distinct existence and objective or conceptual reality

3: an organization (such as a business or governmental unit) that has an identity separate from those of its members

In particular, in Natural Language Processing and Information Extraction, entities are understood as both proper nouns, common nouns, and noun phrases -- anything that denotes an object(material, abstract, animate or not), process or state. [19]

Relations between entities could be of different nature:

? Is a (antity A is a subtype of entity B)

? Has (entity A has entity B)

? Causation (entity A causes or is said to cause entity B)

? Correlation (entity A happens alongside with entity B it's unknown or not stated that there's a causation)

To increase the validity of the research, the strict system of entities and relations was not imposed on annotators in order to let them create their own (though they received initial guidelines and examples). In each entry, the annotators were asked firstly to identify important entities and then to detect all the possible reasonable relations. To solve the problem effectively, one can use a modular approach: creating smaller pieces of knowledge to further combine into a bigger one.

The next step for me as the research carrier was to collect the results and closely analyse the annotation in each case. We were particularly interested in in-group and inter-group annotation disagreements as they might shed light to data structure and further possible improvements. We were looking for certain parameters that could add tidiness to performance metrics in knowledge extraction.

Results

The annotated data received from two groups was closely analysed. Among the most interesting phenomena were a number of distinct entity types and a number of distinct relation types detected by annotators. It was also interesting to study how entities were defined within each annotator group (was it just a noun or a nominal phrase, if yes, then how big was it, if verbal phrases were included in entities or defined as “relations” between them, etc). Among auxiliary statistics one can mention the average and median amount of entities and relations per entry detected by each group (see details in Appendix3).

Thus, NLP-group tends to distinguish less classes, both for entities and relations (they detected 11 unique entity classes and 11 unique relation classes). In contrast, the medical expert group has defined 36 unique entity classes and 21 unique relation classes. Their annotation proves to be more detailed, as, for example, they distinguish between such entity classes as “illness”, “symptom”, “complaint” and “anamnesis”, whereas the latter didn't occur in annotations from NLP-group. The annotation analysis reveals several patterns:

For entities,

? Average amount of entities detected per entry is a reflection of a collective opinion within a group. It equals 5,7 among annotators from NLP-group, and is 9,2 among those from the medical group (which proves the idea that medical annotation is more thorough).

_ The general average amount of entities detected per entry is 7,5.

_ The median amount of detected entities was also calculated, to check if the data may be distributed normally (in this case mean and median should not differ much). The median values are 4 for NLP-group; 7,5 for medical experts and 6,2 as inter-group median. However, any assumptions about data distribution should be made carefully since the corpus contains only 50 samples.

? The variance characterises the extent of inter-annotation agreement. It is quite high for the whole corpus, amounting to 19.5. This, again, allows for the idea that NLP specialists and medical experts perceive and structure information with a certain difference.

For relations,

? Average amount of relations detected per entry for the two groups is 3.1 (2.5 within NLP-group and 3.9 for medical group).

_ Mean values for detected relations are 3, 2 and 4 respectively. Relation detection turned out to be less diverse, compared to entity detection, and a smaller difference between the number of unique classes (11 against 21) for each group proves this point.

? Variance value for relation detection is pretty low (1.9 as an inter-group value), detecting larger extent of annotation agreement.

Taking everything into account, there is evidence that the medical group created a more thorough and diverse system of entities and their relations, which may potentially lead to more sophisticated procedure of knowledge extraction from a source data, and requires a more detailed knowledge base.

Analysis

Several ideas can be inferred from the annotation analysis:

A procedure of successful knowledge extraction consists of several steps:

1) detecting entities and their classes

2) detecting relations between entities

3) extracting the pairs and further organising as needed

Thus knowledge can be regarded as a combination of extracted entities and their relations.

Since the group of medical experts has presented a more detailed annotation, compared to NLP-group, this raises a question of adequate procedure of evaluation of extracted results. Traditionally, when evaluating, one compares the result of a knowledge extraction system to the gold standard. This divides the whole set of outcomes into 4 groups: true positive(TP), false positive (FP), true negative (TN), false negative (FN).

Such a division may appear insufficient in terms of knowledge extraction from ill-defined domains. Is there a way to use better metrics? This situation may be approached from the 2 sides:

In the first case the system may allow the semantically close types of entities or relations. For example, in the sentence “У меня уже полгода временами болит зуб, но сегодня заболело еще и горло”, the piece “болит зуб” can be denoted as an entity of class “symptom”, “complaint” or “anamnesis”. In this case the extracted class (c) should be counted as correct if it belongs to a set of possible classes C:

In case of a naпve approach, the probability of a label doesn't matter as long as it is correct. If a more sophisticated solution is required, an additional step may be included to calculate label probability. In this case the sum of all class probabilities should result in 1.

The second scenario focuses on integral parameters for entities and relations that would allow a more neat evaluation of knowledge extraction.

The first suggested parameter is “knowledge сompleteness”. Since extracted knowledge is seen as a combination of entities and their relations (one can think of knowledge as a graph), it is important to evaluate which part of classes (for entities and relations) was extracted correctly.

where , is the amount of true- and false positive examples for extracted entities, and is the same for extracted relations. value ranges from 0 to 1.

The additional metrics to mention is the “knowledge similarity”, which is calculated as graph edit distance between the gold standard and the result of a certain knowledge extraction system in question. In this case one needs a fully annotated gold corpus with entities (as graph vertices) and connections between them (as edges). The graph edit distance is a set of graph edit operations (vertex and edge deletions, insertions, substitutions) required in order to transform the resulting graph into a golden one. In other words, this metric shows how similar the extracted system of entities and relations (knowledge) is to the one from gold standard.

The graph edit distance (GED) is calculated as:

Where is the set of edit paths converting the first graph into the second.

The is the cost of each edit operation, .

The second suggested parameter iscredibility ratio” for relations. Sometimes entities can be connected, but the connection is doubtful in terms of common sense. For example, there were entries when the patients said “[грейпфрут] вызывает [рак груди]” or “если съесть [яблоко с червяком], заведутся [глисты]”, or “можно заразиться [бешенством] от [белки] в [парке]”. The connections between entities exist formally, but it doesn't mean that the connection is plausible. Such relations may seem noisy or misleading because when extracted, they may be further stored in a knowledge base and occasionally re-used. In order to check the credibility parameter, it is suggested to take the extracted relation examples labelled as FN and FP, and to additionally check between the two entities that shared a relation.

The higher the pmi score, the more likely the words occur together. Thus it is possible to penalize rare entity pairs.

It easier to use normalized pmi in evaluation, so

Where the denominator is Claud Shannon's “self-information”, a level of "surprise" of a particular outcome.

Normalized pmi ranges between [-1,+1] resulting in -1 for a word pair that almost never occurs together; +1 for a strong co-occurrence and 0 for the notions in between.

To summarize, the credibility coefficient is calculated for a pair of entities, if any of them or both of them is labelled as FP or FN.

This parameter allows for reducing doubtful relations: since the pmi for two words denoting entities would be low. Then, the complete Precision and Recall formulae would look as:

Another suggestion for evaluating knowledge extraction is using “uncertainty coefficient”. This metric is closely related to entropy of a random variable. As a random variable, let's consider the probability of a certain class for a detected entity. Entropy is a degree of randomness for a variable, it detects which outcome is more likely to happen (if possible to detect). The closer the probabilities of outcomes to each other, the higher the entropy. In a given case, one can consider entropy as the “degree of randomness for assigning a certain class for an extracted entity”.

The entropy is calculated as

Since knowledge is a set of entities and certain relations between them, we need to add the second random variable, for denoting probability for an extracted relation to be of a certain class. The next step is to calculate conditional entropy (how much it is required to know about X in order to predict Y). Conditional entropy is calculated as

Тhen uncertainty coefficient is mutual information about two variables over the entropy of the first one.

When measuring the validity of a statistical classification, the uncertainty coefficient can be used in addition to traditional Precision and Recall measures or used as a substitution. In our case it would measure the degree of association between entity class and relation class. This could detect strange, statistically rare or doubtful relations.

To summarize all the aforementioned analysis, this work suggests the new approaches for evaluating knowledge extraction from a source text. This research is conducted in order to improve the traditional IE performance metrics and to adapt them to the structural and semantic peculiarities of data from ill-defined domains. The new metrics can serve as complementary to the traditional ones, or can be used instead, depending on a specific task.

Discussion

The research topic proved to be both challenging and engaging. Knowledge extraction in medical texts is a steady trend, and it evolves with the advances of Machine Learning algorithms. New systems and architectures are being designed and implemented every year, and consequently metrics are required to be up-to-date, as well. There is an opportunity for a potential follow-up and new projects in terms of perfecting the metrics, adapting them to the needs of the domain and possibly designing new ones.

Speaking about alternative research questions, the problem of semantic overlapping among the labels also appears to be interesting. Even though the current work only shades a little light on it, it is important to mention already existing solutions. For instance, among the latest achievements in the field, there is a paper by Daoyuan Chen at al. [20] who approached the problem of shifted label distribution by using Machine Learning. They created a module of cooperative multi-agents that calculated a continuous confidence interval for a label score, then the confidences were used to correct the training losses of the extraction system.

Among the encountered challenges, one can state the certain scarcity of novel research. There are lots of papers describing Information Extraction or Knowledge Extraction systems, their architecture, design and implementation. However, there are significantly less papers analyzing metrics, majority of them being issued between 2002 and 2015.

Conclusion

The research is focused on metrics for the successful knowledge extraction in ill-defined medical domain. We analyzed the notion of “Quality of life” through the extracted knowledge from the source text data.

In order to answer the stated research question “if there any ways to improve evaluation done by Precision and Recall?” this research analyzed metrics for knowledge extraction and proved to find alternative ways to evaluate performance of a knowledge extraction system in ill-defined domains, and to do this in a more definite way.

Reference

Appendix1 - Review of Knowledge extraction systems

Appendix2 - Analysis of keywords for corpus mining

Appendix3 - Annotation statistics

Appendix4 - A table of entities and relations detected by annotators

Appendix5 - Examples given to annotators

Appendix6 - New formulae calculations

All complementary analysis results + code are stored here: https://github.com/nstsj/Master_thesis_additional

1. Jurafsky, Dan, and James H. Martin. "Speech and language processing (draft)." Chapter 18: Information Extraction (Draft of October 02, 2019).

2. Lehnert, Wendy, et al. "Evaluating an information extraction system." Integrated Computer-Aided Engineering 1.6 (1994): 453-472.

3. Schatkun M. A second look at Egghe's universal IR surface and a simple derivation of a complete set of universal IR evaluation points //Information processing & management. - 2010. - Т. 46. - №. 1. - С. 110-114.

4. Egghe, L. (2004). A universal method of information retrieval evaluation: The ``missing” link M and the universal IR surface. Information Processing and Management, 40(1), 21-30.

5. Egghe, L. (2007). Existence theorem of the quadruple (P, R, F, M): Precision, recall, fallout and miss. Information Processing and Management, 43(1), 265-272.

6. Egghe, L. (2008). The measures Precision, Recall, Fallout and Miss as a function of the number of retrieved documents and their mutual interrelations. Information Processing and Management, 44(2), 856-876.

7. Swets, J. A. (1969). Effectiveness of information retrieval methods. American Documentation, 20(1), 72-89.

8. Хорошевский, В. Ф. "Оценка систем извлечения информации из текстов на естественном языке: кто виноват, что делать." Труды Десятой национальной конференции по искусственному интеллекту с международным участием (КИИ-2006).-М.: Физматлит. Vol. 2. 2006.

9. Garcнa J., Jose'Garcнa-Peсalvo F., Therуn R. A survey on ontology metrics //World Summit on Knowledge Society. - Springer, Berlin, Heidelberg, 2010. - С. 22-27.

10. Denny Vrandecic and York Sure, "How to Design Better Ontology Metrics", The Semantic Web: Research and Applications, pp. 311-325, Springer-Berlag 2007

11. Harith Alani, Christopher Brewster and Nigel Shadbolt, "Ranking Ontologies with AKTiveRank", Proceedings of the International Semantic Web Conference, ISWC, 2006 5th International Semantic Web Conference (ISWC), November 2006, Georgia, USA

12. Harith Alani and Christopher Brewster, "Metrics for Ranking Ontologies", 4th Int. EON Workshop, 15th Int. World Wide Web Conference, 2006

13. Anthony Orme, Haining Yao, and Letha Etzkorn, "Coupling Metrics for Ontology-Based Systems", IEEE Software, pp 102-108, 2006

14. Haining Yao, Anthony Orme and Letha Etzkorn, “Cohesion Metrics for Ontology Design and Application”, Journal of Computer Science 1(1): 107-113, 2005, Science Publications

15. Yinglong Ma, Beihong Jin, Yulin Feng, "Semantic oriented ontology cohesion metrics for ontology-based systems", The Journal of Systems and Software, Elsevier, 2009

16. Nicola Guarino and Chris Welty, "An Overview of OntoClean", The Handbook on Ontologies, Pp. 151-172, Berlin:Springer-Verlag, 2004

17. Nicola Guarino and Chris Welty, "Evaluating Ontological Decisions with OntoClean", Communications of the ACM, pp 61-65, ACM Press, 2002

18. Zhe YANG, Dalu Zhang and Chuan YE, "Evaluation Metrics for Ontology Complexity and Evolution Analysis", IEEE International Conference on e-Business Engineering (ICEBE'06), 2006

19. Raad, Joe, and Christophe Cruz. "A survey on ontology evaluation methods." 2015.

20. ???

21. Chen, Daoyuan, et al. "Relabel the Noise: Joint Extraction of Entities and Relations via Cooperative Multiagents." arXiv preprint arXiv:2004.09930 (2020).

22. Mahajan, Rajiv. "Real world data: additional source for making clinical decisions." International Journal of Applied and Basic Medical Research 5.2 (2015).

23. Starostin, A. S., et al. "FactRuEval 2016: evaluation of named entity recognition and fact extraction systems for Russian." (2016).

Размещено на Allbest.ru

...

Подобные документы

  • The digital nervous system. The best way to put distance between company and the crowd. Accurate information about sales. A standardized system of accounts for the entire GM organization. Achieving advantage over competitors in the information age.

    анализ книги [19,8 K], добавлен 16.06.2012

  • Antitrust regulation of monopolies. The formation and methods of antitrust policy in Russia. Several key areas of antitrust policy: stimulating entrepreneurship, the development of competition began, organizational and legal support for antitrust policy.

    эссе [39,2 K], добавлен 04.06.2012

  • Entrepreneurial risk: the origins and essence. The classification of business risk. Economic characteristic of entrepreneurial risks an example of joint-stock company "Kazakhtelecom". The basic ways of the risks reduction. Methods for reducing the risks.

    курсовая работа [374,8 K], добавлен 07.05.2013

  • Понятие справедливой стоимости компании и подходы к ее определению. Доходный подход к оценке бизнеса. Расчет и прогнозирование денежных потоков. Yandex N.V.: описание бизнеса, факторы влияния на стоимость компании, рыночная ситуация, cтруктура выручки.

    дипломная работа [3,4 M], добавлен 24.09.2012

  • The definition of term "economic security of enterprise" and characteristic of it functional components: technical and technological, intellectual and human resources component, information, financial, environmental, political and legal component.

    презентация [511,3 K], добавлен 09.03.2014

  • Models and concepts of stabilization policy aimed at reducing the severity of economic fluctuations in the short run. Phases of the business cycle. The main function of the stabilization policy. Deviation in the system of long-term market equilibrium.

    статья [883,7 K], добавлен 19.09.2017

  • Law of demand and law of Supply. Elasticity of supply and demand. Models of market and its impact on productivity. Kinds of market competition, methods of regulation of market. Indirect method of market regulation, tax, the governmental price control.

    реферат [8,7 K], добавлен 25.11.2009

  • Теоретические аспекты управления запасами, потребность в материалах MRP (Material Requirements Planning). Основные этапы функционирования системы MRP. Анализ управления запасами на предприятии ОАО "ТопМехСистемы", оптимизация на основе системы MRP.

    курсовая работа [949,5 K], добавлен 29.04.2010

  • Financial bubble - a phenomenon on the financial market, when the assessments of people exceed the fair price. The description of key figures of financial bubble. Methods of predicting the emergence of financial bubbles, their use in different situations.

    реферат [90,0 K], добавлен 14.02.2016

  • Defining the role of developed countries in the world economy and their impact in the political, economic, technical, scientific and cultural spheres.The level and quality of life. Industrialised countries: the distinctive features and way of development.

    курсовая работа [455,2 K], добавлен 27.05.2015

  • Project background and rationales. Development methodology, schedule planning. Company mission and vision. Organization of staff and company structure. Procurement system target market. Implementation of procurement system. Testing, user manual.

    дипломная работа [6,8 M], добавлен 28.11.2013

  • The essence of Natural Monopoly. The necessity of regulation over Natural Monopoly. Methods of state regulation over the Natural Monopolies. Analysis and Uzbek practice of regulation over Monopolies. Natural Monopolies in modern Economy of Uzbekistan.

    курсовая работа [307,7 K], добавлен 13.03.2014

  • Формирование корпоративных структур. Порядок государственного регулирования деятельности. Мировая практика деятельности холдингов. Финансовые показатели, управление, органы управления, факторы роста, управление рисками в АО "System Capital Management".

    реферат [179,4 K], добавлен 24.01.2014

  • The air transport system in Russia. Project on the development of regional air traffic. Data collection. Creation of the database. Designing a data warehouse. Mathematical Model description. Data analysis and forecasting. Applying mathematical tools.

    реферат [316,2 K], добавлен 20.03.2016

  • Transition of the Chinese labor market. Breaking the Iron Rice Bowl. Consequences for a Labor Force in transition. Labor market reform. Post-Wage Grid Wage determination, government control. Marketization Process. Evaluating China’s industrial relations.

    курсовая работа [567,5 K], добавлен 24.12.2012

  • Estimate risk-neutral probabilities and the rational for its application. Empirical results of predictive power assessment for risk-neutral probabilities as well as their comparisons with stock-implied probabilities defined as in Samuelson and Rosenthal.

    дипломная работа [549,4 K], добавлен 02.11.2015

  • Calculation of accounting and economic profits. The law of diminishing returns. Short-Run production relationships and production costs, it's graphic representation. The long-run cost curve. Average fixed, variable, total costs and marginal costs.

    презентация [66,7 K], добавлен 19.10.2016

  • Методы минимизации затрат. Анализ затраты-результативность, выгоды, полезность. Показатель качества жизни Quality Adjusted Life Years. Проблема вирусного гепатита С. Программа по борьбе с хроническими вирусными заболеваниями в Самарской области.

    дипломная работа [1,6 M], добавлен 27.04.2016

  • Identifing demographic characteristics of consumers shopping in supermarkets. Determine the factors influencing consumer’s way of shopping and the level of their satisfaction (prices, quality, services offered, etc in supermarkets and bazaars).

    доклад [54,4 K], добавлен 05.05.2009

  • The influence of corruption on Ukrainian economy. Negative effects of corruption. The common trends and consequences of increasing corruption. Crimes of organized groups and criminal organizations. Statistical data of crime in some regions of Ukraine.

    статья [26,7 K], добавлен 04.01.2014

Работы в архивах красиво оформлены согласно требованиям ВУЗов и содержат рисунки, диаграммы, формулы и т.д.
PPT, PPTX и PDF-файлы представлены только в архивах.
Рекомендуем скачать работу.