Метод переноса обучения через аугментации в задачах классификации текста

Существующие методы аугментации тренировочных данных в задаче классификации, их сравнительная характеристика и особенности применения. Порядок проведения экспериментов по аугментированию с помощью различных подходов. Их сравнение с методом EDA.

Рубрика Программирование, компьютеры и кибернетика
Вид дипломная работа
Язык русский
Дата добавления 20.08.2020
Размер файла 1,9 M

Отправить свою хорошую работу в базу знаний просто. Используйте форму, расположенную ниже

Студенты, аспиранты, молодые ученые, использующие базу знаний в своей учебе и работе, будут вам очень благодарны.

Библиография

1. Alexis Conneau, Douwe Kiela, Holger Schwenk, Loic Barrault, and Antoine Bordes. Supervised learning of universal sentence representations from natural language inference data. arXiv preprint arXiv:1705.02364, 2017.

2. Ateret Anaby-Tavor, Boaz Carmeli, Esther Goldbraich, Amir Kantor, George Kour, Segev Shlomov, Naama Tepper, Naama Zwerdling. Do Not Have Enough Data? Deep Learning to the Rescue! arXiv preprint arXiv:1911.03118, 2019.

3. Dan Roth and Chad M. Cumby and Xin Li and Paul Morie and Ramya Nagarajan and Vasin Punyakanok and Nick Rizzolo and Kevin Small and Wen-tau Yih, Question-Answering via Enhanced Understanding of Questions TREC, 2002.

4. George A. Miller. 1995. Wordnet: A lexical database for english. Commun. ACM, 38 (11):39-41.

5. J. Devlin, M. Chang, K. Lee, and K. Toutanova. BERT: pretraining of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805, 2018.

6. Maas, Andrew L. and Daly, Raymond E. and Pham, Peter T. and Huang, Dan and Ng, Andrew Y. and Potts, Christopher. Learning Word Vectors for Sentiment Analysis. Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, Portland, Oregon, USA, Association for Computational Linguistics, June, 2011, pages 142-150.

7. Mansi Gupta, Nitish Kulkarni, Raghuveer Chanda, Anirudha Rayasam, and Zachary C. Lipton. 2019. Amazonqa: A review-based question answering task. In Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence, IJCAI 2019, Macao, China, August 10-16, 2019, pages 4996-5002.

8. Qizhe Xie, Zihang Dai, Eduard Hovy, Minh-Thang Luong, Quoc V. Le. Unsupervised data augmentation for consistency training. arXiv preprint arXiv:1904.12848, 2019.

9. Rajpurkar, P.; Zhang, J.; Lopyrev, K.; and Liang, P. 2016. SQuAD: 100,000+ questions for machine comprehension of text. EMNLP.

10. Samuel R. Bowman, Gabor Angeli, Christopher Potts, and Christopher D. Manning. A large annotated corpus for learning natural language inference. In Proceedings of EMNLP, 2015.

11. Sepp Hochreiter; Jьrgen Schmidhuber. Long short-term memory // Neural Computation: journal. - 1997. - Vol. 9, no. 8. - P. 1735-1780. - doi:10.1162/neco.1997.9.8.1735

12. Sohn, Kihyuk, Honglak Lee, and Xinchen Yan. Learning Structured Output Representation using Deep Conditional Generative Models. Advances in Neural Information Processing Systems. 2015.

13. Stephen Merity, Caiming Xiong, James Bradbury, and Richard Socher. 2016. Pointer Sentinel Mixture Models

14. T. Mikolov, E. Grave, P. Bojanowski, C. Puhrsch, A. Joulin. Advances in Pre-Training Distributed Word Representations, 2018.

15. Wu, X.; Lv, S.; Zang, L.; Han, J.; and Hu, S. Conditional bert contextual augmentation. In International Conference on Computational Science, 84-95. Springer, 2019

16. Yoon Kim. 2014. Convolutional neural networks for sentence classification. In Proceedings of Empirical Methods on Natural Language Processing.

Приложение

Приложение 1

Показатели Ф-меры и точности для выборок набора данных Trec-50

Модель / Размер выборки

LSTM

CNN

MLP

BERT

5000

F1 - 0.757±0.0069

Acc - 0.738±0.007

F1 - 0.827±0.006

Acc - 0.812±0.006

F1 - 0.774±0.009

Acc - 0.79±0.007

F1 - 0.9256

Acс - 0.92

4000

F1 - 0.74±0.011

Acc - 0.71±0.015

F1 - 0.817±0.006

Acc - 0.804±0.003

F1 - 0.764±0.008

Acc - 0.78±0.0047

F1 - 0.9317

Acc - 0.92

3000

F1 - 0.787±0.002

Acc - 0.771±0.0048

F1 - 0.841±0.003

Acc - 0.830±0.0041

F1 - 0.778±0.005

Acc - 0.789±0.003

F1 - 0.9152

Acc - 0.91

Приложение 2

Показатели Ф-меры и точности для выборок набора данных Trec-6

Модель / Размер выборки

LSTM

CNN

MLP

BERT

5000

F1 - 0.869±0.0022

Acc - 0.870±0.005

F1 - 0.910±0.003

Acc - 0.909±0.0028

F1 - 0.887±0.0095

Acc - 0.8886±0.008

F1 - 0.9797

Acc - 0.98

4000

F1 - 0.872±0.0082

Acc - 0.870±0.012

F1 - 0.909±0.00248

Acc - 0.912±0.0084

F1 - 0.86±0.005

Acc - 0.862, ±0.004

F1 - 0.9758

Acc - 0.98

3000

F1 - 0.843±0.003

Acc - 0.836±0.0028

F1 - 0.884±0.0051

Acc - 0.884±0.0033

F1 - 0.86±0.00613

Acc - 0.862, ±0.005

F1 - 0.9737

Acc - 0.97

Приложение 3

Показатели Ф-меры и точности для выборок набора данных SST-2

Модель / Размер выборки

LSTM

CNN

MLP

BERT

5000

F1 - 0.817±0.002

Acc - 0.817±0.0027

F1 - 0.841±0.0011

Acc - 0.841±0.0011

F1 - 0.79±0.003

Acc - 0.79±0.002

F1 - 0.9077

Acc - 0.91

4000

F1 - 0.812±0.002

Acc - 0.812 ±0.002

F1 - 0.83±0.002

Acc - 0.831±0.002

F1 - 0.783±0.0033

Acc - 0.7836±0.003

F1 - 0.9039

Acc - 0.90

3000

F1 - 0.807 ±0.004

Acc - 0.81±0.0046

F1 - 0.828±0.0011

Acc - 0.828±0.0009

F1 - 0.773±0.0035

Acc - 0.773±0.0034

F1 - 0.8962

Acc - 0.90

Приложение 4

Показатели Ф-меры в задаче классификации на данных Trec-6 с разным количеством тренировочных примеров и различными моделями.

Количество примеров / Классификаторы

Случайный лес

AdaBoost

Многослойный персептрон

Наивный Байес

Решающее дерево

Метод опорных веторов

Логистическая регрессия

5000

0.844

0.596

0.87

0.821

0.8

0.858

0.874

4000

0.826

0.554

0.85

0.8

0.802

0.852

0.856

3000

0.832

0.61

0.832

0.782

0.77

0.84

0.84

Приложение 5

Примеры вопросов и категорий из набора данных Trec-50

Вопросы

Категории

How does Zatanna perform her magic in DC comics?

manner

What was the non-fiction best-seller of 1952, 1953 and 1954?

cremat

What two animals are specifically mentioned as being in Noah 's Ark?

animal

What does `` B.Y.O.B. '' mean?

exp

What count did Alexandre Dumas write about?

ind

What cable network bills itself as `` the family entertainer «?

gr

What was Queen Victoria 's title regarding India?

title

What is dry ice?

def

When is Dick Clark 's birthday?

date

What makes sperm?

reason

Name the French historical period during the reign of Napoleon III.

event

What state produces the best lobster to eat?

state

What is the design of the ship Titanic?

desc

How many double-word-score spaces are there on a Scrabble Crossword Game board?

count

What flag flies over Wake Island?

other

What letter does Gorbachev 's middle name start with?

letter

What religion has the most members?

religion

What kind of wine is Spumante?

food

What nationality is Gorbachev?

country

What color of Monopoly properties are landed on most often?

color

Name the Islamic counterpart to the Red Cross.

termeq

What city is graced by the Arch of Titus?

city

The corpus callosum is in what part of the body?

body

What is a fear of gravity?

dismed

What mountain range marks the border of France and Spain?

mount

What is the average cost for four years of medical school?

money

What is the brand name of daminozide?

product

How old is the sun?

period

What molecules include fluorine, sodium and magnesium?

substance

What new games are available for Nintendo 64?

sport

What are some of Australia 's native flora?

plant

What are other ways of getting stretch marks besides pregnancy, weight loss, and weight lifting?

techmeth

How large is the Arctic refuge to preserve unique wildlife and wilderness value on Alaska 's north coast?

volsize

What kind of guitar did Jimi Hendrix play?

instru

What 's the abbreviation for limited partnership?

abb

How fast is light?

speed

What English word contains the most letters?

word

Name a Sioux language.

lang

What percentage of children between the ages of two and eleven watch ` The Simpsons '?

perc

What seven digits follow the area code in the number for long distance information?

code

How long is the world 's largest ship, in meters?

dist

How hot does the inside of an active volcano get?

temp

What is the trademark of a Washington Redskin 's fan?

symbol

What chapter of the Bible has the most verses?

ord

On which flight did Fawaz Younis commit air piracy and hostage taking?

veh

Approximately how much does a teaspoon of matter weigh in a black hole?

weight

What money was used by them?

currency

Приложение 6

Распределение примеров на каждую категорию из набора данных Trec-50. Выборка 5000 примеров

Приложение 7

Распределение примеров на каждую категорию из набора данных Trec-6. Выборка 5000 примеров

Размещено на Allbest.ru

...

Подобные документы

Работы в архивах красиво оформлены согласно требованиям ВУЗов и содержат рисунки, диаграммы, формулы и т.д.
PPT, PPTX и PDF-файлы представлены только в архивах.
Рекомендуем скачать работу.