ON USAGE OF MACHINE LEARNING FOR NATURAL LANGUAGE PROCESSING TASKS AS ILLUSTRATED BY EDUCATIONAL CONTENT MINING
- Authors: Melnikov A.1, Botov D.1, Klenin J.1
-
Affiliations:
- Chelyabinsk State University, Institute of Information Technology, Chelyabinsk, Russia
- Issue: Vol 7, No 1(23) (2017)
- Pages: 34-47
- Section: METHODS AND TECHNOLOGIES OF DECISION MAKING
- URL: https://journals.ssau.ru/ontology/article/view/5946
- ID: 5946
Cite item
Full Text
Abstract
In this paper, we review most popular approaches to a variety of natural language processing (NLP) tasks, primarily those, which involve machine learning: from classics to state-of-the-art technologies. Most modern approaches can be separated into three rough categories: ones based on distributional hypothesis, those extracting information from graph-like structures (such as ontologies) and the ones that look for lexico-syntactic patterns in text documents. We focus mainly on the former of the three. Before the analysis can even begin, one of the important steps in preparation stage of NLP is the task of representing words and documents as numeric vectors. There exists a variety of approaches from the most simplistic Bag-of-Words to sophisticated machine learning methods, such as word embedding. Today, in the task of information retrieval the best quality for both English and Russian languages is achieved by approaches based on word embedding algorithms, trained on carefully picked text corpora in conjunction with deep syntactic and semantic analysis using various deep neural networks. A big variety of different machine learning algorithms is being applied for NLP tasks such as Part-of-Speech-tagging, text summarization, named entity recognition, document classification, topic and relation extraction and natural language question answering. We also review possibilities of applying these approaches and methods to educational content analysis, and propose the novel approach to utilizing NLP and machine learning capabilities in analyzing and synthesizing educational content in a form of a decision support systems.
About the authors
A.V. Melnikov
Chelyabinsk State University, Institute of Information Technology, Chelyabinsk, Russia
Author for correspondence.
Email: mav@csu.ru
D.S. Botov
Chelyabinsk State University, Institute of Information Technology, Chelyabinsk, Russia
Email: dmbotov@gmail.com
J.D. Klenin
Chelyabinsk State University, Institute of Information Technology, Chelyabinsk, Russia
Email: jklen@yandex.ru