Application of Ensemble learning for computing semantic textual similarity

Semantic Textual Similarity (STS) evaluation
assesses the degree to which two parts
of texts are similar, based on their semantic
evaluation. In this paper, we describe three
models submitted to STS SemEval 2017.
Given two English parts of a text, each of
proposed methods outputs the assessment
of their semantic similarity.
We propose an approach for computing
monolingual semantic textual similarity
based on an ensemble of three distinct
methods. Our model consists of recursive
neural network (RNN) text auto-encoders
ensemble with supervised a model of vectorized
sentences using reduced part of
speech (PoS) weighted word embeddings
as well as unsupervised a method based
on word coverage (TakeLab). Additionally,
we enrich our model with additional
features that allow disambiguation of ensemble
methods based on their efficiency.
We have used Multi-Layer Perceptron as an
ensemble classifier basing on estimations
of trained Gradient Boosting Regressors.
Results of our research proves that using
such ensemble leads to a higher accuracy
due to a fact that each memberalgorithm
tends to specialize in particular
type of sentences. Simple model based
on PoS weighted Word2Vec word embeddings
seem to improve performance of
more complex RNN based auto-encoders in
the ensemble. In the monolingual EnglishEnglish
STS subtask our Ensemble based
model achieved mean Pearson correlation
of .785 compared with human annotators.

 

Leave a Reply