Application of Ensemble learning for computing semantic textual similarity

Semantic Textual Similarity (STS) evaluation
assesses the degree to which two parts
of texts are similar, based on their semantic
evaluation. In this paper, we describe three
models submitted to STS SemEval 2017.
Given two English parts of a text, each of
proposed methods outputs the assessment
of their semantic similarity.
We propose an approach for computing
monolingual semantic textual similarity
based on an ensemble of three distinct
methods. Our model consists of recursive
neural network (RNN) text auto-encoders
ensemble with supervised a model of vectorized
sentences using reduced part of
speech (PoS) weighted word embeddings
as well as unsupervised a method based
on word coverage (TakeLab). Additionally,
we enrich our model with additional
features that allow disambiguation of ensemble
methods based on their efficiency.
We have used Multi-Layer Perceptron as an
ensemble classifier basing on estimations
of trained Gradient Boosting Regressors.
Results of our research proves that using
such ensemble leads to a higher accuracy
due to a fact that each memberalgorithm
tends to specialize in particular
type of sentences. Simple model based
on PoS weighted Word2Vec word embeddings
seem to improve performance of
more complex RNN based auto-encoders in
the ensemble. In the monolingual EnglishEnglish
STS subtask our Ensemble based
model achieved mean Pearson correlation
of .785 compared with human annotators.


Evaluation of interaction dynamics of concurrent processes : Journal of Electrical Engineering

The purpose of this paper is to present the wavelet tools that enable the detection of temporal interactions of concurrent processes. In particular, the determination of interaction coherence of time-varying signals is achieved using a complex continuous wavelet transform. This paper has used electrocardiogram (ECG) and seismocardiogram (SCG) data set to show multiple continuous wavelet analysis techniques based on Morlet wavelet transform. MATLAB Graphical User Interface (GUI), developed in the reported research to assist in quick and simple data analysis, is presented. These software tools can discover the interaction dynamics of time-varying signals, hence they can reveal their correlation in phase and amplitude, as well as their non-linear interconnections. The user-friendly MATLAB GUI enables effective use of the developed software what enables to load two processes under investigation, make choice of the required processing parameters, and then perform the analysis. The software developed is a useful tool for researchers who have a need for investigation of interaction dynamics of concurrent processes.

MRI imaging texture features in prostate lesions classification | SpringerLink

(PCa) is the most common diagnosed cancer and cause of cancer-related death among men. This paper describes novel, deep learning based PCa CAD system that uses statistical central moments and Haralick features extracted from MR images, integrated with anamnestic data. Developed system has been trained on the dataset consisting of 330 lesions and evaluated on the challenge dataset using area under curve (AUC) related to estimated receiver operating characteristic (ROC). Two configurations of our method, based on statistical and Haralick features, scored 0.63 and 0.73 of AUC values. We draw conclusions from the challenge participation and discussed further improvements that could be made to the model to improve prostate classification.

Natural Language Processing with Deep Learning

Natural language processing (NLP) deals with the key artificial intelligence technology of understanding complex human language communication. This lecture series provides a thorough introduction to the cutting-edge research in deep learning applied to NLP, an approach that has recently obtained very high performance across many different NLP tasks including question answering and machine translation. It emphasizes how to implement, train, debug, visualize, and design neural network models, covering the main technologies of word vectors, feed-forward models, recurrent neural networks, recursive neural networks, convolutional neural networks, and recent models involving a memory component.

Feature transformations with ensembles of trees — scikit-learn 0.18.1 documentation

ransform your features into a higher dimensional, sparse space. Then train a linear model on these features.
First fit an ensemble of trees (totally random trees, a random forest, or gradient boosted trees) on the training set. Then each leaf of each tree in the ensemble is assigned a fixed arbitrary feature index in a new feature space. These leaf indices are then encoded in a one-hot fashion.
Each sample goes through the decisions of each tree of the ensemble and ends up in one leaf per tree. The sample is encoded by setting feature values for these leaves to 1 and the other feature values to 0.
The resulting transformer has then learned a supervised, sparse, high-dimensional categorical embedding of the data.

Introduction to Statistical Learning

This book provides an introduction to statistical learning methods. It is aimed for upper level undergraduate students, masters students and Ph.D. students in the non-mathematical sciences. The book also contains a number of R labs with detailed explanations on how to implement the various methods in real life settings, and should be a valuable resource for a practicing data scientist.