Semantic Textual Similarity (STS) evaluationassesses the degree to which two partsof texts are similar, based on their semanticevaluation. In this paper, we describe threemodels submitted to STS SemEval 2017.Given two English parts of a text, each ofproposed methods outputs the assessmentof their semantic similarity.We propose an approach for computingmonolingual semantic textual similaritybased on an ensemble of three distinctmethods. Our model consists of recursiveneural network (RNN) text auto-encodersensemble with supervised a model of vectorizedsentences using reduced part ofspeech (PoS) weighted word embeddingsas well as unsupervised a method basedon word coverage (TakeLab). Additionally,we enrich our model with additionalfeatures that allow disambiguation of ensemblemethods based on their efficiency.We have used Multi-Layer Perceptron as anensemble classifier basing on estimationsof trained Gradient Boosting Regressors.Results of our research proves that usingsuch ensemble leads to a higher accuracydue to a fact that each memberalgorithmtends to specialize in particulartype of sentences. Simple model basedon PoS weighted Word2Vec word embeddingsseem to improve performance ofmore complex RNN based auto-encoders inthe ensemble. In the monolingual EnglishEnglishSTS subtask our Ensemble basedmodel achieved mean Pearson correlationof .785 compared with human annotators.
The purpose of this paper is to present the wavelet tools that enable the detection of temporal interactions of concurrent processes. In particular, the determination of interaction coherence of time-varying signals is achieved using a complex continuous wavelet transform. This paper has used electrocardiogram (ECG) and seismocardiogram (SCG) data set to show multiple continuous wavelet analysis techniques based on Morlet wavelet transform. MATLAB Graphical User Interface (GUI), developed in the reported research to assist in quick and simple data analysis, is presented. These software tools can discover the interaction dynamics of time-varying signals, hence they can reveal their correlation in phase and amplitude, as well as their non-linear interconnections. The user-friendly MATLAB GUI enables effective use of the developed software what enables to load two processes under investigation, make choice of the required processing parameters, and then perform the analysis. The software developed is a useful tool for researchers who have a need for investigation of interaction dynamics of concurrent processes.
#Prostate cancer (PCa) is the most common diagnosed cancer and cause of cancer-related death among men. This paper describes novel, deep learning based PCa CAD system that uses statistical central moments and Haralick features extracted from MR images, integrated with anamnestic data. Developed system has been trained on the dataset consisting of 330 lesions and evaluated on the challenge dataset using area under curve (AUC) related to estimated receiver operating characteristic (ROC). Two configurations of our method, based on statistical and Haralick features, scored 0.63 and 0.73 of AUC values. We draw conclusions from the challenge participation and discussed further improvements that could be made to the model to improve prostate classification.
Encyclopedia of Machine LearningEditors: Claude Sammut, Geoffrey I. WebbISBN: 978-0-387-30768-8 (Print) 978-0-387-30164-8 (Online)
Natural language processing (NLP) deals with the key artificial intelligence technology of understanding complex human language communication. This lecture series provides a thorough introduction to the cutting-edge research in deep learning applied to NLP, an approach that has recently obtained very high performance across many different NLP tasks including question answering and machine translation. It emphasizes how to implement, train, debug, visualize, and design neural network models, covering the main technologies of word vectors, feed-forward models, recurrent neural networks, recursive neural networks, convolutional neural networks, and recent models involving a memory component.
Bruno Olshausen, UC BerkeleyFoundations of Machine Learning
ransform your features into a higher dimensional, sparse space. Then train a linear model on these features.First fit an ensemble of trees (totally random trees, a random forest, or gradient boosted trees) on the training set. Then each leaf of each tree in the ensemble is assigned a fixed arbitrary feature index in a new feature space. These leaf indices are then encoded in a one-hot fashion.Each sample goes through the decisions of each tree of the ensemble and ends up in one leaf per tree. The sample is encoded by setting feature values for these leaves to 1 and the other feature values to 0.The resulting transformer has then learned a supervised, sparse, high-dimensional categorical embedding of the data.
This book provides an introduction to statistical learning methods. It is aimed for upper level undergraduate students, masters students and Ph.D. students in the non-mathematical sciences. The book also contains a number of R labs with detailed explanations on how to implement the various methods in real life settings, and should be a valuable resource for a practicing data scientist.
Built in spare time by @karpathy to accelerate research.
Basically a good way to keep up with recent research in ML