본문 바로가기
오늘의 자연어 처리

[2022-08-15] 오늘의 자연어처리

by 지환이아빠 2022. 8. 15.
반응형

An Empirical Exploration of Cross-domain Alignment between Language and Electroencephalogram

 

Electroencephalography (EEG) and language have been widely explored independently for many downstream tasks (e.g., sentiment analysis, relation detection, etc.). Multimodal approaches that study both domains have not been well explored, even though in recent years, multimodal learning has been seen to be more powerful than its unimodal counterparts. In this study, we want to explore the relationship and dependency between EEG and language, i.e., how one domain reflects and represents the other. To study the relationship at the representation level, we introduced MTAM, a Multimodal Transformer Alignment Model, to observe coordinated representations between the two modalities, and thus employ the transformed representations for downstream applications. We used various relationship alignment-seeking techniques, such as Canonical Correlation Analysis and Wasserstein Distance, as loss functions to transfigure low-level language and EEG features to high-level transformed features. On downstream applications, sentiment analysis, and relation detection, we achieved new state-of-the-art results on two datasets, ZuCo and K-EmoCon. Our method achieved an F1-score improvement of 16.5% on sentiment analysis for K-EmoCon, 26.6% on sentiment analysis of ZuCo, and 31.1% on relation detection of ZuCo. In addition, we provide interpretation of the performance improvement by: (1) visualizing the original feature distribution and the transformed feature distribution, showing the effectiveness of the alignment module for discovering and encoding the relationship between EEG and language; (2) visualizing word-level and sentence-level EEG-language alignment weights, showing the influence of different language semantics as well as EEG frequency features; and (3) visualizing brain topographical maps to provide an intuitive demonstration of the connectivity of EEG and language response in the brain regions.

 

 

 

 

Searching for chromate replacements using natural language processing and machine learning algorithms

 

The past few years has seen the application of machine learning utilised in the exploration of new materials. As in many fields of research - the vast majority of knowledge is published as text, which poses challenges in either a consolidated or statistical analysis across studies and reports. Such challenges include the inability to extract quantitative information, and in accessing the breadth of non-numerical information. To address this issue, the application of natural language processing (NLP) has been explored in several studies to date. In NLP, assignment of high-dimensional vectors, known as embeddings, to passages of text preserves the syntactic and semantic relationship between words. Embeddings rely on machine learning algorithms and in the present work, we have employed the Word2Vec model, previously explored by others, and the BERT model - applying them towards a unique challenge in materials engineering. That challenge is the search for chromate replacements in the field of corrosion protection. From a database of over 80 million records, a down-selection of 5990 papers focused on the topic of corrosion protection were examined using NLP. This study demonstrates it is possible to extract knowledge from the automated interpretation of the scientific literature and achieve expert human level insights.

 

 

 

 

Overview of CTC 2021: Chinese Text Correction for Native Speakers

 

In this paper, we present an overview of the CTC 2021, a Chinese text correction task for native speakers. We give detailed descriptions of the task definition and the data for training as well as evaluation. We also summarize the approaches investigated by the participants of this task. We hope the data sets collected and annotated for this task can facilitate and expedite future development in this research area. Therefore, the pseudo training data, gold standards validation data, and entire leaderboard is publicly available online at this https URL.

 

 

 

 

반응형

댓글