본문 바로가기
반응형

오늘의 자연어 처리572

[2023-07-04] 오늘의 자연어처리 Biomedical Language Models are Robust to Sub-optimal Tokenization As opposed to general English, many concepts in biomedical terminology have been designed in recent history by biomedical professionals with the goal of being precise and concise. This is often achieved by concatenating meaningful biomedical morphemes to create new semantic units. Nevertheless, most modern biomedical language mode.. 2023. 7. 4.
[2023-07-03] 오늘의 자연어처리 Tokenization and the Noiseless Channel Subword tokenization is a key part of many NLP pipelines. However, little is known about why some tokenizer and hyperparameter combinations lead to better downstream model performance than others. We propose that good tokenizers lead to \emph{efficient} channel usage, where the channel is the means by which some input is conveyed to the model and efficiency.. 2023. 7. 3.
[2023-07-02] 오늘의 자연어처리 LyricWhiz: Robust Multilingual Zero-shot Lyrics Transcription by Whispering to ChatGPT We introduce LyricWhiz, a robust, multilingual, and zero-shot automatic lyrics transcription method achieving state-of-the-art performance on various lyrics transcription datasets, even in challenging genres such as rock and metal. Our novel, training-free approach utilizes Whisper, a weakly supervised robust .. 2023. 7. 2.
[2023-07-02] 오늘의 자연어처리 Automatic Speech Recognition of Non-Native Child Speech for Language Learning Applications Voicebots have provided a new avenue for supporting the development of language skills, particularly within the context of second language learning. Voicebots, though, have largely been geared towards native adult speakers. We sought to assess the performance of two state-of-the-art ASR systems, Wav2Vec2.0.. 2023. 7. 2.
반응형