[2023-02-07] 오늘의 자연어처리

Generalizing to Unseen Elements: A Survey on Knowledge Extrapolation for Knowledge Graphs

Knowledge graphs (KGs) have become effective knowledge resources in diverse applications, and knowledge graph embedding (KGE) methods have attracted increasing attention in recent years. However, it's still challenging for conventional KGE methods to handle unseen entities or relations during the model test. Much effort has been made in various fields of KGs to address this problem. In this paper, we use a set of general terminologies to unify these methods and refer to them as Knowledge Extrapolation. We comprehensively summarize these methods classified by our proposed taxonomy and describe their correlations. Next, we introduce the benchmarks and provide comparisons of these methods from aspects that are not reflected by the taxonomy. Finally, we suggest some potential directions for future research.

지식 그래프(KG)는 다양한 분야에서 효과적인 지식 자원이 되었다 응용 프로그램 및 지식 그래프 임베딩(KGE) 방법이 매력적이다 최근 몇 년간 증가하는 관심. 하지만, 그것은 여전히 어려운 일이다 전통적인 KGE 방법들은 눈에 보이지 않는 실체들이나 관계들을 처리한다 모형 시험. 이를 해결하기 위해 KG의 다양한 분야에서 많은 노력이 기울여졌다 문제. 이 논문에서, 우리는 이러한 것들을 통합하기 위해 일련의 일반적인 용어들을 사용한다 방법 및 이를 지식 외삽법이라고 합니다. 우리는 포괄적으로 우리의 제안된 분류법에 의해 분류된 이러한 방법들을 요약하고 그것들의 설명한다 상관 관계. 다음으로 벤치마크를 소개하고 다음을 비교합니다 분류법에 의해 반영되지 않는 측면에서 이러한 방법들. 드디어 저희가 미래 연구의 잠재적인 방향을 제시한다.

Efficient Domain Adaptation for Speech Foundation Models

Foundation models (FMs), that are trained on broad data at scale and are adaptable to a wide range of downstream tasks, have brought large interest in the research community. Benefiting from the diverse data sources such as different modalities, languages and application domains, foundation models have demonstrated strong generalization and knowledge transfer capabilities. In this paper, we present a pioneering study towards building an efficient solution for FM-based speech recognition systems. We adopt the recently developed self-supervised BEST-RQ for pretraining, and propose the joint finetuning with both source and unsupervised target domain data using JUST Hydra. The FM encoder adapter and decoder are then finetuned to the target domain with a small amount of supervised in-domain data. On a large-scale YouTube and Voice Search task, our method is shown to be both data and model parameter efficient. It achieves the same quality with only 21.6M supervised in-domain data and 130.8M finetuned parameters, compared to the 731.1M model trained from scratch on additional 300M supervised in-domain data.

광범위한 데이터에 대해 규모에 맞게 교육을 받은 기반 모델(FM) 광범위한 다운스트림 작업에 적응할 수 있으며, 에 대한 큰 관심을 가져왔다 연구계. 다음과 같은 다양한 데이터 소스의 이점 활용 다양한 양식, 언어 및 애플리케이션 도메인, 기반 모델은 강력한 일반화 및 지식 전달 능력을 입증했습니다. 이 점에서. 논문, 우리는 효율적인 솔루션을 구축하기 위한 선구적인 연구를 제시한다 FM 기반 음성 인식 시스템. 우리는 최근에 개발된 것을 채택한다 사전 교육을 위한 자체 감독 BEST-RQ를 제안하고 다음과 같은 공동 미세 조정을 제안한다 Just Hydra를 사용하여 소스 및 비지도 대상 도메인 데이터를 모두 사용할 수 있습니다. FM 인코더 어댑터와 디코더는 다음으로 대상 도메인에 미세 조정됩니다 소량의 도메인 내 데이터를 관리할 수 있습니다. 대규모 유튜브와 음성으로 검색 작업, 우리의 방법은 데이터와 모델 매개 변수 모두 효율적인 것으로 나타났다. 21개만 있어도 동일한 품질을 달성합니다.6M이 감독하는 도메인 내 데이터 731과 비교하여 130.8M의 미세 조정된 매개 변수입니다.처음부터 훈련된 1M 모델 추가적인 300M 감독 도메인 내 데이터에 대해.

A Case Study for Compliance as Code with Graphs and Language Models: Public release of the Regulatory Knowledge Graph

The paper presents a study on using language models to automate the construction of executable Knowledge Graph (KG) for compliance. The paper focuses on Abu Dhabi Global Market regulations and taxonomy, involves manual tagging a portion of the regulations, training BERT-based models, which are then applied to the rest of the corpus. Coreference resolution and syntax analysis were used to parse the relationships between the tagged entities and to form KG stored in a Neo4j database. The paper states that the use of machine learning models released by regulators to automate the interpretation of rules is a vital step towards compliance automation, demonstrates the concept querying with Cypher, and states that the produced sub-graphs combined with Graph Neural Networks (GNN) will achieve expandability in judgment automation systems. The graph is open sourced on GitHub to provide structured data for future advancements in the field.

그 논문은 자동화를 위해 언어 모델을 사용하는 것에 대한 연구를 제시한다 컴플라이언스를 위한 실행 가능한 지식 그래프(KG) 구축. 신문 아부다비 글로벌 시장 규정과 분류법에 초점을 맞추고, 매뉴얼을 포함한다 규정의 일부에 태그를 지정하고, BERT 기반 모델을 교육합니다 그리고 나머지 말뭉치에 적용됩니다. 상호 참조 해결 및 구문 분석은 태그가 지정된 엔티티 간의 관계를 분석하는 데 사용되었습니다 Neo4j 데이터베이스에 저장된 KG를 형성합니다. 그 논문은 기계의 사용을 명시하고 있다 규칙 해석을 자동화하기 위해 규제 기관에 의해 공개된 학습 모델 컴플라이언스 자동화를 향한 중요한 단계이며, 이 개념을 보여줍니다 사이퍼로 쿼리하고, 생성된 하위 그래프가 다음과 결합되었다고 말한다 그래프 신경망(GNN)은 판단 자동화에서 확장성을 달성할 것이다 시스템들. 그래프는 GitHub에서 오픈 소스로 제공되어 다음을 위한 구조화된 데이터를 제공합니다 그 분야에서의 장래의 진보.

'오늘의 자연어 처리' 카테고리의 다른 글

[2023-02-09] 오늘의 자연어처리 (0)	2023.02.09
[2023-02-08] 오늘의 자연어처리 (0)	2023.02.08
[2023-02-06] 오늘의 자연어처리 (0)	2023.02.06
[2023-02-05] 오늘의 자연어처리 (0)	2023.02.05
[2023-02-04] 오늘의 자연어처리 (0)	2023.02.04

잡다한 이야기

[2023-02-07] 오늘의 자연어처리

Generalizing to Unseen Elements: A Survey on Knowledge Extrapolation for Knowledge Graphs

Efficient Domain Adaptation for Speech Foundation Models

A Case Study for Compliance as Code with Graphs and Language Models: Public release of the Regulatory Knowledge Graph

'오늘의 자연어 처리' 카테고리의 다른 글

댓글

티스토리툴바

[2023-02-07] 오늘의 자연어처리

Generalizing to Unseen Elements: A Survey on Knowledge Extrapolation for Knowledge Graphs

Efficient Domain Adaptation for Speech Foundation Models

A Case Study for Compliance as Code with Graphs and Language Models: Public release of the Regulatory Knowledge Graph

'오늘의 자연어 처리' 카테고리의 다른 글

관련글

댓글

티스토리툴바