[2023-05-27] 오늘의 자연어처리

UNITE: A Unified Benchmark for Text-to-SQL Evaluation

A practical text-to-SQL system should generalize well on a wide variety of natural language questions, unseen database schemas, and novel SQL query structures. To comprehensively evaluate text-to-SQL systems, we introduce a \textbf{UNI}fied benchmark for \textbf{T}ext-to-SQL \textbf{E}valuation (UNITE). It is composed of publicly available text-to-SQL datasets, containing natural language questions from more than 12 domains, SQL queries from more than 3.9K patterns, and 29K databases. Compared to the widely used Spider benchmark \cite{yu-etal-2018-spider}, we introduce $\sim$120K additional examples and a threefold increase in SQL patterns, such as comparative and boolean questions. We conduct a systematic study of six state-of-the-art (SOTA) text-to-SQL parsers on our new benchmark and show that: 1) Codex performs surprisingly well on out-of-domain datasets; 2) specially designed decoding methods (e.g. constrained beam search) can improve performance for both in-domain and out-of-domain settings; 3) explicitly modeling the relationship between questions and schemas further improves the Seq2Seq models. More importantly, our benchmark presents key challenges towards compositional generalization and robustness issues -- which these SOTA models cannot address well.

실용적인 텍스트-to-SQL 시스템은 다양한 종류의 자연어 질문, 보이지 않는 데이터베이스 스키마 및 새로운 SQL 쿼리 구조물들. 텍스트-SQL 시스템을 종합적으로 평가하기 위해, 우리는 다음을 소개한다 \textbf{\textbf{에 대한 UNI} 결합 벤치마크T}ext-to-SQL \textbf{E}평가 (UNITE). 다음을 포함하여 공개적으로 사용 가능한 텍스트-SQL 데이터 세트로 구성되어 있다 12개 이상의 도메인에서 자연어 질문, SQL 쿼리 3.9K 패턴과 29K 데이터베이스보다 더 많습니다. 널리 사용되는 스파이더와 비교하여 벤치마크 \cite{yu-etal-2018-spider}, 우리는 $\sim$12K를 추가로 도입한다 예와 SQL 패턴의 3배 증가, 예를 들어 비교 및 부울적인 질문. 우리는 6개의 최첨단(SOTA)에 대한 체계적인 연구를 수행한다 새로운 벤치마크에서 텍스트-to-SQL 파서는 다음을 보여줍니다. 1) Codex는 도메인 외부 데이터 세트에서 놀라울 정도로 잘 작동합니다. 2) 특별히 설계된 디코딩 방법(예: 제한된 빔 검색)은 두 가지 모두에 대한 성능을 향상시킬 수 있습니다 도메인 내 및 도메인 외부 설정, 3) 관계를 명시적으로 모델링 질문과 스키마 사이에서 Seq2Seq 모델을 더욱 개선한다. 더 중요한 것은, 우리의 벤치마크가 구성을 향한 주요 과제를 제시한다는 것이다 일반화 및 견고성 문제 - 이러한 SOTA 모델이 해결할 수 없는 문제 음.

Give Me More Details: Improving Fact-Checking with Latent Retrieval

Evidence plays a crucial role in automated fact-checking. When verifying real-world claims, existing fact-checking systems either assume the evidence sentences are given or use the search snippets returned by the search engine. Such methods ignore the challenges of collecting evidence and may not provide sufficient information to verify real-world claims. Aiming at building a better fact-checking system, we propose to incorporate full text from source documents as evidence and introduce two enriched datasets. The first one is a multilingual dataset, while the second one is monolingual (English). We further develop a latent variable model to jointly extract evidence sentences from documents and perform claim verification. Experiments indicate that including source documents can provide sufficient contextual clues even when gold evidence sentences are not annotated. The proposed system is able to achieve significant improvements upon best-reported models under different settings.

증거는 자동화된 사실 확인에서 중요한 역할을 한다. 확인할 때 실제 주장, 기존 사실 확인 시스템은 증거를 가정한다 문장이 제공되거나 검색 엔진에서 반환된 검색 스니펫을 사용합니다. 이러한 방법은 증거 수집의 어려움을 무시하고 제공하지 않을 수 있다 실제 주장을 확인할 수 있는 충분한 정보. 더 나은 건물을 짓는 것을 목표로 한다 사실 확인 시스템, 우리는 소스 문서의 전체 텍스트를 통합할 것을 제안한다 증거로 두 개의 풍부한 데이터 세트를 소개합니다. 첫 번째는 다국어 데이터 세트, 두 번째 데이터 세트는 단일 언어(영어)이다. 우리는 더 나아간다 증거 문장을 공동으로 추출하기 위한 잠재 변수 모델을 개발하다 문서화하고 클레임 확인을 수행합니다. 실험에 따르면 다음을 포함한다 출처 문서는 금일 경우에도 충분한 맥락적 단서를 제공할 수 있다 증거 문장에는 주석이 달리지 않습니다. 제안된 시스템은 달성할 수 있다 다양한 설정에서 최적의 모델에 대한 상당한 개선.

What about em? How Commercial Machine Translation Fails to Handle (Neo-)Pronouns

As 3rd-person pronoun usage shifts to include novel forms, e.g., neopronouns, we need more research on identity-inclusive NLP. Exclusion is particularly harmful in one of the most popular NLP applications, machine translation (MT). Wrong pronoun translations can discriminate against marginalized groups, e.g., non-binary individuals (Dev et al., 2021). In this ``reality check'', we study how three commercial MT systems translate 3rd-person pronouns. Concretely, we compare the translations of gendered vs. gender-neutral pronouns from English to five other languages (Danish, Farsi, French, German, Italian), and vice versa, from Danish to English. Our error analysis shows that the presence of a gender-neutral pronoun often leads to grammatical and semantic translation errors. Similarly, gender neutrality is often not preserved. By surveying the opinions of affected native speakers from diverse languages, we provide recommendations to address the issue in future MT research.

3인칭 대명사의 사용이 새로운 형태로 변화함에 따라, 예를 들어, 네오프로명사, 우리는 신원을 포함한 NLP에 대한 더 많은 연구가 필요하다. 제외는 특히 중요합니다 가장 인기 있는 NLP 애플리케이션 중 하나인 기계 번역(MT)에서 유해합니다. 잘못된 대명사 번역은 소외된 그룹을 차별할 수 있다. 이진이 아닌 개인(Dev et al., 2021). 이 "현실 점검"에서 우리는 공부한다 세 가지 상용 MT 시스템이 어떻게 3인칭 대명사를 번역하는지. 구체적으로 저희가 영어의 성대립대명사와 성중립대명사의 번역을 비교하다 5개의 다른 언어(덴마크어, 파르시어, 프랑스어, 독일어, 이탈리아어) 및 그 반대에도 적용됩니다 반대로, 덴마크어에서 영어로. 우리의 오류 분석은 의 존재를 보여준다 성중립대명사는 종종 문법적이고 의미론적인 번역으로 이어진다 오류. 비슷하게, 성중립성은 종종 보존되지 않는다. 설문 조사를 통해 다양한 언어의 영향을 받은 원어민들의 의견, 우리는 제공한다 향후 MT 연구에서 이 문제를 다루기 위한 권고 사항.

'오늘의 자연어 처리' 카테고리의 다른 글

[2023-05-29] 오늘의 자연어처리 (0)	2023.05.29
[2023-05-28] 오늘의 자연어처리 (0)	2023.05.28
[2023-05-26] 오늘의 자연어처리 (0)	2023.05.26
[2023-05-25] 오늘의 자연어처리 (0)	2023.05.25
[2023-05-24] 오늘의 자연어처리 (0)	2023.05.24

잡다한 이야기

[2023-05-27] 오늘의 자연어처리

UNITE: A Unified Benchmark for Text-to-SQL Evaluation

Give Me More Details: Improving Fact-Checking with Latent Retrieval

What about em? How Commercial Machine Translation Fails to Handle (Neo-)Pronouns

'오늘의 자연어 처리' 카테고리의 다른 글

댓글

티스토리툴바

[2023-05-27] 오늘의 자연어처리

UNITE: A Unified Benchmark for Text-to-SQL Evaluation

Give Me More Details: Improving Fact-Checking with Latent Retrieval

What about em? How Commercial Machine Translation Fails to Handle (Neo-)Pronouns

'오늘의 자연어 처리' 카테고리의 다른 글

관련글

댓글

티스토리툴바