[2023-09-05] 오늘의 자연어처리

Why do universal adversarial attacks work on large language models?: Geometry might be the answer

Transformer based large language models with emergent capabilities are becoming increasingly ubiquitous in society. However, the task of understanding and interpreting their internal workings, in the context of adversarial attacks, remains largely unsolved. Gradient-based universal adversarial attacks have been shown to be highly effective on large language models and potentially dangerous due to their input-agnostic nature. This work presents a novel geometric perspective explaining universal adversarial attacks on large language models. By attacking the 117M parameter GPT-2 model, we find evidence indicating that universal adversarial triggers could be embedding vectors which merely approximate the semantic information in their adversarial training region. This hypothesis is supported by white-box model analysis comprising dimensionality reduction and similarity measurement of hidden representations. We believe this new geometric perspective on the underlying mechanism driving universal attacks could help us gain deeper insight into the internal workings and failure modes of LLMs, thus enabling their mitigation.

새로운 기능을 갖춘 트랜스포머 기반의 대규모 언어 모델은 다음과 같다 사회에 점점 더 널리 퍼지게 되는 것. 하지만 이해의 과제는 그리고 적대적인 맥락에서 그들의 내부 작업을 해석한다 공격은 대부분 해결되지 않은 채로 남아 있습니다. 그래디언트 기반 범용 공격 큰 언어 모델과 잠재적으로 매우 효과적인 것으로 나타났다 입력에 구애받지 않는 특성으로 인해 위험합니다. 이 작품은 소설을 보여준다 대형에 대한 보편적인 적대적 공격을 설명하는 기하학적 관점 언어 모델. 117M 매개변수 GPT-2 모델을 공격함으로써 우리는 증거를 찾는다 보편적 적대적 트리거가 벡터를 내장할 수 있음을 나타냅니다 그들의 적대적 훈련에서 단지 의미적 정보를 근사화한다 지역. 이 가설은 다음을 포함하는 화이트박스 모형 분석에 의해 지지된다 숨겨진 표현의 차원 감소 및 유사성 측정. 우리는 기본적인 메커니즘 구동에 대한 이 새로운 기하학적 관점을 믿는다 보편적인 공격은 내부 작동에 대한 더 깊은 통찰력을 얻는 데 도움이 될 수 있다 그리고 LLM의 고장 모드를 사용하여 LLM의 완화를 가능합니다.

BatchPrompt: Accomplish more with less

Many LLMs are trained to perform zero-shot or few-shot inference using instruction-based prompts. Crafting prompts for these LLMs typically requires the user to provide a detailed task description, examples of context and completion, and single example of context for inference. This regular prompt baseline is referred to as SinglePrompt in this paper. However, for NLP tasks where each data point for inference is not necessarily lengthy, the token count for instructions and few-shot examples in the prompt may be considerably larger than that of the data point, resulting in lower token-resource utilization compared with encoder-based models like fine-tuned BERT. This cost-efficiency issue, affecting inference speed and compute budget, counteracts the many benefits LLMs have to offer. This paper aims to alleviate the preceding problem by batching multiple data points into a single prompt, a prompting strategy we refer to as BatchPrompt. This strategy increases the density of data points, which in turn leads to improved token utilization. Applying BatchPrompt naively, however, is very challenging due to significant performance degradation, as observed in our experiments. We also noticed varying inference outcomes for the same data point appearing in different positions within a prompt. To address the quality issue while remain high token-resource utilization, we introduce Batch Permutation and Ensembling for BatchPrompt, a simple way that recovers labeling quality through majority votes from data points placed in varying positions in a batch at the price of more token usage. To counterbalance the additional token usage caused by the voting process, we further propose Self-reflection-guided EArly Stopping, which can terminate the voting process early for data points the LLM confidently handles.

많은 LLM은 다음을 사용하여 제로샷 또는 퓨샷 추론을 수행하도록 훈련됩니다 명령 기반 프롬프트입니다. 이러한 LLM에 대한 프롬프트를 작성하려면 일반적으로 필요합니다 사용자는 상세한 작업 설명, 컨텍스트의 예를 제공합니다 완료 및 추론을 위한 단일 컨텍스트 예제. 이 일반 프롬프트 이 문서에서는 기준선을 단일 프롬프트라고 합니다. 그러나 NLP 태스크의 경우 추론을 위한 각 데이터 포인트가 반드시 긴 것은 아닌 경우, 토큰 카운트 지침 및 프롬프트의 몇 번의 예는 상당히 클 수 있습니다 데이터 포인트에 비해 토큰 리소스 활용률이 낮아짐 미세 조정된 BERT와 같은 인코더 기반 모델과 비교됩니다. 이 비용 효율성은 추론 속도와 계산 예산에 영향을 미치는 문제는 많은 문제에 대응합니다 LLM이 제공하는 이점. 이 논문은 앞의 문제를 완화하는 것을 목표로 한다 여러 데이터 포인트를 하나의 프롬프트로 일괄 처리함으로써, 우리는 전략을 유도한다 배치 프롬프트라고 합니다. 이 전략은 데이터 포인트의 밀도를 높입니다, 이는 토큰 활용도 향상으로 이어집니다. 배치 프롬프트 적용 그러나, 상당한 성능 때문에 순진하게도 매우 도전적이다 우리의 실험에서 관찰된 바와 같이, 열화. 우리는 또한 다양한 추론을 발견했다 a 내의 다른 위치에 나타나는 동일한 데이터 포인트에 대한 결과 신속한. 높은 토큰 리소스를 유지하면서 품질 문제를 해결하려면 활용, 우리는 배치 프롬프트를 위한 배치 순열 및 조립을 소개한다 데이터에서 다수의 투표를 통해 라벨 품질을 복구하는 간단한 방법 더 많은 토큰 사용 가격으로 배치된 다양한 위치에 배치된 포인트. 투표 과정에서 발생하는 추가 토큰 사용의 균형을 맞추기 위해, 우리는 추가적으로 자체 반사 유도 EARly Stop을 제안하며, 이는 다음을 종료할 수 있다 LLM이 자신 있게 처리하는 데이터 포인트에 대한 조기 투표 프로세스.

Large Content And Behavior Models To Understand, Simulate, And Optimize Content And Behavior

Shannon, in his seminal paper introducing information theory, divided the communication into three levels: technical, semantic, and effectivenss. While the technical level is concerned with accurate reconstruction of transmitted symbols, the semantic and effectiveness levels deal with the inferred meaning and its effect on the receiver. Thanks to telecommunications, the first level problem has produced great advances like the internet. Large Language Models (LLMs) make some progress towards the second goal, but the third level still remains largely untouched. The third problem deals with predicting and optimizing communication for desired receiver behavior. LLMs, while showing wide generalization capabilities across a wide range of tasks, are unable to solve for this. One reason for the underperformance could be a lack of "behavior tokens" in LLMs' training corpora. Behavior tokens define receiver behavior over a communication, such as shares, likes, clicks, purchases, retweets, etc. While preprocessing data for LLM training, behavior tokens are often removed from the corpora as noise. Therefore, in this paper, we make some initial progress towards reintroducing behavior tokens in LLM training. The trained models, other than showing similar performance to LLMs on content understanding tasks, show generalization capabilities on behavior simulation, content simulation, behavior understanding, and behavior domain adaptation. Using a wide range of tasks on two corpora, we show results on all these capabilities. We call these models Large Content and Behavior Models (LCBMs). Further, to spur more research on LCBMs, we release our new Content Behavior Corpus (CBC), a repository containing communicator, message, and corresponding receiver behavior.

섀넌은 정보 이론을 소개하는 그의 중요한 논문에서 다음과 같이 말했다 커뮤니케이션은 기술적, 의미적, 효과적의 세 가지 수준으로 분류됩니다. 하는 동안에 기술 수준은 전송된 정확한 재구성과 관련이 있다 기호, 의미 및 효과 수준은 추론된 의미를 다룹니다 그리고 그것이 수신기에 미치는 영향. 통신 덕분에, 첫 번째 레벨은 문제는 인터넷과 같은 큰 발전을 낳았다. 대규모 언어 모델 (LLM) 두 번째 목표를 향해 약간의 진전이 있지만, 세 번째 레벨은 여전히 대체로 훼손되지 않은 채로 남아 있다. 세 번째 문제는 예측과 원하는 수신기 동작을 위해 통신을 최적화합니다. LLM, 표시 중 광범위한 작업에 걸쳐 광범위한 일반화 기능을 제공할 수 있으며, 다음과 같은 기능을 제공할 수 없습니다 이 일을 해결하다. 실적이 저조한 이유 중 하나는 다음의 부족일 수 있다 LLM의 훈련 말뭉치에서 "행동 토큰". 동작 토큰은 수신기를 정의합니다 공유, 좋아요, 클릭, 구매와 같은 커뮤니케이션에 대한 행동, 리트윗 등. LLM 교육을 위한 데이터를 전처리하는 동안 동작 토큰은 종종 코퍼스에서 소음으로 제거됩니다. 그러므로, 이 논문에서, 우리는 몇 가지를 만든다 LLM 교육에서 행동 토큰을 재도입하기 위한 초기 진행. 그 콘텐츠에서 LLM과 유사한 성능을 보여주는 것 외에 훈련된 모델 작업 이해, 행동 시뮬레이션에 대한 일반화 기능 보여주기, 콘텐츠 시뮬레이션, 행동 이해 및 행동 영역 적응. 두 개의 말뭉치에 대한 광범위한 작업을 사용하여, 우리는 이 모든 것에 대한 결과를 보여줍니다 성능. 우리는 이 모델들을 대형 콘텐츠 및 행동 모델(LCBM)이라고 부른다. 또한, LCBM에 대한 더 많은 연구를 촉진하기 위해, 우리는 우리의 새로운 콘텐츠 행동을 공개한다 통신자, 메시지 및 대응을 포함하는 저장소인 CBC(Corpus) 수신자의 행동.

'오늘의 자연어 처리' 카테고리의 다른 글

[2023-09-07] 오늘의 자연어처리 (1)	2023.09.07
[2023-09-06] 오늘의 자연어처리 (0)	2023.09.06
[2023-09-04] 오늘의 자연어처리 (0)	2023.09.04
[2023-09-03] 오늘의 자연어처리 (1)	2023.09.03
[2023-09-02] 오늘의 자연어처리 (0)	2023.09.02

잡다한 이야기

[2023-09-05] 오늘의 자연어처리

Why do universal adversarial attacks work on large language models?: Geometry might be the answer

BatchPrompt: Accomplish more with less

Large Content And Behavior Models To Understand, Simulate, And Optimize Content And Behavior

'오늘의 자연어 처리' 카테고리의 다른 글

댓글

티스토리툴바

[2023-09-05] 오늘의 자연어처리

Why do universal adversarial attacks work on large language models?: Geometry might be the answer

BatchPrompt: Accomplish more with less

Large Content And Behavior Models To Understand, Simulate, And Optimize Content And Behavior

'오늘의 자연어 처리' 카테고리의 다른 글

관련글

댓글

티스토리툴바