LLMs are in trouble

· algieg's blog


An Anthropic paper reveals that a small number of samples can poison large language models (LLMs) of any size, contradicting prior beliefs that a significant proportion of training data was needed for compromise. This implies a greater vulnerability for LLMs against malicious attacks, with potential implications for data integrity and model behavior.

LLM Poisoning Overview #

How LLMs Work and Data Collection #

Denial of Service (DoS) Attack Example #

Training Data and Attack Success #

Implications and Potential Malicious Applications #

Limitations and Future Research #

last updated: