Abstract: Large Language Models (LLMs) are vulnerable to adversarial prompt attacks, which can lead to “jailbreaking” and the failure of safety alignment, resulting in the generation of harmful ...
Some results have been hidden because they may be inaccessible to you
Show inaccessible results