Follow

RT @johnjnay: LLM Alignment Limitations

-There exist prompts that trigger LLM into outputting practically any bad behavior, with prob increasing w/ prompt length
-RLHF can make things worse
-Mimicking personas that demonstrate bad behavior is efficient way to evoke it

t.co/bqB5uVfNxQ t.co/e7MMSRwXrt

Sign in to participate in the conversation
Mastodon

海行の個人インスタンスです。
よろしくどうぞ。

ホームページ
http://soysoftware.sakura.ne.jp/