Alignment faking in large language models

1 Like