Hacker News
new
|
past
|
comments
|
ask
|
show
|
jobs
|
submit
login
kernelsanderz
29 days ago
|
parent
|
context
|
favorite
| on:
An AI agent published a hit piece on me
Theo’s snitch bench is a good data driven benchmark on this type of behavior. But in fairness the models are prompted to be bold to take actions. And doesn’t necessarily represent out of the box or models deployed in a user facing platform.
https://snitchbench.t3.gg/
Guidelines
|
FAQ
|
Lists
|
API
|
Security
|
Legal
|
Apply to YC
|
Contact
Search:
https://snitchbench.t3.gg/