MANIFOLD
Will Anthropic’s next Sonnet model exceed 65% on terminal bench?
7
Ṁ100Ṁ1.4k
Dec 31
3%
chance

Will be looking toward https://www.tbench.ai/ for evals, using the terminus 2 scaffolding.

Only counts if the number in the model’s name increments, so a new Claude Sonnet 4.5 checkpoint does not count.

If a new Sonnet model is not released by 2027 this will resolve NA

Market context
Get
Ṁ1,000
to start trading!
Sort by:

@JaundicedBaboon I don't think anybody is going to test Sonnet 4.6 on Termial Bench, Anthropic had Sonnet 4.6's Terminal-Bench 2.0 score at 59.1%, but nobody has submitted its results to the leaderboard yet. I don't know if you think Anthropic's results are good enough of if you want to continue waiting for somebody to submit results to the leaderboard.

© Manifold Markets, Inc.TermsPrivacy