Discussion about this post

User's avatar
Harold Toups's avatar

Quite interesting. As you mention, the benchmark outlines only what the models know ‘about’ teaching, not whether they can actually do the teaching. I would love to see a study of that, perhaps limiting it to the Top 3 candidates from this survey.

Expand full comment
1 more comment...

No posts