2 Comments
User's avatar
Harold Toups's avatar

Quite interesting. As you mention, the benchmark outlines only what the models know ‘about’ teaching, not whether they can actually do the teaching. I would love to see a study of that, perhaps limiting it to the Top 3 candidates from this survey.

Expand full comment
Javier Santana's avatar

Thank you for reading and for dropping a comment! Yes, indeed we need to be aware of the limitations of this benchmark. However, you should have a look at the questions (the dataset is linked in the article), they're very praxis-oriented, so it's actually a very solid base. At the end of the day, consider that this is exactly what we also ask of teachers (the questions come from an official teacher certification training in Chile, which is quite strict!). Plus, having that leaderboard really is a good compromise, in my opinion.

Expand full comment