Quite interesting. As you mention, the benchmark outlines only what the models know ‘about’ teaching, not whether they can actually do the teaching. I would love to see a study of that, perhaps limiting it to the Top 3 candidates from this survey.
Thank you for reading and for dropping a comment! Yes, indeed we need to be aware of the limitations of this benchmark. However, you should have a look at the questions (the dataset is linked in the article), they're very praxis-oriented, so it's actually a very solid base. At the end of the day, consider that this is exactly what we also ask of teachers (the questions come from an official teacher certification training in Chile, which is quite strict!). Plus, having that leaderboard really is a good compromise, in my opinion.
Quite interesting. As you mention, the benchmark outlines only what the models know ‘about’ teaching, not whether they can actually do the teaching. I would love to see a study of that, perhaps limiting it to the Top 3 candidates from this survey.
Thank you for reading and for dropping a comment! Yes, indeed we need to be aware of the limitations of this benchmark. However, you should have a look at the questions (the dataset is linked in the article), they're very praxis-oriented, so it's actually a very solid base. At the end of the day, consider that this is exactly what we also ask of teachers (the questions come from an official teacher certification training in Chile, which is quite strict!). Plus, having that leaderboard really is a good compromise, in my opinion.