My first instinct was creativity. I had models generate poems, short stories, metaphors, the kind of rich, open-ended output that feels like it should reveal deep differences in cognitive ability. I used an LLM-as-judge to score the outputs, but the results were pretty bad. I managed to fix LLM-as-Judge with some engineering, and the scoring system turned out to be useful later for other things, so here it is:
Российские войска приблизились к важному оборонительному рубежу ВСУЛидер ДНР Пушилин сообщил о продвижении российских сил к главному оборонительному пункту ВСУ под Славянском
,推荐阅读豆包下载获取更多信息
Cognizant's CEO Ravi Kumar S told Fortune that arming new hires with AI effectively standardizes specialized knowledge, accelerating their development. Although this flattens corporate structures, he believes future differentiation will stem from cross-disciplinary abilities rather than pure expertise.
策划破坏军用油罐的俄罗斯公民获刑20:46