Technology

Polish language not ‘superior’ for AI prompting, researchers say

Adobe Stock
Adobe Stock

Claims that Polish is the “best language for prompting” AI models are incorrect, according to the research behind the OneRuler benchmark, a multilingual test suite evaluating how AI models process very long texts.

The study, presented in October at the CoLM (Conference on Language Modeling), compared AI performance across 26 languages.

While models performed slightly better on average with Polish, the differences between Polish and English were not statistically significant, and the study did not conclude that any language is superior, says Marzena Karpińska, co-author of the research.

She told the Polish Press Agency: “It is not true. We have not researched this at all. We have created a tool for diagnosing language models, checking how well they extract information from very long texts.”

The benchmark tasked models with finding specific sentences embedded in books, a process analogous to using CTRL+F in text editors or browsers.

“We expected to achieve 100 percent accuracy in many languages. They did not. We noticed that the models started to go wrong, especially when we reminded them in the prompt that the searched text might not contain the answer. And then the model should write that there is no answer,” Karpińska said.

Another task required compiling lists of the most common words in the texts. Performance dropped significantly, likely because the task required models to use the full context rather than simply locating information.

The choice of books for each language may also have influenced results. Different works were used for each language—for example, the third volume of Nights and Days for Polish, Don Quixote for Spanish, Little Women for English, and The Magic Mountain for German.

Karpińska said this could explain why Polish performed slightly better in the benchmark.

“There are so many different factors in this study that we certainly cannot conclude on its basis that Polish is the best language for +prompting+,” she said.

She also warned that the results highlight the limitations of current language models.

“People upload heaps of documents into chatGPT and ask questions about the content. And we must remember that language models still have very limited text processing capabilities. Sometimes they are incredibly good, and a moment later – they make huge mistakes. You have to ask again and verify with a different model. And above all, you need to be careful what documents you upload into the models, especially when it comes to sensitive content and privacy,” Karpińska said.

PAP - Science in Poland, Ludwika Tomala

lt/ bar/

tr. RL

 

The PAP Foundation allows free reprinting of articles from the Nauka w Polsce portal provided that we are notified once a month by e-mail about the fact of using the portal and that the source of the article is indicated. On the websites and Internet portals, please provide the following address: Source: www.scienceinpoland.pl, while in journals – the annotation: Source: Nauka w Polsce - www.scienceinpoland.pl. In case of social networking websites, please provide only the title and the lead of our agency dispatch with the link directing to the article text on our web page, as it is on our Facebook profile.

More on this topic

  • Adobe Stock

    Children treat robots politely regardless of communicative style, Polish researchers find

  • Adobe Stock

    Gen Z have poor sleep and depression because of late night Instagram use, study finds

Before adding a comment, please read the Terms and Conditions of the Science in Poland forum.