Polish window for ChatGPT: Scientists invite to collaborate on creating Polish chatbot
Researchers from Wrocław are working on the Polish equivalent of ChatGPT. In order to develop it, they need as much data as possible about the conversations of Polish users with artificial intelligence. They request that users who engage in conversations with ChatGPT use the Polish window they have prepared.
ChatGPT was made available in November 2022 by the American company OpenAI. It is an AI-based content generator - a bot that can communicate in a natural language. The tool (which also speaks Polish) can answer questions, translate documents into various languages, proofread and edit texts, summarize and analyse scientific papers, suggest solutions to various problems, write essays, scripts, correct errors in programming codes, search databases. And its applications are still being discovered.
'We estimate that up to 70% of people in Poland have not had contact with this chatbot yet. For many people, an insurmountable difficulty is the fact that ChatGPT does not have a Polish interface. In addition, in order to be able to use the chatbot, you need to log in using a Google account login or by providing a phone number. This is a barrier that many people are unable to overcome. We are addressing these problems', Dr. Jan Kocoń from the CLARIN-PL project of the Wrocław University of Technology explains in an interview with PAP.
The team from Wrocław has prepared a Polish dialog box for ChatGPT. The idea is very simple: users talk to ChatGPT through a Polish website and Polish researchers also have access to these conversations. Thanks to this, Polish users have easier access to the American application, and researchers gain a database of chat queries and information about what is missing in ChatGPT's answers.
The CLARIN-PL team's website is prepared in Polish. The first few questions can be asked immediately, without logging in. For users who log in (registration is free), there are larger limits than in the case of free access to ChatGPT. In this way, the researchers want to encourage Polish users to allow to take a peek at their conversations with artificial intelligence.
'We are working on the Polish equivalent of ChatGPT. For this solution to have a chance to exist, we need to collect as much information as possible about the methods of using such chats by Polish recipients. ChatGPT was created abroad and its creators did not necessarily focus on problems that would be important for Polish users', says Dr. Kocoń.
In his opinion, ChatGPT's command of the Polish language is much worse than, for example, English. It makes language mistakes and does not always understand queries formulated in Polish. This can be seen, for example, when we ask the chatbot to write a poem or song, the researcher points out. In English it works out quite well, but in Polish the text usually doesn't even rhyme.
'We have no information on how the OpenAI model was created, but our main suspicion is that it has +seen+ relatively little Polish compared to other languages. The model most likely uses an interlingual knowledge transfer based on a translation database', the scientist says.
The researcher describes that creating artificial intelligence consists of two main stages: you need to have a large database - in this case texts, on the basis of which the model learns the language. And then you need a database of queries and answers, on the basis of which the AI learns to generate the desired content.
The problem is not only that ChatGPT saw few texts in Polish at the stage of creating language models, but also saw few Polish instructions and queries at the training stage.
That is why scientists from Wrocław want to develop a model that will have the Polish language at its core from the very beginning. 'We can't compete with OpenAI in a language like English, but when it comes to Slavic languages - we have a lot to offer. We have a very large database of corpus texts (used for linguistic research) - in Polish. Based on them, we are able to create a large language model. And then we want to tune it on the instructions we get from users', explains the scientist.
'The most important thing for us is that users report various types of irregularities resulting from the use of the chatbot through our window', explains Jan Kocoń.
If the chatbot doesn't respond as expected, you can press the sad face below the dialog window. This alone is enough for researchers as a signal to look at the bot's response and check what is wrong.
After each answer rating, a window opens in which users can enter their comments - for example, draw attention to language errors or the fact that ChatGPT made up some information, or that its reply is offensive. And even enter what answer would be satisfactory. This way, users will not only help Polish researchers, but also train their critical thinking and exercise limited trust in artificial intelligence.
Comments and reactions of Internet users will be reviewed by a Polish team of researchers. The inquiries and conclusions from these conversations will be used in the work on the Polish bot. 'For a good chatbot to be developed, it needs a lot of instructions. This is what OpenAI did - they hired a lot of people who talked to the bot and corrected its responses', the researcher says. As a result, the chatbot learned which content was desirable, and which content should not be generated.
The idea is to teach the artificial intelligence that there is a certain class of questions that the chatbot cannot answer directly (this includes content that may facilitate committing a crime, violate privacy or offend religious feelings). Someone had to manually prepare standard answers for such a class of queries, and the model was tuned to these instructions.
Polish researchers do not have such funds as the American company. They are not able to predict all possible uses of the chatbot and check whether it works well. Instead, they will benefit from the interaction of Polish users with ChatGPT to train their own model.
The researcher reports that his university - in collaboration with the Wrocław Centre for Networking and Supercomputing - is in the process of purchasing computing equipment that will be useful in developing research on Polish artificial intelligence. The budget is large, approx. PLN 80 million, but the equipment will be delivered to scientists only next year. However, researchers have already prepared an additional budget for access to computing power. 'We don't want to idly wait for the equipment, we are already preparing the data, on which we will train our model', he says.
The scientist also appeals to researchers and specialists in various fields for help in the work on Polish artificial intelligence. 'If we want to have Polish high-margin technologies, we must develop and research them. And without data, we will not move forward', he emphasises.
The members of the research team are: Bartosz Walkowiak, Dawid Banach, Tomasz Walkowiak, Magdalena Drewniak, Jan Wieczorek, Paweł Kazienko, Tomasz Naskręt, Jan Kocoń, Maciej Piasecki. (PAP)
PAP - Science in Poland, Ludwika Tomala
lt/ bar/ kap/