Technology

AI censorship affects accuracy, warns Bielik co-creator

Adobe Stock
Adobe Stock

Several mechanisms allow artificial intelligence models to censor responses, which can affect the quality and reliability of the information they provide, according to Krzysztof Wróbel, co-creator of the Polish AI system Bielik.

A study recently published in PNAS Nexus found that Chinese AI chatbots respond differently to politically sensitive questions about China compared to Western language models. The Chinese systems were more likely to refuse to answer, omit inconvenient facts, or provide false information, indicating systemic censorship.

"In the case of closed models (like those from Google or OpenAI), we cannot be certain of their creators' intentions. We do not know what data they used or what values guided their model development. Remember that the results you obtain from such sources may be biased," Wróbel told PAP.

He said Bielik was designed without censorship. "In Bielik's case, we assumed we would not censor it. We are not training it to refuse to answer specific questions." He cited psychoactive substances as an example, where most closed models deliver censored responses. "However, there are industries, such as the pharmaceutical industry, where such topics should not be taboo. Therefore, Bielik (the downloadable version) is designed to provide information even on sensitive topics."

Wróbel noted that completely unrestricted AI can also pose risks. He described the Bielik Guard (Sójka), a content moderation add-on that prevents the chatbot from delivering dangerous messages, including hate speech, profanity, sexual content, instructions for crime, or material related to self-harm. Sójka allows institutions to adjust "safety sliders" to protect chatbots—not just Bielik—from misuse.

According to Wróbel, AI censorship can occur at multiple stages. One is through the selection of training data. "If a model never sees texts on a given topic, it simply will not learn to talk about it. For example, if a country bans publishing content about a historical event, the language model will not learn about it and therefore will not provide a correct response."

Creators can also deliberately reject or modify training texts before adding them to the database. Fully open models documenting every step of their development remain rare. Even in Bielik, low-quality materials had to be filtered, which could unintentionally introduce bias. "For example, we can assume that the Google models received a lot of data about the corporation itself. But perhaps it is mostly positive information about the company," Wróbel said.

Censorship can also be introduced during training by human annotators, who teach the model desired forms of expression. Employees can then ensure AI responds according to organizational or government policies.

Restrictions can also be applied to an existing system through hidden instructions, or "prompts," which specify how a chatbot should answer particular questions. According to Wróbel, developers can add new prompts overnight—sometimes at the request of government authorities or other stakeholders.

"The law in individual countries already influences the responses citizens receive from chatbots. In Poland, we also have some restrictions. For example, automated systems should not provide medical, legal, or financial advice," he said. He added that failing to include appropriate disclaimers could expose developers to lawsuits.

Wróbel highlighted even subtler forms of censorship. Research on Chinese AI models generating source code found that projects on topics sensitive to China had 50% more security vulnerabilities than neutral projects, potentially making them more vulnerable to cyberattacks. "It was either a deliberate action or a side effect of incorporating censorship into the functioning of these models," he said.

"If you use language models, remember: they will never be 100% accurate or objective. You must always verify the information they provide. The most important thing is not to blindly trust them," Wróbel added.

Ludwika Tomala (PAP)

lt/ bar/

tr. RL

The PAP Foundation allows free reprinting of articles from the Nauka w Polsce portal provided that we are notified once a month by e-mail about the fact of using the portal and that the source of the article is indicated. On the websites and Internet portals, please provide the following address: Source: www.scienceinpoland.pl, while in journals – the annotation: Source: Nauka w Polsce - www.scienceinpoland.pl. In case of social networking websites, please provide only the title and the lead of our agency dispatch with the link directing to the article text on our web page, as it is on our Facebook profile.

More on this topic

  • The glowing vial contains an aqueous dispersion of CdTe quantum dots modified with mercaptopropionic acid (CdTe/MPA), to which cisplatin can be attached via electrostatic interaction. Credit: Promotion Department of the Faculty of Chemistry, Warsaw University of Technology, Agnieszka Sikora

    Polish scientists develop quantum dot system to deliver chemotherapy drugs directly to tumours

  • Adobe Stock

    Polish firm Astronika to build instrument booms for ESA’s Vigil space weather probe

Before adding a comment, please read the Terms and Conditions of the Science in Poland forum.