Polish scientists develop language model for generating long texts
Polish researchers have developed LongLLaMA, a large language model based on the OpenLLaMA software created by Meta. It is freely available for download.
Large, open-source language models allow researchers to do advanced work. They can be used for all the tasks that chatbots already help people with. This includes, for example, generating text, editing text, chatting with users, creating summaries and translating.
The model will potentially support 64 times longer text than ChatGPT, its creators say.
LongLLaMA was developed by Szymon Tworkowski, Konrad Staniszewski, Mikołaj Pacek and Piotr Miło - researchers affiliated with IDEAS NCBR, the University of Warsaw and the Polish Academy of Sciences - and Yuhuai Wu, one of the co-creators of Elon Musk's startup xAI, and Henryk Michalewski, affiliated with the University of Warsaw and Google DeepMind.
'LongLLaMA is a large +Polish+ language model, available to everyone on the Internet. It can handle 8,000 characters at a time, which is approximately 30-50 pages of text, and in the case of some tasks much more, up to 256,000 characters, although this is only a technical result,’ says the team leader, Dr. Piotr Miłoś.
When Meta, the owner of Facebook, released OpenLLaMA, scientists from around the world, including those working under the supervision of Dr. Miłoś, started modifying it.
'Our LongLLaMA is capable of processing much larger contexts than was previously possible, which means it can +eat+ much more text in one bite,’ says Dr. Miłoś.
He explains that LongLLaMA can process very long input data. As a result, it generates more consistent and accurate answers than other models.
LongLLaMA can handle any amount of context without truncating and padding it, as demonstrated by passkey tests.
The researchers checked whether, after receiving a very long prompt (complex command), LongLLaMA would be able to remember the passkey given at the beginning. OpenLLaMA could only handle a prompt of 2,000 characters, and in longer contexts its effectiveness dropped to zero. Meanwhile, LongLLaMA maintained 94.5%. accuracy after receiving a prompt of 100,000 tokens and 73 percent accuracy after receiving 256 thousand tokens.
This model can currently generate coherent texts with a length of 8,000 characters. Potentially - up to 256,000 characters, which would significantly outperform models including ChatGPT, the creators say. It consumes relatively little energy - a single processor is enough to use LongLLaMA - and works very quickly.
'How can you visualise the difference? If, for simplicity, we assume that 1 character is 1 word, let us emphasize that 2,000 words constitute an approximately 7-page article. 256,000 words is approximately the length of the Harry Potter and the Order of the Phoenix novel (257,000 words) or Ulysses (265,000 words),’ the scientists say.
'ChatGPT is a commercial product. It has been optimised for convenient use. Models such as LongLLaMA generate rather raw information on which something can be built, e.g. a text analysed or code generated,’ says Dr. Miłoś.
Open source software can be modified by IT specialists around the world, which distinguishes it from the ChatGPT software, which has not been made publicly available, although it is known to be based on the Transformer architecture.
The authors of the Polish model explain that it is a type of neural network architecture that analyses text to distinguish complex connections between words on many layers, learning patterns based on huge amounts of data.
This technology has revolutionized natural language processing, enabling chatbots to generate text, translate, chat with users and perform many other tasks at a level previously unavailable to artificial intelligence.
Dr. Miłoś explains that when we ask a question to a chatbot based on Transformer, it changes the text into tokens. These are pieces of information, usually between one character and one word in length. In the sentence 'In 2023, out of the blue, chatbots changed our lives', a chatbot may see, for example, seven words, the number 2023, two commas and a period. By dividing text into characters, artificial intelligence can process information effectively.
However, the number of characters that a chatbot can accept is limited. In the case of ChatGPT 3.5, the character limit is 4,096, for OpenLLaMA it is 2,000, and for Google Bard - about 1,000.
Therefore, when you ask the chatbot a long question or provide a lot of information, you may need to cut or omit some parts to meet the character limit. Most existing chatbots cannot analyse an entire book, long conversation or article.
'The full potential of large language models is often limited by the amount of context a given model can accept,’ says Miłoś. 'That is why we introduced Focused Transformer (FoT), a technique using a training process inspired by contrastive learning. This innovative approach allows to fine-tune existing LLMs so that they are able to accept a larger context.’
According to the IDEAS NCBR and PAS researcher, LongLLaMA is a great achievement because it shows that large language models can overcome limitations related to prompt length and generate useful long texts.
A publication devoted to LongLLaMA - 'Focused Transformer: Contrastive Training for Context Scaling' - was accepted for the NeurIPS 2023 conference in New Orleans. https://arxiv.org/abs/2307.03170
PAP - Science in Poland
kol/ bar/ kap/