
Dictionary containing a network of over 100 thousand Polish terms linked by relationships is provided for free by researchers from Wrocław University of Technology. Słowosieć will be useful for both IT professionals in computer text processing, as well as ordinary users of language.
Researchers from G4.19 Language Technology Group at Wrocław University of Technology undertook a difficult task: they are going to describe as many as 200 thousand terms in Polish language by showing their relationship with other words. Each entry in the dictionary will have its place in the network of meanings.
According to the project leader, Dr. Maciej Piasecki, scientists are only halfway through the job, but even now it is one of the largest Polish language dictionaries and one of the world\'s two largest wordnet dictionaries. All users can download or browse the dictionary for free.
How are dictionary entries built? For example, the word "car" has assigned synonyms, for example "automobile". Separately listed are also different types of cars, such as "bus", "taxi", "convertible", ("car" is their hypernym) and terms which contain the concept of the car, such as "vehicle" or "means of transportation" (the car is their hyponym). The words with which "car" is related, are also parts of the car: "chassis", "fuel tank" or "airbag" (a holonymic relationship). Dr. Maciej Piasecki said in an interview with PAP that the dictionary distinguished more than 20 different types of relationships between words.
Each relationship is described with a link, making Słowosieć (Polish for “Wordnet”) an interactive dictionary, which can be efficiently navigated move not only by people, but also by computer programs.
For now, dictionary contains 106 thousand entries that have 160 thousand meanings. There are 440 thousand links between them, and 50 thousand terms are translated into English.
According to the researcher, this kind of dictionary can be useful mainly for three groups of people. It can be used by normal users, for example, people who learn Polish or English language. Another group are researchers, who can use Słowosieć in the study of language. The last, very important group that can benefit from Słowosieć, are software developers. Słowosieć can be used for automatic translation, text and speech analysis. Dictionary could help developers create more effective and intelligent search engines, and better manage the information in document databases. The dictionary will also help in the development of the so-called Semantic Internet. "The word description language we use is not precise, but enough to help in the analysis of texts" - noted the expert.
Słowosieć dictionary is modeled on the American dictionary Princeton Wordnet, which is the first and largest dictionary of this type (containing about 150 thousand terms). Initially, Wordnet, created in the 1980s, would only be used in experiments on children learning the meanings of words. In time, it became clear that there were many more applications. Dr. Piasecki noted that many countries that create their own word networks, choose to simply translate of the U.S. wordnet. Polish researchers, however, decided to develop a dictionary from scratch, so that it would better reflect the reality of the Polish language. Słowosieć is developed semi-automatically - programs developed at Wrocław University of Technology learn the meanings of words based on a large database of texts and suggest meaning descriptions for approval by linguists.
"Our thesaurus seems to be better designed than the Princeton Wordnet in terms of structure. In the American dictionary, much depends on the associations of people who create it. Meanwhile, Słowosieć is built on the basis of the analysis of a large database of texts, which contains a total of nearly 2 billion words. These are various texts that give a good image of the language. Creating entries in the dictionary is based on working with real text, and meanings we describe, are the result of how words are actually used. Due to this, the meaning can be more up to date than the other dictionaries. Our Dictionary is the only wordnet dictionary based on linguistic principles" - said Dr. Piasecki.
The Słowosieć website already has had hundreds of thousands of visitors, and Słowosieć has been downloaded by recovered about 300 users, including dozens of companies. According to the scientist, the solutions is not yet used by any large international companies. "They often develop solutions that work for many languages, because it is cheaper. However, such solutions are not necessarily optimal for the Polish language" - noted the researcher.
"I hope that in three years we can reach 200 thousand entries. This was the volume of the largest Polish language dictionary in history, hence our goal. But we want to look for natural boundaries of language" - concluded Dr. Piasecki.
The dictionary is being developed by an interdisciplinary team, composed of computer scientists, computer linguists and linguists. The activities are financed by several projects, including NEKST and SYNAT. To download the Słowosieć source files, users simply register free of charge by filling out a form.
PAP - Science and Scholarship in Poland, Ludwika Tomala
lt/ mrt/
tr. RL
Fundacja PAP zezwala na bezpłatny przedruk artykułów z Serwisu Nauka w Polsce pod warunkiem mailowego poinformowania nas raz w miesiącu o fakcie korzystania z serwisu oraz podania źródła artykułu. W portalach i serwisach internetowych prosimy o zamieszczenie podlinkowanego adresu: Źródło: naukawpolsce.pl, a w czasopismach adnotacji: Źródło: Serwis Nauka w Polsce - naukawpolsce.pl. Powyższe zezwolenie nie dotyczy: informacji z kategorii "Świat" oraz wszelkich fotografii i materiałów wideo.