Језичка лабораторија



The goal of the Language laboratory is creating and improving a grammatical dictionary of the Serbian language in electronic form and publishing it under a free license. The dictionary can further be used as a basis for development of other projects (spell checkers, text analyzers, automatic translators, grammar checkers, and similar advanced tools).

The Dictionary

The dictionary is the list of the words that are processed on the project. For each word it is possible to see all the forms entered, and for each form the number of statements and their weight. The number of statements is simply the number of users who have entered the same form for the same assertion. The weight is their overall weight - there is a possibility for some users to have more weight than others (for example, that their entry is as "heavy" as two entries of someone else), but this was only used in early days of the dictionary when there were very few users.

We have found that users make less than 1% of errors; to be safe, we round it up to 1%. This means that, if an assertion has one statement, it can be considered that the probability of its accuracy is 99%; for two identical statements, 99.99%; for three identical statements, 99.9999%, which we consider to be a satisfactory accuracy for practical use (one error per million). This, of course, applies to random typing errors and similar, and not to systematic errors that may arise due to differences in grammatical feelings of different users.

The dictionary can also be downloaded in XML or MULTEXT-East format for further processing, with the same content. XML and MULTEXT-East dictionaries contain only statements with a weight of 3 or more, while the raw XML dictionary contains all the statements.