What’s New: Basis Technology Adds Turkish Support to the Rosette Linguistics Platform
Turkish is spoken natively by over 83 million people in the world, so we’re pleased to be releasing a version of Rosette with support for Turkish. Rosette Base Linguistics will now provide lemmas (the dictionary form of words) for search engines and other natural language processing applications.
Lemmas are a key linguistic ingredient that improves search results for almost all languages (even English). Turkish is a language that benefits from lemmatization when doing searches. It is a highly agglutinative language, which means that affixes (prefixes and suffixes) are liberally added to words to make new words (e.g., create nouns from verbs); indicate the grammatical function of a word; and intensify syllables used with adjectives or adverbs.
Thus without lemmatization, doing a search for “köy” meaning “village (nominative case)” would fail to match these other forms of the word.
||the village’s/of the village
||to the village
||the village (accusative case)
||from the village
||in the village