Morzsák

Oldal címe

Translating Hungarian language dialects using natural language processing models

Címlapos tartalom

Within the territory of Hungary, there are several dialects that differ from the traditional Hungarian language and have various accents. These differences can influence speech, vocabulary, and text, often leading to misunderstandings. These accents have developed over several centuries and have preserved their regional characteristics. The accent primarily relates to the emphasis and pronunciation of sounds, but there are also written traces that have been preserved over hundreds of years. This intriguing language processing challenge raises the question of whether we can create a translation program using natural language processing models that are capable of translating text or speech between different accents. While we are aware of the similarities between these dialects and the traditional Hungarian language, the research aims to discover further connections between different dialects. Data collection itself poses an interesting challenge since these are only dialects and not distinctly separated from each other. Additionally, there is no comprehensive compilation of the vocabulary or uniqueness of each dialect in Hungarian according to our best knowledge. Even within dialects found in books, there are variations as texts can span several decades or even centuries. Furthermore, territorial changes have also influenced dialects.