Flagship Results

Explore All Results

Journal Papers

📄 Dobrovoljc, K. (in press). Treebanking Spoken Slovenian: New Data, Models, and Lessons Learned. Contributions to Contemporary History.

📄 Dobrovoljc, K. (2024). Using the SST Treebank in Research on Spoken Slovenian: Advantages and Limitations. [In Slovene] Jezik in slovstvo, 69(4), 187-209.

📄 Dobrovoljc, K. (in press). Syntactic Characteristics of Spoken Slovene: a Survey. [In Slovene] Slovenski jezik / Slovene Linguistic Studies.

Book Chapters

📄 Dobrovoljc, K. (2024). Spoken Slovenian Treebank: Current Situation and Perspectives. [In Slovene] In M. Krajnc Ivič (Ed.), Stanje in perspektive uporabe govornih virov v raziskavah govora (pp. 41-62). Maribor: Univerza v Mariboru, Univerzitetna založba.

Conference Presentations

📄  Krsnik, L., & Dobrovoljc, K. (2025). STARK: A toolkit for dependency (sub)tree extraction and analysis. To be presented at SyntaxFest 2025. Ljubljana, Slovenia.

📄 Terčon, L., & Dobrovoljc, K. (2025). ComparaTree: A multi-level comparative treebank analysis tool. To be presented at SyntaxFest 2025. Ljubljana, Slovenia.

📄 Hüll, N., & Dobrovoljc, K. (2025). Word order variation in spoken and written corpora: A cross-linguistic study of SVO and alternative orders. To be presented at SyntaxFest 2025. Ljubljana, Slovenia.

📄 Estève, L., & Dobrovoljc, K. (2025). DELTA: A new pipeline for measuring diversity across various linguistic levels. Presented at UniDive 3rd General Meeting: Universality, diversity and idiosyncrasy in language technology. Budapest.

📄 Dobrovoljc, K., & Čibej, J. (2025). Spoken Slovenian Treebank: New annotated data, parsing models and linguistic insights. Presented at UniDive 3rd General Meeting: Universality, diversity and idiosyncrasy in language technology. Budapest.

📄 Dobrovoljc, K. (2024). Can’t see the forest for the trees: Tools and services for investigating Slovene dependency treebanks. In CLARIN Annual Conference Proceedings (pp. 107-111).

📄 Dobrovoljc, K. (2024). Extending the spoken Slovenian treebank. In Š. Arhar Holdt & T. Erjavec (Eds.), Language technologies and digital humanities: Proceedings of the conference (pp. 113-143). Ljubljana: Inštitut za novejšo zgodovino.

📄 Ljubešić, N., Terčon, L., & Dobrovoljc, K. (2024). CLASSLA-Stanza: The next step for linguistic processing of South Slavic languages. In Š. Arhar Holdt & T. Erjavec (Eds.), Language technologies and digital humanities: Proceedings of the conference (pp. 251-274). Ljubljana: Inštitut za novejšo zgodovino.

📄 Verdonik, D., Ljubešić, N., Rupnik, P., Dobrovoljc, K., & Čibej, J. (2024). Izbor in urejanje gradiv za učni korpus govorjene slovenščine ROG. In Š. Arhar Holdt & T. Erjavec (Eds.), Language technologies and digital humanities: Proceedings of the conference (pp. 469-484). Ljubljana: Inštitut za novejšo zgodovino.

📄 Terčon, L. (2024). Uporaba šestih mer skladenjske kompleksnosti za primerjavo jezika v govornem in pisnem korpusu. In Proceedings of the Conference on Language Technologies and Digital Humanities, Ljubljana, Slovenia.

📄 Dobrovoljc, K., Krsnik, L., & Robnik Šikonja, M. (2023). STARK: A tool for dependency tree extraction and analysis. Presented at UniDive 1st General Meeting: Universality, diversity and idiosyncrasy in language technology. Paris: Paris-Saclay University.

📄 Dobrovoljc, K. (2023). Skladenjska drevesnica govorjene slovenščine: stanje in perspektive. In M. Krajnc Ivič (Ed.), Infrastruktura za raziskave govora v humanistiki in jezikovnih tehnologijah: Zbornik povzetkov (pp. 41-44). Maribor: Univerza v Mariboru, Univerzitetna založba.

Datasets / Corpora

📊 A new version of the Spoken Slovenian Treebank (SST), published as part of: Zeman, D., et al. (2024). Universal Dependencies 2.15. LINDAT/CLARIAH-CZ digital library at the Institute of Formal and Applied Linguistics (ÚFAL), Faculty of Mathematics and Physics, Charles University.

📊 A new version of the Slovenian SSJ Treebank, published as part of: Zeman, D., et al. (2024). Universal Dependencies 2.15. LINDAT/CLARIAH-CZ digital library at the Institute of Formal and Applied Linguistics (ÚFAL), Faculty of Mathematics and Physics, Charles University.

📊 Verdonik, D., et al. (2024). Training corpus of spoken Slovenian ROG 1.0. Slovenian language resource repository CLARIN.SI.

📊 Arhar Holdt, Š., Krek, S., Dobrovoljc, K., Erjavec, T., Gantar, P., Čibej, J., Pori, E., Terčon, L., Munda, T., Žitnik, S., Robida, N., Blagus, N., Može, S., Ledinek, N., Holz, N., Zupan, K., Kuzman, T., Kavčič, T., Škrjanec, I., Marko, D., Jezeršek, L., & Zajc, A. (2024). Training corpus SUK 1.1. Slovenian language resource repository CLARIN.SI, ISSN 2820-4042.

Annotation Guidelines

📄 Dobrovoljc, K., & Terčon, L. (2023). Universal dependencies: Smernice za označevanje besedil v slovenščini 1.7. Ljubljana: Center za jezikovne vire in tehnologije.

Tools and Models

🖥️ Krsnik, L., Dobrovoljc, K., & Robnik-Šikonja, M. (2025). Dependency tree extraction tool STARK 3.1. Slovenian language resource repository CLARIN.SI.

🖥️ Štravs, M., Dobrovoljc, K. & Bezgovšek, L. (2025). Service for querying dependency treebanks Drevesnik 1.2. Slovenian language resource repository CLARIN.SI.

🖥️ Brank, J. (2023). Q-CAT Corpus Annotation Tool 1.5. Slovenian language resource repository CLARIN.SI.

🤖 Krsnik, L., Dobrovoljc, K., & Terčon, L. (2024). The Trankit model for linguistic processing of written and spoken Slovenian 1.2. Slovenian language resource repository CLARIN.SI.

🤖 Krsnik, L., Dobrovoljc, K., & Terčon, L. (2024). The Trankit model for linguistic processing of standard written Slovenian 1.1. Slovenian language resource repository CLARIN.SI.

Invited Talks

🏛️Dobrovoljc, K. (2025). A Treebank-Driven Exploration of Spoken Language Grammar. Guest lecture at the Linguistic forum, University of Gothenburg, Sweden.

🏛️Dobrovoljc, K. (2024). Treebanking speech: Challenges, insights and applications. Invited talk at Beyond Words: Theoretical, Experimental, and Computational Approaches to Language, Contexts, and Modalities, University of Gothenburg, Sweden.

🏛️ Dobrovoljc, K. (2023). Advantages and challenges of cross-lingually harmonized approaches to spoken data annotation. Keynote talk at the Second International Conference on Speech and Language Technologies for Low Resource Languages (SPELLL – 2023), Perundurai, India.

Events

📅SyntaxFest 2025: TLT, UDW, IWPT, QUASY and DepLing, 26-29 August 2025, Ljubljana, Slovenia.

📅SpLAn-UD: UniDive Workshop on Spoken Language Annotation for Universal Dependencies. 29-30 May 2025. Bologna, Italija.

Other Dissemination

💬 Dobnik, S., Samardžić, T., Ljubešić, N., Žgank, A., Zuljan Kumar, D., Dobrovoljc, K., & Tivadar, H. (2024). Frontiers in Speech Communication Research. Panel discussion at Language technologies and digital humanities conference 2024, Ljubljana, Slovenia.

🎙️Žitnik, S., Rozman, T., Čibej, J., & Dobrovoljc, K. (2023). Srečanje jezika in tehnologije. Ljubljana: Alumni UL. (Alumniteka podcast series).

📺 Dobrovoljc, K. (2024). Ah, ta splet!: Digitalna slovenščina. In Ah, ta leta (TV broadcast). TV SLO, 1. program.