Tools - SPOT

STARK

STARK iis a command-line tool designed for bottom-up statistical analysis of dependency-parsed corpora. It complements traditional treebank exploration methods that rely on predefined queries by enabling the systematic extraction and quantification of all relevant dependency trees and subtrees based on user-defined parameters—from concrete phrases to more abstract syntactic patterns. Given a treebank in CoNLL-U format, STARK provides frequency counts and other useful statistics for each extracted structure. Within the SPOT project, STARK is used to uncover syntactic patterns characteristic of speech by comparing the SST treebank with its written counterpart, SSJ. To support this effort, the tool was substantially improved and enhanced with a more user-friendly web interface.

Drevesnik

Drevesnik is a web-based interface for querying Slovenian corpora annotated with dependency syntax. It enables linguists and other researchers to explore a wide range of grammatical phenomena in Slovenian. Users can enter custom queries, select the corpora of interest, and view results as interactive dependency trees (graphs), which can also be downloaded for further analysis. As part of the SPOT project, Drevesnik is used for the qualitative analysis of syntactic patterns in both written and spoken Slovenian treebanks. The interface has also been visually redesigned to offer a more modern and user-friendly experience.

Q-CAT

Q-CAT is a desktop application for customizible manual linguistic annotation of corpora, offering advanced corpus query capabilities based on these annotations. The tool has been used in numerous annotation campaigns for Slovenian, including the annotation of dependency relations, semantic roles, named entities, or multi-word expressions. Within SPOT, Q-CAT is employed for the manual dependency parsing of new SST transcripts, for which integration of audio recordings has also been enabled.

Označevalnik

Oznacevalnik CJVT is an online web interface for automatic grammatical annotation of Slovenian texts, based on the CLASSLA-Stanza tool for Slovenian language processing. It assigns a range of morphological, syntactic, and semantic features to surface words—such as base forms, parts of speech, and syntactic functions—thus enabling faster retrieval of relevant linguistic phenomena for linguistic research or information extraction. As part of the SPOT and Mezzanine projects, it was upgraded with new models specifically designed for processing spoken Slovenian. The interface also serves as a public-facing demonstration of automatic linguistic annotation tools.

STARK

Drevesnik

Q-CAT

Označevalnik

Funding

Host institution

Project Leader

Field

Duration

Range