![]() Selectivity-based extraction does not require linguistic knowledge as it is derived purely from statistical and structural information of the network and it can be easily ported to new languages and used in a multilingual scenario. Therefore, the selectivity and its modification – generalized selectivity as the node centrality measures are included in the SBKE method. The selectivity slightly outperforms an extraction based on the standard centrality measures. The node selectivity value is calculated from a weighted network as the average weight distributed on the links of a single node and is used in the procedure of keyword candidate ranking and extraction. This chapter presents a novel Selectivity-Based Keyword Extraction (SBKE) method, which extracts keywords from the source text represented as a network. Efikasnost MAUI metode pokazala se perspektivnijom u odnosu na RAKE metodu što je već ranije potvrđeno u eksperimentu ekstrakcije ključnih riječi iz tekstova pisanih na engleskom jeziku. Za potrebe eksperimenta prikupljeni su i ručno označeni talijanski tekstovi. Eksperimentalno je ispitano mogu li metode uspješno ekstrahirati ključne riječi iz tekstova pisanih na talijanskom jeziku, na kojem do sada nisu usporedno testirane. U ovom radu objašnjene su i rekonstruirane dvije postojeće metode – RAKE i MAUI, a koje su standardni predstavnici nenadzirane i nadzirane skupine metoda. Iako su razvijene brojne metode za ekstrakciju ključnih riječi iz teksta, njihova učinkovitost ovisna je o brojnim faktorima poput pristupa kojim su konstruirane, domene na koju su prilagođene, vrste jezika ili zadataka za koji su konstruirane i sl., a samim time prostor za napredak u smislu nadogradnje i poboljšanja, uvijek postoji. Still, there are drawbacks – the method can extract only the words that appear in the text.Īutomatska ekstrakcija ključnih riječi iz teksta aktualan je istraživački problem u području računalne analize prirodnog jezika i pretraživanja informacija. This work shows that SBKE can be easily ported to new a language, domain and type of text in the sense of its structure. In case that we evaluate against the whole keyword set, the F1 scores are 40.08% and 45.71% respectively. The achieved keyword extraction results measured with an F1 score are 49.57% for English and 46.73% for the Serbian language, if we disregard keywords that are not present in the abstracts. The constructed parallel corpus of scientific abstracts with annotated keywords allows a better comparison of the performance of the method across languages since we have the controlled experimental environment and data. The method is based on the structural and statistical properties of text represented as a complex network. The keywords are extracted by a selectivity-based keyword extraction method. In this paper, we study the keyword extraction from parallel abstracts of scientific publication in the Serbian and English languages.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |