Number of the records: 1

Performance evaluation of Machine Learning approaches for identifying parts of scientific affiliations

Title statement	Performance evaluation of Machine Learning approaches for identifying parts of scientific affiliations [rukopis] / Jan Macháň
Additional Variant Titles	Performance evaluation of Machine Learning approaches for identifying parts of scientific affiliations
Personal name	Macháň, Jan, (dissertant)
Translated title	Performance evaluation of Machine Learning approaches for identifying parts of scientific affiliations
Issue data	2023
Phys.des.	46 : grafy, schémata, tab.
Note	Ved. práce Karel Berka
	Oponent Martin Trnečka
Another responsib.	Berka, Karel, 1982- (thesis advisor)
	Trnečka, Martin (opponent)
Another responsib.	Univerzita Palackého. Katedra biochemie (degree grantor)
Keywords	afiliace * geolokalizace * embeddings * pre-trained word embeddings modely * modely strojového učení * klasifikace * statistické vyhodnocení * výběr vhodného modelu * analýza dat * affiliations * geo-localization * embeddings * pre trained word embeddings * machine learning models * classification * statistical evaluation * model selection * data analysis
Form, Genre	diplomové práce master's theses
UDC	(043)378.2
Country	Česko
Language	angličtina
Document kind	PUBLIKAČNÍ ČINNOST
Title	Mgr.
Degree program	Navazující
Degree program	Bioinformatika
Degreee discipline	Bioinformatika

book

Kvalifikační práce	Downloaded	Size	datum zpřístupnění
00279172-769376153.pdf	9	2.7 MB	15.05.2023

Posudek	Typ posudku
00279172-ved-530712776.pdf	Posudek vedoucího
00279172-opon-696483076.pdf	Posudek oponenta

Průběh obhajoby	datum zadání	datum odevzdání	datum obhajoby	přidělená hodnocení	typ hodnocení
00279172-prubeh-251335408.pdf	14.10.2021	15.05.2023	14.06.2023	A	Hodnocení známkou

Resumé
Citace PRO

Tato diplomová práce zkoumá vhodnost několika volně dostupných natrénovaných modelů ke tvorbě word embeddings pro úlohu geolokalizace částí vědeckých afiliací. Analýza využívá statistické metody, jako jsou PCA a ANOVA, k určení nejvhodnějšího embeddings modelu. Tyto modely se používají v kombinaci s modely strojového učení, jako jsou neuronové sítě a další klasifikátory. Jako nejvýkonnější se ukazuje kombinace modelů neuronové sítě + uncsd-BERT embeddings. Jeho přesnost a průběh procesu učení jsou dále prozkoumány. Práce je případovou studií ilustrující možný postup výběru nejlepšího modelu pro konkrétní klasifikační úlohu a poskytuje pohled na současný stav výkonnosti vybraných embeddings modelů v této úloze. Primárním cílem je vyvinout model strojového učení pro geolokalizaci afiliací, bez nutnosti spoléhat na komerční nástroje.This diploma thesis investigates the suitability of multiple freely available pre-trained word embeddings (PWE) models for the task of geo-localizing parts of affiliations. The analysis employs statistical methods, such as PCA and ANOVA, to identify the most suitable PWE model. PWE models are used in combination with ML classifiers such as Neural Networks, Random Forests, Support Vector Classifier, and K-Nearest Neighbors. The Neural Networks together with uncsd-BERT embeddings model emerges as the best performing combination. Its classification performance and learning process are further evaluated. The thesis serves as a case study illustrating the selection of the best model for a specific classification task, and it provides insights into the state of the art performance of selected embeddings models on this task. The primary goal is to develop a ML model for geo-localizing affiliations without commercial tools.

citace PRO

Number of the records: 1