Učni načrt predmeta

Predmet:
Napredne jezikovne tehnologije
Course:
Advanced Language Technologies
Študijski program in stopnja /
Study programme and level
Študijska smer /
Study field
Letnik /
Academic year
Semester /
Semester
Informacijske in komunikacijske tehnologije, 3. stopnja Tehnologije znanja 1 1
Information and Communication Technologies, 3rd cycle Knowledge Technologies 1 1
Vrsta predmeta / Course type
Izbirni / Elective
Univerzitetna koda predmeta / University course code:
IKT3-724
Predavanja
Lectures
Seminar
Seminar
Vaje
Tutorial
Klinične vaje
work
Druge oblike
študija
Samost. delo
Individ. work
ECTS
15 15 15 105 5

*Navedena porazdelitev ur velja, če je vpisanih vsaj 15 študentov. Drugače se obseg izvedbe kontaktnih ur sorazmerno zmanjša in prenese v samostojno delo. / This distribution of hours is valid if at least 15 students are enrolled. Otherwise the contact hours are linearly reduced and transfered to individual work.

Nosilec predmeta / Course leader:
doc. dr. Senja Pollak
Sodelavci / Lecturers:
Jeziki / Languages:
Predavanja / Lectures:
Slovenščina, angleščina / Slovenian, English
Vaje / Tutorial:
Pogoji za vključitev v delo oz. za opravljanje študijskih obveznosti:
Prerequisites:

Zaključen študij druge stopnje s področja informacijskih ali komunikacijskih tehnologij ali zaključen študij druge stopnje na drugih področjih z znanjem osnov s področja predmeta. Potrebna so tudi osnovna znanja matematike, računalništva in informatike.

Completed second-cycle studies in information or communication technologies or completed second-cycle studies in other fields with knowledge of fundamentals in the field of this course. Basic knowledge of mathematics, computer science and informatics is also requested.

Vsebina:
Content (Syllabus outline):

Uvod:
Razvoj jezikoslovja in računalniškega jezikoslovja, kompleksnost jezika, ravni analize jezika, pregled aplikacij in metod.

Analiza jezika z metodami za procesiranje naravnega jezika:
Relevantne metode in primeri uporabe za avtomatizirano označevanje na morfološki, sintaktični in semantični ravni.

Vrste korpusov in standardi za zapis: eno- in večjezični korpusi in standardi zapisov.

Metode procesiranja naravnega jezika: Metode za luščenje informacij (luščenje imenskih entitet, terminologije), klasifikacija dokumentov, analiza sentimenta, semantična analiza, diahrona analiza.

Interdisciplinarnost: digitalna humanistika in računalniško družboslovje.

Introduction:
Development of linguistics and computational linguistics, complexity of language, levels of linguistic analysis, overview of applications and methods.

Text analysis with natural language processing methods:
Relevant methods and use cases for automatic morphological, syntactic and semantic annotation.

Corpora types and encoding standards: monolingual and multilingual corpora and text encoding standards.

Natural language processing methods: information extraction (named entity recognition, terminology extraction), document classification, sentiment analysis, diachronic analysis.

Interdisciplinary applications: digital humanities and computational social science.

Temeljna literatura in viri / Readings:

Izbrana poglavja iz naslednjih knjig: / Selected chapters from the following books:
D. Jurafsky, and J. H. Martin. An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition with Language Models, 2024. https://web.stanford.edu/~jurafsky/slp3/
R. Mitkov (ed.). The Oxford Handbook of Computational Linguistics. Oxford University Press, 2003. ISBN
978-0-19-823882-9.
C. Manning, and H. Schütze. Foundations of Statistical Natural Language Processing. MIT Press. 1999.
ISBN 0-262-13360-1.
N. Ide and J. Pustejovsky (eds.). Handbook of Linguistic Annotation. Springer. 2017. I SBN 978-94-024-
0881-2.

Cilji in kompetence:
Objectives and competences:

Jezikovne tehnologije zajemajo metode in aplikacije obdelave naravnega jezika na računalniku.

Slušatelji pridobijo teoretično razumevanje in praktične izkušnje s področij jezikovnih tehnologij in računalniškega jezikoslovja, kar je predpogoj za učinkovito delo na računalniški obdelavi jezikovnih podatkov.

Cilji predmeta so (a) predstaviti osnove jezikovnih tehnologij, (b) predstaviti zapis in označevanje jezikovnih virov in (c) izbrane metode in tehnike jezikovnih tehnologij. Poudarek predmeta je na obravnavi slovenskega jezika in čezjezikovnih metodah.

Študenti bodo obvladali osnove jezikovnih tehnologij in bodo usposobljeni za praktično uporabo izbranih metod in orodij.

Language technologies comprise methods and applications of computer processing of natural language.

Students will gain basic theoretical understanding and practical experience of language technologies and computational linguistics, which is a prerequisite for effective work on computer processing of language data.

The course objectives are to (a) introduce the basics of language technologies, (b) present the coding and annotation of language resources, and (c) present selected methodologies and techniques used in language technologies. The focus of the course is on the processing of Slovene language and cross-lingual methods.

The students will master the basics of language technologies and will be capable of using selected methods and tools in practice.

Predvideni študijski rezultati:
Intendeded learning outcomes:

Obvladana uporaba izbranih metod in tehnik jezikovnih tehnologij, usposobljenost za praktično uporabo izbranih metod in orodij.

Mastering of selected methods and techniques of language technologies, capability of practical use of selected methods and techniques.

Metode poučevanja in učenja:
Learning and teaching methods:

Predavanja, seminar, konzultacije, samostojno delo.

Lectures, seminar, consultations, individual work.

Načini ocenjevanja:
Delež v % / Weight in %
Assesment:
Pisni ali ustni izpit
50 %
Written or oral exam
Seminarska naloga
25 %
Seminar work
Ustni zagovor seminarske naloge
25 %
Oral defense of the seminar work
Reference nosilca / Lecturer's references:
1. KOLOSKI, Boshko, STEPIŠNIK PERDIH, Timen, ROBNIK ŠIKONJA, Marko, POLLAK, Senja, ŠKRLJ, Blaž. Knowledge graph informed fake news classification via heterogeneous representation ensembles. Neurocomputing. [Print ed.]. 2022, vol. 496, july, str. 208-226. ISSN 0925-2312. DOI: 10.1016/j.neucom.2022.01.096.
2. MARTINC, Matej, POLLAK, Senja, ROBNIK ŠIKONJA, Marko. Supervised and unsupervised neural approaches to text readability. Computational linguistics. 2021, vol. 47, no. 1, str. 141-179. ISSN 0891-2017. DOI: 10.1162/coli_a_00398.
3. ŠKRLJ, Blaž, MARTINC, Matej, KRALJ, Jan, LAVRAČ, Nada, POLLAK, Senja. tax2vec : constructing interpretable features from taxonomies for short text classification. Computer speech & language. 2021, vol. 65, str. 101104-1-101104-21. ISSN 0885-2308. DOI: 10.1016/j.csl.2020.101104.
4. MARTINC, Matej, HAIDER, Fasih, POLLAK, Senja, LUZ, Saturnino. Temporal integration of text transcripts and acoustic features for Alzheimer's diagnosis based on spontaneous speech. Frontiers in aging neuroscience. 2021, vol. 13, str. 652647-1-652647-15.
5. HONG HANH, Tran Thi, MARTINC, Matej, REPAR, Andraž, LJUBEŠIĆ, Nikola, DOUCET, Antoine, POLLAK, Senja. Can cross-domain term extraction benefit from cross-lingual transfer and nested term labeling?. Machine learning. 2024, vol. 113, march, str. 4285-4314. ISSN 1573-0565. DOI: 10.1007/s10994-023-06506-7.