Les tutelles
CNRS-MNHN-Université de Paris
ALLASSONNIÈRE-TANG Marc
Entité de rattachement
UMR 7206 - Diversité et évolution culturelles (DivEC)
Spécialité
Linguistique, Traitement automatique des langues, Humanités numériques, Phylogénétique
Contact
Réseaux sociaux
Site(s) internet
Courriel
PARCOURS PROFESSIONNEL
- 2021 – present: CNRS Researcher. EA (Ecological Anthropology, UMR 7206) lab at the Muséum National d’Histoire Naturelle (MNHN) in Paris, France.
- 2019 – 2021: Postdoctoral Researcher. DDL (Dynamics of Language) lab of University Lumière Lyon 2, France. (80% research - 20% Teaching/Supervision).
- 2016 – 2019: Ph.D. Student. Department of Linguistics and Philology, Uppsala University, Sweden. (80% research- 20% teaching, full-time funded working contract).
- 2013 – 2015: French Instructor. Chinese Institute of European Languages, Taiwan. (Group courses 5-20 members, 30 hours/week, level A1-C2).
- 2013 – 2015: Research Assistant. Syntax/Phonology lab, National ChengChi University, Taiwan. (Experiment design, database maintenance, and data analysis)
- 2011 – 2012: Product Manager. North Africa division, Asustek Computer, Taiwan. (Product planning, sales and marketing for notebooks and tablets)
FORMATION
- 2016 – 2019: Ph.D. in linguistics, Uppsala University, Sweden/ National Institute of Oriental Languages and Civilizations, France.
Thesis: A typology of classifiers and gender: From description to computation. - 2013 – 2015: M.A. in linguistics, National ChengChi University, Taiwan.
Thesis: A GIS typological analysis of the convergence and divergence among numeral classifier, genders and plural markers in the world’s languages. - 2006 – 2011: B.A. in Diplomacy/Arabic Language and Literature (double major), National ChengChi University, Taiwan.
COMPÉTENCES
- Languages: French (Native), Chinese (Native), English (TOEIC 990/990), Arabic (CEFR B1), Swedish (CEFR B1).
- Data visualization and analysis: The R tidyverse and its extensions, classification and regression (Generalized Linear Mixed Models, Random Forests, Neural
Networks, Support Vector Machines), clustering methods, QGIS. - Natural Language Processing: Stylometry, topic modelling, word embeddings (GloVe, word2vec, fastText), text processing (Word segmentation, POS tagging, dependency parsing), web data harvesting (Docker, Selenium).
- Computational methods: Bayesian phylogenetic inference (BayesTraits, MrBayes, BEAST), Bayesian agent-based modelling with network analysis.
- Linguistics: CLAN (Computerized Language Analysis), ELAN (EUDICO Linguistic Annotator), Praat, Toolbox, VTL (VocalTractLab).
- Computer: Programming language R and Python, Operating systems Linux, Mac, and Windows, LATEX.
ENSEIGNEMENTS
Linguistique
- Fieldwork practice session with native speakers: Linguistic summer school teaching module, HT19 (International School in Linguistic Fieldwork, Paris). Instructor for 2-hour daily sessions on planning and conducting fieldwork during a week.
- Current Research in Linguistics: Undergraduate course VT19 (Department of Linguistics and Philology, Uppsala University). Instructor in charge of the full course. The students learn qualitative and quantitative methods to develop and test linguistic hypotheses.
- Cognitive Linguistics: Undergraduate course VT19 (Department of Linguistics and Philology, Uppsala University). Instructor in charge of the full course. This course provides basic theoretical and methodological knowledge in the area of cognitive linguistics.
Méthodes quantitatives
- Visualisation and Statistics: Postgraduate course HT18 (Faculty of Languages, Uppsala University). Instructor for weekly two-hour lab sessions of R programming in data visualization an statistical analysis.
- An Introduction to Random Forests in R: Postgraduate teaching module, VT18 (Department of Linguistics and Philology, Uppsala University). Instructor for 90-minute sessions to the computational classifier of random forests.
Cours de langues
- French: High school/University group courses 2013HT- 2015VT (Chinese Institute of European Languages). Instructor for group courses (A1-C2 levels). The teaching involved conversation, writing, grammar courses, and preparation for the DELF diplomas.
Projets
EVOGRAM
The role of linguistic and non-linguistic factors in the evolution of nominal classification systems (Grant: ANR, 166 936 euros) - Principal Investigator - ANR-20-CE27-0021
This project (2021-2023) is hosted at the DDL (Dynamics of Language) lab in Lyon and aims at building a database on nominal classification systems to identify the factors affecting their evolution.
MACDIT
Multi-agent models and social media data: Collective dynamics and individual trajectories in linguistic populations (Grant: Labex ASLAN, 229 916 euros) - Principal Investigator (with J-P Magué) - LINK
The goal of this project (2021-2024) is to study the interaction between individual and collective levels of language variation and change in Twitter and Wikipedia data using Bayesian agent-based models.
RELI
Recherche En Linguistique Illustrée [Research In Linguistics Illustrated] (Grant: Labex ASLAN, 6 000 euros + extension 2 000 euros) - Principal Investigator (with R Anselme) - LINK
This project (2020-2021) contributes to the valorization of science by popularizing research of the ASLAN laboratories in Lyon in the form of short comics and comic strips.
FIELDLING
Funded international school in linguistic fieldwork - Organising committee - LINK
FieldLing has been organised on a yearly basis since 2010 (involved units: LLACAN, SEDYL, LACITO, DDL). It is at present the only regular (and free) intensive training program in France preparing students to study theories, methods, and the use of technological tools for language description through fieldwork.
Publications
2022
- Janvier 2022 — Inferring case paradigms in Koalib with computational classifiers. Abstract The object case inflection in Koalib (Niger-Congo) represents complex patterns that involve phoneme position,... Corpus Linguistics and Linguistic Theory.,ISSN:1613-7027, 1613-7035
2021
- Décembre 2021 — Expansion by migration and diffusion by contact is a source to the global diversity of linguistic nominal categorization systems. Abstract Languages of diverse structures and different families tend to share common patterns if they are spoken in geographic... Humanities and Social Sciences Communications. Vol. 8, n° 1, p. 331.,ISSN:2662-9992
- Août 2021 — « Classifiers in Southeast Asian languages » in The Languages and Linguistics of Mainland Southeast Asia. Dir. De Gruyter, p. 733-772.,ISBN:978-3-11-055814-2
- 2021 — Testing Semantic Dominance in Mian Gender: Three Machine Learning Models. Oceanic Linguistics. Vol. 60, n° 2, p. 302-334.,ISSN:1527-9421
- 2021 — Investigating the branching of Chinese classifier phrases: Evidence from speech perception and production. Journal of Chinese Linguistics. Vol. 49, n° 1, p. 71-105.,
- Janvier 2021 — An empirical study on the contribution of formal and semantic features to the grammatical gender of nouns. Linguistics Vanguard. Vol. 7, n° 1, p. 20200048.,ISSN:2199-174X
- 2021 — Syllable Complexity and Morphological Synthesis: A Well-Motivated Positive Complexity Correlation Across Subdomains. Frontiers in Psychology. Vol. 12, p. 583.,ISSN:1664-1078
- 2021 — « Keyword Spotting: A quick-and-dirty method for extracting typological features of language from grammatical descriptions » in Proceedings of SLTC-2020.,
2020
- Septembre 2020 — Numeral base, numeral classifier, and noun: Word order harmonization. Language and Linguistics. Vol. 21, n° 4, p. 511-556.,ISSN:1606-822X, 2309-5067
- Mars 2020 — Functions of gender and numeral classifiers in Nepali. Poznan Studies in Contemporary Linguistics. Vol. 56, n° 1, p. 113-168.,ISSN:0137-2459, 1897-7499