CV
π Education
Ph.D. in Linguistics (2017-2022)
ποΈ University of Cologne, Germany
SFB1252 "Prominence in Language"
π Thesis: Referring Expression Generation in Context: Combining Linguistic and Computational Approaches
π₯ Supervisors: Prof. Dr. Nikolaus P. Himmelmann, Prof. Dr. Kees van Deemter
Research Master's Programme in Linguistics (2012-2014)
ποΈ Utrecht University, the Netherlands
π Thesis: The Vector-based Semantics of Distal PP Modification
π₯ Supervisors: Prof. Dr. Yoad Winter, Dr. Choonkyu Lee
Exchange Student in Cognitive Science Department / Intern in Mercator Research Group (2013-2014)
ποΈ Ruhr University Bochum, Germany
M.A. in General Linguistics (2009-2011)
ποΈ Allameh Tabatabai University, Iran
π Thesis: The Morpho-semantic Analysis of Bahuvrihi Compounds in Persian
π₯ Supervisor: Prof. Dr. Koorosh Safavi
B.A. in English Language and Literature (2005-2009)
ποΈ University of Isfahan, Iran
Diploma in Mathematics and Physics Discipline (2001-2005)
ποΈ Edalat High School, Iran
πΌ Work Experience
Data Scientist (2024 - Present)
π’ Trivago N.V.
DΓΌsseldorf, Germany
πΌ Core Responsibilities:
- Build rule-based, ML, and LLM models for different tasks such as matching, deduplication, and data enrichment
- Develop and optimise BigQuery SQL and Python pipelines at scale
- Perform EDA, anomaly detection, data transformation, and visualisation
- Collaborate with backend engineers to productionize scalable models
- Work closely with stakeholders to deliver data-driven solutions
Research Associate (2017 - 2024)
ποΈ University of Cologne
Germany
π¬ Collaborative Research Center "Prominence in Language" (SFB1252)
π Project: "INF β Infrastructure: Data, Design and Sustainability"
π₯ PIs: Prof. Dr. Nikolaus P. Himmelmann, Prof. Dr. Nils Reiter
πΌ Core Responsibilities:
- Ensuring sustainable data management of ~20 research projects in compliance with the German Research Foundation (DFG) regulations, including tasks such as long-term data archiving, handling metadata, and fostering open science practices
- Assisting projects with different data workflow tasks including data preprocessing, automatic annotation, data wrangling, query matching, string manipulation, and data visualisation
- Consulting individual projects on experimental design, corpus design, survey design, and data annotation practices
Research Associate (2015 - 2017)
ποΈ University of Cologne
Germany
π¬ DFG Project: The relation between grammar and usage: null subjects and subject position in Spanish and Persian
π₯ PI: Prof. Dr. Aria Adli
πΌ Core Responsibilities:
- Multi-layer annotation of Persian spontaneous speech
- Data preparation and transformation
- Developing hierarchical data representation strategies in alignment with ISO frameworks, MAF and SynAF, and TEI guidelines
π οΈ Skills & Expertise
π» Programming & Scripting
Python: pandas, NumPy, scikit-learn, TensorFlow
R: Statistical analysis and data manipulation
SQL: Database querying and management
βοΈ Cloud & Big Data Ecosystem (GCP)
BigQuery: Data warehousing and analytics
BigQuery ML: Machine learning in the cloud
Cloud Storage: Scalable object storage
Looker: Business intelligence and data visualization
π€ Machine Learning
Classical Methods: Linear/logistic regression, tree-based and ensemble models
Deep Learning: CNNs, RNNs/LSTMs
Transformer-based: Large Language Models (LLMs)
π Survey & Crowdsourcing Platforms
Amazon Mechanical Turk: Human intelligence tasks
Prolific: Academic research participant recruitment
Qualtrics: Survey design and data collection
Google Forms: Simple survey creation
π€ Natural Language Processing
Toolkits: spaCy, NLTK, UDPipe, Quanteda, Hugging Face Transformers
Annotation Tools: INCePTION, WebAnno, MMAX2, ELAN
Corpus Analysis: Sketch Engine (CQL)
π Markup & Document Preparation
Typesetting: LaTeX, Markdown
Data Formats: XML, XPath
Text Processing: Regular Expressions (Regex)
π Version Control & Collaboration
Git: Source code management and collaboration
π Teaching
π Publications
π€ Talks
Morpho-semantic Analysis of Bahuvrihi Compounds in Persian
Oral presentation at International Conference on Word formats and lexical combinations, Rome, Italy
Does Typicality Conform to the Principle of Compositionality? A Conceptual-Compositional Perspective
Poster presentation at 17th International Workshop on Roots of Pragsemantics, Szklarska Poreba, Poland
A stand-off XML-TEI representation of reference annotation
Poster presentation at DGfS 2018: 40th Annual Conference of the German Linguistic Society, Stuttgart, Germany
Homogeneous annotation of dependency relations using universal dependencies (UD): The case of P-drop in Persian
Poster presentation at DGfS 2018: 40th Annual Conference of the German Linguistic Society, Paris, France
MultiSub: A multiple parallel subtitle corpus
Poster presentation at DGfS 2019: 41th Annual Conference of the German Linguistic Society, Bremen, Germany
Choice of Referring Expressions in Discourse: Computational Interpretation of Recency
Oral presentation at 1st Workshop on Computational Approaches to Discourse, Virtual
Choosing a Feature Set for Generating Referring Expressions in Context
Oral presentation at The 28th International Conference on Computational Linguistics (COLING2020)Β , Virtual
What can Neural Referential Form Selectors Learn?
Oral presentation at INLG 2021, Aberdeen, Scotland [online]
Referring Expression Generation in Context: The Choice of Corpus
Invited talk at Workshop 'NLP Meets Linguistics', Utrecht University, the Netherlands
Non-neural Models Matter: a Re-evaluation of Neural Referring Expression Generation Systems
Oral presentation at ACL 2022, Dublin, Ireland
Constructing Distributions of Variation in Referring Expression Type from Corpora for Model Evaluation
Oral presentation at LREC 2022, Marseille, France
Referring Expression Generation in Context: Combining Linguistic and Computational Approaches
Invited talk at University of Cologne, University of Cologne
Variation in the Choice of Referring Expressions in Context
Invited talk at Utrecht NLP Group, Utrecht University, the Netherlands
Models of Reference Production: How do They Withstand the Test of Time?
Oral presentation at INLG 2023, Prague, Czech Republic
Human Evaluation: Why the Choice of Approach is Key
Invited talk at Computational Linguistics in the Hothouse, Utrecht, the Netherlands
Trivago AI Meetup: Learnings from NLP Conferences
Invited talk at Trivago AI Meetup, DΓΌsseldorf, Germany
Beyond One-Size-Fits-All: Layered Human Evaluation for Reliable NLG Assessment
Invited talk at DCU NLP Seminar Series, Dublin, Ireland [online]