Profile
Hi, I’m Haonan Chen. I am currently pursuing my Master's degree in Digital Linguistics at the University of Zurich, with a focus on computational linguistics and speech processing. I am currently working on my Master's thesis under the guidance of Prof. Eleanor Chodroff, on the topic of comparing the performance of humans and Whisper in transcribing whispered Mandarin. My academic interests lie in speech and text technology, and I am particularly passionate about AI products and multilingual learning. In my personal life, I enjoy staying active through body combat, yoga and hiphop, and I love cooking and watching movies!
Learning route

Education

University of Zurich | MA in Digital Linguistics

September 2022 – present, Zurich, Switzerland

  • • Major and elective courses: Machine Learning in Computational Linguistics, Text Generation with Language Model, Automatic Speech Processing, Language Technology and Web Applications, Fundamentals of speech sciences and signal processing, Language Data Processing, Detecting Semantic Shift, etc.

Beijing Foreign Studies University | BA in Polish Language and Culture | GPA: 3.93/4.00

September 2018 – June 2022, Beijing, China

  • • Major and elective courses: Polish-Chinese Interpreting, Introduction to Language Typology, Computer Science and Python Programming, Applicative Technologies of Access Database, etc.

Interships

ByteDance Technology Co., Ltd. | AI Lab | Polish Linguistics Intern

February 2022 – June 2022, Beijing, China

  • • Linguistic and Phonological Research: Conducted research on Polish phonetic features and inflectional rules, and wrote professional, comprehensive text normalization documentation to support business and algorithm development with linguistic insights.
  • • Data Management and Quality Control: Assisted in managing the Polish language corpus, and performed quality control and acceptance of recordings, texts, and annotation data.
  • • Model Performance Evaluation and Feedback: Collaborated closely with the algorithm team to provide test data based on business needs, evaluate model performance, identify errors, and suggest optimization strategies.
  • • International Business Support: Worked with data and algorithm teams to support multiple international business operations, including ByteDance's short video platform.

iFLYTEK Co., Ltd. | Polish Language Resource Intern

November 2020 – April 2021, Remote

  • • Audio Quality Inspection and Corpus Creation: Systematically checked the clarity and compliance of audio data with standards. Used Audacity to record recordings with specific sampling rates and bit depths, as well as varying volumes and speech rates.
  • • Text Data Annotation and Quality Control: Checked the grammatical accuracy of the Polish language corpus. Annotated corpus data for PoS and non-Polish entities. Completed quality checks on over 10,000 corpus entries and created more than 3,000 new entries.
  • • Linguistic support for MTPE Project: Evaluated the effectiveness of machine translations, summarized patterns, and wrote documentations of optimization suggestions.
  • • Professional Training and Team Collaboration: Received training related to corpus management and participated in online Q&A sessions to address semantic and grammatical issues related to Polish.

Projects

WordWizard - Multilingual Crossword Website | Team Leader

October 2023 – January 2024, Web Application and Language Technology

  • Developed a customizable multilingual crossword game supporting English, Spanish, French, German, Swedish, and Dutch, tailored to users' language proficiency levels (CEFR A1-C1). Implemented interactive elements such as hints, a timer, and answer verification to create an engaging and enjoyable gaming experience.
  • • Project Schedule and Execution: Collected and pre-processed multilingual datasets and dictionaries from CEFRLex to meet project requirements.
  • • Data Collection and Preprocessing: Managed the entire project lifecycle from concept to implementation, including requirement analysis, product positioning, page design, and development. Created and monitored the project schedule, ensuring smooth progress and final delivery.
  • • Webpage and User Experience Design: Designed and structured the webpage, focusing on a intuitive and user-friendly visual style.
  • • Core Algorithm Development: Developed the algorithm for generating multilingual crossword puzzles.
  • • Team Collaboration and Management: Assigned tasks based on project scope, set milestone goals, and ensured team alignment and cooperation. Conducted regular progress meetings to track project status and address any issues promptly.
  • • Methods: Python, HTML, CSS, Javascript, SQL

A Differential Study of the Performance of Whisper ASR and Human on Transcription Task of whispered Mandarin Speech

2023 Fall, Automatic Speech Processing

  • • Topic: This research compares the performance of the WHISPER automatic speech recognition (ASR) system and human transcription in transcribing whispered Mandarin speech. Results show that WHISPER performs consistently across text types, it is outperformed by human transcription.
  • • Methods: Python, Paired samples t-test

Rhythmic Variability in Swiss German Dialects

2023 Fall, Computational Processing of Speech Rhythm for Language and Speaker Classification

  • • Topic: This research explores the use of rhythmic metrics, specifically nPVI and VarcoV, in classifying Swiss German dialects from Basel, Bern, and Zurich. The results shows that nPVI and VarcoV are not suitable metrics for dialect differentiation.
  • • Methods: Computation modeling in R, Linear Discriminant Analysis (LDA), Exploratory Data Analysis (EDA)

Rhythmic Variability Between Young and Old Age Groups

2023 Fall, Computational Processing of Speech Rhythm for Language and Speaker Classification

  • • Topic: This study examines the impact of a set of durational and intensity rhythmic metrics to classify speakers into young and old age groups. As shown in the result, deltaV, V% and stdevP stand out as the top predictors.
  • • Methods: Computation modeling in R, Random forest, Decision tree

Certifications
Skills