Yi-Te (Ethan) Hsu ☕️
Yi-Te (Ethan) Hsu

Research Engineer

About Me

I am Yi-Te (Ethan) Hsu, a research engineer at ASAPP, where I lead projects on multimodal LLM fine-tuning and ASAPP’s ASR system using advanced technologies.

Previously, I worked at Otter.ai, improving video/audio transcription and summarization systems. At Apple Inc., I optimized neural machine translation models. I also conducted speech processing research with Dr. Yu Tsao at Academia Sinica and collaborated with Prof. Frank Rudzicz at the University of Toronto and Vector Institute on detecting pathological voices and identifying Alzheimer’s disease.

I am excited about using machine learning to solve real-world problems!

Interests
  • Speech Processing, ASR
  • MultiModal LLM
  • Model Effieicncy and Optimization
Education
  • M.S. in Computer Science

    Johns Hopkins University

  • BSc in Electrical Engineering

    National Taiwan University

Recent Publications
(2024). DiscreteSLU: A Large Language Model with Self-Supervised Discrete Speech Units for Spoken Language Understanding. INTERSPEECH 2024.
(2024). Generative Context-Aware Fine-Tuning of Self-Supervised Speech Models. ICASSP 2024-2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
(2021). Seofp-net: Compression and acceleration of deep neural networks for speech enhancement using sign-exponent-only floating-points. IEEE/ACM Transactions on Audio, Speech, and Language Processing.
(2020). Efficient inference for neural machine translation. SustaiNLP: Simple and Efficient Natural Language Processing, EMNLP 2020.

Experience

  1. Research Engineer

    Language Technology, ASAPP
  2. Machine Learning Engineer

    Speech & NLP Platform, Otter.ai
    Applying speech and NLP techniques to improve the automatic video/ audio transcription and summarization system.
  3. Machine Learning Intern

    Apple Inc.
    • Implemented state-of-the-art model efficiency techniques for Transformer models.
    • Achieved a 2.1x speedup and reduced parameters by 25% while maintaining translation quality by applying knowledge distillation, simpler recurrent architecture, and pruning techniques on Transformer.
  4. Visiting Researcher

    AI Research Group, Vector Institute; University of Toronto
    • Developed early pathological voice detection models using speech processing and deep learning techniques.
    • Built a system to solve the channel mismatch problem between devices, increasing the target domain PR-AUC from 0.84 to 0.94, using an unsupervised domain adaptation method, domain adversarial training.
    • Detected dementia in a low-resource language by proposing a transfer learning method.
  5. Research Assistant

    Bio-Acoustic Signal Processing Lab, Academia Sinica
    • Proposed novel neural network structures that achieves a 4x compression rate and 1.2x acceleration without performance degradation by quantizing the floating-point weights.
    • Integrated and optimized deep learning-based models (LSTM, FCN…) for various signal processing tasks, including speech enhancement and disease detection.

Education

  1. M.S. in Computer Science

    Johns Hopkins University
  2. BSc in Electrical Engineering

    National Taiwan University