Shuo Zhang

I am a machine learning reseracher and engineer at Bose Research, where I work on research and development of NLP and audio applications. Previously I was a Collaborator and Researcher at the Music Technology Group (MTG), Universitat Pompeu Fabra, Barcelona, Spain. I received my Ph.D from Georgetown University[GUCL Group, CorpLing Lab] with a focus on Computational Linguistics. My research interest includes machine learning and deep learning for audio (speech, music, environmental audio understanding), natural language processing (NLP), speech prosody, and computational musicology.

CV email music LinkedIn

Recent Publications

Tornike Karchkhadze, Hassan Salami Kavaki, Mohammad Rasool Izadi, Bryce Irvin, Mikolaj Kegler, Ari Hertz, Shuo Zhang, Marko Stamenovic. Latent CLAP Loss for Better Foley Sound Synthesis. [preprint] (2024)

Mohammad Rasool Izadi, Yujia Yan, Shuo Zhang, Robert Stevenson. Towards Optimal Voice Disentanglement With Weak Supervision. Proc. ICASSP 2024.[IEEE Explore]

Bryce Irvin, Sile Yin, Shuo Zhang, Marko Stamenovic.A Fullband Neural Network For Audio Packet Loss Concealment. Proc. ICASSP 2024.

J. Williams, T. Azim, A. -M. Piskopani, A. Chamberlain and S. Zhang. "Socio-Technical Trust For Multi-Modal Hearing Assistive Technology," 2023 IEEE International Conference on Acoustics, Speech, and Signal Processing Workshops (ICASSPW), Rhodes Island, Greece, 2023, pp. 1-5, doi: 10.1109/ICASSPW59220.2023.10193586. [IEEE Explore]

N Shashaank, Berker Banar, Mohammad Rasool Izadi, Jeremy Kemmerer, Shuo Zhang, Chuan-Che (Jeff)Huang . HiSSNet: Sound Event Detection and Speaker Identification via Hierarchical Prototypical Networks for Low-Resource Headphones. Proceedings of the ICASSP 2023, Rhodes Island, Greece. [arXiv][IEEE Explore]

Alnajjar,K, Hämäläinen,M, Zhang,S. Ring That Bell: A Corpus and Method for Multimodal Metaphor Detection in Videos. Proceedings of the Third Workshop on Figurative Language Processing (FigLang) at EMNLP 2022. [U of Helsinki] [Zenodo][ACL Anthology]

Zhang,S. Data mining Mandarin tone contour shapes. Proceedings of 16th SIGMORPHON Workshop on Computational Research in Phonetics, Phonology, and Morphology (at ACL 2019). Association for Computational Linguistics: Florence, Italy, August 2019. [preprint][acl anthology]

Caro, R, Zhang, S, Serra, X. Quantitative analysis of the relationship between linguistic tones and melody in jingju using music scores. Proceedings of the 3rd International workshop on Digital Libraries for Musicology (DLfM) at International Society for Music Information Retrieval Conference (ISMIR) 2017, Shanghai, China, October 2017. Published by ACM-ICPS. [MTG][UPF e-repositori][ACM-Digital Library]

Zhang,S., Caro,R, Serra,X. Understanding the expressive functions of jingju metrical patterns through lyrics text mining. Proceedings of the 18th International Society for Music Information Retrieval (ISMIR) conference, Suzhou, China, October 2017. [MTG ] [UPF e-repositori]

Zhang, S., Zeldes, A.  GitDOX: A Linked Version Controlled Online XML Editor for Manuscript Transcription. Proceedings of the 30th International Florida Artificial Intelligence Research Society Conference (FLAIRS 30, published by AAAI Press), May 22-24, 2017, Marco Island, Florida, FL. [pdf @ AAAI Press]

Zhang, S.  Mining linguistic tone patterns with symbolic representation. Proceedings of the 14th SIGMORPHON Workshop on Computational Research in Phonetics, Phonology, and Morphology (SIGMORPHON at ACL 2016), Association for Computational Linguistics, Berlin, Germany, August 2016.[ACL anthology]

Zeldes,A, Zhang, S. When Schemas Change Rules Help : A Configurable Approach to Coreference beyond OntoNotes. In: Proceedings of the NAACL2016 Workshop on Coreference Resolution Beyond OntoNotes (CORBON). Association for Computational Linguistics, San Diego, CA, June 2016. [ACL Anthology].

Zhang,S, Caro, R, Serra,X.  Predicting pairwise pitch contour relations based on linguistic tone information in Beijing opera singing. Proceedings of the 16th International Society for Music Information Retrieval (ISMIR) conference_, Malaga, Spain, October 26th-30th, 2015. [mtg]

Zhang, S, Caro, R, Serra,X,. Study of the similarity between linguistic tones and melodic pitch contours in Beijing Opera singing. Proceedings of The 15th International Society for Music Information Retrieval (ISMIR) Conference, pp.345-348. Taiwan, October, 27-31 2014. [pdf final version][mtg]

Published(3)

2022 - Hearing augmentation and wearable system with localized feedback(Bose)
2021 - Spatialized virtual personal assistent (Bose)
2020 - Systems and methods for augmented reality content harvesting and information extraction (Bose)

Filed pending(3)

Recent invited talks, teaching, conference tutorials

Tufts University, University of Washington, Spotify, NLP4MusA Workshop @ ISMIR 2020 , Artificial Intelligence Festival by AI Accelerator Institute AI Accelerator Summit Boston 2019, Global AI Conference Boston 2019, RE:WORK Deep Learning Summit Boston , NLP in MIR tutorial at ISMIR 16 NYC, Peking University, Tsinghua University, Fudan University, Communication University of China, Ningbo University, China Academy of Social Sciences, Shanghai Conservatory, China Conservatory, Central Conservatory, Tencent, Douban, Dolby Labs, etc.

Invited peer reviews and professional services (NLP, MIR, audio signal processing)

Program committee member / reviewer

Journals

EURASIP Journal on Audio, Speech, and Music Processing
Glossa

Conference proceedings

EUSIPCO, ICASSP, ISMIR, DCASE, ACM-Multimedia, IEEE-MMSP, WASPPA, NLP4MusA, ACL, NAACL, COLM, EACL, EMNLP, ECNLP (Workshop), NLP4DH, etc.

Services

Co-Chair of Industrial Liaisons, DCASE Workshop (2021-2023)

Open source projects contributions

click on a project name to learn more

CompMusic Project website

Computational Models for the discovery of world music. Funded by European Research Council, PI: Xavier Serra. Music Technology Group (MTG), Universitat Pompeu Fabra (UPF). Resulted in publications listed here. Contribution: 2013-17.

ANNIS website

ANNIS is originally Annotation of Information Structure at Humboldt Universitat zu Berlin, Germany. It is a web application for search and visualization of linguistic corpora, funded by German Research Foundation (DFG), etc. Contribution: 2013-17.

XRENNER website

XRENNER is eXternally configurable REference and Non Named Entity Recognizer, developed by Amir Zeldes with my contributions (see Zeldes and Zhang 2016 paper at CORBON workshop at NAACL 2016). See website for a live demo of coreference and entity resolution (including both non-named and named entities.) Contribution: 2015-16.

Zeldes,A, Zhang, S. 2016. When Schemas Change Rules Help : A Configurable Approach to Coreference beyond OntoNotes. Proceedings of CORBON Workshop at NAACL 2016.

GitDOX website

Web application for collaborative annotation projects with version control using Gitbub as backend, funded by US National Endowments for the Humanities (NEH), etc., and part of Coptic Scriptorium. GitDox is on the list of awesome-nlp tools starred by 10.1K developers on GitHub. Contribution: 2016.

Zhang, S., Zeldes, A. 2017. GitDOX: A Linked Version Controlled Online XML Editor for Manuscript Transcription. Proceedings of FLAIRS30.

Open source data sets

Annotated jingju arias dataset website

Annotated syllabic level segmentation, phonetic, melodic and linguistic tone information for a set of arias in Beijing oepra under CompMusic Project.

Zhang,S, Caro, R, Serra,X. 2015. Predicting pairwise pitch contour relations based on linguistic tone information in Beijing opera singing. Proc. of ISMIR 15.

Zhang, S, Caro, R, Serra,X,. 2014. Study of the similarity between linguistic tones and melodic pitch contours in Beijing Opera singing. Proc. ISMIR 14.

Jingju lyrics dataset website

Comprehensive collection of lyrics of arias acquired through crawling online jingju lyrics database xikao, under CompMusic Project.

Zhang,S., Caro,R, Serra,X. 2017. Understanding the expressive functions of jingju metrical patterns through lyrics text mining. Proc. ISMIR 17.

ANNIS dataset website

ANNotation of Information Structure.

Thomas Krause; Weißenfels Benjamin; Tom R.; IrinaGlushanok; Martin Klotz; Shuo Zhang; Luke Gessler; Amir Zeldes; fab-bar; Stephan Druskat; adrianeboyd; egon w. stemle; Thomas N; Lari Lampen; Florian Petran (2019, October 18). korpling/ANNIS beta.3 (Version beta.3). Zenodo. http://doi.org/10.5281/zenodo.3507129

Multimodal Metaphor Corpus website

Multimodal Metaphor Corpus (collaboration with University of Helsinki).

Alnajjar,K, Hämäläinen,M, Zhang,S. Ring That Bell: A Corpus and Method for Multimodal Metaphor Detection in Videos. Proceedings of the Third Workshop on Figurative Language Processing (FigLang) at EMNLP 2022. Zenodo.