About Me

Ho-Hsiang Wu is currently a Sr. Research Scientist at Bosch Research. He finished his PhD in Music Technology program at New York University, advised by Dr. Juan Pablo Bello. His research interests focus on Multimodal Foundation Model, Machine Learning & Signal Processing, and Embodied AI.

He was a research intern at Adobe Research and Descript, and he also worked as an applied scientist intern at Amazon Alexa. Before starting his PhD journey, he worked in industry for several years building machine learning products. He holds MS degree in Electrical Engineering from University of California, Los Angeles and BS degree from National Taiwan University.

Publications

Ronchini, F., Wu, H. H., Lin, W. C., Antonacci, F. (2025). Mind the Prompt: Prompting Strategies in Audio Generations for Improving Sound Classification. ICASSP GenDA Workshop, 2025.

Lin, W. C., Ghazi, I., Belsarkar, A., Bondi, L., Das, S., & Wu, H. H. (2024). CLAP4SED: Training-Free Multimodal Few-Shot Retrieval for Real-Time Sound Event Detection on Embeded Devices. DCASE, 2024.

Ghaffarzadegan, S., Bondi, L., Lin, W. C., Kumar, A., Wu, H. H., Horst, H. G., & Das, S. (2024). Sound of Traffic: A Dataset for Acoustic Traffic Identification and Counting. INTERSPEECH, 2024.

Tatiya, G., Francis, J., Wu, H. H., Bisk, Y., & Sinapov, J. (2024). MOSAIC: Learning Unified Multi-Sensory Object Property Representations for Robot Perception. ICRA, 2024.

Kim, G., Wu, H. H., Bondi, L., & Liu, B. (2024). Multi-Modal Continual Pre-Training for Audio Encoders. ICASSP, 2024.

Lin, W. C., Ghaffarzadegan, S., Bondi, L., Kumar, A., Das, S., & Wu, H. H. (2024). CLAP4EMO: ChatGPT-Assisted Speech Emotion Retrieval with Natural Language Supervision. ICASSP, 2024.

Vosoughi, A., Bondi, L., Wu, H. H., & Xu, C. (2024). Learning Audio Concepts from Counterfactual Natural Language. ICASSP, 2024.

Ghaffarzadegan, S., Bondi, L., Wu, H.-H., Munir, S., Shields, K.J., Das, S., Aracri, J. (2023). Active Learning for Abnormal Lung Sound Data Curation and Detection in Asthma. INTERSPEECH, 2023.

Wu, H. H., Nieto, O., Bello, J. P., & Salomon, J. (2023). Audio-Text Models Do Not Yet Leverage Natural Language. ICASSP, 2023.

Wu, H. H., Fuentes, M., Seetharaman, P., & Bello, J. P. (2022). How to Listen? Rethinking Visual Sound Localization. INTERSPEECH, 2022.

Srivastava, S., Wu, H. H., Rulff, J., Fuentes, M., Cartwright, M., Silva, C., Arora, A. & Bello, J. P. (2022). A Study on Robustness to Perturbations for Representations of Environmental Sound. EUSIPCO, 2022.

Wu, H. H., Seetharaman, P., Kumar, K., & Bello, J. P. (2022). Wav2CLIP: Learning Robust Audio Representations From CLIP. ICASSP, 2022.

Wu, H. H., Fuentes, M., Bello, J. P. (2021). Exploring Modality-Agnostic Representations for Music Classification. SMC, 2021.

Wu, H. H., Kao, C., Tang, Q., Sun, M., McFee, B., Bello, J. P., & Wang, C. (2021). Multi-task Self-supervised Pre-training for Music Classification. ICASSP, 2021.

Cartwright, M., Cramer, J., Mendez, A. E. M., Wang, Y., Wu, H. H., Lostanlen, V., & Nov, O. (2020). SONYC-UST-V2: An Urban Sound Tagging Dataset with Spatiotemporal Context. DCASE2020.

Cartwright, M., Mendez, A. Cramer, J., Lostanlen, V., Dove, G., Wu, H., Salamon, J., Nov, O., and Bello, J. P. (2019). SONYC Urban Sound Tagging (SONYC-UST): A Multilabel Dataset from an Urban Acoustic Sensor Network. DCASE2019.

Husain, H., Wu, H. H., Gazit, T., Allamanis, M., & Brockschmidt, M. (2019). Codesearchnet challenge: Evaluating the state of semantic code search. arXiv preprint arXiv:1909.09436.

Cramer, J.^*, Wu, H. H.^*, Salamon, J., & Bello, J. P. (2019). Look, Listen, and Learn More: Design Choices for Deep Audio Embeddings. ICASSP, 2019.

Wu, H. H., & Bello, J. P. (2010). Audio-based music visualization for music structure analysis. In Proceedings of Sound and Music Computing Conference (SMC) (pp. 1-6).

Chen, C. W., Lee, K., & Wu, H. H. (2009). Towards a class-based representation of perceptual tempo for music retrieval. In 2009 International Conference on Machine Learning and Applications (pp. 602-607). IEEE.

Chen, C. W., Cremer, M., Lee, K., DiMaria, P., & Wu, H. H. (2009). Improving perceived tempo estimation by statistical modeling of higher-level musical descriptors. In Audio Engineering Society Convention 126. Audio Engineering Society.

Tutorials

Husain, H., Wu, H. H. (2018). Feature Extraction and Summarization with Sequence to Sequence Learning. KDD 2018 Hands-On Tutorials.