Prior Work Experience
Applied Machine Learning Intern at Bose CE Applied Research Group
- Developed and Deployed NLP & Computer Vision related ML system on Google pixel phone and Bose wearables
Research Assistant at SPIRE LAB, IISc Bangalore (Advised by: Prof. Prasanta Kumar Ghosh)
- Built a speech classifier to detect Amyotrophic Lateral Sclerosis (ALS) and Parkinsons (PD) diseases based on voice as bio-marker
- Developed an unsupervised system for robust bird sound detection using enhanced Multiple Window Savitzky-Golay (MWSG) spectrogram.
Software Developer at Robert Bosch Bangalore
- Customer and Production diagnosis in Telematics Projects, where designed new features for Daimler customer.
- Text to speech (TTS) outputs for various car multimedia features like navigation, SMS readout and hands free control and tested the output using TTFIS tool
|
Research
I am interested in understanding signal level properties of audio,speech and Image. My research experiences thus far delve in natural language processing and applying machine learning algorithms on speech & image.
|
Discrete Audio Representation as an Alternative to Mel-Spectrograms for Speaker and Speech Recognition
Krishna C Puvvada, Nithin Rao Koluguri, Kunal Dhawan, Jagadeesh Balam, Boris Ginsburg
International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2024
project page
We evaluate various discrete audio representations that can be used as an alternative to mel-spectrograms for speech and speaker recognition.
|
Investigating End-to-End ASR Architectures for Long Form Audio Transcription
Nithin Rao Koluguri, Samuel Kriman, Georgy Zelenfroind, Somshubra Majumdar, Dima Rekesh, Vahid Noroozi, Jagadeesh Balam, Boris Ginsburg
International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2024
project page
We investigate end-to-end ASR architectures for long form audio transcription, that can do inference in one single pass.
|
Enhancing Speaker Diarization with Large Language Models: A Contextual Beam Search Approach
Tae Jin Park, Kunal Dhawan, Nithin Koluguri, Jagadeesh Balam
International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2024
project page
We propose a novel contextual beam search approach for speaker diarization that leverages large language models to improve speaker diarization performance.
|
Fast conformer with linearly scalable attention for efficient speech recognition
Dima Rekesh, Nithin Rao Koluguri, Samuel Kriman, Somshubra Majumdar, Vahid Noroozi, He Huang, Oleksii Hrinchuk, Krishna Puvvada, Ankur Kumar, Jagadeesh Balam, Boris Ginsburg
IEEE Automatic Speech Recognition and Understanding Workshop (ASRU), 2023
project page
We propose a novel attention mechanism for Conformer architecture that is linearly scalable with respect to the input sequence length.
|
Open Automatic Speech Recognition Leaderboard
Sanchit, Hugging Face Team, Nvidia NeMo Team, SpeechBrain Team, Vaibhav Srivastav, Somshubra Majumdar, Nithin Koluguri, Adel
project page
We present the Open Automatic Speech Recognition Leaderboard, a platform for benchmarking and comparing ASR models.
|
A Compact End-to-End Model with Local and Global Context for Spoken Language Identification
Fei Jia, Nithin Rao Koluguri, Jagadeesh Balam, Boris Ginsburg
International Speech Communication Association (Interspeech), 2023
project page
AmberNet, a compact end-to-end neural network for Spoken Language Identification. AmberNet consists of 1D depth-wise separable convolutions and Squeeze-and-Excitation layers with global context, followed by statistics pooling and linear layers. "
|
Property-Aware Multi-Speaker Data Simulation: A Probabilistic Modelling Technique for Synthetic Data Generation
Tae Jin Park, He Huang, Coleman Hooper, Nithin Rao Koluguri, Kunal Dhawan, Ante Jukić, Jagadeesh Balam, Boris Ginsburg
International Speech Communication Association (Interspeech), 2023
project page
We introduce a sophisticated multi-speaker speech data simulator, specifically engineered to generate multi-speaker speech recordings.
|
The CHiME-7 Challenge: System Description and Performance of NeMo Team's DASR System
Tae Jin Park, He Huang, Ante Jukic, Kunal Dhawan, Krishna C Puvvada,Nithin Koluguri , Nikolay Karpov, Aleksandr Laptev, Jagadeesh Balam, Boris Ginsburg
International Speech Communication Association (Interspeech), 2023
project page
We propose a novel multi-scale decoder for speech recognition, which is an ensembled NeMo diarization system.
|
Multi-scale Speaker Diarization with Dynamic Scale Weighting
Taejin Park, Nithin Rao Koluguri, Jagadeesh Balam, Boris Ginsburg
International Speech Communication Association (Interspeech), 2022
project page
Advanced multi-scale diarization system based on a multi-scale diarization decoder.
|
TitaNet: Neural Model for speaker representation with 1D Depth-wise separable convolutions and global context
Nithin Rao Koluguri,
Taejin Park, Boris Ginsburg
International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2022
project page
A novel neural network architecture for extracting speaker representations. Employs 1D depth-wise separable convolutions with Squeeze-and-Excitation (SE) layers with global context followed by channel attention based statistics pooling layer to map variable-length utterances to a fixed-length embedding (t-vector).
|
SpeakerNet: 1D Depth-wise Separable Convolutional Network for Text-Independent Speaker Recognition and Verification
Nithin Rao Koluguri,
Jason Li, Vitaly Lavrukhin, Boris Ginsburg
International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2021
project page
We propose SpeakerNet - a new neural architecture for speaker recognition and speaker verification tasks, this uses conv1D encoder and x-vector based statistics pooling decoder
|
Meta-learning for robust child-adult classification from speech
Nithin Rao Koluguri,
Manoj Kumar,
So Hyun Kim, Catherine Lord,
Shrikanth Narayanan
International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2020
project page
We demonstrate improvements over state-of-the-art speaker embeddings (x-vectors) for speaker classification using prototypical networks
|
Comparison Of Speech Tasks And Recording Devices For Voice Based Automatic Classification Of Healthy Subjects And Patients With Amyotrophic Lateral Sclerosis
Suhas B.N,
Deep Patel,
Nithin Rao Koluguri,
Prasanta Ghosh*
International Speech Communication Association (Interspeech) , 2019
project page
We evaluated role of different speech tasks and recording devices in detecting ALS through speech
|
Spectrogram Enhancement Using Multiple Window Savitzky-Golay (MWSG) Filter for Robust Bird Sound Detection
Nithin Rao Koluguri,
Nisha G Meenakshi*,
Prasanta Ghosh*
IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2017
project page
We propose a novel unsupervised method to denoise a spectrogram using Multiple Window Savitzky Golay algorithm, and use to enhance bird sound ques to recognize their sounds in noisy environments
|
Conferences:
- 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
- 2023 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU)
- 2023 IEEE International Speech Communication Association (Interspeech)
|
|