srt-to-praat: Convert .srt subtitle files to Praat .TextGrid files

I wrote a Python script that converts .srt subtitle files to Praat .TextGrid files. It has extra features to accommodate .srt files generated by whisper.cpp and whisperX. Unlike SrtToTextgrid, it automatically adds silent intervals to the .srt file and convert it to .TextGrid format in one go.

Please visit my GitHub repository to download the script and the example files.

Requirements

This script requires pydub and inflect. You can install them by entering the following commands in your Terminal/Command Prompt:

pip install pydub
pip install inflect

Usage

Go to Terminal/Command Prompt and run the following command:

python3 srt-to-praat.py srt_input media_input tg_output csv_output -d -c

Arguments

srt_input: Path to the input .srt file.

media_input: Path to the input audio file. This is needed to determine the total duration of the output .textGrid file. A wide range of media formats are supported.

Video: MXF, MKV, OGM, AVI, DivX, WMV, QuickTime, RealVideo, Mpeg-1, MPEG-2, MPEG-4, DVD-Video (VOB), DivX, XviD, MSMPEG4, ASP, H.264 (Mpeg-4 AVC)

Audio: OGG, MP3, WAV, RealAudio, AC3, DTS, AAC, M4A, AU, AIFF, Opus.

tg_output: Path to the output .TextGrid file.

csv_output: Path to the output .csv file. The script generates a CSV file which logs all instances of consecutive uppercase letters and numbers in the subtitles. It is important to edit them out if you intend to use forced alignment tools like Montreal Forced Aligner (MFA) as they do not process acronyms and numbers properly.

Options

-d, --diarize enables speaker diarization if each subtitle in your .srt file starts with the name of the speaker in the format [SPEAKER_NAME]: It gives each speaker a separate tier in the TextGrid file.

-c, --convert-numbers adds space in between consecutive uppercase letters (e.g., SRTS R T) and converts numbers to English words (e.g., 25twenty-five). I recommend you to have this enabled if you intend to use forced alignment tools like MFA afterwards.

Example

example.srt is a transcript generated by whisperX, and example.wav is the corresponding audio file. To convert it to a .TextGrid, you may enter

python3 srt-to-praat.py example.srt example.wav example.TextGrid example.csv -d -c

If you do not want speaker diarization and number conversion, you may enter

python3 srt-to-praat.py example.srt example.wav example.TextGrid example.csv