Skip to main content

audio

The audio commands provide helpers for several audio-related tasks:

  • the convert command repairs audio files and makes them available in several formats.

  • the join command joins several audio files into one.

  • the transcribe command transcribes an audio file to .vtt captions.

convert

The audio convert command should be run on all .webm recordings produced by the browser. The syntax is:

liqvid audio convert audio.webm

This does two things: fixes the durations in audio.webm (so that audio seeking will work), and creates audio.mp4 (Safari does not support webm). It is equivalent to running the following shell commands:

# fix webm file produced by browser
ffmpeg -i audio.webm -strict -2 audio-fixed.webm
mv audio-fixed.webm audio.webm

# make available in mp4
ffmpeg -i audio.webm audio.mp4

join

The audio join command concatenates several audio files into a single audio file. The syntax is:

liqvid audio join [filenames..]

The last file name specified is the output filename, all other filenames are input files. Alternatively, the output filename can be specified using the --output or -o flag, in which case all filenames are input files. That is, the following are equivalent:

liqvid audio join -o audio.webm audio-*.webm
liqvid audio join audio-*.webm audio.webm

Internally, this uses https://trac.ffmpeg.org/wiki/Concatenate#demuxer; read that if you need more fine-grained control over this process.

transcribe

The audio transcribe command transcribes audio files to a transcript file (transcript.json) and captions (captions.vtt) file. The "rich transcript" transcript.json contains per-word timings. The syntax is

liqvid audio transcribe \
--api-key $API_KEY \
--api-url $API_URL \
-i audio.webm -t transcript.json -c captions.vtt

This uses ibm-watson and requires an IBM Cloud account. (Other transcription platforms may be provided in future.)

// liqvid.config.ts

// these shouldn't be in source control
import {apiKey, apiUrl} from "./keys";

module.exports = {
"audio": {
"transcribe": {
apiKey,
apiUrl
}
}
}

--api-key

IBM API key. You should probably specify this in the config file instead of on the command line.

--api-url

IBM Watson endpoint URL. This will be something like "https://api.us-south.speech-to-text.watson.cloud.ibm.com/instances/a80f1e72-6a00-11ec-90d6-0242ac120003". You should probably specify this in the config file instead of on the command line.

--captions, -c

Output filename for WebVTT captions. Defaults to "./captions.vtt".

--input, -i

Audio input filename.

--transcript, -t

Output filename for rich transcript. Defaults to "./transcript.json".