Since you say offline processing is ok, maybe check whether
audiogrep can either be inverted to remove matching words/phrases instead, or simply to output the timestamps at which they occur (then use something else to do the muting/whatever).
On the detecting commercials front, there is
this blog post on audio ad-blocking but it doesn't offer solutions for sponsored messages (other than pointing out that speech recognition is likely the first step).