Unfortunately the recommended FOSS options like eSpeak are just not there yet, and either lack coherency, clarity/fidelity, voice quality, or some mix of the above. Downloading the offline voices and recognition bundles and disabling network access in Speech Services by Google is the way to go. You do not need GSF/GMS at all for it to work.
However on the STT side, I am having a weird problem on one of my devices where the list of downloadable speech recognition models has disappeared from the app, including the already-downloaded ones, and even after reinstalling the app entirely it has not come back... Maybe a Google outage. No clue how it happened as network access was denied at the time, so it unexpectedly ceasing to function was without any clear stimuli. Haven't been able to pin down what happened or reproduce on another device, so fair warning if you decide to use the offline STT functionality.