Posts

Showing posts from June, 2024

Rotary phone, RPi5, STT & Ollama for an offline quirky assistant with TTS output - part 2

Image
Okay, following on from the success of Part 1 (okay, it was only about 8 hours ago, but y'know) I ventured into the Hardware side of things, looking at getting the software to interact with the hardware.  Time to get the screwdrivers out. As mentioned, I thought I was going to use Node-Red.  I burnt even more time trying to get Node-Red GPIO nodes to work.  Turns out that there are "issues" with RPi 5, Python and the GPIO access.  It took me far too long going around in circles to accept this.  I'll have a chat with DCJ when he's back from holiday in July. So, what did I do?  I went back to the layer underneath.  Yep, I did it people, I dropped into using Python.  Actually, I noticed that the node-red node was just dropping down to using Python anyway, so I was just removing the layer that was giving me issues. Here's the Node-Red error I was getting: It's odd as I can run that command not a problem and I followed all the instructions for the node about gr

Shhhh..... Whisper-WebGPU

Image
Original article: https://www.marktechpost.com/2024/06/08/whisper-webgpu-real-time-in-browser-speech-recognition-with-openai-whisper/ What is is referring to: https://huggingface.co/spaces/Xenova/realtime-whisper-webgpu What does it mean: You can now do Speech To Text conversion DIRECTLY in the web-browser.  wow.  that IS impressive. NO data leaves your web-browser. Nice. Go try it: This opens up a whole world of possibilities..........

Rotary phone, RPi5, STT & Ollama for an offline quirky assistant with TTS output - part 1

Image
Did I say, "rotary phone?"  Sure did. "What is one of those?" (top left in photo above) Well, back in the day we had these odd things that we made phone calls from - yep, just phone calls.  People used them to call other people, other people used phones to call them, it had a funky dial to select the numbers and a headset you picked up and put to the side of your head.  It was great. Anyway, I had a funky idea to re-purpose one of these device, hijack the microphone and the speaker of the headset, allow a person to speak a question that they want answered, pass that feed into a Raspberry Pi 5, convert the Speech to Text (using state of the art OpenAI Whisper - yes, OFFLINE!), then pass that into an LLM (powered by Ollama Engine running OFFLINE), then convert the response back to Speech, trigger the phone to basically make it RING! - person picks up the phone and the answer to their question is spoken back to them. Funky huh?  As an implementation pattern it does dem