Posts

Rotary phone, RPi5, STT & Ollama for an offline quirky assistant with TTS output - part 3

Image
As shown HERE (Part 1) & HERE (Part 2), I've been fiddling around with an old rotary phone, adding some switches and servo and hooking it up to a Raspberry Pi 5 in order to do some Speech to Text processing, some local / offline LLM processing and some Text to Speech output. Well, I've been faffing around with this on & off for a short while and I've now finally got the hardware & the software doing what it needs to do.  It is version 1.0, as in, it is in Python, however I do have it all setup to be able to drop down to C coding and see if that actually makes things faster / more efficient etc.. So, what was the plan? I wanted to take an old rotary phone, attempt to leave the externals as standard as possible, but make it so that a person can pick up the handset, dial a number, "ask a question", put the phone handset back down and then have the phone ring when an answer is ready, the person picks the handset back up and the answer is spoken to the pe...

RPi 5 & AI Kit (teaser)

Image
In a good & bad way, I've been somewhat distracted & busy with work/work - when that happens all my toy playing experiments go on hold. Whilst I'm far from having time to play again, I'm just dropping this here as a mental reminder for myself & as a teaser that this is sitting on my desk as a "NEXT" item to investigate. You can purchase it HERE Updates to follow.

Rotary phone, RPi5, STT & Ollama for an offline quirky assistant with TTS output - part 2

Image
Okay, following on from the success of Part 1 (okay, it was only about 8 hours ago, but y'know) I ventured into the Hardware side of things, looking at getting the software to interact with the hardware.  Time to get the screwdrivers out. As mentioned, I thought I was going to use Node-Red.  I burnt even more time trying to get Node-Red GPIO nodes to work.  Turns out that there are "issues" with RPi 5, Python and the GPIO access.  It took me far too long going around in circles to accept this.  I'll have a chat with DCJ when he's back from holiday in July. So, what did I do?  I went back to the layer underneath.  Yep, I did it people, I dropped into using Python.  Actually, I noticed that the node-red node was just dropping down to using Python anyway, so I was just removing the layer that was giving me issues. Here's the Node-Red error I was getting: It's odd as I can run that command not a problem and I followed all the instructions for the node...

Shhhh..... Whisper-WebGPU

Image
Original article: https://www.marktechpost.com/2024/06/08/whisper-webgpu-real-time-in-browser-speech-recognition-with-openai-whisper/ What is is referring to: https://huggingface.co/spaces/Xenova/realtime-whisper-webgpu What does it mean: You can now do Speech To Text conversion DIRECTLY in the web-browser.  wow.  that IS impressive. NO data leaves your web-browser. Nice. Go try it: This opens up a whole world of possibilities..........

Rotary phone, RPi5, STT & Ollama for an offline quirky assistant with TTS output - part 1

Image
Did I say, "rotary phone?"  Sure did. "What is one of those?" (top left in photo above) Well, back in the day we had these odd things that we made phone calls from - yep, just phone calls.  People used them to call other people, other people used phones to call them, it had a funky dial to select the numbers and a headset you picked up and put to the side of your head.  It was great. Anyway, I had a funky idea to re-purpose one of these device, hijack the microphone and the speaker of the headset, allow a person to speak a question that they want answered, pass that feed into a Raspberry Pi 5, convert the Speech to Text (using state of the art OpenAI Whisper - yes, OFFLINE!), then pass that into an LLM (powered by Ollama Engine running OFFLINE), then convert the response back to Speech, trigger the phone to basically make it RING! - person picks up the phone and the answer to their question is spoken back to them. Funky huh?  As an implementation pattern it does dem...

Weaviate Verba RAG with Node-Red & Ollama Engine

Image
Been absent a while, have had many things to be focused on; however, this recent little nugget needed to be documented & shared, mainly because I did this on my personal laptop & I need to recreate it somewhere else and this mechanism just makes it easier - also, this might help someone else out too. Right, so what am I talking about? About a year ago I was doing some new stuff with LLMs and RAG (ingesting own documents as the data to use rather than the LLM training data) and it was okay-ish, it did the job.  Zoom forward a year and obviously things have moved on, quite a bit. The RAG tools & code have improved significantly, it still takes time to ingest though - haven't found a way to speed that part up, well, I'm focused on offline/airgapped/onpremise solutions, it could probably be faster if using a Cloud SaaS offering, but that is of no interest to me, so I'll accept the time it takes. What are the steps inolved? Get a bunch of documents, upload them to be...

What Siri should be = Inflection Pi

Image
I predicted a while back, maybe a year ago that the whole chatGPT LLM ( Large Language Model ) "thing" will hit it's peak around Aug '23/Sept '23 and then decline towards Dec '23/Jan '24 where people will start looking at non-pay / non-monetised usage of LLMs. I've also been a keen advocate of moving the usage / runtimes OFF of "other people's servers", ie. what you call " Cloud ", because they are incentivised to implement vendor lock-in in subtle ways such as getting you to use a service that only they offer, or store your core data in a datastore that you cannot export / lift&shift elsewhere without it costing more than it is worth, therefore stealth lock-in. I cannot really complain, businesses are in the business of business, therefore, they are driven by financial transactions and you, as the customer ( still makes me chuckle that the "IT people" call customers "end users", just like drug dealers r...