Posts

Rotary phone, RPi5, STT & Ollama for an offline quirky assistant with TTS output - part 2

Image
Okay, following on from the success of Part 1 (okay, it was only about 8 hours ago, but y'know) I ventured into the Hardware side of things, looking at getting the software to interact with the hardware.  Time to get the screwdrivers out. As mentioned, I thought I was going to use Node-Red.  I burnt even more time trying to get Node-Red GPIO nodes to work.  Turns out that there are "issues" with RPi 5, Python and the GPIO access.  It took me far too long going around in circles to accept this.  I'll have a chat with DCJ when he's back from holiday in July. So, what did I do?  I went back to the layer underneath.  Yep, I did it people, I dropped into using Python.  Actually, I noticed that the node-red node was just dropping down to using Python anyway, so I was just removing the layer that was giving me issues. Here's the Node-Red error I was getting: It's odd as I can run that command not a problem and I followed all the instructions for the node...

Shhhh..... Whisper-WebGPU

Image
Original article: https://www.marktechpost.com/2024/06/08/whisper-webgpu-real-time-in-browser-speech-recognition-with-openai-whisper/ What is is referring to: https://huggingface.co/spaces/Xenova/realtime-whisper-webgpu What does it mean: You can now do Speech To Text conversion DIRECTLY in the web-browser.  wow.  that IS impressive. NO data leaves your web-browser. Nice. Go try it: This opens up a whole world of possibilities..........

Rotary phone, RPi5, STT & Ollama for an offline quirky assistant with TTS output - part 1

Image
Did I say, "rotary phone?"  Sure did. "What is one of those?" (top left in photo above) Well, back in the day we had these odd things that we made phone calls from - yep, just phone calls.  People used them to call other people, other people used phones to call them, it had a funky dial to select the numbers and a headset you picked up and put to the side of your head.  It was great. Anyway, I had a funky idea to re-purpose one of these device, hijack the microphone and the speaker of the headset, allow a person to speak a question that they want answered, pass that feed into a Raspberry Pi 5, convert the Speech to Text (using state of the art OpenAI Whisper - yes, OFFLINE!), then pass that into an LLM (powered by Ollama Engine running OFFLINE), then convert the response back to Speech, trigger the phone to basically make it RING! - person picks up the phone and the answer to their question is spoken back to them. Funky huh?  As an implementation pattern it does dem...

Weaviate Verba RAG with Node-Red & Ollama Engine

Image
Been absent a while, have had many things to be focused on; however, this recent little nugget needed to be documented & shared, mainly because I did this on my personal laptop & I need to recreate it somewhere else and this mechanism just makes it easier - also, this might help someone else out too. Right, so what am I talking about? About a year ago I was doing some new stuff with LLMs and RAG (ingesting own documents as the data to use rather than the LLM training data) and it was okay-ish, it did the job.  Zoom forward a year and obviously things have moved on, quite a bit. The RAG tools & code have improved significantly, it still takes time to ingest though - haven't found a way to speed that part up, well, I'm focused on offline/airgapped/onpremise solutions, it could probably be faster if using a Cloud SaaS offering, but that is of no interest to me, so I'll accept the time it takes. What are the steps inolved? Get a bunch of documents, upload them to be...

What Siri should be = Inflection Pi

Image
I predicted a while back, maybe a year ago that the whole chatGPT LLM ( Large Language Model ) "thing" will hit it's peak around Aug '23/Sept '23 and then decline towards Dec '23/Jan '24 where people will start looking at non-pay / non-monetised usage of LLMs. I've also been a keen advocate of moving the usage / runtimes OFF of "other people's servers", ie. what you call " Cloud ", because they are incentivised to implement vendor lock-in in subtle ways such as getting you to use a service that only they offer, or store your core data in a datastore that you cannot export / lift&shift elsewhere without it costing more than it is worth, therefore stealth lock-in. I cannot really complain, businesses are in the business of business, therefore, they are driven by financial transactions and you, as the customer ( still makes me chuckle that the "IT people" call customers "end users", just like drug dealers r...

Snow, a little rant about society/AI and using a GPS, BDS, GLONASS, GALILEO and QZSS positioning device

Image
Well, the sun is out, the frost is melting, it actually snowed yesterday, it didn't snow anywhere else in the UK ( that I'm aware of ), but it did snow a LOT in California ( and still is )  Here's a photo of my garden from yesterday morning, I was very surprised as it meant I couldn't really continue with the landscape waterfall remodelling that I'd started the previous day: With such loveliness that Spring is coming and Winter is just making an attempt to show it is still a force to be reckoned with, I was pondering on how I spend my last couple of days of "freedom", ie. the as-before-mentioned-time-off.  I'll document below the tech. toys that I decided to setup & get working ready for next months "time off" where I can pull a lot of this technology together to actually start to "make something". Oh and then I noticed this headline in my fav. IT news website:  IBM announces European layoffs and for some uncontrollable reason,...