In other news.... the Genie is out

Google has literally let the genie out of the bottle:

https://sites.google.com/view/genie-2024

"We introduce Genie, the first generative interactive environment trained in an unsupervised manner from unlabelled Internet videos. The model can be prompted to generate an endless variety of action-controllable virtual worlds described through text, synthetic images, photographs, and even sketches. At 11B parameters, Genie can be considered a foundation world model. It is comprised of a spatiotemporal video tokenizer, an autoregressive dynamics model, and a simple and scalable latent action model. 

Genie enables users to act in the generated environments on a frame-by-frame basis despite training without any ground-truth action labels or other domain-specific requirements typically found in the world model literature. Further the resulting learned latent action space facilitates training agents to imitate behaviors from unseen videos, opening the path for training generalist agents of the future."

of course there is a PAPER on it HERE.... https://arxiv.org/abs/2402.15391


That's lovely, but "what can it do for me?". I'm glad you asked.

Well... from what I can tell, it can help to make a quick PoC (or PoS, depending on how well the Engine works) 2D platform game for you, without you having to do all the hard stuff.... y'know, like writing code, making pixel art, working out how to move sprites, collision detection, creating puzzles, defining original gameplay ideas, devising your own scoring mechanism....

oh, I started to get all negative there.  I'll come back to that later, for now, let's keep an open/empty/blank mind about this "progress".

That still sounds like, "computer do the thinking for me".... but y'know.... that is kinda what computers were invented for, I suppose.  Also, it does kind of read like this is not what it is actually going to be used for, ie. this "pattern" can be applied to 2D games platforms, but the learning & application techniques can & most probably will be applied in a totally different area.

I did read earlier that there has been a starting trend for the general public to stop using Google to search for the answers to things, for instance, over the past 10-20years, if you wanted to know the answer to something, you would open up a web-browser on a desktop PC, a laptop, a tablet or a phone and you would type or say (via STT) the question and, originally you would get a list of websites that might have content that could help you to find that answer, or more recently, you may actually get the response output to you and have links to the source websites..... and we'd be happy with that.

Google would be happy with that, as they have charged advertising companies to put ad's in the web-browser or they've charged companies to appear nearer the top of the search results, I mean who nowdays goes to page 2?  I'm saying that like I know? I've been using DuckDuckGo for YEARS now, so there is no page 2 or ooooooooo to click through, you just keep pressing [show more results] and the list grows longer down the screen

People now, apparently, are going to start just using xxxGPT tools to ask those questions and they probably won't fact check the answers either.  Nope, can't see that this is going to cause any issues at all.  nope.  all will be fine.  I can see the Gen. Z-ers snearing at me from behind their phone screens, whilst I see the reflection in their pupils of the latest TikTok fad wiping away the last elements of their personality or sense of original thought....and I ponder, "what did we create?" (double meaning there, in case you missed it)





Old-man rant time:

Sigh. okay, well, like all things xxxGPT or LLM focused, these models rely on EXISTING things for them to use as pattern matching starters, ie. they take "previous" things, like images, videos, screenshots, games, etc... and then use those as baselines to then come up with variants based on them... and that's the problem.  I know, I know... at some point the mythical AGI will pop into existence (and you think we'll still be alive at that point? lol) and the code / tooling will "create" something fresh & new, however, that is NOT how it works today, you are taking people's previous work / achievements / creativity / art & then just making a newer version of it.  

Okay, I hear the argument that that is pretty much what humans do, we learn by looking at an existing "thing", we then aim to replicate it, we then look to improve it and then we evolve it and then pat ourselves on the back for what we have achieved - "so what is the difference here?".  well, there is no "create", there are only a vast amount of parameters being put into the equation, ie. a human can take the experience of 1000 things over a 50year lifespan and can then apply that experience; however a "model" can take 1,000,000 things over a 5month lifespan and come up with options that we would not have reached in a 100 lifetimes (or more), so we "think" that it is smarter than it is - it is not, it just has more variant parameters, once you realise that this is not "intelligence" or "smart", you can start to understand that whilst this is novel & has a level of amusement, it does actually have a darker side for humanity.

What happens if you continue with something like Genie?  You keep it rolling forward another 10years.  It will still have a baseline of the 1980s / 90s / 00s (good games stopped there) but it will then stagnate - there is not any more "new" material for it to learn / work from.  Yes, it can come up with a few new variant branches, but they will still be similar.  You will end up with a very bland and boring outcome.

"Calm down Tony, it's only a 2D game generator!", well, yes & no.  I'm no doom-monger and I like the "idea" of this technology, however, "humans".  That is the problem, right there.  Humans have become lazy in society.  They have.  Of sorry, we now have technology to help us with our busy lives (well, there is an argument that the technology has made the lives busy for no value, but that's another rant for another time), however, we are offloading THINKING.  We are offloading CREATIVITY.  We are offloading HUMAN.  As a society, we're sleep-walking into an abyss of darkness where the doors will close behind us, we'll look back and about 1% will then realise, "oh f**k", but by then it'll be too late.

My take is - this is great and all that, for what it is... but it will never replace that 4-square centimetres of squishy stuff inside your skull - that evolved and grew for a reason, that reason was not for it to be shutdown and run at a minimum zombie state.  If you don't use it, you will lose it.

Enjoy playing your computer generated 2D platform game, mouth open wide, dribbling, eyes scanning the next "gap" to jump over, thumbs pressing the controller every 4secs (muscle stimulation to go with brain stimulation to keep you addicted) OR pick up a pencil (or pen), take a piece of paper and draw a design, draw out a series of screens for your character to navigate their way around, work out some puzzles for the game player to solve, be as nice or as nasty as you like, revel in that for a bit, sketch out some pixel characters, you could even use coloured pencils, eat some food, drink some sugary soda drink, then, well, although I do dislike Python, I'm going to recommend getting your hands on a copy of THIS BOOK - then, code and build your game, yourself.  





Click HERE to buy your copy.


OKAY, OKAY, OKAY, so this is actually the step where GENIE can be used, ie. take your sketches / images / photos and then turn those into part of the game level / design, so yes, I can see that it could be a useful way of going from "conceptual artwork" to "something real" without y'know, having to do that "hard bit" in the middle.


Regardless of whether you use Genie or code something yourself, it is actually satisfying to do, the sense of achievement, the sense of fulfilment, the sense that YOU did a thing (not a tool).  Okay, it might end up being a bit cr@p, but hey, you had a go - you MADE A THING.  You didn't just lazily type a few words into a text box and something else spewed out some generic thing that will keep you interested for about 3minutes until you go and scan (un)social media for your next dopamine hit.  I'll go get back in my garage and start prepping for when the world turns to cr@p IRL and "no", I won't be sharing any of my stuff to keep you alive, how selfish of me.  I'll be playing a RL game, called "how will you all live with no electricity for >90 days".....


Right, I guess it's time for me to go waste 10 minutes of my life having a quick game of Manic Miner:


..and yes, I filmed & uploaded that to YouTube in March 2007.... 17years ago..... yep, before some Gen Z-ers were even out of nappies....

Admittedly, the game itself came out of the brain of 1 person, way back in 1983 and I first played it at xmas 1984 on the Amstrad CPC 464 (as per the video above) and I loved the simple but frustrating challenges that I needed to overcome.... 


Whether you choose to use Genie or whether you wish to learn to make it yourself - go and CREATE, go and DO, you are living in a fortunate and blessed time, that will probably never exist again, so go and enjoy life for a bit - make something, read something, go do some gardening, help make a pond, help make a waterfall, help grow some plants / herbs / potatoes.... go and exist, with or without technology.


UPDATE:

okay, so after doing something very different & random - basically taking apart a patio door lock mechanism & figuring out how to replace the 20year old central mechanism, I started to think a bit more about the "Death of Google"....or the usage of alternatives.....

Google has been in place since the mid-1990s, so they've had a good stranglehold on their domain for long enough, however, as they are Billion / Trillion - aires? Big Tech, indeed.  They are in a position to continue to dominate, just due to the financial clout they have, so it's unlikely they will be going anywhere or get replaced by an upstart.... "why do I say that, Tony?"

Well, that's what I was thinking as I was deconstructing the door mechanism....

Google is built on advertising, the scraping and indexing of websites is a bonus, a draw, a feed to get you, the user to use their services.  However, the table is actually the other way around (same model for Meta / facebook / etc..etc...), the business is the business of selling things to people, pushing the adverts under the noses of people who just might click on the links and spend money.  That is BIG business.  it is not genius.  It's actually very simple.  However, all the wrappers around that is what makes it complex - just the sheer size of the data centres required would bankrupt most companies attempting to replicate such a service, managing, maintaining, etc...etc.. staying relevant in todays society is a full time job.

So, advertising.  That is the ££££ driver.

"What has this got to do with Large Language Models?"....well, if you recall my previous rant about society going down the road of becoming reliant upon LLMs for handling Q&A, where does that leave all that potential advertising revenue....well, the same place as it was before.  ATM xxxGPT and other open-source models are not engrained with advertising wrappers, you ask a question, you get a response, end of.  However, we're still in the honeymoon period.  This simplicity will end.  Then the revenue generators will want to get in on the act.  Big Tech joins the party once more.

I was also pondering where we might be headed, as Apple have made noises that they'll most likely be bundling "AI" (think offline Siri LLM on the iPhone) onto their next models this year & Dell are looking at forcing you to buy the latest new laptops that will have "AI" chips onboard & Micro$oft will spread "AI" throughout Windows 11/12 like some form of nervous system virus, so everything will have an LLM in it someplace... but they'll all be "generic", ie. capable of answering pub quiz style questions and general knowledge...

...and that is where I predict the next wave of money makers are going to come surfing over the waves.... "knowledge specific LLMs", KiSS-LLMs, if you will.

The purpose of these models will be to be very knowledge-able & specific about a focused subject, they will have knowledge of generic grammar, but, they will only be trained on a specific topic.  For instance, instead of attempting to be a Doctor, having to know about everything a human Doctor might know, there will be a model specifically for Ears, Eyes, Nose, etc... or an LLM that has indepth specific knowledge about Car Mechanics, but more granular than that, specific knowledge about gearboxes, fuel injection, ECUs, etc...

You can then have a master-LLM that can offload your question to these specific specialist KiSS-LLMs, yes, a bit like a multi-agent LLM (I've even written a few of these myself, last year), BUT, you're missing the point, I mean SPECIALIST, super-knowledgeable, very narrow focused, doesn't care about anything else, is obsessive just about their specialist subject matter.

You can still wrap advertising around this - in fact, the algorithm writes itself here, if you are wanting to know specific information about a certain element, then it is implied that associated elements will also be of interest to you, therefore showing adverts or prompts/questions/phrases to ask "other" KiSS-LLMs questions can be proposed to you.  kerching £$£$£$£$£$.  sigh, it's all sadly about the money, be nice if it wasn't.

anyway, turns out Screwfix didn't have the part I needed for the door - even though their website said they did have it instore (I guess technology isn't that great!), so I'll have to wait until tomorrow to put the door back together.

This thinking did remind me of the following though - instead of downloading the knowledge, we'll just have access to a specific KiSS-LLM model that can assist us in a specific task.

We all know the infamous Matrix quote, "I know Kung Fu". as Neo has the specific knowledge about just Kung Fu downloaded (made accessible) to him:

"I know Kung Fu"




Comments