3D Printed robot head using OpenCV

Okay, so this post needs a PAGE, not just a single post as it's going to be rather long (I suspect).

As this is a personal project, I'll be posting the information about it all here in this section and will link to it from the main POST page.  It'll have photo's, videos & C++ code (wow! yes, this is a Python-free zone).



Background

Many moons ago, I purchased a 3D printer.  I printed an Inmoov robot head.  

It was in PLA.  Not the white stretchy/flexible stuff (ABS), so it's rather brittle and is more a piece of artwork.  I have had a love/hate relationship with 3D printers.  I have two.  The 2nd one was rather expensive and was meant to help me by having a self-levelling bed...hmmm....yeah, about that...anyway, I've not used either of them for about a year, but I will get back to them at some point.

Anyway, I did get as far as mounting 4 servos in the head, 1 for movement of left/right, 1 for movement of the jaw line, to emulate speaking and the other 2 are mini-servos to move the actual eye-balls.  The eye-balls are actually just hacked apart re-purposed Microsoft web-cams!  They are just the right size and have a decent image quality to them.  I did actually mount, align & fit them into place and there they have sat ever since.  Why?  well, I once attempted to use an old laptop of mine to connect ROS (Robot Operating System v1.0) to them in order to do what I wanted to do...and well, the performance absolutely sucked.  Also, it was very overly complex to setup and code to achieve what I wanted to do.  I knew what I wanted to achieve, but the timing was wrong - that was my excuse.

My goal was to be able to "see" via the web-cams, detect a person and then determine where the person was and then move the robot head to keep it aligned with the person (but stop at a certain degree otherwise we'd end up with the scene from the exorcist).  I also wanted to then use a little bit of smartness to detect "who" the person was, ie. myself or my wife and then hook it up to a chat-bot style interface to then greet the person differently depending on who it was....I was going to cheat to start off with and just use the linux espeak command as that'd do the job very quickly without having to use any cloud services etc...

You get the vision.




Devices

Obvious choice.  Raspberry Pi.  Hmmm...actually an Arduino UNO was the first choice.  It's dumb, it's simple, it's coded to do one thing...it will be great for just controlling the servo's or providing a reaction to an event.  Unfortunately, because I get distracted by shiny things (a lot), I had already used my Arduino connected to a Gravity: HuskyLens - an easy to use AI vision sensor (which is awesome btw!) and yes, I should perhaps look at some method of using that device, but that is the actual output concept that I was looking to achieve.  I wanted to process the stream of 2 web-cams (at the same time) and perform object detection/recognition to detect a face and to then do some event.  As I'll explain shortly, I then thought I'd test this out on a couple of different laptops / Linux variants before porting over to a device.


This did however lead me into finding out about a nifty little board called a Rock Pi X - this looks like a Pi, smells like a Pi, is the same size as a Pi.....except it is based on an x86 CPU... what does that mean?  Well, a Pi is ARM based and if this is x86 based, I can just copy the executable from my laptop and it'll run on the device.  Technically, I wouldn't actually need to install anything else (although as I found out, yes I did).  I swiftly set about installing Ubuntu 20.04 onto it and was up and running pretty quick.

I performed the OpenCV installation as described below - oh, please don't waste your time with ANY other information on the good old internet, especially when it comes to bundling Python and OpenCV together - f*ck me, there is a sh*t load of b*ll*x out there either duplicated on websites or youtube videos that spout a complete and utter load of sh*te about setting up OpenCV....ignore it all, it'll just waste your time.  As my time is less now than it was previously, I have to consider how I'm going to spend that time and trust me, I want a refund.  Funny thing is.... I should have just RTFM :-D  lesson learnt.

I then "found" that I had a couple of Raspberry Pi 3 devices knocking around, one from 2015 and one from 2017.  I also found that I had a nice 7" touchscreen and a case.  About time I introduced the two to each other.  I install Raspbian (Debian) variant onto the Raspberry Pi 3+ and repeated the same exercise.

Interestingly, the Raspberry Pi gave me the same grief as the Rock Pi X and Deepin laptop linux in ref to issues with libgtk2.0-dev, basically, when it came to actually rendering a window to display the web-cam frame an exception would get thrown and it would not continue.  Was very annoying, until I believe I found the missing library and then all worked just fine.



Software

I knew what I wanted to achieve.  I wanted to read the video stream from the web-cam cameras and interpret/analyse the content and do some stuff based on that.  I foolishly starting writing some code in C that was very low-level....and then I was prompted by facebook of all things, about an image of myself in 2012 in Riyadh, Saudi Arabia...where I was doing the exact same thing via an IBM Lenovo laptop.

The software I was using was OpenCV.  Of course, with over 20years experience and know-how going into the codebase, why shouldn't I just use that as the basis, okay, I could live with the bloat-ware, all that extra stuff that I didn't really need, but I might choose to use in the future?... So, fine, I thought I'd give it a go.

As you would expect, I have a "few" laptops available for my personal usage.  I have a MacBook Pro for work/work and I try my best to keep that reserved for work/work.  I do have numerous VMWare images that I've created to try things out in Linux, knowing that if I balls it up, an easy "fix" is to just start up a new VMWare image.  On this occasion though, I thought that seeing as I have 2 "spare" web-cams (not the robot head one's, another 2), why don't I just plug them into a laptop and getting it going and thus began the "fun".... :-D

Installation of OpenCV.  Okay, this is going to be for "v4.5.1-pre" (latest at time of writing this).

$ sudo apt update && sudo apt install -y cmake g++ wget unzip

$ sudo apt-get install build-essential pkg-config

$ sudo apt-get install libjpeg-dev libtiff5-dev libjasper-dev libpng12-dev

The Raspberry Pi didn't seem to like libjasper or the linpng, so I just left them off the install.

$ sudo apt-get install libavcodec-dev libavformat-dev libswscale-dev libv4l-dev

$ sudo apt-get install libxvidcore-dev libx264-dev

$ sudo apt-get install libgtk2.0-dev libgtk-3-dev

$ sudo apt-get install libatlas-base-dev gfortran

$ sudo apt-get install libcanberra-gtk*

I'm not 100% sure, but I believe that installing the above library fixed the mis-leading error that I kept getting telling me to re-install libgtk2.0-dev and pkg-config (I concluded that this error was a red-herring)

$ sudo apt-get install python3-dev

$ sudo apt-get install python3-pip

$ sudo pip3 install numpy scipy

Again, for some reason the Raspberry Pi had issues with this last command - I ctrl+C'd it and it doesn't seem to have caused any issue for me so far.

Okay, so that was all the "prep" work to get everything lined up before you start on the actual OpenCV installation.

Alas, there is no end result binary to use (I'm ignoring Windows, pretty much like I have done the past 10 years), so you're going to have to download and Build the code for the machine you're running it on.  Now, this isn't as scary as it used to be back in 2001.  If you've done all the steps above and they all went okay, then all you're going to do now is lose a bit of time, it'll take anything from 50mins to 2hours to complete the following.

$ cd ~/Downloads

$ wget -O opencv.zip https://github.com/opencv/opencv/archive/4.5.1.zip

$ unzip opencv.zip

This expands out into an /opencv-4.5.1 sub-folder. Change the folder name to be opencv  

$ wget -O opencv_contrib.zip https://github.com/opencv/opencv_contrib/archive/4.5.1.zip

$ unzip opencv_contrib.zip

This expands out into an /opencv-4.5.1_contrib sub-folder. Change the folder name to be opencv_contrib

$ cd opencv (this is where you need to create the build folder)

$ mkdir -p build && cd build

$ cmake -D CMAKE_BUILD_TYPE=RELEASE \
    -D CMAKE_INSTALL_PREFIX=/usr/local \
    -D INSTALL_C_EXAMPLES=OFF \
    -D INSTALL_PYTHON_EXAMPLES=OFF \
    -D OPENCV_GENERATE_PKGCONFIG=ON \
    -D ENABLE_NEON=ON \
    -D OPENCV_EXTRA_EXE_LINKER_FLAGS=-latomic \
    -D ENABLE_VFPV3=ON \
    -D BUILD_TESTS=OFF \
    -D OPENCV_ENABLE_NONFREE=ON \
    -D OPENCV_EXTRA_MODULES_PATH=~/opencv_build/opencv_contrib/modules \
    -D BUILD_EXAMPLES=OFF ..

$ make -j4  (this will make the code using all 4 CPUs - the RPi4 will get hot!)

Now, this is the bit that will take some time and your device WILL get hot - if you leave the System Monitors / nmon running you'll see that yourself.  Make sure you don't just have your device resting on a wooden topped desk - it will burn it and possibly set fire to something!  As mentioned above, this can take from 1-2 hours depending.  One thing I did do before this was to remove any form of screen power off, as it's really annoying and if you just want to check the progress, you will be at the mercy of the interrupt cycles for you to be able to login and the screen to refresh to show you status.  Minor niggle, but, as I say, I did this about 5 times.  I did have a couple of scenarios where the build got to about 72% and another at 98% and then just locked up, no recovery.  Had to unplug device and start again.  Turns out that was more than likely due to the micro SD Card being used, I changed them and all was good.

Once you've gone and made dinner, eaten dinner, done the washing up and watched an episode of Z-Nation, it'll be time for you to execute the following command.  It is quick, but often forgotten about!

$ sudo make install

$ sudo ldconfig   (this is done so that the RPi can find the new opencv software we've just installed)

And we're all done!  We now have an OpenCV installation on the device all ready and waiting to be used.

As I needed to code (later on some python that used items from the contrib libraries, I've added the steps above, but there is one last one that you need to do in order for that code to work)

$ python -m pip install opencv-contrib-python

This downloads and installs the contrib files for python.

One little change I had to make for the Raspberry Pi 3+ and that 7" touchscreen was to get it to flip the right way up!  For some odd reason it was upside down - a quick Duck Duck Go search and a solution was found.



Non-scientific benchmarking test

The Rock Pi X surprised me.  I assumed it would be more capable.

The Raspberry Pi 3B+ actually did genuinely surprise me, it just 'felt' quicker and was.

I'll update this with the Raspberry Pi 4 once it arrives.



As I stated, I initially performed this test on the Rock Pi X, which after doing some further research actually has something like a 4year old CPU inside it.  Now, I didn't think that would have that much of an impact, I was wrong.  It's a shame, but hey-ho.  Note the CPU usage - it is outputting the 2 web-cams here to rather large windows..


I then performed the same thing with the Raspberry Pi 3+.


I noticed straight away that the 'detection time' values were significantly smaller for the Raspberry Pi 3+.  So I then set about changing my code so that I had smaller windows displayed - maybe just the vast size of the windows was causing such a lag?.... I reduced them to be 100x100 pixels and placed them side-by-side.

The Rock Pi X, even with reduced window sizing still had a very high CPU usage and those 'detection times'...hmmmm.....

This did have a little effect.  But I still see the Rock Pi X has average CPU usage of 95% for all 4 cores and the 'detection time' is average of 750ms.

The Raspberry Pi 3+ had an average CPU usage of 84% for all 4 cores (not a major difference) but the 'detection time' was averaging 300ms.  Now, that may or may not have been due to the size of the monitor (I doubt it) and I'll test it again with plugging into the same monitor (as I type that I am arguing with my brain about how illogical that statement is and how it should never have been written).


All that I can conclude is, based on the data I have before me... the Rock Pi X is about twice as slow to grab the web-cam image, run it against the facial detection model and render the output to the window.

Video evidence

As you can see this is the Raspberry Pi 3+ performing the task that is required of it:

As you can see this is the Rock Pi X performing the task that is required of it:



Conclusion

Well, it's a little un-concluded at the moment as I want to repeat this exercise on a Raspberry Pi 4b and see if that just blows away all the results from above.  If it does (and I kinda think it might), then that's the device I'll go with moving forward.  I appreciate that in real life usage, I won't actually be outputting the web-cam frames to a window on a display, so therefore that piece of coding might be less abusive on the CPU of the device, but I'll test that again in the future for all 3 devices.

I also have a servo controller board for a Raspberry Pi on it's way to me from ThePiHut, so when that arrives, I'll extend the simple C++ code to then invoke the servos and complete the actual task that I set out to do.

I'm hoping that the Raspberry Pi 4b will have a lot more grunt, so that it can do the web-cam visual checking as well as the "extra" pieces that I want to do, such as actually hooking it up to the WiFi and to Cloud based chat-bot services, so that I can then do a lot more useful things.  I'll also fit the speaker into the head too (the casing is printed) and that's simple to attach to the Raspberry Pi.

The next step is to also figure out how to manipulate the GPIOs from C++ code, luckily here's an old way of doing it that might get me going.


Just for kicks, I think I'll have this robot head as my interface for Project 'O'......



CLICK HERE TO SEE THE RASPBERRY PI 4 UPDATE - IT SURPRISED ME



sample C++ Code used

#include "opencv2/objdetect.hpp"
#include "opencv2/highgui.hpp"

#include "opencv2/imgproc.hpp"
#include "opencv2/videoio.hpp"

#include <iostream>
using namespace std;
using namespace cv;

void detectAndDraw( Mat& img, CascadeClassifier& cascade );
//we just want to detect a face, no need for anything more at the moment
string cascadeName = "/home/tony/Downloads/opencv-master/data/haarcascades/haarcascade_frontalface_alt.xml";

int main( int argc, const char** argv )
{
    VideoCapture capture0, capture2;
    Mat frame0, image0, frame2, image2;
    CascadeClassifier cascade, nestedCascade;

    if (!cascade.load(samples::findFile(cascadeName)))
    {
        cerr << "ERROR: Could not load classifier cascade" << endl;
        return -1;
    }
    int camera0 = 0; //used for left eye
    int camera2 = 2; //used for right eye
    if(!capture0.open(camera0))
    {
        cout << "Capture from camera #" <<  camera0 << " didn't work" << endl;
        return 1;
    }
    if(!capture2.open(camera2))
    {
        cout << "Capture from camera #" <<  camera2 << " didn't work" << endl;
        return 1;
    }
    if( capture0.isOpened() && capture2.isOpened() )
    {
        cout << "Video capturing has been started ..." << endl;
        for(;;)
        {
            capture0 >> frame0;
            capture2 >> frame2;
            if( frame0.empty() )
                break;
            if( frame2.empty() )
                break;

            Mat frame1 = frame0.clone();
            detectAndDraw( frame1, cascade );
            namedWindow("left", 0);
            resizeWindow("left", 100,100);
            moveWindow("left", 10,10);
            imshow("left", frame1);

            Mat frame3 = frame2.clone();
            detectAndDraw( frame3, cascade );
            namedWindow("right", 0);
            resizeWindow("right", 100,100);
            moveWindow("right", 400, 10);
            imshow("right", frame3);

            char c = (char)waitKey(10);
            if( c == 27 || c == 'q' || c == 'Q' )
                break;
        }
    }
    return 0;
}

void detectAndDraw( Mat& img, CascadeClassifier& cascade )
{
    double t = 0;
    double scale = 1;
    vector<Rect> faces;
    const static Scalar colors[] =
    {
        Scalar(255,0,0),
        Scalar(255,128,0),
        Scalar(255,255,0),
        Scalar(0,255,0),
        Scalar(0,128,255),
        Scalar(0,255,255),
        Scalar(0,0,255),
        Scalar(255,0,255)
    };
    Mat gray, smallImg;
    cvtColor( img, gray, COLOR_BGR2GRAY );
    double fx = 1 / scale;
    resize( gray, smallImg, Size(), fx, fx, INTER_LINEAR_EXACT );
    equalizeHist( smallImg, smallImg );
    t = (double)getTickCount();
    cascade.detectMultiScale( smallImg, faces,
        1.1, 2, 0
        //|CASCADE_FIND_BIGGEST_OBJECT
        //|CASCADE_DO_ROUGH_SEARCH
        |CASCADE_SCALE_IMAGE,
        Size(30, 30) );
    t = (double)getTickCount() - t;
    printf( "detection time = %g ms\n", t*1000/getTickFrequency());

    for ( size_t i = 0; i < faces.size(); i++ )
    {
        Rect r = faces[i];
        Point center;
        Scalar color = colors[i%8];
        int radius;
        double aspect_ratio = (double)r.width/r.height;
        if( 0.75 < aspect_ratio && aspect_ratio < 1.3 )
        {
            center.x = cvRound((r.x + r.width*0.5)*scale);
            center.y = cvRound((r.y + r.height*0.5)*scale);
            radius = cvRound((r.width + r.height)*0.25*scale);
            circle( img, center, radius, color, 3, 8, 0 );
        }
        else
            rectangle( img, Point(cvRound(r.x*scale), cvRound(r.y*scale)),
                       Point(cvRound((r.x + r.width-1)*scale), cvRound((r.y + r.height-1)*scale)),
                       color, 3, 8, 0);
    }
}

Comments