instagram github facebook arrowDown arrowLeft arrowRight cancel check circleCancel clock coffee dribbble location mail mouse plus reply replyAll retweet search skype speechBubble tag twitter user vimeo vimeo2 vsco
  • TypeStudy
  • CourseInteractive Media
  • Semester7th
  • SupervisorProf. Erich Schöls
Read more

Herbert is an approach to make a man-maschine interface as human and empathic as possible. At the same time Herbert points the fact, that there will be machines more intelligent than mankind rather soon.

Herbert is controlled by speech input, for example:

  • "Who are you?"
  • "Whats is your name?"
  • "Do you want to marry me?"
  • "What is your favourite dish?"
  • "Do we know each other?"

At the same time it has some advanced functions, like:

  • Herbert asks you how you feel today and reacts according to your answer
  • At the beginning of the dialogue it identifys your gender and average age
  • You say "I want to be you" and Herbert activates a mirror mode, where all your facial expressions are mirrored to his face
  • If you ask for a secret, you have to tell him one of yours, before he tells you one of an earlier visitor


To make herbert as human-like as possible, there was no point of animating all the facial expressions I wanted to use, because it takes a lot of time and would probably still look clearly artifical. Since the human brain is very sensitive for facial expressions, it becomes just harder to create a convincing animation. So recording was the way to go:

I stumbled upon FaceShift, a markerless motion capture software for facial expressions which basically transfers all of my facial expressions to a 3D representation of a head. It works pretty well, all hardware I needed was a cheap Microsoft Kinect. As a result Herbert is capable of showing a wide range of human expressions:

Technological singularity

Herbert is oversized whereas the visitor is making himself smaller by sitting down. A hint how superior the machines will be in the near future.

All experts say basically the same: It's just a matter of time until we will create an artifical intelligence, that will become infinite times more intelligent then the entire humanity. And it won't be 200 years until we are at this point, instead it could be a matter of 30-60 years (If you are interested in this topic, I highly recommend reading this article about superintelligent AIs, and read part 2 too!).

Anyway, I wanted to pick up this fact with this project, since Herbert theoretically could be an accessible interface for a superintelligent AI in the future. This is accomplished by lifting the head on a complete different level compared to the visitor by showing it as an oversized representation floating in an infinite space above the viewer. The deep voice with added sound-effects like echo and delay support this concept by being "godlike".


Look development inside Unreal Engine 4.

Due to the fact I used the rigged face exported from FaceShift, the room for visual creativity was kind of limited. I aimed for a look where the head is floating in an abstract space without visible boundries. Also I wanted an clearly artifical look, so for example no human skin colors. The lightning should be somewhat mystic and creepy (think of a flashlight pointing at your face from below).

The eye color at the same time has a functional meaning too: it changes from green to blue as soon as the program regognizes a human.

Technical setup

Technical setup for a presentation. Hardware used: iMac, Router, Kinect, Microphone, Webcam, Beamer, Speakers.

From a technical standpoint, this project was the most complex I did during my study. It included several technologies which were all linked with Open Sound Control, Spacebrew or the like.

The main application and logic is runned by Processing:

  • collection of all the recognition data from FaceShift and Google Speech
  • read a storyboard.json file, where all the clips and sentences herbert is capable of speaking are saved
  • save all the data inputted by visitors (e.g. names)
  • send commands to OSX for the speech synthesis
  • communication with Unreal Engine
  • run all the custom functions

This part of the blueprint is for setting all values for the morph targets of the face (basically values for “smile”, “wink”, etc)

In Unreal Engine things are quite different: The only writte code, was for setting up a TCP-Socket for communicating with processing (which I already created for the Visual Donation project). The excellent Blueprint-System is quite nice for designers to create prototypes and applications without writing any code (and Unreal Engine is based on C++, which is quite difficult!).

Things I accomplished only with blueprints in Unreal Engine included:

  • using the TCP-Socket class to communicate with processing
  • playing the animation clips
  • using processing tracking data to face the visitor all the time
  • mirroring the visitors facial expressions

Thank you :)