Utrecht University Virtual Human Controller (UUVHC)

Getting Started


Virtual Human Controller is a collection of modules that allows to create virtual characters that can talk, perform facial expressions and gestures. It provides a quick pipeline to build interactive conversational characters from scratch.

Why is it useful?

If you are a researcher or game developer that likes to include virtual characters with realistic appearance and social behaviors, UUVHC is for you. For example, you might want to develop a job skills training game where a virtual character takes the role of an interviewer. Or you want to train police officers where they interact with non-player characters in a game that takes the role of citizens or victims. You could also imagine a game to help people with social anxiety to safely experience social situations. It is up to your imagination! Role-playing characters are required in almost every application area including business, health, education, security and military.

What does it actually do?

The asset is built on top of the Unity 3D Game Engine. The animation pipeline includes three steps:

  • Importing an animatable 3D character: We export rigged 3D models from Daz3D Editor together with the blendshapes for speech and facial animation and import them to Unity. That requires no designer effort in terms of animation.
  • Individual animation controllers: It includes animation controllers for speech, gaze, facial expressions and gestures. For speech and gaze, we currently use 3rd party low-cost or free assets from the Unity Asset store. For text-to-speech, we use CereVoice API. Facial expressions and gestures are based on Unity’s animation blending and blendshapes features. We also provide a database of conversational gesture animations captured with a Vicon Motion Capture device that exists at our university.
  • Multi-modal animation generation: It generates multi-modal synchronized animations thanks to the Behavior Mark-Up Language (BML). We developed a BML Realizer for Unity. Why is BML useful? It has a human-readable notation that allows to semantically combine any facial expressions, gestures, speech and gaze animation together.

Where is it used currently?

Virtual Human Controller is an asset partly developed in the RAGE (Realising an Applied Gaming Ecosystem) Horizon 2020 project and partly by the Utrecht University Game Research Seed Money. We successfully integrated our asset with Communicate! dialogue manager from Utrecht University and with the Emotion Appraisal module from INESC-ID from Portugal. In addition to inter-asset integration, our asset is currently being used by the game developers at BipMedia in Paris. It will take part in the interviews skills training game for Randstad.

It was recently used for a case study where the virtual character takes the role of a Virtual Receptionist. The set-up includes a microphone to capture people’s speech and a Kinect camera to capture their behaviors. Beyond the functionalities of the Virtual Human Controller, we added Google speech recognition and chatting functionalities using AIML Pandorabots. Furthermore, we developed a novel autonomous gaze control module based on Kinect to drive the “look at” behavior of the virtual character in group-based interactions.

We showed our results as a live interactive demo in two public events recently: one in May for the visit of EU ambassadors to Utrecht and the other one in June during the INTETAIN 2016 conference. Take a look at the video below:


The software has an Utrecht University license. 3rd party assests have their own licenses and needs to be downloaded/purchased from their related websites. It is currently available to RAGE game researchers/developers and for internal projects at Utrecht University.


For questions and feedback, please contact Dr. Zerrin Yumak at z.yumak@uu.nl. See here our research page at Utrecht University.

User Manual

Below we provide step-by-step instructions for the Virtual Human Controller.

Import a virtual human

The first step is to prepare a 3D model to be imported to Unity. We currently use Daz3D Editor. It provides rigged models and allows to export blendshapes for facial animation without any designer effort. Desired accessories such as clothes and hair can be downloaded from the Daz3D store.

The model is exported as .fbx file from Daz3D Editor. Please make sure that you add export rules to include the visemes and facial expressions.

Download the Unity project and drag and drop the .fbx file to the "Models" folder under Assets. Add the model to the scene. You will see the list of blendshapes (facial expressions and visemes) attached to the body mesh.

Notice that the eyes and hair of the model has problems. You can fix that by playing with the shader settings. For more information, please check the Unity Manual. Alternatively, you can see a manual here. For the background and lighting settings, we worked with a designer from the Utrecht School of the Arts (HKU).

Behavior Mark-up Language Realizer for Unity

In order to realize multi-modal animations combining speech, facial expressions, gestures and gaze, we developed a Behavior Mark-up Language (BML) Realizer for Unity. BML is an XML description language for controlling the verbal and nonverbal behaviors of an embodied conversational agent. It describes behaviors and synchronization constraints between these behaviors. Once the character is ready, you can add the BMLRealizer script and start setting the individial Speech, Face, Gesture and Gaze Realizers and their parameters as described below.

Speech Animation

For speech animation, currently we are using an asset from the Unity Asset Store, Rogo Digital. The free version of the asset Rogo Digital Lite works well. For extra functionality, you can also check the Rogo Digital Pro. Text-to-speech is based on CereProc SDK. The free academic version comes with a free voice Heather (Scottish English female). We currently use Isabella (American English female). You can test for different voices here.

Once your model is in the scene, you can add the Rogo LipSync component by using Add Component button in Unity. Rogo LipSync includes 9 phonemes (+1 rest frame) and you need to map them to the blendshapes exported from Daz3D. You also need to create an Audio Source file and link it from the LipSync component.

Please see below the mapping of the Rogo phonemes to Daz3D blendshapes. For the rest frame, we have chosen any blendshape and set its value to 0.

Rogo Digital Phonemes Daz3D Blendshapes
AI head.eCTRLvAA
E head.eCTRLvEE
U head.eCTRLvUW
O head.eCTRLvOW
FV head.eCTRLvF
L head.eCTRLvL
MBP head.eCTRLvM
WQ head.eCTRLvW

Since Rogo Digital lipsync works offline with sound files, we linked RogoDigital to CereVoice TTS callback functionality. TTS callback example can be found in the examples/basictts folder once you download the CereVoice SDK. We wrote a set of scripts in Unity to extract phonemes from CereVoice and to pass them to RogoDigital. The voice and license files from CereVoice and the necessary dlls needs to be added to the Unity project to make it work. Cereproc provides an API for both Mac OS X and Windows. Currently, we made the link only for the Windows version. The last step is setting the parameter of the Speech Realizer to Rogo Digital so that it can find it.

Facial Animation

Facial animation is based on blendshapes in Unity. These are exported from Daz3D and can be seen listed under Genesis3Female.Shape skinned mesh renderer. In the Face Realizer settings, you need to select Genesis3Female.Shape as the Morph Container. Then, add the number of face lexemes that you want to include and link them to the blendshapes as seen below. For example, for Suprised, the blendshape number in the blendshapes list is 5, which matches to head.eCTRLSuprised in the blendshapes list. If you want to create different facial expressions than the ones provided by Daz3D, you can use the face primitives as basis and create different combinations in 3dsMax or Maya or work with a 3D artist.

Conversational Gestures

Once you add the 3D character model to the project, you can set the Humanoid Rig. It is possible to get animations from Mixamo. For more information on animation, please check the Unity Manual.

The next step is to create an animation controller. In our example, we have a base animation layer with a standing idle motion

and a second animation layer for the waving animation with a right hand mask. This is to use only the desired part of the original waving animation which effects the whole body parts. Define a transition from the No motion state to the Wave state, which is triggered by the Wave boolean variable defined in the Parameters tab.

The final step is to link the Gesture Realizer to the animation controller of the model and adding the supported gestures list.

It is possible to add gesture parameters such as attackPeak according to BML specification. This is not fully supported yet in our BML Realizer.

Gaze Animation

For the gaze animation, add the Head Look Controller script from the asset store and set the parameters.

Then, link the Gaze Realizer to the Head Look Controller script and create gaze targets from your scene.

BML Example

Take a look at the video below

and its corresponding BML script.