Cascadeur character animation software has reached an audience of 130 thousand people and is being released in Product Hunt today. We thought this was a great opportunity to talk to its founder about the challenges that the tool is currently facing, and, most importantly, about its integration with neural networks.
Alexander Semenov, Editor-in-Chief of App2Top: Zhenya, hello! Look, since December I wanted to talk to the Cascadeur team on the topic of graphical neural networks. By the way, how do you feel about them in general?
Evgeny Dyabin, founder and chief producer of Cascadeur: We treat them with great interest and even reschedule them for our tasks. Speaking globally, I do not share the panic about the sudden technological revolution with the disappearance of many professions. The advent of cameras has not reduced the number of artists, and the advent of cameras with auto-tuning in each phone has not reduced the number of professional photographers and operators, although it has changed their work. In addition, while these new neural networks have many problems and limitations, if you try to put them into practice, it greatly cools the first impression. Let’s see how fast the progress will be.
Excellent. So, there was a desire, there was no reason. And today you will be released on Product Hunt. A great reason to start with. Why do I need a Product Hunt?
Evgeny: The fact is that Product Hunt is a whole community where makers, startups and enthusiasts gather and communicate. Every day they evaluate and discuss new products, and their evaluation means a lot to various specialized media.
By the way, I will take this opportunity to appeal to readers – if you want to support our project, then you can do it today on the Product Hunt website and even leave your feedback or question – we will definitely answer everything.
Cascadeur is coming to Product Hunt today
Cascadeur’s user base is 130 thousand people. You yourself said that there was a queue of game and film companies lining up for you. What will Product Hunt give you that you haven’t achieved yourself yet?
Evgeny: Yes, both major studios and indie developers know about us – our recognition is high. But there is a serious threshold between knowing about software and deciding to take up its study. If we receive the “product of the day” award at Product Hunt, then this will play the role of social proof from the expert community. I think this will allow us to position ourselves differently for the major media and convince more people to try Cascadeur.
In December, Cascadeur had an official worldwide release after a year of beta. What challenges did you face as a service after that?
Evgeny: The release led to a large influx of users, and this significantly increased the load on support. There were a lot of questions, but the number of specialists remained the same, and it is not so easy to scale them – they must be well versed in the program and technical details.
Therefore, we are doing a FAQ to speed up the support of the “first line”, and also prioritize the support of users of the Pro version – when applying for a license, they will immediately receive a link to a closed channel. In general, we optimize the process.
It was also the first time we came across the fact that some companies need to buy 40 licenses at once, for example. I had to urgently finish this functionality. Also, it turned out that 20% of our users are students or employees in the field of education, so we need to add educational licenses.
You’re probably collecting feedback. Based on it, what are users usually missing in Cascadeur today?
Evgeny: Users lack thousands of little things, and it’s impossible to do everything. Therefore, we have to choose something that is most in demand or important in our opinion. For example, improved compatibility with Blender or the ability to download audio. There are very frequent, but complex requests such as facial animations or blendshapes, which sooner or later will have to be done, but so far we do not have enough resources.
As for our main topic – physics, many users expect to see ragdoll, collisions and interaction with the environment. While our Autophysics is engaged in correcting the dynamic balance in movements, correcting the trajectory and rotation in jumps, adding secondary vibrations and surges – this greatly improves animation and speeds up work on it. But so far there really isn’t enough opportunity to push off walls, climb ledges or interact with other characters. We are actively working on all this.
Some other users complain about the inconvenience of working with the fingers of the character due to the need to rotate each phalanx. But in the next update, we will add Finger Auto–positioning – a smart rig that makes it very easy to control your fingers through multiple controllers.
I will share my pain a little. When I saw what the Stable Diffusion — ControlNet — Blender bundle was capable of, I immediately thought of Cascadeur. Unlike Blender, it is focused on posing, and I am sure that its entry threshold is less than that of Blender (and it is definitely more convenient than the basic functionality of ControlNet). In this regard, the question is: is it worth waiting for SD support in Cascadeur?
Evgeniy: This idea is on the surface! We also thought about it right away! And not just us. However, only we have the most convenient tool for controlling the character’s pose, suitable for beginners and amateurs. We can only add to this a neural rendering based on Stable Diffusion, which will turn a 3d model into a ready-made 2d image in any style. In general, we are already working on it.
We have always wanted to reach a wider audience, but the problem with Cascadeur is that it does not deliver the final product, but is only a link in the production chain. But neurorendering can erase this line for a wide audience, especially in the mobile version of Cascadeur. You download a mobile application, put a fairly natural pose using Autoposing, select the camera angle, upload a picture of a character and get this character in the desired pose from the desired angle. This will give a lot more creative control than is possible in principle with a text description. So far, we are talking only about pictures, but someday it will come to videos, and in this the advantages of Cascadeur will manifest themselves in full force.
The first neural networks capable of making videos are already appearing. Two or three years and, most likely, we will see the first full-fledged short films. But, will they be built on the principle of text-to-animation or on the principle of text+animation-to-animation, where an animated mockup will also act as a promt?
Evgeny: So far I am skeptical that in the near future neural networks will be able to generate high-quality 3d animation without resorting to physical simulation. We consider text-to-animation as a primary draft that we can clean, correct poses and physics – and get high-quality animation that the user can edit using our tools. I can’t imagine how, without additional control, you can get the necessary result from the text alone, except in the most trivial cases.
If you add video neurorendering to this, which will become possible sooner or later, then the tool turns out to be quite magical. You describe the idea in words, get a realistic enough version, edit it and at the output you have a ready-made video with the right character in any style. But so far this is all a concept, and implementation is still far away.
Cascadeur itself was built on the basis of neural networks. In other words, you have been in this topic for a long time. However, have you used them only as part of training your service? Or somewhere (in some other areas) have you experimented with them yet?
Evgeny: First of all, I want to note that so far there are more physics in Cascadeur than neural networks, and this is the main difference from most of the solutions we know for generating movements. We started using neural networks a few years ago and achieved the greatest success in the Autoposing tool, which helps to make a natural pose with the least amount of actions.
If we talk about the company Nekki as a whole, neural networks have been used in other projects for different tasks. As an example, I can name bots in Shadow Fight 4: Arena. They are trained in player battles and are able to control different characters using their special techniques and tactics characteristic of these characters.
Many animations in Cascadeur are done like this: a video clip is loaded into the program as a template, according to which the animator reproduces key frames. How realistic is it within the framework of your service to implement a model that will be ready to make a draft animation based on video alone? Are you working in this direction?
Evgeny: Yes, we are working on it – it’s something like a mockup on video. Already in the next version of Cascadeur there will be the first alpha version of this feature. While it is not possible to achieve good quality, but as a reference animation draft, in which there are at least key poses and timings, this can save a lot of time. We will develop and optimize this feature. It works slowly on the client, so we need to transfer it to the server – then we will be less limited by performance.
Returning to business: now they say that investors are running like crazy to invest in AI. Have you experienced this in your own experience? What are such investors waiting for, who should understand that they are essentially competing with Microsoft, Google and Adobe in this field?
Evgeny: I think there is no competition with giants here. Large companies are investing in the technology itself, and small companies and startups are trying to use this technology in different areas. We also do not develop our neural rendering or video mockup from scratch, but use available libraries and models, configuring and retraining them for our tasks.
It seems to me that investments in such projects will increase greatly now. We are funded by Nekki, but we are also open to suggestions.
By the way, did you expect such attention to neural networks that they received after the popularization of MJ, SD and ChatGPT?
Evgeny: A few years ago we understood that the future lay with neural networks and their implementation in various tools, so we began to recruit data scientists and engage in research and development of our tools in order to be ready for the revolution. But the successes of Midjourney and ChatGPT still surprised us and gave us new hopes and ideas like neural rendering, text-to-animation and others.
Are you ready to give any forecast? What to expect in the framework of AI technologies, including your own, at least until the end of the year?
Evgeny: It seems to me that the world and the market will not change as quickly as many are afraid, but first of all, investing will change, which reflects faith in a certain future. In the coming year, I would expect the emergence of intelligent search and a change in the approach to the tutorials – now people will be able to receive personal help and instructions in their tasks. Generative networks will help to prototype and try different ideas faster, but in most cases they will not be able to produce the final result.
As for us, we consider AI precisely as an assistant speeding up the work of an animator who knows what he wants to get. Motion recognition from the reference, Autoposing, Autophysics, neurorendering – all this is primarily a reduction in the time between the idea and the result, but at each stage the animator has full control and can make any changes. I think by the end of the year we will already be able to show off new AI features.
And two technical questions: when getting acquainted with the limitations of the free version, I noticed that the user can export models only with “300 frames and 120 joints”. For those who are not familiar with animation, explain, how much is it?
Evgeny: That’s it, from a fabulous future to the urgent need to pay for development! The idea behind the restrictions is to enable amateurs and indie developers to use Cascadeur for free without depriving them of important functionality. 300 frames at 30 frames per second is 10 seconds. This is more than enough for game animations, but not enough for long cutscenes. Also, 120 points is enough for almost any character, but not enough for several characters or for very fancy ones.
When to wait for Godot support?
Evgeny: As soon as we add support for the glTF format, immediately after the support for the USD format. In general, we are working on it.