Expanding VBs With AI?

Eternal-Aurath · Jun 2, 2023

So with how advanced AI stuff has gotten, it has me wondering if expanding a voicebank with AI is possible? Like taking a Japanese voicebank and having AI generate English samples for it? It's clearly possible in some form, with SynthV doing the cross-language thing for all it's VBs, but idk if we have access to something that would let us do that? I have so many banks from years ago that I can't re-record that I'd love to be able to expand like this.

Kiyoteru · Jun 2, 2023

There's already AI singing synthesis in the form of NNSVS. To develop a new voicebank you'll need to record yourself singing songs, then label the phonemes, then train the model. Reusing UTAU recordings is not ideal for this, due to the monotone style. The model would capture your pronunciation and tone of voice as well as your singing style.
If you only want the tone of voice, you can use Diff-SVC, which functions more as a voice changing effect than as a synthesizer. To use a diff svc model you'll need to input a reference acapella track, and this could be in any language. Of course, that means it's also going to copy the pronunciation and accent of the reference track. UTAU recordings are still not ideal for training data because it'll have a very limited pitch range. You'll want to sing songs again, or render out various songs with the UTAU bank if you don't mind the resampler sound. No phoneme labelling is required.

Halo · Jun 2, 2023

It goes without saying, but just in case: if you wanted to do this with an UTAU that does not belong to you, you must get permission for both training on/modifying the source files AND have permission from whoever made the English samples you use to convert (presuming diff-SVC or another SVC) if not your own. Please be considerate and ethical!

Search

Expanding VBs With AI?

Eternal-Aurath

Momo's Minion

Kiyoteru

UtaForum power user

Halo

Icon by Wanpuccino @ DA

Similar threads