How to make your own Utau voicebank?

HowlingWolf15

Ritsu's Renegades
Defender of Defoko
First, you need a microphone. Don't use your laptop mic as it'll come out very bad. If you have a mic from a game or something than that can be of use. Download OREMO or Audacity to record the samples. I personally recommend OREMO as it already comes with a reclist and sends the samples to the folder you set. After you record the samples, now its time to OTO. OTOing is a bit more complicated, but there are many good tutorials for it such as this one.

Now, this part is optional, but most people do this, creating a character. Now when you create your character, make them however you want, but for somethings to avoid look at this guide. Also, its not required to give your UTAU a design, you can just do what Yamaha did for the VY's.

Finally, now that you've finished your UTAU now its time to show them off. Upload a cover to SoundCloud or YouTube, and make sure you listen to criticism people give you so you can improve.

Well, that's all I can think of right now. Hope you have good time here :smile:

EDIT: Also, start with a CV voicebank since your a beginner, and as you get better, then try and record a VCV one, and next time you need help, post it in the Help and Advice forum.
 
Last edited:

수연 <Suyeon>

Your friendly neighborhood koreaboo trash
Supporter
Defender of Defoko
First thing's first: you'll need your tools to record with.
- Oremo, or audacity, or a legit DAW
- Oremo is the tour du jour program to record banks with as it contains a tuner (so you'll know the note you're on), a reclist (you'll need to be familiar with hiragana if you use the one it comes with and just know that it has extras that are unnecessary), automatically names and saves the file to standard 44100; 16bit wav, and can be used to generate a basic oto.
- audacity (compatible with linux, mac, and windows) is a free DAW-like program for basic recording and mixing. It's handy if Oremo doesn't play nice with your machine or you can't get the hang of more complex DAWs
- a legit DAW can be anything from FL Studio, Reaper, Mixcraft, etc. It's not necessary, but it offers more options for making your mixed covers sound nice, compared to audacity. If you want to create your own music, an actual DAW allows this to be a reality without a ton of hoops to jump through.

- A mic
- if you're just starting out and have no equipment whatsoever, then your built in mic on the computer or smartphone (you'll need an app that can record wav) will have to suffice, but it's not recommended (the mic made for the laptop is very substandard as your average sound card is... well, it's crap to put it politely; well, your phone is of course... a phone. It'll make your bank sound like it came from a tiny mic). If you're able, try to invest at least $75 for something cheap, but okay at the price point (like the CAD U37 - I can't vouch for it personally cause I've never used it, but it seems to be recommended a lot). I don't recommend spending over $100 for a USB mic. Once you get over $100 price points, you might as well go XLR. For first time recording, you don't have to go broke yet. Only think about spending serious money if you plan on recording for an extended period (like, a year or more passes and you're still making banks/improving upon the first bank)

After recording, you'll need to configure the bank (or you can have someone do it for you if they're feeling generous - people usually are with CV banks). Try not to get too frustrated at this step as CV banks are always the biggest hurdle. You can do this through UTAU's built in voice configuration tool or with SetParam.

Art isn't necessary unless you want to have an avatar. Soundcloud and Youtube are popular avenues to debut banks with, so if you have a channel or profile, you can post your first demo.
 
  • Like
Reactions: 歌手音ピコ

歌手音ピコ

Bruuuuuuuuuuuuuuh Uta-U-Utauu
Thread starter
Thank you :smile:
[doublepost=1467207522][/doublepost]
First, you need a microphone. Don't use your laptop mic as it'll come out very bad. If you have a mic from a game or something than that can be of use. Download OREMO or Audacity to record the samples. I personally recommend OREMO as it already comes with a reclist and sends the samples to the folder you set. After you record the samples, now its time to OTO. OTOing is a bit more complicated, but there are many good tutorials for it such as this one.

Now, this part is optional, but most people do this, creating a character. Now when you create your character, make them however you want, but for somethings to avoid look at this guide. Also, its not required to give your UTAU a design, you can just do what Yamaha did for the VY's.

Finally, now that you've finished your UTAU now its time to show them off. Upload a cover to SoundCloud or YouTube, and make sure you listen to criticism people give you so you can improve.

Well, that's all I can think of right now. Hope you have good time here :smile:

EDIT: Also, start with a CV voicebank since your a beginner, and as you get better, then try and record a VCV one, and next time you need help, post it in the Help and Advice forum.

What is the difference between a CV voicebank and a VCV one?
 

수연 <Suyeon>

Your friendly neighborhood koreaboo trash
Supporter
Defender of Defoko
Thank you :smile:
[doublepost=1467207522][/doublepost]

What is the difference between a CV voicebank and a VCV one?

CV voicebanks are single sound recordings (this applies to the oto, the recording itself can have multiple sounds in a single file) - as in, the oto will only factor in a single sound. You can record a bank as such...
ka
ki
ku
ke
ko
kya
kyu
kyo

If you use a bank and it can read: か (or ka) without any alteration necessary, then it's CV compatible. CV is the most basic recording method and is usually difficult for new users to oto because there are no guides in terms of where to place Overlap for a smooth transition.

_______

VCV recordings are exclusively recorded in strings - these strings can be anywhere from 2 to 8 mora (sounds) per string. You would record strings as such (this is an example from my own reclist):

ka_ka_ki_ka_ku_ka
ka_ke_ka_ko_ka
ki_ki_ku_ki_ke_ki
ki_ko_ki
ku_ku_ke_ku_ko_ku
n_ku
ke_ke_ko_ke_n_ke
ko_ko_n_ko
n_ka
n_ki

Each new line would be a new string and they're recorded without pauses. This method can be either easy or difficult depending on 1) the amount of time, personal space, and quiet you're allowed, 2) the reclist you use, 3) whether you suffer from glitching audio or other issues (VCV is very strict in terms of being in rhythm, on pitch, and not having audio cut off prematurely where as CV is more forgiving).

VCV is easier to oto because when done right, it allows uniformity and you can see where the transition is from vowel to the next consonant.

As for comparison...



The bank doing Rin's part is VCV. Len's part is CV.
 
  • Like
Reactions: 歌手音ピコ

歌手音ピコ

Bruuuuuuuuuuuuuuh Uta-U-Utauu
Thread starter
CV voicebanks are single sound recordings (this applies to the oto, the recording itself can have multiple sounds in a single file) - as in, the oto will only factor in a single sound. You can record a bank as such...
ka
ki
ku
ke
ko
kya
kyu
kyo

If you use a bank and it can read: か (or ka) without any alteration necessary, then it's CV compatible. CV is the most basic recording method and is usually difficult for new users to oto because there are no guides in terms of where to place Overlap for a smooth transition.

_______

VCV recordings are exclusively recorded in strings - these strings can be anywhere from 2 to 8 mora (sounds) per string. You would record strings as such (this is an example from my own reclist):

ka_ka_ki_ka_ku_ka
ka_ke_ka_ko_ka
ki_ki_ku_ki_ke_ki
ki_ko_ki
ku_ku_ke_ku_ko_ku
n_ku
ke_ke_ko_ke_n_ke
ko_ko_n_ko
n_ka
n_ki

Each new line would be a new string and they're recorded without pauses. This method can be either easy or difficult depending on 1) the amount of time, personal space, and quiet you're allowed, 2) the reclist you use, 3) whether you suffer from glitching audio or other issues (VCV is very strict in terms of being in rhythm, on pitch, and not having audio cut off prematurely where as CV is more forgiving).

VCV is easier to oto because when done right, it allows uniformity and you can see where the transition is from vowel to the next consonant.

As for comparison...



The bank doing Rin's part is VCV. Len's part is CV.


Thank you, I understand it now. :smile:
 

Similar threads