If anybody has the time, could somebody read over my UTAU video essay?

cubialpha

Ruko's Ruffians
Defender of Defoko
i'm on mobile rn but i wrote some notes for you, sorry if it's sloppy or hard to follow!

----

overall sould quality - i would change this to "varying sound quality"

it'd be cool to say something about how utau may be tricky but it has many benefits, such as creating any language voicebanks, precise control over consonants & pronunciation, and being able to change rendering engine (resamplers)

headphones or earphones to check sound quality - i would just say " headphones or earphones", don't need to state what they're used for.

i would add a section about recording environment as well, (soundproofing etc) cause it's really important.


- i wouldn't say cv is necessarily the easiest the oto, it's just the shortest list to oto. mention that it's choppy because there are no recorded transitions between syllables.

- i also wouldn't say vcv is complex to oto (the process is actually easier than cv), it's just a very huge amount to oto. i would mention that the syllable transition happens in the vowel. vowels are meant to overlap and crossfade, creating the smooth transition.

also some screen shots here of how the sounds overlap and transition in the vowel would help

maybe break it down like what i did in the cvvc section below

- cvvc : i would say that this differs from vcv in that the syllable transition happens during the consonant.

for example the recording "soso" would break down into 3 usable pieces
"-so", "os" and "so".
and "koko" gives you these pieces
"-ko", "ok", and "ko"

(maybe show the wav form of "soso" & "koko" and then the corresponding oto of each parts)

then show some visual how those parts from these recordings can be separated and combined. showing them color-coded could really help (make parts from the "soso" recording red and "koko" blue or something)

[-so + ok] + [ko]
[-ko + os] + [so]

then word examples when you take the whole list into consideration

[-so + ok] + [ku]
[-ko + os] + [sa + am] [me]
[-mi + id] + [do + or] + [ri]

(it might be easier to follow to talk about cv first, then cvvc, then vccv, then vcv last)


vccv: just say that it's basically cvvc, but it's accommodating for languages that use consonant clusters and consonant endings.

it features the additions of ending consonants ( VC- ) and consonant clusters for parsing ( -CC and CC- )


oto section
cv - i would mention something about aspirated and unaspirated consonant otos. like the way t, b, p, etc should have a pause before them and the overlap should be placed before the consonant sound here (you can include some screenshots of cvvc otos, and direct them to those to visibly see the pause as a visual example)

as opposed to how y, m, s, etc you do want the overlap inside the consonant area somewhere.

(show some visual of this here)

* i would include somewhere that as a good rule of thumb, the overlap value being exactly half of the preutterance value is generally fine for most things

vcv: i'd mention something about the best way is to find a good base oto, and mainly move only the offset value in setparam while keeping the other values fixed (i would show a video of this in setparam here). and say that, generally, 300 preutterance and 150 overlap are good values for vcv.

if you want to get real precise about it, it's a good idea to slide the overlap a bit back into the previous syllable's vowel area if it's far away. and to also refine the white vowel area. (mention somewhere how you can listen to each oto section isolated in setparam, and it’s a good idea to listen to the white vowel area like this and pick the optimal spot for looping/stretching)

cvvc: on the VC oto, the preutterance should go at the end of the first vowel, and the overlap can be set the exactly half the preutterance value (i would say p 250 o 125 or p 200 o 100 are generally good values for VC oto).

the cutoff should go before the second consonant starts (for b, t, p, ch, etc) or before the second vowel (s, m, n, f, h, etc) . in the white area, try to isolate the best stretchable area of s,m,f, etc consonants, or the cleanest pause area of t, p, b, k, etc consonants (as this is where the transition it occurring).

ust section:
you can mention that utau doesn't have a word dictionary like vocaloid or other programs so phonemes must be manually combined to form words

i'd mention something about how one of the little benefits of using utau over other programs is the extremely precise control over consonants you have with ctrl+left click (mainly for cvvc/vccv)

also some run down of the interface, keyboard shortcuts (ctrl+g, ctrl+e), p2p3, stp0, where plugins are, etc etc etc

you should mention something about presamp being good for using japanese cv or cvvc usts. especially for cvvc japanese cause it allows you to enter cv phonemes while it runs the cvvc transitions in the background (it also does this for vcv as well)

i'd show more screenshots of different types of usts just for visual examples (cv, cvvc, english, other languages etc)

mention lyric parser plugin for english usts (turns words into vccv phonemes)

don't forget to mention how one big benefit of utau is being able to change the rendering engine on the fly (resamplers)

plugins: definitely mention iroiro and setparam and utaformix
 
Last edited:

kukuism21

Momo's Minion
Thread starter
i'm on mobile rn but i wrote some notes for you, sorry if it's sloppy or hard to follow!

----

overall sould quality - i would change this to "varying sound quality"

it'd be cool to say something about how utau may be tricky but it has many benefits, such as creating any language voicebanks, precise control over consonants, and being able to change rendering engine (resamplers)

headphones or earphones to check sound quality - i would just say " headphones or earphones", don't need to state what they're used for.

i would add a section about recording environment as well, (soundproofing etc) cause it's really important.


- i wouldn't say cv is necessarily the easiest the oto, it's just the shortest list to oto

- i also wouldn't say vcv is complex to oto (the process is actually easier than cv), it's just a very huge amount to oto. i would mention that the vowels are meant to overlap and crossfade, creating the smooth transition.

also some screen shots here of how the sounds overlap and transition in the vowel would help

- cvvc : i would say that this differs from vcv in that the transition happens during the consonant.

for example the recording "soso" would break down into 3 usable pieces
"-so", "os" and "so".
and "koko" gives you these pieces
"-ko", "ok", and "ko"

(maybe show the wav form of "sasa" & "koko" and then the corresponding oto of each parts)

then show some visual how those parts from these recordings can be separated and combined

[-so + ok] + [ko]
[-ko + os] + [so]

then word examples when you take the whole list into consideration

[-so + ok] + [ku]
[-ko + os] + [sa + am] [me]
[-mi + id] + [do + or] + [ri]


vccv: just say that it's basically cvvc, but it's accommodating for languages that use consonant clusters and consonant endlings.

it features the additions of ending consonants ( VC- ) and consonant clusters for parsing ( -CC and CC- )

oto section
cv - i would mention something about aspirated and unaspirated consonant otos. like the way t, b, p, etc should have a pause before them and the overlap should be placed before the consonant sound here (you can include some screenshots of cvvc otos, and direct them to those to visibly see the pause as a visual example)

as opposed to how y, m, s, etc you do want the overlap inside the consonant area somewhere.

(show some visual of this here)

* i would include somewhere that as a good rule of thumb, the overlap value being exactly half of the preutterance value is generally fine for most things

vcv: i'd mention something about the best way is to find a good base oto, and mainly move only the offset value in setparam while keeping the other values fixed (i would show a video of this here). and say that generally (300 preutterance and 150 overlap are generally good values)

if you want to get real precise about it, it's a good idea to slide the overlap a bit back into the previous syllable's vowel area if it's far away.

cvvc: on the VC oto, the preutterance should go at the end of the first vowel, and the overlap can be set the exactly half the preutterance value (i would say p 250 o 125 or p 200 o 100 are generally good values for VC oto).

the cutoff should go before the second vowel. in the white area, try to isolate the best stretchable area of s,m,f, etc consonants, or the cleanest pause area of t, p, b, k, etc consonants

ust section: i'd mention something about how one of the little benefits of using utau over other programs is the extremely precise control over consonants you have with ctrl+left click (mainly for cvvc/vccv)

you should mention something about presamp being good for using japanese cv or cvvc usts. i'd show more screenshots of different types of usts just for visual examples (cv, cvvc, english, other languages etc)

don't forget to mention how one big benefit of utau is being able to change the rendering engine on the fly (resamplers)

plugins: definitely mention iroiro and setparam and utaformix
thank you for your time! I'll be sure to include all of those!
 

cubialpha

Ruko's Ruffians
Defender of Defoko
thank you for your time! I'll be sure to include all of those!
hope it helps, i made a couple edits to the post just now but i think it about covers my thoughts. lots of visual/audio examples and visual breakdowns would be really helpful
 
  • Like
Reactions: Halo

Big_B

Ritsu's Renegades
Defender of Defoko
i'm on mobile rn but i wrote some notes for you, sorry if it's sloppy or hard to follow!

----

overall sould quality - i would change this to "varying sound quality"

it'd be cool to say something about how utau may be tricky but it has many benefits, such as creating any language voicebanks, precise control over consonants, and being able to change rendering engine (resamplers)

headphones or earphones to check sound quality - i would just say " headphones or earphones", don't need to state what they're used for.

i would add a section about recording environment as well, (soundproofing etc) cause it's really important.


- i wouldn't say cv is necessarily the easiest the oto, it's just the shortest list to oto

- i also wouldn't say vcv is complex to oto (the process is actually easier than cv), it's just a very huge amount to oto. i would mention that the syllable transition happens in the vowel. vowels are meant to overlap and crossfade, creating the smooth transition.

also some screen shots here of how the sounds overlap and transition in the vowel would help

maybe break it down like what i did in the cvvc section below

- cvvc : i would say that this differs from vcv in that the syllable transition happens during the consonant.

for example the recording "soso" would break down into 3 usable pieces
"-so", "os" and "so".
and "koko" gives you these pieces
"-ko", "ok", and "ko"

(maybe show the wav form of "soso" & "koko" and then the corresponding oto of each parts)

then show some visual how those parts from these recordings can be separated and combined

[-so + ok] + [ko]
[-ko + os] + [so]

then word examples when you take the whole list into consideration

[-so + ok] + [ku]
[-ko + os] + [sa + am] [me]
[-mi + id] + [do + or] + [ri]


vccv: just say that it's basically cvvc, but it's accommodating for languages that use consonant clusters and consonant endings.

it features the additions of ending consonants ( VC- ) and consonant clusters for parsing ( -CC and CC- )


oto section
cv - i would mention something about aspirated and unaspirated consonant otos. like the way t, b, p, etc should have a pause before them and the overlap should be placed before the consonant sound here (you can include some screenshots of cvvc otos, and direct them to those to visibly see the pause as a visual example)

as opposed to how y, m, s, etc you do want the overlap inside the consonant area somewhere.

(show some visual of this here)

* i would include somewhere that as a good rule of thumb, the overlap value being exactly half of the preutterance value is generally fine for most things

vcv: i'd mention something about the best way is to find a good base oto, and mainly move only the offset value in setparam while keeping the other values fixed (i would show a video of this here). and say that generally (300 preutterance and 150 overlap are generally good values)

if you want to get real precise about it, it's a good idea to slide the overlap a bit back into the previous syllable's vowel area if it's far away. and to also refine the white vowel area. (mention somewhere how you can listen to each oto section isolated in setparam and it’s a good idea to listen to the white vowel area like this and pick the optimal spot for looping/stretching)

cvvc: on the VC oto, the preutterance should go at the end of the first vowel, and the overlap can be set the exactly half the preutterance value (i would say p 250 o 125 or p 200 o 100 are generally good values for VC oto).

the cutoff should go before the second consonant starts. in the white area, try to isolate the best stretchable area of s,m,f, etc consonants, or the cleanest pause area of t, p, b, k, etc consonants (as this is where the transition it occuring).

ust section:
you can mention that utau doesn't have a word dictionary like vocaloid or other programs so phonemes must be manually be combined to form words

i'd mention something about how one of the little benefits of using utau over other programs is the extremely precise control over consonants you have with ctrl+left click (mainly for cvvc/vccv)

also some run down of the interface, keyboard shortcuts (ctrl+g, ctrl+e), p2p3, stp0, where plugins ars, etc etc etc

you should mention something about presamp being good for using japanese cv or cvvc usts. especially for cvvc japanese cause it allows you to enter cv phonemes while it runs the cvvc transitions in the background

i'd show more screenshots of different types of usts just for visual examples (cv, cvvc, english, other languages etc)

maybe mention lyric parser plugin for english usts (turns words into vccv phonemes)

don't forget to mention how one big benefit of utau is being able to change the rendering engine on the fly (resamplers)

plugins: definitely mention iroiro and setparam and utaformix
Cubi: I'm on mobile so don't expect too much.

*Cubi writes a better text than I could ever done on my PC*

Me: e....e
 
  • Like
Reactions: cubialpha

cubialpha

Ruko's Ruffians
Defender of Defoko
i corrected some spelling and added 1 or 2 more additional things. let me know if you need critique on later drafts or need help making example screenshots, i always wanted there to be a definitive video series for utau (but i always worry they'll have incorrect information or won't be up to my standards etc so i'd love to help you make it as good as possible!)
 
  • Like
Reactions: Big_B

kukuism21

Momo's Minion
Thread starter
i corrected some spelling and added 1 or 2 more additional things. let me know if you need critique on later drafts or need help making example screenshots, i always wanted there to be a definitive video series for utau (but i always worry they'll have incorrect information or won't be up to my standards etc so i'd love to help you make it as good as possible!)
sounds great, thank you! I need all the help I could get honestly ha, but I seriously appreciate your help!
 

Similar threads