If auto-tune can change a pitch without distorting it, why can't UTAU do the same?

Azumi

Momo's Minion
i was using melodyne the other day and i changed the pitch of one of the notes i sung. i moved it down like 3 whole steps and it just sounded like i sang it that way when i recorded it, and it didn't sound robotic or anything (although i know that some people intentionally use auto-tune to add a robotic effect on purpose but that's not what i was doing)
so why can't utau do that? essentially isn't utau doing the same thing but just with pre-recorded samples? but it comes out so metallic..
also i am new here and didnt bother making an intro thread. i probably put this in the wrong category. sigh sorry.
edit: i meant to put this in UTAU for the prefix but it wont let me edit it on my phone OTL
 

Dangosan

Jellie Bellie Pete Rat Gummie Candie
Defender of Defoko
UTAU is developed independently by a hobbyist, while Melodyne is developed by people with experience.
Furthermore, the resampler corrects the pitch of the sample before moving it to a different note. The metallicness/engine noise may vary with different resamplers.
The noise can be improved with allowing a larger chunk of a sample to be stretched or using looping instead of stretching.
 

수연 <Suyeon>

Your friendly neighborhood koreaboo trash
Supporter
Defender of Defoko
Plenty of factors...
- Some resamplers will make a voice more metallic. Switch between various resamplers and earlier versions to find one that renders the best sounding result.

- Mic quality. There's a saying: garbage in, garbage out. If your mic is low quality, that increases the chances of a metallic sounding output. Samples that are too short also effect the quality of output.

- oto quality. A bad oto can make a library sound as bad as Defoko's original settings. As in choppy, metallic, like a speak-n-spell.
 

Azumi

Momo's Minion
Thread starter
Plenty of factors...
- Some resamplers will make a voice more metallic. Switch between various resamplers and earlier versions to find one that renders the best sounding result.

- Mic quality. There's a saying: garbage in, garbage out. If your mic is low quality, that increases the chances of a metallic sounding output. Samples that are too short also effect the quality of output.

- oto quality. A bad oto can make a library sound as bad as Defoko's original settings. As in choppy, metallic, like a speak-n-spell.

that does make sense. however i thought oto'ing was basically sort of like cropping/trimming a sample and adjusting where it overlaps and wouldn't affect the actual tone of the voice? sure it might sound super choppy without being oto'd but if a voice is going to sound metallic wouldn't it sound metallic even without being oto'd?
 

m170

Ritsu's Renegades
Defender of Defoko
like dangosan said, i think it's because UTAU wasn't developed by professionals like Melodyne is. And Melodyne isn't free if i remember correctly so that could explain some.
It'd be nice if there was a program that did change samples' pitches without making it metallic. Maybe it would sound a lil Vocaloid-y. I wonder if there are free voice synthesizers that actually do that. In my opinion NiaoNiao, pretty much the chinese version of UTAU, sounds less metallic but it sounds way more choppy..
 

Kiyoteru

UtaForum power user
Supporter
Defender of Defoko
that does make sense. however i thought oto'ing was basically sort of like cropping/trimming a sample and adjusting where it overlaps and wouldn't affect the actual tone of the voice? sure it might sound super choppy without being oto'd but if a voice is going to sound metallic wouldn't it sound metallic even without being oto'd?

If you only set a short segment of the sample to be stretched/looped then of course you'll end up with an unnatural tone. That's why longer samples are preferred for realism.
 

수연 <Suyeon>

Your friendly neighborhood koreaboo trash
Supporter
Defender of Defoko
The closest thing to a software that doesn't distort the voice too much - as far as resamplers go - would probably be moresampler (tho your mile may vary - that's why you test every resampler to find the best sounding one). That resampler more or less recreates the samples to sound like the original recordings. That's the intention at least. As for any additional metallic ring that may still exist... It's all down to the noise captured in the recording process and whether or not a particular bank has multipitch. The less range a library has to work with (1 pitch vs 6 pitches that spans the singer's range for ex.), the more metallic the sound gets as you move further away from the original pitch. The breathier a voice is (or if the mic is LQ and has lots of self-noise; or noise is captured from the ambiance in the recording environment), the more noise it generates.

Melodyne is a $99-699 piece of software that is specifically made for pitch correction. It's not made to replace singing - more to correct a sour note - which is why it doesn't alter the voice in a way that sounds unnatural (unless you want to sound like T-Pain or Kanye). UTAU (and Vocaloid) are synthesizers that are intended as demo material (for ex. you're getting a demo of sorts of the vocalist Gackt before he actually sings in your studio in the flesh). UTAU in particular is made by a hobbyist for hobbyists and most hobbyists use less than HQ microphones which means that they'll have a more metallic twang. Vocaloid's synth is better at avoiding a metallic twang, but the downside is that the un-tuned vocal can sound "dead." Cevio is like UTAU in that it has the potential to sound like the actual provider with more expression, but its transitions aren't quite as smooth (can sound like a CV library that needs more tweaking) and there's more engine noise. The most realistic engine on the market is WowTune/Revivos(?). I believe that is the answer you would be looking for - a synth that doesn't distort or sound metallic away from the base pitch and really is a singer in a box. Too bad us mere mortals would never have access to it, lol.
 

na4a4a

Outwardly Opinionated and Harshly Critical
Supporter
Defender of Defoko
resamplers have to do pitch shifting, formant correction, and time stretching.

Things like Melodyne and Autotune have had lots of money and time put into them by large teams but also the use case is totally different, working with real singingi in the first place rather than a jumble of samples.
Even the default resampler can do pitch shifting pretty well it's when you start stretching and doing vibrato that everything sort of falls apart a little bit. If I'm not mistaken it's based on tandem-straight so it's not like it was 100% made by one guy.

Each resampler uses different methods. Fresamp separates separates samples into frames/segments. EFB-GT/GW uses WORLD magic.
Resamplers like bkh01 and moresampler more or less (re?)synthesize the voice samples. BKH01 is a bit more dirty about it and also redoes everything on the fly each time while Moresampler saves analytic data of each sample and uses that instead of the actual wav file.
VS4u is literally just vocalshifter so it's like it's kinda like pre-autotune.

The default resampelr actually sounds worse than the previous version in some cases. In all honesty it's much more fair to look into other resamplers for comparison.
Fresamp with the F2L2 flags will sound good and fresamp14 has a version that can use NVIDIA cards to run really fast. EFB-GW is good for some males. BKH01 is good if you really want to mess with a voice and do something unnatural. Moresampler tries to be as close to the original source as possible but is still under heavily development so it can have the occasional glitch, it also has a lot of custom features to transform a voice in a way that is extremely useful.

Fresamp11/14(with flags as it's default settings aren't desirable), EFB-GT/GW, and Moresampler are three resamplers that almost never sound metallic and if they do it's usually from sample quality issues. Resampler, Tips, tn_fnds, M4, w4u are very often metallic sounding but work for a few voices.

A lot of the time the failure of the resampler can be due to the quality of your samples, the more noise the smaller the useful range. Also you want to use the large, CONSISTENT part of your samples. So the area you want to stretch/loop should be as uniform as possible. Utau can sound fantastic if you have the right combination of configuration and such. No matter how long you set the vowel segment to be stretched it will not compensate for noise in your samples, you will need to just live with that.
 

Similar threads