You cut the silence off the beginning of a syllable.
Then, the part in pink is the consonant and the start of the vowel. This doesn't get stretched or warped by UTAU.
Then, the white area is the rest of your vowel. It gets stretched, warped, crossfaded, and blended into other vowels.
Lastly...