The best for the real time rendering would be a multithread process where it's cutting the rendering process in multiple threads, the first one where it's rendering the first half of the song, then other ones rendering the rest@piachuk
Most of the slowness comes from either making things single-threaded when they should be multi-threaded (loading voicebanks for example) or loading UI elements all at once instead of only when the user scrolls to them.
That said, the UTSU engine is separate from the main UI and could definitely use more optimizing. I'd be interested to see a C++ take on it, especially if it could efficiently handle temp files and caching! Real-time rendering (for every resampler except moresampler :[ ) is also a possibility as long as you start playing the first notes of a song while the later ones are still being added by the wavtool.
a solution to help real time rendering for instance when moving notes would be to have a kind of cache of prerendered "la" note for the voicebank for each note that would be played without having to process anything which would help when someone has difficulties to identify the note