Here is the audio file for Aggie's voice that I used with ZonosTTS to generate the lines. It's only 18 seconds of audio so even on a lower end machine it shouldn't take long to get Zonos rolling with it.
>>60274
Re-rolling every line until it's decent I guess. Maybe I got a decent grip on the sliders at this point.
>>60281
There are a few tools involved, but anons could do this themselves.
1. Find decent source for voice cloning.
(No background noise. No reverb. Emotional range can help.)
2. Gen audio in small sections. Keep re-rolling until it's decent.
4. Move to next line until done, then edit lines together in Tenacity.
5. Grab sound effects off youtube to use.
Become familiar with speaking rate, emotion sliders, pitch std. Don't overdo them. Usually only need like 2 emotions active at a time & don't crank them very high (uncheck Emotion under unconditional). Core Zonos UI says what pitch std you should use for what.
Tools used:
Boris FX CrumplePop Pro (to remove echo)
magnet:?xt=urn:btih:BD2F45110F041447CEB94AFB61DC6D88BC771F44
Core Zonos (voice cloning)
https://github.com/loscrossos/core_zonos
Audio Strip (separate vocals from background noise)
https://audiostrip.com/isolate
Tenacity (audio editor. get one with ffmpeg as it supports more formats)
https://codeberg.org/tenacityteam/tenacity/releases
JDownloader (search youtube for sound effects & download as AAC with this)
https://jdownloader.org/download/index
There are many other downloaders for youtube, but this one is constantly updated.
Zonos can also place an audio clip
before what it is generating to try to make something with a similar tone/emotion, but I haven't had much success with that yet so I haven't used it. cunny is more often pronounced correctly when spelled kuh'nee
Personally, I'm more into video generation lately with vocal gen secondary. Video takes ages, but I don't think you need a beefy machine for vocal gen so I'm sure other people can do it.