Thuum.org

A community for the dragon language of The Elder Scrolls V: Skyrim

Thuum.org

A community for the dragon language of The Elder Scrolls V: Skyrim

Project: Dovahzul Text-to-Speech

 1 

Frinmulaar
February 6, 2015

I have been gobbling up any and all information about Dovahzul for a week now, and it seems to me that this language is perfectly suited for the other thing I know how to do. Which is TTS.

TTS (Text-to-Speech Systems) are software programs that convert written language to audible speech. They're used as screen readers for the visually impaired, in very small screenless mobile devices, and for announcing the names of stations in some trains. Since I learned how to write Java code, I've been fascinated by audio systems, particularly human-mimicking ones. And it just so happens that Dovahzul is regularly pronounced as spelled, and doesn't require extensive knowledge of word stresses.

So, what I'm thinking of is a simple text window. The user types Dovahzul phrases, presses a button, and hears the input text spoken in the breathy bass voice of a dragon. This could perhaps be used to learn pronunciation at some point.

I think this might just be possible given what I know about programming and recording. Should anything decent come out of it, I will share the program. What do you think?

by Frinmulaar
February 6, 2015

I have been gobbling up any and all information about Dovahzul for a week now, and it seems to me that this language is perfectly suited for the other thing I know how to do. Which is TTS.

TTS (Text-to-Speech Systems) are software programs that convert written language to audible speech. They're used as screen readers for the visually impaired, in very small screenless mobile devices, and for announcing the names of stations in some trains. Since I learned how to write Java code, I've been fascinated by audio systems, particularly human-mimicking ones. And it just so happens that Dovahzul is regularly pronounced as spelled, and doesn't require extensive knowledge of word stresses.

So, what I'm thinking of is a simple text window. The user types Dovahzul phrases, presses a button, and hears the input text spoken in the breathy bass voice of a dragon. This could perhaps be used to learn pronunciation at some point.

I think this might just be possible given what I know about programming and recording. Should anything decent come out of it, I will share the program. What do you think?


paarthurnax
Administrator
February 6, 2015

This would be great! If there's any way to make a web-friendly TTS program (versus one you have to install), that would be even more amazing.

by paarthurnax
February 6, 2015

This would be great! If there's any way to make a web-friendly TTS program (versus one you have to install), that would be even more amazing.


linkjr87
February 8, 2015

This would be awesome. I'm new to this, but trying to learn as much as I can, studying some every day. This would be a great tool for beginners like me to learn the proper pronunciation of words in dovahzul.

by linkjr87
February 8, 2015

This would be awesome. I'm new to this, but trying to learn as much as I can, studying some every day. This would be a great tool for beginners like me to learn the proper pronunciation of words in dovahzul.


Frinmulaar
February 9, 2015

Thanks for the kind words! I have now written the "text normalization unit" which turns raw input text into an exact description of what to pronounce and when. Input is accepted in both the "bahlok wah diivon fin lein" format and the"b4lok w4 d3von fin l2n" format. I'm also studying how to make applications into applets for easy use as part of a web page.

What remains untouched is the detailed nature of the voice and the sound system. I'll post again when I've made more progress.

by Frinmulaar
February 9, 2015

Thanks for the kind words! I have now written the "text normalization unit" which turns raw input text into an exact description of what to pronounce and when. Input is accepted in both the "bahlok wah diivon fin lein" format and the"b4lok w4 d3von fin l2n" format. I'm also studying how to make applications into applets for easy use as part of a web page.

What remains untouched is the detailed nature of the voice and the sound system. I'll post again when I've made more progress.


linkjr87
February 9, 2015

Pruzah! I've discovered the memrise learning programs, and started them last night. This TTS combined with that would be great help to anyone new learning dovahzul.

by linkjr87
February 9, 2015

Pruzah! I've discovered the memrise learning programs, and started them last night. This TTS combined with that would be great help to anyone new learning dovahzul.


Maakrindah
April 24, 2015

That sounds awesome!! Do you think the TTS could be made into a mobile app? Either way I'm impressed and delighted.

by Maakrindah
April 24, 2015

That sounds awesome!! Do you think the TTS could be made into a mobile app? Either way I'm impressed and delighted.


PapercutPegasus
May 4, 2015

Still eargly waiting 

by PapercutPegasus
May 4, 2015

Still eargly waiting 


Frinmulaar
May 5, 2015

Okay, here's a sample of what I'm struggling with. Krosis for the wait.

The plan is to record three units per vowel rune; accented, monotonous and falling. That sample is entirely monotone so its emphasis sounds essentially random, but other styles should help somewhat once they come along.

For those interested, the sound is made with a mixed phone-diphone approach in a CV+C pattern. "Suleykaar" is su+ley+kaa+r.

I can still change the units pretty freely at this point. Comments on pitch, timbre, rate of speech? The perfect combination is yet to be found, but we'll get there.

by Frinmulaar
May 5, 2015

Okay, here's a sample of what I'm struggling with. Krosis for the wait.

The plan is to record three units per vowel rune; accented, monotonous and falling. That sample is entirely monotone so its emphasis sounds essentially random, but other styles should help somewhat once they come along.

For those interested, the sound is made with a mixed phone-diphone approach in a CV+C pattern. "Suleykaar" is su+ley+kaa+r.

I can still change the units pretty freely at this point. Comments on pitch, timbre, rate of speech? The perfect combination is yet to be found, but we'll get there.


hiith
May 5, 2015

Neat! I hope that eventually we could use this with Speach-to-Text to have Draconic conversation with computers and robots. It's a very nice vision I have, but it's not like I'm going to do it.

It speaks too quickly, I believe. It's also very difficult to understand (but I've always had a problem with understanding speech, so I might be exaggerating). I'm hoping that the additional styles will make it seem clearer. It may also be a bit too deep of a tone.

Keep up the nice work!

by hiith
May 5, 2015

Neat! I hope that eventually we could use this with Speach-to-Text to have Draconic conversation with computers and robots. It's a very nice vision I have, but it's not like I'm going to do it.

It speaks too quickly, I believe. It's also very difficult to understand (but I've always had a problem with understanding speech, so I might be exaggerating). I'm hoping that the additional styles will make it seem clearer. It may also be a bit too deep of a tone.

Keep up the nice work!


Frinmulaar
May 7, 2015

Thank you for the very well worded critique, hiith! It'll be a pleasure to solve these issues.

I'm not quite satisfied with the intelligibility either. CV+C can only go so far, so I think I'll go full diphone with vowel cores. That would make "suleykaar" s(u)+u+l(e)+ey+k(a)+aa+(a)r. More recording, but consistently better results.

Too quick, you say? If that's still a problem with the new units, I'll look into it. Paarthurnax speaks about 150 syllables per minute and the current system about 110, and I wouldn't like to slow it down more unless absolutely necessary.

As for the pitch, you're right. Turns out it's too deep by a major sixt.

by Frinmulaar
May 7, 2015

Thank you for the very well worded critique, hiith! It'll be a pleasure to solve these issues.

I'm not quite satisfied with the intelligibility either. CV+C can only go so far, so I think I'll go full diphone with vowel cores. That would make "suleykaar" s(u)+u+l(e)+ey+k(a)+aa+(a)r. More recording, but consistently better results.

Too quick, you say? If that's still a problem with the new units, I'll look into it. Paarthurnax speaks about 150 syllables per minute and the current system about 110, and I wouldn't like to slow it down more unless absolutely necessary.

As for the pitch, you're right. Turns out it's too deep by a major sixt.


Dezonikso
May 7, 2015

I am sooooo excited for this to finally be finished! I feel like the syllables per minute measurement you gave for Paarthurnax should be taken as an average for more of the speaking dragons in the game. Perhaps Alduin, Odahviing, Durnehviir, and Sahrotaar's syllables per minute? I'd also be interested to know if they speak faster in English or Dovahzul (unless that's what you measured when you mentioned it before?).

by Dezonikso
May 7, 2015

I am sooooo excited for this to finally be finished! I feel like the syllables per minute measurement you gave for Paarthurnax should be taken as an average for more of the speaking dragons in the game. Perhaps Alduin, Odahviing, Durnehviir, and Sahrotaar's syllables per minute? I'd also be interested to know if they speak faster in English or Dovahzul (unless that's what you measured when you mentioned it before?).


HugoC4
May 7, 2015

Hey Ferymulaar, I honestly think this is such a cool idea. And seeing as that sound sample already sound pretty great, I can't wait to see/hear the finished thing.
I was wondering. Are you just gonna keep it with one voice?
I don't know how difficult the recording process for a voice is, but if it's doable by listening to existing examples I could maybe record one too.
Just wondering. ^^
Is this thing going to be open source btw? I'm studying to become a programmer and I'd love to see something like this. c:

by HugoC4
May 7, 2015

Hey Ferymulaar, I honestly think this is such a cool idea. And seeing as that sound sample already sound pretty great, I can't wait to see/hear the finished thing.
I was wondering. Are you just gonna keep it with one voice?
I don't know how difficult the recording process for a voice is, but if it's doable by listening to existing examples I could maybe record one too.
Just wondering. ^^
Is this thing going to be open source btw? I'm studying to become a programmer and I'd love to see something like this. c:


Frinmulaar
May 7, 2015

Hi, Hugo. You flatter me! The question of multiple voices is interesting, and the answer is by no means set in stone. Right now I think it's better to have one generic voice, rather than having "Dragon 1", "Dragon 2", and so on. Dragons don't even have genders, so there's little variation to be captured.

Co-producing a voicebank would be quite the feat indeed. Thing is, I have previously recorded exactly one diphone bank, and that one was far less complex. Before we even consider such an attempt, I'd like to find out exactly what works and what doesn't, so we don't end up wasting time and effort.

Also, code licensing is a big unknown to me atm. If and when the project becomes more than a bug-filled skeleton, it might be worthwhile to release it as a "modify and distribute freely as long as you credit me and include this text" type thing.

by Frinmulaar
May 7, 2015

Hi, Hugo. You flatter me! The question of multiple voices is interesting, and the answer is by no means set in stone. Right now I think it's better to have one generic voice, rather than having "Dragon 1", "Dragon 2", and so on. Dragons don't even have genders, so there's little variation to be captured.

Co-producing a voicebank would be quite the feat indeed. Thing is, I have previously recorded exactly one diphone bank, and that one was far less complex. Before we even consider such an attempt, I'd like to find out exactly what works and what doesn't, so we don't end up wasting time and effort.

Also, code licensing is a big unknown to me atm. If and when the project becomes more than a bug-filled skeleton, it might be worthwhile to release it as a "modify and distribute freely as long as you credit me and include this text" type thing.


HugoC4
May 8, 2015

^^ You're welcome.
Of course, that would be the best idea for now. Although, maybe you could have a dragon voice and a Greybeard, or human, voice. Maybe that'd help with people who are just starting out on Dovahzul, like me. XD

It would be quite the experience for the both of us I think. :P Ah, okay. 

Ah, it kinda is to me too, but I presume there must be some sort of license for that. :P

by HugoC4
May 8, 2015

^^ You're welcome.
Of course, that would be the best idea for now. Although, maybe you could have a dragon voice and a Greybeard, or human, voice. Maybe that'd help with people who are just starting out on Dovahzul, like me. XD

It would be quite the experience for the both of us I think. :P Ah, okay. 

Ah, it kinda is to me too, but I presume there must be some sort of license for that. :P

This thread is more than 6 months old and is no longer open to new posts. If you have a topic you want to discuss, consider starting a new thread. Contact the administrator for assistance if you are the author of this thread.