I need both speech output (as .MP3) and speechMarks output (as .JSON). Currently, I'm using two calls from the CLI based on the documentation, one call to generate each one for the same text.
I believe that means I'm getting billed twice for the same text. Is this correct? Are you billed twice, once for audio, and a second time for speech marks?
Also, this takes two calls / more time / seems to duplicate effort / the server must be doing the same computation twice.
Is there a way to make a single call that generates both speech audio output (mp3) and SpeechMarks (json) in a single call, and/or a way to pay once rather than twice for the same text?
Related question / Similar issue: I also need multiple speech variants for the same text to allow for end-user preferences (eg different voices, different speed/prosody). Is there a way to batch generate multiple sets of speech output from a single call to decrease speech generation cost for this situation, rather than paying for 2x the amount of text for each small variant?
Would prefer to do this using the CLI, but also fine to use tasks, the js API, the python API, etc.
Here are the docs with examples of the calls to generate audio and SpeechMarks:
https://docs.amazonaws.cn/en_us/polly/latest/dg/using-speechmarks.html
https://docs.amazonaws.cn/en_us/polly/latest/dg/get-started-cli-exercise.html
Thanks for your help