Amazon Chime SDK live transcription automatic language identification inaccuracies/instabilities

0

We currently employ Amazon Chime SDK for our Web-based meeting platform and have enabled meetings to support multiple languages thanks to the Amazon Chime SDK live transcription automatic language identification feature. However, we are experiencing many inaccuracies and instabilities with this system. For example, if we configure a meeting to support English, Spanish, and Japanese transcriptions and have different attendees join the meeting who speak each of these different languages then the language identification ultimately ends up all over the place in fairly short order and even more so as time goes on. As individuals reload and rejoin the meeting others who have already been identified as speaking one language are seemingly changed to speaking in another language, which results in incorrect transcripts as the language they are speaking no longer matches the language that the transcribe service is using to interpret their speech. Also, leaving and re-joining the meeting to switch the language that you are speaking seems to fall victim and perhaps even further exacerbate the issue.

The following is an example of the options we are using for the StartMeetingTranscriptionCommand command:

const command = new StartMeetingTranscriptionCommand({
  MeetingId: <awsMeetingId>,
  TranscriptionConfiguration: {
    EngineTranscribeSettings: {
      IdentifyLanguage: true,
      LanguageOptions: 'en-US,es-US,ja-JP',
    },
  },
});

The following are being experienced:

  • A single user can join a meeting by themselves, speak English and have their speaking language detected and transcribed in English. This user can then reload, and join the meeting by themselves, speak Spanish and have their speaking language detected and transcribed in Spanish. This user can then reload, and join the meeting by themselves, speak Japanese and have their speaking language detected and transcribed in Japanese. This user can then reload, and join the meeting by themselves, speak English and have their speaking language detected and transcribed in English. Note: Under the covers different meetings are being created in this case since this user is by themselves and there are no other attendees in the meeting to keep the meeting open.
  • User 1 can join a meeting in a muted state and contribute nothing to a meeting's transcriptions but keep the meeting open for user 2. User 2 can join a meeting by themselves, speak English and have their speaking language detected and transcribed in English. User 1 observes this correctly in the transcriptions. User 2 can then reload, and join the meeting by themselves, speak Spanish and have their speaking language detected and transcribed in Spanish. User 1 observes this correctly in the transcriptions. User 2 can then reload, and join the meeting by themselves, speak Japanese and have their speaking language detected and transcribed in Japanese. User 1 observes this correctly in the transcriptions. This user can then reload, and join the meeting by themselves, speak English and have their speaking language detected and transcribed in English. User 1 observes this correctly in the transcriptions.
  • User 1 can join a new meeting and speak English and be identified and transcribed in English correctly. User 2 can join the meeting and speak Spanish and be identified and transcribed in Spanish correctly. User 3 can join the meeting and speak English but be identified as speaking Spanish. User 1 or user 2 can refresh and speak either the same language as their original language and are almost always identified as speaking the incorrect language. Other users who have been in the meeting and did not refresh are also changed and identified to be speaking other languages.

There really are many variations of tests that we have run. However, the following is what we are expecting to occur:

  • Any user can join a meeting and speak any of the configured transcription languages and expect to be detected and transcribed in that language.
  • Any user can reload and re-join a meeting (e.g. brand new attendee) and begin speaking a different language and expect to be detected and transcribed in that language.
  • No users should ever be identified as speaking one language and then switched over to speaking a different language, regardless of what other attendees are doing (e.g. reloading/re-joining).

Any help or insights would be greatly appreciated. At this point we are at a bit of a loss. We understand that we can have no less than 2 and should have no more than 5 languages set when using the language identification feature for performance reasons. However, we have yet to receive accurate results as individuals speaking different languages begin and continue to contribute to the meeting.

Finally, is there any way for this to be set manually on a per-attendee basis? E.g. We offer an option for attendees to simply select the language that they will be speaking? I have not found this anywhere within the docs, but if it does exist would seem to solve for our current conundrum.

Thank you for your time and help in advance.

profile picture
Tim
asked 24 days ago123 views
No Answers

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions