In this article we propose two algorithms for discourse prosodic feature interpretation.
The first algorithm based on wide phonetic categories and second algorithm based on audio
signal melodic cross-correlation functions and short-timed energy series – as well as methodical
recommendations for their use are proposed as a part of the problem of audio signal language
identification based on a prosodic approach. An experimental evaluation of both algorithms is
proposed. Neural networks are used as a decision rule. Wide phonetic categories were pause,
pitch, noise. We have expanded wide phonetic categories to pause, pitch, noise, five levels of
pitch, sites of decreasing energy, main maximum, adverse maximum. The total number of
categories was 14. These algorithms can be applied for language identification or speaker
identification. At the same time there is no requirement to restore the speech signal after
processing it by low-speed codec. Certainly, frames of the speech codec must contain such
parameters as pitch, tone-noise parameter, energy. The base of speech signals consists of 10
languages 10 speakers per language. Total time of the speech per speaker is 100 minutes. This
time takes into account statistical regularities of languages. Tests for evaluation of the algorithms
were carried out with a multilayer perceptron.