Java – speaker recognition using marf

I am using marf (modular audio recognition framework) to recognize the sound of speakers

But I don't get the correct response, which means that the trained voice is different from the test voice, but marf is giving audio sampling matching

I've also experienced this link

https://stackoverflow.com/questions/4837511/speaker-recognition

result

Config: [SL: WAVE,PR: NORMALIZATION (100),FE: FFT (301),CL: EUCLIDEAN_DISTANCE (503),ID: -1]
         Speaker's ID: 26
   Speaker identified: G

Or I did something wrong or if there are other speaker recognition methods available

Now I use vtext, which can be easily used http://basic-signalprocessing.com/voiceRecognition.php According to this link, vtext also uses MATLAB and gives the output

I got the correct frequency time domain diagram, but I couldn't compare the two sound samples I received an error

Exception: com.mathworks.tool@R_587_2419@.javabuilder.MWException: Error using ==> eq
Matrix dimensions must agree.
{??? Error using ==> eq
Matrix dimensions must agree.

Error in ==> recognizePartial10k at 10


}

Anyone has any idea about it

Solution

First of all, according to my experience, using FFT algorithm will not give you the best result: try LPC in marf

Second: marf assumes what people call "closed set" speech, which means that even if the system does not know the speaker, it will always return results – > you must determine the possibility of response according to the distance threshold

Also ensure that the sliding window (Hamming window) size is set according to the sampling rate of the file: for example, a window with 512 sampling values, with a sampling rate of 22050 Hz, produces a window of about 1 The best results were returned on a data set of 500 speakers in 23 milliseconds

Since 22050 Hz means a large number of samples per second, the required length of about 25 ms can be found at any sampling rate: sampling rate / 1000 * 25

Note that the FFT algorithm used in marf requires a window with a power of 2 (256 / 512 / 1024 /...)

But this is not necessary for the LPC algorithm (although it may be a little efficient for the processor because the power of 2 is what it knows: -)

Ha, don't forget, if you use stereo files, the window time is twice... But I recommend using mono files: using multi-channel files for voice processing has no added value, it is longer and more inaccurate

A word about sampling rate: the selected sampling rate should be twice the highest frequency you are interested in Generally, people think that the highest frequency of speech is 4000 Hz, so the sampling rate of 8000 Hz is selected Note that this is not entirely true: "s" and "sh" sounds can reach higher frequencies Indeed, you don't need those frequencies to understand the speaker's meaning, but it may be useful to use a wider spectrum when extracting vocal singing My preference is 22050hz Some voice cipher packs do not allow you to go below 11000 Hz

A word about bit depth: 8 bits to 16 bits. Although the sampling rate is about the accuracy of time, the bit depth is related to the accuracy of amplitude 8 bits provide you with 256 values and 16 bits provide you with 65536 values

Needless to say, why should you use 16 bit vocal biometrics: -)

As a reference, the audio CD uses 44100HZ / 16 bits

About vtext: as I said before, Fourier transform (FFT) is not what I found for large data sets It lacks accuracy

There seems to be a problem when you delegate the calculation to Mathlab Without code, IMHO, it's almost impossible to give you more information

Don't hesitate to ask for clarification of what I said. I may take something for granted and don't realize that it's not so clear: -)

Fwiw, I have just written a speaker recognition tool called recognito in Java. I believe it is not better than marf in recognition ability, but it is certainly easier for users in the initial steps. You do not need your permission to use the mode. The software is open source and supports calls from multiple concurrent threads

If you want to give recognito a shot: https://github.com/amaurycrickx/recognito

The content of this article comes from the network collection of netizens. It is used as a learning reference. The copyright belongs to the original author.
THE END
分享
二维码
< <上一篇
下一篇>>