Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

will there be support for android in near future? #10

Open
cogmeta opened this issue Oct 2, 2019 · 18 comments
Open

will there be support for android in near future? #10

cogmeta opened this issue Oct 2, 2019 · 18 comments

Comments

@cogmeta
Copy link

cogmeta commented Oct 2, 2019

No description provided.

@yodakohl
Copy link
Collaborator

yodakohl commented Oct 2, 2019

Very likely.
After a quick look it should be sufficient to add the Android-NDK to the Travis environment and build a number of selected targets (armeabi-v7a, arm64-v8a ..) using the android toolchain.

Unfortunately I'm not very familiar with the android ecosystem so there might be some hidden pitfalls. I going to give it a try and see if I run into any problems.

@cogmeta
Copy link
Author

cogmeta commented Oct 2, 2019

Cool. We can help. If I understand correctly, there is other no other dependency than kissfft. Right?

@yodakohl
Copy link
Collaborator

yodakohl commented Oct 3, 2019

Sorry for the delay.
The dependencies are Tensorflow-Lite and pfft (kissfft was swaped out in favor of pffft).

I managed to successfully compile everything locally. The part I missed was writing a jni interface for java/kotlin. I already implemented parts of the interface. I will push the final version of the lib in
a day or two.

Some parts still need to be done after that: Recording and buffering the audio as well as a small gui example. (record.py and ringbuffer.py )

@cogmeta
Copy link
Author

cogmeta commented Oct 4, 2019

Awesome! Looking forward to test it.

@yodakohl
Copy link
Collaborator

yodakohl commented Oct 5, 2019

Just a quick update. The changes are live in the 0.3.5 branch, but mostly untested and building the library is still a bit of a hassle. I set up a temporary repository at https://github.com/yodakohl/AudioRecognition_AndroidTest . The repository features a pre-build version of the X86 library for android and a kotlin interface to the lib. This should be sufficient to get an app running.

I will expand the temp repository to a minimal Keyword Spotting App.

@cogmeta
Copy link
Author

cogmeta commented Oct 5, 2019

@yodakohl Great! We will check it and report back.

@cogmeta
Copy link
Author

cogmeta commented Oct 8, 2019

Thank you. We had some trouble to get it compiled but we resolved it. We have now issue with detection. First of all, we are little confused as to why signalToMel function is returning 800 mel values when fed byteArray of size 3200 (which is 1600 samples)? It should written 400 mel values. right?

We used test.cpp to compare the reference mel values. The 400 values match as expected but then the puzzling question is why signalToMel is returning 800 instead of 400 values on byteArray size of 3200.

And oh, the runDetetion always return zero. even on valid "marvin" wav file (with marvin model file)

@yodakohl
Copy link
Collaborator

yodakohl commented Oct 8, 2019

The signalToMel function expects 3200 samples which are 6400 bytes.
At 16000 Samples per second this represents 200ms of Audio.
The Detector shifts its internal 1000ms feature buffer by this 200ms and runs 5 detections per second.
So 800 Mel Features per 200ms Audio is expected. I have some additional documentation here.

Do you mind sharing your wave reading code in a gist so i can integrate it into the android example and test it myself?

@cogmeta
Copy link
Author

cogmeta commented Oct 8, 2019

When fed with 6400 bytes it crashes.
Screenshot_2019-10-08-19-33-54-640_com miui bugreport

@cogmeta
Copy link
Author

cogmeta commented Oct 8, 2019

I agree that when 6400 bytes are fed you expect to get 800 mels
similarly when you feed 3200 bytes you should get 400 mels but your function return 800 (in android lib)

@cogmeta
Copy link
Author

cogmeta commented Oct 8, 2019

  val data = application.assets.open("marvin.raw").readBytes()
  val dataSize = 3200
  Log.e("DetectionStart", "---------------")
  for ((sections, _) in (0..data.size / dataSize).withIndex()) {
            val audioBuffer = data.copyOfRange(sections, sections + dataSize)
            val mel = featureExtractor.signalToMel(audioBuffer, 1.0F)
            val result = audioRecognizer.runDetection(mmel)
            if (result != 0)
                Log.e("DetectionResult", result.toString())
 }
 Log.e("DetectionEnd", "---------------")

Is this right?

@yodakohl
Copy link
Collaborator

yodakohl commented Oct 8, 2019

Ah ok, seems like I have a bug in the jni binding (lib.cpp)

`jbyteArray Java_com_nyumaya_audiorecognition_NyumayaLibrary_signalToMel(JNIEnv env, jobject obj,jlong impl,jbyteArray pcm,jfloat gain)
{
FeatureExtractor
f = (FeatureExtractor*) impl;
int inlen = env->GetArrayLength(pcm);
jbyte *pcm_arr = env->GetByteArrayElements(pcm, 0);

//FIXME: Properly get required size
uint8_t result[1024];
int outlen = f->signal_to_mel((int16_t *)pcm_arr,inlen,result,gain);

env->ReleaseByteArrayElements(pcm, pcm_arr, 0);

return ToJavaByteArray(env,result,outlen);

}`

I'm getting the inlen from the byte array and then cast it to int16_t. I will probably have to divide inlen by two before feeding it to signal_to_mel.

@cogmeta
Copy link
Author

cogmeta commented Oct 8, 2019

glad you found it :). will wait for the fix.

@cogmeta
Copy link
Author

cogmeta commented Oct 8, 2019

I fixed in lib.cpp and rebuilt the library. Now it return correct numberl of mels (400). But the runDetection still return 0 even on valid marvin wav file.

@yodakohl
Copy link
Collaborator

yodakohl commented Oct 8, 2019

Yeah, same here. I'm currently trying to find the issue

@yodakohl
Copy link
Collaborator

yodakohl commented Oct 8, 2019

Got a detection result. The slicing code is wrong:

for ((sections, _) in (0..(data.size / dataSize)-1).withIndex()) {
            val sectionEnd =  sections*dataSize + dataSize
            val sectionStart = sections*dataSize
            val audioBuffer = data.copyOfRange(sectionStart, sectionEnd)

@cogmeta
Copy link
Author

cogmeta commented Oct 8, 2019

Yep. got it working. Awesome! Thanks

@cogmeta
Copy link
Author

cogmeta commented Oct 10, 2019

Can you please add the licence file?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants