Raspberry Pi and offline speech recognition

Yes, I must admit the test phase went longer than initially thought. But good things take time, right. When one develops a speech recognition software or a pattern detection system stuff can go horrible wrong and the learning curve potentiates at some point.

But anyway, what the hell am I talking about? In a nutshell about SoPaRe. Or the SOund PAttern REcognition project. With Sopare and a Raspberry Pi (technically it works on any Linux system with a multi core environment) everybody can voice control stuff. Like lights, robotic arms, general purpose input and output…offline and in real time.

Voice, speech, microphone, Raspberry Pi, Sound Pattern Recognition with Sopare, voice control a robotic arm

Even without a wake up word. The local dependencies are minimal. Sopare is developed in Python. The code is on GitHub. Cool? Crazy? Spectacular? Absolutely! I made a video to show you whats possible. In the video I control a robotic arm. With my voice. Running Sopare on a Raspberry Pi. In real time. Offline. And it is easy as eating cake. You are pumped? So am I. Here it is:

I may prepare some more tutorials about fine tuning, increase precision and the difference between single and multiple word detection. Let me know what you are using Sopare for, what’s missing or about potential issues.

Happy voice controlling. Have fun 🙂

16 thoughts on “Raspberry Pi and offline speech recognition

  1. Hello Martin!

    Is it possible, too for sopare to learn non-speech patterns? I’m thinking of monitoring our coffee maker at the office, which depending on the brew has his distinguished melody. 😉

    Kind regards,
    Sebastian

  2. Hi Sebastian,

    yes, that should be possible. Currently many settings in the config are optimized for human voice, like the LOW_FREQ and HIGH_FREQ. But you can configure sopare to filter specific frequencies and for example spend more attention to the dominant frequency (f0) or to the wave model instead of the FFT result. It’s also possible to train really rough (MIN_PROGRESSIVE_STEP and MAX_PROGRESSIVE_STEP) which could help if the sound is not really repeatable. I think the hard part is to „separate“ the coffee machine specific sound … but sounds like fun and you should give it a try 🙂

  3. Hi Martin,
    thank you for your nice idea and project, great work. Voice recognition independent from Google&Co is great for an autonomous project. I’d like to try it, could you please add a little manual for dummies? Would be nice! I’m not so sure how to handle all the files …
    Kind regards,
    Markus

    • Hi Markus,

      great that you like the project 🙂 The current plan is to provide a short manual „how to start from scratch“ within the next two weeks. So please stay tuned!

  4. Thank you very much for sharing this project.
    It is very interesting and works without any effort on Orange PI LITE.

    Question: Can take the pattens from files already generated in .wav format?
    This would avoid having to use the microphone to capture each of the sounds.

    Greetings from Mallorca.

    • You are welcome and great to hear that it works for you on an Orange PI LITE!

      In regards to your questions: training from „wav“ files is currently not supported. But there is an option to store recorded input:
      ../sopare.py -w samples/test.raw
      and the recorded file can be used for training or testing:
      ./sopare.py -r samples/test.raw – t test -v

      As this options are already available, it should be possible to create a converter to transform „wav“ to „raw“ files, to achieve what you are asking for. I’ve added this to my to-do list, but I remember something that there was an issue converting „wav“ files so it’s not guaranteed that this becomes a feature 😉

      Saludos

      • Thank you for answering so quickly!

        With your suggestion I have found this great tutorial where it explains how to do the conversion
        https://www.hellomico.com/getting-started/convert-audio-to-raw/

        After that, I have using:
        ./sopare.py -r samples/test.raw -t test -v
        ./sopare.py -c
        But there are no changes in the dict.json file so it does not recognize the new „test“ pattern.
        There is any way to attach or send the test.raw so that you take a look and validity this method of conversion?

        Thanks in advance!

      • I have also tried:

        ./sopare.py -w samples/test.raw
        ./sopare.py -r samples/test.raw -t test -v
        ./sopare.py -c
        The same, no changes.

        But, if you open the test.raw file generated by sopare in audacity, the format is Signed 16 bit PCM – Little-endian – 1 Channel (MONO) and the audio is correct, same like generated by the conversion from wav to raw.

        • Hmm, what’s possible is that the volume is below the THRESHOLD and therefore „test“ doesn’t show up in the dict.json file as it was never learned, but this is just a wild guess. I tested again and as expected, everything works in my environment 🙂

          I would like to suggest to move this thread and the issue we are discussing to GitHub, as you can easily attach files, track the issue and I can close it when its done: https://github.com/bishoph/sopare/issues/new

          BTW: You can check what’s already learned with the command:

          ./sopare.py -o

          The output is a string pair which consists of the ID and a UUID, which corresponds with the a JSON file in the dict/directory (even if the suffix pretends something different) …

          In the case you want to see only unique learned IDs, this command chain is quite handy:

          ./sopare.py -s '*' | sed 's/[^a-z].*//' | sed 's/\//g' | grep -v '^$' | sort | uniq

          Thanks y un saludo 🙂

  5. Hi Martin:

    I am very interested in getting SoPaRe working on a Raspberry Pi 3 with Rasbian Jessie O/S so that I can use it to create a voice controller for the trolling motor on my fishing boat. It sounds like just the thing I need and appreciate your efforts and that you shared it.

    I went through a fresh Rasbian Jessie install on a 16Gb SD card, performed the update to the O/S, cloned SoPaRe and installed all the dependencies listed on your web link describing what to install. But when I try:

    $ sopare/sopare.py -l -v

    from my home directory, after about 5 seconds I get:

    sopare 1.3.0

    I try speaking into the microphone and get no response. I confirmed that the microphone is working through other software. I tried moving the threshold value down from 400 to 200 and then down to 10 but still no response. After about 10-15 seconds with no response, I get:

    Segmentation fault

    So, I’m at a loss as to what I did wrong. Do you have any insight into what could be wrong? Is there any way to get more detailed diagnostics that might point me toward what the problem might be? Is there any information that I could generate and send you to diagnose the issue?

    Thanks!

    Jeff

    • Hi Jeff,

      a segmentation fault means that something went wrong in regards to memory access. As SOPARE is pure python the issue must be located somewhere else. My guess is that pyaudio is the culprit as there are others with a similar problem and without SOPARE:
      https://www.raspberrypi.org/forums/viewtopic.php?f=32&t=77696

      Unfortunately, I have no clue about the root cause and a quick search does not show any steps how to solve this.

      Please share your sopare/config.py and your sound card model/name and I try to reproduce the issue. You may want file a new GitHub issue as this forum is not very handy in terms of file uploads and bug tracking: https://github.com/bishoph/sopare/issues

      Let’s see what we can do!

  6. Great work! This sounds like the software I was looking for for quite a while now.

    Just to be sure about my plan/whish:
    Could it be possible to set up a Raspberry with this software, attach a microphone to the Raspberry, plug the Raspberry via USB to a PC and (for example) say „e, space, k, k, p, l, enter, 1, 6, s“ and the Raspberry acts like a keyboard?

    Could I combine this with a footpedal to activate/deactivate the mic?

    Could I use my own commands for different keystrokes? For example „extract“ triggers keystroke „e“, „delete“ triggers shortcut „strg x“.

    Could it talk to a specific software via the API?

    Thanks and please excuse my noobness. This is completely new to me. Due to a condition with my wrists I am looking for alternatives to use a software.

  7. Hey Thorsten,

    you want a remote voice controlled keyboard, right. Some of your use cases are technically feasible, like turn on/off the mic via a pedal (you could leverage the GPIO interface) but I have no clue about the Raspberry – PC bridge to be honest.

    If you want to give it a try and develop this by yourself I would start with a proof of concept of the most important stuff and see how far you can go. See, SOPARE is a tool that that was designed to learn certain sounds and to make predictions for sound input. With the simple plugin interface you can assign any further logic to the given predictions – including shortcuts and alike. In theory this could work even if this sounds like something SOPARE was not designed for in the first place 😉

Schreibe einen Kommentar

Deine E-Mail-Adresse wird nicht veröffentlicht. Erforderliche Felder sind mit * markiert.