Raspberry Pi and offline speech recognition

Yes, I must admit the test phase went longer than initially thought. But good things take time, right. When one develops a speech recognition software or a pattern detection system stuff can go horrible wrong and the learning curve potentiates at some point.

But anyway, what the hell am I talking about? In a nutshell about SoPaRe. Or the SOund PAttern REcognition project. With Sopare and a Raspberry Pi (technically it works on any Linux system with a multi core environment) everybody can voice control stuff. Like lights, robotic arms, general purpose input and output…offline and in real time.

Voice, speech, microphone, Raspberry Pi, Sound Pattern Recognition with Sopare, voice control a robotic arm

Even without a wake up word. The local dependencies are minimal. Sopare is developed in Python. The code is on GitHub. Cool? Crazy? Spectacular? Absolutely! I made a video to show you whats possible. In the video I control a robotic arm. With my voice. Running Sopare on a Raspberry Pi. In real time. Offline. And it is easy as eating cake. You are pumped? So am I. Here it is:

I may prepare some more tutorials about fine tuning, increase precision and the difference between single and multiple word detection. Let me know what you are using Sopare for, what’s missing or about potential issues.

Happy voice controlling. Have fun 🙂

12 thoughts on “Raspberry Pi and offline speech recognition

  1. Hello Martin!

    Is it possible, too for sopare to learn non-speech patterns? I’m thinking of monitoring our coffee maker at the office, which depending on the brew has his distinguished melody. 😉

    Kind regards,
    Sebastian

  2. Hi Sebastian,

    yes, that should be possible. Currently many settings in the config are optimized for human voice, like the LOW_FREQ and HIGH_FREQ. But you can configure sopare to filter specific frequencies and for example spend more attention to the dominant frequency (f0) or to the wave model instead of the FFT result. It’s also possible to train really rough (MIN_PROGRESSIVE_STEP and MAX_PROGRESSIVE_STEP) which could help if the sound is not really repeatable. I think the hard part is to „separate“ the coffee machine specific sound … but sounds like fun and you should give it a try 🙂

  3. Hi Martin,
    thank you for your nice idea and project, great work. Voice recognition independent from Google&Co is great for an autonomous project. I’d like to try it, could you please add a little manual for dummies? Would be nice! I’m not so sure how to handle all the files …
    Kind regards,
    Markus

    • Hi Markus,

      great that you like the project 🙂 The current plan is to provide a short manual „how to start from scratch“ within the next two weeks. So please stay tuned!

  4. Thank you very much for sharing this project.
    It is very interesting and works without any effort on Orange PI LITE.

    Question: Can take the pattens from files already generated in .wav format?
    This would avoid having to use the microphone to capture each of the sounds.

    Greetings from Mallorca.

    • You are welcome and great to hear that it works for you on an Orange PI LITE!

      In regards to your questions: training from „wav“ files is currently not supported. But there is an option to store recorded input:
      ../sopare.py -w samples/test.raw
      and the recorded file can be used for training or testing:
      ./sopare.py -r samples/test.raw – t test -v

      As this options are already available, it should be possible to create a converter to transform „wav“ to „raw“ files, to achieve what you are asking for. I’ve added this to my to-do list, but I remember something that there was an issue converting „wav“ files so it’s not guaranteed that this becomes a feature 😉

      Saludos

      • Thank you for answering so quickly!

        With your suggestion I have found this great tutorial where it explains how to do the conversion
        https://www.hellomico.com/getting-started/convert-audio-to-raw/

        After that, I have using:
        ./sopare.py -r samples/test.raw -t test -v
        ./sopare.py -c
        But there are no changes in the dict.json file so it does not recognize the new „test“ pattern.
        There is any way to attach or send the test.raw so that you take a look and validity this method of conversion?

        Thanks in advance!

      • I have also tried:

        ./sopare.py -w samples/test.raw
        ./sopare.py -r samples/test.raw -t test -v
        ./sopare.py -c
        The same, no changes.

        But, if you open the test.raw file generated by sopare in audacity, the format is Signed 16 bit PCM – Little-endian – 1 Channel (MONO) and the audio is correct, same like generated by the conversion from wav to raw.

        • Hmm, what’s possible is that the volume is below the THRESHOLD and therefore „test“ doesn’t show up in the dict.json file as it was never learned, but this is just a wild guess. I tested again and as expected, everything works in my environment 🙂

          I would like to suggest to move this thread and the issue we are discussing to GitHub, as you can easily attach files, track the issue and I can close it when its done: https://github.com/bishoph/sopare/issues/new

          BTW: You can check what’s already learned with the command:

          ./sopare.py -o

          The output is a string pair which consists of the ID and a UUID, which corresponds with the a JSON file in the dict/directory (even if the suffix pretends something different) …

          In the case you want to see only unique learned IDs, this command chain is quite handy:

          ./sopare.py -s '*' | sed 's/[^a-z].*//' | sed 's/\//g' | grep -v '^$' | sort | uniq

          Thanks y un saludo 🙂

Schreibe einen Kommentar

Deine E-Mail-Adresse wird nicht veröffentlicht. Erforderliche Felder sind mit * markiert.