SOPARE architecture and plugins

After installing, configuring and training SOPARE, you want naturally do something with the recognized results. I run some installations where SOPARE turns lights on and off, controls a magic mirror and a robotic arm. With my voice. On a Raspberry Pi. Offline and in real time. How does that work? Glad you are asking. This post should give you an overview about SOPARE in terms of the architecture and provide some insights how to write your own custom plugins for further processing.

Architecture

SOPARE is written in Python and runs very well with Python 2.7 and was tested successfully on several Raspberry Pis running all kind of operating systems. In fact, SOPARE should run on all kind of *UNIX systems if the underlying hardware comes with a multi core CPU. Here you find more information about how to get started. Now let’s make a small walk through. When you start SOPARE with the command

./sopare.py

we can look at this simplified list what’s happening:

  1. Check for arguments from the command line and init parameters
  2. Load the config file
  3. Evaluate the init parameters
  4. Initialize the audio
  5. Read input from microphone
  6. Check if the sound level is above the THRESHOLD
    • Prepare
    • Filter
    • Analyse
    • Check for matching patterns
      • Call plugin(s)
  7. Stop everything after the configured timeout is reached

Below is an architecture overview:

SOPARE architecture overview

Now we can dig a bit deeper and do a more detailed view. The first thread (thread 1) is listening the whole time and records chunks of data. This small chunks of data are compared in terms of sound volume. Whenever one CHUNK volume is above the THRESHOLD the chunks are transformed, filtered and a characteristic is created. This happens in “thread 2”. The chain of characteristics are compared against trained results and the plugins are called in “thread 3”. Each thread runs on a different CPU core. This can be observed when SOPARE is running by starting the command

top -d 1

and then pressing “1” to show all CPUs:

Output from command top -d 1

Press “q” to exit the top output.

As we see in the above picture the SOPARE python processes with the corresponding PID 656, 655 and 649 I want to show two ways to kill SOPARE processes:

First via the “kill” command and a list of all PIDs:

kill -3 649 655 656

Second via “pkill” where the processes are killed based on a name:

pkill -3 -f ./sopare.py

I usually send a SIGQUIT as I get the dump to see what the program is doing but you can send any appropriate signal.

Ok. Now that we have touched the architecture and processes, let’s talk about plugins. Plugins are stored in the

plugins

directory. Below you see the plugin structure from my magic mirror pi:

SOPARE Plugin Structure

To create a plugin I recommend to copy the print plugin and create a “custom_plugin_directory” where you replace the name “custom_plugin_directory” with any meaningful name for your plugin:

cd plugins
cp -r print/ custom_plugin_directory

You now have a directory with all necessary files for your custom plugin in the directory “custom_plugin_directory” or whatever name you have chosen. Let’s modify the plugin and see what’s inside:

cd custom_plugin_directory

The file “__init__.py” is the one which is called from SOPARE when a sound is regognized. Open it:

nano __init__.py

You see some lines of code and the interesting part is this one:

def run(readable_results, data, rawbuf):
    print readable_results

The function “run” is called and the value “readable_results” is handed over as an array. This means that each recognized sound is inside this array with the corresponding ID. Let’s assume that you trained a sound with the ID “test”. When the same sound is recognized, then the ID “test” shows up like this:

[u'test']

If more words are recognized, you get all of them in the array in the same order as they were recognized:

[u'test', u'another_id', u'whatever']

You can now work with the results and write your own conditions. For example, here is the code that just checks for two words:

def run(readable_results, data, rawbuf):
    if ('some_word' in readable_results and 'another_word' in readable_results):
        print ('Tschakka! Got my two words...now do some awesome stuff')

Here is an example where the results must be in a specific order:

def run(readable_results, data, rawbuf):
    if (len(readable_results) == 2 and 'word_1' in readable_results[0] and 'word_2' in readable_results[1]):
        print ('Tschakka! Got my two words in the right order...now do some awesome stuff')

With this knowledge you are now hopefully able to read and understand the robotic arm example and write your own plugins. There is one thing I want to mention. All plugins are called sequential. This means you should not execute any complex or long running code without threading!

Ok, I stop for now. I’ll make a video tutorial whenever I have the time. In the meantime don’t hold back and ask questions, make suggestions and give feedback.

Happy coding 🙂

70 thoughts on “SOPARE architecture and plugins

  1. Hello, me again who would like to detect specific noises from a closing door. Can we dig here a little deeper in the arcitecture of sopare? 🙂

  2. Hi
    I am attempting to run the code in Python 2.7 in the Windows environment. I am getting some errors in the pickling.py routine. In general, should sopare run on a win 7 64 bit environment? Is there anything special I need to change in the configuration?

    Thanks
    BJ

    • Never tested SOPARE on Windows as SOPARE is developed on and for the Raspberry Pi. If you resolve the dependencies it might work but I can’t guaranty it.

  3. Hello,
    I am trying to understand the logical architecture of sopare.However I can’t come to get how is the training taking place can you help me ?

    • Sure, I try to help but I don’t have a clue what’s your point is … please explain where you are stuck and how to help 🙂

      • I need to understand how the training is done ? the algorithm and dependencies to other developped module
        Basicly the log info is giving me an idea about the used modules however I still need to understand the algorithm

        • The architecture overview should give you quite a good overview about the modules and the flow. And a brief description how SOPARE works was described here and here 🙂

          • good thank you
            I have another question, the features characterising each word are: length , frequency and length is that correct ?

          • Frequencies, shape, length are the most prominent characteristics, but as you can configure quite some stuff you are in control.

  4. Hey,

    I am currently working on a speech recognition for the control of a mobile robot. All commands are limited to two consecutive words. After successful training of the vocabulary, I start the endless mode via “./sopare.py -l”. Here, however, my problem arises. Sopare recognizes a few words, but hangs up after those few words (i can’t say which number of words it is…sometimes after 3…sometimes after 24) and stops responding to my audio input. Have you ever experienced this problem? and could you help me with that?
    I would be very grateful

    Greetings, Julian

    • Hey. Do you have a custom plugin? Do you experience 100% CPU usage after some time? How many trained words are in the dictionary? One potential reason for the described behavior could be that one process or task takes very long and the work adds up to a point where the system start lagging and therefore SOPARE becomes unresponsive.

      • Hey, I think this problem could be solved by allocating one processor core per process to guarantee full parallel computing.
        Haven’t tried that yet but I’m working m way through cuda computing and I might post about it soon if that works and of corse if you agree

      • Hey. I haven´t checked the part with the CPU usage so far as i hadn´t got time last week but i will do it tomorrow. And Yes i do have a custom plugin which is responsible for the movement of the robot. In my dictionary are 7 words – each trained around 5 times. Maybe I should try to start my own code as a thread as you recommended. I will come back here after i tried it. Thank you so far for your advice!

      • Hey. I just want to inform you that the problem was caused by a mistake in my custom plugin. So there is no problem at all with your speech recognition engine :). But thank you very much for your quick response and your help.

  5. Great work and it is very fun using this tool. I am intermediate with python and I know my commands and operations. It is rare to see educational or tutorial about back-end architecture. Where can I go to learn more about how Python REALLY works? Such as how scripts interact and communicate ETC. I have looked around but not much luck, not sure if I am using wrong words.

    • I already answered this question for you. If you don’t like the answer you can try a different, more specific answer. BTW: You asked 3 times the same question so far…!?

  6. i had an error when implementing plugins
    when i tried to write to __init__.py file the following error appear:
    Error writing __init__.py: permission denied
    please need your help to fix this problem

  7. hi
    i have the same problem like julian.when i run ./sopare.py -l after 3 or 4 words that i say, sopare stops responding and doesnt react to voice.im not using any plugin and i just followed the step by step instructions.i have 2 words in my dict and each one about 4 or 5 times trained.im trying to control a robot and add my plugin but now its just sopare itself and the problem occurs.would you please guide me how to solve it?
    plus i want to use the speech recognition for long periods.is it possible or is there any limitations or something that needs to be done?
    thanks

  8. Hi, do you think it’s possible to recognize phonemes than words? Than the API doesn’t need to much resources. I would like to use Sopare for my own home assistant. There I’ve got some servers which will process the nlp afterwards.

    Thanks in advance.
    BR Daniel

  9. Hey, is there a way you can make sequence? So you say for example “Bathroom”, the System processes the command, than asks you for your next command like “lights”, than some processing and after that an other command from you. That would be really helpfull for me. I hope it is clear what i mean.

    Thanks for the great documentation.
    VG Christopher

  10. ALSA lib pcm_dsnoop.c:618:(snd_pcm_dsnoop_open) unable to open slave
    ALSA lib confmisc.c:1281:(snd_func_refer) Unable to find definition ‘defaults.bluealsa.device’
    ALSA lib conf.c:4528:(_snd_config_evaluate) function snd_func_refer returned error: No such file or directory
    ALSA lib conf.c:4996:(snd_config_expand) Args evaluate error: No such file or directory
    ALSA lib pcm.c:2495:(snd_pcm_open_noupdate) Unknown PCM bluealsa
    ALSA lib confmisc.c:1281:(snd_func_refer) Unable to find definition ‘defaults.bluealsa.device’
    ALSA lib conf.c:4528:(_snd_config_evaluate) function snd_func_refer returned error: No such file or directory
    ALSA lib conf.c:4996:(snd_config_expand) Args evaluate error: No such file or directory
    ALSA lib pcm.c:2495:(snd_pcm_open_noupdate) Unknown PCM bluealsa
    connect(2) call to /tmp/jack-1000/default/jack_0 failed (err=No such file or directory)
    attempt to connect to server failed

    • Yeah, an unrelated snippet of ALSA output without further context. What the heck!

      Ok, let’s comment this and assume it has something to do with SOPARE: If you run SOPARE and in case of an error the keyword error appears in the output.

      Note to myself: On my TODO list is now an issue to mark ALSA output as ALSA output and separate it completely from SOPARE output as 95& of all problems, questions and stuff is around ALSA output.

  11. Hey bishophdo, how are you? I’m new to Python and I have some doubts. Using Sopare can I get the characteristic string of the word that was spoken and recognized as a trained word? My goal is to find some patterns and differences through this chain of features and to be able to estimate the distance from the sound source to the microphone. I thought about maybe using THRESHOLD of words, doing a calibration and determining an estimated distance value for each THRESHOLD. Thank you.

    • You can train sounds with SOPARE and SOPARE may gets you predictions and results based on your training. Distance estimation is beyond SOPAREs scope. You should look at a different project or approach.

      • Yes I wanted to work with the help of SOPARE just to make the recognition of the sound, regarding the distance estimation I will implement something still, so I would like to know if I can access the chain of characteristics that SOPARE generates to make the comparison with the sound trained. Do you know if I can save this file? Because then I see if I can work on these data.

        • Hey bishoph, how are you? I was reading some topics in git and found the –wave function of SOPARE, this function creates wave files only of the words trained and saved in the dictionary or the words that are recognized while the program is being executed are also saved? Could you help me with this question?

          • Hey. SOPARE creates “.wav” files for the recognized words and their parts as SOPARE processes them. You find the files in the folder “tokens/”.

  12. Hello,

    Let’s suppose I train SOPARE to recognise the words “events”, “today” and “tomorrow”, but I am not happy with the training data for the word “events” as there was a lot of background noise when I trained it.
    Is there any way to only delete training data for the word “events” without deleting the training data for all the other words?

    Thank you

    • Yes. You can delete trained data for one id and keep the rest of the training data.

      Delete trained data for the id events:

      ./sopare.py -d "events"
      

      After that re-create the dict:

       ./sopare.py -c
      

      Deleting single training data is a bit more complicated as you have to delete the raw data files and re-create the dict again. If somebody needs to know the how to I’ll explain it.

  13. Hello Martin,
    I have a problem with characteristic.py. Could you support me, please?
    I see you calculate Discrete Fourier Transform (fft) for feature extraction but I dont know what algorithm it is. Is it Mel frequency cepstral coefficients (MFCC) or Perceptual linear prediction (PLP) or something else?
    Thank you.

    • Hi. I had no idea what PLP and MFCC is so I had to search for it. Reading through it, SOPARE uses some of the procedures of both PLP and MFCC like power spectrum calculation, filtering, equalizing, smoothing and if I understand it correctly also some kind of lifting. But as I haven’t looked much into other speech processing software/concepts while writing SOPARE these techniques seems to be necessary to get the work done. If you search for technical implementation of well known and named procedures SOPARE is the wrong place to search. Hope that helps.

  14. Hi Martin,

    First; This is a real cool project (and helpful documentation)! Just stumbled upon it as I was looking into how to use an rpi3 to recognize different types of noises.

    The (training) use-case I have is that I have mp3 recordings of the various noises I want to monitor, using audacity I can extract a number of occurrences of each noise and export them into e.g. raw format (that from what I understand can be used as input to sopare). Though I do no see any entry in dict.json (or any other file created)

    What I do is this:

    $ python sopare.py -c ~/noise1_1.raw -t hund1 -v
    sopare 1.5.0
    recreating dictionary from raw input files…
    $ python sopare.py -o
    sopare 1.5.0
    current entries in dictionary:

    Is there a specific encoding expected for the .raw files (tested so far with VOX_ADPCM and signed 32-bit PCM)?

    Best regards

    • Hi. The “raw” format is indeed a bit misleading as it has nothing to do with sound encoding raw format(s). The raw in terms of SOPARE stands for raw data. To create this format sound must run through the SOPARE training method…sorry. There is already a feature request for using mp3 as training method on GitHub…

      • HI Martin,

        Thanks for the clarification! I made a work-around by making a virtual microphone and then played the sound files through that. Made scripted tested tests of different sopare config settings a bit easier (still learning about that though) 🙂

        For anyone interested until native support for sound file exists, this is my bash script:

        #!/bin/bash

        CFILE=”/home/my_user/.config/pulse/client.conf”
        MICFILE=”/tmp/.virtmic”

        print_usage()
        {
        cat >> usage.txt <<USAGE
        This script will either:
        1) Create a virtual microphone for PulseAudio to use and set it as the default device.
        2) If configured to 1), revert back

        Usage:

        sound_redirect.sh
        -h Print this help

        USAGE
        cat usage.txt
        rm usage.txt
        }

        set_up_virtual()
        {
        # Create virtual microphone that reads audio data from a named pipe
        pactl load-module module-pipe-source source_name=virtmic \
        file=$MICFILE format=s16le rate=16000 channels=1 > /dev/null

        # Set the virtual microphone as the default device
        pactl set-default-source virtmic

        # Create a file that will set the default source device to virtmic for all
        # PulseAudio client applications.
        echo “default-source = virtmic” > $CFILE

        # Now an audio file can be written to $MICFILE
        # A write will block until the named pipe is read.
        }

        revert_back()
        {
        # Uninstall the virtual microphone, remove it as default source
        pactl unload-module module-pipe-source
        rm $CFILE
        }

        parse_input()
        {
        while getopts “:h” arg; do
        case $arg in
        h)
        print_usage
        exit 0
        ;;
        esac
        done
        }

        sound_redirect_main()
        {
        if [ ! -f $CFILE ]; then
        printf “Setting up system for SOPARE training… ”
        set_up_virtual
        printf “Done\n\n”
        printf “To write audio file to virtual microphone, do e.g.:\n”
        printf “ffmpeg -re -i input. -f s16le -ar 16000 -ac 1 pipe:1 > $MICFILE\n”
        else
        printf “Reverting system back from SOPARE training… ”
        revert_back
        printf “Done\n”
        fi
        exit 0
        }

        sound_redirect_main $@

  15. Hello

    I have a question:
    Does SOPARE run on a raspberry pi zero? Or is there a problem because of the multithreading?

    Kind regards

    • Yes, it runs on the zero, but you should know what you are doing and it could lag. I can’t really recommend it. Use it for fun and giggles and it should do the job even if – depending on lots of stuff – you could potentially loose the real time aspect and this would be kind of normal and no bug 😉

  16. Hi bishoph
    First of all, thanks a lot for your work! I was very happy to find sopare, as there aren’t many useful programs for offline speech recognition beside yours.
    I was wondering if it’s possible to stop the recording after one word is detected. Until now I’ve used the endless mode via „./sopare.py -l“ but is there another option?
    Thanks a lot & best regards

    • Hey. Glad that you like my work!
      If you want SOPARE to stop after each detection you can simply start SOPARE without the loop option “-l”. You can also stop SOPARE programmatically in your custom plugin…
      Have fun!

  17. Hi Bishoph,
    It has been amazing to find SOPARE, there aren’t many useful programs for offline speech recognition. I successfully installed it on Rspi 3b+. It is able to recognize my sounds well. I am using Raspbian Buster with Desktop (GUI). I am not good at programming, but am learning Python over next few days.
    I came across your Python script the generic start up script and saw your YouTube video on Robot arm control. Being that SOPARE is a clean and responsive program, it is impressive that it can control things with such little control.
    You mentioned on a different reply, the following:
    “Now you may want to ask: How can I control stuff with sopare? Where does my code goes? Good questions. Sopare comes with a simple plugin interface. Just take a look into the directory „./plugins“. You see the standard plugin „print/__init__.py“. This plugin only contains two (2) lines of code that are relevant in terms of coding:
    def run(readable_results, data, rawbuf):
    print readable_results
    The interesting part for the basic usage is the list value „readable_results“. In my case the value contains something like
    [u’main‘]
    If you simply copy the print plugin you can add your own code and control stuff.

    To me and possibly other newbies in programming, it would be helpful if you can elaborate more on this concept. I am trying to control my Rspi GUI and programs installed by speech recognition like basic things to open and respond to voice. For example – have it to open chromium browser, or file manager, or terminal, or libreoffice writer, raspi-config, e-mail etc. What would be a code for each of these in Python? I have Python 3.7 on my machine.
    Would my voice command be like 2-3 words – open terminal, open internet browser, open libreoffice writer? and what would be the execute scripts?

    • I control stuff like lights with Alexa and with SOPARE. As every environment is different and you may have other requirements/goals all I can say is that there are plenty of combinations and possibilities…

  18. Hello Martin
    Fantastic Project! I am quite impressed of how easy the training works. Although I am struggeling a bit in setting up a script that uses the plugin. Might you provide a simple (full-working) example-script which maybe prints something on recognition of a specific word?
    Thank you very much. Keep up your fantastic work.

  19. hi bishoph,
    Can you explain me the working principle of the SOPARE ?
    How it actually works and matches the wave patterns ?

    thank you

  20. Hi Bishoph,

    Great tool. I am having issues using this on python version 3.9.2. I keep getting error:
    ModuleNotFoundError: No module named ‘ConfigParser’.
    I am relatively new to python but i found where someone said that I ‘ConfigParse’ is not supported in python 3, but I could change to ‘configparse’ I did this and I kept getting multiple errors on other missing modules.

    Can you please help with guiding me through a solution, I am hoping to use this tool with my PhD research. It seems like a good fit for what I need. Thank you in advance! 🙂

Schreibe einen Kommentar

Deine E-Mail-Adresse wird nicht veröffentlicht. Erforderliche Felder sind mit * markiert