Sopare precision and accuracy

Whoa, it’s about time to talk about accuracy and precision in terms of SOPARE. SOPARE is a Python project that listens to microphone input and makes predictions from trained sounds like spoken words. Offline and in real time.

Before we go into the details we make a quick excursion how SOPARE is processing sound. The microphone listens permanent and records every sound in small chunks. As soon as the volume of a sound reaches a specified threshold, SOPARE adds some small chunks and creates a bigger chunk. At this time, SOPARE has an array of data in raw mic input format. The input receives some filtering (HANNING) and the time domain data is transformed into the frequency domain. Now SOPARE removes unused frequencies as specified in the configuration (LOW_FREQ and HIGH_FREQ).

At this stage SOPARE is able to compress the data (MIN_PROGRESSIVE_STEP, MAX_PROGRESSIVE_STEP). Compression is a big factor of precision. Progressive steps mean that a number of frequencies are combined into one value. A progressive step of 100 takes 100 values and creates one (1) combined value. This is a very rough preparation and a good way to create lots of false positives. The opposite would be a step of one (1) which would use each frequency for the characteristic and prediction and represents the max. accuracy – but maybe also the worst true positive recognition.

This is how the process looks like. From the full blown time domain data (40000), to the specified number of frequencies (600) and at the end there is a compressed set of data (24) which is quite clear and used for the predictions.

 

 

You need to test around and find some good values for your setup and environment. If you have optimal values, train your sound patterns.

Please note that the values in the section „Stream prep and silence configuration options“ must be used for training and whenever you change them you need to do a new training round. This means remove the trained files via

mv dict/*.raw /backup

or

rm dict/*.raw

and train again!

Now let’s talk about options to enhance precision and accuracy. First of all, you should note that one identifier is always susceptible for false positives. Checking for two or more patterns/words increase the precision big time.

The second option is to make use of the config options to increase the accuracy. Let’s start with the one that identifies a word or pattern:

MARGINAL_VALUE

The marginal value can have a range between 0 and 1. Zero (0) means that everything will be identified as the beginning of a word, 1 means that the trained sample and the current sound must match 100%. Good values lie between between 0.7 and 0.9. Test around how high you can increase the value while still getting real results. For testing purpose keep this value quite low.

MIN_CROSS_SIMILARITY

is the option that is used for comparison. Again, 0 means everything is a match and 1 means that the trained pattern and the current sound must match 100%. For one word scenarios, this value can be quite high, two or more words require normally lower values as the transitions from two patterns are most likely not as single trained words. Good values in my setups are between 0.6 and 0.9. 0.9 for single words, lower values for multiple word recognition.

The following values have a huge impact but I can’t hand out best case values. Instead, they require some manual testing and adjustment:

MIN_LEFT_DISTANCE
MIN_RIGHT_DISTANCE

These values are somehow special. For each word/pattern SOPARE calculates the distance from the trained word and the current sound. A low distance means that the characteristic is similar, high distances means that there is a difference. Left and right means that the frequency ranges are halved and the lower and higher bandwidth is compared respectively. Even if a prediction for the whole word is very close, even a small distance can be essential to filter out false positives. The debug option reveals the most important values:

sorted_best_match: [[MIN_CROSS_SIMILARITY, MIN_LEFT_DISTANCE, MIN_RIGHT_DISTANCE, START_POS, LENGTH, u'PREDICTION'], [MIN_CROSS_SIMILARITY, MIN_LEFT_DISTANCE, MIN_RIGHT_DISTANCE, START_POS, LENGTH, u'PREDICTION']]

Again, this requires some fiddling around to find the optimal values that gives true positive and avoid the false ones…start with high values and reduce until you are satisfied. In my smart home light control setup the values are around 0.3 and my false positive rate is near zero although SOPARE is running 24/7 and my house is quite noisy (kids, wife, …).

The last config options to consider is the calculation basis for the value „MIN_CROSS_SIMILARITY“. The sum of the three following values should be 1:

SIMILARITY_NORM
SIMILARITY_HEIGHT
SIMILARITY_DOMINANT_FREQUENCY

„SIMILARITY_NORM“ is the comparison of the FFT similarity.

„SIMILARITY_HEIGHT“ compares against the time domain shape. Good if you want to consider a certain volume.

„SIMILARITY_DOMINANT_FREQUENCY“ is the similarity factor for the dominant frequency (f0).

I recommend to play around with this values and learn the impacts. Based on the environment, sound and the desired outcome there are plenty of possible combinations. Here are some examples:

Puuhhh, this post got longer than expected. My next task is to create some visuals for a better understanding. In the meantime, please give me some feedback.

I had to split the content into several parts.

Part 1:

Part 2:

I’ll add more parts when they are ready … please stay tuned ūüėČ

Sopare basic usage. Voice controlling a magic mirror

In the last post I did a quick sopare intro and we controlled a robotic arm via voice. Today I want to focus on simple one word commands and how to add custom features to sopare. And because I need something to control I use a smart mirror web interface which is one of my next projects I’m working on. It’s not yet a mirror but the frame, screen, a usb mic and some more parts are already assembled and I think this is a perfect example how to use sopare with one word commands.

The magic mirror prototype that will be controlled via voice. Offline and in real time.

The magic mirror prototype that will be controlled via voice

So, let’s start with the requirements. Obviously, you need a Raspberry Pi 2/3. And a microphone. All my sopare systems are using USB microphones. Here is a list that I’m using for different¬†objectives:

  • Blue Microphones Snowball USB Mic (light control)
  • Samson Meteor Mic USB Studio (robotic arm control)
  • Foxnovo Portable USB 2.0 Mic (magic mirror control)

And of course, you need sopare in the latest version. Before we start the training, it’s a good time to check and adjust the mic input. As I’m working most of the time with a headless Pi, „alsamixer“ is my preferred tool. Just make sure that the input is not too high and not too low. I get good results with mic input levels around the 2/3 mark when the mic input level is not yet in the red sector (see the video for a visual reference). Continue Reading

Raspberry Pi and offline speech recognition

Yes, I must admit the test phase went longer than initially thought. But good things take time, right. When one develops a speech recognition software or a pattern detection system stuff can go horrible wrong and the learning curve potentiates at some point.

But anyway, what the hell am I talking about? In a nutshell about SoPaRe. Or the SOund PAttern REcognition project. With Sopare¬†and a Raspberry Pi (technically it works on any Linux system with a multi core environment) everybody can voice control stuff. Like lights, robotic arms, general purpose input and output…offline and in real time.

Voice, speech, microphone, Raspberry Pi, Sound Pattern Recognition with Sopare, voice control a robotic arm

Even without a wake up word. The local dependencies are minimal. Sopare is developed in Python. The code is on GitHub. Cool? Crazy? Spectacular? Absolutely! I made a video to show you whats possible. In the video I control a robotic arm. With my voice. Running Sopare on a Raspberry Pi. In real time. Offline. And it is easy as eating cake. You are pumped? So am I. Here it is:

I may prepare some more tutorials about fine tuning, increase precision and the difference between single and multiple word detection. Let me know what you are using Sopare for, what’s missing or about potential issues.

Happy voice controlling. Have fun ūüôā

Smart home and voice control – SoPaRe beta testing

More than a year ago I’ve written about voice controlled stuff. Enterprise NCC-1701-D like. As you all know: with the rise of cloud APIs this can be accomplished with some work. Downside is that all your talks will be processed in the cloud which means that you may lose your privacy. As I like the Raspberry Pi my goal was to have something running locally on a Pi. To be more specific: on a Raspberry Pi 2. After trying Jasper and some other projects my personal conclusion was that the small device is not powerful enough for the heavy lifting. This said I just want instant results. And I want to talk from anywhere in the room. I don’t care where the microphone is located. So I started the project¬†SoPaRe to figure out what is possible. My goals were (and still are):

  • Real time audio processing
  • Must run on small credit card sized, ARM powered computers like Raspberry Pi, Banana Pi and alike
  • Pattern/voice recognition for only a few words
  • Must work offline without immanent dependencies to cloud APIs
  • Able¬†to talk free from anywhere in a room

I must admit that I did not expect much trouble as it’s only data processing. Well, I changed my view and learned a lot. My current¬†result is a first usable system¬†that is able to learn sounds (in my case words) and recognize them even when I not talk directly in the microphone but from 2 meter away and from different angles. But let’s start with some basics. The following image shows the printed result of me saying three words: „computer light off“.

computer_licht_aus

In memory we talk about 80000 values that are generated in roughly 3 seconds. As one of my primary goals was real time processing this number is huge. As the Raspberry PI 2 has 4 cores one of my first decisions was to leverage real threads and process the data on different cores to get a good throughput. Another broad idea was to crunch the data and work with just a small characteristic of the sound. This diagram shows the current project architecture:

sopare_architecture

First of all we have to „tokenize“ a sound into small parts that can be compared. Like single words from a sentence:

computer_licht_aus_word_tokenizer

In the current version even a single word is parted into smaller parts like „com-pu-ter“ and for all this parts a characteristic is generated. These characteristics can be stored for further comparison. I tried quite some stuff but I get decent¬†results with a combination of¬†a condensed¬†fast Fourier¬†transformation and rough meta information like length and peaks.

The current version is able to not only match¬†learned words in a sentence but also does this is a real environment. This means standing in a room and the microphone is somewhere located in the corner. Or speaking from quite a distance. On the other hand I still get false¬†positives as the approach is rough. But I’m quite happy with the current state that’s why I talk about it now. The project (SoPaRe) incl. the source code is located on GitHub. Happy to receive your feedback or comments and of course, if you are using SoPaRe, please tell me about it!

My next step is to kick off the beta testing and enhance here and there. Will write again when we have more results after the test phase ūüôā

Raspberry Pi powered pinball machine ‚Äď Gameplay impressions

Having a playable but only half-done pinball machine means still that there will be somebody who wants to play. Good thing is that I get a lot of feedback. Bad thing is development speed slows down quite a bit. Here are some impressions from my beta testers playing my DYI Raspberry Pi powered pinball machine:

impressions

Highlight of the current state is the¬†display. The display shows points for the current ball, total points, played ball and the high score since the game started. On top the display shows some comic speech bubbles whenever an event is triggered. The display is in fact an old 17″ monitor attached by some plates from the do-it-yourself store. According to my¬†intelligence boys want to break the high score and ¬†girls want to break boys so there is always competition and fun. Best impression so far was two girls dressed up as princesses are playing with the pinball machine made from a princess bed. Gorgeous.

Technically some timeouts were adjusted and I introduced real multiprocessing for sound effects and background music. One „slingshot“ is equipped with a switch to get points when the ball hits the rubber. So far so good. Whenever I have some spare time the second sling shot receives a switch as well ¬†and I want¬†to add targets to the play field. Targets are great¬†for¬†incentives and should impact the game play as it becomes more interesting. Furthermore I’ll play around with some barriers as the out lanes are hit too often at the moment.

Whenever the Raspberry Pi is switched on the operating system boots up and the program gets started at¬†boot time. No¬†X Window system is used which means I’m using a direct framebuffer as you can see in the source code. Starting a Python script at boot time is relatively easy as the following line can be used in a start/stop script:

start-stop-daemon --start --background --pidfile $PIDFILE --make-pidfile --exec $DAEMON --startas $DAEMON

 

Raspberry Pi powered pinball machine ‚Äď Weak flipper finger

Today I spend some time¬†investigating the issue with the weak flipper finger. First I debugged the software. Each 1000 loops the average time that the software needs in the loop was calculated. As the numbers are pretty stable around 0.005, which is the¬†defined sleep timeout, we can say that the software runs inside our defined¬†parameters and is quite reliably. Next the value of the HIGH flipper finger timeouts were increased by 0.01 to make sure that the¬†magnetic field does not collapse to early. Without¬†any effect. Last I inspected the flipper finger mechanics carefully and I notices that the right flipper finger¬†requires a grain more power when the EOS switch was reached in comparison to the left flipper finger. In addition the EOS switch was triggered in the middle of the movement and not in the end phase. Long story short: The solution was to bend the EOS switch manually a bit to make sure that the switch is triggered at the end of the movement and the ball now gets kicked hard. Small details can have great impacts when building a DIY pinball machine ūüôā

A short video that shows the current state:

Raspberry Pi powered pinball machine ‚Äď The bumper

Yeah, bumper time. To be precise: pop bumper time. Here are the parts we have to assemble.

bumper_parts

Worth to mention that the coil in the picture is an AC coil without diode and I soldered the diode myself to make it work in my DC voltage environment. And this is how it should work: a pop bumper kicks the ball when a ball hits the bumper plastic skirt. The plastic skirt triggers the switch which gives the impulse to fire up the coil. The coil pulls down the rod and ring assembly to accelerate the ball. Easy as eating cake. I started with mounting the coil:

pop_bumper_coil_mounting

I used the bumper base for marking all holes and mounted the coil first with only one screw to make sure that all movable parts have enough space. Mounting pop bumper requires lots of drilling BTW. In the next step we place the skirt and fix the rod and ring assembly.

bumper_skirt_and_rod_ring_assembly

Screwing the bumper body and fixing all parts comes next. It is a bit tricky to stick the lamp pins through the tiny holes but who cares.

bumper_body_and_light

To make it look like a real bumper the last visible step is to add the cap.

pop_bumper_assembly_result

The real tricky part is to make the bumper work in the way that the skirt does not get¬†caught but activates the switch. In my case I replaced ¬†the original switch bracket ¬†with a piece of wood strip. I had to try several wood strips in different¬†thickness to make it work. Try and error¬†is the motto. Because I’m using a Raspberry Pi for the switch input testing was easy. I made sure that the bumper switch worked correctly before I wired the coils and relays. I also developed¬†the logic and dry tested some „hiccups“ to avoid burning the coil in advance by adding some special cool downs. Will play around and maybe adjust some timers though. And finally here is a current picture of the play field:

playfield_with_bumper

My pinball source code incl. all bumper specifics¬†is¬†available on Github as always, if you are interested. What I recognized while playing the current state¬†is that I get sometimes a short but noticeable delay which results in weak flipper finger strength. As I’m using a Raspberry Pi 2 with multiple cores I introduces multiprocessor support for the sound module. Did not help as much as I thought so this issue is definitively something I need to address as the fun part depends a lot on¬†precision. One solution to address this delays could be to introduce tmpfs and put all sounds into memory to avoid I/O wait conditions. So there is enough stuff on my to-do list. Get¬†excited and have fun.

Raspberry Pi powered pinball machine ‚Äď The spinner

Building your own pinball machine is not only huge fun. For me it’s learning and¬†experimenting. One can become extremely creative. A good example was today’s „building a spinner mounting support“. The spinner is an original part of a real pinball machine. I got it second hand from a pinball shop.¬†My mission today was to mount it above the lane, add a micro switch and develop the logic around¬†light, sound and scoring. With just a couple of brackets, a metal rail, screws, nuts,¬†washers and a standard micro switch I’m really proud of the result:

spinner_with_micro_switch

And best of all: it works. Which is quite cool as the whole project becomes really complex as everyone can see by just looking at the current state of my wiring:

raspi_and_relay

It looks really confusing and this is only the tip of the iceberg. But I have a fool-proof system that works for me. First of all I‘ using a text file with all GPIO pin configurations. Next: all GPIO outgoing cables are tied to a well defined¬†luster terminal input which is also documented. The text file also describes details which relay must be used. And most of the time I also label the wires. Most of the time means I tend to not label the wires until I lose the track ūüôā

Raspberry Pi powered pinball machine – lights and sound

Adding light and sound effects is a huge difference when building a pinball machine. Even if I have currently only 3 triggers and two lights its quite fun to play around with the current state. I added a small effect class to the project to play any sounds one after another and to define simple light effects. This means blinking all around and because of a small random routine there is always some blink-blink action. Yeah, major improvement in terms of atmosphere.

The right flipper finger is not as strong as the left one and I’ve no idea why. Same resistors, resistance and pieces all around but¬†noticeably weaker. Maybe something mechanically, will go ahead and ignore it for now. The next step is to define the places for more elements and the visual interface.

The source code is available¬†if you are interested. I really enjoy Python as it makes stuff so much easier. I should add a video as just text or static images can’t express how cool this is. Stay tuned ‚till the next milestone and have fun.