Visualization fun with GStreamer

I have a Mini-ITX box connected to my TV. It worked very well on my old CRT TV, but now I have a Full HD TV. It went from 720×576 to 1920×1080: five times the pixel count (exactly!), or over 100MB/s (bytes, not bits of course) of raw video. It’s not all that much, but it’s way more than what the Mini-ITX can handle. With the magic of graphics hardware, however, it can show lower resolutions scaled up to 1920×1080 without breaking a sweat.

I had a lot of issues trying to get some music visualization running on it. There’s no way the poor thing can generate 1920×1080 pixels worth of visualization, let alone push it out to the TV. libvisual, the closest thing to a visualization standard there is, didn’t appear to have simple command line apps that you could point to a music file and hardware scale the visualizations to fullscreen.

There was projectM though, which has a clever system of capturing audio from the Pulse audio system and visualizing it with OpenGL scaling. That way you can use any music playing app you want, at any resolution you care to render and display. Unfortunately, the Openchrome drivers for the VIA hardware and the Qt OpenGL component really hated each other.

But hey, we have GStreamer!

gst-launch-0.10 pulsesrc device=alsa_output.hw_0.monitor ! queue ! audioconvert ! libvisual_infinite ! video/x-raw-rgb,width=640,height=360,framerate=25/1 ! ffmpegcolorspace ! queue ! xvimagesink

Grab audio from the pulse monitor device, run it through libvisual to get a visualization at the specified resolution, and show it through xvideo. All my requirements summed up in about two lines of gstreamer goodness!

Webcam fun with GStreamer

I have yet to find a proper linux tool for recording video from a webcam while showing it on screen at the same time. The typical hack is to use mencoder to encode, and mplayer to play the encoded file, but the latency is typically a full second or more:

{ tail --follow=name -n +0 --retry "lulz.avi" | mplayer -cache 320 -vo x11 -; killall -INT mencoder; } & mencoder tv:// -tv width=640:height=480:fps=15 -ovc lavc -o lulz.avi

GStreamer does to video/audio what Bash does to text and NetPBM does to images, and it’s just as brilliant (possibly more). So let’s instead use it instead:

gst-launch-0.10 v4l2src ! tee name=videoout ! queue ! videorate ! video/x-raw-yuv,fps=15 ! queue ! theoraenc quality=60 ! queue ! muxout. pulsesrc ! audio/x-raw-int,rate=22000,channels=1,width=16 ! queue ! audioconvert ! vorbisenc ! queue ! muxout. oggmux name=muxout ! filesink location=lulz.ogg videoout. ! queue ! ffmpegcolorspace ! ximagesink

Voila. While long and seemingly convoluted, it’s not really worse than the mplayer line, and it works a lot better.

While a gst pipeline looks scary to begin with, it’s really self explanatory when you start reading it. Still, I’ll do a little dance about it:

#Get a v4l2 video source, split it and put one end though a 
#theora codec and send the other to videoout (defined later)
v4l2src ! tee name=videoout ! queue ! videorate ! video/x-raw-yuv,fps=15 \
        ! queue  ! theoraenc quality=60 ! queue ! muxout.   

#Get audio from a pulseaudio stream, run it through the vorbis encoder
pulsesrc ! audio/x-raw-int,rate=22000,channels=1,width=16 \
         ! queue ! audioconvert ! vorbisenc ! queue !muxout.  

#Mux the audio and video together, and put it in "media.ogg"
oggmux name=muxout ! filesink location=media.ogg  

#Put the other end of the video split out on the screen
videoout. ! queue ! ffmpegcolorspace ! ximagesink

Easy to see why this is one of my new favourite toys.

Cutecodes

I suggest a simple number-to-string scheme for easily recognising and comparing numbers.

There seems to be a number of cases where you want to check that two numbers are the same. This could be comparing a number on a printed record to a number on screen, comparing document IDs over the phone, seeing if two people share a phone number, or a bunch of other scenarios. This is highly error prone. Given that you can raed wrods wrhee the lertets are mxied up wouthit porbelms, it’s no wonder that 85142 and 85412 are easily confused.

Humans are a lot better at concepts, and therefore words. Given the lines “snowman, blue kiwi” and “snowman, red camel”, anyone will easily see that they’re not the same. Even though “snowman, red camel” has three times as many characters as “85412”, I think most people would find the former easier for both long term and short term memorization.

These happen to be actual examples from a simple number-to-string conversion scheme I devised. It’s based on a set of ten adjectives and a hundred mostly cute and happy nouns. I call the resulting strings “cutecodes”. You can test them below, by typing in some digits and hoping that my javascript skills haven’t rotted.


Cutecode test:
Javascript off?

The digits are grouped in threes, the first digit picks an adjective and the last two pick a noun. Here are some thoughts that went into the system:

  • There should be a sizable amount of words. Here, an adjective and a noun will uniquely identify three digits. With some more work, you might have a hundred adjectives and a thousand nouns, for five digits.
  • The words should be pleasant and inoffensive, no matter which order they’re put in. People might object to having “burning deamon” as their order number. “Cutecodes” came from the resulting high concentration of cute nouns. I tried not making it sappy though, since it should be usable in a serious corporate setting.
  • Words should not be excessively culture specific. It’s hard making it global, but I avoided words like “gopher” which are primarly American. People will have a harder time remembering words if the concept are difficult to relate to. Since this is a proof of concept there are still some, like “lemur”.
  • With four digits you get two nouns rather than two adjectives and a noun. This is because a “small, green pencil” and “green, small pencil” is the same concept but would map to different numbers (with this rule, you get “ginger, pencil” instead)

It could be convenient to be able to convert cutecodes to numbers by hand. One way would be to use “A” and “B” as “0”, “C” and “D” as 1, etc, and picking the words so that “BArn” is 00, “CAlf” is “10” and so forth. So far, the nouns are just listed in alphabetical order, so you know that “earthworm, red carrot” is a lot less than “wizard, small mushroom”.

Instead of even trying, I chose single syllable adjectives and double syllable nouns, all with mostly the same rythm: “black camel, green lemur, sweet raincoat, young almond, white puzzle, small lemming, dry bubble”. This sounds nice and takes the same number of syllables to say as the number.

Finally, the words should be chosen so that they can be translated unambiguously between a few major languages. I didn’t bother with this either.

I imagine that this could be shown wherever strings of more than 3-4 digits are displayed, to increase recognition by humans.