Webcam and Linux – GStreamer tutorial


Now let’s get busy building a pipeline!
$ gst-launch v4l2src device=/dev/video0 ! \
'video/x-raw-yuv,width=640,height=480,framerate=30/1' ! \
xvimagesink

GStreamer has a simple pipeline workflow. First you define the source, then you tell what to do with the source and in the end you tell where to put it by specifying a sink. Sink can be almost anything: a file, your screen or a network which turns your computer into a streaming server. In the above case you tell gst-launch to take video4linux2 as a stream source. Then you link it to a capability, which is defined it by a mime type and a few optional properties, telling v4l2 that you want video, Giving it a proper resolution and a frame rate. In the end you link everything to xvimagesink. This sink will display your stream on the screen.
Now that you have a live feed on your screen you can try putting it in a file.
$ gst-launch v4l2src ! 'video/x-raw-yuv,width=640,height=480,framerate=30/1' ! \
queue ! videorate ! 'video/x-raw-yuv,framerate=30/1' ! theoraenc ! \
queue ! oggmux ! filesink location=me_funny_dancing.ogg

What is different this time? First there is queue which provides some buffer for the next element in the pipeline, videorate. Videorate will take each frame of the input and feed it to the next element at the requested framerate. This is achieved by duplicating or dropping frames. Stream is then linked to theoraenc which encodes raw video into theora stream. Then the stream is linked to oggmux which takes theora stream and muxes it into an ogg container. In the end a properly contained theora encoded video is linked to a filesink which writes all the data to a file. Try playing it!
$ mplayer me_funny_dancing.ogg

See yourself doing a funny dance. But oh, noes! The quality is kind of bad and you didn’t see yourself while you were recording. Let’s take it one step of at a time. First the recording feedback.
$ gst-launch v4l2src ! 'video/x-raw-yuv,width=640,height=480,framerate=30/1' ! \
tee name=t_vid ! queue ! xvimagesink sync=false t_vid. ! queue ! videorate ! \
'video/x-raw-yuv,framerate=30/1' ! theoraenc ! queue ! oggmux ! \
filesink location=me_funny_dancing.ogg

Yes, this pipeline is getting more and more complicated. With tee you can split data stream into multiple pads and handle each one of them separately. First you create an element tee and name it t_vid then you link first part with xvimagesink so it displays live stream. Then by appending t_vid. you tell gstreamer that you want to split the data stream and you link the rest of it to theora encoder and muxer and filesink like in previous example. You will end up with your pretty video on the screen which is also being recorded into a file.
As we all know that there is no home made porn without sound we somehow need to add sound to the video. But how? Simple, by complicating our already complicated pipeline.
$ gst-launch v4l2src ! 'video/x-raw-yuv,width=640,height=480,framerate=30/1' ! \
tee name=t_vid ! queue ! xvimagesink sync=false t_vid. ! queue ! videorate ! \
video/x-raw-yuv,framerate=30/1 ! theoraenc ! queue ! mux. \
alsasrc device=hw:1,0 ! audio/x-raw-int,rate=48000,channels=2,depth=16 ! \
queue ! audioconvert ! queue ! vorbisenc ! queue ! mux. \
oggmux name=mux ! filesink location=me_funny_dancing.ogg

Are your eyes bleeding? You’re not finished yet, so get ready for more. What in the name of a pipeline is this? You now already know all about splitting the pipeline and how to handle those two splits so let’s go straight to the changes. In the third line you can see that where video should be linked with oggmux it is actually linked with mux. which is a name for your future muxer that you will create later. Next you need to specify one additional source for sound. In my case it is Alsa source and the device I used is hw:1,0 which corresponds with the built in microphone on the webcam. Your device will probably be something else. Just like with the video we need to tell alsasrc what kind of data we want. Mime type is audio/x-raw-int sampling rate is 48kHz with two channels and 16 bit depth. This is sometimes enforced by the audio source and you might have to play with the settings a little bit. You will need to link it to audioconvert element which takes your audio and converts it to something that encoder expects. Vorbisenc is an audio encoder as you probably guessed and it is linked to mux. At last you will create that muxer you were referring to before and link it to the filesink.
There, you’re all set! No wait, don’t run off just yet. What about the quality? Satisfied? No? Then stick around! So, on the fly encoding to theora works but the quality of your recording is not that good. The problem with theora is that even setting the quality to maximum doesn’t help much. You will need something else, something more raw! Try this:
$ gst-launch v4l2src ! 'video/x-raw-yuv,width=640,height=480,framerate=30/1' ! \
tee name=t_vid ! queue ! videoflip method=horizontal-flip ! \
xvimagesink sync=false t_vid. ! queue ! \
videorate ! 'video/x-raw-yuv,framerate=30/1' ! queue ! mux. \
alsasrc device=hw:1,0 ! audio/x-raw-int,rate=48000,channels=2,depth=16 ! queue ! \
audioconvert ! queue ! mux. avimux name=mux ! \
filesink location=me_dancing_funny.avi

Don’t be surprised if you see your mirrored image on the screen. You should, some people prefer the mirrored preview. It certainly makes nose picking easier, right? If you don’t like it that then simply remove the videoflip element. With this pipeline your video will stay uncompressed and will be written in an avi container with the help of avimux. Please note that ten seconds of your video will require around 150 Megabytes of disk space and one minute is close to 1 Gigabyte if you record in 640×480@30fps! Yes it will be huge.
The problem with this raw video is that is not editable in Kdenlive. If you try to open it Kdenlive will simply crash and die in a puff of smoke. My main problem was how to record a video with a decent quality. I tried dozens of combinations, but nothing really worked and raw video was not editable.
And now the/my solution. Quality of H.264 would be good for me, but H.264 encoding takes a lot of time even on a Core2Quad CPU and it can’t be done on the fly. You need to record raw video and then convert to H.264. Here’s how:
$ gst-launch filesrc location=me_funny_dancing.avi ! \
decodebin name=decode decode. ! queue ! x264enc ! mp4mux name=mux ! \
filesink location=me_funny_dancing.mp4 decode. ! \
queue ! audioconvert ! faac ! mux.

This time you need to specify a different source. It needs to be the file which contains your recording. Then you link it to the decodebin which will create a pipeline that you can link to x264enc and then mp4mux. Decodebin works similar to tee, you get more than one pipeline out of it so that you can separetely work with audio and video. You will need to mux audio and video together with mp4mux and write it to file. Careful eye will notice that faac element was added before the ending mux. This is because mp4 container can’t contain raw audio so we convert it to mpeg2/4.
Conversion will take some time, but it is well worth it. Video that you get will be of a very high quality and it can be edited with Kdenlive and later rendered into your final product.
This concludes the first part of Gstreamer tutorial. Next time I will talk more about streaming live feeds to all the viewers out the on the internet.
Comments, suggestions, corrections and flames are well appreciated.