I build web, mobile and desktop apps, produce screencasts, write ebooks, and provide pairing and training

Capturing video and audio on Mac OS X with RubyMotion

May 06, 2015 - Elliott Draper

In this article we’re going to look at how you can use RubyMotion and the AVFoundation framework to build a Mac OS X app that lets you capture video and audio from attached input sources (like the built-in iSight), and how you can then output that combined footage to a movie file. You can see the code for the app here.

App setup and capture overview

The first thing we need to do is setup a new Mac app:

motion create --template=osx AVCapture

Then we need to edit the Rakefile to add the AVFoundation for us to use:

Motion::Project::App.setup do |app|
  # Use `rake config' to see complete project settings.
  app.name = 'AVCapture'

  app.frameworks += ['AVFoundation']


Now we’re ready to start capturing from input sources. Whether audio or video, the setup and API calls are roughly the same, with the process being the following steps:

  • create and configure a capture session
  • find the device(s) you want to capture from
  • create a capture input from the device(s)
  • check to make sure the capture inputs can be added, and if so, add it to the capture session
  • create a capture output representing how you want to save or process the captured data
  • check to make sure it can be added to the session, and if so, add it
  • when you want to begin capturing, start the session, and once the session is started, begin any output specific actions (such as recording to file)
  • stop the session when you’re done capturing

Now, that seems like a lot of steps, but each step is quite straightforward, which is the great thing about the API - it breaks down a complicated task into a series of very simple steps, and along the way allows a ton of configuration, customisation, and different input and output options.

So what are we going to be building? To keep things simple to begin with, we’ll build an app that shows a window with a button on it, and when the button is pressed, we begin recording using the first video and audio device found, and when it’s pressed again, we’ll stop recording. It’ll output the captured audio and video inputs to file. This means we don’t have to focus too much on the UI of our test app, but can focus instead on the code that makes all of the above steps happen.

Setting up our capture session and device inputs

So first things first, we want to setup our capture session and device inputs, and we’ll do that in applicationDidFinishLaunching.

  def applicationDidFinishLaunching(notification)

    @session = AVCaptureSession.alloc.init
    @session.sessionPreset = AVCaptureSessionPresetHigh

Creating a session is fairly familiar of course, with a standard .alloc.init, and then we set a preset that determines the quality of the capture. There are a fair few presets to choose from, and a good description of them is available better than I could summise in the docs here. In this case we’ll use a preset which represents the highest quality that the recording devices allows for. You can additionally go into much more detailed configuration for session quality, but we won’t get into that right now.

Next up, we need to locate our devices. You can easily find the available devices on the system with AVCaptureDevice.devices, and the key is to inspect them and find the ones that support the type of capture you want to do - in this case, we want a video input device, and an audio input device. If you want to see the devices available with a friendly name, you can log the following or run it in the console:


We’ll be taking the first of each one we find, which commonly on a Mac will be the video from the built-in iSight, and the audio from the built-in mic:

    devices = AVCaptureDevice.devices
    video_device = devices.select { |device| device.hasMediaType(AVMediaTypeVideo) }.first
    audio_device = devices.select { |device| device.hasMediaType(AVMediaTypeAudio) }.first

hasMediaType is a useful way of determining the media capabilities of each device, so we can find the devices we need for both audio and video - in this case, they are separate, but if you had a USB connected webcam that did both audio and video, you’d see that it would support both media types and thus could be used for either/both.

Next up, we need to create a capture input from these devices, before we can add them to our capture session:

    video_input = AVCaptureDeviceInput.deviceInputWithDevice(video_device, error: nil)
    audio_input = AVCaptureDeviceInput.deviceInputWithDevice(audio_device, error: nil)

And then lastly, we check to ensure we can add the inputs, before adding them to our session:

    if @session.canAddInput(video_input) && @session.canAddInput(audio_input)

Now our session and inputs are setup, let’s configure the output!

Configuring our capture outputs

We just have to decide on the type of output we want, and then instantiate an instance of that output type. In this case, we’ll be using the movie file output, which processes the data directly to a file. Again, we’re checking if we can add the output before adding it to the session:

    @output = AVCaptureMovieFileOutput.alloc.init
    @session.addOutput(@output) if @session.canAddOutput(@output)

There are of course other types of output for performing different tasks - they are listed in full in the docs here, but in short alongside the movie file output, there are outputs for directly processing frame data for captured video and audio (for modifying it on the fly), as well as an output for capturing still images from a video device.

If you run the app now, you’ll see a blank window, and not much else happening (yet).

Blank window after session setup


Push to start, push to stop

So now everything is setup, but we need a way to start and stop the actual capture. Let’s add a button to our basic window to allow us to do just that - at the end of applicationDidFinishLaunching, add:

    @button = NSButton.alloc.initWithFrame(CGRectZero)
    @button.title = "Start"

Then below that, add these methods that we require, so we handle window resizing (similar to the article on positioning controls here), and to setup the button frame used initially and with the resizing:

  def windowDidResize(notification)

  def set_button_frame
    size = @mainWindow.frame.size
    button_size = [150, 30]
    @button.frame = [
      [(size.width / 2.0) - (button_size[0] / 2.0), (size.height / 2.0) - (button_size[1] / 2.0)]
      , button_size

We also need to make sure the app delegate is the delegate for the window, in order for our windowDidResize event handler to be called, so add this at the bottom of buildWindow:

    @mainWindow.delegate = self

Then, we need to set the target/action for the button, and point it to a method to start the session capture and recording to file, so back in applicationDidFinishLaunching, add:

    @button.target = self
    @button.action = 'toggle_capture:'

And the implementation for the button handler goes as follows:

  def toggle_capture(sender)
    @is_running ||= false
    if @is_running
      @button.title = "Start"
      @button.title = "Stop"
    @is_running = !@is_running

Here we can see that we’re tracking whether we’re running or not, initially defaulting to false, and then based on that we’re either starting or stopping the session, and updating the button text accordingly. Finally, we’re flipping the @is_running tracking var for next time.

And if we fire up the app now, we’ll see we can click the button, have it toggle between Start and Stop, and you should notice that the light on your built-in iSight goes on and off as you start and stop the capture. But there is no output file, no actual saved recording. That’s because we need to start the output itself, and specify a filename, and the best way to do that is to respond to when the session has started by subscribing to a notification, and firing up our output in there.

Start button with no session running

Stop button with session running


audio + video = mp4

So first of all, at the end of applicationDidFinishLaunching, let’s add our notification handler code:

      selector: 'didStartRunning',
      name: AVCaptureSessionDidStartRunningNotification,
      object: nil)

We’re adding an observer for the AVCaptureSessionDidStartRunningNotification notification, and we’re asking that didStartRunning is called when that event is fired. The code for our didStartRunning method looks like this:

  def didStartRunning
    url = NSURL.alloc.initWithString("file:///Users/#{NSUserName()}/Desktop/temp#{Time.now.to_i}.mp4")
    @output.startRecordingToOutputFileURL(url, recordingDelegate: self)

Here we’re constructing a URL to represent a file to save, using NSUserName() to grab the current user to use within the path. Then we call startRecordingToOutputFileURL with the URL and the app delegate as the recording delegate, which will begin to save the output from our capture session to file.

Now when we fire up the app, each time we start and stop we’ll create a new file, with the output from the video of the iSight and the audio of the mic. However you might notice that if you start and stop quite quickly, no file is created. If you start and wait a few seconds, and then stop, you’ll get a file, but it’ll only be very short. This is because it takes a second or two to start writing the output to file, and as such some additional feedback might be useful in our app so we know when we’re actually recording.


Making it more responsive

Firstly, let’s change our toggle_capture method so that it is essentially moving the app into a working state on a button click, and awaiting the actions to complete before updating the UI further. This means that clicking Start will change the button text to Starting… and clicking Stop will change it to Stopping…, as well as rendering the button unusable until it’s been updated further.

  def toggle_capture(sender)
    return if @is_working
    @is_running ||= false
    if @is_running
      @is_working = true
      @button.title = "Stopping..."
      @is_working = true
      @button.title = "Starting..."
    @button.enabled = false

You’ll see we have an additional variable, @is_working, used to ensure that while we’re in the working state, we don’t action any other button presses, and we also disable the button to be sure. This is only half the story though - now we need some callbacks from the output itself to know when it starts and stops recording so that we can update the UI, the @is_working var, and re-enable the button. You may have noticed when we start recording in didStartRunning, we make this call:

    @output.startRecordingToOutputFileURL(url, recordingDelegate: self)

This sets the app delegate as the delegate for receiving events related to the recording, and we’re now explicitly calling stopRecording in toggle_capture too, so as such we can now implement a couple of delegate methods to handle our response to starting and stopping recording. Add the following:

  def captureOutput(output, didStartRecordingToOutputFileAtURL: url, fromConnections: connections)
    @button.title = "Stop"
    @button.enabled = true
    @is_working = false
    @is_running = true

  def captureOutput(output, didFinishRecordingToOutputFileAtURL: url, fromConnections: connections, error: err)
    @button.title = "Start"
    @button.enabled = true
    @is_working = false
    @is_running = false

We’re re-enabling the button and marking @is_working as false in both cases, for starting and stopping, but when we’ve started we’ll update the button text to Stop, and mark @is_running as true, and when we’ve finished we’ll update the button to show Start and set @is_running to false. Fairly simple, and we now have a more complete feedback loop with more robust handling of the way that the recording starts and stops, which provides better feedback to the user.

Working button state while session and recording is started


Tidy up

The last thing we’ll do is a quick bit of tidyup to ensure that our recording and session stay in sync, even if there is an error. Right now if the user presses Stop, both the session and recording are stopped, but if the recording is stopped independently by the system for some reason (out of disk space, for example), then our session will continue going even when the session stops. Our button will update, but the webcam will stay on. We’ll just move this line from toggle_capture:


to the top of our didFinishRecordingToOutputFileAtURL method:

  def captureOutput(output, didFinishRecordingToOutputFileAtURL: url, fromConnections: connections, error: err)
    @button.title = "Start"

That’s it! Now when the start button is hit, we start the session, and when the session is started, we start the recording. When the stop button is pressed, we stop the recording, and if that happens, or if the recording is stopped for any other reason, then the session is stopped also.


Next steps

So we’ve seen that it’s not too tricky to use the robust AVCaptureSession API to capture video and audio, and get an output file of the total captured footage. But it’d make the app more useful if it was possible to preview what the video looked like, and the audio levels so we can be sure everything is working while recording. Tomorrow I’ll have a post looking at how we can improve this app to do just that! Make sure to check back then, or follow us on Twitter so you know when that post is up. I’ll also have an overdue update on my RubyMotion Mac OS X book later in the week too, so stay posted for that!

UPDATE: Here is the next post, covering the video preview and audio level display!

Available now in early access: Building Mac OS X apps with RubyMotion!

Unlock the power of Ruby in your Mac OS X apps to build everything from utility and productivity apps, to developer tools and helpers, to fully fledged desktop user interfaces. You'll integrate with web APIs, with core system functions, learn powerful ways to build user interfaces, and more. You'll learn how to best structure your apps and to take advantage of the Ruby syntax to make your development more efficient than building the app in Objective-C.

Learn more or purchase now.

blog comments powered by Disqus
Back to blog

Building Mac OS X apps with RubyMotion

Learn how to build Mac apps with using Ruby with this ebook, currently in early access, and with the finished version coming soon.