Professional videographers often shoot using a video camera and a separate audio recorder at the same time, to enable the capture of high-quality sound. It is also common for videographers to film a scene or event using multiple cameras at once to capture different angles.
PluralEyes, originally developed by Singular Software and subsequently acquired by Red Giant, is a popular application that automatically synchronizes footage from multiple cameras and audio recorders by matching up the audio content of media files, thereby saving many hours of painstaking manual synchronization. In 2011, PluralEyes 2.x took the form of a plug-in to several video-editing applications (known as non-linear editors, or NLEs). Its user interface consisted of dialog boxes for choosing parameters for the synchronization process and for specifying the input and output locations of files.
In 2011- 2012, I collaborated with Singular Software to design PluralEyes 3, which would use the core technology of PluralEyes 2 but be far more capable and user-friendly. It was a fun project, as we had the freedom to design a completely new user experience while also having a huge base of user feedback from earlier releases.
A typical usage of PluralEyes 3 is shown in a 2.5-minute demo and independent review here:
Bruce Sharpe, CEO of Singular Software, had taken every opportunity to get to know PluralEyes users, and had developed a remarkable level of understanding of how the product was being used. PluralEyes 2.x was a useful product with a loyal following, however he believed it had the potential to be far better. He envisioned PluralEyes 3 as a standalone application, which compared to a plug-in would allow richer data visualization and a smoother user experience.
At the beginning of the PluralEyes 3 project he set out some goals for the PluralEyes 3 user experience:
- The UI should be self-explanatory. For the common case in which the software can successfully synchronize files with one click, the user should not need to watch an introductory video or go to the Help menu. Although the software had some very advanced features that would require complex interactions, the simple tasks should be simple.
- Video editors use many different applications. The role of PluralEyes in this overall workflow should be clear.
- The UI should be visually attractive, clean, clutter-free, and demo really well.
From the beginning of the project, we also had a hunch that it might be fun for the user to watch the sync as it progressed, inspired by the primitive but hypnotic experience of watching Windows Disk Defragmenter do its work.
The small, distributed team at Singular Software followed an iterative development model. As a consultant for the project, I wrote vision statements and requirements based on the team’s excellent research, and served as its sole interaction designer. A freelance graphic designer joined us at the later stages to drive the visual design side of the project. Although I led the interaction design effort, many excellent creative ideas in the product came from other members of the experienced, user-focused team.
User and Task Analysis
I started the project with very little knowledge of videography. To kick-start my understanding of the domain and of our target users, I sat down with Bruce Sharpe for a series of brain-dump sessions and drafted a requirements document that we would extend and refine over the course of the project.
Our requirements document included a one-page list of functional requirements, a list of claims that would be printed on the back of the box if the software were packaged, and analysis of users and tasks. Personas helped us organize our understanding of users:
Bill has been a professional videographer for 10 years, and runs an independent videography company with his wife Alexa. Weddings are the mainstay of his business during the summer, and he also does corporate videos, stage productions, and the occasional music video.[Photo, not shown here for licensing reasons, of a smiling man in a suit]
Bill and Alexa use HDV pro cameras, and have two to four cameras on most projects. At events where they need to capture a soundboard feed, they find it very handy to hook up the soundboard to a portable audio recorder. That way, they can leave the audio recorder in the sound room and not have to deal with a long cable to connect the soundboard to a camera.
Bill and Alexa have a highly systematized set of conventions for storing and backing up files, and Bill feels that his business absolutely depends on successful file management. When Bill downloads files from a camera or audio recorder, he puts them in a specific folder for that project on an external drive. All post-production work that he does on the project also goes into that folder. Bill and Alexa often share post-production tasks, so the external drive is frequently unplugged from his machine and then plugged into Alexa’s for her to work on a project he started, and vice-versa.
The software that Bill uses the most includes Final Cut Pro 7, After Effects, Macintosh Mail, and a variety of iPad and iPhone apps.
Bill’s goals are:
- Above all, let me synchronize footage in fewer person-hours, i.e. save time.
- Manually synchronizing footage is really tedious. Save me from having to do it.
- Synchronize footage in fewer clock-hours.
- Let me have flexibility in where I put devices, so I can capture good sound without cable management issues.
- Any new technology needs to blend into my current media management strategy.
Context scenarios captured the range of goals, real-world concepts, and contexts of use that we could foresee for the product:
Bill is a professional wedding videographer, working on a wedding from two days ago. There were two videographers working at this wedding, with three cameras. The wedding day video files are in the following folders:
- Bridal prep
- Groom prep
- Photo shoot
Throughout the noisy three-hour reception, Bill and his partner used three cameras. One camera was mostly stationary. They each moved around with a camera with an attached shotgun mike, filming dances and speeches and going around to tables to interview guests. A separate audio recorder was hooked up to the sound board.
For the reception, they ended up with 50 video clips from Bill, and 50 from his partner, one three-hour-long video from the stationary camera that is split into 20 files of 2GB each, and 1 three-hour-long audio clip from the sound board.
The Reception folder also contains 3 clips that Bill shot of people arriving at the reception hall, and 1 clip of the string quartet playing on the front lawn just before the reception.
Bill starts PE3, loads in the 125 files from the Reception folder, and takes a minute to make sure that he’s added the clips from the right wedding. He asks PE3 to start the sync, and PE3 indicates that it will take three hours. Bill goes out for lunch, returns, and does some other work on his computer while waiting for the sync to finish.
PE3 reports that 9 of the video clips did not sync with the audio from the sound board. Four of them are from the pre-reception, when there was no feed from the sound board at all. The others are mostly short clips, such as relatives laughing during speeches and shots of the first dance from various angles. Bill plays through all 5 of these clips in a couple of minutes. Starting with a clip he more or less remembers, he moves it to where he thinks it should go, and PE3 moves it a little bit more. Bill plays through the clip and is satisfied that it syncs reasonably well with the audio.
Bill then proceeds to move the other 4 clips successfully. Bill does a spot check of a few speeches and his shoot of the first dance to see if they have synced well with audio from the sound board, saves the results of the sync as an AAF file, and starts getting together files from the wedding ceremony for PE3 to sync.
Our main context scenarios, with names correspond to names of personas, were:
- Bill syncs audio and video recordings for a wedding reception, with syncing issues
- Jeff syncs audio and video recordings for a corporate conference
- Jeff syncs audio and video for a dance recital, with drift issues
- Alan syncs takes for a music video
- Tony syncs audio and video for an indie narrative film, with orphaned clips
For all types of users, the following table describes what should be the most common task flow. The pathway for performing it should be highly prominent and efficient.
|The user identifies the video and audio clips that they want to sync, and arranges them into tracks corresponding to the recording devices they are from. If the clips are for a music video, the user indicates which clips are for which takes, rather than for which recording device. See details in: “User decides what clips to sync together, and associates them with recording devices or takes”
|The system organizes clips from different devices or takes into tracks. By looking at file names, date stamps, and other metadata, the system tries to do the following:
The system might prompt the user for additional information while doing this.
|The user indicates that they want to initiate the sync process, e.g. by clicking the “Sync” button.
|The system starts the sync. It provides rich, engaging feedback throughout the sync process. Feedback should:
Ideally, the system should visually “tell the story” of the smarts behind PE3, to make its value more obvious in demos. Good feedback will minimize the user’s perception of how long the sync is taking.
|The user QAs the sync. See details in “User QAs the sync after syncing”
|The systems provides information to help the user assess whether the sync was successful, and to troubleshoot the sync if needed.
|The user asks to save the results of the sync in one or more formats. See details in “User saves the results of a sync”
|The system saves the sync results in the specified formats.
After completing a successful sync, some possible next steps are:
- User opens the synced files in an NLE
- User initiates a sync for a different set of files, clearing the data associated with the current sync first.
Turning Research into Design
To make an interface intuitive, a useful design principle is to create a match between the system and the real world. As described in a popular list of heuristics, “The system should speak the users’ language, with words, phrases and concepts familiar to the user, rather than system-oriented terms.” The following example illustrates one of the ways we applied this principle, and how it made a difference to the UI.
The synchronization algorithm in PluralEyes works in terms of “tracks”. It needs to know how many “tracks” to synchronize with each other, and what clips are in each track. For example, if you used Camera A and Camera B to film a concert from different angles, the synchronization algorithm needs to be told which clips came from Camera A and which clips came from Camera B, otherwise it might try to synchronize clips from Camera A with other clips from Camera A. One of the design challenges we identified early on was:
How do we get users to put files from only one device into each track?
We were concerned that users would be confused about how to use tracks and might put files from multiple devices into the same track.
Looking at users’ real-world language helped us to reframe and solve this problem. It was clear that the natural vocabulary of videographers was “cameras” and “audio recorders”. Users would have to think if given a UI that talked about devices and tracks. But they could relate right away to a UI that asked of them, “Please give me a set of files that came from one of your cameras.”
This UI also helped to explain the role of the application in the overall workflow, making it clear that the system needed raw media files and not, for example, a file exported from an NLE.
We knew from research that most projects would include at least one camera and at least one audio recorder, so we made new projects have a folder for one of each by default. As the user put clips into folders – which the system automatically labeled “Camera n”, “Audio Recorder n”, etc., we had the system automatically treat the contents of each folder as the contents of a single track.
(This solution worked for most users, but not quite all. A few users don’t use PluralEyes to synchronize different cameras and audio recorders – they make music videos, and synchronize their takes of the music video visuals with a prerecorded song. In this case, the synchronization algorithm needs each take to be on a separate track. For the small minority of users who were synchronizing takes of a music video, we designed a separate project type for synchronizing take, with functionality optimized for that particular task.)
Users found it simple to use our UI to indicate which clips came from which cameras and which audio recorders. In version 3.3, PluralEyes went a step further and automatically sorted clips based on metadata, a level of sophistication that was out of scope for the initial project. Even with the additional level of automation in PluralEyes 3.3, using the user’s natural concepts was still important, as it made it easy for users immediately recognize what the automated process had done.
I started creating screen mockups and storyboards early in the project, as soon as I had wrapped my head around the major scenarios. After I had brought the evolving design through enough internal review sessions to be confident that it was feasible to implement and going in a reasonable direction (which involved about a dozen iterations), we set up sessions for user feedback and usability testing with real professional videographers, to better understand what we needed to improve and what was already working.
OK. Before we look at the mockups, I’d like to ask you just a few quick questions.
First, what kind of video and still photography work do you do?
How many cameras and recording devices do you use in a given shoot? If it varies, tell me about the possible configurations and when you would use them.
What software applications do you use the most, including photo processing and uploading software, Adobe Bridge, etc.?
OK, great. We’re done with the questions, and we can start looking at things.
For the reception, there was a camera set up at the back of the room, and a videographer walked around with a video camera to get close-ups and different angles, and to interview guests. A portable audio recorder was used to capture the soundboard feed.
We’ll assume that you downloaded the files from the cameras and audio recorders, and put them into folders like this.
(For videographers only): As you are a professional videographer, we’d like to know if this folder structure makes sense to you. Or would you organize it any other way?
Wedding Reception Scenario
Now we’re going to look at what this new application will look like. When you launch PluralEyes, this is what you’ll see.
Now imagine that you want to synchronize the footage from the wedding reception. At any time during this task, you can ask to see the Finder window containing the files from the reception.
Just let me know when you want to see it. So coming back to the software…
… What would you do or click on to proceed? Please remember to think aloud as much as possible. Also please let me know when you think that you have finished the task.
Great. Your feedback is really useful for us. Let’s go over your impressions from the past few minutes.
- Do you remember seeing anything whose purpose wasn’t totally obvious?
- Did anything happen that you didn’t expect?
- Was it obvious to you which files to put into the system?
The main screen provides a clear entry point to begin their task by dragging and dropping files. A minimalist toolbar conveys the purpose of the application with the labelled commands “Synchronize” and “Export Timeline”. PluralEyes 3.0 required the user to drag and drop files from one camera or audio recorder at a time:
In PluralEyes 3.3, the system became smart enough to automatically organize files by detecting what camera or audio recorder they were from:
As the user adds files to a project, we provide rich feedback so that they can quickly notice if they had forgotten to add files from one of their cameras, or had mistakenly given the system files from the wrong wedding. We immediately show thumbnails, audio waveforms, and summaries of how many clips from how many cameras and audio recorders have been added. The user can choose to watch and listen to clips:
As a fun-to-watch way to provide feedback on the progress of the synchronization, we have the UI show clips moving into place in real time:
After the synchronization is completed, the user can play multiple tracks together to check that they are in sync, optionally touch-up the sync by manually moving any stray files into place, and export the synchronized timeline for editing in an NLE.
Release and Feedback
In July 2012 I heard the happy announcement that Red Giant Software had acquired the Singular Software business. Red Giant released PluralEyes 3.0 a few weeks later, to wide acclaim including strong reviews from Studio Daily, Creative Planet Network, and customers.
— Shun Lee (@shunlee123) March 30, 2013
Just about to use PluralEyes to sync up a video for @VentureCentreNZ – love watching PluralEyes at work – it’s like magic
— Mark Shingleton (@mark0s) May 19, 2014
— Johnathan Farrar (@IssacFarrar) June 1, 2014