Direct Access to Video Encoding and Decoding

来源：http://asciiwwdc.com/2014/sessions/513

Session 513 WWDC 2014

Discover how to use AV Foundation and Video Toolbox to access hardware accelerated encoding and decoding services. Gain best practices for when it is appropriate to use a high-level or low-level API for encoding or decoding. Learn about multi-pass export for improved H.264 encoding and see how to use it in your app.

[ Silence ]

[ Applause ]

Hi everyone.

Thanks for coming today.

My name is David Eldred.

about Video Encoders and Decoders today.

All right.

to hardware encoders and decoders.

This will help users.

This will improve user experience in a number of ways.

this will extend battery life.

as well as their iOS devices have improved battery life.

in every time they're doing video processing.

first, we're going to break this down into a few case studies.

We're going to look at some common user scenarios.

of a layer in your application.

to those decoded CV pixel buffers.

like to compress those directly into a movie file.

or do whatever you like with them.

that we're introducing in iOS8 and Yosemite.

of our media interface stack.

in my view of this, because we're talking about video.

So at the top we have AVKit.

for dealing with media.

Below that, we have AVFoundation.

interface for a wide range of media tasks.

And below that, we have Video Toolbox.

but now it's finally populated with headers on iOS.

and decoders [applause].

And below that we have Core Media Core Video.

in the interfaces in the rest of the stack.

and the Video Toolbox.

in your application or compress frames directly into a file.

or compress directly to CM sample buffers.

So a quick note on using these frameworks.

to get hardware acceleration, but that's really not true.

and Video Toolbox will all use hardware codec.

when it's appropriate.

when it's available on system and when you request it.

All right.

I'm going to do a quick look at this cast of characters.

that you'll encounter in these interfaces.

First off, there's CVPixelBuffer.

that buffer of data is the CVPixelBuffer wrapping.

to access that data.

It's got the dimensions, the width and the height.

to correctly interpret the pixel data.

Next, we've got the CVPixelBufferPool.

to efficiently recycle CVPixelBuffer back ends.

so PixelBufferPool allows you to recycle them.

counted object.

for reuse next time you allocate a PixelBuffer from that pool.

Next thing is pixelBufferAttributes.

in our interfaces.

for pixelBufferAttributes dictionaries.

or a PixelBufferPool.

This includes the this can include several things.

that you're requesting, the width and height.

of pixel formats that you'd like to receive.

such as OpenGL, OpenGL ES or Core Animation.

All right.

Next, we've got CMTime.

that you'll see in your interfaces.

This is a rational representation of a time value.

and a 32 byte time scale, which is the denominator.

pipeline and you won't have to do any sort of rounding on them.

All right.

Next, CMVideoFormatDescription.

of video data.

This contains the dimensions.

that go along with the CMVideoFormatDescription.

and it can include color space information.

about that more a little bit later.

All right, next is CMBlockBuffer.

of data in core media.

it will be wrapped in a CMBlockBuffer.

All right, now we have CMSampleBuffer.

You'll see CMSampleBuffer show up a lot in our interfaces.

These wrap samples of data.

on several of the types that we've talked about here.

They contain a CMTime.

This is the presentation time for the sample.

They contain a CMVideoFormatDescription.

This describes the data inside of the CMSampleBuffer.

and the CMBlockBuffer has the compressed video data.

or it may be in a CMBlockBuffer.

All right.

Next, we've got CMClock.

time is always moving and it's always increasing on a CMClock.

that you'll see used is the HostTimeClock.

on mach absolute time.

So CMClocks are hard to control.

You can't really control them.

and always at a constant rate.

So CMTimebase provides a more controlled view onto a CMClock.

we could then set the time to time zero on our time base.

of your time base.

at which the clock is advancing.

or they can be created based on other CMTimebases.

All right.

Let's hop into our first use case.

and display it in a layer inside of your application.

in Mavericks, and it's new in iOS8.

So let's take a look inside AVSampleBufferDisplayLayer.

as input and these need to be in CMSampleBuffers.

at the appropriate time.

But, I mentioned we were getting our data off of the network.

in the form of an elementary stream.

AVSampleBufferDisplayLayer wants CMSampleBuffers as its input.

into CMSampleBuffers.

So let's talk about this.

the H.264 spec defines a couple of ways of packaging H.264 data.

to is Elementary Stream packaging.

a lot of things with streams in their name.

Next, is MPEG-4 packaging.

This is used in movie files and MP4 files.

in MPEG-4 packaging.

So let's look closer at an H.264 stream.

of data packaged in NAL Units.

and these are Network Abstraction Layer units.

These can contain a few different things.

First off, they can contain sample data.

across several NAL Units.

The other thing that NAL Units can contain is parameter sets.

until a new parameter set arrives.

So let's look at Elementary Stream packaging.

in NAL Units right inside the stream.

This is great if you're doing sequential playback.

to all subsequent frames until a new frame or sets arrive.

in the CMVideoFormatDescription.

each CMSampleBuffer references this CMVideoFormatDescription.

to the parameter sets.

for random access in a file.

and begin decoding at an I frame.

if you have an Elementary Stream coming in?

to have to package those in a CMVideoFormatDescription.

CMVideoFormatDescription CreatefromH264ParameterSets.

[ Applause ]

and MPEG-4 packaging is in NAL Unit headers.

we have a length code.

and replace it with a length code.

That's the length of the NAL Unit.

It's not that hard.

from your Elementary Stream.

or NAL Units and replace the start code with a length code.

And you'll wrap that NAL Unit in a CMBlockBuffer.

in your CMSampleBuffer.

So you have a CMBlockBuffer.

to create CMSampleBuffer using CMSampleBufferCreate.

about AVSampleBufferDisplayLayer in time.

time stamp.

Well, how does it know when to display these frames?

By default, it will be driven off of the host time clock.

Well, that can be a little bit hard to manager.

The host time clock isn't really under your control.

with your own time base.

on our AVSampleBufferDisplayLayer.

at the appropriate time.

All right.

that can describe this.

First off, there's the periodic source.

in the AVSampleBufferDisplayLayer.

or video conferencing scenario.

The next case is the unconstrained source.

at one time.

or if you're reading the CMSampleBuffers from a file.

All right, let's talk about the first case.

This is really simple.

at which they're being displayed.

as they arrive.

You use the enqueueSampleBuffer column.

All right.

The unconstrained source is a little bit more complicated.

at once.

No one will be happy with that.

and you can ask it when it has enough data.

WhenReadyOnQueue.

its internal queue's are low and it needs more data.

and loop while you're asking whether it has enough data.

You use isReadyForMoreMediaData column.

so you keep on feeding SampleBuffers in.

that means it has enough and you can stop.

So it's a pretty simple loop to write.

All right.

about with AVSampleBufferDisplayLayer.

to create an AVSampleBufferDisplayLayer.

by your AVSampleBufferDisplayLayer.

to your layer, AVSampleBufferDisplayLayer.

with the AVSampleBufferDisplayLayer.

All right.

So let's dive into our second case.

to just display it in your application.

and get the decompressed pixel buffers.

of the pieces we need.

we'll access it through the VTDecompressionSession.

VTDecompressionSession wants CMSampleBuffers as its input.

in the output callback that you implement.

you'll need a few things.

of the source buffers that you'll be decompressing.

This is a CMVideoFormatDescription.

to decompress you can pull it right off the CMSampleBuffer.

for your output pixel buffers.

You use a pixelBufferAttributes dictionary for this.

to implement a VTDecompressionOutputCallback.

All right.

for the Output PixelBuffers.

dictionary.

in an open GLS ES render pipeline.

that they be OpenGL ES compatible.

GLESCompatibilityKey and set it to true.

dictionaries to be very specific.

but there's some pitfalls here.

to true.

and outputting YUV CVPixelBuffers.

with those requested attributes.

And the answer is yes.

that directly to your callback.

to your PixelBufferAttributes.

with the requested output requirements.

And it is OpenGL ES compatible, but it's certainly not BGRA.

that YUV data to BGRA data.

So extra buffer copies are bad.

to decreased battery life.

So the moral story here is be it don't over specify.

All right, so let's talk about your Output Callback.

for that PixelBuffer here.

you'll receive that information in the Output Callback.

if there's an error, even if it's dropped.

to your VTDecompressionSession.

SessionDecodeFrame.

to provide these frames in decode order.

operate synchronously.

VTDecompression SessionDecodeFrame returns.

Decompression.

All right, let's talk about Asynchronous Decompression then.

as it hands the frame off to the decoder.

But decoders often have limited pipelines for decoding frames.

until space opens up in the decoders pipeline.

We call this decoder back pressure.

but the call can still block in some cases, so be aware of that.

but the call can block, so don't perform UI tasks on that thread.

for asynchronous frames.

from the decompression session.

So sometimes, we'll be decode in a sequence of video frames.

out of the first parameter sets that we encountered.

with our first SPS and PPS.

until we encounter a new SPS and PPS in the stream.

between these format descriptions.

CanAcceptFormatDescription.

to FormatDescription two.

it can handle the new accepted FormatDescription.

into the Decompression Session and everything will work fine.

and be sure and pass the new frames into that one.

when you're no longer using it.

All right.

about with the VTDecompressionSession.

for specifying your output requirements.

about handing changes in CMVideo FormatDescription.

So with that, let's hop into case three.

and you want to compress those directly into a movie file.

Well, for this, you may be familiar with this already.

We have AVAssetWriter.

so it can write these optimally into a movie file.

Working with Media and AVFoundation.

All right.

Let's just hop straight into case four.

but you don't want to write into a movie file.

You want direct access to those compresses SampleBuffers.

than through the AVAssetWriter.

that compressed data out over the network.

you'll need a few things, and this is really simple.

for your compressed output.

to the VTCompressionSession.

to implement a VTCompressionOutput Callback.

So you've created a VTCompressionSession.

Now you want to configure it.

VTSession SetProperty.

of VTSessionSetProperty calls.

but this is not an exhaustive list.

The first one I'm going to mention is AllowFrameReordering.

By default, H.264 encoder will allow frames to be reordered.

in which they're admitted.

to allow frame reordering.

Next one, average byte rate.

This is how you set a target byte rate for the compressor.

or KVTH compression for your H.264 encoder.

All right, and then there's the RealTime property.

to more of a background activity like a transcode operation.

to mention here is the ProfileLevelKey.

or specific profiles and allow us to choose the correct level.

And this is definitely not an exhaustive list.

in VTCompressionProperties.H and see what we have for you.

to your VTCompressionSession.

you'll provide the presentation timestamp.

You need to feed the frames in in presentation order.

the presentation timestamps must be increasing.

no timestamps that go backwards.

that they'll operate on, so your output may be delayed.

of frames have been pushed into the encoder.

All right.

CompleteFrames.

All pending frames will be omitted.

All right.

Let's talk about your Output Callback.

where you'll receive your output CMSampleBuffers.

These contain the compressed frames.

that information here.

And final thing, frames will be omitted in decode order.

and they'll be omitted in decode order.

All right.

Well, so you've compressed a bunch of frames.

which means that they're using MPEG-4 packaging.

over to Elementary Stream packaging.

to do a little bit of work.

So we talked about the parameter sets before.

H.264 will be in the CMVideoFormatDescription.

as NAL Units to send out over the network.

Well, we provide a handy utility for that too.

CMVideoFormatDescription GetH.264ParameterSetAtIndex.

of what we did with AVSampleBufferDisplayLayer.

to convert those length headers into start codes.

convert those headers on the NAL Units.

All right.

about with the VTCompressionSession.

We talked about creating the VTCompressionSession.

to configure it using the VTSessionSetProperty column.

to the compression session.

into an H.264 Elementary Stream packaging.

All right.

so he can talk about Multi-Pass.

Good morning everyone.

My name is Eric Turnquist.

about Multi-Pass Encoding.

quality versus bit rate.

like seeing bad quality video.

in the output media file.

So let's say we're preparing some content.

If you're like me, you go for high quality first.

So great, we have high quality.

Now in this case, what happens with the bit rate?

to have a high bit rate.

if we're streaming this content or storing it on a server.

but the quality isn't going to stay this high.

Unfortunately, that's also going to go down as well.

like the source.

We don't want this either.

high quality and low bit rate.

to AVFoundation and Video Toolbox.

Yeah, so first off, what is Multi-Pass Encoding?

Well, let's do a review of what Single-Pass Encoding is first.

So this is what David covered in his portion of the talk.

going into the encoder and being admitted.

In this case, we're going to a movie file.

and we're left with our output movie file.

Simple enough.

Let's see how Multi-Pass differs.

as compressed samples.

Now we're going to change things up a little bit.

So we're going to have our frame database.

for Multi-Pass, and we're going to have our encoder database.

This will store frame analysis.

to get better quality.

or I want more passes.

In this case, let's assume that we're finished.

so we need one more step.

to the output movie file and that's it.

We have a Multi-Pass encoded video track on a movie file.

Cool. Let's go over some encoder features.

so you're not losing any hardware acceleration there.

Second point is that Multi-Pass has knowledge of the future.

Now it's not some crazy time traveling video encoder.

Bonus points whoever filed that enhancement request.

It allows or is able to see your entire content.

to make assumptions about what might come next.

so it can make much better decisions there.

Third, it can change decision that it's made.

So in Single-Pass, as soon as the frame is emitted, that's it.

It can't it can no longer change.

It can no longer change its mind about what it's emitted.

to achieve optimal quality.

of like having a very awesome custom encoder for your content.

So that's how Multi-Pass works and some more features.

Let's talk about new APIs.

So first off, let's talk about AVFoundation.

In AVFoundation, we have a new AVAssetExport Session property.

and we have reuse on AVAssetReaderOutput.

of AVAssetExport Session.

and writing them to a movie file.

So in this case, what does AVAsset ExportSession provide?

Well, it does all this for you.

It's the easiest way to transcode media on iOS and OS X.

So let's see what we've added here.

of for you automatically.

to send the samples between passes.

if Multi-Pass isn't supported.

it'll use Single-Pass.

and you're automatically opted into Multi-Task, and that's it.

So for a large majority of you, this is all you need.

Next, let's talk about AVSWriter.

So AVSWriter, you're coming from uncompressed samples.

You want to compress them and write them to a movie file.

You might be coming from an OpenGL or OpenGL ES context.

In this case, what does AVSWriter provide?

to the output movie file.

and modifying the buffers in some way.

and an AVSWriterInput.

You're responsible for sending samples from one to the other.

Let's go over a new AVSWriterInput APIs.

so set SDS and you're automatically opted in.

you need to mark the current pass as finished.

So what does this do?

Well, this triggers the encoder analysis.

and if so, what time ranges.

or I want to see subsets of the sequence.

about what time ranges it wants for the next pass?

Well, that's through AVSWriterInput PassDescription.

to seven, but not the sample at time seven.

or subsets of the sequence.

that the encoder has requested by calling sourceTimeRanges.

pass descriptions.

to reply with what decisions it's made.

to give you that answer.

about the next pass.

about what content it wants for the next pass.

Let's see how that works all in a sample.

So here's our sample.

We have our block callback that your provide.

Inside that callback you call current pass description.

for the next pass.

for another pass, you reconfigure your source.

for the next pass.

with requestMediaDataWhen ReadyOnQueue.

If the pass is nil, that means the encoder has finished passes.

Then you're done.

You can mark your input as finished.

All right, let's say you're going from a source media file.

That was in our second example.

So we have new APIs for AVSReaderOutput.

by saying supportsRandomAccess equals yes.

to deliver those time ranges.

with an NSArray of time ranges.

callMarkConfigurationAsFinal.

to its completed state so it can start tearing itself down.

in combination together.

if the AVSWriterInput supports Multi-Pass.

we need to support random access on the source.

for the AVSWriterInput.

with the pass description's time ranges.

Let's go over that in the sample.

we now want to deliver for our AVS at ReaderOuput.

with the pass description source time ranges.

Great. So that's the new API and AVFoundation for Multi-Pass.

Let's talk next about Video Toolbox.

we like to call this our VTMultiPassStorage.

and decompressed database, or as we call it, the VTFrameSilo.

with the objects that we actually use.

and our VTMultiPassStorage.

We're done with this pass.

The encoder wants to see samples again.

We're sending in those samples that it requests.

and the compression session and we're left with our FrameSilo.

from the FrameSilo to the output movie file.

Great, we have our output movie file.

So first off, let's go over what the VTMultiPassStorage is.

So this is the encoder analysis.

This is a pretty simple API.

and then you close the file once you're finished.

So that's all the API that you need to use.

and you don't have to worry about it.

Next, let's talk about additions to VTCompressionSession.

and the encoder about your VTMultiPassStorage.

So you can do that by setting a property.

and use this VTMultiPassStorage for its frame analysis.

Next, we've added a couple functions for MultiPass.

for that pass, you call end pass.

if another pass can be performed.

of samples it wants for the next pass.

and a C array of time ranges.

Now let's talk about the VTFrameSilo.

So this is the compressed frame store.

to add samples to this VTFrameSilo.

and you don't need to worry about it.

It's a convenient database for you to use.

Then you can prepare the VTFrameSilo for the next pass.

This optimizes the storage for the next pass.

to the output movie file.

So you can retrieve samples for a given time range.

that you provide and add it to your output movie file.

Right, that's the new Video Toolbox APIs.

So I want to close with a couple considerations.

and your priority in your app.

you should be using Single-Pass.

so use Single-Pass in these cases.

of power during encoding, use Single-Pass.

and as will the encoder analysis.

or transcode operation, use Single-Pass.

than the output medial file.

for your content, Multi-Pass is a great option.

as possible, use Multi-Pass.

and so it can allocate bits only where it needs to.

It's very smart in this sense.

for better quality, Multi-Pass is a good option.

you need to experiment.

and if they're willing to wait longer for better quality.

Next, let's talk about content.

or a static image sequence.

than Single-Pass.

These are both pretty easy to encode.

Next, let's talk about high complexity content.

water, fire, explosions.

but Multi-Pass probably won't do much better than Single-Pass.

to encode or is Multi-Pass a better decision?

in Final Cut Pro or an iMovie Trailer.

high complexity transitions.

and really give you the best quality per bit.

you need to experiment.

if Multi-Pass will give you a good benefit in these cases.

So let's go over what we've talked about today.

and for most of you, these are the APIs you will be using.

Video Toolbox APIs provide you direct media access.

this is a good way to use Video Toolbox.

your use cases and your users before you enable it.

So for more information, here's our Evangelism email.

and a programming guide.

We can answer your questions on the developer forums.

a lot of these talks have already happened.

interested in.

Thanks everyone and have a good rest of your day.

[ Applause ]

ASCIIwwdc

Searchable full-text transcripts of WWDC sessions.

An NSHipster Project

Created by normalizing and indexing video transcript files provided for WWDC videos. Check out the app's source code on GitHub for additional implementation details, as well as information about the webservice APIs made available.

Additional transcripts for 2012 WWDC content graciously contributed by Rev.com, as well as Carlos Arcenas, Christopher Bowns, Daniel Schneller, Janek Spaderna, Jorge Galrito, Tyler Bindon, YoonHyung Jo, and other community volunteers

Built on Sinatra. Powered by Heroku and Heroku Postgres.