Picture-in-Picture (PiP) is a floating video player which sticks to the foreground of your screen – regardless of which application you are using. This user-experience allows viewers to eyeball your content while multitasking.

This article takes you through the PiP-enabled platforms, its mechanisms and its implementation. Readers can also find demonstrations and an overview of the market evolution. The second half contains bonus info and a conclusion.

Supported Platforms

Native Picture-in-Picture, also known as "Pop Out Video" or "Baby Player", is available as a feature on certain platforms. This feature can be toggled through a native user-interaction or an API.

The platforms below have support for native Picture-in-Picture:

Device Platform API Date
Desktop Maxthon 2.5.6.350+ N/A 2009-08-26
Mobile iOS (iPad only) App 9+ AVKit 2015-09-16
Desktop Opera 37+ W3C 2016-05-04
Desktop MacOS Safari 9+ WebKit 2016-09-30
Tablet iOS (iPad only) Safari 9+ WebKit 2016-09-30
Mobile Android App 8.0+ Activity 2017-08-21
Smart TV Android App 8.0+ Activity 2017-08-21
Mobile Android Chrome 8.0+ N/A 2017-08-21
Streaming FireTV 6.0+ Activity 2017-10-01
Desktop Chrome 71+ W3C 2018-12-04
Desktop Vivaldi 2.2+ W3C 2018-12-13
Streaming Apple TV 13.2+ AVKit 2019-10-13
Desktop Windows Firefox 71+ N/A 2019-12-03
Desktop Apple/Linux Firefox 72+ N/A 2020-01-07
Desktop (Chromium) Edge 79 2.2+ W3C 2020-01-15
Desktop & iPad Safari 13.1+ W3C 2020-03-24
Table 1: Platforms with support for Picture-in-Picture.

There seems to be no native support for Pop Out video on the following platforms:

Device Platform
Mobile iPhone
Smart TV WebOS
Streaming Roku
Streaming Chromecast
Console Xbox
Console PS4

The following platform(s) have a related API:

  • Samsung Β Tizen 2.3+, a Smart TV, offers a TVWindow API since 2015-07-06. This API allows you to embed a 'real video source' (e.g. the input from your HDMI cable). This API does not allow you to create a floating video player containing your content.
  • Reach out to us to inform us on other APIs!

Mechanisms

This articles defines two mechanisms to toggle Picture-in-Picture. The first mechanism, a user-interaction, is controlled by the viewer. The second mechanism, an API, is controlled by the application developer.

πŸ‘† User-Interaction

Many platforms offer viewers a native user-interaction to enable Picture-in-Picture. Check one of the "[UI]" demos to view an example of this mechanism. We define two types of user-interactions.

  1. Direct user-interactions. For example, 1) viewers need to right-click context menu (e.g. Desktop Chrome); or 2) viewers click a PiP-button injected in the video player user-interface by the browser (e.g. Firefox); or 3) viewers can open a menu next to the address bar (e.g. Chromium Edge's Global Media Controls); and so on. There is an obvious, direct request from the viewer.
  2. Indirect user-interactions. For example, when the viewer is on their smartphone watching a video in fullscreen, and they click the home-button. Suddenly, the video continues in PiP. There is no obvious, direct request from the viewer.

In general, all browsers offer a user-interaction to trigger PiP. This is good news if your OTT service only lives on browsers. If your viewer really wants Picture-in-Picture, they can use the native user-interaction. Instead of spending time creating and maintaining code, you could let the marketeers or product managers write and publish FAQs.

A typical Picture-in-Picture flow.

Native applications usually require you to implement an API to get it up-and-running.

πŸ’» API

Developers can programmatically enable Picture-in-Picture through an API as hinted by table 1. Check one of the "[API]" demos to view an example of this mechanism. There are two factions in Picture-in-Picture API war: the "W3C Editor's Draft Interface" versus "everyone doing their own thing", creating a total of 4 different APIs.

W3C Picture-in-Picture API

The 'official' Web API is the W3C Picture-in-Picture API as described by the Editor's Draft at https://w3c.github.io/picture-in-picture/.

The W3C API is implemented by most Chromium-based Desktop Browsers (Chrome, Chromium Edge, Opera and Vivaldi). Safari 13.1 for iOS and MacOS also started support this API.

The W3C API can be reduced to three bullet points:

  1. Methods to request and exit PiP.
  2. An attribute to disable Picture-in-Picture.
  3. An event to monitor a presentation mode change.

Besides this, the API also allows:

  1. Videos to go automatically in PiP;
  2. Interaction with remote playback;
  3. Interaction with the Media Session API to customise controls.

This CodePen snippet provides a sample implementation of the W3C Picture-in-Picture API. This example should work on all browsers which implement the W3C API.

Other APIs are offered by Android and Apple

The W3C API is the most widely adopted one, and the WebKit API is probably the second most popular ones. The following APIs are ranked by 'most likely to come in contact with':

  1. WebKit JS API: This API is the other known browser API, and it's developed by Apple. This WebKit API exposes a webkitPresentationMode. Developers can set (and get) a presentation mode value. This value can either be 1) inline, 2) fullscreen or 3) picture-in-picture. A simple example is available on their website.
  2. iOS / MacOS AVKit API: This API can be used to enter PiP in iOS (iPad only) applications and MacOS applications. The syntax is different from the WebKit API and the W3C API.
  3. Android Activity API: This API can be used to enter PiP in Android applications. The syntax is different from the W3C API.

Use-cases

Picture-in-Picture creates more engagement opportunities for video content. The following use-cases tie in nicely with a pop-out player.

i. "One Eyeball Content". Your app offers videos where you are fine with your viewers only glimpsing at it. Your content doesn't require their full attention, but they should still be able to pause (and play) it when multi-tasking.
Examples: music videos, documentaries, ...

ii. "Connected Content". Your app offers videos which is connected to a specific activity. Your content is beneficial (or essential) to view when doing a related activity.
Examples: training/educational videos, screen-sharing presentations, ...

iii. "Force-feed Content". Your app offers videos which must reach the viewer.
Examples: advertisements, ...

A question for the readers: why did you decide to read this article? Share your use-case through LinkedIn or Twitter.

πŸ‘©β€πŸ’» Implementation

Somehow this article lit a fire under you. You need Picture-in-Picture, and you need it yesterday. The good news: you only have to get familiar with a maximum of 4 APIs.

1. W3C

❌ You could decide to ignore W3C-specification-compliant browsers, and rely and the viewer figuring out how to use the native UI. This leaves you 3 APIs to get familiar with.
βœ… To programmatically implement Picture-in-Picture on these browsers, get started with Google's article on Watching video using Picture-in-Picture.

2. WebKit

❌ You could decide to use Safari's native <video>-element which has an out-of-the-box PiP icon. This leaves you 2 APIs to get familiar with.
❌ ❌ You could decide to not do programmatic Picture-in-Picture on Safari before version 13.1. As of Safari 13.1, developers can use the W3C Picture-in-Picture API.
βœ… To programmatically implement Picture-in-Picture before Safari 13.1, get started with Apple's article on Adding Picture in Picture to Your Safari Media Controls.

3. AVKit

❌ You could decide to use the standard AV Player which has an out-of-the-box PiP button. This leaves you 1 API to get familiar with.
βœ… To programmatically implement Picture-in-Picture, get started with Apple's article on Adopting Picture in Picture in a Custom Player.

4. Activity

βœ… To programmatically implement Picture-in-Picture on Android (and FireTV), get started with Android's article on Picture-in-Picture support in Activities.

πŸ“Ί Demos

The videos below illustrate Picture-in-Picture on different platforms. Items with "[UI]" are experiences where PiP is enabled with a native user-interaction. Items with "[API]" are experiences where PiP is enabled with an API call.

πŸ“ˆ Market evolution

Maxthon, a Chinese web browser, seemingly introduced "Tear-Off Video" around 2009. They built this feature on-top of their Flash player, and it's only toggleable through a user-interaction.

Apple popularised Picture-in-Picture in 2015 by adding it to Safari. In the 5 years since then, Picture-in-Picture got implemented on every platform, except iPhones, some streaming sticks & consoles, and non-Android Smart TVs.

2020

We'll keep this section updated. Reach out to us if anything is missing.

Tell me more...

Are you sure you?

Consider continuing your reading journey if the following topics interest you:

  1. Variations;
  2. The PiP conundrum;
  3. Custom controls;
  4. Disabling Picture-in-Picture;
  5. No API;
  6. Context Menus are mean;
  7. Making money with PiP;
  8. Intrusion;
  9. Conclusion.

Variations

People associate different definitions with Picture-in-Picture. There's native PiP, which we have been talking about, but there are two other types.

In-App Picture-in-Picture

In-App Picture-in-Picture is a floating video player contained within a single-page application, website or native application.

This floating video player will not stick to your foreground when you navigate to another web page or application. This variation is often linked with a visibility API to automatically enter PiP when the regular video player container is no longer visible when scrolling.

The implementation is up to the application developer. For example, for websites, developers often leverage CSS to configure an absolute position and high Z-index for the video player container.

This variation offers the advantage of 'full-control' over native Picture-in-Picture. Developers can fully customise the floating video player, whereas the native one has a default set of nonadjustable controls. This could be useful when integrating advertisements in your Picture-in-Picture experience. (Or when you want to have a simple scrub bar in your floating video player – something which most native PiPs do not offer.)

Multi-screen, Multi-angle or Screen-in-screen

Multi-screen is when one video (player) is contained within another video player, or when multiple video players are placed side-by-side. This is Picture-in-Picture according to Wikipedia.

This experience is often associated with the more traditional television experience, where the feed of one camera is overlaying another feed, or when two channels/movies are playing next to each other.

Screen-in-screen (Yes, it's literally one picture in another picture).

This implementation is very useful for content with a social/live component. For example, in (e-)sports, you want to display the commentators and the sports game at the same time.

πŸ€” The PiP Conundrum

The primary advantage of Picture-in-Picture is straightforward. The feature offers viewers the possibility to interact with videos outside of its apps or websites. Logically speaking, if viewers have more 'locations' to consume content, it increases content consumption, thus benefiting content providers.

On the other hand, nonchalant viewership might not be in your best interest as it could reduce content consumption. Viewers might miss important scenes, loose track of what's happening, get less invested, and forsake the content.
We don't have any data or papers to back this claim, but it would definitely be an interesting A/B test.

And that, ladies and gentlemen, is the PiP conundrum.

Custom controls

In general, for browsers and Apple applications, it is not possible to configure custom controls for the floating video players which PiP spawns.

There is some hope though. The W3C spec writes that the Media Session API can be used to customise the available controls. On Desktop Chrome, developers can already use this Media Session API to map some controls.

Instead of native Picture-in-Picture, you could opt for In-App Picture-in-Picture if your use-case permits it. This variation gives you full control over the look-and-feel, but it will no longer work cross-application.

You can customise the Picture-in-Picture UX (and controls) for Android applications, because you're styling an activity instead. (Check out the Android App [API] example!)

Disabling Picture-in-Picture

There are two approaches to disable Picture-in-Picture.

The first is programmatic by leveraging a Picture-in-Picture API. The W3C API gives you a disablePictureInPicture attribute which you can configure for a <video>-element. We've set up a CodePen snippet to demonstrate the implementation.

This API is unfortunately unavailable on most browsers – even the ones which implement the W3C API.

You could also make it more difficult for viewers to trigger the native user-interaction.

  • You can create a custom context menu to respond to a right-click, for example by intercepting the oncontextmenu event. This approach replaces the native right-click menu – a right-click menu which might allow to viewer to trigger Picture-in-Picture.
  • You can overlay invisible elements on top of your <video>-element. Viewers would now right-click an element (e.g. a <div>) which doesn't have a "Picture-in-Picture" item in its right-click menu.
  • You can configure pointer-events: none; for the <video>-element. The right-click event will no longer spawn a right-click menu.

The second approach is by letting the viewer manually disable Picture-in-Picture in their browser settings.

Disabling Picture-in-Picture in Firefox.

This manual approach differs from browser to browser.

No API

There are two obvious camps with regards to Picture-in-Picture:

  1. Those who don't offer an API (e.g. Desktop Firefox and Mobile Browsers).
  2. Those who offer an API (e.g. Google on Desktop and Android applications, Apple on Desktop an Applications).

The trade-off is as following:

  1. When there is no API, all end-users can immediately enjoy the feature and developers don't need to opt-in.
  2. When there is an API, developers can customise the user-experience and user-interface.

We identified three main scenarios where there is no Picture-in-Picture API:

  1. You're dealing with a mobile web browser;
  2. You're dealing with Firefox;
  3. You're dealing with the UC Browser.

The people at Mozilla wrote an interesting article on why they prefer no API.

Context Menus are mean

As we mentioned, a direct user-interaction is a common mechanism to enable Picture-in-Picture. This direct user-interaction often boils down to right-clicking the video player, and selecting a 'Picture-in-Picture' menu-item.

Chrome's direct user-interaction to toggle Picture-in-Picture.

This user-interaction is sometimes prevented by video players. How? The approaches described in the above "Disabling Picture-in-Picture" section are often implemented without taking Picture-in-Picture into account. Be careful when you create custom context menus, or when you overlay elements on top of your <video>-element.

YouTube's context menu doesn't allow viewers to trigger PiP.

Be sure to inform your team on the trade-offs of customizing the video player UI. The native right-click menu can be really useful, and you might be disappointing PiP-lovers by disabling this feature.

πŸ€‘ Making money with PiP

One of the allures of Picture-in-Picture is the idea that you can stream more advertisements to your viewers. While force-feeding advertisements is a good way to earn an extra buck, you have to remember that there's a lack of support for custom controls. This constraint means that you'll be able to generate ad impressions, but no ad clicks.

Additionally, for Picture-in-Picture, two remarks are relevant here:

  • You can only push a <video>-element to Picture-in-Picture. Advertisement videos often live within <iframe>-elements, which rendered in fullscreen.
  • You cannot easily switch between <video>-elements; only one element can be active.

If you are currently debating client-side versus server-side ad-insertion (SSAI), you might want to give SSAI the preference. SSAI ads are stitched inside your stream and could guarantee smooth transitions between regular content and advertisement content.

You might be able to generate extra ad revenue with Picture-in-Picture, but it's bound to give you a headache. Also: viewers never tend to like ads, and they definitely do not like PiP ads.

AVOD is not the only business model. You could sell Picture-in-Picture as one of your premium features. YouTube packages this as one of the feature of YouTube Premium, which costs $11.99/month.

If you commercialize Picture-in-Picture, you need to figure out how to disable it for your free users. This is somewhat ironic, as it means you need to disable right-click menus.

Intrusion

Some people hate PiP. Imagine this: you are new to a website, and suddenly video players pop out right and left. Now imagine that these pop-out players are playing advertisements. This user-experience is a red flag to a lot of people, and they will want to leave your service ASAP.

Unwarranted intrusion is an argument why some platforms offer no support for a Picture-in-Picture API. Developers might abuse the API, and drag down the browsing experience. They argue that if a viewer really wants to enable PiP, they can use a native user-interaction.

Some platforms recognize this problem, and try to tackle it in different approaches.

  • On Android Chrome, you need to be in fullscreen before Picture-in-Picture can be toggled. Chrome probably makes the assumption that because you are in fullscreen, the content must be of interest to you, because you did a user-interaction to put it in fullscreen. Hence, it should be OK to play the content in picture-in-picture when you click the home button if you are playing in fullscreen.
  • Some platforms only allow a Picture-in-Picture API when you are using a native application. Because you have downloaded and installed the app, you probably trust them, so if their PiP annoys you: complain to them.

One potential solution could be to evolve to a permission-based system similar to push notifications. Meaning, applications can request the 'Picture-in-Picture permission', and end-users explicitly allow the Picture-in-Picture permission.

🏴 Conclusion

Implementing Picture-in-Picture requires a focused mind. Some platforms have a full-fledged API, and other platforms only allow user-interactions to toggle PiP. On top of that, the behavior is different for the same browser on Desktop versus Mobile. You basically need a cheat-sheet to keep track of everything.

Content providers should debate whether PiP is a worthy investment. Will it impact ad revenue? Will it ultimately decrease content consumption? Do people care enough to justify the R&D costs? Should you only leverage the APIs on non-browsers (because all browsers offer a native user-interaction to enable PiP)?

Not all PiPs are created equal. Personally, I prefer Chrome's and Opera's Desktop approach.

  • Chrome offers a user-interaction and an API. It is also allows you to disable Picture-in-Picture through the API, something which the other Chromium-based browsers do not seem to offer. The trade-off is that your viewers need to know that they have to right-click the video to toggle PiP. (This user-interaction is less obvious than the one provided by Opera or Firefox.)
  • Opera offers a user-interaction and an API. On top of that, their floating video player offers a scrub bar! The trade-off is that Opera's user-interaction is a constant – it's overlaying the video player, but at least it looks nice.

The biggest bummer is the lack of support for PiP on iPhones. That being said, it is promising that iPhones recognise the W3C Picture-in-Picture API as of Safari 13.1.

Picture-in-Picture isn't a finished story. When we started writing this article in December 2019, there was no stable Edge release with Picture-in-Picture, and you had to use the WebKit API to do PiP on Apple browsers. The latter situation did a 180 in 2020.
Will we see more adoptions from Smart TVs, consoles and streaming sticks? We don't have a crystal ball, but we're confident that Picture-in-Picture will continue to bring stories in the months and years to come.

☝ What do you think? Join the discussion on LinkedIn or Twitter.