Picture-in-Picture (PiP) is a floating video player which sticks to the foreground of your screen β regardless of which application you are using. This user-experience allows viewers to eyeball your content while multitasking.
This article takes you through the PiP-enabled platforms, its mechanisms and its implementation. Readers can also find demonstrations and an overview of the market evolution. The second half contains bonus info and a conclusion.
πΊ What's your stance on Picture-in-Picture? (https://t.co/45gO3yIpWF)
— Thijs Lowette (@thijsl_) April 8, 2020
Supported Platforms
Native Picture-in-Picture, also known as "Pop Out Video" or "Baby Player", is available as a feature on certain platforms. This feature can be toggled through a native user-interaction or an API.
The platforms below have support for native Picture-in-Picture:
Device | Platform | API | Date |
---|---|---|---|
Desktop | Maxthon 2.5.6.350+ | N/A | 2009-08-26 |
Mobile | iOS (iPad only) App 9+ | AVKit | 2015-09-16 |
Desktop | Opera 37+ | W3C | 2016-05-04 |
Desktop | MacOS Safari 9+ | WebKit | 2016-09-30 |
Tablet | iOS (iPad only) Safari 9+ | WebKit | 2016-09-30 |
Mobile | Android App 8.0+ | Activity | 2017-08-21 |
Smart TV | Android App 8.0+ | Activity | 2017-08-21 |
Mobile | Android Chrome 8.0+ | N/A | 2017-08-21 |
Streaming | FireTV 6.0+ | Activity | 2017-10-01 |
Desktop | Chrome 71+ | W3C | 2018-12-04 |
Desktop | Vivaldi 2.2+ | W3C | 2018-12-13 |
Streaming | Apple TV 13.2+ | AVKit | 2019-10-13 |
Desktop | Windows Firefox 71+ | N/A | 2019-12-03 |
Desktop | Apple/Linux Firefox 72+ | N/A | 2020-01-07 |
Desktop | (Chromium) Edge 79 2.2+ | W3C | 2020-01-15 |
Desktop & iPad | Safari 13.1+ | W3C | 2020-03-24 |
There seems to be no native support for Pop Out video on the following platforms:
Device | Platform |
---|---|
Mobile | iPhone |
Smart TV | WebOS |
Streaming | Roku |
Streaming | Chromecast |
Console | Xbox |
Console | PS4 |
The following platform(s) have a related API:
- Samsung Β Tizen 2.3+, a Smart TV, offers a TVWindow API since 2015-07-06. This API allows you to embed a 'real video source' (e.g. the input from your HDMI cable). This API does not allow you to create a floating video player containing your content.
- Reach out to us to inform us on other APIs!
Mechanisms
This articles defines two mechanisms to toggle Picture-in-Picture. The first mechanism, a user-interaction, is controlled by the viewer. The second mechanism, an API, is controlled by the application developer.
π User-Interaction
Many platforms offer viewers a native user-interaction to enable Picture-in-Picture. Check one of the "[UI]" demos to view an example of this mechanism. We define two types of user-interactions.
- Direct user-interactions. For example, 1) viewers need to right-click context menu (e.g. Desktop Chrome); or 2) viewers click a PiP-button injected in the video player user-interface by the browser (e.g. Firefox); or 3) viewers can open a menu next to the address bar (e.g. Chromium Edge's Global Media Controls); and so on. There is an obvious, direct request from the viewer.
- Indirect user-interactions. For example, when the viewer is on their smartphone watching a video in fullscreen, and they click the home-button. Suddenly, the video continues in PiP. There is no obvious, direct request from the viewer.
In general, all browsers offer a user-interaction to trigger PiP. This is good news if your OTT service only lives on browsers. If your viewer really wants Picture-in-Picture, they can use the native user-interaction. Instead of spending time creating and maintaining code, you could let the marketeers or product managers write and publish FAQs.
Native applications usually require you to implement an API to get it up-and-running.
π» API
Developers can programmatically enable Picture-in-Picture through an API as hinted by table 1. Check one of the "[API]" demos to view an example of this mechanism. There are two factions in Picture-in-Picture API war: the "W3C Editor's Draft Interface" versus "everyone doing their own thing", creating a total of 4 different APIs.
W3C Picture-in-Picture API
The 'official' Web API is the W3C Picture-in-Picture API as described by the Editor's Draft at https://w3c.github.io/picture-in-picture/.
The W3C API is implemented by most Chromium-based Desktop Browsers (Chrome, Chromium Edge, Opera and Vivaldi). Safari 13.1 for iOS and MacOS also started support this API.
The W3C API can be reduced to three bullet points:
- Methods to request and exit PiP.
- An attribute to disable Picture-in-Picture.
- An event to monitor a presentation mode change.
Besides this, the API also allows:
- Videos to go automatically in PiP;
- Interaction with remote playback;
- Interaction with the Media Session API to customise controls.
This CodePen snippet provides a sample implementation of the W3C Picture-in-Picture API. This example should work on all browsers which implement the W3C API.
Other APIs are offered by Android and Apple
The W3C API is the most widely adopted one, and the WebKit API is probably the second most popular ones. The following APIs are ranked by 'most likely to come in contact with':
- WebKit JS API: This API is the other known browser API, and it's developed by Apple. This WebKit API exposes a
webkitPresentationMode
. Developers can set (and get) a presentation mode value. This value can either be 1)inline
, 2)fullscreen
or 3)picture-in-picture
. A simple example is available on their website. - iOS / MacOS AVKit API: This API can be used to enter PiP in iOS (iPad only) applications and MacOS applications. The syntax is different from the WebKit API and the W3C API.
- Android Activity API: This API can be used to enter PiP in Android applications. The syntax is different from the W3C API.
Use-cases
Picture-in-Picture creates more engagement opportunities for video content. The following use-cases tie in nicely with a pop-out player.
i. "One Eyeball Content". Your app offers videos where you are fine with your viewers only glimpsing at it. Your content doesn't require their full attention, but they should still be able to pause (and play) it when multi-tasking.
Examples: music videos, documentaries, ...
ii. "Connected Content". Your app offers videos which is connected to a specific activity. Your content is beneficial (or essential) to view when doing a related activity.
Examples: training/educational videos, screen-sharing presentations, ...
iii. "Force-feed Content". Your app offers videos which must reach the viewer.
Examples: advertisements, ...
A question for the readers: why did you decide to read this article? Share your use-case through LinkedIn or Twitter.
π©βπ» Implementation
Somehow this article lit a fire under you. You need Picture-in-Picture, and you need it yesterday. The good news: you only have to get familiar with a maximum of 4 APIs.
1. W3C
β You could decide to ignore W3C-specification-compliant browsers, and rely and the viewer figuring out how to use the native UI. This leaves you 3 APIs to get familiar with.
β
To programmatically implement Picture-in-Picture on these browsers, get started with Google's article on Watching video using Picture-in-Picture.
2. WebKit
β You could decide to use Safari's native <video>
-element which has an out-of-the-box PiP icon. This leaves you 2 APIs to get familiar with.
β β You could decide to not do programmatic Picture-in-Picture on Safari before version 13.1. As of Safari 13.1, developers can use the W3C Picture-in-Picture API.
β
To programmatically implement Picture-in-Picture before Safari 13.1, get started with Apple's article on Adding Picture in Picture to Your Safari Media Controls.
3. AVKit
β You could decide to use the standard AV Player which has an out-of-the-box PiP button. This leaves you 1 API to get familiar with.
β
To programmatically implement Picture-in-Picture, get started with Apple's article on Adopting Picture in Picture in a Custom Player.
4. Activity
β To programmatically implement Picture-in-Picture on Android (and FireTV), get started with Android's article on Picture-in-Picture support in Activities.
πΊ Demos
The videos below illustrate Picture-in-Picture on different platforms. Items with "[UI]" are experiences where PiP is enabled with a native user-interaction. Items with "[API]" are experiences where PiP is enabled with an API call.
π Market evolution
Maxthon, a Chinese web browser, seemingly introduced "Tear-Off Video" around 2009. They built this feature on-top of their Flash player, and it's only toggleable through a user-interaction.
Apple popularised Picture-in-Picture in 2015 by adding it to Safari. In the 5 years since then, Picture-in-Picture got implemented on every platform, except iPhones, some streaming sticks & consoles, and non-Android Smart TVs.
2020
- December 2019 / January 2020: Mozilla launched Picture-in-Picture for Firefox.
- January 2020: Microsoft launches Chromium Edge, which has support for Picture-in-Picture
- February 2020: Opera improved Picture-in-Picture in R2020. It now has a video timer, a back-to-tab button and a next-track button.
- March 2020: Vivaldi still improving "Popout Video".
- March 2020: Chromium Edge adds Picture-in-Picture to the Global Media Controls.
- March 2020: Safari 13.1 implements the Picture-in-Picture W3C API instead of the WebKit one. Apple still restricts Picture-in-Picture to Safari for MacOS and iPad. There's one remarkable change to the iPhone developers console though. The
document.pictureInPictureEnabled
property now returnsfalse
instead ofundefined
. This change means that iPhones recognise the W3C API as of Safari 13.1.
We'll keep this section updated. Reach out to us if anything is missing.
Tell me more...
Consider continuing your reading journey if the following topics interest you:
- Variations;
- The PiP conundrum;
- Custom controls;
- Disabling Picture-in-Picture;
- No API;
- Context Menus are mean;
- Making money with PiP;
- Intrusion;
- Conclusion.
Variations
People associate different definitions with Picture-in-Picture. There's native PiP, which we have been talking about, but there are two other types.
In-App Picture-in-Picture
In-App Picture-in-Picture is a floating video player contained within a single-page application, website or native application.
This floating video player will not stick to your foreground when you navigate to another web page or application. This variation is often linked with a visibility API to automatically enter PiP when the regular video player container is no longer visible when scrolling.
The implementation is up to the application developer. For example, for websites, developers often leverage CSS to configure an absolute position and high Z-index for the video player container.
This variation offers the advantage of 'full-control' over native Picture-in-Picture. Developers can fully customise the floating video player, whereas the native one has a default set of nonadjustable controls. This could be useful when integrating advertisements in your Picture-in-Picture experience. (Or when you want to have a simple scrub bar in your floating video player β something which most native PiPs do not offer.)
Multi-screen, Multi-angle or Screen-in-screen
Multi-screen is when one video (player) is contained within another video player, or when multiple video players are placed side-by-side. This is Picture-in-Picture according to Wikipedia.
This experience is often associated with the more traditional television experience, where the feed of one camera is overlaying another feed, or when two channels/movies are playing next to each other.
This implementation is very useful for content with a social/live component. For example, in (e-)sports, you want to display the commentators and the sports game at the same time.
π€ The PiP Conundrum
The primary advantage of Picture-in-Picture is straightforward. The feature offers viewers the possibility to interact with videos outside of its apps or websites. Logically speaking, if viewers have more 'locations' to consume content, it increases content consumption, thus benefiting content providers.
On the other hand, nonchalant viewership might not be in your best interest as it could reduce content consumption. Viewers might miss important scenes, loose track of what's happening, get less invested, and forsake the content.
We don't have any data or papers to back this claim, but it would definitely be an interesting A/B test.
And that, ladies and gentlemen, is the PiP conundrum.
Custom controls
In general, for browsers and Apple applications, it is not possible to configure custom controls for the floating video players which PiP spawns.
There is some hope though. The W3C spec writes that the Media Session API can be used to customise the available controls. On Desktop Chrome, developers can already use this Media Session API to map some controls.
Instead of native Picture-in-Picture, you could opt for In-App Picture-in-Picture if your use-case permits it. This variation gives you full control over the look-and-feel, but it will no longer work cross-application.
You can customise the Picture-in-Picture UX (and controls) for Android applications, because you're styling an activity instead. (Check out the Android App [API] example!)
Disabling Picture-in-Picture
There are two approaches to disable Picture-in-Picture.
The first is programmatic by leveraging a Picture-in-Picture API. The W3C API gives you a disablePictureInPicture
attribute which you can configure for a <video>-element. We've set up a CodePen snippet to demonstrate the implementation.
This API is unfortunately unavailable on most browsers β even the ones which implement the W3C API.
You could also make it more difficult for viewers to trigger the native user-interaction.
- You can create a custom context menu to respond to a right-click, for example by intercepting the
oncontextmenu
event. This approach replaces the native right-click menu β a right-click menu which might allow to viewer to trigger Picture-in-Picture. - You can overlay invisible elements on top of your
<video>
-element. Viewers would now right-click an element (e.g. a<div>
) which doesn't have a "Picture-in-Picture" item in its right-click menu. - You can configure
pointer-events: none;
for the<video>
-element. The right-click event will no longer spawn a right-click menu.
The second approach is by letting the viewer manually disable Picture-in-Picture in their browser settings.
This manual approach differs from browser to browser.
No API
There are two obvious camps with regards to Picture-in-Picture:
- Those who don't offer an API (e.g. Desktop Firefox and Mobile Browsers).
- Those who offer an API (e.g. Google on Desktop and Android applications, Apple on Desktop an Applications).
The trade-off is as following:
- When there is no API, all end-users can immediately enjoy the feature and developers don't need to opt-in.
- When there is an API, developers can customise the user-experience and user-interface.
We identified three main scenarios where there is no Picture-in-Picture API:
- You're dealing with a mobile web browser;
- You're dealing with Firefox;
- You're dealing with the UC Browser.
The people at Mozilla wrote an interesting article on why they prefer no API.
Context Menus are mean
As we mentioned, a direct user-interaction is a common mechanism to enable Picture-in-Picture. This direct user-interaction often boils down to right-clicking the video player, and selecting a 'Picture-in-Picture' menu-item.
This user-interaction is sometimes prevented by video players. How? The approaches described in the above "Disabling Picture-in-Picture" section are often implemented without taking Picture-in-Picture into account. Be careful when you create custom context menus, or when you overlay elements on top of your <video>
-element.
Be sure to inform your team on the trade-offs of customizing the video player UI. The native right-click menu can be really useful, and you might be disappointing PiP-lovers by disabling this feature.
π€ Making money with PiP
One of the allures of Picture-in-Picture is the idea that you can stream more advertisements to your viewers. While force-feeding advertisements is a good way to earn an extra buck, you have to remember that there's a lack of support for custom controls. This constraint means that you'll be able to generate ad impressions, but no ad clicks.
Additionally, for Picture-in-Picture, two remarks are relevant here:
- You can only push a
<video>
-element to Picture-in-Picture. Advertisement videos often live within<iframe>
-elements, which rendered in fullscreen. - You cannot easily switch between
<video>
-elements; only one element can be active.
If you are currently debating client-side versus server-side ad-insertion (SSAI), you might want to give SSAI the preference. SSAI ads are stitched inside your stream and could guarantee smooth transitions between regular content and advertisement content.
You might be able to generate extra ad revenue with Picture-in-Picture, but it's bound to give you a headache. Also: viewers never tend to like ads, and they definitely do not like PiP ads.
AVOD is not the only business model. You could sell Picture-in-Picture as one of your premium features. YouTube packages this as one of the feature of YouTube Premium, which costs $11.99/month.
If you commercialize Picture-in-Picture, you need to figure out how to disable it for your free users. This is somewhat ironic, as it means you need to disable right-click menus.
Intrusion
Some people hate PiP. Imagine this: you are new to a website, and suddenly video players pop out right and left. Now imagine that these pop-out players are playing advertisements. This user-experience is a red flag to a lot of people, and they will want to leave your service ASAP.
Unwarranted intrusion is an argument why some platforms offer no support for a Picture-in-Picture API. Developers might abuse the API, and drag down the browsing experience. They argue that if a viewer really wants to enable PiP, they can use a native user-interaction.
Some platforms recognize this problem, and try to tackle it in different approaches.
- On Android Chrome, you need to be in fullscreen before Picture-in-Picture can be toggled. Chrome probably makes the assumption that because you are in fullscreen, the content must be of interest to you, because you did a user-interaction to put it in fullscreen. Hence, it should be OK to play the content in picture-in-picture when you click the home button if you are playing in fullscreen.
- Some platforms only allow a Picture-in-Picture API when you are using a native application. Because you have downloaded and installed the app, you probably trust them, so if their PiP annoys you: complain to them.
One potential solution could be to evolve to a permission-based system similar to push notifications. Meaning, applications can request the 'Picture-in-Picture permission', and end-users explicitly allow the Picture-in-Picture permission.
π΄ Conclusion
Implementing Picture-in-Picture requires a focused mind. Some platforms have a full-fledged API, and other platforms only allow user-interactions to toggle PiP. On top of that, the behavior is different for the same browser on Desktop versus Mobile. You basically need a cheat-sheet to keep track of everything.
Content providers should debate whether PiP is a worthy investment. Will it impact ad revenue? Will it ultimately decrease content consumption? Do people care enough to justify the R&D costs? Should you only leverage the APIs on non-browsers (because all browsers offer a native user-interaction to enable PiP)?
Not all PiPs are created equal. Personally, I prefer Chrome's and Opera's Desktop approach.
- Chrome offers a user-interaction and an API. It is also allows you to disable Picture-in-Picture through the API, something which the other Chromium-based browsers do not seem to offer. The trade-off is that your viewers need to know that they have to right-click the video to toggle PiP. (This user-interaction is less obvious than the one provided by Opera or Firefox.)
- Opera offers a user-interaction and an API. On top of that, their floating video player offers a scrub bar! The trade-off is that Opera's user-interaction is a constant β it's overlaying the video player, but at least it looks nice.
The biggest bummer is the lack of support for PiP on iPhones. That being said, it is promising that iPhones recognise the W3C Picture-in-Picture API as of Safari 13.1.
Picture-in-Picture isn't a finished story. When we started writing this article in December 2019, there was no stable Edge release with Picture-in-Picture, and you had to use the WebKit API to do PiP on Apple browsers. The latter situation did a 180 in 2020.
Will we see more adoptions from Smart TVs, consoles and streaming sticks? We don't have a crystal ball, but we're confident that Picture-in-Picture will continue to bring stories in the months and years to come.
β What do you think? Join the discussion on LinkedIn or Twitter.