Captured Surface Control
Consider a Web application |capturer| which has used {{MediaDevices/getDisplayMedia()}} to
start capturing another [=display surface=], |capturee|. This specification introduces a set
of APIs that allow |capturer| the following new capabilities:
- Read and write |capturee|'s [=zoom level=].
-
Deliver a wheel event over |capturee|'s viewport at coordinates of |capturer|'s choosing.
Background
Nearly all video-conferencing Web applications offer their users the ability to share
[=display surfaces=] - typically a browser tab ([=display surface/browser=]), a native app's
window ([=display surface/window=]), or an entire screen ([=display surface/monitor=]).
Many of these applications also show the local user a "preview tile" with a video of the
captured [=display surface=].
All these applications suffer from one key drawback - if the user wishes to interact with a
captured [=display surface=], the user must first switch to that surface, taking them away
from the video-conferencing application. This presents a few issues:
-
Users can't simultaneously interact with the captured application and see the videos of
remote users.
-
Users are burdened by the need to repeatedly switch between the video-conferencing
application and the captured surface.
-
Users are limited in their ability to see and interact with controls exposed by the
video-conferencing application while they are interacting with the captured surface. A
non-comprehensive list of examples of such controls includes - embedded chat applications,
emoji reactions, "knock-ins" by users asking to join the call, and multimedia controls.
It bears mentioning that [[DOCUMENT-PICTURE-IN-PICTURE]]
goes a long way towards addressing some of these issues. However, it not always a suitable
solution, as not all use cases are adequately addressed by a floating window which will
often be small, which obscures arbitrary other content on the screen, and whose size and
positioning must be manually controlled by the user.
Permissions Policy Integration
This specification defines a [=policy-controlled feature=] identified by the string
`"captured-surface-control"`. Its [=policy-controlled
feature/default allowlist=] is `"self"`.
The API surfaces introduced by this specification can be categorized as either read-access
or write-access. Note that only the write-access APIs ({{CaptureController/forwardWheel}},
{{CaptureController/increaseZoomLevel}}, {{CaptureController/decreaseZoomLevel}} and
{{CaptureController/resetZoomLevel}}) are gated by the "captured-surface-control"
permissions policy.
Zoom
Definition of Zoom
We define a concept of an integer "zoom level" that can be applied to [=display
surfaces=] of any type, and which is independent of the user agent and the platform. It is
expected that in the case of [=display surface/browser=] [=display surfaces=], this
concept will match the concept of zoom level that user agents typically exposed to the
user.
-
The default zoom level of any [=display surface=] is defined to be 100. All
implementations must support this value for all [=display surface=] of any type.
-
Decreasing [=zoom level=] values represent "zooming out". The minimum theoretical value
is 1; however, user agents may cap their support for "zooming out" at a larger values,
with 100 being the largest permissible minimum value, representing lack of support for
"zooming out".
-
Increasing values represent "zooming in". This specification does not mandate a
theoretical maximum. The smallest possible maximum is 100, which represents lack of
support for "zooming in".
For a given [=display surface=] of type |surfaceType|, we define the user agent's set of
supported zoom levels for |surfaceType| as a non-empty set of integers
including at least the [=default zoom level=] (100), and not including any integers lesser
than 1.
Permitted Event Types for zoom-setting
We define the permitted event types for zoom-setting as a set composed of the
following event types:
Zoom-control APIs
partial interface CaptureController {
sequence<long> getSupportedZoomLevels();
readonly attribute long? zoomLevel;
Promise<undefined> increaseZoomLevel();
Promise<undefined> decreaseZoomLevel();
Promise<undefined> resetZoomLevel();
attribute EventHandler onzoomlevelchange;
};
getSupportedZoomLevels()
This method allows applications to discover the set of [=zoom levels=] supported by
the user agent.
When invoked, the user agent MUST run the following steps:
-
If [=this=] is not [=actively capturing=], [=exception/throw=] an
"{{InvalidStateError}}" {{DOMException}}.
- Let |surfaceType| be [=this=].{{CaptureController/[[DisplaySurfaceType]]}}.
-
If |surfaceType| is not a [=supported display surface type=], [=exception/throw=] a
"{{NotSupportedError}}" {{DOMException}}.
-
Return a monotonically increasing sequence containing all of the values in the
[=supported zoom levels=] for |surfaceType|.
zoomLevel
This attribute allows applications to discover the captured [=display surface=]'s
[=zoom level=].
On getting, the user agent MUST return [=this=].{{CaptureController/[[ZoomLevel]]}}.
increaseZoomLevel()
This method allows applications to set the captured [=display surface=]'s [=zoom
level=] one step higher than its current value.
When this method is invoked, the user agent MUST run the [=set zoom level algorithm=]
with [=this=] as the |controller| and `"increase"` as the |zoomAction|.
decreaseZoomLevel()
This method allows applications to set the captured [=display surface=]'s [=zoom
level=] one step lower than its current value.
When this method is invoked, the user agent MUST run the [=set zoom level algorithm=]
with [=this=] as the |controller| and `"decrease"` as the |zoomAction|.
resetZoomLevel()
This method allows applications to set the captured [=display surface=]'s [=zoom
level=] to 100.
When this method is invoked, the user agent MUST run the [=set zoom level algorithm=]
with [=this=] as the |controller| and `"reset"` as the |zoomAction|.
onzoomlevelchange
An [=event handler IDL attribute=] whose [=event handler event type=] is
`zoomlevelchange`.
Whenever [=this=].[[\Source]]'s [=zoom
level=] changes to |newZoomLevel|, the user agent MUST [=queue a global task=] on the
[=user interaction task source=] given the current realm's global object, which will
run the following stpes:
- If [=this=] is not [=actively capturing=], abort these steps.
- Set [=this=].{{CaptureController/[[ZoomLevel]]}} to |newZoomLevel|.
- [=Fire an event=] named `zoomlevelchange` at [=this=].
Examples of causes include:
-
The user interacted with the user agent to change the zoom level of a captured
tab.
- The capturing application called {{CaptureController/increaseZoomLevel()}}.
-
The user changed the shared [=display surface=], choosing one which has a
different [=zoom level=].
Scroll
Scrolling APIs
partial interface CaptureController {
constructor();
Promise<undefined> forwardWheel(HTMLElement? element);
};
constructor
{{CaptureController}}'s
constructor is
extended to also define and initialize the following internal slots:
Internal Slot
Initial value
| [[\ZoomLevel]] |
`null` |
| [[\ForwardWheelElement]] |
`null` |
| [[\ForwardWheelEventListener]] |
`null` |
forwardWheel()
This method allows applications to automatically forward
wheel events
from an {{HTMLElement}} to the viewport of a captured [=display surface=].
When invoked, the user agent MUST run the following steps:
-
If [=this=] is not [=actively capturing=], return a promise [=reject|rejected=] with
a {{DOMException}} object whose {{DOMException/name}} attribute has the value
{{InvalidStateError}}.
-
If [=this=] [=is self-capturing=], return a promise [=reject|rejected=] with a
{{DOMException}} object whose {{DOMException/name}} attribute has the value
{{InvalidStateError}}.
- Let |surfaceType| be [=this=].{{CaptureController/[[DisplaySurfaceType]]}}.
-
If |surfaceType| is not a [=supported display surface type=], return a promise
[=reject|rejected=] with a {{DOMException}} object whose {{DOMException/name}}
attribute has the value {{NotSupportedError}}.
- Let |element| be the method's first argument.
- Let |P| be a new {{Promise}}.
-
Run the following steps [=in parallel=]:
-
[=Get the current permission state=] of "captured-surface-control". If
the result is NOT {{PermissionState/"granted"}}, and the [=relevant global
object=] does NOT have [=transient activation=], then:
-
[=Queue a global task=] on the [=user interaction task source=] given the
current realm's [=global object=] as |global| to [=reject=] |P| with a
{{DOMException}} object whose {{DOMException/name}} attribute has the value
{{InvalidStateError}}.
- Abort these steps.
This step ensures that on the one hand, permission prompts are not be shown
without [=transient activation=], while on the one hand, if the permission
is already {{PermissionState/"granted"}},
{{CaptureController/forwardWheel()}} may be called immediately after
{{MediaDevices/getDisplayMedia()}} resolves, even if the [=transient
activation=] that permitted the call to {{CaptureController/forwardWheel()}}
has since expired.
-
[=Request permission to use=] a {{PermissionDescriptor}} with its
{{PermissionDescriptor/name}} member set to
"captured-surface-control". If the result of the request is
{{PermissionState/"denied"}}, then:
-
[=Queue a global task=] on the [=user interaction task source=] given the
current realm's [=global object=] as |global| to [=reject=] |P| with a new
{{DOMException}} object whose {{DOMException/name}} is {{NotAllowedError}}.
- Abort these steps.
-
If [=this=].{{CaptureController/[[ForwardWheelElement]]}} is not `null`,
[=remove an event listener=] with
[=this=].{{CaptureController/[[ForwardWheelElement]]}} as |eventTarget| and
[=this=].{{CaptureController/[[ForwardWheelEventListener]]}} as |listener|.
-
Set [=this=].{{CaptureController/[[ForwardWheelEventListener]]}} to `null`.
- Set [=this=].{{CaptureController/[[ForwardWheelElement]]}} to |element|.
-
If [=this=].{{CaptureController/[[ForwardWheelElement]]}} is not `null`:
-
Set [=this=].{{CaptureController/[[ForwardWheelEventListener]]}} to an
[=event listener=] defined as follows:
type
`wheel`
[=event listener/callback=]
The result of creating a new Web IDL {{EventListener}} instance
representing a reference to a function of one argument of type {{Event}}
|event|. This function executes the [=forward wheel event algorithm=]
given [=this=] and |event|.
-
[=Add an event listener=] with
[=this=].{{CaptureController/[[ForwardWheelElement]]}} as |eventTarget| and
[=this=].{{CaptureController/[[ForwardWheelEventListener]]}} as |listener|.
-
[=Queue a global task=] on the [=user interaction task source=] given the
current realm's [=global object=] as |global| to [=resolve=] |P|.
- Return |P|.
Extensions to the getDisplayMedia algorithm
Extend the
getDisplayMedia algorithm
as follows:
Recall that |p| is the promise which the algorithm returns. Immediately before the step
which resolves it, add the following steps:
-
If |controller| is not `null` and |controller|.[[\DisplaySurfaceType]]
is a [=supported display surface type=], then set
|controller|.{{CaptureController/[[ZoomLevel]]}} to |controller|.[[\Source]]'s [=zoom level=].
Subroutines
Subroutine: Actively capturing
To determine if a {{CaptureController}} |controller| is
actively capturing, run the following steps:
- Let |source| be |controller|.{{CaptureController/[[Source]]}}.
- If |source| is `null`, return `false`.
-
If |source| has been stopped, return
`false`.
- Return `true`.
Subroutine: Is self-capturing
To determine if a {{CaptureController}} |controller| is
is self-capturing, run the following steps:
- If |controller| is not [=actively capturing=], return `false`.
-
If |controller|.{{CaptureController/[[Source]]}} is a [=display surface=] of type
[=display surface/browser=], and represents the [=relevant global object=]'s
[=associated `Document`=], return `true`.
- Return `false`.
Subroutine: Supported display surface type
To determine if a [=display surface=] |surfaceType| is
supported display surface type, run the following steps:
- If |surfaceType| is [=display surface/browser=], return `true`.
- Return `false`.
Whether [=display surface/window=] should be supported is under discussion.
Subroutine: Setting the zoom level
The set zoom level algorithm, given a |controller:CaptureController| of type
{{CaptureController}} and a |zoomAction:DOMString| of type {{DOMString}} as arguments,
consists of running the following steps:
-
If |controller| is not [=actively capturing=], return a promise [=reject|rejected=] with
a {{DOMException}} object whose {{DOMException/name}} attribute has the value
{{InvalidStateError}}.
-
If |controller| [=is self-capturing=], return a promise [=reject|rejected=] with a
{{DOMException}} object whose {{DOMException/name}} attribute has the value
{{InvalidStateError}}.
- Let |surfaceType| be |controller|.{{CaptureController/[[DisplaySurfaceType]]}}.
-
If |surfaceType| is not a [=supported display surface type=], return a promise
[=reject|rejected=] with a {{DOMException}} object whose {{DOMException/name}} attribute
has the value {{NotSupportedError}}.
-
Ensure that the code is running from within the context of an event handler which was
triggered by the browser agent firing a trusted event, triggered by the user
interacting with the user agent. To do so, run the following steps:
- Let |currentEvent:Event| be {{Window}}.{{Window/event}}.
-
If |currentEvent| is {{undefined}}, return a promise [=reject|rejected=] with a
{{DOMException}} object whose {{DOMException/name}} attribute has the value
{{InvalidStateError}}.
-
If |currentEvent|.{{Event/isTrusted}} is `false`, return a promise
[=reject|rejected=] with a {{DOMException}} object whose {{DOMException/name}}
attribute has the value {{InvalidStateError}}.
-
If |currentEvent|.{{Event/type}} is not in [=permitted event types for
zoom-setting=], return a promise [=reject|rejected=] with a {{DOMException}} object
whose {{DOMException/name}} attribute has the value {{InvalidStateError}}.
It follows from these steps that {{CaptureController/increaseZoomLevel()}},
{{CaptureController/decreaseZoomLevel()}} and
{{CaptureController/resetZoomLevel()}} are only callable with [=transient
activation=], because [=permitted event types for zoom-setting=] only contains
event types that confer this activation.
In fact, our API shape implies a stronger guarantee - whereas [=transient
activation=] persists for several seconds after the user action, the API shape
here limits zoom-setting to immediately after the user's action.
-
Let |currentZoomLevel| be |controller|.{{CaptureController/[[Source]]}}'s [=zoom level=]
- Let |targetZoomLevel| be a {{long}}. Set its value as follows:
-
If |zoomAction| is `"decrease"` then:
-
If |currentZoomLevel| is the minimum value in [=supported zoom levels=], return a
promise [=reject|rejected=] with a {{DOMException}} object whose
{{DOMException/name}} attribute has the value {{InvalidStateError}}.
-
Otherwise, set |targetZoomLevel| to the value in [=supported zoom levels=] that
appears immediately after |currentZoomLevel|.
-
Else, if |zoomAction| is `"increase"` then:
-
If |currentZoomLevel| is the maximum value in [=supported zoom levels=], return a
promise [=reject|rejected=] with a {{DOMException}} object whose
{{DOMException/name}} attribute has the value {{InvalidStateError}}.
-
Otherwise, set |targetZoomLevel| to the value in [=supported zoom levels=] that
appears immediately after |currentZoomLevel|.
-
Else:
- Assert that |zoomAction| is `"reset"`.
- Set |targetZoomLevel| to `100`.
- Let |P| be a new {{Promise}}.
-
Run the following steps [=in parallel=]:
-
[=Request permission to use=] a {{PermissionDescriptor}} with its
{{PermissionDescriptor/name}} member set to
"captured-surface-control". If the result of the request is
{{PermissionState/"denied"}}, then:
-
[=Queue a global task=] on the [=user interaction task source=] given the
current realm's [=global object=] as |global| to [=reject=] |P| with a new
{{DOMException}} object whose {{DOMException/name}} is {{NotAllowedError}}.
- Abort these steps.
-
Set [=this=].{{CaptureController/[[Source]]}}'s [=zoom level=] to |targetZoomLevel|.
-
[=Queue a global task=] on the [=user interaction task source=] given the current
realm's [=global object=] as |global| to [=resolve=] |P|.
- Return |P|.
Subroutine: Forward wheel event
The forward wheel event algorithm takes a {{CaptureController}} |controller|
and a {{WheelEvent}} |event|, and runs the following steps:
- If |controller| is not [=actively capturing=], abort these steps.
- If [=this=] [=is self-capturing=], abort these steps.
- Let |surfaceType| be |controller|.{{CaptureController/[[DisplaySurfaceType]]}}.
- If |surfaceType| is not a [=supported display surface type=], abort these steps.
-
Run the following steps [=in parallel=]:
-
[=Get the current permission state=] of "captured-surface-control". If the
result is NOT {{PermissionState/"granted"}}, abort these steps.
- If |event|.{{Event/isTrusted}} is `false`, abort these steps.
-
Let [|scaledX|, |scaledY|] be the result of the [=scale element coordinates
algorithm=] on [|event|.{{MouseEvent/offsetX}}, |event|.{{MouseEvent/offsetY}}] and
[=this=].{{CaptureController/[[ForwardWheelElement]]}}.
-
[=Queue a global task=] on the [=user interaction task source=] of
|controller|.[[\Source]]'s current realm, given that
realm's global object, to [=fire an
event=] named `"wheel"` using {{WheelEvent}} with the {{MouseEvent//x}} attribute
initialized to |scaledX|, the {{MouseEvent//y}} attribute initialized to |scaledY|,
the {{WheelEvent/deltaX}} attribute initialized to |event|.|deltaX| and the
{{WheelEvent/deltaY}} attribute initialized to |event|.|deltaY|, at the
[=topmost event target=].
Subroutine: Scale element coordinates
The scale element coordinates algorithm takes {{double}} coordinates [|x|, |y|]
and a {{CaptureController}} |controller|, and run the following steps:
-
Let |scaleFactorX| be
(|x| /
|controller|.{{CaptureController/[[ForwardWheelElement]]}}.{{Element/getBoundingClientRect()}}.{{DOMRect/width}}).
-
Let |scaleFactorX| be
(|x| /
|controller|.{{CaptureController/[[ForwardWheelElement]]}}.{{Element/getBoundingClientRect()}}.{{DOMRect/height}}).
-
Let |surfaceWidth| be |controller|.{{CaptureController/[[Source]]}}'s viewport's width.
-
Let |surfaceHeight| be |controller|.{{CaptureController/[[Source]]}}'s viewport's
height.
- Let |scaledX| be `(|scaleFactorX| * |surfaceWidth|)`.
- Let |scaledY| be `(|scaleFactorY| * |surfaceHeight|)`.
- Return [|scaledX|, |scaledY|].
This subroutine assumes that |controller| is [=actively capturing=].
Privacy and Security Considerations
The API surfaces introduced in this specification allow a capturing application limited
control over a captured application. These APIs allow the capturing application to gain
access to additional pixels in the captured application. This specification employs multiple
means to ensure that new capabilities are used in accordance with the user's intentions.
Among these means:
-
All new capabilities introduced here are implicitly gated by the prior mitigations which
were employed to render screen-sharing safe.
- A new {{PermissionsPolicy}} called "captured-surface-control" is used.
-
{{CaptureController/forwardWheel()}} is designed such that only the user's scrolling over
an {{Element}} can trigger scrolling in the captured application. This API shape ensures
that the capturing application can only [=forward wheel event algorithm|forward wheel
events=] to the captured application at the time when the user agent dispatches the
trusted wheel event on the capturing application itself.
-
Setting the zoom level is gated by a requirement that is even more stringent than
[=transient activation=]. Whereas [=transient activation=] could be used several seconds
after the interaction, this specification limits zoom-setting to the time when the user
agent is dispatching the event associated with that interaction.
Zoom-setting: Limitation to specific interactions
{{CaptureController/increaseZoomLevel()}}, {{CaptureController/decreaseZoomLevel()}} and
{{CaptureController/resetZoomLevel()}} are only callable from event handlers of specific
event types - the [=permitted event types for
zoom-setting=]. These are events dispatched directly by the user agent, triggered by user
interaction. This specification intentionally excludes from this set such events as
"[=mousemove=]", which users are liable to trigger
inadvertently.
Scrolling: Limitation to specific interactions
The shape of {{CaptureController/forwardWheel()}} is intentionally chosen to limit the
capturing application's control. The application designates a specific element which, when
the user scrolls over it, the corresponding wheel events are forwarded to the captured
application.
Limiting element types
This specification does not limit the type of {{Element}} for which either
{{CaptureController/increaseZoomLevel()}}, {{CaptureController/decreaseZoomLevel()}},
{{CaptureController/resetZoomLevel()}} or {{CaptureController/forwardWheel()}} work. Such
a limitation would accomplish nothing, because malicious applications could always overlay
transparent permitted {{Element}} types on top of visible non-permitted {{Element}}s,
thereby bypassing this restriction.
The limitation of interaction types is sufficient. This is accomplished by
{{CaptureController/forwardWheel()}} through its shape, and by
{{CaptureController/increaseZoomLevel()}}, {{CaptureController/decreaseZoomLevel()}} and
{{CaptureController/resetZoomLevel()}} through their gating on
event types.