MediaStream Image Capture
Abstract
This document specifies methods and camera settings to produce photographic image capture. The source of images is, or can be referenced via a MediaStreamTrack.
Status of this document
Table of Contents
- 1 Introduction
- 2 Security and Privacy Considerations
-
3 Image Capture API
- 3.1 Attributes
- 3.2 Methods
-
4 PhotoCapabilities
- 4.1 Members
-
5 PhotoSettings
- 5.1 Members
-
6 MediaSettingsRange
- 6.1 Members
-
7 RedEyeReduction
- 7.1 Values
-
8 FillLightMode
- 8.1 Values
-
9 Extensions
-
9.1 MediaTrackSupportedConstraints dictionary
- 9.1.1 Members
-
9.2 MediaTrackCapabilities dictionary
- 9.2.1 Members
-
9.3 MediaTrackConstraintSet dictionary
- 9.3.1 Members
-
9.4 MediaTrackSettings dictionary
- 9.4.1 Members
-
9.5 Additional Constrainable Properties
- 9.5.1 Members
- 10 Photo Capabilities and Constrainable Properties
-
11 MeteringMode
- 11.1 Values
-
12 Point2D
- 12.1 Members
-
13 Examples
- 13.1 Update camera pan, tilt and zoom and takePhoto()
- 13.2 Repeated grabbing of a frame with grabFrame()
- 13.3 Grabbing a Frame and Post-Processing
- 13.4 Update camera focus distance and takePhoto()
-
Conformance
- Document conventions
- Conformant Algorithms
-
Index
- Terms defined by this specification
- Terms defined by reference
-
References
- Normative References
- Informative References
- IDL Index
1. Introduction
The API defined in this document captures images from a photographic device referenced through a valid MediaStreamTrack [GETUSERMEDIA]. The produced image can be in the form of a Blob (see takePhoto() method) or as a ImageBitmap (see grabFrame()).
Reading capabilities and settings and applying constraints is done in one of two ways depending on whether it impacts the video MediaStreamTrack or not. Photo-specific capabilities and current settings can be retrieved via getPhotoCapabilities()/getPhotoSettings() and configured via takePhoto()’s PhotoSettings argument. Manipulating video-related capabilities, current settings and constraints is done via the MediaStreamTrack extension mechanism.
2. Security and Privacy Considerations
The privacy and security considerations discussed in [GETUSERMEDIA] apply to this extension specification.
Moreover, implementors should take care to prevent additional leakage of privacy-sensitive data from captured images.
For instance, embedding the user’s location in the metadata of the digitzed image (e.g. EXIF) might transmit more private data than the user is expecting.
3. Image Capture API
The User Agent must support Promises in order to implement the Image Capture API. Any Promise object is assumed to have a resolver object, with resolve() and reject() methods associated with it.
[
Exposed=Window,
SecureContext]
interface ImageCapture {
constructor(
MediaStreamTrack videoTrack);
Promise<
Blob>
takePhoto(optional
PhotoSettings photoSettings = {});
Promise<
PhotoCapabilities>
getPhotoCapabilities();
Promise<
PhotoSettings>
getPhotoSettings();
Promise<
ImageBitmap>
grabFrame();
readonly attribute
MediaStreamTrack track;
};
3.1. Attributes
track,
of type MediaStreamTrack, readonly
The
MediaStreamTrack passed into the constructor.
3.2. Methods
ImageCapture(MediaStreamTrack videoTrack)
takePhoto(optional PhotoSettings photoSettings)
takePhoto() produces the result of a single photographic exposure using the video capture device sourcing the
track and including any
PhotoSettings configured, returning an encoded image in the form of a
Blob if successful. When this method is invoked, the user agent MUST run the following steps:
- If the readyState of track provided in the constructor is not live, return a promise rejected with a new DOMException whose name is InvalidStateError, and abort these steps.
- Let p be a new promise.
-
Run the following steps in parallel:
-
Gather data from the track underlying source with the defined photoSettings and into a Blob containing a single still image. The method of doing this will depend on the underlying device.
Devices MAY temporarily stop streaming data, reconfigure themselves with the appropriate photo settings, take the photo, and then resume streaming. In this case, the stopping and restarting of streaming SHOULD cause
onmute and
onunmute events to fire on the track in question.
- If the operation cannot be completed for any reason (for example, upon invocation of multiple takePhoto() method calls in rapid succession), then reject p with a new DOMException whose name is UnknownError, and abort these steps.
- Resolve p with the Blob object.
- Return p.
getPhotoCapabilities()
getPhotoCapabilities() is used to retrieve the ranges of available configuration options, if any. When this method is invoked, the user agent MUST run the following steps:
- If the readyState of track provided in the constructor is not live, return a promise rejected with a new DOMException whose name is InvalidStateError, and abort these steps.
- Let p be a new promise.
-
Run the following steps in parallel:
- Gather data from track into a PhotoCapabilities dictionary containing the available capabilities of the device, including ranges where appropriate. The method of doing this will depend on the underlying device.
- If the data cannot be gathered for any reason (for example, the MediaStreamTrack being ended asynchronously), then reject p with a new DOMException whose name is OperationError, and abort these steps.
- Resolve p with the PhotoCapabilities dictionary.
- Return p.
getPhotoSettings()
getPhotoSettings() is used to retrieve the current configuration settings values, if any. When this method is invoked, the user agent MUST run the following steps:
- If the readyState of track provided in the constructor is not live, return a promise rejected with a new DOMException whose name is InvalidStateError, and abort these steps.
- Let p be a new promise.
-
Run the following steps in parallel:
- Gather data from track into a PhotoSettings dictionary containing the current conditions in which the device is found. The method of doing this will depend on the underlying device.
- If the data cannot be gathered for any reason (for example, the MediaStreamTrack being ended asynchronously), then reject p with a new DOMException whose name is OperationError, and abort these steps.
- Resolve p with the PhotoSettings dictionary.
- Return p.
grabFrame()
grabFrame() takes a snapshot of the live video being held in
track, returning an
ImageBitmap if successful.
grabFrame() returns data only once upon being invoked. When this method is invoked, the user agent MUST run the following steps:
- If the readyState of track provided in the constructor is not live, return a promise rejected with a new DOMException whose name is InvalidStateError, and abort these steps.
- Let p be a new promise.
-
Run the following steps in parallel:
- Gather data from track into an ImageBitmap object. The width and height of the ImageBitmap object are derived from the constraints of track.
- If the operation cannot be completed for any reason (for example, upon invocation of multiple grabFrame()/takePhoto() method calls in rapid succession), then reject p with a new DOMException whose name is UnknownError, and abort these steps.
- Resolve p with the ImageBitmap object.
- Return p.
4. PhotoCapabilities
dictionary PhotoCapabilities {
RedEyeReduction redEyeReduction;
MediaSettingsRange imageHeight;
MediaSettingsRange imageWidth;
sequence<
FillLightMode>
fillLightMode;
};
4.1. Members
redEyeReduction,
of type RedEyeReduction
The
red eye reduction capacity of the source.
imageHeight,
of type MediaSettingsRange
This reflects the
image height range supported by the UA.
imageWidth,
of type MediaSettingsRange
This reflects the
image width range supported by the UA.
fillLightMode,
of type sequence<FillLightMode>
This reflects the supported
fill light mode (flash) settings, if any.
The supported resolutions are presented as segregated
imageWidth and
imageHeight ranges to prevent increasing the fingerprinting surface and to allow the UA to make a best-effort decision with regards to actual hardware configuration.
5. PhotoSettings
dictionary PhotoSettings {
FillLightMode fillLightMode;
double imageHeight;
double imageWidth;
boolean redEyeReduction;
};
5.1. Members
redEyeReduction,
of type boolean
This reflects whether camera
red eye reduction is desired
imageHeight,
of type double
This reflects the desired
image height. The UA MUST select the closest height value to this setting if it supports a discrete set of height options.
imageWidth,
of type double
This reflects the desired
image width. The UA MUST select the closest width value to this setting if it supports a discrete set of width options.
fillLightMode,
of type FillLightMode
This reflects the desired
fill light mode (flash) setting.
dictionary MediaSettingsRange {
double max;
double min;
double step;
};
max,
of type double
The maximum value of this setting
min,
of type double
The minimum value of this setting
step,
of type double
The minimum difference between consecutive values of this setting.
enum RedEyeReduction {
"never",
"always",
"controllable"
};
7.1. Values
never
Red eye reduction is not available in the device.
always
Red eye reduction is available in the device and it is always configured to true.
controllable
Red eye reduction is available in the device and it is controllable by the user via
redEyeReduction.
enum FillLightMode {
"auto",
"off",
"flash"
};
8.1. Values
auto
The video device’s fill light will be enabled when required (typically low light conditions). Otherwise it will be off. Note that auto does not guarantee that a flash will fire when
takePhoto() is called. Use
flash to guarantee firing of the flash for
takePhoto() method.
off
The source’s fill light and/or flash will not be used.
flash
This value will always cause the flash to fire for
takePhoto() method.
9. Extensions
This Section defines a new set of constrainable properties for MediaStreamTrack that can be applied in order to make its behavior more suitable for taking pictures. Use of these constraints via MediaStreamTrack’s methods getCapabilities(), getSettings(), getConstraints() and applyConstraints() will modify the behavior of the ImageCapture object’s track.
MediaTrackSupportedConstraints is extended here with the list of constraints that a User Agent recognizes for controlling the photo capabilities. This dictionary can be retrieved using MediaDevices getSupportedConstraints() method.
partial dictionary
MediaTrackSupportedConstraints {
boolean whiteBalanceMode = true;
boolean exposureMode = true;
boolean focusMode = true;
boolean pointsOfInterest = true;
boolean exposureCompensation = true;
boolean exposureTime = true;
boolean colorTemperature = true;
boolean iso = true;
boolean brightness = true;
boolean contrast = true;
boolean pan = true;
boolean saturation = true;
boolean sharpness = true;
boolean focusDistance = true;
boolean tilt = true;
boolean zoom = true;
boolean torch = true;
};
whiteBalanceMode,
of type boolean, defaulting to true
Whether
white balance mode constraining is recognized.
colorTemperature,
of type boolean, defaulting to true
Whether
color temperature constraining is recognized.
exposureMode,
of type boolean, defaulting to true
Whether
exposure constraining is recognized.
exposureCompensation,
of type boolean, defaulting to true
Whether
exposure compensation constraining is recognized.
exposureTime,
of type boolean, defaulting to true
Whether
exposure time constraining is recognized.
iso,
of type boolean, defaulting to true
Whether
ISO constraining is recognized.
focusMode,
of type boolean, defaulting to true
Whether
focus mode constraining is recognized.
pointsOfInterest,
of type boolean, defaulting to true
Whether
points of interest are supported.
brightness,
of type boolean, defaulting to true
Whether
brightness constraining is recognized.
contrast,
of type boolean, defaulting to true
Whether
contrast constraining is recognized.
pan,
of type boolean, defaulting to true
Whether
pan constraining is recognized.
saturation,
of type boolean, defaulting to true
Whether
saturation constraining is recognized.
sharpness,
of type boolean, defaulting to true
Whether
sharpness constraining is recognized.
focusDistance,
of type boolean, defaulting to true
Whether
focus distance constraining is recognized.
tilt,
of type boolean, defaulting to true
Whether
tilt constraining is recognized.
zoom,
of type boolean, defaulting to true
Whether configuration of the
zoom level is recognized.
torch,
of type boolean, defaulting to true
Whether configuration of
torch is recognized.
MediaTrackCapabilities is extended here with the capabilities specific to image capture. This dictionary is produced by the UA via getCapabilities() and represents the supported ranges and enumerations of the supported constraints.
partial dictionary
MediaTrackCapabilities {
sequence<
DOMString>
whiteBalanceMode;
sequence<
DOMString>
exposureMode;
sequence<
DOMString>
focusMode;
MediaSettingsRange exposureCompensation;
MediaSettingsRange exposureTime;
MediaSettingsRange colorTemperature;
MediaSettingsRange iso;
MediaSettingsRange brightness;
MediaSettingsRange contrast;
MediaSettingsRange saturation;
MediaSettingsRange sharpness;
MediaSettingsRange focusDistance;
MediaSettingsRange pan;
MediaSettingsRange tilt;
MediaSettingsRange zoom;
sequence<
boolean>
torch;
};
whiteBalanceMode,
of type sequence<DOMString>
A sequence of supported
white balance modes. Each string MUST be one of the members of
MeteringMode.
colorTemperature,
of type MediaSettingsRange
This range reflects the supported correlated
color temperatures to be used for the scene white balance calculation.
exposureMode,
of type sequence<DOMString>
A sequence of supported
exposure modes. Each string MUST be the members of
MeteringMode.
exposureCompensation,
of type MediaSettingsRange
This reflects the supported range of
exposure compensation. The supported range can be, and usually is, centered around 0 EV.
exposureTime,
of type MediaSettingsRange
This reflects the supported range of
exposure time. Values are numeric. Increasing values indicate increasing exposure time.
iso,
of type MediaSettingsRange
This reflects the permitted range of
ISO values.
focusMode,
of type sequence<DOMString>
A sequence of supported
focus modes. Each string MUST be one of the members of
MeteringMode.
brightness,
of type MediaSettingsRange
This reflects the supported range of
brightness setting of the camera. Values are numeric. Increasing values indicate increasing brightness.
contrast,
of type MediaSettingsRange
This reflects the supported range of
contrast. Values are numeric. Increasing values indicate increasing contrast.
pan,
of type MediaSettingsRange
This reflects the
pan value range supported by the UA and by the track.
If the track has been created without requesting permission to use (as defined in [permissions]) a PermissionDescriptor with its name member set to camera and its panTiltZoom member set to true or if that permission request is denied, the track does not support pan.
In that case the UA MUST NOT expose the pan value range but MAY provide an empty MediaSettingsRange dictionary to indicate that the underlying video source supports pan.
saturation,
of type MediaSettingsRange
This reflects the permitted range of
saturation setting. Values are numeric. Increasing values indicate increasing saturation.
sharpness,
of type MediaSettingsRange
This reflects the permitted
sharpness range of the camera. Values are numeric. Increasing values indicate increasing sharpness, and the minimum value always implies no sharpness enhancement or processing.
focusDistance,
of type MediaSettingsRange
This reflects the
focus distance value range supported by the UA.
tilt,
of type MediaSettingsRange
This reflects the
tilt value range supported by the UA and by the track.
If the track has been created without requesting permission to use (as defined in [permissions]) a PermissionDescriptor with its name member set to camera and its panTiltZoom member set to true or if that permission request is denied, the track does not support tilt.
In that case the UA MUST NOT expose the tilt value range but MAY provide an empty MediaSettingsRange dictionary to indicate that the underlying video source supports tilt.
zoom,
of type MediaSettingsRange
This reflects the
zoom value range supported by the UA and by the track.
If the track has been created without requesting permission to use (as defined in [permissions]) a PermissionDescriptor with its name member set to camera and its panTiltZoom member set to true or if that permission request is denied, the track does not support zoom.
In that case the UA MUST NOT expose the zoom value range but MAY provide an empty MediaSettingsRange dictionary to indicate that the underlying video source supports zoom.
torch,
of type sequence<boolean>
If the source cannot turn on
torch, a single false is reported.
If the source cannot turn off
torch, a single true is reported.
If the script can control the feature, the source reports a list with both true and false as possible values.
MediaTrackConstraintSet [GETUSERMEDIA] dictionary is used for both reading the current status with getConstraints() and for applying a set of constraints with applyConstraints().
partial dictionary
MediaTrackConstraintSet {
ConstrainDOMString whiteBalanceMode;
ConstrainDOMString exposureMode;
ConstrainDOMString focusMode;
ConstrainPoint2D pointsOfInterest;
ConstrainDouble exposureCompensation;
ConstrainDouble exposureTime;
ConstrainDouble colorTemperature;
ConstrainDouble iso;
ConstrainDouble brightness;
ConstrainDouble contrast;
ConstrainDouble saturation;
ConstrainDouble sharpness;
ConstrainDouble focusDistance;
(
boolean or
ConstrainDouble)
pan;
(
boolean or
ConstrainDouble)
tilt;
(
boolean or
ConstrainDouble)
zoom;
ConstrainBoolean torch;
};
whiteBalanceMode,
of type ConstrainDOMString
This string MUST be one of the members of
MeteringMode. See
white balance mode constrainable property.
exposureMode,
of type ConstrainDOMString
This string MUST be one of the members of
MeteringMode. See
exposure constrainable property.
focusMode,
of type ConstrainDOMString
This string MUST be one of the members of
MeteringMode. See
focus mode constrainable property.
colorTemperature,
of type ConstrainDouble
See
color temperature constrainable property.
exposureCompensation,
of type ConstrainDouble
See
exposure compensation constrainable property.
exposureTime,
of type ConstrainDouble
See
exposure time constrainable property.
iso,
of type ConstrainDouble
See
iso constrainable property.
pointsOfInterest,
of type ConstrainPoint2D
See
points of interest constrainable property.
brightness,
of type ConstrainDouble
See
brightness constrainable property.
contrast,
of type ConstrainDouble
See
contrast constrainable property.
pan,
of type (boolean or ConstrainDouble)
See
pan constrainable property.
saturation,
of type ConstrainDouble
See
saturation constrainable property.
sharpness,
of type ConstrainDouble
See
sharpness constrainable property.
focusDistance,
of type ConstrainDouble
See
focus distance constrainable property.
tilt,
of type (boolean or ConstrainDouble)
See
tilt constrainable property.
zoom,
of type (boolean or ConstrainDouble)
See
zoom constrainable property.
torch,
of type ConstrainBoolean
See
torch constrainable property.
When the getSettings() method is invoked on a video stream track, the user agent must return the extended MediaTrackSettings dictionary, representing the current status of the underlying user agent.
partial dictionary
MediaTrackSettings {
DOMString whiteBalanceMode;
DOMString exposureMode;
DOMString focusMode;
sequence<
Point2D>
pointsOfInterest;
double exposureCompensation;
double exposureTime;
double colorTemperature;
double iso;
double brightness;
double contrast;
double saturation;
double sharpness;
double focusDistance;
double pan;
double tilt;
double zoom;
boolean torch;
};
whiteBalanceMode,
of type DOMString
Current
white balance mode setting. The string MUST be one of the members of
MeteringMode.
exposureMode,
of type DOMString
Current
exposure mode setting. The string MUST be one of the members of
MeteringMode.
colorTemperature,
of type double
Color temperature in use for the white balance calculation of the scene. This field is only significant if
whiteBalanceMode is
manual.
exposureCompensation,
of type double
Current
exposure compensation setting. A value of 0 EV is interpreted as no exposure compensation. This field is only significant if
exposureMode is
continuous or
single-shot
exposureTime,
of type double
Current
exposure time setting. This field is only significant if
exposureMode is
manual.
iso,
of type double
Current camera
ISO setting.
focusMode,
of type DOMString
Current
focus mode setting. The string MUST be one of the members of
MeteringMode.
pointsOfInterest,
of type sequence<Point2D>
A sequence of
Point2Ds in use as
points of interest for other settings, e.g. Focus, Exposure and Auto White Balance.
brightness,
of type double
This reflects the current
brightness setting of the camera.
contrast,
of type double
This reflects the current
contrast setting of the camera.
pan,
of type double
This reflects the current
pan setting of the camera.
If the track has been created without requesting permission to use (as defined in [permissions]) a PermissionDescriptor with its name member set to camera and its panTiltZoom member set to true or if that permission request is denied, the track does not support pan.
In that case the UA MUST NOT expose the pan setting.
saturation,
of type double
This reflects the current
saturation setting of the camera.
sharpness,
of type double
This reflects the current
sharpness setting of the camera.
focusDistance,
of type double
This reflects the current
focus distance setting of the camera.
tilt,
of type double
This reflects the current
tilt setting of the camera.
If the track has been created without requesting permission to use (as defined in [permissions]) a PermissionDescriptor with its name member set to camera and its panTiltZoom member set to true or if that permission request is denied, the track does not support tilt.
In that case the UA MUST NOT expose the tilt setting.
zoom,
of type double
This reflects the current
zoom setting of the camera.
If the track has been created without requesting permission to use (as defined in [permissions]) a PermissionDescriptor with its name member set to camera and its panTiltZoom member set to true or if that permission request is denied, the track does not support zoom.
In that case the UA MUST NOT expose the zoom setting.
torch,
of type boolean
Current camera
torch configuration setting.
9.5. Additional Constrainable Properties
dictionary ConstrainPoint2DParameters {
sequence<
Point2D>
exact;
sequence<
Point2D>
ideal;
};
typedef (
sequence<
Point2D> or
ConstrainPoint2DParameters) ConstrainPoint2D;
9.5.1. Members
exact,
of type sequence<Point2D>
The exact required value of
points of interest.
ideal,
of type sequence<Point2D>
The ideal (target) value of
points of interest.
10. Photo Capabilities and Constrainable Properties
Many of the mentioned photo and video capabilities mirror hardware features that are hard to define since they can be implemented in a number of ways. Moreover, manufacturers tend to publish vague definitions to protect their intellectual property.
-
White balance mode is a setting that cameras use to adjust for different color temperatures. Color temperature is the temperature of background light (usually measured in Kelvin). This setting can usually be automatically and continuously determined by the implementation, but it’s also common to offer a manual mode in which the estimated temperature of the scene illumination is hinted to the implementation. Typical temperature ranges for popular modes are provided below:
Mode
Kelvin range
| incandescent
| 2500-3500
|
| fluorescent
| 4000-5000
|
| warm-fluorescent
| 5000-5500
|
| daylight
| 5500-6500
|
| cloudy-daylight
| 6500-8000
|
| twilight
| 8000-9000
|
| shade
| 9000-10000
|
- Exposure is the amount of light that is allowed to fall on the photosensitive device. In auto-exposure modes (single-shot or continuous exposureMode), the exposure time and/or camera aperture are automatically adjusted by the implementation based on the subject of the photo. In manual exposureMode, these parameters are set to fixed absolute values.
- Focus mode describes the focus setting of the capture device (e.g. auto or manual).
-
Points of interest describe the metering area centers used in other settings, e.g. exposure, white balance mode and focus mode each one being a Point2D (usually these three controls are modified simultaneously by the so-called 3A algorithm: auto-focus, auto-exposure, auto-white-balance).
A Point2D Point of Interest is interpreted to represent a pixel position in a normalized square space ({x,y} ∈ [0.0, 1.0]). The origin of coordinates {x,y} = {0.0, 0.0} represents the upper leftmost corner whereas the {x,y} = {1.0, 1.0} represents the lower rightmost corner: the x coordinate (columns) increases rightwards and the y coordinate (rows) increases downwards. Values beyond the mentioned limits will be clamped to the closest allowed value.
- Exposure Compensation is a numeric camera setting that adjusts the exposure level from the current value used by the implementation. This value can be used to bias the exposure level enabled by auto-exposure, and usually is a symmetric range around 0 EV (the no-compensation value). This value is only used in single-shot and continuous exposureMode.
- Exposure Time is a numeric camera setting that controls the length of time during which light is allowed to fall on the photosensitive device. This value is used in manual exposureMode to control exposure. The value is in 100 microsecond units. That is, a value of 1.0 means an exposure time of 1/10000th of a second and a value of 10000.0 means an exposure time of 1 second.
- The ISO setting of a camera describes the sensitivity of the camera to light. It is a numeric value, where the lower the value the greater the sensitivity. This value should follow the [iso12232] standard.
- Red Eye Reduction is a feature in cameras that is designed to limit or prevent the appearance of red pupils ("Red Eye") in photography subjects due prolonged exposure to a camera’s flash.
- [LIGHTING-VOCABULARY] defines brightness as "the attribute of a visual sensation according to which an area appears to emit more or less light" and in the context of the present API, it refers to the numeric camera setting that adjusts the perceived amount of light emitting from the photo object. A higher brightness setting increases the intensity of darker areas in a scene while compressing the intensity of brighter parts of the scene. The range and effect of this setting is implementation dependent but in general it translates into a numerical value that is added to each pixel with saturation.
- Contrast is the numeric camera setting that controls the difference in brightness between light and dark areas in a scene. A higher contrast setting reflects an expansion in the difference in brightness. The range and effect of this setting is implementation dependent but it can be understood as a transformation of the pixel values so that the luma range in the histogram becomes larger; the transformation is sometimes as simple as a gain factor.
- [LIGHTING-VOCABULARY] defines saturation as "the colourfulness of an area judged in proportion to its brightness" and in the current context it refers to a numeric camera setting that controls the intensity of color in a scene (i.e. the amount of gray in the scene). Very low saturation levels will result in photos closer to black-and-white. Saturation is similar to contrast but referring to colors, so its implementation, albeit being platform dependent, can be understood as a gain factor applied to the chroma components of a given image.
-
Sharpness is a numeric camera setting that controls the intensity of edges in a scene. Higher sharpness settings result in higher contrast along the edges, while lower settings result in less contrast and blurrier edges (i.e. soft focus). The implementation is platform dependent, but it can be understood as the linear combination of an edge detection operation applied on the original image and the original image itself; the relative weights being controlled by this sharpness.
-
Image width and image height represent the supported/desired resolution of the resulting photographic image after any potential sensor corrections and other algorithms are run.
The supported resolutions are managed segregated e.g.
imageWidth and
imageHeight values/ranges to prevent increasing the fingerprinting surface and to allow the UA to make a best-effort decision with regards to actual hardware configuration vis-a-vis requested constraints.
- Focus distance is a numeric camera setting that controls the focus distance of the lens. The setting usually represents distance in meters to the optimal focus distance.
-
Pan is a numeric camera setting that controls the pan of the camera. The setting represents pan in arc seconds, which are 1/3600th of a degree. Values are in the range from -180*3600 arc seconds to +180*3600 arc seconds. Positive values pan the camera clockwise as viewed from above, and negative values pan the camera counter clockwise as viewed from above.
Constraints on pan influence camera selection through fitness distance toward cameras with the ability to pan. To exert this influence without overwriting the current pan setting, pan may be constrained to true. Conversely, constraining it to false disfavors cameras with the ability to pan.
Any algorithm which uses a MediaTrackConstraintSet object whose pan dictionary member exists with a value other than false MUST either request permission to use (as defined in [permissions]) a PermissionDescriptor with its name member set to camera and its panTiltZoom member set to true, or decide not to expose the pan setting.
If the visibilityState of the top-level browsing context value is "hidden", the applyConstraints() algorithm MUST throw a SecurityError if pan dictionary member exists with a value other than false.
-
Tilt is a numeric camera setting that controls the tilt of the camera. The setting represents tilt in arc seconds, which are 1/3600th of a degree. Values are in the range from -180*3600 arc seconds to +180*3600 arc seconds. Positive values tilt the camera upward when viewed from the front, and negative values tilt the camera downward as viewed from the front.
Constraints on tilt influence camera selection through fitness distance toward cameras with the ability to tilt. To exert this influence without overwriting the current tilt setting, tilt may be constrained to true. Conversely, constraining it to false disfavors cameras with the ability to tilt.
Any algorithm which uses a MediaTrackConstraintSet object whose tilt dictionary member exists with a value other than false MUST either request permission to use (as defined in [permissions]) a PermissionDescriptor with its name member set to camera and its panTiltZoom member set to true, or decide not to expose the tilt setting.
If the visibilityState of the top-level browsing context value is "hidden", the applyConstraints() algorithm MUST throw a SecurityError if tilt dictionary member exists with a value other than false.
There is no defined order when applying
pan and
tilt, the UA is allowed to apply them in any order. In practice this should not matter since these values are absolute, so order will not affect the final position. However, if applying pan and tilt is slow enough, the order in which they are applied may be visually noticeable.
-
Zoom is a numeric camera setting that controls the focal length of the lens. The setting usually represents a ratio, e.g. 4 is a zoom ratio of 4:1. The minimum value is usually 1, to represent a 1:1 ratio (i.e. no zoom).
Constraints on zoom influence camera selection through fitness distance toward cameras with the ability to zoom. To exert this influence without overwriting the current zoom setting, zoom may be constrained to true. Conversely, constraining it to false disfavors cameras with the ability to zoom.
Any algorithm which uses a MediaTrackConstraintSet object whose zoom dictionary member exists with a value other than false MUST either request permission to use (as defined in [permissions]) a PermissionDescriptor with its name member set to camera and its panTiltZoom member set to true, or decide not to expose the zoom setting.
If the visibilityState of the top-level browsing context value is "hidden", the applyConstraints() algorithm MUST throw a SecurityError if zoom dictionary member exists with a value other than false.
- Fill light mode describes the flash setting of the capture device (e.g. auto, off, on). Torch describes the setting of the source’s fill light as continuously connected, staying on as long as track is active.
enum MeteringMode {
"none",
"manual",
"single-shot",
"continuous"
};
11.1. Values
none
This source does not offer focus/exposure/white balance mode. For setting, this is interpreted as a command to turn off the feature.
manual
The capture device is set to manually control the lens position/exposure time/white balance, or such a mode is requested to be configured.
single-shot
The capture device is configured for single-sweep autofocus/one-shot exposure/white balance calculation, or such a mode is requested.
continuous
The capture device is configured for continuous focusing for near-zero shutter-lag/continuous auto exposure/white balance calculation, or such continuous focus hunting/exposure/white balance calculation mode is requested.
A Point2D represents a location in a two dimensional space. The origin of coordinates is situated in the upper leftmost corner of the space.
dictionary Point2D {
double x = 0.0;
double y = 0.0;
};
12.1. Members
x,
of type double, defaulting to 0.0
Value of the horizontal (abscissa) coordinate.
y,
of type double, defaulting to 0.0
Value of the vertical (ordinate) coordinate.
13. Examples
13.1. Update camera pan, tilt and zoom and takePhoto()
The following example can also be found in e.g.
this codepen with minimal modifications.
<html>
<body>
<video autoplay></video>
<img>
<div>
<input id="pan" title="Pan" type="range" disabled />
<label for="pan">Pan</label>
</div>
<div>
<input id="tilt" title="Tilt" type="range" disabled />
<label for="tilt">Tilt</label>
</div>
<div>
<input id="zoom" title="Zoom" type="range" disabled />
<label for="zoom">Zoom</label>
</div>
<script>
let imageCapture;
async function getMedia() {
try {
const stream = await navigator.mediaDevices.getUserMedia({
video: {pan: true, tilt: true, zoom: true},
});
const video = document.querySelector('video');
video.srcObject = stream;
const [track] = stream.getVideoTracks();
imageCapture = new ImageCapture(track);
const capabilities = track.getCapabilities();
const settings = track.getSettings();
for (const ptz of ['pan', 'tilt', 'zoom']) {
// Check whether pan/tilt/zoom is available or not.
if (!(ptz in settings)) continue;
// Map it to a slider element.
const input = document.getElementById(ptz);
input.min = capabilities[ptz].min;
input.max = capabilities[ptz].max;
input.step = capabilities[ptz].step;
input.value = settings[ptz];
input.disabled = false;
input.oninput = async event => {
try {
// Warning: Chrome requires advanced constraints.
await track.applyConstraints({[ptz]: input.value});
} catch (err) {
console.error("applyConstraints() failed: ", err);
}
};
}
} catch (err) {
console.error(err);
}
}
async function takePhoto() {
try {
const blob = await imageCapture.takePhoto();
console.log("Photo taken: " + blob.type + ", " + blob.size + "B");
const image = document.querySelector('img');
image.src = URL.createObjectURL(blob);
} catch (err) {
console.error("takePhoto() failed: ", err);
}
}
</script>
</body>
</html>
13.2. Repeated grabbing of a frame with grabFrame()
The following example can also be found in e.g.
this codepen with minimal modifications.
<html>
<body>
<canvas></canvas>
<button id="stopButton">Stop frame grab</button>
<script>
async function grabFrames() {
try {
const canvas = document.querySelector('canvas');
const video = document.querySelector('video');
const stream = await navigator.mediaDevices.getUserMedia({video: true});
video.srcObject = stream;
const [track] = stream.getVideoTracks();
try {
const imageCapture = new ImageCapture(track);
stopButton.onclick = () => track.stop();
while (track.readyState == 'live') {
const imgData = await imageCapture.grabFrame();
canvas.width = imgData.width;
canvas.height = imgData.height;
canvas.getContext('2d').drawImage(imgData, 0, 0);
await new Promise(r => setTimeout(r, 1000));
}
} finally {
track.stop();
}
} catch (err) {
console.error(err);
}
}
</script>
</body>
</html>
13.3. Grabbing a Frame and Post-Processing
The following example can also be found in e.g.
this codepen with minimal modifications.
<html>
<body>
<canvas></canvas>
<script>
async function grabFrames() {
try {
const canvas = document.querySelector('canvas');
const video = document.querySelector('video');
const stream = await navigator.mediaDevices.getUserMedia({video: true});
video.srcObject = stream;
const [track] = stream.getVideoTracks();
try {
const imageCapture = new ImageCapture(track);
const imageBitmap = await imageCapture.grabFrame();
// |imageBitmap| pixels are not directly accessible: we need to paint
// the grabbed frame onto a <canvas>, then getImageData() from it.
const ctx = canvas.getContext('2d');
canvas.width = imageBitmap.width;
canvas.height = imageBitmap.height;
ctx.drawImage(imageBitmap, 0, 0);
// Read back the pixels from the <canvas>, and invert the colors.
const imageData = ctx.getImageData(0, 0, canvas.width, canvas.height);
const data = imageData.data;
for (let i = 0; i < data.length; i += 4) {
data[i] ^= 255; // red
data[i + 1] ^= 255; // green
data[i + 2] ^= 255; // blue
}
// Finally, draw the inverted image to the <canvas>
ctx.putImageData(imageData, 0, 0);
} finally {
track.stop();
}
} catch (err) {
console.error(err);
}
}
</script>
</body>
</html>
13.4. Update camera focus distance and takePhoto()
<html>
<body>
<video autoplay></video>
<img>
<input type="range" hidden>
<script>
let imageCapture;
async function getMedia() {
try {
const stream = await navigator.mediaDevices.getUserMedia({video: true});
const video = document.querySelector('video');
video.srcObject = stream;
const [track] = stream.getVideoTracks();
imageCapture = new ImageCapture(track);
const capabilities = track.getCapabilities();
const settings = track.getSettings();
// Check whether focus distance is available or not.
if (!capabilities.focusDistance) {
return;
}
// Map focus distance to a slider element.
const input = document.querySelector('input[type="range"]');
input.min = capabilities.focusDistance.min;
input.max = capabilities.focusDistance.max;
input.step = capabilities.focusDistance.step;
input.value = settings.focusDistance;
input.oninput = async event => {
try {
await track.applyConstraints({
focusMode: "manual",
focusDistance: input.value
});
} catch (err) {
console.error("applyConstraints() failed: ", err);
}
};
input.parentElement.hidden = false;
} catch (err) {
console.error(err);
}
}
async function takePhoto() {
try {
const blob = await imageCapture.takePhoto();
console.log("Photo taken: " + blob.type + ", " + blob.size + "B");
const image = document.querySelector('img');
image.src = URL.createObjectURL(blob);
} catch (err) {
console.error("takePhoto() failed: ", err);
}
}
</script>
</body>
</html>
Document conventions
Conformance requirements are expressed
with a combination of descriptive assertions
and RFC 2119 terminology.
The key words “MUST”, “MUST NOT”, “REQUIRED”, “SHALL”, “SHALL NOT”, “SHOULD”, “SHOULD NOT”, “RECOMMENDED”, “MAY”, and “OPTIONAL”
in the normative parts of this document
are to be interpreted as described in RFC 2119.
However, for readability,
these words do not appear in all uppercase letters in this specification.
All of the text of this specification is normative
except sections explicitly marked as non-normative, examples, and notes. [RFC2119]
Examples in this specification are introduced with the words “for example”
or are set apart from the normative text
with class="example",
like this:
This is an example of an informative example.
Informative notes begin with the word “Note”
and are set apart from the normative text
with class="note",
like this:
Note, this is an informative note.
Requirements phrased in the imperative as part of algorithms
(such as "strip any leading space characters"
or "return false and abort these steps")
are to be interpreted with the meaning of the key word
("must", "should", "may", etc)
used in introducing the algorithm.
Conformance requirements phrased as algorithms or specific steps
can be implemented in any manner,
so long as the end result is equivalent.
In particular, the algorithms defined in this specification
are intended to be easy to understand
and are not intended to be performant.
Implementers are encouraged to optimize.
Index
Terms defined by this specification
- "always", in § 7.1
- always, in § 7.1
- "auto", in § 8.1
- auto, in § 8.1
-
brightness
- Color temperature, in § 10
-
colorTemperature
- ConstrainPoint2D, in § 9.5
- ConstrainPoint2DParameters, in § 9.5
- constructor(videoTrack), in § 3.2
- "continuous", in § 11.1
- continuous, in § 11.1
- Contrast, in § 10
-
contrast
- "controllable", in § 7.1
- controllable, in § 7.1
- exact, in § 9.5.1
- Exposure, in § 10
- Exposure Compensation, in § 10
-
exposureCompensation
-
exposureMode
- Exposure Time, in § 10
-
exposureTime
- Fill light mode, in § 10
- FillLightMode, in § 8
-
fillLightMode
- "flash", in § 8.1
- flash, in § 8.1
- Focus distance, in § 10
-
focusDistance
- Focus mode, in § 10
-
focusMode
- getPhotoCapabilities(), in § 3.2
- getPhotoSettings(), in § 3.2
- grabFrame(), in § 3.2
- ideal, in § 9.5.1
- ImageCapture, in § 3
- ImageCapture(videoTrack), in § 3.2
- image height, in § 10
-
imageHeight
- Image width, in § 10
-
imageWidth
- ISO, in § 10
-
iso
- "manual", in § 11.1
- manual, in § 11.1
- max, in § 6.1
- MediaSettingsRange, in § 6
- MeteringMode, in § 11
- min, in § 6.1
- "never", in § 7.1
- never, in § 7.1
- "none", in § 11.1
- none, in § 11.1
- "off", in § 8.1
- off, in § 8.1
- Pan, in § 10
-
pan
- PhotoCapabilities, in § 4
- PhotoSettings, in § 5
- Point2D, in § 12
- Points of interest, in § 10
-
pointsOfInterest
- Red Eye Reduction, in § 10
- RedEyeReduction, in § 7
-
redEyeReduction
-
saturation
- Sharpness, in § 10
-
sharpness
- "single-shot", in § 11.1
- single-shot, in § 11.1
- step, in § 6.1
- takePhoto(), in § 3.2
- takePhoto(photoSettings), in § 3.2
- Tilt, in § 10
-
tilt
- Torch, in § 10
-
torch
- track, in § 3.1
- White balance mode, in § 10
-
whiteBalanceMode
- x, in § 12.1
- y, in § 12.1
- Zoom, in § 10
-
zoom
Terms defined by reference
-
[] defines the following terms:
- advanced constraints
- allowed required constraints for device selection
- optional basic constraints
- required constraints
-
[FileAPI] defines the following terms:
-
[GETUSERMEDIA] defines the following terms:
- "camera"
- ConstrainBoolean
- ConstrainDOMString
- ConstrainDouble
- MediaDevices
- MediaStream
- MediaStreamTrack
- MediaTrackCapabilities
- MediaTrackConstraintSet
- MediaTrackConstraints
- MediaTrackSettings
- MediaTrackSupportedConstraints
- applyConstraints()
- fitness distance
- getCapabilities()
- getConstraints()
- getSettings()
- getSupportedConstraints()
- getUserMedia()
- kind
- live
- onmute
- onunmute
- panTiltZoom
- readyState
-
[HTML] defines the following terms:
- ImageBitmap
- height
- top-level browsing context
- visibilityState
- width
-
[PERMISSIONS] defines the following terms:
- PermissionDescriptor
- request permission to use
- requesting permission to use
-
[WEBIDL] defines the following terms:
- DOMException
- DOMString
- Exposed
- InvalidStateError
- NotSupportedError
- OperationError
- Promise
- SecureContext
- SecurityError
- UnknownError
- a promise rejected with
- boolean
- double
- sequence
References
Normative References
[FileAPI]
Marijn Kruisselbrink.
File API. URL:
https://w3c.github.io/FileAPI/
[GETUSERMEDIA]
Cullen Jennings; et al.
Media Capture and Streams. URL:
https://w3c.github.io/mediacapture-main/
[HTML]
Anne van Kesteren; et al.
HTML Standard. Living Standard. URL:
https://html.spec.whatwg.org/multipage/
[PERMISSIONS]
Marcos Caceres; Mike Taylor.
Permissions. URL:
https://w3c.github.io/permissions/
[RFC2119]
S. Bradner.
Key words for use in RFCs to Indicate Requirement Levels. March 1997. Best Current Practice. URL:
https://datatracker.ietf.org/doc/html/rfc2119
[WEBIDL]
Edgar Chen; Timothy Gu.
Web IDL Standard. Living Standard. URL:
https://webidl.spec.whatwg.org/
[ISO12232]
Photography - Digital still cameras - Determination of exposure index, ISO speed ratings, standard output sensitivity, and recommended exposure index. 15 April 2006. URL:
http://www.iso.org/iso/catalogue_detail.htm?csnumber=37777
[LIGHTING-VOCABULARY]
CIE International Lighting Vocabulary: IEC International Electrotechnical Vocabulary.. 15 December 1987.
[UVC]
USB Device Class Definition for Video Devices. 9 August 2012. URL:
http://www.usb.org/developers/docs/devclass_docs/
IDL Index
[
Exposed=Window,
SecureContext]
interface
ImageCapture {
constructor(
MediaStreamTrack videoTrack);
Promise<
Blob>
takePhoto(optional
PhotoSettings photoSettings = {});
Promise<
PhotoCapabilities>
getPhotoCapabilities();
Promise<
PhotoSettings>
getPhotoSettings();
Promise<
ImageBitmap>
grabFrame();
readonly attribute
MediaStreamTrack track;
};
dictionary
PhotoCapabilities {
RedEyeReduction redEyeReduction;
MediaSettingsRange imageHeight;
MediaSettingsRange imageWidth;
sequence<
FillLightMode>
fillLightMode;
};
dictionary
PhotoSettings {
FillLightMode fillLightMode;
double imageHeight;
double imageWidth;
boolean redEyeReduction;
};
dictionary
MediaSettingsRange {
double max;
double min;
double step;
};
enum
RedEyeReduction {
"never",
"always",
"controllable"
};
enum
FillLightMode {
"auto",
"off",
"flash"
};
partial dictionary
MediaTrackSupportedConstraints {
boolean whiteBalanceMode = true;
boolean exposureMode = true;
boolean focusMode = true;
boolean pointsOfInterest = true;
boolean exposureCompensation = true;
boolean exposureTime = true;
boolean colorTemperature = true;
boolean iso = true;
boolean brightness = true;
boolean contrast = true;
boolean pan = true;
boolean saturation = true;
boolean sharpness = true;
boolean focusDistance = true;
boolean tilt = true;
boolean zoom = true;
boolean torch = true;
};
partial dictionary
MediaTrackCapabilities {
sequence<
DOMString>
whiteBalanceMode;
sequence<
DOMString>
exposureMode;
sequence<
DOMString>
focusMode;
MediaSettingsRange exposureCompensation;
MediaSettingsRange exposureTime;
MediaSettingsRange colorTemperature;
MediaSettingsRange iso;
MediaSettingsRange brightness;
MediaSettingsRange contrast;
MediaSettingsRange saturation;
MediaSettingsRange sharpness;
MediaSettingsRange focusDistance;
MediaSettingsRange pan;
MediaSettingsRange tilt;
MediaSettingsRange zoom;
sequence<
boolean>
torch;
};
partial dictionary
MediaTrackConstraintSet {
ConstrainDOMString whiteBalanceMode;
ConstrainDOMString exposureMode;
ConstrainDOMString focusMode;
ConstrainPoint2D pointsOfInterest;
ConstrainDouble exposureCompensation;
ConstrainDouble exposureTime;
ConstrainDouble colorTemperature;
ConstrainDouble iso;
ConstrainDouble brightness;
ConstrainDouble contrast;
ConstrainDouble saturation;
ConstrainDouble sharpness;
ConstrainDouble focusDistance;
(
boolean or
ConstrainDouble)
pan;
(
boolean or
ConstrainDouble)
tilt;
(
boolean or
ConstrainDouble)
zoom;
ConstrainBoolean torch;
};
partial dictionary
MediaTrackSettings {
DOMString whiteBalanceMode;
DOMString exposureMode;
DOMString focusMode;
sequence<
Point2D>
pointsOfInterest;
double exposureCompensation;
double exposureTime;
double colorTemperature;
double iso;
double brightness;
double contrast;
double saturation;
double sharpness;
double focusDistance;
double pan;
double tilt;
double zoom;
boolean torch;
};
dictionary
ConstrainPoint2DParameters {
sequence<
Point2D>
exact;
sequence<
Point2D>
ideal;
};
typedef (
sequence<
Point2D> or
ConstrainPoint2DParameters)
ConstrainPoint2D;
enum
MeteringMode {
"none",
"manual",
"single-shot",
"continuous"
};
dictionary
Point2D {
double x = 0.0;
double y = 0.0;
};