Embodiments of the present invention provide systems, methods, and computer storage media for identifying candidate boundaries for video segments, video segment selection using those boundaries, and text-based video editing of video segments selected via transcript interactions. In an example implementation, boundaries of detected sentences and words are extracted from a transcript, the boundaries are retimed into an adjacent speech gap to a location where voice or audio activity is a minimum, and the resulting boundaries are stored as candidate boundaries for video segments. As such, a transcript interface presents the transcript, interprets input selecting transcript text as an instruction to select a video segment with corresponding boundaries selected from the candidate boundaries, and interprets commands that are traditionally thought of as text-based operations (e.g., cut, copy, paste) as an instruction to perform a corresponding video editing operation using the selected video segment.
G06F 40/166 - Traitement de texte Édition, p.ex. insertion ou suppression
G10L 15/26 - Systèmes de synthèse de texte à partir de la parole
G10L 25/57 - Techniques d'analyses de la parole ou de la voix qui ne se limitent pas à un seul des groupes spécialement adaptées pour un usage particulier pour comparaison ou différentiation pour le traitement des signaux vidéo
The present disclosure relates to systems, methods, and non-transitory computer-readable media that implement depth-aware object move operations for digital image editing. For instance, in some embodiments, the disclosed systems determine a first object depth for a first object portrayed within a digital image and a second object depth for a second object portrayed within the digital image. Additionally, the disclosed systems move the first object to create an overlap area between the first object and the second object within the digital image. Based on the first object depth and the second object depth, the disclosed systems modify the digital image to occlude the first object or the second object within the overlap area.
G06V 10/26 - Segmentation de formes dans le champ d’image; Découpage ou fusion d’éléments d’image visant à établir la région de motif, p.ex. techniques de regroupement; Détection d’occlusion
Systems and methods for dynamic user profile projection are provided. One or more aspects of the systems and methods includes computing, by a prediction component, a predicted number of lookups for a future time period based on a lookup history of a user profile using a lookup prediction model; comparing, by the prediction component, the predicted number of lookups to a lookup threshold; and transmitting, by a projection component, the user profile to an edge server based on the comparison.
Techniques for nonlinear representations for vector objects are described that support construction of a nonlinear vector graph to represent a vector object. In an implementation, a user input is received including a plurality of points and at least one primitive. A content processing system then generates a vector object by constructing a nonlinear vector graph that specifies a nonlinear connection of the plurality of points with the at least one primitive. In some examples, the vector object is edited by applying an edit to the nonlinear vector graph. Once generated, the content processing system then outputs the vector object for display, e.g., in a user interface.
G06T 11/20 - Traçage à partir d'éléments de base, p.ex. de lignes ou de cercles
G06F 3/04845 - Techniques d’interaction fondées sur les interfaces utilisateur graphiques [GUI] pour la commande de fonctions ou d’opérations spécifiques, p.ex. sélection ou transformation d’un objet, d’une image ou d’un élément de texte affiché, détermination d’une valeur de paramètre ou sélection d’une plage de valeurs pour la transformation d’images, p.ex. glissement, rotation, agrandissement ou changement de couleur
G06T 11/60 - Edition de figures et de texte; Combinaison de figures ou de texte
Embodiments are disclosed for expanding a seed scene using proposals from a generative model of scene graphs. The method may include clustering subgraphs according to respective one or more maximal connected subgraphs of a scene graph. The scene graph includes a plurality of nodes and edges. The method also includes generating a scene sequence for the scene graph based on the clustered subgraphs. A first machine learning model determines a predicted node in response to receiving the scene sequence. A second machine learning model determines a predicted edge in response to receiving the scene sequence and the predicted node. A scene graph is output according to the predicted node and the predicted edge.
The present disclosure relates to systems, methods, and non-transitory computer-readable media that modify digital images via scene-based editing using image understanding facilitated by artificial intelligence. For example, in one or more embodiments the disclosed systems utilize generative machine learning models to create modified digital images portraying human subjects. In particular, the disclosed systems generate modified digital images by performing infill modifications to complete a digital image or human inpainting for portions of a digital image that portrays a human. Moreover, in some embodiments, the disclosed systems perform reposing of subjects portrayed within a digital image to generate modified digital images. In addition, the disclosed systems in some embodiments perform facial expression transfer and facial expression animations to generate modified digital images or animations.
G06V 10/82 - Dispositions pour la reconnaissance ou la compréhension d’images ou de vidéos utilisant la reconnaissance de formes ou l’apprentissage automatique utilisant les réseaux neuronaux
G06V 40/10 - Corps d’êtres humains ou d’animaux, p.ex. occupants de véhicules automobiles ou piétons; Parties du corps, p.ex. mains
7.
UTILIZING A GENERATIVE MACHINE LEARNING MODEL TO CREATE MODIFIED DIGITAL IMAGES FROM AN INFILL SEMANTIC MAP
The present disclosure relates to systems, methods, and non-transitory computer-readable media that modify digital images via scene-based editing using image understanding facilitated by artificial intelligence. For example, in one or more embodiments the disclosed systems utilize generative machine learning models to create modified digital images portraying human subjects. In particular, the disclosed systems generate modified digital images by performing infill modifications to complete a digital image or human inpainting for portions of a digital image that portrays a human. Moreover, in some embodiments, the disclosed systems perform reposing of subjects portrayed within a digital image to generate modified digital images. In addition, the disclosed systems in some embodiments perform facial expression transfer and facial expression animations to generate modified digital images or animations.
G06T 11/60 - Edition de figures et de texte; Combinaison de figures ou de texte
G06V 10/764 - Dispositions pour la reconnaissance ou la compréhension d’images ou de vidéos utilisant la reconnaissance de formes ou l’apprentissage automatique utilisant la classification, p.ex. des objets vidéo
G06V 10/82 - Dispositions pour la reconnaissance ou la compréhension d’images ou de vidéos utilisant la reconnaissance de formes ou l’apprentissage automatique utilisant les réseaux neuronaux
G06V 20/70 - RECONNAISSANCE OU COMPRÉHENSION D’IMAGES OU DE VIDÉOS Éléments spécifiques à la scène Étiquetage du contenu de scène, p.ex. en tirant des représentations syntaxiques ou sémantiques
Device cohort management techniques are described that are usable to control resource utilization by the devices. This is performable by managing usage together through grouping the devices through membership in a cohort. As a result, interaction with resources by the various devices is coordinated across the cohort, thereby improving device operation and user efficiency in resource usage by the devices.
Embodiments of the technology described herein provide a method for generating a unified contract view. The method identifies, within a contract change document, a change instruction for a main contract. The change instruction includes a change introduction and a change content. The method determines an editing intent associated with the change instruction. The method identifies, using the change instruction, a target element in the main contract to be changed. The method generates a unified contract view that depicts the target element modified according to the editing intent and the change content. The method causes the unified contract view to be output for display.
G06F 40/166 - Traitement de texte Édition, p.ex. insertion ou suppression
G06F 3/0484 - Techniques d’interaction fondées sur les interfaces utilisateur graphiques [GUI] pour la commande de fonctions ou d’opérations spécifiques, p.ex. sélection ou transformation d’un objet, d’une image ou d’un élément de texte affiché, détermination d’une valeur de paramètre ou sélection d’une plage de valeurs
Systems and methods for document classification are described. Embodiments of the present disclosure generate classification data for a plurality of samples using a neural network trained to identify a plurality of known classes; select a set of samples for annotation from the plurality of samples using an open-set metric based on the classification data, wherein the annotation includes an unknown class; and train the neural network to identify the unknown class based on the annotation of the set of samples.
The present disclosure relates to systems, methods, and non-transitory computer-readable media that modify two-dimensional images via scene-based editing using three-dimensional representations of the two-dimensional images. For instance, in one or more embodiments, the disclosed systems utilize three-dimensional representations of two-dimensional images to generate and modify shadows in the two-dimensional images according to various shadow maps. Additionally, the disclosed systems utilize three-dimensional representations of two-dimensional images to modify humans in the two-dimensional images. The disclosed systems also utilize three-dimensional representations of two-dimensional images to provide scene scale estimation via scale fields of the two-dimensional images. In some embodiments, the disclosed systems utilizes three-dimensional representations of two-dimensional images to generate and visualize 3D planar surfaces for modifying objects in two-dimensional images. The disclosed systems further use three-dimensional representations of two-dimensional images to customize focal points for the two-dimensional images.
The present disclosure relates to systems, methods, and non-transitory computer-readable media that modify digital images via scene-based editing using image understanding facilitated by artificial intelligence. For example, in one or more embodiments the disclosed systems utilize generative machine learning models to create modified digital images portraying human subjects. In particular, the disclosed systems generate modified digital images by performing infill modifications to complete a digital image or human inpainting for portions of a digital image that portrays a human. Moreover, in some embodiments, the disclosed systems perform reposing of subjects portrayed within a digital image to generate modified digital images. In addition, the disclosed systems in some embodiments perform facial expression transfer and facial expression animations to generate modified digital images or animations.
G06V 10/44 - Extraction de caractéristiques locales par analyse des parties du motif, p.ex. par détection d’arêtes, de contours, de boucles, d’angles, de barres ou d’intersections; Analyse de connectivité, p.ex. de composantes connectées
G06V 10/771 - Sélection de caractéristiques, p.ex. sélection des caractéristiques représentatives à partir d’un espace multidimensionnel de caractéristiques
G06V 10/80 - Fusion, c. à d. combinaison des données de diverses sources au niveau du capteur, du prétraitement, de l’extraction des caractéristiques ou de la classification
G06V 10/82 - Dispositions pour la reconnaissance ou la compréhension d’images ou de vidéos utilisant la reconnaissance de formes ou l’apprentissage automatique utilisant les réseaux neuronaux
13.
MODIFYING DIGITAL IMAGES VIA MULTI-LAYERED SCENE COMPLETION FACILITATED BY ARTIFICIAL INTELLIGENCE
The present disclosure relates to systems, methods, and non-transitory computer-readable media that modify digital images via multi-layered scene completion techniques facilitated by artificial intelligence. For instance, in some embodiments, the disclosed systems receive a digital image portraying a first object and a second object against a background, where the first object occludes a portion of the second object. Additionally, the disclosed systems pre-process the digital image to generate a first content fill for the portion of the second object occluded by the first object and a second content fill for a portion of the background occluded by the second object. After pre-processing, the disclosed systems detect one or more user interactions to move or delete the first object from the digital image. The disclosed systems further modify the digital image by moving or deleting the first object and exposing the first content fill for the portion of the second object.
G06F 3/04845 - Techniques d’interaction fondées sur les interfaces utilisateur graphiques [GUI] pour la commande de fonctions ou d’opérations spécifiques, p.ex. sélection ou transformation d’un objet, d’une image ou d’un élément de texte affiché, détermination d’une valeur de paramètre ou sélection d’une plage de valeurs pour la transformation d’images, p.ex. glissement, rotation, agrandissement ou changement de couleur
Systems and methods for dynamic user profile management are provided. One aspect of the systems and methods includes receiving, by a lookup component, a request for a user profile; computing, by a profile component, a time-to-live (TTL) refresh value for the user profile based on a lookup history of the user profile; updating, by the profile component, a TTL value of the user profile based on the request and the TTL refresh value; storing, by the profile component, the user profile and the updated TTL value in the edge database; and removing, by the edge database, the user profile from the edge database based on the updated TTL value.
In implementations of systems for training language models and preserving privacy, a computing device implements a privacy system to predict a next word after a last word in a sequence of words by processing input data using a machine learning model trained on training data to predict next words after last words in sequences of words. The training data describes a corpus of text associated with clients and including sensitive samples and non-sensitive samples. The machine learning model is trained by sampling a client of the clients and using a subset of the sensitive samples associated with the client and a subset of the non-sensitive samples associated with the client to update parameters of the machine learning model. The privacy system generates an indication of the next word after the last word in the sequence of words for display in a user interface.
The present disclosure relates to systems, methods, and non-transitory computer-readable media that modify digital images via scene-based editing using image understanding facilitated by artificial intelligence. For example, in one or more embodiments the disclosed systems utilize generative machine learning models to create modified digital images portraying human subjects. In particular, the disclosed systems generate modified digital images by performing infill modifications to complete a digital image or human inpainting for portions of a digital image that portrays a human. Moreover, in some embodiments, the disclosed systems perform reposing of subjects portrayed within a digital image to generate modified digital images. In addition, the disclosed systems in some embodiments perform facial expression transfer and facial expression animations to generate modified digital images or animations.
G06V 10/25 - Détermination d’une région d’intérêt [ROI] ou d’un volume d’intérêt [VOI]
G06V 10/44 - Extraction de caractéristiques locales par analyse des parties du motif, p.ex. par détection d’arêtes, de contours, de boucles, d’angles, de barres ou d’intersections; Analyse de connectivité, p.ex. de composantes connectées
G06V 10/82 - Dispositions pour la reconnaissance ou la compréhension d’images ou de vidéos utilisant la reconnaissance de formes ou l’apprentissage automatique utilisant les réseaux neuronaux
17.
NEURAL COMPOSITING BY EMBEDDING GENERATIVE TECHNOLOGIES INTO NON-DESTRUCTIVE DOCUMENT EDITING WORKFLOWS
One or more aspects of the method, apparatus, and non-transitory computer readable medium include obtaining an original image, a scene graph describing elements of the original image, and a description of a modification to the original image. The one or more aspects further include updating the scene graph based on the description of the modification. The one or more aspects further include generating a modified image using an image generation neural network based on the updated scene graph, wherein the modified image incorporates content based on the original image and the description of the modification.
G06T 11/60 - Edition de figures et de texte; Combinaison de figures ou de texte
G06T 3/40 - Changement d'échelle d'une image entière ou d'une partie d'image
G06T 5/50 - Amélioration ou restauration d'image en utilisant plusieurs images, p.ex. moyenne, soustraction
G06V 10/82 - Dispositions pour la reconnaissance ou la compréhension d’images ou de vidéos utilisant la reconnaissance de formes ou l’apprentissage automatique utilisant les réseaux neuronaux
18.
UTILIZING A GENERATIVE MACHINE LEARNING MODEL AND GRAPHICAL USER INTERFACE FOR CREATING MODIFIED DIGITAL IMAGES FROM AN INFILL SEMANTIC MAP
The present disclosure relates to systems, methods, and non-transitory computer-readable media that modify digital images via scene-based editing using image understanding facilitated by artificial intelligence. For example, in one or more embodiments the disclosed systems utilize generative machine learning models to create modified digital images portraying human subjects. In particular, the disclosed systems generate modified digital images by performing infill modifications to complete a digital image or human inpainting for portions of a digital image that portrays a human. Moreover, in some embodiments, the disclosed systems perform reposing of subjects portrayed within a digital image to generate modified digital images. In addition, the disclosed systems in some embodiments perform facial expression transfer and facial expression animations to generate modified digital images or animations.
G06V 10/25 - Détermination d’une région d’intérêt [ROI] ou d’un volume d’intérêt [VOI]
G06V 10/764 - Dispositions pour la reconnaissance ou la compréhension d’images ou de vidéos utilisant la reconnaissance de formes ou l’apprentissage automatique utilisant la classification, p.ex. des objets vidéo
G06V 10/82 - Dispositions pour la reconnaissance ou la compréhension d’images ou de vidéos utilisant la reconnaissance de formes ou l’apprentissage automatique utilisant les réseaux neuronaux
One aspect of systems and methods for data correction includes identifying a false label from among predicted labels corresponding to different parts of an input sample, wherein the predicted labels are generated by a neural network trained based on a training set comprising training samples and training labels corresponding to parts of the training samples; computing an influence of each of the training labels on the false label by approximating a change in a conditional loss for the neural network corresponding to each of the training labels; identifying a part of a training sample of the training samples and a corresponding source label from among the training labels based on the computed influence; and modifying the training set based on the identified part of the training sample and the corresponding source label to obtain a corrected training set.
In some examples, an environment evaluation system accesses interaction data recording interactions by users with an online platform hosted by a host system and computes, based on the interaction data, interface experience metrics. The interface experience metrics includes an individual experience metric for each user and a transition experience metric for each transition in the interactions by the users with the online platform. The environment evaluation system identifies a user with the individual experience metric below a pre-determined threshold, identifies a transition performed by the user that has a transition experience metric below a second threshold, and analyzes the transition to determine users who have performed the transition. The environment evaluation system updates the host system with the individual experience metrics and the transition metrics, based on which the host system can perform modifications of interface elements of the online platform to improve the experience.
G06Q 10/0639 - Analyse des performances des employés; Analyse des performances des opérations d’une entreprise ou d’une organisation
G06F 3/0484 - Techniques d’interaction fondées sur les interfaces utilisateur graphiques [GUI] pour la commande de fonctions ou d’opérations spécifiques, p.ex. sélection ou transformation d’un objet, d’une image ou d’un élément de texte affiché, détermination d’une valeur de paramètre ou sélection d’une plage de valeurs
Systems and methods for image generation are provided. An aspect of the systems and methods for image generation includes obtaining an original image depicting an element and a target prompt describing a modification to the element. The system may then compute a first output and a second output using a diffusion model. The first output is based on a description of the element and the second output is based on the target prompt. The system then computes a difference between the first output and the second output, and generates a modified image including the modification to the element of the original image based on the difference.
In implementations of systems for resolving conflicts in collaborative digital content editing, a computing device implements a resolution system to apply a content editing operation to a digital object. The resolution system writes an indication of the content editing operation at a first position of a local transaction stack of editing operations. The resolution system transmits editing data via a network describing the content editing operation for receipt by a server system. Relay data is received via the network from the server system describing an additional content editing operation for application to the digital object. The resolution system determines a conflict between the additional content editing operation and the content editing operation and writes an indication of the additional content editing operation at a second position of the local transaction stack of editing operations that is before the first position.
H04L 65/401 - Prise en charge des services ou des applications dans laquelle les services impliquent une session principale en temps réel et une ou plusieurs sessions parallèles additionnelles en temps réel ou sensibles au temps, p.ex. accès partagé à un tableau blanc ou mise en place d’une sous-conférence
23.
UTILIZING A WARPED DIGITAL IMAGE WITH A REPOSING MODEL TO SYNTHESIZE A MODIFIED DIGITAL IMAGE
The present disclosure relates to systems, methods, and non-transitory computer-readable media that modify digital images via scene-based editing using image understanding facilitated by artificial intelligence. For example, in one or more embodiments the disclosed systems utilize generative machine learning models to create modified digital images portraying human subjects. In particular, the disclosed systems generate modified digital images by performing infill modifications to complete a digital image or human inpainting for portions of a digital image that portrays a human. Moreover, in some embodiments, the disclosed systems perform reposing of subjects portrayed within a digital image to generate modified digital images. In addition, the disclosed systems in some embodiments perform facial expression transfer and facial expression animations to generate modified digital images or animations.
G06T 7/70 - Détermination de la position ou de l'orientation des objets ou des caméras
G06V 10/44 - Extraction de caractéristiques locales par analyse des parties du motif, p.ex. par détection d’arêtes, de contours, de boucles, d’angles, de barres ou d’intersections; Analyse de connectivité, p.ex. de composantes connectées
G06V 10/771 - Sélection de caractéristiques, p.ex. sélection des caractéristiques représentatives à partir d’un espace multidimensionnel de caractéristiques
G06V 10/80 - Fusion, c. à d. combinaison des données de diverses sources au niveau du capteur, du prétraitement, de l’extraction des caractéristiques ou de la classification
G06V 10/82 - Dispositions pour la reconnaissance ou la compréhension d’images ou de vidéos utilisant la reconnaissance de formes ou l’apprentissage automatique utilisant les réseaux neuronaux
24.
MODIFYING DIGITAL IMAGES VIA PERSPECTIVE-AWARE OBJECT MOVE
The present disclosure relates to systems, methods, and non-transitory computer-readable media that implement perspective-aware object move operations for digital image editing. For instance, in some embodiments, the disclosed systems determine a vanishing point associated with a digital image portraying an object. Additionally, the disclosed systems detect one or more user interactions for moving the object within the digital image. Based on moving the object with respect to the vanishing point, the disclosed systems perform a perspective-based resizing of the object within the digital image.
An image generation system implements a multi-branch GAN to generate images that each express visually similar content in a different modality. A generator portion of the multi-branch GAN includes multiple branches that are each tasked with generating one of the different modalities. A discriminator portion of the multi-branch GAN includes multiple fidelity discriminators, one for each of the generator branches, and a consistency discriminator, which constrains the outputs generated by the different generator branches to appear visually similar to one another. During training, outputs from each of the fidelity discriminators and the consistency discriminator are used to compute a non-saturating GAN loss. The non-saturating GAN loss is used to refine parameters of the multi-branch GAN during training until model convergence. The trained multi-branch GAN generates multiple images from a single input, where each of the multiple images depicts visually similar content expressed in a different modality.
G06V 10/70 - Dispositions pour la reconnaissance ou la compréhension d’images ou de vidéos utilisant la reconnaissance de formes ou l’apprentissage automatique
Embodiments of the present invention provide systems, methods, and computer storage media for a question search for meaningful questions that appear in a video. In an example embodiment, an audio track from a video is transcribed, and the transcript is parsed to identify sentences that end with a question mark. Depending on the embodiment, one or more types of questions are filtered out, such as short questions less than a designated length or duration, logistical questions, and/or rhetorical questions. As such, in response to a command to perform a question search, the questions are identified, and search result tiles representing video segments of the questions are presented. Selecting (e.g., clicking or tapping on) a search result tile navigates a transcript interface to a corresponding portion of the transcript.
G06F 3/0482 - Interaction avec des listes d’éléments sélectionnables, p.ex. des menus
G06F 3/0484 - Techniques d’interaction fondées sur les interfaces utilisateur graphiques [GUI] pour la commande de fonctions ou d’opérations spécifiques, p.ex. sélection ou transformation d’un objet, d’une image ou d’un élément de texte affiché, détermination d’une valeur de paramètre ou sélection d’une plage de valeurs
G06F 16/735 - Filtrage basé sur des données supplémentaires, p.ex. sur des profils d'utilisateurs ou de groupes
G06F 16/738 - Présentation des résultats des requêtes
27.
VISUAL AND TEXT SEARCH INTERFACE FOR TEXT-BASED VIDEO EDITING
Embodiments of the present invention provide systems, methods, and computer storage media for a visual and text search interface used to navigate a video transcript. In an example embodiment, a freeform text query triggers a visual search for frames of a loaded video that match the freeform text query (e.g., frame embeddings that match a corresponding embedding of the freeform query), and triggers a text search for matching words from a corresponding transcript or from tags of detected features from the loaded video. Visual search results are displayed (e.g., in a row of tiles that can be scrolled to the left and right), and textual search results are displayed (e.g., in a row of tiles that can be scrolled up and down). Selecting (e.g., clicking or tapping on) a search result tile navigates a transcript interface to a corresponding portion of the transcript.
Offset object alignment operations are described that support an ability to control alignment operations to aid positioning of an object in relation to at least one other object in a user interface based an offset value. This is performable through identification of objects that overlap along an axis in a user interface and calculation of offset values using these object pairs. Filtering and priority based techniques are also usable as part of calculated an offset value to be used as part of an alignment operation.
Embodiments of the present invention provide systems, methods, and computer storage media for face-aware speaker diarization. In an example embodiment, an audio-only speaker diarization technique is applied to generate an audio-only speaker diarization of a video, an audio-visual speaker diarization technique is applied to generate a face-aware speaker diarization of the video, and the audio-only speaker diarization is refined using the face-aware speaker diarization to generate a hybrid speaker diarization that links detected faces to detected voices. In some embodiments, to accommodate videos with small faces that appear pixelated, a cropped image of any given face is extracted from each frame of the video, and the size of the cropped image is used to select a corresponding active speaker detection model to predict an active speaker score for the face in the cropped image.
In some examples, a computing system accesses a field of view (FOV) image that has a field of view less than 360 degrees and has low dynamic range (LDR) values. The computing system estimates lighting parameters from a scene depicted in the FOV image and generates a lighting image based on the lighting parameters. The computing system further generates lighting features generated the lighting image and image features generated from the FOV image. These features are aggregated into aggregated features and a machine learning model is applied to the image features and the aggregated features to generate a panorama image having high dynamic range (HDR) values.
A method includes receiving a natural language description of an image to be generated using a machine learning model. The method further includes extracting, from the natural language description of the image to be generated, a control element and a sub-prompt. The method further includes identifying a relationship between the control element and the sub-prompt based on the natural language description of the image to be generated. The method further includes generating, by the machine learning model, an image based on the control element, the sub-prompt, and the relationship. The image includes visual elements corresponding to the control element and the sub-prompt.
Embodiments of the present invention provide systems, methods, and computer storage media for selection of the best image of a particular speaker's face in a video, and visualization in a diarized transcript. In an example embodiment, candidate images of a face of a detected speaker are extracted from frames of a video identified by a detected face track for the face, and a representative image of the detected speaker's face is selected from the candidate images based on image quality, facial emotion (e.g., using an emotion classifier that generates a happiness score), a size factor (e.g., favoring larger images), and/or penalizing images that appear towards the beginning or end of a face track. As such, each segment of the transcript is presented with the representative image of the speaker who spoke that segment and/or input is accepted changing the representative image associated with each speaker.
G11B 27/02 - Montage, p.ex. variation de l'ordre des signaux d'information enregistrés sur, ou reproduits à partir des supports d'enregistrement ou d'information
G06V 20/40 - RECONNAISSANCE OU COMPRÉHENSION D’IMAGES OU DE VIDÉOS Éléments spécifiques à la scène dans le contenu vidéo
G06V 40/16 - Visages humains, p.ex. parties du visage, croquis ou expressions
A method includes receiving an input including a target style and a glyph. The method further includes masking the glyph. The method further includes generating a stylized glyph by a glyph generative model using the masked glyph. The method further includes rendering the stylized glyph as a unicode stylized glyph.
The present disclosure relates to systems, methods, and non-transitory computer readable media for panoptically guiding digital image inpainting utilizing a panoptic inpainting neural network. In some embodiments, the disclosed systems utilize a panoptic inpainting neural network to generate an inpainted digital image according to panoptic segmentation map that defines pixel regions corresponding to different panoptic labels. In some cases, the disclosed systems train a neural network utilizing a semantic discriminator that facilitates generation of digital images that are realistic while also conforming to a semantic segmentation. The disclosed systems generate and provide a panoptic inpainting interface to facilitate user interaction for inpainting digital images. In certain embodiments, the disclosed systems iteratively update an inpainted digital image based on changes to a panoptic segmentation map.
The present disclosure relates to systems, methods, and non-transitory computer readable media for panoptically guiding digital image inpainting utilizing a panoptic inpainting neural network. In some embodiments, the disclosed systems utilize a panoptic inpainting neural network to generate an inpainted digital image according to panoptic segmentation map that defines pixel regions corresponding to different panoptic labels. In some cases, the disclosed systems train a neural network utilizing a semantic discriminator that facilitates generation of digital images that are realistic while also conforming to a semantic segmentation. The disclosed systems generate and provide a panoptic inpainting interface to facilitate user interaction for inpainting digital images. In certain embodiments, the disclosed systems iteratively update an inpainted digital image based on changes to a panoptic segmentation map.
A method, apparatus, and non-transitory computer readable medium for multimedia processing are described. Embodiments of the present disclosure obtain a project file comprising page data for one or more pages. Each of the one or more pages comprises a spatial arrangement of one or more media elements. A media editing interface presents a page of the one or more pages based on the spatial arrangement. The media editing interface presents a scene line adjacent to the page. The scene line comprises a temporal arrangement of one or more scenes within the page, and the one or more media elements are temporally arranged within the one or more scenes.
Constrained stroke editing techniques for digital content are described. In these examples, a stroke constraint system is employed as part of a digital content creation system to manage input, editing, and erasure (i.e., removal) of strokes via a user interface as part of editing digital content. To do so, locations and attributes of a displayed stroke are used to constrain location and/or attributes of an input stroke.
G06F 3/04883 - Techniques d’interaction fondées sur les interfaces utilisateur graphiques [GUI] utilisant des caractéristiques spécifiques fournies par le périphérique d’entrée, p.ex. des fonctions commandées par la rotation d’une souris à deux capteurs, ou par la nature du périphérique d’entrée, p.ex. des gestes en fonction de la pression exer utilisant un écran tactile ou une tablette numérique, p.ex. entrée de commandes par des tracés gestuels pour l’entrée de données par calligraphie, p.ex. sous forme de gestes ou de texte
G06F 3/0354 - Dispositifs de pointage déplacés ou positionnés par l'utilisateur; Leurs accessoires avec détection des mouvements relatifs en deux dimensions [2D] entre le dispositif de pointage ou une partie agissante dudit dispositif, et un plan ou une surface, p.ex. souris 2D, boules traçantes, crayons ou palets
38.
TRANSCRIPT PARAGRAPH SEGMENTATION AND VISUALIZATION OF TRANSCRIPT PARAGRAPHS
Embodiments of the present invention provide systems, methods, and computer storage media for segmenting a transcript into paragraphs. In an example embodiment, a transcript is segmented to start a new paragraph whenever there is a change in speaker and/or a long pause in speech. If any remaining paragraphs are longer than a designated length or duration (e.g., 50 or 100 words), each of those paragraphs is segmented using dynamic programming to minimize a cost function that penalizes candidate paragraphs based on divergence from a target paragraph length and/or that rewards candidate paragraphs that group semantically similar sentences. As such, the transcript is visualized, segmented at the identified paragraphs.
The present disclosure relates to systems, methods, and non-transitory computer readable media for panoptically guiding digital image inpainting utilizing a panoptic inpainting neural network. In some embodiments, the disclosed systems utilize a panoptic inpainting neural network to generate an inpainted digital image according to panoptic segmentation map that defines pixel regions corresponding to different panoptic labels. In some cases, the disclosed systems train a neural network utilizing a semantic discriminator that facilitates generation of digital images that are realistic while also conforming to a semantic segmentation. The disclosed systems generate and provide a panoptic inpainting interface to facilitate user interaction for inpainting digital images. In certain embodiments, the disclosed systems iteratively update an inpainted digital image based on changes to a panoptic segmentation map.
The present disclosure relates to systems, methods, and non-transitory computer readable media for panoptically guiding digital image inpainting utilizing a panoptic inpainting neural network. In some embodiments, the disclosed systems utilize a panoptic inpainting neural network to generate an inpainted digital image according to panoptic segmentation map that defines pixel regions corresponding to different panoptic labels. In some cases, the disclosed systems train a neural network utilizing a semantic discriminator that facilitates generation of digital images that are realistic while also conforming to a semantic segmentation. The disclosed systems generate and provide a panoptic inpainting interface to facilitate user interaction for inpainting digital images. In certain embodiments, the disclosed systems iteratively update an inpainted digital image based on changes to a panoptic segmentation map.
A method includes receiving a description of content to be generated using a generative model. The received description of content is associated with a user profile. The method further includes determining a semantic term based on the description of content. The method further includes generating a user-specific template including the semantic term and a user preference associated with the user profile. The method further includes generating the content using the generative model based on the user-specific template. The method further includes outputting the content for display on a target user device.
Embodiments of the present invention provide systems, methods, and computer storage media for annotating transcript text with video metadata, and including thumbnail bars in the transcript to help users select a desired portion of a video through transcript interactions. In an example embodiment, a video editing interface includes a transcript interface that presents a transcript with transcript text that is annotated to indicate corresponding portions of the video where various features were detected (e.g., annotating via text stylization of transcript text and/or labeling the transcript text with a textual representation of a corresponding detected feature class). In some embodiments, the transcript interface displays a visual representation of detected non-speech audio or pauses (e.g., a sound bar) and/or video thumbnails corresponding to each line of transcript text (e.g., a thumbnail bar). Transcript text, soundbars, and/or thumbnail bars are selectable to identify and perform video editing operations on a corresponding video segment.
The present disclosure relates to systems, methods, and non-transitory computer-readable media that modify two-dimensional images via scene-based editing using three-dimensional representations of the two-dimensional images. For instance, in one or more embodiments, the disclosed systems utilize three-dimensional representations of two-dimensional images to generate and modify shadows in the two-dimensional images according to various shadow maps. Additionally, the disclosed systems utilize three-dimensional representations of two-dimensional images to modify humans in the two-dimensional images. The disclosed systems also utilize three-dimensional representations of two-dimensional images to provide scene scale estimation via scale fields of the two-dimensional images. In some embodiments, the disclosed systems utilizes three-dimensional representations of two-dimensional images to generate and visualize 3D planar surfaces for modifying objects in two-dimensional images. The disclosed systems further use three-dimensional representations of two-dimensional images to customize focal points for the two-dimensional images.
In implementations of systems for generating templates using structure-based matching, a computing device implements a template system to receive input data describing a set of digital design elements. The template system represents the input data as a sentence in a design structure language that describes structural relationships between design elements included in the set of digital design elements. An input template embedding is generated based on the sentence in the design structure language. The template system generates a digital template that includes the set of digital design elements for display in a user interface based on the input template embedding.
Embodiments described herein include aspects related to generating a layout-aware background image. Aspects of the method include receiving a training dataset comprising a document. The method further includes obtaining a mask image based on a layout of content in the document, the mask image having a content area corresponding to content of the document. The method further includes training a machine learning model using the mask image to provide a trained machine learning model that generates transparency values for pixels of a background image for the document.
G06T 7/194 - Découpage; Détection de bords impliquant une segmentation premier plan-arrière-plan
G06T 7/90 - Détermination de caractéristiques de couleur
G06V 10/56 - Extraction de caractéristiques d’images ou de vidéos relative à la couleur
G06V 10/75 - Appariement de motifs d’image ou de vidéo; Mesures de proximité dans les espaces de caractéristiques utilisant l’analyse de contexte; Sélection des dictionnaires
46.
MUSIC-AWARE SPEAKER DIARIZATION FOR TRANSCRIPTS AND TEXT-BASED VIDEO EDITING
Embodiments of the present invention provide systems, methods, and computer storage media for music-aware speaker diarization. In an example embodiment, one or more audio classifiers detect speech and music independently of each other, which facilitates detecting regions in an audio track that contain music but do not contain speech. These music-only regions are compared to the transcript, and any transcription and speakers that overlap in time with the music-only regions are removed from the transcript. In some embodiments, rather than having the transcript display the text from this detected music, a visual representation of the audio waveform is included in the corresponding regions of the transcript.
Business consultation services in the fields of marketing, advertising, online business optimization and customer experience management; providing consultation, advisory services and technical information in the fields of digital marketing and advertising, digital business strategy, and conducting business on computer e-commerce software platforms; business management and organization consultancy; business optimization services, namely, business management consulting with relation to strategy, marketing, sales and operation; business information services, namely, providing customer intelligence services in the field of advertising and marketing campaign analytics; advertising and marketing consulting services, namely, providing advertising and marketing services for managing and optimizing the performance of advertising and marketing campaigns; business data analysis; market research and business analyses; statistical analysis and reporting services for business purposes; providing business marketing and advertising services for enterprises with large volumes of data that must be leveraged for intelligent recommendations and decision advantage through data science, namely, artificial intelligence (AI), machine learning and deep learning technologies and algorithms; computerized database management; business data analysis, namely, collecting, reporting, analyzing and integrating business data related to the use of websites and applications of others, the use of other data from various sources, and the effectiveness of advertising and marketing campaigns; advertising and marketing consulting services, namely, providing advertising and marketing services for managing, distributing and serving advertising, improving ad targeting, facilitating buying or selling advertising, providing real time reporting, and forecasting, managing, monitoring, executing and optimizing the performance of advertising and marketing campaigns; providing online searchable databases in the field of advertising and marketing campaign analytics; advertising and marketing consultancy; providing business intelligence services; providing business intelligence services in the field of advertising and marketing campaign analytics; advertising, marketing and commercial information services via the Internet, computer networks, other telecommunications networks, and mobile communications devices; business management consulting services relating to task management, schedule management, business management, document management, business planning, human resource allocation, workforce collaboration, financial resource allocation, and workflow tracking in the field of business process management
Telecommunication services, namely, providing online facilities for real time conversation and interaction between and among users of computers, mobile and handheld computers, and wired and wireless communication devices, concerning topics of general interest; enabling individuals to send and receive messages in the field of general interest, namely, electronic transmission of email, message sending via a website, and instant messaging services; providing on line chat rooms and electronic bulletin boards for transmission of messages among users in the field of general interest; chat room services for social networking, namely, organizing real time chat conversation into customized strings; web based real time, multimedia communications services, namely, electronic transmission of multimedia data among users of computers and wireless communication networks
45 - Services juridiques; services de sécurité; services personnels pour individus
Produits et services
Stock photography services, namely, leasing reproduction rights of photographs, transparencies and digital content to others; Online social networking services
A missing glyph replacement system is described. In an example, a Unicode identifier of a missing glyph is obtained and glyph metadata describing a glyph cluster that includes the Unicode identifier is obtained from a cache maintained in the storage device, e.g., as part of preprocessing. From this, the system obtains glyphs from the font using Unicode identifiers included in the glyph cluster. The system uses a representative glyph from these glyphs to verify the glyph cluster, and if verified obtains glyphs based on the cluster. For these obtained glyphs, an amount of similarity is determined for the missing glyph with respect to the plurality of obtained glyphs, e.g., to control output of representations of the obtained glyphs in the user interface. The representations are user selectable via the user interface to replace the missing glyph.
Methods, systems, and non-transitory computer readable storage media are disclosed for utilizing machine-learning to automatically select a machine-learning model for graph learning tasks. The disclosed system extracts, utilizing a graph feature machine-learning model, meta-graph features representing structural characteristics of a graph representation comprising a plurality of nodes and a plurality of edges indicating relationships between the plurality of nodes. The disclosed system also generates, utilizing the graph feature machine-learning model, a plurality of estimated graph learning performance metrics for a plurality of machine-learning models according to the meta-graph features. The disclosed system selects a machine-learning model to process data associated with the graph representation according to the plurality of estimated graph learning performance metrics.
The present disclosure relates to systems, methods, and non-transitory computer-readable media that provides to a user a subset of digital design templates as recommendations based on a creative segment classification and template classifications. For instance, in one or more embodiments, the disclosed systems generate the creative segment classification for the user and determines geo-seasonal intent data. Furthermore, the disclosed system generates template classifications using a machine learning model based on geo-seasonality and creative intent. In doing so, the disclosed system identifies a subset of digital design templates based on the template classifications, geo-seasonal intent data, and the creative segment classification of the user.
Self-consumable portions generation techniques from a digital document are described. The self-consumable portions are generated based on a determination of an amount of resources available at a receiver device that is to receive the digital document. Examples of the resources include an amount of memory resources, processing resources, and/or network resources associated with the receiver device. The self-consumable portions, once generated, are separately renderable at the receiver device.
The present disclosure relates to systems, methods, and non-transitory computer readable media that determine internet traffic data loss from internet traffic data including bulk ingested data utilizing an internet traffic forecasting model. In particular, the disclosed systems detect that observed internet traffic data includes bulk ingested internet traffic data. In addition, the disclosed systems determine a predicted traffic volume for an outage period from the bulk ingested internet traffic data utilizing an internet traffic forecasting model. The disclosed systems further generate a decomposed predicted traffic volume for the outage period. The disclosed systems also determine an internet traffic data loss for the outage period from the decomposed predicted traffic volume while calibrating for pattern changes and late data from previous periods.
Digital image text editing techniques as implemented by an image processing system are described that support increased user interaction in the creation and editing of digital images through understanding a content creator's intent as expressed using text. In one example, a text user input is received by a text input module. The text user input describes a visual object and a visual attribute, in which the visual object specifies a visual context of the visual attribute. A feature representation generated by a text-to-feature system using a machine-learning module based on the text user input. The feature representation is passed to an image editing system to edit a digital object in a digital image, e.g., by applying a texture to an outline of the digital object within the digital image.
Systems and methods for data augmentation are provided. One aspect of the systems and methods include receiving an image that is misclassified by a classification network; computing an augmentation image based on the image using an augmentation network; and generating an augmented image by combining the image and the augmentation image, wherein the augmented image is correctly classified by the classification network.
Systems and methods for text simplification are described. Embodiments of the present disclosure identify a simplified text that includes original information from a complex text and additional information that is not in the complex text. Embodiments then compute an entailment score for each sentence of the simplified text using a neural network, wherein the entailment score indicates whether the sentence of the simplified text includes information from a sentence of the complex text corresponding to the sentence of the simplified text. Then, embodiments generate a modified text based on the entailment score, the simplified text, and the complex text, wherein the modified text includes the original information and excludes the additional information. Embodiments may then present the modified text to a user via a user interface.
A media edit point selection process can include a media editing software application programmatically converting speech to text and storing a timestamp-to-text map. The map correlates text corresponding to speech extracted from an audio track for the media clip to timestamps for the media clip. The timestamps correspond to words and some gaps in the speech from the audio track. The probability of identified gaps corresponding to a grammatical pause by the speaker is determined using the timestamp-to-text map and a semantic model. Potential edit points corresponding to grammatical pauses in the speech are stored for display or for additional use by the media editing software application. Text can optionally be displayed to a user during media editing.
Techniques for trigger based digital content caching are described to automatically cache digital content on a client device based on a likelihood that the client device will access the digital content. A cache system, for instance, monitors an interaction of a first client device with digital content that is maintained as part of a digital service by a service provider system. Based on the monitored interaction, the cache system detects a trigger event that indicates a likelihood of interaction by a second client device to edit the digital content. Responsive to detection of the trigger event, the cache system is operable to initiate caching of the digital content on the second client device automatically and without user intervention.
H04N 21/231 - Opération de stockage de contenu, p.ex. mise en mémoire cache de films pour stockage à court terme, réplication de données sur plusieurs serveurs, ou établissement de priorité des données pour l'effacement
H04N 21/472 - Interface pour utilisateurs finaux pour la requête de contenu, de données additionnelles ou de services; Interface pour utilisateurs finaux pour l'interaction avec le contenu, p.ex. pour la réservation de contenu ou la mise en place de rappels, pour la requête de notification d'événement ou pour la transformation de contenus affichés
61.
SYSTEMS AND METHODS FOR COLLABORATIVE AGREEMENT SIGNING
Systems and methods for collaborative document signing are described. According to one aspects, a method for collaborative document signing includes initiating a live communication session including a user, identifying a source document for an agreement using an agreement signing interface of the live communication session, assigning the user as a signer of the agreement using the agreement signing interface, and generating the agreement. In some cases, the agreement includes the source document. The method further includes obtaining a signature for the agreement from the user and generating a signed agreement including the signature.
G06F 3/0482 - Interaction avec des listes d’éléments sélectionnables, p.ex. des menus
G06F 3/0484 - Techniques d’interaction fondées sur les interfaces utilisateur graphiques [GUI] pour la commande de fonctions ou d’opérations spécifiques, p.ex. sélection ou transformation d’un objet, d’une image ou d’un élément de texte affiché, détermination d’une valeur de paramètre ou sélection d’une plage de valeurs
Systems and methods for query processing are described. Embodiments of the present disclosure identify a target phrase in an original query, wherein the target phrase comprises a phrase to be replaced in the original query; replace the target phrase with a mask token to obtain a modified query; generate an alternative query based on the modified query using a masked language model (MLM), wherein the alternative query includes an alternative phrase in place of the target phrase that is consistent with a context of the target phrase; and retrieve a search result based on the alternative query.
Methods and systems are provided for facilitating generation and utilization of causal-based models. In embodiments described herein, a set of events comprising touchpoints resulting in a conversion are obtained. A direct attribution indicating credit for an event contribution to the conversion is determined. An adjusted attribution for the event based on the direct attribution for the event augmented with an indirect attribution for the event is determined. The indirect attribution can be identified based on the event causing a subsequent event of the set of events to result in the conversion. Thereafter, the adjusted attribution for the event is provided to indicate an extent of credit assigned to the event for causing the corresponding conversion.
The present disclosure relates to systems, methods, and non-transitory computer-readable media that distribute item-based digital content across digital platforms using trend setting participants of those digital platforms. For instance, in one or more embodiments, the disclosed systems generate affinity metrics for digital items from a catalog of digital items with respect to a plurality of trend setting participants of a plurality of digital platforms using attributes of digital posts by the plurality of trend setting participants on the plurality of digital platforms and corresponding attributes of the digital items. The disclosed systems further determine predicted demand metrics for the digital items on the plurality of digital platforms using the affinity metrics. Using the predicted demand metrics, the disclosed systems distribute digital content related to the digital items for display on a plurality of client devices via the plurality of digital platforms.
A modeling system displays a three-dimensional (3D) space including a 3D object including a plurality of points and a cage model of the 3D object including a first configuration of vertices and quad faces. Each of the plurality of points is located at a respective initial location. The modeling system generates cage coordinates for the cage model including a vertex coordinate for each vertex of the cage model and four quad coordinates for each quad face of the cage model corresponding to each corner vertex of the quad. The modeling system deforms, responsive to receiving a request, the cage model to change the first configuration of vertices to a second configuration. The modeling system generates, based on the cage coordinates, the first configuration of vertices, and the second configuration of vertices, an updated 3D object by determining a subsequent location for each of the plurality of points.
In various examples, a table recognition model receives an image of a table and generates, using a first encoder of the table recognition machine learning model, an image feature vector including features extracted from the image of the table; generates, using a first decoder of the table recognition machine learning model and the image feature vector, a set of coordinates within the image representing rows and columns associated with the table, and generates, using a second decoder of the table recognition machine learning model and the image feature vector, a set of bounding boxes and semantic features associated with cells the table, then determines, using a third decoder of the table recognition machine learning model, a table structure associated with the table using the image feature vector, the set of coordinates, the set of bounding boxes, and the semantic features.
G06V 30/412 - Analyse de mise en page de documents structurés avec des lignes imprimées ou des zones de saisie, p.ex. de formulaires ou de tableaux d’entreprise
G06V 30/262 - Techniques de post-traitement, p.ex. correction des résultats de la reconnaissance utilisant l’analyse contextuelle, p.ex. le contexte lexical, syntaxique ou sémantique
G06V 30/414 - Extraction de la structure géométrique, p.ex. arborescence; Découpage en blocs, p.ex. boîtes englobantes pour les éléments graphiques ou textuels
Systems and methods for joint document signing are described. According to one aspect, a method for joint document signing includes establishing a live communication session including a plurality of users. In some cases, the plurality of users correspond to a set of signers of a document. The method further includes initiating a signing process during the live communication session, receiving a signature for the document from each of the plurality of users during the live communication session based on the signing process, and generating a signed document including the signature from each of the plurality of users.
G06F 40/166 - Traitement de texte Édition, p.ex. insertion ou suppression
G06F 3/0484 - Techniques d’interaction fondées sur les interfaces utilisateur graphiques [GUI] pour la commande de fonctions ou d’opérations spécifiques, p.ex. sélection ou transformation d’un objet, d’une image ou d’un élément de texte affiché, détermination d’une valeur de paramètre ou sélection d’une plage de valeurs
H04L 65/1069 - Gestion de session Établissement ou terminaison d'une session
68.
FACILITATING GENERATION AND PRESENTATION OF ADVANCED INSIGHTS
Methods, computer systems, computer-storage media, and graphical user interfaces are provided for facilitating generation and presentation of insights. In one implementation, a set of data is used to generate a data visualization. A candidate insight associated with the data visualization is generated, the candidate insight being generated in text form based on a text template and comprising a descriptive insight, a predictive insight, an investigative, or a prescriptive insight. A set of natural language insights is generated, via a machine learning model. The natural language insights represent the candidate insight in a text style that is different from the text template. A natural language insight having the text style corresponding with a desired text style is selected for presenting the candidate insight and, thereafter, the selected natural language insight and data visualization are providing for display via a graphical user interface.
Systems and methods for query processing are described. Embodiments of the present disclosure identify an original query; generate a plurality of expanded queries by generating a plurality of additional phrases based on the original query using a causal language model (CLM) and augmenting the original query with each of the plurality of additional phrases, respectively; and provide a plurality of images in response to the original query, wherein the plurality of images are associated with the plurality of expanded queries, respectively.
In some embodiments, techniques for producing user-generated content are provided. For example, a process may involve sending a product identifier; receiving a first candidate image that is associated with the product identifier; determining that a similarity between a user structure and a target structure satisfies a threshold condition, wherein the user structure characterizes a figure of a user in a first input image and the target structure is based on a pose guide associated with the first candidate image; and capturing, based on the determining, the first input image.
A search system employs arrival times with associated confidence scores as search facets for identifying items. The search system identifies a plurality of items based on search input. An arrival time and associated confidence score are determined for each item from the plurality of items. Search results are provided for the plurality of items in response to the search input. The search results are provided based at least in part on the arrival times and associated confidence scores for the plurality of items.
Methods, systems, and non-transitory computer readable storage media are disclosed for automatically detecting and reconstructing patterns in digital images. The disclosed system determines structurally similar pixels of a digital image by comparing neighborhood descriptors that include the structural context for neighborhoods of the pixels. In response to identify structurally similar pixels of a digital image, the disclosed system utilizes non-maximum suppression to reduce the set of structurally similar pixels to collinear pixels within the digital image. Additionally, the disclosed system determines whether a group of structurally similar pixels define the boundaries of a pattern cell that forms a rectangular grid pattern within the digital image. The disclosed system also modifies a boundary of a detected pattern cell to include a human-perceived pattern object via a sliding window corresponding to the pattern cell.
G06V 10/77 - Dispositions pour la reconnaissance ou la compréhension d’images ou de vidéos utilisant la reconnaissance de formes ou l’apprentissage automatique utilisant l’intégration et la réduction de données, p.ex. analyse en composantes principales [PCA] ou analyse en composantes indépendantes [ ICA] ou cartes auto-organisatrices [SOM]; Séparation aveugle de source
G06T 9/20 - Codage des contours, p.ex. utilisant la détection des contours
G06V 10/46 - Descripteurs pour la forme, descripteurs liés au contour ou aux points, p.ex. transformation de caractéristiques visuelles invariante à l’échelle [SIFT] ou sacs de mots [BoW]; Caractéristiques régionales saillantes
Systems and methods for image exploration are provided. One aspect of the systems and methods includes identifying a set of images; reducing the set of images to obtain a representative set of images that is distributed throughout the set of images by removing a neighbor image based on a proximity of the neighbor image to an image of the representative set of images; arranging the representative set of images in a grid structure using a self-sorting map (SSM) algorithm; and displaying a portion of the representative set of images based on the grid structure.
G06F 16/54 - Navigation; Visualisation à cet effet
G06V 10/22 - Prétraitement de l’image par la sélection d’une région spécifique contenant ou référençant une forme; Localisation ou traitement de régions spécifiques visant à guider la détection ou la reconnaissance
G06V 10/772 - Détermination de motifs de référence représentatifs, p.ex. motifs de valeurs moyennes ou déformants; Génération de dictionnaires
74.
Automated Digital Tool Identification from a Rasterized Image
A visual lens system is described that identifies, automatically and without user intervention, digital tool parameters for achieving a visual appearance of an image region in raster image data. To do so, the visual lens system processes raster image data using a tool region detection network trained to output a mask indicating whether the digital tool is useable to achieve a visual appearance of each pixel in the raster image data. The mask is then processed by a tool parameter estimation network trained to generate a probability distribution indicating an estimation of discrete parameter configurations applicable to the digital tool to achieve the visual appearance. The visual lens system generates an image tool description for the parameter configuration and incorporates the image tool description into an interactive image for the raster image data. The image tool description enables transfer of the digital tool parameter configuration to different image data.
G06T 11/40 - Remplissage d'une surface plane par addition d'attributs de surface, p.ex. de couleur ou de texture
G06F 3/04817 - Techniques d’interaction fondées sur les interfaces utilisateur graphiques [GUI] fondées sur des propriétés spécifiques de l’objet d’interaction affiché ou sur un environnement basé sur les métaphores, p.ex. interaction avec des éléments du bureau telles les fenêtres ou les icônes, ou avec l’aide d’un curseur changeant de comport utilisant des icônes
G06F 3/04842 - Sélection des objets affichés ou des éléments de texte affichés
G06F 18/214 - Génération de motifs d'entraînement; Procédés de Bootstrapping, p.ex. ”bagging” ou ”boosting”
G06F 18/2411 - Techniques de classification relatives au modèle de classification, p.ex. approches paramétriques ou non paramétriques basées sur la proximité d’une surface de décision, p.ex. machines à vecteurs de support
G06F 18/40 - Dispositions logicielles spécialement adaptées à la reconnaissance des formes, p.ex. interfaces utilisateur ou boîtes à outils à cet effet
The present disclosure relates to systems that perform text-based palette searches that convert a text query into a color distribution and utilize the color distribution to identify relevant color palettes. More specifically, the disclosed systems receive a textual color palette search query and convert, utilizing a text-to-color model, the textual color palette search query into a color distribution. The disclosed systems determine, utilizing a palette scoring model, distance metrics between the color distribution and a plurality of color palettes in a color database by: identifying swatch matches between colors of the color distribution and unmatched swatches of the plurality of color palettes and determining distances between the colors of the color distribution and matched swatches of the plurality of color palettes. The disclosed systems return one or more color palettes of the plurality of color palettes in response to the textual color palette search query based on the distance metrics.
G06F 16/583 - Recherche caractérisée par l’utilisation de métadonnées, p.ex. de métadonnées ne provenant pas du contenu ou de métadonnées générées manuellement utilisant des métadonnées provenant automatiquement du contenu
G06F 16/532 - Formulation de requêtes, p.ex. de requêtes graphiques
G06F 16/538 - Présentation des résultats des requêtes
G06F 40/40 - Traitement ou traduction du langage naturel
Methods, systems, and non-transitory computer readable storage media are disclosed for customizing digital content tutorials for a user within a digital editing application based on user experience with editing tools. The disclosed system determines proficiency levels for a plurality of different portions of a digital content tutorial corresponding to a digital content editing task. The disclosed system generates tool proficiency scores associated with the user in a digital editing application in connection with the portions of the digital content tutorial. Specifically, the disclosed system generates the tool proficiency scores based on usage of tools corresponding to the portions. Additionally, the disclosed system generates a mapping for the user based on the tool proficiency scores associated with the user and the proficiency levels of the portions of the digital content tutorial. The disclosed system provides a customized digital content tutorial for display at a client device according to the mapping.
Systems and methods for event processing are provided. One aspect of the systems and methods includes receiving an event corresponding to an interaction of a user with a digital content channel; identifying a rule state for a segmentation rule that assigns users to a segment; assigning the user to the segment by evaluating the segmentation rule based on the rule state and the event from the digital content channel; updating the rule state; and providing customized content to the user based on the assignment of the user to the segment.
H04L 47/762 - Contrôle d'admission; Allocation des ressources en utilisant l'allocation dynamique des ressources, p.ex. renégociation en cours d'appel sur requête de l'utilisateur ou sur requête du réseau en réponse à des changements dans les conditions du réseau déclenchée par le réseau
H04L 47/70 - Contrôle d'admission; Allocation des ressources
78.
MULTIDIMENTIONAL IMAGE EDITING FROM AN INPUT IMAGE
Various disclosed embodiments are directed to changing parameters of an input image or multidimensional representation of the input image based on a user request to change such parameters. An input image is first received. A multidimensional image that represents the input image in multiple dimensions is generated via a model. A request to change at least a first parameter to a second parameter is received via user input at a user device. Such request is a request to edit or generate the multidimensional image in some way. For instance, the request may be to change the light source position or camera position from a first set of coordinates to a second set of coordinates.
G06T 19/20 - Transformation de modèles ou d'images tridimensionnels [3D] pour infographie Édition d'images tridimensionnelles [3D], p.ex. modification de formes ou de couleurs, alignement d'objets ou positionnements de parties
G06F 3/04847 - Techniques d’interaction pour la commande des valeurs des paramètres, p.ex. interaction avec des règles ou des cadrans
G06V 10/774 - Dispositions pour la reconnaissance ou la compréhension d’images ou de vidéos utilisant la reconnaissance de formes ou l’apprentissage automatique utilisant l’intégration et la réduction de données, p.ex. analyse en composantes principales [PCA] ou analyse en composantes indépendantes [ ICA] ou cartes auto-organisatrices [SOM]; Séparation aveugle de source méthodes de Bootstrap, p.ex. "bagging” ou “boosting”
G06V 10/776 - Dispositions pour la reconnaissance ou la compréhension d’images ou de vidéos utilisant la reconnaissance de formes ou l’apprentissage automatique utilisant l’intégration et la réduction de données, p.ex. analyse en composantes principales [PCA] ou analyse en composantes indépendantes [ ICA] ou cartes auto-organisatrices [SOM]; Séparation aveugle de source Évaluation des performances
G06V 10/82 - Dispositions pour la reconnaissance ou la compréhension d’images ou de vidéos utilisant la reconnaissance de formes ou l’apprentissage automatique utilisant les réseaux neuronaux
79.
ATTENTION AWARE MULTI-MODAL MODEL FOR CONTENT UNDERSTANDING
A content analysis system provides content understanding for a content item using an attention aware multi-modal model. Given a content item, feature extractors extract features from content components of the content item in which the content components comprise multiple modalities. A cross-modal attention encoder of the attention aware multi-modal model generates an embedding of the content item using features extracted from the content components. A decoder of the attention aware multi-modal model generates an action-reason statement using the embedding of the content item from the cross-modal attention encoder.
G06F 16/58 - Recherche caractérisée par l’utilisation de métadonnées, p.ex. de métadonnées ne provenant pas du contenu ou de métadonnées générées manuellement
Embodiments are disclosed for reconstructing linear gradients from an input image that can be applied to another image. In particular, in one or more embodiments, the disclosed systems and methods comprise receiving a raster image, the raster image including a representation of a linear color gradient. The disclosed systems and methods further comprise determining a vector representing a direction of the linear color gradient. The disclosed systems and methods further comprise analyzing pixel points along the direction of the linear color gradient to compute color stops of the linear color gradient. The disclosed systems and methods further comprise generating an output color gradient vector with the computed color stops of the linear color gradient, the output color gradient vector to be applied to a vector graphic.
The present disclosure describes systems, non-transitory computer-readable media, and methods for generating object-specific-preset edits to be later applied to other digital images depicting a same object type or applying a previously generated object-specific-preset edit to an object of the same object type within a target digital image. For example, in some cases, the disclosed systems generate an object-specific-preset edit by determining a region of a particular localized edit in an edited digital image, identifying an edited object corresponding to the localized edit, and storing in a digital-image-editing document an object tag for the edited object and instructions for the localized edit. In certain implementations, the disclosed systems further apply such an object-specific-preset edit to a target object in a target digital image by determining transformed-positioning parameters for a localized edit from the object-specific-preset edit to the target object.
A search system generates custom attributes for use as search facets. User input associated with an image of a target item available on a listing platform is received. The image is analyzed to determine an attribute of the target item as a custom attribute. A value for the custom attribute is determined for each of a number of other items available on the listing platform that are of the same item type as the target item. Search results are provided based at least in part on the values of the custom attribute for the other items.
Systems and methods for image processing are described. Embodiments of the present disclosure receive a raster image depicting a radial color gradient; compute an origin point of the radial color gradient based on an orthogonality measure between a color gradient vector at a point in the raster image and a relative position vector between the point and the origin point; construct a vector graphics representation of the radial color gradient based on the origin point; and generate a vector graphics image depicting the radial color gradient based on the vector graphics representation.
Embodiments provide systems, methods, and computer storage media for prediction and computation of electronic shopping carts. In an example embodiment, for each interaction between an e-shopper and an e-commerce application, one or more predicted electronic shopping carts that represent a combination of items the e-shopper is likely to purchase are generated based on current items in the e-shopper's electronic shopping cart and recent interactions with the e-shopper. For some or all of the predicted electronic shopping carts (e.g., those with top predicted confidence levels), corresponding shopping cart computations (e.g., identifying application promotions, determining a price total for the items in the predicted shopping cart) are executed and cached prior to the e-shopping adding the predicted items. As such, a page configured to visualize the predicted electronic shopping cart with a value retrieved from the cached shopping cart computations (e.g., price total for the predicted electronic shopping cart) is generated.
The present disclosure relates to systems, methods, and non-transitory computer readable media that utilize deep learning to map query videos to known videos so as to identify a provenance of the query video or identify editorial manipulations of the query video relative to a known video. For example, the video comparison system includes a deep video comparator model that generates and compares visual and audio descriptors utilizing codewords and an inverse index. The deep video comparator model is robust and ignores discrepancies due to benign transformations that commonly occur during electronic video distribution.
H04N 21/434 - Désassemblage d'un flux multiplexé, p.ex. démultiplexage de flux audio et vidéo, extraction de données additionnelles d'un flux vidéo; Remultiplexage de flux multiplexés; Extraction ou traitement de SI; Désassemblage d'un flux élémentaire mis en paquets
G06F 16/78 - Recherche de données caractérisée par l’utilisation de métadonnées, p.ex. de métadonnées ne provenant pas du contenu ou de métadonnées générées manuellement
H04N 21/84 - Génération ou traitement de données de description, p.ex. descripteurs de contenu
H04N 21/845 - Structuration du contenu, p.ex. décomposition du contenu en segments temporels
86.
Automatic detection and removal of typographic rivers in electronic documents
Embodiments are disclosed for removing typographic rivers from electronic documents. The method may include receiving an electronic document including a plurality of words for automatic typographic correction. A typographic river is identified in the electronic document, the typographic river including a plurality of nodes, each node including an empty glyph. A candidate adjustment that removes the first node of the plurality of nodes is identified and the candidate adjustment is applied to the electronic document.
G06F 17/00 - TRAITEMENT ÉLECTRIQUE DE DONNÉES NUMÉRIQUES Équipement ou méthodes de traitement de données ou de calcul numérique, spécialement adaptés à des fonctions spécifiques
G06F 40/109 - Maniement des polices de caractères; Typographie cinétique ou temporelle
G06F 40/166 - Traitement de texte Édition, p.ex. insertion ou suppression
87.
IMAGE COMPRESSION PERFORMANCE OPTIMIZATION FOR IMAGE COMPRESSION
The context-aware optimization method includes training a context model by determining whether to split each node in the context by identifying a first subset of virtual context to evaluate by identifying a second subset of virtual contexts to evaluate and obtaining an encoding cost of splitting of the context model for each virtual context in the second subset and identifying the first subset of virtual contexts to evaluate by selecting a predetermined number of virtual contexts from the second subset based on the encoding cost such that the predetermined number of virtual contexts with lowest encoding cost are selected. The modified tree-traversal method includes encoding a mask or performing a speculative-based method. The modified entropy coding method includes representing data into an array of bits, using multiple coders to process each bit in the array and combining the output from the multiple coders into a data range.
An image processing system uses a depth-conditioned autoencoder to generate a modified image from an input image such that the modified image maintains an overall structure from the input image while modifying textural features. An encoder of the depth-conditioned autoencoder extracts a structure latent code from an input image and depth information for the input image. A generator of the depth-conditioned autoencoder generates a modified image using the structure latent code and a texture latent code. The modified image generated by the depth-conditioned autoencoder includes the structural features from the input image while incorporating textural features of the texture latent code. In some aspects, the autoencoder is depth-conditioned during training by augmenting training images with depth information. The autoencoder is trained to preserve the depth information when generating images.
Embodiments are disclosed for identifying and generating symmetrical repeat edits to similar objects in an image. A selection of a first object and an edit to the first object in an image is received. The image is searched for a plurality of candidate objects that have a similar shape to the first object and the plurality of candidate objects are filtered to include one or more objects that are symmetrical with the first object. A symmetric object is selected from the plurality of candidate objects. An axis of symmetry is computed between the symmetric object and the first object. The edit is applied to the symmetric object and to the first object.
Systems and methods for machine learning context based confidence calibration are disclosed. In one embodiment, a processing logic may obtain an image frame; generate, with a first machine learning model, a confidence score, a bounding box, and an instance embedding corresponding to an object instance inferred from the image frame; and compute, with a second machine learning model, a calibrated confidence score for the object instance based on the instance embedding, the confidence score, and the bounding box.
The present disclosure relates to systems, methods, and non-transitory computer readable media that recommend editing presets based on editing intent. For instance, in one or more embodiments, the disclosed systems receive, from a client device, a user query corresponding to a digital image to be edited. The disclosed systems extract, from the user query, an editing intent for editing the digital image. Further, the disclosed systems determine an editing preset that corresponds to the editing intent based on an editing state of an edited digital image associated with the editing preset. The disclosed systems generate a recommendation for the editing preset for provision to the client device.
Systems and methods for image processing are described. Embodiments of the present disclosure receive a reference image depicting a reference object with a target spatial attribute; generate object saliency noise based on the reference image by updating random noise to resemble the reference image; and generate an output image based on the object saliency noise, wherein the output image depicts an output object with the target spatial attribute.
G06V 10/74 - Appariement de motifs d’image ou de vidéo; Mesures de proximité dans les espaces de caractéristiques
G06V 10/764 - Dispositions pour la reconnaissance ou la compréhension d’images ou de vidéos utilisant la reconnaissance de formes ou l’apprentissage automatique utilisant la classification, p.ex. des objets vidéo
G06V 10/774 - Dispositions pour la reconnaissance ou la compréhension d’images ou de vidéos utilisant la reconnaissance de formes ou l’apprentissage automatique utilisant l’intégration et la réduction de données, p.ex. analyse en composantes principales [PCA] ou analyse en composantes indépendantes [ ICA] ou cartes auto-organisatrices [SOM]; Séparation aveugle de source méthodes de Bootstrap, p.ex. "bagging” ou “boosting”
G06V 20/70 - RECONNAISSANCE OU COMPRÉHENSION D’IMAGES OU DE VIDÉOS Éléments spécifiques à la scène Étiquetage du contenu de scène, p.ex. en tirant des représentations syntaxiques ou sémantiques
Embodiments are disclosed for blending complex objects. The method may include identifying a first complex object and a second complex object. A first primary object associated with the first complex object and a first sequence of geometric repeat operations are determined. A second primary object associated with the second complex object and second sequence of geometric repeat operations are also determined. A blending operation is applied to the first primary object and the second primary object to generate one or more intermediate primary objects. One or more intermediate complex objects are generated from the one or more intermediate primary objects.
In implementations of systems for visual reordering of partial vector objects, a computing device implements an order system to receive input data describing a region specified relative to a group of vector objects that includes a portion of a first vector object and a portion of second vector object. A visual order as between the portion of the first vector object and the portion of the second vector object within the region is determined. The order system computes a modified visual order as between the portion of the first vector object and the portion of the second vector object within the region based on the visual order. The order system generates the group of vector objects for display in a user interface using a render surface and a sentinel value to render pixels within the region in the modified visual order.
The technology described herein receives a natural-language sequence of words comprising multiple entities. The technology then identifies a plurality of entities in the natural-language sequence. The technology generates a masked natural-language sequence by masking a first entity in the natural-language sequence. The technology retrieves, from a knowledge base, information related to a second entity in the plurality of entities. The technology then trains a natural-language model to respond to a query. The training uses a first representation of the masked natural-language sequence, a second representation of the information, and the first entity.
H04L 51/02 - Messagerie d'utilisateur à utilisateur dans des réseaux à commutation de paquets, transmise selon des protocoles de stockage et de retransmission ou en temps réel, p.ex. courriel en utilisant des réactions automatiques ou la délégation par l’utilisateur, p.ex. des réponses automatiques ou des messages générés par un agent conversationnel
This disclosure describes one or more implementations of systems, non-transitory computer-readable media, and methods that regularize learning targets for a student network by leveraging past state outputs of the student network with outputs of a teacher network to determine a retrospective knowledge distillation loss. For example, the disclosed systems utilize past outputs from a past state of a student network with outputs of a teacher network to compose student-regularized teacher outputs that regularize training targets by making the training targets similar to student outputs while preserving semantics from the teacher training targets. Additionally, the disclosed systems utilize the student-regularized teacher outputs with student outputs of the present states to generate retrospective knowledge distillation losses. Then, in one or more implementations, the disclosed systems compound the retrospective knowledge distillation losses with other losses of the student network outputs determined on the main training tasks to learn parameters of the student networks.
Embodiments are disclosed for performing 3-D vectorization. The method includes obtaining a three-dimensional rendered image and a camera position. The method further includes obtaining a triangle mesh representing the three-dimensional rendered image. The method further involves creating a reduced triangle mesh by removing one or more triangles from the triangle mesh. The method further involves subdividing each triangle of the reduced triangle mesh into one or more subdivided triangles. The method further involves performing a mapping of each pixel of the three-dimensional rendered image to the reduced triangle mesh. The method further involves assigning a color value to each vertex of the reduced triangle mesh. The method further involves sorting each triangle of the reduced triangle mesh using a depth value of each triangle. The method further involves generating a two-dimensional triangle mesh using the sorted triangles of the reduced triangle mesh.
Location operation conflict resolution techniques are described. In these techniques, a likely user's intent is inferred by a digital image editing system to prioritize anchor points that are to be a subject of a location operation. In an example in which multiple anchor points qualify for location operations at a same time, these techniques are usable to resolve conflicts between the anchor points based on an assigned priority. In an implementation, the priority is based on selection input location with respect to an object.
G06V 10/24 - Alignement, centrage, détection de l’orientation ou correction de l’image
G06V 10/22 - Prétraitement de l’image par la sélection d’une région spécifique contenant ou référençant une forme; Localisation ou traitement de régions spécifiques visant à guider la détection ou la reconnaissance
99.
DETERMINING FEATURE CONTRIBUTIONS TO DATA METRICS UTILIZING A CAUSAL DEPENDENCY MODEL
The present disclosure relates to methods, systems, and non-transitory computer-readable media for determining causal contributions of dimension values to anomalous data based on causal effects of such dimension values on the occurrence of other dimension values from interventions performed in a causal graph. For example, the disclosed systems can identify an anomalous dimension value that reflects a threshold change in value between an anomalous time period and a reference time period. The disclosed systems can determine causal effects by traversing a causal network representing dependencies between different dimensions associated with the dimension values. Based on the causal effects, the disclosed systems can determine causal contributions of particular dimension values on the anomalous dimension value. Further, the disclosed systems can generate a causal-contribution ranking of the particular dimension values based on the determined causal contributions.
Embodiments are disclosed for managing text co-editing in a conflict-free replicated data type (CRDT) environment. A method of co-editing management includes detecting a burst operation to be performed on a sequential data structure being edited by one or more client devices. A segment of the sequential data structure associated with the burst operation is determined based on a logical index associated with the burst operation. A tree structure associated with the segment is generated, where a root node of the tree structure corresponds to the burst operation. A global index for the root node of the tree structure is determined and an update corresponding to the burst operation, including the root node and the global index, is sent to the one or more client devices.