Offset object alignment operations are described that support an ability to control alignment operations to aid positioning of an object in relation to at least one other object in a user interface based an offset value. This is performable through identification of objects that overlap along an axis in a user interface and calculation of offset values using these object pairs. Filtering and priority based techniques are also usable as part of calculated an offset value to be used as part of an alignment operation.
Embodiments of the present invention provide systems, methods, and computer storage media for face-aware speaker diarization. In an example embodiment, an audio-only speaker diarization technique is applied to generate an audio-only speaker diarization of a video, an audio-visual speaker diarization technique is applied to generate a face-aware speaker diarization of the video, and the audio-only speaker diarization is refined using the face-aware speaker diarization to generate a hybrid speaker diarization that links detected faces to detected voices. In some embodiments, to accommodate videos with small faces that appear pixelated, a cropped image of any given face is extracted from each frame of the video, and the size of the cropped image is used to select a corresponding active speaker detection model to predict an active speaker score for the face in the cropped image.
The present disclosure relates to systems, methods, and non-transitory computer readable media for panoptically guiding digital image inpainting utilizing a panoptic inpainting neural network. In some embodiments, the disclosed systems utilize a panoptic inpainting neural network to generate an inpainted digital image according to panoptic segmentation map that defines pixel regions corresponding to different panoptic labels. In some cases, the disclosed systems train a neural network utilizing a semantic discriminator that facilitates generation of digital images that are realistic while also conforming to a semantic segmentation. The disclosed systems generate and provide a panoptic inpainting interface to facilitate user interaction for inpainting digital images. In certain embodiments, the disclosed systems iteratively update an inpainted digital image based on changes to a panoptic segmentation map.
The present disclosure relates to systems, methods, and non-transitory computer readable media for panoptically guiding digital image inpainting utilizing a panoptic inpainting neural network. In some embodiments, the disclosed systems utilize a panoptic inpainting neural network to generate an inpainted digital image according to panoptic segmentation map that defines pixel regions corresponding to different panoptic labels. In some cases, the disclosed systems train a neural network utilizing a semantic discriminator that facilitates generation of digital images that are realistic while also conforming to a semantic segmentation. The disclosed systems generate and provide a panoptic inpainting interface to facilitate user interaction for inpainting digital images. In certain embodiments, the disclosed systems iteratively update an inpainted digital image based on changes to a panoptic segmentation map.
Constrained stroke editing techniques for digital content are described. In these examples, a stroke constraint system is employed as part of a digital content creation system to manage input, editing, and erasure (i.e., removal) of strokes via a user interface as part of editing digital content. To do so, locations and attributes of a displayed stroke are used to constrain location and/or attributes of an input stroke.
G06F 3/04883 - Techniques d’interaction fondées sur les interfaces utilisateur graphiques [GUI] utilisant des caractéristiques spécifiques fournies par le périphérique d’entrée, p.ex. des fonctions commandées par la rotation d’une souris à deux capteurs, ou par la nature du périphérique d’entrée, p.ex. des gestes en fonction de la pression exer utilisant un écran tactile ou une tablette numérique, p.ex. entrée de commandes par des tracés gestuels pour l’entrée de données par calligraphie, p.ex. sous forme de gestes ou de texte
G06F 3/0354 - Dispositifs de pointage déplacés ou positionnés par l'utilisateur; Leurs accessoires avec détection des mouvements relatifs en deux dimensions [2D] entre le dispositif de pointage ou une partie agissante dudit dispositif, et un plan ou une surface, p.ex. souris 2D, boules traçantes, crayons ou palets
6.
ARTIFICIAL INTELLIGENCE TECHNIQUES FOR EXTRAPOLATING HDR PANORAMAS FROM LDR LOW FOV IMAGES
In some examples, a computing system accesses a field of view (FOV) image that has a field of view less than 360 degrees and has low dynamic range (LDR) values. The computing system estimates lighting parameters from a scene depicted in the FOV image and generates a lighting image based on the lighting parameters. The computing system further generates lighting features generated the lighting image and image features generated from the FOV image. These features are aggregated into aggregated features and a machine learning model is applied to the image features and the aggregated features to generate a panorama image having high dynamic range (HDR) values.
A method includes receiving a natural language description of an image to be generated using a machine learning model. The method further includes extracting, from the natural language description of the image to be generated, a control element and a sub-prompt. The method further includes identifying a relationship between the control element and the sub-prompt based on the natural language description of the image to be generated. The method further includes generating, by the machine learning model, an image based on the control element, the sub-prompt, and the relationship. The image includes visual elements corresponding to the control element and the sub-prompt.
Embodiments of the present invention provide systems, methods, and computer storage media for selection of the best image of a particular speaker's face in a video, and visualization in a diarized transcript. In an example embodiment, candidate images of a face of a detected speaker are extracted from frames of a video identified by a detected face track for the face, and a representative image of the detected speaker's face is selected from the candidate images based on image quality, facial emotion (e.g., using an emotion classifier that generates a happiness score), a size factor (e.g., favoring larger images), and/or penalizing images that appear towards the beginning or end of a face track. As such, each segment of the transcript is presented with the representative image of the speaker who spoke that segment and/or input is accepted changing the representative image associated with each speaker.
G11B 27/02 - Montage, p.ex. variation de l'ordre des signaux d'information enregistrés sur, ou reproduits à partir des supports d'enregistrement ou d'information
G06V 20/40 - RECONNAISSANCE OU COMPRÉHENSION D’IMAGES OU DE VIDÉOS Éléments spécifiques à la scène dans le contenu vidéo
G06V 40/16 - Visages humains, p.ex. parties du visage, croquis ou expressions
A method includes receiving an input including a target style and a glyph. The method further includes masking the glyph. The method further includes generating a stylized glyph by a glyph generative model using the masked glyph. The method further includes rendering the stylized glyph as a unicode stylized glyph.
A method, apparatus, and non-transitory computer readable medium for multimedia processing are described. Embodiments of the present disclosure obtain a project file comprising page data for one or more pages. Each of the one or more pages comprises a spatial arrangement of one or more media elements. A media editing interface presents a page of the one or more pages based on the spatial arrangement. The media editing interface presents a scene line adjacent to the page. The scene line comprises a temporal arrangement of one or more scenes within the page, and the one or more media elements are temporally arranged within the one or more scenes.
Embodiments of the present invention provide systems, methods, and computer storage media for segmenting a transcript into paragraphs. In an example embodiment, a transcript is segmented to start a new paragraph whenever there is a change in speaker and/or a long pause in speech. If any remaining paragraphs are longer than a designated length or duration (e.g., 50 or 100 words), each of those paragraphs is segmented using dynamic programming to minimize a cost function that penalizes candidate paragraphs based on divergence from a target paragraph length and/or that rewards candidate paragraphs that group semantically similar sentences. As such, the transcript is visualized, segmented at the identified paragraphs.
The present disclosure relates to systems, methods, and non-transitory computer readable media for panoptically guiding digital image inpainting utilizing a panoptic inpainting neural network. In some embodiments, the disclosed systems utilize a panoptic inpainting neural network to generate an inpainted digital image according to panoptic segmentation map that defines pixel regions corresponding to different panoptic labels. In some cases, the disclosed systems train a neural network utilizing a semantic discriminator that facilitates generation of digital images that are realistic while also conforming to a semantic segmentation. The disclosed systems generate and provide a panoptic inpainting interface to facilitate user interaction for inpainting digital images. In certain embodiments, the disclosed systems iteratively update an inpainted digital image based on changes to a panoptic segmentation map.
The present disclosure relates to systems, methods, and non-transitory computer readable media for panoptically guiding digital image inpainting utilizing a panoptic inpainting neural network. In some embodiments, the disclosed systems utilize a panoptic inpainting neural network to generate an inpainted digital image according to panoptic segmentation map that defines pixel regions corresponding to different panoptic labels. In some cases, the disclosed systems train a neural network utilizing a semantic discriminator that facilitates generation of digital images that are realistic while also conforming to a semantic segmentation. The disclosed systems generate and provide a panoptic inpainting interface to facilitate user interaction for inpainting digital images. In certain embodiments, the disclosed systems iteratively update an inpainted digital image based on changes to a panoptic segmentation map.
A method includes receiving a description of content to be generated using a generative model. The received description of content is associated with a user profile. The method further includes determining a semantic term based on the description of content. The method further includes generating a user-specific template including the semantic term and a user preference associated with the user profile. The method further includes generating the content using the generative model based on the user-specific template. The method further includes outputting the content for display on a target user device.
Embodiments of the present invention provide systems, methods, and computer storage media for annotating transcript text with video metadata, and including thumbnail bars in the transcript to help users select a desired portion of a video through transcript interactions. In an example embodiment, a video editing interface includes a transcript interface that presents a transcript with transcript text that is annotated to indicate corresponding portions of the video where various features were detected (e.g., annotating via text stylization of transcript text and/or labeling the transcript text with a textual representation of a corresponding detected feature class). In some embodiments, the transcript interface displays a visual representation of detected non-speech audio or pauses (e.g., a sound bar) and/or video thumbnails corresponding to each line of transcript text (e.g., a thumbnail bar). Transcript text, soundbars, and/or thumbnail bars are selectable to identify and perform video editing operations on a corresponding video segment.
The present disclosure relates to systems, methods, and non-transitory computer-readable media that modify two-dimensional images via scene-based editing using three-dimensional representations of the two-dimensional images. For instance, in one or more embodiments, the disclosed systems utilize three-dimensional representations of two-dimensional images to generate and modify shadows in the two-dimensional images according to various shadow maps. Additionally, the disclosed systems utilize three-dimensional representations of two-dimensional images to modify humans in the two-dimensional images. The disclosed systems also utilize three-dimensional representations of two-dimensional images to provide scene scale estimation via scale fields of the two-dimensional images. In some embodiments, the disclosed systems utilizes three-dimensional representations of two-dimensional images to generate and visualize 3D planar surfaces for modifying objects in two-dimensional images. The disclosed systems further use three-dimensional representations of two-dimensional images to customize focal points for the two-dimensional images.
In implementations of systems for generating templates using structure-based matching, a computing device implements a template system to receive input data describing a set of digital design elements. The template system represents the input data as a sentence in a design structure language that describes structural relationships between design elements included in the set of digital design elements. An input template embedding is generated based on the sentence in the design structure language. The template system generates a digital template that includes the set of digital design elements for display in a user interface based on the input template embedding.
Embodiments described herein include aspects related to generating a layout-aware background image. Aspects of the method include receiving a training dataset comprising a document. The method further includes obtaining a mask image based on a layout of content in the document, the mask image having a content area corresponding to content of the document. The method further includes training a machine learning model using the mask image to provide a trained machine learning model that generates transparency values for pixels of a background image for the document.
G06T 7/194 - Découpage; Détection de bords impliquant une segmentation premier plan-arrière-plan
G06T 7/90 - Détermination de caractéristiques de couleur
G06V 10/56 - Extraction de caractéristiques d’images ou de vidéos relative à la couleur
G06V 10/75 - Appariement de motifs d’image ou de vidéo; Mesures de proximité dans les espaces de caractéristiques utilisant l’analyse de contexte; Sélection des dictionnaires
19.
MUSIC-AWARE SPEAKER DIARIZATION FOR TRANSCRIPTS AND TEXT-BASED VIDEO EDITING
Embodiments of the present invention provide systems, methods, and computer storage media for music-aware speaker diarization. In an example embodiment, one or more audio classifiers detect speech and music independently of each other, which facilitates detecting regions in an audio track that contain music but do not contain speech. These music-only regions are compared to the transcript, and any transcription and speakers that overlap in time with the music-only regions are removed from the transcript. In some embodiments, rather than having the transcript display the text from this detected music, a visual representation of the audio waveform is included in the corresponding regions of the transcript.
A missing glyph replacement system is described. In an example, a Unicode identifier of a missing glyph is obtained and glyph metadata describing a glyph cluster that includes the Unicode identifier is obtained from a cache maintained in the storage device, e.g., as part of preprocessing. From this, the system obtains glyphs from the font using Unicode identifiers included in the glyph cluster. The system uses a representative glyph from these glyphs to verify the glyph cluster, and if verified obtains glyphs based on the cluster. For these obtained glyphs, an amount of similarity is determined for the missing glyph with respect to the plurality of obtained glyphs, e.g., to control output of representations of the obtained glyphs in the user interface. The representations are user selectable via the user interface to replace the missing glyph.
Self-consumable portions generation techniques from a digital document are described. The self-consumable portions are generated based on a determination of an amount of resources available at a receiver device that is to receive the digital document. Examples of the resources include an amount of memory resources, processing resources, and/or network resources associated with the receiver device. The self-consumable portions, once generated, are separately renderable at the receiver device.
The present disclosure relates to systems, methods, and non-transitory computer-readable media that provides to a user a subset of digital design templates as recommendations based on a creative segment classification and template classifications. For instance, in one or more embodiments, the disclosed systems generate the creative segment classification for the user and determines geo-seasonal intent data. Furthermore, the disclosed system generates template classifications using a machine learning model based on geo-seasonality and creative intent. In doing so, the disclosed system identifies a subset of digital design templates based on the template classifications, geo-seasonal intent data, and the creative segment classification of the user.
Methods, systems, and non-transitory computer readable storage media are disclosed for utilizing machine-learning to automatically select a machine-learning model for graph learning tasks. The disclosed system extracts, utilizing a graph feature machine-learning model, meta-graph features representing structural characteristics of a graph representation comprising a plurality of nodes and a plurality of edges indicating relationships between the plurality of nodes. The disclosed system also generates, utilizing the graph feature machine-learning model, a plurality of estimated graph learning performance metrics for a plurality of machine-learning models according to the meta-graph features. The disclosed system selects a machine-learning model to process data associated with the graph representation according to the plurality of estimated graph learning performance metrics.
Digital image text editing techniques as implemented by an image processing system are described that support increased user interaction in the creation and editing of digital images through understanding a content creator's intent as expressed using text. In one example, a text user input is received by a text input module. The text user input describes a visual object and a visual attribute, in which the visual object specifies a visual context of the visual attribute. A feature representation generated by a text-to-feature system using a machine-learning module based on the text user input. The feature representation is passed to an image editing system to edit a digital object in a digital image, e.g., by applying a texture to an outline of the digital object within the digital image.
Systems and methods for text simplification are described. Embodiments of the present disclosure identify a simplified text that includes original information from a complex text and additional information that is not in the complex text. Embodiments then compute an entailment score for each sentence of the simplified text using a neural network, wherein the entailment score indicates whether the sentence of the simplified text includes information from a sentence of the complex text corresponding to the sentence of the simplified text. Then, embodiments generate a modified text based on the entailment score, the simplified text, and the complex text, wherein the modified text includes the original information and excludes the additional information. Embodiments may then present the modified text to a user via a user interface.
Systems and methods for data augmentation are provided. One aspect of the systems and methods include receiving an image that is misclassified by a classification network; computing an augmentation image based on the image using an augmentation network; and generating an augmented image by combining the image and the augmentation image, wherein the augmented image is correctly classified by the classification network.
The present disclosure relates to systems, methods, and non-transitory computer readable media that determine internet traffic data loss from internet traffic data including bulk ingested data utilizing an internet traffic forecasting model. In particular, the disclosed systems detect that observed internet traffic data includes bulk ingested internet traffic data. In addition, the disclosed systems determine a predicted traffic volume for an outage period from the bulk ingested internet traffic data utilizing an internet traffic forecasting model. The disclosed systems further generate a decomposed predicted traffic volume for the outage period. The disclosed systems also determine an internet traffic data loss for the outage period from the decomposed predicted traffic volume while calibrating for pattern changes and late data from previous periods.
A media edit point selection process can include a media editing software application programmatically converting speech to text and storing a timestamp-to-text map. The map correlates text corresponding to speech extracted from an audio track for the media clip to timestamps for the media clip. The timestamps correspond to words and some gaps in the speech from the audio track. The probability of identified gaps corresponding to a grammatical pause by the speaker is determined using the timestamp-to-text map and a semantic model. Potential edit points corresponding to grammatical pauses in the speech are stored for display or for additional use by the media editing software application. Text can optionally be displayed to a user during media editing.
Techniques for trigger based digital content caching are described to automatically cache digital content on a client device based on a likelihood that the client device will access the digital content. A cache system, for instance, monitors an interaction of a first client device with digital content that is maintained as part of a digital service by a service provider system. Based on the monitored interaction, the cache system detects a trigger event that indicates a likelihood of interaction by a second client device to edit the digital content. Responsive to detection of the trigger event, the cache system is operable to initiate caching of the digital content on the second client device automatically and without user intervention.
H04N 21/231 - Opération de stockage de contenu, p.ex. mise en mémoire cache de films pour stockage à court terme, réplication de données sur plusieurs serveurs, ou établissement de priorité des données pour l'effacement
H04N 21/472 - Interface pour utilisateurs finaux pour la requête de contenu, de données additionnelles ou de services; Interface pour utilisateurs finaux pour l'interaction avec le contenu, p.ex. pour la réservation de contenu ou la mise en place de rappels, pour la requête de notification d'événement ou pour la transformation de contenus affichés
Systems and methods for query processing are described. Embodiments of the present disclosure identify a target phrase in an original query, wherein the target phrase comprises a phrase to be replaced in the original query; replace the target phrase with a mask token to obtain a modified query; generate an alternative query based on the modified query using a masked language model (MLM), wherein the alternative query includes an alternative phrase in place of the target phrase that is consistent with a context of the target phrase; and retrieve a search result based on the alternative query.
Methods and systems are provided for facilitating generation and utilization of causal-based models. In embodiments described herein, a set of events comprising touchpoints resulting in a conversion are obtained. A direct attribution indicating credit for an event contribution to the conversion is determined. An adjusted attribution for the event based on the direct attribution for the event augmented with an indirect attribution for the event is determined. The indirect attribution can be identified based on the event causing a subsequent event of the set of events to result in the conversion. Thereafter, the adjusted attribution for the event is provided to indicate an extent of credit assigned to the event for causing the corresponding conversion.
Systems and methods for collaborative document signing are described. According to one aspects, a method for collaborative document signing includes initiating a live communication session including a user, identifying a source document for an agreement using an agreement signing interface of the live communication session, assigning the user as a signer of the agreement using the agreement signing interface, and generating the agreement. In some cases, the agreement includes the source document. The method further includes obtaining a signature for the agreement from the user and generating a signed agreement including the signature.
G06F 3/0482 - Interaction avec des listes d’éléments sélectionnables, p.ex. des menus
G06F 3/0484 - Techniques d’interaction fondées sur les interfaces utilisateur graphiques [GUI] pour la commande de fonctions ou d’opérations spécifiques, p.ex. sélection ou transformation d’un objet, d’une image ou d’un élément de texte affiché, détermination d’une valeur de paramètre ou sélection d’une plage de valeurs
Systems and methods for joint document signing are described. According to one aspect, a method for joint document signing includes establishing a live communication session including a plurality of users. In some cases, the plurality of users correspond to a set of signers of a document. The method further includes initiating a signing process during the live communication session, receiving a signature for the document from each of the plurality of users during the live communication session based on the signing process, and generating a signed document including the signature from each of the plurality of users.
G06F 40/166 - Traitement de texte Édition, p.ex. insertion ou suppression
G06F 3/0484 - Techniques d’interaction fondées sur les interfaces utilisateur graphiques [GUI] pour la commande de fonctions ou d’opérations spécifiques, p.ex. sélection ou transformation d’un objet, d’une image ou d’un élément de texte affiché, détermination d’une valeur de paramètre ou sélection d’une plage de valeurs
H04L 65/1069 - Gestion de session Établissement ou terminaison d'une session
35.
UTILIZING TREND SETTER BEHAVIOR TO PREDICT ITEM DEMAND AND DISTRIBUTE RELATED DIGITAL CONTENT ACROSS DIGITAL PLATFORMS
The present disclosure relates to systems, methods, and non-transitory computer-readable media that distribute item-based digital content across digital platforms using trend setting participants of those digital platforms. For instance, in one or more embodiments, the disclosed systems generate affinity metrics for digital items from a catalog of digital items with respect to a plurality of trend setting participants of a plurality of digital platforms using attributes of digital posts by the plurality of trend setting participants on the plurality of digital platforms and corresponding attributes of the digital items. The disclosed systems further determine predicted demand metrics for the digital items on the plurality of digital platforms using the affinity metrics. Using the predicted demand metrics, the disclosed systems distribute digital content related to the digital items for display on a plurality of client devices via the plurality of digital platforms.
A modeling system displays a three-dimensional (3D) space including a 3D object including a plurality of points and a cage model of the 3D object including a first configuration of vertices and quad faces. Each of the plurality of points is located at a respective initial location. The modeling system generates cage coordinates for the cage model including a vertex coordinate for each vertex of the cage model and four quad coordinates for each quad face of the cage model corresponding to each corner vertex of the quad. The modeling system deforms, responsive to receiving a request, the cage model to change the first configuration of vertices to a second configuration. The modeling system generates, based on the cage coordinates, the first configuration of vertices, and the second configuration of vertices, an updated 3D object by determining a subsequent location for each of the plurality of points.
In various examples, a table recognition model receives an image of a table and generates, using a first encoder of the table recognition machine learning model, an image feature vector including features extracted from the image of the table; generates, using a first decoder of the table recognition machine learning model and the image feature vector, a set of coordinates within the image representing rows and columns associated with the table, and generates, using a second decoder of the table recognition machine learning model and the image feature vector, a set of bounding boxes and semantic features associated with cells the table, then determines, using a third decoder of the table recognition machine learning model, a table structure associated with the table using the image feature vector, the set of coordinates, the set of bounding boxes, and the semantic features.
G06V 30/412 - Analyse de mise en page de documents structurés avec des lignes imprimées ou des zones de saisie, p.ex. de formulaires ou de tableaux d’entreprise
G06V 30/262 - Techniques de post-traitement, p.ex. correction des résultats de la reconnaissance utilisant l’analyse contextuelle, p.ex. le contexte lexical, syntaxique ou sémantique
G06V 30/414 - Extraction de la structure géométrique, p.ex. arborescence; Découpage en blocs, p.ex. boîtes englobantes pour les éléments graphiques ou textuels
Systems and methods for query processing are described. Embodiments of the present disclosure identify an original query; generate a plurality of expanded queries by generating a plurality of additional phrases based on the original query using a causal language model (CLM) and augmenting the original query with each of the plurality of additional phrases, respectively; and provide a plurality of images in response to the original query, wherein the plurality of images are associated with the plurality of expanded queries, respectively.
In some embodiments, techniques for producing user-generated content are provided. For example, a process may involve sending a product identifier; receiving a first candidate image that is associated with the product identifier; determining that a similarity between a user structure and a target structure satisfies a threshold condition, wherein the user structure characterizes a figure of a user in a first input image and the target structure is based on a pose guide associated with the first candidate image; and capturing, based on the determining, the first input image.
A search system employs arrival times with associated confidence scores as search facets for identifying items. The search system identifies a plurality of items based on search input. An arrival time and associated confidence score are determined for each item from the plurality of items. Search results are provided for the plurality of items in response to the search input. The search results are provided based at least in part on the arrival times and associated confidence scores for the plurality of items.
Methods, computer systems, computer-storage media, and graphical user interfaces are provided for facilitating generation and presentation of insights. In one implementation, a set of data is used to generate a data visualization. A candidate insight associated with the data visualization is generated, the candidate insight being generated in text form based on a text template and comprising a descriptive insight, a predictive insight, an investigative, or a prescriptive insight. A set of natural language insights is generated, via a machine learning model. The natural language insights represent the candidate insight in a text style that is different from the text template. A natural language insight having the text style corresponding with a desired text style is selected for presenting the candidate insight and, thereafter, the selected natural language insight and data visualization are providing for display via a graphical user interface.
A visual lens system is described that identifies, automatically and without user intervention, digital tool parameters for achieving a visual appearance of an image region in raster image data. To do so, the visual lens system processes raster image data using a tool region detection network trained to output a mask indicating whether the digital tool is useable to achieve a visual appearance of each pixel in the raster image data. The mask is then processed by a tool parameter estimation network trained to generate a probability distribution indicating an estimation of discrete parameter configurations applicable to the digital tool to achieve the visual appearance. The visual lens system generates an image tool description for the parameter configuration and incorporates the image tool description into an interactive image for the raster image data. The image tool description enables transfer of the digital tool parameter configuration to different image data.
G06T 11/40 - Remplissage d'une surface plane par addition d'attributs de surface, p.ex. de couleur ou de texture
G06F 3/04817 - Techniques d’interaction fondées sur les interfaces utilisateur graphiques [GUI] fondées sur des propriétés spécifiques de l’objet d’interaction affiché ou sur un environnement basé sur les métaphores, p.ex. interaction avec des éléments du bureau telles les fenêtres ou les icônes, ou avec l’aide d’un curseur changeant de comport utilisant des icônes
G06F 3/04842 - Sélection des objets affichés ou des éléments de texte affichés
G06F 18/214 - Génération de motifs d'entraînement; Procédés de Bootstrapping, p.ex. ”bagging” ou ”boosting”
G06F 18/2411 - Techniques de classification relatives au modèle de classification, p.ex. approches paramétriques ou non paramétriques basées sur la proximité d’une surface de décision, p.ex. machines à vecteurs de support
G06F 18/40 - Dispositions logicielles spécialement adaptées à la reconnaissance des formes, p.ex. interfaces utilisateur ou boîtes à outils à cet effet
Methods, systems, and non-transitory computer readable storage media are disclosed for automatically detecting and reconstructing patterns in digital images. The disclosed system determines structurally similar pixels of a digital image by comparing neighborhood descriptors that include the structural context for neighborhoods of the pixels. In response to identify structurally similar pixels of a digital image, the disclosed system utilizes non-maximum suppression to reduce the set of structurally similar pixels to collinear pixels within the digital image. Additionally, the disclosed system determines whether a group of structurally similar pixels define the boundaries of a pattern cell that forms a rectangular grid pattern within the digital image. The disclosed system also modifies a boundary of a detected pattern cell to include a human-perceived pattern object via a sliding window corresponding to the pattern cell.
G06V 10/77 - Dispositions pour la reconnaissance ou la compréhension d’images ou de vidéos utilisant la reconnaissance de formes ou l’apprentissage automatique utilisant l’intégration et la réduction de données, p.ex. analyse en composantes principales [PCA] ou analyse en composantes indépendantes [ ICA] ou cartes auto-organisatrices [SOM]; Séparation aveugle de source
G06T 9/20 - Codage des contours, p.ex. utilisant la détection des contours
G06V 10/46 - Descripteurs pour la forme, descripteurs liés au contour ou aux points, p.ex. transformation de caractéristiques visuelles invariante à l’échelle [SIFT] ou sacs de mots [BoW]; Caractéristiques régionales saillantes
Systems and methods for image exploration are provided. One aspect of the systems and methods includes identifying a set of images; reducing the set of images to obtain a representative set of images that is distributed throughout the set of images by removing a neighbor image based on a proximity of the neighbor image to an image of the representative set of images; arranging the representative set of images in a grid structure using a self-sorting map (SSM) algorithm; and displaying a portion of the representative set of images based on the grid structure.
G06F 16/54 - Navigation; Visualisation à cet effet
G06V 10/22 - Prétraitement de l’image par la sélection d’une région spécifique contenant ou référençant une forme; Localisation ou traitement de régions spécifiques visant à guider la détection ou la reconnaissance
G06V 10/772 - Détermination de motifs de référence représentatifs, p.ex. motifs de valeurs moyennes ou déformants; Génération de dictionnaires
45.
Text-based color palette searches utilizing text-to-color models
The present disclosure relates to systems that perform text-based palette searches that convert a text query into a color distribution and utilize the color distribution to identify relevant color palettes. More specifically, the disclosed systems receive a textual color palette search query and convert, utilizing a text-to-color model, the textual color palette search query into a color distribution. The disclosed systems determine, utilizing a palette scoring model, distance metrics between the color distribution and a plurality of color palettes in a color database by: identifying swatch matches between colors of the color distribution and unmatched swatches of the plurality of color palettes and determining distances between the colors of the color distribution and matched swatches of the plurality of color palettes. The disclosed systems return one or more color palettes of the plurality of color palettes in response to the textual color palette search query based on the distance metrics.
G06F 16/583 - Recherche caractérisée par l’utilisation de métadonnées, p.ex. de métadonnées ne provenant pas du contenu ou de métadonnées générées manuellement utilisant des métadonnées provenant automatiquement du contenu
G06F 16/532 - Formulation de requêtes, p.ex. de requêtes graphiques
G06F 16/538 - Présentation des résultats des requêtes
G06F 40/40 - Traitement ou traduction du langage naturel
Methods, systems, and non-transitory computer readable storage media are disclosed for customizing digital content tutorials for a user within a digital editing application based on user experience with editing tools. The disclosed system determines proficiency levels for a plurality of different portions of a digital content tutorial corresponding to a digital content editing task. The disclosed system generates tool proficiency scores associated with the user in a digital editing application in connection with the portions of the digital content tutorial. Specifically, the disclosed system generates the tool proficiency scores based on usage of tools corresponding to the portions. Additionally, the disclosed system generates a mapping for the user based on the tool proficiency scores associated with the user and the proficiency levels of the portions of the digital content tutorial. The disclosed system provides a customized digital content tutorial for display at a client device according to the mapping.
Embodiments are disclosed for reconstructing linear gradients from an input image that can be applied to another image. In particular, in one or more embodiments, the disclosed systems and methods comprise receiving a raster image, the raster image including a representation of a linear color gradient. The disclosed systems and methods further comprise determining a vector representing a direction of the linear color gradient. The disclosed systems and methods further comprise analyzing pixel points along the direction of the linear color gradient to compute color stops of the linear color gradient. The disclosed systems and methods further comprise generating an output color gradient vector with the computed color stops of the linear color gradient, the output color gradient vector to be applied to a vector graphic.
Systems and methods for event processing are provided. One aspect of the systems and methods includes receiving an event corresponding to an interaction of a user with a digital content channel; identifying a rule state for a segmentation rule that assigns users to a segment; assigning the user to the segment by evaluating the segmentation rule based on the rule state and the event from the digital content channel; updating the rule state; and providing customized content to the user based on the assignment of the user to the segment.
H04L 47/762 - Contrôle d'admission; Allocation des ressources en utilisant l'allocation dynamique des ressources, p.ex. renégociation en cours d'appel sur requête de l'utilisateur ou sur requête du réseau en réponse à des changements dans les conditions du réseau déclenchée par le réseau
H04L 47/70 - Contrôle d'admission; Allocation des ressources
49.
MULTIDIMENTIONAL IMAGE EDITING FROM AN INPUT IMAGE
Various disclosed embodiments are directed to changing parameters of an input image or multidimensional representation of the input image based on a user request to change such parameters. An input image is first received. A multidimensional image that represents the input image in multiple dimensions is generated via a model. A request to change at least a first parameter to a second parameter is received via user input at a user device. Such request is a request to edit or generate the multidimensional image in some way. For instance, the request may be to change the light source position or camera position from a first set of coordinates to a second set of coordinates.
G06T 19/20 - Transformation de modèles ou d'images tridimensionnels [3D] pour infographie Édition d'images tridimensionnelles [3D], p.ex. modification de formes ou de couleurs, alignement d'objets ou positionnements de parties
G06F 3/04847 - Techniques d’interaction pour la commande des valeurs des paramètres, p.ex. interaction avec des règles ou des cadrans
G06V 10/774 - Dispositions pour la reconnaissance ou la compréhension d’images ou de vidéos utilisant la reconnaissance de formes ou l’apprentissage automatique utilisant l’intégration et la réduction de données, p.ex. analyse en composantes principales [PCA] ou analyse en composantes indépendantes [ ICA] ou cartes auto-organisatrices [SOM]; Séparation aveugle de source méthodes de Bootstrap, p.ex. "bagging” ou “boosting”
G06V 10/776 - Dispositions pour la reconnaissance ou la compréhension d’images ou de vidéos utilisant la reconnaissance de formes ou l’apprentissage automatique utilisant l’intégration et la réduction de données, p.ex. analyse en composantes principales [PCA] ou analyse en composantes indépendantes [ ICA] ou cartes auto-organisatrices [SOM]; Séparation aveugle de source Évaluation des performances
G06V 10/82 - Dispositions pour la reconnaissance ou la compréhension d’images ou de vidéos utilisant la reconnaissance de formes ou l’apprentissage automatique utilisant les réseaux neuronaux
50.
ATTENTION AWARE MULTI-MODAL MODEL FOR CONTENT UNDERSTANDING
A content analysis system provides content understanding for a content item using an attention aware multi-modal model. Given a content item, feature extractors extract features from content components of the content item in which the content components comprise multiple modalities. A cross-modal attention encoder of the attention aware multi-modal model generates an embedding of the content item using features extracted from the content components. A decoder of the attention aware multi-modal model generates an action-reason statement using the embedding of the content item from the cross-modal attention encoder.
G06F 16/58 - Recherche caractérisée par l’utilisation de métadonnées, p.ex. de métadonnées ne provenant pas du contenu ou de métadonnées générées manuellement
A search system generates custom attributes for use as search facets. User input associated with an image of a target item available on a listing platform is received. The image is analyzed to determine an attribute of the target item as a custom attribute. A value for the custom attribute is determined for each of a number of other items available on the listing platform that are of the same item type as the target item. Search results are provided based at least in part on the values of the custom attribute for the other items.
The present disclosure describes systems, non-transitory computer-readable media, and methods for generating object-specific-preset edits to be later applied to other digital images depicting a same object type or applying a previously generated object-specific-preset edit to an object of the same object type within a target digital image. For example, in some cases, the disclosed systems generate an object-specific-preset edit by determining a region of a particular localized edit in an edited digital image, identifying an edited object corresponding to the localized edit, and storing in a digital-image-editing document an object tag for the edited object and instructions for the localized edit. In certain implementations, the disclosed systems further apply such an object-specific-preset edit to a target object in a target digital image by determining transformed-positioning parameters for a localized edit from the object-specific-preset edit to the target object.
Systems and methods for image processing are described. Embodiments of the present disclosure receive a raster image depicting a radial color gradient; compute an origin point of the radial color gradient based on an orthogonality measure between a color gradient vector at a point in the raster image and a relative position vector between the point and the origin point; construct a vector graphics representation of the radial color gradient based on the origin point; and generate a vector graphics image depicting the radial color gradient based on the vector graphics representation.
Embodiments provide systems, methods, and computer storage media for prediction and computation of electronic shopping carts. In an example embodiment, for each interaction between an e-shopper and an e-commerce application, one or more predicted electronic shopping carts that represent a combination of items the e-shopper is likely to purchase are generated based on current items in the e-shopper's electronic shopping cart and recent interactions with the e-shopper. For some or all of the predicted electronic shopping carts (e.g., those with top predicted confidence levels), corresponding shopping cart computations (e.g., identifying application promotions, determining a price total for the items in the predicted shopping cart) are executed and cached prior to the e-shopping adding the predicted items. As such, a page configured to visualize the predicted electronic shopping cart with a value retrieved from the cached shopping cart computations (e.g., price total for the predicted electronic shopping cart) is generated.
The present disclosure relates to systems, methods, and non-transitory computer readable media that utilize deep learning to map query videos to known videos so as to identify a provenance of the query video or identify editorial manipulations of the query video relative to a known video. For example, the video comparison system includes a deep video comparator model that generates and compares visual and audio descriptors utilizing codewords and an inverse index. The deep video comparator model is robust and ignores discrepancies due to benign transformations that commonly occur during electronic video distribution.
H04N 21/434 - Désassemblage d'un flux multiplexé, p.ex. démultiplexage de flux audio et vidéo, extraction de données additionnelles d'un flux vidéo; Remultiplexage de flux multiplexés; Extraction ou traitement de SI; Désassemblage d'un flux élémentaire mis en paquets
G06F 16/78 - Recherche de données caractérisée par l’utilisation de métadonnées, p.ex. de métadonnées ne provenant pas du contenu ou de métadonnées générées manuellement
H04N 21/84 - Génération ou traitement de données de description, p.ex. descripteurs de contenu
H04N 21/845 - Structuration du contenu, p.ex. décomposition du contenu en segments temporels
56.
IMAGE COMPRESSION PERFORMANCE OPTIMIZATION FOR IMAGE COMPRESSION
The context-aware optimization method includes training a context model by determining whether to split each node in the context by identifying a first subset of virtual context to evaluate by identifying a second subset of virtual contexts to evaluate and obtaining an encoding cost of splitting of the context model for each virtual context in the second subset and identifying the first subset of virtual contexts to evaluate by selecting a predetermined number of virtual contexts from the second subset based on the encoding cost such that the predetermined number of virtual contexts with lowest encoding cost are selected. The modified tree-traversal method includes encoding a mask or performing a speculative-based method. The modified entropy coding method includes representing data into an array of bits, using multiple coders to process each bit in the array and combining the output from the multiple coders into a data range.
An image processing system uses a depth-conditioned autoencoder to generate a modified image from an input image such that the modified image maintains an overall structure from the input image while modifying textural features. An encoder of the depth-conditioned autoencoder extracts a structure latent code from an input image and depth information for the input image. A generator of the depth-conditioned autoencoder generates a modified image using the structure latent code and a texture latent code. The modified image generated by the depth-conditioned autoencoder includes the structural features from the input image while incorporating textural features of the texture latent code. In some aspects, the autoencoder is depth-conditioned during training by augmenting training images with depth information. The autoencoder is trained to preserve the depth information when generating images.
Embodiments are disclosed for identifying and generating symmetrical repeat edits to similar objects in an image. A selection of a first object and an edit to the first object in an image is received. The image is searched for a plurality of candidate objects that have a similar shape to the first object and the plurality of candidate objects are filtered to include one or more objects that are symmetrical with the first object. A symmetric object is selected from the plurality of candidate objects. An axis of symmetry is computed between the symmetric object and the first object. The edit is applied to the symmetric object and to the first object.
Embodiments are disclosed for removing typographic rivers from electronic documents. The method may include receiving an electronic document including a plurality of words for automatic typographic correction. A typographic river is identified in the electronic document, the typographic river including a plurality of nodes, each node including an empty glyph. A candidate adjustment that removes the first node of the plurality of nodes is identified and the candidate adjustment is applied to the electronic document.
G06F 17/00 - TRAITEMENT ÉLECTRIQUE DE DONNÉES NUMÉRIQUES Équipement ou méthodes de traitement de données ou de calcul numérique, spécialement adaptés à des fonctions spécifiques
G06F 40/109 - Maniement des polices de caractères; Typographie cinétique ou temporelle
G06F 40/166 - Traitement de texte Édition, p.ex. insertion ou suppression
60.
MACHINE LEARNING CONTEXT BASED CONFIDENCE CALIBRATION
Systems and methods for machine learning context based confidence calibration are disclosed. In one embodiment, a processing logic may obtain an image frame; generate, with a first machine learning model, a confidence score, a bounding box, and an instance embedding corresponding to an object instance inferred from the image frame; and compute, with a second machine learning model, a calibrated confidence score for the object instance based on the instance embedding, the confidence score, and the bounding box.
The present disclosure relates to systems, methods, and non-transitory computer readable media that recommend editing presets based on editing intent. For instance, in one or more embodiments, the disclosed systems receive, from a client device, a user query corresponding to a digital image to be edited. The disclosed systems extract, from the user query, an editing intent for editing the digital image. Further, the disclosed systems determine an editing preset that corresponds to the editing intent based on an editing state of an edited digital image associated with the editing preset. The disclosed systems generate a recommendation for the editing preset for provision to the client device.
Systems and methods for image processing are described. Embodiments of the present disclosure receive a reference image depicting a reference object with a target spatial attribute; generate object saliency noise based on the reference image by updating random noise to resemble the reference image; and generate an output image based on the object saliency noise, wherein the output image depicts an output object with the target spatial attribute.
G06V 10/74 - Appariement de motifs d’image ou de vidéo; Mesures de proximité dans les espaces de caractéristiques
G06V 10/764 - Dispositions pour la reconnaissance ou la compréhension d’images ou de vidéos utilisant la reconnaissance de formes ou l’apprentissage automatique utilisant la classification, p.ex. des objets vidéo
G06V 10/774 - Dispositions pour la reconnaissance ou la compréhension d’images ou de vidéos utilisant la reconnaissance de formes ou l’apprentissage automatique utilisant l’intégration et la réduction de données, p.ex. analyse en composantes principales [PCA] ou analyse en composantes indépendantes [ ICA] ou cartes auto-organisatrices [SOM]; Séparation aveugle de source méthodes de Bootstrap, p.ex. "bagging” ou “boosting”
G06V 20/70 - RECONNAISSANCE OU COMPRÉHENSION D’IMAGES OU DE VIDÉOS Éléments spécifiques à la scène Étiquetage du contenu de scène, p.ex. en tirant des représentations syntaxiques ou sémantiques
Embodiments are disclosed for blending complex objects. The method may include identifying a first complex object and a second complex object. A first primary object associated with the first complex object and a first sequence of geometric repeat operations are determined. A second primary object associated with the second complex object and second sequence of geometric repeat operations are also determined. A blending operation is applied to the first primary object and the second primary object to generate one or more intermediate primary objects. One or more intermediate complex objects are generated from the one or more intermediate primary objects.
In implementations of systems for visual reordering of partial vector objects, a computing device implements an order system to receive input data describing a region specified relative to a group of vector objects that includes a portion of a first vector object and a portion of second vector object. A visual order as between the portion of the first vector object and the portion of the second vector object within the region is determined. The order system computes a modified visual order as between the portion of the first vector object and the portion of the second vector object within the region based on the visual order. The order system generates the group of vector objects for display in a user interface using a render surface and a sentinel value to render pixels within the region in the modified visual order.
The technology described herein receives a natural-language sequence of words comprising multiple entities. The technology then identifies a plurality of entities in the natural-language sequence. The technology generates a masked natural-language sequence by masking a first entity in the natural-language sequence. The technology retrieves, from a knowledge base, information related to a second entity in the plurality of entities. The technology then trains a natural-language model to respond to a query. The training uses a first representation of the masked natural-language sequence, a second representation of the information, and the first entity.
H04L 51/02 - Messagerie d'utilisateur à utilisateur dans des réseaux à commutation de paquets, transmise selon des protocoles de stockage et de retransmission ou en temps réel, p.ex. courriel en utilisant des réactions automatiques ou la délégation par l’utilisateur, p.ex. des réponses automatiques ou des messages générés par un agent conversationnel
This disclosure describes one or more implementations of systems, non-transitory computer-readable media, and methods that regularize learning targets for a student network by leveraging past state outputs of the student network with outputs of a teacher network to determine a retrospective knowledge distillation loss. For example, the disclosed systems utilize past outputs from a past state of a student network with outputs of a teacher network to compose student-regularized teacher outputs that regularize training targets by making the training targets similar to student outputs while preserving semantics from the teacher training targets. Additionally, the disclosed systems utilize the student-regularized teacher outputs with student outputs of the present states to generate retrospective knowledge distillation losses. Then, in one or more implementations, the disclosed systems compound the retrospective knowledge distillation losses with other losses of the student network outputs determined on the main training tasks to learn parameters of the student networks.
Embodiments are disclosed for performing 3-D vectorization. The method includes obtaining a three-dimensional rendered image and a camera position. The method further includes obtaining a triangle mesh representing the three-dimensional rendered image. The method further involves creating a reduced triangle mesh by removing one or more triangles from the triangle mesh. The method further involves subdividing each triangle of the reduced triangle mesh into one or more subdivided triangles. The method further involves performing a mapping of each pixel of the three-dimensional rendered image to the reduced triangle mesh. The method further involves assigning a color value to each vertex of the reduced triangle mesh. The method further involves sorting each triangle of the reduced triangle mesh using a depth value of each triangle. The method further involves generating a two-dimensional triangle mesh using the sorted triangles of the reduced triangle mesh.
Location operation conflict resolution techniques are described. In these techniques, a likely user's intent is inferred by a digital image editing system to prioritize anchor points that are to be a subject of a location operation. In an example in which multiple anchor points qualify for location operations at a same time, these techniques are usable to resolve conflicts between the anchor points based on an assigned priority. In an implementation, the priority is based on selection input location with respect to an object.
G06V 10/24 - Alignement, centrage, détection de l’orientation ou correction de l’image
G06V 10/22 - Prétraitement de l’image par la sélection d’une région spécifique contenant ou référençant une forme; Localisation ou traitement de régions spécifiques visant à guider la détection ou la reconnaissance
69.
DETERMINING FEATURE CONTRIBUTIONS TO DATA METRICS UTILIZING A CAUSAL DEPENDENCY MODEL
The present disclosure relates to methods, systems, and non-transitory computer-readable media for determining causal contributions of dimension values to anomalous data based on causal effects of such dimension values on the occurrence of other dimension values from interventions performed in a causal graph. For example, the disclosed systems can identify an anomalous dimension value that reflects a threshold change in value between an anomalous time period and a reference time period. The disclosed systems can determine causal effects by traversing a causal network representing dependencies between different dimensions associated with the dimension values. Based on the causal effects, the disclosed systems can determine causal contributions of particular dimension values on the anomalous dimension value. Further, the disclosed systems can generate a causal-contribution ranking of the particular dimension values based on the determined causal contributions.
Embodiments are disclosed for managing text co-editing in a conflict-free replicated data type (CRDT) environment. A method of co-editing management includes detecting a burst operation to be performed on a sequential data structure being edited by one or more client devices. A segment of the sequential data structure associated with the burst operation is determined based on a logical index associated with the burst operation. A tree structure associated with the segment is generated, where a root node of the tree structure corresponds to the burst operation. A global index for the root node of the tree structure is determined and an update corresponding to the burst operation, including the root node and the global index, is sent to the one or more client devices.
In implementations of systems for generating and propagating personal masking edits, a computing device implements a mask system to detect a face of a person depicted in a digital image displayed in a user interface of an application for editing digital content. The mask system determines an identifier for the person based on an identifier for the face. Edit data is received describing properties of an editing operation and a type of mask used to modify a particular portion of the person depicted in the digital image. The mask system edits an additional digital image identified based on the identifier of the person using the type of mask and the properties of the editing operation to modify the particular portion of the person as depicted in the additional digital image.
G06V 10/26 - Segmentation de formes dans le champ d’image; Découpage ou fusion d’éléments d’image visant à établir la région de motif, p.ex. techniques de regroupement; Détection d’occlusion
72.
DEFORMABLE NEURAL RADIANCE FIELD FOR EDITING FACIAL POSE AND FACIAL EXPRESSION IN NEURAL 3D SCENES
A scene modeling system receives a video including a plurality of frames corresponding to views of an object and a request to display an editable three-dimensional (3D) scene that corresponds to a particular frame of the plurality of frames. The scene modeling system applies a scene representation model to the particular frame, and includes a deformation model configured to generate, for each pixel of the particular frame based on a pose and an expression of the object, a deformation point using a 3D morphable model (3DMM) guided deformation field. The scene representation model includes a color model configured to determine, for the deformation point, color and volume density values. The scene modeling system receives a modification to one or more of the pose or the expression of the object including a modification to a location of the deformation point and renders an updated video based on the received modification.
G06T 19/20 - Transformation de modèles ou d'images tridimensionnels [3D] pour infographie Édition d'images tridimensionnelles [3D], p.ex. modification de formes ou de couleurs, alignement d'objets ou positionnements de parties
G06T 17/00 - Modélisation tridimensionnelle [3D] pour infographie
Content creation techniques are described that leverage content analytics to provide insight and guidance as part of content creation. To do so, content features are extracted by a content analytics system from a plurality of content and used by the content analytics system as a basis to generate a content dataset. Event data is also collected by the content analytics system from an event data source. Event data describes user interaction with respective items of content, including subsequent activities in both online and physical environments. The event data is then used to generate an event dataset. An analytics user interface is then generated by the content analytics system using the content dataset and the event dataset and is usable to guide subsequent content creation and editing.
G06F 3/0484 - Techniques d’interaction fondées sur les interfaces utilisateur graphiques [GUI] pour la commande de fonctions ou d’opérations spécifiques, p.ex. sélection ou transformation d’un objet, d’une image ou d’un élément de texte affiché, détermination d’une valeur de paramètre ou sélection d’une plage de valeurs
G06F 18/2415 - Techniques de classification relatives au modèle de classification, p.ex. approches paramétriques ou non paramétriques basées sur des modèles paramétriques ou probabilistes, p.ex. basées sur un rapport de vraisemblance ou un taux de faux positifs par rapport à un taux de faux négatifs
G06V 10/40 - Extraction de caractéristiques d’images ou de vidéos
G06V 10/764 - Dispositions pour la reconnaissance ou la compréhension d’images ou de vidéos utilisant la reconnaissance de formes ou l’apprentissage automatique utilisant la classification, p.ex. des objets vidéo
G06F 3/0482 - Interaction avec des listes d’éléments sélectionnables, p.ex. des menus
G06T 11/20 - Traçage à partir d'éléments de base, p.ex. de lignes ou de cercles
G06F 40/166 - Traitement de texte Édition, p.ex. insertion ou suppression
A method includes receiving a user event associated with content of an add-on for a web application displayed on a first user interface. The add-on is a non-native application executed using a hypertext markup language (HTML) element. The method further includes passing the user event to a document object model of the web application using a blank native element. The blank native element links the add-on to the document object model. The method further includes processing the user event using an HTML element renderer. The method further includes displaying updated content associated with the add-on based on the processed user event.
In implementations of systems for generating blend objects from objects with pattern fills, a computing device implements a blend system to generate a source master texture using a first pattern fill of a source object and a destination master texture using a second pattern fill of the a destination object. First colors are sampled from the source master texture and second colors are sampled from the destination master texture. The blend system determines a blended pattern fill for the first pattern fill and the second pattern fill by combining the first colors and the second colors. The blend system generates an intermediate blend object for the source object and the destination object for display in a user interface based on the blended pattern fill.
Methods, systems, and non-transitory computer readable storage media are disclosed for modifying parametric continuity between portions of a digital image in piecewise parametric patch deformations. For example, the disclosed system determine parametric patches in a parametric quilt corresponding to a digital image in response to a request to deform the digital image. The disclosed system divides the digital image into a plurality of separate portions along edges of the parametric patches, each parametric patch comprising a separate set of control points. The disclosed system generates sets of interactive handles for each anchor control point in the parametric patch corresponding to metadata flags that determine parametric continuities between portions of the digital image. Additionally, in response to a user input, the disclosed system modifies the parametric continuity at a portion of the digital image corresponding to an anchor control point by modifying a metadata flag for the control point.
The present disclosure relates to systems, methods, and non-transitory computer readable media that fill in digital documents using user identity models of client devices. For instance, in one or more embodiments, the disclosed systems receive a digital document comprising a digital fillable field. The disclosed systems further retrieve, for a client device associated with the digital document, a decentralized identity credential comprising a user attribute established under a decentralized identity framework. Using the user attribute of the decentralized identity credential, the disclosed systems modify the digital document by filling in the digital fillable field.
H04L 9/32 - Dispositions pour les communications secrètes ou protégées; Protocoles réseaux de sécurité comprenant des moyens pour vérifier l'identité ou l'autorisation d'un utilisateur du système
Embodiments are disclosed for generating snapping guide lines from objects in a selected region to an object or drawing tool by a digital design system. In particular, in one or more embodiments, the disclosed systems and methods comprise receiving a first selection of an object from a plurality of objects within a drawing displayed in a graphical user interface (GUI). The disclosed systems and methods further comprise receiving a second selection of a region of interest. The disclosed systems and methods further comprise identifying one or more objects in the region of interest. The disclosed systems and methods further comprise, in response to an input indicating a moving operation of the selected object, generating guide lines from objects in the region of interest to the selected object. The disclosed systems and methods further comprise performing the moving operation of the selected object based on alignment with the generated guide lines.
G06F 3/04842 - Sélection des objets affichés ou des éléments de texte affichés
G06F 3/0488 - Techniques d’interaction fondées sur les interfaces utilisateur graphiques [GUI] utilisant des caractéristiques spécifiques fournies par le périphérique d’entrée, p.ex. des fonctions commandées par la rotation d’une souris à deux capteurs, ou par la nature du périphérique d’entrée, p.ex. des gestes en fonction de la pression exer utilisant un écran tactile ou une tablette numérique, p.ex. entrée de commandes par des tracés gestuels
An animation system configured for generating an animation scene that includes at least one animation stylization effect applied to one or more three-dimensional digital objects is described. The animation system includes an interface having a timeline portion and a node graph portion. The timeline portion represents various animation stylization effects as clips arranged chronologically relative to a timeline and the node graph portion includes a node cluster for each clip, where individual node clusters are made up of an animate node, an action node, and an effect node. Input at the timeline portion modifying at least one parameter of the animation scene propagates to the node graph portion, and vice versa. The animation system thus presents dual representations of an animation scene in a manner that enables complex animation customizations while organizing animation effects in a simplified, chronological manner.
An image search system uses a multi-modal model to determine relevance of images to a spoken query. The multi-modal model includes a spoken language model that extracts features from spoken query and a language processing model that extract features from an image. The multi-model model determines a relevance score for the image and the spoken query based on the extracted features. The multi-modal model is trained using a curriculum approach that includes training the spoken language model using audio data. Subsequently, a training dataset comprising a plurality of spoken queries and one or more images associated with each spoken query is used to jointly train the spoken language model and an image processing model to provide a trained multi-modal model.
G10L 15/06 - Création de gabarits de référence; Entraînement des systèmes de reconnaissance de la parole, p.ex. adaptation aux caractéristiques de la voix du locuteur
G06V 10/774 - Dispositions pour la reconnaissance ou la compréhension d’images ou de vidéos utilisant la reconnaissance de formes ou l’apprentissage automatique utilisant l’intégration et la réduction de données, p.ex. analyse en composantes principales [PCA] ou analyse en composantes indépendantes [ ICA] ou cartes auto-organisatrices [SOM]; Séparation aveugle de source méthodes de Bootstrap, p.ex. "bagging” ou “boosting”
G10L 15/183 - Classement ou recherche de la parole utilisant une modélisation du langage naturel selon les contextes, p.ex. modèles de langage
G06F 40/284 - Analyse lexicale, p.ex. segmentation en unités ou cooccurrence
Methods, systems, and non-transitory computer readable storage media are disclosed for generating neural network based perceptual artifact segmentations in synthetic digital image content. The disclosed system utilizing neural networks to detect perceptual artifacts in digital images in connection with generating or modifying digital images. The disclosed system determines a digital image including one or more synthetically modified portions. The disclosed system utilizes an artifact segmentation machine-learning model to detect perceptual artifacts in the synthetically modified portion(s). The artifact segmentation machine-learning model is trained to detect perceptual artifacts based on labeled artifact regions of synthetic training digital images. Additionally, the disclosed system utilizes the artifact segmentation machine-learning model in an iterative inpainting process. The disclosed system utilizes one or more digital image inpainting models to inpaint in a digital image. The disclosed system utilizes the artifact segmentation machine-learning model detect perceptual artifacts in the inpainted portions for additional inpainting iterations.
One or more processing devices access a scene depicting a reference object that includes an annotation identifying a target region to be modified in one or more video frames. The one or more processing devices determine that a target pixel corresponds to a sub-region within the target region that includes hallucinated content. The one or more processing devices determine gradient constraints using gradient values of neighboring pixels in the hallucinated content, the neighboring pixels being adjacent to the target pixel and corresponding to four cardinal directions. The one or more processing devices update color data of the target pixel subject to the determined gradient constraints.
Systems and methods use machine learning models with content editing tools to prevent or mitigate inadvertent disclosure and dissemination of sensitive data. Entities associated with private information are identified by applying a trained machine learning model to a set of unstructured text data received via an input field of an interface. A privacy score is computed for the text data by identifying connections between the entities, the connections between the entities contributing to the privacy score according to a cumulative privacy risk, the privacy score indicating potential exposure of the private information. The interface is updated to include an indicator distinguishing a target portion of the set of unstructured text data within the input field from other portions of the set of unstructured text data within the input field, wherein a modification to the target portion changes the potential exposure of the private information indicated by the privacy score.
G06Q 50/26 - Services gouvernementaux ou services publics
G06F 16/48 - Recherche caractérisée par l’utilisation de métadonnées, p.ex. de métadonnées ne provenant pas du contenu ou de métadonnées générées manuellement
G06F 40/40 - Traitement ou traduction du langage naturel
G06F 21/62 - Protection de l’accès à des données via une plate-forme, p.ex. par clés ou règles de contrôle de l’accès
G06Q 50/00 - Systèmes ou procédés spécialement adaptés à un secteur particulier d’activité économique, p.ex. aux services d’utilité publique ou au tourisme
Systems and methods for mesh generation are described. One aspect of the systems and methods includes receiving an image depicting a visible portion of a body; generating an intermediate mesh representing the body based on the image; generating visibility features indicating whether parts of the body are visible based on the image; generating parameters for a morphable model of the body based on the intermediate mesh and the visibility features; and generating an output mesh representing the body based on the parameters for the morphable model, wherein the output mesh includes a non-visible portion of the body that is not depicted by the image.
Methods, systems, and non-transitory computer readable storage media are disclosed that utilizes machine learning models for patch retrieval and deformation in completing three-dimensional digital shapes. In particular, in one or more implementations the disclosed systems utilize a machine learning model to predict a coarse completion shape from an incomplete 3D digital shape. The disclosed systems sample coarse 3D patches from the coarse 3D digital shape and learn a shape distance function to retrieve detailed 3D shape patches in the input shape. Moreover, the disclosed systems learn a deformation for each retrieved patch and blending weights to integrate the retrieved patches into a continuous surface.
G06T 17/20 - Description filaire, p.ex. polygonalisation ou tessellation
G06V 10/22 - Prétraitement de l’image par la sélection d’une région spécifique contenant ou référençant une forme; Localisation ou traitement de régions spécifiques visant à guider la détection ou la reconnaissance
G06V 10/75 - Appariement de motifs d’image ou de vidéo; Mesures de proximité dans les espaces de caractéristiques utilisant l’analyse de contexte; Sélection des dictionnaires
A system debiases image translation models to produce generated images that contain minority attributes. A balanced batch for a minority attribute is created by over-sampling images having the minority attribute from an image dataset. An image translation model is trained using images from the balanced batch by applying supervised contrastive loss to output of an encoder of the image translation model and an auxiliary classifier loss based on predicted attributes in images generated by a decoder of the image translation model. Once trained, the image translation model is used to generate images with the minority image when given an input image having the minority attribute.
Systems and methods for image processing are described. Embodiments of the present disclosure receive a first image depicting a scene and a second image that includes a style; segment the first image to obtain a first segment and a second segment, wherein the first segment has a shape of an object in the scene; apply a style transfer network to the first segment and the second image to obtain a first image part, wherein the first image part has the shape of the object and the style from the second image; combine the first image part with a second image part corresponding to the second segment to obtain a combined image; and apply a lenticular effect to the combined image to obtain an output image.
G06T 19/20 - Transformation de modèles ou d'images tridimensionnels [3D] pour infographie Édition d'images tridimensionnelles [3D], p.ex. modification de formes ou de couleurs, alignement d'objets ou positionnements de parties
The present disclosure relates to systems, non-transitory computer-readable media, and methods for adapting generative neural networks to target domains utilizing an image translation neural network. In particular, in one or more embodiments, the disclosed systems utilize an image translation neural network to translate target results to a source domain for input in target neural network adaptation. For instance, in some embodiments, the disclosed systems compare a translated target result with a source result from a pretrained source generative neural network to adjust parameters of a target generative neural network to produce results corresponding in features to source results and corresponding in style to the target domain.
G06V 10/82 - Dispositions pour la reconnaissance ou la compréhension d’images ou de vidéos utilisant la reconnaissance de formes ou l’apprentissage automatique utilisant les réseaux neuronaux
G06V 10/77 - Dispositions pour la reconnaissance ou la compréhension d’images ou de vidéos utilisant la reconnaissance de formes ou l’apprentissage automatique utilisant l’intégration et la réduction de données, p.ex. analyse en composantes principales [PCA] ou analyse en composantes indépendantes [ ICA] ou cartes auto-organisatrices [SOM]; Séparation aveugle de source
G06V 10/46 - Descripteurs pour la forme, descripteurs liés au contour ou aux points, p.ex. transformation de caractéristiques visuelles invariante à l’échelle [SIFT] ou sacs de mots [BoW]; Caractéristiques régionales saillantes
In implementations of systems for efficiently generating blend objects, a computing device implements a blending system to assign unique shape identifiers to objects included in an input render tree. The blending system generates a shape mask based on the unique shape identifiers. A color of a pixel of a blend object is computed based on particular objects of the objects that contribute to the blend object using the shape mask. The blending system generates the blend object for display in a user interface based on the color of the pixel.
Embodiments are disclosed for using machine learning models to perform three-dimensional garment deformation due to character body motion with collision handling. In particular, in one or more embodiments, the disclosed systems and methods comprise receiving an input, the input including character body shape parameters and character body pose parameters defining a character body, and garment parameters. The disclosed systems and methods further comprise generating, by a first neural network, a first set of garment vertices defining deformations of a garment with the character body based on the input. The disclosed systems and methods further comprise determining, by a second neural network, that the first set of garment vertices includes a second set of garment vertices penetrating the character body. The disclosed systems and methods further comprise modifying, by a third neural network, each garment vertex in the second set of garment vertices to positions outside the character body.
Some embodiments involve a reinforcement learning based framework for training a natural media agent to learn a rendering policy without human supervision or labeled datasets. The reinforcement learning based framework feeds the natural media agent a training dataset to implicitly learn the rendering policy by exploring a canvas and minimizing a loss function. Once trained, the natural media agent can be applied to any reference image to generate a series (or sequence) of continuous-valued primitive graphic actions, e.g., sequence of painting strokes, that when rendered by a synthetic rendering environment on a canvas, reproduce an identical or transformed version of the reference image subject to limitations of an action space and the learned rendering policy.
G09G 5/37 - Dispositions ou circuits de commande de l'affichage communs à l'affichage utilisant des tubes à rayons cathodiques et à l'affichage utilisant d'autres moyens de visualisation caractérisés par l'affichage de dessins graphiques individuels en utilisant une mémoire à mappage binaire - Détails concernant le traitement de dessins graphiques
This disclosure describes one or more implementations of a panoptic segmentation system that generates panoptic segmented digital images that classify both known and unknown instances of digital images. For example, the panoptic segmentation system builds and utilizes a panoptic segmentation neural network to discover, cluster, and segment new unknown object subclasses for previously unknown object instances. In addition, the panoptic segmentation system can determine additional unknown object instances from additional digital images. Moreover, in some implementations, the panoptic segmentation system utilizes the newly generated unknown object subclasses to refine and tune the panoptic segmentation neural network to improve the detection of unknown object instances in input digital images.
G06V 10/40 - Extraction de caractéristiques d’images ou de vidéos
G06F 18/2413 - Techniques de classification relatives au modèle de classification, p.ex. approches paramétriques ou non paramétriques basées sur les distances des motifs d'entraînement ou de référence
A group captioning system includes computing hardware, software, and/or firmware components in support of the enhanced group captioning contemplated herein. In operation, the system generates a target embedding for a group of target images, as well as a reference embedding for a group of reference images. The system identifies information in-common between the group of target images and the group of reference images and removes the joint information from the target embedding and the reference embedding. The result is a contrastive group embedding that includes a contrastive target embedding and a contrastive reference embedding with which to construct a contrastive group embedding, which is then input to a model to obtain a group caption for the target group of images.
G06V 20/30 - RECONNAISSANCE OU COMPRÉHENSION D’IMAGES OU DE VIDÉOS Éléments spécifiques à la scène dans les albums, les collections ou les contenus partagés, p.ex. des photos ou des vidéos issus des réseaux sociaux
G06V 10/75 - Appariement de motifs d’image ou de vidéo; Mesures de proximité dans les espaces de caractéristiques utilisant l’analyse de contexte; Sélection des dictionnaires
G06F 18/214 - Génération de motifs d'entraînement; Procédés de Bootstrapping, p.ex. ”bagging” ou ”boosting”
G06V 10/82 - Dispositions pour la reconnaissance ou la compréhension d’images ou de vidéos utilisant la reconnaissance de formes ou l’apprentissage automatique utilisant les réseaux neuronaux
Systems and methods for facial image generation are described. One aspect of the systems and methods includes receiving an image depicting a face, wherein the face has an identity non-related attribute and a first identity-related attribute; encoding the image to obtain an identity non-related attribute vector in an identity non-related attribute vector space, wherein the identity non-related attribute vector represents the identity non-related attribute; selecting an identity-related vector from an identity-related vector space, wherein the identity-related vector represents a second identity-related attribute different from the first identity-related attribute; generating a modified latent vector in a latent vector space based on the identity non-related attribute vector and the identity-related vector; and generating a modified image based on the modified latent vector, wherein the modified image depicts a face that has the identity non-related attribute and the second identity-related attribute.
Systems and methods for color prediction are described. Embodiments of the present disclosure receive an image that includes an object including a color, generate a color vector based on the image using a color classification network, where the color vector includes a color value corresponding to each of a set of colors, generate a bias vector by comparing the color vector to teach of a set of center vectors, where each of the set of center vectors corresponds to a color of the set of colors, and generate an unbiased color vector based on the color vector and the bias vector, where the unbiased color vector indicates the color of the object.
G06V 10/764 - Dispositions pour la reconnaissance ou la compréhension d’images ou de vidéos utilisant la reconnaissance de formes ou l’apprentissage automatique utilisant la classification, p.ex. des objets vidéo
G06V 10/56 - Extraction de caractéristiques d’images ou de vidéos relative à la couleur
G06V 10/774 - Dispositions pour la reconnaissance ou la compréhension d’images ou de vidéos utilisant la reconnaissance de formes ou l’apprentissage automatique utilisant l’intégration et la réduction de données, p.ex. analyse en composantes principales [PCA] ou analyse en composantes indépendantes [ ICA] ou cartes auto-organisatrices [SOM]; Séparation aveugle de source méthodes de Bootstrap, p.ex. "bagging” ou “boosting”
96.
GENERATING NEURAL NETWORK BASED PERCEPTUAL ARTIFACT SEGMENTATIONS IN MODIFIED PORTIONS OF A DIGITAL IMAGE
Methods, systems, and non-transitory computer readable storage media are disclosed for generating neural network based perceptual artifact segmentations in synthetic digital image content. The disclosed system utilizing neural networks to detect perceptual artifacts in digital images in connection with generating or modifying digital images. The disclosed system determines a digital image including one or more synthetically modified portions. The disclosed system utilizes an artifact segmentation machine-learning model to detect perceptual artifacts in the synthetically modified portion(s). The artifact segmentation machine-learning model is trained to detect perceptual artifacts based on labeled artifact regions of synthetic training digital images. Additionally, the disclosed system utilizes the artifact segmentation machine-learning model in an iterative inpainting process. The disclosed system utilizes one or more digital image inpainting models to inpaint in a digital image. The disclosed system utilizes the artifact segmentation machine-learning model detect perceptual artifacts in the inpainted portions for additional inpainting iterations.
Techniques for recommending hashtags, including trending hashtags, are disclosed. An example method includes accessing a graph. The graph includes video nodes representing videos, historical hashtag nodes representing historical hashtags, and edges indicating associations among the video nodes and the historical hashtag nodes. A trending hashtag is identified. An edge is added to the graph between a historical hashtag node representing a historical hashtag and a trending hashtag node representing the trending hashtag, based on a semantic similarity between the historical hashtag and the trending hashtag. A new video node representing a new video is added to the video nodes of the graph. A graph neural network (GNN) is applied to the graph, and the GNN predicts a new edge between the trending hashtag node and the new video node. The trending hashtag is recommended for the new video based on prediction of the new edge.
G06Q 50/00 - Systèmes ou procédés spécialement adaptés à un secteur particulier d’activité économique, p.ex. aux services d’utilité publique ou au tourisme
In implementations of systems for identifying templates based on fonts, a computing device implements an identification system to receive input data describing a selection of a font included in a collection of fonts. The identification system generates an embedding that represents the font in a latent space using a machine learning model trained on training data to generate embeddings for digital templates in the latent space based on intent phrases associated with the digital templates and embeddings for fonts in the latent space based on intent phrases associated with the fonts. A digital template included in a collection of digital templates is identified based on the embedding that represents the font and an embedding that represents the digital template in the latent space. The identification system generates an indication of the digital template for display in a user interface.
G06F 17/00 - TRAITEMENT ÉLECTRIQUE DE DONNÉES NUMÉRIQUES Équipement ou méthodes de traitement de données ou de calcul numérique, spécialement adaptés à des fonctions spécifiques
G06F 3/0484 - Techniques d’interaction fondées sur les interfaces utilisateur graphiques [GUI] pour la commande de fonctions ou d’opérations spécifiques, p.ex. sélection ou transformation d’un objet, d’une image ou d’un élément de texte affiché, détermination d’une valeur de paramètre ou sélection d’une plage de valeurs
G06F 40/109 - Maniement des polices de caractères; Typographie cinétique ou temporelle
In implementations of systems for assistive digital form authoring, a computing device implements an authoring system to receive input data describing a search input associated with a digital form. The authoring system generates an input embedding vector that represents the search input in a latent space using a machine learning model trained on training data to generate embedding vectors in the latent space. A candidate embedding vector included in a group of candidate embedding vectors is identified based on a distance between the input embedding vector and the candidate embedding vector in the latent space. The authoring system generates an indication of a search output associated with the digital form for display in a user interface based on the candidate embedding vector.
G06F 17/00 - TRAITEMENT ÉLECTRIQUE DE DONNÉES NUMÉRIQUES Équipement ou méthodes de traitement de données ou de calcul numérique, spécialement adaptés à des fonctions spécifiques
Systems and methods for content customization are provided. One aspect of the systems and methods includes receiving dynamic characteristics for a plurality of users, wherein the dynamic characteristics include interactions between the plurality of users and a digital content channel; clustering the plurality of users in a plurality of segments based on the dynamic characteristics using a machine learning model; assigning a user to a segment of the plurality of segments based on static characteristics of the user; and providing customized digital content for the user based on the segment.
H04N 21/2668 - Création d'un canal pour un groupe dédié d'utilisateurs finaux, p.ex. en insérant des publicités ciblées dans un flux vidéo en fonction des profils des utilisateurs finaux
H04N 21/25 - Opérations de gestion réalisées par le serveur pour faciliter la distribution de contenu ou administrer des données liées aux utilisateurs finaux ou aux dispositifs clients, p.ex. authentification des utilisateurs finaux ou des dispositifs clients ou