Introduced here are approaches to determining causal relationships in mixed datasets containing data related to continuous variables and discrete variables. To accomplish this, a marketing insight and intelligence platform may employ a multi-phase approach in which dependency is established before the data related to continuous variables is discretized. Such an approach ensures that information regarding dependence is not lost through discretization.
The present disclosure describes systems, methods, and non-transitory computer readable media for detecting user interactions to edit a digital image from a client device and modify the digital image for the client device by using a web-based intermediary that modifies a latent vector of the digital image and an image modification neural network to generate a modified digital image from the modified latent vector. In response to user interaction to modify a digital image, for instance, the disclosed systems modify a latent vector extracted from the digital image to reflect the requested modification. The disclosed systems further use a latent vector stream renderer (as an intermediary device) to generate an image delta that indicates a difference between the digital image and the modified digital image. The disclosed systems then provide the image delta as part of a digital stream to a client device to quickly render the modified digital image.
A system and methods for providing human-invisible AR markers is described. One aspect of the system and methods includes identifying AR metadata associated with an object in an image; generating AR marker image data based on the AR metadata; generating a first variant of the image by adding the AR marker image data to the image; generating a second variant of the image by subtracting the AR marker image data from the image; and displaying the first variant and the second variant of the image alternately at a display frequency to produce a display of the image, wherein the AR marker image data is invisible to a human vision system in the display of the image.
The present disclosure relates to systems, methods, and non-transitory computer readable media that implement an inpainting framework having computer-implemented machine learning models to generate high-resolution inpainting results. For instance, in one or more embodiments, the disclosed systems generate an inpainted digital image utilizing a deep inpainting neural network from a digital image having a replacement region. The disclosed systems further generate, utilizing a visual guide algorithm, at least one deep visual guide from the inpainted digital image. Using a patch match model and the at least one deep visual guide, the disclosed systems generate a plurality of modified digital images from the digital image by replacing the region of pixels of the digital image with replacement pixels. Additionally, the disclosed systems select, utilizing an inpainting curation model, a modified digital image from the plurality of modified digital images to provide to a client device.
The present disclosure relates to systems, methods, and non-transitory computer readable media that utilize deep learning to identify regions of an image that have been editorially modified. For example, the image comparison system includes a deep image comparator model that compares a pair of images and localizes regions that have been editorially manipulated relative to an original or trusted image. More specifically, the deep image comparator model generates and surfaces visual indications of the location of such editorial changes on the modified image. The deep image comparator model is robust and ignores discrepancies due to benign image transformations that commonly occur during electronic image distribution. The image comparison system optionally includes an image retrieval model utilizes a visual search embedding that is robust to minor manipulations or benign modifications of images. The image retrieval model utilizes a visual search embedding for an image to robustly identify near duplicate images.
The present disclosure relates to systems, non-transitory computer-readable media, and methods for generating marked digital images with content adaptive watermarks. In particular, in one or more embodiments, the disclosed systems intelligently evaluate a plurality of watermark configurations to select one or more content adaptive watermarks for one or more target digital images and generate one or more marked digital images by adding the selected content adaptive watermarks to the one or more target digital images.
A clustering system provides bounded incremental clustering for adding input data instances to existing data clusters. Input data instances are received and processed to form input data clusters. For a given input data cluster, a subset of existing data clusters is selected, and a subset of existing data instances are selected from each of the selected existing data clusters. The selected existing data instances and the input data instances from the given input data cluster are processed to form intermediate clusters. At least one intermediate cluster is mapped to an existing data cluster.
Systems and methods for video segmentation and summarization are described. Embodiments of the present disclosure receive a video and a transcript of the video; generate visual features representing frames of the video using an image encoder; generate language features representing the transcript using a text encoder, wherein the image encoder and the text encoder are trained based on a correlation between training visual features and training language features; and segment the video into a plurality of video segments based on the visual features and the language features.
G06V 20/40 - Scenes; Scene-specific elements in video content
G06F 16/683 - Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
G06V 10/774 - Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
Methods, computer systems, computer-storage media, and graphical user interfaces are provided for facilitating generation and presentation of insights. In one implementation, a set of data is used to generate a data visualization. A candidate insight associated with the data visualization is generated, the candidate insight being generated in text form based on a text template and comprising a descriptive insight, a predictive insight, an investigative, or a prescriptive insight. A set of natural language insights is generated, via a machine learning model. The natural language insights represent the candidate insight in a text style that is different from the text template. A natural language insight having the text style corresponding with a desired text style is selected for presenting the candidate insight and, thereafter, the selected natural language insight and data visualization are providing for display via a graphical user interface.
Embodiments described herein provide methods and systems for facilitating actively-learned context modeling. In one embodiment, a subset of data is selected from a training dataset corresponding with an image to be compressed, the subset of data corresponding with a subset of data of pixels of the image. A context model is generated using the selected subset of data. The context model is generally in the form of a decision tree having a set of leaf nodes. Entropy values corresponding with each leaf node of the set of leaf nodes are determined. Each entropy value indicates an extent of diversity of context associated with the corresponding leaf node. Additional data from the training dataset is selected based on the entropy values corresponding with the leaf nodes. The updated subset of data is used to generate an updated context model for use in performing compression of the image.
H04N 19/91 - Entropy coding, e.g. variable length coding [VLC] or arithmetic coding
H04N 19/50 - Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
H04N 19/184 - Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being bits, e.g. of the compressed video stream
H04N 19/182 - Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being a pixel
Embodiments are provided for facilitating multimodal extraction across multiple granularities. In one implementation, a set of features of a document for a plurality of granularities of the document is obtained. Via a machine learning model, the set of features of the document are modified to generate a set of modified features using a set of self-attention values to determine relationships within a first type of feature and a set of cross-attention values to determine relationships between the first type of feature and a second type of feature. Thereafter, the set of modified features are provided to a second machine learning model to perform a classification task.
Systems and methods for machine learning based multipage scanning are provided. In one embodiment, one or more processing devices perform operations that include receiving a video stream that includes image frames that capture a plurality of pages of a document. The operations further include detection, via a machine learning model that is trained to infer events from the video stream detects, a new page event. Detection of the new page event indicates that a page of the plurality of pages available for scanning has changed from a first page to a second page. Based on the detection of the new page event, the one or more processing devices capture an image frame of the page from the video stream. In some embodiments, the machine learning model detects events based on a weighted use of video data, inertial data, audio samples, image depth information, image statistics and/or other information.
Systems and methods for natural language processing are described. Embodiments of the present disclosure receive text including an event trigger word indicating an occurrence of an event; classify the event trigger word to obtain an event type using a few-shot classification network, wherein the few-shot classification network is trained by storing first labeled samples during a first training iteration and using the first labeled samples for computing a loss function during a second training iteration that includes a support set with second labeled samples having a same ground-truth label as the first labeled samples; and transmit event detection information including the event trigger word and the event type.
Embodiments are disclosed for adding node highlighting to vector graphics. In particular, in one or more embodiments, the disclosed systems and methods comprise receiving a selection of a plurality of anchor points of a vector graphic to be highlighted, generating a graph representing one or more path objects of the vector graphic, each node of the graph corresponding to an anchor point of the one or more path objects and each connection corresponding to a path segment connecting the anchor point to another anchor point, identifying a highlight trajectory including a subset of nodes from the graph, the highlight trajectory including at least a start node and an end node, generating a highlight path including at least one or more highlight nodes corresponding to a subset of nodes from the highlight trajectory, and updating the vector graphic to include the highlight path.
G06F 3/04845 - Interaction techniques based on graphical user interfaces [GUI] for the control of specific functions or operations, e.g. selecting or manipulating an object, an image or a displayed text element, setting a parameter value or selecting a range for image manipulation, e.g. dragging, rotation, expansion or change of colour
In implementations of systems for efficiently rendering vector objects, a computing device implements a rendering system to identify unique geometries from a set of geometries of vector objects included in a render tree. The rendering system tessellates the unique geometries and each of the tessellated unique geometries has a unique identifier. Mappings are generated between the vector objects included in the render tree and the tessellated unique geometries using the unique identifiers. The rendering system renders the vector objects included in the render tree for display in a user interface based on the mappings.
Systems and methods that enable the efficient and adaptive allocation of resources dedicated to a virtualized resource-based computation (e.g., one or more information processing tasks) are provided. In one embodiment, a reward model is generated based on a set of statistical distributions, for example, in response to receiving a request to launch a set of VCRs. Thereafter, an expected reward is predicting for each configuration of a set of configurations based on the reward model and one or more parameters of the corresponding configuration. The expected reward indicates an efficiency in distribution or allocation of physical computation resources to the set of VCRs. A configuration of the set of configurations is selected based on the predicted expected reward for the configuration. The set of VCRs are then configured with the selected configuration.
Embodiments are disclosed for generating temporally consistent manipulated videos. A method of generating temporally consistent manipulated videos comprises receiving a target appearance and an input digital video including a plurality of frames, generating a plurality of target appearance frames from the plurality of frames, training a video prediction network to generate a digital video wherein a subject of the digital video has its appearance modified to match the target appearance, providing the input digital video to the video prediction network, and generating, by the video prediction network, an output digital video wherein the subject of the output digital video has its appearance modified to match the target appearance.
The technology is directed towards receiving training data regarding a set of observations. Each observation includes a feature set, a treatment, and an outcome. A first generation of machine learning models is trained, via the training data, to predict an outcome for a feature set of a given observation. A new generation of models is generated by selecting a subset of models from the trained first generation of models based on a fitness criteria of each model to generate an intermediate layer for use in predicting a treatment. An algorithm is applied to the selected subset of models to generate the new generation of models. Transformed training data is generated using the training data and a model of the new generation of models. The transformed training data includes, for each observation, a transformed feature set comprising a representation of the feature set in a latent space of the model.
Systems and methods for product retrieval are described. One or more aspects of the systems and methods include receiving a query that includes a text description of a product associated with a brand; identifying the product based on the query by comparing the text description to a product embedding of the product, wherein the product embedding is based on a brand embedding of the brand; and displaying product information for the product in response to the query, wherein the product information includes the brand.
A computing system captures a first image, comprising an object in a first position, using a camera. The object has indicators indicating points of interest on the object. The computing system receives first user input linking at least a subset of the indicators and establishing relationships between the points of interest on the object and second user input comprising a graphic element and a mapping between the graphic element and the object. The computing system captures second images, comprising the object in one or more modified positions using, the camera. The computing system tracks the modified positions of the object across the second images using the indicators and the relationships between the points of interest. The computing system generates a virtual graphic based on the one or more modified positions, the graphic element, and the mappings between the graphic element and the object.
G06T 19/00 - Manipulating 3D models or images for computer graphics
G06F 3/04815 - Interaction with a metaphor-based environment or interaction object displayed as three-dimensional, e.g. changing the user viewpoint with respect to the environment or object
Some techniques described herein relate to utilizing a machine-learning (ML) model to select respective samples for queries of a query sequence. In one example, a method includes receiving a query in a query sequence, where the query is directed toward a dataset. Samples are available as down-sampled versions of the dataset. The method further include applying an agent to select, for the query, a sample from among the samples of the dataset. The agent includes an ML model trained, such as via intent-based reinforcement learning, to select respective samples for queries. The query is then executed against the sample to output a response.
Graphics processing unit instancing control techniques are described that overcome conventional challenges to expand functionality made available via a graphics processing unit. In one example, these techniques support ordering of primitives within respective instances of a single draw call made to a graphics processing unit. This is performed by ordering primitives within respective instances that correspond to polygons for rendering. The ordering of the primitives overcomes limitations of conventional techniques and reduces visual artifacts through support of correct overlaps and z-ordering of instances.
Embodiments are disclosed for correlating video sequences and audio sequences by a media recommendation system using a trained encoder network. In particular, in one or more embodiments, the disclosed systems and methods comprise receiving a training input including a media sequence, including a video sequence paired with an audio sequence, segmenting the media sequence into a set of video sequence segments and a set of audio sequence segments, extracting visual features for each video sequence segment and audio features for each audio sequence segment, generating, by transformer networks, contextualized visual features from the extracted visual features and contextualized audio features from the extracted audio features, the transformer networks including a visual transformer and an audio transformer, generating predicted video and audio sequence segment pairings based on the contextualized visual and audio features, and training the visual transformer and the audio transformer to generate the contextualized visual and audio features.
G06V 10/774 - Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
G06V 20/40 - Scenes; Scene-specific elements in video content
G06V 10/74 - Image or video pattern matching; Proximity measures in feature spaces
G10L 25/57 - Speech or voice analysis techniques not restricted to a single one of groups specially adapted for particular use for comparison or discrimination for processing of video signals
G10L 25/03 - Speech or voice analysis techniques not restricted to a single one of groups characterised by the type of extracted parameters
Systems and methods for translation evaluation are provided. One or more aspects of the systems and methods includes receiving a source text, a context identifier for the source text, and a translation text, wherein the source text comprises text from a software application and the context identifier specifies a context of the source text within the software application; generating a source text representation and a translation text representation based on the source text, the context identifier, and the translation text using an encoder of a machine learning model; and generating translation quality information based on the source text representation and the translation text representation using a decoder of the machine learning model.
Digital content operation testing techniques are described. An authoring environment supports edit operations to digital content. The authoring environment includes an option to initiate testing of operation of edited digital content at a publish environment of a content delivery network, at which, the digital content is to be deployed. Data describing results of the testing is then communicated over the network based to the digital content editing system. The data is output within the user interface of the authoring environment in this example such that an effect of edits made to the digital content are viewable non-modally within the authoring environment.
Embodiments are disclosed for identifying and modifying overlapping glyphs in a text layout. A method of identifying and modifying overlapping glyphs includes detecting a plurality of overlapping glyphs in a text layout, modifying a geometry of one or more of the overlapping glyphs based on an aesthetic score, updating a rendering tree based on the modified geometry of the one or more overlapping glyphs, and rendering the text layout using the rendering tree.
The technology described herein is directed to an adaptive sparse attention pattern that is learned during fine-tuning and deployed in a machine-learning model. In aspects, a row or a column in an attention matrix with an importance score for a task that is above a threshold importance score is identified. The important row or the column is included in an adaptive attention pattern used with a machine-learning model having a self-attention operation. In response to an input, a task-specific inference is generated for the input using the machine-learning model with the adaptive attention pattern.
Embodiments are disclosed for managing co-editing management. A method of co-editing management includes detecting a modification operation to be performed on a sequential data structure being edited by one or more client devices, determining a segment of the sequential data structure associated with the modification operation based on a logical index associated with the modification operation, generating a tree structure associated with the segment, a root node of the tree structure corresponding to the modification operation, determining a global index for the root node of the tree structure, and sending an update corresponding to the modification operation, including the root node and the global index, to a co-editing server to be distributed to the one or more client devices.
Embodiments provide systems, methods, and computer storage media for a Nonsymmetric Determinantal Point Process (NDPPs) for compatible set recommendations in a setting where data representing entities (e.g., items) arrives in a stream. A stream representing compatible sets of entities is received and used to update a latent representation of the entities and a compatibility distribution indicating likelihood of compatibility of subsets of the entities. The probability distribution is accessed in a single sequential pass to predict a compatible complete set of entities that completes an incomplete set of entities. The predicted complete compatible set is provided a recommendation for entities that complete the incomplete set of entities.
Embodiments described herein include aspects related to previewing and capturing stroke outlines from a real-world image or a video feed. In one embodiment, a stroke outline preview image is generated by performing an edge detection process on an input image. The stroke outline preview image provides a preview indicating an example of a stroke outline image to be provided for the input image if the input image is selected. A detailed stroke outline image for the input image is generated using a detailed stroke outline process, and an alternative stroke outline image is obtained for the input image using an alternative outline process. Thereafter, the alternative stroke outline image is modified by including in the alternative stroke outline image a portion of stroke outlines from the detailed stroke outline image.
G06V 10/44 - Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
G06V 10/36 - Applying a local operator, i.e. means to operate on image points situated in the vicinity of a given point; Non-linear local filtering operations, e.g. median filtering
31.
OBJECT CLASS INPAINTING IN DIGITAL IMAGES UTILIZING CLASS-SPECIFIC INPAINTING NEURAL NETWORKS
The present disclosure relates to systems, methods, and non-transitory computer readable media that generate inpainted digital images utilizing class-specific cascaded modulation inpainting neural network. For example, the disclosed systems utilize a class-specific cascaded modulation inpainting neural network that includes cascaded modulation decoder layers to generate replacement pixels portraying a particular target object class. To illustrate, in response to user selection of a replacement region and target object class, the disclosed systems utilize a class-specific cascaded modulation inpainting neural network corresponding to the target object class to generate an inpainted digital image that portrays an instance of the target object class within the replacement region. Moreover, in one or more embodiments the disclosed systems train class-specific cascaded modulation inpainting neural networks corresponding to a variety of target object classes, such as a sky object class, a water object class, a ground object class, or a human object class.
In implementations of three-dimensional vector-based brushes, a computing device implements a brush system to receive input data describing a first stamp and a second stamp of consecutive stamps of a vector-based digital brush. The first stamp is segmented into a first convex geometry and a second convex geometry, and the second stamp is segmented into a third convex geometry and a fourth convex geometry. The brush system computes a first convex hull of the first convex geometry and the third convex geometry and computes a second convex hull of the second convex geometry and the fourth convex geometry. An order for passing the first convex hull and the second convex hull to a planarizer is determined and the brush system generates a vector-based stroke of digital paint for display in a user interface based on the order.
Aspects of a system and method for procedural media generation include generating a sequence of operator types using a node generation network; generating a sequence of operator parameters for each operator type of the sequence of operator types using a parameter generation network; generating a sequence of directed edges based on the sequence of operator types using an edge generation network; combining the sequence of operator types, the sequence of operator parameters, and the sequence of directed edges to obtain a procedural media generator, wherein each node of the procedural media generator comprises an operator that includes an operator type from the sequence of operator types, a corresponding sequence of operator parameters, and an input connection or an output connection from the sequence of directed edges that connects the node to another node of the procedural media generator; and generating a media asset using the procedural media generator.
G06V 10/44 - Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
Systems and methods for image processing are configured. Embodiments of the present disclosure identify target style attributes and target structure attributes for a composite image; generate a matrix of composite feature tokens based on the target style attributes and the target structure attributes, wherein subsequent feature tokens of the matrix of composite feature tokens are sequentially generated based on previous feature tokens of the matrix of composite feature tokens according to a linear ordering of the matrix of composite feature tokens; and generate the composite image based on the matrix of composite feature tokens, wherein the composite image includes the target style attributes and the target structure attributes.
Various disclosed embodiments are directed to classify or determining an image style of a target image according to a consumer application based on determining a similarity score between the image style of a target image and one or more other predetermined image styles of the consumer application. Various disclosed embodiments can resolve image style transfer destructiveness functionality by making various layers of predetermined image styles modifiable. Further various embodiments resolve tedious manual user input requirements and reduce computing resource consumption, among other things.
G06F 16/583 - Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
G06F 18/214 - Generating training patterns; Bootstrap methods, e.g. bagging or boosting
G06V 10/44 - Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
36.
INSERTING THREE-DIMENSIONAL OBJECTS INTO DIGITAL IMAGES WITH CONSISTENT LIGHTING VIA GLOBAL AND LOCAL LIGHTING INFORMATION
This disclosure describes methods, non-transitory computer readable storage media, and systems that generate realistic shading for three-dimensional objects inserted into digital images. The disclosed system utilizes a light encoder neural network to generate a representation embedding of lighting in a digital image. Additionally, the disclosed system determines points of the three-dimensional object visible within a camera view. The disclosed system generates a self-occlusion map for the digital three-dimensional object by determining whether fixed sets of rays uniformly sampled from the points intersects with the digital three-dimensional object. The disclosed system utilizes a generator neural network to determine a shading map for the digital three-dimensional object based on the representation embedding of lighting in the digital image and the self-occlusion map. Additionally, the disclosed system generates a modified digital image with the three-dimensional object inserted into the digital image with consistent lighting of the three-dimensional object and the digital image.
Face anonymization techniques are described that overcome conventional challenges to generate an anonymized face. In one example, a digital object editing system is configured to generate an anonymized face based on a target face and a reference face. As part of this, the digital object editing system employs an encoder as part of machine learning to extract a target encoding of the target face image and a reference encoding of the reference face. The digital object editing system then generates a mixed encoding from the target and reference encodings. The mixed encoding is employed by a machine-learning model of the digital object editing system to generate a mixed face. An object replacement module is used by the digital object editing system to replace the target face in the target digital image with the mixed face.
The present disclosure relates to using end-to-end differentiable pipeline for optimizing parameters of a base procedural material to generate a procedural material corresponding to a target physical material. For example, the disclosed systems can receive a digital image of a target physical material. In response, the disclosed systems can retrieve a differentiable procedural material for use as a base procedural material in response. The disclosed systems can compare a digital image of the base procedural material with the digital image of the target physical material using a loss function, such as a style loss function that compares visual appearance. Based on the determined loss, the disclosed systems can modify the parameters of the base procedural material to determine procedural material parameters for the target physical material. The disclosed systems can generate a procedural material corresponding to the base procedural material using the determined procedural material parameters.
Digital content layout encoding techniques for search are described. In these techniques, a layout representation is generated (using machine learning automatically and without user intervention) that describes a layout of elements included within the digital content. In an implementation, the layout representation includes a description of both spatial and structural aspects of the elements in relation to each other. To do so, a two-pathway pipeline that is configured to model layout from both spatial and structural aspects using a spatial pathway, and a structural pathway, respectively. In one example, this is also performed through use of multi-level encoding and fusion to generate a layout representation.
A computerized method includes training a first decision tree based model on a first set of data to generate a first trained decision tree based model having a first set of decision trees. The first trained decision tree based model outputs a first prediction based on receiving an input. The method includes training a second decision tree based model on a second set of data to generate a second trained decision tree based model. The second trained decision tree based model comprises the first set of decision trees and a second set of decision trees determined from training the second decision tree based model. The second trained decision tree based model outputs a second prediction based on receiving the input. The method includes training a logistic model to output a final prediction in response to receiving the first prediction and the second prediction.
In implementations of systems for joint trimap estimation and alpha matte prediction, a computing device implements a matting system to estimate a trimap for a frame of a digital video using a first stage of a machine learning model. An alpha matte is predicted for the frame based on the trimap and the frame using a second stage of the machine learning model. The matting system generates a refined trimap and a refined alpha matte for the frame based on the alpha matte, the trimap, and the frame using a third stage of the machine learning model. An additional trimap is estimated for an additional frame of the digital video based on the refined trimap and the refined alpha matte using the first stage of the machine learning model.
An illustrator system accesses a multi-element document including a plurality of elements. The illustrator system selects, from the plurality of elements, a selected element. The illustrator system generates a replacement multi-element document that includes a substitute element in place of the selected element in the multi-element document, wherein the substitute element is different from the selected element. The illustrator system displays, via a user interface with the multi-element document, a preview of the replacement multi-element document providing a view of the replacement multi-element document, wherein the view of the replacement multi-element document is focused to depict the substitute element.
G06F 3/0482 - Interaction with lists of selectable items, e.g. menus
G06F 16/58 - Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
G06F 16/532 - Query formulation, e.g. graphical querying
G06F 16/583 - Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
G06F 3/04847 - Interaction techniques to control parameter settings, e.g. interaction with sliders or dials
Semantic fill techniques are described that support generating fill and editing images from semantic inputs. A user input, for example, is received by a semantic fill system that indicates a selection of a first region of a digital image and a corresponding semantic label. The user input is utilized by the semantic fill system to generate a guidance attention map of the digital image. The semantic fill system leverages the guidance attention map to generate a sparse attention map of a second region of the digital image. A semantic fill of pixels is generated for the first region based on the semantic label and the sparse attention map. The edited digital image is displayed in a user interface.
G06V 10/774 - Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
G06V 10/22 - Image preprocessing by selection of a specific region containing or referencing a pattern; Locating or processing of specific regions to guide the detection or recognition
G06T 3/40 - Scaling of a whole image or part thereof
44.
EXTRAPOLATING PANORAMAS FROM IMAGES USING A GENERATIVE MODEL
Embodiments are disclosed for generating 360-degree panoramas from input narrow field of view images. A method of generating 360-degree panoramas may include obtaining an input image and guide, generating a panoramic projection of the input image, and generating, by a panorama generator, a 360-degree panorama based on the panoramic projection and the guide, wherein the panorama generator is a guided co-modulation generator network trained to generate a 360-degree panorama from the input image based on the guide.
An improved analytics system generates actionable KPI-based customer segments. The analytics system determines predicted outcomes for a key performance indicator (KPI) of interest and a contribution value for each variable indicating an extent to which each variable contributes to predicted outcomes. Topics are generated by applying a topic model to the contribution values for the variables. Each topic comprises a group of variables with a contribution level for each variable that indicates the importance of each variable to the topic. User segments are generated by assigning each user to a topic based on attribution levels output by the topic model.
The present disclosure relates to systems, methods, and non-transitory computer readable media that generate three-dimensional hybrid mesh-volumetric representations for digital objects. For instance, in one or more embodiments, the disclosed systems generate a mesh for a digital object from a plurality of digital images that portray the digital object using a multi-view stereo model. Additionally, the disclosed systems determine a set of sample points for a thin volume around the mesh. Using a neural network, the disclosed systems further generate a three-dimensional hybrid mesh-volumetric representation for the digital object utilizing the set of sample points for the thin volume and the mesh.
The present disclosure relates to systems, non-transitory computer-readable media, and methods for detecting changes to a point of interest between a selected version and a previous version of a digital image and providing a summary of the changes to the point of interest. For example, the disclosed system provides for display a selected version of a digital image and detects a point of interest within the selected version of the digital image. The disclosed system determines image modifications to the point of interest (e.g., tracks changes to the point of interest) to generate a summary of the image modifications. Moreover, the summary can indicate further information concerning image modifications applied to the selected point of interest, such as timestamp, editor, or author information.
The present disclosure relates to systems, methods, and non-transitory computer readable media that generate inpainted digital images utilizing a cascaded modulation inpainting neural network. For example, the disclosed systems utilize a cascaded modulation inpainting neural network that includes cascaded modulation decoder layers. For example, in one or more decoder layers, the disclosed systems start with global code modulation that captures the global-range image structures followed by an additional modulation that refines the global predictions. Accordingly, in one or more implementations, the image inpainting system provides a mechanism to correct distorted local details. Furthermore, in one or more implementations, the image inpainting system leverages fast Fourier convolutions block within different resolution layers of the encoder architecture to expand the receptive field of the encoder and to allow the network encoder to better capture global structure.
42 - Scientific, technological and industrial services, research and design
Goods & Services
Marketing consulting services, namely, providing marketing services for designing, developing and increasing the effectiveness of digital marketing and improving customer engagement; business consulting services, namely, consultation regarding marketing and revenue performance Platform as a service (PAAS) featuring computer software platforms for marketing automation; software as a service (SaaS), namely, providing lead generation and lead management software, and software for account-based marketing; providing online, non-downloadable software for coordinating and automating cross-channel marketing campaigns; providing online non-downloadable software for use in delivering, targeting, analyzing and optimizing digital advertising; software as a service (SAAS) services featuring marketing analytics, marketing attribution and sales intelligence software; providing online non-downloadable software for providing sales insights, sales intelligence and sales engagement; providing online non-downloadable software using data science, namely, artificial intelligence (AI), machine learning, deep learning, statistical learning and data mining, for contextual prediction, personalization, predictive modeling, and content intelligence; software as a Service (SaaS) services featuring software facilitating the use of customer data and analytics to improve customer experience and marketing activities; providing online non-downloadable software using artificial intelligence and machine learning for automating and generating digital conversations; providing online non-downloadable computer chatbot software for automating conversations
50.
ITEM DISTRIBUTION AND SEARCH RANKING ACROSS LISTING PLATFORMS
A system leverages reinforcement learning techniques to determine distribution of items to listing platforms and search ranking rules for each listing platform. Using historical listing data regarding items listed at one or more listing platforms, a machine learning model generates item interaction data, and a reinforcement learning agent is initialized using the item interaction data. The reinforcement learning agent is trained to optimize a function for selecting item distributions and search ranking rules across listing platforms. At each epoch of a series of epochs, the function is used to select an action including a new distribution of items to listing platforms and new search ranking rules to use at each listing platform. After the action from an epoch is implemented, the reinforcement learning agent updates the function, for instance, based on an impact of the action.
Embodiments of the technology described herein are directed to a persona-specific navigation interface for a document. Initially, a user may select a persona associated with a document through a document navigation interface. A machine-learning model may identify an interest within a portion of the document. The interest may be mapped to the persona. A navigation interface that includes a navigable link to the portion of the document is generated and output for display. A user interaction with the navigable link is received. In response to the interaction, the portion of the document corresponding to the navigable link is output for display.
The present disclosure relates to systems, non-transitory computer-readable media, and methods for removing objects from an image stream at capture time of a digital image. For example, the disclosed system contemporaneously detects and segments objects from a digital image stream being previewed in a camera viewfinder graphical user interface of a client device. The disclosed system removes selected objects from the image stream and fills a hole left by the removed object with a content aware fill. Moreover, the disclosed system displays the image stream with the removed object and content fill as the image stream is previewed by a user prior to capturing a digital image from the image stream.
Systems and methods for image processing are configured. Embodiments of the present disclosure encode a content image and a style image using a machine learning model to obtain content features and style features, wherein the content image includes a first object having a first appearance attribute and the style image includes a second object having a second appearance attribute; align the content features and the style features to obtain a sparse correspondence map that indicates a correspondence between a sparse set of pixels of the content image and corresponding pixels of the style image; and generate a hybrid image based on the sparse correspondence map, wherein the hybrid image depicts the first object having the second appearance attribute.
G06T 5/50 - Image enhancement or restoration by the use of more than one image, e.g. averaging, subtraction
G06V 10/40 - Extraction of image or video features
G06V 10/75 - Image or video pattern matching; Proximity measures in feature spaces using context analysis; Selection of dictionaries
G06V 10/77 - Arrangements for image or video recognition or understanding using pattern recognition or machine learning using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
G06V 10/774 - Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
G06V 10/44 - Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
54.
MAPPING OF UNLABELED DATA ONTO A TARGET SCHEMA VIA SEMANTIC TYPE DETECTION
Automatically mapping unlabeled input data onto a target schema via semantic type detection is described. The input data includes data elements that are structured as 2D table rows and columns forming cells. Each data element is included in a cell. The target schema includes a set of fields. Schema mapping includes mapping each column to one or more fields. More particularly, the fields are clustered into field clusters, where each field cluster includes one or more of the fields. Each column is automatically mapped to one of the field clusters of the set of field clusters. The mapping between schema fields and data columns is automatically performed based on appropriate pairings of the detected semantic types, where the semantic types are encoded in vector representations of the fields, the field clusters, and the data elements.
Embodiments are disclosed for real time generative audio for brush and canvas interaction in digital drawing. The method may include receiving a user input and a selection of a tool for generating audio for a digital drawing interaction. The method may further include generating intermediary audio data based on the user input and the tool selection, wherein the intermediary audio data includes a pitch and a frequency. The method may further include processing, by a trained audio transformation model and through a series of one or more layers of the trained audio transformation model, the intermediary audio data. The method may further include adjusting the series of one or more layers of the trained audio transformation model to include one or more additional layers to produce an adjusted audio transformation model. The method may further include generating, by the adjusted audio transformation model, an audio sample based on the intermediary audio data.
G06F 3/04883 - Interaction techniques based on graphical user interfaces [GUI] using specific features provided by the input device, e.g. functions controlled by the rotation of a mouse with dual sensing arrangements, or of the nature of the input device, e.g. tap gestures based on pressure sensed by a digitiser using a touch-screen or digitiser, e.g. input of commands through traced gestures for inputting data by handwriting, e.g. gesture or text
G06F 3/04842 - Selection of displayed objects or displayed text elements
56.
AUTO-GENERATING VIDEO TO ILLUSTRATE A PROCEDURAL DOCUMENT
Systems and methods for video processing are configured. Embodiments of the present disclosure receive a procedural document comprising a plurality of instructions; extract a plurality of key concepts for an instruction of the plurality of instructions; compute an information coverage distribution for each of a plurality of candidate multi-media assets, wherein the information coverage distribution indicates whether a corresponding multi-media asset relates to each of the plurality of key concepts; select a set of multi-media assets for the instruction based on the information coverage distribution; and generate a multi-media presentation describing the procedural document by combining the set of multi-media assets based on a presentation template.
G06V 10/774 - Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
G06V 10/764 - Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
G10L 13/02 - Methods for producing synthetic speech; Speech synthesisers
G06F 16/41 - Indexing; Data structures therefor; Storage structures
G06N 5/00 - Computing arrangements using knowledge-based models
57.
UTILIZING MACHINE LEARNING MODELS TO PROCESS LOW-RESULTS WEB QUERIES AND GENERATE WEB ITEM DEFICIENCY PREDICTIONS AND CORRESPONDING USER INTERFACES
Methods, systems, and non-transitory computer readable media are disclosed for utilizing machine learning models to extract digital signals from low-results web queries and generate item demand deficiency predictions for digital item lists corresponding to websites. In one or more embodiments, the disclosed systems identify a low-results query submitted by client devices navigating a website. The disclosed systems generate features for the low-results query and the digital item list to generate a deficiency prediction relative to demand indicated by the low-results query. In some embodiments, the disclosed system utilizes a deficiency prediction model to process the extracted signals and generate a deficiency confidence score corresponding to the low-results query. Based on the deficiency confidence score, the disclosed system can generate and provide demand notifications via one or more graphical user interfaces.
In implementations of augmented reality systems for comparing physical objects, a computing device implements a comparison system to detect physical objects and physical markers depicted in frames of a digital video captured using an image capture device and displayed in a user interface. The comparison system associates a physical object of the physical objects with a physical marker of the physical markers based on an association distance estimated using two-dimensional coordinates of the user interface corresponding to a center of the physical object and a distance from the image capture device to the physical marker. Characteristics of the physical object are determined that are not displayed in the user interface based on an identifier of the physical marker. The comparison system generates a virtual object for display in the user interface that includes indications of a subset of the characteristics of the physical object.
An image inpainting system is described that receives an input image that includes a masked region. From the input image, the image inpainting system generates a synthesized image that depicts an object in the masked region by selecting a first code that represents a known factor characterizing a visual appearance of the object and a second code that represents an unknown factor characterizing the visual appearance of the object apart from the known factor in latent space. The input image, the first code, and the second code are provided as input to a generative adversarial network that is trained to generate the synthesized image using contrastive losses. Different synthesized images are generated from the same input image using different combinations of first and second codes, and the synthesized images are output for display.
In implementations of music enhancement systems, a computing device implements an enhancement system to receive input data describing a recorded acoustic waveform of a musical instrument. The recorded acoustic waveform is represented as an input mel spectrogram. The enhancement system generates an enhanced mel spectrogram by processing the input mel spectrogram using a first machine learning model trained on a first type of training data to generate enhanced mel spectrograms based on input mel spectrograms. An acoustic waveform of the musical instrument is generated by processing the enhanced mel spectrogram using a second machine learning model trained on a second type of training data to generate acoustic waveforms based on mel spectrograms. The acoustic waveform of the musical instrument does not include an acoustic artifact that is included in the recorded waveform of the musical instrument.
G10H 1/00 - ELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE - Details of electrophonic musical instruments
G10H 1/06 - Circuits for establishing the harmonic content of tones
Embodiments are disclosed for a machine learning-based chroma keying process. The method may include receiving an input including an image depicting a chroma key scene and a color value corresponding to a background color of the image. The method may further include generating a preprocessed image by concatenating the image and the color value. The method may further include providing the preprocessed image to a trained neural network. The method may further include generating, using the trained neural network, an alpha matte representation of the image based on the preprocessed image.
The present disclosure relates to systems, non-transitory computer-readable media, and methods for combining digital images. In particular, in one or more embodiments, the disclosed systems combine latent codes of a source digital image and a target digital image utilizing a blending network to determine a combined latent encoding and generate a combined digital image from the combined latent encoding utilizing a generative neural network. In some embodiments, the disclosed systems determine an intersection face mask between the source digital image and the combined digital image utilizing a face segmentation network and combine the source digital image and the combined digital image utilizing the intersection face mask to generate a blended digital image.
A generative neural network control system controls a generative neural network by modifying the intermediate latent space in the generative neural network. The generative neural network includes multiple layers each generating a set of activation values. An initial layer (and optionally additional layers) receives an input latent vector, and a final layer outputs an image generated based on the input latent vector. The data that is input to each layer (other than the initial layer) is referred to as data in an intermediate latent space. The data in the intermediate latent space includes activation values (e.g., generated by the previous layer or modified using various techniques) and optionally a latent vector. The generative neural network control system modifies the intermediate latent space to achieve various different effects when generating a new image.
Systems and methods for machine learning are described. Embodiments of the present disclosure receive state information that describes a state of a decision making agent in an environment; compute an action vector from an action embedding space based on the state information using a policy neural network of the decision making agent, wherein the policy neural network is trained using reinforcement learning based on a topology loss that constrains changes in a mapping between an action set and the action embedding space; and perform an action that modifies the state of the decision making agent in the environment based on the action vector, wherein the action is selected based on the mapping.
The present disclosure relates to systems, non-transitory computer-readable media, and methods for incorporating unobserved behaviors when generating user segments or predictions of future user actions. In particular, in one or more embodiments, the disclosed systems utilize a deep learning-based clustering algorithm that segments the behavioral history of users based on a future outcome. Further, the disclosed systems recognize that users may exhibit behaviors that represent two or more segments and allow for targeted marketing to users based on the user’s inclusion in multiple segments.
Embodiments are disclosed for performing a using a neural network to optimize filter weights of an adaptive filter. In particular, in one or more embodiments, the disclosed systems and methods comprise receiving, by a filter, an input audio signal, wherein the input audio signal is a far-end audio signal, the filter including a transfer function with adaptable filter weights, generating a response audio signal modeling the input audio signal passing through the acoustic environment, receiving a target response signal, including the input audio signal and near-end audio signals, calculating an adaptive filter loss, generating, by a trained recurrent neural network, a filter weight update using the calculated adaptive filter loss, updating the adaptable filter weights of the transfer function to create an updated transfer function, generating an updated response audio signal based on the updated transfer function, and providing the updated response audio signal as an output audio signal.
G10L 25/30 - Speech or voice analysis techniques not restricted to a single one of groups characterised by the analysis technique using neural networks
G10L 25/18 - Speech or voice analysis techniques not restricted to a single one of groups characterised by the type of extracted parameters the extracted parameters being spectral information of each sub-band
Systems and techniques for privacy preserving document analysis are described that derive insights pertaining to a digital document without communication of the content of the digital document. To do so, the privacy preserving document analysis techniques described herein capture visual or contextual features of the digital document and creates a stamp representation that represents these features without included the content of the digital document. The stamp representation is projected into a stamp embedding space based on a stamp encoding model generated through machine learning techniques capturing feature patterns and interaction in the stamp representations. The stamp encoding model exploits these feature interactions to define similarity of source documents based on location within the stamp embedding space. Accordingly, the techniques described herein can determine a similarity of documents without having access to the documents themselves.
Methods and systems disclosed herein relate generally to systems and methods for analyzing various stroke properties determined from strokes inputted by a user to generate a new glyph set for rendering type characters. A font-generating application receives, via a stroke input on a typographic layer presented on a user interface, strokes that trace a visual appearance of a glyph set comprising one or more glyphs. The font-generating application determines stroke properties for the strokes. The font-generating application constructs a new glyph set from the stroke properties. The font-generating application applies the new glyph set to render, on a user interface, one or more type characters that match a visual appearance of the new glyph set.
An image differentiation system receives input feature vectors for multiple input images and reference feature vectors for multiple reference images. In some cases, the feature vectors are extracted by an image feature extraction module trained based on training image triplets. A differentiability scoring module determines a differentiability score for each input image based on a distance between the input feature vectors and the reference feature vectors. The distance for each reference feature vector is modified by a weighting factor based on interaction metrics associated with the corresponding reference image. In some cases, an input image is identified as a differentiated image based on the corresponding differentiability score. Additionally or alternatively, an image modification module determines an image modification that increases the differentiability score of the input image. The image modification module generates a recommended image by applying the image modification to the input image.
G06T 5/50 - Image enhancement or restoration by the use of more than one image, e.g. averaging, subtraction
G06V 10/40 - Extraction of image or video features
G06F 18/2113 - Selection of the most significant subset of features by ranking or filtering the set of features, e.g. using a measure of variance or of feature cross-correlation
Embodiments are disclosed for interlacing vector objects. A method of interlacing vector objects may include receiving a selection of a first vector object of an image. The method may further include detecting a second vector object of the image, wherein the second vector object is different than the first vector object. The method may further include determining a first depth position for the first vector object and a second depth position for the second vector object. The method may further include interlacing the second vector object and the first vector object, wherein interlacing comprises drawing the first vector object based on the first depth position and the second vector object based on the second depth position.
Embodiments are disclosed for performing fact correction of natural language sentences using data tables. In particular, in one or more embodiments, the disclosed systems and methods comprise receiving an input sentence, tokenizing elements of the input sentence, and identifying, by a first machine learning model, a data table associated with the input sentence. The systems and methods further comprise a second machine learning model identifying a tokenized element of the input sentence that renders the input sentence false based on the data table and masking the tokenized element of the tokenized input sentence that renders the input sentence false. The systems and method further includes a third machine learning model predicting a new value for the masked tokenized element based on the input sentence with the masked tokenized element and the identified data table and providing an output including a modified input sentence with the new value.
Systems and methods for customer journey optimization are described. One or more aspects of the systems and methods include displaying a workflow canvas including a representation of a customer journey corresponding to a digital content channel, wherein the customer journey comprises an ordered sequence of event definitions; displaying the digital content channel within the customer journey user interface; monitoring user interactions with digital content channel; receiving an event payload generated in response to a user interaction with the digital content channel based on the monitoring, wherein the event payload comprises event data describing the user interaction; generating an event definition based on the event data from the event payload, wherein the event definition defines a category of user interaction events on the digital content channel; adding the event definition to the customer journey; and displaying a representation of the customer journey including a visual representation of the added event definition.
Systems and methods for customer journey orchestration are described. One or more aspects of the systems and methods include identifying, by a customer journey orchestration application, a customer journey having a previously unidentified fault; initiating, by a mode selection component, a debug mode of the customer journey orchestration application for the customer journey; receiving, by a graphical user interface of the customer journey orchestration application, a user input corresponding to an event of a plurality of events of the customer journey; simulating, by an event simulation component, the event based on the user input and the debug mode; determining, by a status component, a status of the event based on the simulation; and identifying, by a fault identification component, the previously unidentified fault based on the status of the event.
Systems and methods for image processing are described. Embodiments of the present disclosure identify a plurality of candidate concepts in a knowledge graph (KG) that correspond to an image tag of an image; generate an image embedding of the image using a multi-modal encoder; generate a concept embedding for each of the plurality of candidate concepts using the multi-modal encoder; select a matching concept from the plurality of candidate concepts based on the image embedding and the concept embedding; and generate association data between the image and the matching concept.
G06V 10/74 - Image or video pattern matching; Proximity measures in feature spaces
G06V 10/771 - Feature selection, e.g. selecting representative features from a multi-dimensional feature space
G06V 10/77 - Arrangements for image or video recognition or understanding using pattern recognition or machine learning using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
G06V 10/82 - Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
G06V 10/774 - Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
75.
SEMANTIC STRUCTURE IDENTIFICATION FOR DOCUMENT AUTOSTYLING
Systems and methods for natural language processing are described. Embodiments of the present disclosure receive plain text comprising a sequence of text entities; generate a sequence of entity embeddings based on the plain text, wherein each entity embedding in the sequence of entity embeddings is generated based on a text entity in the sequence of text entities; generate style information for the text entity based on the sequence of entity embeddings; and generate a document based on the style information.
The present disclosure relates to systems, non-transitory computer-readable media, and methods for utilizing machine learning models to generate refined depth maps of digital images utilizing digital segmentation masks. In particular, in one or more embodiments, the disclosed systems generate a depth map for a digital image utilizing a depth estimation machine learning model, determine a digital segmentation mask for the digital image, and generate a refined depth map from the depth map and the digital segmentation mask utilizing a depth refinement machine learning model. In some embodiments, the disclosed systems generate first and second intermediate depth maps using the digital segmentation mask and an inverse digital segmentation mask and merger the first and second intermediate depth maps to generate the refined depth map.
Digital image synthesis techniques are described that leverage splatting, i.e., forward warping. In one example, a first digital image and a first optical flow are received by a digital image synthesis system. A first splat metric and a first merge metric are constructed by the digital image synthesis system that defines a weighted map of respective pixels. From this, the digital image synthesis system produces a first warped optical flow and a first warp merge metric corresponding to an interpolation instant by forward warping the first optical flow based on the splat metric and the merge metric. A first warped digital image corresponding to the interpolation instant is formed by the digital image synthesis system by backward warping the first digital image based on the first warped optical flow.
In implementations of systems for single image reflection removal, a computing device implements a removal system to receive data describing a digital image that depicts light reflected by a surface and light transmitted through the surface. The removal system predicts an edge map of a transmitted image for the light transmitted through the surface by processing the data using a first machine learning model trained on a first type of training data. A reflected component is predicted for the light reflected by the surface by processing the data using a second machine learning model trained on a second type of training data. A corrected digital image is generated that does not depict the light reflected by the surface based on the data, the edge map of the transmitted image, and the reflected component.
A model training system is described that obtains a training dataset including videos and text labels. The model training system generates a video-text classification model by causing a model having a dual image text encoder architecture to predict which of the text labels describes each video in the training dataset. Predictions output by the model are compared to the training dataset to determine distillation and contrastive losses, which are used to adjust internal weights of the model during training. The internal weights of the model are then combined with internal weights of a trained image-text classification model to generate the video-text classification model. The video text-classification model is configured to generate a video or text output that classifies a video or text input.
The present disclosure relates to systems, methods, and non-transitory computer readable media that generates composite images via auto-compositing features. For example, in one or more embodiments, the disclosed systems determine a background image and a foreground object image for use in generating a composite image. The disclosed systems further provide, for display within a graphical user interface of a client device, at least one selectable option for executing an auto-composite model for the composite image, the auto-composite model comprising at least one of a scale prediction model, a harmonization model, or a shadow generation model. The disclosed systems detect, via the graphical user interface, a user selection of the at least one selectable option and generate, in response to detecting the user selection, the composite image by executing the auto-composite model using the background image and the foreground object image.
Automatic font synthesis for modifying a local font to have an appearance that is visually similar to a source font is described. A font modification system receives an electronic document including the source font together with an indication of a font descriptor for the source font. The font descriptor includes information describing various font attributes for the source font, which define a visual appearance of the source font. Using the source font descriptor, the font modification system identifies a local font that is visually similar in appearance to the source font by comparing local font descriptors to the source font descriptor. A visually similar font is then synthesized by modifying glyph outlines of the local font to achieve the visual appearance defined by the source font descriptor. The synthesized font is then used to replace the source font and output in the electronic document at the computing device.
G06V 30/244 - Division of the character sequences into groups prior to recognition; Selection of dictionaries using graphical properties, e.g. alphabet type or font
82.
RECOMMENDING OBJECTS FOR IMAGE COMPOSITION USING A GEOMETRY-AND-LIGHTING AWARE NEURAL NETWORK
The present disclosure relates to systems, methods, and non-transitory computer readable media that utilizes artificial intelligence to learn to recommend foreground object images for use in generating composite images based on geometry and/or lighting features. For instance, in one or more embodiments, the disclosed systems transform a foreground object image corresponding to a background image using at least one of a geometry transformation or a lighting transformation. The disclosed systems further generating predicted embeddings for the background image, the foreground object image, and the transformed foreground object image within a geometry-lighting-sensitive embedding space utilizing a geometry-lighting-aware neural network. Using a loss determined from the predicted embeddings, the disclosed systems update parameters of the geometry-lighting-aware neural network. The disclosed systems further provide a variety of efficient user interfaces for generating composite digital images.
The present disclosure relates to systems, methods, and non-transitory computer readable media that utilizes artificial intelligence to learn to recommend foreground object images for use in generating composite images based on geometry and/or lighting features. For instance, in one or more embodiments, the disclosed systems transform a foreground object image corresponding to a background image using at least one of a geometry transformation or a lighting transformation. The disclosed systems further generating predicted embeddings for the background image, the foreground object image, and the transformed foreground object image within a geometry-lighting-sensitive embedding space utilizing a geometry-lighting-aware neural network. Using a loss determined from the predicted embeddings, the disclosed systems update parameters of the geometry-lighting-aware neural network. The disclosed systems further provide a variety of efficient user interfaces for generating composite digital images.
Digital synthesis techniques are described to synthesize a digital image at a target time between a first digital image and a second digital image. To begin, an optical flow generation module is employed to generate optical flows. The digital images and optical flows are then received as an input by a motion refinement system. The motion refinement system is configured to generate data describing many-to-many relationships mapped for pixels in the plurality of digital images and reliability scores of the many-to-many relationships. The reliability scores are then used to resolve overlaps of pixels that are mapped to a same location by a synthesis module to generate a synthesized digital image.
Embodiments are disclosed for user-guided variable-rate compression. A method of user-guided variable-rate compression includes receiving a request to compress an image, the request including the image, a corresponding importance data, and a target bitrate, providing the image, the corresponding importance data, and the target bitrate to a compression network, generating, by the compression network, a learned importance map and a representation of the image, and generating, by the compressing network, a compressed representation of the image based on the learned importance map and the representation of the image.
Systems and methods are described for rendering garments. The system includes a first machine learning model trained to generate coarse garment templates of a garment and a second machine learning model trained to render garment images. The first machine learning model generates a coarse garment template based on position data. The system produces a neural texture for the garment, the neural texture comprising a multi-dimensional feature map characterizing detail of the garment. The system provides the coarse garment template and the neural texture to the second machine learning model trained to render garment images. The second machine learning model generates a rendered garment image of the garment based on the coarse garment template of the garment and the neural texture.
09 - Scientific and electric apparatus and instruments
35 - Advertising and business services
42 - Scientific, technological and industrial services, research and design
Goods & Services
Downloadable software for use by creators, artists, authors, and owners of digital works to demonstrate the authenticity and provenance of their works; Downloadable software for authenticating digital images; Downloadable image and multimedia files containing artwork, video, and photographs. Promoting the digital works of others by means of providing digital content authentication tools and providing information relating thereto; Promoting public interest and awareness of content authenticity and provenance. Providing online non-downloadable software for use by creators, artists, authors, and owners of digital works to demonstrate the authenticity and provenance of their works; Authentication in the field of digital images, artwork, videos, and photographs; Providing a website featuring technology that enables users to demonstrate the authenticity and provenance of their works; providing temporary use of non-downloadable software for use in connection with authentication of images and multimedia files containing artwork, video, and photographs; Providing a website featuring technology that enables users to facilitate the authentication of digital artwork, images, and photographs; Development of voluntary standards for authentication of digital works featuring an online protocol for authenticating works of others for the reproduction and use of said material in digital formats and providing information relating thereto.
09 - Scientific and electric apparatus and instruments
35 - Advertising and business services
42 - Scientific, technological and industrial services, research and design
Goods & Services
Downloadable software for use by creators, artists, authors, and owners of digital works to demonstrate the authenticity and provenance of their works; Downloadable software for authenticating digital images; Downloadable image and multimedia files containing artwork, video, and photographs. Promoting the digital works of others by means of providing digital content authentication tools and providing information relating thereto; Promoting public interest and awareness of content authenticity and provenance. Providing online non-downloadable software for use by creators, artists, authors, and owners of digital works to demonstrate the authenticity and provenance of their works; Authentication in the field of digital images, artwork, videos, and photographs; Providing a website featuring technology that enables users to demonstrate the authenticity and provenance of their works; providing temporary use of non-downloadable software for use in connection with authentication of images and multimedia files containing artwork, video, and photographs; Providing a website featuring technology that enables users to facilitate the authentication of digital artwork, images, and photographs; Development of voluntary standards for authentication of digital works featuring an online protocol for authenticating works of others for the reproduction and use of said material in digital formats and providing information relating thereto.
09 - Scientific and electric apparatus and instruments
35 - Advertising and business services
42 - Scientific, technological and industrial services, research and design
Goods & Services
Downloadable software for use by creators, artists, authors, and owners of digital works to demonstrate the authenticity and provenance of their works; Downloadable software for authenticating digital images; Downloadable image and multimedia files containing artwork, video, and photographs Promoting the digital works of others by means of providing digital content authentication tools and providing information relating thereto; Promoting public interest and awareness of content authenticity and provenance Providing online non-downloadable software for use by creators, artists, authors, and owners of digital works to demonstrate the authenticity and provenance of their works; Authentication in the field of digital images, artwork, videos, and photographs; Providing a website featuring technology that enables users to demonstrate the authenticity and provenance of their works; Providing online non-downloadable image and multimedia files containing artwork, video, and photographs; Providing a website featuring technology that enables users to facilitate the authentication of digital artwork, images, and photographs; Development of voluntary standards for authentication of digital works featuring an online protocol for authenticating works of others for the reproduction and use of said material in digital formats and providing information relating thereto
09 - Scientific and electric apparatus and instruments
35 - Advertising and business services
42 - Scientific, technological and industrial services, research and design
Goods & Services
Downloadable software for use by creators, artists, authors, and owners of digital works to demonstrate the authenticity and provenance of their works; Downloadable software for authenticating digital images; Downloadable image and multimedia files containing artwork, video, and photographs Promoting the digital works of others by means of providing digital content authentication tools and providing information relating thereto; Promoting public interest and awareness of content authenticity and provenance Providing online non-downloadable software for use by creators, artists, authors, and owners of digital works to demonstrate the authenticity and provenance of their works; Authentication in the field of digital images, artwork, videos, and photographs; Providing a website featuring technology that enables users to demonstrate the authenticity and provenance of their works; Providing online non-downloadable image and multimedia files containing artwork, video, and photographs; Providing a website featuring technology that enables users to facilitate the authentication of digital artwork, images, and photographs; Development of voluntary standards for authentication of digital works featuring an online protocol for authenticating works of others for the reproduction and use of said material in digital formats and providing information relating thereto
91.
GENERATING AND MODIFYING DIGITAL IMAGES USING A JOINT FEATURE STYLE LATENT SPACE OF A GENERATIVE NEURAL NETWORK
The present disclosure relates to systems, non-transitory computer-readable media, and methods for latent-based editing of digital images using a generative neural network. In particular, in one or more embodiments, the disclosed systems perform latent-based editing of a digital image by mapping a feature tensor and a set of style vectors for the digital image into a joint feature style space. In one or more implementations, the disclosed systems apply a joint feature style perturbation and/or modification vectors within the joint feature style space to determine modified style vectors and a modified feature tensor. Moreover, in one or more embodiments the disclosed systems generate a modified digital image utilizing a generative neural network from the modified style vectors and the modified feature tensor.
The present disclosure relates to systems, methods, and non-transitory computer readable media for generating painted digital images utilizing an intelligent painting process that includes progressive layering, sequential brushstroke guidance, and/or brushstroke regularization. For example, the disclosed systems utilize an image painting model to perform progressive layering to generate and apply digital brushstrokes in a progressive fashion for different layers associated with a background canvas and foreground objects. In addition, the disclosed systems utilize sequential brushstroke guidance to generate painted foreground objects by sequentially shifting through attention windows for regions of interest in a target digital image. Furthermore, the disclosed systems utilize brushstroke regularization to generate and apply an efficient brushstroke sequence to generate a painted digital image.
An item recommendation system receives a set of recommendable items and a request to select, from the set of recommendable items, a contrast group. The item recommendation system selects a contrast group from the set of recommendable items by applying a image modification model to the set of recommendable items. The image modification model includes an item selection model configured to determine an unbiased conversion rate for each item of the set of recommendable items and select a recommended item from the set of recommendable items having a greatest unbiased conversion rate. The image modification model includes a contrast group selection model configured to select, for the recommended item, a contrast group comprising the recommended item and one or more contrast items. The item recommendation system transmits the contrast group responsive to the request.
In implementations of systems for accessible digital painting, a computing device implements a landmark system to receive input data describing a coordinate of a first type of user interaction in a user interface of a digital canvas. The landmark system determines that the coordinate of the first type of user interaction is within a threshold distance of a coordinate of a digital landmark that corresponds to a visual feature of a visual layer of the digital canvas. Feedback is generated that indicates the coordinate of the first type of user interaction is within the threshold distance of the coordinate of the digital landmark. Additional input data is received describing a coordinate of a second type of user interaction in the user interface based on the feedback. The landmark system generates a stroke of digital paint for display in the user interface based on the additional input data.
G06F 3/04883 - Interaction techniques based on graphical user interfaces [GUI] using specific features provided by the input device, e.g. functions controlled by the rotation of a mouse with dual sensing arrangements, or of the nature of the input device, e.g. tap gestures based on pressure sensed by a digitiser using a touch-screen or digitiser, e.g. input of commands through traced gestures for inputting data by handwriting, e.g. gesture or text
An effective stock keeping unit (SKU) management system encodes catalog data into an embedding per catalog item. An embedding space is created by encoding catalog item data into an embedding per catalog item. The embedding is created by generating an index, where a number of rows represents a number of catalog items and a number of columns represents a number of fields associated with each catalog item. The index is then denormalized using customer groups and transformed by compressing the number of columns, to create the embedding space. In some configuration, a machine learning model is trained using catalog data. In the embedding space, item similarity is encoded by clustering catalog SKUs into groups in the embedding space, by placing similarly related items close to each other in the embedding space. Catalog items are then searched for in the embedding, with the closest clusters searched for a particular catalog item.
Systems, methods, and computer storage media are disclosed for predicting visual compatibility between a bundle of catalog items (e.g., a partial outfit) and a candidate catalog item to add to the bundle. Visual compatibility prediction may be jointly conditioned on item type, context, and style by determining a first compatibility score jointly conditioned on type (e.g., category) and context, determining a second compatibility score conditioned on outfit style, and combining the first and second compatibility scores into a unified visual compatibility score. A unified visual compatibility score may be determined for each of a plurality of candidate items, and the candidate item with the highest unified visual compatibility score may be selected to add to the bundle (e.g., fill the in blank for the partial outfit).
G06V 10/764 - Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
G06V 10/82 - Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
G06V 10/44 - Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
Disclosed are computer-implemented methods and systems for generating text descriptive of digital images, comprising using a machine learning model to pre-process an image to generate initial text descriptive of the image; adjusting one or more inferences of the machine learning model, the inferences biasing the machine learning model away from associating negative words with the image; using the machine learning model comprising the adjusted inferences to post-process the image to generate updated text descriptive of the image; and processing the generated updated text descriptive of the image outputted by the machine learning model to fine-tune the updated text descriptive of the image.
Systems and methods for face annotation are described. One or more of the systems and methods include receiving a plurality of annotated images, wherein each annotated image of the annotated images comprises a caption; cropping the annotated image based on a face detection algorithm to obtain a face crop; comparing the face crop to the caption corresponding to the annotated image to obtain a caption similarity score; and filtering the plurality of annotated images based on the caption similarity score to obtain a plurality of annotated face images.
G06V 40/16 - Human faces, e.g. facial parts, sketches or expressions
G06V 10/22 - Image preprocessing by selection of a specific region containing or referencing a pattern; Locating or processing of specific regions to guide the detection or recognition
G06V 10/74 - Image or video pattern matching; Proximity measures in feature spaces
Systems and methods for object tracking are described. One or more aspects of the systems and methods include receiving a video depicting an object; generating object tracking information for the object using a student network, wherein the student network is trained in a second training phase based on a teacher network using an object tracking training set and a knowledge distillation loss that is based on an output of the student network and the teacher network, and wherein the teacher network is trained in a first training phase using an object detection training set that is augmented with object tracking supervision data; and transmitting the object tracking information in response to receiving the video.
G06T 3/40 - Scaling of a whole image or part thereof
G06T 3/00 - Geometric image transformation in the plane of the image
G06V 10/764 - Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
G06V 10/774 - Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
G06V 10/82 - Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
The present disclosure relates to systems, methods, and non-transitory computer-readable media that recommends application features of software applications based on in-application behavior and provides the recommendations within a dynamically updating graphical user interface. For instance, in one or more embodiments, the disclosed systems utilize behavioral signals reflecting the behavior of a user with respect to one or more software applications to recommend application features of the software application(s). For instance, in some cases, the disclosed systems recommend an application feature related to recent activity user, an application feature from a curated recommendation list that has yet to be viewed, and/or an application feature determined via machine learning. In some embodiments, the disclosed systems dynamically update a graphical user interface of a client device in real time as the user utilizes the client device to access and navigate the software application(s) to display these recommendations.