An electronic device generates training data to train a classifier to classify a respective search query as complete or incomplete, including: obtaining a first search query input by a first user; determining a media content item selected by the first user from the first search query; comparing metadata associated with the media content item with the first search query input by the first user; and labeling the first search query as complete or incomplete based on the comparison. The electronic device trains the classifier, using the generated training data, to classify a respective search query as complete or incomplete and uses the trained classifier to determine whether a second search query is complete or incomplete. The electronic device provides, for display, for a second user, one or more complete search queries as recommendations for a received search query, including the second search query if second search query is complete.
The various implementations described herein include methods and devices for searching of audio content. In one aspect, a method includes obtaining a query string for audio content and obtaining a plurality of audio content results corresponding to the query string. The method further includes selecting a subset of results from the plurality of audio content results, including selecting respective search results from a plurality of sub-topic clusters, and sequencing the subset of audio content results. The method also includes causing the sequenced subset of audio content results to be presented to a user.
The various implementations described herein include methods and devices for identifying a language in audio content. In one aspect, a method includes obtaining audio content and generating a speaker embedding from the audio content. The method further includes determining, via a language identification model, a language of the audio content based on the speaker embedding.
G10L 17/02 - Opérations de prétraitement, p.ex. sélection de segment; Représentation ou modélisation de motifs, p.ex. fondée sur l’analyse linéaire discriminante [LDA] ou les composantes principales; Sélection ou extraction des caractéristiques
G10L 21/028 - Séparation du signal de voix utilisant les propriétés des sources sonores
A method includes obtaining lyrics text and audio for a media item and generating, using a first encoder, a first plurality of embeddings representing symbols that appear in the lyrics text for the media item. The method includes generating, using a second encoder, a second plurality of embeddings representing an acoustic representation of the audio for the media item. The method includes determining respective similarities between embeddings of the first plurality of embeddings and embeddings of the second plurality of embeddings and aligning the lyrics text and the audio for the media item based on the respective similarities. The method includes, while streaming the audio for the media item, providing, for display, the aligned lyrics text with the streamed audio.
An electronic device obtains a plurality of media items, including, for each media item in the plurality, a set of attributes of the media item. The device provides the set of attributes for each media item of the plurality of media items to a machine learning model that is trained to determine a pairwise similarity between respective media items in the plurality of media items and generates an acyclic graph of an output of the machine learning model that is trained to determine pairwise similarity distances between respective media items in the plurality of media items. The device clusters nodes of the acyclic graph, each node corresponding to a media item. Based on the clustering, the electronic device modifies metadata associated with a first media item in a first cluster and displays a representation of the first media item in a user interface according to the modified metadata.
G06F 16/64 - Navigation; Visualisation à cet effet
G06F 16/683 - Recherche de données caractérisée par l’utilisation de métadonnées, p.ex. de métadonnées ne provenant pas du contenu ou de métadonnées générées manuellement utilisant des métadonnées provenant automatiquement du contenu
7.
SYSTEMS AND METHODS FOR GENERATING PERSONALIZED PLAYLISTS
The various implementations described herein include methods and devices for generating personalized playlists. In one aspect, a method includes obtaining information about recent media items presented to a user, the information including data about a respective time of day and day of week each media item was presented to the user. The method further includes grouping the recent media items into clusters based on time of day and day of week; and generating a recommendation vector using a weighted average of the clusters. The method also includes generating a playlist for the user by identifying a plurality of media items using the recommendation vector; and presenting the playlist to the user.
G06F 16/683 - Recherche de données caractérisée par l’utilisation de métadonnées, p.ex. de métadonnées ne provenant pas du contenu ou de métadonnées générées manuellement utilisant des métadonnées provenant automatiquement du contenu
8.
SYSTEMS, METHODS, AND COMPUTER PROGRAM PRODUCTS FOR PROVIDING SIMULATOR AUGMENTED CONTENT SELECTION
Simulator augmented content selection is provided by initializing a content selection object according to session initialization parameter values associated with a simulated media content playback session. The content selection object corresponds to a candidate content selection machine learning model trained to predict selectable content media items for at least one simulated user. A simulated session including a sequence of predicted simulated user next actions and one or more predicted sets of selectable content items are generated by applying a simulated user model to content items identified by the initialized content selection object, where the simulated user model is trained to predict a next action of the simulated user in response to a simulated playback input received from the simulated user and each set of the selectable content items are correlated to each next action in the sequence of predicted simulated user next actions.
H04L 65/1069 - Gestion de session Établissement ou terminaison d'une session
H04L 65/613 - Diffusion en flux de paquets multimédias pour la prise en charge des services de diffusion par flux unidirectionnel, p.ex. radio sur Internet pour la commande de la source par la destination
The various implementations described herein include methods and devices for identifying and presenting content to users. In one aspect, a method includes providing a domain specific language (DSL) tool to a user of the computing device and receiving a plurality of user inputs via the DSL tool. The plurality of user inputs includes: an input identifying a DSL object corresponding to a media pool; an input identifying a DSL object corresponding to a mutator to be applied to the media pool; and inputs identifying a plurality of DSL objects corresponding to respective objectives for a media set list. The method also includes generating the media set list from the media pool based on the mutator and the objectives and presenting information about the generated media set list to the user.
A method for personalizing media content for a user is provided. The method includes, at an electronic device, streaming a first media item from a first set of media items, the first set of media items compiled using a first recommendation hypothesis. The method further includes, while streaming the first media item, in response to a first user request, selecting, without user intervention, a second set of media items, distinct from the first set of media items, including determining a presentation order of a plurality of sets of media items using a heuristic applied to the plurality of sets of media items. The second set of media items is compiled using a second recommendation hypothesis, wherein the second recommendation hypothesis is distinct from the first recommendation hypothesis. The method includes streaming a second media item from the second set of media items.
H04L 65/613 - Diffusion en flux de paquets multimédias pour la prise en charge des services de diffusion par flux unidirectionnel, p.ex. radio sur Internet pour la commande de la source par la destination
H04L 65/1089 - Procédures en session en supprimant des médias
A method for processing voice input is disclosed. The method may be performed by a device including a voice assistant manager and a plurality of voice assistants. In some embodiments, the method includes receiving an utterance from a user, detecting a category of the utterance, and communicating the utterance to a selected voice assistant of the plurality of voice assistants. The selected voice assistant may be associated with the detected category. In some embodiments, the selected voice assistant may generate a response to utterance, and the response may be output to the user.
A training audio track feature vector is generated for training audio tracks. The training audio track feature vector includes training track vector components based on one or more feature sets. Each of the training track vector components is grouped into at least one cluster. Audio filters are mapped to one or more of the clusters, thereby building a feature-filter mapping function. Mapping functions from filters to audio output devices and/or physical space acoustic features can also be built. A media playback device receives the mapping function(s) and is enabled to apply the mapping function(s) to a query audio track feature vector to identify at least one audio filter corresponding to the query audio track. The media playback device can then apply the at least one audio filter to the query audio track.
A system for processing voice requests includes a voice assistant manager and a plurality of voice assistants. The voice assistant manager detects a wake word in an utterance and communicates the utterance to a voice assistant of the plurality of voice assistants. In some embodiments, the voice assistant may verify the detected wake word and communicate with a cloud service, which may also verify the detected wake word and generate a response to the utterance. In some embodiments, the voice assistant manager may activate or deactivate one or more of the voice assistants.
A method for managing voice assistants is disclosed. The method may be performed by a voice assistant controller communicatively coupled to a plurality of voice assistants. The voice assistant controller may determine a first order of the plurality of voice assistants. Based at least in part on the first order, the voice assistant controller may activate one or more voice assistants. Furthermore, the voice assistant controller may determine a second order of the plurality of voice assistants. Based at least in part on the second order, the voice assistant controller may suspend an active assistant, activate a suspended assistant, or perform both operations.
A server performs a method of controlling the manipulation of a playlist that includes a queue of media items to be played. The method includes authorizing a first electronic device to control the manipulation of the playlist and generating the playlist based on a set of media preferences associated with the first electronic device. The method further includes, after authorizing a second electronic device to manipulate the playlist, receiving, from the second electronic device, a request to update an order of media items in the playlist and generating an updated order of media items in the playlist in response to receiving the request from the second electronic device.
G06F 16/438 - Présentation des résultats des requêtes
H04L 67/52 - Services réseau spécialement adaptés à l'emplacement du terminal utilisateur
H04W 4/02 - Services utilisant des informations de localisation
H04W 4/80 - Services utilisant la communication de courte portée, p.ex. la communication en champ proche, l'identification par radiofréquence ou la communication à faible consommation d’énergie
A parse arbitrator receives a rule-based parser predictive test result indicating whether a first set contains at least one predictive slot and a descriptive classifier test result indicating whether the digitized descriptive query is descriptive. The parse arbitrator instructs a fulfillment system to perform a fulfillment operation based on the first set when the rule-based parser predictive test result and the descriptive classifier test result align and instructs a fulfillment system to perform a fulfillment operation based on a second set when the rule-based parser predictive test result and the descriptive classifier test result do not align. The first set has been generated by using a first parser to parse a digitized descriptive query and the second set has been generated by using a second parser to parse the digitized descriptive query.
Systems, methods, and devices for training and testing utterance based frameworks are disclosed. The training and testing can be conducting using synthetic utterance samples in addition to natural utterance samples. The synthetic utterance samples can be generated based on a vector space representation of natural utterances. In one method, a synthetic weight vector associated with a vector space is generated. An average representation of the vector space is added to the synthetic weight vector to form a synthetic feature vector. The synthetic feature vector is used to generate a synthetic voice sample. The synthetic voice sample is provided to the utterance-based framework as at least one of a testing or training sample.
G10L 15/06 - Création de gabarits de référence; Entraînement des systèmes de reconnaissance de la parole, p.ex. adaptation aux caractéristiques de la voix du locuteur
G06F 7/58 - Générateurs de nombres aléatoires ou pseudo-aléatoires
G10L 13/02 - Procédés d'élaboration de parole synthétique; Synthétiseurs de parole
The various implementations described herein include methods and devices for facilitating semantic search. In one aspect, a method includes obtaining audio content and extracting vocabulary terms from the audio content. The method further includes generating, using a transformer model, a vocabulary embedding from the vocabulary terms, and generating one or more topic embeddings from the audio content and the vocabulary embeddings. The method also includes generating a topic embedding index for the audio content based on the one or more topic embeddings, and storing the embedding index for use with a search engine system.
An electronic system obtains a first plurality of records corresponding to a plurality of media items, wherein each record of the first plurality of records has at least one attribute of a plurality of attributes. The electronic system trains a machine-learning model by, for each record of the first plurality of records, masking a portion of an attribute of the record. An encoder of the machine-learning model produces a training embedding for the record, and a decoder of the machine-learning model predicts the masked portion of the attribute of the record, based on the training embedding. The electronic system uses the trained machine-learning model to produce an embedding for each record of a second plurality of records, and groups two or more records of the second plurality of records into a first group based on the embeddings of the two or more records.
An adaptive multi-model item selection method, comprising: receiving, from one of a plurality of client devices, a request including a client-side feature vector representing a state of the client device; determining, by an advocate model, a probability distribution of a plurality of specialist cluster models from the client-side feature vector; choosing, by a use case selector, a cluster corresponding to a use case from the probability distribution; and obtaining, by the use case selector based on the cluster (i.e., the cluster that was sampled by the user case selector), a specialist cluster model from the plurality of specialist cluster models.
A system, method and computer product for training a neural network system. The method comprises applying an audio signal to the neural network system, the audio signal including a vocal component and a non-vocal component. The method also comprises comparing an output of the neural network system to a target signal, and adjusting at least one parameter of the neural network system to reduce a result of the comparing, for training the neural network system to estimate one of the vocal component and the non-vocal component. In one example embodiment, the system comprises a U-Net architecture. After training, the system can estimate vocal or instrumental components of an audio signal, depending on which type of component the system is trained to estimate.
Methods, systems, and computer programs for generating a playlist of media content items for a group of users. Media content items listened to by the selected users are compared to an average user taste profile to select media content items for playback to the group of users.
Systems, devices, apparatuses, components, methods, and techniques for generating and playing a selectable content depth media program are provided. Media content items are edited to produce selectable depth media segments which are assembled into selectable depth media programs. A media-playback device is configured to navigate and play the selectable depth media program through interaction by a listening user. The user selects the desired content depth for each media segment.
H04N 21/482 - Interface pour utilisateurs finaux pour la sélection de programmes
H04N 21/2387 - Traitement de flux en réponse à une requête de reproduction par un utilisateur final, p.ex. pour la lecture à vitesse variable ("trick play")
24.
SYSTEMS AND METHODS FOR MUSICAL PERFORMANCE SCORING
An electronic device pre-processes a target audio track, including determining, for each time interval of a plurality of time intervals of the target audio track, a multi-pitch salience. The electronic device presents the target audio track at a device associated with the user. While presenting the target audio track at the device associated with the user, the electronic device receives an audio data stream representative of a user's musical performance and scores the user's musical performance with respect to the target audio track by comparing, respectively, for each time interval of the plurality of time intervals of the target audio track, (i) a pitch of the user's musical performance represented by the audio data stream to (ii) the multi-pitch salience.
An electronic system stores metadata for a plurality of media items, including, for each media item of the plurality of media items, at least one categorical identifier from a set of categorical identifiers. For a user of the media-providing service, the electronic system (i) determines a distribution of interests of the user with respect to the set of categorical identifiers; (ii) generates a network graph configured to represent a calibrated media item selection task, wherein the network graph represents respective relevance scores for each respective media item of the plurality of media items and the distribution of interests of the user with respect to the categorical identifiers; (iii) selects a set of media items from the plurality of media items to recommend to the user by solving for a maximum flow of the network graph; and (iv) provides the set of media items as recommendations to the user.
G06F 16/483 - Recherche caractérisée par l’utilisation de métadonnées, p.ex. de métadonnées ne provenant pas du contenu ou de métadonnées générées manuellement utilisant des métadonnées provenant automatiquement du contenu
G06F 16/435 - Filtrage basé sur des données supplémentaires, p.ex. sur des profils d'utilisateurs ou de groupes
G06F 16/901 - Indexation; Structures de données à cet effet; Structures de stockage
Systems and methods for providing format agnostic media playback may include a media delivery system that can determine media content responses for media playback devices to use to determine how to operate when playing media content. To initiate the format agnostic media playback, a request for media content may be received from a media playback device. In response to the request a media content format of the media content may be determined, and a media content response may be determined based on the media content format. The media content response may be sent to the media playback device for the media playback device to playback media content and/or to determine media content descriptions, player features, user interface formats, actions, and/or a timeline of media content items to play.
G06F 3/0484 - Techniques d’interaction fondées sur les interfaces utilisateur graphiques [GUI] pour la commande de fonctions ou d’opérations spécifiques, p.ex. sélection ou transformation d’un objet, d’une image ou d’un élément de texte affiché, détermination d’une valeur de paramètre ou sélection d’une plage de valeurs
Methods, systems and computer program products are provided for performing personalization. A segments reader receives a request for episode metadata corresponding to an episode. A determination is made whether the episode metadata should be personalized to an account associated with the request. In turn, personalized-episode metadata associated with the account and the request is retrieved and the personalized-episode metadata is provided to a client device associated with the request.
The present application describes various methods and devices for providing content to users. In one aspect, a method includes, for each content item of a set of content items, obtaining a score for the content item using a recommender system, the score corresponding to a calculation of subsequent repeated engagement by a user with the content item. The method also includes ranking the set of content items based on the respective scores and providing recommendation information to the user for one or more highest ranked content items in the set of content items.
Contrastive learning is used to learn an alternative embedding. A subtree replacement strategy generates structurally similar pairs of samples from an input space for use in contrastive learning. The resulting embedding captures more of the structural proximity relationships of the input space and improves Bayesian optimization performance when applied to tasks such as fitting and optimization.
An electronic device generates a respective user queue for each user of a plurality of users participating in a shared listening session. While providing a first media content item for playback, the device receives a second request, from a first user, to add a second media content item to the shared playback queue and updates the respective user queue for the first user. After receiving the second request, the electronic device receives a third request, from a second user, to add a third media content item to the shared playback queue and updates the respective user queue for the second user. The electronic device updates the shared playback queue using the respective user queues of the first user and the second user, including positioning the third media content item in an order of the shared playback queue to be played back before the second media content item.
H04N 21/472 - Interface pour utilisateurs finaux pour la requête de contenu, de données additionnelles ou de services; Interface pour utilisateurs finaux pour l'interaction avec le contenu, p.ex. pour la réservation de contenu ou la mise en place de rappels, pour la requête de notification d'événement ou pour la transformation de contenus affichés
H04N 21/25 - Opérations de gestion réalisées par le serveur pour faciliter la distribution de contenu ou administrer des données liées aux utilisateurs finaux ou aux dispositifs clients, p.ex. authentification des utilisateurs finaux ou des dispositifs clients ou
H04N 21/258 - Gestion de données liées aux clients ou aux utilisateurs finaux, p.ex. gestion des capacités des clients, préférences ou données démographiques des utilisateurs, traitement des multiples préférences des utilisateurs finaux pour générer des données co
H04N 21/262 - Ordonnancement de la distribution de contenus ou de données additionnelles, p.ex. envoi de données additionnelles en dehors des périodes de pointe, mise à jour de modules de logiciel, calcul de la fréquence de transmission de carrousel, retardement d
H04N 21/442 - Surveillance de procédés ou de ressources, p.ex. détection de la défaillance d'un dispositif d'enregistrement, surveillance de la bande passante sur la voie descendante, du nombre de visualisations d'un film, de l'espace de stockage disponible dans l
H04N 21/45 - Opérations de gestion réalisées par le client pour faciliter la réception de contenu ou l'interaction avec le contenu, ou pour l'administration des données liées à l'utilisateur final ou au dispositif client lui-même, p.ex. apprentissage des préféren
H04N 21/458 - Ordonnancement de contenu pour créer un flux personnalisé, p.ex. en combinant une publicité stockée localement avec un flux d'entrée; Opérations de mise à jour, p.ex. pour modules de système d'exploitation
H04N 21/466 - Procédé d'apprentissage pour la gestion intelligente, p.ex. apprentissage des préférences d'utilisateurs pour recommander des films
This disclosure is directed to an enhanced audio file generator. One aspect is a method of enhancing input speech in an input audio file, the method comprising receiving the input audio file representing the input speech, wherein the input audio file is recorded at an audio recording device, and generating an enhanced audio file by applying an audio transformation model to the input audio file, wherein applying the audio transformation model to generate the enhanced audio file comprises extracting parameters defining audio features from the input audio file, the parameters including a noise parameter defining noise in the input audio file and one or more other preset parameters respectively defining other audio features, synthesizing clean speech based on the extracted parameters including the noise parameter, wherein synthesizing the clean speech comprises transforming the noise parameter to defined value(s); and generating the enhanced audio file with the synthesized clean speech.
G10L 21/0264 - Filtration du bruit caractérisée par le type de mesure du paramètre, p.ex. techniques de corrélation, techniques de passage par zéro ou techniques prédictives
G10L 13/047 - Architecture des synthétiseurs de parole
G10L 25/03 - Techniques d'analyses de la parole ou de la voix qui ne se limitent pas à un seul des groupes caractérisées par le type de paramètres extraits
G10L 25/30 - Techniques d'analyses de la parole ou de la voix qui ne se limitent pas à un seul des groupes caractérisées par la technique d’analyse utilisant des réseaux neuronaux
G10L 25/51 - Techniques d'analyses de la parole ou de la voix qui ne se limitent pas à un seul des groupes spécialement adaptées pour un usage particulier pour comparaison ou différentiation
A request to play a media content item is received. It is determined whether the play request is ambiguous. Responsive to determining that the play request is ambiguous, then it is determined whether to play a suspended media content item or an alternate media content item. The determination can be made based on a length of time that the suspended media content item has been suspended, a media content item type, or a state, among other factors. Responsive to the determination, playback of the suspended or alternate media content item is initiated.
H04L 65/613 - Diffusion en flux de paquets multimédias pour la prise en charge des services de diffusion par flux unidirectionnel, p.ex. radio sur Internet pour la commande de la source par la destination
In general, this disclosure is directed to generating and discovering a group media playback session that is conducted in a media playback device. One aspect is a method of controlling a media output device, the method comprising establishing a media playback session at a host device through the media output device, wirelessly broadcasting a participant ID from a participant device to the host device, associating, at the host device, the participant ID with a session ID for the media playback session and sending the association to a server, and transmitting session information from the server to the participant device to permit the participant device to join the media playback session to adjust playback at the media output device.
H04L 65/1069 - Gestion de session Établissement ou terminaison d'une session
H04N 21/63 - Signalisation de contrôle entre des éléments du client, serveur et réseau; Procédés liés au réseau pour la distribution de vidéo entre serveur et clients, p.ex. transmission de la couche de base et des couches d’amélioration sur des voies de transmission différentes, mise en œuvre d’une communication pair à pair via Interne; Protocoles de communication; Adressage
34.
Playback of audio content along with associated non-static media content
One aspect herein relates to a method performed by a server system. In response to receiving a request message from a first electronic device, an audio content item is retrieved and a non-static content item associated with the audio content item is located. The server system transmits, to the first electronic device, the audio content item, and the located non-static media content item. The server system receives a second request message from a second electronic device, the second request message including an instruction for the server system to modify the non-static media content item associated with the audio content item. In response to the second request message, the server system modifies the non-static media content item associated with the audio content item in accordance with the instruction of the second request message from the second electronic device.
G06F 16/683 - Recherche de données caractérisée par l’utilisation de métadonnées, p.ex. de métadonnées ne provenant pas du contenu ou de métadonnées générées manuellement utilisant des métadonnées provenant automatiquement du contenu
G06F 16/48 - Recherche caractérisée par l’utilisation de métadonnées, p.ex. de métadonnées ne provenant pas du contenu ou de métadonnées générées manuellement
G06F 16/583 - Recherche caractérisée par l’utilisation de métadonnées, p.ex. de métadonnées ne provenant pas du contenu ou de métadonnées générées manuellement utilisant des métadonnées provenant automatiquement du contenu
G06F 16/783 - Recherche de données caractérisée par l’utilisation de métadonnées, p.ex. de métadonnées ne provenant pas du contenu ou de métadonnées générées manuellement utilisant des métadonnées provenant automatiquement du contenu
H04N 21/431 - Génération d'interfaces visuelles; Rendu de contenu ou données additionnelles
H04N 21/4722 - Interface pour utilisateurs finaux pour la requête de contenu, de données additionnelles ou de services; Interface pour utilisateurs finaux pour l'interaction avec le contenu, p.ex. pour la réservation de contenu ou la mise en place de rappels, pour la requête de notification d'événement ou pour la transformation de contenus affichés pour la requête de données additionnelles associées au contenu
Utterance-based user interfaces can include activation trigger processing techniques for detecting activation triggers and causing execution of certain commands associated with particular command pattern activation triggers without waiting for output from a separate speech processing engine. The activation trigger processing techniques can also detect speech analysis patterns and selectively activate a speech processing engine.
An electronic device associated with a media-providing service displays a user interface that includes a representation of a first media item. While the representation of the first media item is displayed, the electronic device initiates playback of a preview of the first media item. The electronic device further detects a first input by a user to display a representation of a second media item. Then, while the electronic device is displaying the representation of the second media item, the electronic device initiates playback of a preview of the second media item. Based on a determination that the preview of the second media item has completed playback, the electronic device plays the second media item and adds the second media item to the user's playback history without further intervention by the user.
H04N 21/482 - Interface pour utilisateurs finaux pour la sélection de programmes
G06F 3/0485 - Défilement ou défilement panoramique
G06F 3/0488 - Techniques d’interaction fondées sur les interfaces utilisateur graphiques [GUI] utilisant des caractéristiques spécifiques fournies par le périphérique d’entrée, p.ex. des fonctions commandées par la rotation d’une souris à deux capteurs, ou par la nature du périphérique d’entrée, p.ex. des gestes en fonction de la pression exer utilisant un écran tactile ou une tablette numérique, p.ex. entrée de commandes par des tracés gestuels
37.
Systems and methods for generating personalized pools of candidate media items
An electronic device stores, for a user of a media-providing service, a playback history that includes information about media items that have previously been consumed by the user. The electronic device receives a request to search for media content including search criteria. In response to the request, and without additional user intervention, the electronic device generates a vector representation of the user using media items from the playback history of the user that are relevant to the search criteria. The electronic device identifies one or more media content items from a media content library that match the vector representation of the user and the search criteria and provides, to the user, the one or more media content items.
Methods, systems and computer program products are provided for content generation. A distribution of policies is defined based on an action space. Distribution parameters are received from a reinforcement learning (RL) algorithm. In turn, a policy is randomly sampled from the distribution of policies. A candidate content item is generated using the sampled policy. A quality of the candidate content item is measured based on a predefined quality criteria and a parameter model is adjusted as specified by the reinforcement learning algorithm to obtain a plurality of updated distribution parameters. Environment settings are passed to a trained parameter model to obtain a plurality of policy distribution parameters. A predetermined number of policies from the distribution of policies are then sampled and the plurality of environment settings are passed to the predetermined number of sampled policies to obtain at least one content item.
A media content item recommendation system recommends media content items based on one or more attributes of a seed playlist. The recommended media content items can be determined from a plurality of existing playlists that have been created over a period of time. Such existing playlists can be selected based on similarity to the seed playlist.
A cuepoint determination system utilizes a convolutional neural network (CNN) to determine cuepoint placements within media content items to facilitate smooth transitions between them. For example, audio content from a media content item is normalized to a plurality of beats, the beats are partitioned into temporal sections, and acoustic feature groups are extracted from each beat in one or more of the temporal sections. The acoustic feature groups include at least downbeat confidence, position in bar, peak loudness, timbre and pitch. The extracted acoustic feature groups for each beat are provided as input to the CNN on a per temporal section basis to predict whether a beat immediately following the temporal section within the media content item is a candidate for cuepoint placement. A cuepoint placement is then determined from among the candidate cuepoint placements predicted by the CNN.
G06F 16/683 - Recherche de données caractérisée par l’utilisation de métadonnées, p.ex. de métadonnées ne provenant pas du contenu ou de métadonnées générées manuellement utilisant des métadonnées provenant automatiquement du contenu
The various implementations described herein include methods and devices for media segmentation. In one aspect, a method includes obtaining audio content for a podcast and generating sentence embeddings for the audio content. The method also includes generating segment embeddings using the sentence embeddings and context information, and determining, for each segment embedding, whether the segment embedding includes a topic transition for the podcast. The method further includes generating one or more topic transition timestamps for the podcast in accordance with the determining.
The various implementations described herein include methods and devices for media discovery. In one aspect, a method includes obtaining a pre-trained recommender model that has been trained using contrastive learning with feature-level augmentation and instance-level augmentation. The method further includes generating, via the model, a user embedding based on features of the user and generating, via the model, a respective episode embedding for each episode of a plurality of episodes, each respective episode embedding based on features of the corresponding episode. The method also includes generating, via the model, a respective similarity score (corresponding to a latent similarity between the user embedding and each respective episode embedding) for each episode, the respective similarity score, and ranking the episodes in accordance with the respective similarity scores. The method further includes recommending the highest ranked episode to the user.
The various implementations described herein include methods and devices for speaker diarization. In one aspect, a method includes obtaining an audio recording and generating an embedding signal from the audio recording. The method further includes factoring the embedding signal to obtain a basis matrix and an activation matrix, including obtaining a sparse optimization of the embedding signal by minimizing a norm corresponding to the factored embedding signal. The method also includes generating a speaker log for the audio recording based on the sparse optimization of the embedding signal.
Systems, methods, and devices for human-machine interfaces for utterance-based playlist selection are disclosed. In one method, a list of playlists is traversed and a portion of each is audibly output until a playlist command is received. Based on the playlist command, the traversing is stopped and a playlist is selected for playback. In examples, the list of playlists is modified based on a modification input.
A server obtains user data corresponding to a first content domain. The server identifies, from the user data, a plurality of labels. A respective label of the plurality of labels corresponds to a distinct characteristic of content items of the first content domain. The server utilizes a neural network to generate a plurality of user embeddings. A respective user embedding of the plurality of user embeddings includes a plurality of labels that correspond to a respective user. The server determines, using the plurality of user embeddings, a first content item of a plurality of content items of a second type that meets matching criteria for a first user. The server further provides, to a device of the first user, information that corresponds to the first content item of the second content domain.
G06F 16/435 - Filtrage basé sur des données supplémentaires, p.ex. sur des profils d'utilisateurs ou de groupes
G06F 16/438 - Présentation des résultats des requêtes
G06F 16/483 - Recherche caractérisée par l’utilisation de métadonnées, p.ex. de métadonnées ne provenant pas du contenu ou de métadonnées générées manuellement utilisant des métadonnées provenant automatiquement du contenu
46.
Personalizing explainable recommendations with bandits
Methods, systems and computer program products are provided personalizing recommendations of items with associated explanations. The example embodiments described herein use contextual bandits to personalize explainable recommendations (“recsplanations”) as treatments (“Bart”). Bart learns and predicts satisfaction (e.g., click-through rate, consumption probability) for any combination of item, explanation, and context and, through logging and contextual bandit retraining, can learn from its mistakes in an online setting.
G06F 16/00 - Recherche d’informations; Structures de bases de données à cet effet; Structures de systèmes de fichiers à cet effet
G06F 16/635 - Filtrage basé sur des données supplémentaires, p.ex. sur des profils d'utilisateurs ou de groupes
G06F 16/638 - Présentation des résultats des requêtes
G06F 18/21 - Conception ou mise en place de systèmes ou de techniques; Extraction de caractéristiques dans l'espace des caractéristiques; Séparation aveugle de sources
47.
Using a hierarchical machine learning algorithm for providing personalized media content
An electronic device generates a score for each objective in a hierarchy of objectives. Generating the score comprises, using a first machine learning algorithm, generating a score for a first objective corresponding to a first level in the hierarchy of the objectives and using an output of the first machine learning algorithm, distinct from the score for the first objective, as an input to a second machine learning algorithm to generate a score for a second objective corresponding to a second level in the hierarchy of objectives. The electronic device generates a combined score using the score for the first objective and the score for the second objective. The electronic device selects, automatically without user input, media content based on the combined scores for the plurality of media content items and streams, using an application of the media-providing service, one or more of the selected media content to a user.
A full attention mechanism of a multilingual transformer model is converted into a Longformer attention mechanism to generate a Longformer multilingual transformer model. The Longformer multilingual transformer model is finetuned to perform a summarization task based on episode-description:episode-transcript pairs, thereby generating a finetuned Longformer multilingual transformer model. The Longformer multilingual transformer model also can further be finetuned to perform a summarization task based on article-summary:full-original-article pairs. A summary of a query episode transcript can be generated using the single-finetuned Longformer multilingual transformer model and/or the double-finetuned Longformer multilingual transformer model. The multilingual transformer-based model enables systems, methods and computer products to be capable of generating multilingual abstractive summaries.
G06F 40/58 - Utilisation de traduction automatisée, p.ex. pour recherches multilingues, pour fournir aux dispositifs clients une traduction effectuée par le serveur ou pour la traduction en temps réel
An electronic device receives a first media content item and receives information indicating: a first insertion time within the first media content item; and a second media content item to be played at the first insertion time and/or one or more properties of the second media content item. The electronic device stores the first media content item. The electronic device provides the first media content item to the second electronic device, including queuing the second electronic device to playback, in sequence and without user intervention: the first media content item until the first insertion time; the second media content item at the first insertion time; and the first media content item resumed after playback of the second media content item is ceased.
G06F 16/68 - Recherche de données caractérisée par l’utilisation de métadonnées, p.ex. de métadonnées ne provenant pas du contenu ou de métadonnées générées manuellement
A server system receives, from a first client device, a video recording created by the first client device and an indication that the video recording is to be associated with a media content item. The server system retrieves text associated with the media content item and provides the text for display at the first client device as a text lens overlay that is mapped to a portion of an object in the video recording of the first client device and follows movement of the portion of an object in the video recording created by the first client device. The server system provides, to a second client device, the video recording in combination with the media content item; and the text associated with the media content item as the text lens overlay that is mapped to the portion of the object of the first client device.
G06F 16/40 - Recherche d’informations; Structures de bases de données à cet effet; Structures de systèmes de fichiers à cet effet de données multimédia, p.ex. diaporama comprenant des données d'image et d’autres données audio
G06F 3/0482 - Interaction avec des listes d’éléments sélectionnables, p.ex. des menus
G06F 3/0484 - Techniques d’interaction fondées sur les interfaces utilisateur graphiques [GUI] pour la commande de fonctions ou d’opérations spécifiques, p.ex. sélection ou transformation d’un objet, d’une image ou d’un élément de texte affiché, détermination d’une valeur de paramètre ou sélection d’une plage de valeurs
G06F 16/435 - Filtrage basé sur des données supplémentaires, p.ex. sur des profils d'utilisateurs ou de groupes
G06F 16/483 - Recherche caractérisée par l’utilisation de métadonnées, p.ex. de métadonnées ne provenant pas du contenu ou de métadonnées générées manuellement utilisant des métadonnées provenant automatiquement du contenu
G06F 16/9535 - Adaptation de la recherche basée sur les profils des utilisateurs et la personnalisation
G06F 16/9536 - Personnalisation de la recherche basée sur le filtrage social ou collaboratif
G06Q 10/107 - Gestion informatisée du courrier électronique
G06Q 50/00 - Systèmes ou procédés spécialement adaptés à un secteur particulier d’activité économique, p.ex. aux services d’utilité publique ou au tourisme
H04L 51/52 - Messagerie d'utilisateur à utilisateur dans des réseaux à commutation de paquets, transmise selon des protocoles de stockage et de retransmission ou en temps réel, p.ex. courriel pour la prise en charge des services des réseaux sociaux
H04L 65/403 - Dispositions pour la communication multipartite, p.ex. pour les conférences
H04N 21/222 - Serveurs secondaires, p.ex. serveur proxy ou tête de réseau de télévision par câble
H04N 21/233 - Traitement de flux audio élémentaires
H04N 21/234 - Traitement de flux vidéo élémentaires, p.ex. raccordement de flux vidéo ou transformation de graphes de scènes MPEG-4
H04N 21/235 - Traitement de données additionnelles, p.ex. brouillage de données additionnelles ou traitement de descripteurs de contenu
H04N 21/431 - Génération d'interfaces visuelles; Rendu de contenu ou données additionnelles
H04N 21/435 - Traitement de données additionnelles, p.ex. décryptage de données additionnelles ou reconstruction de logiciel à partir de modules extraits du flux de transport
H04N 21/439 - Traitement de flux audio élémentaires
H04N 21/44 - Traitement de flux élémentaires vidéo, p.ex. raccordement d'un clip vidéo récupéré d'un stockage local avec un flux vidéo en entrée ou rendu de scènes selon des graphes de scène MPEG-4
H04N 21/462 - Gestion de contenu ou de données additionnelles, p.ex. création d'un guide de programmes électronique maître à partir de données reçues par Internet et d'une tête de réseau ou contrôle de la complexité d'un flux vidéo en dimensionnant la résolution o
H04N 21/4788 - Services additionnels, p.ex. affichage de l'identification d'un appelant téléphonique ou application d'achat communication avec d'autres utilisateurs, p.ex. discussion en ligne
H04N 21/8545 - Création de contenu pour générer des applications interactives
51.
SYSTEMS AND METHODS FOR BIDIRECTIONAL COMMUNICATION WITHIN A WEBSITE DISPLAYED WITHIN A MOBILE APPLICATION
A method is performed at an electronic device. The method includes displaying, in a mobile application provided by a media content provider, a user interface that includes one or more media content items. The method further includes displaying, within a browser displayed within the mobile application, external content that is associated with a content provider distinct from the media content provider, including displaying a first set of controls within the external content. The method includes, while displaying the external content, receiving a first user input selecting a first control of the first set of controls and, in response to the first user input selecting the first control, sending a command to the mobile application to perform an action and performing, by the mobile application, the action corresponding to the first control.
G06F 16/954 - Navigation, p.ex. en utilisant la navigation par catégories
G06F 16/438 - Présentation des résultats des requêtes
G06F 16/435 - Filtrage basé sur des données supplémentaires, p.ex. sur des profils d'utilisateurs ou de groupes
G06F 3/0484 - Techniques d’interaction fondées sur les interfaces utilisateur graphiques [GUI] pour la commande de fonctions ou d’opérations spécifiques, p.ex. sélection ou transformation d’un objet, d’une image ou d’un élément de texte affiché, détermination d’une valeur de paramètre ou sélection d’une plage de valeurs
52.
Methods and systems for interactive queuing for shared listening sessions based on user satisfaction
An electronic device stores a shared playback queue for the shared playback session, the shared playback queue comprising one or more media content items, including a first media content item associated with a first user and a second media content item associated with a second user of the plurality of users. The device receives a request to adjust the shared playback queue. The device determines an order for the adjusted shared playback queue based at least in part on media preferences indicated in a profile of a third user of the plurality of users participating in the shared playback session, wherein the third user is distinct from the first user and the second user. The device provides the first media content item and the second media content item based on the order of the shared playback queue.
H04N 21/482 - Interface pour utilisateurs finaux pour la sélection de programmes
H04N 21/45 - Opérations de gestion réalisées par le client pour faciliter la réception de contenu ou l'interaction avec le contenu, ou pour l'administration des données liées à l'utilisateur final ou au dispositif client lui-même, p.ex. apprentissage des préféren
H04N 21/475 - Interface pour utilisateurs finaux pour acquérir des données d'utilisateurs finaux, p.ex. numéro d'identification personnel [PIN] ou données de préférences
53.
Systems and methods for selecting images for a media item
An electronic device obtains a collection of images and obtains a media item. The electronic device selects a subset of the collection of images, including: selecting an initial subset of the collection of images, wherein the initial subset of the collection of images is based on descriptors associated with the collection of images and/or the media item; obtaining a set of preferences for a user of the media-providing service; and selecting the subset of the collection of images from the initial subset of the collection of images based on the set of preferences for the user of the media-providing service. The electronic device concurrently presents: a respective image of the subset of the collection of images; and the media item.
G06F 16/438 - Présentation des résultats des requêtes
G06F 16/535 - Filtrage basé sur des données supplémentaires, p.ex. sur des profils d'utilisateurs ou de groupes
G06F 16/583 - Recherche caractérisée par l’utilisation de métadonnées, p.ex. de métadonnées ne provenant pas du contenu ou de métadonnées générées manuellement utilisant des métadonnées provenant automatiquement du contenu
A system for device discovery for social playback is disclosed. The system operates to connect a host media playback device to a media output device and broadcast a social playback session to guest media playback devices. Upon joining a social playback session, a guest media playback device may control the media playback at the host media playback device. Where the media output for the social playback session is provided by the media output device.
H04L 65/60 - Diffusion en flux de paquets multimédias
H04W 4/80 - Services utilisant la communication de courte portée, p.ex. la communication en champ proche, l'identification par radiofréquence ou la communication à faible consommation d’énergie
Audio translation system includes a feature extractor and a style transfer machine learning model. The feature extractor generates for each of a plurality of source voice files one or more source voice parameters encoded as a collection of source feature vectors, and generates for each of a plurality of target voice files one or more target voice parameters encoded as a collection of target feature vectors. The style transfer machine learning model trained on the collection of source feature vectors for the plurality of source voice files and the collection of target feature vectors for the plurality of target voice files to generate a style transformed feature vector.
G10L 21/003 - Changement de la qualité de la voix, p.ex. de la hauteur tonale ou des formants
G10L 25/45 - Techniques d'analyses de la parole ou de la voix qui ne se limitent pas à un seul des groupes caractérisées par le type de fenêtre d’analyse
G10L 25/75 - Techniques d'analyses de la parole ou de la voix qui ne se limitent pas à un seul des groupes pour la modélisation des paramètres du conduit vocal
G10L 15/06 - Création de gabarits de référence; Entraînement des systèmes de reconnaissance de la parole, p.ex. adaptation aux caractéristiques de la voix du locuteur
56.
Selection of a wireless device to be remotely controlled by a user interface device for media presentation
A method includes receiving a Bluetooth Low Energy (BLE) advertising message from a user interface (UI) device. The method includes, responsive to a receipt of the BLE advertising message from the UI device: waking up an application module of the first wireless device and authorizing the UI device to remotely control media presentation as presented by the application module. The method includes determining a first determination of whether the first wireless device is paired or is in a current cabled connection with an electronic device that is distinct from the UI device; and in accordance with the first determination being a determination that the first wireless device is not paired with the electronic device and is not in a current cabled connection with the electronic device, automatically terminating the authorization of the UI device to remotely control media presentation as presented by the application module.
H04N 21/414 - Plate-formes spécialisées de client, p.ex. récepteur au sein d'une voiture ou intégré dans un appareil mobile
H04L 67/125 - Protocoles spécialement adaptés aux environnements propriétaires ou de mise en réseau pour un usage spécial, p.ex. les réseaux médicaux, les réseaux de capteurs, les réseaux dans les véhicules ou les réseaux de mesure à distance en impliquant la commande des applications des terminaux par un réseau
H04N 21/41 - Structure de client; Structure de périphérique de client
H04W 4/48 - Services spécialement adaptés à des environnements, à des situations ou à des fins spécifiques pour les véhicules, p.ex. communication véhicule-piétons pour la communication dans le véhicule
H04W 4/80 - Services utilisant la communication de courte portée, p.ex. la communication en champ proche, l'identification par radiofréquence ou la communication à faible consommation d’énergie
Media content episodes are received. Using machine learning, one or more media segments of interest are identified in each of the media content episodes based at least in part on an analysis of content included in a corresponding audio content episode. Each of the identified media segments is associated with one or more automatically determined tags. Using machine learning, a recommended media segment is selected for a specific user from the identified media segments based at least in part on attributes of the specific user and the automatically determined tags of the identified media segments. The recommended media segment is automatically provided in an media segment feed.
G10L 17/00 - Identification ou vérification du locuteur
G10L 25/51 - Techniques d'analyses de la parole ou de la voix qui ne se limitent pas à un seul des groupes spécialement adaptées pour un usage particulier pour comparaison ou différentiation
58.
System and Method for Assessing and Correcting Potential Underserved Content In Natural Language Understanding Applications
Methods, systems, and related products that provide detection of media content items that are under-locatable by machine voice-driven retrieval of uttered requests for retrieval of the media items. For a given media item, a resolvability value and/or an utterance resolve frequency is calculated by a number of playbacks of the media item by a speech retrieval modality to a total number of playbacks of the media item regardless of retrieval modality. In some examples, the methods, systems and related products also provide for improvement in the locatability of an under-locatable media item by collecting and/or generating one or more pronunciation aliases for the under-locatable item.
A text-to-speech engine creates audio output that includes synthesized speech and one or more media content item snippets. The input text is obtained and partitioned into text sets. A track having lyrics that match a part of one of the text sets is identified. The location of the track's audio that contains the lyric is extracted based on forced alignment data. The extracted audio is combined with synthesized speech corresponding to the remainder of the input text to form audio output.
G10L 13/00 - Synthèse de la parole; Systèmes de synthèse de la parole à partir de texte
G06F 16/683 - Recherche de données caractérisée par l’utilisation de métadonnées, p.ex. de métadonnées ne provenant pas du contenu ou de métadonnées générées manuellement utilisant des métadonnées provenant automatiquement du contenu
G10L 13/04 - Procédés d'élaboration de parole synthétique; Synthétiseurs de parole - Détails des systèmes de synthèse de la parole, p.ex. structure du synthétiseur ou gestion de la mémoire
A method of determining relations between music items, wherein a music item is a submix of a musical composition comprising one or more music tracks, the method comprising determining a first input representation for at least part of a first music item, mapping the first input representation onto to one or more subspaces derived from a vector space using a first model, wherein each subspace models a characteristic of the music items, determining a second input representation for at least part of a second music item, mapping the second input representation onto the one or more subspaces using a second model, and determining a distance between the mappings of the first and second input representations in each subspace, wherein the distance represents the degree of relation between the first and second input representations with respect to the characteristic modelled by the subspace.
G10H 1/00 - INSTRUMENTS DE MUSIQUE ÉLECTROPHONIQUES; INSTRUMENTS DANS LESQUELS LES SONS SONT PRODUITS PAR DES MOYENS ÉLECTROMÉCANIQUES OU DES GÉNÉRATEURS ÉLECTRONIQUES, OU DANS LESQUELS LES SONS SONT SYNTHÉTISÉS À PARTIR D'UNE MÉMOIRE DE DONNÉES Éléments d'instruments de musique électrophoniques
A method of determining relations between music items, the method comprising determining a first input representation for a symbolic representation of a first music item, mapping the first input representation onto to one or more subspaces derived from a vector space using a first model, wherein each subspace models a characteristic of the music items, determining a second input representation for music data representing a second music item, mapping the second input representation onto the one or more subspaces using a second model, determining a distance between the mappings of the first and second input representation in each subspace, wherein the distance represents the degree of relation between the first and second input representation with respect to the characteristic modelled by the subspace.
A method for training a speech synthesis model adapted to output speech in response to input text is provided. The method includes receiving training data for training said speech synthesis model, the training data comprising speech that corresponds to known text. The method includes training said speech synthesis model. The method includes testing said speech synthesis model using a plurality of text sequences. The method includes calculating at least one metric indicating the performance of the model when synthesising each text sequence. The method includes determining from said metric whether the speech synthesis model requires further training. The method includes determining targeted training text from said calculated metrics, wherein said targeting training text is text related to text sequences where the metric indicated that the model required further training. And the method includes outputting said determined targeted training text with a request further speech corresponding to the targeted training text.
G10L 13/047 - Architecture des synthétiseurs de parole
G10L 13/08 - Analyse de texte ou génération de paramètres pour la synthèse de la parole à partir de texte, p.ex. conversion graphème-phonème, génération de prosodie ou détermination de l'intonation ou de l'accent tonique
A system and method for media content sequencing. Prior tracks for a listening session are segmented into groups based on attribute scores for an audial attribute. A preferred group is then selected, which can be based on user feedback regarding the prior tracks in the listening session. Candidate tracks, such as from a candidate track pool for future playback in the listening session, are also segmented into the groups of the prior tracks. The candidate tracks can then be ranked based on their associated group and the preferred group.
G10H 1/00 - INSTRUMENTS DE MUSIQUE ÉLECTROPHONIQUES; INSTRUMENTS DANS LESQUELS LES SONS SONT PRODUITS PAR DES MOYENS ÉLECTROMÉCANIQUES OU DES GÉNÉRATEURS ÉLECTRONIQUES, OU DANS LESQUELS LES SONS SONT SYNTHÉTISÉS À PARTIR D'UNE MÉMOIRE DE DONNÉES Éléments d'instruments de musique électrophoniques
A second wake word detector, at a media-playback device, that plays audio (or other) content to a device, such as a voice-enabled device, detects false wake words in the audio content. The second wake word detector analyzes the audio stream to determine if the audio stream contains any audio that sounds like the wake word. If so, the second wake word detector can generate one of a plurality of instructions that describes the time period, within the audio content, in which the false wake word was encountered. The instruction can cause a first wake word detector to assume one of a plurality of configurations. The media-playback device can then instruct or inform the voice-enabled device of the presence of the false wake word. In this way, the wake word detector, at the voice-enabled device, is not activated to receive the false wake word or ignores the wake word.
G10L 15/22 - Procédures utilisées pendant le processus de reconnaissance de la parole, p.ex. dialogue homme-machine
G10L 15/20 - Techniques de reconnaissance de la parole spécialement adaptées de par leur robustesse contre les perturbations environnantes, p.ex. en milieu bruyant ou reconnaissance de la parole émise dans une situation de stress
A wake word detector, at a server of a content delivery network (CDN) that provides audio (or other) content to a device, such as a voice-enabled device, detects false wake words in the audio content. The CDN wake word detector analyzes the audio stream to determine if the audio stream contains any audio that sounds like the wake word. If so, the CDN wake word detector can generate metadata that describes the time period, within the audio content, in which the false wake word was encountered. The metadata can include time offsets, from the start of the audio content, which can instruct a voice-enabled device to deactivate during the time period. This metadata is stored and then sent to the media-playback device requests the media content. The media-playback device can then instruct or inform the voice-enabled device of the presence of the false wake word. In this way, the wake word detector, at the voice-enabled device, is not activated to receive the false wake word.
Systems, devices, apparatuses, components, methods, and techniques for predicting user and media-playback device states are provided. Systems, devices, apparatuses, components, methods, and techniques for representing cached, user-selected, and streaming content are also provided.
G06F 15/167 - Communication entre processeurs utilisant une mémoire commune, p.ex. boîte aux lettres électronique
G06F 12/0888 - Adressage d’un niveau de mémoire dans lequel l’accès aux données ou aux blocs de données désirés nécessite des moyens d’adressage associatif, p.ex. mémoires cache utilisant la mémorisation cache sélective, p.ex. la purge du cache
G06F 12/14 - Protection contre l'utilisation non autorisée de mémoire
G06N 5/02 - Représentation de la connaissance; Représentation symbolique
H04L 67/5681 - Pré-extraction ou pré-livraison de données en fonction des caractéristiques du réseau
H04N 21/231 - Opération de stockage de contenu, p.ex. mise en mémoire cache de films pour stockage à court terme, réplication de données sur plusieurs serveurs, ou établissement de priorité des données pour l'effacement
67.
TEXT-TO-SPEECH SYNTHESIS METHOD AND SYSTEM, AND A METHOD OF TRAINING A TEXT-TO-SPEECH SYNTHESIS SYSTEM
A text-to-speech synthesis method includes receiving text, inputting the received text in a synthesizer that includes a prediction network configured to convert the received text into speech data having a speech attribute that includes emotion, intention, projection, pace, and/or accent, and outputting said speech data. The prediction network is obtained by obtaining a first sub-dataset and a second sub-dataset, where the first sub-dataset and the second sub-dataset each include audio samples and corresponding text, and the speech attribute of the audio samples of the second sub-dataset is more pronounced than the speech attribute of the audio samples of the first sub-dataset, training a first model using the first sub-dataset until a performance metric reaches a first predetermined value, training a second model by further training the first model using the second sub-dataset until the performance metric reaches a second predetermined value, and selecting one trained model as the prediction network.
G10L 13/033 - Procédés d'élaboration de parole synthétique; Synthétiseurs de parole Édition de voix, p.ex. transformation de la voix du synthétiseur
G10L 13/047 - Architecture des synthétiseurs de parole
G10L 13/027 - Synthétiseurs de parole à partir de concepts; Génération de phrases naturelles à partir de concepts automatisés
G10L 13/08 - Analyse de texte ou génération de paramètres pour la synthèse de la parole à partir de texte, p.ex. conversion graphème-phonème, génération de prosodie ou détermination de l'intonation ou de l'accent tonique
Methods, systems and computer program products are provided for determining acoustic feature vectors of query and target items in a first vector space, and mapping the acoustic feature vectors to a second vector space having a lower dimension. The distribution of vectors in the second vector space can then be used to identify items from the same songs, and/or items that are complementary. A mapping function is trained using a machine learning algorithm, such that complementary audio items are closer in the second vector space than the first, according to a given distance metric.
G10L 25/51 - Techniques d'analyses de la parole ou de la voix qui ne se limitent pas à un seul des groupes spécialement adaptées pour un usage particulier pour comparaison ou différentiation
G10L 25/30 - Techniques d'analyses de la parole ou de la voix qui ne se limitent pas à un seul des groupes caractérisées par la technique d’analyse utilisant des réseaux neuronaux
A method, which may be performed at an electronic device, such as a media server associated with a media-providing service, causes a set of media items to be provided to a user based on identifying performance listings relevant to the user. The method includes determining a list of one or more performance listings of artists relevant to a user based on a media consumption history of the user, the media consumption history describing media content items previously delivered to the user by a media content server, and a listening profile of a second user, distinct from the first user, the listening profile identifying media content and artists played by the second user via the media content server. The method includes providing one or more media items to the user, the one or more media items selected based on the list of one or more performance listings.
A descriptive media content search solution is provided to allow a user to search for media content that better matches a user's descriptive search request. The descriptive media content search solution utilizes an extensive catalog of playlists each having a playlist description, such as a playlist title or other descriptive text, and identifies additional descriptive information for media content items to be searched. The descriptive media content search solution can set up a descriptive search database and utilize the descriptive search database to conduct a descriptive search responsive to the user's descriptive search request.
G06F 16/48 - Recherche caractérisée par l’utilisation de métadonnées, p.ex. de métadonnées ne provenant pas du contenu ou de métadonnées générées manuellement
G06F 16/41 - Indexation; Structures de données à cet effet; Structures de stockage
G06F 16/438 - Présentation des résultats des requêtes
G06F 16/2457 - Traitement des requêtes avec adaptation aux besoins de l’utilisateur
71.
Systems and methods for using hierarchical ordered weighted averaging for providing personalized media content
An electronic device, for each media content item of a plurality of media content items, receives a respective score for each a first set of objectives and one or more other objectives and generates a respective score between a user and the media content item. The generating includes applying a first ordered weighted average to the respective scores for the first set of objectives, to produce a first combined score for the first set of objectives, applying a second ordered weighted average to the respective scores for a second set of objectives, wherein the second set of objectives includes (i) a resulting objective corresponding to the first set of objectives and having the first combined score and (ii) the one or more other objectives. The electronic device streams media content to the user selected based on the respective scores between the user and the media content items.
While a first media content item from a shared playback session is being presented on a set of presentation devices, each of the set of presentation devices corresponding to a respective user of a plurality of users in the shared playback session, a master device receives a request to modify playback of the shared playback session from an observer device of a set of observer devices that corresponds to a presentation device for a first user, wherein the set of observer devices is different from the set of presentation devices. In response to the request to modify playback of the shared playback session, the master device sends a command for an action selected based on the request to each of the set of presentation devices.
H04N 21/472 - Interface pour utilisateurs finaux pour la requête de contenu, de données additionnelles ou de services; Interface pour utilisateurs finaux pour l'interaction avec le contenu, p.ex. pour la réservation de contenu ou la mise en place de rappels, pour la requête de notification d'événement ou pour la transformation de contenus affichés
H04L 65/401 - Prise en charge des services ou des applications dans laquelle les services impliquent une session principale en temps réel et une ou plusieurs sessions parallèles additionnelles en temps réel ou sensibles au temps, p.ex. accès partagé à un tableau blanc ou mise en place d’une sous-conférence
H04N 21/2387 - Traitement de flux en réponse à une requête de reproduction par un utilisateur final, p.ex. pour la lecture à vitesse variable ("trick play")
H04N 21/4788 - Services additionnels, p.ex. affichage de l'identification d'un appelant téléphonique ou application d'achat communication avec d'autres utilisateurs, p.ex. discussion en ligne
An adaptive multi-model item selection method, comprising: receiving, from one of a plurality of client devices, a request including a client-side feature vector representing a state of the client device; determining, by an advocate model, a probability distribution of a plurality of specialist cluster models from the client-side feature vector; choosing, by a use case selector, a cluster corresponding to a use case from the probability distribution; and obtaining, by the use case selector based on the cluster (i.e., the cluster that was sampled by the user case selector), a specialist cluster model from the plurality of specialist cluster models.
Apparatus, methods and computer-readable medium are provided for processing wind noise. Audio input is processed by receiving an audio input. A wind noise level representative of a wind noise at the microphone array is measured using the audio input and a determination is made, based on the wind noise level, whether to perform either (i) a wind noise suppression process on the audio input on-device, or (ii) the wind noise suppression process on the audio input on-device and an audio reconstruction process in-cloud.
G10L 21/0232 - Traitement dans le domaine fréquentiel
G10L 21/0216 - Filtration du bruit caractérisée par le procédé d’estimation du bruit
H04R 1/40 - Dispositions pour obtenir la fréquence désirée ou les caractéristiques directionnelles pour obtenir la caractéristique directionnelle désirée uniquement en combinant plusieurs transducteurs identiques
A method is provided for modifying a first media content item by superimposing a first set of data over a first audio event having an amplitude that satisfies a first threshold. The first audio event has a first audio profile, the first set of data has a second audio profile, playback of the second audio profile is configured to be masked by the first audio profile during playback of the first media content item, and the first set of data includes playlist information. The method includes transmitting, to a second electronic device, the modified first media content item.
G06F 16/683 - Recherche de données caractérisée par l’utilisation de métadonnées, p.ex. de métadonnées ne provenant pas du contenu ou de métadonnées générées manuellement utilisant des métadonnées provenant automatiquement du contenu
76.
Methods and systems for providing personalized content based on shared listening sessions
A method is provided for initiating a first shared playback session at a first device of a host user. The method includes, in response to receiving the request from the second device: in accordance with a determination that the second device is a first type of device, providing, to the second device, a first joining method for joining the first shared playback session and in accordance with a determination that the second device is a second type of device, providing, to the second device, a second joining method, distinct from the first joining method for joining the first shared playback session.
H04N 21/485 - Interface pour utilisateurs finaux pour la configuration du client
H04N 21/439 - Traitement de flux audio élémentaires
H04N 21/44 - Traitement de flux élémentaires vidéo, p.ex. raccordement d'un clip vidéo récupéré d'un stockage local avec un flux vidéo en entrée ou rendu de scènes selon des graphes de scène MPEG-4
H04N 21/442 - Surveillance de procédés ou de ressources, p.ex. détection de la défaillance d'un dispositif d'enregistrement, surveillance de la bande passante sur la voie descendante, du nombre de visualisations d'un film, de l'espace de stockage disponible dans l
H04N 21/647 - Signalisation de contrôle entre des éléments du réseau et serveur ou clients; Procédés réseau pour la distribution vidéo entre serveur et clients, p.ex. contrôle de la qualité du flux vidéo en éliminant des paquets, protection du contenu contre une modification non autorisée dans le réseau ou surveillance de la charge du résea
77.
METHODS AND SYSTEMS FOR SYNTHESISING SPEECH FROM TEXT
A method for synthesising speech from text includes receiving text and encoding, by way of an encoder module, the received text. The method further includes determining, by way of an attention module, a context vector from the encoding of the received text, wherein determining the context vector comprises at least one of: applying a threshold function to an attention vector and accumulating the thresholded attention vector, or applying an activation function to the attention vector and accumulating the activated attention vector. The method further includes determining speech data from the context vector.
G10L 13/08 - Analyse de texte ou génération de paramètres pour la synthèse de la parole à partir de texte, p.ex. conversion graphème-phonème, génération de prosodie ou détermination de l'intonation ou de l'accent tonique
G10L 13/047 - Architecture des synthétiseurs de parole
A method for personalizing media content for a user is provided. The method includes, at an electronic device, streaming a first media item from a first set of media items, the first set of media items compiled using a first recommendation hypothesis. The method further includes, while streaming the first media item, in response to a first user request, selecting, without user intervention, a second set of media items, distinct from the first set of media items, including determining a presentation order of a plurality of sets of media items using a heuristic applied to the plurality of sets of media items. The second set of media items is compiled using a second recommendation hypothesis, wherein the second recommendation hypothesis is distinct from the first recommendation hypothesis. The method includes streaming a second media item from the second set of media items.
H04L 65/613 - Diffusion en flux de paquets multimédias pour la prise en charge des services de diffusion par flux unidirectionnel, p.ex. radio sur Internet pour la commande de la source par la destination
H04L 65/1089 - Procédures en session en supprimant des médias
This disclosure is directed to adjusting a playlist of media-content items. One aspect is a method comprising receiving a request to adjust a playlist comprising initial media-content items, in response to receiving the input requesting the playlist be adjusted, compiling a set of features for the playlist and selecting a strong seed media-content item from the initial media-content items as a strong seed, predicting scores for a plurality of candidate media-content items based at least in part on the set of features for the playlist and the strong seed, the scores indicating a likelihood that a corresponding candidate media-content item will be added to the playlist, and inserting a candidate media-content item of the plurality of candidate media-content items after the strong seed media-content item based at least in part on the scores predicted for the plurality of candidate media-content items.
An audio cancellation system includes a voice enabled computing system that is connected to an audio output device using a wired or wireless communication network. The voice enabled computing device can provide media content to a user and receive a voice command from the user. The connection between the voice enabled computing system and the audio output device introduces a time delay between the media content being generated at the voice enabled computing device and the media content being reproduced at the audio output device. The system operates to determine a calibration value adapted for the voice enabled computing system and the audio output device. The system uses the calibration value to filter the user's voice command from a recording of ambient sound including the media content, without requiring significant use of memory and computing resources.
G10L 21/0232 - Traitement dans le domaine fréquentiel
G10L 25/51 - Techniques d'analyses de la parole ou de la voix qui ne se limitent pas à un seul des groupes spécialement adaptées pour un usage particulier pour comparaison ou différentiation
A method for communicating a playback order for a plurality of media content items to a user device operating in an online mode, the method performed at a server system and comprising receiving an indication that the user device will enter an offline mode, generating a playback order for the plurality of media content items, and transmitting the generated playback order to the user device before the user device enters the offline mode.
Methods, systems, and computer programs for generating a playlist of media content items without explicit content. A vector space is created that represents explicit and non-explicit tracks in the same playlists created by other users and then tracks are filtered based on cosine distance between the “seed tracks” and all the tracks in the aforementioned playlist. The explicit tracks are filtered out, and tracks are sorted based on the affinity of the user to the artist.
G06F 16/635 - Filtrage basé sur des données supplémentaires, p.ex. sur des profils d'utilisateurs ou de groupes
H04N 21/4545 - Signaux d'entrée aux algorithmes de filtrage, p.ex. filtrage d'une région de l'image
H04N 21/45 - Opérations de gestion réalisées par le client pour faciliter la réception de contenu ou l'interaction avec le contenu, ou pour l'administration des données liées à l'utilisateur final ou au dispositif client lui-même, p.ex. apprentissage des préféren
H04N 21/454 - Filtrage de contenu, p.ex. blocage des publicités
G06F 16/638 - Présentation des résultats des requêtes
G06F 16/683 - Recherche de données caractérisée par l’utilisation de métadonnées, p.ex. de métadonnées ne provenant pas du contenu ou de métadonnées générées manuellement utilisant des métadonnées provenant automatiquement du contenu
A system for supporting a user's repetitive motion activity operates to manage cadence-based playlists identifying one or more media content items having a tempo corresponding to a user's cadence. The cadence-based playlists can be categorized by different tempi or tempo ranges that cover all likely cadences during the user's activities. A media-playback device is provided to acquire a user's cadence and retrieve a cadence-based playlist associated with a tempo or a tempo range corresponding to the cadence.
G06F 17/30 - Recherche documentaire; Structures de bases de données à cet effet
G06F 16/683 - Recherche de données caractérisée par l’utilisation de métadonnées, p.ex. de métadonnées ne provenant pas du contenu ou de métadonnées générées manuellement utilisant des métadonnées provenant automatiquement du contenu
G06F 16/638 - Présentation des résultats des requêtes
G06F 16/68 - Recherche de données caractérisée par l’utilisation de métadonnées, p.ex. de métadonnées ne provenant pas du contenu ou de métadonnées générées manuellement
G06F 16/9535 - Adaptation de la recherche basée sur les profils des utilisateurs et la personnalisation
G06F 16/9538 - Présentation des résultats des requêtes
G05B 15/02 - Systèmes commandés par un calculateur électriques
An electronic device provides, to a user, a user-curated playlist, the user-curated playlist including an ordered set of media items that were added by the user. While providing a first media item in the ordered set of media items, the electronic device receives a first user input selecting an option to include recommended media items in the user-curated playlist. In response to the first user input, the electronic device updates the user-curated playlist to include a first recommended media item, the first recommended media item selected without user intervention based at least in part on attributes of the user-curated playlist. The first recommended media item is positioned in the user-curated playlist in between media items that were added to the ordered set of media items by the user.
G06F 16/638 - Présentation des résultats des requêtes
G06F 16/635 - Filtrage basé sur des données supplémentaires, p.ex. sur des profils d'utilisateurs ou de groupes
G06F 16/735 - Filtrage basé sur des données supplémentaires, p.ex. sur des profils d'utilisateurs ou de groupes
G06F 16/783 - Recherche de données caractérisée par l’utilisation de métadonnées, p.ex. de métadonnées ne provenant pas du contenu ou de métadonnées générées manuellement utilisant des métadonnées provenant automatiquement du contenu
86.
SYSTEMS AND METHODS FOR IMPORTING AUDIO FILES IN A DIGITAL AUDIO WORKSTATION
A method includes displaying a user interface of a digital audio workstation, which includes a composition region for generating a composition. The composition region includes a representation of a first MIDI file that has already been added to the composition by a user. The method further includes receiving a user input to import, into the composition region, an audio file. In response to the user input to import the audio file, the method includes importing the audio file, which includes, without user intervention, aligning the audio file with a rhythm of the first MIDI file, modifying a rhythm of the audio file based on the rhythm of the first MIDI file, and displaying a representation of the audio file in the composition region.
G10H 1/00 - INSTRUMENTS DE MUSIQUE ÉLECTROPHONIQUES; INSTRUMENTS DANS LESQUELS LES SONS SONT PRODUITS PAR DES MOYENS ÉLECTROMÉCANIQUES OU DES GÉNÉRATEURS ÉLECTRONIQUES, OU DANS LESQUELS LES SONS SONT SYNTHÉTISÉS À PARTIR D'UNE MÉMOIRE DE DONNÉES Éléments d'instruments de musique électrophoniques
This disclosure concerns the provision of media, and more particularly streaming of media. In particular, one aspect herein relates to a method performed by a server system of streaming an audio content item to an electronic device. In response to receiving a request message from the electronic device, a selected audio content item is retrieved. A second storage is browsed utilizing to locate non-static media content item(s) associated with the selected audi content item. In response to finding a non-static media content item associated with the selected audio content item, the selected audio content item is sent along with the located non-static media content item to the electronic device for simultaneous presentation of the audio content item and the located non static media content item.
G06F 16/683 - Recherche de données caractérisée par l’utilisation de métadonnées, p.ex. de métadonnées ne provenant pas du contenu ou de métadonnées générées manuellement utilisant des métadonnées provenant automatiquement du contenu
H04N 21/4722 - Interface pour utilisateurs finaux pour la requête de contenu, de données additionnelles ou de services; Interface pour utilisateurs finaux pour l'interaction avec le contenu, p.ex. pour la réservation de contenu ou la mise en place de rappels, pour la requête de notification d'événement ou pour la transformation de contenus affichés pour la requête de données additionnelles associées au contenu
H04N 21/431 - Génération d'interfaces visuelles; Rendu de contenu ou données additionnelles
G06F 16/48 - Recherche caractérisée par l’utilisation de métadonnées, p.ex. de métadonnées ne provenant pas du contenu ou de métadonnées générées manuellement
G06F 16/583 - Recherche caractérisée par l’utilisation de métadonnées, p.ex. de métadonnées ne provenant pas du contenu ou de métadonnées générées manuellement utilisant des métadonnées provenant automatiquement du contenu
G06F 16/783 - Recherche de données caractérisée par l’utilisation de métadonnées, p.ex. de métadonnées ne provenant pas du contenu ou de métadonnées générées manuellement utilisant des métadonnées provenant automatiquement du contenu
A system, method and computer product for training a neural network system. The method comprises inputting an audio signal to the system to generate plural outputs f(X, Θ). The audio signal includes one or more of vocal content and/or musical instrument content, and each output f(X, Θ) corresponds to a respective one of the different content types. The method also comprises comparing individual outputs f(X, Θ) of the neural network system to corresponding target signals. For each compared output f(X, Θ), at least one parameter of the system is adjusted to reduce a result of the comparing performed for the output f(X, Θ), to train the system to estimate the different content types. In one example embodiment, the system comprises a U-Net architecture. After training, the system can estimate various different types of vocal and/or instrument components of an audio signal, depending on which type of component(s) the system is trained to estimate.
G10H 1/00 - INSTRUMENTS DE MUSIQUE ÉLECTROPHONIQUES; INSTRUMENTS DANS LESQUELS LES SONS SONT PRODUITS PAR DES MOYENS ÉLECTROMÉCANIQUES OU DES GÉNÉRATEURS ÉLECTRONIQUES, OU DANS LESQUELS LES SONS SONT SYNTHÉTISÉS À PARTIR D'UNE MÉMOIRE DE DONNÉES Éléments d'instruments de musique électrophoniques
An electronic device generates a respective user queue for each user of a plurality of users participating in a shared listening session. While providing a first media content item for playback, the device receives a second request, from a first user, to add a second media content item to the shared playback queue and updates the respective user queue for the first user. After receiving the second request, the electronic device receives a third request, from a second user, to add a third media content item to the shared playback queue and updates the respective user queue for the second user. The electronic device updates the shared playback queue using the respective user queues of the first user and the second user, including positioning the third media content item in an order of the shared playback queue to be played back before the second media content item.
H04N 21/458 - Ordonnancement de contenu pour créer un flux personnalisé, p.ex. en combinant une publicité stockée localement avec un flux d'entrée; Opérations de mise à jour, p.ex. pour modules de système d'exploitation
H04N 21/472 - Interface pour utilisateurs finaux pour la requête de contenu, de données additionnelles ou de services; Interface pour utilisateurs finaux pour l'interaction avec le contenu, p.ex. pour la réservation de contenu ou la mise en place de rappels, pour la requête de notification d'événement ou pour la transformation de contenus affichés
H04N 21/442 - Surveillance de procédés ou de ressources, p.ex. détection de la défaillance d'un dispositif d'enregistrement, surveillance de la bande passante sur la voie descendante, du nombre de visualisations d'un film, de l'espace de stockage disponible dans l
H04N 21/25 - Opérations de gestion réalisées par le serveur pour faciliter la distribution de contenu ou administrer des données liées aux utilisateurs finaux ou aux dispositifs clients, p.ex. authentification des utilisateurs finaux ou des dispositifs clients ou
H04N 21/258 - Gestion de données liées aux clients ou aux utilisateurs finaux, p.ex. gestion des capacités des clients, préférences ou données démographiques des utilisateurs, traitement des multiples préférences des utilisateurs finaux pour générer des données co
H04N 21/45 - Opérations de gestion réalisées par le client pour faciliter la réception de contenu ou l'interaction avec le contenu, ou pour l'administration des données liées à l'utilisateur final ou au dispositif client lui-même, p.ex. apprentissage des préféren
H04N 21/466 - Procédé d'apprentissage pour la gestion intelligente, p.ex. apprentissage des préférences d'utilisateurs pour recommander des films
H04N 21/262 - Ordonnancement de la distribution de contenus ou de données additionnelles, p.ex. envoi de données additionnelles en dehors des périodes de pointe, mise à jour de modules de logiciel, calcul de la fréquence de transmission de carrousel, retardement d
90.
SYSTEMS AND METHODS FOR SEQUENCING A PLAYLIST OF MEDIA ITEMS
A server system receives a request to generate a playlist. The playlist includes a sequence of media items. The server system receives a plurality of constraints that define disqualification criteria for excluding media items from a respective slot in the sequence of media items. The plurality of constraints for the respective slot in the sequence of media items includes at least one constraint that is based on already-populated slots in the sequence of media items. The server system generates the playlist by sequentially populating each respective slot in the sequence of media items, including selecting, for the respective slot, a respective media item that meets the plurality of constraints for the respective slot in the sequence of media items. The server system provides the playlist to a user of the media providing service.
H04N 21/262 - Ordonnancement de la distribution de contenus ou de données additionnelles, p.ex. envoi de données additionnelles en dehors des périodes de pointe, mise à jour de modules de logiciel, calcul de la fréquence de transmission de carrousel, retardement d
H04N 21/454 - Filtrage de contenu, p.ex. blocage des publicités
H04N 21/239 - Interfaçage de la voie montante du réseau de transmission, p.ex. établissement de priorité des requêtes de clients
Systems, devices, apparatuses, components, methods, and techniques for media a simple user interface that can facilitate discovery of contextually relevant media content with minimal navigation are provided. For example, the disclosed user interface may present contextually relevant categories, sub-categories and media content items while concurrently playing a media content item predicted to likely be selected by the user.
H04N 21/442 - Surveillance de procédés ou de ressources, p.ex. détection de la défaillance d'un dispositif d'enregistrement, surveillance de la bande passante sur la voie descendante, du nombre de visualisations d'un film, de l'espace de stockage disponible dans l
H04N 21/2668 - Création d'un canal pour un groupe dédié d'utilisateurs finaux, p.ex. en insérant des publicités ciblées dans un flux vidéo en fonction des profils des utilisateurs finaux
H04N 21/45 - Opérations de gestion réalisées par le client pour faciliter la réception de contenu ou l'interaction avec le contenu, ou pour l'administration des données liées à l'utilisateur final ou au dispositif client lui-même, p.ex. apprentissage des préféren
H04N 21/472 - Interface pour utilisateurs finaux pour la requête de contenu, de données additionnelles ou de services; Interface pour utilisateurs finaux pour l'interaction avec le contenu, p.ex. pour la réservation de contenu ou la mise en place de rappels, pour la requête de notification d'événement ou pour la transformation de contenus affichés
92.
SYSTEMS AND METHODS FOR DETERMINING DESCRIPTORS FOR MEDIA CONTENT ITEMS
An electronic device obtains a plurality of collections of media content items, each collection of media content items being associated with text generated by one or more users of the media-providing service. Based on how frequently a first media content item co-occurs with a first descriptor in text for respective collections of media items that include the first media content item, the electronic device generates, without user input, a new collection of media content items for a first user. The new collection of media content items corresponds to the first descriptor and includes the first media content item. The electronic device presents the new collection of media content items to the first user as a recommendation.
G06F 16/908 - Recherche caractérisée par l’utilisation de métadonnées, p.ex. de métadonnées ne provenant pas du contenu ou de métadonnées générées manuellement utilisant des métadonnées provenant automatiquement du contenu
G06F 16/9535 - Adaptation de la recherche basée sur les profils des utilisateurs et la personnalisation
G06F 16/68 - Recherche de données caractérisée par l’utilisation de métadonnées, p.ex. de métadonnées ne provenant pas du contenu ou de métadonnées générées manuellement
Technology for generating, reading, and using machine-readable codes is disclosed. There is a method, performed by an image capture device, for reading and using the codes. The method includes obtaining an image, identifying an area in the image having a machine-readable code. The method also includes, within the image area, finding a predefined start marker defining a start point and a predefined stop marker defining a stop point, an axis being defined there between. A plurality of axis points can be defined along the axis. For each axis point, a first distance within the image area to a mark is determined. The distance can be measured from the axis point in a first direction which is orthogonal to the axis. The first distances can be converted to a binary code using Gray code such that each first distance encodes at least one bit of data in the code.
G06K 19/06 - Supports d'enregistrement pour utilisation avec des machines et avec au moins une partie prévue pour supporter des marques numériques caractérisés par le genre de marque numérique, p.ex. forme, nature, code
G06K 7/14 - Méthodes ou dispositions pour la lecture de supports d'enregistrement par radiation corpusculaire utilisant la lumière sans sélection des longueurs d'onde, p.ex. lecture de la lumière blanche réfléchie
94.
Display screen with animated graphical user interface
Audio content episodes are received. Using machine learning, one or more audio segments of interest are identified in each of the audio content episodes based at least in part on an analysis of content included in a corresponding audio content episode. Each of the identified audio segments is associated with one or more automatically determined tags. Using machine learning, a recommended audio segment is selected for a specific user from the identified audio segments based at least in part on attributes of the specific user and the automatically determined tags of the identified audio segments. The recommended audio segment are automatically provided in an audio segment feed.
G10L 25/51 - Techniques d'analyses de la parole ou de la voix qui ne se limitent pas à un seul des groupes spécialement adaptées pour un usage particulier pour comparaison ou différentiation
G06F 3/14 - Sortie numérique vers un dispositif de visualisation
G10L 17/00 - Identification ou vérification du locuteur
G06N 5/04 - Modèles d’inférence ou de raisonnement
G06F 16/638 - Présentation des résultats des requêtes
G06F 16/64 - Navigation; Visualisation à cet effet
A system, method and computer product for combining audio tracks. In one example embodiment herein, the method comprises determining at least one music track that is musically compatible with a base music track, aligning those tracks in time, and combining the tracks. In one example embodiment herein, the tracks may be music tracks of different songs, the base music track can be an instrumental accompaniment track, and the at least one music track can be a vocal track. Also in one example embodiment herein, the determining is based on musical characteristics associated with at least one of the tracks, such as an acoustic feature vector distance between tracks, a likelihood of at least one track including a vocal component, a tempo, or musical key. Also, determining of musical compatibility can include determining at least one of a vertical musical compatibility or a horizontal musical compatibility among tracks.
G10H 1/00 - INSTRUMENTS DE MUSIQUE ÉLECTROPHONIQUES; INSTRUMENTS DANS LESQUELS LES SONS SONT PRODUITS PAR DES MOYENS ÉLECTROMÉCANIQUES OU DES GÉNÉRATEURS ÉLECTRONIQUES, OU DANS LESQUELS LES SONS SONT SYNTHÉTISÉS À PARTIR D'UNE MÉMOIRE DE DONNÉES Éléments d'instruments de musique électrophoniques
97.
Anomaly Detection Using Gaussian Process Variational Autoencoder (GPVAE)
A method comprises the following steps: providing a Gaussian process variational autoencoder (GP-VAE) including a Gaussian process (GP) encoder and a neural network decoder; selecting a plurality of inducing points in a data space; generating a mapping of the plurality of inducing points in a latent space; and training the GP-VAE using a training dataset.
Apparatus, methods and computer-readable medium are provided for processing wind noise. Audio input is processed by receiving an audio input. A wind noise level representative of a wind noise at the microphone array is measured using the audio input and a determination is made, based on the wind noise level, whether to perform either (i) a wind noise suppression process on the audio input on-device, or (ii) the wind noise suppression process on the audio input on-device and an audio reconstruction process in-cloud.
G10L 21/0232 - Traitement dans le domaine fréquentiel
H04R 1/40 - Dispositions pour obtenir la fréquence désirée ou les caractéristiques directionnelles pour obtenir la caractéristique directionnelle désirée uniquement en combinant plusieurs transducteurs identiques