Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for selecting actions to be performed by an agent to interact with an environment using an action selection neural network. In one aspect, a method comprises, at each time step in a sequence of time steps: generating a current representation of a state of a task being performed by the agent in the environment as of the current time step as a sequence of data elements; autoregressively generating a sequence of data elements representing a current action to be performed by the agent at the current time step; and after autoregressively generating the sequence of data elements representing the current action, causing the agent to perform the current action at the current time step.
Methods, systems, and apparatus, including computer programs encoded on computer storage media, for processing video data using an adaptive visual speech recognition model. One of the methods includes receiving a video that includes a plurality of video frames that depict a first speaker; obtaining a first embedding characterizing the first speaker; and processing a first input comprising (i) the video and (ii) the first embedding using a visual speech recognition neural network having a plurality of parameters, wherein the visual speech recognition neural network is configured to process the video and the first embedding in accordance with trained values of the parameters to generate a speech recognition output that defines a sequence of one or more words being spoken by the first speaker in the video.
Methods, systems, and apparatus, including computer programs encoded on computer storage media, for processing multi-modal inputs using language models. In particular, the inputs include an image, and the image is encoded by an image encoder neural network to generate a sequence of image embeddings representing the image. The sequence of image embeddings is provided as at least part of an input sequence to that is processed by a language model neural network.
G06N 3/04 - Architecture, e.g. interconnection topology
G06V 10/80 - Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
G06V 10/82 - Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
Systems and methods for training rate control neural networks through reinforcement learning. During training, reward values for training examples are generated from the current performance of the rate control neural network in encoding the video in the training example and the historical performance of the rate control neural network in encoding the video in the training example.
Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for training a neural network that includes one or more graph neural network layers. In one aspect, a method comprises: generating data defining a graph, comprising: generating a respective final feature representation for each node, wherein, for each of one or more of the nodes, the respective final feature representation is a modified feature representation that is generated from a respective feature representation for the node using respective noise; processing the data defining the graph using one or more of the graph neural network layers of the neural network to generate a respective updated node embedding of each node; and processing, for each of one or more of the nodes having modified feature representations, the updated node embedding of the node to generate a respective de-noising prediction for the node.
Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for generating a network output using a neural network. In one aspect, a method comprises: obtaining: (i) a network input to a neural network, and (ii) a set of query embeddings; processing the network input using the neural network to generate a network output that comprises a respective dimension corresponding to each query embedding in the set of query embeddings, comprising: processing the network input using an encoder block of the neural network to generate a representation of the network input as a set of latent embeddings; and processing: (i) the set of latent embeddings, and (ii) the set of query embeddings, using a cross-attention block that generates each dimension of the network output by cross-attention of a corresponding query embedding over the set of latent embeddings.
Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for unmasking a masked representation of a protein using a protein reconstruction neural network. In one aspect, a method comprises: receiving the masked representation of the protein; and processing the masked representation of the protein using the protein reconstruction neural network to generate a respective predicted embedding corresponding to one or more masked embeddings that are included in the masked representation of the protein, wherein a predicted embedding corresponding to a masked embedding in a representation of the amino acid sequence of the protein defines a prediction for an identity of an amino acid at a corresponding position in the amino acid sequence, wherein a predicted embedding corresponding to a masked embedding in a representation of the structure of the protein defines a prediction for a corresponding structural feature of the protein.
There is disclosed a computer-implemented method for training a neural network. The method comprises determining a gradient associated with a parameter of the neural network. The method further comprises determining a ratio of a gradient norm to parameter norm and comparing the ratio to a threshold. In response to determining that the ratio exceeds the threshold, the value of the gradient is reduced such that the ratio is equal to or below the threshold. The value of the parameter is updated based upon the reduced gradient value.
This specification describes a method for using a neural network to generate a network output that characterizes an entity. The method includes: obtaining a representation of the entity as a set of data element embeddings, obtaining a set of latent embeddings, and processing: (i) the set of data element embeddings, and (ii) the set of latent embeddings, using the neural network to generate the network output characterizing the entity. The neural network includes: (i) one or more cross-attention blocks, (ii) one or more self-attention blocks, and (iii) an output block. Each cross-attention block updates each latent embedding using attention over some or all of the data element embeddings. Each self-attention block updates each latent embedding using attention over the set of latent embeddings. The output block processes one or more latent embeddings to generate the network output that characterizes the entity.
Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for performing protein design. In one aspect, a method comprises: processing an input characterizing a target protein structure of a target protein using an embedding neural network having a plurality of embedding neural network parameters to generate an embedding of the target protein structure of the target protein; determining a predicted amino acid sequence of the target protein based on the embedding of the target protein structure, comprising: conditioning a generative neural network having a plurality of generative neural network parameters on the embedding of the target protein structure; and generating, by the generative neural network conditioned on the embedding of the target protein structure, a representation of the predicted amino acid sequence of the target protein.
Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for rendering a new image that depicts a scene from a perspective of a camera at a new camera location. In one aspect, a method comprises: receiving a plurality of observations characterizing the scene; generating a latent variable representing the scene from the plurality of observations characterizing the scene; conditioning a scene representation neural network on the latent variable representing the scene, wherein the scene representation neural network conditioned on the latent variable representing the scene defines a geometric model of the scene as a three-dimensional (3D) radiance field; and rendering the new image that depicts the scene from the perspective of the camera at the new camera location using the scene representation neural network conditioned on the latent variable representing the scene.
Methods, systems, and apparatus, including computer programs encoded on computer storage media, for determining principal components of a data set using multi- agent interactions. One of the methods includes obtaining initial estimates for a plurality of principal components of a data set; and generating a final estimate for each principal component by repeatedly performing operations comprising: generating a reward estimate using the current estimate of the principal component, wherein the reward estimate is larger if the current estimate of the principal component captures more variance in the data set; generating, for each parent principal component of the principal component, a punishment estimate, wherein the punishment estimate is larger if the current estimate of the principal component and the current estimate of the parent principal component are not orthogonal; and updating the current estimate of the principal component according to a difference between the reward estimate and the punishment estimates.
A computer-implemented method of training a neural network. The method comprises processing a first transformed view of a training data item, e.g. an image, with a target neural network to generate a target output, processing a second transformed view of the training data item, e.g. image, with an online neural network to generate a prediction of the target output, updating parameters of the online neural network to minimize an error between the prediction of the target output and the target output, and updating parameters of the target neural network based on the parameters of the online neural network. The method can effectively train an encoder neural network without using labelled training data items, and without using a contrastive loss, i.e. without needing "negative examples" which comprise transformed views of different data items.
Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for reinforcement learning with adaptive return computation schemes. In one aspect, a method includes: maintaining data specifying a policy for selecting between multiple different return computation schemes, each return computation scheme assigning a different importance to exploring the environment while performing an episode of a task; selecting, using the policy, a return computation scheme from the multiple different return computation schemes; controlling an agent to perform the episode of the task to maximize a return computed according to the selected return computation scheme; identifying rewards that were generated as a result of the agent performing the episode of the task; and updating, using the identified rewards, the policy for selecting between multiple different return computation schemes.
Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for training an environment representation neural network of a reinforcement learning system controls an agent to perform a given task. In one aspect, the method includes: receiving a current observation input and a future observation input; generating, from the future observation input, a future latent representation of the future state of the environment; processing, using the environment representation neural network, to generate a current internal representation of the current state of the environment; generating, from the current internal representation, a predicted future latent representation; evaluating an objective function measuring a difference between the future latent representation and the predicted future latent representation; and determining, based on a determined gradient of the objective function, an update to the current values of the environment representation parameters.
Methods, systems, and apparatus, including computer programs encoded on computer storage media, for selecting actions to be performed by an agent interacting with an environment to cause the agent to perform a task. One of the methods includes: receiving a current observation characterizing a current environment state of the environment; performing a plurality of planning iterations to generate plan data that indicates a respective value to performing the task of the agent performing each of the set of actions in the environment and starting from the current environment state, wherein performing each planning iteration comprises selecting a sequence of actions to be performed by the agent starting from the current environment state based on outputs generated by a dynamics model and a prediction model; and selecting, from the set of actions, an action to be performed by the agent in response to the current observation based on the plan data.
Methods, systems, and apparatus, including computer programs encoded on computer storage media, for executing depth-parallel training of a neural network. One of the methods includes receiving an input sequence; and at each processing time step in a sequence of processing time steps: processing an input item using a first layer block in a stack of layer blocks to generate a first block output; for each subsequent layer block, processing a block output generated by the preceding layer block at the preceding processing time step to generate a current block output; computing i) a current error in an output item generated by the final layer block and ii) a current gradient of the current error; generating a parameter update for the final layer block; for each particular layer block that is not the final layer block, computing a current gradient for the particular layer block and generating a parameter update.
Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for performing a machine learning task on a network input that is a sequence to generate a network output. In one aspect, one of the methods includes, for each particular sequence of layer inputs: for each attention layer in the neural network: maintaining episodic memory data; maintaining compressed memory data; receiving a layer input to be processed by the attention layer; and applying an attention mechanism over (i) the compressed representation in the compressed memory data for the layer, (ii) the hidden states in the episodic memory data for the layer, and (iii) the respective hidden state at each of the plurality of input positions in the particular network input to generate a respective activation for each input position in the layer input.
Methods, systems, and apparatus, including computer programs encoded on computer storage media, for generating output audio examples using a generative neural network. One of the methods includes obtaining a training conditioning text input; processing a training generative input comprising the training conditioning text input using a feedforward generative neural network to generate a training audio output; processing the training audio output using each of a plurality of discriminators, wherein the plurality of discriminators comprises one or more conditional discriminators and one or more unconditional discriminators; determining a first combined prediction by combining the respective predictions of the plurality of discriminators; and determining an update to current values of a plurality of generative parameters of the feedforward generative neural network to increase a first error in the first combined prediction.
A neural network system includes at least one layer which applies a lxl convolution to a dense activation matrix, using a kernel defined by a sparse weight matrix. The layer is implemented by a processor with access to a sparsity dataset which indicates where the null weights are located in the weight matrix. The processor selects the feature values corresponding to the other weights from a memory unit configured to store the activation matrix, and then uses these extracted feature values for calculating the convolved values.
Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for performing protein structure prediction and protein domain segmentation. In one aspect, a method comprises generating a plurality of predicted structures of a protein, wherein generating a predicted structure of the protein comprises: updating initial values of a plurality of structure parameters of the protein, comprising, at each of a plurality of update iterations: determining a gradient of a quality score for the current values of the structure parameters with respect to the current values of the structure parameters; and updating the current values of the structure parameters using the gradient.
Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for performing protein structure prediction. In one aspect, a method comprises generating a distance map for a given protein, wherein the given protein is defined by a sequence of amino acid residues arranged in a structure, wherein the distance map characterizes estimated distances between the amino acid residues in the structure, comprising: generating a plurality of distance map crops, wherein each distance map crop characterizes estimated distances between (i) amino acid residues in each of one or more respective first positions in the sequence and (ii) amino acid residues in each of one or more respective second positions in the sequence in the structure of the protein, wherein the first positions are a proper subset of the sequence; and generating the distance map for the given protein using the plurality of distance map crops.
G16H 50/20 - ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for computer-aided diagnosis, e.g. based on medical expert systems
23.
PREDICTING PROTEIN STRUCTURES USING GEOMETRY NEURAL NETWORKS THAT ESTIMATE SIMILARITY BETWEEN PREDICTED PROTEIN STRUCTURES AND ACTUAL PROTEIN STRUCTURES
Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for performing protein structure prediction. In one aspect, a method comprises, at each of one or more iterations: determining an alternative predicted structure of a given protein defined by alternative values of structure parameters; processing, using a geometry neural network, a network input comprising: (i) a representation of a sequence of amino acid residues in the given protein, and (ii) the alternative values of the structure parameters, to generate an output characterizing an alternative geometry score that is an estimate of a similarity measure between the alternative predicted structure and the actual structure of the given protein.
G16H 50/20 - ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for computer-aided diagnosis, e.g. based on medical expert systems
Methods, systems, and apparatus, including computer programs encoded on computer storage media, for generating an output sequence of audio data that comprises a respective audio sample at each of a plurality of time steps. One of the methods includes, for each of the time steps: providing a current sequence of audio data as input to a convolutional subnetwork, wherein the current sequence comprises the respective audio sample at each time step that precedes the time step in the output sequence, and wherein the convolutional subnetwork is configured to process the current sequence of audio data to generate an alternative representation for the time step; and providing the alternative representation for the time step as input to an output layer, wherein the output layer is configured to: process the alternative representation to generate an output that defines a score distribution over a plurality of possible audio samples for the time step.
G10L 19/00 - Speech or audio signal analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
H04W 4/18 - Information format or content conversion, e.g. adaptation by the network of the transmitted or received information for the purpose of wireless delivery to users or terminals
G10L 25/30 - Speech or voice analysis techniques not restricted to a single one of groups characterised by the analysis technique using neural networks
G10H 1/00 - ELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE - Details of electrophonic musical instruments
G10L 13/00 - Speech synthesis; Text to speech systems
Methods, systems, and apparatus, including computer programs encoded on computer storage media, for generating an output sequence of audio data that comprises a respective audio sample at each of a plurality of time steps. One of the methods includes, for each of the time steps: providing a current sequence of audio data as input to a convolutional subnetwork, wherein the current sequence comprises the respective audio sample at each time step that precedes the time step in the output sequence, and wherein the convolutional subnetwork is configured to process the current sequence of audio data to generate an alternative representation for the time step; and providing the alternative representation for the time step as input to an output layer, wherein the output layer is configured to: process the alternative representation to generate an output that defines a score distribution over a plurality of possible audio samples for the time step.
G10L 99/00 - Subject matter not provided for in other groups of this subclass
G10L 25/30 - Speech or voice analysis techniques not restricted to a single one of groups characterised by the analysis technique using neural networks
G10H 7/00 - Instruments in which the tones are synthesised from a data store, e.g. computer organs
G10L 13/04 - Methods for producing synthetic speech; Speech synthesisers - Details of speech synthesis systems, e.g. synthesiser structure or memory management
Methods, systems, and apparatus, including computer programs encoded on computer storage media, for asynchronous deep reinforcement learning. One of the systems includes one or more computers configured to implement a plurality of workers, wherein each worker is configured to operate independently of each other worker, and wherein each worker is associated with a respective actor that interacts with a respective replica of the environment during the training of the deep neural network. Aspects of the present specification have the technical effect of faster training of a neural network and/or reducing the memory requirements for the training.
Methods, systems, and apparatus, including computer programs encoded on computer storage media, for training an actor neural network used to select actions to be performed by an agent interacting with an environment. One of the methods includes obtaining a minibatch of experience tuples; and updating current values of the parameters of the actor neural network, comprising: for each experience tuple in the minibatch: processing the training observation and the training action in the experience tuple using a critic neural network to determine a neural network output for the experience tuple, and determining a target neural network output for the experience tuple; updating current values of the parameters of the critic neural network using errors between the target neural network outputs and the neural network outputs; and updating the current values of the parameters of the actor neural network using the critic neural network.