Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for training a target action selection policy to control a target agent interacting with an environment. In one aspect, a method comprises: obtaining a set of offline training data, wherein the offline training data characterizes interaction of a baseline agent with an environment as the baseline agent performs actions selected in accordance with a baseline action selection policy; generating a set of online training data that characterizes interaction of the target agent with the environment as the target agent performs actions selected in accordance with the target action selection policy; and training the target action selection policy on both: (i) the offline training data, and (ii) the online training data, wherein the training of the target action selection policy on the offline training data is conditioned on a measure of competency of the baseline agent.
Methods, systems, and apparatus, including computer programs encoded on computer storage media, for generating a prediction characterizing an environment. In one aspect, a method includes obtaining a respective observation characterizing a state of an environment for each time step in a sequence of multiple time steps, comprising, for each time step after a first time step in the sequence of time steps: processing a network input that comprises observations obtained for one or more preceding time steps to generate a plurality of acquisition decisions; obtaining an observation for the time step, wherein the observation includes data corresponding to modalities that are selected for acquisition at the time step, does not include data corresponding to modalities that are not selected for acquisition at the time step; and processing a model input that includes the observation for each time step in the sequence of time steps to generate the prediction.
Methods, systems, and apparatus, including computer programs encoded on computer storage media, for generating an output sequence of audio data that comprises a respective audio sample at each of a plurality of time steps. One of the methods includes, for each of the time steps: providing a current sequence of audio data as input to a convolutional subnetwork, wherein the current sequence comprises the respective audio sample at each time step that precedes the time step in the output sequence, and wherein the convolutional subnetwork is configured to process the current sequence of audio data to generate an alternative representation for the time step; and providing the alternative representation for the time step as input to an output layer, wherein the output layer is configured to: process the alternative representation to generate an output that defines a score distribution over a plurality of possible audio samples for the time step.
G10L 25/30 - Speech or voice analysis techniques not restricted to a single one of groups characterised by the analysis technique using neural networks
G06N 3/04 - Architecture, e.g. interconnection topology
Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for selecting an action to be performed by a reinforcement learning agent interacting with an environment. In one aspect, a method comprises: receiving a current observation; for each action of a plurality of actions: randomly sampling one or more probability values; for each probability value: processing the action, the current observation, and the probability value using a quantile function network to generate an estimated quantile value for the probability value with respect to a probability distribution over possible returns that would result from the agent performing the action in response to the current observation; determining a measure of central tendency of the one or more estimated quantile values; and selecting an action to be performed by the agent in response to the current observation using the measures of central tendency for the actions.
Methods, systems, and apparatus, including computer programs encoded on computer storage media, for generating a graph model representing an environment being interacted with by an agent. In one aspect, one of the methods include: obtaining experience data; using the experience data to update a visitation count for each of one or more state-action pairs represented by the graph model; and at each of multiple environment exploration steps: computing a utility measure for each of the one or more state-action pairs represented by the graph model; determining, based on the utility measures, a sequence of one or more planned actions that have an information gain that satisfies a threshold; and controlling the agent to perform the sequence of one or more planned actions to cause the environment to transition from a state characterized by a last observation received after a last action in the experience data into a different state.
Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for selecting actions for an agent in a target environment. In particular, the actions are selected using an environment model for the target environment that is parameterized using interactions of the agent with the target environment and one or more source environments.
A method performed by one or more computers for obtaining an optimized algorithm that (i) is functionally equivalent to a target algorithm and (ii) optimizes one or more target properties when executed on a target set of one or more hardware devices. The method includes: initializing a target tensor representing the target algorithm; generating, using a neural network having a plurality of network parameters, a tensor decomposition of the target tensor that parametrizes a candidate algorithm; generating target property values for each of the target properties when executing the candidate algorithm on the target set of hardware devices; determining a benchmarking score for the tensor decomposition based on the target property values of the candidate algorithm; generating a training example from the tensor decomposition and the benchmarking score; and storing, in a training data store, the training example for use in updating the network parameters of the neural network.
There is disclosed a computer-implemented method for training a neural network. The method comprises determining a gradient associated with a parameter of the neural network. The method further comprises determining a ratio of a gradient norm to parameter norm and comparing the ratio to a threshold. In response to determining that the ratio exceeds the threshold, the value of the gradient is reduced such that the ratio is equal to or below the threshold. The value of the parameter is updated based upon the reduced gradient value.
There is provided a computer-implemented method for updating a search distribution of an evolutionary strategies optimizer using an optimizer neural network comprising one or more attention blocks. The method comprises receiving a plurality of candidate solutions, one or more parameters defining the search distribution that the plurality of candidate solutions are sampled from, and fitness score data indicating a fitness of each respective candidate solution of the plurality of candidate solutions. The method further comprises processing, by the one or more attention neural network blocks, the fitness score data using an attention mechanism to generate respective recombination weights corresponding to each respective candidate solution. The method further comprises updating the one or more parameters defining the search distribution based upon the recombination weights applied to the plurality of candidate solutions.
Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for training an action selection neural network used to select actions to be performed by an agent interacting with an environment. In one aspect, a system comprises a plurality of actor computing units and a plurality of learner computing units. The actor computing units generate experience tuple trajectories that are used by the learner computing units to update learner action selection neural network parameters using a reinforcement learning technique. The reinforcement learning technique may be an off-policy actor critic reinforcement learning technique.
Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for generating a pathogenicity score characterizing a likelihood that a mutation to a protein is a pathogenic mutation, wherein the mutation modifies an amino acid sequence of the protein by replacing an original amino acid by a substitute amino acid at a mutation position in the amino acid sequence of the protein. In one aspect, a method comprises: generating a network input to a pathogenicity prediction neural network, wherein the network input comprises a multiple sequence alignment (MSA) representation that represents an MSA for the protein; processing the network input using the pathogenicity prediction neural network to generate a score distribution over a set of amino acids; and generating the pathogenicity score using the score distribution over the set of amino acids.
Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for performing protein design. In one aspect, a method comprises: processing an input characterizing a target protein structure of a target protein using an embedding neural network having a plurality of embedding neural network parameters to generate an embedding of the target protein structure of the target protein; determining a predicted amino acid sequence of the target protein based on the embedding of the target protein structure, comprising: conditioning a generative neural network having a plurality of generative neural network parameters on the embedding of the target protein structure; and generating, by the generative neural network conditioned on the embedding of the target protein structure, a representation of the predicted amino acid sequence of the target protein.
G16B 15/00 - ICT specially adapted for analysing two-dimensional or three-dimensional molecular structures, e.g. structural or functional relations or structure alignment
Methods, systems, and apparatus, including computer programs encoded on computer storage media, for generating an output sequence of discrete tokens using a diffusion model. In one aspect, a method includes generating, by using the diffusion model, a final latent representation of the sequence of discrete tokens that includes a determined value for each of a plurality of latent variables; applying a de-embedding matrix to the final latent representation of the output sequence of discrete tokens to generate a de-embedded final latent representation that includes, for each of the plurality of latent variables, a respective numeric score for each discrete token in a vocabulary of multiple discrete tokens; selecting, for each of the plurality of latent variables, a discrete token from among the multiple discrete tokens in the vocabulary that has a highest numeric score; and generating the output sequence of discrete tokens that includes the selected discrete tokens.
Methods and systems for performing a sequence of machine learning tasks. One system includes a sequence of deep neural networks (DNNs), including: a first DNN corresponding to a first machine learning task, wherein the first DNN comprises a first plurality of indexed layers, and each layer in the first plurality of indexed layers is configured to receive a respective layer input and process the layer input to generate a respective layer output; and one or more subsequent DNNs corresponding to one or more respective machine learning tasks, wherein each subsequent DNN comprises a respective plurality of indexed layers, and each layer in a respective plurality of indexed layers with index greater than one receives input from a preceding layer of the respective subsequent DNN, and one or more preceding layers of respective preceding DNNs, wherein a preceding layer is a layer whose index is one less than the current index.
A method of automatically selecting a neural network from a plurality of computer-implemented candidate neural networks, each candidate neural network comprising at least an encoder neural network trained to encode an input value as a latent representation. The method comprises: obtaining a sequence of data items, each of the data items comprising an input value and a target value; and determining a respective score for each of the candidate neural networks, comprising evaluating the encoder neural network of the candidate neural network using a plurality of read-out heads. Each read-out head comprises parameters for predicting a target value from a latent representation of an input value of a data item encoded using the encoder neural network of the candidate neural network. The method further comprises selecting the neural network from the plurality of candidate neural networks using the respective scores.
A method performed by one or more computers for obtaining an optimized algorithm that (i) is functionally equivalent to a target algorithm and (ii) optimizes one or more target properties when executed on a target set of one or more hardware devices. The method includes: initializing a target tensor representing the target algorithm; generating, using a neural network having a plurality of network parameters, a tensor decomposition of the target tensor that parametrizes a candidate algorithm; generating target property values for each of the target properties when executing the candidate algorithm on the target set of hardware devices; determining a benchmarking score for the tensor decomposition based on the target property values of the candidate algorithm; generating a training example from the tensor decomposition and the benchmarking score; and storing, in a training data store, the training example for use in updating the network parameters of the neural network.
Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for controlling agents using reporter neural networks.
Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for using simulation-based inference to inferring a set of parameters such as measurements, from observations, e.g. real world observations. The method uses a score generation neural network to determine scores for individual observations or for groups of observations that are combined and used to iteratively adjust values of the parameters.
Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for simulating a state of an environment over a sequence of time steps. In one aspect, a method comprises, at each of one or more time steps: obtaining an environment mesh representing the state of the environment at the time step; generating a graph representing the state of the environment at the time step, comprising: determining that a first face of a first object mesh is within a collision distance of a second face of a second object mesh; and in response, instantiating a face-face edge in the graph that connects: (i) a first set of graph nodes in the graph that represent the first face in the first object mesh, and (ii) a second set of graph nodes in the graph that represent the second face in the second object mesh.
Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for controlling an agent that is interacting with an environment. Implementations of the system use previously learned skills to explore states of the environment to collect and store training data, which is then used to train an action selection system. The system includes a set of skill action selection subsystems, each configured to select actions for the agent to perform for a respective skill. The set of skill action selection subsystems is used to explore states of the environment to collect the training data, keeping their individual action selection policies unchanged. A scheduler neural network selects the skill neural networks to use. The action selection system is trained on the stored training data.
Methods, systems, and apparatus, including computer programs encoded on computer storage media, for training a neural network used to select actions to be performed by an agent interacting with an environment. Implementations of the described techniques can learn to explore the environment efficiently by storing and updating state embedding cluster centers based on observations characterizing states of the environment.
Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for controlling agents. In particular, an agent can be controlled using an action selection neural network that performs in-context reinforcement learning when controlling an agent on a new task.
G06N 3/006 - Artificial life, i.e. computing arrangements simulating life based on simulated virtual individual or collective life forms, e.g. social simulations or particle swarm optimisation [PSO]
G06N 3/0442 - Recurrent networks, e.g. Hopfield networks characterised by memory or gating, e.g. long short-term memory [LSTM] or gated recurrent units [GRU]
Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for controlling agents using reporter neural networks.
Methods, systems, and apparatus, including computer programs encoded on computer storage media, for generating an output sequence of discrete tokens using a diffusion model. In one aspect, a method includes generating, by using the diffusion model, a final latent representation of the sequence of discrete tokens that includes a determined value for each of a plurality of latent variables; applying a de-embedding matrix to the final latent representation of the output sequence of discrete tokens to generate a de-embedded final latent representation that includes, for each of the plurality of latent variables, a respective numeric score for each discrete token in a vocabulary of multiple discrete tokens; selecting, for each of the plurality of latent variables, a discrete token from among the multiple discrete tokens in the vocabulary that has a highest numeric score; and generating the output sequence of discrete tokens that includes the selected discrete tokens.
Methods, systems, and apparatus, including computer programs encoded on computer storage media, for A training a language model for performing a reasoning task. The system obtains a plurality of training examples. Each training example includes a respective sample query text sequence characterizing a respective sample query and a respective reference response text sequence that includes a reference final answer to the respective sample query. The system trains a reward model on the plurality of training examples. The reward model is configured to receive an input including a query text sequence characterizing a query and one or more reasoning steps that have been generated in response to the query and process the input to compute a reward score indicating how successful the one or more reasoning steps are in yielding a correct final answer to the query. The system trains the language model using the trained reward model.
A reinforcement learning system is proposed in which a policy model neural network is trained to control an agent to perform a task in successive time steps, by training a control system including the policy model neural network to select a respective action for each time step which gives a high value for a reward function based on the action, and which indicates the contribution of the action to solving the task. The reward function includes a term based on a progress value output by a progress model. The progress model generates the progress value upon receiving a first observation of the state of the environment at a time step before the performance of the action, and a second observation of the state of the environment at a time step following the performance of the action. The progress value is an estimate of the average time which an ensemble of experts who produced the demonstrations would have taken to transform the environment from how it appears in the first observation to how it appears in the second observation.
Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for processing a network input using a neural network that includes one or more regularized attention layers. In one aspect, a method comprises: receiving a layer input to a regularized attention layer, wherein the layer input to the regularized attention layer comprises a set of input embeddings; and applying a regularized attention operation over the set of input embeddings to generate a set of output embeddings, comprising: transforming intermediate attention scores using a set of shaping constants to generate a set of transformed attention scores, wherein: values of the shaping constants are initialized prior to training of the neural network and are not adjusted during the training of the neural network; and the values of the shaping constants are selected to regularize the set of output embeddings.
Methods, systems, and apparatus, including computer programs encoded on computer storage media, for A training a language model for performing a reasoning task. The system obtains a plurality of training examples. Each training example includes a respective sample query text sequence characterizing a respective sample query and a respective reference response text sequence that includes a reference final answer to the respective sample query. The system trains a reward model on the plurality of training examples. The reward model is configured to receive an input including a query text sequence characterizing a query and one or more reasoning steps that have been generated in response to the query and process the input to compute a reward score indicating how successful the one or more reasoning steps are in yielding a correct final answer to the query. The system trains the language model using the trained reward model.
Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for enabling a user to conduct a dialogue. Implementations of the system learn when to rely on supporting evidence, obtained from an external search system via a search system interface, and are also able to generate replies for the user that align with the preferences of a previously trained response selection neural network. Implementations of the system can also use a previously trained rule violation detection neural network to generate replies that take account of previously learnt rules.
G06N 3/006 - Artificial life, i.e. computing arrangements simulating life based on simulated virtual individual or collective life forms, e.g. social simulations or particle swarm optimisation [PSO]
This specification describes a method for using a neural network to generate a network output that characterizes an entity. The method includes: obtaining a representation of the entity as a set of data element embeddings, obtaining a set of latent embeddings, and processing: (i) the set of data element embeddings, and (ii) the set of latent embeddings, using the neural network to generate the network output characterizing the entity. The neural network includes: (i) one or more cross-attention blocks, (ii) one or more self-attention blocks, and (iii) an output block. Each cross-attention block updates each latent embedding using attention over some or all of the data element embeddings. Each self-attention block updates each latent embedding using attention over the set of latent embeddings. The output block processes one or more latent embeddings to generate the network output that characterizes the entity.
A computer-implemented method for generating an output token sequence from an input token sequence. The method combines a look ahead tree search, such as a Monte Carlo tree search, with a sequence-to-sequence neural network system. The sequence-to-sequence neural network system has a policy output defining a next token probability distribution, and may include a value neural network providing a value output to evaluate a sequence. An initial partial output sequence is extended using the look ahead tree search guided by the policy output and, in implementations, the value output, of the sequence-to-sequence neural network system until a complete output sequence is obtained.
Methods, systems, and apparatus, including computer programs encoded on computer storage media, for generating compressed representations of synthetic images. One of the methods is a method of generating a synthetic image using a generative neural network, and includes: generating, using the generative neural network, a plurality of coefficients that represent the synthetic image after the synthetic image has been encoded using a lossy compression algorithm; and decoding the synthetic image by applying the lossy compression algorithm to the plurality of coefficients.
A reinforcement learning neural network system configured to manage rewards on scales that can vary significantly. The system determines the value of a scale factor that is applied to a temporal difference error used for reinforcement learning. The scale factor depends at least upon a variance of the rewards received during the reinforcement learning.
In one aspect there is provided a method for training a neural network system by reinforcement learning. The neural network system may be configured to receive an input observation characterizing a state of an environment interacted with by an agent and to select and output an action in accordance with a policy aiming to satisfy an objective. The method may comprise obtaining a policy set comprising one or more policies for satisfying the objective and determining a new policy based on the one or more policies. The determining may include one or more optimization steps that aim to maximize a diversity of the new policy relative to the policy set under the condition that the new policy satisfies a minimum performance criterion based on an expected return that would be obtained by following the new policy.
Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for enabling a user to conduct a dialogue. Implementations of the system learn when to rely on supporting evidence, obtained from an external search system via a search system interface, and are also able to generate replies for the user that align with the preferences of a previously trained response selection neural network. Implementations of the system can also use a previously trained rule violation detection neural network to generate replies that take account of previously learnt rules.
G06N 3/006 - Artificial life, i.e. computing arrangements simulating life based on simulated virtual individual or collective life forms, e.g. social simulations or particle swarm optimisation [PSO]
G06F 40/284 - Lexical analysis, e.g. tokenisation or collocates
Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for controlling agents. In particular, an agent can be controlled using an action selection neural network that performs in-context reinforcement learning when controlling an agent on a new task.
Methods, systems, and apparatus, including computer programs encoded on computer storage media, for processing images using recurrent attention. One of the methods includes determining a location in the first image; extracting a glimpse from the first image using the location; generating a glimpse representation of the extracted glimpse; processing the glimpse representation using a recurrent neural network to update a current internal state of the recurrent neural network to generate a new internal state; processing the new internal state to select a location in a next image in the image sequence after the first image; and processing the new internal state to select an action from a predetermined set of possible actions.
G06V 10/44 - Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
A system for controlling an agent interacting with an environment to perform a task. The system includes an action selection neural network configured to generate action selection outputs that are used to select actions to be performed by the agent. The action selection neural network includes an encoder sub network configured to generate encoded representations of the current observations; an attention sub network configured to generate attention sub network outputs with the used of an attention mechanism; a recurrent sub network configured to generate recurrent sub network outputs; and an action selection sub network configured to generate the action selection outputs that are used to select the actions to be performed by the agent in response to the current observations.
G06N 3/0442 - Recurrent networks, e.g. Hopfield networks characterised by memory or gating, e.g. long short-term memory [LSTM] or gated recurrent units [GRU]
Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for controlling a facility through hierarchical reinforcement learning. In particular, the facility is controlled using a high-level controller neural network that makes high-level decisions and a low-level controller neural network that makes low-level controller decisions.
G05B 13/02 - Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric
40.
DATA-EFFICIENT REINFORCEMENT LEARNING WITH ADAPTIVE RETURN COMPUTATION SCHEMES
Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for data-efficient reinforcement learning with adaptive return computation schemes.
G06N 3/006 - Artificial life, i.e. computing arrangements simulating life based on simulated virtual individual or collective life forms, e.g. social simulations or particle swarm optimisation [PSO]
Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for training a policy neural network for use in controlling a robot. In particular, the policy neural network can be trained in simulation using images generated by a scene synthesis machine learning model.
G06N 3/0895 - Weakly supervised learning, e.g. semi-supervised or self-supervised learning
G06N 3/006 - Artificial life, i.e. computing arrangements simulating life based on simulated virtual individual or collective life forms, e.g. social simulations or particle swarm optimisation [PSO]
Methods, systems, and apparatus, including computer programs encoded on computer storage media, for determining principal components of a data set using multi-agent interactions. One of the methods includes obtaining initial estimates for a plurality of principal components of a data set; and generating a final estimate for each principal component by repeatedly performing operations comprising: generating a reward estimate using the current estimate of the principal component, wherein the reward estimate is larger if the current estimate of the principal component captures more variance in the data set; generating, for each parent principal component of the principal component, a punishment estimate, wherein the punishment estimate is larger if the current estimate of the principal component and the current estimate of the parent principal component are not orthogonal; and updating the current estimate of the principal component according to a difference between the reward estimate and the punishment estimates.
Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for unmasking a masked representation of a protein using a protein reconstruction neural network. In one aspect, a method comprises: receiving the masked representation of the protein; and processing the masked representation of the protein using the protein reconstruction neural network to generate a respective predicted embedding corresponding to one or more masked embeddings that are included in the masked representation of the protein, wherein a predicted embedding corresponding to a masked embedding in a representation of the amino acid sequence of the protein defines a prediction for an identity of an amino acid at a corresponding position in the amino acid sequence, wherein a predicted embedding corresponding to a masked embedding in a representation of the structure of the protein defines a prediction for a corresponding structural feature of the protein.
A computer-implemented reinforcement learning neural network system that learns a model of rewards in order to relate actions by an agent in an environment to their long-term consequences. The model learns to decompose the rewards into components explainable by different past states. That is, the model learns to associate when being in a particular state of the environment is predictive of a reward in a later state, even when the later state, and reward, is only achieved after a very long time delay.
Methods, systems, and apparatus, including computer programs encoded on computer storage media, for controlling a robot manipulator that has a plurality of joints. One of the methods includes obtaining a control input that comprises one or more velocity values that specify a target velocity of a reference point in a given coordinate frame; determining a respective joint velocity for each of the plurality of joints by generating a solution to an optimization problem formulated from the control input; and controlling the robot manipulator, including causing the plurality of joints of the robot manipulator to move in accordance with the respective joint velocities to approximate the control input.
G05B 19/427 - Teaching successive positions by tracking the position of a joystick or handle to control the positioning servo of the tool head, master-slave control
46.
CONTROLLING AGENTS USING AMBIGUITY-SENSITIVE NEURAL NETWORKS AND RISK-SENSITIVE NEURAL NETWORKS
Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for controlling agents. In particular, an agent can be controlled using an action selection system that is risk-sensitive, ambiguity-sensitive, or both.
G06N 3/006 - Artificial life, i.e. computing arrangements simulating life based on simulated virtual individual or collective life forms, e.g. social simulations or particle swarm optimisation [PSO]
G06N 7/01 - Probabilistic graphical models, e.g. probabilistic networks
Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for generating a response to a query input using a selection-inference neural network.
Methods, computer systems, and apparatus, including computer programs encoded on computer storage media, for predicting an exchange-correlation energy of an atomic system. The system obtains respective electron-orbital features of the atomic system at each of a plurality of grid points; generates, for each of the plurality of grid points, a respective input feature vector for the electron-orbital features at the grid point; and processes the respective input feature vectors for the plurality of grid points using a neural network to generate a predicted exchange-correlation energy of the atomic system.
Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for rendering a new image that depicts a scene from a perspective of a camera at a new camera location. In one aspect, a method comprises: receiving a plurality of observations characterizing the scene; generating a latent variable representing the scene from the plurality of observations characterizing the scene; conditioning a scene representation neural network on the latent variable representing the scene, wherein the scene representation neural network conditioned on the latent variable representing the scene defines a geometric model of the scene as a three-dimensional (3D) radiance field; and rendering the new image that depicts the scene from the perspective of the camera at the new camera location using the scene representation neural network conditioned on the latent variable representing the scene.
Methods, systems, and apparatus, including computer programs encoded on computer storage media, for data-efficient reinforcement learning. One of the systems is a system for training an actor neural network used to select actions to be performed by an agent that interacts with an environment by receiving observations characterizing states of the environment and, in response to each observation, performing an action selected from a continuous space of possible actions, wherein the actor neural network maps observations to next actions in accordance with values of parameters of the actor neural network, and wherein the system comprises: a plurality of workers, wherein each worker is configured to operate independently of each other worker, wherein each worker is associated with a respective agent replica that interacts with a respective replica of the environment during the training of the actor neural network.
G06N 3/006 - Artificial life, i.e. computing arrangements simulating life based on simulated virtual individual or collective life forms, e.g. social simulations or particle swarm optimisation [PSO]
Methods, systems, and computer readable storage media for performing operations comprising: obtaining a plurality of initial network inputs that have been classified as belonging to a corresponding ground truth class; processing each of the plurality of initial network inputs using a trained target neural network to generate a respective predicted network output for each initial network input, the respective predicted network output comprising a respective score for each of a plurality of classes, the plurality of classes comprising the ground truth class; identifying, based on the respective predicted network outputs and the ground truth class, a subset of the initial network inputs as having been misclassified by the trained target neural network; and determining, based on the subset of initial network inputs, one or more failure case latent representations, wherein each failure case latent representation is a latent representation that characterizes network inputs that belong to the ground truth class but that are likely to be misclassified by the trained target neural network.
Methods, systems, and apparatus, including computer programs encoded on computer storage media, for solving mixed integer programs (MIPs) using neural networks. One of the methods includes obtaining data specifying parameters of a MIP; generating, from the parameters of the MIP, an input representation; processing the input representation using an encoder neural network to generate a respective embedding for each of the integer variables; generating a plurality of partial assignments by selecting a respective second, proper subset of the integer variables; and for each of the variables in the respective second subset, generating, using at least the respective embedding for the variable, a respective additional constraint on the value of the variable; generating, for each of the partial assignments, a corresponding candidate final assignment that assigns a respective value to each of the plurality of variables; and selecting, as a final assignment for the MIP, one of the candidate final assignments.
Methods, systems, and apparatus, including computer programs encoded on computer storage media, for selecting actions from large discrete action sets. One of the methods includes receiving a particular observation representing a particular state of an environment; and selecting an action from a discrete set of actions to be performed by an agent interacting with the environment, comprising: processing the particular observation using an actor policy network to generate an ideal point; determining, from the points that represent actions in the set, the k nearest points to the ideal point; for each nearest point of the k nearest points: processing the nearest point and the particular observation using a Q network to generate a respective Q value for the action represented by the nearest point; and selecting the action to be performed by the agent from the k actions represented by the k nearest points based on the Q values.
Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for identifying agents in a system. According to one aspect, a method comprises: generating data defining a causal model of the system, comprising transmitting instructions to cause a plurality of interventions to be applied to the system, wherein each intervention modifies one or more variable elements in the system; processing the model of the system to identify one or more of the variable elements in the system as being decision elements, wherein each decision element represents an action selected by a respective agent in the system; and identifying one or more agents in the system based on the decision elements; and outputting data that identifies the agents in the system.
There is described a neural network system for generating a graph, the graph comprising a set of nodes and edges. The system comprises one or more neural networks configured to represent a probability distribution over sequences of node generating decisions and/or edge generating decisions, and one or more computers configured to sample the probability distribution represented by the one or more neural networks to generate a graph.
A computer-implemented method for determining, for a loss function which is a function of a parameter vector comprising a plurality of parameters, values for the parameters for which the parameter vector is a stationary point of the loss function. The method comprises determining initial values for the parameters; and repeatedly updating the parameters by: (a) determining at least one drift value indicative of discretization drift for a discrete update to the parameters based on the loss function; (b) determining at least one learning rate value by evaluating a learning rate function based on, and having an inverse relationship with, the at least one drift value; (c) determining respective updates to the parameters based upon a product of the at least one learning rate value and a gradient of the loss function with respect to the respective parameter for current values of the parameters; and (d) updating the parameters based upon the determined respective updates.
Method, system, and non-transitory computer storage media for selecting actions to be performed by an agent to interact with an environment to perform a main task by for each time step in a sequence of time steps: receiving a set of features representing an observation; for each of one or more auxiliary prediction neural networks, generating a state value estimate for the current state of the environment relative to a corresponding auxiliary reward that measures values of a corresponding target feature from the set of features representing the observations for the sequence of time steps; processing an input comprising a respective intermediate output generated by each auxiliary neural network at the time step using an action selection neural network to generate an action selection output; and selecting the action to be performed by the agent at the time step using the action selection output.
Methods, systems, and apparatus, including computer programs encoded on computer storage media, for generating control policies for controlling agents in an environment. One of the methods includes, at each of a plurality of iterations: obtaining a current joint control policy for a plurality of agents, the current joint control policy specifying a respective current control policy for each agent; and updating the current joint control policy, comprising, for each agent: generating a respective reward estimate for each of a plurality of alternate control policies that is an estimate of a reward received by the agent if the agent is controlled using the alternate control policy while the other agents are controlled using the respective current control policies; computing a best response for the agent from the respective reward estimates; and updating the respective current control policy for the agent using the best response for the agent.
Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for performing a machine learning task on a network input that is a sequence to generate a network output. In one aspect, one of the methods includes, for each particular sequence of layer inputs: for each attention layer in the neural network: maintaining episodic memory data; maintaining compressed memory data; receiving a layer input to be processed by the attention layer; and applying an attention mechanism over (i) the compressed representation in the compressed memory data for the layer, (ii) the hidden states in the episodic memory data for the layer, and (iii) the respective hidden state at each of the plurality of input positions in the particular network input to generate a respective activation for each input position in the layer input.
Methods, systems, and apparatus, including computer programs encoded on computer storage media, for using multi-task neural networks. One of the methods includes receiving a first network input and data identifying a first machine learning task to be performed on the first network input; selecting a path through the plurality of layers in a super neural network that is specific to the first machine learning task, the path specifying, for each of the layers, a proper subset of the modular neural networks in the layer that are designated as active when performing the first machine learning task; and causing the super neural network to process the first network input using (i) for each layer, the modular neural networks in the layer that are designated as active by the selected path and (ii) the set of one or more output layers corresponding to the identified first machine learning task.
Methods, systems, and apparatus, including computer programs encoded on computer storage media, for data-driven robotic control. One of the methods includes maintaining robot experience data; obtaining annotation data; training, on the annotation data, a reward model; generating task-specific training data for the particular task, comprising, for each experience in a second subset of the experiences in the robot experience data: processing the observation in the experience using the trained reward model to generate a reward prediction, and associating the reward prediction with the experience; and training a policy neural network on the task-specific training data for the particular task, wherein the policy neural network is configured to receive a network input comprising an observation and to generate a policy output that defines a control policy for a robot performing the particular task.
Methods, systems, and apparatus, including computer programs encoded on computer storage media, for reinforcement learning with scheduled auxiliary tasks. In one aspect, a method includes maintaining data specifying parameter values for a primary policy neural network and one or more auxiliary neural networks; at each of a plurality of selection time steps during a training episode comprising a plurality of time steps: receiving an observation, selecting a current task for the selection time step using a task scheduling policy, processing an input comprising the observation using the policy neural network corresponding to the selected current task to select an action to be performed by the agent in response to the observation, and causing the agent to perform the selected action.
Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for training an action selection neural network that is used to select actions to be performed by an agent interacting with an environment. In one aspect, the method comprises: receiving an observation characterizing a current state of the environment; processing the observation and an exploration importance factor using the action selection neural network to generate an action selection output; selecting an action to be performed by the agent using the action selection output; determining an exploration reward; determining an overall reward based on: (i) the exploration importance factor, and (ii) the exploration reward; and training the action selection neural network using a reinforcement learning technique based on the overall reward.
G06N 3/006 - Artificial life, i.e. computing arrangements simulating life based on simulated virtual individual or collective life forms, e.g. social simulations or particle swarm optimisation [PSO]
G06N 3/04 - Architecture, e.g. interconnection topology
G06N 3/084 - Backpropagation, e.g. using gradient descent
G06F 18/22 - Matching criteria, e.g. proximity measures
G06V 10/764 - Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
G06V 10/82 - Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
64.
ACTION CLASSIFICATION IN VIDEO CLIPS USING ATTENTION-BASED NEURAL NETWORKS
Methods, systems, and apparatus, including computer programs encoded on computer storage media, for classifying actions in a video. One of the methods obtaining a feature representation of a video clip; obtaining data specifying a plurality of candidate agent bounding boxes in the key video frame; and for each candidate agent bounding box: processing the feature representation through an action transformer neural network.
Methods, systems, and apparatus, including computer programs encoded on computer storage media, for optimizing a target algorithm using a state representation neural network.
A video processing system configured to analyze a sequence of video frames to detect objects in the video frames and provide information relating to the detected objects in response to a query. The query may comprise, for example, a request for a prediction of a future event, or of the location of an object, or a request for a prediction of what would happen if an object were modified. The system uses a transformer neural network subsystem to process representations of objects in the video.
G06V 20/40 - Scenes; Scene-specific elements in video content
G06V 10/26 - Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
G06V 10/82 - Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
Methods, systems, and apparatus for selecting actions to be performed by an agent interacting with an environment. One system includes a high-level controller neural network, low-level controller network, and subsystem. The high-level controller neural network receives an input observation and processes the input observation to generate a high-level output defining a control signal for the low-level controller. The low-level controller neural network receives a designated component of an input observation and processes the designated component and an input control signal to generate a low-level output that defines an action to be performed by the agent in response to the input observation. The subsystem receives a current observation characterizing a current state of the environment, determines whether criteria are satisfied for generating a new control signal, and based on the determination, provides appropriate inputs to the high-level and low-level controllers for selecting an action to be performed by the agent.
G06N 3/006 - Artificial life, i.e. computing arrangements simulating life based on simulated virtual individual or collective life forms, e.g. social simulations or particle swarm optimisation [PSO]
G06N 3/044 - Recurrent networks, e.g. Hopfield networks
Methods, systems, and apparatus, including computer programs encoded on computer storage media, for selecting an input vocabulary for a machine learning model using power indices. One of the methods includes computing a respective score for each of a plurality of text tokens in an initial vocabulary and then selecting the text tokens in the input vocabulary based on the respective scores.
G10L 13/08 - Text analysis or generation of parameters for speech synthesis out of text, e.g. grapheme to phoneme translation, prosody generation or stress or intonation determination
69.
MODEL-FREE REINFORCEMENT LEARNING WITH REGULARIZED NASH DYNAMICS
Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for training a policy neural network that is used to control an agent. In particular, the policy neural network can be trained through model-free reinforcement learning with regularized Nash dynamics.
Methods, and systems, including computer programs encoded on computer storage media for generating data items. A method includes reading a glimpse from a data item using a decoder hidden state vector of a decoder for a preceding time step, providing, as input to a encoder, the glimpse and decoder hidden state vector for the preceding time step for processing, receiving, as output from the encoder, a generated encoder hidden state vector for the time step, generating a decoder input from the generated encoder hidden state vector, providing the decoder input to the decoder for processing, receiving, as output from the decoder, a generated a decoder hidden state vector for the time step, generating a neural network output update from the decoder hidden state vector for the time step, and combining the neural network output update with a current neural network output to generate an updated neural network output.
Methods, systems, and apparatus, including computer programs encoded on computer storage media, for simulating industrial facilities for control. One of the methods includes. at each of a plurality of time steps during a task episode: receiving, from a computer simulator of an industrial facility, measurements representing a current state of the facility; generating, from the measurements, an observation; providing the observation as input to a control policy for controlling the facility; receiving, as output, an action for controlling one or more setpoints of the facility; generating, from the action, one or more control inputs for the one or more setpoints of the facility; and providing, as input to the simulator, (i) the control inputs and (ii) current values for one or more configuration parameters of the simulator to cause the simulator to generate, as output, new measurements representing a new state of the facility.
G05B 13/02 - Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric
G05B 19/418 - Total factory control, i.e. centrally controlling a plurality of machines, e.g. direct or distributed numerical control (DNC), flexible manufacturing systems (FMS), integrated manufacturing systems (IMS), computer integrated manufacturing (CIM)
72.
PREDICTING PROTEIN STRUCTURES USING PROTEIN GRAPHS
Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for determining a predicted structure of a protein. According to one aspect, there is provided a method comprising maintaining graph data representing a graph of the protein; obtaining a respective pair embedding for each edge in the graph; processing the pair embeddings using a sequence of update blocks, wherein each update block performs operations comprising, for each edge in the graph: generating a respective representation of each of a plurality of cycles in the graph that include the edge by, for each cycle, processing embeddings for edges in the cycle in accordance with the values of the update block parameters of the update block to generate the representation of the cycle; and updating the pair embedding for the edge using the representations of the cycles in the graph that include the edge.
This specification describes a simulation system that performs simulations of physical environments using a graph neural network. At each of one or more time steps in a sequence of time steps in a given time interval, the system can process a representation of a current state of the physical environment at the current time step using the graph neural network to generate a prediction of a next state of the physical environment at the next time step. Generally, the environment has discontinuous dynamics at one or more time points during the time interval.
G06F 30/20 - Design optimisation, verification or simulation
G06F 30/27 - Design optimisation, verification or simulation using machine learning, e.g. artificial intelligence, neural networks, support vector machines [SVM] or training a model
G06F 119/12 - Timing analysis or timing optimisation
74.
Distributional reinforcement learning for continuous control tasks
Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for training an action selection neural network that is used to select actions to be performed by a reinforcement learning agent interacting with an environment. In particular, the actions are selected from a continuous action space and the system trains the action selection neural network jointly with a distribution Q network that is used to update the parameters of the action selection neural network.
G06N 3/006 - Artificial life, i.e. computing arrangements simulating life based on simulated virtual individual or collective life forms, e.g. social simulations or particle swarm optimisation [PSO]
Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for predicting a structure of a protein comprising one or more chains. In one aspect, a method comprises, at each subsequent iteration after a first iteration in a sequence of iterations: obtaining a network input for the subsequent iteration that characterizes the protein; generating, from (i) structure parameters generated at a preceding iteration that precedes the subsequent iteration in the sequence, (ii) one or intermediate outputs generated by the protein structure prediction neural network while generating the structure parameters at the last iteration, or (iii) both, features for the subsequent iteration; and processing the features and the network input for the subsequent iteration using the protein structure prediction neural network to generate structure parameters for the subsequent iteration that define another predicted structure for the protein.
The invention describes a system and a method for controlling an agent interacting with an environment to perform a task, the method comprising, at each of a plurality of first time steps from a plurality of time steps: receiving an observation characterizing a state of the environment at the first time step; determining a goal representation for the first time step that characterizes a goal state of the environment to be reached by the agent; processing the observation and the goal representation using a low-level controller neural network to generate a low-level policy output that defines an action to be performed by the agent in response to the observation, wherein the low-level controller neural network comprises: a representation neural network configured to process the observation to generate an internal state representation of the observation, and a low-level policy head configured to process the state observation representation and the goal representation to generate the low-level policy output; and controlling the agent using the low-level policy output.
G06N 3/006 - Artificial life, i.e. computing arrangements simulating life based on simulated virtual individual or collective life forms, e.g. social simulations or particle swarm optimisation [PSO]
The invention describes the method performed by one or more computers and for training a base policy neural network that is configured to receive a base policy input comprising an observation of a state of an environment and to process the policy input to generate a base policy output that defines an action to be performed by an agent in response to the observation, the method comprising: generating training data for training the base policy neural network by controlling an agent using (i) the base policy neural network and (ii) an exploration strategy that maps, in accordance with a set of one or more parameters, base policy outputs generated by the base policy neural network to actions performed by the agent to interact with an environment, the generating comprising, at each of a plurality of time points: determining that criteria for updating the exploration strategy are satisfied at the time point; and in response to determining that the criteria are satisfied: generating a meta policy input that comprises data characterizing a performance of the base policy neural network in controlling the agent at the time point; processing the meta policy input using a meta policy to generate a meta policy output that specifies respective values for each of the set of one or more parameters that define the exploration strategy; and controlling the agent using the base policy neural network and in accordance with the exploration strategy defined by the respective values for the set of one or more parameters specified by the meta policy output.
G06N 3/008 - Artificial life, i.e. computing arrangements simulating life based on physical entities controlled by simulated intelligence so as to replicate intelligent life forms, e.g. based on robots replicating pets or humans in their appearance or behaviour
Methods, systems, and apparatus, including computer programs encoded on computer storage media, for training a speaker neural network using one or more listener neural networks.
We describe an artificial neural network comprising: an input layer of input neurons, one or more hidden layers of neurons in successive layers of neurons above the input layer, and at least one further, concept-identifying layer of neurons above the hidden layers. The neural network includes an activation memory coupled to an intermediate, hidden layer of neurons between the input concept-identifying layers to store a pattern of activation of the intermediate layer. The neural network further includes a system to determine an overlap between a plurality of the stored patterns of activation and to activate in the intermediate hidden layer an overlap pattern such that the concept-identifying layer of neurons is configured to identify features of the overlap patterns. We also describe related methods, processor control code, and computing systems for the neural network. Optionally further, higher level concept-identifying layers of neurons may be included.
Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for training a structure prediction neural network that comprises an embedding neural network and a main folding neural network. According to one aspect, a method comprises: obtaining a training network input characterizing a training protein; processing the training network input using the embedding neural network and the main folding neural network to generate a main structure prediction; for each auxiliary folding neural network in a set of one or more auxiliary folding neural networks, processing at least a corresponding intermediate output of the embedding neural network to generate an auxiliary structure prediction; determining a gradient of an objective function that includes a respective auxiliary structure loss term for each of the auxiliary folding neural networks; and updating the current values of the embedding network parameters and the main folding parameters based on the gradient.
47 ABSTRACT Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for simulating a state of a physical environment. In one aspect, a method performed by one or more computers for simulating the state of the physical environment is provided. The method includes, for each of multiple time steps: obtaining data defining a fine-resolution mesh and a coarse-resolution mesh that each characterize the state of the physical environment at the current time step, where the fine-resolution mesh has a higher resolution than the coarse-resolution mesh; processing data defining the fine- resolution mesh and the coarse-resolution mesh using a graph neural network that includes: (i) one or more fine-resolution update blocks, (ii) one or more coarse-resolution update blocks, and (iii) one or more up-sampling update blocks; and determining the state of the physical environment at a next time step using updated node embeddings for nodes in the fine-resolution mesh. DeepMind Technologies Limited F&R Ref.: 45288-0255WO1 PCT Application
G06F 30/23 - Design optimisation, verification or simulation using finite element methods [FEM] or finite difference methods [FDM]
G06F 30/27 - Design optimisation, verification or simulation using machine learning, e.g. artificial intelligence, neural networks, support vector machines [SVM] or training a model
An iterative method is proposed to train an action selection system of a reinforcement learning system, based on a reward function which defines a reward value for each action. The reward value includes an intrinsic reward term generated based on the outputs of two encoder models: an online encoder model and a target encoder model. The online encoder model is iteratively trained based on a loss function, and the target encoder model is updated to bring it closer to the online encoder model.
Systems, methods, and computer programs, for training and using a machine learning system to control an agent to perform a task. The machine learning system is trained using counterfactual internal states so that it can provide an output that explains the behavior of the system in causal terms, e.g. in terms of aspects of its environment that cause the system to select particular actions for the agent.
G06N 3/008 - Artificial life, i.e. computing arrangements simulating life based on physical entities controlled by simulated intelligence so as to replicate intelligent life forms, e.g. based on robots replicating pets or humans in their appearance or behaviour
G06N 3/044 - Recurrent networks, e.g. Hopfield networks
Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for controlling a reinforcement learning agent in an environment to perform a task. In one aspect, a method comprises: maintaining a retrieval dataset that stores a plurality of history observations and, for each history observation, a respective associated context; receiving a current observation characterizing a current state of the environment; selecting one or more history observations from the plurality of history observations; processing, using an encoder neural network and in accordance with current values of encoder network parameters, an encoder network input comprising (i) the current observation and (ii) the one or more selected history observations and their respective associated context to generate a latent state representation for the current state of the environment; and using the latent state representation to determine an action to be performed by the agent in response to the current observation.
G06F 16/908 - Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for training a neural network to perform a machine learning task on one or more received inputs by using a hybrid training dataset with a semi-supervised learning technique. The hybrid training dataset includes multiple unlabeled training inputs and multiple labeled training inputs and, in some cases, more unlabeled training inputs than labeled training inputs.
Methods, systems, and apparatus, including computer programs encoded on computer storage media, for training a neural network used to select actions performed by an agent interacting with an environment by performing actions that cause the environment to transition states. One of the methods includes maintaining a replay memory storing a plurality of transitions; selecting a plurality of transitions from the replay memory; and training the neural network on the plurality of transitions, comprising, for each transition: generating an initial Q value for the transition; determining a scaled Q value for the transition; determining a scaled temporal difference learning target for the transition; determining an error between the scaled temporal difference learning target and the scaled Q value; determining an update to the current values of the Q network parameters; and determining an update to the current value of the scaling term.
G06N 3/0442 - Recurrent networks, e.g. Hopfield networks characterised by memory or gating, e.g. long short-term memory [LSTM] or gated recurrent units [GRU]
88.
TRAINING MACHINE LEARNING MODELS BY DETERMINING UPDATE RULES USING NEURAL NETWORKS
Methods, systems, and apparatus, including computer programs encoded on computer storage media for training machine learning models. One method includes obtaining a machine learning model, wherein the machine learning model comprises one or more model parameters, and the machine learning model is trained using gradient descent techniques to optimize an objective function; determining an update rule for the model parameters using a recurrent neural network (RNN); and applying a determined update rule for a final time step in a sequence of multiple time steps to the model parameters.
Methods, systems, and apparatus, including computer programs encoded on computer storage media, for determining generalized eigenvectors that characterize a data set.
Systems, methods, and computer programs for learning to control an embodied agent to perform tasks. The techniques use internal, "intra-agent" speech when learning, and are thus able to perform tasks involving new objects without any direct experience of interacting with those objects, i.e. zero-shot. Implementations of the techniques use an image captioning neural network system to generate natural language captions used when training an action selection neural network system.
G06N 3/008 - Artificial life, i.e. computing arrangements simulating life based on physical entities controlled by simulated intelligence so as to replicate intelligent life forms, e.g. based on robots replicating pets or humans in their appearance or behaviour
Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for generating a response to a query input using a selection-inference neural network.
B60W 50/06 - Improving the dynamic response of the control system, e.g. improving the speed of regulation or avoiding hunting or overshoot
G05B 13/02 - Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric
Systems and methods for encoding video, and for decoding video at an arbitrary temporal and/or spatial resolution. The techniques use a scene representation neural network that, in implementations, is configured to represent frames of a 2D or 3D video as a 3D model encoded in the parameters of the neural network.
G06N 3/04 - Architecture, e.g. interconnection topology
H04N 19/31 - Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using hierarchical techniques, e.g. scalability in the temporal domain
H04N 19/33 - Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using hierarchical techniques, e.g. scalability in the spatial domain
94.
NEGOTIATING CONTRACTS FOR AGENT COOPERATION IN MULTI-AGENT SYSTEMS
Methods, systems and apparatus, including computer programs encoded on computer storage media, for enabling agents to cooperate with one another in a way that improves their collective efficiency. The agents can modify their behavior by taking into account the behavior of other agents, so that a better overall result can be achieved than if each agent acted independently. This is done by enabling the agents to negotiate contracts with one another that restrict their respective actions.
G06Q 10/04 - Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"
A system and method that controls an agent to perform a task subject to one or more constraints. The system trains a preference neural network that learns which preferences produce constraint-satisfying action selection policies. Thus the system optimizes a hierarchical policy that is a product of a preference policy and a preference-conditioned action selection policy. Thus the system learns to jointly optimize a set of objectives relating to rewards and costs received during the task whilst also learning preferences, i.e. trade-offs between the rewards and costs, that are most likely to produce policies that satisfy the constraints.
Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for generating a response to a query input using a selection- inference neural network.
This specification describes a simulation system that performs simulations of physical environments using a graph neural network. At each of one or more time steps in a sequence of time steps, the system can process a representation of a current state of the physical environment at the current time step using the graph neural network to generate a prediction of a next state of the physical environment at the next time step. Some implementations of the system are adapted for hardware GLOBAL acceleration. As well as performing simulations, the system can be used to predict physical quantities based on measured real-world data. Implementations of the system are differentiable and can also be used for design optimization, and for optimal control tasks.
G06F 30/27 - Design optimisation, verification or simulation using machine learning, e.g. artificial intelligence, neural networks, support vector machines [SVM] or training a model
98.
DATA COMPRESSION AND RECONSTRUCTION USING SPARSE META-LEARNED NEURAL NETWORKS
Methods, systems, and apparatus, including computer programs encoded on computer storage media, for compressing and decompressing data signals using sparse, meta-learned neural networks.
Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for training neural networks to predict the structure of a protein. In one aspect, a method comprises: obtaining, for each of a plurality of proteins, a full multiple sequence alignment for the protein; generating, for each of the plurality of proteins, target structure parameters characterizing a structure of the protein from the full multiple sequence alignment for the protein, comprising processing a representation of the full multiple sequence alignment for the protein using the structure prediction neural network to generate output structure parameters characterizing a structure of the protein, and determining the target structure parameters for the protein based on the output structure parameters for the protein; determining, for each of the plurality of proteins, a reduced multiple sequence alignment for the protein, comprising removing or masking data from the full multiple sequence alignment for the protein.
A query processing system is described which receives a query input comprising an input token string and also at least one data item having a second, different modality, and generates a corresponding output token string.