Disclosed are apparatuses, systems, and techniques that enable compressed grid-based graph representations for efficient implementations of graph-mapped computing applications. The techniques include but are not limited to selecting a reference grid having a plurality of blocks, assigning nodes of the graph to blocks of the grid, and generating a graph representation that maps directions, relative to the reference grid, of nodal connections of the graph.
Apparatuses, systems, and techniques are presented to generate images representing realistic motion or activity. In at least one embodiment, one or more neural networks are used to select a first neural network to perform a first task based, at least in part, upon a performance estimated by a second neural network.
Techniques are disclosed herein for designing a circuit. The techniques include receiving a specification for a driver and a plurality of sinks; executing, based on the driver and the plurality of sinks, a machine learning model that predicts at least one of a size, a location, or a delay target of one or more buffers; generating a tree that includes a plurality of nodes representing the driver, the plurality of sinks, and the one or more buffers between the driver and one or more of the sinks; and generating a design of a circuit based on the tree.
Apparatuses, systems, and techniques to process image frames. In at least one embodiment, an application programming interface (API) is performed to indicate frame size information using one or more neural networks.
Apparatuses, systems, and techniques to process image frames. In at least one embodiment, an application programming interface (API) is performed to indicate support to use one or more neural networks to perform frame interpolation.
Apparatuses, systems, and techniques are presented to remove unintended variations introduced into data. In at least one embodiment, a first image of an object can be generated based, at least in part, upon adding noise to, and removing the noise from, a second image of the object.
Apparatuses, systems, and techniques to process image frames. In at least one embodiment, an application programming interface (API) is performed to enable frame interpolation to use one or more neural networks.
In various examples, live perception from sensors of a vehicle may be leveraged to detect and classify intersections in an environment of a vehicle in real-time or near real-time. For example, a deep neural network (DNN) may be trained to compute various outputs—such as bounding box coordinates for intersections, intersection coverage maps corresponding to the bounding boxes, intersection attributes, distances to intersections, and/or distance coverage maps associated with the intersections. The outputs may be decoded and/or post-processed to determine final locations of, distances to, and/or attributes of the detected intersections.
G06V 10/25 - Determination of region of interest [ROI] or a volume of interest [VOI]
G06V 10/75 - Image or video pattern matching; Proximity measures in feature spaces using context analysis; Selection of dictionaries
G06V 10/764 - Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
G06V 10/80 - Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
G06V 10/82 - Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
G06V 20/56 - Context or environment of the image exterior to a vehicle by using sensors mounted on the vehicle
G06V 20/70 - Labelling scene content, e.g. deriving syntactic or semantic representations
G08G 1/01 - Detecting movement of traffic to be counted or controlled
9.
MESH TOPOLOGY GENERATION USING PARALLEL PROCESSING
Various embodiments include techniques for generating topological data for a mesh included in a computer-generated environment. The mesh includes simple geometric shapes, such as triangles. The disclosed techniques identify vertices in the mesh that have the same position and have identical attributes, such as color, normal vector, and texture coordinates. The disclosed techniques further identify vertices in the mesh that have the same position but differ in one or more attributes. The techniques generate lists of the triangles that are adjacent to each vertex included in the mesh. The techniques generate a list of the unique edges included in the mesh. Further, the techniques are well suited for execution on highly parallel processors, such as graphics processing units, thereby reducing the time to generate this topological data. The topological data may then be efficiently used by other computer graphics processing operations.
Techniques are described for detecting an electromagnetic (“EM”) fault injection attack directed toward circuitry in a target digital system. In various embodiments, a first node may be coupled to first driving circuitry, and a second node may be coupled to second driving circuitry. The driving circuitry is implemented in a manner such that a logic state on the second node has greater sensitivity to an EM pulse than has a logic state on the first node. Comparison circuitry may be coupled to the first and to the second nodes to assert an attack detection output responsive to sensing a logic state on the second node that is unexpected relative to a logic state on the first node.
G06F 21/75 - Protecting specific internal or peripheral components, in which the protection of a component leads to protection of the entire computer to assure secure computing or processing of information by inhibiting the analysis of circuitry or operation, e.g. to counteract reverse engineering
G06F 21/52 - Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems during program execution, e.g. stack integrity, buffer overflow or preventing unwanted data erasure
11.
EARLY RELEASE OF RESOURCES IN RAY TRACING HARDWARE
Techniques are disclosed for improving the throughput of ray intersection or visibility queries performed by a ray tracing hardware accelerator. Throughput is improved, for example, by releasing allocated resources before ray visibility query results are reported by the hardware accelerator. The allocated resources are released when the ray visibility query results can be stored in a compressed format outside of the allocated resources. When reporting the ray visibility query results, the results are reconstructed based on the results stored in the compressed format. The compressed format storage can be used for ray visibility queries that return no intersections or terminate on any hit ray visibility query. One or more individual components of allocated resources can also be independently deallocated based on the type of data to be returned and/or results of the ray visibility query.
A method for generating, by an encoder-based model, a three-dimensional (3D) representation of a two-dimensional (2D) image is provided. The encoder-based model is trained to infer the 3D representation using a synthetic training data set generated by a pre-trained model. The pre-trained model is a 3D generative model that produces a 3D representation and a corresponding 2D rendering, which can be used to train a separate encoder-based model for downstream tasks like estimating a triplane representation, neural radiance field, mesh, depth map, 3D key points, or the like, given a single input image, using the pseudo ground truth 3D synthetic training data set. In a particular embodiment, the encoder-based model is trained to predict a triplane representation of the input image, which can then be rendered by a volume renderer according to pose information to generate an output image of the 3D scene from the corresponding viewpoint.
Systems techniques to control a robot are described herein. In at least one embodiment, a machine learning model for controlling a robot is trained based at least on one or more population-based training operations or one or more reinforcement learning operations. Once trained, the machine learning model can be deployed and used to control a robot to perform a task.
A receiver device includes detection logic, error counter logic, and threshold logic. The detection detects frame errors in data frames received by a transmitter device. The error counter logic increments a first value of an error count responsive to each error signal, indicative of a frame error in a data frame, received from the detection logic. The error counter logic reduces the first value to a second value (non-zero value) for the error count responsive to receiving a decrement signal and a period marker signal corresponding to a programmable period. The error counter logic resets the first value or the second value of the error count to zero responsive to receiving a reset signal. The threshold logic compares a current value of the error count with a threshold number of frame errors and output an interrupt responsive to the current value satisfying the threshold number of frame errors.
Technologies for generating a graphical user interface (GUI) dashboard with a three-dimensional (3D) grid of unit cells are described. An anomaly statistic can be determined for a set of records. A subset of network address identifiers can be identified and sorted according to the anomaly statistic. The subset can have higher anomaly statistics than other network address identifiers. There can be a maximum number in the subset. The GUI dashboard is generated with unit cells organized by the subset of network address identifiers as rows, time intervals as columns, colors as a configurable anomaly score indicator, and a number of network access events as column heights. Each unit cell is a colored, 3D visual object representing a composite score of anomaly scores associated with zero or more network access events corresponding to the respective network address identifier at the respective time interval. The GUI dashboard is rendered on a display.
G06F 3/04845 - Interaction techniques based on graphical user interfaces [GUI] for the control of specific functions or operations, e.g. selecting or manipulating an object, an image or a displayed text element, setting a parameter value or selecting a range for image manipulation, e.g. dragging, rotation, expansion or change of colour
16.
SENSOR CALIBRATION USING FIDUCIAL MARKERS FOR IN-CABIN MONITORING SYSTEMS AND APPLICATIONS
In various examples, sensor parameter calibration techniques for in-cabin monitoring systems and applications are presented. An occupant monitoring system (OMS) is an example of a system that may be used within a vehicle or machine cabin to perform real-time assessments of driver and occupant presence, gaze, alertness, and/or other conditions. In some embodiments, a calibration parameter for an interior image sensor is determined so that the coordinates of features detected in 2D captured images may be referenced to an in-cabin 3D coordinate system. In some embodiments, a processing unit may detect fiducial points using an image of an interior space captured by a sensor, determine a 2D image coordinate for a fiducial point using the image, determine a 3D coordinate for the fiducial point, determine a calibration parameter comprising a rotation-translation transform from the 2D image coordinate and the 3D coordinate, and configure an operation based on the calibration parameter.
One embodiment of a method for generating representations of scenes includes assigning each image included in a set of images of a scene to one or more clusters of images based on a camera pose associated with the image, and performing one or more operations to generate, for each cluster included in the one or more clusters, a corresponding three-dimensional (3D) representation of the scene based on one or more images assigned to the cluster.
In various examples, calibration techniques for interior depth sensors and image sensors for in-cabin monitoring systems and applications are provided. An intermediary coordinate system may be generated using calibration targets distributed within an interior space to reference 3D positions of features detected by both depth-perception and optical image sensors. Rotation-translation transforms may be determined to compute a first transform (H1) between the depth-perception sensor's 3D coordinate system and the 3D intermediary coordinate system, and a second transform (H2) between the optical image sensor's 2D coordinate system and the intermediary coordinate system. A third transform (H3) between the depth-perception sensor's 3D coordinate system and the optical image sensor's 2D coordinate system can be computed as a function of H1 and H2. The calibration targets may comprise a structural substrate that includes one or more fiducial point markers and one or more motion targets.
G06V 10/24 - Aligning, centring, orientation detection or correction of the image
B60W 40/02 - Estimation or calculation of driving parameters for road vehicle drive control systems not related to the control of a particular sub-unit related to ambient conditions
G06T 3/60 - Rotation of a whole image or part thereof
Various embodiments include techniques for performing parallel edge decimation on a high resolution mesh by collapsing multiple edges in parallel by blocking only the neighbor edges of the edges selected as collapse candidates. Effectively, the disclosed techniques dynamically partition the mesh into small partitions around the collapse candidates. In this manner, the techniques identify all the edges that may be independently collapsed in a single, now parallel, iteration. Edge decimation may be performed so that certain computational geometry techniques can be efficiently applied to a simpler mesh. In so doing, the disclosed techniques preserve the history of how the edge decimation process displaces the vertices of the original mesh to generate the simplified mesh. As a result, the results of the computational geometry techniques as applied to the simplified mesh can be propagated back to the original mesh.
One embodiment of a display system includes one or more light sources, one or more spatial light modulators, and a plurality of scatterers. One embodiment of a method for displaying content includes computing at least one of a phase or an amplitude modulation associated with two-dimensional (2D) or three-dimensional (3D) content, and causing one or more spatial light modulators to modulate light based on the at least one of a phase or an amplitude modulation to generate modulated light, where the modulated light is scattered by a plurality of scatterers.
Apparatuses, systems, and techniques to cause one or more neural networks to be trained. In at least one embodiment, a processor includes one or more circuits to cause one or more neural networks to be trained based, at least in part, on one or more capabilities.
G06N 3/04 - Architecture, e.g. interconnection topology
H04L 41/16 - Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks using machine learning or artificial intelligence
22.
PARALLEL WORKLOAD SCHEDULING BASED ON WORKLOAD DATA COHERENCE
Approaches for addressing issues associated with processing workloads that exhibit high divergence in execution and data access are provided. A plurality of workload items to be processed at least partially in parallel may be identified. Coherence information associated with the plurality of workload items may be determined. The plurality of workload items may be enqueued in a segmented queue. The plurality of workload items may be sorted based at least on a similarity of the coherence information. The sorted plurality of workload items may be stored to the queue. Using a set of processing units, the workload items in the queue may be processed at least partially in parallel according to an order of the sorting.
Techniques applicable to a ray tracing hardware accelerator for traversing a hierarchical acceleration structure with reduced false positive ray intersections are disclosed. The reduction of false positives may be based upon one or more of selectively performing a secondary higher precision intersection test for a bounding volume, identifying and culling bounding volumes that degenerate to a point, and parametrically clipping rays that exceed certain configured distance thresholds.
Apparatuses, systems, and techniques to perform versions of program code. In at least one embodiment, one or more versions of a plurality of versions of software code are performed. In at least one embodiment, one or more versions of a plurality of versions of software code are performed based, at least in part, on whether the versions of the program code access overlapping memory regions.
Disclosed are apparatuses, systems, and techniques that may use machine learning for determining transmitted signals in communication systems that deploy orthogonal frequency division multiplexing. A system for performing the disclosed techniques includes receiving (RX) antennas to receive RX signals, each RX signal received over a respective resource element of a resource grid. Individual resource elements of the resource grid are associated with different radio subcarriers and/or data symbols. The RX signals include a combination of a plurality of transmitted (TX) streams. The system further includes a processing device to process the RX signals using one or more neural network models to determine TX data symbols transmitted via the plurality of TX streams.
Systems and methods are disclosed that relate to freespace detection using machine learning models. First data that may include object labels may be obtained from a first sensor and freespace may be identified using the first data and the object labels. The first data may be annotated to include freespace labels that correspond to freespace within an operational environment. Freespace annotated data may be generated by combining the one or more freespace labels with second data obtained from a second sensor, with the freespace annotated data corresponding to a viewable area in the operational environment. The viewable area may be determined by tracing one or more rays from the second sensor within the field of view of the second sensor relative to the first data. The freespace annotated data may be input into a machine learning model to train the machine learning model to detect freespace using the second data.
Techniques applicable to a ray tracing hardware accelerator for traversing a hierarchical acceleration structure with reduced false positive ray intersections are disclosed. The reduction of false positives may be based upon one or more of selectively performing a secondary higher precision intersection test for a bounding volume, identifying and culling bounding volumes that degenerate to a point, and parametrically clipping rays that exceed certain configured distance thresholds.
Apparatuses, systems, and techniques to process image frames. In at least one embodiment, one or more neural networks are used to blend two or more video frames between a first video frame and a second video frame. In at least one embodiment, a blended video frame is used to generate an intermediate video frame between the first video frame and the second video frame.
Apparatuses, systems, and techniques to select one or more beams to transmit signals. In at least one embodiment, a system includes one or more circuits to select one or more wireless signal beams based, at least in part, on measuring one or more received reference signals.
H04B 7/08 - Diversity systems; Multi-antenna systems, i.e. transmission or reception using multiple antennas using two or more spaced independent antennas at the receiving station
30.
IDENTIFYING OBJECTS USING NEURAL NETWORK-GENERATED DESCRIPTORS
Apparatuses, systems, and techniques are presented to identify one or more objects. In at least one embodiment, one or more neural networks can be used to identify one or more objects based, at least in part, on one or more descriptors of one or more segments of the one or more objects.
G06V 10/82 - Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
G06V 10/26 - Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
G06V 10/77 - Arrangements for image or video recognition or understanding using pattern recognition or machine learning using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
Apparatuses, systems, and techniques to optimize processor performance. In at least one embodiment, a processor is to perform an application programming interface (API) to exclude one or more portions of program code from a program.
Apparatuses, systems, and techniques to process image frames. In at least one embodiment, an application programming interface (API) is performed to disable frame interpolation to use one or more neural networks.
Apparatuses, systems, and techniques are presented to generate digital content. In at least one embodiment, one or more neural networks are used to generate one or more textured three-dimensional meshes corresponding to one or more objects based, at least in part, one or more two-dimensional images of the one or more objects.
In various examples, the decoding and upscaling capabilities of a client device are analyzed to determine encoding parameters and operations used by a content streaming server to generate encoded video streams. The quality of the upscaled content of the client device may be monitored by the streaming servers such that the encoding parameters may be updated based on the monitored quality. In this way, the encoding operations of one or more streaming servers may be more effectively matched to the decoding and upscaling abilities of one or more client devise such that an increased number of client devices may be served by the streaming servers.
H04N 19/59 - Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding involving spatial sub-sampling or interpolation, e.g. alteration of picture size or resolution
H04N 19/105 - Selection of the reference unit for prediction within a chosen coding or prediction mode, e.g. adaptive choice of position and number of pixels used for prediction
H04N 19/146 - Data rate or code amount at the encoder output
To improve the efficiency of bounding volumes in a hardware based ray tracer, we employ a sheared axis-aligned bounding box to approximate an oriented bounding box typically defined by rotations. To achieve this, the bounding volume hierarchy builder shears an axis-aligned box to fit tightly around its enclosed oriented geometry in top level or bottom level space, then computes the inverse shear transform. The bounds are still stored as axis-aligned boxes in memory, now defined in the new sheared coordinate system, along with the derived parameters to transform a ray into the sheared coordinate system before testing intersection with the boxes. The ray-bounding volume intersection test is performed as usual, just in the new sheared coordinate system. Additional efficiencies are gained by constraining the number of shear dimensions, constraining the shear transform coefficients to a quantized list, sharing a shear transform across a collection of bounds, performing a shear transform only for ray-bounds testing and not for ray-geometry intersection testing, and adding a specialized shear transform calculator/accelerator to the hardware.
Apparatuses, systems, and techniques to transmit configuration information. In at least one embodiment, a processor includes one or more circuits to wirelessly transmit reference signal configuration information corresponding to one or more reference signals.
Systems, methods, and devices for performing computing operations are provided. In one example, a device is described to include a first processing unit and second processing unit in communication via a network interconnect. The first processing unit is configured to offload at least one of computation tasks and communication tasks to the second processing unit while the first processing unit performs the application-level processing tasks. The second processing unit is also configured to provide a result vector to the first processing unit when the at least one of computation tasks and communication tasks are completed.
Technologies for generating a set of models for each account, where each model is a fine-grained, unsupervised behavior model trained for each user to monitor and detect anomalous patterns are described. An unsupervised training pipeline can generate user models, each being associated with one of multiple accounts and is trained to detect an anomalous pattern using feature data associated with the one account. Each account is associated with at least one of a user, a machine, or a service. An inference pipeline can detect a first anomalous pattern in first data associated with a first account using a first user model. The inference pipeline can detect a second anomalous pattern in second data associated with a second account using a second user model.
In various examples, systems and methods that use dialogue systems associated with various machine systems and applications are described. For instance, the systems and methods may receive text data representing speech, such as a question associated with a vehicle or other machine type. The systems and methods then use a retrieval system(s) to retrieve a question/answer pair(s) associated with the text data and/or contextual information associated with the text data. In some examples, the contextual information is associated with a knowledge base associated with or corresponding to the vehicle. The systems and methods then generate a prompt using the text data, the question/answer pair(s), and/or the contextual information. Additionally, the systems and methods determine, using a language model(s) and based at least on the prompt, an output associated with the text data. For instance, the output may include information that answers the question associated with the vehicle.
Apparatuses, systems, and techniques to use one or more neural networks to generate an upsampled version of one or more images based, at least in part, on a denoised version of said one or more images. At least one embodiment pertains to generating an upsampled high-resolution image from a noisy version and denoised version of a low-resolution image. At least one embodiment pertains to separating components of a low-resolution image before denoising an image.
Techniques applicable to a ray tracing hardware accelerator for traversing a hierarchical acceleration structure with reduced false positive ray intersections are disclosed. The reduction of false positives may be based upon one or more of selectively performing a secondary higher precision intersection test for a bounding volume, identifying and culling bounding volumes that degenerate to a point, and parametrically clipping rays that exceed certain configured distance thresholds.
A circuit for improving control over asynchronous signal crossings during circuit scan tests includes multiple scan registers and a decoder configured to translate a combined output of the scan registers into multiple one-hot controls to the local clock gates of scan registers disposed in multiple different clock domains. Programmable registers are provided to selectively enable and disable the local clock gates of the different clock domains.
Apparatuses, systems, and techniques to generate animations. In at least one embodiment, one or more neural networks control motion of one or more animated objects based, at least in part, on natural language inputs.
Apparatuses, systems, and techniques to optimize processor performance. In at least one embodiment, a method increases an operation voltage of one or more processors, based at least in part, on one or more error rates of the one or more processors.
Approaches presented herein can provide for the performance of specific types of tasks using a large model, without a need to retrain the model. Custom endpoints can be trained for specific types of tasks, as may be indicated by the specification of one or more guidance mechanisms. A guidance mechanism can be added to or used along with a request to guide the model in performing a type of task with respect to a string of text. An endpoint receiving such a request can perform any marshalling needed to get the request in a format required by the model, and can add the guidance mechanisms to the request by, for example, prepending one or more text strings (or text prefixes) to a text-formatted request. A model receiving this string can process the text according to the guidance mechanisms. Such an approach can allow for a variety of tasks to be performed by a single model.
Apparatuses, systems, and techniques to annotate images using neural models. In at least one embodiment, neural networks generate mask information from labels of one or more objects within one or more images identified by one or more other neural networks.
G06V 10/774 - Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
G06V 10/764 - Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
G06V 10/82 - Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
47.
COMPONENT ANALYSIS FROM MULTIPLE MODALITIES IN AN INTERACTION ENVIRONMENT
Systems and methods integrate different portions of a design review, such as files from a variety of different sources, into an interaction environment for review and interaction by a number of reviewing parties. The reviewing parties interact through an interface that is different from a native software of the files. An automated design review may be performed to evaluate a common rendering, formed from the files, for one or more conflicts, including interferences or version errors.
In various examples, systems and methods are presented for model-based trajectory simulation of agents in a simulated environment. Traffic simulators mimic reality so that autonomous or semi-autonomous vehicle design teams can validate driving models in environments that have diversity and complexity. In some embodiments, for a model-controlled agent of a simulation environment, a plurality of navigation probability distributions are generated, each of the plurality of navigation probability distributions defining a candidate trajectory for the agent to follow. A trajectory is selected for the agent based at least on at least one of the plurality of navigation probability distributions, and the agent is moved within the simulation environment based at least on the selected trajectory. In some embodiments, a search algorithm may be applied across multiple time-steps of a simulation, for example, to identify the occurrence of collision-free sequences of navigation probability distributions.
B60W 60/00 - Drive control systems specially adapted for autonomous road vehicles
B60W 30/095 - Predicting travel path or likelihood of collision
B60W 50/00 - CONJOINT CONTROL OF VEHICLE SUB-UNITS OF DIFFERENT TYPE OR DIFFERENT FUNCTION; CONTROL SYSTEMS SPECIALLY ADAPTED FOR HYBRID VEHICLES; ROAD VEHICLE DRIVE CONTROL SYSTEMS FOR PURPOSES NOT RELATED TO THE CONTROL OF A PARTICULAR SUB-UNIT - Details of control systems for road vehicle drive control not related to the control of a particular sub-unit
B60W 50/14 - Means for informing the driver, warning the driver or prompting a driver intervention
Apparatuses, systems, and techniques to perform matrix multiply-accumulate (MMA) operations on data of a first type using one or more MMA instructions for data of a second type. In at least one embodiment, a single tensorfloat-32 (TF32) MMA instruction computes a 32-bit floating point (FP32) output using TF32 input operands converted from FP32 data values.
G06F 7/544 - Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation using unspecified devices for evaluating functions by calculation
Apparatuses, systems, and techniques are presented to identify and prevent generation of restricted content. In at least one embodiment, one or more neural networks are used to identify restricted content based only on the restricted content.
Approaches presented herein provide for the maintaining of fine details that might be removed by a denoiser used to reduce an amount of noise in an image. An input image can be provided to a denoiser, and can also can be simultaneously processed to extract pixel data that may correspond to fine detail or high frequency features. Individual pixels of an image can have a value determined for a material property sampled for that pixel location, and that value can be compared against an average material property value determined for neighboring pixels. The ratio of material values can be multiplied by the value of a corresponding pixel of the denoised image, for any or all pixel locations, to obtain final pixel values for an output image that include less noise than the original image but represent fine detail that may otherwise have been lost during the denoising process.
Apparatuses, systems, and techniques to perform one or more APIs. In at least one embodiment, a processor is to perform an API to indicate a number of 5G-NR cells that are able to be performed concurrently by one or more processors; a processor is to perform an API to indicate whether one or more processors are able to perform a first number of 5G-NR cells concurrently; a processor comprising one or more circuits is to perform an API to indicate whether one or more resources of one or more processors are allocated to perform 5G-NR cells; and/or a processor comprises one or more circuits to perform an API to indicate one or more techniques to be used by one or more processors in performing one or more 5G-NR cells.
A die including a die body having a first body surface, a second body surface on an opposite side of the die body as the first body surface, an interconnect region adjacent to the first body surface including interconnect dielectric layers with metal lines and vias, a transistor region above the interconnect region, the metal lines and vias making electrical connections to one or more power rails of the transistor region and electrically connected to transistors of the transistor region, a power region above the transistor region including an electro-conductive film on the second body surface and TSVs in the power region, an outer end of the TSV contacting the film and an embedded end of the TSVs contacting one of the power rails. A method of manufacturing an IC package and computer with the IC package are also disclosed.
H01L 23/528 - Layout of the interconnection structure
H01L 21/56 - Encapsulations, e.g. encapsulating layers, coatings
H01L 21/768 - Applying interconnections to be used for carrying current between separate components within a device
H01L 23/00 - SEMICONDUCTOR DEVICES NOT COVERED BY CLASS - Details of semiconductor or other solid state devices
H01L 23/29 - Encapsulation, e.g. encapsulating layers, coatings characterised by the material
H01L 23/48 - Arrangements for conducting electric current to or from the solid state body in operation, e.g. leads or terminal arrangements
H01L 23/522 - Arrangements for conducting electric current within the device in operation from one component to another including external interconnections consisting of a multilayer structure of conductive and insulating layers inseparably formed on the semiconductor body
H01L 25/16 - Assemblies consisting of a plurality of individual semiconductor or other solid state devices the devices being of types provided for in two or more different main groups of groups , or in a single subclass of , , e.g. forming hybrid circuits
54.
APPLICATION PROGRAMMING INTERFACE TO CAUSE PERFORMANCE OF FRAME INTERPOLATION
Apparatuses, systems, and techniques to process image frames. In at least one embodiment, an application programming interface (API) is performed to cause frame interpolation to be performed using one or more neural networks.
Systems and methods include a first valve that controls a flow rate of a coolant. A processor is configured to set the flow rate of the coolant to a rate that maintains a vapor quality, measured at an outlet of the coolant, within a predetermined quality range.
Apparatuses, systems, and techniques to generate a prompt for one or more machine learning processes. In at least one embodiment, the machine learning process(es) generate(s) a plan to perform a task (identified in the prompt) that is to be performed by an agent (real world or virtual).
Apparatuses, systems, and techniques to perform neural networks. In at least one embodiment, a most consistent output of one or more pre-trained neural networks is to be selected. In at least one embodiment, a most consistent output of one or more pre-trained neural networks is to be selected based, at least in part, on a plurality of variances of one or more inputs to the one or more neural networks.
Systems and techniques are described related to training one or more machine learning models for use in control of a robot. In at least one embodiment, one or more machine learning models are trained based at least on simulations of the robot and renderings of such simulations—which may be performed using one or more ray tracing algorithms, operations, or techniques.
Embodiments of the present disclosure relate to a method of automated tuning of control parameters. In some implementations, the method may include obtaining, from a search algorithm, one or more parameter sets that determine how a controller responds to an environment with at least one changing variable. In these and other implementations, at least one of the parameter sets may include a vector parameter that includes a vector of values. In these and other implementations, a value selected from the vector of values for the vector parameter during operation of the controller may be based on the at least one changing variable. In some implementations, the method may include ordering the vector of values for the vector parameter of the parameter sets and simulating at least one operation of the controller using the parameter sets with the ordered vector of values for the vector parameter.
G05B 13/04 - Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric involving the use of models or simulators
Apparatuses, systems, and techniques adjust a frequency at which a processor operates. In at least one embodiment, a frequency at which a processor operates is adjusted based, at least in part, on different cores of the processor performing one or more identical instructions.
Apparatuses, systems, and techniques to generate a video using two or more images comprising objects to be included in the video. In at least one embodiment, objects are identified in two or more images using one or more neural networks, to generate a video to include the objects in the video.
G06V 10/25 - Determination of region of interest [ROI] or a volume of interest [VOI]
G06V 10/44 - Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
G06V 10/74 - Image or video pattern matching; Proximity measures in feature spaces
G06V 10/771 - Feature selection, e.g. selecting representative features from a multi-dimensional feature space
G06V 10/82 - Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
Systems and methods include pressure sensors that measure a pressure differential of coolant between a first coolant line and a second coolant line. Coolant flow control valves control respective valve flow rates. A processor selects a valve from the flow control valves to provide coolant to a coolant output, responsive to the measured pressure differential.
H05K 7/20 - Modifications to facilitate cooling, ventilating, or heating
G01F 1/34 - Measuring the volume flow or mass flow of fluid or fluent solid material wherein the fluid passes through a meter in a continuous flow by using mechanical effects by measuring pressure or differential pressure
63.
LANDMARK DETECTION WITH AN ITERATIVE NEURAL NETWORK
Landmark detection refers to the detection of landmarks within an image or a video, and is used in many computer vision tasks such emotion recognition, face identity verification, hand tracking, gesture recognition, and eye gaze tracking. Current landmark detection methods rely on a cascaded computation through cascaded networks or an ensemble of multiple models, which starts with an initial guess of the landmarks and iteratively produces corrected landmarks which match the input more finely. However, the iterations required by current methods typically increase the training memory cost linearly, and do not have an obvious stopping criteria. Moreover, these methods tend to exhibit jitter in landmark detection results for video. The present disclosure improves current landmark detection methods by providing landmark detection using an iterative neural network. Furthermore, when detecting landmarks in video, the present disclosure provides for a reduction in jitter due to reuse of previous hidden states from previous frames.
Systems and techniques for performing multicast-reduction operations. In at least one embodiment, a network device receives first network data associated with a multicast operation to be collectively performed by at least a plurality of endpoints. The network device reserves resources to process second network data to be received from the endpoints, and sends the first network data to a plurality of additional network devices. The network device receives the second network data, and processes the second network data using the reserved resources.
Apparatuses, systems, and techniques to optimize performance of a processor group. In at least one embodiment, a method increases a processor's clock frequency based, at least in part, on performance of other processors in a group.
In various examples, a time conversion operation may be performed based at least on updating a first local clock of a component based at least on a reference clock of a system including the component. A difference between a current time of the first local clock and a current time of a second local clock of the component may be determined. A state of at least one of the reference clock, the first local clock, or the second local clock may be determined based at least on comparing the time difference to a previously determined difference between a time of the reference clock and a time of the second local clock.
In various examples, techniques for determining perception zones for object detection are described. For instance, a system may use a dynamic model associated with an ego-machine, a dynamic model associated with an object, and one or more possible interactions between the ego-machine and the object to determine a perception zone. The system may then perform one or more processes using the perception zone. For instance, if the system is validating a perception system of the ego-machine, the system may determine whether a detection error associated with the object is a safety-critical error based on whether the object is located within the perception zone. Additionally, if the system is executing within the ego-machine, the system may determine whether the object is a safety-critical object based on whether the object is located within the perception zone.
G05D 1/02 - Control of position or course in two dimensions
G06V 20/58 - Recognition of moving objects or obstacles, e.g. vehicles or pedestrians; Recognition of traffic objects, e.g. traffic signs, traffic lights or roads
68.
USING SCENE-AWARE CONTEXT FOR CONVERSATIONAL AI SYSTEMS AND APPLICATIONS
In various examples, techniques for using scene-aware context for dialogue systems and applications are described herein. For instance, systems and methods are disclosed that process audio data representing speech in order to determine an intent associated with the speech. Systems and methods are also disclosed that process sensor data representing at least a user in order to determine a point of interest associated with the user. In some examples, the point of interest may include a landmark, a person, and/or any other object within an environment. The systems and methods may then generate a context associated with the point of interest. Additionally, the systems and methods may process the intent and the context using one or more language models. Based on the processing, the language model(s) may output data associated with the speech.
A game-agnostic event detector can be used to automatically identify game events. Game-specific configuration data can be used to specify types of pre-processing to be performed on media for a game session, as well as types of detectors to be used to detect events for the game. Event data for detected events can be written to an event log in a form that is both human- and process-readable. The event data can be used for various purposes, such as to generate highlight videos or provide player performance feedback.
A63F 13/30 - Interconnection arrangements between game servers and game devices; Interconnection arrangements between game devices; Interconnection arrangements between game servers
A63F 13/426 - Processing input control signals of video game devices, e.g. signals generated by the player or derived from the environment by mapping the input signals into game commands, e.g. mapping the displacement of a stylus on a touch screen to the steering angle of a virtual vehicle involving on-screen location information, e.g. screen coordinates of an area at which the player is aiming with a light gun
A63F 13/428 - Processing input control signals of video game devices, e.g. signals generated by the player or derived from the environment by mapping the input signals into game commands, e.g. mapping the displacement of a stylus on a touch screen to the steering angle of a virtual vehicle involving motion or position input signals, e.g. signals representing the rotation of an input controller or a player's arm motions sensed by accelerometers or gyroscopes
A63F 13/44 - Processing input control signals of video game devices, e.g. signals generated by the player or derived from the environment involving timing of operations, e.g. performing an action within a time slot
A63F 13/79 - Game security or game management aspects involving player-related data, e.g. identities, accounts, preferences or play histories
A63F 13/86 - Watching games played by other players
G07F 17/32 - Coin-freed apparatus for hiring articles; Coin-freed facilities or services for games, toys, sports, or amusements
H04N 21/234 - Processing of video elementary streams, e.g. splicing of video streams or manipulating MPEG-4 scene graphs
H04N 21/44 - Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to MPEG-4 scene graphs
70.
APPLICATION PROGRAMMING INTERFACE TO ACCELERATE MATRIX OPERATIONS
Apparatuses, systems, and techniques to determine a matrix multiplication algorithm for a matrix multiplication operation. In at least one embodiment, a matrix multiplication operation is analyzed to determine an appropriate matrix multiplication algorithm to perform the matrix multiplication algorithm.
Systems and methods for cooling a datacenter are disclosed. In at least one embodiment, a liquid-to-liquid heat exchanger associated with a rear door of a rack exchanges heat between a primary coolant associated with a chilling facility and a secondary coolant or fluid associated with a computing device of the rack.
An artificial intelligence framework is described that incorporates a number of neural networks and a number of transformers for converting a two-dimensional image into three-dimensional semantic information. Neural networks convert one or more images into a set of image feature maps, depth information associated with the one or more images, and query proposals based on the depth information. A first transformer implements a cross-attention mechanism to process the set of image feature maps in accordance with the query proposals. The output of the first transformer is combined with a mask token to generate initial voxel features of the scene. A second transformer implements a self-attention mechanism to convert the initial voxel features into refined voxel features, which are up-sampled and processed by a lightweight neural network to generate the three-dimensional semantic information, which may be used by, e.g., an autonomous vehicle for various advanced driver assistance system (ADAS) functions.
B60W 50/14 - Means for informing the driver, warning the driver or prompting a driver intervention
G06T 3/40 - Scaling of a whole image or part thereof
G06V 10/44 - Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
G06V 10/771 - Feature selection, e.g. selecting representative features from a multi-dimensional feature space
G06V 10/82 - Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
State information can be determined for a subject that is robust to different inputs or conditions. For drowsiness, facial landmarks can be determined from captured image data and used to determine a set of blink parameters. These parameters can be used, such as with a temporal network, to estimate a state (e.g., drowsiness) of the subject. To improve robustness, an eye state determination network can determine eye state from the image data, without reliance on intermediate landmarks, that can be used, such as with another temporal network, to estimate the state of the subject. A weighted combination of these values can be used to determine an overall state of the subject. To improve accuracy, individual behavior patterns and context information can be utilized to account for variations in the data due to subject variation or current context rather than changes in state.
G06V 20/59 - Context or environment of the image inside of a vehicle, e.g. relating to seat occupancy, driver state or inner lighting conditions
B60W 40/08 - Estimation or calculation of driving parameters for road vehicle drive control systems not related to the control of a particular sub-unit related to drivers or passengers
G06F 18/21 - Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
An alternate root tree or graph structure for ray and path tracing enables dynamic instancing build time decisions to split any number of geometry acceleration structures in a manner that is developer transparent, nearly memory storage neutral, and traversal efficient. The resulting traversals only need to partially traverse the acceleration structure, which improves efficiency. One example use reduces the number of false positive instance acceleration structure to geometry acceleration structure transitions for many spatially separated instances of the same geometry.
In various examples, a corrective operation may be performed based at least in part on detecting that at least one circuit is operating asynchronously with respect to a reference clock. An indication that at least one circuit operating asynchronously was detected may be generated. Upon detecting a circuit operating asynchronously, a corrective operation may be performed such that a component that receives data generated using the at least one circuit continues operating in view of the indication.
In various examples, techniques for detecting occluded objects within an environment are described. For instance, systems and methods may receive training data representing images and ground truth data indicating whether the images are associated with occluded objects or whether the images are not associated with occluded objects. The systems and methods may then train a neural network to detect occluded objects using the training data and the ground truth data. After training, the systems and methods may use the neural network to detect occluded objects within an environment. For instance, while a vehicle is navigating, the vehicle may process sensor data using the neural network. The neural network may then output data indicating whether an object is located within the environment and occluded from view of the vehicle. In some examples, the neural network may further output additional information associated with the occluded object.
G06V 20/58 - Recognition of moving objects or obstacles, e.g. vehicles or pedestrians; Recognition of traffic objects, e.g. traffic signs, traffic lights or roads
G06V 10/774 - Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
The technology disclosed herein involves using a transformation curve to modify colors of images so that those images are more easily viewed by persons with a color vision deficiency (CVD). The transformation curve is applied to spectral versions of images in which each pixel has a spectral representation to modify the spectral versions of the images. A spectral version of an image is modified by, for each pixel of the spectral version of the image, modifying intensities of one or more wavelengths by applying the one or more wavelengths to the transformation curve, which transforms the intensities from source wavelengths to destination wavelengths. The modified spectral version of the image is then modified to a modified version of the image in a color space, such as the RGB color space.
A vision transformer (ViT) is a deep learning model that performs one or more vision processing tasks. ViTs may be modified to include a global task that clusters images with the same concept together to produce semantically consistent relational representations, as well as a local task that guides the ViT to discover object-centric semantic correspondence across images. A database of concepts and associated features may be created and used to train the global and local tasks, which may then enable the ViT to perform visual relational reasoning faster, without supervision, and outside of a synthetic domain.
Various techniques for adaptive rendering of images with noise reduction are described. More specifically, the present disclosure relates to approaches for rendering and denoising images—such as ray-traced images—in an iterative process that distributes computational efforts to pixels where denoised output is predicted with higher uncertainty. In some embodiments, an input image may be fed into a deep neural network (DNN) to jointly predict a denoised image and an uncertainty map. The uncertainty map may be used to create a distribution of additional samples (e.g., for one or more samples per pixel on average), and the additional samples may be used with the input image to adaptively render a higher quality image. This process may be repeated in a loop, until some criterion is satisfied, for example, when the denoised image converges to a designated quality, a time or sampling budget is satisfied, or otherwise.
In various examples, metadata may be generated corresponding to compressed data streams that are compressed according to serial compression algorithms—such as arithmetic encoding, entropy encoding, etc.—in order to allow for parallel decompression of the compressed data. As a result, modification to the compressed data stream itself may not be required, and bandwidth and storage requirements of the system may be minimally impacted. In addition, by parallelizing the decompression, the system may benefit from faster decompression times while also reducing or entirely removing the adoption cycle for systems using the metadata for parallel decompression.
In training a deep neural network using reduced precision, gradient computation operates on larger values without affecting the rest of the training procedure. One technique trains the deep neural network to develop loss, scales the loss, computes gradients at a reduced precision, and reduces the magnitude of the computed gradients to compensate for scaling of the loss. In one example non-limiting arrangement, the training forward pass scales a loss value by some factor S and the weight update reduces the weight gradient contribution by 1/S. Several techniques can be used for selecting scaling factor S and adjusting the weight update.
In various examples, surface profile estimation and bump detection may be performed based on a three-dimensional (3D) point cloud. The 3D point cloud may be filtered in view of a portion of an environment including drivable free-space, and within a threshold height to factor out other objects or obstacles other than a driving surface and protuberances thereon. The 3D point cloud may be analyzed—e.g., using a sliding window of bounding shapes along a longitudinal or other heading direction—to determine one-dimensional (1D) signal profiles corresponding to heights along the driving surface. The profile itself may be used by a vehicle—e.g., an autonomous or semi-autonomous vehicle—to help in navigating the environment, and/or the profile may be used to detect bumps, humps, and/or other protuberances along the driving surface, in addition to a location, orientation, and geometry thereof.
In various examples, scenarios may be defined using a declarative description—e.g., defining a behavior of interest—that the present system may convert into a procedural description for generating one or more instances and/or variations of a scenario for testing an autonomous or semi-autonomous machine in a virtual environment. The system may execute observers or evaluators for testing the performance and accuracy of the machine and may compute coverage of various elements based on the generated virtual scenarios, and may feed the results back to the system to generate additional instances and/or variations where the coverage or accuracy is below a desired level. As a result, the system may include an end-to-end framework for generating scenarios in virtual environments, testing and validating the scenarios themselves, and/or testing and validating the underlying autonomous or semi-autonomous systems of the machine—all based on a declarative description.
G06F 30/27 - Design optimisation, verification or simulation using machine learning, e.g. artificial intelligence, neural networks, support vector machines [SVM] or training a model
B60W 50/00 - CONJOINT CONTROL OF VEHICLE SUB-UNITS OF DIFFERENT TYPE OR DIFFERENT FUNCTION; CONTROL SYSTEMS SPECIALLY ADAPTED FOR HYBRID VEHICLES; ROAD VEHICLE DRIVE CONTROL SYSTEMS FOR PURPOSES NOT RELATED TO THE CONTROL OF A PARTICULAR SUB-UNIT - Details of control systems for road vehicle drive control not related to the control of a particular sub-unit
G05D 1/00 - Control of position, course, altitude, or attitude of land, water, air, or space vehicles, e.g. automatic pilot
A method includes determining, using a processing device, a set of observations from coolant data, the coolant data being received from one or more sensors in an environment associated with a coolant. The method further includes determining, using a machine learning model and the set of observations, a contamination level of the coolant. The method also includes initiating an operation, using the processing device, responsive to determining the coolant contamination level.
G01N 11/02 - Investigating flow properties of materials, e.g. viscosity or plasticity; Analysing materials by determining flow properties by measuring flow of the material
G01N 21/90 - Investigating the presence of flaws, defects or contamination in a container or its contents
Systems and methods provide for text normalization or inverse text normalization using a hybrid language system that combines rule-based processing with neural or learned processing. For example, a hybrid rule-based and neural approach identifies semiotic tokens within a textual input and generates a set of potential plain-text conversions of the semiotic tokens. The plain-text conversions are weighted and evaluated by a trained language model that rescores the plain-text conversion based on context to identify a highest scoring plain-text conversion for further processing within a language system pipeline.
G10L 13/08 - Text analysis or generation of parameters for speech synthesis out of text, e.g. grapheme to phoneme translation, prosody generation or stress or intonation determination
G06F 40/40 - Processing or translation of natural language
In various examples, systems for performing cloud-based updating of operating systems (e.g., root file systems) using system partitioning. For instance, a system(s) may initiate updates of the operating systems of machines, where the machines use system partitioning for the updating. More specifically, the system(s) may cause a machine to update the operating system using a standby system partition while the machine is currently running on another, active system partition. In some circumstances, the system(s) may perform these processes in order to update a cluster of machines, such as during a specific time period or at a certain frequency. By using such processes, the cluster of machines may still operate during the updating of the machines and/or even if the update fails on one or more of the machines.
G06F 21/57 - Certifying or maintaining trusted computer platforms, e.g. secure boots or power-downs, version controls, system software checks, secure updates or assessing vulnerabilities
Apparatuses, systems, and techniques to schedule one or more workloads to one or more computers by comparing one or more performance metrics of the one or more workloads to be performed using one or more computers with one or more performance metrics of the one or more workloads to be performed using a simulation of the one or more computers.
Apparatuses, systems, and techniques to allocate portions of a storage to groups of processors. In at least one embodiment, an amount of storage to store data to be used by one or more computer programs, based at least in part, on an amount of processors to perform one or more portions of the one or more computer programs.
Apparatuses, systems, and techniques to perform software workloads. In at least one embodiment, one or more circuits of a processor cause a programming interface to select a subset of one or more processors of a non-uniform memory access (NUMA) node to perform a software workload.
Apparatuses, systems, and techniques to obtain metric data of a computing resource service provider. In at least one embodiment, metric data of one or more graphics processing unit (GPUs) is caused to be obtained from the one or more GPUs in an order from newest to oldest.
Apparatuses, systems, and techniques to perform software workloads. In at least one embodiment, one or more circuits of a processor cause a first application programming interface to select a second application programming interface, wherein the second application programming interface performs one or more software workloads identified by the first application programming interface.
Apparatuses, systems, and techniques to perform software workloads. In at least one embodiment, one or more circuits of a processor perform a first application programming interface to select a second application programming interface, wherein the second application programming interface monitors performance of one or more software workloads identified by the first application programming interface.
Apparatuses, systems, and techniques to perform software workloads. In at least one embodiment, one or more circuits of a processor perform a first application programming interface to select a second application programming interface, wherein the second application programming interface terminates performance of one or more software workloads identified by the first application programming interface.
Apparatuses, systems, and techniques for scheduling instructions in a cluster to guarantee GPU-CPU alignment for these instructions. In at least one embodiment, jobs are scheduled based on constraints on job sizes and job placement. In at least one embodiment, a processor comprises circuits to schedule instructions to be performed by processors based on latency of interconnects coupled to these processors.
A ray (e.g., a traced path of light, etc.) is generated from an originating pixel within a scene being rendered. Additionally, one or more shadow map lookups are performed for the originating pixel to estimate an intersection of the ray with alpha-tested geometry within the scene. A shadow map stores the distance of geometry as seen from the point of view of the light, and alpha-tested geometry includes objects within the scene being rendered that have a determined texture and opacity. Further, the one or more shadow map lookups are performed to determine a visibility value for the pixel (e.g., that identifies whether the originating pixel is in a shadow) and a distance value for the pixel (e.g., that identifies how far the pixel is from the light). Further still, the visibility value and the distance value for the pixel are passed to a denoiser.
High quality image rendering can be achieved in part by using inverse transform sampling to direct sampling toward regions of greater importance, such as regions with higher brightness values, to reduce noise and improve convergence. Inverse transform sampling can be achieved more efficiently by reformulating as a ray-tracing problem, using tree traversal units that can be accelerated. A geometric mesh can be generated based on a set of cumulative distribution functions (CDFs) for various rows and columns of pixels in a texture, and individual rays can be traced against this mesh, with those rays having a higher probability of intersection at a point with greater importance, such as a higher brightness value. A probability distribution function to be used for importance sampling can be derived by analyzing partial derivatives of the CDF geometry at the intersection location.
Apparatuses, systems, and techniques to select computer systems to perform portions of one or more programs in parallel based, at least in part, on the computer systems' ability to perform the portions at substantially a same performance level. In at least one embodiment, a system includes one or more circuits to select one or more computer systems based, at least in part, on identifying one or more logical partitions of the computer systems based, at least in part, on one or more attributes of one or more programs associated with the one or more computer systems.
Transferring pose to three-dimensional characters is a common computer graphics task that typically involves transferring the pose of a reference avatar to a (stylized) three-dimensional character. Since three-dimensional characters are created by professional artists through imagination and exaggeration, and therefore, unlike human or animal avatars, have distinct shape and features, matching the pose of a three-dimensional character to that of a reference avatar generally requires manually creating shape information for the three-dimensional character that is required for pose transfer. The present disclosure provides for the automated transfer of a reference pose to a three-dimensional character, based specifically on a learned shape code for the three-dimensional character.
Estimating motion of a human or other object in video is a common computer task with applications in robotics, sports, mixed reality, etc. However, motion estimation becomes difficult when the camera capturing the video is moving, because the observed object and camera motions are entangled. The present disclosure provides for joint estimation of the motion of a camera and the motion of articulated objects captured in video by the camera.
One embodiment of a method for controlling a robot includes generating a representation of spatial occupancy within an environment based on a plurality of red, green, blue (RGB) images of the environment, determining one or more actions for the robot based on the representation of spatial occupancy and a goal, and causing the robot to perform at least a portion of a movement based on the one or more actions.