An apparatus and method for efficiently routing power signals across a semiconductor die. In various implementations, an integrated circuit includes a micro through silicon via (TSV) that traverses a silicon substrate layer to a backside metal layer. The integrated circuit also includes power switches. The integrated circuit routes a power supply signal from the output of a power switch to a frontside power rail using the micro TSV and the backside metal layer. The integrated circuit also routes the power supply signal from the output of the power switch to the frontside power rail using a frontside metal layer. Therefore, the frontside metal layer and the backside metal layer provide power connection redundancy that increases charge sharing, improves wafer yield, reduces voltage droop, and reduces on-die area. In addition, the process routes a ground reference voltage level using both a frontside power rail and a backside power rail.
H01L 23/528 - Configuration de la structure d'interconnexion
H01L 21/768 - Fixation d'interconnexions servant à conduire le courant entre des composants distincts à l'intérieur du dispositif
H01L 23/48 - Dispositions pour conduire le courant électrique vers le ou hors du corps à l'état solide pendant son fonctionnement, p.ex. fils de connexion ou bornes
H01L 23/525 - Dispositions pour conduire le courant électrique à l'intérieur du dispositif pendant son fonctionnement, d'un composant à un autre comprenant des interconnexions externes formées d'une structure multicouche de couches conductrices et isolantes inséparables du corps semi-conducteur sur lequel elles ont été déposées avec des interconnexions modifiables
A semiconductor package includes multiple dies that share the same package pin. An output enable register provided on each die is used to select the die that drives an output to the shared pin. A hardware arbitration circuit ensures that two or more dies do not drive an output to the shared pin at the same time.
G06F 13/20 - Gestion de demandes d'interconnexion ou de transfert pour l'accès au bus d'entrée/sortie
H01L 23/538 - Dispositions pour conduire le courant électrique à l'intérieur du dispositif pendant son fonctionnement, d'un composant à un autre la structure d'interconnexion entre une pluralité de puces semi-conductrices se trouvant au-dessus ou à l'intérieur de substrats isolants
H01L 25/065 - Ensembles consistant en une pluralité de dispositifs à semi-conducteurs ou d'autres dispositifs à l'état solide les dispositifs étant tous d'un type prévu dans le même sous-groupe des groupes , ou dans une seule sous-classe de , , p.ex. ensembles de diodes redresseuses les dispositifs n'ayant pas de conteneurs séparés les dispositifs étant d'un type prévu dans le groupe
3.
PROCESSOR-GUIDED EXECUTION OF OFFLOADED INSTRUCTIONS USING FIXED FUNCTION OPERATIONS
Processor-guided execution of offloaded instructions using fixed function operations is disclosed. Instructions designated for remote execution by a target device are received by a processor. Each instruction includes, as an operand, a target register in the target device. The target register may be an architected virtual register. For each of the plurality of instructions, the processor transmits an offload request in the order that the instructions are received. The offload request includes the instruction designated for remote execution. The target device may be, for example, a processing-in-memory device or an accelerator coupled to a memory.
The disclosed device for packet coalescing includes detecting a trigger condition for initiating packet coalescing of packet traffic and sending, to an endpoint device, a notification to start packet coalescing. The device can observe a status in response to starting the packet coalescing and report a performance of the packet coalescing. A system can include a controller that detects a trigger condition for packet coalescing and notifies an endpoint device via a notification register. The controller can read a status register to report, based on the read status, a packet coalescing performance. Various other methods, systems, and computer-readable media are also disclosed.
A memory controller monitors memory command selected for dispatch to the memory and sends commands controlling a read clock state. A memory includes a read clock circuit and a mode register. The read clock circuit has an output for providing a hybrid read clock signal in response to a clock signal and a read clock mode signal. The read clock circuit provides the hybrid read clock signal as a free-running clock signal that toggles continuously, and as a strobe signal that is active only in response to the memory receiving a read command.
A processing system includes a primary processor and a co-processor. The primary processor is couplable to a memory subsystem having at least one memory and operating to execute system software employing memory address translations based on one or more page tables stored in the memory subsystem. The co-processor is likewise couplable to the memory subsystem and operates to perform iterations of a page table walk through one or more page tables maintained for the memory subsystem and to perform one or more page management operations on behalf of the system software based the iterations of the page table walk. The page management operations performed by the co-processor include analytic data aggregation, free list management and page allocation, page migration management, page table error detection, and the like.
G06F 12/1027 - Traduction d'adresses utilisant des moyens de traduction d’adresse associatifs ou pseudo-associatifs, p.ex. un répertoire de pages actives [TLB]
G06F 12/123 - Commande de remplacement utilisant des algorithmes de remplacement avec listes d’âge, p.ex. file d’attente, liste du type le plus récemment utilisé [MRU] ou liste du type le moins récemment utilisé [LRU]
7.
COMMUNICATION REDUCTION TECHNIQUES FOR PARALLEL COMPUTING
A physical system is simulated using a model including a plurality of elements in a mesh or grid. The elements are divided into partitions processed by different processing units. For some time steps, state data is transmitted between partitions and used to calculate flux data for updating the state of edge elements of the partitions. Periodically, transmission of state data is suppressed, and flux data is obtained by linear interpolation based on past flux data. Alternatively, flux data is obtained by processing state variables of an edge element and past flux data using a machine learning model, such as a DNN. Whether to suppress transmission of state data may be determined based on one or both of (a) uncertainty in an output of the machine learning model (e.g., Bayesian neural network) and (b) complexity of model of the physical system (e.g., spatial or temporal gradients).
G06F 30/23 - Optimisation, vérification ou simulation de l’objet conçu utilisant les méthodes des éléments finis [MEF] ou les méthodes à différences finies [MDF]
G06F 30/27 - Optimisation, vérification ou simulation de l’objet conçu utilisant l’apprentissage automatique, p.ex. l’intelligence artificielle, les réseaux neuronaux, les machines à support de vecteur [MSV] ou l’apprentissage d’un modèle
8.
Page rinsing scheme to keep a directory page in an exclusive state in a single complex
A method includes, in a cache directory, storing an entry associating a memory region with an exclusive coherency state, and in response to a memory access directed to the memory region, transmitting a demote superprobe to convert at least one cache line of the memory region from an exclusive coherency state to a shared coherency state.
A processing system includes a processor core for processing instructions and a memory that stores a page table set including an extended page table having an extended page table entry storing extended page table attributes associated with a physical memory page. The system receives a virtual address and translates the virtual address to a physical address for the physical memory page. One or more extended page attributes associated with the physical memory page are retrieved from the extended page table entry based on the virtual address.
G06F 12/00 - Accès à, adressage ou affectation dans des systèmes ou des architectures de mémoires
G06F 9/455 - Dispositions pour exécuter des programmes spécifiques Émulation; Interprétation; Simulation de logiciel, p.ex. virtualisation ou émulation des moteurs d’exécution d’applications ou de systèmes d’exploitation
G06F 12/06 - Adressage d'un bloc physique de transfert, p.ex. par adresse de base, adressage de modules, extension de l'espace d'adresse, spécialisation de mémoire
G06F 12/1009 - Traduction d'adresses avec tables de pages, p.ex. structures de table de page
G06F 13/00 - Interconnexion ou transfert d'information ou d'autres signaux entre mémoires, dispositifs d'entrée/sortie ou unités de traitement
10.
On-Demand Regulation of Memory Bandwidth Utilization to Service Requirements of Display
Systems, apparatuses, and methods for prefetching data by a display controller. From time to time, a performance-state change of a memory are performed. During such changes, a memory clock frequency is changed for a memory subsystem storing frame buffer(s) used to drive pixels to a display device. During the performance-state change, memory accesses may be temporarily blocked. To sustain a desired quality of service for the display, a display controller is configured to prefetch data in advance of the performance-state change. In order to ensure the display controller has sufficient memory bandwidth to accomplish the prefetch, bandwidth reduction circuitry in clients of the system are configured to temporarily reduce memory bandwidth of corresponding clients.
An apparatus and method for efficiently managing performance among multiple integrated circuits in separate semiconductor chips. In various implementations, a computing system includes at least a first processing node and a second processing node. While processing tasks, the first processing node uses a first memory and the second processing node uses a second memory. A first communication channel transfers data between the first processing node and the second processing node. The first processing node accesses the second memory using a second communication channel different from the first communication channel and supports point-to-point communication. The second memory services access requests from the first and second processing nodes as the access requests are received while foregoing access conflict detection. The first processing node accesses the second memory after determining a particular amount of time has elapsed after reception of an indication from the second processing node specifying that a particular task has begun.
A scheduler of an apparatus exposes an application programming interface (API) usable to specify quality-of-service (QoS) parameters, e.g., latency, throughput, and so forth. An application, for instance, specifies the QoS parameters for a workload to be processed using a hardware compute unit. The QoS parameters are employed by the scheduler as a basis to configure a partition within a hardware compute unit. The partition is configured such that processing resources that are available via the partition to process the workload comply with the specified quality-of-service.
A semiconductor module comprises multiple non-homogeneous semiconductor dies disposed on the semiconductor module, with each semiconductor die having a set of circuitry modules that are common to all of the semiconductor dies and also a set of supporting circuitry modules that are distinct between the semiconductor dies. An interconnect communicatively couples the semiconductor dies together. Commands for processing by the semiconductor module may be routed to individual semiconductor dies based on capabilities of the particular circuitry modules disposed on those individual semiconductor dies.
G06F 15/80 - Architectures de calculateurs universels à programmes enregistrés comprenant un ensemble d'unités de traitement à commande commune, p.ex. plusieurs processeurs de données à instruction unique
A method for receiving a multi-level error signal having more than two logic levels includes oversampling the multi-level error signal to provide sampled symbols, wherein a first level of the multi-level error signal indicates no error, and second and third levels of the multi-level error signal indicate first and second error conditions, respectively. The sampled signals are de-serialized to provide sets of symbols. A start of a symbol period is determined in response to detecting that a given sample is different from a prior sample, and the prior sample indicates no error. The sets of symbols are filtered to provide corresponding output symbols based on the start.
A method for operating a memory having a plurality of banks accessible in parallel, each bank including a plurality of grains accessible in parallel is provided. The method includes: based on a memory access request that specifies a memory address, identifying a set that stores data for the memory access request, wherein the set is spread across multiple grains of the plurality of grains; and performing operations to satisfy the memory access request, using entries of the set stored across the multiple grains of the plurality of grains.
An apparatus for component cooling includes a first heat transfer element configured to be thermally coupled to a heat-generating component and a second heat transfer element configured to be thermally coupled to the heat-generating component. A manifold is configured to receive a single fluid flow of a heat transfer medium and split the single fluid flow into a first split fluid flow provided to the first heat transfer element and a second split fluid flow provided to the second heat transfer element.
G05B 19/416 - Commande numérique (CN), c.à d. machines fonctionnant automatiquement, en particulier machines-outils, p.ex. dans un milieu de fabrication industriel, afin d'effectuer un positionnement, un mouvement ou des actions coordonnées au moyen de données d'u caractérisée par la commande de vitesse, d'accélération ou de décélération
A method for configuring a processor includes identifying a component cooling device thermally coupled to a processor, and configuring one or more operating parameters of the processor based on the identification of the component cooling device.
An apparatus for sub-cooling components includes a component cooling device, a processor thermally coupled to the component cooling device, an electronic component, a fan configured to direct an airflow across the processor and the electronic component, and a thermoelectric cooling device thermally coupled to the component cooling device. The thermoelectric cooling device is configured to cool the airflow from a first temperature to a second temperature.
An apparatus for component cooling includes a thermoelectric cooling (TEC) device configured to be thermally coupled to a first processor, and a controller. The controller is configured to receive at least one first parameter indicative of a first activity level of the first processor; determine a TEC power level from among a plurality of TEC power levels based on the at least one first parameter; and control providing of power to the TEC device at the determined TEC power level.
A processing system allocates memory to co-locate input and output operands for operations for processing in memory (PIM) execution in the same PIM-local memory while exploiting row-buffer locality and complying with conventional memory abstraction. The processing system identifies as “super rows” virtual rows that span all the banks of a memory device. Each super row has a different bank-interleaving pattern, referred to as a “color”. A group of contiguous super rows that has the same PIM-interleaving pattern is referred to as a “color group”. The processing system assigns memory addresses to each operand (e.g., vector) of an operation for PIM execution to a super row having a different color within the same color group to co-locate the operands for each PIM execution unit and uses address hashing to alternate between banks assigned to elements of a first operand and elements of a second operand of the operation.
G06F 12/06 - Adressage d'un bloc physique de transfert, p.ex. par adresse de base, adressage de modules, extension de l'espace d'adresse, spécialisation de mémoire
G06F 12/02 - Adressage ou affectation; Réadressage
21.
Selecting Between Basic and Global Persistent Flush Modes
Selecting between basic and global persistent flush modes is described. In accordance with the described techniques, a system includes a data fabric in electronic communication with at least one cache and a controller configured to select between operating in a global persistent flush mode and a basic persistent flush mode based on an available flushing latency of the system, control the at least one cache to flush dirty data to the data fabric in response to a flush event trigger while operating in the global persistent flush mode, and transmit a signal to switch control to an application to flush the dirty data from the at least one cache to the data fabric while operating in the basic persistent flush mode.
Runtime flushing to persistency in heterogenous systems is described. In accordance with the described techniques, a system may include a persistent memory in electronic communication with at least one cache and a controller configured to command the at least one cache to flush dirty data to the persistent memory in response to a dirtiness of the at least one cache reaching a cache dirtiness threshold.
G06F 12/0891 - Adressage d’un niveau de mémoire dans lequel l’accès aux données ou aux blocs de données désirés nécessite des moyens d’adressage associatif, p.ex. mémoires cache utilisant des moyens d’effacement, d’invalidation ou de réinitialisation
23.
APPARATUS, SYSTEM, AND METHOD FOR THROTTLING PREFETCHERS TO PREVENT TRAINING ON IRREGULAR MEMORY ACCESSES
A disclosed computing device includes at least one prefetcher and a processing device communicatively coupled to the prefetcher. The processing device is configured to detect a throttling instruction that indicates a start of a throttling region. The computing device is further configured to prevent the prefetcher from being trained on one or more memory instructions included in the throttling region in response to the throttling instruction. Various other apparatuses, systems, and methods are also disclosed.
G06F 12/0862 - Adressage d’un niveau de mémoire dans lequel l’accès aux données ou aux blocs de données désirés nécessite des moyens d’adressage associatif, p.ex. mémoires cache avec pré-lecture
24.
CONNECTING A CHIPLET TO AN INTERPOSER DIE AND TO A PACKAGE INTERFACE USING A SPACER INTERCONNECT COUPLED TO A PORTION OF THE CHIPLET
A semiconductor package assembly includes a package interface. An interposer die has a first surface and a second surface opposite to the first surface, where the first surface of the interposer is die positioned on the package interface. The interposer die includes a plurality of conductive connections between the first surface and second surface. A chiplet includes a connectivity region having conductive pathways, with a first portion of the connectivity region coupled to a conductive connection of the interposer die and a second portion of the connectivity region cantilevered from the interposer die.
H01L 23/498 - Connexions électriques sur des substrats isolants
H01L 23/00 - DISPOSITIFS À SEMI-CONDUCTEURS NON COUVERTS PAR LA CLASSE - Détails de dispositifs à semi-conducteurs ou d'autres dispositifs à l'état solide
The disclosed method includes detecting a stalled memory request, for a first level cache within a cache hierarchy of a processor, that includes an outstanding memory request associated with a second level cache within the cache hierarchy that is higher than the first level cache. The method further includes indicating, to the second level cache, that the first level cache is experiencing a starvation issue due to stalled memory requests to enable the second level cache to perform starvation-remediation actions. Various other methods, systems, and computer-readable media are also disclosed.
The disclosed computing device includes a cache memory and at least one processor coupled to the cache memory. The at least one processor is configured to copy data written to one or more nonredundant wordlines of the cache memory to one or more redundant wordlines of the cache memory. The at least one processor is additionally configured to detect a mismatch between data read from the one or more nonredundant wordlines and data stored in the one or more redundant wordlines. The at least one processor is also configured to perform a remediation action in response to detecting the mismatch. Various other methods, systems, and computer-readable media are also disclosed.
G06F 11/10 - Détection ou correction d'erreur par introduction de redondance dans la représentation des données, p.ex. en utilisant des codes de contrôle en ajoutant des chiffres binaires ou des symboles particuliers aux données exprimées suivant un code, p.ex. contrôle de parité, exclusion des 9 ou des 11
G06F 11/07 - Réaction à l'apparition d'un défaut, p.ex. tolérance de certains défauts
27.
Selecting a Tiling Scheme for Processing Instances of Input Data Through a Neural Netwok
An electronic device uses a tiling scheme selected from among a set of tiling schemes for processing instances of input data through a neural network. Each of the tiling schemes is associated with a different arrangement of portions into which instances of input data are divided for processing in the neural network. In operation, processing circuitry in the electronic device acquires information about a neural network and properties of the processing circuitry. The processing circuitry then selects a given tiling scheme from among a set of tiling schemes based on the information. The processing circuitry next processes instances of input data in the neural network using the given tiling scheme. Processing each instance of input data in the neural network includes dividing the instance of input data into portions based on the given tiling scheme, separately processing each of the portions in the neural network, and combining the respective outputs to generate an output for the instance of input data.
In response to receiving a scene description, a processing system generates a set of planes in the scene and a bounding volume representing a partition of the scene. Using the set of planes in the scene, a compute unit of an accelerated processing unit performs a spatial test on the bounding volume to determine whether the bounding volume intersects one or more planes of the set of planes in the scene. Based on the spatial test, the compute unit generates intersection data indicating whether the bounding volume intersects one or more planes of the set of planes in the scene. The accelerated processing unit then uses the intersection data to render the scene.
A data processing system includes a vector data processing unit that includes a shared scheduler queue configured to store in a same queue, at least one entry that includes at least a mask type instruction and another entry that includes at least a vector type instruction. Shared pipeline control logic controls a vector data path or a mask data path, based a type of instruction picked from the same queue. In some examples, at least one mask type instruction and the at least one vector type instruction each include a source operand having a corresponding shared source register bit field that indexes into both a mask register file and a vector register file. The shared pipeline control logic uses a mask register file or a vector register file depending on whether bits of the shared source register bit field identify a mask source register or a vector source register.
A processing device and method for executing a color twist operation are provided. The processing device comprises memory and a processor configured to convert values of pixels of a frame from a first color domain to a hue, saturation and value (HSV) color domain, adjust hue values and saturation values of the pixels, store the adjusted hue and saturation values in a portion of the memory local to the processor and convert the frame from the HSV color domain to the first color domain using the adjusted hue and saturation values stored in local memory. The adjusted hue and saturation values are generated from pre-adjusted values, which are generated from masked vector values.
Methods and devices are provided for processing image data on a sub-frame portion basis using layers of a convolutional neural network. The processing device comprises memory and a processor. The processor is configured to determine, for an input tile of an image, a receptive field via backward propagation and determine a size of the input tile based on the receptive field and an amount of local memory allocated to store data for the input tile. The processor determines whether the amount of local memory allocated to store the data of the input tile and padded data for the receptive field.
Connection modification based on traffic pattern is described. In accordance with the described techniques, a traffic pattern of memory operations across a set of connections between at least one device and at least one memory is monitored. The traffic pattern is then compared to a threshold traffic pattern condition, such as an amount of data traffic in different directions across the connections. A traffic direction of at least one connection of the set of connections is modified based on the traffic pattern corresponding to the threshold traffic pattern condition.
A memory system includes a PHY embodied on an integrated circuit, the PHY coupling to a memory over conductive traces on a substrate. The PHY includes a reference clock generation circuit providing a reference clock signal to the memory, a first group of driver circuits providing CA signals to the memory, and a second group of driver circuits providing DQ signals to the memory. A plurality of the conductive traces which carry the DQ signals are constructed with a length longer than that of conductive traces carrying the reference clock signal in order to reduce an effective insertion delay associated with coupling the reference clock signal to latch respective incoming DQ signals.
A method for performing prefetching operations is disclosed. The method includes storing a recorded access pattern indicating a set of accesses for a region; in response to an access within the region, fetching the recorded access pattern; and performing prefetching based on the access pattern.
G06F 12/0862 - Adressage d’un niveau de mémoire dans lequel l’accès aux données ou aux blocs de données désirés nécessite des moyens d’adressage associatif, p.ex. mémoires cache avec pré-lecture
A memory controller for generating accesses for a memory includes a row hammer logic circuit for providing a sample request. In response to the sample request, the memory controller generates a sample command for dispatch to the memory to cause the memory to capture a current row. In response to a completion of the sample command, the memory controller generates a mitigation command for dispatch to the memory.
G11C 11/4078 - Circuits de sécurité ou de protection, p.ex. afin d'empêcher la lecture ou l'écriture intempestives ou non autorisées; Cellules d'état; Cellules de test
G11C 11/406 - Organisation ou commande des cycles de rafraîchissement ou de régénération de la charge
36.
Executing Kernel Workgroups Across Multiple Compute Unit Types
Portions of programs, oftentimes referred to as kernels, are written by programmers to target a particular type of compute unit, such as a central processing unit (CPU) core or a graphics processing unit (GPU) core. When executing a kernel, the kernel is separated into multiple parts referred to as workgroups, and each workgroup is provided to a compute unit for execution. Usage of one type of compute unit is monitored and, in response to the one type of compute unit being idle, one or more workgroups targeting another type of compute unit are executed on the one type of compute unit. For example, usage of CPU cores is monitored, and in response to the CPU cores being idle, one or more workgroups targeting GPU cores are executed on the CPU cores.
A disclosed method can include (i) reporting, by a microcontroller, detection of a violation of a physical infrastructure constraint to a machine check architecture, (ii) triggering, by the machine check architecture in response to the reporting, a machine-check exception such that the violation of the physical infrastructure constraint is recorded, and (iii) performing a corrective action based on the triggering of the machine-check exception. Various other apparatuses, systems, and methods are also disclosed.
The disclosed computer-implemented method for generating remedy recommendations for power and performance issues within semiconductor software and hardware. For example, the disclosed systems and methods can apply a rule-based model to telemetry data to generate rule-based root-cause outputs as well as telemetry-based unknown outputs. The disclosed systems and methods can further apply a root-cause machine learning model to the telemetry-based unknown outputs to analyze deep and complex failure patterns with the telemetry-based unknown outputs to ultimately generate one or more root-cause remedy recommendations that are specific to the identified failure and the client computing device that is experiencing that failure.
Systems and methods for pushed prefetching include: multiple core complexes, each core complex having multiple cores and multiple caches, the multiple caches configured in a memory hierarchy with multiple levels; an interconnect device coupling the core complexes to each other and coupling the core complexes to shared memory, the shared memory at a lower level of the memory hierarchy than the multiple caches; and a push-based prefetcher having logic to: monitor memory traffic between caches of a first level of the memory hierarchy and the shared memory; and based on the monitoring, initiate a prefetch of data to a cache of the first level of the memory hierarchy.
G06F 12/0862 - Adressage d’un niveau de mémoire dans lequel l’accès aux données ou aux blocs de données désirés nécessite des moyens d’adressage associatif, p.ex. mémoires cache avec pré-lecture
G06F 12/0811 - Systèmes de mémoire cache multi-utilisateurs, multiprocesseurs ou multitraitement avec hiérarchies de mémoires cache multi-niveaux
40.
MATRIX MULTIPLICATION UNIT WITH FLEXIBLE PRECISION OPERATIONS
A processing unit such as a graphics processing unit (GPU) includes a plurality of vector signal processors (VSPs) that include multiply/accumulate elements. The processing unit also includes a plurality of registers associated with the plurality of VSPs. First portions of first and second matrices are fetched into the plurality of registers prior to a first round that includes a plurality of iterations. The multiply/accumulate elements perform matrix multiplication and accumulation on different combinations of subsets of the first portions of the first and second matrices in the plurality of iterations prior to fetching second portions of the first and second matrices into the plurality of registers for a second round. The accumulated results of multiplying the first portions of the first and second matrices are written into an output buffer in response to completing the plurality of iterations.
A system and method for determining power-performance state transition thresholds in a computing system. A processor comprises several functional blocks and a power manager. Each of the functional blocks produces data corresponding to an activity level associated with the respective functional block. The power manager determines activity levels of the functional blocks and compares the activity level of a given functional block to a threshold to determine if a power-performance state (P-state) transition is indicated. The threshold is determined in part on a current P-state of the given functional block. When the current P-state of the given functional block is relatively high, the threshold activity level to transition to a higher P-state is higher than it would be if the current P-state were relatively low. The power manager is further configured to determine the thresholds based in part on one or more of a type of circuit being monitored and a type of workload being executed.
Systems, apparatuses, and methods for implementing a hierarchical scheduler. In various implementations, a processor includes a global scheduler, and a plurality of independent local schedulers with each of the local schedulers coupled to a plurality of processors. In one implementation, the processor is a graphics processing unit and the processors are computation units. The processor further includes a shared cache that is shared by the plurality of local schedulers. Each of the local schedulers also includes a local cache used by the local scheduler and processors coupled to the local scheduler. To schedule work items for execution, the global scheduler is configured to store one or more work items in the shared cache and convey an indication to a first local scheduler of the plurality of local schedulers which causes the first local scheduler to retrieve the one or more work items from the shared cache. Subsequent to retrieving the work items, the local scheduler is configured to schedule the retrieved work items for execution by the coupled processors. Each of the plurality of local schedulers is configured to schedule work items for execution independent of scheduling performed by other local schedulers.
Systems, apparatuses, and methods for implementing a message passing system to schedule work in a computing system. In various implementations, a processor includes a global scheduler, and a plurality of local schedulers with each of the local schedulers coupled to a plurality of processors. The processor further includes a shared cache that is shared by the plurality of local schedulers. Also, a plurality of mailboxes are implemented to enable communication between the local schedulers and the global scheduler. To schedule work items for execution, the global scheduler is configured to store one or more work items in the shared cache and store an indication in a mailbox for a first local scheduler of the plurality of local schedulers. Responsive to detecting the message in the mailbox, the first local scheduler identifies a location of the one or more work items in the shared cache and retrieves them for scheduling locally.
An apparatus and method for efficiently performing power management for multiple clients of a semiconductor chip that supports remote manageability. In various implementations, a network interface receives a packet, and sends at least an indication of the packet to a manageability processing circuitry (MPC) of a processing node with multiple clients for processing tasks. The MPC determines whether a client or itself is a destination needed to process the packet. If the destination is the MPC, then packet processing is done by the MPC without involvement from the clients, which can be in an idle state. For example, the MPC can process a remote manageability packet requesting diagnostic information from one or more clients of the processing node. The network interface and the MPC use a sideband communication channel for data transmission, which foregoes lane training for further reduction in latency and power consumption.
H04L 49/109 - TRANSMISSION D'INFORMATION NUMÉRIQUE, p.ex. COMMUNICATION TÉLÉGRAPHIQUE Éléments de commutation de paquets caractérisés par la construction de la matrice de commutation intégrés sur micropuce, p.ex. interrupteurs sur puce
An apparatus and method for efficiently routing power signals across a semiconductor die. In various implementations, an integrated circuit includes, at a first node that receives a power supply reference, a first micro through silicon via (TSV) that traverses through a silicon substrate layer to a backside metal layer. The integrated circuit includes, at a second node that receives the power supply reference, a second micro TSV that physically contacts at least one source region. The integrated circuit includes a first power rail that connects the first micro TSV to the second micro TSV. This power rail replaces contacts between the micro TSVs and a second power rail such as the frontside metal zero (M0) layer. Each of the first power rail, the second power rail, and the backside metal layer provides power connection redundancy that increases charge sharing, improves wafer yield, and reduces voltage droop.
H01L 23/528 - Configuration de la structure d'interconnexion
H01L 21/768 - Fixation d'interconnexions servant à conduire le courant entre des composants distincts à l'intérieur du dispositif
H01L 23/535 - Dispositions pour conduire le courant électrique à l'intérieur du dispositif pendant son fonctionnement, d'un composant à un autre comprenant des interconnexions internes, p.ex. structures d'interconnexions enterrées
H01L 27/118 - Circuits intégrés à tranche maîtresse
Data reuse cache techniques are described. In one example, a load instruction is generated by an execution unit of a processor unit. In response to the load instruction, data is loaded by a load-store unit for processing by the execution unit and is also stored to a data reuse cache communicatively coupled between the load-store unit and the execution unit. Upon receipt of a subsequent load instruction for the data from the execution unit, the data is loaded from the data reuse cache for processing by the execution unit.
G06F 12/0811 - Systèmes de mémoire cache multi-utilisateurs, multiprocesseurs ou multitraitement avec hiérarchies de mémoires cache multi-niveaux
G06F 12/0875 - Adressage d’un niveau de mémoire dans lequel l’accès aux données ou aux blocs de données désirés nécessite des moyens d’adressage associatif, p.ex. mémoires cache avec mémoire cache dédiée, p.ex. instruction ou pile
G06F 12/0884 - Mode parallèle, p.ex. en parallèle avec la mémoire principale ou l’unité centrale [CPU]
47.
BIGNUM ADDITION AND/OR SUBTRACTION WITH CARRY PROPAGATION
A processing unit includes a plurality of adders and a plurality of carry bit generation circuits. The plurality of adders add first and second X bit binary portion values of a first Y bit binary value and a second Y bit binary value. Y is a multiple of X. The plurality of adders further generate first carry bits. The plurality of carry bit generation circuits is coupled to the plurality of adders, respectively, and receive the first carry bits. The plurality of carry bit generation circuits generate second carry bits based on the first carry bits. The plurality of adders use the second carry bits to add the first and second X bit binary portions of the first and second Y bit binary values, respectively.
G06F 7/498 - Calculs avec des nombres décimaux utilisant des accumulateurs de type compteur
G06F 7/506 - Addition; Soustraction en mode parallèle binaire, c. à d. ayant un circuit de maniement de chiffre différent pour chaque position avec génération simultanée de retenue pour plusieurs étages ou propagation simultanée de retenue sur plusieurs étages
Methods, devices, and systems for retrieving information based on cache miss prediction. It is predicted, based on a history of cache misses at a private cache, that a cache lookup for the information will miss a shared victim cache. A speculative memory request is enabled based on the prediction that the cache lookup for the information will miss the shared victim cache. The information is fetched based on the enabled speculative memory request.
Devices and methods for node traversal for ray tracing are provided, which comprise casting a first ray in a space comprising objects represented by geometric shapes, traversing, for the first ray, at least one first node of an accelerated hierarchy structure representing an approximate volume of a group of the geometric shapes and a second node representing a volume of one of the geometric shapes, casting a second ray in the space, selecting, for the second ray, a starting node of traversal based on locations of intersection of the first ray and the second ray and an identifier which identifies one or more nodes intersected by the first ray and traversing, for the second ray, the accelerated hierarchy structure beginning at the starting node of traversal.
A method and apparatus for storing keys in a key storage block includes processing a key request. A first key is allocated based upon the key request. The first key is stored in the key storage block, wherein the first key is of a first size and includes a first rule.
Methods and systems are disclosed for reducing power consumption by a system including a digital unit and an optical unit. Techniques disclosed comprise generating a workload signature of an incoming workload to be executed by the system. Based on the generated workload signature, techniques disclosed comprise matching the incoming workload with a profile of stored workload profiles. The workload profiles are generated by a trace capture unit. Based on the associated profile, a task submission transaction is sent to the optical unit of the system, representative of a request to execute the incoming workload by the optical unit.
G06F 1/329 - Gestion de l’alimentation, c. à d. passage en mode d’économie d’énergie amorcé par événements Économie d’énergie caractérisée par l'action entreprise par planification de tâches
52.
COMPONENT COOLER WITH MULTIPLE HEAT TRANSFER PATHS
An apparatus for component cooling includes a first heat transfer element configured to be thermally coupled to a heat-generating component, and a second heat transfer element. The apparatus further includes a plurality of thermally conductive paths between the first heat transfer element and the second heat transfer element. Each of the plurality of thermally conductive paths are configured to provide a separate heat conduction path from the first heat transfer element to the second heat transfer element.
An apparatus for component cooling includes a first heat transfer element configured to be thermally coupled to a heat-generating electronic component, and a second heat transfer element. The apparatus further includes a plurality of heat transfer paths thermally coupled between the first heat transfer element and the second heat transfer element. Each of the plurality of heat transfer paths configured to provide a separate heat conduction path from the first heat transfer element to the second heat transfer element. The apparatus further includes a manifold including a first fluid passage providing a first portion of a heat transfer fluid in thermal contact with the first heat transfer element, and a second fluid passage providing a second portion of the heat transfer fluid in thermal contact with the second heat transfer element.
An apparatus for component cooling includes a manifold and a heat transfer element configured to be thermally coupled to a heat-generating component. The apparatus further includes a first spring mechanism between the manifold and the heat transfer element. The first spring mechanism is configured to apply a first force to the heat transfer element.
A method of forming a semiconductor assembly includes forming a set of through-silicon vias in a carrier wafer, where a layer of the carrier wafer includes integrated devices. A die is coupled to a top surface of the carrier wafer including the set of through-silicon vias using hybrid bonding. One or more connection layers of the die are coupled to one or more of the through-silicon vias and coupled to one or more of the integrated devices. A second wafer is coupled to a top surface of the die. An amount is removed from a bottom surface of the carrier wafer that is parallel to and opposite to the top surface of the carrier wafer to reveal a conductive portion of at least one of the through-silicon vias.
H01L 23/00 - DISPOSITIFS À SEMI-CONDUCTEURS NON COUVERTS PAR LA CLASSE - Détails de dispositifs à semi-conducteurs ou d'autres dispositifs à l'état solide
H01L 25/00 - Ensembles consistant en une pluralité de dispositifs à semi-conducteurs ou d'autres dispositifs à l'état solide
H01L 25/065 - Ensembles consistant en une pluralité de dispositifs à semi-conducteurs ou d'autres dispositifs à l'état solide les dispositifs étant tous d'un type prévu dans le même sous-groupe des groupes , ou dans une seule sous-classe de , , p.ex. ensembles de diodes redresseuses les dispositifs n'ayant pas de conteneurs séparés les dispositifs étant d'un type prévu dans le groupe
56.
VECTOR PROCESSING UNIT WITH PROGRAMMABLE MULTICYCLE SHUFFLE UNIT
An integrated circuit includes a vector data processing unit that employs a cross-lane shuffle unit including multiplexing logic that programmably shuffles packed source lane values, each corresponding to one of a plurality of vector lanes, to different output vector result lane positions over multiple cycles. In certain implementations, in a first cycle, control logic in the cross-shuffle unit controls the multiplexing logic to select source lane values to be placed in a first group of output vector result lane positions for a vector result register; and in at least a second cycle, the same multiplexing logic is reused to select source lane values to be placed in a second group of output vector result lane positions for the vector result register wherein at least one of the selected source lane values is moved to a different result lane position. Associated methods are also presented.
G06F 9/30 - Dispositions pour exécuter des instructions machines, p.ex. décodage d'instructions
G06F 9/345 - Adressage de l'opérande d'instruction ou du résultat ou accès à l'opérande d'instruction ou au résultat d'opérandes ou de résultats multiples
A method for hierarchical work scheduling includes consuming a work item at a first scheduling domain having a local scheduler circuit and one or more workgroup processing elements. Consuming the work item produces a set of new work items. Subsequently, the local scheduler circuit distributes at least one new work item of the set of new work items to be executed locally at the first scheduling domain. If the local scheduler circuit of the first scheduling domain determines that the set of new work items includes one or more work items that would overload the first scheduling domain with work if scheduled for local execution, those work items are distributed to the next higher-level scheduler circuit in a scheduling domain hierarchy for redistribution to one or more other scheduling domains.
A technique for servicing a memory request is disclosed. The technique includes obtaining permissions associated with a source and a destination specified by the memory request, obtaining a first set of address translations for the memory request, and executing operations for a first request, using the first set of address translations.
A memory controller includes a first arbiter for selecting memory commands for dispatch to a memory over a first channel, a second arbiter for selecting memory commands for dispatch to the memory over a second channel, and a test circuit. The test circuit generates a respective testing sequence of read commands and write commands for each of the first channel and second channel, and causes the testing sequences to be transmitted over the first and second channels at least partially overlapping in time without selection by the first or second arbiters.
A method includes, in a cache directory, storing a set of entries corresponding to one or more memory regions having a first region size when the cache directory is in a first configuration, and based on a workload sparsity metric, reconfiguring the cache directory to a second configuration. In the second configuration, each entry in the set of entries corresponds to a memory region having a second region size.
G06F 12/0895 - Mémoires cache caractérisées par leur organisation ou leur structure de parties de mémoires cache, p.ex. répertoire ou matrice d’étiquettes
61.
SECURITY FOR SIMULTANEOUS MULTITHREADING PROCESSORS
A processor implements a simultaneous multithreading (SMT) protection mode that, when enabled, prevents execution of particular software (e.g., a virtual machine) at a processor core when a thread associated with different software (e.g., a different virtual machine or a hypervisor) is currently executing at the processor core. By preventing execution of the software, data, software execution patterns, and other potentially sensitive information is kept protected from unauthorized access or detection. Further, in at least some embodiments the SMT protection mode is implemented on a per-software basis, so that different software can choose whether to implement the protection mode, thereby allowing the processor to be employed in a wide variety of computing environments.
G06F 9/455 - Dispositions pour exécuter des programmes spécifiques Émulation; Interprétation; Simulation de logiciel, p.ex. virtualisation ou émulation des moteurs d’exécution d’applications ou de systèmes d’exploitation
G06F 9/48 - Lancement de programmes; Commutation de programmes, p.ex. par interruption
62.
Leveraging an Adaptive Oscillator for Fast Frequency Changes
Systems, apparatuses, and methods for managing power and performance in a computing system. A system management unit detects a condition indicating a change in a power-performance state of a given computing unit is indicated. In response to detecting the indication, the system management unit is configured to initiate a change to a frequency of a clock signal generated by an adaptive oscillator by changing a voltage supplied to the adaptive oscillator. The adaptive oscillator is configured to rapidly change a frequency of the clock signal generated in response to detecting a change in a droopy supply voltage of the adaptive oscillator. The new frequency generated by the adaptive oscillator is based in part on a difference between the droopy supply voltage and a regulated supply voltage of the adaptive oscillator.
Systems, apparatuses, and methods for prefetching data by a display controller. From time to time, a performance-state change of a memory are performed. During such changes, a memory clock frequency is changed for a memory subsystem storing frame buffer(s) used to drive pixels to a display device. During the performance-state change, memory accesses may be temporarily blocked. In order to reduce visual artifacts that may occur while the memory accesses are blocked, a memory subsystem includes a control circuit configured to enable a caching mode which caches display data provided to the display controller. Subsequent requests for display data from the display controller are then serviced using the cached data instead of accessing memory.
Multi-level cell memory management techniques are described. In one example, the memory controller is configured to control whether a single-level cell operation or a multi-level cell operation to be used using different mapping schemes. The single-level cell operation, for instance, is usable to store a data word using two states whereas the multi-level cell operation is usable to store the data word by also using an intermediate state. In order to store the data word using two states, the memory controller is configurable to separate the data word across two word lines in the physical memory. In an implementation, use of the different operations and corresponding mapping schemes by the memory controller alternates between adjacent word lines in physical memory.
Block data load with transpose techniques are described. In one example, an input is received, at a control unit, specifying an instruction to load a block of data to at least one memory module using a transpose operation. Responsive to the receiving the input by the control unit, the block of data is caused to be loaded to the at least one memory module by transposing the block of data to form a transposed block of data and storing the transposed block of data in the at least one memory.
In accordance with the described techniques for data compression and decompression for processing in memory, a page address is received by a processing in memory component that maps to a first location in memory where data of a page is maintained. The data of the page is compressed by the processing in memory component. Further, compressed data of the page is written by the processing in memory component to a compressed block device responsive to the compressed data satisfying one or more compressibility criteria. The compressed block device is a portion of the memory dedicated to storing data in a compressed form.
In accordance with the described techniques for bank-level parallelism for processing in memory, a plurality of commands are received for execution by a processing in memory component embedded in a memory. The memory includes a first bank and a second bank. The plurality of commands include a first stream of commands which cause the processing in memory component to perform operations that access the first bank and a second stream of commands which cause the processing in memory component to perform operations that access the second bank. A next row of the first bank that is to be accessed by the processing in memory component is identified. Further, a precharge command is scheduled to close a first row of the first bank and an activate command is scheduled to open the next row of the first bank in parallel with execution of the second stream of commands.
Systems and methods are disclosed for managing diversified virtual memory by an engine. Techniques disclosed include receiving one or more request messages, each request message including a job descriptor that specifies an operation to be performed on a respective virtual memory space, processing the job descriptors by generating one or more commands for transmission to one or more virtual memory managers, and transmitting the one or more commands to the one or more virtual memory managers (VMMs) for processing.
G06F 9/455 - Dispositions pour exécuter des programmes spécifiques Émulation; Interprétation; Simulation de logiciel, p.ex. virtualisation ou émulation des moteurs d’exécution d’applications ou de systèmes d’exploitation
G06F 3/06 - Entrée numérique à partir de, ou sortie numérique vers des supports d'enregistrement
G06F 12/02 - Adressage ou affectation; Réadressage
69.
Scheduling Processing-in-Memory Requests and Memory Requests
A memory controller coupled to a memory module receives both processing-in-memory (PIM) requests and memory requests from a host (e.g., a host processor). The memory controller issues PIM requests to one group of memory banks and concurrently issues memory requests to one or more other groups of memory banks. Accordingly, memory requests are performed on groups of memory banks that would otherwise be idle while PIM requests are performed on the one group of memory banks. Optionally, the memory controller coupled to the memory module also takes various actions when switching between operating in a PIM mode and a non-processing-in-memory mode to reduce or hide overhead when switching between the two modes.
A method and system for distributing keys in a key distribution system includes receiving a connection for communication from a first component. A determination is made whether the first component requires a key be generated and distributed. Based upon a security mode for the communication, the key generated and distributed to the first component.
Devices and methods for multi-resolution geometric representation for ray tracing are described which include casting a ray in a space comprising objects represented by geometric shapes and approximating a volume of the geometric shapes using an accelerated hierarchy structure. The accelerated hierarchy structure comprises first nodes each representing a volume of one of the geometric shapes in the space and second nodes each representing an approximate volume of a group of the geometric shapes. When the ray is determined to intersect a bounding box of a second node representing one group of the geometric shapes, a selection is made between traversal and non-traversal of other second nodes based on a LOD for representing the volume of the one group of geometric shapes.
Power management using temperature gradient information is described. In accordance with the described techniques, temperature measurements of a component are obtained from two or more sensors of the component. A temperature of a hotspot of the component is predicted based on the temperature measurements obtained from the two or more sensors of the component. Operation of the component is adjusted based on the predicted temperature of the hotspot.
An integrated circuit includes a power supply monitor, a clock generator, and a divider. The power supply monitor is operable to provide a trigger signal in response to a power supply voltage dropping below a threshold voltage. The clock generator is operable to provide a first clock signal having a frequency dependent on a value of a frequency control word, and to change the frequency of the first clock signal over time using a native slope in response to a change in the frequency control word. The divider is responsive to an assertion of the trigger signal to divide a frequency of the first clock signal by a divide value to provide a second clock signal.
H03L 7/08 - Commande automatique de fréquence ou de phase; Synchronisation utilisant un signal de référence qui est appliqué à une boucle verrouillée en fréquence ou en phase - Détails de la boucle verrouillée en phase
G01R 19/165 - Indication de ce qu'un courant ou une tension est, soit supérieur ou inférieur à une valeur prédéterminée, soit à l'intérieur ou à l'extérieur d'une plage de valeurs prédéterminée
74.
BACKSIDE POWER DELIVERY AND POWER GRID PATTERN TO SUPPORT 3D DIE STACKING
An apparatus and method for efficiently routing power signals across semiconductor dies. A semiconductor fabrication process (or process) places a first semiconductor die in an integrated circuit and stacks a second semiconductor die vertically adjacent to the first semiconductor die. The process forms multiple backside metal layers vertically adjacent to a backside of a silicon substrate of the second semiconductor die. The process forms a first backside metal layer that includes at least a first power route that forms a rectangle within the first backside metal layer. The process forms a second backside metal layer that includes at least a second power rail that forms an L-shape within the second backside metal layer. The process connects one or more corners of the rectangle of the first power rail to a corresponding corner of a separate power rail of the second backside metal layer that forms an L-shape.
H01L 25/065 - Ensembles consistant en une pluralité de dispositifs à semi-conducteurs ou d'autres dispositifs à l'état solide les dispositifs étant tous d'un type prévu dans le même sous-groupe des groupes , ou dans une seule sous-classe de , , p.ex. ensembles de diodes redresseuses les dispositifs n'ayant pas de conteneurs séparés les dispositifs étant d'un type prévu dans le groupe
H01L 23/48 - Dispositions pour conduire le courant électrique vers le ou hors du corps à l'état solide pendant son fonctionnement, p.ex. fils de connexion ou bornes
H01L 23/528 - Configuration de la structure d'interconnexion
H01L 23/538 - Dispositions pour conduire le courant électrique à l'intérieur du dispositif pendant son fonctionnement, d'un composant à un autre la structure d'interconnexion entre une pluralité de puces semi-conductrices se trouvant au-dessus ou à l'intérieur de substrats isolants
A/D bit storage, processing, and mode management techniques through use of a dense A/D bit representation are described. In one example, a memory management unit employs an A/D bit representation generation module to generate the dense A/D bit representation. In an implementation, the A/D bit representation is stored adjacent to existing page table structures of the multilevel page table hierarchy. In another example, memory management unit supports use of modes as part of A/D bit storage.
Predicates for processing in memory is described. In accordance with the described techniques, a predicate instruction to compute a conditional value based on data stored in a memory is provided to a processing-in-memory component. A response that includes the conditional value computed by the processing-in-memory component is received, and the conditional value is stored in a predicate register. One or more conditional instructions are provided to the processing-in-memory component based on the conditional value stored in the predicate register.
In accordance with described techniques for reduction of parallel memory operation messages, a computing system or computing device includes a memory system that receives memory operation messages. A shared response component in the memory system receives responses to the memory operation messages, and identifies a set of the responses that are coalesceable. The shared response component then coalesces the set of the responses into a combined message for communication completion through a communication path in the memory system.
In accordance with described techniques for filtered responses to memory operation messages, a computing system or computing device includes a memory system that receives messages. A filter component in the memory system receives the responses to the memory operation messages, and filters one or more of the responses based on a filterable condition. A tracking logic component tracks the one or more responses as filtered responses for communication completion.
Resource use orchestration for multiple application instances is described. In accordance with the described techniques, a time interval for accessing a resource is divided into multiple time slots. In one or more implementations, the resource is a graphics processing unit. Each of a plurality of containers associated with an application is assigned to one of the multiple time slots according to a disbursement algorithm. A respective signal offset is provided to each container based on an assigned time slot of the container. The provided signal offsets cause the plurality of containers to access the resource for the application in a predetermined order.
A63F 13/335 - Dispositions d’interconnexion entre des serveurs et des dispositifs de jeu; Dispositions d’interconnexion entre des dispositifs de jeu; Dispositions d’interconnexion entre des serveurs de jeu utilisant des connexions de réseau étendu [WAN] utilisant l’Internet
A63F 13/352 - Dispositions d’interconnexion entre des serveurs et des dispositifs de jeu; Dispositions d’interconnexion entre des dispositifs de jeu; Dispositions d’interconnexion entre des serveurs de jeu - Détails des serveurs de jeu comportant des dispositions particulières de serveurs de jeu, p.ex. des serveurs régionaux connectés à un serveur national ou à plusieurs serveurs gérant les partitions de jeu
A63F 13/358 - Adaptation du déroulement du jeu en fonction de la charge du réseau ou du serveur, p.ex. pour diminuer la latence due aux différents débits de connexion entre clients
80.
Memory Control for Data Processing Pipeline Optimization
Generating optimization instructions for data processing pipelines is described. A pipeline optimization system computes resource usage information that describes memory and compute usage metrics during execution of each stage of the data processing pipeline. The system additionally generates data storage information that describes how data output by each pipeline stage is utilized by other stages of the pipeline. The pipeline optimization system then generates the optimization instructions to control how memory operations are performed for a specific data processing pipeline during execution. In implementations, the optimization instructions cause a memory system to discard data (e.g., invalidate cache entries) without copying the discarded data to another storage location after the data is no longer needed by the pipeline. The optimization instructions alternatively or additionally control at least one of evicting, writing-back, or prefetching data to minimize latency during pipeline execution.
Devices and methods method of tiled rendering are provided which comprises dividing a frame to be rendered, into a plurality of tiles, receiving commands to execute a plurality of subpasses of the tiles, interleaving execution of same subpasses of multiple tiles of the frame by executing one or more subpasses as skip operations, storing visibility data, for subsequently ordered subpasses of the tiles, at memory addresses allocated for data of corresponding adjacent tiles in a first direction of traversal and rendering the tiles for the subsequently ordered subpasses using the visibility data stored at the memory addresses allocated for corresponding adjacent tiles in a second direction of traversal, opposite the first direction of traversal.
Data integrity checks for reducing communication latency is described. A transmitting endpoint transmits data to a receiving endpoint by generating an integrity tag for a first subset of data blocks and a second integrity tag for a second subset of data blocks. In implementations, the first and second integrity tags overlap at least one data block and are offset based on computational complexities of generating the integrity tags. A receiving endpoint generates comparison tags for each of the integrity tags and uses the comparison tags to validate an authenticity of received data. In response to validating the first and second integrity tags, data blocks covered by both the first and second integrity tags are released for use. Additional integrity tags are generated and validated for subsequent subsets of data blocks during data communication, thus reducing latency by offsetting times at which comparison tags are generated and validated.
Methods and devices are provided for processing data using a neural network. Activations from a previous layer of the neural network are received by a layer of the neural network. Weighted values, to be applied to values of elements of the activations, are determined based on a spatial correlation of the elements and a task error output by the layer. The weighted values are applied to the values of the elements and a combined error is determined based on the task error and the spatial correlation.
The disclosed computer-implemented method for interpolating register-based lookup tables can include identifying, within a set of registers, a lookup table that has been encoded for storage within the set of registers. The method can also include receiving a request to look up a value in the lookup table and responding to the request by interpolating, from the encoded lookup table stored in the set of registers, a representation of the requested value. Various other methods, systems, and computer-readable media are also disclosed.
An approach is provided for a gaming overlay application to provide automatic in-game subtitles and/or closed captions for video game applications. The overlay application accesses an audio stream and a video stream generated by an executing game application. The overlay application processes the audio stream through a text conversion engine to generate at least one subtitle. The overlay application determines a display position to associate with the at least one subtitle. The overlay application generates a subtitle overlay comprising the at least one subtitle located at the associated display position. The overlay application causes a portion of the video stream to be displayed with the subtitle overlay.
A63F 13/53 - Commande des signaux de sortie en fonction de la progression du jeu incluant des informations visuelles supplémentaires fournies à la scène de jeu, p.ex. en surimpression pour simuler un affichage tête haute [HUD] ou pour afficher une visée laser dans un jeu de tir
A63F 13/87 - Communiquer avec d’autres joueurs, p.ex. par courrier électronique ou messagerie instantanée
G10L 17/06 - Techniques de prise de décision; Stratégies d’alignement de motifs
G10L 17/26 - Reconnaissance de caractéristiques spéciales de voix, p.ex. pour utilisation dans les détecteurs de mensonge; Reconnaissance des voix d’animaux
Address translation service management techniques are described. These techniques are based on metadata that is usable to provide a hint as insight into memory access, and based on this, use of a translation lookaside buffer is optimized to control which entries are maintained in the queue and manage address translation requests.
G06F 12/1027 - Traduction d'adresses utilisant des moyens de traduction d’adresse associatifs ou pseudo-associatifs, p.ex. un répertoire de pages actives [TLB]
An apparatus and method for efficient power management of multiple integrated circuits. In various implementations, a computing system includes first partition and a second partition. The second partition includes video pre-processing circuitry that identifies regions of a video frame to be presented on a screen or monitor that don't change or regions that can have one or more of resolution and color accuracy be below a threshold. The first partition includes a parallel data processor with one or more compute units, each with multiple lanes of execution. Based on the identified regions, the first partition generates an execution mask indicating which lanes of the compute units are inactive. The parallel data processor copies result data from the active lanes to outputs of the inactive lanes.
G06F 1/3237 - Gestion de l’alimentation, c. à d. passage en mode d’économie d’énergie amorcé par événements Économie d’énergie caractérisée par l'action entreprise par désactivation de la génération ou de la distribution du signal d’horloge
G06F 9/48 - Lancement de programmes; Commutation de programmes, p.ex. par interruption
88.
HYBRID MEMORY ARCHITECTURE FOR ADVANCED 3D SYSTEMS
Disclosed wherein stacked memory dies that utilize a mix of high and low operational temperature memory and non-volatile based memory dies, and chip packages containing the same. High temperature memory dies, such as those using non-volatile memory (NVM) technologies are in a memory stack with low temperature memory dies, such as those having volatile memory technologies. In some cases, the high temperature memory technologies could be used together, in some cases, on the same IC die as logic circuitry. In one example, a memory stack is provided that include a first memory IC die having high temperature memory circuitry, such as non-volatile memory, stacked below a second memory IC die. The second memory IC die has high temperature memory circuitry, such as volatile memory circuitry.
H01L 25/065 - Ensembles consistant en une pluralité de dispositifs à semi-conducteurs ou d'autres dispositifs à l'état solide les dispositifs étant tous d'un type prévu dans le même sous-groupe des groupes , ou dans une seule sous-classe de , , p.ex. ensembles de diodes redresseuses les dispositifs n'ayant pas de conteneurs séparés les dispositifs étant d'un type prévu dans le groupe
H10B 80/00 - Ensembles de plusieurs dispositifs comprenant au moins un dispositif de mémoire couvert par la présente sous-classe
89.
3D LAYOUT AND ORGANIZATION FOR ENHANCEMENT OF MODERN MEMORY SYSTEMS
Memory stacks having substantially vertical bitlines, and chip packages having the same, are disclosed herein. In one example, a memory stack is provided that includes a first memory IC die and a second memory IC die. The second memory IC die is stacked on the first memory IC die. Bitlines are routed through the first and second IC dies in a substantially vertical orientation. Wordlines within the first memory IC die are oriented orthogonal to the bitlines.
H01L 25/065 - Ensembles consistant en une pluralité de dispositifs à semi-conducteurs ou d'autres dispositifs à l'état solide les dispositifs étant tous d'un type prévu dans le même sous-groupe des groupes , ou dans une seule sous-classe de , , p.ex. ensembles de diodes redresseuses les dispositifs n'ayant pas de conteneurs séparés les dispositifs étant d'un type prévu dans le groupe
G11C 11/4097 - Organisation de lignes de bits, p.ex. configuration de lignes de bits, lignes de bits repliées
H01L 25/18 - Ensembles consistant en une pluralité de dispositifs à semi-conducteurs ou d'autres dispositifs à l'état solide les dispositifs étant de types prévus dans plusieurs sous-groupes différents du même groupe principal des groupes , ou dans une seule sous-classe de ,
90.
REGISTER, FLOP, AND LATCH DESIGNS INLCUDING FERROELECTRIC AND LINEAR DIELECTRICS
A memory device includes a memory circuitry includes a first transmission grate, a first capacitor, a second transmission gate, and a second capacitor. The first transmission gate includes a first transistor connected between a first node and a second node. The first transistor having a gate terminal connected to a first clock node. The first clock node configured to receive a first clock signal. The first capacitor is connected between the second node and a first voltage node. The first capacitor is a ferroelectric capacitor. The second transmission gate includes a second transistor connected between the second node and a third node. The second transistor has a gate terminal connected to the first clock node. The second capacitor is connected between the third node and a second voltage node.
A memory device includes memory cells. A memory cell of the memory cells includes gate circuitry, a first capacitor, and a second capacitor. The gate circuitry is connected to a wordline and a bitline. The first capacitor is connected to the gate circuitry and a first drive line. The second capacitor is connected to the gate circuitry and a second drive line.
Dynamic memory operations are described. In accordance with the described techniques, a system includes a stacked memory and one or more memory monitors configured to monitor conditions of the stacked memory. A system manager is configured to receive the monitored conditions of the stacked memory from the one or more memory monitors, and dynamically adjust operation of the stacked memory based on the monitored conditions. In one or more implementations, a system includes a memory and at least one register configured to store a ranking for each of a plurality of portions of the memory. Each respective ranking is determined based on an associated retention time of the respective portion of the memory. A memory controller is configured to dynamically refresh the portions of the memory at different times based on the ranking for each of the plurality of portions of the memory stored in the at least one register.
Methods, devices, and systems for rendering primitives in a frame. During a visibility pass, state packets are processed to determine a register state, and the register state is stored in a memory device. During a rendering pass, the state packets are discarded and the register state is read from the memory device. In some implementations, a graphics pipeline is configured during the visibility pass based on the register state determined by processing the state packets, and the graphics pipeline is configured during the rendering pass based on the register state read from the memory device. In some implementations, replay control packets, draw packets, and the state packets, from a packet stream, are processed during the visibility pass; the draw packets are modified based on visibility information determined during the visibility pass; and the replay control packets and draw packets are processed, during the rendering pass.
Error correction for stacked memory is described. In accordance with the described techniques, a system includes a plurality of error correction code engines to detect vulnerabilities in a stacked memory and coordinate at least one vulnerability detected for a portion of the stacked memory to at least one other portion of the stacked memory.
A method and a processing device for performing rendering are disclosed. The method comprises generating a base hierarchy tree comprising data representing a first object and generating a second hierarchy tree representing a second object comprising shared data of the base hierarchy tree and the second hierarchy tree and difference data. The method further comprises storing the difference data in the memory without storing the shared data, and generating an overlay hierarchy tree comprising the shared data, the difference data, and indication information indicating nodes of the overlay hierarchy tree that comprise the difference data. The method further comprises rendering the first object using the data stored for the base hierarchy tree, and rendering the second object using any one or a combination of the shared data, the difference data, and the indication information.
A system and method for updating power supply voltages due to variations from aging are described. A functional unit includes a power supply monitor capable of measuring power supply variations in a region of the functional unit. An age counter measures an age of the functional unit. A control unit notifies the power supply monitor to measure an operating voltage reference. When the control unit receives a measured operating voltage reference, the control unit determines an updated age of the region different from the current age based on the measured operating voltage reference. The control unit updates the age counter with the corresponding age, which is younger than the previous age in some cases due to the region not experiencing predicted stress and aging. The control unit is capable of determining a voltage adjustment for the operating voltage reference based on an age indicated by the age counter.
A data processing system includes a data processor and a memory controller receiving memory access requests from the data processor and generating at least one memory access cycle to a memory system in response to the receiving. The memory controller includes a command queue and a sparse element processor. The command queue is for receiving and storing the memory access requests including a first memory access request including a small element request. The sparse element processor is for causing the memory controller to issue a second memory access request to the memory system in response to the first memory access request with a density greater than a density indicated by the first memory access request.
An electronic device includes a processor having processor circuitry and a leader memory controller, a controller coupled to the processor and having a follower memory controller, and a memory. The processor circuitry is operable to access the memory by issuing memory access requests to the leader memory controller. The leader memory controller is operable to complete the memory access requests using the follower memory controller to issue memory commands to the at least one memory die.
A data processing node includes a processor element and a data fabric circuit. The data fabric circuit is coupled to the processor element and to a local memory element and includes a crossbar switch. The data fabric circuit is operable to bypass the crossbar switch for memory access requests between the processor element and the local memory element.
A method for forming a semiconductor assembly that includes forming a first set of layers on a first wafer, where one or more layers of the first set includes one or more devices of the semiconductor assembly. The method further includes forming a second set of layers on a second wafer, where one or more layers of the second set include connections between one or more of the devices of the semiconductor assembly. The method additionally includes coupling a layer of the first set to a layer of the second set using metal to metal hybrid bonding.
H01L 23/00 - DISPOSITIFS À SEMI-CONDUCTEURS NON COUVERTS PAR LA CLASSE - Détails de dispositifs à semi-conducteurs ou d'autres dispositifs à l'état solide
H01L 25/065 - Ensembles consistant en une pluralité de dispositifs à semi-conducteurs ou d'autres dispositifs à l'état solide les dispositifs étant tous d'un type prévu dans le même sous-groupe des groupes , ou dans une seule sous-classe de , , p.ex. ensembles de diodes redresseuses les dispositifs n'ayant pas de conteneurs séparés les dispositifs étant d'un type prévu dans le groupe