An apparatus and method for efficiently routing power signals across a semiconductor die. In various implementations, an integrated circuit includes a micro through silicon via (TSV) that traverses a silicon substrate layer to a backside metal layer. The integrated circuit also includes power switches. The integrated circuit routes a power supply signal from the output of a power switch to a frontside power rail using the micro TSV and the backside metal layer. The integrated circuit also routes the power supply signal from the output of the power switch to the frontside power rail using a frontside metal layer. Therefore, the frontside metal layer and the backside metal layer provide power connection redundancy that increases charge sharing, improves wafer yield, reduces voltage droop, and reduces on-die area. In addition, the process routes a ground reference voltage level using both a frontside power rail and a backside power rail.
H01L 23/528 - Configuration de la structure d'interconnexion
H01L 21/768 - Fixation d'interconnexions servant à conduire le courant entre des composants distincts à l'intérieur du dispositif
H01L 23/48 - Dispositions pour conduire le courant électrique vers le ou hors du corps à l'état solide pendant son fonctionnement, p.ex. fils de connexion ou bornes
H01L 23/525 - Dispositions pour conduire le courant électrique à l'intérieur du dispositif pendant son fonctionnement, d'un composant à un autre comprenant des interconnexions externes formées d'une structure multicouche de couches conductrices et isolantes inséparables du corps semi-conducteur sur lequel elles ont été déposées avec des interconnexions modifiables
A semiconductor package includes multiple dies that share the same package pin. An output enable register provided on each die is used to select the die that drives an output to the shared pin. A hardware arbitration circuit ensures that two or more dies do not drive an output to the shared pin at the same time.
G06F 13/20 - Gestion de demandes d'interconnexion ou de transfert pour l'accès au bus d'entrée/sortie
H01L 23/538 - Dispositions pour conduire le courant électrique à l'intérieur du dispositif pendant son fonctionnement, d'un composant à un autre la structure d'interconnexion entre une pluralité de puces semi-conductrices se trouvant au-dessus ou à l'intérieur de substrats isolants
H01L 25/065 - Ensembles consistant en une pluralité de dispositifs à semi-conducteurs ou d'autres dispositifs à l'état solide les dispositifs étant tous d'un type prévu dans le même sous-groupe des groupes , ou dans une seule sous-classe de , , p.ex. ensembles de diodes redresseuses les dispositifs n'ayant pas de conteneurs séparés les dispositifs étant d'un type prévu dans le groupe
3.
PROCESSOR-GUIDED EXECUTION OF OFFLOADED INSTRUCTIONS USING FIXED FUNCTION OPERATIONS
Processor-guided execution of offloaded instructions using fixed function operations is disclosed. Instructions designated for remote execution by a target device are received by a processor. Each instruction includes, as an operand, a target register in the target device. The target register may be an architected virtual register. For each of the plurality of instructions, the processor transmits an offload request in the order that the instructions are received. The offload request includes the instruction designated for remote execution. The target device may be, for example, a processing-in-memory device or an accelerator coupled to a memory.
The disclosed device for packet coalescing includes detecting a trigger condition for initiating packet coalescing of packet traffic and sending, to an endpoint device, a notification to start packet coalescing. The device can observe a status in response to starting the packet coalescing and report a performance of the packet coalescing. A system can include a controller that detects a trigger condition for packet coalescing and notifies an endpoint device via a notification register. The controller can read a status register to report, based on the read status, a packet coalescing performance. Various other methods, systems, and computer-readable media are also disclosed.
A memory controller monitors memory command selected for dispatch to the memory and sends commands controlling a read clock state. A memory includes a read clock circuit and a mode register. The read clock circuit has an output for providing a hybrid read clock signal in response to a clock signal and a read clock mode signal. The read clock circuit provides the hybrid read clock signal as a free-running clock signal that toggles continuously, and as a strobe signal that is active only in response to the memory receiving a read command.
A processing system includes a primary processor and a co-processor. The primary processor is couplable to a memory subsystem having at least one memory and operating to execute system software employing memory address translations based on one or more page tables stored in the memory subsystem. The co-processor is likewise couplable to the memory subsystem and operates to perform iterations of a page table walk through one or more page tables maintained for the memory subsystem and to perform one or more page management operations on behalf of the system software based the iterations of the page table walk. The page management operations performed by the co-processor include analytic data aggregation, free list management and page allocation, page migration management, page table error detection, and the like.
G06F 12/1027 - Traduction d'adresses utilisant des moyens de traduction d’adresse associatifs ou pseudo-associatifs, p.ex. un répertoire de pages actives [TLB]
G06F 12/123 - Commande de remplacement utilisant des algorithmes de remplacement avec listes d’âge, p.ex. file d’attente, liste du type le plus récemment utilisé [MRU] ou liste du type le moins récemment utilisé [LRU]
7.
COMMUNICATION REDUCTION TECHNIQUES FOR PARALLEL COMPUTING
A physical system is simulated using a model including a plurality of elements in a mesh or grid. The elements are divided into partitions processed by different processing units. For some time steps, state data is transmitted between partitions and used to calculate flux data for updating the state of edge elements of the partitions. Periodically, transmission of state data is suppressed, and flux data is obtained by linear interpolation based on past flux data. Alternatively, flux data is obtained by processing state variables of an edge element and past flux data using a machine learning model, such as a DNN. Whether to suppress transmission of state data may be determined based on one or both of (a) uncertainty in an output of the machine learning model (e.g., Bayesian neural network) and (b) complexity of model of the physical system (e.g., spatial or temporal gradients).
G06F 30/23 - Optimisation, vérification ou simulation de l’objet conçu utilisant les méthodes des éléments finis [MEF] ou les méthodes à différences finies [MDF]
G06F 30/27 - Optimisation, vérification ou simulation de l’objet conçu utilisant l’apprentissage automatique, p.ex. l’intelligence artificielle, les réseaux neuronaux, les machines à support de vecteur [MSV] ou l’apprentissage d’un modèle
The disclosed device for packet coalescing includes detecting a trigger condition for initiating packet coalescing of packet traffic and sending, to an endpoint device, a notification to start packet coalescing. The device can observe a status in response to starting the packet coalescing and report a performance of the packet coalescing. A system can include a controller that detects a trigger condition for packet coalescing and notifies an endpoint device via a notification register. The controller can read a status register to report, based on the read status, a packet coalescing performance. Various other methods, systems, and computer-readable media are also disclosed.
H04L 47/43 - Assemblage ou désassemblage de paquets, p.ex. par segmentation et réassemblage [SAR]
H04L 43/08 - Surveillance ou test en fonction de métriques spécifiques, p.ex. la qualité du service [QoS], la consommation d’énergie ou les paramètres environnementaux
9.
Page rinsing scheme to keep a directory page in an exclusive state in a single complex
A method includes, in a cache directory, storing an entry associating a memory region with an exclusive coherency state, and in response to a memory access directed to the memory region, transmitting a demote superprobe to convert at least one cache line of the memory region from an exclusive coherency state to a shared coherency state.
A processing system includes a processor core for processing instructions and a memory that stores a page table set including an extended page table having an extended page table entry storing extended page table attributes associated with a physical memory page. The system receives a virtual address and translates the virtual address to a physical address for the physical memory page. One or more extended page attributes associated with the physical memory page are retrieved from the extended page table entry based on the virtual address.
G06F 12/00 - Accès à, adressage ou affectation dans des systèmes ou des architectures de mémoires
G06F 9/455 - Dispositions pour exécuter des programmes spécifiques Émulation; Interprétation; Simulation de logiciel, p.ex. virtualisation ou émulation des moteurs d’exécution d’applications ou de systèmes d’exploitation
G06F 12/06 - Adressage d'un bloc physique de transfert, p.ex. par adresse de base, adressage de modules, extension de l'espace d'adresse, spécialisation de mémoire
G06F 12/1009 - Traduction d'adresses avec tables de pages, p.ex. structures de table de page
G06F 13/00 - Interconnexion ou transfert d'information ou d'autres signaux entre mémoires, dispositifs d'entrée/sortie ou unités de traitement
11.
On-Demand Regulation of Memory Bandwidth Utilization to Service Requirements of Display
Systems, apparatuses, and methods for prefetching data by a display controller. From time to time, a performance-state change of a memory are performed. During such changes, a memory clock frequency is changed for a memory subsystem storing frame buffer(s) used to drive pixels to a display device. During the performance-state change, memory accesses may be temporarily blocked. To sustain a desired quality of service for the display, a display controller is configured to prefetch data in advance of the performance-state change. In order to ensure the display controller has sufficient memory bandwidth to accomplish the prefetch, bandwidth reduction circuitry in clients of the system are configured to temporarily reduce memory bandwidth of corresponding clients.
An apparatus and method for efficiently managing performance among multiple integrated circuits in separate semiconductor chips. In various implementations, a computing system includes at least a first processing node and a second processing node. While processing tasks, the first processing node uses a first memory and the second processing node uses a second memory. A first communication channel transfers data between the first processing node and the second processing node. The first processing node accesses the second memory using a second communication channel different from the first communication channel and supports point-to-point communication. The second memory services access requests from the first and second processing nodes as the access requests are received while foregoing access conflict detection. The first processing node accesses the second memory after determining a particular amount of time has elapsed after reception of an indication from the second processing node specifying that a particular task has begun.
A scheduler of an apparatus exposes an application programming interface (API) usable to specify quality-of-service (QoS) parameters, e.g., latency, throughput, and so forth. An application, for instance, specifies the QoS parameters for a workload to be processed using a hardware compute unit. The QoS parameters are employed by the scheduler as a basis to configure a partition within a hardware compute unit. The partition is configured such that processing resources that are available via the partition to process the workload comply with the specified quality-of-service.
A semiconductor module comprises multiple non-homogeneous semiconductor dies disposed on the semiconductor module, with each semiconductor die having a set of circuitry modules that are common to all of the semiconductor dies and also a set of supporting circuitry modules that are distinct between the semiconductor dies. An interconnect communicatively couples the semiconductor dies together. Commands for processing by the semiconductor module may be routed to individual semiconductor dies based on capabilities of the particular circuitry modules disposed on those individual semiconductor dies.
G06F 15/80 - Architectures de calculateurs universels à programmes enregistrés comprenant un ensemble d'unités de traitement à commande commune, p.ex. plusieurs processeurs de données à instruction unique
A method for receiving a multi-level error signal having more than two logic levels includes oversampling the multi-level error signal to provide sampled symbols, wherein a first level of the multi-level error signal indicates no error, and second and third levels of the multi-level error signal indicate first and second error conditions, respectively. The sampled signals are de-serialized to provide sets of symbols. A start of a symbol period is determined in response to detecting that a given sample is different from a prior sample, and the prior sample indicates no error. The sets of symbols are filtered to provide corresponding output symbols based on the start.
A method for operating a memory having a plurality of banks accessible in parallel, each bank including a plurality of grains accessible in parallel is provided. The method includes: based on a memory access request that specifies a memory address, identifying a set that stores data for the memory access request, wherein the set is spread across multiple grains of the plurality of grains; and performing operations to satisfy the memory access request, using entries of the set stored across the multiple grains of the plurality of grains.
An apparatus for component cooling includes a first heat transfer element configured to be thermally coupled to a heat-generating component and a second heat transfer element configured to be thermally coupled to the heat-generating component. A manifold is configured to receive a single fluid flow of a heat transfer medium and split the single fluid flow into a first split fluid flow provided to the first heat transfer element and a second split fluid flow provided to the second heat transfer element.
G05B 19/416 - Commande numérique (CN), c.à d. machines fonctionnant automatiquement, en particulier machines-outils, p.ex. dans un milieu de fabrication industriel, afin d'effectuer un positionnement, un mouvement ou des actions coordonnées au moyen de données d'u caractérisée par la commande de vitesse, d'accélération ou de décélération
A method for configuring a processor includes identifying a component cooling device thermally coupled to a processor, and configuring one or more operating parameters of the processor based on the identification of the component cooling device.
An apparatus for sub-cooling components includes a component cooling device, a processor thermally coupled to the component cooling device, an electronic component, a fan configured to direct an airflow across the processor and the electronic component, and a thermoelectric cooling device thermally coupled to the component cooling device. The thermoelectric cooling device is configured to cool the airflow from a first temperature to a second temperature.
An apparatus for component cooling includes a thermoelectric cooling (TEC) device configured to be thermally coupled to a first processor, and a controller. The controller is configured to receive at least one first parameter indicative of a first activity level of the first processor; determine a TEC power level from among a plurality of TEC power levels based on the at least one first parameter; and control providing of power to the TEC device at the determined TEC power level.
A processing system allocates memory to co-locate input and output operands for operations for processing in memory (PIM) execution in the same PIM-local memory while exploiting row-buffer locality and complying with conventional memory abstraction. The processing system identifies as “super rows” virtual rows that span all the banks of a memory device. Each super row has a different bank-interleaving pattern, referred to as a “color”. A group of contiguous super rows that has the same PIM-interleaving pattern is referred to as a “color group”. The processing system assigns memory addresses to each operand (e.g., vector) of an operation for PIM execution to a super row having a different color within the same color group to co-locate the operands for each PIM execution unit and uses address hashing to alternate between banks assigned to elements of a first operand and elements of a second operand of the operation.
G06F 12/06 - Adressage d'un bloc physique de transfert, p.ex. par adresse de base, adressage de modules, extension de l'espace d'adresse, spécialisation de mémoire
G06F 12/02 - Adressage ou affectation; Réadressage
22.
Selecting Between Basic and Global Persistent Flush Modes
Selecting between basic and global persistent flush modes is described. In accordance with the described techniques, a system includes a data fabric in electronic communication with at least one cache and a controller configured to select between operating in a global persistent flush mode and a basic persistent flush mode based on an available flushing latency of the system, control the at least one cache to flush dirty data to the data fabric in response to a flush event trigger while operating in the global persistent flush mode, and transmit a signal to switch control to an application to flush the dirty data from the at least one cache to the data fabric while operating in the basic persistent flush mode.
Runtime flushing to persistency in heterogenous systems is described. In accordance with the described techniques, a system may include a persistent memory in electronic communication with at least one cache and a controller configured to command the at least one cache to flush dirty data to the persistent memory in response to a dirtiness of the at least one cache reaching a cache dirtiness threshold.
G06F 12/0891 - Adressage d’un niveau de mémoire dans lequel l’accès aux données ou aux blocs de données désirés nécessite des moyens d’adressage associatif, p.ex. mémoires cache utilisant des moyens d’effacement, d’invalidation ou de réinitialisation
24.
APPARATUS, SYSTEM, AND METHOD FOR THROTTLING PREFETCHERS TO PREVENT TRAINING ON IRREGULAR MEMORY ACCESSES
A disclosed computing device includes at least one prefetcher and a processing device communicatively coupled to the prefetcher. The processing device is configured to detect a throttling instruction that indicates a start of a throttling region. The computing device is further configured to prevent the prefetcher from being trained on one or more memory instructions included in the throttling region in response to the throttling instruction. Various other apparatuses, systems, and methods are also disclosed.
G06F 12/0862 - Adressage d’un niveau de mémoire dans lequel l’accès aux données ou aux blocs de données désirés nécessite des moyens d’adressage associatif, p.ex. mémoires cache avec pré-lecture
25.
CONNECTING A CHIPLET TO AN INTERPOSER DIE AND TO A PACKAGE INTERFACE USING A SPACER INTERCONNECT COUPLED TO A PORTION OF THE CHIPLET
A semiconductor package assembly includes a package interface. An interposer die has a first surface and a second surface opposite to the first surface, where the first surface of the interposer is die positioned on the package interface. The interposer die includes a plurality of conductive connections between the first surface and second surface. A chiplet includes a connectivity region having conductive pathways, with a first portion of the connectivity region coupled to a conductive connection of the interposer die and a second portion of the connectivity region cantilevered from the interposer die.
H01L 23/498 - Connexions électriques sur des substrats isolants
H01L 23/00 - DISPOSITIFS À SEMI-CONDUCTEURS NON COUVERTS PAR LA CLASSE - Détails de dispositifs à semi-conducteurs ou d'autres dispositifs à l'état solide
The disclosed method includes detecting a stalled memory request, for a first level cache within a cache hierarchy of a processor, that includes an outstanding memory request associated with a second level cache within the cache hierarchy that is higher than the first level cache. The method further includes indicating, to the second level cache, that the first level cache is experiencing a starvation issue due to stalled memory requests to enable the second level cache to perform starvation-remediation actions. Various other methods, systems, and computer-readable media are also disclosed.
The disclosed computing device includes a cache memory and at least one processor coupled to the cache memory. The at least one processor is configured to copy data written to one or more nonredundant wordlines of the cache memory to one or more redundant wordlines of the cache memory. The at least one processor is additionally configured to detect a mismatch between data read from the one or more nonredundant wordlines and data stored in the one or more redundant wordlines. The at least one processor is also configured to perform a remediation action in response to detecting the mismatch. Various other methods, systems, and computer-readable media are also disclosed.
G06F 11/10 - Détection ou correction d'erreur par introduction de redondance dans la représentation des données, p.ex. en utilisant des codes de contrôle en ajoutant des chiffres binaires ou des symboles particuliers aux données exprimées suivant un code, p.ex. contrôle de parité, exclusion des 9 ou des 11
G06F 11/07 - Réaction à l'apparition d'un défaut, p.ex. tolérance de certains défauts
28.
Selecting a Tiling Scheme for Processing Instances of Input Data Through a Neural Netwok
An electronic device uses a tiling scheme selected from among a set of tiling schemes for processing instances of input data through a neural network. Each of the tiling schemes is associated with a different arrangement of portions into which instances of input data are divided for processing in the neural network. In operation, processing circuitry in the electronic device acquires information about a neural network and properties of the processing circuitry. The processing circuitry then selects a given tiling scheme from among a set of tiling schemes based on the information. The processing circuitry next processes instances of input data in the neural network using the given tiling scheme. Processing each instance of input data in the neural network includes dividing the instance of input data into portions based on the given tiling scheme, separately processing each of the portions in the neural network, and combining the respective outputs to generate an output for the instance of input data.
In response to receiving a scene description, a processing system generates a set of planes in the scene and a bounding volume representing a partition of the scene. Using the set of planes in the scene, a compute unit of an accelerated processing unit performs a spatial test on the bounding volume to determine whether the bounding volume intersects one or more planes of the set of planes in the scene. Based on the spatial test, the compute unit generates intersection data indicating whether the bounding volume intersects one or more planes of the set of planes in the scene. The accelerated processing unit then uses the intersection data to render the scene.
A data processing system includes a vector data processing unit that includes a shared scheduler queue configured to store in a same queue, at least one entry that includes at least a mask type instruction and another entry that includes at least a vector type instruction. Shared pipeline control logic controls a vector data path or a mask data path, based a type of instruction picked from the same queue. In some examples, at least one mask type instruction and the at least one vector type instruction each include a source operand having a corresponding shared source register bit field that indexes into both a mask register file and a vector register file. The shared pipeline control logic uses a mask register file or a vector register file depending on whether bits of the shared source register bit field identify a mask source register or a vector source register.
A processing device and method for executing a color twist operation are provided. The processing device comprises memory and a processor configured to convert values of pixels of a frame from a first color domain to a hue, saturation and value (HSV) color domain, adjust hue values and saturation values of the pixels, store the adjusted hue and saturation values in a portion of the memory local to the processor and convert the frame from the HSV color domain to the first color domain using the adjusted hue and saturation values stored in local memory. The adjusted hue and saturation values are generated from pre-adjusted values, which are generated from masked vector values.
Methods and devices are provided for processing image data on a sub-frame portion basis using layers of a convolutional neural network. The processing device comprises memory and a processor. The processor is configured to determine, for an input tile of an image, a receptive field via backward propagation and determine a size of the input tile based on the receptive field and an amount of local memory allocated to store data for the input tile. The processor determines whether the amount of local memory allocated to store the data of the input tile and padded data for the receptive field.
Connection modification based on traffic pattern is described. In accordance with the described techniques, a traffic pattern of memory operations across a set of connections between at least one device and at least one memory is monitored. The traffic pattern is then compared to a threshold traffic pattern condition, such as an amount of data traffic in different directions across the connections. A traffic direction of at least one connection of the set of connections is modified based on the traffic pattern corresponding to the threshold traffic pattern condition.
A memory system includes a PHY embodied on an integrated circuit, the PHY coupling to a memory over conductive traces on a substrate. The PHY includes a reference clock generation circuit providing a reference clock signal to the memory, a first group of driver circuits providing CA signals to the memory, and a second group of driver circuits providing DQ signals to the memory. A plurality of the conductive traces which carry the DQ signals are constructed with a length longer than that of conductive traces carrying the reference clock signal in order to reduce an effective insertion delay associated with coupling the reference clock signal to latch respective incoming DQ signals.
A method for performing prefetching operations is disclosed. The method includes storing a recorded access pattern indicating a set of accesses for a region; in response to an access within the region, fetching the recorded access pattern; and performing prefetching based on the access pattern.
G06F 12/0862 - Adressage d’un niveau de mémoire dans lequel l’accès aux données ou aux blocs de données désirés nécessite des moyens d’adressage associatif, p.ex. mémoires cache avec pré-lecture
A memory controller for generating accesses for a memory includes a row hammer logic circuit for providing a sample request. In response to the sample request, the memory controller generates a sample command for dispatch to the memory to cause the memory to capture a current row. In response to a completion of the sample command, the memory controller generates a mitigation command for dispatch to the memory.
G11C 11/4078 - Circuits de sécurité ou de protection, p.ex. afin d'empêcher la lecture ou l'écriture intempestives ou non autorisées; Cellules d'état; Cellules de test
G11C 11/406 - Organisation ou commande des cycles de rafraîchissement ou de régénération de la charge
37.
Executing Kernel Workgroups Across Multiple Compute Unit Types
Portions of programs, oftentimes referred to as kernels, are written by programmers to target a particular type of compute unit, such as a central processing unit (CPU) core or a graphics processing unit (GPU) core. When executing a kernel, the kernel is separated into multiple parts referred to as workgroups, and each workgroup is provided to a compute unit for execution. Usage of one type of compute unit is monitored and, in response to the one type of compute unit being idle, one or more workgroups targeting another type of compute unit are executed on the one type of compute unit. For example, usage of CPU cores is monitored, and in response to the CPU cores being idle, one or more workgroups targeting GPU cores are executed on the CPU cores.
A disclosed method can include (i) reporting, by a microcontroller, detection of a violation of a physical infrastructure constraint to a machine check architecture, (ii) triggering, by the machine check architecture in response to the reporting, a machine-check exception such that the violation of the physical infrastructure constraint is recorded, and (iii) performing a corrective action based on the triggering of the machine-check exception. Various other apparatuses, systems, and methods are also disclosed.
The disclosed computer-implemented method for generating remedy recommendations for power and performance issues within semiconductor software and hardware. For example, the disclosed systems and methods can apply a rule-based model to telemetry data to generate rule-based root-cause outputs as well as telemetry-based unknown outputs. The disclosed systems and methods can further apply a root-cause machine learning model to the telemetry-based unknown outputs to analyze deep and complex failure patterns with the telemetry-based unknown outputs to ultimately generate one or more root-cause remedy recommendations that are specific to the identified failure and the client computing device that is experiencing that failure.
Systems and methods for pushed prefetching include: multiple core complexes, each core complex having multiple cores and multiple caches, the multiple caches configured in a memory hierarchy with multiple levels; an interconnect device coupling the core complexes to each other and coupling the core complexes to shared memory, the shared memory at a lower level of the memory hierarchy than the multiple caches; and a push-based prefetcher having logic to: monitor memory traffic between caches of a first level of the memory hierarchy and the shared memory; and based on the monitoring, initiate a prefetch of data to a cache of the first level of the memory hierarchy.
G06F 12/0862 - Adressage d’un niveau de mémoire dans lequel l’accès aux données ou aux blocs de données désirés nécessite des moyens d’adressage associatif, p.ex. mémoires cache avec pré-lecture
G06F 12/0811 - Systèmes de mémoire cache multi-utilisateurs, multiprocesseurs ou multitraitement avec hiérarchies de mémoires cache multi-niveaux
41.
MATRIX MULTIPLICATION UNIT WITH FLEXIBLE PRECISION OPERATIONS
A processing unit such as a graphics processing unit (GPU) includes a plurality of vector signal processors (VSPs) that include multiply/accumulate elements. The processing unit also includes a plurality of registers associated with the plurality of VSPs. First portions of first and second matrices are fetched into the plurality of registers prior to a first round that includes a plurality of iterations. The multiply/accumulate elements perform matrix multiplication and accumulation on different combinations of subsets of the first portions of the first and second matrices in the plurality of iterations prior to fetching second portions of the first and second matrices into the plurality of registers for a second round. The accumulated results of multiplying the first portions of the first and second matrices are written into an output buffer in response to completing the plurality of iterations.
A remote display synchronization technique preserves the presence of a local display device for a remotely-rendered video stream. A server and a client device cooperate to dynamically determine a target frame rate for a stream of rendered frames suitable for the current capacities of the server and the client device and networking conditions. The server generates from this target frame rate a synchronization signal that serves as timing control for the rendering process. The client device may provide feedback to instigate a change in the target frame rate, and thus a corresponding change in the synchronization signal. In this approach, the rendering frame rate and the encoding frequency may be “synchronized” in a manner consistent with the capacities of the server, the network, and the client device, resulting in generation, encoding, transmission, decoding, and presentation of a stream of frames that mitigates missed encoding of frames while providing acceptable latency.
A63F 13/355 - Réalisation d’opérations pour le compte de clients ayant des capacités de traitement restreintes, p.ex. serveurs transformant une scène de jeu qui évolue en flux MPEG à transmettre à un téléphone portable ou à un client léger
A63F 13/358 - Adaptation du déroulement du jeu en fonction de la charge du réseau ou du serveur, p.ex. pour diminuer la latence due aux différents débits de connexion entre clients
H04N 19/132 - Procédés ou dispositions pour le codage, le décodage, la compression ou la décompression de signaux vidéo numériques utilisant le codage adaptatif caractérisés par l’élément, le paramètre ou la sélection affectés ou contrôlés par le codage adaptatif Échantillonnage, masquage ou troncature d’unités de codage, p.ex. ré-échantillonnage adaptatif, saut de trames, interpolation de trames ou masquage de coefficients haute fréquence de transformée
A display processing device includes a display device interface and a processing unit. The processing is configured to transition at least a first component of the display processing system into a low-power state in response to an active region of a first video frame of a plurality of video frames having completed. A second component of the display processing device is configured to maintain a temporal count value corresponding to a current frame line of the plurality of video frames, and further to generate a first signal in response to the temporal count value corresponding to a first trigger value. The first signal causes the at least first component to transition out of the low-power state.
H04N 19/172 - Procédés ou dispositions pour le codage, le décodage, la compression ou la décompression de signaux vidéo numériques utilisant le codage adaptatif caractérisés par l’unité de codage, c. à d. la partie structurelle ou sémantique du signal vidéo étant l’objet ou le sujet du codage adaptatif l’unité étant une zone de l'image, p.ex. un objet la zone étant une image, une trame ou un champ
H04N 21/43 - Traitement de contenu ou données additionnelles, p.ex. démultiplexage de données additionnelles d'un flux vidéo numérique; Opérations élémentaires de client, p.ex. surveillance du réseau domestique ou synchronisation de l'horloge du décodeur; Intergiciel de client
44.
MULTI-PASS WRITEBACK WITH SINGLE-PASS DISPLAY CONSUMPTION
Techniques described herein allow multi-pass writeback processing of graphical frames (such as those having a high or ultrahigh resolution) to reduce bandwidth for display operations by, for example, splitting an input stream for processing by separate graphical pipelines as two or more spatially segmented portions. After receiving a graphical frame for processing, the graphical frame is spatially segmented into multiple portions. Each of the multiple portions is provided to a respective graphical pipeline of a plurality of graphical pipelines for processing. Each processed portion of the graphical frame is written substantially simultaneously to a corresponding portion of a system memory.
G09G 3/20 - Dispositions ou circuits de commande présentant un intérêt uniquement pour l'affichage utilisant des moyens de visualisation autres que les tubes à rayons cathodiques pour la présentation d'un ensemble de plusieurs caractères, p.ex. d'une page, en composant l'ensemble par combinaison d'éléments individuels disposés en matrice
G09G 5/393 - Dispositions pour la mise à jour du contenu de la mémoire à mappage binaire
A method for performing prefetching operations is disclosed. The method includes storing a recorded access pattern indicating a set of accesses for a region; in response to an access within the region, fetching the recorded access pattern; and performing prefetching based on the access pattern.
G06F 12/0862 - Adressage d’un niveau de mémoire dans lequel l’accès aux données ou aux blocs de données désirés nécessite des moyens d’adressage associatif, p.ex. mémoires cache avec pré-lecture
G06F 3/06 - Entrée numérique à partir de, ou sortie numérique vers des supports d'enregistrement
A method and system for distributing keys in a key distribution system includes receiving a connection for communication from a first component. A determination is made whether the first component requires a key be generated and distributed. Based upon a security mode for the communication, the key generated and distributed to the first component.
G06F 21/57 - Certification ou préservation de plates-formes informatiques fiables, p.ex. démarrages ou arrêts sécurisés, suivis de version, contrôles de logiciel système, mises à jour sécurisées ou évaluation de vulnérabilité
In response to receiving a scene description, a processing system [100] generates a set of planes [242] in the scene and a bounding volume [128] representing a partition of the scene. Using the set of planes in the scene, a compute unit [230] of an accelerated processing unit [200] performs a spatial test on the bounding volume to determine whether the bounding volume intersects one or more planes of the set of planes in the scene. Based on the spatial test, the compute unit generates intersection data [435] indicating whether the bounding volume intersects one or more planes of the set of planes in the scene. The accelerated processing unit then uses the intersection data to render the scene.
A method for hierarchical work scheduling includes consuming a work item at a first scheduling domain having a local scheduler circuit and one or more workgroup processing elements. Consuming the work item produces a set of new work items. Subsequently, the local scheduler circuit distributes at least one new work item of the set of new work items to be executed locally at the first scheduling domain. If the local scheduler circuit of the first scheduling domain determines that the set of new work items includes one or more work items that would overload the first scheduling domain with work if scheduled for local execution, those work items are distributed to the next higher-level scheduler circuit in a scheduling domain hierarchy for redistribution to one or more other scheduling domains.
Connection modification based on traffic pattern is described. In accordance with the described techniques, a traffic pattern of memory operations across a set of connections between at least one device and at least one memory is monitored. The traffic pattern is then compared to a threshold traffic pattern condition, such as an amount of data traffic in different directions across the connections. A traffic direction of at least one connection of the set of connections is modified based on the traffic pattern corresponding to the threshold traffic pattern condition.
G06F 13/14 - Gestion de demandes d'interconnexion ou de transfert
H04L 43/08 - Surveillance ou test en fonction de métriques spécifiques, p.ex. la qualité du service [QoS], la consommation d’énergie ou les paramètres environnementaux
A memory system includes a PHY embodied on an integrated circuit, the PHY coupling to a memory over conductive traces on a substrate. The PHY includes a reference clock generation circuit providing a reference clock signal to the memory, a first group of driver circuits providing CA signals to the memory, and a second group of driver circuits providing DQ signals to the memory. A plurality of the conductive traces which carry the DQ signals are constructed with a length longer than that of conductive traces carrying the reference clock signal in order to reduce an effective insertion delay associated with coupling the reference clock signal to latch respective incoming DQ signals.
G11C 7/22 - Circuits de synchronisation ou d'horloge pour la lecture-écriture [R-W]; Générateurs ou gestion de signaux de commande pour la lecture-écriture [R-W]
G06F 3/06 - Entrée numérique à partir de, ou sortie numérique vers des supports d'enregistrement
An apparatus for component cooling includes a first heat transfer element configured to be thermally coupled to a heat-generating electronic component, and a second heat transfer element. The apparatus further includes a plurality of heat transfer paths thermally coupled between the first heat transfer element and the second heat transfer element. Each of the plurality of heat transfer paths configured to provide a separate heat conduction path from the first heat transfer element to the second heat transfer element. The apparatus further includes a manifold including a first fluid passage providing a first portion of a heat transfer fluid in thermal contact with the first heat transfer element, and a second fluid passage providing a second portion of the heat transfer fluid in thermal contact with the second heat transfer element.
H01L 23/467 - Dispositions pour le refroidissement, le chauffage, la ventilation ou la compensation de la température impliquant le transfert de chaleur par des fluides en circulation par une circulation de gaz, p.ex. d'air
H01L 33/64 - DISPOSITIFS À SEMI-CONDUCTEURS NON COUVERTS PAR LA CLASSE - Détails caractérisés par les éléments du boîtier des corps semi-conducteurs Éléments d'extraction de la chaleur ou de refroidissement
F25B 21/02 - Machines, installations ou systèmes utilisant des effets électriques ou magnétiques utilisant l'effet Nernst-Ettinghausen
H01L 23/473 - Dispositions pour le refroidissement, le chauffage, la ventilation ou la compensation de la température impliquant le transfert de chaleur par des fluides en circulation par une circulation de liquides
F28B 1/02 - Condenseurs dans lesquels la vapeur d'eau ou autre vapeur est séparée de l'agent de refroidissement par des parois, p.ex. condenseur à surface utilisant l'eau ou un autre liquide comme agent de refroidissement
52.
CONNECTING A CHIPLET TO AN INTERPOSER DIE AND TO A PACKAGE INTERFACE USING A SPACER INTERCONNECT COUPLED TO A PORTION OF THE CHIPLET
A semiconductor package assembly includes a package interface. An interposer die has a first surface and a second surface opposite to the first surface, where the first surface of the interposer is die positioned on the package interface. The interposer die includes a plurality of conductive connections between the first surface and second surface. A chiplet includes a connectivity region having conductive pathways, with a first portion of the connectivity region coupled to a conductive connection of the interposer die and a second portion of the connectivity region cantilevered from the interposer die.
H01L 23/538 - Dispositions pour conduire le courant électrique à l'intérieur du dispositif pendant son fonctionnement, d'un composant à un autre la structure d'interconnexion entre une pluralité de puces semi-conductrices se trouvant au-dessus ou à l'intérieur de substrats isolants
H01L 25/07 - Ensembles consistant en une pluralité de dispositifs à semi-conducteurs ou d'autres dispositifs à l'état solide les dispositifs étant tous d'un type prévu dans le même sous-groupe des groupes , ou dans une seule sous-classe de , , p.ex. ensembles de diodes redresseuses les dispositifs n'ayant pas de conteneurs séparés les dispositifs étant d'un type prévu dans le groupe
H01L 25/065 - Ensembles consistant en une pluralité de dispositifs à semi-conducteurs ou d'autres dispositifs à l'état solide les dispositifs étant tous d'un type prévu dans le même sous-groupe des groupes , ou dans une seule sous-classe de , , p.ex. ensembles de diodes redresseuses les dispositifs n'ayant pas de conteneurs séparés les dispositifs étant d'un type prévu dans le groupe
53.
LOW POWER PROCESSING OF REMOTE MANAGEABILITY REQUESTS
An apparatus and method for efficiently performing power management for multiple clients of a semiconductor chip that supports remote manageability. In various implementations, a network interface receives a packet, and sends at least an indication of the packet to a manageability processing circuity (MPC) of a processing node with multiple clients for processing tasks. The MPC determines whether a client or itself is a destination needed to process the packet. If the destination is the MPC, then packet processing is done by the MPC without involvement from the clients, which can be in an idle state. For example, the MPC can process a remote manageability packet requesting diagnostic information from one or more clients of the processing node. The network interface and the MPC use a sideband communication channel for data transmission, which foregoes lane training for further reduction in latency and power consumption.
G06F 1/3209 - Surveillance d’une activité à distance, p.ex. au travers de lignes téléphoniques ou de connexions réseau
G06F 1/3234 - Gestion de l’alimentation, c. à d. passage en mode d’économie d’énergie amorcé par événements Économie d’énergie caractérisée par l'action entreprise
G06F 1/3287 - Gestion de l’alimentation, c. à d. passage en mode d’économie d’énergie amorcé par événements Économie d’énergie caractérisée par l'action entreprise par la mise hors tension d’une unité fonctionnelle individuelle dans un ordinateur
G06F 1/3293 - Gestion de l’alimentation, c. à d. passage en mode d’économie d’énergie amorcé par événements Économie d’énergie caractérisée par l'action entreprise par transfert vers un processeur plus économe en énergie, p.ex. vers un sous-processeur
H04L 45/00 - Routage ou recherche de routes de paquets dans les réseaux de commutation de données
H04L 49/109 - TRANSMISSION D'INFORMATION NUMÉRIQUE, p.ex. COMMUNICATION TÉLÉGRAPHIQUE Éléments de commutation de paquets caractérisés par la construction de la matrice de commutation intégrés sur micropuce, p.ex. interrupteurs sur puce
A disclosed method can include (i) reporting, by a microcontroller, detection of a violation of a physical infrastructure constraint to a machine check architecture, (ii) triggering, by the machine check architecture in response to the reporting, a machine-check exception such that the violation of the physical infrastructure constraint is recorded, and (iii) performing a corrective action based on the triggering of the machine-check exception. Various other apparatuses, systems, and methods are also disclosed.
The disclosed method includes detecting a stalled memory request, for a first level cache within a cache hierarchy of a processor, that includes an outstanding memory request associated with a second level cache within the cache hierarchy that is higher than the first level cache. The method further includes indicating, to the second level cache, that the first level cache is experiencing a starvation issue due to stalled memory requests to enable the second level cache to perform starvation-remediation actions. Various other methods, systems, and computer-readable media are also disclosed.
A disclosed computing device includes at least one prefetcher and a processing device communicatively coupled to the prefetcher. The processing device is configured to detect a throttling instruction that indicates a start of a throttling region. The computing device is further configured to prevent the prefetcher from being trained on one or more memory instructions included in the throttling region in response to the throttling instruction. Various other apparatuses, systems, and methods are also disclosed.
G06F 3/06 - Entrée numérique à partir de, ou sortie numérique vers des supports d'enregistrement
G06F 12/0862 - Adressage d’un niveau de mémoire dans lequel l’accès aux données ou aux blocs de données désirés nécessite des moyens d’adressage associatif, p.ex. mémoires cache avec pré-lecture
A system and method for determining power-performance state transition thresholds in a computing system. A processor comprises several functional blocks and a power manager. Each of the functional blocks produces data corresponding to an activity level associated with the respective functional block. The power manager determines activity levels of the functional blocks and compares the activity level of a given functional block to a threshold to determine if a power-performance state (P-state) transition is indicated. The threshold is determined in part on a current P-state of the given functional block. When the current P-state of the given functional block is relatively high, the threshold activity level to transition to a higher P-state is higher than it would be if the current P-state were relatively low. The power manager is further configured to determine the thresholds based in part on one or more of a type of circuit being monitored and a type of workload being executed.
Systems, apparatuses, and methods for implementing a hierarchical scheduler. In various implementations, a processor includes a global scheduler, and a plurality of independent local schedulers with each of the local schedulers coupled to a plurality of processors. In one implementation, the processor is a graphics processing unit and the processors are computation units. The processor further includes a shared cache that is shared by the plurality of local schedulers. Each of the local schedulers also includes a local cache used by the local scheduler and processors coupled to the local scheduler. To schedule work items for execution, the global scheduler is configured to store one or more work items in the shared cache and convey an indication to a first local scheduler of the plurality of local schedulers which causes the first local scheduler to retrieve the one or more work items from the shared cache. Subsequent to retrieving the work items, the local scheduler is configured to schedule the retrieved work items for execution by the coupled processors. Each of the plurality of local schedulers is configured to schedule work items for execution independent of scheduling performed by other local schedulers.
Systems, apparatuses, and methods for implementing a message passing system to schedule work in a computing system. In various implementations, a processor includes a global scheduler, and a plurality of local schedulers with each of the local schedulers coupled to a plurality of processors. The processor further includes a shared cache that is shared by the plurality of local schedulers. Also, a plurality of mailboxes are implemented to enable communication between the local schedulers and the global scheduler. To schedule work items for execution, the global scheduler is configured to store one or more work items in the shared cache and store an indication in a mailbox for a first local scheduler of the plurality of local schedulers. Responsive to detecting the message in the mailbox, the first local scheduler identifies a location of the one or more work items in the shared cache and retrieves them for scheduling locally.
An apparatus and method for efficiently performing power management for multiple clients of a semiconductor chip that supports remote manageability. In various implementations, a network interface receives a packet, and sends at least an indication of the packet to a manageability processing circuitry (MPC) of a processing node with multiple clients for processing tasks. The MPC determines whether a client or itself is a destination needed to process the packet. If the destination is the MPC, then packet processing is done by the MPC without involvement from the clients, which can be in an idle state. For example, the MPC can process a remote manageability packet requesting diagnostic information from one or more clients of the processing node. The network interface and the MPC use a sideband communication channel for data transmission, which foregoes lane training for further reduction in latency and power consumption.
H04L 49/109 - TRANSMISSION D'INFORMATION NUMÉRIQUE, p.ex. COMMUNICATION TÉLÉGRAPHIQUE Éléments de commutation de paquets caractérisés par la construction de la matrice de commutation intégrés sur micropuce, p.ex. interrupteurs sur puce
An apparatus and method for efficiently routing power signals across a semiconductor die. In various implementations, an integrated circuit includes, at a first node that receives a power supply reference, a first micro through silicon via (TSV) that traverses through a silicon substrate layer to a backside metal layer. The integrated circuit includes, at a second node that receives the power supply reference, a second micro TSV that physically contacts at least one source region. The integrated circuit includes a first power rail that connects the first micro TSV to the second micro TSV. This power rail replaces contacts between the micro TSVs and a second power rail such as the frontside metal zero (M0) layer. Each of the first power rail, the second power rail, and the backside metal layer provides power connection redundancy that increases charge sharing, improves wafer yield, and reduces voltage droop.
H01L 23/528 - Configuration de la structure d'interconnexion
H01L 21/768 - Fixation d'interconnexions servant à conduire le courant entre des composants distincts à l'intérieur du dispositif
H01L 23/535 - Dispositions pour conduire le courant électrique à l'intérieur du dispositif pendant son fonctionnement, d'un composant à un autre comprenant des interconnexions internes, p.ex. structures d'interconnexions enterrées
H01L 27/118 - Circuits intégrés à tranche maîtresse
Data reuse cache techniques are described. In one example, a load instruction is generated by an execution unit of a processor unit. In response to the load instruction, data is loaded by a load-store unit for processing by the execution unit and is also stored to a data reuse cache communicatively coupled between the load-store unit and the execution unit. Upon receipt of a subsequent load instruction for the data from the execution unit, the data is loaded from the data reuse cache for processing by the execution unit.
G06F 12/0811 - Systèmes de mémoire cache multi-utilisateurs, multiprocesseurs ou multitraitement avec hiérarchies de mémoires cache multi-niveaux
G06F 12/0875 - Adressage d’un niveau de mémoire dans lequel l’accès aux données ou aux blocs de données désirés nécessite des moyens d’adressage associatif, p.ex. mémoires cache avec mémoire cache dédiée, p.ex. instruction ou pile
G06F 12/0884 - Mode parallèle, p.ex. en parallèle avec la mémoire principale ou l’unité centrale [CPU]
63.
BIGNUM ADDITION AND/OR SUBTRACTION WITH CARRY PROPAGATION
A processing unit includes a plurality of adders and a plurality of carry bit generation circuits. The plurality of adders add first and second X bit binary portion values of a first Y bit binary value and a second Y bit binary value. Y is a multiple of X. The plurality of adders further generate first carry bits. The plurality of carry bit generation circuits is coupled to the plurality of adders, respectively, and receive the first carry bits. The plurality of carry bit generation circuits generate second carry bits based on the first carry bits. The plurality of adders use the second carry bits to add the first and second X bit binary portions of the first and second Y bit binary values, respectively.
G06F 7/498 - Calculs avec des nombres décimaux utilisant des accumulateurs de type compteur
G06F 7/506 - Addition; Soustraction en mode parallèle binaire, c. à d. ayant un circuit de maniement de chiffre différent pour chaque position avec génération simultanée de retenue pour plusieurs étages ou propagation simultanée de retenue sur plusieurs étages
Methods, devices, and systems for retrieving information based on cache miss prediction. It is predicted, based on a history of cache misses at a private cache, that a cache lookup for the information will miss a shared victim cache. A speculative memory request is enabled based on the prediction that the cache lookup for the information will miss the shared victim cache. The information is fetched based on the enabled speculative memory request.
Devices and methods for node traversal for ray tracing are provided, which comprise casting a first ray in a space comprising objects represented by geometric shapes, traversing, for the first ray, at least one first node of an accelerated hierarchy structure representing an approximate volume of a group of the geometric shapes and a second node representing a volume of one of the geometric shapes, casting a second ray in the space, selecting, for the second ray, a starting node of traversal based on locations of intersection of the first ray and the second ray and an identifier which identifies one or more nodes intersected by the first ray and traversing, for the second ray, the accelerated hierarchy structure beginning at the starting node of traversal.
A method and apparatus for storing keys in a key storage block includes processing a key request. A first key is allocated based upon the key request. The first key is stored in the key storage block, wherein the first key is of a first size and includes a first rule.
Methods and systems are disclosed for reducing power consumption by a system including a digital unit and an optical unit. Techniques disclosed comprise generating a workload signature of an incoming workload to be executed by the system. Based on the generated workload signature, techniques disclosed comprise matching the incoming workload with a profile of stored workload profiles. The workload profiles are generated by a trace capture unit. Based on the associated profile, a task submission transaction is sent to the optical unit of the system, representative of a request to execute the incoming workload by the optical unit.
G06F 1/329 - Gestion de l’alimentation, c. à d. passage en mode d’économie d’énergie amorcé par événements Économie d’énergie caractérisée par l'action entreprise par planification de tâches
68.
COMPONENT COOLER WITH MULTIPLE HEAT TRANSFER PATHS
An apparatus for component cooling includes a first heat transfer element configured to be thermally coupled to a heat-generating component, and a second heat transfer element. The apparatus further includes a plurality of thermally conductive paths between the first heat transfer element and the second heat transfer element. Each of the plurality of thermally conductive paths are configured to provide a separate heat conduction path from the first heat transfer element to the second heat transfer element.
An apparatus for component cooling includes a first heat transfer element configured to be thermally coupled to a heat-generating electronic component, and a second heat transfer element. The apparatus further includes a plurality of heat transfer paths thermally coupled between the first heat transfer element and the second heat transfer element. Each of the plurality of heat transfer paths configured to provide a separate heat conduction path from the first heat transfer element to the second heat transfer element. The apparatus further includes a manifold including a first fluid passage providing a first portion of a heat transfer fluid in thermal contact with the first heat transfer element, and a second fluid passage providing a second portion of the heat transfer fluid in thermal contact with the second heat transfer element.
An apparatus for component cooling includes a manifold and a heat transfer element configured to be thermally coupled to a heat-generating component. The apparatus further includes a first spring mechanism between the manifold and the heat transfer element. The first spring mechanism is configured to apply a first force to the heat transfer element.
A method of forming a semiconductor assembly includes forming a set of through-silicon vias in a carrier wafer, where a layer of the carrier wafer includes integrated devices. A die is coupled to a top surface of the carrier wafer including the set of through-silicon vias using hybrid bonding. One or more connection layers of the die are coupled to one or more of the through-silicon vias and coupled to one or more of the integrated devices. A second wafer is coupled to a top surface of the die. An amount is removed from a bottom surface of the carrier wafer that is parallel to and opposite to the top surface of the carrier wafer to reveal a conductive portion of at least one of the through-silicon vias.
H01L 23/00 - DISPOSITIFS À SEMI-CONDUCTEURS NON COUVERTS PAR LA CLASSE - Détails de dispositifs à semi-conducteurs ou d'autres dispositifs à l'état solide
H01L 25/00 - Ensembles consistant en une pluralité de dispositifs à semi-conducteurs ou d'autres dispositifs à l'état solide
H01L 25/065 - Ensembles consistant en une pluralité de dispositifs à semi-conducteurs ou d'autres dispositifs à l'état solide les dispositifs étant tous d'un type prévu dans le même sous-groupe des groupes , ou dans une seule sous-classe de , , p.ex. ensembles de diodes redresseuses les dispositifs n'ayant pas de conteneurs séparés les dispositifs étant d'un type prévu dans le groupe
72.
VECTOR PROCESSING UNIT WITH PROGRAMMABLE MULTICYCLE SHUFFLE UNIT
An integrated circuit includes a vector data processing unit that employs a cross-lane shuffle unit including multiplexing logic that programmably shuffles packed source lane values, each corresponding to one of a plurality of vector lanes, to different output vector result lane positions over multiple cycles. In certain implementations, in a first cycle, control logic in the cross-shuffle unit controls the multiplexing logic to select source lane values to be placed in a first group of output vector result lane positions for a vector result register; and in at least a second cycle, the same multiplexing logic is reused to select source lane values to be placed in a second group of output vector result lane positions for the vector result register wherein at least one of the selected source lane values is moved to a different result lane position. Associated methods are also presented.
G06F 9/30 - Dispositions pour exécuter des instructions machines, p.ex. décodage d'instructions
G06F 9/345 - Adressage de l'opérande d'instruction ou du résultat ou accès à l'opérande d'instruction ou au résultat d'opérandes ou de résultats multiples
A method for hierarchical work scheduling includes consuming a work item at a first scheduling domain having a local scheduler circuit and one or more workgroup processing elements. Consuming the work item produces a set of new work items. Subsequently, the local scheduler circuit distributes at least one new work item of the set of new work items to be executed locally at the first scheduling domain. If the local scheduler circuit of the first scheduling domain determines that the set of new work items includes one or more work items that would overload the first scheduling domain with work if scheduled for local execution, those work items are distributed to the next higher-level scheduler circuit in a scheduling domain hierarchy for redistribution to one or more other scheduling domains.
A technique for servicing a memory request is disclosed. The technique includes obtaining permissions associated with a source and a destination specified by the memory request, obtaining a first set of address translations for the memory request, and executing operations for a first request, using the first set of address translations.
A memory controller includes a first arbiter for selecting memory commands for dispatch to a memory over a first channel, a second arbiter for selecting memory commands for dispatch to the memory over a second channel, and a test circuit. The test circuit generates a respective testing sequence of read commands and write commands for each of the first channel and second channel, and causes the testing sequences to be transmitted over the first and second channels at least partially overlapping in time without selection by the first or second arbiters.
A method includes, in a cache directory, storing a set of entries corresponding to one or more memory regions having a first region size when the cache directory is in a first configuration, and based on a workload sparsity metric, reconfiguring the cache directory to a second configuration. In the second configuration, each entry in the set of entries corresponds to a memory region having a second region size.
G06F 12/0895 - Mémoires cache caractérisées par leur organisation ou leur structure de parties de mémoires cache, p.ex. répertoire ou matrice d’étiquettes
77.
SECURITY FOR SIMULTANEOUS MULTITHREADING PROCESSORS
A processor implements a simultaneous multithreading (SMT) protection mode that, when enabled, prevents execution of particular software (e.g., a virtual machine) at a processor core when a thread associated with different software (e.g., a different virtual machine or a hypervisor) is currently executing at the processor core. By preventing execution of the software, data, software execution patterns, and other potentially sensitive information is kept protected from unauthorized access or detection. Further, in at least some embodiments the SMT protection mode is implemented on a per-software basis, so that different software can choose whether to implement the protection mode, thereby allowing the processor to be employed in a wide variety of computing environments.
G06F 9/455 - Dispositions pour exécuter des programmes spécifiques Émulation; Interprétation; Simulation de logiciel, p.ex. virtualisation ou émulation des moteurs d’exécution d’applications ou de systèmes d’exploitation
G06F 9/48 - Lancement de programmes; Commutation de programmes, p.ex. par interruption
A processor for implementing a last use cache policy is configured to access data in a portion of a cache, determine that the data in the portion of the cache is no longer needed, and mark the data in the portion of the cache as non-dirty responsive to the determining that the data in the portion of the cache is no longer needed. The marking of the data as non-dirty is indicative that the data in the portion of the cache is not to be evicted from the cache to a memory.
A device includes one or more processors and one or more parallel accelerated processors. Additionally, a system management unit is configured to monitor the one or more parallel accelerated processors and to obtain one or more power management metrics for a parallel accelerated processor. A host driver included in the device is configured to receive a guest request for one or more power management metrics of the parallel accelerated processor from a virtual function of a virtual machine executing on the device, to transmit a host request for the one or more power management metrics from the host driver to the system management unit in response to receiving the guest request, to receive the one or more power management metrics from the system management unit at the host driver, and to transmit the one or more power management metrics from the host driver to the virtual machine.
G06F 9/455 - Dispositions pour exécuter des programmes spécifiques Émulation; Interprétation; Simulation de logiciel, p.ex. virtualisation ou émulation des moteurs d’exécution d’applications ou de systèmes d’exploitation
Predicates for processing in memory is described. In accordance with the described techniques, a predicate instruction to compute a conditional value based on data stored in a memory is provided to a processing-in-memory component. A response that includes the conditional value computed by the processing-in-memory component is received, and the conditional value is stored in a predicate register. One or more conditional instructions are provided to the processing-in-memory component based on the conditional value stored in the predicate register.
A method includes, in a cache directory, storing a set of entries corresponding to one or more memory regions having a first region size when the cache directory is in a first configuration, and based on a workload sparsity metric, reconfiguring the cache directory to a second configuration. In the second configuration, each entry in the set of entries corresponds to a memory region having a second region size.
G06F 12/0893 - Mémoires cache caractérisées par leur organisation ou leur structure
G06F 12/0891 - Adressage d’un niveau de mémoire dans lequel l’accès aux données ou aux blocs de données désirés nécessite des moyens d’adressage associatif, p.ex. mémoires cache utilisant des moyens d’effacement, d’invalidation ou de réinitialisation
82.
SELECTING BETWEEN BASIC AND GLOBAL PERSISTENT FLUSH MODES
Selecting between basic and global persistent flush modes is described. In accordance with the described techniques, a system includes a data fabric in electronic communication with at least one cache and a controller configured to select between operating in a global persistent flush mode and a basic persistent flush mode based on an available flushing latency of the system, control the at least one cache to flush dirty data to the data fabric in response to a flush event trigger while operating in the global persistent flush mode, and transmit a signal to switch control to an application to flush the dirty data from the at least one cache to the data fabric while operating in the basic persistent flush mode.
Runtime flushing to persistency in heterogenous systems is described. In accordance with the described techniques, a system may include a persistent memory in electronic communication with at least one cache and a controller configured to command the at least one cache to flush dirty data to the persistent memory in response to a dirtiness of the at least one cache reaching a cache dirtiness threshold.
An apparatus for sub-cooling components includes a component cooling device, a processor thermally coupled to the component cooling device, an electronic component, a fan configured to direct an airflow across the processor and the electronic component, and a thermoelectric cooling device thermally coupled to the component cooling device. The thermoelectric cooling device is configured to cool the airflow from a first temperature to a second temperature.
H05K 7/20 - Modifications en vue de faciliter la réfrigération, l'aération ou le chauffage
F25B 21/02 - Machines, installations ou systèmes utilisant des effets électriques ou magnétiques utilisant l'effet Nernst-Ettinghausen
H01L 23/467 - Dispositions pour le refroidissement, le chauffage, la ventilation ou la compensation de la température impliquant le transfert de chaleur par des fluides en circulation par une circulation de gaz, p.ex. d'air
H01L 23/473 - Dispositions pour le refroidissement, le chauffage, la ventilation ou la compensation de la température impliquant le transfert de chaleur par des fluides en circulation par une circulation de liquides
H01L 23/38 - Dispositifs de refroidissement utilisant l'effet Peltier
Data reuse cache techniques are described. In one example, a load instruction is generated by an execution unit of a processor unit. In response to the load instruction, data is loaded by a load-store unit for processing by the execution unit and is also stored to a data reuse cache communicatively coupled between the load-store unit and the execution unit. Upon receipt of a subsequent load instruction for the data from the execution unit, the data is loaded from the data reuse cache for processing by the execution unit.
G06F 12/0802 - Adressage d’un niveau de mémoire dans lequel l’accès aux données ou aux blocs de données désirés nécessite des moyens d’adressage associatif, p.ex. mémoires cache
86.
ON-DEMAND REGULATION OF MEMORY BANDWIDTH UTILIZATION TO SERVICE REQUIREMENTS OF DISPLAY
Systems, apparatuses, and methods for prefetching data by a display controller are proposed. From time to time, a performance-state change of a memory is performed. During such changes, a memory clock frequency is changed for a memory subsystem (220) storing frame buffer(s) (230) used to drive pixels to a display device (250). During the performance-state change, memory accesses may be temporarily blocked. To sustain a desired quality of service for the display, a display controller (150) is configured to prefetch data in advance of the performance-state change. In order to ensure the display controller has sufficient memory bandwidth to accomplish the prefetch, bandwidth reduction circuitry (112A, 112N) in clients (205) of the system are configured to temporarily reduce memory bandwidth of corresponding clients.
G09G 5/00 - Dispositions ou circuits de commande de l'affichage communs à l'affichage utilisant des tubes à rayons cathodiques et à l'affichage utilisant d'autres moyens de visualisation
G06F 1/324 - Gestion de l’alimentation, c. à d. passage en mode d’économie d’énergie amorcé par événements Économie d’énergie caractérisée par l'action entreprise par réduction de la fréquence d’horloge
G06F 1/3234 - Gestion de l’alimentation, c. à d. passage en mode d’économie d’énergie amorcé par événements Économie d’énergie caractérisée par l'action entreprise
G06F 3/06 - Entrée numérique à partir de, ou sortie numérique vers des supports d'enregistrement
G06F 12/0862 - Adressage d’un niveau de mémoire dans lequel l’accès aux données ou aux blocs de données désirés nécessite des moyens d’adressage associatif, p.ex. mémoires cache avec pré-lecture
G09G 5/393 - Dispositions pour la mise à jour du contenu de la mémoire à mappage binaire
G11C 7/22 - Circuits de synchronisation ou d'horloge pour la lecture-écriture [R-W]; Générateurs ou gestion de signaux de commande pour la lecture-écriture [R-W]
G09G 5/395 - Dispositions spécialement adaptées pour le transfert du contenu de la mémoire à mappage binaire vers l'écran
Systems, apparatuses, and methods for implementing a hierarchical scheduler. In various implementations, a processor includes a global scheduler, and a plurality of independent local schedulers with each of the local schedulers coupled to a plurality of processors. In one implementation, the processor is a graphics processing unit and the processors are computation units. The processor further includes a shared cache that is shared by the plurality of local schedulers. Each of the local schedulers also includes a local cache used by the local scheduler and processors coupled to the local scheduler. To schedule work items for execution, the global scheduler is configured to store one or more work items in the shared cache and convey an indication to a first local scheduler of the plurality of local schedulers which causes the first local scheduler to retrieve the one or more work items from the shared cache. Subsequent to retrieving the work items, the local scheduler is configured to schedule the retrieved work items for execution by the coupled processors. Each of the plurality of local schedulers is configured to schedule work items for execution independent of scheduling performed by other local schedulers.
Systems, apparatuses, and methods for implementing a message passing system to schedule work in a computing system. In various implementations, a processor includes a global scheduler, and a plurality of local schedulers with each of the local schedulers coupled to a plurality of processors. The processor further includes a shared cache that is shared by the plurality of local schedulers. Also, a plurality of mailboxes are implemented to enable communication between the local schedulers and the global scheduler. To schedule work items for execution, the global scheduler is configured to store one or more work items in the shared cache and store an indication in a mailbox for a first local scheduler of the plurality of local schedulers. Responsive to detecting the message in the mailbox, the first local scheduler identifies a location of the one or more work items in the shared cache and retrieves them for scheduling locally.
The disclosed computing device includes a cache memory and at least one processor coupled to the cache memory. The at least one processor is configured to copy data written to one or more nonredundant wordlines of the cache memory to one or more redundant wordlines of the cache memory. The at least one processor is additionally configured to detect a mismatch between data read from the one or more nonredundant wordlines and data stored in the one or more redundant wordlines. The at least one processor is also configured to perform a remediation action in response to detecting the mismatch. Various other methods, systems, and computer-readable media are also disclosed.
G06F 3/06 - Entrée numérique à partir de, ou sortie numérique vers des supports d'enregistrement
G06F 12/0802 - Adressage d’un niveau de mémoire dans lequel l’accès aux données ou aux blocs de données désirés nécessite des moyens d’adressage associatif, p.ex. mémoires cache
90.
SYSTEMS AND METHODS FOR GENERATING REMEDY RECOMMENDATIONS FOR POWER AND PERFORMANCE ISSUES WITHIN SEMICONDUCTOR SOFTWARE AND HARDWARE
The disclosed computer-implemented method for generating remedy recommendations for power and performance issues within semiconductor software and hardware. For example, the disclosed systems and methods can apply a rule-based model to telemetry data to generate rule-based root-cause outputs as well as telemetry-based unknown outputs. The disclosed systems and methods can further apply a root-cause machine learning model to the telemetry-based unknown outputs to analyze deep and complex failure patterns with the telemetry-based unknown outputs to ultimately generate one or more root-cause remedy recommendations that are specific to the identified failure and the client computing device that is experiencing that failure.
G06F 11/07 - Réaction à l'apparition d'un défaut, p.ex. tolérance de certains défauts
G06F 11/22 - Détection ou localisation du matériel d'ordinateur défectueux en effectuant des tests pendant les opérations d'attente ou pendant les temps morts, p.ex. essais de mise en route
G06F 11/34 - Enregistrement ou évaluation statistique de l'activité du calculateur, p.ex. des interruptions ou des opérations d'entrée–sortie
Methods, devices, and systems for retrieving information based on cache miss prediction. It is predicted, based on a history of cache misses at a private cache, that a cache lookup for the information will miss a shared victim cache. A speculative memory request is enabled based on the prediction that the cache lookup for the information will miss the shared victim cache. The information is fetched based on the enabled speculative memory request.
A method for operating a memory having a plurality of banks accessible in parallel, each bank including a plurality of grains accessible in parallel is provided. The method includes: based on a memory access request that specifies a memory address, identifying a set that stores data for the memory access request, wherein the set is spread across multiple grains of the plurality of grains; and performing operations to satisfy the memory access request, using entries of the set stored across the multiple grains of the plurality of grains.
A processor [101] implements a simultaneous multithreading (SMT) protection mode that, when enabled, prevents execution of particular software [106] (e.g., a virtual machine) at a processor core [102] when a thread associated with different software (e.g., a different virtual machine or a hypervisor) is currently executing at the processor core. By preventing execution of the software, data, software execution patterns, and other potentially sensitive information is kept protected from unauthorized access or detection. Further, in at least some embodiments the SMT protection mode is implemented on a per-software basis, so that different software can choose whether to implement the protection mode, thereby allowing the processor to be employed in a wide variety of computing environments.
G06F 9/455 - Dispositions pour exécuter des programmes spécifiques Émulation; Interprétation; Simulation de logiciel, p.ex. virtualisation ou émulation des moteurs d’exécution d’applications ou de systèmes d’exploitation
G06F 9/30 - Dispositions pour exécuter des instructions machines, p.ex. décodage d'instructions
A memory controller for generating accesses for a memory includes a row hammer logic circuit for providing a sample request. In response to the sample request, the memory controller generates a sample command for dispatch to the memory to cause the memory to capture a current row. In response to a completion of the sample command, the memory controller generates a mitigation command for dispatch to the memory.
A method for receiving a multi-level error signal having more than two logic levels includes oversampling the multi-level error signal to provide sampled symbols, wherein a first level of the multi-level error signal indicates no error, and second and third levels of the multi-level error signal indicate first and second error conditions, respectively. The sampled signals are de-serialized to provide sets of symbols. A start of a symbol period is determined in response to detecting that a given sample is different from a prior sample, and the prior sample indicates no error. The sets of symbols are filtered to provide corresponding output symbols based on the start.
G11C 11/4096 - Circuits de commande ou de gestion d'entrée/sortie [E/S, I/O] de données, p.ex. circuits pour la lecture ou l'écriture, circuits d'attaque d'entrée/sortie ou commutateurs de lignes de bits
G06F 3/06 - Entrée numérique à partir de, ou sortie numérique vers des supports d'enregistrement
G11C 29/52 - Protection du contenu des mémoires; Détection d'erreurs dans le contenu des mémoires
96.
REMOTE DISPLAY SYNCHRONIZATION TO PRESERVE LOCAL DISPLAY
A remote display synchronization technique preserves the presence of a local display device for a remotely-rendered video stream. A server and a client device cooperate to dynamically determine a target frame rate for a stream of rendered frames suitable for the current capacities of the server and the client device and networking conditions. The server generates from this target frame rate a synchronization signal that serves as timing control for the rendering process. The client device may provide feedback to instigate a change in the target frame rate, and thus a corresponding change in the synchronization signal. In this approach, the rendering frame rate and the encoding frequency may be "synchronized" in a manner consistent with the capacities of the server, the network, and the client device, resulting in generation, encoding, transmission, decoding, and presentation of a stream of frames that mitigates missed encoding of frames while providing acceptable latency.
H04N 21/242 - Procédés de synchronisation, p.ex. traitement de références d'horloge de programme [PCR]
H04N 21/2662 - Contrôle de la complexité du flux vidéo, p.ex. en mettant à l'échelle la résolution ou le débit binaire du flux vidéo en fonction des capacités du client
H04N 21/24 - Surveillance de procédés ou de ressources, p.ex. surveillance de la charge du serveur, de la bande passante disponible ou des requêtes effectuées sur la voie montante
A memory controller coupled to a memory module receives both processing-in-memory (PIM) requests and memory requests from a host (e.g., a host processor). The memory controller issues PIM requests to one group of memory banks and concurrently issues memory requests to one or more other groups of memory banks. Accordingly, memory requests are performed on groups of memory banks that would otherwise be idle while PIM requests are performed on the one group of memory banks. Optionally, the memory controller coupled to the memory module also takes various actions when switching between operating in a PIM mode and a non-processing-in-memory mode to reduce or hide overhead when switching between the two modes.
A processing unit includes a plurality of adders and a plurality of carry bit generation circuits. The plurality of adders add first and second X bit binary portion values of a first Y bit binary value and a second Y bit binary value. Y is a multiple of X. The plurality of adders further generate first carry bits. The plurality of carry bit generation circuits is coupled to the plurality of adders, respectively, and receive the first carry bits. The plurality of carry bit generation circuits generate second carry bits based on the first carry bits. The plurality of adders use the second carry bits to add the first and second X bit binary portions of the first and second Y bit binary values, respectively.
G06F 7/506 - Addition; Soustraction en mode parallèle binaire, c. à d. ayant un circuit de maniement de chiffre différent pour chaque position avec génération simultanée de retenue pour plusieurs étages ou propagation simultanée de retenue sur plusieurs étages
G06F 7/509 - Addition; Soustraction en mode parallèle binaire, c. à d. ayant un circuit de maniement de chiffre différent pour chaque position pour opérandes multiples, p.ex. intégrateurs numériques
G06F 9/30 - Dispositions pour exécuter des instructions machines, p.ex. décodage d'instructions
H04L 9/32 - Dispositions pour les communications secrètes ou protégées; Protocoles réseaux de sécurité comprenant des moyens pour vérifier l'identité ou l'autorisation d'un utilisateur du système
99.
SIGNAL INTERFERENCE TESTING USING RELIABLE READ WRITE INTERFACE
A memory controller includes a first arbiter for selecting memory commands for dispatch to a memory over a first channel, a second arbiter for selecting memory commands for dispatch to the memory over a second channel, and a test circuit. The test circuit generates a respective testing sequence of read commands and write commands for each of the first channel and second channel, and causes the testing sequences to be transmitted over the first and second channels at least partially overlapping in time without selection by the first or second arbiters.
Systems, apparatuses, and methods for managing power and performance in a computing system. A system management unit detects a condition indicating a change in a power-performance state of a given computing unit is indicated. In response to detecting the indication, the system management unit is configured to initiate a change to a frequency of a clock signal generated by an adaptive oscillator by changing a voltage supplied to the adaptive oscillator. The adaptive oscillator is configured to rapidly change a frequency of the clock signal generated in response to detecting a change in a droopy supply voltage of the adaptive oscillator. The new frequency generated by the adaptive oscillator is based in part on a difference between the droopy supply voltage and a regulated supply voltage of the adaptive oscillator.