Data integrity checks for reducing communication latency is described. A transmitting endpoint transmits data to a receiving endpoint by generating an integrity tag for a first subset of data blocks and a second integrity tag for a second subset of data blocks. In implementations, the first and second integrity tags overlap at least one data block and are offset based on computational complexities of generating the integrity tags. A receiving endpoint generates comparison tags for each of the integrity tags and uses the comparison tags to validate an authenticity of received data. In response to validating the first and second integrity tags, data blocks covered by both the first and second integrity tags are released for use. Additional integrity tags are generated and validated for subsequent subsets of data blocks during data communication, thus reducing latency by offsetting times at which comparison tags are generated and validated.
H04L 9/32 - Dispositions pour les communications secrètes ou protégées; Protocoles réseaux de sécurité comprenant des moyens pour vérifier l'identité ou l'autorisation d'un utilisateur du système
2.
HYBRID MEMORY ARCHITECTURE FOR ADVANCED 3D SYSTEMS
Disclosed wherein stacked memory dies that utilize a mix of high and low operational temperature memory and non-volatile based memory dies, and chip packages containing the same. High temperature memory dies, such as those using non-volatile memory (NVM) technologies are in a memory stack with low temperature memory dies, such as those having volatile memory technologies. In some cases, the high temperature memory technologies could be used together, in some cases, on the same IC die as logic circuitry. In one example, a memory stack is provided that include a first memory IC die having high temperature memory circuitry, such as non-volatile memory, stacked below a second memory IC die. The second memory IC die has high temperature memory circuitry, such as volatile memory circuitry.
G11C 5/02 - Disposition d'éléments d'emmagasinage, p.ex. sous la forme d'une matrice
G11C 11/22 - Mémoires numériques caractérisées par l'utilisation d'éléments d'emmagasinage électriques ou magnétiques particuliers; Eléments d'emmagasinage correspondants utilisant des éléments électriques utilisant des éléments ferro-électriques
G11C 11/401 - Mémoires numériques caractérisées par l'utilisation d'éléments d'emmagasinage électriques ou magnétiques particuliers; Eléments d'emmagasinage correspondants utilisant des éléments électriques utilisant des dispositifs à semi-conducteurs utilisant des transistors formant des cellules nécessitant un rafraîchissement ou une régénération de la charge, c. à d. cellules dynamiques
3.
3D LAYOUT AND ORGANIZATION FOR ENHANCEMENT OF MODERN MEMORY SYSTEMS
Memory stacks having substantially vertical bitlines, and chip packages having the same, are disclosed herein. In one example, a memory stack is provided that includes a first memory IC die and a second memory IC die. The second memory IC die is stacked on the first memory IC die. Bitlines are routed through the first and second IC dies in a substantially vertical orientation. Wordlines within the first memory IC die are oriented orthogonal to the bitlines.
Error correction for stacked memory is described. In accordance with the described techniques, a system includes a plurality of error correction code engines to detect vulnerabilities in a stacked memory and coordinate at least one vulnerability detected for a portion of the stacked memory to at least one other portion of the stacked memory.
G06F 11/10 - Détection ou correction d'erreur par introduction de redondance dans la représentation des données, p.ex. en utilisant des codes de contrôle en ajoutant des chiffres binaires ou des symboles particuliers aux données exprimées suivant un code, p.ex. contrôle de parité, exclusion des 9 ou des 11
Dynamic memory operations are described. In accordance with the described techniques, a system includes a stacked memory and one or more memory monitors configured to monitor conditions of the stacked memory. A system manager is configured to receive the monitored conditions of the stacked memory from the one or more memory monitors, and dynamically adjust operation of the stacked memory based on the monitored conditions. In one or more implementations, a system includes a memory and at least one register configured to store a ranking for each of a plurality of portions of the memory. Each respective ranking is determined based on an associated retention time of the respective portion of the memory. A memory controller is configured to dynamically refresh the portions of the memory at different times based on the ranking for each of the plurality of portions of the memory stored in the at least one register.
G11C 5/04 - Supports pour éléments d'emmagasinage; Montage ou fixation d'éléments d'emmagasinage sur de tels supports
G11C 11/4074 - Circuits d'alimentation ou de génération de tension, p.ex. générateurs de tension de polarisation, générateurs de tension de substrat, alimentation de secours, circuits de commande d'alimentation
G06F 3/06 - Entrée numérique à partir de, ou sortie numérique vers des supports d'enregistrement
6.
MEMORY CONTOLLER AND NEAR-MEMORY SUPPORT FOR SPARSE ACCESSES
A data processing system includes a data processor and a memory controller receiving memory access requests from the data processor and generating at least one memory access cycle to a memory system in response to the receiving. The memory controller includes a command queue and a sparse element processor. The command queue is for receiving and storing the memory access requests including a first memory access request including a small element request. The sparse element processor is for causing the memory controller to issue a second memory access request to the memory system in response to the first memory access request with a density greater than a density indicated by the first memory access request.
A data processing node includes a processor element and a data fabric circuit. The data fabric circuit is coupled to the processor element and to a local memory element and includes a crossbar switch. The data fabric circuit is operable to bypass the crossbar switch for memory access requests between the processor element and the local memory element.
An electronic device includes a processor having processor circuitry and a leader memory controller, a controller coupled to the processor and having a follower memory controller, and a memory. The processor circuitry is operable to access the memory by issuing memory access requests to the leader memory controller. The leader memory controller is operable to complete the memory access requests using the follower memory controller to issue memory commands to the at least one memory die.
A method for forming a semiconductor assembly that includes forming a first set of layers on a first wafer, where one or more layers of the first set includes one or more devices of the semiconductor assembly. The method further includes forming a second set of layers on a second wafer, where one or more layers of the second set include connections between one or more of the devices of the semiconductor assembly. The method additionally includes coupling a layer of the first set to a layer of the second set using metal to metal hybrid bonding.
H01L 23/00 - DISPOSITIFS À SEMI-CONDUCTEURS NON COUVERTS PAR LA CLASSE - Détails de dispositifs à semi-conducteurs ou d'autres dispositifs à l'état solide
H01L 25/065 - Ensembles consistant en une pluralité de dispositifs à semi-conducteurs ou d'autres dispositifs à l'état solide les dispositifs étant tous d'un type prévu dans le même sous-groupe des groupes , ou dans une seule sous-classe de , , p.ex. ensembles de diodes redresseuses les dispositifs n'ayant pas de conteneurs séparés les dispositifs étant d'un type prévu dans le groupe
Random access memory (RAM) is attached to an input/output (I/O) controller of a chipset (e.g., on a motherboard). This chipset attached RAM is optionally used as part of a tiered storage solution with other tiers including, for example, nonvolatile memory (e.g., a solid state drive (SSD)) or a hard disk drive. The chipset attached RAM is separate from the system memory, allowing the chipset attached RAM to be used to speed up access to frequently used data stored in the tiered storage solution without reducing the amount of system memory available to an operating system running on the one or more processing units.
A disclosed method can include (i) positioning a first surface of a component of a semiconductor device on a first plated through-hole, (ii) covering, with a layer of dielectric material, at least a second surface of the component that is opposite the first surface of the component, (iii) removing a portion of the layer of dielectric material covering the second surface of the component to form at least one cavity, and (iv) depositing conductive material in the cavity to form a second plated through-hole on the second surface of the component. Various other apparatuses, systems, and methods are also disclosed.
H01L 23/538 - Dispositions pour conduire le courant électrique à l'intérieur du dispositif pendant son fonctionnement, d'un composant à un autre la structure d'interconnexion entre une pluralité de puces semi-conductrices se trouvant au-dessus ou à l'intérieur de substrats isolants
H01L 23/498 - Connexions électriques sur des substrats isolants
H01L 21/768 - Fixation d'interconnexions servant à conduire le courant entre des composants distincts à l'intérieur du dispositif
12.
SYSTEMS AND METHODS FOR DEFOGGING IMAGES AND VIDEO
Systems, apparatuses, and methods for implementing defogging techniques for images and video are disclosed. A defogging engine generates a defog filter result from a grayscale format version of an input image. An estimation engine generates an enhancement strength variable from a hue-saturation-value (HSV) format version of the input image. An enhancement engine receives both the defog filter result from the defogging engine and the enhancement strength variable from the estimation engine. The enhancement engine also receives the original red-green-blue (RGB) color space format version of the input image. The enhancement engine generates an enhanced version of the input image from the original RGB format version based on the defog filter result and the enhancement strength variable. The enhanced version of the input image mitigates fog, haze, mist or other environmental impediments that obscured the original input image.
An apparatus and method for efficiently performing power management for a multi-client computing system. In various implementations, a computing system includes multiple clients that process tasks corresponding to applications. The clients store generated requests of a particular type while processing tasks. A client receives an indication specifying that another client is having requests of the particular type being serviced. In response to receiving this indication, the client inserts a first urgency level in one or more stored requests of the particular type prior to sending the requests for servicing. When the client determines a particular time interval has elapsed, the client sends an indication to other clients specifying that requests of the particular type are being serviced. The client also inserts a second urgency level different from the first urgency level in one or more stored requests of the particular type prior to sending the requests for servicing.
G06F 9/52 - Synchronisation de programmes; Exclusion mutuelle, p.ex. au moyen de sémaphores
G06F 1/3234 - Gestion de l’alimentation, c. à d. passage en mode d’économie d’énergie amorcé par événements Économie d’énergie caractérisée par l'action entreprise
Systems, apparatuses, and methods for automatic firmware provision of high speed serializer/deserializer (SERDES) links are disclosed. A system on chip (SoC) includes one or more microcontrollers, a programmable interconnect, a plurality of physical layer engines, and a plurality of SERDES lanes. The programmable interconnect and the plurality of SERDES lanes are able to support communication protocols for interfaces such as PCIE, SATA, Ethernet, and others. On bootup, the SoC receives a custom specification of how the plurality of SERDES lanes are to be configured. The one or more microcontrollers generate a mapping for the programmable interconnect based on the specification. The mapping is used to configure the programmable interconnect to match the specification. The result is the programmable interconnect connecting the plurality of SERDES lanes to the appropriate physical layer circuits to implement the desired configuration.
An apparatus and method for efficiently supporting multiple peripheral communication protocols in a computing system. A computing system includes multiple servers with one or more of the servers using multiple connectors for connecting to multiple peripheral devices such as data storage devices. At least one of the connectors is able to support multiple communication protocols, rather than a single communication protocol. A processor of the server determines a peripheral device has been attached to a connector that supports multiple communication protocols, and the processor determines whether one of the multiple communication protocols supported by the particular connector matches the attached peripheral device's communication protocol. If so, the processor configures the connector with the matching communication protocol. Otherwise, the processor generates an indication that specifies that there is no match.
A system and method for efficiently designing a through silicon via (TSV) macro blocks are described. In various implementations, the circuitry of a processor executes instructions of a place and route tool that provides automatic placement of macro blocks and standard cells on an integrated circuit die based on a copy of a netlist of the integrated circuit being designed and a copy of a standard cell library that includes a variety of standard cells and macro blocks. The processor places two functional macros in the floorplan with a channel between them. In the channel, the processor places a TSV macro that includes at least one boundary cell inside of the TSV macro. The processor prevents placement of a boundary cell adjacent to at least one side of the TSV macro despite empty space exists due to no standard cells or macros about the at least one side.
G06F 30/392 - Conception de plans ou d’agencements, p.ex. partitionnement ou positionnement
G06F 30/398 - Vérification ou optimisation de la conception, p.ex. par vérification des règles de conception [DRC], vérification de correspondance entre géométrie et schéma [LVS] ou par les méthodes à éléments finis [MEF]
G06F 111/20 - CAO de configuration, p.ex. conception par assemblage ou positionnement de modules sélectionnés à partir de bibliothèques de modules préconçus
G06F 119/22 - Analyse de rendement ou optimisation de rendement
A technique for operating a cache is disclosed. The technique includes based on a workload change, identifying a first allocation permissions policy; operating the cache according to the first allocation permissions policy; based on set sampling, identifying a second allocation permissions policy; and operating the cache according to the second allocation permissions policy.
G06F 12/0802 - Adressage d’un niveau de mémoire dans lequel l’accès aux données ou aux blocs de données désirés nécessite des moyens d’adressage associatif, p.ex. mémoires cache
Systems, apparatuses, and methods for dynamically estimating power losses in a computing system. A system management circuit tracks a state of a computing system and dynamically estimates power losses in the computing system based in part on the state. Based on the estimated power losses, power consumption of the computing system is estimated. In response to detecting reduced power losses in at least a portion of the computing system, the system management circuit is configured to increase a power-performance state of one or more circuits of the computing system while remaining within a power allocation limit of the computing system.
G06F 1/30 - Moyens pour agir en cas de panne ou d'interruption d'alimentation
G06F 1/3206 - Surveillance d’événements, de dispositifs ou de paramètres initiant un changement de mode d’alimentation
G06F 1/3234 - Gestion de l’alimentation, c. à d. passage en mode d’économie d’énergie amorcé par événements Économie d’énergie caractérisée par l'action entreprise
G06F 1/28 - Surveillance, p.ex. détection des pannes d'alimentation par franchissement de seuils
Systems, apparatuses, and methods for managing power allocation in a computing system. A system management unit detects a condition indicating a change in power is indicated. Such a change may be detecting an indication that a power change is either required, possible, or requested. In response to detecting a reduction in power is indicated, the system management unit identifies currently executing tasks of the computing system and accesses sensitivity data to determine which of a number of computing units (or power domains) to select for power reduction. Based at least in part on the data, a unit is identified that is determined to have a relatively low sensitivity to power state changes under the current operating conditions. A relatively low sensitivity indicates that a change in power to the corresponding unit will not have as significant an impact on overall performance of the computing system than if another unit was selected. Power allocated for the selected unit is then decreased.
G06F 1/3206 - Surveillance d’événements, de dispositifs ou de paramètres initiant un changement de mode d’alimentation
G06F 1/324 - Gestion de l’alimentation, c. à d. passage en mode d’économie d’énergie amorcé par événements Économie d’énergie caractérisée par l'action entreprise par réduction de la fréquence d’horloge
G06F 1/329 - Gestion de l’alimentation, c. à d. passage en mode d’économie d’énergie amorcé par événements Économie d’énergie caractérisée par l'action entreprise par planification de tâches
G06F 1/3296 - Gestion de l’alimentation, c. à d. passage en mode d’économie d’énergie amorcé par événements Économie d’énergie caractérisée par l'action entreprise par diminution de la tension d’alimentation ou de la tension de fonctionnement
20.
LTH AND SVLC HYBRID CORE ARCHITECTURE FOR LOWER COST COMPONENT EMBEDDING IN PACKAGE SUBSTRATE
An apparatus and method for efficiently transferring information as signals through a silicon package substrate. A semiconductor fabrication process (or process) begins with a relatively thin package substrate core layer and uses lasers to create openings in the package substrate at locations of the signal routes. The use of each of the relatively thin core layer and the lasers allows for reduction in the pitch of the signal routes. The process creates signal routes in the openings using stacked vias from one side of the package substrate to an opposite side of the package substrate. Additionally, the process forms the package substrate with multiple embedded passive components with different thicknesses in different layers of the package substrate. The embedded passive components are used to improve signal integrity of the signal routes.
H01L 23/498 - Connexions électriques sur des substrats isolants
H01L 23/538 - Dispositions pour conduire le courant électrique à l'intérieur du dispositif pendant son fonctionnement, d'un composant à un autre la structure d'interconnexion entre une pluralité de puces semi-conductrices se trouvant au-dessus ou à l'intérieur de substrats isolants
H01L 21/48 - Fabrication ou traitement de parties, p.ex. de conteneurs, avant l'assemblage des dispositifs, en utilisant des procédés non couverts par l'un uniquement des groupes
A data processor is adapted to couple to a memory. The data processor includes a memory operation array, a power engine, and an initialization circuit. The memory operation array includes a command portion and a data portion. The power engine has an input for receiving power state change request signals and an output for providing memory operations responsive to instructions stored in the command portion. The initialization circuit populates the data portion such that consecutive memory operations are separated by an amount corresponding to a predetermined minimum timing parameter.
A memory includes a link training circuit with a pseudo-random bit sequence (PRBS) generator and a burst error detection counter. The burst error detection counter including a comparator, a first input coupled to the data input, a second input coupled to the PRBS generator, and a counter operable to increase an error count value by one responsive to detecting any number of errors greater than zero in a sequence of symbols including a predetermined number of symbols.
G11C 11/4096 - Circuits de commande ou de gestion d'entrée/sortie [E/S, I/O] de données, p.ex. circuits pour la lecture ou l'écriture, circuits d'attaque d'entrée/sortie ou commutateurs de lignes de bits
G06F 3/06 - Entrée numérique à partir de, ou sortie numérique vers des supports d'enregistrement
A processing system including a parallel processing unit selectively allocating pages of memory for interleaving across configurable subsets of channels based on a mode of allocation. In some embodiments, in a first mode, a page of memory is allocated to and interleaved across a plurality of channels, and in a second mode, a page of memory is allocated to and interleaved across a subset of the plurality of channels.
A processing system divides an image to be rendered into one or more tiles and performs a visibility pass on the primitives of the image. During the visibility pass, the processing system generates visibility data for each primitive of a draw call of the image based on a visible primitive count and a visible draw call count. In response to a primitive of the draw call being visible in the first tile, the processing system increments the visible primitive count and generates visibility data indicating that the primitives of the draw call are to be rendered using draw call index data stored in an on-chip memory. If the primitive is the first visible primitive of the draw call, the processing system further increments the visible draw call count. Additionally, the processing system renders the primitives of the draw call using the draw call index data stored in the on-chip memory.
The disclosed system may include a processor configured to encode, using an encoding scheme that reduces a number of bits needed to represent one or more instructions from a set of instructions in an instruction buffer represented by a dependency matrix, a dependency indicating that a child instruction represented in the dependency matrix depends on a parent instruction represented in the dependency matrix. The processor may also be configured to store the encoded dependency in the dependency matrix and dispatch instructions in the instruction buffer based at least on decoding one or more dependencies stored in the dependency matrix for the instructions. Various other methods, systems, and computer-readable media are also disclosed.
A technique for operating a cache is disclosed. The technique includes utilizing a first portion of a cache in a directly accessed manner; and utilizing a second portion of the cache as a cache.
G06F 12/0891 - Adressage d’un niveau de mémoire dans lequel l’accès aux données ou aux blocs de données désirés nécessite des moyens d’adressage associatif, p.ex. mémoires cache utilisant des moyens d’effacement, d’invalidation ou de réinitialisation
A technique for operating a cache is disclosed. The technique includes recording access data for a first set of memory accesses of a first frame; identifying parameters for a second set of memory accesses of a second frame subsequent to the first frame, based on the access data; and applying the parameters to the second set of memory accesses.
G06F 12/0802 - Adressage d’un niveau de mémoire dans lequel l’accès aux données ou aux blocs de données désirés nécessite des moyens d’adressage associatif, p.ex. mémoires cache
28.
REST-OF-CHIP POWER OPTIMIZATION THROUGH DATA FABRIC PERFORMANCE STATE MANAGEMENT
Methods and systems are disclosed for managing performance states of a data fabric of a system on chip (SoC). Techniques disclosed include determining a performance state of the data fabric based on data fabric bandwidth utilizations of respective components of the SoC. A metric, characteristic of a workload centric to cores of the SoC, is derived from hardware counters, and, based on the metric, it is determined whether to alter the performance state.
G06F 1/3234 - Gestion de l’alimentation, c. à d. passage en mode d’économie d’énergie amorcé par événements Économie d’énergie caractérisée par l'action entreprise
G06F 1/324 - Gestion de l’alimentation, c. à d. passage en mode d’économie d’énergie amorcé par événements Économie d’énergie caractérisée par l'action entreprise par réduction de la fréquence d’horloge
29.
DATA ENCRYPTION SUITABLE FOR USE IN SYSTEMS WITH PROCESSING-IN-MEMORY
An encryption circuit includes an iterative block cipher circuit. The iterative block cipher circuit has a counter input for a row index, a key input for receiving a secret key, and an output for providing an encrypted counter value in response to performing a block cipher process using the row index as a counter the secret key. The encryption circuit uses the iterative block cipher circuit during a row operation to a memory.
G06F 21/72 - Protection de composants spécifiques internes ou périphériques, où la protection d'un composant mène à la protection de tout le calculateur pour assurer la sécurité du calcul ou du traitement de l’information dans les circuits de cryptographie
H04L 9/06 - Dispositions pour les communications secrètes ou protégées; Protocoles réseaux de sécurité l'appareil de chiffrement utilisant des registres à décalage ou des mémoires pour le codage par blocs, p.ex. système DES
G06F 21/79 - Protection de composants spécifiques internes ou périphériques, où la protection d'un composant mène à la protection de tout le calculateur pour assurer la sécurité du stockage de données dans les supports de stockage à semi-conducteurs, p.ex. les mémoires adressables directement
30.
CHANNEL AND SUB-CHANNEL THROTTLING FOR MEMORY CONTROLLERS
An arbiter is operable to pick commands from a command queue for dispatch to a memory. The arbiter includes a traffic throttle circuit for mitigating excess power usage increases in coordination with one or more additional arbiters. The traffic throttle circuit includes a monitoring circuit and a throttle circuit. The monitoring circuit is for measuring a number of read and write commands picked by the arbiter and the one or more additional arbiters over a first predetermined period of time. The throttle circuit, responsive to a low activity state, limits a number of read and write commands issued by the arbiter during a second predetermined period of time.
A disclosed method for making efficient picks of micro-operations for execution includes selecting a first set of micro-operations that are ready for execution during a certain clock cycle. The method also includes selecting a second set of micro-operations that are ready for execution during the certain clock cycle. The method additionally includes replacing one or more of the complex micro-operations included in the first set of micro-operations with one or more simple micro-operations included in the second set of micro-operations due at least in part to a number of complex micro-operations included in the first set of micro-operations exceeding a set of complex resources capable of executing the complex micro-operations. Various other apparatuses, systems, and methods are also disclosed.
G06F 9/48 - Lancement de programmes; Commutation de programmes, p.ex. par interruption
G06F 7/57 - Unités arithmétiques et logiques [UAL], c. à d. dispositions ou dispositifs pour accomplir plusieurs des opérations couvertes par les groupes ou pour accomplir des opérations logiques
G06F 5/01 - Procédés ou dispositions pour la conversion de données, sans modification de l'ordre ou du contenu des données maniées pour le décalage, p.ex. la justification, le changement d'échelle, la normalisation
An apparatus and method for efficiently managing balanced performance among replicated partitions of an integrated circuit despite loss of functionality due to manufacturing defects. A processing unit includes at least two replicated partitions, each assigned to operation parameters of a respective power domain. The partitions include multiple compute units. The compute units include multiple lanes of execution. Due to a variety of types of manufacturing defects, one or more of the partitions of the processing unit has less than a predetermined number of operational compute units. To balance the throughput of the multiple partitions, a power manager generates both static and dynamic scaling factors based on at least the corresponding number of operational compute units. Using these scaling factors, the power manager adjusts the operation parameters of power domains for the partitions relative to one another.
G06F 1/324 - Gestion de l’alimentation, c. à d. passage en mode d’économie d’énergie amorcé par événements Économie d’énergie caractérisée par l'action entreprise par réduction de la fréquence d’horloge
G06F 1/3296 - Gestion de l’alimentation, c. à d. passage en mode d’économie d’énergie amorcé par événements Économie d’énergie caractérisée par l'action entreprise par diminution de la tension d’alimentation ou de la tension de fonctionnement
33.
ADAPTIVE THREAD MANAGEMENT FOR HETEROGENOUS COMPUTING ARCHITECTURES
An apparatus and method for efficiently scheduling tasks in a dynamic manner to multiple cores that support a heterogeneous computing architecture. A computing system includes multiple cores with at least two cores being capable of executing instructions of a same instruction set architecture (ISA), and therefore, are architecturally compatible. In an implementation, each of the at least two cores is a general-purpose central processing unit (CPU) core capable of executing instructions of a same ISA. However, the throughput and the power consumption greatly differ between the at least two cores based on their hardware designs. An operating system scheduler assigns a thread to a first core, and the first core measures thread dynamic behavior of the thread over a time interval. Based on the thread dynamic behavior, the scheduler reassigns the thread to a second core different from the first core.
A data processor is for accessing a memory having a first pseudo channel and a second pseudo channel. The data processor includes at least one memory accessing agent, a memory controller, and a data fabric. The at least one memory accessing agent generates generating memory access requests including first memory access requests that access the memory. The memory controller provides memory commands to the memory in response to the first memory access requests. The data fabric routes the first memory access requests to a first downstream port in response to a corresponding first memory request accessing the first pseudo channel, and to a second downstream port in response to the corresponding first memory request accessing the second pseudo channel. The memory controller has first and second upstream ports coupled to the first and second downstream ports of the data fabric, respectively, and a downstream port coupled to the memory.
A data processor accesses a memory having a first pseudo channel and a second pseudo channel. The data processor includes at least one memory accessing agent for generating a memory access request, a memory controller for providing a memory command to the memory in response to a normalized request selectively using a first pseudo channel pipeline circuit and a second pseudo channel pipeline circuit, and a data fabric for converting the memory access request into the normalized request selectively for the first pseudo channel and the second pseudo channel.
A processing system [100] includes a processing device [230] coupled to a memory [106] configured to check for and correct faults in requested data. In response to correcting the faults of the requested data, the memory sends the corrected data and unused check bits to the processing device as a plurality of fetch returns. The memory also sends a parity fetch based on the corrected data and one or more operations to the processing device. After receiving the plurality of fetch returns and the unused check bits, the processing device checks each fetch return for faults based on the unused check bits. In response to determining that a fetch return includes a fault, the processing device erases the fetch return and reconstructs the fetch return based on one or more other received fetch returns and the parity fetch.
G06F 11/10 - Détection ou correction d'erreur par introduction de redondance dans la représentation des données, p.ex. en utilisant des codes de contrôle en ajoutant des chiffres binaires ou des symboles particuliers aux données exprimées suivant un code, p.ex. contrôle de parité, exclusion des 9 ou des 11
37.
VOLUME INTERSECTION USING ROTATED BOUNDING VOLUMES
One or more rotated bounding volumes are generated for one or more nodes of a bounding volume hierarchy (BVH). Volume intersection ray tracing tests are then performed using the rotated bounding volumes with the aim of reducing the number of calculations required relative to an original, non-rotated bounding volume. Rotated bounding volumes are selected from a plurality of candidate rotations, and selection of one of the candidate rotations are based on surface areas, such as minimum total surface areas, of bounding volumes corresponding to each of the candidate rotations. In order to minimize data storage and increase performance, a number of candidate rotations may be limited to a predetermined set of rotations.
The disclosed computer-implemented method for interpolating register-based lookup tables can include identifying, within a set of registers, a lookup table that has been encoded for storage within the set of registers. The method can also include receiving a request to look up a value in the lookup table and responding to the request by interpolating, from the encoded lookup table stored in the set of registers, a representation of the requested value. Various other methods, systems, and computer-readable media are also disclosed.
G06F 9/30 - Dispositions pour exécuter des instructions machines, p.ex. décodage d'instructions
G06F 9/345 - Adressage de l'opérande d'instruction ou du résultat ou accès à l'opérande d'instruction ou au résultat d'opérandes ou de résultats multiples
G06N 3/063 - Réalisation physique, c. à d. mise en œuvre matérielle de réseaux neuronaux, de neurones ou de parties de neurone utilisant des moyens électroniques
The disclosed method may include detecting, by a control circuit coupled to a first read only memory (ROM) device and a second ROM device, a failure of a first output signal from the first ROM device to a common output. The first ROM device is connected to the common output and the second ROM device is disconnected from the common output. The method also includes switching, by the control circuit in response to detecting the failure, the common output from the first ROM device to the second ROM device. Various other methods, systems, and computer-readable media are also disclosed.
G06F 3/06 - Entrée numérique à partir de, ou sortie numérique vers des supports d'enregistrement
G06F 11/08 - Détection ou correction d'erreur par introduction de redondance dans la représentation des données, p.ex. en utilisant des codes de contrôle
40.
PIPELINE DELAY ELIMINATION WITH PARALLEL TWO LEVEL PRIMITIVE BATCH BINNING
A technique for rendering is provided. The technique includes for a set of primitives processed in a coarse binning pass, outputting early draw data to an early draw buffer; while processing the set of primitives in the coarse binning pass, processing the early draw data in a fine binning pass; and processing remaining primitives of the set of primitives in the fine binning pass.
Real time workload-based system adjustment is described. In accordance with the described techniques, a processor and a memory are operated according to first settings associated with a first workload. A second workload configured to utilize the processor and the memory is detected. The second workload is associated with second settings. Responsive to detecting the second workload, operation of the processor and the memory are adjusted to operate according to the second settings without rebooting.
A technique for operating a cache is disclosed. The technique includes in response to a power down trigger that indicates that the cache effectiveness is considered to be low, powering down the cache.
G06F 3/06 - Entrée numérique à partir de, ou sortie numérique vers des supports d'enregistrement
G06F 12/0802 - Adressage d’un niveau de mémoire dans lequel l’accès aux données ou aux blocs de données désirés nécessite des moyens d’adressage associatif, p.ex. mémoires cache
Methods and systems are disclosed for cross-chiplet performance data streaming. Techniques disclosed include accumulating, by a subservient chiplet, event data associated with an event indicative of a performance aspect of the subservient chiplet; sending, by the subservient chiplet, the event data over a chiplet bus to a master chiplet; and adding, by the master chiplet, the received event data to an event record, the event record containing previously received, from the subservient chiplet over the chiplet bus, event data associated with the event.
Core activation and deactivation for a multi-core processor is described. In accordance with the described techniques, a processor having multiple cores operates using a first core configuration. A request to switch from the first core configuration to a second core configuration is received. Responsive to the request, a switch from the first core configuration to the second core configuration occurs by adjusting a number of active cores of the processor without rebooting.
G06F 9/50 - Allocation de ressources, p.ex. de l'unité centrale de traitement [UCT]
G06F 11/20 - Détection ou correction d'erreur dans une donnée par redondance dans le matériel en utilisant un masquage actif du défaut, p.ex. en déconnectant les éléments défaillants ou en insérant des éléments de rechange
G06F 1/3234 - Gestion de l’alimentation, c. à d. passage en mode d’économie d’énergie amorcé par événements Économie d’énergie caractérisée par l'action entreprise
Array of pointers prefetching is described. In accordance with described techniques, a pointer target instruction is detected by identifying that a destination location of a load instruction is used in an address compute for a memory operation and the load instruction is included in a sequence of load instructions having addresses separated by a step size. An instruction for fetching data of a future load instruction is injected in an instruction stream of a processor. The data of the future load instruction is stored in a temporary register. An additional instruction is injected in the instruction stream for prefetching a pointer target based on an address of the memory operation and the data of the future load instruction.
G06F 12/0862 - Adressage d’un niveau de mémoire dans lequel l’accès aux données ou aux blocs de données désirés nécessite des moyens d’adressage associatif, p.ex. mémoires cache avec pré-lecture
Package lids with carveouts configured for processor connection and alignment are described. Lid carveouts are configured to align and mechanically secure a cooling device to the package lid by receiving protrusions of the cooling device. Because the lid carveouts ensure precise alignment and orientation of a cooling device relative to a package lid, the lid design enables targeted cooling of discrete portions of the lid. Lid carveouts are further configured to expose one or more connectors disposed on a surface that supports package internal components. When contacted by corresponding connectors of a cooling device, the lid carveouts enable direct connections between the package and the attached cooling device. By creating a direct connection between package components and an attached cooling device, the lid carveouts enable a high-speed connection for proactive and on-demand cooling actuation.
A technique for performing ray tracing operations is provided, The technique includes, in response to detecting that a threshold number of traversal stage work-items of a wavefront have terminated, increasing intersection test parallelization for non-terminated work-items..
Package lids with carveouts configured to expose lights directly connected to an internal component of a processor are described. Lid carveouts are configured to precisely align and mechanically secure a cooling device to the package lid by receiving protrusions of the cooling device via a press fit connection, while maintaining visibility of lights directly connected to processor internal components when the cooling device is connected. Lid carveouts are further configured to expose one or more connectors disposed on a processor surface that supports its internal component. When contacted by corresponding connectors of an auxiliary device, such as a light not integrated into a processor package or a cooling device, the lid carveouts enable direct connections between the package's internal components and the auxiliary device.
Load dependent branch prediction is described. In accordance with described techniques, a load dependent branch instruction is detected by identifying that a destination location of a load instruction is used in an operation for determining whether a conditional branch is taken or not taken. The load instruction is included in a sequence of load instructions having addresses separated by a step size. An instruction is injected in an instruction stream of a processor for fetching data of a future load instruction using an address of the load instruction offset by a distance based on the step size. An additional instruction is injected in the instruction stream of the processor for precomputing an outcome of a load dependent branch using an address computed based on an address of the operation and the data of the future load instruction.
User configurable hardware settings for overclocking is described. In accordance with the described techniques, user input to adjust hardware settings for operating a processing unit in an overclocking mode is received. The user input, for example, adjusts at least one of a voltage droop threshold or a frequency adjustment of the clock rate. A voltage droop is detected while operating the processing unit in the overclocking mode. Responsive to detecting the voltage droop, a clock rate of the processing unit is adjusted based at least in part on the adjusted hardware settings.
G01R 19/165 - Indication de ce qu'un courant ou une tension est, soit supérieur ou inférieur à une valeur prédéterminée, soit à l'intérieur ou à l'extérieur d'une plage de valeurs prédéterminée
G01R 31/28 - Test de circuits électroniques, p.ex. à l'aide d'un traceur de signaux
G01R 35/00 - Test ou étalonnage des appareils couverts par les autres groupes de la présente sous-classe
A first frame of a video stream is obtained. The first frame is defined by a plurality of pixels associated with a set of color data. A determination is made that a pixel of the plurality of pixels comprises high-frequency information. Responsive to the determination that the pixel comprises high-frequency information, a pixel lock is generated for the pixel such that color data associated with the pixel is maintained during a color accumulation process for at least one of the first frame or a second frame of the video stream that is subsequent to the first frame.
A first frame of a video stream rendered at a first resolution is obtained. A second frame of the video stream upscaled to a second higher resolution is also obtained. The first plurality of pixels is upscaled to the second resolution. The upsampling generates upsampled color data for the upsampled first plurality of pixels. The upsampled color data is accumulated with a second set of color data associated with a second plurality of pixels defining the second frame to generate final color data for the upsampled first plurality of pixels. Color data of the second set of color data associated with a pixel lock contributes more to the final color data than corresponding color data of the upsampled color data. The upsampled first plurality of pixels is stored with the final color data as an upscaled frame representing the first frame at the second resolution.
A memory package includes first, second, third, and fourth channels arranged consecutively in a clockwise direction on the memory package, each of the first, second, third, and fourth channels having access circuitry and memory arrays. In a first mode, the first channel controls access to the memory arrays in the second channel and the fourth channel controls access to the memory arrays in the third channel.
G11C 11/4096 - Circuits de commande ou de gestion d'entrée/sortie [E/S, I/O] de données, p.ex. circuits pour la lecture ou l'écriture, circuits d'attaque d'entrée/sortie ou commutateurs de lignes de bits
54.
QUANTIFYING THE HUMAN-LIKENESS OF ARTIFICIALLY INTELLIGENT AGENTS USING STATISTICAL METHODS AND TECHNIQUES
An apparatus includes a processor configured to determine a first distribution associated with an artificial agent based on behavior associated with the artificial agent and a second distribution based on behavior of a user. The processor is further configured to generate a human-likeness similarity measurement by comparing the first distribution to the second distribution and modify the behavior of the artificial agent in response to the similarity measurement failing to satisfy a similarity threshold.
Methods and systems are disclosed for traversing nodes in a BVH tree by an intersection engine. Techniques disclosed comprise receiving, by the intersection engine, a traversal instruction, including a tracing-mode, ray data, and an identifier of a node to be traversed. Where the tracing-mode includes a closest hit mode and a first hit mode. If the node to be traversed is an internal node, the intersection engine determines, based on the tracing-mode, an order in which children nodes of the node are to be next traversed and output identifiers of the children nodes in the determined order.
A technique for performing ray tracing operations is provided. The technique includes combining one or more common exponent values of a compressed box node with one or more minimum vertex mantissas of the compressed box node and one or more maximum vertex mantissas of the compressed box node to obtain one or more minimum vertices and one or more maximum vertices; and combining a minimum vertex of the one or more minimum vertices and a maximum vertex of the one or more maximum vertices to obtain a bounding box for a compressed box data item of the compressed box node.
A method and apparatus for managing power states in a computer system includes, responsive to an event received by a processor, powering up a first circuitry. Responsive to the event not being serviceable by the first circuitry, powering up at least a second circuitry of the computer system to service the event.
A technique for performing ray tracing operations is provided. The technique includes traversing through a bounding volume hierarchy to an instance node; performing an instance node transform using common circuitry; traversing to a leaf node of the bounding volume hierarchy; and performing an intersection test for the leaf node using the common circuitry.
Devices, methods and systems for managing resources in a computing device. Information regarding resource usage is captured. A prediction is generated, based on the information, that resource usage by a processor will exceed a threshold during an upcoming time. An operating parameter of the processor is adjusted, based on the prediction. In some implementations, information regarding memory bandwidth is captured. A prediction is generated, based on the information, that a memory region stored in a first memory device will be addressed by a memory intensive instruction during an upcoming time period. Data stored in the memory region is moved to a second memory device, based on the prediction.
Methods and systems are disclosed for inline suspension of an accelerated processing unit (APU). Techniques include receiving a packet, including a mode of operation and commands to be executed by the APU; suspending execution of commands received in previous packets when the mode of operation is a suspension initiation mode; and executing, by the APU, the commands in the received packet. The execution of the suspended commands is restored when the mode of operation in a subsequently received packet is a suspension conclusion mode.
G06F 15/80 - Architectures de calculateurs universels à programmes enregistrés comprenant un ensemble d'unités de traitement à commande commune, p.ex. plusieurs processeurs de données à instruction unique
G06T 1/20 - Architectures de processeurs; Configuration de processeurs p.ex. configuration en pipeline
61.
EMULATING PERFORMANCE OF PRIOR GENERATION PLATFORMS
Methods and systems are disclosed for emulating, in a platform, the performance of a target platform. Techniques disclosed include receiving, by the platform, values of system features, associated with a target performance of the target platform; and setting, by the platform, one or more configuration knobs, based on the received values of system features, to match a performance of the platform to the target performance of the target platform.
G06F 9/455 - Dispositions pour exécuter des programmes spécifiques Émulation; Interprétation; Simulation de logiciel, p.ex. virtualisation ou émulation des moteurs d’exécution d’applications ou de systèmes d’exploitation
G06F 9/30 - Dispositions pour exécuter des instructions machines, p.ex. décodage d'instructions
Cascading execution of atomic operations, including: receiving a request for each thread of a plurality of threads to perform an atomic operation, wherein the plurality of threads comprises a plurality of thread subsets each corresponding to a local memory, wherein the local memory for a thread subset is accessible by the thread subset and inaccessible to a remainder of threads in the plurality of threads; generating a plurality of intermediate results by performing, by each thread subset, the atomic operation in the local memory corresponding to the thread subset; and generating a result for the request by aggregating the plurality of intermediate results in a shared memory accessible to all threads in the plurality of threads.
Parallel processors typically allocate resources to workloads based on workload priority. Priority inversion of resource allocation between workloads of different priorities reduces the operating efficiency of a parallel processor in some cases. A parallel processor [100] mitigates priority inversion by soft-locking resources to prevent their allocation for the processing of lower priority workloads. Soft-locking is enabled responsive to a soft-lock condition, such as one or more priority inversion heuristics [110] exceeding corresponding thresholds or multiple failed allocations of higher priority workloads within a time period. In some cases, priority inversion heuristics include quantities of higher priority workloads and lower priority workloads that are in-flight or incoming, ratios between such quantities, quantities of render targets, or a combination of these. The soft-lock is released responsive to expiry of a soft-lock timer or incoming or in-flight higher priority workloads falling below a threshold, for example.
Methods and systems are disclosed for clock delay compensation in a multiple chiplet system. Techniques disclosed include distributing, by a clock generator, a clock signal across distribution trees of respective chiplets; measuring phases, by phase detectors, where each phase measurement is associated with a chiplet of the chiplets and is indicative of a propagation speed of the clock signal through the distribution tree of the chiplet. Then, for each chiplet, techniques are further disclosed that determine, by a microcontroller, based on the phase measurements associated with the chiplet, a delay offset, and that delay, based on the delay offset, the propagation of the clock signal through the distribution tree of the chiplet using a delay unit associated with the chiplet.
Methods and systems are disclosed for frequency transitioning in a memory interface system. Techniques disclosed include receiving a signal indicative of a change in operating frequency, into a new frequency, in a processing unit interfacing with memory via the memory interface system; switching the system from a normal mode of operation into a transition mode of operation; updating control and state register (CSR) banks of respective transceivers of the system through a mission bus used during the normal mode of operation; and operating the system in the new frequency.
G06F 1/324 - Gestion de l’alimentation, c. à d. passage en mode d’économie d’énergie amorcé par événements Économie d’énergie caractérisée par l'action entreprise par réduction de la fréquence d’horloge
G06F 1/3234 - Gestion de l’alimentation, c. à d. passage en mode d’économie d’énergie amorcé par événements Économie d’énergie caractérisée par l'action entreprise
66.
PERIPHERAL DEVICE PROTOCOLS IN CONFIDENTIAL COMPUTE ARCHITECTURES
Restricting peripheral device protocols in confidential compute architectures, the method including: receiving a first address translation request from a peripheral device supporting a first protocol, wherein the first protocol supports cache coherency between the peripheral device and a processor cache; determining that a confidential compute architecture is enabled; and providing, in response to the first address translation request, a response including an indication to the peripheral device to not use the first protocol.
A processing system [100] divides successive dispatches [135] of work items into portions [145]. The successive dispatches are separated from each other by barriers [202], [204], each barrier indicating that the work items of the previous dispatch must complete execution before work items of a subsequent dispatch can begin execution. In some embodiments, the processing system interleaves execution of portions of a first dispatch with portions of subsequent dispatches that consume data produced by the first dispatch. The processing system thereby reduces the amount of data written to the local cache [120] by a producer dispatch while preserving data locality for a subsequent consumer (or consumer/producer) dispatch and facilitating processing efficiency.
A technique for performing ray tracing operations is provided. The technique includes processing small bounding box nodes in a box intersection test circuit to generate intersection test results for the small bounding box nodes; and processing large bounding box nodes in the box intersection test circuit to generate intersection test results for the large bounding box nodes.
A technique for operating a memory system is disclosed. The technique includes performing a first request, by a first memory client, to access data at a first memory address, wherein the first memory address refers to data in a first memory section that is coupled to the first memory client via a direct memory connection; servicing the first request via the direct memory connection; performing a second request, by the first client, to access data at a second memory address, wherein the second memory address refers to data in a second memory section that is coupled to the first client via a cross connection; and servicing the second request via the cross connection.
Systems, methods, and techniques dynamically utilize load balancing for workgroup assignments between a group of shader engines [125] by a command processor [160] of a graphics processing unit (GPU) [115]. Based on one or more commands received for execution, a plurality of workgroups is generated for assignment to a plurality of shader engines for processing, each shader engine including a respective quantity of active compute units [230]. Each workgroup of the plurality of workgroups is dynamically assigned to a respective shader engine for execution based at least in part on indications of available resources respectively associated with each of the shader engines. In various embodiments, the indications of available resources may include physical parameters regarding each shader engine, as well as current status information regarding the processing of workgroups assigned to each shader engine.
G09G 5/36 - Dispositions ou circuits de commande de l'affichage communs à l'affichage utilisant des tubes à rayons cathodiques et à l'affichage utilisant d'autres moyens de visualisation caractérisés par l'affichage de dessins graphiques individuels en utilisant une mémoire à mappage binaire
G06F 9/50 - Allocation de ressources, p.ex. de l'unité centrale de traitement [UCT]
71.
DRAM SPECIFIC INTERFACE CALIBRATION VIA PROGRAMMABLE TRAINING SEQUENCES
Methods and systems are disclosed for training, by a sequencer of a memory interface system, an interface with DRAM. Techniques disclosed comprise scheduling a command sequence, including DRAM commands that are interleaved with one or more CSR commands; executing the scheduled command sequence, wherein the DRAM commands are sent to the DRAM through an internal datapath of the system and the CSR commands are sent to the internal datapath; and training the interface based on exchange of data, carried out by the DRAM commands, including adjustments to an operational parameter associated with the interface.
Leveraging processing-in-memory (PIM) resources to expedite non-PIM instructions executed on a host is disclosed. In an implementation, a memory controller identifies a first write instruction to write first data to a first memory location, where the first write instruction is not a processing-in-memory (PIM) instruction. The memory controller then writes the first data to a first PIM register. Opportunistically, the memory controller moves the first data from the first PIM register to the first memory location. In another implementation, a memory controller identifies a first memory location associated with a first read instruction, where the first read instruction is not a processing-in-memory (PIM) instruction. The memory controller identifies that a PIM register is associated with the first memory location. The memory controller then reads, in response to the first read instruction, first data from the PIM register.
Methods and systems are disclosed for calibrating, by a memory interface system, an interface with dynamic random-access memory (DRAM) using a dynamically changing training clock. Techniques disclosed comprise receiving a system clock having a clock signal at a first pulse rate. Then, during the training of the interface, techniques disclosed comprise generating a training clock from the clock signal at the first pulse rate, the training clock having a clock signal at a second pulse rate, and sending, based on the generated training clock, command signals, including address data, to the DRAM.
G11C 11/4096 - Circuits de commande ou de gestion d'entrée/sortie [E/S, I/O] de données, p.ex. circuits pour la lecture ou l'écriture, circuits d'attaque d'entrée/sortie ou commutateurs de lignes de bits
A processing unit [100] employs a hardware traversal engine [115] to traverse an acceleration structure [107] such as a ray tracing structure. The hardware traversal engine includes one or more memory modules [434, 436, 438, 440] to store state information and other data used for the structure traversal, and control logic [430] to execute a traversal process based on the stored data and based on received information indicating a source node of the acceleration structure to be used for the traversal process. By employing a hardware traversal engine, the processing unit is able to execute the traversal process more quickly and efficiently, conserving processing resources and improving overall processing efficiency.
A system and method for efficiently resetting data stored in a memory array are described. In various implementations, an integrated circuit includes a memory for storing data, and a processing unit that generates access requests for the data stored in the memory. When access circuitry of the memory array begins a reset operation, it reduces a power supply voltage level used by memory bit cells in a column of the array to a value less than a threshold voltage of transistors. Therefore, the p-type transistors of the bit cells do not contend with the write driver during a write operation. The access circuitry provides the reset data on the write bit lines, and asserts each of the write word lines of the memory array. To complete the write operation, the access circuitry returns the power supply voltage level from below the threshold voltage level to an operating voltage level.
Systems, apparatuses, and methods for implementing a discard engine in a graphics pipeline are disclosed. A system includes a graphics pipeline with a geometry engine launching shaders that generate attribute data for vertices of each primitive of a set of primitives. The attribute data is consumed by pixel shaders, with each pixel shader generating a deallocation message when the pixel shader no longer needs the attribute data. A discard engine gathers deallocations from multiple pixel shaders and determines when the attribute data is no longer needed. Once a block of attributes has been consumed by all potential pixel shader consumers, the discard engine deallocates the given block of attributes. The discard engine sends a discard command to the caches so that the attribute data can be invalidated and not written back to memory.
A method and apparatus for performing a simulated write in a computer system includes, responsive to a scheduled memory operation determined by a memory controller, sending a simulated write operation to a physical layer circuitry (PHY) to increase circuit power without enabling the output of the PHY until the memory operation begins. Responsive to the memory operation being complete, sending a simulated write operation to the PHY to decrease circuit power.
A semiconductor device includes one or more active devices disposed between a processor die and a package substrate. The semiconductor device includes a first layer with a processor die, a second layer with one or more active devices, and a third layer with a package substrate, where the second layer is disposed between the first and third layers. The one or more active devices are semiconductor-based devices, such as voltage regulators, that participate in supplying power to the processor die and are electrically connected to the processor die using various connection configurations. The implementations use short path lengths for improved performance with a compact structure that avoids the use of edge wiring or interposers without occupying processor die space. Implementations include the use of through-silicon vias (TSVs) to provide short path lengths while reducing the number of connection resources used by the one or more power components.
H01L 25/065 - Ensembles consistant en une pluralité de dispositifs à semi-conducteurs ou d'autres dispositifs à l'état solide les dispositifs étant tous d'un type prévu dans le même sous-groupe des groupes , ou dans une seule sous-classe de , , p.ex. ensembles de diodes redresseuses les dispositifs n'ayant pas de conteneurs séparés les dispositifs étant d'un type prévu dans le groupe
H01L 25/18 - Ensembles consistant en une pluralité de dispositifs à semi-conducteurs ou d'autres dispositifs à l'état solide les dispositifs étant de types prévus dans plusieurs sous-groupes différents du même groupe principal des groupes , ou dans une seule sous-classe de ,
H01L 23/538 - Dispositions pour conduire le courant électrique à l'intérieur du dispositif pendant son fonctionnement, d'un composant à un autre la structure d'interconnexion entre une pluralité de puces semi-conductrices se trouvant au-dessus ou à l'intérieur de substrats isolants
79.
MECHANISM FOR REDUCING COHERENCE DIRECTORY CONTROLLER OVERHEAD FOR NEAR-MEMORY COMPUTE ELEMENTS
A parallel processing (PP) level coherence directory, also referred to as a Processing In-Memory Probe Filter (PimPF), is added to a coherence directory controller. When the coherence directory controller receives a broadcast PIM command from a host, or a PIM command that is directed to multiple memory banks in parallel, the PimPF accelerates processing of the PIM command by maintaining a directory for cache coherence that is separate from existing system level directories in the coherence directory controller. The PimPF maintains a directory according to address signatures that define the memory addresses affected by a broadcast PIM command. Two implementations are described: a lightweight implementation that accelerates PIM loads into registers, and a heavyweight implementation that accelerates both PIM loads into registers and PIM stores into memory.
G06F 12/0842 - Systèmes de mémoire cache multi-utilisateurs, multiprocesseurs ou multitraitement pour multitraitement ou multitâche
G06F 12/0815 - Protocoles de cohérence de mémoire cache
G06F 12/0895 - Mémoires cache caractérisées par leur organisation ou leur structure de parties de mémoires cache, p.ex. répertoire ou matrice d’étiquettes
An approach is provided for a gaming overlay application to provide automatic in-game subtitles and/or closed captions for video game applications. The overlay application accesses an audio stream and a video stream generated by an executing game application. The overlay application processes the audio stream through a text conversion engine to generate at least one subtitle. The overlay application determines a display position to associate with the at least one subtitle. The overlay application generates a subtitle overlay comprising the at least one subtitle located at the associated display position. The overlay application causes a portion of the video stream to be displayed with the subtitle overlay.
Methods and apparatus employ an asynchronous first-in-first-out buffer (FIFO), that includes a plurality of entries. Control logic determines a timing separation between a write header valid signal and corresponding write data valid signal for a write operation to an entry in the first-in-first-out buffer (FIFO) and performs a read of the corresponding data from the entry in the FIFO in the second clock domain, based on the determined timing separation of the write header valid signal and corresponding write data valid signal, and based on a clock frequency ratio between the first and second clock domains.
G06F 5/06 - Procédés ou dispositions pour la conversion de données, sans modification de l'ordre ou du contenu des données maniées pour modifier la vitesse de débit des données, c. à d. régularisation de la vitesse
A data processing system includes a plurality of coherent masters, a plurality of coherent slaves, and a coherent data fabric. The coherent data fabric has upstream ports coupled to the plurality of coherent masters and downstream ports coupled to the plurality of coherent slaves for selectively routing accesses therebetween. The coherent data fabric includes a probe filter and a directory cleaner. The probe filter is associated with at least one of the downstream ports and has a plurality of entries that store information about each entry. The directory cleaner periodically scans the probe filter and selectively removes a first entry from the probe filter after the first entry is scanned.
A apparatus includes a reference signal generator, a droop detection circuit, a digital frequency-locked loop (DFLL), and a DFLL control circuit. The reference signal generator that receives a digital value and produces a pulse-density modulated signal based on the digital value. The droop detection circuit converts the pulse-density modulated signal to an analog signal, compares the analog signal to a monitored supply voltage, and responsive to detecting a droop of the monitored supply voltage below a designated value relative to the analog signal, produces a droop detection signal. The DFLL provides a clock signal for synchronizing circuitry within a domain of the monitored supply voltage. The DFLL control circuit, responsive to receiving the droop detection signal, causes the DFLL to slow the clock signal.
H03L 7/081 - Commande automatique de fréquence ou de phase; Synchronisation utilisant un signal de référence qui est appliqué à une boucle verrouillée en fréquence ou en phase - Détails de la boucle verrouillée en phase avec un déphaseur commandé additionnel
H03L 7/24 - Commande automatique de fréquence ou de phase; Synchronisation utilisant un signal de référence directement appliqué au générateur
G01R 19/25 - Dispositions pour procéder aux mesures de courant ou de tension ou pour en indiquer l'existence ou le signe utilisant une méthode de mesure numérique
G01R 23/16 - Analyse de spectre; Analyse de Fourier
A coherent memory fabric includes a plurality of coherent master controllers and a coherent slave controller. The plurality of coherent master controllers each include a response data buffer. The coherent slave controller is coupled to the plurality of coherent master controllers. The coherent slave controller, responsive to determining a selected coherent block read command is guaranteed to have only one data response, sends a target request globally ordered message to the selected coherent master controller and transmits responsive data. The selected coherent master controller, responsive to receiving the target request globally ordered message, blocks any coherent probes to an address associated with the selected coherent block read command until receipt of the responsive data is acknowledged by a requesting client.
A processing system [100] selectively allocates storage at a local cache [120] of a parallel processing unit [110] for cache lines of a repeating pattern of data [420] that exceeds the storage capacity of the cache. The processing system identifies repeating patterns of data having cache lines that have a reuse distance that exceeds the storage capacity of the cache. A cache controller [130] allocates storage for only a subset of cache lines [140] of the repeating pattern of data at the cache and excludes the remainder of cache lines [415] of the repeating pattern of data from the cache. By restricting the cache to store only a subset of cache lines of the repeating pattern of data, the cache controller increases the hit rate at the cache for the subset of cache lines.
G06F 12/0802 - Adressage d’un niveau de mémoire dans lequel l’accès aux données ou aux blocs de données désirés nécessite des moyens d’adressage associatif, p.ex. mémoires cache
Near-memory compute elements perform memory operations and temporarily store at least a portion of address information for the memory operations in local storage. A broadcast memory command is then issued to the near-memory compute elements that causes the near-memory compute elements to perform a subsequent memory operation using their respective address information stored in the local storage. This allows a single broadcast memory command to be used to perform memory operations across multiple memory elements, such as DRAM banks, using bank-specific address information. In one implementation, the approach is used to process workloads with irregular updates to memory while consuming less command bus bandwidth than conventional approaches. Implementations include using conditional flags to selectively designate address information in local storage that is to be processed with the broadcast memory command.
An integrated circuit includes a latch array including a plurality of latches logically configured in rows and columns, a plurality of repair latches operatively coupled to the plurality of latches and latch array built in self-test and repair logic (LABISTRL) coupled to the plurality of latches. In some implementations the LABISTRL configures latches in the array as one or more column serial test shift register, detects one or more defective latches of the plurality of latches based on applied test data, and selects at least one repair latch in response to detection of at least one defective latch.
One or more components of a computing device are run by default in a boost mode state. The one or more components continue to run in the boost mode state until the boost mode state is no longer sustainable, e.g., due to power consumption of the one or more components or temperature of the one or more components. The one or more components are switched to a reduced power state (e.g., a non-boost mode state) in response to the boost mode state no longer being sustainable. When operating the one or more components in the boost mode state again becomes sustainable due to power consumption or temperature of the one or more components, the one or more components are returned to the default boost mode state.
G06F 1/3234 - Gestion de l’alimentation, c. à d. passage en mode d’économie d’énergie amorcé par événements Économie d’énergie caractérisée par l'action entreprise
G06F 9/50 - Allocation de ressources, p.ex. de l'unité centrale de traitement [UCT]
G06F 1/3206 - Surveillance d’événements, de dispositifs ou de paramètres initiant un changement de mode d’alimentation
89.
MULTI-CYCLE SCHEDULER WITH SPECULATIVE PICKING OF MICRO-OPERATIONS
A multi-cycle scheduler for a processor includes early wake circuitry, late wake circuitry, and picker circuitry. In a first cycle of a clock, the early wake circuitry speculatively identifies child micro-operations as ready whose dependencies are satisfied by a set of ready parent micro-operations. In a second cycle of the clock, the picker circuitry picks at least one of the child micro-operations identified as ready for issue to execution circuitry. In addition, the late wake circuitry blocks from issue at least one picked child micro-operation speculatively identified as ready upon determining that a respective parent micro-operation did not issue to execution circuitry.
An approach provides indirect addressing support for PIM. Indirect PIM commands include address translation information that allows memory modules to perform indirect addressing. Processing logic in a memory module processes an indirect PIM command and retrieves, from a first memory location, a virtual address of a second memory location. The processing logic calculates a corresponding physical address for the virtual address using the address translation information included with the indirect PIM command and retrieves, from the second memory location, a virtual address of a third memory location. This process is repeated any number of times until one or more indirection stop criteria are satisfied. The indirection stop criteria stop the process when work has been completed normally or to prevent errors. Implementations include the processing logic in the memory module working in cooperation with a memory controller to perform indirect addressing.
G11C 11/4091 - Amplificateurs de lecture ou de lecture/rafraîchissement, ou circuits de lecture associés, p.ex. pour la précharge, la compensation ou l'isolation des lignes de bits couplées
G11C 8/12 - Circuits de sélection de groupe, p.ex. pour la sélection d'un bloc de mémoire, la sélection d'une puce, la sélection d'un réseau de cellules
91.
APPROACH FOR MANAGING NEAR-MEMORY PROCESSING COMMANDS AND NON-NEAR-MEMORY PROCESSING COMMANDS IN A MEMORY CONTROLLER
An approach is provided for managing PIM commands and non-PIM commands at a memory controller. A memory controller enqueues PIM commands and non-PIM commands and selects the next command to process based upon various selection criteria. The memory controller maintains and uses a page table to properly configure memory elements, such as banks in a memory module, for the next memory command, whether a PIM command or a non-PIM command. The page table tracks the status of memory elements as of the most recent memory command that was issued. The page table includes an "All Bank" entry that indicates the status of banks after processing the most recent PIM command. For example, the All Banks entry indicates whether all the banks have a row open and if so, specifies the open row for all the banks.
G06F 3/06 - Entrée numérique à partir de, ou sortie numérique vers des supports d'enregistrement
G11C 8/12 - Circuits de sélection de groupe, p.ex. pour la sélection d'un bloc de mémoire, la sélection d'une puce, la sélection d'un réseau de cellules
G06F 12/1009 - Traduction d'adresses avec tables de pages, p.ex. structures de table de page
92.
3D SEMICONDUCTOR PACKAGE WITH DIE-MOUNTED VOLTAGE REGULATOR
A semiconductor package includes a package substrate having a first surface and an opposing second surface, and further includes an integrated circuit (IC) die disposed at the second surface and having a third surface facing the second surface and an opposing fourth surface. The IC die has a first region comprising one or more metal layers and circuit components for one or more functions of the IC die and a second region offset from the first region in a direction parallel with the third and fourth surfaces. The semiconductor package further includes a voltage regulator disposed at the fourth surface in the second region and having an input configured to receive a supply voltage and an output configured to provide a regulated voltage, and also includes a conductive path coupling the output of the voltage regulator to a voltage input of circuitry of the IC die.
H01L 23/58 - Dispositions électriques structurelles non prévues ailleurs pour dispositifs semi-conducteurs
H01L 25/065 - Ensembles consistant en une pluralité de dispositifs à semi-conducteurs ou d'autres dispositifs à l'état solide les dispositifs étant tous d'un type prévu dans le même sous-groupe des groupes , ou dans une seule sous-classe de , , p.ex. ensembles de diodes redresseuses les dispositifs n'ayant pas de conteneurs séparés les dispositifs étant d'un type prévu dans le groupe
H01L 23/538 - Dispositions pour conduire le courant électrique à l'intérieur du dispositif pendant son fonctionnement, d'un composant à un autre la structure d'interconnexion entre une pluralité de puces semi-conductrices se trouvant au-dessus ou à l'intérieur de substrats isolants
H01L 23/48 - Dispositions pour conduire le courant électrique vers le ou hors du corps à l'état solide pendant son fonctionnement, p.ex. fils de connexion ou bornes
93.
LOW-LATENCY ARCHITECTURE FOR FULL FREQUENCY NOISE REDUCTION IN IMAGE PROCESSING
Systems and techniques provide for low-latency, full-frequency noise filtering of images through the use of an image-scaling-based filtering technique, or "multiscale filtering technique", that can provide filtering for low, medium, and/or high frequencies for one or more components of an image, such that the different resolution scales at each level of the multiscale filtering technique provides a larger receptive field for a denoising process employed at each level than a conventional denoising framework. This multiscale filtering includes receiving an input image to be filtered and then performing a multiscale filtering process in which an input image is, at different resolution scales, denoised, downscaled, upscaled, and fused with a result of a lower resolution scale, to generate a filtered image. This may include temporarily buffering intermediate image data for some of the resolution scales at a memory using direct memory access (DMA) operations.
A data processor includes a data fabric, a memory controller, a last level cache, and a traffic monitor. The data fabric is for routing requests between a plurality of requestors and a plurality of responders. The memory controller is for accessing a volatile memory. The last level cache is coupled between the memory controller and the data fabric. The traffic monitor is coupled to the last level cache and operable to monitor traffic between the last level cache and the memory controller, and based on detecting an idle condition in the monitored traffic, to cause the memory controller to command the volatile memory to enter self-refresh mode while the last level cache maintains an operational power state and responds to cache hits over the data fabric.
G06F 3/06 - Entrée numérique à partir de, ou sortie numérique vers des supports d'enregistrement
G06F 12/0891 - Adressage d’un niveau de mémoire dans lequel l’accès aux données ou aux blocs de données désirés nécessite des moyens d’adressage associatif, p.ex. mémoires cache utilisant des moyens d’effacement, d’invalidation ou de réinitialisation
A cache includes an upstream port, a cache memory for storing cache lines each having a line width, and a cache controller. The cache controller is coupled to the upstream port and the cache memory. The upstream port transfers data words having a transfer width less than the line width. In response to a cache line fill, the cache controller selectively determines data bus inversion information for a sequence of data words having the transfer width, and stores the data bus inversion information along with selected inverted data words for the cache line fill in the cache memory.
G06F 12/0802 - Adressage d’un niveau de mémoire dans lequel l’accès aux données ou aux blocs de données désirés nécessite des moyens d’adressage associatif, p.ex. mémoires cache
Cache associativity allocation is described. In accordance with the described techniques, a portion of associativity of a cache is allocated to a category of cache requests. The portion of associativity corresponds to a subset of cachelines of the cache. A request is received to access the cache, and a cacheline of the subset of cachelines is allocated to the request based on a category associated with the request. Data corresponding to the request is loaded into the cacheline of the subset of cachelines.
A processing system [100] is configured to translate a first cache access pattern of a dispatch [135] of work items to a cache access pattern [145] that facilitates consumption of data stored at a cache [120] of a parallel processing unit [110] by a subsequent access before the data is evicted to a more remote level of a memory hierarchy. For consecutive cache accesses having read-after-read data locality, in some embodiments the processing system translates the first cache access pattern to a space-filling curve [506]. In some embodiments, for consecutive accesses having read-after-write data locality, the processing system translates a first typewriter cache access pattern that proceeds in ascending order for a first access [512] to a reverse typewriter cache access pattern that proceeds in descending order for a subsequent cache access [514]. By translating the cache access pattern based on data locality, the processing system increases the hit rate of the cache.
A system and method for fast save/restore is disclosed. The system and method include one or more logical units (LUs) residing in independent power domains, one or more digital frequency synthesizers (DFS), each of the one or more DFS associated with one of the one or more LUs, the one or more DFSs configured to lock a system complex frequency and ramp the one or more LUs to system complex frequency, and one or more slave fast save/restore control (FSRC) units, each slave FSRC unit associated with one of the one or more LUs, the one or more slave FSRC units configured to save/restore the FSRC states of the one or more LUs.
G06F 1/324 - Gestion de l’alimentation, c. à d. passage en mode d’économie d’énergie amorcé par événements Économie d’énergie caractérisée par l'action entreprise par réduction de la fréquence d’horloge
G06F 1/28 - Surveillance, p.ex. détection des pannes d'alimentation par franchissement de seuils
99.
A METHOD AND APPARATUS FOR RECOVERING REGULAR ACCESS PERFORMANCE IN FINE-GRAINED DRAM
A fine-grained dynamic random-access memory (DRAM) includes a first memory bank, a second memory bank, and a dual mode I/O circuit. The first memory bank includes a memory array divided into a plurality of grains, each grain including a row buffer and input/output (I/O) circuitry. The dual-mode I/O circuit is coupled to the I/O circuitry of each grain in the first memory bank, and operates in a first mode in which commands having a first data width are routed to and fulfilled individually at each grain, and a second mode in which commands having a second data width different from the first data width are fulfilled by at least two of the grains in parallel.
G11C 11/4096 - Circuits de commande ou de gestion d'entrée/sortie [E/S, I/O] de données, p.ex. circuits pour la lecture ou l'écriture, circuits d'attaque d'entrée/sortie ou commutateurs de lignes de bits
G11C 8/12 - Circuits de sélection de groupe, p.ex. pour la sélection d'un bloc de mémoire, la sélection d'une puce, la sélection d'un réseau de cellules
G06F 3/06 - Entrée numérique à partir de, ou sortie numérique vers des supports d'enregistrement
An approach is provided for implementing a useful proof-of-work consensus algorithm. A proposed block is received. A combined hash value is generated based on the proposed block and a nonce value. The combined hash value is divided into a plurality of hash value pieces that each correspond to a work packet of a plurality of work packets. One or more requests are transmitted for the plurality of work packets that correspond to the plurality of hash value pieces. In response to receiving the plurality of work packets, a plurality of results is generated by performing, for each work packet of the plurality of work packets, one or more operations to complete work specified by the respective work packet. In response to determining that at least one result of the plurality of results satisfies one or more criteria, the proposed block is added to a blockchain maintained by the blockchain network.