A processor with an expandable instruction set architecture for dynamically configuring execution resources. The processor includes a programmable execution unit (PEU) that may be programmed to perform a user-defined function in response to a user-defined instruction (UDI). The PEU includes programmable logic elements and programmable interconnectors that are collectively programmed to perform at least one processing operation. A UDI loader is responsive to a UDI load instruction that specifies a UDI and a location of programming information that is used to program the PEU. The PEU may be programmed for one or more UDIs for one or more processes. An instruction table stores each UDI and corresponding information to identify the UDI and possibly to reprogram the PEU if necessary. A UDI handler consults the instruction table to identify a received UDI and to send corresponding information to the PEU to execute the corresponding user-defined function.
A processor including a programmable prefetcher for prefetching information from an external memory. The programmable prefetcher includes a load monitor, a programmable prefetch engine, and a prefetch requester. The load monitor tracks load requests issued by the processor to retrieve information from the external memory. The programmable prefetch engine is configured to be programmed by at least one prefetch program to operate as a programmed prefetcher, such that during operation of the processor, the programmed prefetcher generates at least one prefetch address based on the load requests issued by the processor. The requester uses each generated prefetch address to prefetch information from the external memory. A prefetch memory may store one or more prefetch programs and a prefetch programmer may be included to select from among stored prefetch programs to program the prefetcher based on an executing process. Each prefetch program may be configured according to a prefetch definition.
An arbiter that performs accelerated arbitration by approximating relative ages including a memory, blur logic, and grant logic. Multiple entries arbitrate for one or more resources. The memory stores age values each providing a relative age between each pair of entries, and further stores blurred age values. The entries are divided into subsets in which each entry belongs to only one subset. The blur logic determines each blurred age value to indicate a relative age between an entry of a first subset and an entry of a different subset for each pair of subsets. The grant logic grants access by an entry to a resource based on relative age using corresponding age values when comparing relative age between entries within a common subset, and using corresponding blurred age values when comparing relative age between entries in different subsets. Each blurred age value represents multiple age values to simplify arbitration.
An apparatus includes first reservation and second reservation. The first reservation station dispatches a load micro instruction, and indicates on a hold bus if the load micro instruction is a specified load micro instruction directed to retrieve an operand from a prescribed resource other than on-core cache memory. The second reservation station is coupled to the hold bus, and dispatches one or more younger micro instructions therein that depend on the load micro instruction for execution after a number of clock cycles following dispatch of the first load micro instruction, and if it is indicated on the hold bus that the load micro instruction is the specified load micro instruction, the second reservation station is configured to stall dispatch of the one or more younger micro instructions until the load micro instruction has retrieved the operand. The plurality of non-core resources includes a random access memory, programmed via a Joint Test Action Group interface with the plurality of specified load instructions corresponding to an out-of-order processor which, upon initialization, accesses the random access memory to determine said plurality of specified load instructions.
An apparatus includes first and second reservation stations. The first reservation station (421.L) dispatches a load micro instruction, and indicates on a hold bus (444) if the load micro instruction is a specified load micro instruction directed to retrieve an operand from a prescribed resource other than on-core cache memory. The second reservation station (421.1-421.N) is coupled to the hold bus (444), and dispatches one or more younger micro instructions therein that depend on the load micro instruction for execution after a number of clock cycles following dispatch of the first load micro instruction, and if it is indicated on the hold bus (444) that the load micro instruction is the specified load micro instruction, the second reservation station (421.1-421.N) is configured to stall dispatch of the one or more younger micro instructions until the load micro instruction has retrieved the operand. The plurality of non-core resources includes a control element, coupled to the out-of order processor via a control bus.
A cache stores 2ΛJ-byte cache lines has an array of 2ΛN sets each holds tags each X bits and 2∧W ways. An input receives a Q-bit address, MA[(Q-1):0], having a tag MA[(Q-1):(Q-X)] and index MA[(Q-X-1):J]. Q is at least (N+J+X-l). Set selection logic selects one set using the index and tag LSB; comparison logic compares all but the LSB of the tag with all but the LSB of each tag in the selected set and indicates a hit if a match; allocation logic, when the comparison logic indicates there is not a match: allocates into any of the 2ΛW ways of the selected set when operating in a first mode; and into a subset of the 2ΛW ways of the selected set when operating in a second mode. The subset of is limited based on bits of the tag portion.
An apparatus includes first and second reservation stations. The first reservation station (421.L) dispatches a load micro instruction, and indicates on a hold bus (444) if the load micro instruction is a specified load micro instruction directed to retrieve an operand from a prescribed resource other than on-core cache memory, where the specified load instruction requires more than a first number of clock cycles to retrieve the operand. The second reservation station (421.1-421.N) is coupled to the hold bus (444), and dispatches one or more younger micro instructions therein that depend on the load micro instruction for execution after a number of clock cycles following dispatch of the first load micro instruction, and if it is indicated on the hold bus (444) that the load micro instruction is the specified load micro instruction, the second reservation station (421.1-421.N) is configured to stall dispatch of the one or more younger micro instructions until the load micro instruction has retrieved the operand.
An apparatus includes first and second reservation stations. The first reservation station dispatches a load micro instruction, and indicates on a hold bus if the load micro instruction is a specified load micro instruction directed to retrieve an operand from a prescribed resource other than on-core cache memory. The second reservation station is coupled to the hold bus, and dispatches one or more younger micro instructions therein that depend on the load micro instruction for execution after a number of clock cycles following dispatch of the first load micro instruction, and if it is indicated on the hold bus that the load micro instruction is the specified load micro instruction, the second reservation station is configured to stall dispatch of the one or more younger micro instructions until the load micro instruction has retrieved the operand. The plurality of non-core resources (302) includes a fuse array (308), configured to store the plurality of specified load instructions corresponding to the out-of-order processor which upon initialization, accesses the fuse array to determine the plurality of specified load micro instructions.
A fully associative cache memory, comprising: an array of storage elements; an allocation unit that allocates the storage elements in response to memory accesses that miss in the cache memory. Each memory access has an associated memory access type (MAT) of a plurality of predetermined MATs. Each valid storage element of the array has an associated MAT. For each MAT, the allocation unit maintains: a counter that counts of a number of valid storage elements associated with the MAT; and a corresponding threshold. The allocation unit allocates into any of the storage elements in response to a memory access that misses in the cache, unless the counter of the MAT of the memory access has reached the corresponding threshold, in which case the allocation unit replaces one of the valid storage elements associated with the MAT of the memory access.
An associative cache memory (102), comprising: an array (104) of storage elements (112) arranged as M sets by N ways; an allocation unit (106) allocates the storage elements (112) in response to memory accesses (122) that miss in the cache memory (102). Each memory access (122) selects a set. Each memory access (122) has an associated memory access type (MAT) (101) of a plurality of predetermined MATs (101). Each valid storage element (112) has an associated MAT (101); a mapping (108) that includes, for each MAT (101), a MAT priority (3277). In response to a memory access (122) that misses in the array (104), the allocation unit (106): determines a most eligible way and a second most eligible way of the selected set for replacement based on a replacement policy; and replaces the second most eligible way rather than the most eligible way when the MAT priority (3277) of the most eligible way is greater than the MAT priority (3277) of the second most eligible way.
A cache memory comprising: a mode input indicates in which of a plurality of allocation modes the cache memory is to operate; a set-associative array of entries having a plurality of sets by W ways; an input receives a memory address comprising: an index used to select a set from the plurality of sets; and a tag used to compare with tags stored in the entries of the W ways of the selected set to determine whether the memory address hits or misses; and allocation logic, when the memory address misses in the array: selects one or more bits of the tag based on the allocation mode; performs a function, based on the allocation mode, on the selected bits of the tag to generate a subset of the W ways of the array; and allocates into one way of the subset of the ways of the selected set.
A processor includes a first prefetcher that prefetches data in response to memory accesses and a second prefetcher that prefetches data in response to memory accesses. Each of the memory accesses has an associated memory access type (MAT) of a plurality of predetermined MATs. The processor also includes a table that holds first scores that indicate effectiveness of the first prefetcher to prefetch data with respect to the plurality of predetermined MATs and second scores that indicate effectiveness of the second prefetcher to prefetch data with respect to the plurality of predetermined MATs. The first and second prefetchers selectively defer to one another with respect to data prefetches based on their relative scores in the table and the associated MATs of the memory accesses.
A cache memory stores 2∧J-byte cache lines and includes an array of 2∧N sets each holding tags each X bits, an input receives a Q-bit memory address, MA[(Q-1):0], having: a tag MA[(Q-1):(Q-X)] and an index MA[(Q-X-1):J]. Q is an integer at least (N+J+X-l). In a first mode: set selection logic selects one set using the index and LSB of the tag; comparison logic compares all but LSB of the tag with all but LSB of each tag in the selected set and indicates a hit if a match; otherwise allocation logic allocates into the selected set. In a second mode: the set selection logic selects two sets using the index; the comparison logic compares the tag with each tag in the selected two sets and indicates a hit if a match; and otherwise allocates into one set of the two selected sets.
An apparatus including first and second reservation stations. The first reservation station dispatches a load micro instruction, and indicates on a hold bus if the load micro instruction is a specified load micro instruction directed to retrieve an operand from a prescribed resource other than on-core cache memory. The second reservation station is coupled to the hold bus, and dispatches one or more younger micro instructions therein that depend on the load micro instruction for execution after a number of clock cycles following dispatch of the first load micro instruction, and if it is indicated on the hold bus that the load micro instruction is the specified load micro instruction, the second reservation station is configured to stall dispatch of the one or more younger micro instructions until the load micro instruction has retrieved the operand. The resources include a fuse array that stores configuration data.
A set associative cache memory, comprising: an array of storage elements arranged as M sets by N ways; an allocation unit that allocates the storage elements in response to memory accesses that miss in the cache memory. Each memory access selects a set; for each parcel of a plurality of parcels, a parcel specifier specifies: a subset of ways of the N ways included in the parcel. The subsets of ways of parcels associated with a selected set are mutually exclusive; a replacement scheme associated with the parcel from among a plurality of predetermined replacement schemes. For each memory access, the allocation unit: selects the parcel specifier in response to the memory access; and uses the replacement scheme associated with the parcel to allocate into the subset of ways of the selected set included in the parcel.
An apparatus including first and second reservation stations. The first reservation station dispatches a load micro instruction, and indicates on a hold bus if the load micro instruction is a specified load micro instruction directed to retrieve an operand from a prescribed resource other than on-core cache memory. The second reservation station is coupled to the hold bus, and dispatches one or more younger micro instructions therein that depend on the load micro instruction for execution after a number of clock cycles following dispatch of the first load micro instruction, and if it is indicated on the hold bus that the load micro instruction is the specified load micro instruction, the second reservation station is configured to stall dispatch of the one or more younger micro instructions until the load micro instruction has retrieved the operand. The resources include a system memory that is accessed via a memory bus, the system memory comprising one or more page tables, configured to store one or more mappings between virtual addresses and physical addresses.
An apparatus including first and second reservation stations. The first reservation station dispatches a load micro instruction, and indicates on a hold bus if the load micro instruction is a specified load micro instruction directed to retrieve an operand from a prescribed resource other than on-core cache memory. The second reservation station is coupled to the hold bus, and dispatches one or more younger micro instructions therein that depend on the load micro instruction for execution after a number of clock cycles following dispatch of the first load micro instruction, and if it is indicated on the hold bus that the load micro instruction is the specified load micro instruction, the second reservation station is configured to stall dispatch of the one or more younger micro instructions until the load micro instruction has retrieved the operand. The plurality of non-core resources includes an off-core cache memory, configured to store memory operands which may have been cached from a system memory that are not present in one or more on-core cache memories.
An apparatus including first and second reservation stations. The first reservation station dispatches a load micro instruction, and indicates on a hold bus if the load micro instruction is a specified load micro instruction directed to retrieve an operand from a prescribed resource other than on-core cache memory. The second reservation station is coupled to the hold bus, and dispatches one or more younger micro instructions therein that depend on the load micro instruction for execution after a number of clock cycles following dispatch of the first load micro instruction, and if it is indicated on the hold bus that the load micro instruction is the specified load micro instruction, the second reservation station is configured to stall dispatch of the one or more younger micro instructions until the load micro instruction has retrieved the operand. The plurality of prescribed resources includes system memory, coupled to the out-of-order processor via a memory bus, where the specified load micro instruction is known to resolve to write combining memory space in the system memory.
A processor includes a prefetched 102) that prefetches data in response to memory accesses(134), wherein each memory access has an associated memory access type (MAT) of a plurality of predetermined MATs. The processor also includes a table(106) that holds scores that indicate effectiveness of the prefetcher to prefetch data with respect to the plurality of predetermined MATs. The prefetcher( 102) prefetches data in response to memory accesses at a level of aggressiveness based on the scores held in the table(106) and the associated MATs of the memory accesses.
An apparatus includes first and second reservation stations. The first reservation station (421.L) dispatches a load micro instruction, and indicates on a hold bus (444) if the load micro instruction is a specified load micro instruction directed to retrieve an operand from a prescribed resource other than on-core cache memory. The second reservation station (421.1-421.N) is coupled to the hold bus (444), and dispatches one or more younger micro instructions therein that depend on the load micro instruction for execution after a number of clock cycles following dispatch of the first load micro instruction, and if it is indicated on the hold bus (444) that the load micro instruction is the specified load micro instruction, the second reservation station (421.1-421.N) is configured to stall dispatch of the one or more younger micro instructions until the load micro instruction has retrieved the operand. The resources include an input/output (I/O) unit, configured to perform I/O operations via an I/O bus coupling an out-of-order processor to I/O resources.
An apparatus including first and second reservation stations. The first reservation station dispatches a load micro instruction, and indicates on a hold bus if the load micro instruction is a specified load micro instruction directed to retrieve an operand from a prescribed resource other than on-core cache memory. The second reservation station is coupled to the hold bus, and dispatches one or more younger micro instructions therein that depend on the load micro instruction for execution after a number of clock cycles following dispatch of the first load micro instruction, and if it is indicated on the hold bus that the load micro instruction is the specified load micro instruction, the second reservation station is configured to stall dispatch of the one or more younger micro instructions until the load micro instruction has retrieved the operand. The resources include an advanced programmable interrupt controller (APIC), configured to perform interrupt operations.
An apparatus includes a first reservation station and a second reservation station. The first reservation station dispatches a first load micro instruction, and detects and indicates on a hold bus if the first load micro instruction is a specified load micro instruction directed to retrieve an operand from a prescribed resource other than on-core cache memory. The second reservation station is coupled to the hold bus, and dispatches one or more younger micro instructions therein that depend on the first load micro instruction for execution after a first number of clock cycles following dispatch of the first load micro instruction, and if it is indicated on the hold bus that the first load micro instruction is the specified load micro instruction, the second reservation station is configured to stall dispatch of the one or more younger micro instructions until the first load micro instruction has retrieved the operand.
An apparatus includes first and second reservation stations. The first reservation station (421.L) dispatches a load micro instruction, and indicates on a hold bus (444) if the load micro instruction is a specified load micro instruction directed to retrieve an operand from a prescribed resource other than on-core cache memory. The second reservation station (421.1-421.N) is coupled to the hold bus (444), and dispatches one or more younger micro instructions therein that depend on the load micro instruction for execution after a number of clock cycles following dispatch of the first load micro instruction, and if it is indicated on the hold bus (444) that the load micro instruction is the specified load micro instruction, the second reservation station (421.1-421.N) is configured to stall dispatch of the one or more younger micro instructions until the load micro instruction has retrieved the operand. The resources include system memory, coupled an out-of-order processor (300) via a memory bus (MEM).
A set associative cache memory (102), comprising: an array (104) of storage elements (112) arranged as N ways; an allocation unit (106) that allocates the storage elements of the array in response to memory accesses that miss in the cache memory (102); wherein each of the memory accesses has an associated memory access type (MAT) (101) of a plurality of predetermined MATs, wherein the MAT is received by the cache memory; a mapping (108) that, for each MAT of the plurality of predetermined MATs, associates the MAT with a subset of one or more ways of the N ways; wherein for each memory access of the memory accesses, the allocation unit allocates into a way of the subset of one or more ways that the mapping associates with the MAT of the memory access; and wherein the mapping is dynamically updatable during operation of the cache memory.
A set associative cache memory, comprising: an array of storage elements arranged as M sets by N ways, each set belongs in one of L mutually exclusive groups; an allocation unit allocates the storage elements in response to memory accesses that miss in the cache; each memory access has an associated memory access type (MAT) of a plurality of predetermined MAT; a mapping, for each group of the L mutually exclusive groups: for each MAT, associates the MAT with a subset of the N ways; and for each memory access, the allocation unit allocates into a way of the subset of ways that the mapping associates with the MAT of the memory access and with one of the L mutually exclusive groups in which the selected set belongs.
An apparatus including first and second reservation stations. The first reservation station dispatches a load micro instruction, and indicates on a hold bus if the load micro instruction is a specified load micro instruction directed to retrieve an operand from a prescribed resource other than on-core cache memory. The second reservation station is coupled to the hold bus, and dispatches one or more younger micro instructions therein that depend on the load micro instruction for execution after a number of clock cycles following dispatch of the first load micro instruction, and if it is indicated on the hold bus that the load micro instruction is the specified load micro instruction, the second reservation station is configured to stall dispatch of the one or more younger micro instructions until the load micro instruction has retrieved the operand. The plurality of non-core resources includes a random access memory, configured to store microcode patches corresponding to the out-of-order processor which, upon initialization, accesses said random access memory to retrieve said microcode patches.
An apparatus including first and second reservation stations. The first reservation station dispatches a load micro instruction, and indicates on a hold bus if the load micro instruction is a specified load micro instruction directed to retrieve an operand from a prescribed resource other than on-core cache memory, where the specified load instruction comprises a load instruction resulting from execution of an x86 special bus cycle. The second reservation station is coupled to the hold bus, and dispatches one or more younger micro instructions therein that depend on the load micro instruction for execution after a number of clock cycles following dispatch of the first load micro instruction, and if it is indicated on the hold bus that the load micro instruction is the specified load micro instruction, the second reservation station is configured to stall dispatch of the one or more younger micro instructions until the load micro instruction has retrieved the operand.
A graphics processing system and power gating method thereof, the graphics processing system (100) comprising: a graphics processing unit (GPU) (130), a bus interface (110) and a power management unit (PMU) (120), the GPU comprising a control circuit (140) and a plurality of partitions (151-154); the method comprises: when the bus interface receives an external graphics command, utilizing the PMU to turn on a power supply of the control circuit (S210); subsequently utilizing the control circuit to turn on power supplies of one or more partitions of the plurality of partitions corresponding to the external graphics command (S220); when the control circuit detects any one of the plurality of partitions is in an idle state, utilizing the control circuit to turn off the power supply of the partition in the idle state (S230); when the bus interface detects the plurality of partitions are in a full idle state, utilizing the bus interface to turn off the power supply of the control circuit via the PMU (S240); and when the PMU turns off the power supply of the control circuit, the control circuit may also turn off the power supplies of the plurality of the partitions.
Provided are a system and method for dynamically adjusting a voltage frequency, the system comprising: an operational unit; a power management unit; a hardware activity monitoring unit for monitoring an operation state and temperature information of the operational unit, and determining, according to the operation state, temperature information, and a previous adjustment result, whether the operation voltage and operation frequency of the operational unit need to be updated, and when it is determined to update the operation voltage and operation frequency, generating a first control signal to the power management unit to adjust the operation voltage and operation frequency; and a hardware voltage monitoring unit for detecting time sequence information of the operational unit, and determining whether to fine-tune the operation voltage according to the time sequence information, and when it is determined to fine-tune the operation voltage, generating a second control signal to the power management unit to fine-tune the operation voltage.
A microprocessor comprises a cache including a tag array; a tag pipeline that arbitrates access to the tag array; and a pattern detector. The pattern detector comprises a register; a decoder that decodes transaction type identifiers of tagpipe arbs advancing through the tag pipeline; and an accumulator that accumulates into the register the transaction type identifiers of a plurality of tagpipe arbs that advance through the tag pipeline.
A microprocessor comprises a cache including a tag array; a tagpipe that arbitrates access to the tag array; and a logic analyzer for investigating a starvation, livelock, or deadlock condition. The logic analyzer, which comprises read logic coupled to the tagpipe, is configured to record snapshots of transactions to access the tag array.
A microprocessor comprises a cache including a tag array; a tag pipeline that arbitrates access to the tag array; and a pattern detector. The pattern detector comprises snapshot capture logic that captures snapshots of tagpipe arbs— including information about whether the tagpipe arb is a load, snoop, store or other arb type and whether the tagpipe arb completed or replayed— and a plurality of configurable register modules operable to store user-configured snapshot patterns. Configuration logic enables a user to specify, for each configurable register module, properties of tagpipe arbs for the pattern detector to detect as well as dependencies between the configurable register modules. A register module becomes triggered if a tagpipe arb or pattern of tagpipe arbs meets the user-specified properties for the register module and if any other register module on which the register module depends is also in a triggered state.
A microprocessor comprises a plurality of queues containing transient transaction state information about cache-accessing transactions; a plurality of detectors coupled to the plurality of queues and monitoring the plurality of queues for one or more likely starvation, livelock, or deadlock conditions; and a plurality of recovery logic modules operable to implement one or more recovery routines when the detectors identify one or more likely starvation, livelock, or deadlock conditions.
A cache memory system including a primary cache and an overflow cache that are searched together using a search address. The overflow cache operates as an eviction array for the primary cache. The primary cache is addressed using bits of the search address, and the overflow cache is configured as FIFO buffer. The cache memory system may be used to implement a translation lookaside buffer for a microprocessor.
A processor includes a cache memory having a plurality of entries. Each of the entries holds data of a cache line, a state of the cache line and a tag of the cache line. The cache memory includes an engine comprising one or more finite state machines. The processor also includes an interface to a bus over which the processor writes back modified cache lines from the cache memory to the system memory in response to encountering an architectural writeback and invalidate instruction. The processor also invalidates the state of the entries of the cache memory in response to encountering the architectural writeback and invalidate instruction. In response to being instructed to perform a cache diagnostic operation, for each entry of the entries, the engine writes the state and the tag of the entry on the bus and does not invalidate the state of the entry.
A processor includes a plurality of processing cores and a cache memory shared by the plurality of processing cores. The cache memory comprises a size engine that receives a respective request from each of the plurality of processing cores to perform an operation associated with the cache memory. The size engine fuses the respective requests from two or more of the plurality of processing cores into a fused request. To perform the fused request the size engine performs a single instance of the operation and notifies each of the two or more of the plurality of processing cores that its respective request has been completed when the single instance of the operation is complete.
A translation-lookaside buffer (TLB) includes a plurality of entries, wherein each entry of the plurality of entries is configured to hold an address translation and a local valid bit vector, wherein each bit of the local valid bit vector is mapped from a different value of an x86 instruction set architecture (ISA) process context identifier (PCID). The TLB also includes an input that receives an invalidation bit vector having bits corresponding to the bits of the local valid bit vector of the plurality of entries. The TLB also includes logic that simultaneously invalidates a bit of the local valid bit vector of each entry of the plurality of entries that corresponds to a set bit of the invalidation bit vector.
A processor includes translation-lookaside buffer (TLB) (206) and a mapping module (204). The TLB (206) includes a plurality of entries (300), wherein each entry of the plurality of entries (300) is configured to hold an address translation (306, 308) and a valid bit vector (302, 304), wherein each bit of the valid bit vector (302, 304) indicates, for a respective address translation context, the address translation (306, 308) is valid if set and invalid if clear. The TLB (206) also includes an invalidation bit vector (302, 304) having bits corresponding to the bits of the valid bit vector (302, 304) of the plurality of entries (300), wherein a set bit of the invalidation bit vector (302, 304) indicates to simultaneously clear the corresponding bit of the valid bit vector (302, 304) of each entry of the plurality of entries (300). The mapping module (204) generates the invalidation bit vector (302, 304).
A processor includes a mapping module that maps architectural virtual processor identifiers to non-architectural global identifiers and maps architectural process context identifiers to non-architectural local identifiers. The processor also includes a translation-lookaside buffer (TLB) having a plurality of address translations. For each address translation of the plurality of address translations: when the address translation is a global address translation, the address translation is tagged with a representation of one of the non-architectural global identifiers to which the mapping module has mapped one of the virtual processor identifiers; and when the address translation is a local address translation, the address translation is tagged with a representation of one of the non-architectural local identifiers to which the mapping module has mapped one of the process context identifiers.
A cache memory system including a primary cache and an overflow cache that are searched together using a search address. The overflow cache operates as an eviction array for the primary cache. The primary cache is addressed using bits of the search address, and the overflow cache is addressed by a hash index generated by a hash function applied to bits of the search address. The hash function operates to distribute victims evicted from the primary cache to different sets of the overflow cache to improve overall cache utilization. A hash generator may be included to perform the hash function. A hash table may be included to store hash indexes of valid entries in the primary cache. The cache memory system may be used to implement a translation lookaside buffer for a microprocessor.
A microprocessor executes a fused multiply-accumulate operation of a form ±A * B ± C by dividing the operation in first and second suboperations. The first suboperation selectively accumulates the partial products of A and B with or without C and generates an intermediate result vector and a plurality of calculation control indicators. The calculation control indicators indicate how subsequent calculations to generate a final result from the intermediate result vector should proceed. The intermediate result vector, in combination with the plurality of calculation control indicators, provides sufficient information to generate a result indistinguishable from an infinitely precise calculation of the compound arithmetic operation whose result is reduced in significance to a target data size.
G06F 9/30 - Dispositions pour exécuter des instructions machines, p.ex. décodage d'instructions
G06F 7/57 - Unités arithmétiques et logiques [UAL], c. à d. dispositions ou dispositifs pour accomplir plusieurs des opérations couvertes par les groupes ou pour accomplir des opérations logiques
42.
PROGRAMMING APPARATUS AND METHOD FOR REPAIRING CACHE ARRAYS IN MULTI-CORE MICROPROCESSOR
An apparatus includes a programmer, a stores, and a plurality of cores. The programmer programs a fuse array with compressed configuration data. The stores provides for storage and access of decompressed configuration data sets. Each of a plurality of cores is coupled to the fuse array. One of the cores is accesses the fuse array upon power-up/reset to decompress and store decompressed configuration data sets for one or more cache memories. Each of the cores includes reset logic and sleep logic. The reset logic employs the decompressed configuration data sets to initialize the one or more cache memories upon power- up/reset. The sleep logic determines that power is restored following a power gating event, and subsequently accesses the stores to retrieve and employ the decompressed configuration data sets to initialize the one or more caches following the power gating event.
An apparatus including a device programmer (310), a stores (1130), and a plurality of cores (332; 1101). The device programmer (310) programs a semiconductor fuse array (336) with compressed configuration data for a plurality of cores (332; 1101) disposed on a die (330). The stores (1130) has a plurality of sub-stores (1131; 1132; 1133; 1134) that each correspond to each of the plurality of cores (1101), where one of the plurality of cores (1101) is configured to access the semiconductor fuse array (336) upon power- up/reset to read and decompress the configuration data, and to store a plurality of decompressed configuration data sets for one or more cache memories (1102) within the each of the plurality of cores (1101) in the plurality of sub-stores (1131; 1132; 1133; 1134). The plurality of cores each has sleep logic (1106) that is configured to subsequently access a corresponding one of the each of the plurality of sub-stores (1131; 1132; 1133; 1134) to retrieve and employ the decompressed configuration data sets to initialize the one or more caches (1102) following a power gating event.
An apparatus includes a device programmer and a plurality of cores. The programmer programs a semiconductor fuse array with compressed configuration data. Each of the plurality of cores accesses the fuse array upon power-up/reset to read and decompress the compressed data, and stores decompressed data sets for one or more cache memories within the each of the plurality of cores in a stores that is coupled to the each of the plurality of cores. Each of the plurality of cores has reset logic and sleep logic. The reset logic employs the decompressed data sets to initialize the one or more cache memories upon power-up/reset. The sleep logic determines that power is restored following a power gating event, and subsequently accesses the stores to retrieve and employ the decompressed data sets to initialize the one or more caches following the power gating event.
An apparatus includes a device programmer and a stores. The device programmer programs a semiconductor fuse array with compressed configuration data for a plurality of cores disposed on a die. The stores includes a plurality of sub-stores that each correspond to each of the plurality of cores, where one of the plurality of cores is configured to access the semiconductor fuse array upon power-up/reset to read and decompress the compressed configuration data, and to store a plurality of decompressed configuration data sets for one or more cache memories within the each of the plurality of cores in the plurality of sub-stores, and where, following a power gating event, one of the each of the plurality of cores subsequently accesses a corresponding one of the each of the plurality of sub- stores to retrieve and employ the decompressed configuration data sets to initialize the one or more caches.
G11C 15/04 - Mémoires numériques dans lesquelles l'information, comportant une ou plusieurs parties caractéristiques, est écrite dans la mémoire et dans lesquelles l'information est lue au moyen de la recherche de l'une ou plusieurs de ces parties caractéristique utilisant des éléments semi-conducteurs
46.
PROCESSOR WITH APPROXIMATE COMPUTING FUNCTIONAL UNIT
A processor includes an indicator configured to indicate a first mode or a second mode and a functional unit configured to perform computations with a full degree of accuracy when the indicator indicates the first mode and to perform computations with less than the full degree of accuracy when the indicator indicates the second mode.
A processor includes a decoder that decodes an instruction that instructs the processor to perform subsequent computations in an approximate manner and a functional unit that performs the subsequent computations in the approximate manner in response to the instruction. An instruction instructs the processor to clear an error amount associated with a value stored in a general purpose register of the processor. The error amount indicates an amount of error associated with a result of a computation performed by the processor in an approximate manner. The processor also clears the error amount in response to the instruction. Another instruction specifies a computation to be performed and includes a prefix that indicates the processor is to perform the computation in an approximate manner. The functional unit performs the computation specified by the instruction in the approximate manner specified by the prefix.
A processor includes a storage configured to receive a snapshot of a state of the processor prior to performing a set of computations in an approximating manner. The processor also includes an indicator that indicates an amount of error accumulated while the set of computations is performed in the approximating manner. When the processor detects that the amount of error accumulated has exceeded an error bound, the processor is configured to restore the state of the processor to the snapshot from the storage.
A microprocessor includes a predicting unit having storage for holding a prediction history of characteristics of instructions previously executed by the microprocessor. The predicting unit accumulates the prediction history and uses the prediction history to make predictions related to subsequent instruction executions. The storage comprises a plurality of portions separately controllable for accumulating the prediction history. The microprocessor also includes a control unit that detects the microprocessor is running an operating system routine and controls the predicting unit to use only a fraction of the plurality of portions of the storage to accumulate the prediction history while the microprocessor is running the operating system routine.