An apparatus is provided for controlling the operating mode of control circuitry, such that the control circuitry may change between two operating modes. In an allocation mode, data that is loaded in response to an instruction is allocated into storage circuitry from an intermediate buffer, and the data is read from the storage circuitry. In a non-allocation mode, the data is not allocated to the storage circuitry, and is read directly from intermediate buffer. The control of the operating mode may be performed by mode control circuitry, and the mode may be changed in dependence on the type of instruction that calls the data, and whether the data may be used again in the near future, or whether it is expected to be used only once.
A processor to execute a plurality of tasks comprising a first task and a second task. At least a part of the first task is to be executed simultaneously with at least a part of the second task. The processor comprises a handling unit to: determine an available portion of a storage available during execution of the part of the first task; determine a mapping between at least one logical address associated with data associated with the part of the second task and a corresponding at least one physical address of the storage corresponding to the available portion; and identify, based on the mapping, the at least one physical address corresponding to the at least one logical address associated with the data, for storing the data in the available portion of the storage.
Prefetch circuitry generates, based on stream prefetch state information, prefetch requests for prefetching data to at least one cache. Cache control circuitry controls, based on cache policy information associated with cache entries in a given level of cache, at least one of cache entry replacement in the given level of cache, and allocation of data evicted from the given level of cache to a further level of cache. The stream prefetch state information specifies, for at least one stream of addresses, information representing an address access pattern for generating addresses to be specified by a corresponding series of prefetch requests. Cache policy information for at least one prefetched cache entry of the given level of cache (to which data is prefetched for a given stream of addresses) is set to a value dependent on at least one stream property associated with the given stream of addresses.
G06F 12/0862 - Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches with prefetch
G06F 12/0811 - Multiuser, multiprocessor or multiprocessing cache systems with multilevel cache hierarchies
G06F 12/0871 - Allocation or management of cache space
A data processing apparatus is provided. Decode circuitry decodes a stream of instructions including a store instruction and a load instruction. Prediction circuitry predicts that the load instruction loads data from memory that is stored to the memory by the store instruction and the prediction is based on a hash of a program counter value of the store instruction.
A data processing apparatus includes detection circuitry that detects a parent instruction and a child instruction from a stream of instructions. The parent instruction references a destination register that is referenced as a source register by the child instruction. Adjustment circuitry then adjusts the child instruction to produce an adjusted child instruction whose behaviour is logically equivalent to a behaviour of executing the parent instruction followed by the child instruction.
Various implementations described herein are directed to a device having core circuitry and hardware with functional paths and canary paths that are co-located with the functional paths. The device may have timing monitors that monitor and measure digital timing margins of the functional paths and the canary paths during droop events. Also, the device may have a control processor that sets-up parameters for hardware droop mitigation based on the digital timing margins, wherein the control processor calibrates the hardware for droop response or for adaptive clock and power control for droop mitigation based on the digital timing margins.
An apparatus and method are provided for storing a plurality of translation entries in a cache, each translation entry corresponding to one of a plurality of page table entries and defining a translation between a first address and a second address, and encoding control information indicative of an attribute of each page table entry; returning, in response to a lookup querying a first lookup address, a corresponding second address when the first lookup address corresponds to one of the plurality of translation entries stored in the cache; modifying at least some of the control information in response to notification of a modification of the attribute in a page table entry; and retaining in the cache at least one translation entry corresponding to the page table entry for use in a subsequent address lookup querying a corresponding first lookup address in response to the notification of the modification of the attribute in the page table entry.
According to one implementation of the present disclosure, a method includes: receiving, by a hardware design generation circuit, a plurality of input signals of a software workload on a processing unit; training a power prediction model based on a toggling of the input signals accumulated over a training interval range; determining, by the hardware design generation circuit, a plurality of prediction proxies and respective weightings for the plurality of prediction proxies based at least partially on the trained power prediction model, wherein the plurality of weighted prediction proxies correspond to a power output of the hardware design generation circuit; and generating an updated circuit design of the processing unit based on the power output.
G06F 30/327 - Logic synthesis; Behaviour synthesis, e.g. mapping logic, HDL to netlist, high-level language to RTL or netlist
G06F 30/27 - Design optimisation, verification or simulation using machine learning, e.g. artificial intelligence, neural networks, support vector machines [SVM] or training a model
G06F 30/3308 - Design verification, e.g. functional simulation or model checking using simulation
There is provided a graphics primitive assembly circuit comprising an early primitive assembly data generator operable to supply primitive input to a shader and a buffer operable to store early primitive assembly data during operation of the shader and to supply the early primitive assembly data to a late primitive assembly circuit element responsive to completion of operation of the shader. The circuit may also include a compressor that compresses the early primitive assembly data to reduce the amount of storage taken up by the buffer and the bandwidth required to transfer the early primitive assembly data.
A data processing apparatus includes control flow prediction circuitry that generates a control flow prediction in respect of a group of one or more instructions. Storage circuitry used by the control flow prediction circuitry stores data in association with groups of instructions used to generate the control flow prediction for each of the groups of instructions. Control flow prediction update circuitry inserts new data into the storage circuitry in association with a new group of one or more instructions in dependence on one or more conditions being met when a miss occurs for the group of one or more instructions in the storage circuitry.
A data processing system that comprises a processing unit and a communications bus over which bus transactions to access memory can be performed is disclosed. The system includes a codec, and the processing unit can initiate over the communications bus, bus transactions that comprise the codec accessing the memory.
An apparatus, method and computer program, the apparatus comprising processing circuitry to execute instructions, issue circuitry to issue the instructions for execution by the processing circuitry, and candidate instruction storage circuitry to store a plurality of condition-dependent instructions, each specifying at least one condition. The issue circuitry is configured to issue a given condition-dependent instruction in response to a determination or a prediction of the at least one condition specified by the given condition-dependent instruction being met, and when the given condition-dependent instruction is a sequence-start instruction, the issue circuitry is responsive to the determination or prediction to issue a sequence of instructions comprising the sequence-start instruction and at least one subsequent instruction.
An apparatus has processing circuitry with execution units to perform operations, physical registers to store data, and forwarding circuitry to forward the data from the physical registers to the execution units. The forwarding circuitry provides an incomplete set of connections between the physical registers and the execution units such that, for each of at least some of the physical registers, the physical register is connected to only a subset of the execution units. The apparatus also has register renaming circuitry to map logical registers identified by the operations to respective physical registers and register reorganisation circuitry to monitor upcoming operations and to determine, based on the upcoming operations and the connections provided by the forwarding circuitry, whether to perform a register reorganisation procedure to change a mapping between the logical registers and the physical registers. The register reorganisation circuitry is also configured to perform the register reorganisation procedure.
An apparatus comprises an instruction decoder 20 and processing circuitry 22. Monitoring circuitry 36 monitors one or more events indicative of a potential update to data associated with any of a monitored set of addresses, and makes accessible to software executing on the processing circuitry 22 a monitoring reporting indication indicative of whether any events has occurred for at least one of the monitored set of addresses. In response to decoding of an exclusive status setting instruction specifying a given address, the processing circuitry 22 sets an exclusive status associated with the given address. The exclusive status is cleared in response to detecting an event indicative of a conflicting memory access to the given address. In response to decoding of a monitor exclusive instruction, the processing circuitry 22: determines whether the exclusive status is associated with a target address, and if so allocates the target address to be one of the monitored set of addresses.
Key capability storage circuitry 90 is provided to store a key capability specifying key bounds indicating information indicative of permissible bounds for information specified by any one or more of: a non-capability operand, a capability, or the key capability itself. For a given software compartment executed by the processing circuitry, which lacks a key capability operating privilege associated with at least a portion of the key capability storage circuitry, the processing circuitry is configured to prohibit certain manipulations of the key capability, including a transfer between key capability storage and a memory location selected by the given software compartment. This can help to support temporal safety.
G06F 21/80 - Protecting specific internal or peripheral components, in which the protection of a component leads to protection of the entire computer to assure secure storage of data in storage media based on magnetic or optical technology, e.g. disks with sectors
G06F 21/52 - Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems during program execution, e.g. stack integrity, buffer overflow or preventing unwanted data erasure
Data communication apparatus comprises a receiver comprising message receiver circuitry to receive payload messages and sender control messages from message sender circuitry, the message receiver circuitry comprising: communication circuitry to send receiver control messages to the message sender circuitry, the receiver control messages relating to actions by the message receiver circuitry in response to payload messages or sender control messages from the message sender circuitry; in which the communication circuitry is configured to selectively associate a respective indication with at least some of the receiver control messages sent to the message sender circuitry, the indication indicating whether a given receiver control message with which the indication is associated is a first receiver control message sent by the communication circuitry to the message sender circuitry after a reset of circuitry in the receiver.
There is provided an apparatus, method and medium for data processing. The apparatus comprises a register file comprising a plurality of data registers, and frontend circuitry responsive to an issued instruction, to control processing circuitry to perform a processing operation to process an input data item to generate an output data item. The processing circuitry is responsive to a first encoding of the issued instruction specifying a data register, to read the input data item from the data register, and/or write the output data item to the data register. The processing circuitry is responsive to a second encoding of the issued instruction specifying a buffer-region of the register file for storing a queue of data items, to perform the processing operation and to perform a dequeue operation to dequeue the input data item from the queue, and/or perform an enqueue operation to enqueue the output data item to the queue.
One or more triggered-instruction processing elements are provided, a given triggered-instruction processing element comprising execution circuitry to execute processing operations in response to instructions according to a triggered instruction architecture. Input channel processing circuitry receives a given tagged data item (comprising a data value and a tag value) for a given input channel, and in response controls enqueuing of the data value of the given tagged data item to a selected buffer structure selected from among at least two buffer structures mapped onto register storage accessible to one or more of the triggered-instruction processing elements in response to a computation instruction for controlling performance of a computation operation. The selected buffer structure is selected based at least on the tag value, so data values of tagged data items specifying different tag values for the given input channel are allocatable to different buffer structures.
A computer-implemented method of operating a device is provided. The method comprises operating a sensor to capture a data input, individuating an element of the data input, tagging an individuated element with metadata, matching the metadata with an associated permission set, and applying a restricting function defined in the associated permission set to the individuated element during a process flow to produce augmented reality output data restricted as required by the associated permission set. A device is also provided, comprising a sensor, an individuating component to individuate an element of sensor data from the sensor, a tagging component to tag the individuated element, a matching component to match a tag of the individuated element with a permission of a permission set, and a restricting function component to restrict an application's interaction with the individuated element.
Apparatus, methods, and software for protecting a plurality of memory locations are disclosed. Logical addresses are translated into physical addresses in dependence on one of a first translation function and a second translation function. A transitional logical address and an associated transitional value are locally held in circuitry which applies the translation functions. A remapping of first to second translation function usage is performed by determining a new transitional physical address by applying the second translation function to the transitional logical address; determining a new transitional logical address by applying an inverse of the first translation function to the new transitional physical address; retrieving a new transitional value using the new transitional physical address; storing the old transitional value to the memory location indicated by the new transitional physical address; and locally storing the new transitional value. This remapping can be interleaved with normal memory accesses.
An apparatus comprises counter tree circuitry configured to store, in a first node of a counter tree, a representation of a parent counter value and in a second node of the counter tree, wherein the second node is a child node of the first node, an encrypted representation of two or more counter values. The encryption operation for forming the encrypted representation of the two or more counter values takes as an input the parent counter value. The apparatus also comprises integrity checking circuitry to check the integrity of an item of data retrieved from memory based on a comparison between a stored authentication code and a generated authentication code generated based on the item of data and a decrypted counter value determined from an encrypted representation of a counter value retrieved from the second node, decrypted using a parent counter value retrieved from the first node.
G06F 21/74 - Protecting specific internal or peripheral components, in which the protection of a component leads to protection of the entire computer to assure secure computing or processing of information operating in dual or compartmented mode, i.e. at least one secure mode
An apparatus comprises counter integrity tree circuitry to maintain a counter integrity tree having a plurality of nodes. The counter integrity tree circuitry is configured to store, in a first node of the counter integrity tree, an encrypted representation of two or more non-repeating counters and in a second, parent, node, an indication of a function value equal to a non-repeating function of the two or more non-repeating counters of the first node. The apparatus comprises integrity checking circuitry configured to check the integrity of the first node using the function value retrieved from the second node.
H04L 9/32 - Arrangements for secret or secure communications; Network security protocols including means for verifying the identity or authority of a user of the system
A method of operating a cache system is disclosed. When it is desired to evict a cache entry from the cache, a cache entry to evict from the cache is selected using an age of any compression block that the cache is caching data for, and the selected cache entry is evicted from the cache.
G06F 12/0891 - Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches using clearing, invalidating or resetting means
Disclose herein is a method of operating a graphics processor when performing ray tracing. During a traversal of the nodes of an acceleration data structure, when a parent node that encompasses multiple child node volumes is encountered, a group of rays is tested against the child node volumes to determine which child nodes may need to be visited next. Rather than simply visiting the nodes based on the order in which they are found to be interested, the node traversal order is instead determined based on the group of rays.
An apparatus has processing circuitry with one or more execution units to perform operations in response to instructions. The apparatus also has registers to store data accessed by the processing circuitry and forwarding circuitry to forward results of the operations from the execution units to be written back to the registers and to the execution units for use as operands of further operations. The apparatus also has write-back reschedule circuitry which for each operation causes an execution unit performing the operation to stall the operation prior to a write-back stage of the execution unit and determine, based on monitoring subsequent operations whether to forward the result of the operation to be written back to a register or to forward the result to an execution unit. The write-back reschedule circuitry also controls the forwarding circuitry to forward the result according to the determination.
There is provided an apparatus, method and medium. The apparatus comprises a store buffer to store a plurality of store requests, where each of the plurality of store requests identifies a storage address and a data item to be transferred to storage beginning at the storage address, where the data item comprises a predetermined number of bytes. The apparatus is responsive to a memory access instruction indicating a store operation specifying storage of N data items, to determine an address allocation order of N consecutive store requests based on a copy direction hint indicative of whether the memory access instruction is one of a sequence of memory access instructions each identifying one of a sequence of sequentially decreasing addresses, and to allocate the N consecutive store requests to the store buffer in the address allocation order.
According to one implementation of the present disclosure, a circuit structure is configured to store charge in a charge-based storage element, where the charge-based storage element is disposed at least partially in a shallow-trench-isolation (STI) region of the circuit. According to one implementation of the present disclosure, a method includes: providing a circuit structure disposed on a substrate and a shallow-trench-isolation (STI) region of a circuit; forming an opening of the substrate and the STI region by removing a portion of the substrate and STI region; placing a first liner material in the opening and on remaining portions of the substrate and the STI region; and depositing a first metal layer in the opening on the first liner material.
An apparatus and method are described for providing a trusted execution environment. The apparatus comprises processing circuitry to execute program code, and interrupt controller circuitry, responsive to receipt of one or more interrupt requests, to select a given interrupt request from amongst the one or more interrupt requests, and to issue an interrupt signal to the processing circuitry identifying a given interrupt service routine providing program code to be executed by the processing circuitry to service the given interrupt request. The interrupt controller circuitry is responsive to the given interrupt request being a trusted execution environment (TEE) interrupt request, to issue the interrupt signal to identify as the given interrupt service routine a TEE interrupt service routine, and to inhibit issuance of any further interrupt signal until the TEE interrupt service routine has been executed by the processing circuitry. The interrupt controller circuitry comprises code protection circuitry to inhibit unauthorised modification of the TEE interrupt service routine, and data protection circuitry to inhibit unauthorised access to confidential data processed by the TEE interrupt service routine.
G06F 21/53 - Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems during program execution, e.g. stack integrity, buffer overflow or preventing unwanted data erasure by executing in a restricted environment, e.g. sandbox or secure virtual machine
A context-information-dependent instruction causes a context-information-dependent operation to be performed based on specified context information indicative of a specified execution context. A context information translation cache 10 stores context information translation entries each specifying untranslated context information and translated context information. Lookup circuitry 14 performs a lookup of the context information translation cache based on the specified context information, to identify whether the context information translation cache includes a matching context information translation entry which is valid and which specifies untranslated context information corresponding to the specified context information. When the matching context information translation entry is identified, the context-information-dependent operation is performed based on the translated context information specified by the matching context information translation entry.
G06F 12/0811 - Multiuser, multiprocessor or multiprocessing cache systems with multilevel cache hierarchies
G06F 12/0837 - Cache consistency protocols with software control, e.g. non-cacheable data
G06F 12/0875 - Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches with dedicated cache, e.g. instruction or stack
30.
METHOD, APPARATUS AND PROGRAM FOR ADJUSTING AN EXPOSURE LEVEL
A method for adjusting an exposure level of an image sensor. The method comprises receiving intensity values of a current image captured by the image sensor, the image sensor having an initial exposure level when the current image is captured. The method comprises determining an intensity status based on comparing at least one characteristic of at least the current intensity values to one or more criteria. The method comprises selecting an exposure convergence mode, from a plurality of exposure convergence modes, based on the intensity status. The method comprises calculating, based on the current intensity values and the exposure convergence mode, a new exposure level for use by the image sensor in capturing a subsequent image.
Example methods, apparatuses, and/or articles of manufacture are disclosed that may implement, in whole or in part, techniques to process portions of an image frame according to a level of diminished signal information. Portions of an image frame experiencing diminished signal information may be sampled a lower rate/more sparsely to reduce impacts to downstream image processing resources.
H04N 19/59 - Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding involving spatial sub-sampling or interpolation, e.g. alteration of picture size or resolution
G06V 10/44 - Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
H04N 19/176 - Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object the region being a block, e.g. a macroblock
A data processing apparatus includes receive circuitry that receives an indication of a trigger block of instructions. Branch prediction circuitry provides, in response to the trigger block of instructions, branch predictions in respect of a branch within: a subsequent block of instructions subsequent to the trigger block of instructions in execution order, when in a 1-taken mode of operation and a later block of instructions subsequent to the subsequent block of instructions in execution order, when in a 2-taken mode of operation
Circuitry comprises memory address translation circuitry to access memory circuitry storing translation information defining memory address translations from input memory addresses to respective output memory addresses; in which the translation information stored by the memory circuitry comprises a hierarchy of page table levels from a highest page table level to a lowest page table level, each page table level having one or more level tables each comprising two or more entries, in which an entry of a level table at a page table level other than a last page table level of the hierarchy points to a level table at a next lower page table level in the hierarchy; the memory address translation circuitry being configured to select an entry of a level table at each page table level according to a selection value, the selection value being dependent upon a portion, applicable to that page table level, of a given input memory address; in which the memory circuitry is configured to store entries as groups of entries, a group of entries being accessible by a single memory retrieval operation; and in which, for at least a subset of the page table levels, a group of entries stored by the memory circuitry comprises a set of entries from two or more respective level tables.
A cache is provided having a plurality of entries for storing data. In response to a given access request, lookup circuitry performs a lookup operation in the cache to determine whether one of the entries in the cache is allocated to store data associated with the memory address indicated by the given access request, with a hit indication or a miss indication being generated dependent on the outcome of that lookup operation. During a single lookup period, the lookup circuitry is configured to perform lookup operations in parallel for up to N access requests. In addition, allocation circuitry is provided that is able to determine, during the single lookup period, at least N candidate entries for allocation from amongst the plurality of entries, and to cause one of the candidate entries to be allocated for each of the up to N access requests for which the lookup circuitry generates a miss indication.
Processing circuitry (16) and an instruction decoder (9) supports a load chunk instruction and a store chunk instruction which can be useful for implementing memory copy functions and other library functions for manipulating or comparing blocks of memory. Number of bytes to load or store in response to these instructions is determined based on an implementation specific condition. As well as loading or storing bytes of data, the load chunk instruction and (10) store chunk instruction also designated a load/store length value as data corresponding to an architecturally visible register, which provides an indication of a number of bytes loaded or stored.
A method for processing an image captured by an image sensor. The method comprises receiving first pre-tonemapped intensity values of a first image captured by the image sensor. The method comprises applying a first tonemapping function having a first tonemapping strength to the first pre-tonemapped intensity values to generate first tonemapped intensity values. The method comprises receiving second pre-tonemapped intensity values of a second image captured by the image sensor, the second image captured subsequent to the first image. The method comprises, based on a difference between the first tonemapped intensity values and a target for the first tonemapped intensity values, determining a second tonemapping function having a second tonemapping strength. The method comprises applying the second tonemapping function to the second pre-tonemapped intensity values to generate second tonemapped intensity values.
H04N 23/741 - Circuitry for compensating brightness variation in the scene by increasing the dynamic range of the image compared to the dynamic range of the electronic image sensors
H04N 23/71 - Circuitry for evaluating the brightness variation
H04N 23/76 - Circuitry for compensating brightness variation in the scene by influencing the image signals
37.
METHOD, APPARATUS AND PROGRAM FOR PROCESSING AN IMAGE
A method for processing an image. The method comprises receiving intensity values for each of a plurality of images captured by an image sensor during a detection window. The method comprises identifying, using the intensity values, one or more transitions occurring during the detection window, the one or more transitions each comprising a transition between an image in which a maximum clipping criterion is not satisfied and an image in which the maximum clipping criterion is satisfied. The method comprises, based on the identified transitions, at least one of: adjusting an exposure level of the image sensor; and determining a tonemapping function having a tonemapping strength, and applying the tonemapping function to the intensity values of a current image to generate tonemapped intensity values, wherein the image sensor has a current exposure level when the current image is captured.
Data processing systems, methods, and storage medium for implementing convolutional processes are provided. The data processing system includes a convolution engine and a set of weight decoders including a first weight decoder and a second weight decoder that implement a first decompression function and a second decompression function respectively. A weight decoder selection module for selecting a weight decoder from the set of weight decoders is provided. The data processing system, receives a compressed set of weight values, selects a weight decoder using the weight decoder selection module, and processes the compressed set of weight values using the selected weight decoder to obtain an uncompressed set of weight values. The uncompressed set of weight values are provided to the convolution engine. A corresponding data processing system, method, and storage medium for generating the compressed set of weight values is also provided.
There is provided an apparatus, method, and computer-readable medium. The apparatus comprises interconnect circuitry to couple a device to one or more processing elements and to one or more storage structures. The apparatus also comprises stashing circuitry configured to receive stashing transactions from the device, each stashing transaction comprising payload data and control data. The stashing circuitry is responsive to a given stashing transaction whose control data identifies a plurality of portions of the payload data, to perform a plurality of independent stashing decision operations, each of the plurality of independent stashing decision operations corresponding to a respective portion of the plurality of portions of payload data and comprising determining, with reference to the control data, whether to direct the respective portion to one of the one or more storage structures or whether to forward the respective portion to memory.
An apparatus comprising a cache comprising a plurality of cache entries, cache access circuitry responsive to a cache access request to perform, based on a target memory address associated with the cache access request, a cache lookup operation, tracking circuitry to track pending requests to modify cache entries of the cache, and prediction circuitry responsive to the cache access request to make a prediction of whether the pending requests tracked by the tracking circuitry include a pending request to modify a cache entry associated with the target memory address, wherein the cache access circuitry is responsive to the cache access request to determine, based on the prediction, whether to perform an additional lookup of the tracking circuitry. A method and a non-transitory computer-readable medium to store computer-readable code for fabrication of the apparatus are also provided.
G06F 9/38 - Concurrent instruction execution, e.g. pipeline, look ahead
G06F 12/0864 - Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches using pseudo-associative means, e.g. set-associative or hashing
41.
SYSTEM, METHOD AND/OR APPARATUS FOR CONTROLLING A POWER SIGNAL
Briefly, embodiments, such as methods, systems and/or circuits for controlling a power signal to be supplied to a processing device. In one aspect, a magnitude of a power supplied to a processing device may be changed based, at least in part on an estimated and/or predicted load.
An on-chip memory is provided. The memory includes wordline sections, input/output (I/O) circuitry, and control circuitry. Each wordline section includes a number of wordlines, and each wordline section is coupled to a different wordline control circuitry. The control circuitry is configured to, in response to receiving an access request including an address, decode the address including determine, based on the address, an associated wordline, and determine, based on the associated wordline, an associated wordline section. The control circuitry is further configured to apply power to wordline control circuitry coupled to the associated wordline section, and access the address.
G11C 8/08 - Word line control circuits, e.g. drivers, boosters, pull-up circuits, pull-down circuits, precharging circuits, for word lines
G11C 7/10 - Input/output [I/O] data interface arrangements, e.g. I/O data control circuits, I/O data buffers
G11C 8/18 - Address timing or clocking circuits; Address control signal generation or management, e.g. for row address strobe [RAS] or column address strobe [CAS] signals
A behavioral sensor for creating consumable events can include: a feature extractor coupled to receive an event stream of events performed by a circuit, wherein the feature extractor identifies features of a particular event of the event stream and associates the particular event with a time; and a classifier coupled to receive the features of the particular event from the feature extractor, wherein the classifier classifies the particular event into a classified event associated with the time using predefined categories based on the received features of the particular event; whereby the classified event and subsequent classified events extracted from the event stream within a time frame are appended in a time series forming the consumable events.
A burst read with flexible burst length for on-chip memory, such as, for example, system cache memory, hierarchical cache memory, system memory, etc. is provided. Advantageously, successive burst reads are performed with less signal toggling and fewer bitline swings.
Dynamic power management for an on-chip memory, such as a system cache memory as well as other memories, is provided. The memory includes wordline sections, input/output (I/O) circuitry, and control circuitry. Each wordline section includes a number of wordlines, and each wordline section is coupled to a different wordline control circuitry. The control circuitry is configured to, in response to receiving an access request including an address, decode the address including determine, based on the address, an associated wordline, and determine, based on the associated wordline, an associated wordline section. The control circuitry is further configured to apply power to wordline control circuitry coupled to the associated wordline section.
Circuitry including cache storage and control circuitry is provided. The cache storage includes an array of random access memory storage elements, and is configured to store data in multiple cache sectors, each cache sector including a number of cache storage data units. The control circuitry is configured to control access to the cache storage including, for example, accessing the cache storage data units in the cache sectors. After accessing a cache storage data unit in a cache sector, the energy requirement and/or latency for the next access to a cache storage data unit in the same sector is lower than the energy requirement and/or latency for the next access to a cache storage data unit in a different same sector.
Example methods, apparatuses, and/or articles of manufacture are disclosed that may be implemented, in whole or in part, using one or more computing devices to determine options for decisions in connection with design features of a computing device. In a particular implementation, design options for two or more design decisions of neural network processing device may be identified based, at least in part, on combination of a definition of available computing resources and one or more predefined performance constraints.
Systems and methods for processing data for a neural network are described. The system comprises non-transitory memory configured to receive data bits defining a kernel of weights, the data bits being suitable for processing input data; and a data processing unit, configured to: receive bits defining a kernel of weights for the neural network, the kernel of weights comprising one or more non-zero value weights and one or more zero-valued weights; generate a set of mask bits, a position of each bit in the set of mask bits corresponds to a position within the kernel of weights and the value of each bit indicates whether a weight in the corresponding position is a zero-valued weight or a non-zero value weight; and transmit the non-zero value weights and the set of mask bits for storage, the non-zero value weights and the set of mask bits represent the kernel of weights.
H03M 7/30 - Compression; Expansion; Suppression of unnecessary data, e.g. redundancy reduction
G06F 7/544 - Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation using unspecified devices for evaluating functions by calculation
49.
TECHNIQUE FOR TRACKING MODIFICATION OF CONTENT OF REGIONS OF MEMORY
Address translation circuitry (20) converts virtual addresses into physical addresses with reference to intermediate level and final level page tables. Final level descriptors within final level page tables identify address translation data for an associated region of memory. Intermediate level descriptors within intermediate level page tables identify intermediate address translation data used to identify an associated page table at a next level of the page tables. Page table update circuitry (35) maintains state information within each final and intermediate level descriptor, and updates the state information from a clean state to a dirty state: in the final level descriptors to indicate that a modification of content of the associated memory region is permitted; in the intermediate level descriptors to indicate occurrence of an update from the clean state to the dirty state within the state information of any final level descriptors that are accessed via that intermediate level descriptor.
An apparatus and method of converting data into an Enhanced Block Floating Point (EBFP) format with a shared exponent is provided. The EBFP format enables data within a wide range of values to be stored using a reduced number of bits compared with conventional floating-point or fixed-point formats. The data to be converted may be in any other format, such as fixed-point, floating-point, block floating-point or EBFP.
G06F 5/01 - Methods or arrangements for data conversion without changing the order or content of the data handled for shifting, e.g. justifying, scaling, normalising
In a data processor, an input datum, having a sign, a tag and a payload, is decoded by first determining a format of the payload based on the tag. For a first format, an exponent difference and an output fraction are decoded from the payload. For a second format, an exponent difference is decoded from the payload and the output fraction may be assumed to be zero. The exponent difference is subtracted from a shared exponent to produce the output exponent. The decoded output may be stored in a standard format for floating-point numbers.
G06F 7/483 - Computations with numbers represented by a non-linear combination of denominational numbers, e.g. rational numbers, logarithmic number system or floating-point numbers
When performing tile-based rendering in a graphics processing system, lists indicative of fragments to be processed are maintained for respective sub-regions of tiles to be rendered, with each list entry including, inter alia, at least an indication of the coverage within the tile sub-region of the group of fragments that the list entry represents, and an indication of whether the group of fragments that the list entry represents is eligible to undergo particular processing operations. The coverage information and eligibility information for the list entries is then used to control the processing of fragments for sub-regions of a tile, in such a way as to ensure that processing order dependencies are enforced and met.
When performing tile-based rendering in a graphics processing system, lists indicative of fragments to be processed are maintained for respective sub-regions of tiles to be rendered, with each list entry representing a group of one or more fragments and including an indication of the coverage within the tile sub-region of the group of fragments that the list entry represents. The coverage information for the list entries is then used to set for entries in the list indicative of fragments to be processed for a sub-region, information indicating whether one or more processing operations are eligible to be performed for fragments that entries in the list represent.
A method and processor comprising a command processing unit to receive, from a host processor, a sequence of commands to be executed; and generate based on the sequence of commands a plurality of tasks. The processor also comprises a plurality of compute units each having a first processing module for executing tasks of a first task type, a second processing module for executing tasks of a second task type, different from the first task type, and a local cache shared by at least the first processing module and the second processing module. The command processing unit issues the plurality of tasks to at least one of the plurality of compute units, and wherein at least one of the plurality of compute units is to process at least one of the plurality of tasks.
In a data processing system comprising a first cache operable to store data for use when performing a data processing operation, and a second cache operable to store data required for fetching data into the first cache from memory, when it is determined that there is no entry for data for a data processing operation in the first cache, an entry in the first cache is allocated for the required data, and information that indicates an entry in the second cache for data required for fetching the required data is stored in the tag portion of the allocated entry. Then, once a request has been sent to a memory system for the required data, the information in the tag portion for the allocated entry in the first cache that indicates an entry in the second cache is replaced with information indicative of an address for the data required for fetching the required data.
There is provided a processor configured to transfer data to a plurality of processor circuits. The apparatus includes broadcast circuitry that broadcasts first machine learning data to at least a subset of the plurality of processor circuits.
Example methods, apparatuses, and/or articles of manufacture are disclosed that may be implemented, in whole or in part, techniques to apply an image anti-aliasing operation to an image frame.
Various implementations described herein are related to a device having bitline drivers coupled to passgates of bitcells via bitlines and buried metal lines formed within a substrate including a buried enable signal line and a buried ground line coupled to ground connections of the bitline drivers. The buried enable signal line transfers a negative bias to a selected bitline of the bitlines via the buried ground line that is coupled to the ground connections of the bitline drivers so as to increase gate-source bias of the passgates of the selected bitcell to thereby enhance write capability of the selected bitcell.
G11C 11/412 - Digital stores characterised by the use of particular electric or magnetic storage elements; Storage elements therefor using electric elements using semiconductor devices using transistors forming cells with positive feedback, i.e. cells not needing refreshing or charge regeneration, e.g. bistable multivibrator or Schmitt trigger using field-effect transistors only
H01L 27/11 - Static random access memory structures
An apparatus supports decoding and execution of a bulk memory instruction specifying a block size parameter. The apparatus comprises control circuitry to determine whether the block size corresponding to the block size parameter exceeds a predetermined threshold, and performs a micro-architectural control action to influence the handling of at least one bulk memory operation by memory operation processing circuitry. The micro-architectural control action varies depending on whether the block size exceeds the predetermined threshold, and further depending on the states of other components and operations within or coupled with the apparatus. The micro-architectural control action could include an alignment correction action, cache allocation control action, or processing circuitry selection action.
A data processing apparatus is configured to determine a product of two operands stored in an Extended Block Floating-Point format. The operands are decoded, based on their tags and payloads, to generate exponent differences and at least the fractional parts of significands. The significands are multiplied to generate an output significand and shared exponents and exponent differences of the operands are combined to generate an output exponent. Signs of the operands may also be combined to provide an output sign. The apparatus may be combined with an accumulator having one or more lanes to provide an apparatus for determining dot products.
In a data processor, an input value having a sign, an exponent and a significand is encoded by determining an exponent difference between a base exponent and the exponent. When the exponent difference is not less than a first threshold, only the exponent difference, or a designated value, is encoded to a payload of the output value and one or more tag bits of the output value are set to a first value. When the exponent difference is less than the first threshold, the significand and exponent difference are encoded to the payload of an output value and, optionally, the one or more tag bits of the output value. A sign bit in the output value is set corresponding to the sign of the input value, and the output value is stored.
A method to operate a head-mountable processing system, is provided. The head-mountable processing system comprising generating one or more control signals based upon a visual motion of a sequence of images for display by the head-mountable processing system, and transmitting the generated one or more control signals to a plurality of transducers to stimulate a wearer's vestibular system.
A61H 23/02 - Percussion or vibration massage, e.g. using supersonic vibration; Suction-vibration massage; Massage with moving diaphragms with electric or magnetic drive
G06F 3/01 - Input arrangements or combined input and output arrangements for interaction between user and computer
H04R 3/12 - Circuits for transducers for distributing signals to two or more loudspeakers
H04R 1/40 - Arrangements for obtaining desired frequency or directional characteristics for obtaining desired directional characteristic only by combining a number of identical transducers
63.
APPARATUS AND METHOD OF OPTIMISING DIVERGENT PROCESSING IN THREAD GROUPS
A data processor is disclosed in which groups of execution threads comprising a thread group can execute a set of instructions in lockstep, and in which a plurality of execution lanes can perform processing operations for the execution threads. In response to an execution thread issuing circuit determining whether a portion of active threads of a first thread group and a portion of active threads of a second thread group use different execution lanes of the plurality of execution lanes, the execution thread issuing circuit issuing both the portion of active threads of a first thread group and a portion of active threads of a second thread group for execution. This can have the effect of increasing data processor efficiency, thereby increasing throughput and reducing latency.
Disclosed herein is a graphics processor that comprises a programmable execution unit operable to execute programs to perform graphics processing operations. The graphics processor further comprises a dedicated machine learning processing circuit operable to perform processing operations for machine learning processing tasks. The machine learning processing circuit is in communication with the programmable execution unit internally to the graphics processor. In this way, the graphics processor can be configured such that machine learning processing tasks can be performed by the programmable execution unit, the machine learning processing circuit, or a combination of both, with the different units being able to message each other accordingly to control the processing.
There is provided an apparatus configured to operate as a shader core, the shader core configured to perform a complex rendering process comprising a rendering process and a machine learning process, the shader core comprising: one or more tile buffers configured to store data locally to the shader core, wherein during the rendering process, the one or more tile buffers are configured to store rendered fragment data relating to a tile; and during the machine learning process, the one or more tile buffers are configured to store an input feature map, kernel weights or an output feature map relating to the machine learning process.
Aspects of the present disclosure relate to an apparatus comprising a plurality of processing elements having a spatial layout, and control circuitry to assign workloads to said plurality of processing elements. The control circuitry is configured to, based on a timing parameter, determine one or more active processing elements to deactivate; determine, based on the spatial layout, one or more inactive processing elements to activate; and deactivate said one or more active processing elements and activate said one or more inactive processing elements.
A masked-vector-comparison instruction specifies a source vector operand comprising a plurality of source data elements, a mask value, and a comparison target operand. In response to the masked-vector-comparison instruction, an instruction decoder 10 controls processing circuitry 16 to: for each active source data element of the source vector operand, determine whether the active source data element satisfies a comparison condition, based on a masked comparison between one or more compared bits of the active source data element and one or more compared bits of the comparison target operand, the mask value specifying a pattern of compared bits and non-compared bits within the comparison target operand and the active source data element; and generate a result value indicative of which of the source data elements of the source vector operand, if any, is an active source data element satisfying the comparison condition. This instruction is useful for variable length decoding operations.
Example methods, apparatuses, and/or articles of manufacture are disclosed that may be implemented, in whole or in part, techniques to process image signal values sampled from a multi color channel imaging device. In particular, methods and/or techniques disclosed herein are directed to synthesizing a temporally upsampled image frame to be in a temporal sequence of images frames.
G06T 3/40 - Scaling of a whole image or part thereof
G06T 3/00 - Geometric image transformation in the plane of the image
G06T 1/20 - Processor architectures; Processor configuration, e.g. pipelining
G06V 10/56 - Extraction of image or video features relating to colour
G06V 10/60 - Extraction of image or video features relating to illumination properties, e.g. using a reflectance or lighting model
G06V 10/80 - Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
G06V 10/82 - Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
69.
SYSTEM, METHOD AND/OR APPARATUS FOR MAGNETIC MEMORY TESTING
Briefly, embodiments, such as methods and/or systems for operations and/or procedures to test magnetic memory devices. In a particular implementation, a bit error rate of a magnetic memory device may be estimated based, at least in part, on an observed bit error rate in the presence of an externally applied magnetic field.
There is provided a neural processing unit for calculating an attention matrix during machine learning inference. The neural processing unit is configured to calculate: a first score matrix based on differences between a query matrix and a key matrix; a second score matrix based on differences between the key matrix and a learned key matrix; a similarity matrix based on a combination of the first score matrix and second score matrix; and an attention matrix comprising applying a normalisation function to the similarity matrix. Also provided is an apparatus comprising at least one said neural processing unit and at least one memory, the memory configured to pass, on demand, a learned key matrix to the neural processing unit. Also provided is a computer program product having computer readable program code stored thereon which, when executed by said neural processing unit, causes the unit to perform said calculations.
Various implementations described herein are directed to a method that tests and repairs memory fabricated on a wafer or a package. The method may generate and store a reuse table based on memory repair results. The method may manufacture the memory after repairing the memory. The method may access and reuse data stored in the reuse table to repair the memory after manufacturing the memory.
A data processing apparatus is provided. Prefetch circuitry generates a prefetch request for a cache line prior to the cache line being explicitly requested. The cache line is predicted to be required for a store operation in the future. Issuing circuitry issues the prefetch request to a memory hierarchy and filter circuitry filters the prefetch request based on at least one other prefetch request made to the cache line, to control whether the prefetch request is issued by the issuing circuitry.
G06F 12/0862 - Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches with prefetch
G06F 12/0875 - Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches with dedicated cache, e.g. instruction or stack
74.
SYSTEM, DEVICES AND/OR PROCESSES FOR APPLICATION OF KERNEL COEFFICIENTS
Example methods, apparatuses, and/or articles of manufacture are disclosed that may be implemented, in whole or in part, techniques to process image signal intensity values sampled from a multi color channel imaging device. In particular, methods and/or techniques disclosed herein are directed to processing image signal intensity values by application of kernel coefficients to the image signal intensity values.
G06V 10/82 - Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
G06V 10/77 - Arrangements for image or video recognition or understanding using pattern recognition or machine learning using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
G06V 10/40 - Extraction of image or video features
Briefly, embodiments, such as methods and/or systems for employing external memory devices in the execution of activation function such as activation functions implemented in a neural network. In one aspect, a first activation input tensor may be partitioned as a plurality of tensor segments stored in one or more external memory devices. Individual stored tensor segments may be sequentially loaded to memories local to processing circuitry to apply activation functions associated with the stored tensor segments.
There is provided a data processing apparatus in which decode circuitry receives a memory copy instruction containing an indication of a source area of memory, an indication of a destination area of memory, and an indication of a remaining copy length. In response to receiving the memory copy instruction, the decode circuitry generates at least one active memory copy operation or a null memory copy operation. The active memory copy operation causes one or more execution units to perform a memory copy from part of the source area of memory to part of the destination area of memory and the null memory copy operation leaves the destination area of memory unmodified.
Methods and systems for detecting errors when performing a convolutional operation is provided. Predicted checksum data, corresponding to input checksum data and kernel checksum data, is obtained. The convolutional operation is performed to obtain an output feature map. Output checksum data is generated and the predicted checksum data and the output checksum data are compared, the comparing taking account of partial predicted checksum data configured to correct for a lack of padding when performing the convolution operation, wherein the partial predicted checksum data corresponds to input checksum data for a subset of the values in the input feature map and kernel checksum data for a subset of the values in the kernel.
There is provided a data processing apparatus in which receive circuitry receives a result signal from a lower level cache and a higher level cache in respect of a first instruction block. The lower level cache and the higher level cache are arranged hierarchically and transmit circuitry transmits, to the higher level cache, a query for the result signal. In response to the result signal originating from the higher level cache containing requested data, the transmit circuitry transmits a further query to the higher level cache for a subsequent instruction block at an earlier time than the further query is transmitted to the higher level cache when the result signal containing the requested data originates from the lower level cache.
According to one implementation of the present disclosure, a cache memory includes: a plurality of cache-lines, wherein each row of cache-lines comprises: tag bits of a tag-random access memory (tag-RAM); data bits of a data-random access memory (data-RAM), and a single set of retention bits corresponding to the tag-RAM. According to one implementation of the present disclosure, a method includes: sampling a single set of retention bits of a cache-line of a cache memory, where the cache-line comprises the single set of retention bits, tag-RAM and data-RAM, and where at least the single set of retention bits comprise eDRAM bitcells; and performing a refresh cycle of at least the data-RAM corresponding to the tag-RAM based on the sampled single set of retention bits.
A data processing system (1) comprises a plurality of processing units (11) and a controller (30) operable to allocate processing units of the plurality of processing units into respective groups of the processing units, wherein each group of processing units comprises a set of one or more of the processing units of the plurality of processing units. The data processing system further comprises an arbiter (31, 32) for each group of processing units for controlling access by virtual machines (33, 34) that require processing operations to the processing units of the group of processing units that the arbiter has been allocated.
Security measures for signal paths with tree structures can be implemented at design phase using an EDA software program or tool with security feature functionality that, when executed by a computing system, directs the computing system to: display a canvas through which components of a circuit are arranged; and provide a menu of commands, including an option to add components from a library to the canvas and an option to secure a tree. In response to receiving a selection of the option to secure the tree, the system can be directed to add a hardware countermeasure coupled to at least two lines or terminal nodes of a tree structure identified from components on the canvas or in a netlist corresponding to a circuit's design.
G06F 21/75 - Protecting specific internal or peripheral components, in which the protection of a component leads to protection of the entire computer to assure secure computing or processing of information by inhibiting the analysis of circuitry or operation, e.g. to counteract reverse engineering
G06F 30/398 - Design verification or optimisation, e.g. using design rule check [DRC], layout versus schematics [LVS] or finite element methods [FEM]
The present disclosure relates to a method of inter-layer format conversion for a neural network, the neural network comprising at least two computation layers including a first layer to process first data in a first data format and a second layer to process second data in a second data format, the method comprising: extracting data statistics from data output by the first layer, said data statistics being representative of the data output by the first layer; determining one or more conversion parameters based on the extracted data statistics and the second data format; and generating the second data for the second layer by modifying said data output by the first layer using the one or more conversion parameters.
According to one implementation of the present disclosure, an integrated circuit includes comparator circuitry coupled to peripheral circuitry of a multiport memory and configured to transmit one or more data input signals or one or more write enable signals to respective memory outputs when a memory address collision is detected for one or more respective bitcells of the multi-port memory. In another implementation, a method comprises: detecting a read operation and a write operation to a same memory bitcell of a multiport memory in one clock cycle and in response to the detection, performing the read operation of a data input signal or a write enable signal of the multiport memory.
A method, system and apparatus provide bit-sparse neural network optimization. Rather than quantizing and pruning weight and activation elements at the word level, weight and activation elements are pruned at the bit level, which reduces the density of effective “set” bits in weight and activation data, which, advantageously, reduces the power consumption of the neural network inference process by reducing the degree of bit-level switching during inference.
An apparatus comprises an non-inclusive cache (14) configured to cache data and coherency control circuitry (16). The coherency control circuitry is configured to look up the non-inclusive cache in response to a coherent access request from a first requestor (4). In response to determining that the coherent access request can be serviced using data stored in a matching entry of the non-inclusive cache, the coherency control circuitry references snoop-filter information associated with the matching entry to determine whether the first requestor can use the data stored in the matching entry without waiting for a response to a snoop of a coherent cache (8).
There is provided a computer-implemented method of defining bounding boxes for a primitive in a tile-based graphics processing pipeline comprising determining a part-way point on the primitive, wherein, for each pair of vertices, a part-way point is part-way between that pair of vertices, and defining a plurality of bounding boxes, wherein each bounding box intersects a part-way point. Also provided is a bounding box generation circuit comprising a part-way point calculation circuit to determine a part-way point on the primitive, wherein, for each pair of vertices, a part-way point is part-way between that pair of vertices, wherein the bounding box generation circuit to define a plurality of bounding boxes based upon the determined part-way point, wherein each bounding box intersects a part-way point. A method of defining bounding boxes for a point primitive is also provided.
A data processing system (1) comprises a plurality of, e.g. graphics, processing units (11), and a management circuit (12) associated with the processing units and operable to configure the processing units of the plurality of processing units into respective groups of the processing units. The management circuit (12) is configured to always operate with a high level of fault protection, but the groups of the processing units can be selectively operated with either a higher level of fault protection or a lower level of fault protection, by selectively subjecting them to fault detection testing (60).
Various implementations described herein are related to a device having memory architecture having multiple bitcell arrays. The device may include column multiplexer circuitry coupled to the memory architecture via multiple bitlines for read access operations. The column multiplexer circuitry may perform read access operations in the multiple bitcell arrays via the bitlines based on a sense amplifier enable signal and a read multiplexer signal. The device may include control circuitry that provides the read multiplexer signal to the column multiplexer circuitry based on a clock signal and the sense amplifier enable signal so that the column multiplexer circuitry is able to perform the read access operations.
Processing circuitry performs a processing operation to generate a two's complement result value representing a positive or negative number in two's complement representation. Normalization-and-rounding circuitry converts the two's complement result value to a normalized-and-rounded floating-point result value represented using sign-magnitude representation. The normalization-and-rounding circuitry comprises incrementing circuitry to perform an increment addition (e.g. a rounding increment or a conversion increment) to generate a fraction of the normalized-and-rounded floating-point result value. For an operation where the increment addition is required to be performed, tininess detection circuitry detects the after-rounding tininess status based on a still-to-be-incremented version of the normalized-and-rounded floating-point result value prior to the increment addition by the increment circuitry.
G06F 7/483 - Computations with numbers represented by a non-linear combination of denominational numbers, e.g. rational numbers, logarithmic number system or floating-point numbers
Various implementations described herein are directed to a device having memory with banks of bitcells with each bank having a bitcell array. The device may have header circuitry that powers-up a selected bank and powers-down unselected banks during a wake-up mode of operation. In some instances, only the selected bank of the memory is powered-up with the header circuitry during the wake-up mode of operation.
G11C 11/412 - Digital stores characterised by the use of particular electric or magnetic storage elements; Storage elements therefor using electric elements using semiconductor devices using transistors forming cells with positive feedback, i.e. cells not needing refreshing or charge regeneration, e.g. bistable multivibrator or Schmitt trigger using field-effect transistors only
91.
METHODS AND APPARATUS FOR TRANSFERRING DATA WITHIN HIERARCHICAL CACHE CIRCUITRY
Aspects of the present disclosure relate to an apparatus comprising processing circuitry, first cache circuitry and second cache circuitry, wherein the second cache circuitry has an access latency higher than an access latency of the first cache circuitry. The second cache circuitry is responsive to receiving a request for data stored within the second cache circuitry to identify said data as pseudo-invalid data and provide said data to the first cache circuitry. The second cache circuitry is responsive to receiving an eviction indication, indicating that the first cache circuitry is to evict said data, to, responsive to determining that said data has not been modified since said data was provided to the first cache circuitry, identify said pseudo-invalid data as valid data.
G06F 12/126 - Replacement control using replacement algorithms with special data handling, e.g. priority of data or instructions, handling errors or pinning
G06F 12/0811 - Multiuser, multiprocessor or multiprocessing cache systems with multilevel cache hierarchies
92.
METHODS AND APPARATUS FOR MANAGING TRUSTED DEVICES
Aspects of the present disclosure relate to an apparatus comprising TEE circuitry configured to maintain a list of trusted devices, and interface circuitry to provide communication between the TEE of the apparatus and TEE circuitry of a device communicatively coupled to the apparatus. The TEE circuitry of the apparatus is configured to perform, with the TEE circuitry of the device, a remote attestation in respect of the TEE circuitry of the device. Responsive to a positive outcome of the remote attestation, the device is added to the list of trusted devices. The TEE of the apparatus receives, from the TEE circuitry of the device, an indication of one or more further devices which are trusted by the device, and adds said one or more further devices to the list of trusted devices.
G06F 21/53 - Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems during program execution, e.g. stack integrity, buffer overflow or preventing unwanted data erasure by executing in a restricted environment, e.g. sandbox or secure virtual machine
H04L 9/32 - Arrangements for secret or secure communications; Network security protocols including means for verifying the identity or authority of a user of the system
93.
PREDICTION OF NUMBER OF ITERATIONS OF A FETCHING PROCESS
Prediction circuitry predicts a number of iterations of a fetching process to be performed to control fetching of data/instructions for processing operations that are predicted to be performed by processing circuitry. The processing circuitry can tolerate performing unnecessary iterations of the fetching process following an over-prediction of the number of iterations. In response to the processing circuitry resolving an actual number of iterations, the prediction circuitry adjusts the prediction state information used to predict the number of iterations, based on whether a first predicted number of iterations, predicted based on a first iteration prediction parameter, provides a good prediction (when the first predicted number of iterations is in a range i_cnt to i_cnt+N, where i_cnt is the actual number of iterations and N≥1), or a misprediction (when the first predicted number of iterations is outside the range i_cnt to i_cnt+N).
An apparatus comprises a cache comprising a plurality of cache entries, and cache replacement control circuitry to select, in response to a cache request specifying a target address missing in the cache, a victim cache entry to be replaced with a new cache entry. The cache request specifies a partition identifier indicative of an execution environment associated with the cache request. The victim cache entry is selected based on re-reference interval prediction (RRIP) values for a candidate set of cache entries. The RRIP value for a given cache entry is indicative of a relative priority with which the given cache entry is to be selected as the victim cache entry. Configurable replacement policy configuration data is selected based on the partition identifier, and the RRIP value of the new cache entry is set to an initial value selected based on the selected configurable replacement policy configuration data.
An apparatus comprises a cache comprising a plurality of cache entries, and cache replacement control circuitry to select, in response to a cache request specifying a target address missing in the cache, a victim cache entry to be replaced with a new cache entry. The cache request specifies a partition identifier indicative of an execution environment associated with the cache request. The victim cache entry is selected based on re-reference interval prediction (RRIP) values for a candidate set of cache entries. The RRIP value for a given cache entry is indicative of a relative priority with which the given cache entry is to be selected as the victim cache entry. Configurable replacement policy configuration data is selected based on the partition identifier, and the RRIP value of the new cache entry is set to an initial value selected based on the selected configurable replacement policy configuration data.
[FIG. 1]
G06F 12/121 - Replacement control using replacement algorithms
G06F 12/0891 - Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches using clearing, invalidating or resetting means
G06F 12/06 - Addressing a physical block of locations, e.g. base addressing, module addressing, address space extension, memory dedication
There is provided a data processing apparatus comprising history storage circuitry that stores sets of behaviours of helper instructions for a control flow instruction. Pointer storage circuitry stores pointers, each associated with one of the sets. The behaviours in the one of the sets are indexed according to one of the pointers associated with that one of the sets. Increment circuitry increments at least some of the pointers in response to an increment event and prediction circuitry determines a predicted behaviour of the control flow instruction using one of the sets of behaviours.
Processing circuitry performs processing operations in response to micro-operations. Front end circuitry supplies the micro-operations to be processed by the processing circuitry. Prediction circuitry generates a prediction of a number of loop iterations for which one or more micro-operations per loop iteration are to be supplied by the front end circuitry, where an actual number of loop iterations to be processed by the processing circuitry is resolvable by the processing circuitry based on at least one operand corresponding to a first loop iteration to be processed by the processing circuitry. The front end circuitry varies, based on a level of confidence in the prediction of the number of loop iterations, a supply rate with which the one or more micro-operations for at least a subset of the loop iterations are supplied to the processing circuitry.
An apparatus has processing circuitry to perform an accumulation operation in which a first addend is added to a second addend. The apparatus has storage circuitry to store the second addend in a plurality of lanes, each lane having a significance different to that of each other lane. Each lane within at least a subset of the lanes comprises at least one overlap bit having the same bit significance as a bit in an adjacent more significant lane in the plurality of lanes. The accumulation operation includes selecting an accumulating lane out of the plurality of lanes and performing an addition operation between bits of the accumulating lane and the first addend. The at least one overlap bit of the accumulating lane enables the addition operation to be performed without a possibility of overflowing the accumulating lane.
Various implementations described herein are directed to a device having memory circuitry having multi-port bitcells, wherein each bitcell of the multi-port bitcells has a read-write port and a read port. The device may have read-write circuitry coupled to the read-write port, wherein the read-write circuitry has write-drive logic and read-sense logic that provide for at least one write and at least one read in a single clock cycle.
G11C 11/412 - Digital stores characterised by the use of particular electric or magnetic storage elements; Storage elements therefor using electric elements using semiconductor devices using transistors forming cells with positive feedback, i.e. cells not needing refreshing or charge regeneration, e.g. bistable multivibrator or Schmitt trigger using field-effect transistors only
99.
TECHNIQUE FOR CONSTRAINING ACCESS TO MEMORY USING CAPABILITIES
An apparatus and method for constraining access to memory using capabilities. Processing circuitry performs operations during which access requests to memory are generated, with memory addresses for the access requests being generated using capabilities that identify constraining information. Capability checking circuitry performs a capability check operation to determine whether a given access request whose memory address is generated using a given capability is permitted based on the constraining information. Memory access checking circuitry then further constrains access to the memory by the given access request in dependence on a level of trust. The given capability has a capability level of trust associated therewith, and the level of trust associated with the given access request is dependent on both the current mode level of trust associated with the current mode of operation of the processing circuitry, and the capability level of trust of the given capability.
A data processing system that comprises plural processing units is disclosed. The system includes functional units, the functional units having different processing capacities. A set of one or more processing units can operate in combination with one of the functional units according to a processing capacity required for the set of one or more processing units.