An apparatus includes processing circuitry that performs data processing in response to instructions of a software execution environment. Configuration storage circuitry stores a set of memory transaction parameters in association with a partition identifier, and configuration application circuitry applies the set of memory transaction parameters with respect to memory transactions issued by the software execution environment that identifies the partition identifier. The memory transaction parameters comprise a minimum target allocation and a maximum target allocation of a storage capacity of at least part of a memory system in handling the memory transaction that identifies the partition identifier.
A behavioral system level detector and method that filters local alerts to generate system alerts with an increased confidence level is provided. The method includes receiving local alerts from a local detector that detects events from a processing unit, wherein each local alert comprises information of an event from the processing unit and a timing relationship for the event, filtering the local alerts to determine events indicating an undesirable behavior or attack, and responsive to the determination that there are events indicating the undesirable behavior or the attack, generating a system alert. The behavioral system-level detector includes a shared data structure for storing local alerts received from at least one local detector and system processing unit coupled to the shared data structure to receive the local alerts and coupled to receive state information from the processing units.
Various implementations described herein are related to a device including a bitcell having a bitcell layout with a first metal layer, a second metal layer and a via programming layer. The device may have a via marking layer provided in the bitcell layout for the bitcell, and the via marking layer controls optical proximity correction of the first metal layer and the second metal layer.
G11C 17/12 - Read-only memories programmable only once; Semi-permanent stores, e.g. manually-replaceable information cards using semiconductor devices, e.g. bipolar elements in which contents are determined during manufacturing by a predetermined arrangement of coupling elements, e.g. mask-programmable ROM using field-effect devices
G11C 7/18 - Bit line organisation; Bit line lay-out
Data storage circuitry has entries to store data according to a data storage technology supporting non-destructive reads, each entry associated with an error checking code (ECC) and age indication. Scrubbing circuitry performs a patrol scrubbing cycle to visit each entry of the data storage circuitry within a scrubbing period. On a given visit to a given entry, the scrubbing operation comprises determining, based on the age indication associated with the given entry, whether a check-not-required period has elapsed for the given entry, and if so performing an error check on the data of the given entry using the ECC for that entry. The error check is omitted if the check-not-required period has not yet elapsed. The check-not-required period is restarted for a write target entry in response to a request causing an update to the data and the error checking code of the write target entry. The check-not-required period is restarted for a read target entry in response to a request causing the data of a read target entry to be non-destructively read and subject to the error check.
There is provided an apparatus and method, the apparatus comprising storage circuitry to store event information associated with instructions occurring between instrumentation points. The event information indicates a plurality of different types of events expected to occur during execution of the instructions. The event information comprises, for each event, type information indicating a type of that event and an expected number of occurrences of that event. The apparatus is also provided with monitoring circuitry comprising a plurality of programmable counters. The monitoring circuitry is responsive to a start instrumentation point, to assign at least a subset of the plurality of programmable counters to measure, during execution of the program instructions, occurrences of the plurality of different types of events identified in the event information. The monitoring circuitry is responsive to at least one counter deviating from the expected number of occurrences indicated by that counter, to perform a predetermined action.
Addition circuitry performs a saturating addition of a first number and a second number to generate a result value indicating an addition result corresponding to addition of the first number and the second number when the addition result is within a predetermined range and indicating a saturation value when the addition result is outside the predetermined range. The addition circuitry comprises: saturation lookahead circuitry to determine, for each lane of the result value, a respective set of one or more saturation lookahead status indications indicative of whether that lane should be set to represent part of the saturation value; and addition result generating circuitry to generate result bits for each lane, with a given lane of the result value having a value determined as a function of corresponding bits of the first and second numbers and a corresponding set of one or more saturation lookahead status indications determined for that lane by the saturation lookahead circuitry.
G06F 7/508 - Adding; Subtracting in bit-parallel fashion, i.e. having a different digit-handling circuit for each denomination with simultaneous carry generation for, or propagation over, two or more stages using carry look-ahead circuits
An apparatus comprises a divide/square-root pipeline comprising: a plurality of divide/square-root iteration pipeline stages each to perform a respective iteration of a digit-recurrence divide or square root operation; and signal paths to supply outputs generated by one divide/square root iteration pipeline stage in one iteration as inputs to a subsequent divide/square root iteration pipeline stage of the divide/square-root pipeline for performing a subsequent iteration of the digit-recurrence divide or square root operation. The divide/square-root pipeline is capable of performing the digit-recurrence divide or square root operation on a floating-point operand to generate a floating-point result.
There is provided a data processing apparatus and method. The data processing apparatus comprises a plurality of processing elements connected via a network arranged on a single chip to form a spatial architecture. Each processing element comprising processing circuitry to perform processing operations and memory control circuitry to perform data transfer operations and to issue data transfer requests for requested data to the network. The memory control circuitry is configured to monitor the network to retrieve the requested data from the network. Each processing element is further provided with local storage circuitry comprising a plurality of local storage sectors to store data associated with the processing operations, and auxiliary memory control circuitry to monitor the network to detect stalled data (S60). The auxiliary memory control circuitry is configured to transfer the stalled data from the network to an auxiliary storage buffer (S66) dynamically selected from amongst the plurality of local storage sectors (S64).
According to the present techniques there is provided a computer implemented method of securely provisioning data to a trusted operation environment hosted on a service, the method performed at the service, comprising: receiving, from a remote resource, a request for functionality at the service; allocating a trusted operation environment in response to the request; initiating a boot sequence at the trusted operation environment; establishing a secure communication session between the trusted operation environment and the remote resource; transmitting, from the trusted operation environment to the remote resource, a request for data; receiving, at the trusted operation environment, the requested data; accessing, at the trusted operation environment, the received data; progressing, at the trusted operation environment, the boot sequence using the received data.
G06F 21/57 - Certifying or maintaining trusted computer platforms, e.g. secure boots or power-downs, version controls, system software checks, secure updates or assessing vulnerabilities
G06F 21/53 - Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems during program execution, e.g. stack integrity, buffer overflow or preventing unwanted data erasure by executing in a restricted environment, e.g. sandbox or secure virtual machine
Various implementations described herein are related to a device with a first clock generator that provides a first pulse signal based on a clock signal, wherein the first clock generator has a first tracking circuit that provides a first reset signal based on the first pulse signal. The device may include a second clock generator that provides a second pulse signal based on the clock signal, wherein the second clock generator has a second tracking circuit that provides a first control signal based on the second pulse signal. Also, the device may include a third clock generator that provides a third pulse signal based on the first reset signal, wherein the third clock generator has a logic circuit that provides a second control signal based on the third pulse signal.
Subject matter disclosed herein may relate to storage and/or processing of signals and/or states representative of neural network parameters in a computing device, and may relate more particularly to configuring circuitry in a computing device to process signals and/or states representative of neural network parameters.
Various implementations described herein are directed to a device having a set-reset architecture with multiple latches including a first latch, a second latch and a third latch. The first latch may receive a first data signal from first logic and provide a first latched data signal as a first output based on a clock signal. The second latch may receive a second data signal from second logic and provide a second latched data signal as a second output based on the clock signal. The third latch may receive a third data signal from third logic and provide a third latched data signal as a third output based on the clock signal.
A data processing apparatus comprises operand routing circuitry configured to prepare operands for processing, and a plurality of processing elements. Each processing element comprises receiving circuitry, processing circuitry, and transmitting circuitry. A group of coupled processing elements comprises a first processing element configured to receive operands from the operand routing circuitry and one or more further processing elements for which the receiving circuitry is coupled to the transmitting circuitry of another processing element in the group. The apparatus also comprises timing circuitry, configured to selectively delay transmission of operands within the group of coupled processing elements to cause operations performed by the group of coupled processing elements to be staggered.
There is provided a memory protection unit configured to maintain region metadata associated with storage regions of off-chip storage and protection metadata associated with each of the storage regions. The protection metadata is stored in the off-chip storage, and the region metadata encodes whether each of the storage regions belongs to a set of protected storage regions or to a set of unprotected storage regions and encodes information indicating corresponding protection metadata associated with each storage region. The memory protection unit is configured to update the region metadata in response to a region update request identifying a given storage region for which the region metadata is to be modified and to dynamically adjust an amount of memory required to store protection metadata associated with the set of protected storage regions in response to the update to the region metadata.
A super home node of a first chip of a multi-chip data processing system manages coherence for both local and remote cache lines accessed by local caching agents and local cache lines accessed by caching agents of one or more second chips. Both local and remote cache lines are stored in a shared cache, and requests are stored in shared point-of-coherency queue. An entry in a snoop filter table of the super home node includes a presence vector that indicates the presence of a remote cache line at specific caching agents of the first chip or the presence of a local cache line at specific caching agents of the first chip and any caching agent of the second chip. All caching agents of the second chip are represented as a single caching agent in the presence vector.
An apparatus, method and computer program are described. The apparatus comprises throttling control circuitry (215), associated with a given processing element (205), to perform a throttling-level selection process to select a throttling level indicative of an execution rate at which higher-power processing tasks received by the given processing element are to be issued to processing circuitry of the given processing element. The apparatus also comprises power management circuitry (220) to perform a power control process to select an operating voltage and/or clock frequency to be used by the given processing element, wherein the power management circuitry is configured to select the operating voltage and/or clock frequency in dependence on the throttling level selected for the given processing element, and at least one register (225) accessible to firmware. The apparatus is configured to control the selection of at least one energy control parameter in dependence on a value read from the at least one register.
A data processing apparatus comprises an instruction decoder and processing circuitry. The processing circuitry is configured to perform a load-with-substitution operation in response to the instruction decoder decoding a load-with-substitution instruction specifying an address and a destination register. In the load-with-substitution operation, the processing circuitry is configured to issue a request to obtain target data corresponding to the address from one or more caches. In response to the request hitting in a given cache belonging to a subset of the one or more caches, the processing circuitry is configured to provide to the destination register of the load-with-substitution instruction the target data obtained from the given cache. In response to the request missing in each cache belonging to the subset of the one or more caches, the processing circuitry is configured to provide a substitute value as the correct architectural result corresponding to the destination register of the load-with-substitution instruction.
A data processing apparatus includes one or more cache configuration data stores, a coherence manager, and a shared cache. The coherence manager is configured to track and maintain coherency of cache lines accessed by local caching agents and one or more remote caching agents. The cache lines include local cache lines accessed from a local memory region and remote cache lines accessed from a remote memory region. The shared cache is configured to store local cache lines in a first partition and to store remote cache lines in a second partition. The sizes of the first and second partitions are determined based on values in the one or more cache configuration data stores and may or not overlap. The cache configuration data stores may be programmable by a user or dynamically programmed in response to local memory and remote memory access patterns.
G06F 12/084 - Multiuser, multiprocessor or multiprocessing cache systems with a shared cache
G06F 12/0811 - Multiuser, multiprocessor or multiprocessing cache systems with multilevel cache hierarchies
G06F 12/0891 - Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches using clearing, invalidating or resetting means
There is provided an apparatus, method and medium. The apparatus comprises processing circuitry to process instructions and a reorder buffer identifying a plurality of entries having state information associated with execution of one or more of the instructions. The apparatus comprises allocation circuitry to allocate entries in the reorder buffer, and to allocate at least one compressed entry corresponding to a plurality of the instructions. The apparatus comprises memory access circuitry responsive to an address associated with a memory access instruction corresponding to access-sensitive memory and the memory access instruction corresponding to the compressed entry, to trigger a reallocation procedure comprising flushing the memory access instruction and triggering reallocation of the memory access instruction without the compression. The allocation circuitry is responsive to a frequency of occurrence of memory access instructions addressing the access-sensitive memory meeting a predetermined condition, to suppress the compression whilst the predetermined condition is met.
A 1-hot path signature accelerator includes a register, first and second accumulator, and an outer product circuit. The register stores an input frame, where the input frame has, at most, one bit of each element set. The first accumulator calculates a present summation by adding the input frame to a previous sum of previous input frames inputted to the 1-hot path signature accelerator within a timeframe. The outer product circuit receives each element of the present summation from the first accumulator and each element of the input frame stored in the register to output a present outer product. Since the input frame has at most one bit of each element set, the outer product circuit is reduced to a logical operation. The second accumulator outputs a present second-layer summation by adding the present outer product to a previous second-layer sum of outputs from the outer product circuit within the timeframe.
G06F 7/501 - Half or full adders, i.e. basic adder cells for one denomination
G06F 5/01 - Methods or arrangements for data conversion without changing the order or content of the data handled for shifting, e.g. justifying, scaling, normalising
Prediction circuitry for a data processing system comprises input circuitry to receive status inputs associated with instructions or memory access requests processed by the data processing system. Unified predictor circuitry comprises shared hardware circuitry configurable to act as a plurality of different types of predictor. The unified predictor circuitry generates, according to a unified prediction algorithm based on the status inputs and a set of predictor parameters, an array of predictions comprising different types of prediction of instruction/memory-access behaviour for the data processing system. A configuration subset of the predictor parameters is configurable to adjust a relative influence of each status input in the unified prediction algorithm used to generate the array of predictions. Output circuitry outputs, based on the plurality of types of prediction, speculative action control signals for controlling the data processing system to perform speculative actions.
A computer implemented method is provided. The computer implemented method includes receiving an intermediate representation of a source code, intentionally injecting a weak code path at a point within the intermediate representation to create a modified intermediate representation, performing a path profiling on the modified intermediate representation to generate a particular path identifier for each path within the modified intermediate representation, and identifying the particular path identifier of the weak code path for use by a monitoring system. A monitoring system is also provided. The monitoring system monitors an executable code during runtime for execution of a path having a particular path identifier corresponding to the injected intentionally weak code path.
A method to distribute verification of attestation evidence and a verifiable system are provided. Method includes receiving, at a secondary verifier operating in a verifiable system, a request from a relying party to perform a verification process with respect to attestation evidence of a device in communication with the relying party, communicating self-attestation evidence, by the secondary verifier, to a trusted verifier to generate an attestation report of the verifiable system, communicating the attestation report of the verifiable system or other indicator of trustworthiness to the relying party to indicate trustworthiness of the secondary verifier with respect to performing the verification process, and performing, by the secondary verifier, the verification process on the attestation evidence of the device in communication with the relying party. The verifiable system includes instructions of a secondary verifier stored and executed by the verifiable system to perform the method to distribute verification of attestation evidence.
In a data processing network, error detection information (EDI) is generated for first data of a first communication protocol of a plurality of communication protocols, the EDI including an error detection code and an associated validity indicator for each field group in a set of field groups. The first data and the EDI are sent through a network interconnect circuit, where the first data is translated to second data of a second communication protocol. An error is detected in the second data received from the network interconnect circuit when a validity indicator for a field group is set in EDI received with the second data and an error detection code generated for second data in the field group does not match the error detection code associated with the field group in the received EDI.
A data processing apparatus comprises: execution circuitry to execute instructions in order to perform data processing operations specified by those instructions; a plurality of registers to store data values for access by the execution circuitry when performing the data processing operations, each register having an associated physical register identifier; register rename circuitry to select physical register identifiers to associate with architectural register identifiers specified by the instructions; and rename storage having a plurality of entries, each entry being associated with one of the architectural register identifiers and used by the register rename circuitry to indicate a physical register identifier selected for association with that one of the architectural register identifiers; the register rename circuitry comprising an execute unit, and being responsive to detection of an early execute condition for a given instruction, the early execute condition requiring at least detection that each source value required to execute the given instruction is available to the register rename circuitry without accessing the plurality of registers, to cause the execute unit to perform the data processing operation specified by the given instruction in order to generate a result value, and to cause the generated result value to be stored in an entry of the rename storage associated with a destination architectural register identifier specified by the given instruction.
Various implementations described herein are related to a device having a memory architecture with a multi-stack of transistors that may be arranged in a multi-bitcell stack configuration. Also, the memory architecture may have a wordline that may be shared across the transistors of the multi-stack of transistors with each transistor coupled to a different bitline.
A method is provided for generating a mipmap. A processor comprises a neural processing engine comprising a plurality of hardware units suitable for performing integer operations on machine learning models. The method comprises receiving initial image data and one or more commands to perform an operation for generating a further layer of image data from the initial image data that has a different resolution to the initial image data. The method processes the commands to generate the further layer using the neural processing engine of the processor.
Briefly, example methods, apparatuses, and/or articles of manufacture are disclosed that may facilitate and/or support scheduling tasks for one or more hardware components of a computing device.
Example methods, apparatuses, and/or articles of manufacture are disclosed that may be implemented, in whole or in part, in techniques to process image signal intensity values sampled from a multi color channel imaging device. In particular, such techniques may comprise application of convolution operations with kernel coefficients selected from a set of coefficient values such that the same coefficient value is to be applied to image signal intensity values of multiple pixel locations in an image frame.
A system, method and computer program product configured to control a plurality of parallel programs operating in an n-dimensional hierarchical iteration space over an n-dimensional data space, comprising: a processor and a memory configured to accommodate the plurality of parallel programs and the data space; a memory access control decoder configured to decode memory location references to regions of the n-dimensional data space from indices in the plurality of parallel programs; and an execution orchestrator responsive to the memory access control decoder and configured to sequence regions of the n-dimensional hierarchical iteration space of the plurality of parallel programs to honour a data requirement of at least a first of the plurality of parallel programs having a data dependency on at least a second of the plurality of parallel programs.
The present disclosure relates to A data processing apparatus, comprising: graph partitioning circuitry configured to receive a computation graph, the computation graph comprising a plurality of nodes representing operators and a plurality of edges representing relationships amongst the plurality of operators, and to divide the computation graph into a plurality of partitions, each partition comprising one or more nodes and/or edges; graph compilation circuitry configured to compile a computation graph to generate one or more compilation outputs; and storage to store the one or more compilation outputs; wherein the graph compilation circuitry is configured to: compile a first partition of the plurality of partitions to generate a first compilation output; and output the first compilation output to a first target portion of the storage assigned to the first partition.
Data transfer between caching domains of a data processing system is achieved by a local coherency node (LCN) of a first caching domain receiving a read request for data associated with a second caching domain, from a requesting node of the first caching domain. The LCN requests the data from the second caching domain via a transfer agent. In response to receiving a cache line containing the data from the second caching domain, the transfer agent sends the cache line to the requesting node, bypassing the LCN and, optionally, sends a read-receipt indicating the state of the cache line to the LCN. The LCN updates a coherency state for the cache line in response to receiving the read-receipt from the transfer agent and a completion acknowledgement from the requesting node. Optionally, the transfer agent may send the cache line via the LCN when congestion is detected in a response channel of the data processing system.
G06F 12/00 - Accessing, addressing or allocating within memory systems or architectures
G06F 12/0888 - Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches using selective caching, e.g. bypass
33.
Traffic Isolation at a Chip-To-Chip Gateway of a Data Processing System
A mechanism for error containment in a data processing system includes receiving a transaction request at a gateway between a host and a device, allocating an entry for the request in a local request tracker of the gateway and sending a link request, to a port of the gateway. In response to an isolation trigger, the port is moved into isolation by completing in-process requests with entries in the tracker and locking the entries. On receiving a response to an in-process request while the port is in isolation, the response is dropped, the associated entry is unlocked, and allocation of the entry is enabled. A completion response is sent to the requester without dispatching a new link request to the port. When requests are completed, the system is quiesced, locked entries are unlocked, and the port is moved out of isolation.
G06F 11/20 - Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements
34.
SYSTEM, DEVICES AND/OR PROCESSES FOR ASSIGNMENT, CONFIGURATION AND/OR MANAGEMENT OF ONE OR MORE HARDWARE COMPONENTS OF A COMPUTING DEVICE
Briefly, example methods, apparatuses, and/or articles of manufacture are disclosed that may facilitate and/or support assignment, configuration and/or management of one or more hardware components of a computing device.
G06F 9/455 - Emulation; Interpretation; Software simulation, e.g. virtualisation or emulation of application or operating system execution engines
G06F 9/48 - Program initiating; Program switching, e.g. by interrupt
G06F 21/52 - Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems during program execution, e.g. stack integrity, buffer overflow or preventing unwanted data erasure
A processor to: receive a task to be executed, the task comprising a task-based parameter associated with the task, for use in determining a position, within an array of data descriptors, of a particular data descriptor of a particular portion of data to be processed in executing the task. Each of the data descriptors in the array of data descriptors has a predetermined size and is indicative of a location in a storage system of a respective portion of data. The processor derives, based on the task, array location data indicative of a location in the storage system of a predetermined data descriptor, and obtains the particular data descriptor, based on the array location data and the task-based parameter. The processor obtains the particular portion of data based on the particular data descriptor and processes the particular portion of data in executing the task.
A processor comprising: a handling unit; a plurality of components each configured to execute a function. The handling unit can receive a task comprising operations on data in a coordinate space having N dimensions, receive a data structure describing execution of the task and comprising a partially ordered set of data items each associated with instructions usable by the plurality of components when executing the task, each data item is associated with a component among the plurality of components, each data item indicates dimensions of the coordinates space for which changes of coordinate causes the function of the associated component to execute, and dimensions of the coordinate space for which changes of coordinate causes the function of the associated component to store data ready to be used by another component. The handling unit iterates over the coordinate space and executes the task using the partially ordered set of data items.
Provided is an access controller configured to control access to a shared resource by plural accessors operable to issue requests for access to the shared resource, the access controller comprising a predictor configured to analyze an activity of an accessor to determine a type of at least one event; predict a future request state of at least one of the accessors based on the determination; and select one of the plurality of accessors to be granted access to the shared resource on a future processor cycle, the selecting being computed using at least the prediction.
A memory unit configured for handling task data, the task data describing a task to be executed as a directed acyclic graph of operations, wherein each operation maps to a corresponding execution unit, and wherein each connection between operations in the acyclic graph maps to a corresponding storage element of the execution unit. The task data defines an operation space representing the dimensions of a multi-dimensional arrangement of the connected operations to be executed represented by the data blocks; the memory unit configured to receive a sequence of processing requests comprising the one or more data blocks with each data block assigned a priority value and comprising a block command. The memory unit is configured to arbitrate between the data blocks based upon the priority value and block command to prioritize the sequence of processing requests and wherein the processing requests include writing data to, or reading data from storage.
A data processing system comprising a processor (306) that is configured to perform neural network processing having one or more execution units (213, 214) configured to perform processing operations for neural network processing and a control circuit (217) configured to distribute processing tasks to the execution unit or units, and a graphics processor (304) comprising a programmable execution unit (203) operable to execute processing programs to perform processing operations. The control circuit (217) of the processor (306) that is configured to perform neural network processing is configured to, in response to an indication of particular neural network processing to be performed provided to the control circuit, cause the programmable execution unit (203) of the graphics processor to execute a program to perform the indicated neural network processing.
A method and apparatus for distributing operations for execution. Input data is received and is subdivided into portions, each comprising a first and second sub-portion. A first operation and a second operation are received. Dependencies between the first and second operations are identified. For each portion the first operation is issued for execution on the first sub-portion to produce a first output sub-portion, and completion is tracked. The first operation is issued for execution on the second sub-portion to produce a second output sub-portion. Depending upon satisfaction of the dependencies in respect of the first sub-portion, either the second operation to be executed on the first output sub-portion is issued, if the dependencies are met; or the second operation, to be executed on the first output sub-portion is stalled, if the dependencies are not met. This is repeated for each subsequent portion.
A processor to generate position data indicative of a position within a compressed data stream, wherein, previously, in executing a task, data of the compressed data stream ending at the position has been read by the processor from storage storing the compressed data stream. After reading the data, the processor reads further data of the compressed data stream from the storage, in executing the task, the further data located beyond the position within the compressed data stream. After reading the further data, the processor reads, based on the position data, a portion of the compressed data stream from the storage, in executing the task, starting from the position within the compressed data stream. The processor decompresses the portion of the compressed data stream to generate decompressed data, in executing the task.
A processor to generate accumulated data comprising, for an operation cycle: performing an operation on a first bit range of a set of first input data to generate a set of operation data, which is accumulated with stored data within a first storage device. A lowest n bits of the accumulated data are accumulated with first further stored data within a first bit range of a second storage device, and are bit-shifted from the first storage device. Further accumulated data is generated, comprising, for an operation cycle: performing the operation on a second bit range of the set of first input data to generate a further set of operation data, which is accumulated with the stored data within the first storage device. A lowest m bits of the further accumulated data is accumulated with second further stored data within a second bit range of the second storage device.
Various implementations described herein are related to a device having a memory architecture with a register bank and multiple latches. The multiple latches may have first latches that receive multi-bit data as input and provide the multi-bit data as output, and multiple latches may have second latches coupled to the first latches so as to receive the multi-bit data from the first latches and then store the multi-bit data.
A mechanism is provided efficient packing of network flits in a data processing network. Transaction messages for transmission across a communication link of a data processing network are analyzed to determine a group of transaction messages to be passed to a packing logic block for increased packing efficiency. The transaction messages are packed into slots of one or more network flits and transmitted across a communication link. The mechanism reduces the number of unused slots in a transmitted network.
COMMAND PROCESSING CIRCUITRY MAINTAINING A LINKED LIST DEFINING ENTRIES FOR ONE OR MORE COMMAND QUEUES AND EXECUTING SYNCHRONIZATION COMMANDS AT THE QUEUE HEAD OF THE ONE OR MORE COMMAND QUEUES IN LIST ORDER BASED ON COMPLETION CRITERIA OF THE SYNCHRONIZATION COMMAND AT THE HEAD OF A GIVEN COMMAND QUEUE
Circuitry comprises a memory to store data defining a set of one or more command queues each associated with a respective memory address space, each command queue defining successive commands for execution from a queue head to a queue tail, the commands being selected from a set of commands comprising synchronization commands and one or more other commands defining memory management operations for a given memory address space, in which completion of a synchronization command is dependent upon one or more completion criteria indicating that all commands from any of the command queues which are earlier than the synchronization command in an execution order have completed; and command processing circuitry to execute the commands; in which the command processing circuitry is configured to maintain a linked list of entries having a list order, each entry defining a respective one of the command queues, in which the command processing circuitry is configured to execute commands at the head of command queues, the command queues being defined by prevailing entries of the linked list of entries in the list order; and in which, for a current occupancy of the linked list at a given stage, the given stage being a given stage of executing commands from command queues defined by entries of the linked list, the command processing circuitry is configured to execute a synchronization command first in the list order and to detect, within that current occupancy of the linked list at the given stage, any further synchronization commands at the head of command queues which are defined by entries of the linked list later in the list order, the command processing circuitry applying, for any such further synchronization commands, the completion criteria of the synchronization command at the head of a given command queue, the given command queue being defined by an entry of the linked list earliest in the list order so as to treat any such further synchronization commands as having been completed.
Example methods, apparatuses, and/or articles of manufacture are disclosed that may be implemented, in whole or in part, techniques to select between and/or among multiple available alternative approaches to perform a temporal anti-aliasing operation in processing an image.
A data processing apparatus is provided. Instruction send circuitry sends an instruction to an external processor to be executed by the external processor. Allocation circuitry allocates a specified one of several registers for a result of the instruction having been executed on the external processor and data receive circuitry receives the result of the instruction having been executed on the external processor and stores the result in the specified one of the several registers. In response to a condition being met: the specified one of the several registers is dereserved prior to the result being received by the data receive circuitry, and the result is discarded by the data receive circuitry when the result is received by the data receive circuitry.
An apparatus and method for controlling access to a shared resource accessible to a plurality of initiator components is provided. The apparatus has shared resource management circuitry to select, from amongst the plurality of initiator components, a currently granted initiator component that is currently allowed to access the shared resource, and to identify the currently granted initiator component within a storage structure. Gating circuitry is located in a communication path between the given initiator component and the shared resource which, on receipt of a given transaction initiated by a given initiator component, determines with reference to the storage structure whether the given initiator component is the currently granted initiator component. The gating circuitry is arranged, when the currently granted initiator component is the given initiator component, to allow onward propagation of the given transaction to the shared resource, and is arranged, when the currently granted initiator component is other than the given initiator component, to block onward propagation of the given transaction to the shared resource and to trigger a recovery action to seek to facilitate future handling of the given transaction.
Various implementations described herein are related to a device having a storage node with a bitcell. The device may have a first stage that performs a first write based on an internal bitline signal, a first write wordline signal and a second write wordline signal. The first stage outputs the internal bitline signal. The device may have a second stage that receives the internal bitline signal and performs a second write of the internal bitline signal to the bitcell. The device may have a third stage with write wordline ports and write bitline ports. The third stage provides the internal bitline signal based on a selected write wordline signal from a write wordline port of the write wordline ports and based on a selected bitline signal based on a write bitline port of the write bitline ports.
G11C 11/412 - Digital stores characterised by the use of particular electric or magnetic storage elements; Storage elements therefor using electric elements using semiconductor devices using transistors forming cells with positive feedback, i.e. cells not needing refreshing or charge regeneration, e.g. bistable multivibrator or Schmitt trigger using field-effect transistors only
A method and apparatus to classify processor events is provided. The apparatus includes a reference generator, a warping unit, a correlation unit and a detector. The reference generator provides a self-reference for an event vector stream based on a history of the event vector stream and the warping unit dynamically aligns the event vector stream with the self-reference to generate a warped event vector stream. The correlation unit determines a window-by-window correlation of event vectors of the warped event vector stream, and the detector passes a window of event vectors of the warped event vector stream to a behavioral classifier when the window-by-window correlation achieves a threshold value. The behavioral classifier may use machine learning. A sample reservoir may be used to store dynamically selected event vectors of the event vector stream that are used, at least in part, to generate the self-reference.
Technique for processing lookup requests, in a cache storage able to store data items of multiple supported types, in the presence of a pending invalidation request
Each entry in a cache has type information associated therewith to indicate a type associated with the data item stored in that entry. Lookup circuitry responds to a given lookup request by performing a lookup procedure to determine whether a hit is detected, by default performing the lookup procedure for a given subset of multiple supported types. Invalidation circuitry processes an invalidation request specifying invalidation parameters used to determine an invalidation address range and invalidation type information, in order to invalidate any data items held in the cache storage that are both associated with the invalidation address range and have an associated type that is indicated by the invalidation type information. Whilst processing of the invalidation request is yet to be completed, filtering circuitry performs a filtering operation for a received lookup request, in order to determine, in dependence on an address indication provided by the received lookup request and one or more of the invalidation parameters of the invalidation request, intersection indication data identifying, for a given type in the given subset, whether an intersection is considered to exist between any entries that would be accessed during performance of the lookup procedure for the received lookup request for the given type and any entries that will be invalidated during processing of the invalidation request, and operation of the lookup circuitry is controlled in dependence on the intersection indication data.
A processor to obtain mapping data indicative of at least one mapping parameter for a plurality of mapping blocks of a multi-dimensional tensor to be mapped. The at least one mapping parameter is for mapping corresponding elements of each mapping block to the same co-ordinate in at least one selected dimension of the multi-dimensional tensor, such that each mapping block corresponds to the same set of co-ordinates in the at least one selected dimension. A co-ordinate of an element of a block of the multi-dimensional tensor is determined. The element is comprised by a mapping block. A physical address in a storage corresponding to the co-ordinate is determined, based on the co-ordinate. The physical address is utilized in a process comprising an interaction between the block of the multi-dimensional tensor and the storage.
An apparatus has a data storage structure to store data items tagged by respective tag values and stores, in association with each data item, a respective tag group identifier to identify other data items having a same tag value within a collection of data items. The apparatus also has tag match circuitry to identify one or more hitting data items. Prioritisation circuitry is provided to select candidate data items which, relative to any other data items in the particular collection of data items having the same tag group identifier as the selected candidate data item is favoured according to an ordering of the data items. The prioritisation circuitry selects the one or more candidate data items before the identification of the hitting data items is available from the tag match circuitry. Data item selection circuitry selects a candidate data item for which the tag match circuitry detected a match.
Circuitry comprises processing circuitry configured to execute program instructions in dependence upon respective trigger conditions matching a current trigger state and to set a next trigger state in response to program instruction execution; the processing circuitry comprising: instruction storage configured to selectively provide a group of two or more program instructions for execution in parallel; and trigger circuitry responsive to the generation of a trigger state by execution of program instructions and to a trigger condition associated with a given group of program instructions, to control the instruction storage to provide program instructions of the given group of program instructions for execution.
An apparatus and method are described for generating debug information. The apparatus has processing circuitry for executing a sequence of instructions that includes a plurality of debug information triggering instructions, and debug information generating circuitry for coupling to a debug port. On executing a given debug information triggering instruction, the processing circuitry is arranged to trigger the debug information generating circuitry to generate a debug information signal whose form is dependent on a control parameter specified by the given debug information triggering instruction. The generated debug information signal is output from the debug port for reference by a debugger. The control parameter is such that the form of the debug information signal enables the debugger to determine a state of the processing circuitry when the given debug information triggering instruction was executed.
Various implementations described herein are related to a device having multi-port circuit architecture with multiple ports. The multi-port circuit architecture may expand a primary clock into multiple dummy clocks so as to separately track, simulate and report clock power consumption for each port of the multiple ports to a central processing unit.
Aspects of the present disclosure relate to an apparatus comprising instruction receiving circuitry to receive an instruction to be executed, the instruction being an instruction to write given data to a storage; instruction implementation circuitry to determine a sequence of operations corresponding to said instruction, and execution circuitry to perform the determined sequence of operations. The instruction implementation circuitry is configured to, responsive to the given data having a value of zero, determining the sequence of operations including a first operation for writing one or more zeroes to the storage, the first operation being a dedicated zero-writing operation, and responsive to the given data having a non-zero value, determining the sequence of operations including one or more second operations for writing the non-zero value to the storage, the one or more second operations being different from the first operation.
The present disclosure relates to a data processing apparatus comprising: instruction decode circuitry to decode instructions; processing circuitry to execute said instructions decoded by said instruction decode circuitry, said processing circuitry comprising fused-multiply-accumulate, FMA, circuitry to respond to a fused-multiply-accumulate, FMA, instruction decoded by said instruction decoder, said FMA instruction specifying a first floating-point operand (a), a second floating-point operand (b) and a third floating-point operand (c); and an operand storage module operable to store said first floating-point operand, said second floating-point operand, and said third floating-point operand, wherein, responsive to said FMA instruction, said FMA circuitry is configured to: perform denormal detection on said first floating-point operand, said second floating-point operand and said third floating-point operand to determine if one or more of said first floating-point operand, said second floating-point operand or said third floating-point operand meets a first denormal condition; upon determining that at least one of said first floating-point operand, said second floating-point operand or said third floating-point operand meets said first denormal condition, execute a denormal handling instruction to: generate a shifted first floating-point operand (ta) based on said first floating-point operand, a shifted second floating-point operand (tb) based on said second floating-point operand, and a shifted third floating-point operand (tc) based on said third floating-point operand; write said shifted first floating-point operand, said shifted second floating-point operand and said shifted third floating-point operand to auxiliary storage of said operand storage module, said auxiliary storage being temporary storage configured within said operand storage module and assigned to said denormal handling instruction; and execute said FMA instruction using said shifted first floating-point operand, said shifted second floating-point operand and said shifted third floating-point operand to generate a shifted FMA output.
A host device (10) provides a plurality of virtual machines (54) executing one or more processes (60, 62, 64, 66). A peripheral device (30) performs tasks on behalf of the host and is coupled to it via a communication network (20). The peripheral provides a plurality of virtual peripheral devices (34), each allocated to one of the virtual machines. Address translation circuitry (75) in the host performs two-stage address translation. When accessing a memory (40) via the host, the peripheral requests a transfer with a specified address and associated metadata providing a source identifier field, a first address translation control field and a second address translation control field. The first address translation control field controls any first stage address translation and depends on the process. The second address translation control field controls any second stage address translation required and depends on the virtual machine associated with the specified address.
There is provided an apparatus, method and computer program for constraining memory accesses. The apparatus comprises processing circuitry to perform operations during which access requests to memory are generated. The processing circuitry is arranged to generate memory addresses for the access requests using capabilities that identify constraining information. The apparatus further comprises capability checking circuitry to perform a capability check operation to determine whether a given access request whose memory address is generated using a given capability is permitted based on given constraining information identified by the given capability. The capability check operation includes performing a range check based on range constraining information provided by the given constraining information, and when a determined condition is met, to perform the range check in dependence on both the range constraining information and an item of state information of the apparatus which varies dynamically during performance of the operations of the processing circuitry.
A processor comprising a first storage managed as a circular buffer to store a plurality of data structures. Each data structure comprises: an identifier, a size indicator and first data associated with instructions for execution of a task. The processor is configured for searching for a data structure in the first storage. A data structure subsequent to the tail data structure can be located using a storage address in the first storage of a tail data structure and the size indicator of all data structures preceding the second data structure among the plurality of data structures. When a data structure is found, the task may be executed based at least in part on the first data of the found data structure.
G06F 12/0875 - Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches with dedicated cache, e.g. instruction or stack
A data processing apparatus comprises: a physical register array comprising a plurality of sectors having one or more different access properties, each sector of the plurality of sectors comprising at least one physical register; prediction circuitry to predict, for a given instruction, a sector identifier identifying one of the sectors of the physical register array to be used for a destination register of the given instruction, wherein the prediction circuitry is configured to select the sector identifier in dependence on prediction information learnt from performance monitoring information indicative of performance achieved for a sequence of instructions when using different sector identifiers for the given instruction; register rename circuitry to map a destination architectural register identifier specified by the given instruction to a destination physical register in the sector identified by the sector identifier predicted by the prediction circuitry; and execution circuitry to execute the given instruction and generate a result to be written to the destination physical register mapped to the destination architectural register identifier by the register rename circuitry.
A tiled-based graphics processor that comprises a plurality of tiling units is disclosed. The graphics processor includes an assigning circuit that assigns tiling units to sort geometry for initial regions of a render output that encompass plural primitive listing regions, and causes assigned tiling units to sort geometry for an initial region into primitive listing regions that the initial region encompasses.
A tiled-based graphics processor that comprises a plurality of tiling units is disclosed. The graphics processor includes an assigning circuit that assigns tiling units to process draw calls or draw call parts, and causes assigned tiling units to process draw calls or draw call parts.
An apparatus and method are provided, the apparatus comprising: interconnect circuitry to couple a device to one or more processing elements, each processing element operating in a trusted execution environment; and secure stashing decision circuitry to receive stashing transactions from the device and to redirect permitted stashing transactions to a given storage structure accessible to at least one of the one or more processing elements. The secure stashing decision circuitry is configured, in response to receiving a given stashing transaction, to determine whether the given stashing transaction comprises a trusted execution environment identifier associated with a given trusted execution environment, and to treat the given stashing transaction as a permitted stashing transaction when redirection requirements, dependent on the trusted execution environment identifier, are met.
G06F 21/53 - Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems during program execution, e.g. stack integrity, buffer overflow or preventing unwanted data erasure by executing in a restricted environment, e.g. sandbox or secure virtual machine
A graphics processing system in which a render output is sub-divided into a plurality of tiles for rendering. The graphics processing system includes a memory system, a tiling circuit and a primitive list preparation circuit. The tiling circuit determines which primitives are to be rendered for regions into which the render output is sub-divided. The regions form a plurality of rows and columns of regions. The primitive list preparation circuit prepares and stores primitive lists for regions of the render output identifying the primitives that are to be rendered for the regions. The primitive list preparation circuit also stores a groups of pointers, each group pointing to respective primitive lists. The regions of the render output corresponding to the primitive lists that are pointed to by the pointers of the group of pointers comprise adjacent regions spanning a plurality of rows and a plurality of columns of regions.
A technique is provided for constraining access to memory using capabilities. An apparatus is provided that has processing circuitry for performing operations during which access request to memory are generated, wherein the processing circuitry is arranged to generate memory addresses for the access requests using capabilities that provide a pointer value and associated constraining information. The apparatus also provides capability generation circuitry, that is responsive to the processing circuitry executing a capability generating instruction that identifies a location in a literal pool of the memory, to retrieve a literal value from the location in the literal pool, and to produce a generated capability in which the pointer value of the generated capability is determined from the literal value. The constraining information of the generated capability is selected from a limited set of options in dependence on information specified by the capability generating instruction. It has been found that such an approach provides a robust mechanism for generating capabilities, whilst reducing code size.
A method and apparatus to control operation of a brain-computer interface, comprising: capturing, at a sensor, a time series of brain activity in response to stimuli; passing data for the time series of brain activity to a history-based challenge generator; receiving, from the history-based challenge generator, a challenge comprising a generated stimulus with a predicted brain response derived from data for the time series of brain activity; issuing the challenge over the brain-computer interface; capturing, at the sensor, a brain response to the challenge; comparing, by an authenticator, the brain response to the challenge with the predicted brain response for the generated stimulus; and responsive to finding no match between the brain response to the challenge and the predicted brain response, preventing further activity over the brain-computer interface.
An apparatus has first-in, first-out (FIFO) buffer circuitry to transfer data from a source domain to a sink domain across a clock domain boundary. The FIFO buffer circuitry has data transfer circuitry to store the data to be transferred across the clock domain boundary; and source and sink domain data transfer control circuitry to maintain respective state vectors indicative of a state of the FIFO buffer circuitry in the respective domain. At least one of the source domain transfer control circuitry and the sink domain transfer control circuitry is operable to perform a multi-item transfer to transfer two or more data items in a single clock cycle of a respective domain by placing the data items into, or reading the data items from, respective data storage elements; and advancing a state vector of the respective domain by two or more state vector encodings in the single clock cycle.
Apparatuses, corresponding methods, and instructions for data processing for the generation of an approximate quantile are provided. A sequence of data values is received, wherein the data values span a range of values and in a plurality of counters each counter is configured to have a correspondence to a subrange within the range of values. For each data value received a corresponding counter of the plurality of counters is determined and an update is applied to the corresponding counter of the plurality of counters. Approximate quantile determination is performed to generate at least one approximate quantile value for the sequence of data values received in dependence on respective counts of the plurality of counters.
Example methods, apparatuses, and/or articles of manufacture are disclosed that may be implemented, in whole or in part, techniques to apply an image anti-aliasing operation to an image frame. In a particular implementation, an anti-flicker process may be applied to a portion of an image frame based, at least in part, on a rate of change in an intensity in the image frame.
There is provided an apparatus, medium and method. The apparatus comprises candidate offset storage circuitry to store a list comprising a plurality of candidate offset values having a default order, and prefetch circuitry to generate prefetch addresses by modifying a base address using a current offset, and to issue prefetch requests to cause information beginning at a corresponding prefetch address to be prefetched into the storage structure in anticipation of a demand request for that information. The apparatus further comprises prefetch training circuitry to select a new offset from the list of candidate offset values through comparison of the plurality of candidate offset values against data indicative of recent requests. The prefetch training circuitry is configured to identify a subset of the candidate offset values based on the current offset and to dynamically modify the default order to increase priority of the subset.
G06F 12/0862 - Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches with prefetch
G06F 9/30 - Arrangements for executing machine instructions, e.g. instruction decode
When performing tile-based rendering a first, pre-pass operation in which primitives in a sequence of primitives for a tile are processed to determine visibility information for the sequence of primitives, the visibility information being usable to determine whether or not fragments for a primitive in the sequence of primitives should subsequently be processed further for the render output, is performed. Thereafter a second, main pass operation is performed in which the further processing of fragments for primitives that were processed during the first, pre-pass operation is controlled based on the determined visibility information for the sequence of primitives, such that for fragments for which the visibility information indicates that the fragments should not be processed further for the render output some or all of the processing during the second, main pass is omitted. The visibility information indicates which primitives should be rendered for which sampling positions of the render output.
When generating a graphics processing output by assembling a sequence of one or more of primitives to be processed from a set of vertex indices provided for the output based on primitive configuration information provided for the output, one or more vertex packets are generated using the vertex indices for the assembled primitives, each vertex packet comprising a plurality of vertices of the assembled primitives. After a threshold number of vertices have been allocated to a vertex packet, vertex attribute processing for the vertices of the vertex packet is triggered, to thereby generate a vertex packet comprising processed vertex attributes for the vertices of the vertex packet. The assembled primitives and the generated vertex packets are then provided to later stages of the graphics processing pipeline for processing.
When generating a graphics processing output by assembling a sequence of one or more of primitives to be processed, vertex attribute processing is performed for vertices of assembled primitives to generate one or more processed vertex attributes for vertices of the assembled primitives. Processed vertex attributes for vertices of assembled primitives are stored in a first storage, and additionally in a second storage. Then when obtaining processed vertex attributes for an assembled primitive, it is checked whether processed vertex attributes for the vertices of the primitive are present in the second storage, and when they are, the processed vertex attributes are read from the second storage, but when the processed vertex attributes are not in the second storage, they are read from the first storage instead.
When preparing and storing a primitive list in a tile-based graphics processing system, a first block of memory space is allocated for storing the primitive list. When there is insufficient space in the first block of memory space to store all of the graphics primitives for the primitive list, a next block of memory space to be used for storing the primitive list is allocated for storing the primitive list. An indication of the location in memory of the allocated next block of memory space is written at the beginning of the first block of memory space.
When processing primitives in a tile-based graphics processing system in which a render output is sub-divided into a plurality of tiles for rendering, before a primitive is written to a primitive list corresponding to a region of the render output, it is first determined whether the primitive can be grouped with one or more previous primitives based on the set of regions of the render output that primitive covers relative to the set of regions of the render output that one or more previous primitives cover. When it is determined that the primitive can be grouped with one or more previous primitives, the primitive is added to a group (i.e. grouped) with the one or more previous primitives. The grouped primitives are then later written together to one or more primitive lists, in a single primitive list write cycle.
When processing primitives in a tile-based graphics processing system in which a render output is sub-divided into a plurality of tiles for rendering, before a primitive is written to a primitive list corresponding to a region of the render output, it is first written to one or more primitive queues allocated to respective regions of the render output. To write the primitives to primitive lists, primitives are written together from a primitive queue allocated to a region of the render output to the primitive list for that region of the render output, in a single primitive list write cycle.
When performing tile-based rendering a first, pre-pass operation in which primitives in a sequence of primitives for a tile are processed to determine visibility information for the sequence of primitives, the visibility information being usable to determine whether or not fragments for a primitive in the sequence of primitives should subsequently be processed further for the render output, is performed. Thereafter a second, main pass operation is performed in which the further processing of fragments for primitives that were processed during the first, pre-pass operation is controlled based on the determined visibility information for the sequence of primitives, such that for fragments for which the visibility information indicates that the fragments should not be processed further for the render output some or all of the processing during the second, main pass is omitted. The visibility information comprises the depth buffer.
When performing tile-based rendering a first, pre-pass operation in which primitives in a sequence of primitives for a tile are processed to determine visibility information for the sequence of primitives, the visibility information being usable to determine whether or not fragments for a primitive in the sequence of primitives should subsequently be processed further for the render output, is performed. Thereafter a second, main pass operation is performed in which the further processing of fragments for primitives that were processed during the first, pre-pass operation is controlled based on the determined visibility information for the sequence of primitives, such that for fragments for which the visibility information indicates that the fragments should not be processed further for the render output some or all of the processing during the second, main pass is omitted. Processing of one or more vertex attributes may be omitted during the first, pre-pass operation.
When performing tile-based rendering a first, pre-pass operation in which primitives in a sequence of primitives for a tile are processed to determine visibility information for the sequence of primitives, the visibility information being usable to determine whether or not fragments for a primitive in the sequence of primitives should subsequently be processed further for the render output, is performed. Thereafter a second, main pass operation is performed in which the further processing of fragments for primitives that were processed during the first, pre-pass operation is controlled based on the determined visibility information for the sequence of primitives, such that for fragments for which the visibility information indicates that the fragments should not be processed further for the render output some or all of the processing during the second, main pass is omitted. Passes from different tiles can be interleaved.
When performing tile-based rendering a first, pre-pass operation in which primitives in a sequence of primitives for a tile are processed to determine visibility information for the sequence, the visibility information being usable to determine whether fragments for a primitive in the sequence should subsequently be processed further, is performed. Thereafter a second, main pass operation is performed in which the further processing of fragments for primitives that were processed during the first, pre-pass operation is controlled based on the determined visibility information for the sequence of primitives, such that for fragments for which the visibility information indicates that the fragments should not be processed further for the render output some or all of the processing during the second, main pass is omitted. The visibility information indicates which primitives should be rendered for which sampling positions of the render output in a hierarchical manner.
The present disclosure relates to a processing resource for a graphics processing system for performing graphics processing for an application executing on a host processor of the graphics processing system according to a command stream, the command stream being generated by the host processor in response to an API call from the application, the processing resource comprising: a control circuit configured to execute commands from the command stream, wherein the command stream comprises one or more commands relating to a processing task and one or more commands relating to at least one state group associated with the processing task; at least one processing circuit configured to perform processing tasks; a shadow state storage module configured for use by the control circuit to store state information; and a processing state storage module configured for use by the processing circuit to store state information, wherein the control circuit is configured to determine one or more changed states within the at least one state group with respect to a preceding API call, to write state information comprising the one or more changed states to the shadow state storage module, and to assign the processing task to the at least one processing circuit and execute an instruction to transmit the state information from the shadow state storage module to the processing state storage module.
When performing tile-based rendering a first, pre-pass operation in which primitives in a sequence of primitives for a tile are processed to determine visibility information for the sequence of primitives, the visibility information being usable to determine whether or not fragments for a primitive in the sequence of primitives should subsequently be processed further for the render output, is performed. Thereafter a second, main pass operation is performed in which the further processing of fragments for primitives that were processed during the first, pre-pass operation is controlled based on the determined visibility information for the sequence of primitives, such that for fragments for which the visibility information indicates that the fragments should not be processed further for the render output some or all of the processing during the second, main pass is omitted. The first, pre-pass can be stopped to process primitives in a fail-safe manner if necessary.
When performing tile-based rendering a first, pre-pass operation in which primitives in a sequence of primitives for a tile are processed to determine visibility information for the sequence of primitives, the visibility information being usable to determine whether or not fragments for a primitive in the sequence of primitives should subsequently be processed further for the render output, is performed. Thereafter a second, main pass operation is performed in which the further processing of fragments for primitives that were processed during the first, pre-pass operation is controlled based on the determined visibility information for the sequence of primitives, such that for fragments for which the visibility information indicates that the fragments should not be processed further for the render output some or all of the processing during the second, main pass is omitted. A data structure is also built to allow entire primitives to be culled.
When performing tile-based rendering a first, pre-pass operation in which primitives for a tile are processed to determine visibility information, the visibility information being usable to determine whether fragments for a primitive in the sequence of primitives should subsequently be processed further for the render output, is performed. Thereafter a second, main pass operation is performed in which the further processing of fragments for primitives that were processed during the first, pre-pass operation is controlled based on the determined visibility information for the sequence of primitives, such that for fragments for which the visibility information indicates that the fragments should not be processed further for the render output some or all of the processing during the second, main pass is omitted. The visibility information indicates which primitives should be rendered for which sampling positions of the render output in a hierarchical manner.
When performing tile-based rendering a first, pre-pass operation in which primitives for a tile are processed to determine visibility information, the visibility information usable to determine whether or not fragments for a primitive should subsequently be processed further for the render output, is performed. Thereafter a second, main pass operation is performed in which the further processing of fragments for primitives that were processed during the first, pre-pass operation is controlled based on the determined visibility information for the sequence of primitives, such that for fragments for which the visibility information indicates that the fragments should not be processed further for the render output some or all of the processing during the second, main pass is omitted. The visibility information indicates which primitives should be rendered for which sampling positions of the render output in a hierarchical manner.
A sequence of primitives to be rendered is processed using a first, pre-pass operation to determine “visibility” information for the sequence of primitives, that is then used in a second, main pass operation in which fragments for primitives that were processed during the first, pre-pass operation are subjected to a visibility test that uses the visibility information determined during the first, pre-pass operation, to determine whether a fragment for a primitive should be processed further in the second, main pass operation. When a fragment is, in the second, main pass operation, subjected to and passes a particular form of visibility test so as to be determined as needing to be processed further, a processing order dependency that would be indicated for the fragment in a record indicative of processing order dependencies for a sub-region of the render output to which the fragment relates is not enforced.
When performing tile-based rendering a first, pre-pass operation in which primitives in a sequence of primitives for a tile are processed to determine visibility information for the sequence of primitives, the visibility information being usable to determine whether or not fragments for a primitive in the sequence of primitives should subsequently be processed further for the render output, is performed. Thereafter a second, main pass operation is performed in which the further processing of fragments for primitives that were processed during the first, pre-pass operation is controlled based on the determined visibility information for the sequence of primitives, such that for fragments for which the visibility information indicates that the fragments should not be processed further for the render output some or all of the processing during the second, main pass is omitted.
A data processor is disclosed that includes a processing unit operable to process neural network data, and a cache system operable to cache neural network data for the processing unit. When neural network data is required for processing, the processing unit issues a request for the neural network data to the cache system, and if the requested data is not cached in the cache system, a compression codec is caused to decode a part of a compressed neural network data stream that encodes the requested neural network data so as to provide the requested neural network data.
The present disclosure relates generally to systems, devices and/or processes for sharing machine learning models among components of a computing environment.
The present disclosure relates generally to systems, devices and/or processes for runtime linking of software component function implementations, and relates more particularly to linking particular function implementations based at least in part on a current execution environment.
The present disclosure provides a method and apparatus for communicating between dice of an inductively-coupled 3D integrated circuit (3D-IC). A transmit resonant circuit at a transmit die is inductively coupled to a first receive resonant circuit at a first receive die, and to a second receive resonant circuit at a second receive die. The resonant circuit at the targeted receive die is tuned to the frequency of resonance of the transmit resonant circuit, while the resonant circuit at the untargeted receive die is detuned, resulting in lower power consumption for a given bit error rate at the targeted die.
Various implementations described herein are related to a device having multi-page memory with a first core array and bitcells accessible via first wordlines and a second core array with bitcells accessible via second wordlines. The device may have wordline drivers coupled to the bitcells in the first core array via the first wordlines and to the bitcells in the second core array via the second wordlines. The device may have buried metal lines formed within a substrate, and the buried metal lines may be used to couple the wordline drivers to the first wordlines.
The present disclosure relates to a method of processing image data at an apparatus having an image sensor, a first statistics data module and a first processor component, the method comprising: obtaining, at the first statistics data module from the image sensor, first image sensor data; generating, at the first statistics data module, statistics data of a first type derived, at least in part, from the first image sensor data; processing, at the first processor component, the statistics data of the first type to determine whether or not an event is detected in a scene; generating, at the first processor component, an event signal when an event is detected.
There is provided an apparatus, medium and method for cache invalidation. The apparatus comprises a cache having a plurality of entries grouped into a plurality of entry sets. Each entry of the plurality of entries identifies an address range having one of a plurality of predetermined address range sizes. The apparatus further comprises cache invalidation circuitry responsive to a cache invalidation request indicating an address invalidation range to trigger invalidation of entries in the cache that overlap the address invalidation range. The cache invalidation circuitry is configured to operate in one of a plurality of invalidation modes based on the address invalidation range and cache occupancy information indicating address range sizes identified by the plurality of entries in the cache. The plurality of invalidation modes comprise: an entry-driven invalidation mode in which the cache invalidation circuitry is configured, for each entry of the plurality of entries and in response to a determination that the address invalidation range overlaps the address range identified by that entry, to invalidate that entry; and an invalidation-range-driven invalidation mode in which the cache invalidation circuitry is configured to generate a set of address range sizes based on the address range sizes indicated in the cache occupancy information and, for each given address range size, to generate one or more cache indexes from the address invalidation range in dependence on the given address range size, each of the cache indexes identifying a corresponding entry set of the plurality of entry sets, and for each corresponding entry set to invalidate entries in dependence on whether the address range identified by those entries overlaps the address invalidation range.
G06F 12/0891 - Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches using clearing, invalidating or resetting means
An apparatus has first-in, first-out buffer circuitry to transfer data from a source domain to a sink domain across a clock domain boundary. The FIFO buffer circuitry has data transfer circuitry; source domain and sink domain data transfer control circuitry to maintain state vectors indicative of a state of the FIFO buffer circuitry in the respective domain; and synchronisation circuitry in each of the source domain and the sink domain to stabilise a signal received from the other of the source domain and the sink domain and to store the received state vector. The synchronisation circuitry is clock-gated by an enable signal and the synchronisation circuitry is responsive to a change in the state of the FIFO buffer circuitry in the respective domain to advance the respective state vector by controlling the enable signal to enable output of elements of the received state vector.
Example methods, apparatuses, and/or articles of manufacture are disclosed that may be implemented, in whole or in part, using one or more computing devices to update parameters of an estimator to estimate and/or predict an execution latency of a neural network in a neural network architecture search (NAS) process.
Various implementations described herein refer to a device having an encoder coupled to memory. The ECC encoder receives input data from memory built-in self-test circuitry, generates encoded data by encoding the input data and by adding check bits to the input data, and writes the encoded data to memory. The device may have an ECC decoder coupled to memory. The ECC decoder reads the encoded data from memory, generates corrected data by decoding the encoded data and by extracting the check bits from the encoded data, and provides the corrected data and double-bit error flag as output. The ECC decoder has error correction logic that performs error correction on the decoded data based on the check bits, wherein if the error correction logic detects a multi-bit error in the decoded data, the error correction logic corrects the multi-bit error in the decoded data to provide the corrected data.
An apparatus is provided for controlling the operating mode of control circuitry, such that the control circuitry may change between two operating modes. In an allocation mode, data that is loaded in response to an instruction is allocated into storage circuitry from an intermediate buffer, and the data is read from the storage circuitry. In a non-allocation mode, the data is not allocated to the storage circuitry, and is read directly from intermediate buffer. The control of the operating mode may be performed by mode control circuitry, and the mode may be changed in dependence on the type of instruction that calls the data, and whether the data may be used again in the near future, or whether it is expected to be used only once.