Apparatuses, methods and programs are disclosed relating to the predication of multiple vectors in vector processing. An encoding of predicate information is disclosed which comprises an element size and an element count, wherein the predicate information comprises a multiplicity of consecutive identical predication indicators given by the element count, each predication indicator corresponding to the element size.
A method for filtering adversarial noise from an input signal is provided. The method comprises receiving an input signal which has an unknown level of adversarial noise. The input signal is filtered with a neural network to remove noise from the received input signal, thereby producing a filtered signal. A confidence value is calculated, the confidence value being associated with the filtered signal, and indicative of a level of trust relating to the filtered signal. The filtered signal and the confidence value may then be output.
Multiplication circuitry comprises at least two adder arrays each to add a respective set—of partial products to generate a respective product representing value representing a result of multiplication of a respective pair of portions of bits selected from first and second operands. The adder arrays comprise separate instances of hardware circuitry having at least two separate enable control signals for independently controlling whether at least two subsets of adder arrays are enabled or disabled. Booth encoding circuitry is shared between the adder arrays, to Booth encode the first operand to generate partial product selection indicators each corresponding to a Booth encoding of a respective Booth digit of the first operand. At least two adder arrays operate on respective partial products selected by partial product selection circuitry based on a same partial product selection indicator generated by the shared Booth encoding circuitry based on a same Booth digit of the first operand.
Various implementations described herein are related to a device with fabrication test circuitry having transistors arranged in a parallel branch configuration between a supply voltage and a single pad. In some applications, each transistor in an off-current branch may be separately deactivated so as to test leakage current applied to the pad by way of the off-current branch, and also, each transistor in an on-current branch may be deactivated so as to further test the leakage current applied to the pad by way of the off-current branch.
An apparatus is provided for limiting the effective utilisation of an instruction fetch queue. The instruction fetch entries are used to control the prefetching of instructions from memory, such that those instructions are stored in an instruction cache prior to being required by execution circuitry while executing a program. By limiting the effective utilisation of the instruction fetch queue, fewer instructions will be prefetched and fewer instructions will be allocated to the instruction cache, thus causing fewer evictions from the instruction cache. In the event that the instruction fetch entries are for instructions that are unnecessary to the program, the pollution of the instruction cache with these unnecessary instructions can be mitigated.
Various implementations described herein are related to a device with fabrication test circuitry having transistors arranged in a parallel branch configuration between a supply voltage and a single pad. In some applications, each transistor in an off-current branch may be separately deactivated so as to test leakage current applied to the pad by way of the off-current branch, and also, each transistor in an on-current branch may be deactivated so as to further test the leakage current applied to the pad by way of the off- current branch.
Processing circuitry is provided to perform vector operations, with instruction decoder circuitry used to decode instructions from a set of instructions to control the processing circuitry to perform the vector operations specified by the instructions. Array storage that has storage elements to store data blocks is used to store at least one two-dimensional array of data blocks accessible to the processing circuitry when performing the vector operations. The set of instructions comprises a complex valued outer product instruction specifying a first source operand, a second source operand, and a destination operand, wherein each of the first source operand and the second source operand is a vector operand comprising a plurality of source data elements, each source data element is a complex number formed of a real part and an imaginary part, and the destination operand identifies a given two-dimensional array of data blocks within the array storage. The processing circuitry is responsive to the complex valued outer product instruction to perform an outer product operation using the source data elements of the first source operand and the source data elements of the second source operand in order to generate a plurality of result data elements, where each result data element is a complex number formed of a real part and an imaginary part, and where each real part and each imaginary part of each result data element is associated with one of the data blocks in the given two-dimensional array of data blocks and is used to update a value of that associated data block.
A data processing apparatus includes first vector registers and second vector registers, both dynamically spatially and dynamically temporally dividable. Decode circuitry receives one or more matrix multiplication instructions that indicate a set of first elements in the first vector registers and a set of second elements in the second vector registers, and in response to receiving the matrix multiplication instructions they generate a matrix multiplication operation. The matrix multiplication operation causes one or more execution units to perform a matrix multiplication of the set of first elements by the set of second elements and an average bit width of the first elements is different to an average bit width of the second elements.
There is provided an apparatus comprising a translation lookaside buffer (TLB) comprising plural entries capable of storing translation data. The TLB is configured to select, when allocating the translation data for storage within a given entry, a format used to store the translation data within the given entry, and the format is selected from a format group comprising plural coalesced formats. The apparatus is provided with control circuitry to maintain coalesced format information identifying active coalesced formats. Each coalesced format defines an input address range size and an output address range size, and each entry formatted using a coalesced format is capable of identifying plural address translations between input address blocks, located within an input address range having the input address range size defined in that coalesced format, and output address blocks, located within an output address range having the output address range size defined in that coalesced format.
G06F 12/1036 - Address translation using associative or pseudo-associative address translation means, e.g. translation look-aside buffer [TLB] for multiple virtual address spaces, e.g. segmentation
G06F 12/1009 - Address translation using page tables, e.g. page table structures
G06F 12/0864 - Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches using pseudo-associative means, e.g. set-associative or hashing
A data processing apparatus includes input circuitry that receives a matrix having values in a first format. Output circuitry outputs the matrix having the values in a second format while adjustment circuitry performs a modification of the matrix from the first format to the second format. The second format is computationally contiguous in respect of a data processing apparatus having the first and second vector registers both configured to be dynamically spatially and dynamically temporally divided, performing a matrix multiplication.
There is provided a processing apparatus, method and computer program. The apparatus comprising: decode circuitry to decode instructions; and processing circuitry to apply vector processing operations specified by the instructions. The decode circuitry is configured to, in response to a vector combining instruction specifying a plurality of source vector registers each comprising source data elements in a plurality of data element positions, one or more further source vector registers, and one or more destination registers, cause the processing circuitry to, for each data element position: extract first source data elements from the data element position of each source vector register; extract second source data elements from the one or more further source vector registers; generate a result data element by combining each element of the first source data elements and the second source data elements; and store the result data element to the data element position of the one or more destination registers.
A data processing apparatus includes input circuitry that receives a matrix having values in a first format. Output circuitry outputs the matrix having the values in a second format while adjustment circuitry performs a modification of the matrix from the first format to the second format. The second format is computationally contiguous in respect of a data processing apparatus having the first and second vector registers both configured to be dynamically spatially and dynamically temporally divided, performing a matrix multiplication.
A live attack shadow replay can be performed at a shadow replay box that receives a snapshot of a computer program executed by an operating system of a device; mirrors an execution environment of the snapshot; determines a typical execution of the computer program comprising a first set of variables; performs a static analysis on the snapshot of the computer program to determine a second set of variables; determines a divergence between the first set of variables and the second set of variables; marks variables of the second set of variables that are associated with the divergence; replays a portion of the computer program corresponding to at least the snapshot; and monitors the marked variables of the second set of variables during the replaying of the portion of the computer program.
G06F 21/56 - Computer malware detection or handling, e.g. anti-virus arrangements
G06F 21/53 - Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems during program execution, e.g. stack integrity, buffer overflow or preventing unwanted data erasure by executing in a restricted environment, e.g. sandbox or secure virtual machine
14.
MATRIX MULTIPLICATION IN A DYNAMICALLY SPATIALLY AND DYNAMICALLY TEMPORALLY DIVIDABLE ARCHITECTURE
A data processing apparatus includes first vector registers and second vector registers, both dynamically spatially and dynamically temporally dividable. Decode circuitry receives one or more matrix multiplication instructions that indicate a set of first elements in the first vector registers and a set of second elements in the second vector registers, and in response to receiving the matrix multiplication instructions they generate a matrix multiplication operation. The matrix multiplication operation causes one or more execution units to perform a matrix multiplication of the set of first elements by the set of second elements and an average bit width of the first elements is different to an average bit width of the second elements.
A data processing apparatus includes multithreaded processing circuitry to perform processing operations of a plurality of micro-threads, each micro-thread operating in a corresponding execution context defining an architectural state. Decoder circuitry responds to a first occurrence of a detach instruction to generate a first micro-thread in respect of a first block of instructions, and a second occurrence of the detach instruction to generate a second micro-thread in respect of a second block of instructions. The second block of instructions comprises a data dependency in respect of a resource accessed in the first block of instructions. Also provided is a data processing apparatus with input circuitry that receives input code with a first block of instructions and a second block of instructions. Output circuitry produces output code corresponding to the first block of instructions and the second block of instructions. Processing circuitry generates the output code based on the input code. The processing circuitry generates: a first hint instruction, within the output code corresponding to the first block of instructions, that indicates an availability of a resource, and a second hint instruction, within the output code corresponding to the second block of instructions, that indicates a requirement of the resource. The second block of instructions has a data dependency in respect of a resource accessed in the first block of instructions.
Address translation circuitry is provided to perform address translation on receipt of a first address to generate a second address. The address translation circuitry comprises a page walk controller configured to perform sequential page table lookups in a plurality of page table levels of a page table hierarchy. Portions of the first address are used to index into sequential page table levels. Cache storage is provided to cache entries comprising translation information retrieved by the sequential page table lookups. An entry in the cache storage further comprises in association with the translation information a re-use indicator indicative of a re-use expectation for subsequent information which is subordinate to the translation information of the entry in the page table hierarchy. The address translation circuitry is configured to modify cache usage for the subsequent information in dependence on the re-use indicator.
There is provided a graphics processor (10) comprising a primitive processing circuit operable to process graphics primitives into respective fragment work items to be rendered by a rendering circuit (22). The primitive processing circuit generates one or more queues (18A, 18B) of fragment work items for rendering that contain fragment work items corresponding to multiple, different sources of fragment work items. The graphics processor (10) is configured to issue fragment work items to the rendering circuit (22) in an interleaved fashion such that rendering of fragment work items from a first source of fragment work items can thereby be interleaved with rendering of fragment work items from a second source of fragment work items.
A processor, method and non-transitory computer-readable storage medium for handling data, by obtaining task data describing a task to be executed in the form of a plurality of operations on data, the task data further defining an operation space of said data, analyzing each of the operations to define transformation data comprising transformation instruction representing a transform into an associated operation-specific local spaces. In case transformation instructions to get to the operation-specific local space for an operation are producing less dimensions compared to the operation space, one or more operation-specific arguments are stored in a data field corresponding to a dimension not produced by the transformation instructions in the transformation data corresponding to the operation.
A behavioral system level detector and method that filters local alerts to generate system alerts with an increased confidence level is provided. The method includes receiving local alerts from a local detector that detects events from a processing unit, wherein each local alert comprises information of an event from the processing unit and a timing relationship for the event, filtering the local alerts to determine events indicating an undesirable behavior or attack, and responsive to the determination that there are events indicating the undesirable behavior or the attack, generating a system alert. The behavioral system-level detector includes a shared data structure for storing local alerts received from at least one local detector and system processing unit coupled to the shared data structure to receive the local alerts and coupled to receive state information from the processing units.
Various implementations described herein are related to a device including a bitcell having a bitcell layout with a first metal layer, a second metal layer and a via programming layer. The device may have a via marking layer provided in the bitcell layout for the bitcell, and the via marking layer controls optical proximity correction of the first metal layer and the second metal layer.
G11C 17/12 - Read-only memories programmable only once; Semi-permanent stores, e.g. manually-replaceable information cards using semiconductor devices, e.g. bipolar elements in which contents are determined during manufacturing by a predetermined arrangement of coupling elements, e.g. mask-programmable ROM using field-effect devices
G11C 7/18 - Bit line organisation; Bit line lay-out
An apparatus includes processing circuitry that performs data processing in response to instructions of a software execution environment. Configuration storage circuitry stores a set of memory transaction parameters in association with a partition identifier, and configuration application circuitry applies the set of memory transaction parameters with respect to memory transactions issued by the software execution environment that identifies the partition identifier. The memory transaction parameters comprise a minimum target allocation and a maximum target allocation of a storage capacity of at least part of a memory system in handling the memory transaction that identifies the partition identifier.
Data storage circuitry has entries to store data according to a data storage technology supporting non-destructive reads, each entry associated with an error checking code (ECC) and age indication. Scrubbing circuitry performs a patrol scrubbing cycle to visit each entry of the data storage circuitry within a scrubbing period. On a given visit to a given entry, the scrubbing operation comprises determining, based on the age indication associated with the given entry, whether a check-not-required period has elapsed for the given entry, and if so performing an error check on the data of the given entry using the ECC for that entry. The error check is omitted if the check-not-required period has not yet elapsed. The check-not-required period is restarted for a write target entry in response to a request causing an update to the data and the error checking code of the write target entry. The check-not-required period is restarted for a read target entry in response to a request causing the data of a read target entry to be non-destructively read and subject to the error check.
There is provided an apparatus and method, the apparatus comprising storage circuitry to store event information associated with instructions occurring between instrumentation points. The event information indicates a plurality of different types of events expected to occur during execution of the instructions. The event information comprises, for each event, type information indicating a type of that event and an expected number of occurrences of that event. The apparatus is also provided with monitoring circuitry comprising a plurality of programmable counters. The monitoring circuitry is responsive to a start instrumentation point, to assign at least a subset of the plurality of programmable counters to measure, during execution of the program instructions, occurrences of the plurality of different types of events identified in the event information. The monitoring circuitry is responsive to at least one counter deviating from the expected number of occurrences indicated by that counter, to perform a predetermined action.
Addition circuitry performs a saturating addition of a first number and a second number to generate a result value indicating an addition result corresponding to addition of the first number and the second number when the addition result is within a predetermined range and indicating a saturation value when the addition result is outside the predetermined range. The addition circuitry comprises: saturation lookahead circuitry to determine, for each lane of the result value, a respective set of one or more saturation lookahead status indications indicative of whether that lane should be set to represent part of the saturation value; and addition result generating circuitry to generate result bits for each lane, with a given lane of the result value having a value determined as a function of corresponding bits of the first and second numbers and a corresponding set of one or more saturation lookahead status indications determined for that lane by the saturation lookahead circuitry.
G06F 7/508 - Adding; Subtracting in bit-parallel fashion, i.e. having a different digit-handling circuit for each denomination with simultaneous carry generation for, or propagation over, two or more stages using carry look-ahead circuits
42 - Scientific, technological and industrial services, research and design
Goods & Services
Software-as-a service (SaaS) services featuring software for using artificial intelligence to create unified customer databases and customer profiles that are accessible to other marketing systems; software-as-a service (SaaS) services featuring software for using artificial intelligence for collecting, analyzing, editing, modifying, organizing, synchronizing, integrating, visualizing, monitoring, transmitting, transferring, storing and sharing electronic customer data and information in networked and/or cloud-connected electronic data processing apparatus; software-as-a service (SaaS) services featuring software using artificial intelligence for collecting, analyzing, editing, modifying, organizing, synchronizing, integrating, visualizing, monitoring, transmitting, transferring, storing and sharing electronic customer data and information in the enterprise, retail, consumer packaged goods, financial services, automotive, entertainment and media, telecommunications, food and beverage, electronics, e-commerce, and healthcare sectors; providing information about results from the collection, analysis, storage and sharing of electronic customer data and information; research, development and design all relating to customer data platforms; development and design services all relating to computer software used in and for use in the design, development, creation and programming of customer data platforms; technical consultancy and technical support services in the nature of diagnosing problems in customer data platforms; maintenance and support services, namely, technical advice related to the installation, use and maintenance of customer data platforms; electronic data storage of customer data and information
An apparatus comprises a divide/square-root pipeline comprising: a plurality of divide/square-root iteration pipeline stages each to perform a respective iteration of a digit-recurrence divide or square root operation; and signal paths to supply outputs generated by one divide/square root iteration pipeline stage in one iteration as inputs to a subsequent divide/square root iteration pipeline stage of the divide/square-root pipeline for performing a subsequent iteration of the digit-recurrence divide or square root operation. The divide/square-root pipeline is capable of performing the digit-recurrence divide or square root operation on a floating-point operand to generate a floating-point result.
There is provided a data processing apparatus and method. The data processing apparatus comprises a plurality of processing elements connected via a network arranged on a single chip to form a spatial architecture. Each processing element comprising processing circuitry to perform processing operations and memory control circuitry to perform data transfer operations and to issue data transfer requests for requested data to the network. The memory control circuitry is configured to monitor the network to retrieve the requested data from the network. Each processing element is further provided with local storage circuitry comprising a plurality of local storage sectors to store data associated with the processing operations, and auxiliary memory control circuitry to monitor the network to detect stalled data (S60). The auxiliary memory control circuitry is configured to transfer the stalled data from the network to an auxiliary storage buffer (S66) dynamically selected from amongst the plurality of local storage sectors (S64).
According to the present techniques there is provided a computer implemented method of securely provisioning data to a trusted operation environment hosted on a service, the method performed at the service, comprising: receiving, from a remote resource, a request for functionality at the service; allocating a trusted operation environment in response to the request; initiating a boot sequence at the trusted operation environment; establishing a secure communication session between the trusted operation environment and the remote resource; transmitting, from the trusted operation environment to the remote resource, a request for data; receiving, at the trusted operation environment, the requested data; accessing, at the trusted operation environment, the received data; progressing, at the trusted operation environment, the boot sequence using the received data.
G06F 21/57 - Certifying or maintaining trusted computer platforms, e.g. secure boots or power-downs, version controls, system software checks, secure updates or assessing vulnerabilities
G06F 21/53 - Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems during program execution, e.g. stack integrity, buffer overflow or preventing unwanted data erasure by executing in a restricted environment, e.g. sandbox or secure virtual machine
Various implementations described herein are related to a device with a first clock generator that provides a first pulse signal based on a clock signal, wherein the first clock generator has a first tracking circuit that provides a first reset signal based on the first pulse signal. The device may include a second clock generator that provides a second pulse signal based on the clock signal, wherein the second clock generator has a second tracking circuit that provides a first control signal based on the second pulse signal. Also, the device may include a third clock generator that provides a third pulse signal based on the first reset signal, wherein the third clock generator has a logic circuit that provides a second control signal based on the third pulse signal.
A data processing apparatus comprises processing circuitry configured to perform data processing operations in response to instructions stored in a memory system. In response to determining that an execute-instruction-from-register condition is satisfied at a point in program flow corresponding to a program counter address, the processing circuitry determines, based on instruction-defining information associated with an instruction-storing register, an alternate operation to be performed in place of an original operation represented by an instruction stored in a location in the memory system corresponding to the program counter address. The apparatus also comprises checking circuitry configured to perform an address-dependent check based on the program counter address to determine whether the processing circuitry is permitted to perform the alternate operation in place of the original operation.
A data processing apparatus includes mode circuitry for indicating a first format of a first floating- point parameter and separately indicating a second format of a second floating-point parameter. Floating-point circuitry performs a two-parameter floating-point operation using the first floating- point parameter in the first format and the second floating-point parameter in the second format. At least one of the first format and the second format is dynamically changeable at runtime.
G06F 7/483 - Computations with numbers represented by a non-linear combination of denominational numbers, e.g. rational numbers, logarithmic number system or floating-point numbers
G06F 9/30 - Arrangements for executing machine instructions, e.g. instruction decode
32.
APPARATUS, METHOD, AND COMPUTER PROGRAM FOR COLLECTING DIAGNOSTIC INFORMATION
An apparatus has processing circuitry to perform single instruction, multiple data (SIMD) processing on an array with a plurality of data items and the processing circuitry supports the SIMD processing for a plurality of array sizes. The processing circuitry is able to select an array size with which to perform the SIMD processing based on SIMD processing configuration information. Diagnostic information collection circuitry is provided to collect diagnostic information about software executing on the processing circuitry and the diagnostic information collection circuitry filters collection of the diagnostic information based on the SIMD processing configuration information.
G06F 11/34 - Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation
G06F 11/36 - Preventing errors by testing or debugging of software
G06F 15/16 - Combinations of two or more digital computers each having at least an arithmetic unit, a program unit and a register, e.g. for a simultaneous processing of several programs
G06F 15/80 - Architectures of general purpose stored program computers comprising an array of processing units with common control, e.g. single instruction multiple data processors
33.
SYSTEMS AND DEVICES FOR CONFIGURING NEURAL NETWORK CIRCUITRY
Subject matter disclosed herein may relate to storage and/or processing of signals and/or states representative of neural network parameters in a computing device, and may relate more particularly to configuring circuitry in a computing device to process signals and/or states representative of neural network parameters.
Various implementations described herein are directed to a device having a set-reset architecture with multiple latches including a first latch, a second latch and a third latch. The first latch may receive a first data signal from first logic and provide a first latched data signal as a first output based on a clock signal. The second latch may receive a second data signal from second logic and provide a second latched data signal as a second output based on the clock signal. The third latch may receive a third data signal from third logic and provide a third latched data signal as a third output based on the clock signal.
A data processing apparatus comprises operand routing circuitry configured to prepare operands for processing, and a plurality of processing elements. Each processing element comprises receiving circuitry, processing circuitry, and transmitting circuitry. A group of coupled processing elements comprises a first processing element configured to receive operands from the operand routing circuitry and one or more further processing elements for which the receiving circuitry is coupled to the transmitting circuitry of another processing element in the group. The apparatus also comprises timing circuitry, configured to selectively delay transmission of operands within the group of coupled processing elements to cause operations performed by the group of coupled processing elements to be staggered.
A target virtual address of a memory access request is translated to a target physical address (PA). The memory access request is associated with one of a plurality of domains including at least a less secure domain associated with a less secure physical address space (PAS) and a more secure domain associated with a more secure PAS. PAS selection circuitry 16, 20 selects a selected PAS for the memory access request. The more secure PAS is prohibited from being selected for memory access requests associated with the less secure domain. Checking circuitry 20 determines whether to reject the memory access request based on protection information corresponding to the target PA. In at least one mode, when the protection information indicates a predetermined less-secure memory property, the less secure PAS (but not the more secure PAS) is allowed to provide access to the target PA, and memory access requests associated with the more secure domain are prohibited from accessing the target PA, even when the selected PAS is the less secure PAS.
Performance monitoring circuitry (40) has event counters (42) each to maintain a respective event count value based on monitoring of events during processing of the software by the processing circuitry. Control circuitry (44) configures the event counters based on counter configuration information. For at least a subset of the event counters, a given event counter (42) in the subset supports a chained-counter operation comprising incrementing a given event count value by an increment value determined based on a logical combination of a first event status indication indicative of status of a first event type assigned by the counter configuration information to be monitored by the given event counter and a second event status indication indicative of status of a second event type assigned by the counter configuration information to be monitored by a further event counter.
Access control circuitry (15), responsive to a memory access request, compares a tag value determined based on a tag portion (40) of an address pointer (42) with an allocation tag (32) associated with the memory location identified by a memory address determined from the address pointer. In response to the comparison indicating a given result, a tag error response is performed. Processing circuitry (4), executing a tag protecting instruction, detects whether an operation involves an attempt to set a bit in an identified portion (74) of an output value to a value other than that in a corresponding bit in an input operand. The processing circuitry sets, when it detects said attempt, a given portion of the output value to an error-indicating value. The identified portion is a portion of the output value which would be used as the tag portion if the output operand was used as the address pointer for a memory access instruction.
Apparatus comprising PAS selection circuitry (16) and access control circuitry (23). The PAS selection circuitry is responsive to a memory access request issued by a requester device, the memory access request specifying a memory address identifying a memory location, to select, based on a current domain of operation of the requester device, one of a plurality of physical address spaces, PASs, to be associated with the memory access request. The access control circuitry comprises PAS checking circuitry (20) to reject the memory access request in response to address-space permissions information, defined for the identified memory location, indicating that memory access requests associated with the selected PAS are prohibited from accessing the identified memory location, and device permissions checking circuitry (92) to reject the memory access request in response to device permissions information, defined for the requester device, indicating that memory access requests issued by the requester device are prohibited from accessing the selected PAS.
Various implementations described herein are related to a device with a first clock generator that provides a first pulse signal based on a clock signal, wherein the first clock generator has a first tracking circuit that provides a first reset signal based on the first pulse signal. The device may include a second clock generator that provides a second pulse signal based on the clock signal, wherein the second clock generator has a second tracking circuit that provides a first control signal based on the second pulse signal. Also, the device may include a third clock generator that provides a third pulse signal based on the first reset signal, wherein the third clock generator has a logic circuit that provides a second control signal based on the third pulse signal.
G11C 8/18 - Address timing or clocking circuits; Address control signal generation or management, e.g. for row address strobe [RAS] or column address strobe [CAS] signals
G11C 7/22 - Read-write [R-W] timing or clocking circuits; Read-write [R-W] control signal generators or management
41.
Dynamic adjustment of memory for storing protection metadata
There is provided a memory protection unit configured to maintain region metadata associated with storage regions of off-chip storage and protection metadata associated with each of the storage regions. The protection metadata is stored in the off-chip storage, and the region metadata encodes whether each of the storage regions belongs to a set of protected storage regions or to a set of unprotected storage regions and encodes information indicating corresponding protection metadata associated with each storage region. The memory protection unit is configured to update the region metadata in response to a region update request identifying a given storage region for which the region metadata is to be modified and to dynamically adjust an amount of memory required to store protection metadata associated with the set of protected storage regions in response to the update to the region metadata.
A super home node of a first chip of a multi-chip data processing system manages coherence for both local and remote cache lines accessed by local caching agents and local cache lines accessed by caching agents of one or more second chips. Both local and remote cache lines are stored in a shared cache, and requests are stored in shared point-of-coherency queue. An entry in a snoop filter table of the super home node includes a presence vector that indicates the presence of a remote cache line at specific caching agents of the first chip or the presence of a local cache line at specific caching agents of the first chip and any caching agent of the second chip. All caching agents of the second chip are represented as a single caching agent in the presence vector.
A super home node of a first chip of a multi-chip data processing system manages coherence for both local and remote cache lines accessed by local caching agents and local cache lines accessed by caching agents of one or more second chips. Both local and remote cache lines are stored in a shared cache, and requests are stored in shared point-of-coherency queue. An entry in a snoop filter table of the super home node includes a presence vector that indicates the presence of a remote cache line at specific caching agents of the first chip or the presence of a local cache line at specific caching agents of the first chip and any caching agent of the second chip. All caching agents of the second chip are represented as a single caching agent in the presence vector.
An apparatus, method and computer program are described. The apparatus comprises throttling control circuitry (215), associated with a given processing element (205), to perform a throttling-level selection process to select a throttling level indicative of an execution rate at which higher-power processing tasks received by the given processing element are to be issued to processing circuitry of the given processing element. The apparatus also comprises power management circuitry (220) to perform a power control process to select an operating voltage and/or clock frequency to be used by the given processing element, wherein the power management circuitry is configured to select the operating voltage and/or clock frequency in dependence on the throttling level selected for the given processing element, and at least one register (225) accessible to firmware. The apparatus is configured to control the selection of at least one energy control parameter in dependence on a value read from the at least one register.
A data processing apparatus comprises an instruction decoder and processing circuitry. The processing circuitry is configured to perform a load-with-substitution operation in response to the instruction decoder decoding a load-with-substitution instruction specifying an address and a destination register. In the load-with-substitution operation, the processing circuitry is configured to issue a request to obtain target data corresponding to the address from one or more caches. In response to the request hitting in a given cache belonging to a subset of the one or more caches, the processing circuitry is configured to provide to the destination register of the load-with-substitution instruction the target data obtained from the given cache. In response to the request missing in each cache belonging to the subset of the one or more caches, the processing circuitry is configured to provide a substitute value as the correct architectural result corresponding to the destination register of the load-with-substitution instruction.
A data processing apparatus includes one or more cache configuration data stores, a coherence manager, and a shared cache. The coherence manager is configured to track and maintain coherency of cache lines accessed by local caching agents and one or more remote caching agents. The cache lines include local cache lines accessed from a local memory region and remote cache lines accessed from a remote memory region. The shared cache is configured to store local cache lines in a first partition and to store remote cache lines in a second partition. The sizes of the first and second partitions are determined based on values in the one or more cache configuration data stores and may or not overlap. The cache configuration data stores may be programmable by a user or dynamically programmed in response to local memory and remote memory access patterns.
G06F 12/084 - Multiuser, multiprocessor or multiprocessing cache systems with a shared cache
G06F 12/0811 - Multiuser, multiprocessor or multiprocessing cache systems with multilevel cache hierarchies
G06F 12/0891 - Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches using clearing, invalidating or resetting means
There is provided an apparatus, method and medium. The apparatus comprises processing circuitry to process instructions and a reorder buffer identifying a plurality of entries having state information associated with execution of one or more of the instructions. The apparatus comprises allocation circuitry to allocate entries in the reorder buffer, and to allocate at least one compressed entry corresponding to a plurality of the instructions. The apparatus comprises memory access circuitry responsive to an address associated with a memory access instruction corresponding to access-sensitive memory and the memory access instruction corresponding to the compressed entry, to trigger a reallocation procedure comprising flushing the memory access instruction and triggering reallocation of the memory access instruction without the compression. The allocation circuitry is responsive to a frequency of occurrence of memory access instructions addressing the access-sensitive memory meeting a predetermined condition, to suppress the compression whilst the predetermined condition is met.
A 1-hot path signature accelerator includes a register, first and second accumulator, and an outer product circuit. The register stores an input frame, where the input frame has, at most, one bit of each element set. The first accumulator calculates a present summation by adding the input frame to a previous sum of previous input frames inputted to the 1-hot path signature accelerator within a timeframe. The outer product circuit receives each element of the present summation from the first accumulator and each element of the input frame stored in the register to output a present outer product. Since the input frame has at most one bit of each element set, the outer product circuit is reduced to a logical operation. The second accumulator outputs a present second-layer summation by adding the present outer product to a previous second-layer sum of outputs from the outer product circuit within the timeframe.
G06F 7/501 - Half or full adders, i.e. basic adder cells for one denomination
G06F 5/01 - Methods or arrangements for data conversion without changing the order or content of the data handled for shifting, e.g. justifying, scaling, normalising
Prediction circuitry for a data processing system comprises input circuitry to receive status inputs associated with instructions or memory access requests processed by the data processing system. Unified predictor circuitry comprises shared hardware circuitry configurable to act as a plurality of different types of predictor. The unified predictor circuitry generates, according to a unified prediction algorithm based on the status inputs and a set of predictor parameters, an array of predictions comprising different types of prediction of instruction/memory-access behaviour for the data processing system. A configuration subset of the predictor parameters is configurable to adjust a relative influence of each status input in the unified prediction algorithm used to generate the array of predictions. Output circuitry outputs, based on the plurality of types of prediction, speculative action control signals for controlling the data processing system to perform speculative actions.
A computer implemented method is provided. The computer implemented method includes receiving an intermediate representation of a source code, intentionally injecting a weak code path at a point within the intermediate representation to create a modified intermediate representation, performing a path profiling on the modified intermediate representation to generate a particular path identifier for each path within the modified intermediate representation, and identifying the particular path identifier of the weak code path for use by a monitoring system. A monitoring system is also provided. The monitoring system monitors an executable code during runtime for execution of a path having a particular path identifier corresponding to the injected intentionally weak code path.
A method to distribute verification of attestation evidence and a verifiable system are provided. Method includes receiving, at a secondary verifier operating in a verifiable system, a request from a relying party to perform a verification process with respect to attestation evidence of a device in communication with the relying party, communicating self-attestation evidence, by the secondary verifier, to a trusted verifier to generate an attestation report of the verifiable system, communicating the attestation report of the verifiable system or other indicator of trustworthiness to the relying party to indicate trustworthiness of the secondary verifier with respect to performing the verification process, and performing, by the secondary verifier, the verification process on the attestation evidence of the device in communication with the relying party. The verifiable system includes instructions of a secondary verifier stored and executed by the verifiable system to perform the method to distribute verification of attestation evidence.
In one embodiment, a computer system comprises one or more central processing units (CPUs) that are communicatively coupled to a system clock, one or more network interfaces, and one or more database interfaces; digital electronic main memory that is communicatively coupled to the one or more CPUs and storing one or more sequences of stored program instructions which, when executed using the one or more CPUs, cause the one or more CPUs to execute a plurality of different consumer services of a SaaS-based data analytics platform, each of the consumer services hosting an instance of a sandboxed runtime for a hermetic and deterministic programming language; user function storage that is communicatively coupled to one of the database interfaces and storing a plurality of different user functions, each of the user functions having been programmed using the programming language, each of the user functions being stored in association with a reference to a destination table of a destination database; each of the consumer services being programmed to initiate a data ingestion process; load a copy of a user function from the user function storage to the sandboxed runtime that is associated with the particular consumer service; using the sandboxed runtime local to the particular consumer service, execute the user function over records directed to the destination table identified in the reference; filter the records or write new records resulting from the function to the destination table.
In a data processing network, error detection information (EDI) is generated for first data of a first communication protocol of a plurality of communication protocols, the EDI including an error detection code and an associated validity indicator for each field group in a set of field groups. The first data and the EDI are sent through a network interconnect circuit, where the first data is translated to second data of a second communication protocol. An error is detected in the second data received from the network interconnect circuit when a validity indicator for a field group is set in EDI received with the second data and an error detection code generated for second data in the field group does not match the error detection code associated with the field group in the received EDI.
A data processing apparatus comprises: execution circuitry to execute instructions in order to perform data processing operations specified by those instructions; a plurality of registers to store data values for access by the execution circuitry when performing the data processing operations, each register having an associated physical register identifier; register rename circuitry to select physical register identifiers to associate with architectural register identifiers specified by the instructions; and rename storage having a plurality of entries, each entry being associated with one of the architectural register identifiers and used by the register rename circuitry to indicate a physical register identifier selected for association with that one of the architectural register identifiers; the register rename circuitry comprising an execute unit, and being responsive to detection of an early execute condition for a given instruction, the early execute condition requiring at least detection that each source value required to execute the given instruction is available to the register rename circuitry without accessing the plurality of registers, to cause the execute unit to perform the data processing operation specified by the given instruction in order to generate a result value, and to cause the generated result value to be stored in an entry of the rename storage associated with a destination architectural register identifier specified by the given instruction.
Various implementations described herein are related to a device having a memory architecture with a multi-stack of transistors that may be arranged in a multi-bitcell stack configuration. Also, the memory architecture may have a wordline that may be shared across the transistors of the multi-stack of transistors with each transistor coupled to a different bitline.
A method is provided for generating a mipmap. A processor comprises a neural processing engine comprising a plurality of hardware units suitable for performing integer operations on machine learning models. The method comprises receiving initial image data and one or more commands to perform an operation for generating a further layer of image data from the initial image data that has a different resolution to the initial image data. The method processes the commands to generate the further layer using the neural processing engine of the processor.
Briefly, example methods, apparatuses, and/or articles of manufacture are disclosed that may facilitate and/or support scheduling tasks for one or more hardware components of a computing device.
Example methods, apparatuses, and/or articles of manufacture are disclosed that may be implemented, in whole or in part, in techniques to process image signal intensity values sampled from a multi color channel imaging device. In particular, such techniques may comprise application of convolution operations with kernel coefficients selected from a set of coefficient values such that the same coefficient value is to be applied to image signal intensity values of multiple pixel locations in an image frame.
A system, method and computer program product configured to control a plurality of parallel programs operating in an n-dimensional hierarchical iteration space over an n-dimensional data space, comprising: a processor and a memory configured to accommodate the plurality of parallel programs and the data space; a memory access control decoder configured to decode memory location references to regions of the n-dimensional data space from indices in the plurality of parallel programs; and an execution orchestrator responsive to the memory access control decoder and configured to sequence regions of the n-dimensional hierarchical iteration space of the plurality of parallel programs to honour a data requirement of at least a first of the plurality of parallel programs having a data dependency on at least a second of the plurality of parallel programs.
The present disclosure relates to A data processing apparatus, comprising: graph partitioning circuitry configured to receive a computation graph, the computation graph comprising a plurality of nodes representing operators and a plurality of edges representing relationships amongst the plurality of operators, and to divide the computation graph into a plurality of partitions, each partition comprising one or more nodes and/or edges; graph compilation circuitry configured to compile a computation graph to generate one or more compilation outputs; and storage to store the one or more compilation outputs; wherein the graph compilation circuitry is configured to: compile a first partition of the plurality of partitions to generate a first compilation output; and output the first compilation output to a first target portion of the storage assigned to the first partition.
Data transfer between caching domains of a data processing system is achieved by a local coherency node (LCN) of a first caching domain receiving a read request for data associated with a second caching domain, from a requesting node of the first caching domain. The LCN requests the data from the second caching domain via a transfer agent. In response to receiving a cache line containing the data from the second caching domain, the transfer agent sends the cache line to the requesting node, bypassing the LCN and, optionally, sends a read-receipt indicating the state of the cache line to the LCN. The LCN updates a coherency state for the cache line in response to receiving the read-receipt from the transfer agent and a completion acknowledgement from the requesting node. Optionally, the transfer agent may send the cache line via the LCN when congestion is detected in a response channel of the data processing system.
G06F 12/00 - Accessing, addressing or allocating within memory systems or architectures
G06F 12/0888 - Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches using selective caching, e.g. bypass
62.
Traffic Isolation at a Chip-To-Chip Gateway of a Data Processing System
A mechanism for error containment in a data processing system includes receiving a transaction request at a gateway between a host and a device, allocating an entry for the request in a local request tracker of the gateway and sending a link request, to a port of the gateway. In response to an isolation trigger, the port is moved into isolation by completing in-process requests with entries in the tracker and locking the entries. On receiving a response to an in-process request while the port is in isolation, the response is dropped, the associated entry is unlocked, and allocation of the entry is enabled. A completion response is sent to the requester without dispatching a new link request to the port. When requests are completed, the system is quiesced, locked entries are unlocked, and the port is moved out of isolation.
G06F 11/20 - Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements
63.
SYSTEM, DEVICES AND/OR PROCESSES FOR ASSIGNMENT, CONFIGURATION AND/OR MANAGEMENT OF ONE OR MORE HARDWARE COMPONENTS OF A COMPUTING DEVICE
Briefly, example methods, apparatuses, and/or articles of manufacture are disclosed that may facilitate and/or support assignment, configuration and/or management of one or more hardware components of a computing device.
G06F 9/455 - Emulation; Interpretation; Software simulation, e.g. virtualisation or emulation of application or operating system execution engines
G06F 9/48 - Program initiating; Program switching, e.g. by interrupt
G06F 21/52 - Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems during program execution, e.g. stack integrity, buffer overflow or preventing unwanted data erasure
A processor to: receive a task to be executed, the task comprising a task-based parameter associated with the task, for use in determining a position, within an array of data descriptors, of a particular data descriptor of a particular portion of data to be processed in executing the task. Each of the data descriptors in the array of data descriptors has a predetermined size and is indicative of a location in a storage system of a respective portion of data. The processor derives, based on the task, array location data indicative of a location in the storage system of a predetermined data descriptor, and obtains the particular data descriptor, based on the array location data and the task-based parameter. The processor obtains the particular portion of data based on the particular data descriptor and processes the particular portion of data in executing the task.
A processor comprising: a handling unit; a plurality of components each configured to execute a function. The handling unit can receive a task comprising operations on data in a coordinate space having N dimensions, receive a data structure describing execution of the task and comprising a partially ordered set of data items each associated with instructions usable by the plurality of components when executing the task, each data item is associated with a component among the plurality of components, each data item indicates dimensions of the coordinates space for which changes of coordinate causes the function of the associated component to execute, and dimensions of the coordinate space for which changes of coordinate causes the function of the associated component to store data ready to be used by another component. The handling unit iterates over the coordinate space and executes the task using the partially ordered set of data items.
Provided is an access controller configured to control access to a shared resource by plural accessors operable to issue requests for access to the shared resource, the access controller comprising a predictor configured to analyze an activity of an accessor to determine a type of at least one event; predict a future request state of at least one of the accessors based on the determination; and select one of the plurality of accessors to be granted access to the shared resource on a future processor cycle, the selecting being computed using at least the prediction.
A memory unit configured for handling task data, the task data describing a task to be executed as a directed acyclic graph of operations, wherein each operation maps to a corresponding execution unit, and wherein each connection between operations in the acyclic graph maps to a corresponding storage element of the execution unit. The task data defines an operation space representing the dimensions of a multi-dimensional arrangement of the connected operations to be executed represented by the data blocks; the memory unit configured to receive a sequence of processing requests comprising the one or more data blocks with each data block assigned a priority value and comprising a block command. The memory unit is configured to arbitrate between the data blocks based upon the priority value and block command to prioritize the sequence of processing requests and wherein the processing requests include writing data to, or reading data from storage.
A data processing system comprising a processor (306) that is configured to perform neural network processing having one or more execution units (213, 214) configured to perform processing operations for neural network processing and a control circuit (217) configured to distribute processing tasks to the execution unit or units, and a graphics processor (304) comprising a programmable execution unit (203) operable to execute processing programs to perform processing operations. The control circuit (217) of the processor (306) that is configured to perform neural network processing is configured to, in response to an indication of particular neural network processing to be performed provided to the control circuit, cause the programmable execution unit (203) of the graphics processor to execute a program to perform the indicated neural network processing.
A method and apparatus for distributing operations for execution. Input data is received and is subdivided into portions, each comprising a first and second sub-portion. A first operation and a second operation are received. Dependencies between the first and second operations are identified. For each portion the first operation is issued for execution on the first sub-portion to produce a first output sub-portion, and completion is tracked. The first operation is issued for execution on the second sub-portion to produce a second output sub-portion. Depending upon satisfaction of the dependencies in respect of the first sub-portion, either the second operation to be executed on the first output sub-portion is issued, if the dependencies are met; or the second operation, to be executed on the first output sub-portion is stalled, if the dependencies are not met. This is repeated for each subsequent portion.
A processor to generate position data indicative of a position within a compressed data stream, wherein, previously, in executing a task, data of the compressed data stream ending at the position has been read by the processor from storage storing the compressed data stream. After reading the data, the processor reads further data of the compressed data stream from the storage, in executing the task, the further data located beyond the position within the compressed data stream. After reading the further data, the processor reads, based on the position data, a portion of the compressed data stream from the storage, in executing the task, starting from the position within the compressed data stream. The processor decompresses the portion of the compressed data stream to generate decompressed data, in executing the task.
A processor to generate accumulated data comprising, for an operation cycle: performing an operation on a first bit range of a set of first input data to generate a set of operation data, which is accumulated with stored data within a first storage device. A lowest n bits of the accumulated data are accumulated with first further stored data within a first bit range of a second storage device, and are bit-shifted from the first storage device. Further accumulated data is generated, comprising, for an operation cycle: performing the operation on a second bit range of the set of first input data to generate a further set of operation data, which is accumulated with the stored data within the first storage device. A lowest m bits of the further accumulated data is accumulated with second further stored data within a second bit range of the second storage device.
Various implementations described herein are related to a device having a memory architecture with a register bank and multiple latches. The multiple latches may have first latches that receive multi-bit data as input and provide the multi-bit data as output, and multiple latches may have second latches coupled to the first latches so as to receive the multi-bit data from the first latches and then store the multi-bit data.
A memory unit configured for handling task data, the task data describing a task to be executed as a graph of operations, wherein each operation maps to a corresponding execution unit, and wherein each connection between operations in the graph maps to a corresponding storage element of the execution unit. The task data defines an operation space representing the dimensions of a multi-dimensional arrangement of the connected operations to be executed represented by the data blocks; the memory unit configured to receive a sequence of processing requests comprising the one or more data blocks with each data block assigned a priority value and comprising a block command. The memory unit is configured to arbitrate between the data blocks based upon the priority value and block command to prioritize the sequence of processing requests and wherein the processing requests include writing data to, or reading data from storage.
A processor and method for handling data, by obtaining operations from storage, analyzing each of the operations to determine an associated operation space, and generating at least one operation set, wherein the operations of the operation set have substantially similar operation spaces. Receiving input data in the form of a tensor; and allocate the input data, as the input to a given operation of the operation set. The input data having the predetermined input characteristics associated with the given operation. Executing the given operations using the input to produces an output with the known output characteristics. Storing in a segment being associated with an operation of the operation set, the input data; and the output associated with the operation of the operation set.
A mechanism is provided efficient packing of network flits in a data processing network. Transaction messages for transmission across a communication link of a data processing network are analyzed to determine a group of transaction messages to be passed to a packing logic block for increased packing efficiency. The transaction messages are packed into slots of one or more network flits and transmitted across a communication link. The mechanism reduces the number of unused slots in a transmitted network.
Command processing circuitry maintaining a linked list defining entries for one or more command queues and executing synchronization commands at the queue head of the one or more command queues in list order based on completion criteria of the synchronization command at the head of a given command queue
Circuitry comprises a memory to store data defining a set of one or more command queues each associated with a respective memory address space, each command queue defining successive commands for execution from a queue head to a queue tail, the commands being selected from a set of commands comprising synchronization commands and one or more other commands defining memory management operations for a given memory address space, in which completion of a synchronization command is dependent upon one or more completion criteria indicating that all commands from any of the command queues which are earlier than the synchronization command in an execution order have completed; and command processing circuitry to execute the commands; in which the command processing circuitry is configured to maintain a linked list of entries having a list order, each entry defining a respective one of the command queues, in which the command processing circuitry is configured to execute commands at the head of command queues, the command queues being defined by prevailing entries of the linked list of entries in the list order; and in which, for a current occupancy of the linked list at a given stage, the given stage being a given stage of executing commands from command queues defined by entries of the linked list, the command processing circuitry is configured to execute a synchronization command first in the list order and to detect, within that current occupancy of the linked list at the given stage, any further synchronization commands at the head of command queues which are defined by entries of the linked list later in the list order, the command processing circuitry applying, for any such further synchronization commands, the completion criteria of the synchronization command at the head of a given command queue, the given command queue being defined by an entry of the linked list earliest in the list order so as to treat any such further synchronization commands as having been completed.
Example methods, apparatuses, and/or articles of manufacture are disclosed that may be implemented, in whole or in part, techniques to select between and/or among multiple available alternative approaches to perform a temporal anti-aliasing operation in processing an image.
A data processing apparatus is provided. Instruction send circuitry sends an instruction to an external processor to be executed by the external processor. Allocation circuitry allocates a specified one of several registers for a result of the instruction having been executed on the external processor and data receive circuitry receives the result of the instruction having been executed on the external processor and stores the result in the specified one of the several registers. In response to a condition being met: the specified one of the several registers is dereserved prior to the result being received by the data receive circuitry, and the result is discarded by the data receive circuitry when the result is received by the data receive circuitry.
An apparatus and method for controlling access to a shared resource accessible to a plurality of initiator components is provided. The apparatus has shared resource management circuitry to select, from amongst the plurality of initiator components, a currently granted initiator component that is currently allowed to access the shared resource, and to identify the currently granted initiator component within a storage structure. Gating circuitry is located in a communication path between the given initiator component and the shared resource which, on receipt of a given transaction initiated by a given initiator component, determines with reference to the storage structure whether the given initiator component is the currently granted initiator component. The gating circuitry is arranged, when the currently granted initiator component is the given initiator component, to allow onward propagation of the given transaction to the shared resource, and is arranged, when the currently granted initiator component is other than the given initiator component, to block onward propagation of the given transaction to the shared resource and to trigger a recovery action to seek to facilitate future handling of the given transaction.
Various implementations described herein are related to a device having a storage node with a bitcell. The device may have a first stage that performs a first write based on an internal bitline signal, a first write wordline signal and a second write wordline signal. The first stage outputs the internal bitline signal. The device may have a second stage that receives the internal bitline signal and performs a second write of the internal bitline signal to the bitcell. The device may have a third stage with write wordline ports and write bitline ports. The third stage provides the internal bitline signal based on a selected write wordline signal from a write wordline port of the write wordline ports and based on a selected bitline signal based on a write bitline port of the write bitline ports.
G11C 11/412 - Digital stores characterised by the use of particular electric or magnetic storage elements; Storage elements therefor using electric elements using semiconductor devices using transistors forming cells with positive feedback, i.e. cells not needing refreshing or charge regeneration, e.g. bistable multivibrator or Schmitt trigger using field-effect transistors only
A method and apparatus to classify processor events is provided. The apparatus includes a reference generator, a warping unit, a correlation unit and a detector. The reference generator provides a self-reference for an event vector stream based on a history of the event vector stream and the warping unit dynamically aligns the event vector stream with the self-reference to generate a warped event vector stream. The correlation unit determines a window-by-window correlation of event vectors of the warped event vector stream, and the detector passes a window of event vectors of the warped event vector stream to a behavioral classifier when the window-by-window correlation achieves a threshold value. The behavioral classifier may use machine learning. A sample reservoir may be used to store dynamically selected event vectors of the event vector stream that are used, at least in part, to generate the self-reference.
Technique for processing lookup requests, in a cache storage able to store data items of multiple supported types, in the presence of a pending invalidation request
Each entry in a cache has type information associated therewith to indicate a type associated with the data item stored in that entry. Lookup circuitry responds to a given lookup request by performing a lookup procedure to determine whether a hit is detected, by default performing the lookup procedure for a given subset of multiple supported types. Invalidation circuitry processes an invalidation request specifying invalidation parameters used to determine an invalidation address range and invalidation type information, in order to invalidate any data items held in the cache storage that are both associated with the invalidation address range and have an associated type that is indicated by the invalidation type information. Whilst processing of the invalidation request is yet to be completed, filtering circuitry performs a filtering operation for a received lookup request, in order to determine, in dependence on an address indication provided by the received lookup request and one or more of the invalidation parameters of the invalidation request, intersection indication data identifying, for a given type in the given subset, whether an intersection is considered to exist between any entries that would be accessed during performance of the lookup procedure for the received lookup request for the given type and any entries that will be invalidated during processing of the invalidation request, and operation of the lookup circuitry is controlled in dependence on the intersection indication data.
A processor to obtain mapping data indicative of at least one mapping parameter for a plurality of mapping blocks of a multi-dimensional tensor to be mapped. The at least one mapping parameter is for mapping corresponding elements of each mapping block to the same co-ordinate in at least one selected dimension of the multi-dimensional tensor, such that each mapping block corresponds to the same set of co-ordinates in the at least one selected dimension. A co-ordinate of an element of a block of the multi-dimensional tensor is determined. The element is comprised by a mapping block. A physical address in a storage corresponding to the co-ordinate is determined, based on the co-ordinate. The physical address is utilized in a process comprising an interaction between the block of the multi-dimensional tensor and the storage.
An apparatus has a data storage structure to store data items tagged by respective tag values and stores, in association with each data item, a respective tag group identifier to identify other data items having a same tag value within a collection of data items. The apparatus also has tag match circuitry to identify one or more hitting data items. Prioritisation circuitry is provided to select candidate data items which, relative to any other data items in the particular collection of data items having the same tag group identifier as the selected candidate data item is favoured according to an ordering of the data items. The prioritisation circuitry selects the one or more candidate data items before the identification of the hitting data items is available from the tag match circuitry. Data item selection circuitry selects a candidate data item for which the tag match circuitry detected a match.
Circuitry comprises processing circuitry configured to execute program instructions in dependence upon respective trigger conditions matching a current trigger state and to set a next trigger state in response to program instruction execution; the processing circuitry comprising: instruction storage configured to selectively provide a group of two or more program instructions for execution in parallel; and trigger circuitry responsive to the generation of a trigger state by execution of program instructions and to a trigger condition associated with a given group of program instructions, to control the instruction storage to provide program instructions of the given group of program instructions for execution.
An apparatus and method are described for generating debug information. The apparatus has processing circuitry for executing a sequence of instructions that includes a plurality of debug information triggering instructions, and debug information generating circuitry for coupling to a debug port. On executing a given debug information triggering instruction, the processing circuitry is arranged to trigger the debug information generating circuitry to generate a debug information signal whose form is dependent on a control parameter specified by the given debug information triggering instruction. The generated debug information signal is output from the debug port for reference by a debugger. The control parameter is such that the form of the debug information signal enables the debugger to determine a state of the processing circuitry when the given debug information triggering instruction was executed.
Various implementations described herein are related to a device having multi-port circuit architecture with multiple ports. The multi-port circuit architecture may expand a primary clock into multiple dummy clocks so as to separately track, simulate and report clock power consumption for each port of the multiple ports to a central processing unit.
Aspects of the present disclosure relate to an apparatus comprising instruction receiving circuitry to receive an instruction to be executed, the instruction being an instruction to write given data to a storage; instruction implementation circuitry to determine a sequence of operations corresponding to said instruction, and execution circuitry to perform the determined sequence of operations. The instruction implementation circuitry is configured to, responsive to the given data having a value of zero, determining the sequence of operations including a first operation for writing one or more zeroes to the storage, the first operation being a dedicated zero-writing operation, and responsive to the given data having a non-zero value, determining the sequence of operations including one or more second operations for writing the non-zero value to the storage, the one or more second operations being different from the first operation.
The present disclosure relates to a data processing apparatus comprising: instruction decode circuitry to decode instructions; processing circuitry to execute said instructions decoded by said instruction decode circuitry, said processing circuitry comprising fused-multiply-accumulate, FMA, circuitry to respond to a fused-multiply-accumulate, FMA, instruction decoded by said instruction decoder, said FMA instruction specifying a first floating-point operand (a), a second floating-point operand (b) and a third floating-point operand (c); and an operand storage module operable to store said first floating-point operand, said second floating-point operand, and said third floating-point operand, wherein, responsive to said FMA instruction, said FMA circuitry is configured to: perform denormal detection on said first floating-point operand, said second floating-point operand and said third floating-point operand to determine if one or more of said first floating-point operand, said second floating-point operand or said third floating-point operand meets a first denormal condition; upon determining that at least one of said first floating-point operand, said second floating-point operand or said third floating-point operand meets said first denormal condition, execute a denormal handling instruction to: generate a shifted first floating-point operand (ta) based on said first floating-point operand, a shifted second floating-point operand (tb) based on said second floating-point operand, and a shifted third floating-point operand (tc) based on said third floating-point operand; write said shifted first floating-point operand, said shifted second floating-point operand and said shifted third floating-point operand to auxiliary storage of said operand storage module, said auxiliary storage being temporary storage configured within said operand storage module and assigned to said denormal handling instruction; and execute said FMA instruction using said shifted first floating-point operand, said shifted second floating-point operand and said shifted third floating-point operand to generate a shifted FMA output.
A translation table entry load/store operation is performed for at least one target translation table entry address selected depending on software-defined address information identifying a selected address in an input address space. Each target translation table entry address comprises an address of a leaf translation table entry providing address mapping information for translating the selected address from the input address space to an output address space or an address of a branch translation table entry traversed in a translation table walk operation for obtaining that leaf translation table entry. At least one variant of the translation table entry load/store operation supports, for a given target translation table entry of the at least one target translation table entry, clearing access tracking metadata of the given target translation table entry from a first state (indicating that at least one load/store access has occurred to a corresponding region of input address space) to a second state (indicating that no load/store accesses have occurred to the corresponding region).
A host device (10) provides a plurality of virtual machines (54) executing one or more processes (60, 62, 64, 66). A peripheral device (30) performs tasks on behalf of the host and is coupled to it via a communication network (20). The peripheral provides a plurality of virtual peripheral devices (34), each allocated to one of the virtual machines. Address translation circuitry (75) in the host performs two-stage address translation. When accessing a memory (40) via the host, the peripheral requests a transfer with a specified address and associated metadata providing a source identifier field, a first address translation control field and a second address translation control field. The first address translation control field controls any first stage address translation and depends on the process. The second address translation control field controls any second stage address translation required and depends on the virtual machine associated with the specified address.
There is provided an apparatus, method and computer program for constraining memory accesses. The apparatus comprises processing circuitry to perform operations during which access requests to memory are generated. The processing circuitry is arranged to generate memory addresses for the access requests using capabilities that identify constraining information. The apparatus further comprises capability checking circuitry to perform a capability check operation to determine whether a given access request whose memory address is generated using a given capability is permitted based on given constraining information identified by the given capability. The capability check operation includes performing a range check based on range constraining information provided by the given constraining information, and when a determined condition is met, to perform the range check in dependence on both the range constraining information and an item of state information of the apparatus which varies dynamically during performance of the operations of the processing circuitry.
Execution circuitry (14) executes processing operations in response to triggered instructions. Candidate instruction storage circuitry (11) stores triggered instructions, each specifying condition information (40) indicating at least one condition. Issue circuitry (12) issues, in response to determining/predicting that the condition indicated by a given triggered instruction is met, the given triggered instruction for execution. The execution circuitry is responsive to state update information (48) specified by the given triggered instruction to cause machine state (82) to be updated. When the given triggered instruction comprises a triggered-producer instruction, the execution circuitry is responsive to completion of execution of the triggered-producer instruction to cause dependency state (84) to be updated, indicating that a corresponding triggered-consumer instruction can be issued. The issue circuitry evaluates, when the given instruction comprises a triggered-consumer instruction, whether the condition is determined/predicted to be met in dependence on both the machine state and the dependency state.
A processor comprising a first storage managed as a circular buffer to store a plurality of data structures. Each data structure comprises: an identifier, a size indicator and first data associated with instructions for execution of a task. The processor is configured for searching for a data structure in the first storage. A data structure subsequent to the tail data structure can be located using a storage address in the first storage of a tail data structure and the size indicator of all data structures preceding the second data structure among the plurality of data structures. When a data structure is found, the task may be executed based at least in part on the first data of the found data structure.
G06F 12/0875 - Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches with dedicated cache, e.g. instruction or stack
A data processing apparatus comprises: a physical register array comprising a plurality of sectors having one or more different access properties, each sector of the plurality of sectors comprising at least one physical register; prediction circuitry to predict, for a given instruction, a sector identifier identifying one of the sectors of the physical register array to be used for a destination register of the given instruction, wherein the prediction circuitry is configured to select the sector identifier in dependence on prediction information learnt from performance monitoring information indicative of performance achieved for a sequence of instructions when using different sector identifiers for the given instruction; register rename circuitry to map a destination architectural register identifier specified by the given instruction to a destination physical register in the sector identified by the sector identifier predicted by the prediction circuitry; and execution circuitry to execute the given instruction and generate a result to be written to the destination physical register mapped to the destination architectural register identifier by the register rename circuitry.
A tiled-based graphics processor that comprises a plurality of tiling units is disclosed. The graphics processor includes an assigning circuit that assigns tiling units to sort geometry for initial regions of a render output that encompass plural primitive listing regions, and causes assigned tiling units to sort geometry for an initial region into primitive listing regions that the initial region encompasses.
A tiled-based graphics processor that comprises a plurality of tiling units is disclosed. The graphics processor includes an assigning circuit that assigns tiling units to process draw calls or draw call parts, and causes assigned tiling units to process draw calls or draw call parts.
An apparatus and method are provided, the apparatus comprising: interconnect circuitry to couple a device to one or more processing elements, each processing element operating in a trusted execution environment; and secure stashing decision circuitry to receive stashing transactions from the device and to redirect permitted stashing transactions to a given storage structure accessible to at least one of the one or more processing elements. The secure stashing decision circuitry is configured, in response to receiving a given stashing transaction, to determine whether the given stashing transaction comprises a trusted execution environment identifier associated with a given trusted execution environment, and to treat the given stashing transaction as a permitted stashing transaction when redirection requirements, dependent on the trusted execution environment identifier, are met.
G06F 21/53 - Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems during program execution, e.g. stack integrity, buffer overflow or preventing unwanted data erasure by executing in a restricted environment, e.g. sandbox or secure virtual machine
A graphics processing system in which a render output is sub-divided into a plurality of tiles for rendering. The graphics processing system includes a memory system, a tiling circuit and a primitive list preparation circuit. The tiling circuit determines which primitives are to be rendered for regions into which the render output is sub-divided. The regions form a plurality of rows and columns of regions. The primitive list preparation circuit prepares and stores primitive lists for regions of the render output identifying the primitives that are to be rendered for the regions. The primitive list preparation circuit also stores a groups of pointers, each group pointing to respective primitive lists. The regions of the render output corresponding to the primitive lists that are pointed to by the pointers of the group of pointers comprise adjacent regions spanning a plurality of rows and a plurality of columns of regions.
A technique is provided for constraining access to memory using capabilities. An apparatus is provided that has processing circuitry for performing operations during which access request to memory are generated, wherein the processing circuitry is arranged to generate memory addresses for the access requests using capabilities that provide a pointer value and associated constraining information. The apparatus also provides capability generation circuitry, that is responsive to the processing circuitry executing a capability generating instruction that identifies a location in a literal pool of the memory, to retrieve a literal value from the location in the literal pool, and to produce a generated capability in which the pointer value of the generated capability is determined from the literal value. The constraining information of the generated capability is selected from a limited set of options in dependence on information specified by the capability generating instruction. It has been found that such an approach provides a robust mechanism for generating capabilities, whilst reducing code size.