Exception control circuitry (40) controls taking of exceptions by processing circuitry (4), depending on control information stored in at least one register (14), the control information including masking control information settable to a masked state or unmasked state; and trap- masked-exception control information settable to an untrapped state or trapped state. In response to a given exception of a maskable class of exceptions, in at least one scenario when the masking control information is in the masked state and a current exception level is less privileged than a predetermined trap target exception level, the exception control circuitry controls whether to trap the given exception to the predetermined trap target exception level depending on whether the trap-masked-exception control information is in the trapped state. When the masking control information is in the unmasked state, a target exception level for handling the given exception is selected independent of the trap-masked-exception control information.
A multiple-outer-product instruction specifies multiple first source vector operands, at least one second source vector operand and correlation information associated with the second source vector operand(s), each vector operand comprising multiple data elements and the correlation information indicating, for each data element of a given second source vector operand, a corresponding first source vector operand. In response to the multiple-outer-product instruction, instruction decoder circuitry (50) controls processing circuitry (60) to perform computations to implement outer product operations, the outer product operations comprising, for a given first source vector operand, performing an associated outer product of that first source vector operand with a subset of data elements of the second source vector operand(s). The processing circuitry selects, for each data element of the second source vector operand(s), a corresponding first source vector operand to be used when performing the associated outer product operation, in dependence on the correlation information.
A method of operating a network-reachable initiator computing entity, comprising: determining a requirement, at the initiator computing entity, for offloading data processing; initiating a message, to be sent on a network to a destination, requesting a response from network-reachable recipient computing entities along the network, from the network-reachable initiator computing entity to the destination, indicating a capacity to perform offloaded data processing; and receiving a response from a first network-reachable recipient computing entity indicating a capacity to perform offloaded data processing.
H04L 67/289 - Intermediate processing functionally located close to the data consumer application, e.g. in same machine, in same home or in same sub-network
G06F 9/50 - Allocation of resources, e.g. of the central processing unit [CPU]
H04L 67/51 - Discovery or management thereof, e.g. service location protocol [SLP] or web services
H04L 67/59 - Providing operational support to end devices by off-loading in the network or by emulation, e.g. when they are unavailable
H04L 43/10 - Active monitoring, e.g. heartbeat, ping or trace-route
4.
AN APPARATUS, A METHOD OF OPERATING AN APPARATUS, AND A NON-TRANSITORY COMPUTER READABLE MEDIUM TO STORE COMPUTER-READABLE CODE FOR FABRICATION OF AN APPARATUS
There is provided an apparatus provided with counter control circuitry to maintain counters associated with data items including: minor counters, middle counters, and a major counter. The apparatus is also provided with a memory protection unit configured, in response to a transfer of a data item from secure storage to off-chip storage, to modify a minor counter associated with the data item, and to encrypt the data item based on counters associated with the data item. The memory protection unit is also responsive to an overflowing minor counter, to perform a middle re-encryption process comprising modifying a middle counter associated with the data item and re-encrypting data items associated with the middle counter. The memory protection unit is also responsive to an overflowing middle counter, to perform a major re- encryption process comprising modifying the major counter, and re-encrypting each of the data items.
G06F 12/14 - Protection against unauthorised use of memory
G06F 21/64 - Protecting data integrity, e.g. using checksums, certificates or signatures
G06F 21/79 - Protecting specific internal or peripheral components, in which the protection of a component leads to protection of the entire computer to assure secure storage of data in semiconductor storage media, e.g. directly-addressable memories
Aspects of the present disclosure relate to apparatus comprising prediction circuitry comprising a plurality of prediction units, said plurality comprising a plurality of types of prediction unit. Each prediction unit is configured to perform a corresponding type of prediction in respect of operations that are to be executed by the apparatus. Shared prediction resource circuitry comprises shared prediction resources configurable to perform said types of prediction. Resource allocation circuitry is configured to determine an allocation of said shared prediction resources to one or more of said plurality of prediction units, and allocate the shared prediction resources according to the determination.
For a predetermined class of load/store operations, load/store processing circuitry buffers store data of predetermined-class store operations in a predetermined-class store buffer, and controls store-to-load forwarding of store data from that buffer to predetermined- class load operations. A predetermined-class-load/store synchronization instruction controls the load/store processing circuitry to enforce that, for a hazarding younger non -predetermined-class load/store operation occurring after the predetermined-class-load/store synchronization instruction in program order and a hazarding older predetermined-class store operation occurring before the predetermined-class-load/store synchronization instruction in program order, for which address ranges overlap, the hazarding younger non-predetermined-class load/store operation observes a result of the hazarding older predetermined-class store operation. In absence of any intervening predetermined-class-load/store synchronization instruction between a given older predetermined-class store operation and a given younger non-predetermined-class load/store operation with overlapping address range, the given younger non-predetermined-class load/store operation is permitted to fail to observe a result of the given older predetermined-class store operation.
A behavioral sensor for creating consumable events can include: a feature extractor coupled to receive an event stream of events performed by a circuit, wherein the feature extractor identifies features of a particular event of the event stream and associates the particular event with a time; and a classifier coupled to receive the features of the particular event from the feature extractor, wherein the classifier classifies the particular event into a classified event associated with the time using predefined categories based on the received features of the particular event; whereby the classified event and subsequent classified events extracted from the event stream within a time frame are appended in a time series forming the consumable events.
A system and computer-implemented method to train and use a neural network is disclosed. For each group of elements of a feature map in a layer in the neural network, a record is accessed to determine if at least one element of the group is active. When at least one element of the group is active, a gradient is determined for each active element of the group, copied to a 5 group element position indicated by the entry for the group in record, and the group is sent to a dot product unit to update weights in the layer based on the group. When no element of the group is active, the dot product unit is signaled to prevent update of weights based on the group. The record is set during the forward path of the feature map through the network.
There is provided an apparatus, a method of operating the apparatus and a computer program for controlling a host data processing apparatus to provide an instruction execution environment equivalent to the apparatus. The apparatus comprises processing circuitry configured to execute a sequence of program instructions to process data items. The processing circuitry is configured to generate a signature indicative of executed instructions in the sequence of program instructions and indicative of the data items. The apparatus is also provided with validation circuitry configured to implement a validation procedure. The validation procedure comprises the steps of evaluating the signature against a predefined policy to verify that the processing circuitry has processed the data items using the sequence of program instructions and, in response to a match between the signature and the predefined policy, generating confirmation information to indicate the match.
G06F 21/52 - Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems during program execution, e.g. stack integrity, buffer overflow or preventing unwanted data erasure
G06F 11/28 - Error detection; Error correction; Monitoring by checking the correct order of processing
An apparatus comprises an instruction decoder to decode instructions; processing circuitry to perform data processing in response to decoding of the instructions by the instruction decoder; and at least one control register to specify instruction-function-selecting information. In response to a no-operation-compatible instruction, the instruction decoder is configured to control the processing circuitry to: treat the no-operation-compatible instruction as a no-operation instruction, when the instruction-function-selecting information specified by the at least one control register is in a first state; perform both a first operation and a second operation, when the instruction-function-selecting information specified by the at least one control register is in a second state; and perform the first operation but not the second operation, when the instruction-function-selecting information specified by the at least one control register is in a third state.
A spiking neural network is described that comprises a plurality of neurons in a first layer connected to at least one neuron in a second layer, each neuron in the first layer being connected to the at least one neuron in the second layer via a respective variable delay path. The at least one neuron in the second layer comprises one or more logic components configured to generate an output signal in dependence upon signals received along the variable delay paths from the plurality of neurons in the first layer. A timing component is configured to determine a timing value in response to receiving the output signal from the one or more logic components, and an accumulate component is configured to accumulate a value based timing values from the timing component. A neuron fires in a case that a value accumulated at the accumulate component reaches a threshold value.
Methods and systems for detecting errors when performing a convolutional operation is provided. Predicted checksum data, corresponding to input checksum data and kernel checksum data, is obtained. The convolutional operation is performed to obtain an output feature map. Output checksum data is generated and the predicted checksum data and the output checksum data are compared, the comparing taking account of partial predicted checksum data configured to correct for a lack of padding when performing the convolution operation, wherein the partial predicted checksum data corresponds to input checksum data for a subset of the values in the input feature map and kernel checksum data for a subset of the values in the kernel.
Apparatuses and methods for monitoring sensor data are provided. Sensor data is captured from an environment and a locality-sensitive hash is generated from the sensor data. Hashes which have been generated are stored in association with a significance value, which periodically decrease. A similarity metric is generated for the locality-sensitive hash with respect to the hashes stored in the hash storage and when it exceeds a similarity threshold the significance value stored in association with the similar hash is increased. When the significance value stored in association with the similar hash exceeds an alert threshold an alert signal is generated.
G06F 21/74 - Protecting specific internal or peripheral components, in which the protection of a component leads to protection of the entire computer to assure secure computing or processing of information operating in dual or compartmented mode, i.e. at least one secure mode
G06F 16/50 - Information retrieval; Database structures therefor; File system structures therefor of still image data
G06F 16/783 - Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
H04L 9/06 - Arrangements for secret or secure communications; Network security protocols the encryption apparatus using shift registers or memories for blockwise coding, e.g. D.E.S. systems
H04L 9/32 - Arrangements for secret or secure communications; Network security protocols including means for verifying the identity or authority of a user of the system
G06F 21/62 - Protecting access to data via a platform, e.g. using keys or access control rules
Aspects of the present disclosure relate to interface circuitry to receive a pointer comprising a plurality of address bits, and pointer processing circuitry. The pointer processing circuitry is configured to extract and encrypt plurality of address bits from the pointer, to produce a plurality of encrypted address bits. The pointer processing circuitry determines, based at least in part on the plurality of address bits, a pointer authentication value. It then combines the pointer authentication value with the plurality of encrypted address bits, to produce a signed encrypted pointer.
G06F 21/52 - Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems during program execution, e.g. stack integrity, buffer overflow or preventing unwanted data erasure
G06F 21/54 - Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems during program execution, e.g. stack integrity, buffer overflow or preventing unwanted data erasure by adding security routines or objects to programs
G06F 21/56 - Computer malware detection or handling, e.g. anti-virus arrangements
G06F 21/64 - Protecting data integrity, e.g. using checksums, certificates or signatures
15.
METHODS AND APPARATUS FOR BRANCH INSTRUCTION SECURITY
Aspects of the present disclosure relate to an apparatus. Instruction receiving circuitry receives, as part of a program flow, a branch instruction, said branch instruction identifying a function. Instruction authentication circuitry determines, based at least in part on the function, an instruction authentication value. The instruction authentication circuitry then combines the instruction authentication value with the branch instruction to produce an authenticatable branch instruction. Branch circuitry authenticates the authenticatable branch instruction based on a function authentication value. Responsive to a successful authentication of the authenticatable branch instruction, the branch circuitry executes a jump in the program flow to said function.
G06F 21/54 - Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems during program execution, e.g. stack integrity, buffer overflow or preventing unwanted data erasure by adding security routines or objects to programs
G06F 21/64 - Protecting data integrity, e.g. using checksums, certificates or signatures
G06F 9/30 - Arrangements for executing machine instructions, e.g. instruction decode
G06F 9/34 - Addressing or accessing the instruction operand or the result
G06F 12/14 - Protection against unauthorised use of memory
An apparatus has processing circuitry to execute instructions. The processing circuitry has calculation circuitry which is responsive to one or more instructions requiring a calculation to be performed to compute the result of the calculation and approximation circuitry which is responsive to said one or more instructions to calculate an approximate result of the calculation independently of the calculation circuitry. The processing circuitry also has integrity checking circuitry to perform an integrity check by comparing the result of the calculation performed by the calculation circuitry and the approximate result of the calculation performed by the approximation circuity. The integrity checking circuitry detects an error in the processing circuitry if it is determined that a difference between the result of the calculation and the approximate result of the calculation is greater than a deviation threshold.
G06F 21/54 - Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems during program execution, e.g. stack integrity, buffer overflow or preventing unwanted data erasure by adding security routines or objects to programs
G06F 21/55 - Detecting local intrusion or implementing counter-measures
G06F 21/74 - Protecting specific internal or peripheral components, in which the protection of a component leads to protection of the entire computer to assure secure computing or processing of information operating in dual or compartmented mode, i.e. at least one secure mode
G06F 11/14 - Error detection or correction of the data by redundancy in operation, e.g. by using different operation sequences leading to the same result
G06F 11/16 - Error detection or correction of the data by redundancy in hardware
A method of managing network-attachable computing entities comprising: training a machine-learning model to detect a bottleneck process segment in a process flow performed by a network-attachable computing entity; deploying a trained model to monitor a network-attachable computing entity in operation; responsive to detecting an instance of the bottleneck process segment, analyzing to determine a cause of the bottleneck; responsive to determining the cause of the bottleneck, generating an augmented functional unit to address the cause of the bottleneck; and deploying the augmented functional unit to at least one of the network-attached computing entities that has an instance of a process comprising the bottleneck process segment.
There is provide an apparatus, method and medium. The apparatus comprises decoder circuitry to generate control signals in response to a vector extract and merge instruction specifying a control parameter, a first vector register, a second vector register, and a destination vector register. The apparatus comprises processing circuitry responsive to the control signals, to perform plural beats of processing, each beat comprising processing corresponding to a portion of at least the first vector register and the destination vector register. The processing, for a Kthbeat comprises: extracting bits, specified by the control parameter, from a Kthportion of the first vector register, concatenating the bits with further bits, and storing the result in the Kthportion of the destination register. The further bits are, for a first portion, extracted from a first portion of the second vector register and, otherwise, from a (K-1)th portion of the first vector register.
Aspects of the present disclosure relate to an apparatus comprising TEE circuitry configured to maintain a list of trusted devices, and interface circuitry to provide communication between the TEE of the apparatus and TEE circuitry of a device communicatively coupled to the apparatus. The TEE circuitry of the apparatus is configured to perform, with the TEE circuitry of the device, a remote attestation in respect of the TEE circuitry of the device. Responsive to a positive outcome of the remote attestation, the device is added to the list of trusted devices. The TEE of the apparatus receives, from the TEE circuitry of the device, an indication of one or more further devices which are trusted by the device, and adds said one or more further devices to the list of trusted devices.
G06F 21/57 - Certifying or maintaining trusted computer platforms, e.g. secure boots or power-downs, version controls, system software checks, secure updates or assessing vulnerabilities
H04W 12/55 - Secure pairing of devices involving three or more devices, e.g. group pairing
A device and method, comprising: determining a quantum of data storage and processing capability/capacity to be offered for distributed processing; retrieving a current trustworthiness rating; constructing a features vector comprising quantum and rating; and entering a pool of candidates for selection by initiator by externalising the features vector. An initiator and method, comprising: determining a minimum quantum of storage and processing capability/capacity and minimum trustworthiness rating required for the processing task; constructing a requirement vector comprising quantum and rating; querying a network for a pool of candidates to perform the task; retrieving a features vector from a candidate; comparing features vector and requirement vector to determine which candidates meet the minimum quantum and rating required for the task; responsive to finding that a candidate meets the minima, selecting the candidate and dispatching task for processing at the candidate and rating the candidate on completion/non-completion/non-completion to standard required.
An apparatus has processing circuitry to perform vector operations, an instruction decoder to decode instructions to control the processing circuitry to perform associated vector operations, and array storage comprising storage elements to store data elements, the array storage storing at least one two dimensional array of data elements. The set of instructions includes a multiple outer product instruction identifying a first source vector operand, a second source vector operand, and a given two dimensional array of data elements within the array storage forming a destination operand. At least the first source vector operand identifies at least one vector of data elements to be treated as comprising a plurality of sub-vectors and at least the second source vector operand identifies a plurality of vectors of data elements. In response to the multiple outer product instruction, the instruction decoder controls the processing circuitry to perform an outer product operation for each sub-vector identified by the first source vector operand. Each outer product operation comprises multiplying each data element of an associated sub-vector identified by the first source vector operand by each data element of a group of data elements selected from the second source vector operand in order to generate a plurality of outer product results, and using each outer product result to update a value held in an associated storage element within the given two dimensional array of storage elements. Selection circuitry controls selection of the data elements processed by each outer product operation so as to switch between vectors of the second source vector operand when switching between different sub-vectors within a given vector of the first source vector operand.
Partial-address-translation-invalidation request to cause cache control circuitry to: identify whether a given cache entry of the address translation cache is a target cache entry to be invalidated, wherein the target cache entry comprises a cache entry for which the address translation data comprises partial address translation data indicative of an address of the next level page table specified by a table address of a target page table entry when used as the branch page table entry; and trigger an invalidation of the given cache entry when the given cache entry is identified to be the target cache entry. The given cache entry is permitted to be retained when the given cache entry provides full address translation data indicative of an address of a corresponding region of address space corresponding to an output address specified by the target page table entry when used as the leaf page table entry.
Circuitry comprises instruction decoder circuitry to decode instructions for execution; processing circuitry to execute instructions decoded by the instruction decoder circuitry; interface circuitry defining an interface for data communication with data compression circuitry; in which the processing circuitry is responsive to one or more instructions of an instruction set defined for the processing circuitry to provide to the interface: input data to be processed by the data compression circuitry; and identification data identifying a compression system for use by the data compression circuitry to process the input data; and in which the processing circuitry is configured to receive from the interface: status data indicating whether data compression circuitry connected to the interface can process data using the compression system identified by the identification data; and, when the status data indicates that the data compression circuitry can process data using the compression system identified by the identification data, output data which has been processed from the input data by the data compression circuitry using the compression system identified by the identification data.
Doorbell physical interrupt control circuitry (20) comprises interrupt detection circuitry (22) to detect an incoming interrupt to be raised as a given virtual interrupt (having a given priority) for a given virtual interrupt handling context, and doorbell physical interrupt generation circuitry (24) responsive to detection of the incoming interrupt by the interrupt detection circuitry, to determine whether the given priority of the given virtual interrupt is indicated, by doorbell- enabled-priority configuration data (28), as enabled for doorbell physical interrupt generation, and if so, to generate a doorbell physical interrupt to be processed in a given physical interrupt handling context. The doorbell physical interrupt indicates to a physical processor handling interrupts for the given physical interrupt handling context that the given virtual interrupt is pending for the given virtual interrupt handling context.
G06F 9/48 - Program initiating; Program switching, e.g. by interrupt
G06F 9/50 - Allocation of resources, e.g. of the central processing unit [CPU]
25.
RUN-TIME MODIFICATION OF A FIELD PROGRAMMABLE GATE ARRAY OR A COARSE GRAINED RECONFIGURABLE ARRAY TO DUPLICATE THE MOST VULNERABLE FUNCTIONAL CIRCUITS BEHAVIOUR
A data processing apparatus is provided. Determination circuitry performs a determination of a vulnerability of each of a plurality of functional circuits in a processing circuit and modification circuitry modifies a behaviour of a reprogrammable circuit to match an architectural behaviour of a vulnerable functional circuit in the functional circuits in response to the determination.
G06F 11/14 - Error detection or correction of the data by redundancy in operation, e.g. by using different operation sequences leading to the same result
G06F 11/16 - Error detection or correction of the data by redundancy in hardware
G06F 21/76 - Protecting specific internal or peripheral components, in which the protection of a component leads to protection of the entire computer to assure secure computing or processing of information in application-specific integrated circuits [ASIC] or field-programmable devices, e.g. field-programmable gate arrays [FPGA] or programmable logic devices [PLD]
An apparatus comprises at least one processor to execute software processes, a memory system to store data for access by the at least one processor, and checkpointing circuitry to trigger saving, to the memory system, of checkpoints of context state associated with at least one software process executed by the at least one processor. The saving of checkpoints is a background process performed by the checkpointing circuitry in the background of execution of the software processes by the at least one processor.
G06F 11/14 - Error detection or correction of the data by redundancy in operation, e.g. by using different operation sequences leading to the same result
G06F 11/16 - Error detection or correction of the data by redundancy in hardware
A data processing method and processor instructions are provided that leverage scatter operations to efficiently merge vector and matrix indices, as compared to standard matrix and vector operations, as well as merge other arithmetic results, lists of numbers, etc.
An interrupt controller controls signalling of a given interrupt having a given interrupt identifier to a target interrupt handling context, by controlling one or more memory write requests to be issued in accordance with a coherency protocol supported by a cache coherent interconnect, to maintain a set of memory-based interrupt tracking structures corresponding to the target interrupt handling context, including: a selected interrupt queue structure selected from among a plurality of interrupt queue structures based on the given interrupt identifier, to queue the given interrupt for processing by the target interrupt handling context; and a queue status summary structure to indicate which of the plurality of interrupt queue structures hold pending interrupts awaiting processing by the target interrupt handling context.
An apparatus is provided in which processing circuitry performs processing in one of a fixed number of at least two domains. One of the domains is subdivided into a variable number of execution environments one of which is a management execution environment configured to manage the execution environments. Memory protection circuitry defines a point of encryption after at least one unencrypted storage circuit of a memory hierarchy and before at least one encrypted storage circuit of the memory hierarchy. The at least one encrypted storage circuitry uses a key input to perform encryption or decryption on the data of a memory access request issued from within a current one of the domains. The key input is different for each of the domains and for each of the execution environments and the management execution environment is configured to inhibit issuing a maintenance operation to the at least one encrypted storage circuit of the memory hierarchy.
There is provided an apparatus that includes processing circuitry for performing processing in one of a fixed number of at least two domains. One of those domains is subdivided into a variable number of execution environments and memory protection circuitry uses a key input to perform encryption or decryption on the data of a memory access request issued to a memory address from within a current one of the domains. The key input is different for each of the domains and for each of the execution environments, the key input for each of the domains is fixed at boot time of the apparatus, and the key input for each of the execution environments is dynamic.
There is provided an apparatus in which processing circuitry performs processing in one of a fixed number of at least two domains, one of the domains being subdivided into a variable number of execution environments. Memory translation circuitry, in response to a memory access request to a given memory address, determines a given encryption environment identifier associated with the one of the execution environments and forwards the memory access request together with the given encryption environment identifier. Storage circuitry stores a plurality of entries, each associated with an associated encryption environment identifier and an associated memory address. The storage circuitry includes determination circuitry that determines, in at least one enabled mode of operation, whether the given encryption environment identifier differs from the associated encryption environment identifier associated with one of the entries associated with the given memory address.
An apparatus comprises exception return state register storage, and processing circuitry. In response to a guarded control stack (GCS) exception return state push instruction, the processing circuitry obtains exception return state information from the exception return state register storage and push the state information to a GCS data structure. In response to a GCS exception return state pop instruction, the processing circuitry obtains GCS-protected exception return state information from the GCS data structure. In at least one operating state, the processing circuitry detects, in response to an attempt to modify the exception return state information stored in the exception return state register storage, whether an exception return state lock parameter is in a locked state or an unlocked state, and signals a fault when it is in the locked state.
A target virtual address is translated to a target physical address for a memory access request. At least for write requests, the memory access request is rejected when a target stage-1 translation table entry specifies that a target memory region corresponding to the target virtual address is a guarded control stack (GCS) region for storing a GCS data structure for protecting return state information, and the memory access request is not a GCS memory access request triggered by one of a restricted subset of GCS-accessing instruction types. When an anti-aliasing property is specified for the target memory region and the target stage-1 translation table entry or another stage-1 translation table entry used to locate the target stage-1 translation table entry is an unhardened entry unprotected by a translation hardening mechanism, the memory access request is rejected. In at least one operating state, a GCS memory access request is rejected when the anti-aliasing property is not specified for the target memory region.
An apparatus is described having processing circuitry for performing operations during which access requests to memory are generated. The processing circuitry generates memory addresses for the access requests using capabilities, where each capability indicates a pointer value and constraining information used to constrain access to memory using memory addresses derived from the pointer value. A marker indication field is stored in association with each capability to provide a marker value used to distinguish between static capabilities used to access statically allocated memory and dynamic capabilities used to access dynamically allocated memory. Capability tracking circuitry maintains a tracking structure providing a tracking field for each of a plurality of memory regions, and the capability tracking circuitry sets the tracking field for a given memory region amongst the plurality of memory regions when at least one capability whose associated marker indication field has a specified marker value is written to the given memory region. The specified marker value indicates that writing of the associated capability to memory is to be tracked by the capability tracking circuitry to facilitate subsequent revocation of that associated capability.
An apparatus is provided comprising processing circuitry to perform operations, instruction decoder circuitry to decode instructions to control the processing circuitry to perform the operations specified by the instructions, and array storage comprising storage elements to store data elements. The array storage is arranged to store at least one two dimensional array of data elements accessible to the processing circuitry when performing the operations, each two dimensional array of data elements comprising a plurality of vectors of data elements, where each vector is one dimensional. The instruction decoder circuitry is arranged, in response to a move and zero instruction that identifies one or more vectors of data elements of a given two dimensional array of data elements within the array storage, to control the processing circuitry to move the data elements of the one or more identified vectors from the array storage to a destination storage and to set to a logic zero value the storage elements of the array storage that were used to store the data elements of the one or more identified vectors.
An apparatus is provided comprising processing circuitry to perform operations, instruction decoder circuitry to decode instructions to control the processing circuitry to perform the operations specified by the instructions, and array storage comprising storage elements to store data elements. The array storage is arranged to store at least one two dimensional array of data elements accessible to the processing circuitry when performing the operations, each two dimensional array of data elements comprising a plurality of vectors of data elements, where each vector is one dimensional. The instruction decoder circuitry is arranged, in response to decoding a zero vectors instruction that identifies multiple vectors of data elements of a given two dimensional array of data elements within the array storage, to also decode a subsequent accumulate instruction arranged to operate on the identified multiple vectors of data elements, and to control the processing circuitry to perform a non-accumulating variant of an accumulate operation specified by the accumulate instruction to produce result data elements for storing in the identified multiple vectors within the array storage.
Processing circuitry is provided to perform operations, along with instruction decoder circuitry to decode instructions to control the processing circuitry to perform the operations specified by the instructions. A set of registers is used to hold data values for access by the processing circuitry. The instruction decoder circuitry is responsive to an ordering constrained access instruction used to access multiple data values, and providing register indication information and memory address information, to control the processing circuitry to perform a sequence of access operations, where each access operation causes a data value from amongst the multiple data values to be moved between an associated register determined from the register indication information and an associated memory address determined from the memory address information. Further, an ordering indication is derived from the ordering constrained access instruction and used to determine an order in which the multiple data values are to be accessed when performing the sequence of access operations, to thereby ensure that observability conditions required when implementing the ordering constrained access instruction are met.
A data processing apparatus is provided that includes storage circuitry to store a plurality of interconnected instructions. Analysis circuitry analyses the instructions to determine a degree of uniqueness of profile measurements of a control flow path fragments within the instructions.
G06F 21/54 - Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems during program execution, e.g. stack integrity, buffer overflow or preventing unwanted data erasure by adding security routines or objects to programs
G06F 11/34 - Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation
G06F 11/36 - Preventing errors by testing or debugging of software
G06F 21/56 - Computer malware detection or handling, e.g. anti-virus arrangements
39.
DEVICE PERMISSIONS TABLE DEFINING PERMISSIONS INFORMATION FOR A TRANSLATED ACCESS REQUEST
Apparatus, method and code for fabrication of an apparatus. The apparatus comprises address translation circuitry (116) to translate virtual addresses to physical addresses in response to advance address translation requests issued by devices (105) on behalf of software contexts (125). The apparatus also comprises translated access control circuitry (117) to control access to memory (110) in response to translated access requests issued by the devices (105) on behalf of the software contexts (125), based on permissions information defined in a device permission table (220), wherein the corresponding access permissions provide information for checking whether translated access requests from a plurality of software contexts are prohibited.
For a set of data points which are desired to be processed according to neural network processing, each data point corresponding to a position in space, data point information indicative of one or more properties of the data points is received (500), and connectivity information indicative of connections between the data points is determined (503). An order for the data points is then determined (504) based on the positions in space of the data points, and updated connectivity information (505) is generated based on the initial connectivity information and the determined order for the set of data points. The updated connectivity information and data point information are provided for further processing (507) to be performed by a processor operable to execute neural network processing.
An apparatus is provided having a memory device and associated access control circuitry, and an additional memory device and associated additional access control circuitry. Redundant data generation circuitry generates, for a given block of data having an associated given memory address, an associated block of redundant data for use in an error detection process. The access control circuitry is arranged to store, at a location in the memory device determined from the given memory address, at least a portion of the given block of data and a first copy of the associated block of redundant data, and the additional access control circuitry is arranged to store, at a location in the additional memory device determined from the given memory address, any remaining portion of the given block of data not stored in the memory device and a second copy of the associated block of redundant data. Error detection circuitry performs the error detection process on the stored given block of data using one copy of the associated block of redundant data, and generates an output signal indicating a result of the error detection process. Comparison circuitry compares the first and second copies of the associated block of redundant data, and generates a comparison result signal to supplement the output signal from the error detection circuitry.
Processing circuitry (4) performs data processing in response to instructions. Memory management circuitry (28) controls access to memory based on page table information capable of associating a given page of memory address space with a read-as-X property indicative that reads to an address in the given page of memory address space should be treated as returning a specified value X. In response to determining, for a read request issued to read a read target value for a read target block of memory address space, that at least part of the read target block corresponds to a page associated with the read-as-X property, the memory management circuitry (28) controls the specified value X to be returned to the processing circuitry (4) as at least part of the read target value. This enables large regions of memory address space to be treated as storing a specified value without needing to commit physical memory for those regions.
An apparatus has processing circuitry (16) to perform data processing, and instruction decoding circuitry (10) to control the processing circuitry to perform the data processing in response to decoding of program instructions defined according to a scalable vector instruction set architecture supporting vector instructions operating on vectors of scalable vector length to enable the same instruction sequence to be executed on apparatuses with hardware supporting different maximum vector lengths. The instruction decoding circuitry and the processing circuitry support a sub-vector-supporting instruction which treats a given vector as comprising a plurality of sub-vectors with each sub-vector comprising a plurality of vector elements. In response to the sub-vector-supporting instruction, the instruction decoding circuitry controls the processing circuitry to perform an operation for the given vector at sub-vector granularity. Each sub-vector has an equal sub-vector length.
Peripheral components, data processing systems and methods of operating such peripheral components and data processing systems are disclosed. The systems comprise an interconnect comprising a system cache, a peripheral component coupled to the interconnect, and a memory coupled to the interconnect. The peripheral component has a memory access request queue for queuing memory access requests in a receipt order. Memory access requests are issued to the interconnect in the receipt order. A memory read request is not issued to the interconnect until a completion response for all older memory write requests has been received from the interconnect. The peripheral component is responsive to receipt of a memory read request to issue a memory read prefetch request comprising a physical address to the interconnect and the interconnect is responsive to the memory read prefetch request to cause data associated with the physical address in the memory to be cached in the system cache.
G06F 12/0817 - Cache consistency protocols using directory methods
G06F 12/0862 - Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches with prefetch
G06F 12/1072 - Decentralised address translation, e.g. in distributed shared memory systems
G06F 12/126 - Replacement control using replacement algorithms with special data handling, e.g. priority of data or instructions, handling errors or pinning
G06F 13/16 - Handling requests for interconnection or transfer for access to memory bus
G06F 12/084 - Multiuser, multiprocessor or multiprocessing cache systems with a shared cache
An apparatus is described having processing circuitry to perform vector processing operations, a set of vector registers, and an instruction decoder to decode vector instructions to control the processing circuitry to perform the required operations. The instruction decoder is responsive to a given vector memory access instruction specifying a plurality of memory access operations, where each memory access operation is to be performed to access an associated data element, to determine, from a data vector indication field of the given vector memory access instruction, at least one vector register in the set of vector registers associated with a plurality of data elements, and to determine, from at least one capability vector indication field of the given vector memory access instruction, a plurality of vector registers in the set of vector registers containing a plurality of capabilities. Each capability is associated with one of the data elements in the plurality of data elements and provides an address indication and constraining information constraining use of that address indication when accessing memory. The number of vector registers determined from the at least one capability vector indication field is greater than the number of vector registers determined from the data vector indication field. The instruction decoder controls the processing circuitry: to determine, for each given data element in the plurality of data elements, a memory address based on the address indication provided by the associated capability, and to determine whether the memory access operation to be used to access the given data element is allowed in respect of that determined memory address having regard to the constraining information of the associated capability; and to enable performance of the memory access operation for each data element for which the memory access operation is allowed.
Data processing apparatus comprises vector processing circuitry to access an array register having at least n x n storage locations, where n is an integer greater than one, the vector processing circuitry comprising: instruction decoder circuitry to decode program instructions; and instruction processing circuitry to execute instructions decoded by the instruction decoder circuitry. The instruction decoder circuitry is responsive to an array access instruction, to control the instruction processing circuitry to access, for a vector of n vector elements, a set of n storage locations each having a respective array location in the array register. The array location accessed for a given vector element of the vector is defined by one or more coordinates associated with the given vector element by one or more parameters of the array access instruction.
In response to determining circuitry determining that a portion of data to be sent to a recipient over an interconnect has a predetermined value, data sending circuitry performs data elision to: omit sending at least one data FLIT corresponding to the portion of data having the predetermined value; and send a data-elision-specifying FLIT specifying data-elision information indicating to the recipient that sending of the at least one data FLIT has been omitted and that the recipient can proceed assuming the portion of data has the predetermined value. The data- elision-specifying FLIT is a FLIT other than a write request FLIT for initiating a memory write transaction sequence. This helps to conserve data FLIT bandwidth for other data not having the predetermined value.
Memory management circuitry (28) supports two-stage address translation based on a stage-1 and stage-2 translation table structures. Stage-2 access permission information specified by a stage-2 translation table entry has an encoding specifying whether a corresponding memory region has a partially-read-only permission indicating that write requests to the memory region corresponding to the target intermediate address, issued when processing circuitry (4) is in a predetermined execution state, are permitted for a restricted subset of write request types (including metadata-updating write requests for updating access tracking metadata in translation table entries) but prohibited for other write request types. The memory management circuitry (28) rejects a memory access request when the stage-2 access permission information of a corresponding stage-2 translation table entry specifies the partially-read-only permission and the memory access request is a write request, other than the restricted subset of write request types, issued in the predetermined execution state.
There is provided a processing apparatus comprising decoder circuitry. The decoder circuitry is configured to generate control signals in response to an instruction. The processing apparatus further comprises processing circuitry which comprising a plurality of processing lanes. The processing circuitry is configured, in response to the control signals, to perform a vector processing operation in each processing lane of the plurality of processing lanes for which a per-lane mask indicates that processing for that processing lane is enabled. The processing apparatus further comprises control circuitry to monitor each processing lane of the plurality of processing lanes for each instruction of a plurality of instructions performed in the plurality of processing lanes and to modify the per-lane mask for a processing lane of the plurality of processing lanes in response to a processing state of the processing lane meeting one or more predetermined conditions.
Disclosed is a data processing system comprising a data processor and a cache that is operable to transfer data from memory to the data processor. The data processor is operable to use data of a type that when transferred to the cache can comprise multiple component values. The data processor is however operable to store within a cache line of the cache a subset of less than all of the component values for a multicomponent data element. The cache is configured to further store in association with each cache line an indication of which data element component values are stored in the cache line so that cache lookups can be performed using the indications of which data element component values are stored in which cache lines.
G06F 12/0875 - Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches with dedicated cache, e.g. instruction or stack
51.
APPARATUS AND METHOD FOR MANAGING PREFETCH TRANSACTIONS
An apparatus and method are provided for managing prefetch transactions. The apparatus has an interconnect for providing communication paths between elements coupled to the interconnect. The elements coupled to the interconnect comprise at least a requester element to initiate transactions, and a plurality of completer elements each of which is arranged to respond to a transaction received by that completer element. Congestion tracking circuitry maintains, in association with the requester element, a congestion indication for each of a plurality of routes through the interconnect used to propagate transactions initiated by that requester element. Each route comprises one or more communication paths, and the route employed to propagate a given transaction is dependent on a target completer element for that transaction. Prefetch throttling circuitry then identifies, in response to an indication of a given prefetch transaction that the requester element wishes to initiate, the target completer element amongst the plurality of completer elements to which that given prefetch transaction would be issued. It then determines whether to issue the given prefetch transaction in dependence on the congestion indication for the route that has been determined.
G06F 12/0862 - Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches with prefetch
52.
METHODS AND APPARATUS FOR TRAINING A CLASSIFICATION DEVICE
A method for training a classification device, the method comprising: receiving classification device constraints at an intermediary device; receiving training data at the intermediary device; matching the training data to the classification device 5 constraints to provide constrained training data; mapping the constrained training data to classification device functionality to provide a command model; and transmitting the command model to the classification device.
A system on chip (102) comprising a plurality of logically homogeneous processor cores (104), each processor core comprising processing circuitry (210) to execute tasks allocated to that processor core, and task scheduling circuitry (202) configured to allocate tasks to the plurality of processor cores. The task scheduling circuitry is configured, for a given task to be allocated, to determine, based on at least one physical circuit implementation property associated with a given processor core, whether the given task is allocated to the given processor core.
Capability storage circuitry 30, 32, 60, 34 stores at least one capability specifying a capability value and capability metadata indicative of constraints on valid use of the capability value. Capability checking circuitry 44 determines whether a capability-controlled operation to be performed by the processing circuitry with reference to a target capability is allowed, based on whether the capability-controlled operation satisfies the constraints indicated by the capability metadata of the target capability, and triggers an error handling response when the constraints are not satisfied. Micro-architectural control circuitry 40, 42, 23 controls a micro- architectural control function, other than determining whether the capability -controlled operation is allowed, depending on the capability metadata specified by a hint capability used to provide a hint to the micro-architectural control function.
An apparatus and method are described for handling sealed capabilities. The apparatus has processing circuitry to perform processing operations during which access requests to memory are generated, wherein the processing circuitry is arranged to generate memory addresses for the access requests using capabilities that identify constraining information. Checking circuitry then determines whether a given access request whose memory address is generated using a given capability is permitted based on the constraining information identified by that given capability, and based on a level of trust associated with the given access request. Each capability has a capability level of trust associated therewith, and the level of trust associated with the given access request is dependent on both a current mode level of trust associated with a current mode of operation of the processing circuitry, and the capability level of trust of the given capability. At least one of the capabilities is settable as a sealed capability, and the apparatus further comprises sealed capability handling circuitry to prevent the processing circuitry performing at least one processing operation using a given sealed capability when the current mode level of trust is a lower level of trust than the capability level of trust of the given sealed capability.
G06F 21/52 - Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems during program execution, e.g. stack integrity, buffer overflow or preventing unwanted data erasure
G06F 9/30 - Arrangements for executing machine instructions, e.g. instruction decode
56.
ADDRESS TRANSLATION CIRCUITRY AND METHODS FOR PERFORMING ADDRESS TRANSLATION
There is provided address translation circuitry and a method for performing address translation. The address translation circuitry is responsive to receipt of a first address and an identifier to perform an address translation from the first address to a second address by performing a translation table walk comprising one or more translation lookups in a plurality of translation tables that are indexed based on a corresponding portion of the first address. The address translation circuitry is further configured to perform a metadata table walk to determine metadata specific to the identifier and associated with the address translation. The metadata table walk comprises one or more metadata lookups in a plurality of metadata lookup tables, each of the one or more metadata lookups corresponds to one of the one or more translation lookups and is indexed based on a same portion of the first address as that translation.
G06F 12/1009 - Address translation using page tables, e.g. page table structures
G06F 12/14 - Protection against unauthorised use of memory
G06F 12/1036 - Address translation using associative or pseudo-associative address translation means, e.g. translation look-aside buffer [TLB] for multiple virtual address spaces, e.g. segmentation
57.
ENABLING BRANCH RECORDING WHEN BRANCH RECORDING CONFIGURATION VALUES SATISFY A PREDETERMINED CONDITION
An apparatus comprises reset circuitry to perform a cold reset and to perform a warm reset by resetting a subset of state that is reset the cold reset, and branch recording circuitry to perform branch recording to store, in branch record storage circuitry, information about processed branch instructions. The branch recording circuitry determines whether warm and cold branch recording configuration values held in at least one register satisfy a predetermined condition; and when the warm and cold branch recording configuration values fail to satisfy the predetermined condition, branch recording is disabled. The branch record storage circuitry is configured to make the information about the processed branch instruction available for diagnostic analysis. The cold reset comprises resetting both of the warm and cold branch recording configuration values, and the warm reset comprises resetting the warm branch recording configuration value and leaving the cold branch recording configuration value unchanged.
A hinter data processing apparatus is provided with processing circuitry that determines that an execution context to be executed on a hintee data processing apparatus will require a virtual-to-physical address translation. Hint circuitry transmits a hint to a hintee data processing apparatus to prefetch a virtual-to-physical address translation in respect of an execution context of the further data processing apparatus. A hintee data processing apparatus is also provided with receiving circuitry that receives a hint from a hinter data processing apparatus to prefetch a virtual-to-physical address translation in respect of an execution context of the further data processing apparatus. Processing circuitry determines whether to follow the hint and, in response to determining that the hint is to be followed, causes the virtual-to-physical address translation to be prefetched for the execution context of the data processing apparatus. In both cases, the hint comprises an identifier of the execution context.
G06F 12/1009 - Address translation using page tables, e.g. page table structures
G06F 12/1027 - Address translation using associative or pseudo-associative address translation means, e.g. translation look-aside buffer [TLB]
G06F 12/084 - Multiuser, multiprocessor or multiprocessing cache systems with a shared cache
G06F 12/1036 - Address translation using associative or pseudo-associative address translation means, e.g. translation look-aside buffer [TLB] for multiple virtual address spaces, e.g. segmentation
G06F 12/1081 - Address translation for peripheral access to main memory, e.g. direct memory access [DMA]
There is provided a data processing apparatus comprising: memory access circuitry configured to issue access requests to a memory system; estimation circuitry configured to estimate a statistical cardinality count on memory row addresses accessed by the access requests; and decay circuitry configured to apply an exponential time-based decay during estimation of the statistical cardinality count.
G06F 21/79 - Protecting specific internal or peripheral components, in which the protection of a component leads to protection of the entire computer to assure secure storage of data in semiconductor storage media, e.g. directly-addressable memories
A matrix multiplication system and method are provided. The system includes a memory that stores one or more weight tensors, a processor and a matrix multiply accelerator (MMA). The processor converts each weight tensor into an encoded block set that is stored in the memory. Each encoded block set includes a number of encoded blocks, and each encoded block includes a data field and an index field. The MMA converts each encoded block set into a reconstructed weight tensor, and convolves each reconstructed weight tensor and an input data tensor to generate an output data matrix.
There is provided address translation circuitry and a method for performing address translation. The address translation circuitry is responsive to receipt of a first address to perform an address translation between the first address and a second address by performing a predetermined maximum number of sequential lookups. The address translation circuitry is configured to support regular page tables comprising 2N entries and large page tables comprising 2N*M entries. The address translation circuitry is configured to: perform an intermediate lookup to retrieve information indicative of a sequentially next lookup address and page table size information and, when the page table size information indicates that the sequentially next lookup corresponds to one of the large page table and performing the sequentially next lookup would exceed the predetermined maximum number of sequential lookups, suppress subsequent lookups and generate the second address based on the information indicative of the sequentially next lookup address.
A method for filtering adversarial noise from an input signal is provided. The method comprises receiving an input signal which has an unknown level of adversarial noise. The input signal is filtered with a neural network to remove noise from the received input signal, thereby producing a filtered signal. A confidence value is calculated, the confidence value being associated with the filtered signal, and indicative of a level of trust relating to the filtered signal. The filtered signal and the confidence value may then be output.
Various implementations described herein refer to a device having a multi- layered logic structure with multiple layers including a first layer and a second layer arranged vertically in a stacked configuration. The device may have a first network that links nodes together in the first layer. The device may have a second network that links the nodes in the first layer together by way of the second layer so as to reduce latency related to data transfer between the nodes.
There is provided a data processing apparatus and a method of operating a data processing apparatus. The data processing apparatus comprises a plurality of processing elements connected via a network on a single chip arranged to form a triggered spatial architecture. Each processing element comprises front end circuitry configured to generate triggered instructions which are passed to decode circuitry to cause the processing element to perform processing operations. Some processing elements are configured to operate in a producing mode in which the processing element transmits the triggered instructions as consumer instructions to be executed by each of a set of processing elements when operating in a consuming mode. Some processing elements are configured to operate in the consuming mode in which the processing elements retrieve consumer instructions transmitted from a processing element operating in a producing mode, and pass the consumer instructions to the decode circuitry.
There is provided a processing apparatus, method and computer program. The apparatus comprising: decode circuitry to decode instructions; and processing circuitry to apply vector processing operations specified by the instructions. The decode circuitry is configured to, in response to a vector combining instruction specifying a plurality of source vector registers each comprising source data elements in a plurality of data element positions, one or more further source vector registers, and one or more destination registers, cause the processing circuitry to, for each data element position: extract first source data elements from the data element position of each source vector register; extract second source data elements from the one or more further source vector registers; generate a result data element by combining each element of the first source data elements and the second source data elements; and store the result data element to the data element position of the one or more destination registers.
Apparatuses, methods and programs are disclosed relating to the predication of multiple vectors in vector processing. An encoding of predicate information is disclosed which comprises an element size and an element count, wherein the predicate information comprises a multiplicity of consecutive identical predication indicators given by the element count, each predication indicator corresponding to the element size.
An apparatus comprises: memory access circuitry (11) to process memory access requests requesting access to a memory system (10, 32); and access frequency tracking circuitry (40). In response to a given memory access request requesting access to a given page of a memory address space, the access frequency tracking circuitry (40) determines an outcome of a chance-dependent test, where the outcome of the chance-dependent test is dependent on chance. When the outcome of the chance-dependent test is a first outcome, an access frequency tracking indicator corresponding to the given page is updated within an access frequency tracking structure. When the chance-dependent test has an outcome other than the first outcome, the access frequency tracking circuitry 40 omits updating of the access frequency tracking indicator corresponding to the given page.
An apparatus has processing circuitry to execute instructions and address prediction storage circuitry to store address prediction information for use in predicting upcoming instructions to be executed by the processing circuitry. The processing circuitry is responsive to an instruction to generate a pointer signature for a pointer to generate the pointer signature for the pointer based on an address of the pointer and a cryptographic key. The address prediction storage circuitry is also configured to store address prediction information for the pointer, the address prediction information including the pointer. The processing circuitry is responsive to an instruction to authenticate a given pointer to obtain, based on the address prediction information for the given pointer, a predicted pointer signature; compare the predicted pointer signature with a pointer signature identified by the instruction to authenticate; and responsive to the comparing detecting a match, determine that the given pointer is valid.
G06F 21/54 - Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems during program execution, e.g. stack integrity, buffer overflow or preventing unwanted data erasure by adding security routines or objects to programs
G06F 12/14 - Protection against unauthorised use of memory
G06F 9/38 - Concurrent instruction execution, e.g. pipeline, look ahead
An apparatus comprises a divide/square-root pipeline comprising: a plurality of divide/square-root iteration pipeline stages each to perform a respective iteration of a digit- recurrence divide or square root operation; and signal paths to supply outputs generated by one divide/square root iteration pipeline stage in one iteration as inputs to a subsequent divide/square root iteration pipeline stage of the divide/square-root pipeline for performing a subsequent iteration of the digit-recurrence divide or square root operation. The divide/square- root pipeline is capable of performing the digit-recurrence divide or square root operation on a floating-point operand to generate a floating-point result.
There is provided a data processing apparatus and method. The data processing apparatus comprises a plurality of processing elements connected via a network arranged on a single chip to form a spatial architecture. Each processing element comprising processing circuitry to perform processing operations and memory control circuitry to perform data transfer operations and to issue data transfer requests for requested data to the network. The memory control circuitry is configured to monitor the network to retrieve the requested data from the network. Each processing element is further provided with local storage circuitry comprising a plurality of local storage sectors to store data associated with the processing operations, and auxiliary memory control circuitry to monitor the network to detect stalled data (S60). The auxiliary memory control circuitry is configured to transfer the stalled data from the network to an auxiliary storage buffer (S66) dynamically selected from amongst the plurality of local storage sectors (S64).
In response to an instruction decoder decoding a range prefetch instruction specifying first and second address-range-specifying parameters and a stride parameter, prefetch circuitry controls, depending on the first and second address-range-specifying parameters and the stride parameter, prefetching of data from a plurality of specified ranges of addresses into the at least one cache. A start address and size of each specified range is dependent on the first and second address-range-specifying parameters. The stride parameter specifies an offset between start addresses of successive specified ranges. Use of the range prefetch instruction helps to improve programmability and improve the balance between prefetch coverage and circuit area of the prefetch circuitry.
G06F 9/30 - Arrangements for executing machine instructions, e.g. instruction decode
G06F 9/345 - Addressing or accessing the instruction operand or the result of multiple operands or results
G06F 12/0862 - Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches with prefetch
72.
SYSTEM, DEVICES AND/OR PROCESSES FOR AUGMENTING ARTIFICIAL INTELLIGENCE AGENT AND COMPUTING DEVICES
Briefly, example methods, apparatuses, and/or articles of manufacture are disclosed that may be implemented, in whole or in part, using one or more computing devices to enhance capabilities of peer devices. In an implementation, at least one agent to: identify one or more learnable capabilities enabled by one or more parameters that are accessible via receipt of one or more message at the one or more communication devices from one or more other computing devices; and determine a utility of augmenting at least one of the one or more learning engines with at least one of the one or more learnable capabilities.
A host device (10) provides a plurality of virtual machines (54) executing one or more processes (60, 62, 64, 66). A peripheral device (30) performs tasks on behalf of the host and is coupled to it via a communication network (20). The peripheral provides a plurality of virtual peripheral devices (34), each allocated to one of the virtual machines. Address translation circuitry (75) in the host performs two- stage address translation. When accessing a memory (40) via the host, the peripheral requests a transfer with a specified address and associated metadata providing a source identifier field, a first address translation control field and a second address translation control field. The first address translation control field controls any first stage address translation and depends on the process. The second address translation control field controls any second stage address translation required and depends on the virtual machine associated with the specified address.
There is provided an apparatus, method and computer program for constraining memory accesses. The apparatus comprises processing circuitry to perform operations during which access requests to memory are generated. The processing circuitry is arranged to generate memory addresses for the access requests using capabilities that identify constraining information. The apparatus further comprises capability checking circuitry to perform a capability check operation to determine whether a given access request whose memory address is generated using a given capability is permitted based on given constraining information identified by the given capability. The capability check operation includes performing a range check based on range constraining information provided by the given constraining information, and when a determined condition is met, to perform the range check in dependence on both the range constraining information and an item of state information of the apparatus which varies dynamically during performance of the operations of the processing circuitry.
G06F 12/14 - Protection against unauthorised use of memory
G06F 12/1036 - Address translation using associative or pseudo-associative address translation means, e.g. translation look-aside buffer [TLB] for multiple virtual address spaces, e.g. segmentation
A technique is provided for constraining access to memory using capabilities. An apparatus is provided that has processing circuitry for performing operations during which access request to memory are generated, wherein the processing circuitry is arranged to generate memory addresses for the access requests using capabilities that provide a pointer value and associated constraining information. The apparatus also provides capability generation circuitry, that is responsive to the processing circuitry executing a capability generating instruction that identifies a location in a literal pool of the memory, to retrieve a literal value from the location in the literal pool, and to produce a generated capability in which the pointer value of the generated capability is determined from the literal value. The constraining information of the generated capability is selected from a limited set of options in dependence on information specified by the capability generating instruction. It has been found that such an approach provides a robust mechanism for generating capabilities, whilst reducing code size.
A data processing apparatus comprises storage circuitry to store a hierarchy of page tables comprising an intermediate level page table (22). Each entry of the intermediate level page table comprises base address information of a next level page table (24) and control information indicating whether an addressing function (26) has been applied to reorder physical storage locations of entries of the next level page table. Address translation circuitry is provided to perform address translations in response to receipt of a virtual address by performing a lookup in a next level page table dependent on the base address information and a page table index from the virtual address. When the control information indicates that the addressing function has been applied, the lookup is performed at a modified storage location generated by applying the addressing function to the page table index.
A method of operating a system having a plurality of neural networks includes receiving sequential input data events and processing each sequential input data event using a corresponding subset of the plurality of neural networks to obtain a plurality of sequential outputs. Each sequential output is indicative of a predictive determination of an aspect of the corresponding input data event. The method includes processing the plurality of sequential outputs to determine an uncertainty value associated with the plurality of sequential outputs, and operating the system based on the determined uncertainty value.
An apparatus and method are described for generating debug information. The apparatus has processing circuitry for executing a sequence of instructions that includes a plurality of debug information triggering instructions, and debug information generating circuitry for coupling to a debug port. On executing a given debug information triggering instruction, the processing circuitry is arranged to trigger the debug information generating circuitry to generate a debug information signal whose form is dependent on a control parameter specified by the given debug information triggering instruction. The generated debug information signal is output from the debug port for reference by a debugger. The control parameter is such that the form of the debug information signal enables the debugger to determine a state of the processing circuitry when the given debug information triggering instruction was executed.
An apparatus and method are provided, the apparatus comprising: interconnect circuitry to couple a device to one or more processing elements, each processing element operating in a trusted execution environment; and secure stashing decision circuitry to receive stashing transactions from the device and to redirect permitted stashing transactions to a given storage structure accessible to at least one of the one or more processing elements. The secure stashing decision circuitry is configured, in response to receiving a given stashing transaction, to determine whether the given stashing transaction comprises a trusted execution environment identifier associated with a given trusted execution environment, and to treat the given stashing transaction as a permitted stashing transaction when redirection requirements, dependent on the trusted execution environment identifier, are met.
G06F 21/53 - Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems during program execution, e.g. stack integrity, buffer overflow or preventing unwanted data erasure by executing in a restricted environment, e.g. sandbox or secure virtual machine
G06F 21/57 - Certifying or maintaining trusted computer platforms, e.g. secure boots or power-downs, version controls, system software checks, secure updates or assessing vulnerabilities
G06F 12/0806 - Multiuser, multiprocessor or multiprocessing cache systems
Message passing circuitry comprises lookup circuitry responsive to a producer request indicating message data provided on a target message channel by a producer node of a system-on-chip, to obtain, from a channel consumer information structure, selected channel consumer information associated with a given consumer node subscribing to the target message channel. Control circuitry writes the message data to a location associated with an address in a consumer- defined region of address space determined based on the selected channel consumer information. When an event notification condition is satisfied for the target message channel and the given consumer node, and an event notification channel is to be used, event notification data is written to a location associated with an address in a consumer-defined region of address space determined based on event notification channel consumer information associated with the event notification channel.
An apparatus and method for handling stash requests are described. The apparatus has a processing element with an associated storage structure that is used to store data for access by the 5 processing element, and an interface for coupling the processing element to interconnect circuitry. Stash request handling circuitry is also provided that, in response to a stash request targeting the storage structure being received at the interface from the interconnect circuitry, causes a block of data associated with the stash request to be stored within the storage structure. The stash request identifies a given address that needs translating into a corresponding physical 10 address in memory, and also identifies an address space key. Address translation circuitry is used to convert the given address identified by the stash request into the corresponding physical address by performing an address translation that is dependent on the address space key identified by the stash request. The stash request handling circuitry is then responsive to the corresponding physical address determined by the address translation circuitry to cause the block 15 of data to be stored at a location within the storage structure associated with the physical address.
Processing circuitry (310) processes instructions in one of at least three domains each associated with a corresponding physical address space, and issues a memory access request to a memory system, the memory access request comprising a partition identifier (selected based on programmable partition identifier selection information associated with a current software execution environment which caused the memory access request to be issued) and a multi-bit partition identifier space indicator indicating a selected partition identifier space (selected from among at least three partition identifier spaces based on a current domain of the processing circuitry). The selected partition identifier space and partition identifier (332, 334) together represent information for selecting, at a memory system component, parameters for controlling allocation of resources for handling the memory access request or managing contention for said resources, or for selecting whether performance monitoring data is updated in response to the memory access request.
Subject matter disclosed herein relates to systems, devices, and/or processes for processing signals relating to surfaces that may be viewable by subjects though one or more devices. In an embodiment, a surface may include one or more devices embedded therein to provide one or more signals to define a portion of the surface.
Circuitry comprises a set of physical registers; instruction decoder circuitry to decode processing instructions each generating an output multi-bit data item in a destination architectural register by applying a processing operation to one or more source data items in one or more respective source architectural registers, the decoder circuitry being configured to detect whether a processing instruction defines a predicated merge operation, being a processing operation which propagates a set of zero or more portions of the prevailing contents of the destination architectural register as respective portions of the output multi-bit data item, the set of portions being defined by predicate data; register allocation circuitry to associate physical registers of the set of physical registers with the destination architectural register and the one or more source architectural registers and, when the detector circuitry detects that a processing instruction defines a predicated merge operation, the register allocation circuitry is configured to associate a further physical register with that processing instruction to store a copy of the prevailing contents of the destination architectural register; predicate generation circuitry to generate the predicate data for use in the execution of a given processing instruction defining a predicated merge operation; and predicate detector circuitry to control association of the further physical register with the given processing instruction in response to a state of the predicate data generated by the predicate generation circuitry.
A method of and apparatus for processing graphics in a tile-based graphics processing system, wherein when preparing primitive lists it is determined, based on a measure of the size of a primitive, whether or not to perform processing of one or more attributes of one or more vertices of the primitive.
The present disclosure provides a method and apparatus for communicating between dice of an inductively-coupled 3D integrated circuit (3D-IC). A transmit resonant circuit at a transmit die is inductively coupled to a first receive resonant circuit at a first receive die, and to a second receive resonant circuit at a second receive die. The resonant circuit at the targeted receive die is tuned to the frequency of resonance of the transmit resonant circuit, while the resonant circuit at the untargeted receive die is detuned, resulting in lower power consumption for a given bit error rate at the targeted die.
A method and apparatus is provided for processing accelerator instructions in a data processing apparatus, where a block of one or more accelerator instructions is executable on a host processor or on an accelerator device. For an instruction executed on the host processor and referencing a first virtual address, the instruction is issued to an instruction queue of the host processor and executed the instruction by the host processor, the executing including translating, by translation hardware of the host processor, the first virtual address to a first physical address. For an instruction executed on the accelerator device and referencing the first virtual address, the first virtual address is translated, by the translation hardware, to a second physical address and the instruction is sent to the accelerator device referencing the second physical address. An accelerator task may be initiated by writing configuration data to an accelerator job queue.
An apparatus and method are provided for storing a plurality of translation entries in a cache, each translation entry corresponding to one of a plurality of page table entries and defining a translation between a first address and a second address, and encoding control information indicative of an attribute of each page table entry; returning, in response to a lookup querying a first lookup address, a corresponding second address when the first lookup address corresponds to one of the plurality of translation entries stored in the cache; modifying at least some of the control information in response to notification of a modification of the attribute in a page table entry; and retaining in the cache at least one translation entry corresponding to the page table entry for use in a subsequent address lookup querying a corresponding first lookup address in response to the notification of the modification of the attribute in the page table entry.
An apparatus comprises an instruction decoder 20 and processing circuitry 22. Monitoring circuitry 36 monitors one or more events indicative of a potential update to data associated with any of a monitored set of addresses, and makes accessible to software executing on the processing circuitry 22 a monitoring reporting indication indicative of whether any events has occurred for at least one of the monitored set of addresses. In response to decoding of an exclusive status setting instruction specifying a given address, the processing circuitry 22 sets an exclusive status associated with the given address. The exclusive status is cleared in response to detecting an event indicative of a conflicting memory access to the given address. In response to decoding of a monitor exclusive instruction, the processing circuitry 22: determines whether the exclusive status is associated with a target address, and if so allocates the target address to be one of the monitored set of addresses.
Key capability storage circuitry 90 is provided to store a key capability specifying key bounds indicating information indicative of permissible bounds for information specified by any one or more of: a non-capability operand, a capability, or the key capability itself. For a given software compartment executed by the processing circuitry, which lacks a key capability operating privilege associated with at least a portion of the key capability storage circuitry, the processing circuitry is configured to prohibit certain manipulations of the key capability, including a transfer between key capability storage and a memory location selected by the given software compartment. This can help to support temporal safety.
G06F 21/52 - Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems during program execution, e.g. stack integrity, buffer overflow or preventing unwanted data erasure
G06F 21/80 - Protecting specific internal or peripheral components, in which the protection of a component leads to protection of the entire computer to assure secure storage of data in storage media based on magnetic or optical technology, e.g. disks with sectors
There is provided a graphics processor (10) comprising a primitive processing circuit operable to process graphics primitives into respective fragment work items to be rendered by a rendering circuit (22). The primitive processing circuit generates one or more queues (18A, 18B) of fragment work items for rendering that contain fragment work items corresponding to multiple, different sources of fragment work items. The graphics processor (10) is configured to issue fragment work items to the rendering circuit (22) in an interleaved fashion such that rendering of fragment work items from a first source of fragment work items can thereby be interleaved with rendering of fragment work items from a second source of fragment work items.
According to one implementation of the present disclosure, an integrated circuit comprises a memory macro unit that includes an input/output (I/O) circuit block, where read/write circuitry of the I/O circuit block is apportioned on at least first and second tiers of the memory macro unit. In a particular implementation, read circuitry of the read/write circuitry is arranged on the first tier and write circuitry of the read/write circuitry is arranged on the second tier.
Circuitry comprises processing circuitry configured to execute program instructions in dependence upon respective trigger conditions matching a current trigger state and to set a next trigger state in response to program instruction execution; the processing circuitry comprising: instruction storage configured to selectively provide a group of two or more program instructions for execution in parallel; and trigger circuitry responsive to the generation of a trigger state by execution of program instructions and to a trigger condition associated with a given group of program instructions, to control the instruction storage to provide program instructions of the given group of program instructions for execution.
G06F 9/30 - Arrangements for executing machine instructions, e.g. instruction decode
G06F 9/38 - Concurrent instruction execution, e.g. pipeline, look ahead
G06F 15/82 - Architectures of general purpose stored program computers data or demand driven
G06F 15/80 - Architectures of general purpose stored program computers comprising an array of processing units with common control, e.g. single instruction multiple data processors
A data processing system that comprises a processing unit (1), (2), (3), (10) and a communications bus (5) over which bus transactions to access memory (6) can be performed is disclosed. The system includes a codec (20), and the processing unit (1), (2), (3), (10) can initiate over the communications bus (5), bus transactions that comprise the codec (20) accessing the memory (6).
An apparatus, method and computer program are described, the apparatus comprising decode circuitry configured to decode instructions, and processing circuitry responsive to the instructions decoded by the decode circuitry to perform data processing. In response to the decode circuitry decoding a memory copy size determining instruction specifying as operands a source memory address, a destination memory address and a total number of bytes to be copied from a source block of memory locations indicated by the source memory address to a destination block of memory locations indicated by the destination memory address, the processing circuitry is configured to determine, based on at least one of the source memory address and the destination memory address, a memory copy size indicating value indicative of a subset of the total number of bytes to be copied. A data transfer instruction is also described.
Processing circuitry (16) and an instruction decoder (9) supports a load chunk instruction and a store chunk instruction which can be useful for implementing memory copy functions and other library functions for manipulating or comparing blocks of memory. Number of bytes to load or store in response to these instructions is determined based on an implementation specific condition. As well as loading or storing bytes of data, the load chunk instruction and (10) store chunk instruction also designated a load/store length value as data corresponding to an architecturally visible register, which provides an indication of a number of bytes loaded or stored.
Address translation circuitry (20) converts virtual addresses into physical addresses with reference to intermediate level and final level page tables. Final level descriptors within final level page tables identify address translation data for an associated region of memory. Intermediate level descriptors within intermediate level page tables identify intermediate address translation data used to identify an associated page table at a next level of the page tables. Page table update circuitry (35) maintains state information within each final and intermediate level descriptor, and updates the state information from a clean state to a dirty state: in the final level descriptors to indicate that a modification of content of the associated memory region is permitted; in the intermediate level descriptors to indicate occurrence of an update from the clean state to the dirty state within the state information of any final level descriptors that are accessed via that intermediate level descriptor.
A context-information-dependent instruction causes a context-information-dependent operation to be performed based on specified context information indicative of a specified execution context. A context information translation cache 10 stores context information translation entries each specifying untranslated context information and translated context information. Lookup circuitry 14 performs a lookup of the context information translation cache based on the specified context information, to identify whether the context information translation cache includes a matching context information translation entry which is valid and which specifies untranslated context information corresponding to the specified context information. When the matching context information translation entry is identified, the context- information-dependent operation is performed based on the translated context information specified by the matching context information translation entry.
G06F 21/53 - Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems during program execution, e.g. stack integrity, buffer overflow or preventing unwanted data erasure by executing in a restricted environment, e.g. sandbox or secure virtual machine
G06F 21/79 - Protecting specific internal or peripheral components, in which the protection of a component leads to protection of the entire computer to assure secure storage of data in semiconductor storage media, e.g. directly-addressable memories
G06F 12/14 - Protection against unauthorised use of memory
A hardware accelerator and method for a mixed-precision deep neural network (DNN) ensemble are provided. The hardware accelerator includes a DNN primary module, a number of DNN auxiliary modules and a fusion module. The DNN primary module processes a DNN primary model having a primary precision level, and each DNN auxiliary module processes a DNN auxiliary model having an auxiliary precision level less than the primary precision level. The DNN primary model and each DNN auxiliary model are configured to determine a mean predicted category and a variance based on input data. The fusion module is configured to receive the mean predicted categories and variances from the DNN primary model and each DNN auxiliary model, determine an average mean predicted category and an average variance based on the mean predicted categories and variances, and output the average mean predicted category and the average variance.
In a cache stash relay, first data, from a producer device, is stashed in a shared cache of a data processing system. The first data is associated with first data addresses in a shared memory of the data processing system. An address pattern of the first data addresses is identified. When a request for second data, associated with a second data address, is received from a processing unit of the data processing system, any data associated with data addresses in the identified address pattern are relayed from the shared cache to a local cache of the processing unit if the second data address is in the identified address pattern. The relaying may include pushing the data from the shared cache to the local cache or a pre-fetcher of the processing unit pulling the data from the shared cache to the local cache in response to a message.
G06F 12/0862 - Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches with prefetch
G06F 12/084 - Multiuser, multiprocessor or multiprocessing cache systems with a shared cache
G06F 12/0813 - Multiuser, multiprocessor or multiprocessing cache systems with a network or matrix configuration