A computer-based technique for processing an application includes determining that a loop of the application includes a reference to a data item of a vector data type. A trip count of the loop is determined to have an unknown trip count. The loop is split into a first loop and a second loop based on a splitting factor. The second loop is unrolled.
Apparatus and associated methods relate to packet header field extraction as defined by a high level language and implemented in a minimum number of hardware streaming parsing stages to speculatively extract header fields from among multiple possible header sequences. In an illustrative example, the number of stages may be determined from the longest possible header sequence in any received packet. For each possible header sequence, one or more headers may be assigned to each stage, for example, based on a parse graph. Each pipelined stage may resolve a correct header sequence, for example, by sequentially extracting length and transition information from an adjacent prior stage to determine offset of the next header. By speculatively extracting selected fields from every possible position in each pipeline stage, a correct value may be selected using sequential hardware streaming pipelines to substantially reduce parsing latency.
H04L 69/324 - Intralayer communication protocols among peer entities or protocol data unit [PDU] definitions in the data link layer [OSI layer 2], e.g. HDLC
3.
CIRCUIT SIMULATION BASED ON AN RTL COMPONENT IN COMBINATION WITH BEHAVIORAL COMPONENTS
Methods and systems for simulating RTL models in combination with behavioral models involve generating an overall simulation model from a circuit design by a simulation tool of an EDA system. The overall simulation model includes respective behavioral simulation models of components of the circuit design. A register transfer level (RTL) simulation model of a particular component of the components of the circuit design is generated by an extractor tool of the EDA system. The respective behavioral simulation model of the particular component in the overall simulation model is replaced with the RTL simulation model, and a simulation that executes the overall simulation model and the RTL simulation model in place of the behavioral simulation model of the particular component is performed.
Methods and systems to detect a metastable condition and suppress/mask a signal during the metastable condition. The metastable condition may arise from asynchronous sampling. Techniques disclosed herein may be configured to enable asynchronous lock-stepping, where outputs of redundant circuit blocks of a first clock domain are received at input nodes of a second clock domain. In the second clock domain, logic states at the input nodes are compared to detect errors, and results of the comparison are masked during transitions at the input nodes. Masking may be constrained to situations where logic states at the input nodes differ.
G06F 11/16 - Error detection or correction of the data by redundancy in hardware
H03L 7/081 - Automatic control of frequency or phase; Synchronisation using a reference signal applied to a frequency- or phase-locked loop - Details of the phase-locked loop provided with an additional controlled phase shifter
5.
DATA PROCESSING ARRAY INTERFACE HAVING INTERFACE TILES WITH MULTIPLE DIRECT MEMORY ACCESS CIRCUITS
An integrated circuit (IC) can include a data processing array including a plurality of compute tiles arranged in a grid. The IC can include an array interface coupled to the data processing array. The array interface includes a plurality of interface tiles. Each interface tile includes a plurality of direct memory access circuits. The IC can include a network-on-chip (NoC) coupled to the array interface. Each direct memory access circuit is communicatively linked to the NoC via an independent communication channel.
G06F 13/28 - Handling requests for interconnection or transfer for access to input/output bus using burst mode transfer, e.g. direct memory access, cycle steal
6.
SYSTEM-ON-CHIP HAVING MULTIPLE CIRCUITS AND MEMORY CONTROLLER IN SEPARATE AND INDEPENDENT POWER DOMAINS
Examples of the present disclosure generally relate to integrated circuits, such as a system-on-chip (SoC), that include a memory subsystem. In some examples, an integrated circuit includes a first master circuit in a first power domain on a chip; a second master circuit in a second power domain on the chip; and a first memory controller in a third power domain on the chip. The first master circuit and the second master circuit each are configured to access memory via the first memory controller. The first power domain and the second power domain each are separate and independent from the third power domain.
An integrated circuit (IC) can include a data processing array including a plurality of compute tiles arranged in a grid. The IC can include an array interface coupled to the data processing array. The array interface includes a plurality of interface tiles. Each interface tile includes a plurality of direct memory access circuits. The IC can include a network-on-chip (NoC) coupled to the array interface. Each direct memory access circuit is communicatively linked to the NoC via an independent communication channel.
G06F 15/80 - Architectures of general purpose stored program computers comprising an array of processing units with common control, e.g. single instruction multiple data processors
8.
Lossless compression using subnormal floating point values
A disclosed compression method includes inputting a data set of floating point values from an input circuit to a compression circuit and detecting non-zero values and sequences of zero values in the data set. The compression circuit outputs, in response to detection of a non-zero value in the data set, the non-zero value to an output circuit. The compression circuit generates, in response to detection of a sequence of zero values in the data set, a subnormal floating point value having significand bits that indicate counted zero values in the sequence, and outputs the subnormal floating point value to the output circuit.
H03M 7/24 - Conversion to or from floating-point codes
G06F 7/483 - Computations with numbers represented by a non-linear combination of denominational numbers, e.g. rational numbers, logarithmic number system or floating-point numbers
9.
Identifying alignment markers using partial correlators
Methods and apparatus for detecting alignment markers in received data streams received via a plurality of data lanes are disclosed. Corresponding data streams may be received via respective data lanes in the plurality of data lanes, where each data stream includes alignment markers delineating data frames, and each alignment marker has a predefined bit pattern. For each respective data lane, a determination is made whether a specified portion of the received data stream has at least a threshold degree of similarity with a portion of the predefined bit pattern. In response to determining, for one of the plurality of data lanes, that the specified portion has at least the threshold degree of similarity, a frame boundary may be determined based on the specified portion, and a verification may be performed, that the specified portion of the received data stream corresponds to an alignment marker.
A simulation framework is capable modeling a hardware implementation of a reference software system using models specified in different computer-readable languages. The models correspond to different ones of a plurality of subsystems of the hardware implementation. Input data is provided to a first simulator configured to simulate a first model of a first subsystem of the modeled hardware implementation. The input data is captured from execution of the reference software system. The first simulator executing the first model generates a first data file specifying output of the first subsystem. The first data file specifies intermediate data of the modeled hardware implementation. The first data file is provided to a second simulator configured to simulate a second model of a second subsystem of the modeled hardware implementation. The second simulator executing the second model generates a second data file specifying output of the second subsystem.
An integrated circuit (IC) chip device includes testing interface circuitry and testing circuitry to test the operation of the IC chips of the IC chip device. The IC chip device includes a first IC chip that comprises first testing circuitry. The first testing circuitry receives a mode select signal, a clock signal, and encoded signals, and comprises finite state machine (FSM) circuitry, decoder circuitry, and control circuitry. The FSM circuitry determines an instruction based on the mode select signal and the clock signal. The decoder circuitry decodes the encoded signals to generate a decoded signal. The control circuitry generates a control signal from the instruction and the decoded signal. The control signal indicates a test to be performed by the first testing circuitry.
The embodiments herein describe a communication protocol (which can be implemented in hardware or software) that provides efficient recover packet loss and can transit large messages in a complex network environment. In one embodiment, each data packet contains an encoded universal sequence which is unique across the sends, which enables cross-sender loss recovery. A receiver can include a transmission control module that controls the receiving buffer and maintains the buffer status and the sender's status. The transmission control module stores incoming packets to the correct position in the receiving buffer and generates acknowledgement notifications. The transmission control module also handles packet loss and out-of-order receiving of the packets containing the transactions.
An integrated circuit (IC) includes a Network-on-Chip (NoC). The NoC includes a plurality of NoC master circuits, a plurality of NoC slave circuits, and a plurality of switches. The plurality of switches are interconnected and communicatively link the plurality of NoC master circuits with the plurality of NoC slave circuits. The plurality of switches are configured to receive data of different widths during operation and implement different operating modes for forwarding the data based on the different widths.
An integrated circuit (IC) includes a Network-on-Chip (NoC). The NoC includes a plurality of NoC master circuits, a plurality of NoC slave circuits, and a plurality of switches. The plurality of switches are interconnected and communicatively link the plurality of NoC master circuits with the plurality of NoC slave circuits. The plurality of switches are configured to receive data of different widths during operation and implement different operating modes for forwarding the data based on the different widths.
Routing a circuit design includes generating a graph of the circuit design where each connected component is represented as a vertex, generating a routing solution for the circuit design by routing packet-switched nets so that the packet-switched nets of a same connected component do not overlap, and, for each routing resource that is shared by packet-switched nets of different connected components, indicating the shared routing resource on the graph by adding an edge. Cycle detection may be performed on the graph. For each cycle detected on the graph, the cycle may be broken by deleting the edge from the graph and ripping-up a portion of the routing solution corresponding to the deleted edge. The circuit design, or portion thereof, for which the routing solution was ripped up may be re-routed using an increased cost for a shared routing resource freed from the ripping-up.
The embodiments herein describe a 3D SmartNIC that spatially distributes compute, storage, or network functions in three dimensions using a plurality of layers. That is, unlike current SmartNIC that can perform acceleration functions in a 2D, a 3D Smart can distribute these functions across multiple stacked layers, where each layer can communicate directly or indirectly with the other layers.
The embodiments herein rely on cross reticle wires (also referred to as cross die wires) to provide communication channels between programmable dies already formed on a wafer. Using cross reticle wires to facilitate x-die communication can be three to four orders of magnitude faster than using general purpose I/O. With a wafer containing cross reticle wires, various device geometries can be generated at dicing time by cutting across different reticle boundaries. This allows up to full wafer-size devices, or several smaller sub-wafer devices to be derived from one wafer. Although the programmable dies can contain defects, these defects can be identified and avoided when generating a bitstream for configuring programmable features in the programmable dies.
G06F 30/323 - Translation or migration, e.g. logic to logic, hardware description language [HDL] translation or netlist translation
H01L 25/065 - Assemblies consisting of a plurality of individual semiconductor or other solid state devices all the devices being of a type provided for in the same subgroup of groups , or in a single subclass of , , e.g. assemblies of rectifier diodes the devices not having separate containers the devices being of a type provided for in group
H01L 27/02 - Devices consisting of a plurality of semiconductor or other solid-state components formed in or on a common substrate including integrated passive circuit elements with at least one potential-jump barrier or surface barrier
A system includes a multi-port random-access memory (RAM) configured to store an instruction table. The instruction table specifies a regular expression for application to a data stream. The system includes a regular expression engine configured to process the data stream based on the instruction table. The regular expression engine includes a decoder circuit configured to determine validity of active states output from the RAM, a plurality of active states memories operating concurrently, wherein each active states memory is configured to initiate a read from a different port of the RAM using an address formed of an active state output from the active states memory and a portion of the data stream, and switching circuitry configured to route the active states to the plurality of active states memories according, at least in part, to a load balancing technique and validity of the active states.
A system includes a first multi-port RAM storing an instruction table. The instruction table specifies a regular expression for application to a data stream and a second multi-port RAM configured to store a capture table having capture entries decodable for tracking position information for a sequence of characters matching a capture sub-expression of the regular expression. The system includes a regular expression engine processing the data stream to determine match states by tracking active states for the regular expression and priorities for the active states by storing the active states of the regular expression in a plurality of priority FIFO memories in decreasing priority order. The system includes a capture engine operating in coordination with the regular expression engine to determine character(s) of the data stream that match the capture sub-expression based on the active state being tracked and decoding the capture entries of the capture table.
Embodiments herein describe a hardware accelerator for a blockchain node. The hardware accelerator is used to perform a validation operation to validate one or more transactions before those transactions are committed to a ledger of a blockchain. The embodiments herein describe an out-of-order validation scheme where a collector is used to collect validated transactions out of order. Thus, if a validation pipeline has finished validating a later transaction before another validation pipeline has finished validating an earlier transaction, the pipeline can nonetheless send its results to the collector and retrieve another transaction from a scheduler. In this manner, the downtime for the validation pipelines is reduced or eliminated.
A system includes a multi-port RAM configured to store an instruction table. The instruction table specifies a regular expression for application to a data stream. The system includes a regular expression engine (engine) that processes the data stream using the instruction table. The engine includes a decoder circuit that determines validity of active states output from the multi-port RAM and a plurality of priority FIFO memories (PFIFOs) operating concurrently. Each PFIFO can initiate a read from a different port of the multi-port RAM. Each PFIFO can track a plurality of active paths for the regular expression and a priority of each active path by, at least in part, storing entries corresponding to active states in each respective PFIFO in decreasing priority order. The engine includes switching circuitry that selectively routes the active states from the decoder circuit to the plurality of PFIFOs according to the priority order.
G06F 12/126 - Replacement control using replacement algorithms with special data handling, e.g. priority of data or instructions, handling errors or pinning
Embodiments herein describe a hardware accelerator for a blockchain node. The hardware accelerator is used to perform a validation operation to validate one or more transactions before those transactions are committed to a ledger of a blockchain. The embodiments herein describe a scheduler for assigning validation engines to the transactions in response to the number of endorsements in the transactions.
G06Q 20/40 - Authorisation, e.g. identification of payer or payee, verification of customer or shop credentials; Review and approval of payers, e.g. check of credit lines or negative lists
An apparatus includes a data processing array having a plurality of array tiles. Each array tile can include a random-access memory (RAM) having a local memory interface accessible by circuitry within the array tile and an adjacent memory interface accessible by circuitry disposed within an adjacent array tile. Each adjacent memory interface of each array tile can include isolation logic that is programmable to allow the circuitry disposed within the adjacent array tile to access the RAM or prevent the circuitry disposed within the adjacent array tile from accessing the RAM. The data processing array can be subdivided into a plurality of partitions wherein the isolation logic of the adjacent memory interfaces is programmed to prevent array tiles from accessing RAMs across a boundary between the plurality of partitions.
Disclosed herein is a chip package and method for fabricating the same are provided that includes a redistribution layer (RDL) with a plurality of loop and void structures. The chip package includes an integrated circuit (IC) die, and a package substrate. The RDL is disposed between the IC die and the package substrate. The RDL has RDL circuitry that connects the IC die to the package substrate. The RDL circuitry includes a first coil formed in a first metal layer and a second coil formed in a second metal layer. A first end of the second coil is coupled to a second end of the first coil by a first via. A second end of the second coil is the IC die.
A design tool determines features of a circuit design and applies a first model to the features. The first model indicates a predicted value of a metric based on the plurality of features. The design tool applies an explanation model to the features, and the explanation model indicates levels of contributions by the features to the predicted value of the metric, respectively. The design tool selects a feature of the plurality of features based on the respective levels of contributions and looks up a recipe associated with the feature in a database having possible features associated with recipes. The design tool processes the circuit design according to the recipe into implementation data that is suitable for making an integrated circuit (IC).
Embodiments herein describe a SoC with one or more untrusted islands that can host one or more roles or tenants in a data center environment (e.g., a cloud computing environment). In one embodiment, a secure shell encapsulates the untrusted islands with a secure application programming interface (API) to access other hardware resources in the SoC. Hardware resources in the SoC (e.g., HardIP, SoftIP, or both), can either be secure/trusted, or rely on the secure shell to ensure confidentiality.
An integrated circuit (IC) for adaptive memory expansion scheme is proposed, which comprises: a home agent comprising a first memory expansion pool and a second memory expansion pool; a first port connecting the home agent to a first memory expansion device, where the first memory expansion device comprises a first memory pool; a second port connecting the home agent to a second memory expansion device, where the second memory expansion device comprises a second memory pool; a first address table mapping the first memory expansion pool to the first memory pool based on a size of the first memory expansion pool or a size of the first memory pool; and a second address table mapping the second memory expansion pool to the second memory pool based on a size of the second memory expansion pool or a size of the second memory pool.
Embodiments herein describe a SoC with one or more untrusted islands that can host one or more roles or tenants in a data center environment (e.g., a cloud computing environment). In one embodiment, a secure shell encapsulates the untrusted islands with a secure application programming interface (API) to access other hardware resources in the SoC. Hardware resources in the SoC (e.g., HardIP, SoftIP, or both), can either be secure/trusted, or rely on the secure shell to ensure confidentiality.
G06F 21/57 - Certifying or maintaining trusted computer platforms, e.g. secure boots or power-downs, version controls, system software checks, secure updates or assessing vulnerabilities
G06F 21/76 - Protecting specific internal or peripheral components, in which the protection of a component leads to protection of the entire computer to assure secure computing or processing of information in application-specific integrated circuits [ASIC] or field-programmable devices, e.g. field-programmable gate arrays [FPGA] or programmable logic devices [PLD]
A chip package and method for fabricating the same are provided that includes a power delivery network (PDN) with non-uniform electrical conductance. The electrical conductance through each current path of the PDN may be selected to balance the distribution of current flow across the current paths through the chip package, thus compensating for areas of high and low current draw found in conventional designs.
Embodiments herein describe a multiple die system that includes an interposer that connects a first die to a second die. Each die has a bump interface structure that is connected to the other structure using traces in the interposer. However, the bump interface structures may have different orientations relative to each other, or one of the interface structures defines fewer signals than the other. Directly connecting the corresponding signals defined by the structures to each other may be impossible to do in the interposer, or make the interposer too costly. Instead, the embodiments here simplify routing in the interposer by connecting the signals in the bump interface structures in a way that simplifies the routing but jumbles the signals. The jumbled signals can then be corrected using reordering circuitry in the dies (e.g., in the link layer and physical layer).
H01L 23/00 - SEMICONDUCTOR DEVICES NOT COVERED BY CLASS - Details of semiconductor or other solid state devices
H01L 25/065 - Assemblies consisting of a plurality of individual semiconductor or other solid state devices all the devices being of a type provided for in the same subgroup of groups , or in a single subclass of , , e.g. assemblies of rectifier diodes the devices not having separate containers the devices being of a type provided for in group
H01L 23/538 - Arrangements for conducting electric current within the device in operation from one component to another the interconnection structure between a plurality of semiconductor chips being formed on, or in, insulating substrates
31.
TIME-DIVISION MULTIPLEXING (TDM) IN INTEGRATED CIRCUITS FOR ROUTABILITY AND RUNTIME ENHANCEMENT
Implementing a circuit design using time-division multiplexing (TDM) can include determining a net signature for each of a plurality of nets of a circuit design. For each net, the net signature specifies location information for a driver and one or more loads of the net. The plurality of nets having a same net signature can be grouped according to distance between drivers of the respective nets. One or more subgroups can be generated based on a TDM ratio for each group. For one or more of the subgroups, a TDM transmitter circuit is connected to a TDM receiver circuit through a selected interconnect, the drivers of the nets of the subgroup are connected to the TDM transmitter circuit, and loads of the nets of the subgroup are connected to the TDM receiver circuit.
Methods and apparatus for time-to-digital conversion. An example apparatus includes a first input; a second input; a delay line coupled to the first input and comprising a plurality of first delay elements coupled in series, each of the plurality of first delay elements having a first delay time; a second delay element having an input coupled to the second input and having the first delay time; a third delay element having an input coupled to the second input and having a second delay time, the second delay time being smaller than the first delay time; a first set of arbiters having first inputs coupled to the delay line and having second inputs coupled to an output of the second delay element; and a second set of arbiters having first inputs coupled to the delay line and having second inputs coupled to an output of the third delay element.
H03K 5/14 - Arrangements having a single output and transforming input signals into pulses delivered at desired time intervals by the use of delay lines
H03K 5/15 - Arrangements in which pulses are delivered at different times at several outputs, i.e. pulse distributors
H03K 5/135 - Arrangements having a single output and transforming input signals into pulses delivered at desired time intervals by the use of time reference signals, e.g. clock signals
H03K 19/0948 - Logic circuits, i.e. having at least two inputs acting on one output; Inverting circuits using specified components using semiconductor devices using field-effect transistors using MOSFET using CMOS
H03M 1/82 - Digital/analogue converters with intermediate conversion to time interval
Disclosed herein are integrated circuit (IC) structures and methods for fabricating and testing such IC structures prior to dicing from a semiconductor wafer on which the IC structures are formed. In one example, a method for fabricating an IC structure includes contacting a first plurality of test pads of the IC structure with one or more test probes. The first plurality of test pads are disposed within or on a first dielectric layer within a scribe lane, i.e., a test region. A first metal layer is formed over the first plurality of test pads if a predefined test criteria is met as determined using information obtained through first plurality of test pads using the one or more test probes. The first metal layer is a layer formed in a die region of an IC die that is being fabricated in the wafer.
H01L 21/66 - Testing or measuring during manufacture or treatment
H01L 21/78 - Manufacture or treatment of devices consisting of a plurality of solid state components or integrated circuits formed in, or on, a common substrate with subsequent division of the substrate into plural individual devices
In one example, a command pattern sequencer includes a set of registers to store values used to configure a command sequence for configuring a memory. The command pattern sequencer further includes state machine circuitry coupled to the set of registers, the state machine circuitry configured to generate and execute the command sequence. The command pattern sequencer still further includes timing circuitry configured to manage timing between commands of the command sequence.
A disclosed circuit arrangement detects the supply voltage level to the “device” (SoC, chip, SiP, etc.) and adjusts bias voltages to receiver and transmitter circuits of the device to levels suitable for the device in response to the supply voltage ramping-up during a power-on reset (“POR”) sequence. The circuitry holds the receiver output at a constant logic value while the supply voltage is ramping up and the POR signal is asserted. The disclosed circuitry also protects the transceiver as the voltage domain of the input signal is unknown and the voltage between any two terminals of a transistor of the transceiver cannot exceed a certain level.
An integrated circuit includes a front-end interface, a back-end interface, a controller, and arbiter circuitry. The front-end interface communicates with a remote host over a front-end fabric. The back-end interface communicates with nonvolatile memory (NVM) subsystems over a back-end fabric. The controller is coupled between the front-end interface and the back-end interface. The controller receives commands from the remote host for the NVM subsystems, and stores the commands in queue pairs associated with the NVM subsystems. The arbiter circuitry receives data for the queue pairs, and selects a command from a first queue pair of the queue pairs based on a comparison of the data to one or more thresholds. The selected command is outputted to one or more of the NVM subsystems.
Methods and apparatus relating to transmission on physical channels, such as in networks on chips (NoCs) or between chiplets, are provided. One example apparatus generally includes a higher bandwidth client; a lower bandwidth client; a first destination; a second destination; and multiple physical channels coupled between the higher bandwidth client, the lower bandwidth client, the first destination, and the second destination, wherein the higher bandwidth client is configured to send first traffic, aggregated across the multiple physical channels, to the first destination and wherein the lower bandwidth client is configured to send second traffic, concurrently with sending the first traffic, from the lower bandwidth client, dispersed over two or more of the multiple physical channels, to the second destination.
Methods and apparatus relating to transmission on physical channels, such as in networks on chips (NoCs) or between chiplets, are provided. One example apparatus generally includes a higher bandwidth client; a lower bandwidth client; a first destination; a second destination; and multiple physical channels coupled between the higher bandwidth client, the lower bandwidth client, the first destination, and the second destination, wherein the higher bandwidth client is configured to send first traffic, aggregated across the multiple physical channels, to the first destination and wherein the lower bandwidth client is configured to send second traffic, concurrently with sending the first traffic, from the lower bandwidth client, dispersed over two or more of the multiple physical channels, to the second destination.
Approaches for logic compaction include inputting an optimization directive that specifies one of area optimization or speed optimization to a synthesis tool executing on a computer processor. The synthesis tool identifies a multiplier and/or an adder specified in a circuit design and synthesizing the multiplier into logic having LUT-to-LUT connections between LUTs on separate slices of a programmable integrated circuit (IC) in response to the optimization directive specifying speed optimization. The synthesis tool synthesizes the multiplier and/or adder into logic having LUT-carry connections between LUTs and carry logic within a single slice of the programmable IC in response to the optimization directive specifying area optimization. The method includes implementing a circuit on the programmable IC from the logic having LUT-carry connections in response to the optimization directive specifying area optimization.
G06F 7/506 - Adding; Subtracting in bit-parallel fashion, i.e. having a different digit-handling circuit for each denomination with simultaneous carry generation for, or propagation over, two or more stages
G06F 30/327 - Logic synthesis; Behaviour synthesis, e.g. mapping logic, HDL to netlist, high-level language to RTL or netlist
G06F 30/34 - Circuit design for reconfigurable circuits, e.g. field programmable gate arrays [FPGA] or programmable logic devices [PLD]
G06F 7/533 - Reduction of the number of iteration steps or stages, e.g. using the Booth algorithm, log-sum, odd-even
G06F 7/57 - Arithmetic logic units [ALU], i.e. arrangements or devices for performing two or more of the operations covered by groups or for performing logical operations
40.
PROGRAMMABLE NON-LINEAR ACTIVATION ENGINE FOR NEURAL NETWORK ACCELERATION
A programmable, non-linear (PNL) activation engine for a neural network is capable of receiving input data within a circuit. In response to receiving an instruction corresponding to the input data, the PNL activation engine is capable of selecting a first non-linear activation function from a plurality of non-linear activation functions by decoding the instruction. The PNL activation engine is capable of fetching a first set of coefficients corresponding to the first non-linear activation function from a memory. The PNL activation engine is capable of performing a polynomial approximation of the first non-linear activation function on the input data using the first set of coefficients. The PNL activation engine is capable of outputting a result from the polynomial approximation of the first non-linear activation function.
Receiver circuitry for an input/output device includes first stage circuitry and second stage. The first stage circuitry has a first input to receive an input signal, voltage adjustment circuitry, and differential amplifier circuitry. The first stage circuitry is coupled to the first input and has a transistor pair to receive the input signal, and adjust a voltage value of the input signal to generate an adjusted signal. The differential amplifier circuitry receives the adjusted signal and a reference signal, and generates a first differential signal and a second differential signal. The second stage circuitry receives the first differential signal and the second differential signal, and generates an output signal based on the first differential signal and the second differential signal.
Static and automatic realization of inter-basic block burst transfers for high-level synthesis can include generating an intermediate representation of a design specified in a high-level programming language, wherein the intermediate representation is specified as a control flow graph, and detecting a plurality of basic blocks in the control flow graph. A determination can be made that plurality of basic blocks represent a plurality of consecutive memory accesses. A sequential access object specifying the plurality of consecutive memory accesses of the plurality of basic blocks is generated. A hardware description language (HDL) version of the design is generated, wherein the plurality of consecutive memory accesses are designated in the HDL version for implementation in hardware using a burst mode.
Examples described herein provide for determining a recipe for identifying from which buckets integrated circuit chips are taken to form units of a multi-chip apparatus. In an example, a method uses a processor-based system and uses a Markov Decision Process. Buckets are defined based on respective characteristics of manufactured chips. Each of the manufactured chips is binned into a respective one of the buckets based on the characteristic of the respective manufactured chip. A recipe for identifying from which of the buckets to take one or more of the manufactured chips to incorporate into respective ones of the units of the multi-chip apparatus is generated.
A machine learning-based process includes identifying a first set of features that includes features of a reference implementation of a circuit design and features of a synthesized version of a modified version of the circuit design. A first classification model is applied to the first set of features, and the first classification model indicates a full implementation flow or an incremental implementation flow. The full implementation flow is performed on the synthesized version of the modified version in response to the first classification model indicating the full implementation flow, and the incremental implementation flow is performed on the synthesized version of the modified version in response to the first classification model indicating the incremental implementation flow. The full and incremental implementation flows generate implementation data that is suitable for making an integrated circuit (IC).
Embodiments herein describe wrapping non-safety compliant hardware resources with error detection checking to satisfy a safety standard. Doing so permits non-safety compliant hardware to be used to perform one or more tasks in a system that, as a whole, satisfies a particular safety standard (e.g., one of the ASIL QM, A, B, C, and D grades).
G07C 5/00 - Registering or indicating the working of vehicles
G06F 11/10 - Adding special bits or symbols to the coded information, e.g. parity check, casting out nines or elevens
H04W 4/48 - Services specially adapted for particular environments, situations or purposes for vehicles, e.g. vehicle-to-pedestrians [V2P] for in-vehicle communication
46.
METHOD AND SYSTEM FOR BUILDING HARDWARE IMAGES FROM HETEROGENEOUS DESIGNS FOR ELETRONIC SYSTEMS
Automatically generating a hardware image based on programming model types includes determining by a design tool, types of programming models used in specifications of blocks of a circuit design, in response to a user control input to generate a hardware image to configure a programmable integrated circuit (IC). The design tool can generate a model-type compiler script for each of the types of programming models. Each compiler script initiates compilation of blocks having specifications based on one of the types of programming model into an accelerator representation. The design tool can generate a build script configured to execute the compiler scripts and link the accelerator representations into linked accelerator representations. Execution of the build script builds a hardware image from the linked accelerator representations for configuring the programmable IC to implement a circuit according to the circuit design.
An integrated circuit includes an interposer and a die coupled to the interposer. The die includes a first data processing engine (DPE) array and a second DPE array. The first DPE array includes a first plurality of DPEs and a first DPE interface coupled to the first plurality of DPEs. The second DPE array includes a second plurality of DPEs and a second DPE interface coupled to the second plurality of DPEs. The integrated circuit includes one or more other dies having a first die interface coupled to, and configured to communicate with, the first DPE interface via the interposer and a second die interface coupled to, and configured to communicate with, the second DPE interface via the interposer.
A System-on-Chip includes a data processing engine array. The data processing engine array includes a plurality of data processing engines organized in a grid. The plurality of data processing engines are partitioned into at least a first partition and a second partition. The first partition includes one or more first data processing engines of the plurality of data processing engines. The second partition includes one or more second data processing engines of the plurality of data processing engines. Each partition is configured to implement an application that executes independently of the other partition.
Techniques and apparatus for dynamically modifying a kernel (and associated user-specified circuitry) for a dynamic region of a programmable integrated circuit (IC) without affecting (e.g., while allowing) operation of other kernels ((and other associated user-specified circuitry) in the programmable IC. Dynamically modifying a kernel may include, for example, unloading an existing kernel, loading a new kernel, or replacing a first kernel with a second kernel). In the case of networking (e.g., in a data center application) where the programmable IC may be part of a hardware acceleration card (e.g., a network interface card (NIC)), the kernel may be user code referred to as a “plugin.”
Embodiments herein describe wrapping non-safety compliant hardware resources with error detection checking to satisfy a safety standard. Doing so permits non-safety compliant hardware to be used to perform one or more tasks in a system that, as a whole, satisfies a particular safety standard (e.g., one of the ASIL QM, A, B, C, and D grades).
G06F 21/57 - Certifying or maintaining trusted computer platforms, e.g. secure boots or power-downs, version controls, system software checks, secure updates or assessing vulnerabilities
G05B 19/042 - Programme control other than numerical control, i.e. in sequence controllers or logic controllers using digital processors
G06F 21/64 - Protecting data integrity, e.g. using checksums, certificates or signatures
G06F 21/72 - Protecting specific internal or peripheral components, in which the protection of a component leads to protection of the entire computer to assure secure computing or processing of information in cryptographic circuits
G06F 21/85 - Protecting input, output or interconnection devices interconnection devices, e.g. bus-connected or in-line devices
H04L 9/32 - Arrangements for secret or secure communications; Network security protocols including means for verifying the identity or authority of a user of the system
H04W 4/48 - Services specially adapted for particular environments, situations or purposes for vehicles, e.g. vehicle-to-pedestrians [V2P] for in-vehicle communication
Implementing a circuit design within an integrated circuit can include converting the circuit design, specified in a hardware description language, into a data flow graph and creating range set data structures in a memory. The range set data structures correspond to nodes of the data flow graph. Each range set data structure can be initialized with a range of values the corresponding node can take as specified by the circuit design. The method can include determining actual values the nodes are capable of taking by propagating the values through the data flow graph. The range set data structures are updated to store the actual values for the corresponding nodes. The method also can include modifying a selected node of the data flow graph based on the actual values stored in the range set data structure of the selected node and semantics of the selected node.
An expansion card having a mezzanine level communication port is disclosed herein. The mezzanine level communication port frees space on the primary substrate (e.g., printed circuit board) for any one or more of a variety of expansion card components. The expansion card includes a bracket, a first communication port, a primary substrate, and a secondary substrate. The first communication port is coupled to the bracket. The primary and secondary substrates are disposed on one side of the bracket. The secondary substrate has a termination of the first communication port.
Chip packages and methods for fabricating the same are provided which utilize a first heat spreader interfaced with a first integrated circuit (IC) die and a second heat spreader separately interfaced with a second IC die. The separate heat spreaders allow the force applied to the first IC die to be controlled independent of the force applied to the second IC die.
H01L 23/427 - Cooling by change of state, e.g. use of heat pipes
H01L 25/16 - Assemblies consisting of a plurality of individual semiconductor or other solid state devices the devices being of types provided for in two or more different main groups of groups , or in a single subclass of , , e.g. forming hybrid circuits
H01L 23/16 - Fillings or auxiliary members in containers, e.g. centering rings
H01L 23/00 - SEMICONDUCTOR DEVICES NOT COVERED BY CLASS - Details of semiconductor or other solid state devices
H01L 21/48 - Manufacture or treatment of parts, e.g. containers, prior to assembly of the devices, using processes not provided for in a single one of the groups
A chip package and method for fabricating the same are provided that includes an off-die inductor. The off-die inductor is disposed in a redistribution layer formed on a bottom surface of an integrated circuit (IC) die. The redistribution layer is connected to a package substrate to form the chip package.
Embodiments herein describe partitioning an acceleration device based on the needs of each user application executing in a host. In one embodiment, a flexible queue provisioning method allows the acceleration device to be dynamically partitioned by pushing the configuration through a control command queue to the device by management software running in a trusted zone. The new configuration is parsed and verified by trusted firmware, which, then, creates isolated IO command queues on the acceleration device. These IO command queues can be directly mapped to a user application, VM, or other PCIe devices. In one embodiment, each IO command queue exposes only the compute resource assigned by the trusted firmware in the acceleration device.
A universal interposer for an integrated circuit (IC) device has a body having a first surface and a second surface opposite the first surface. A first region is formed on a first side of the body along a first edge. The first region has first slots, each having an identical first bond pad layout. A second region is formed on the first side along a second edge, opposite the first edge. The second region has second slots having an identical second bond pad layout. A third region having third slots is formed on the first side between the first and second regions, each slot having an identical third bond pad layout. A pad density of the third bond pad layout is greater than the first bond pad layout. One of the third slots is coupled to contact pads disposed in a region not directly below any of the second slots.
H01L 23/538 - Arrangements for conducting electric current within the device in operation from one component to another the interconnection structure between a plurality of semiconductor chips being formed on, or in, insulating substrates
H01L 23/00 - SEMICONDUCTOR DEVICES NOT COVERED BY CLASS - Details of semiconductor or other solid state devices
H01L 25/18 - Assemblies consisting of a plurality of individual semiconductor or other solid state devices the devices being of types provided for in two or more different subgroups of the same main group of groups , or in a single subclass of ,
Circuitry for multiplying a sparse matrix by a dense vector includes a first switching circuit (302) for routing input triplets from N input ports to N output ports based on column indices of the triplets. Each triplet includes a non-zero value, a row index, and a column index. N first memory banks (303) store subsets of vector elements and are addressed by the column indices of the triplets. N multipliers (305) multiply the non-zero values of the triplets by the vector element read from the respective memory bank. A second switching circuit (304) routes tuples based on row indices of the tuples. Each tuple includes a product output by the one of the N multipliers and a row index output by an output port of the first switching circuit. N accumulator circuits (307) sum products of tuples having equal row indices.
An integrated circuit can include a communication endpoint configured to maintain a communication link with a host computer, a queue configured to receive a plurality of host commands from the host computer via the communication link, and a processor configured to execute a device runtime. The processor, responsive to executing the device runtime, is configured to perform validation of the host commands read from the queue and selectively execute the host commands based on a result of the validation on a per host command basis. The host commands are executable by the processor to manage functions of the integrated circuit. The queue is implemented in a region of memory that is shared by the integrated circuit and the host computer.
G06F 21/57 - Certifying or maintaining trusted computer platforms, e.g. secure boots or power-downs, version controls, system software checks, secure updates or assessing vulnerabilities
G06F 21/64 - Protecting data integrity, e.g. using checksums, certificates or signatures
G06F 21/71 - Protecting specific internal or peripheral components, in which the protection of a component leads to protection of the entire computer to assure secure computing or processing of information
G06F 13/16 - Handling requests for interconnection or transfer for access to memory bus
Testbench creation for sub-design verification can include receiving, using computer hardware, a selection of a sub-design of a circuit design. The sub-design is one of a plurality of sub-designs of the circuit design. The circuit design includes a plurality of parameter values. A list of port-level signal information is generated for the selected sub-design. The one or more parameter values of the circuit design are extracted. Switching activity of each port-level signal from the list is logged in a switching activity file while running a circuit design testbench for the circuit design with the selected sub-design in scope. From the list, the switching activity, and the one or more parameter values, a sub-design testbench for the selected sub-design is generated.
A chip package and method for fabricating the same are provided that includes a near-die integrated passive device. The near-die integrated passive device is disposed between a package substrate and an integrated circuit die of a chip package. Some non-exhaustive examples of an integrated passive device that may be disposed between the package substrate and the integrated circuit die include a resistor, a capacitor, an inductor, a coil, a balum, or an impedance matching element, among others.
H01L 25/16 - Assemblies consisting of a plurality of individual semiconductor or other solid state devices the devices being of types provided for in two or more different main groups of groups , or in a single subclass of , , e.g. forming hybrid circuits
H01L 23/00 - SEMICONDUCTOR DEVICES NOT COVERED BY CLASS - Details of semiconductor or other solid state devices
61.
DIGITAL-TO-ANALOG CONVERTER (DAC)-BASED VOLTAGE-MODE TRANSMIT DRIVER ARCHITECTURE WITH TUNABLE IMPEDANCE CONTROL AND TRANSITION GLITCH REDUCTION TECHNIQUES
A digital-to-analog converter (DAC)-based voltage-mode transmit driver architecture. One example transmit driver circuit generally includes an impedance control circuit coupled to a plurality of DAC driver slices. The impedance control circuit generally includes a tunable impedance configured to be adjusted to match a load impedance for the transmit driver circuit. Another example transmit driver circuit generally has an output impedance that is smaller than the load impedance for the transmit driver circuit, such that an output voltage swing at differential output nodes of the transmit driver circuit is greater than a voltage of a power supply rail. Another example transmit driver circuit generally includes a predriver circuit with a first inverter coupled to a first output of the predriver circuit and a second inverter coupled to a second output of the predriver circuit, the transistors in at least one of the first inverter or the second inverter having different strengths.
A unified container file can be selected using computer hardware. The unified container file can include a plurality of files embedded therein used to configure a programmable integrated circuit (IC). The plurality of files can include a first partial configuration bitstream and a second partial configuration bitstream. The unified container file also includes metadata specifying a defined relationship between the first partial configuration bitstream and the second partial configuration bitstream for programming the programmable IC. The defined relationship can be determined using computer hardware by reading the metadata from the unified container file. The programmable IC can be configured, using the computer hardware, based on the defined relationship specified by the metadata using the first partial configuration bitstream and the second partial configuration bitstream.
G06F 21/53 - Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems during program execution, e.g. stack integrity, buffer overflow or preventing unwanted data erasure by executing in a restricted environment, e.g. sandbox or secure virtual machine
G06F 21/57 - Certifying or maintaining trusted computer platforms, e.g. secure boots or power-downs, version controls, system software checks, secure updates or assessing vulnerabilities
A method includes receiving a value and an identifier from a first memory and hashing the identifier to produce a memory block identifier. The method also includes routing, based on the memory block identifier, a read request to a memory block of a plurality of memory blocks and updating the value received from the first memory based on a property received from the memory block in response to the read request. The memory further includes storing the updated value in the first memory.
Embodiments herein describe using an adaptive chip-to-chip (C2C) interface to interconnect two chips, wherein the adaptive C2C interface indudes circuitry for performing multiple different C2C protocois to communicate with the other chip. One or both of the chips in the C2C connection can include the adaptive C2C interface. During boot time, the adaptive C2C interface is configured to perform one of the different C2C protocois. During runtime, the chip then uses the selected C2C protocol to communicate with the other chip in the C2C connection.
Embodiments herein describe using an adaptive chip-to-chip (C2C) interface to interconnect two chips, wherein the adaptive C2C interface includes circuitry for performing multiple different C2C protocols to communicate with the other chip. One or both of the chips in the C2C connection can include the adaptive C2C interface. During boot time, the adaptive C2C interface is configured to perform one of the different C2C protocols. During runtime, the chip then uses the selected C2C protocol to communicate with the other chip in the C2C connection.
A rearranger circuit rearranges data elements of each raw image of a plurality of raw images according to a plurality of raw color channel arrays. The data elements of each raw image are input to the rearranger circuit according to instances of a pattern of color channels of a color filter array (CFA). The data elements specify values of the color channels in the instances of the pattern, and each raw color channel array has the data elements of one color channel of the color channels in the instances of the pattern. The rearranger circuit can be used in neural network training or in generating raw color channel arrays for performing neural network inference.
An inference server is capable of receiving a plurality of inference requests from one or more client systems. Each inference request specifies one of a plurality of different endpoints. The inference server can generate a plurality of batches each including one or more of the plurality of inference requests directed to a same endpoint. The inference server also can process the plurality of batches using a plurality of workers executing in an execution layer therein. Each batch is processed by a worker of the plurality of workers indicated by the endpoint of the batch.
A biasing scheme for a voltage-to-time converter (VTC). An example biasing circuit generally includes a reference current source; a feedback loop current source; an amplifier having a first input coupled to a target voltage node, having a second input, and having an output coupled to a control input of the reference current source and to a control input of the feedback loop current source; a first capacitive element; a first switch coupled in parallel with the first capacitive element; a second switch coupled between the feedback loop current source and the first capacitive element; and a third switch coupled between the first capacitive element and the second input of the amplifier.
An electronic device and methods for fabricating the same are disclosed herein that utilize a dam formed on a printed circuit board (PCB) that is positioned to substantially prevent edge bond material, utilized to secure a chip package to the PCB, from interfacing with the solder balls transmitting signals between the PCB and chip package.
H05K 3/34 - Assembling printed circuits with electric components, e.g. with resistor electrically connecting electric components or wires to printed circuits by soldering
H05K 1/11 - Printed elements for providing electric connections to or between printed circuits
70.
HIERARCHICAL HARDWARE-SOFTWARE PARTITIONING AND CONFIGURATION
Embodiments herein describe partitioning hardware and software in a system on a chip (SoC) into a hierarchy. In one embodiment, the hierarchy includes three levels of hardware-software configurations, enabling security and/or safety isolation across those three levels. The levels can cover the processor subsystem with compute, memory, acceleration, and peripheral resources shared or divided across those three levels.
Synthetizing a hardware description language code into a netlist comprising loads and a multi-clock buffer (MBUF). The MBUF receives a global clocking signal and generates a first and a second related clocking signals. The loads are grouped into a first and a second groups receiving the first and the second clocking signals respectively. A first/second clock modifying leaf are placed between a common node and the first/group groups respectively, wherein the common node is positioned closer in proximity to the first/second groups in comparison to a clock source generating the global clocking signal. The first/second clock modifying leaves receive a least divided clocking signal from the MBUF and generate the first/second clocking signals respectively. The least divided clocking signal is routed from the MBUF to the first/second clock modifying leaves. The first/second clocking signals are routed from the first/second clock modifying leaves to the first/second group respectively.
Embodiments herein describe techniques for managing power consumption and temperature in an electronic circuits or integrated chips driven by clock signals (collectively referred to as “cards”) by throttling the clock signals on those cards. The cards often allow users to implement customized hardware acceleration functions via Field Programmable Gate Arrays or the like, which can lead to variable workloads on different cards (or regions of individual cards) based on the customized functionality. By throttling the clock signal based on continuously monitored power consumption or temperature, the user is enabled to use the card more aggressively (e.g., based on average rather than worst-case power consumption), and the card automatically throttles operations when power consumption or temperature exceeds operational thresholds.
Embodiments herein describe partitioning hardware and software in a system on a chip (SoC) into a hierarchy. In one embodiment, the hierarchy includes three levels of hardware-software configurations, enabling security and/or safety isolation across those three levels. The levels can cover the processor subsystem with compute, memory, acceleration, and peripheral resources shared or divided across those three levels.
G06F 21/53 - Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems during program execution, e.g. stack integrity, buffer overflow or preventing unwanted data erasure by executing in a restricted environment, e.g. sandbox or secure virtual machine
G06F 9/455 - Emulation; Interpretation; Software simulation, e.g. virtualisation or emulation of application or operating system execution engines
74.
TRANSPARENT AND REMOTE KERNEL EXECUTION IN A HETEROGENEOUS COMPUTING SYSTEM
Remote kernel execution in a heterogeneous computing system can include executing, using a device processor of a device communicatively linked to a host processor, a device runtime and receiving from the host processor within a hardware submission queue of the device, a command. The command requests execution of a software kernel and specifies a descriptor stored in a region of a memory of the device shared with the host processor. In response to receiving the command, the device runtime, as executed by the device processor, invokes a runner program associated with the software kernel. The runner program can map a physical address of the descriptor to a virtual memory address corresponding to the descriptor that is usable by the software kernel. The runner program can execute the software kernel. The software kernel can access data specified by the descriptor using the virtual memory address as provided by the runner program.
An integrated circuit (IC) device for detecting errors within a register, the IC device includes registers and parity checking circuitry. The parity checking circuitry is coupled to the registers and comprises a first parity circuity, a second parity circuit, and error detection circuitry. The first parity circuit receives first register values from the registers and determine a first value from the first register values. The second parity circuit is receives second register values from the registers and determines a second value from the second register values. The error detection circuitry compares the first value and the second value to detect a first error within the registers, and output an error signal indicating the first error.
Embodiments herein describe end-to-end bindings to create zones that extend between different components in a SoC, such as an I/O gateway, a processor subsystem, a NoC, storage and data accelerators, programmable logic, etc. Each zone can be assigned to a different domain that is controlled by a tenant such as an external host, or software executing on that host. Embodiments herein create end-to-end bindings between acceleration engines, I/O gateways, and embedded cores in SoCs. Instead of these components being treated as disparate monolithic components, the bindings divide up the hardware and memory resources across components that make up the SoC, into different zones. Those zones in turn can have unique bindings to multiple tenants. The bindings can be configured in bridges between components to divide resources into the zones to enable tenants of those zones to have dedicated available resources that are secure from the other tenants.
Examples herein describe hardware architecture for processing and accelerating data passing through layers of a neural network. In one embodiment, a reconfigurable integrated circuit (IC) for use with a neural network includes a digital processing engine (DPE) array, each DPE having a plurality of neural network units (NNUs). Each DPE generates different output data based on the currently processing layer of the neural network, with the NNUs parallel processing different input data sets. The reconfigurable IC also includes a plurality of ping-pong buffers designed to alternate storing and processing data for the layers of the neural network.
G06N 3/04 - Architecture, e.g. interconnection topology
G06F 3/06 - Digital input from, or digital output to, record carriers
G06N 3/063 - Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using electronic means
G06F 5/16 - Multiplexed systems, i.e. using two or more similar devices which are alternately accessed for enqueue and dequeue operations, e.g. ping-pong buffers
Disclosed approaches for convolving input feature maps in a neural network include a circuit arrangement circuit that includes memory circuitry and convolution circuitry. The memory circuitry is configured to store K NxN first filters, and C 1x1 second filters, wherein N ≥ 1, and 1 < K < C. The convolution circuitry is coupled to the memory circuitry and configured to convolve a three-dimensional input feature map with the K NxN first filters into an intermediate volume having a depth of K, and convolve the intermediate volume with the C 1x1 second filters into an output feature map having a depth of C.
An integrated circuit (IC) device for detecting errors within a register, the IC device includes registers and parity checking circuitry. The parity checking circuitry is coupled to the registers and comprises a first parity circuitry, a second parity circuit, and error detection circuitry. The first parity circuit receives first register values from the registers and determine a first value from the first register values. The second parity circuit is receives second register values from the registers and determines a second value from the second register values. The error detection circuitry compares the first value and the second value to detect a first error within the registers, and output an error signal indicating the first error.
Methods and apparatus for adaptive integrity levels in electronic and programmable logic systems. In one example, an interface for communication between a first component and a second component is provided. The interface includes logic configured to change an integrity level for a communication from the first component to the second component during operation of the first component and the second component.
G06F 21/57 - Certifying or maintaining trusted computer platforms, e.g. secure boots or power-downs, version controls, system software checks, secure updates or assessing vulnerabilities
A network interface device has data path circuitry configured to cause data to be moved into and/or out of the network interface device. The data path circuitry comprises: first circuitry for providing one or more data processing operations; and interface circuitry supporting channels. The channels comprises command channels receiving command information from a plurality of data path circuitry user instances, event channels providing respective command completion information to the plurality of data path user instances; and data channels providing the associated data.
G06F 13/28 - Handling requests for interconnection or transfer for access to input/output bus using burst mode transfer, e.g. direct memory access, cycle steal
Embodiments herein describe end-to-end bindings to create zones that extend between different components in a SoC, such as an I/O gateway, a processor subsystem, a NoC, storage and data accelerators, programmable logic, etc. Each zone can be assigned to a different domain that is controlled by a tenant such as an external host, or software executing on that host. Embodiments herein create end-to-end bindings between acceleration engines, I/O gateways, and embedded cores in SoCs. Instead of these components being treated as disparate monolithic components, the bindings divide up the hardware and memory resources across components that make up the SoC, into different zones. Those zones in turn can have unique bindings to multiple tenants. The bindings can be configured in bridges between components to divide resources into the zones to enable tenants of those zones to have dedicated available resources that are secure from the other tenants.
A network interface device (109) has data path circuitry (102,) configured to cause data to be moved into and/or out of the network interface device (109). The data path circuitry comprises: first circuitry (128) for providing one or more data processing operations; and interface circuitry (126) supporting channels. The channels comprises command channels receiving command information from a plurality of data path circuitry user instances (101), event channels providing respective command completion information to the plurality of data path user instances (101); and data channels providing the associated data.
G06F 13/16 - Handling requests for interconnection or transfer for access to memory bus
G06F 13/28 - Handling requests for interconnection or transfer for access to input/output bus using burst mode transfer, e.g. direct memory access, cycle steal
Methods and apparatus for adaptive integrity levels in electronic and programmable logic systems. In one example, an interface for communication between a first component and a second component is provided. The interface includes logic configured to change an integrity level for a communication from the first component to the second component during operation of the first component and the second component.
Methods and apparatus for quickly changing line rates in PCIe analyzers without resetting the receivers. One example circuit for multi-rate reception generally includes: a receiver having a data input, a data output, and a clock input configured to receive a clock signal from a clock generator, the receiver being configured to switch between receiving data at a first data rate and at least one second data rate and to sample data according to the first data rate, wherein the first data rate is higher than the at least one second data rate; a phase detector having an input coupled to the data output of the receiver; and a filter having an input coupled to an output of the phase detector and having an output configured to effectively control a phase of the sampling by the receiver when the data is at the at least one second data rate.
H03L 7/081 - Automatic control of frequency or phase; Synchronisation using a reference signal applied to a frequency- or phase-locked loop - Details of the phase-locked loop provided with an additional controlled phase shifter
G06F 13/42 - Bus transfer protocol, e.g. handshake; Synchronisation
H03K 19/21 - EXCLUSIVE-OR circuits, i.e. giving output if input signal exists at only one input; COINCIDENCE circuits, i.e. giving output only if all input signals are identical
H03L 7/08 - Automatic control of frequency or phase; Synchronisation using a reference signal applied to a frequency- or phase-locked loop - Details of the phase-locked loop
Auxiliary power connector PCBs are described. In one example, an auxiliary power connector is described. The auxiliary power connector includes a printed circuit board (PCB) and a PCI express graphics (PEG) connector mounted to the PCB, the PEG connector configured to connect to an auxiliary power source. The auxiliary power connector further includes a set of connectors provided on the PCB, the set of connectors configured to connect the PCB to a main PCB of a device.
An adaptive memory expansion scheme is proposed, where one or more memory expansion capable Hosts or Accelerators can have their memory mapped to one or more memory expansion devices. The embodiments below describe discovery, configuration, and mapping schemes that allow independent SCM implementations and CPU-Host implementations to match their memory expansion capabilities. As a result, a memory expansion host (e.g., a memory controller in a CPU or an Accelerator) can declare multiple logical memory expansion pools, each with a unique capacity. These logical memory pools can be matched to physical memory in the SCM cards using windows in a global address map. These windows represent shared memory for the Home Agents (HAs) (e.g., the Host) and the Slave Agent (SAs) (e.g., the memory expansion device).
Disclosed herein is a heat spreader for use with an IC package, the heat spreader having features for enhanced temperature control of the IC package. A heat spreader for use with an IC package is disclosed. In one example, the heat spreader includes a metal body that has a sealed internal cavity. A thermally conductive material fills the sealed internal cavity. The thermally conductive material has an interstitial space sufficient to allow fluid to pass therethrough. A first phase change material fills at least a portion of the interstitial space of the thermally conductive material.
A circular buffer architecture includes a memory coupled to a producer circuit and a consumer circuit. The memory is configured to store objects. The memory can include memory banks. The number of the memory banks is less than a number of the objects. The circular buffer can include hardware locks configured to reserve selected ones of the memory banks for use by the producer circuit or the consumer circuit. The circular buffer can include a buffer controller coupled to the memory and configured to track a plurality of positions. The positions can include a consumer bank position, a consumer object position, a producer bank position, and a producer object position. The buffer controller is configured to allocate selected ones of the objects from the memory banks to the producer circuit and to the consumer circuit according to the tracked positions and using the hardware locks.
An integrated circuit includes an intellectual property core, scan data pipeline circuitry configured to convey scan data to the intellectual property core, and scan control pipeline circuitry configured to convey one or more scan control signals to the intellectual property core. The integrated circuit also includes a wave shaping circuit configured to detect a trigger event on the one or more scan control signals and, in response to detecting the trigger event, suppress a scan clock to the intellectual property core for a selected number of clock cycles.
G06F 30/333 - Design for testability [DFT], e.g. scan chain or built-in self-test [BIST]
G06F 30/367 - Design verification, e.g. using simulation, simulation program with integrated circuit emphasis [SPICE], direct methods or relaxation methods
A device may include a processor system and an array of data processing engines (DPEs) communicatively coupled to the processor system. Each of the DPEs includes a core and a DPE interconnect. The processor system is configured to transmit configuration data to the array of DPEs, and each of the DPEs is independently configurable based on the configuration data received at the respective DPE via the DPE interconnect of the respective DPE. The array of DPEs enable, without modifying operation of a first kernel of a first subset of the DPEs of the array of DPEs, reconfiguration of a second subset of the DPEs of the array of DPEs.
G06F 15/177 - Initialisation or configuration control
G06F 15/173 - Interprocessor communication using an interconnection network, e.g. matrix, shuffle, pyramid, star or snowflake
G06F 15/80 - Architectures of general purpose stored program computers comprising an array of processing units with common control, e.g. single instruction multiple data processors
92.
Core cavity noise isolation structure for use in chip packages
Various noise isolation structures and methods for fabricating the same are presented. In one example, a substrate for chip package is provided. The substrate includes a core region, top build-up layers and bottom build-up layers. The top build-up layers are formed on a first side of the core region and the bottom build-up layers are formed on a second side of the core region that is opposite the first side. Routing circuitry formed in the bottom build-up layers is coupled to routing circuitry formed in the top build-up layers by vias formed through the core region. A void is formed in the bottom build-up layers. The void is configured as a noise isolation structure. The void has a sectional area that is different in at least two different distances from the core region.
H01L 21/48 - Manufacture or treatment of parts, e.g. containers, prior to assembly of the devices, using processes not provided for in a single one of the groups
H01L 23/552 - Protection against radiation, e.g. light
93.
Wide frequency range voltage controlled oscillators
Phase-locked loop circuitry generates an output signal based on transformer based voltage controlled oscillator (VCO) circuitry. The VCO circuitry includes upper band circuitry including first oscillation circuitry, a first harmonic filter circuitry coupled to the first oscillation circuitry, and a first selection transistor coupled to the first harmonic filter circuitry and a current source. The first harmonic filter circuitry filters the output signal. The lower band circuitry includes second oscillation circuitry, a second harmonic filter circuitry coupled to the second oscillation circuitry, and a second selection transistor coupled to the second harmonic filter circuitry and the current source. The second harmonic filter circuitry filters the output signal.
H03L 7/099 - Automatic control of frequency or phase; Synchronisation using a reference signal applied to a frequency- or phase-locked loop - Details of the phase-locked loop concerning mainly the controlled oscillator of the loop
H03B 5/12 - Generation of oscillations using amplifier with regenerative feedback from output to input with frequency-determining element comprising lumped inductance and capacitance active element in amplifier being semiconductor device
H03L 7/093 - Automatic control of frequency or phase; Synchronisation using a reference signal applied to a frequency- or phase-locked loop - Details of the phase-locked loop concerning mainly the frequency- or phase-detection arrangement including the filtering or amplification of its output signal using special filtering or amplification characteristics in the loop
N key generation circuits are arranged in a pipeline having N stages. Each key generation circuit is configured to generate a round key as a function of a respective input key and a respective round constant. Output signal lines that carry the round key from a key generation circuit in a stage of the pipeline, except the key generation circuit in a last stage of the pipeline, are coupled to the key generation circuit in a successive stage of the pipeline to provide the respective input key.
H04L 9/06 - Arrangements for secret or secure communications; Network security protocols the encryption apparatus using shift registers or memories for blockwise coding, e.g. D.E.S. systems
H03K 19/21 - EXCLUSIVE-OR circuits, i.e. giving output if input signal exists at only one input; COINCIDENCE circuits, i.e. giving output only if all input signals are identical
H04L 9/34 - Bits, or blocks of bits, of the telegraphic message being interchanged in time
95.
Breakout structure for an integrated circuit device
Apparatus having at least one breakout structure are provided. In one example, an apparatus includes a dielectric layer, first and second contact pads, and first and second vias. The first and second contact pads are disposed on the dielectric layer. The first via is disposed through the dielectric layer and coupled to the first contact pad. The first via is offset from the first contact pad in a first direction. The second contact pad is immediately adjacent the first via. The second via is disposed through the dielectric layer immediately adjacent the first contact pad and coupled to the second contact pad. The second via is offset from the second contact pad in a second direction that is opposite of the first direction. The first and the second contact pads define a first differential pair of contact pads that is configured to transmit a first differential pair of signals.
Examples herein describe a peripheral I/O device with a hybrid gateway that permits the device to have both I/O and coherent domains. As a result, the compute resources in the coherent domain of the peripheral I/O device can communicate with the host in a similar manner as CPU-to-CPU communication in the host. The dual domains in the peripheral I/O device can be leveraged for machine learning (ML) applications. While an I/O device can be used as an ML accelerator, these accelerators previously only used an I/O domain. In the embodiments herein, compute resources can be split between the I/O domain and the coherent domain where a ML engine is in the I/O domain and a ML model is in the coherent domain. An advantage of doing so is that the ML model can be coherently updated using a reference ML model stored in the host.
A multiplication injection locked oscillator (MIILO) circuitry includes a ring injection locked oscillator (ILO) circuitry that outputs clock signals, a first switching circuitry and a second switching circuitry. The ring ILO circuitry includes a first path having first delay stages, and a second path having a second delay stages. The first switching circuitry is connected to the first path and a voltage supply node. The first switching circuitry receives a first control signal and a second control signal and selectively connects the voltage supply node to the first path. The second switching circuitry is connected to the second path and a reference voltage node. The second switching circuitry receives the first control signal and the second control signal and selectively connects the reference voltage node to the second path.
H03L 7/099 - Automatic control of frequency or phase; Synchronisation using a reference signal applied to a frequency- or phase-locked loop - Details of the phase-locked loop concerning mainly the controlled oscillator of the loop
G06F 1/06 - Clock generators producing several clock signals
H03K 17/687 - Electronic switching or gating, i.e. not by contact-making and -breaking characterised by the use of specified components by the use, as active elements, of semiconductor devices the devices being field-effect transistors
98.
DAC-BASED TRANSMIT DRIVER ARCHITECTURE WITH IMPROVED BANDWIDTH
A DAC-based transmit driver architecture with improved bandwidth and techniques for driving data using such an architecture. One example transmit driver circuit generally includes an output node and a plurality of digital-to-analog converter (DAC) slices. Each DAC slice has an output coupled to the output node of the transmit driver circuit and includes a bias transistor having a drain coupled to the output of the DAC slice and a multiplexer having a plurality of inputs and an output coupled to a source of the bias transistor.
Clock generation circuitry includes quadrature locked loop circuitry having first injection locked oscillator circuitry, second injection locked oscillator circuitry, and XOR circuitry. The first injection locked oscillator circuitry receives a first input signal and a second input signal and outputs first clock signals. The first input signal and the second input signal correspond to a reference clock signal. The second injection locked oscillator circuitry is coupled to outputs of the first injection locked oscillator circuitry, and receives the first clock signals and generates second clock signals. The XOR circuitry receives the second clock signals and generates a first clock signal, a second clock signal, a third clock signal, and a fourth clock signal. The frequencies of the first clock signal, the second clock signal, the third clock signal, and the fourth clock signal are greater than the frequency of the reference clock signal.
H03L 7/099 - Automatic control of frequency or phase; Synchronisation using a reference signal applied to a frequency- or phase-locked loop - Details of the phase-locked loop concerning mainly the controlled oscillator of the loop
H03L 7/24 - Automatic control of frequency or phase; Synchronisation using a reference signal directly applied to the generator
A semiconductor device comprises a design under test (DUT), a testing interface, pattern generation circuitry, and pattern checker circuitry. The pattern generation circuitry is connected to the DUT and the testing interface. The pattern generation circuitry is configured to generate a test data sequence and control data based on configuration data received from the testing interface, and communicate the test data sequence and the control data to the DUT. The pattern checker circuitry is connected to the DUT and the testing interface. The pattern checker circuitry is configured to generate a comparison test sequence based on the configuration data received from the testing interface, receive resultant test data sequence and output control data from the DUT, and generate a first error signal based on a comparison of the resultant test data sequence and the comparison test sequence and a comparison of the output control data and the configuration data.