Embodiments herein describe techniques for expressing the layers of a neural network in a software model. In one embodiment, the software model includes a class that describes the various functional blocks (e.g., convolution units, max-pooling units, rectified linear units (ReLU), and scaling functions) used to execute the neural network layers. In turn, other classes in the software model can describe the operation of each of the functional blocks. In addition, the software model can include conditional logic for expressing how the data flows between the functional blocks since different layers in the neural network can process the data differently. A compiler can convert the high-level code in the software model (e.g., C++) into a hardware description language (e.g., register transfer level (RTL)) which is used to configure a hardware system to implement a neural network accelerator.
Examples herein describe techniques for method of accessing encrypted data. The techniques include receiving, via a memory controller, a first memory request to a first memory region, where the first memory region is encrypted based on a first key, and incrementing, based on the first memory request, a first counter associated with the first key. The techniques further include, in response to determining that the first counter exceeds a first threshold, initiating a key rolling operation to cause the first memory region to be encrypted based on a second key. The techniques further include tracking an address range of the first memory region that has been encrypted based on the second key, and, in response to determining that an address of a second memory request is outside of the address range, causing the second memory request to be completed based on the first key.
A transmission system is disclosed including a driver circuit. The driver circuit includes multiplexer circuits that receive parallel data and operate as a differential pair. At least one of the multiplexer circuits is coupled to a first circuit node and a second circuit node of the driver circuit. The at least one the multiplexer circuits outputs serial data from the multiplexer circuits at the first and second circuit nodes. The first and second nodes are coupled to a differential output network. The first and second nodes are coupled to an inductor circuit. The first and second nodes are coupled to a cross-coupled circuit. The inductor circuit drains driver circuit current at the first circuit node. The second circuit node and the cross-coupled circuit steer driver circuit current at the first circuit node and the second circuit node.
Embodiments herein describe a configurable engine that is embedded into the cache hierarchy of a processor. The configurable engine can enable efficient data sharing between the main memory, cache memories, and the core. The configurable engine can perform operations that are more efficient to be done in the cache hierarchy. In one embodiment the configurable engine is controlled (or configured) by software (e.g., the operating system (OS)), adapting to each application domain. That is, the OS can configure the engine according to a data flow profile of a particular application being executed by the processor.
An integrated circuit (IC) device includes a controller circuitry having an input coupled to a photodiode of an optoelectronic circuitry and an output coupled to a heater of the optoelectronic circuitry, the controller circuitry configured to determine a center frequency of the optoelectronic circuitry based on a shape of an input signal received from the photodiode, and provide a heater signal to the heater based on the shape of the input signal and the center frequency of the optoelectronic circuitry.
A transformer includes a first inductor, facing in a first direction and a second inductor, facing in a second direction, the second direction opposite to the first. In one example the first and the second inductors are arranged such that the first inductor's legs extend to an area of the second inductor's head, and the second inductor's legs extend to an area of the first inductor's head.
H01F 27/00 - MAGNETS; INDUCTANCES; TRANSFORMERS; SELECTION OF MATERIALS FOR THEIR MAGNETIC PROPERTIES - Details of transformers or inductances, in general
Embodiments herein describe techniques for converting multiple symbols into respective wire states in parallel. In one embodiment, the techniques can be used to convert symbols into wire states in parallel even when those wire states are dependent on previously determined wire states. That is, the dependency on previous wire states can be removed so that wire states can be determined in parallel.
The Hot Carrier Injection effect is a phenomenon present in semiconductor devices, where charges are trapped in the gate oxide region and degrade the device. Hot carrier Injection (HCI) is one of the major problems in lower voltage technologies due to lower voltage tolerance limits of MOS devices. Due to this HCI effect, designing high voltage, wide range (i.e., supply voltage ranges: 3.3 v, 2.5 v, and 1.8 v) I/O buffers has become challenging. The HCI effect is common in input/output (I/O) buffers that use bias generation circuits for wide voltage ranges. Disclosed here are methods and systems employed to provide reliable bias generation in an I/O buffer or other semiconductor circuit. This limits the device drain to source voltage (Vds) in the bias circuits and I/O buffer so as to mitigate the hot carrier Injection (HCI) effect.
H03K 17/687 - Electronic switching or gating, i.e. not by contact-making and -breaking characterised by the use of specified components by the use, as active elements, of semiconductor devices the devices being field-effect transistors
An integrated circuit (IC) device includes processor circuitry configured to output a first memory command having a first memory address, and in-line error correction control (ILECC) circuitry configured to receive the first memory command and output the first memory command to a memory device. The ILECC circuitry includes an error correction code (ECC) cache configured to store a first local ECC associated with the first memory command in a first cache line.
G06F 11/10 - Adding special bits or symbols to the coded information, e.g. parity check, casting out nines or elevens
G06F 12/0871 - Allocation or management of cache space
G06F 12/0891 - Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches using clearing, invalidating or resetting means
10.
SYSTEM AND METHOD FOR SECURE DECONSTRUCTION SENSOR IN A HETEROGENEOUS INTEGRATION CIRCUITRY
Some examples described herein provide for securely booting a heterogeneous integration circuitry apparatus. In an example, an apparatus (e.g., heterogeneous integration circuitry) includes a first portion and a second portion of one or more entropy sources on a first component and a second component, respectively. The apparatus also includes a key generation circuit communicatively coupled with the first portion and the second portion to generate a key encrypted key based on a first set of bits output by the first portion and a second set of bits output by the second portion. The apparatus also includes a key security circuit to generate, based on the key encrypted key and an encrypted public key stored at the apparatus, a plaintext public key to be used by a boot loader during a secure booting operation for the apparatus.
G06F 21/57 - Certifying or maintaining trusted computer platforms, e.g. secure boots or power-downs, version controls, system software checks, secure updates or assessing vulnerabilities
H04L 9/06 - Arrangements for secret or secure communications; Network security protocols the encryption apparatus using shift registers or memories for blockwise coding, e.g. D.E.S. systems
Embodiments herein describe a configurable engine that is embedded into the cache hierarchy of a processor. The configurable engine can enable efficient data sharing between the main memory, cache memories, and the core. The configurable engine can perform operations that are more efficient to be done in the cache hierarchy. In one embodiment, the configurable engine is controlled (or configured) by software (e.g., the operating system (OS)), adapting to each application domain. That is, the OS can configure the engine according to a data flow profile of a particular application being executed by the processor.
G06F 12/0811 - Multiuser, multiprocessor or multiprocessing cache systems with multilevel cache hierarchies
G06F 9/30 - Arrangements for executing machine instructions, e.g. instruction decode
G06F 12/084 - Multiuser, multiprocessor or multiprocessing cache systems with a shared cache
G06F 12/1045 - Address translation using associative or pseudo-associative address translation means, e.g. translation look-aside buffer [TLB] associated with a data cache
Some examples described herein relate to protecting an integrated circuit (IC) structure from imaging or access. In an example, an IC structure includes a semiconductor substrate, an electromagnetic radiation blocking layer, and a support substrate. The semiconductor substrate has a circuit disposed on a front side of the semiconductor substrate. The electromagnetic radiation blocking layer is disposed on a backside of the semiconductor substrate opposite from the front side of the semiconductor substrate. The support substrate is bonded to the semiconductor substrate. The electromagnetic radiation blocking layer is disposed between the semiconductor substrate and the support substrate.
An integrated circuit (IC) device includes functional circuits and multiple communication paths, which may include a first communication path through the functional circuits and a second communication path to permit the functional circuits to share information through a buffer and/or to bypass a subset of the functional circuits and a corresponding portion of the first communication path. The IC device may include a variety of protocol-specific interface circuits (ASIC and/or configurable circuitry) for respective IP blocks, and a controller that selectively directs traffic through the various communication paths. The controller may include a set of domain-specific OpCodes that link various subsets/combinations of the protocol-specific interface circuits as respective communication paths. The IC device may include multiple blocks of circuitry, each including a respective set of domain-specific circuitry (e.g., host-domain, network domain, RF domain, and/or data processing domain), and respective sets of OpCodes.
Embodiments herein describe a memory system with a data width (W) that is split into N separate memories each of narrower width W/N. To protect a write enable (WE) signal, the WE signal is toggled and then stored in each of the N memories. For example, toggle circuits can have states that toggle each time the WE signal goes high, indicated that a received data word should be stored in the N memories. A fault on the WE input to any of the N memories results in its stored toggle bit being different from the toggle bits stored in the other N memories. This condition can then be detected upon any subsequent read by checking whether the toggled bits are equal. The memory system can also protect the address and control signals by generating parity bits that are stored in the N memories.
An interconnection circuitry of an accelerator device includes a multiplexer, a first plurality of buffers, a second plurality of buffers, and a demultiplexer. The multiplexer is coupled to first offload circuitry and received data therefrom. The first plurality of buffers has inputs coupled to outputs of the multiplexer. A second plurality of buffers has inputs coupled to outputs of the first plurality of buffers. The demultiplexer includes inputs coupled to outputs of the second plurality of buffers and outputs coupled to inputs of programmable logic.
Globally placing a circuit design includes adjusting indicated capacity levels for placement bins associated with a target integrated circuit, based on first levels of demand for resources by instances in the circuit design in regions of the target IC. Region constraints restrict placement of the instances in the regions, and the regions include two or more two or more overlapping regions. Tracked levels of demand for resources in the placement bins are adjusted, after adjusting the indicated capacity levels, based on the indicated capacity levels, a target utilization level of the resources in the placement bins, and a current placement. The current placement of the instances is updated based on a density gradient of an electrostatics-based model of the tracked levels of demand, and repeating adjusting the tracked levels of demand and updating the current placement are repeated in response to the density gradient failing to satisfy a threshold.
A device includes a data processing engine (DPE) array having a plurality of data processing engines (DPEs) and a subsystem coupled to the DPE array. Each DPE of the plurality of DPEs is configurable to share data with one or more other DPEs of the plurality of DPEs using one or more of a plurality of data sharing techniques. The data sharing techniques include a core of a selected DPE accessing a memory module of an adjacent DPE via a memory interface of the selected DPE connected to a memory module of the adjacent DPE and the selected DPE accessing the memory module of a non-adjacent DPE using a DMA circuit and a stream switch of the selected DPE. The subsystem may be in a different die than the DPE array.
Retiming a circuit design can include determining whether or not an initial value specified for a candidate register can be removed based on an input logic cone to the candidate register and an output logic cone from the candidate register. The candidate register is a register in a critical path in the circuit design. The candidate register can be retimed into a retimed register in response to determining that the initial value specified for the candidate register can be removed. A new initial value for the retimed register can be derived based on initial values of registers in a logic cone of the retimed register, and the new initial value can be assigned to the retimed register.
An authentication device for a communication device includes key stream generator circuitry and hash function circuitry. The key stream generator circuitry receives a first input data stream and generates a first data stream output signal based on the first input data stream and an encryption key. The first input data stream is associated with a first data rate. The hash function circuitry receives the first data stream output signal from the key stream generator circuitry. The hash function circuitry includes first decimation circuitry and recursive circuitry. The first decimation circuitry receives the first data stream output signal, and combines adjacent data words of the first data stream output signal to generate a first decimated output signal having a second data rate. The second data rate is less than the first data rate. The recursive circuitry generates an authentication tag based on the first decimated output signal.
H04L 9/32 - Arrangements for secret or secure communications; Network security protocols including means for verifying the identity or authority of a user of the system
A device may include a processor system and an array of data processing engines (DPEs) communicatively coupled to the processor system. Each of the DPEs includes a core and a DPE interconnect. The processor system is configured to transmit configuration data to the array of DPEs, and each of the DPEs is independently configurable based on the configuration data received at the respective DPE via the DPE interconnect of the respective DPE. The array of DPEs enable, without modifying operation of a first kernel of a first subset of the DPEs of the array of DPEs, reconfiguration of a second subset of the DPEs of the array of DPEs.
G06F 15/173 - Interprocessor communication using an interconnection network, e.g. matrix, shuffle, pyramid, star or snowflake
G06F 15/80 - Architectures of general purpose stored program computers comprising an array of processing units with common control, e.g. single instruction multiple data processors
21.
Memory controller with reduced latency transaction scheduling
A memory controller includes transaction queue circuitry, a first skip event, a second skip event, a third skip event, and scheduler circuitry. The transaction queue circuitry is configured to store a first transaction, a second transaction, and a third transaction. The first transaction received is by the transaction queue circuitry before the second transaction and the third transaction. The second transaction is received by the transaction queue circuitry before the third transaction. The first skip event counter is associated with the first transaction. The second skip event counter is associated with the second transaction. The third skip event counter is associated with the third transaction. The scheduler circuitry is configured to select the third transaction before selecting the first transaction, increase a value of the first skip event counter based on selecting the third transaction before the first transaction, and communicate the third transaction to a memory device.
An adaptable framework for circuit design simulation verification generates a simulation database for a circuit design and processed design data for the circuit design. The processed design data includes source files for the circuit design referenced by the simulation database. The simulation database and the processed design data are exported from a host integrated development environment (IDE). A template writer configured to generate a simulation script for the circuit design using the simulation database is provided. The simulation script is generated by executing the template writer. The simulation script is generated according to one or more user-specified parameters of the template writer using the simulation database and the processed design data as exported.
Examples herein describe techniques for performing parallel processing using a plurality of processing elements (PEs) and a controller for data that has data dependencies. For example, a calculation may require an entire row or column to be summed, or to determine its mean. The PEs can be assigned different chunks of a data set (e.g., a tensor set, a column, or a row) for processing. The PEs can use one or more tokens to inform the controller when they are done with partial processing of their data chunks. The controller can then gather the partial results and determine an intermediate value for the data set. The controller can then distribute this intermediate value to the PEs which then re-process their respective data chunks using the intermediate value to generate final results.
G06F 15/78 - Architectures of general purpose stored program computers comprising a single central processing unit
G06F 13/28 - Handling requests for interconnection or transfer for access to input/output bus using burst mode transfer, e.g. direct memory access, cycle steal
24.
PROGRAMMABLE HYBRID MEMORY AND CAPACITIVE DEVICE IN A DRAM PROCESS
A DRAM fabrication process for producing a semiconductor die adapted for having the ability to be both a hybrid memory and power supply capacitance. DRAM arrays on a semiconductor die may be individually selected to function as either a memory or as supplemental capacitance on a power distribution network serving circuits on one or more semiconductor dice in a three-dimensional active-on-active (AoA) stacked semiconductor die package configuration. Defective DRAM array trench capacitors can be repurposed to serve as supplemental capacitance on a power distribution network. DRAM array trench capacitors can be dynamically reassigned as supplemental capacitance when power supply monitors sense that additional power supply capacitance is needed.
H10B 80/00 - Assemblies of multiple devices comprising at least one memory device covered by this subclass
H01L 23/522 - Arrangements for conducting electric current within the device in operation from one component to another including external interconnections consisting of a multilayer structure of conductive and insulating layers inseparably formed on the semiconductor body
H01L 23/528 - Layout of the interconnection structure
A network interface device comprises at least one processor configured to validate at least a part of a context associated with a queue pair, the context being fetched from a memory on a host device.
Modeling and compiling tensor processing applications using multi-layer adaptive data flow (ML-ADF) graphs, including folding the ML-ADF graph for temporal sharing of platform resources, computing schedules for runtime orchestration of kernel execution, memory reuse, tensor and sub-volume movement, and dataflow synchronization, and generating binary code for processors of the target computing platform and re-targetable controller code. The ML-ADF graph may represent: tensor processing of a layer of a neural network as data flow through the data nodes and distribution to compute tiles across memory hierarchy; data flow amongst layers of the neural network using connections amongst data nodes of the respective layers; and multi-dimension data partitioning and distribution using tiling parameters associated with ports of the data nodes.
A DRAM fabrication process for producing a semiconductor die adapted for having the ability to be both a hybrid memory and power supply capacitance. DRAM arrays on a semiconductor die may be individually selected to function as either a memory or as supplemental capacitance on a power distribution network serving circuits on one or more semiconductor dice in a three-dimensional active-on-active (AoA) stacked semiconductor die package configuration. Defective DRAM array trench capacitors can be repurposed to serve as supplemental capacitance on a power distribution network. DRAM array trench capacitors can be dynamically reassigned as supplemental capacitance when power supply monitors sense that additional power supply capacitance is needed.
G11C 11/4074 - Power supply or voltage generation circuits, e.g. bias voltage generators, substrate voltage generators, back-up power, power control circuits
Embodiments herein describe an integrated circuit (IC) which includes a global ring that interconnects multiple local rings distributed throughout the IC. In one embodiment, the global ring is connected to the local rings using respective switches. The global ring (and the switches) interconnect the local rings so that a node coupled to one of the local rings can communicate with a node connected to another local ring.
Embodiments herein describe an integrated circuit (IC) which includes a global ring that interconnects multiple local rings distributed throughout the IC. In one embodiment, the global ring is connected to the local rings using respective switches. The global ring (and the switches) interconnect the local rings so that a node coupled to one of the local rings can communicate with a node connected to another local ring.
Error and debug information is saved during a boot process. A read only memory (ROM) debug circuitry (RDC) obtains detected errors within ROM code during a boot process. Error information is generated and stored within a first memory element. The error information includes entries. Each of the entries is associated with a respective one of the errors. Debug information is generated and stored by the RDC within a second memory element. The debug information is associated with the boot process. Further, the method includes outputting, via test circuitry of the processing system, the error information and debug information based on a testing instruction.
G06F 11/14 - Error detection or correction of the data by redundancy in operation, e.g. by using different operation sequences leading to the same result
An integrated circuitry (IC) device for a memory device includes driver circuitry, selection circuitry, clock generation circuitry, and self-time path circuitry. The driver circuitry generates a plurality of driver circuitry outputs. The selection circuitry selects one of the plurality of driver circuitry outputs based on a plurality of enable signals. The clock generation circuitry receives the selected one of the plurality of driver circuitry outputs from the selection circuitry, and generates a clock signal based on at least the selected one of the plurality of driver circuitry outputs from the selection circuitry. The self-time path circuitry of a memory receives the clock signal and generates a reset signal based on the clock signal. The plurality of driver circuitry outputs and the clock signal are based on the reset signal, and the self-time path circuitry corresponds to one or more columns of a memory bank.
Techniques to utilize thin-oxide devices, such as gate-all-around metal-oxide-semiconductor field-effect transistors (MOSFETs), in high voltage environments, such as to provide a high-voltage based low-power, temperature dependent, thin-oxide-only on-chip high current low drop out (LDO) regulator in a system-on-chip (SoC), such as provide power to configuration random-access memory (CRAM) cells distributed throughout configurable/programmable circuitry. Thin-oxide only circuitry may include thin-oxide-only amplifier circuitry, thin-oxide-only power gate circuitry, thin-oxide-only level shifters that shift voltage swings of control signals to voltage domains of the power gate circuitry, and thin-oxide-only clamp circuitry.
H01L 27/01 - Devices consisting of a plurality of semiconductor or other solid-state components formed in or on a common substrate comprising only passive thin-film or thick-film elements formed on a common insulating substrate
G05F 1/567 - Regulating voltage or current wherein the variable actually regulated by the final control device is dc using semiconductor devices in series with the load as final control devices sensing a condition of the system or its load in addition to means responsive to deviations in the output of the system, e.g. current, voltage, power factor for temperature compensation
G05F 1/575 - Regulating voltage or current wherein the variable actually regulated by the final control device is dc using semiconductor devices in series with the load as final control devices characterised by the feedback circuit
H01L 21/82 - Manufacture or treatment of devices consisting of a plurality of solid state components or integrated circuits formed in, or on, a common substrate with subsequent division of the substrate into plural individual devices to produce devices, e.g. integrated circuits, each consisting of a plurality of components
H01L 27/088 - Devices consisting of a plurality of semiconductor or other solid-state components formed in or on a common substrate including integrated passive circuit elements with at least one potential-jump barrier or surface barrier the substrate being a semiconductor body including only semiconductor components of a single kind including field-effect components only the components being field-effect transistors with insulated gate
H01L 29/788 - Field-effect transistors with field effect produced by an insulated gate with floating gate
33.
SUPPORTING MULTIPLE CONTROLLER CIRCUITS ON A MULTIPLEXED COMMUNICATION BUS
A system includes a plurality of controller circuits. The system includes a plurality of target circuits. The system includes a communication bus communicatively linking the plurality of controller circuits with the plurality of target circuits. The communication bus includes a plurality of switches. Each switch of the plurality of switches is connected to a different one of the plurality of controller circuits.
A network interface device comprises circuitry to add a frame check sequence value a data packet to be transmitted onto a network. The data packet with the frame check sequence value is stored in memory. Media access control layer circuitry reads the data packet from the memory and determines if the frame check sequence value is correct. When it is note correct, it is determined that the data in the data packet is corrupted.
Methods and apparatus for calibrating a gain for a circuit block are disclosed. An example method includes receiving a plurality of quantizer offsets, where the plurality of quantizer offsets represent calibration data for a quantizer configured to quantize an output of the circuit block, determining one or more differences based on one or more first quantizer offsets of the plurality of quantizer offsets and on one or more second quantizer offsets of the plurality of quantizer offsets, and determining an incremental change in a gain associated with the circuit block based on the one or more differences.
Dynamic provisioning of portions of a data processing array includes receiving, from an executing application, a context request. The context request specifies a requested task to be performed by a data processing array. A configuration for the data processing array is selected from a plurality of configurations for the data processing array. The selected configuration conforms with the context request and is capable of performing the requested task. A determination is made whether the selected configuration is implementable in the data processing array based, at least in part, on a space requirement of the selected configuration and a current status of the data processing array. The selected configuration is selectively implemented in the data processing array based on the determination.
A design for a programmable integrated circuit (IC) is synthesized and includes an inference engine and a data transformer. A portion of the design including the data transformer is designated as a dynamic function exchange (DFX) module. The inference engine is excluded from the DFX module. The design is implemented, by placing and routing, such that the DFX module is confined to a defined physical area of the programmable integrated circuit. An abstract shell for the design specifying boundary connections of the DFX module as placed and routed is generated. A locked version of the design as placed and routed with the DFX module removed is generated. The method includes implementing a different data transformer as a further DFX module for the design using the abstract shell.
Embodiments herein describe a multi-chip device that includes multiple ICs with interconnected NoCs. Embodiments herein provided address translation circuitry in the ICs. The address translation circuitry establish a hierarchy where traffic originating for a first IC that is intended for a destination on a second IC is first routed to the address translation circuitry on the second IC which then performs an address translation and inserts the traffic back on the NoC in the second IC but with a destination ID corresponding to the destination. In this manner, the IC can have additional address apertures only to route traffic to the address translation circuitry of the other ICs rather than having address apertures for every destination in the other ICs.
An electronic system includes a source follower circuitry that functions as an input driver. The source follower circuitry includes a first input transistor, first current source circuitry, and first phase shift circuitry. The first input transistor includes a first node coupled to a first voltage node, a second node coupled to a first output node, and a gate node coupled to a first input node. The gate node receives a first input signal via the first input node. The first current source circuitry coupled to the first output node and configured to generate a first bias current. The first phase shift circuitry is coupled to the first current source circuitry. The first phase shift circuitry generates a first phase shift signal to modulate the first current source circuitry to reduce signal drop across the first input transistor.
H03K 17/687 - Electronic switching or gating, i.e. not by contact-making and -breaking characterised by the use of specified components by the use, as active elements, of semiconductor devices the devices being field-effect transistors
A system includes a plurality of processing elements and a plurality of memory controllers. The system includes a network on chip (NoC) providing connectivity between the plurality of processing elements and the plurality of memory controllers. The NoC includes a sparse network coupled to the plurality of processing elements and a non-blocking network coupled to the sparse network and the plurality of memory controllers. The plurality of processing elements execute a plurality of applications. Each application has a same deterministic memory access performance in accessing associated ones of the plurality of memory controllers via the sparse network and the non-blocking network of the NoC.
Embodiments herein describe a multi-chip device that includes multiple ICs with interconnected NoCs. Embodiments herein provided address translation circuitry in the ICs. The address translation circuitry establish a hierarchy where traffic originating for a first IC that is intended for a destination on a second IC is first routed to the address translation circuitry on the second IC which then performs an address translation and inserts the traffic back on the NoC in the second IC but with a destination ID corresponding to the destination. In this manner, the IC can have additional address apertures only to route traffic to the address translation circuitry of the other ICs rather than having address apertures for every destination in the other ICs.
Methods and systems for selecting between single-process and multi-process implementation flows involve identifying features of a circuit design by a design tool. A classification model is applied to the features. The classification model indicates whether an implementation flow on the circuit design is likely to have a runtime within a first range of runtimes or a runtime within a second range of runtimes. The implementation flow is executed by the design tool in a single process in response to the classification model indicating the implementation flow on the circuit design is likely to have a runtime within the first range of runtimes. The implementation flow is executed by the design tool in a plurality of processes in response to the classification model indicating the implementation flow on the circuit design is likely to have a runtime within the second range of runtimes.
Multi-stage routing for a circuit design includes performing, using computer hardware, a global routing of the circuit design using a hybrid routing graph for a target integrated circuit. The hybrid routing graph includes routing nodes and a plurality of coarsened routing nodes. Each coarsened routing node includes a plurality of constituent routing nodes that are treated as a single node during the global routing. A detailed routing of the circuit design is performed using the computer hardware to generate a legal routing solution for the circuit design. The detailed routing is performed by routing, in parallel, the nets of the circuit design that were globally routed using the plurality of coarsened routing nodes.
A memory device is disclosed herein that leverages high ratio column MUXES to improve SEU resistance. The memory device may be utilized in an integrated circuit die and chip packages having the same. In one example, as semiconductor memory device is provided that includes first and second arrays of semiconductor memory cells, a plurality of sense amplifiers configured to read data states from the first and second arrays of memory cells, and a first plurality of high ratio MUXES connecting the first and second arrays of memory cells to the plurality of sense amplifiers.
An input/output (I/O) buffer is implemented without an auxiliary power supply (VCCAUX). The input/output (I/O) buffer includes a connection to a VCCO power supply, a connection to a VCCINT power supply, a connection to a reference voltage, and a VCCO detection circuit coupled to a bias generation circuit. Further, the I/O buffer includes a transmitter circuit coupled to the bias generation circuit, and a receiver circuit coupled to an I/O pad.
G05F 1/56 - Regulating voltage or current wherein the variable actually regulated by the final control device is dc using semiconductor devices in series with the load as final control devices
H03K 17/687 - Electronic switching or gating, i.e. not by contact-making and -breaking characterised by the use of specified components by the use, as active elements, of semiconductor devices the devices being field-effect transistors
Embodiments herein describe a configurable packet processing architecture for a SmartNIC or other network device. The configurable architecture includes a plurality of PPEs which are communicatively coupled using a packet bus. A packet can be processed in each of the PPEs. For example, each packet may be first processed by PPE 1, then PPE 2, then PPE 3, and so forth. Moreover, the results of processing the packet at a PPE 1 may affect the operation performed on the packet when it reaches PPE 2 or PPE 3. Thus, the PPEs form a chain where the results determined by a first PPE when processing the packet can affect or change the operation a second PPE performs when processing the same packet.
[A non-blocking crossbar switch architecture is disclosed that circumvents the problem present in prior art crossbar switches where input signals may oversubscribe the available inter-die bandwidth. The new non-blocking crossbar switch architecture is split across a plurality of semiconductor dice, including a plurality of interleaved crossbar switch segments. Only one crossbar switch segment is implemented on each semiconductor die. A plurality of input ports and output ports are coupled to the crossbar switch. The crossbar switch is non-blocking, i.e. any one output port not currently receiving data may receive data from any one input port.
Bi-directional dynamic function exchange (DFX) can include receiving a circuit design for a programmable integrated circuit (IC). The circuit design includes a plurality of DFX partitions coupled by a signal path. The circuit design can be placed using a first plurality of DFX modules for the plurality of DFX partitions, in part, by selecting a flip-flop of a connection block as a boundary flip-flop of the signal path for each DFX module of the plurality of DFX modules. The circuit design including the signal path can be routed through the selected flip-flops of the connection blocks using a bi-directional routing resource coupling the plurality of connection blocks. The bi-directional routing resource is used as a partition pin placement constraint (PPLOC) node for DFX.
A chip package and method for fabricating the same are provided that include hybrid bonded bridge dies connecting IC dies on adjacent die stacks. In one example, a chip package includes an interconnect routing structure, a first die stack and a second die stack. The first die stack includes a top die disposed over a bottom die, the bottom die stacked on the interconnect routing structure. The second die stack also includes a top die disposed over a bottom die, the bottom die stacked on the interconnect routing structure. The first bridge die is electrically and mechanically coupled to the top dies of the first and second die stacks. The first bridge die having solid state circuitry that connects circuitries of the top dies of the first and second die stacks.
H01L 25/18 - Assemblies consisting of a plurality of individual semiconductor or other solid state devices the devices being of types provided for in two or more different subgroups of the same main group of groups , or in a single subclass of ,
H01L 23/00 - SEMICONDUCTOR DEVICES NOT COVERED BY CLASS - Details of semiconductor or other solid state devices
H01L 23/538 - Arrangements for conducting electric current within the device in operation from one component to another the interconnection structure between a plurality of semiconductor chips being formed on, or in, insulating substrates
H01L 25/00 - Assemblies consisting of a plurality of individual semiconductor or other solid state devices
A non-blocking crossbar switch architecture is disclosed that circumvents the problem present in prior art crossbar switches where input signals may oversubscribe the available inter-die bandwidth. The new non-blocking crossbar switch architecture is split across a plurality of semiconductor dice, including a plurality of interleaved crossbar switch segments. Only one crossbar switch segment is implemented on each semiconductor die. A plurality of input ports and output ports are coupled to the crossbar switch. The crossbar switch is non-blocking, i.e. any one output port not currently receiving data may receive data from any one input port.
H04Q 3/52 - Circuit arrangements for indirect selecting controlled by common circuits, e.g. register controller, marker using static devices in switching stages, e.g. electronic switching arrangements
H04Q 3/60 - Arrangements providing connection between main exchange and sub-exchange or satellite for connecting to satellites or concentrators which connect one or more exchange lines with a group of local lines
51.
Registration of a PUF signature and regeneration using a trellis decoder
A physically unclonable function includes a circuit that translates a normally distributed sequence of raw sample into a sequence of uniformly distributed binned values across sub-bins of bins. Helper circuitry generates centering values and parity bits based on binned values generated during registration. Each centering value is associated with a raw sample value corresponding to a binned value and indicates an offset of a sub-bin in one of the bins. A distance calculator generates a set of distances from each raw sample value based on the centering value associated with the raw sample value. Each distance indicates a difference between the respective raw sample value and a raw sample value equivalent to a midpoint of a sub-bin offset by the associated centering value in a bin. A trellis decoder generates a PUF signature based on the candidate symbols, sets of distances, and parity bits.
H03M 13/25 - Error detection or forward error correction by signal space coding, i.e. adding redundancy in the signal constellation, e.g. Trellis Coded Modulation [TCM]
H03M 13/11 - Error detection or forward error correction by redundancy in data representation, i.e. code words containing more digits than the source words using block codes, i.e. a predetermined number of check bits joined to a predetermined number of information bits using multiple parity bits
52.
PROGRAMMABLE STREAM SWITCHES AND FUNCTIONAL SAFETY CIRCUITS IN INTEGRATED CIRCUITS
An integrated circuit (IC) may include a plurality of compute tiles in a data processing array. Each compute tile is configured to perform a data processing function. The IC may include a plurality of interface tiles in the data processing array. The plurality of interface tiles are communicatively linked to the plurality of compute tiles. The IC may include a plurality of programmable stream switches disposed in the plurality of compute tiles and the plurality of interface tiles. The IC may include a functional safety circuit. The functional safety circuit is connected to a selected programmable stream switch of the plurality of programmable stream switches. The functional safety circuit is configured to perform a functional safety function on a plurality of data streams routed to the functional safety circuit from the selected programmable stream switch.
Parameters defining a matrix multiply operation to be implemented in a data processing array can be received. A formulation of the matrix multiply operation is generated based on the parameters. A matrix multiply solution is determined for performing the matrix multiply operation in the data processing array. The matrix multiply solution specifies a spatial and temporal partitioning of the matrix multiply operation for implementation in the data processing array. Synthesizable program code is generated that defines an interface for the data processing array based on the matrix multiply solution. The interface is configured to partition and transfer input data to the data processing array from an external memory and convey output data from the data processing array to the external memory.
G06F 7/72 - Methods or arrangements for performing computations using a digital non-denominational number representation, i.e. number representation without radix; Computing devices using combinations of denominational and non-denominational quantity representations using residue arithmetic
Placement of macros of a circuit design includes mapping the macros to types of sub-circuits of an integrated circuit (IC). The IC includes anchors and instances of each type of the types of sub-circuits. The macros are grouped based on couplings of the macros to the anchors specified in the circuit design. Each group includes one or more macros, and the one or more macros in each group are all coupled to the same set of one or more anchors. A location is selected from alternative locations for each group of macros based on a distance of the location from the same set of anchors. Each location includes one or more instances of one or more types of the types of sub-circuits. The circuit design is placed and routed after selecting the location for each group, and implementation data is generated for making an IC that implements the circuit design.
Embodiments herein describe a configurable packet processing architecture for a SmartNIC or other network device. The configurable architecture includes a plurality of PPEs which are communicatively coupled using a packet bus. A packet can be processed in each of the PPEs. For example, each packet may be first processed by PPE 1, then PPE 2, then PPE 3, and so forth. Moreover, the results of processing the packet at a PPE 1 may affect the operation performed on the packet when it reaches PPE 2 or PPE 3. Thus, the PPEs form a chain where the results determined by a first PPE when processing the packet can affect or change the operation a second PPE performs when processing the same packet.
G06F 7/57 - Arithmetic logic units [ALU], i.e. arrangements or devices for performing two or more of the operations covered by groups or for performing logical operations
56.
COMPRESSION OF SPARSE MATRICES FOR VECTOR PROCESSING
Partition-level compression of an m×n sparse matrix includes determining in each partition, row and column indices of elements having non-zero values. Each partition has s rows and t columns and s
G06F 7/544 - Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation using unspecified devices for evaluating functions by calculation
57.
U-TURN CIRCUITRY TO CONVERT INTER-LAYER CONNECTIONS OF AN INTEGRATED CIRCUIT DEVICE TO INTRA-LAYER CONNECTIONS
An integrated circuit (IC) device includes a block of integrated circuitry that includes functional circuitry and configurable interface circuitry. The configurable interface circuitry includes output circuitry that routes a node of the functional circuitry to an output node of the block, and input circuitry that selectively routes the output node of the block or an input node of the block to the functional circuitry. The output circuitry may route a selectable subset of multiple nodes of the functional circuitry to respective output nodes of the block, and the input circuitry may be configured to route the output nodes of the block back to the functional circuitry in the absence of an adjacent block (e.g., to repurpose the output circuitry), or in addition to interfacing with the adjacent block.
H01L 25/065 - Assemblies consisting of a plurality of individual semiconductor or other solid state devices all the devices being of a type provided for in the same subgroup of groups , or in a single subclass of , , e.g. assemblies of rectifier diodes the devices not having separate containers the devices being of a type provided for in group
G11C 5/02 - Disposition of storage elements, e.g. in the form of a matrix array
G11C 5/06 - Arrangements for interconnecting storage elements electrically, e.g. by wiring
H01L 23/48 - Arrangements for conducting electric current to or from the solid state body in operation, e.g. leads or terminal arrangements
H01L 27/02 - Devices consisting of a plurality of semiconductor or other solid-state components formed in or on a common substrate including integrated passive circuit elements with at least one potential-jump barrier or surface barrier
H01L 27/06 - Devices consisting of a plurality of semiconductor or other solid-state components formed in or on a common substrate including integrated passive circuit elements with at least one potential-jump barrier or surface barrier the substrate being a semiconductor body including a plurality of individual components in a non-repetitive configuration
58.
DYNAMIC DATA CONVERSION FOR NETWORK COMPUTER SYSTEMS
A computing node for a computing system includes a processor, conversion circuitry, and routing circuitry. The processor generates a data signal based on a function of an application executed by the computing system. The data signal has a first precision format and a first sparse representation. The conversion circuitry receives the data signal from the processor and generate a converted data signal by at least one of converting the first precision format to a second precision format and converting the first sparse representation to a second sparse representation. The routing circuitry transmits the converted data signal to switch circuitry of the computing system.
A method comprises compiling, by a compiler, a received program to provide a compiler output for configuring hardware to implement the received program. The received program relate to packets of data in a memory. The compiling comprising defining by the compiler output a plurality of computational units in the hardware, each of the computational units being configured to receive a packet of data as a stream of words and between a first and a second of the computational units, a first buffer for storing words of a packet and a second buffer for storing data output by the first computational unit.
An integrated circuit (IC) device includes a first IC chip, a second IC chip, and a chip-to-chip interface connected between the first IC chip and the second IC chip. The chip-to-chip interface communicates an interface clock signal and a logic clock signal between the first IC chip and the second IC chip. The interface clock signal is synchronous with a data signal received by one of the first IC chip and the second IC chip. The logic clock signal is asynchronous with the data signal.
An integrated circuit (IC) device includes a first IC chip, a second IC chip, and a chip-to-chip interface connected between the first IC chip and the second IC chip. The chip-to-chip interface communicates an interface clock signal and a logic clock signal between the first IC chip and the second IC chip. A frequency of the interface clock signal is a multiple of a frequency of the logic clock signal.
A thread manager creates multiple threads by to execute a simulation of subsystems of a system-on-chip on multiple processor cores in response to execution of a simulation program. The threads execute multiple cycle-accurate simulation models of the subsystems in parallel in an execution phase of each simulation cycle of a plurality of simulation cycles of the simulation. The threads update interfaces of the simulation models in an update phase of each simulation cycle of the plurality of simulation cycles.
An integrated circuit, IC, device includes a first IC chip (110), a second IC chip (120), and a chip-to-chip interface (112) connected between the first IC chip (110) and the second IC chip (120). The chip-to-chip interface (120) communicates an interface clock signal and a logic clock signal between the first IC chip (110) and the second IC chip (120). The interface clock signal is synchronous with a data signal received by one of the first IC chip (110) and the second IC chip (120). The logic clock signal is asynchronous with the data signal.
Systems, methods, and apparatuses are described that enable IC architectures to enable a single anchor to connect to and accept a variety of chiplets at any port by way of a programming model that enables the anchor or chiplet to dynamically adapt to configurations, requirements, or aspects of any coupled component and provide an interface for the coupled components.
Providing dataflow based guidance for buffer allocation in a multicore circuit architecture includes converting, using computer hardware, an application specified in a high-level programming language into an intermediate representation. Buffers of dataflows of the intermediate representation are detected. Determining whether the buffers are independent or dependent based on an analysis of the dataflows of the intermediate representation. Buffer constraints are generated. The buffer constraints specify whether the buffers are independent and dictate a mapping of the buffers in the multicore circuit architecture.
Implementing burst transfers for predicated accesses in high-level synthesis includes generating, using computer hardware, an intermediate representation of a design specified in a high-level programming language. The design is for an integrated circuit. Using the computer hardware, loop predicate information for one or more conditional statements within a loop body of the intermediate representation is determined. A plurality of memory accesses of the loop body guarded by the one or more conditional statements are determined to be sequential memory accesses based on the predicate information. The intermediate representation is modified by inserting one or more intrinsics therein indicating that the sequential memory accesses are to be implemented using a burst transfer mode of the integrated circuit.
A system includes a network-on-chip (NoC). The system includes a protocol offload engine coupled to the NoC. The protocol offload engine is configured to generate packets of data for a selected protocol. The system includes a data movement processor coupled to the network-on-chip. The data movement processor is configured to receive a microcode instruction and, in response to the microcode instruction, establish data paths in the NoC that communicatively link a plurality of circuits involved in data transfers of a collective communication operation specified by the microcode instruction. The plurality of circuits include the protocol offload engine. The system includes a network transceiver coupled to the protocol offload engine. The network transceiver is configured to send the packets of data formatted by the protocol offload engine.
Chip packages are described herein that includes integrated passive devices embedded in a core of a substrate of the chip package, such as a package substrate or an interposer, that shield routings coupled to inductors from adjacent through-substrate conductive paths (e.g., vias). In one example, a chip package includes an integrated circuit (IC) die mounted to a substrate. A core of the substrate has a plurality of inductor routing vias, a plurality of signal transmission vias, and a plurality of ground and power routing vias. A first integrated passive device (IPD) is disposed in the core and separates at least one of the plurality of inductor routing vias from an adjacent via, the adjacent via being one of the plurality of signal transmission vias or one of the plurality of ground and power routing vias.
H01L 23/31 - Encapsulation, e.g. encapsulating layers, coatings characterised by the arrangement
H01L 25/00 - Assemblies consisting of a plurality of individual semiconductor or other solid state devices
H01L 25/16 - Assemblies consisting of a plurality of individual semiconductor or other solid state devices the devices being of types provided for in two or more different main groups of groups , or in a single subclass of , , e.g. forming hybrid circuits
In pruning weights from a neural network (NN), a design tool selects a dt-ds pair from a plurality of dt-ds pairs supported by a target device. Each dt-ds pair specifies a data type, dt, and an associated circuit structure, ds, that is configurable to compute d×s operations in parallel on a set of input activations and a matrix of weights of the data type, d is a number of rows in a sub-matrix of the matrix of weights, s is a number of columns in the sub-matrix, and d×s≥1. The design tool selects as pruned weights, one or more subsets of the weights, based at least on each subset of the one or more subsets including d×s weights in the matrix of weights of the layer. If performance of the pruned NN model is satisfactory, the NN is compiled into an execution graph and configuration data.
A communication system includes link circuits that receive serial data over one or more input serial links. The link circuits include a primary link circuit and a secondary link circuit. The secondary link circuit includes a de-serializer circuit configured to receive the serial data from the one or more input serial links and convert the serial data into parallel data, and an aligner circuit comprising a memory. The aligner circuit stops at least one of storing the parallel data in the memory and reading the memory based on a channel bonding signal generated based on a channel bonding symbol within the serial data. The aligner circuit outputs the channel bonding signal to a finite state machine (FSM) circuit of the primary link circuit. The aligner circuit outputs the parallel data based on receiving a read signal from the FSM circuit of the primary link circuit.
G06F 5/06 - Methods or arrangements for data conversion without changing the order or content of the data handled for changing the speed of data flow, i.e. speed regularising
Safety mechanisms are embedded into a System on a Chip (SoC) and are operable to detect faults present in the logic circuitry in the SoC. Various types of faults in logic circuitry can occur, for example, a bit stuck at 0 or 1, or a transient or temporary fault due to radiation impacting the SoC. SoC devices are required to meet certain automotive safety integrity standards. The most stringent automotive safety integrity level requires that 90% of random latent faults are detected in all relevant logic, including all safety mechanism. Examples disclosed include hardware based checkers and hardware or software based pattern generation methods that achieve high online fault coverage in safety mechanism circuitry used for functional safety. A hardware based safety mechanism monitors the logic circuitry during operation. Any time the safety mechanism detects any faults in the logic circuitry, a fault notification is propagated to upstream logic.
Stacked integrated circuit devices, chip packages and methods for operating a chip package are described herein that provide an increased level of backside protection from physical attacks that could compromise confidentiality or authentication of the integrated circuit device. In one example, a chip stack includes a sacrificial integrated circuit (IC) die stacked with a primary IC die. The sacrificial IC die includes a first split key information source. The primary IC die has security circuitry configured to generate an encryption key based at least in part on first split key information transmitted from the sacrificial IC die across a die-to-die interface to the primary IC die. Separation of the dies to probe or modify of the primary IC die would cause the destruction of split key information required to operate the functional circuitry of the primary IC die.
H01L 23/00 - SEMICONDUCTOR DEVICES NOT COVERED BY CLASS - Details of semiconductor or other solid state devices
H01L 25/18 - Assemblies consisting of a plurality of individual semiconductor or other solid state devices the devices being of types provided for in two or more different subgroups of the same main group of groups , or in a single subclass of ,
73.
FAST CLOCK DOMAIN CROSSING ARCHITECTURE FOR HIGH FREQUENCY TRADING (HFT)
A fast clock domain crossing architecture for high frequency trading includes a receiver that recovers data and a clock of a first clock domain from a communication from an exchange, functional circuitry that generates and a buy/sell command based on the recovered data and the recovered clock, format circuitry that formats the command in a second clock domain, and a transmitter that transmits the formatted command to the exchange. The architecture further includes error detection circuitry that detects bit errors that arise from an asynchronous boundary of the clock domains without increasing a round-trip latency, and/or synchronization circuitry that synchronizes the clock domains, where the synchronization circuitry includes a cleanup PLL that filters input jitter and a phase detector and variable delay line that compensate for latency within the architecture.
G06Q 40/04 - Trading; Exchange, e.g. stocks, commodities, derivatives or currency exchange
H04L 1/00 - Arrangements for detecting or preventing errors in the information received
H04L 7/00 - Arrangements for synchronising receiver with transmitter
H04L 7/033 - Speed or phase control by the received code signals, the signals containing no special synchronisation information using the transitions of the received signal to control the phase of the synchronising-signal- generating means, e.g. using a phase-locked loop
Embodiments herein describe an integrated circuit that includes a network on chip (NoC) where an egress logic block or switch performs a route lookup for a subsequent (e.g., downstream) switch in the NoC (referred to herein as look-ahead routing). After receiving the packet and a port ID from the egress logic block or the switch, the downstream switch knows, without performing route lookup of its own, on which port it should forward the packet. Thus, if the downstream switch performs other functions that are dependent on knowing the destination port (e.g., arbitration or QoS updating), the downstream switch can perform those functions immediately since the port ID was already determined by, and received from, the previous network element.
Embodiments herein describe a network on chip (NoC) that implements multi-path routing (MPR) between an ingress logic block and an egress logic block. The multiple paths between the ingress and egress logic blocks can be assigned different alias destination IDs corresponding to the same destination ID. The NoC can use the alias destination IDs to route the packets along the different paths through interconnected switches in the NoC.
Embodiments herein describe a memory controller that performs in-line data processing (e.g., cryptography and error correction) and efficiently organizes data and associated metadata in memory. The memory controller generates a data block that includes a processed (e.g., encrypted) dataset and associated metadata (e.g., cryptographic metadata and ECC), and stores the data block in a block of memory (i.e., rather than storing the metadata separately), to minimize a number of access operations. The memory controller may access memory in segments (e.g., BL16 operations) that are smaller than the data blocks. For linear accesses, the memory controller may cache a portion of a data block until a subsequent access operation.
A circuit arrangement includes a reduction operator circuits arranged in a first level of a reduction tree. Each reduction operator circuit accumulates respective products into a respective sum. Quantizer circuits are configured to quantize the sums from the reduction operator circuits into quantized sums, respectively, based on values of the sums relative to respective first thresholds. Another reduction operator circuit is arranged in a second level of the reduction tree and is configured to accumulate the quantized sums and provide a first sum. A second-level quantizer circuit is configured to quantize the first sum into a quantized first sum based on a value of the first sum relative to a second threshold.
G06F 7/544 - Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation using unspecified devices for evaluating functions by calculation
G06F 7/501 - Half or full adders, i.e. basic adder cells for one denomination
H03K 19/20 - Logic circuits, i.e. having at least two inputs acting on one output; Inverting circuits characterised by logic function, e.g. AND, OR, NOR, NOT circuits
Integrated circuit (IC) protections that prevent exposure or examination of integrated circuitry through backside analysis include a layer or mesh of an electrically conductive, electromagnetic radiation blocking material disposed over a backside of an IC device to prevent backside analysis. An electrically conductive conduit couples the material to a node of the integrated circuitry to provide a signal and/or voltage reference to the node through the layer/mesh. If the layer/mesh is tampered with, the integrated circuitry loses the voltage reference or signal thereby disabling the integrated circuitry. The IC device may include detection circuitry to monitor the node and to generate an alert and/or disable the circuitry upon tampering. The IC device may further include a support substrate, where a substrate between the material/mesh and the integrated circuitry is sufficiently thin that the IC device would be mechanically weak if the support substrate were removed.
A clock buffer has a clock-in port that inputs a reference clock and an enable port that inputs a video-clock-enable signal from a video receiver. The clock buffer generates a video pixel clock signal that has pulses of the reference signal as enabled by the video-clock-enable signal. The video receiver includes a link symbol extractor, a link-to-pixel mapper, and a timing generator that work to mirror the actual pixel data rate from the active period in a blanking period and thereby recover the actual video pixel clock.
G09G 5/399 - Control of the bit-mapped memory using two or more bit-mapped memories, the operations of which are switched in time, e.g. ping-pong buffers
G09G 5/18 - Timing circuits for raster scan displays
H04N 5/445 - Receiver circuitry for displaying additional information
Embodiments herein describe a network on chip (NoC) that implements multi-path routing (MPR) between an ingress logic block and an egress logic block. The multiple paths between the ingress and egress logic blocks can be assigned different alias destination IDs corresponding to the same destination ID. The NoC can use the alias destination IDs to route the packets along the different paths through interconnected switches in the NoC.
Signal routing and EMIR requirements are causing increased demand for metal resources. The cost of metal resources is also an issue. The design and sign-off of on-chip drivers for driving signals from one chip location to another is complicated by requirements for power integrity and signal routing. This disclosure addresses routing resource bottlenecks and power requirements by introducing a low power driver useable in a high speed SERDES scheme. A voltage clipping high speed and low swing driver is disclosed. Threshold switching voltage of the transmitted signal is controlled by a process and temperature compensated biasing scheme. A reference voltage generation circuitry along with a simple receiver demonstrates the capability of this receiver. This transceiver scheme can be used on an on-chip or off-chip SERDES application to send/receive low speed signals serially. Use of this novel technique addresses the metal resource issue along with EMIR and SIPI requirements.
H03K 17/687 - Electronic switching or gating, i.e. not by contact-making and -breaking characterised by the use of specified components by the use, as active elements, of semiconductor devices the devices being field-effect transistors
Examples described herein generally relate to forming and/or configuring a die stack in a multi-chip device. An example is a method of forming a multi-chip device. Dies are formed. At least two or more of the dies are interchangeable. Characteristics of the at least two or more of the dies that are interchangeable are determined. A die stack comprising the at least two or more of the dies that are interchangeable is formed. Respective placements within the die stack of the at least two or more of the dies that are interchangeable are based on the characteristics.
H01L 25/065 - Assemblies consisting of a plurality of individual semiconductor or other solid state devices all the devices being of a type provided for in the same subgroup of groups , or in a single subclass of , , e.g. assemblies of rectifier diodes the devices not having separate containers the devices being of a type provided for in group
H01L 23/00 - SEMICONDUCTOR DEVICES NOT COVERED BY CLASS - Details of semiconductor or other solid state devices
83.
DECOUPLING CAPACITOR PARAMETER DETERMINATION FOR A POWER DISTRIBUTION NETWORK
A circuit analysis system performs a method for analyzing a power distribution network by determining a first S-parameter model for a first circuit element of the power distribution network. The first circuit element includes first ports that are coupled to first decoupling capacitors. Each of the first decoupling capacitors is associated with a respective first decoupling capacitor S-parameter model. The first S-parameter model is combined with one or more of the first decoupling capacitor S-parameter models to generate a combined S-parameter model for the power distribution network. Further, an impedance profile for the power distribution network is determined based on the combined S-parameter model.
G06F 30/367 - Design verification, e.g. using simulation, simulation program with integrated circuit emphasis [SPICE], direct methods or relaxation methods
84.
MULTIPLIER BLOCK FOR BLOCK FLOATING POINT AND FLOATING POINT VALUES
A mode control circuit operates a circuit arrangement in either a first mode to multiply floating point operands or a second mode to compute a dot product of two vectors of block floating point values. A block of multiplier circuits generates products from first pairs of p-terms. Each p-term is a portion of a significand of one of the floating point operands when operating in the first mode, or a significand of one of the block floating point values when operating in the second mode. An adder tree that is coupled to the block of multiplier circuits sums the products into a final sum. A floating point conversion circuit is configured to generate a floating point value from the final sum and the floating point operands in response to operating in the first mode, and generate a block floating point value from the final sum in response to operating in the second mode.
G06F 7/544 - Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation using unspecified devices for evaluating functions by calculation
85.
SWITCHING BETWEEN REDUNDANT AND NON-REDUNDANT MODES OF SOFTWARE EXECUTION
Executing critical and non-critical sections of program code include executing a non-critical section of a first program by a first processor and executing a non-critical section of a second program by a second processor. The first processor signals the second processor with context to commence redundant execution of the critical section. The second processor switches from executing the second program to executing the critical section of the first program. The first processor executes the critical section of the first program concurrent with the second processor.
Techniques to provide transaction redundancy in an IC include receiving an original memory access request directed to a first memory aperture, constructing a redundant memory access directed to a second memory aperture, and selectively returning a response of the first or second memory aperture to an originator based on contents of the responses. For a write operation, if acknowledgement indicators of the responses indicate success, a response is returned to the originator. For a read operation, if acknowledgement indicators of the responses indicate success and data returned in the response match one another, a response is returned to the originator. If the acknowledgement indicators indicate success, but the data does not match, a retry of the original and redundant read requests is initiated. If any of the acknowledgement indicators do not indicate success, an error is declared. In a mixed-criticality embodiment, the redundant memory access request may be constructed selectively.
Embodiments herein describe a PIM correction circuit. In a base station, TX and RX RF changes, band pass filters, duplexers, and diplexers can have severe memory effects due to their sharp transition bandwidth from pass band to stop band. PIM interference, generated by the TX signals and reflected onto the RX RF chain will include these memory effects. These memory effects make PIM cancellation complex, requiring complicated computations and circuits. However, the embodiments herein use a PIM correction circuit that separates the memory effects of the TX and RX paths from the memory effects of PIM, thereby reducing PIM cancellation complexity and hardware implementation cost.
Multiple classifier models are applied to features of a circuit design after processing the design through a first phase of an implementation flow. Each classifier model is associated with one of multiple directives, the directives are associated with a second phase of the implementation flow, and each classifier model returns a value indicative of likelihood of improving a quality metric. Regressor models of each set of a plurality of sets of regressor models are applied to the features. Each directive is associated with one of the sets of regressor models, and a combined score from each set of regressor models indicates a likelihood of satisfying a constraint. The directives are ranked based on the values indicated by the classifier models and scores from the sets of regressor models, and the circuit design is processed n the second phase of the implementation flow by the design tool using the directive having the highest rank.
Embodiments herein describe a NoC where its internal switches have buffers with pods that can be assigned to different virtual channels. A subset of the pods in a buffer can be grouped together to form a VC. In this manner, different pod groups in a buffer can be assigned to different VCs (or to different types of NoC data units), where VCs that transmit wider data units can be assigned more pods than VCs that transmit narrower data units.
A semiconductor device system comprises an integrated circuit (IC) die. The IC die is configured to operate in a first operating mode during a first period, and a second operating mode during a second period. The first period is associated with enabling an element of the IC die and a first amount of voltage droop. The second period occurs after the first period and is associated with a second amount of voltage droop. The second amount of voltage droop is less than the first amount of voltage droop.
Embodiments herein describe a hardened fractional resampler that includes a fixed filter that supports simultaneous processing of N input samples with minimal additional combinational logic and no additional multipliers. In one embodiment, the fractional resampler is implemented in an integrated circuit using hardened circuit. The embodiments below exploit a pattern in the order filter phases in fractional resampling systems (such as a SSR resampling system) to use filter phases in a single fixed filter to process multiple input samples in parallel, where these filter phases would have been unused in previous resampling systems.
H03M 7/00 - Conversion of a code where information is represented by a given sequence or number of digits to a code where the same information is represented by a different sequence or number of digits
Examples described herein generally relate to clock tree routing in a chip stack. In an example, a multi-chip device includes a chip stack. The chip stack includes chips. The chip stack includes a clock tree. In-chip routing of the clock tree is contained within one logical chip of the chip stack. The chip stack includes leaf nodes disposed in respective chips. Each leaf node of the leaf nodes is electrically connected to the clock tree through a respective leaf-level connection bridge. The respective leaf-level connection bridge extends in an out-of-chip direction through a plurality of the chips.
A yield recovery scheme for configuration memory of an IC device includes asserting an override configuration value on a bitline of memory cells of the configuration memory, where a data node of a faulty one of the memory cells is coupled to a node of configurable circuitry of the IC device, and asserting a wordline of the faulty memory cell while the override configuration value is asserted on the bitline to couple the bitline to the node of the configurable circuitry through the faulty memory cell (i.e., to force a state of the data node to the override configuration value). An identifier of the faulty memory cell may be stored on the IC device (e.g., E-fuses), and control circuitry of the IC device may retrieve the identifier to configure override circuitry of the IC device.
Implementing data flows of an application across a memory hierarchy of a data processing array includes receiving a data flow graph specifying an application for execution on the data processing array. A plurality of buffer objects corresponding to a plurality of different levels of the memory hierarchy of the data processing array and an external memory are identified. The plurality of buffer objects specify data flows. Buffer object parameters are determined. The buffer object parameters define properties of the data flows. Data that configures the data processing array to implement the data flows among the plurality of different levels of the memory hierarchy and the external memory is generated based on the plurality of buffer objects and the buffer object parameters.
Examples herein propose operating redundant ML models which have been trained using a boosting technique that considers hardware faults. The embodiments herein describe performing an evaluation process where the performance of a first ML model is measured in the presence of a hardware fault. The errors introduced by the hardware fault can then be used to train a second ML model. In one embodiment, a second evaluation process is performed where the combined performance of both the first and second trained ML models is measured in the presence of a hardware fault. The resulting errors can then be used when training a third ML model. In this manner, the three trained ML models are trained to be error aware. As a result, during operation, if a hardware fault occurs, the three ML models have better performance relative to three ML models that where not trained to be error aware.
An apparatus includes a data processing array having a plurality of array tiles. The plurality of array tiles include a plurality of compute tiles. The compute tiles include a core coupled to a random-access memory (RAM) in a same compute tile and to a RAM of at least one other compute tile. The data processing array is subdivided into a plurality of partitions. Each partition includes a plurality of array tiles including at least one of the plurality of compute tiles. The apparatus includes a plurality of clock gate circuits being programmable to selectively gate a clock signal provided to a respective one of the plurality of partitions.
Control of a reconfigurable platform can include determining, by a host computer, an interface universally unique identifier (UUID) of an interface of platform circuitry implemented on an accelerator, wherein the accelerator is communicatively linked to the host computer. An electronic request to run a partition design on the accelerator is received by the host computer. In response to the electronic request, the host computer determines an interface UUID for an interface of the partition design and determines compatibility of the partition design with the platform circuitry based on a comparison of the interface UUID of the partition design with the interface UUID of the platform circuitry. The partition design is implemented on the accelerator in response to determining that the partition design is compatible with the platform circuitry.
G06F 9/50 - Allocation of resources, e.g. of the central processing unit [CPU]
G06F 9/38 - Concurrent instruction execution, e.g. pipeline, look ahead
H04L 9/06 - Arrangements for secret or secure communications; Network security protocols the encryption apparatus using shift registers or memories for blockwise coding, e.g. D.E.S. systems
Embodiments herein describe correcting nonlinearity in a Digital-to-Time Converter (DTC) by relaxing a DTC linearity requirement, which results in the correction being co-adapted with a DTC gain calibration loop which can operate in parallel with a DTC integral nonlinearity (INL) correction loop. In one embodiment, the DTC gain calibration loop and the DTC INL correction loop are constrained when determining a nonlinearity correction code to improve the likelihood they converge. Once determined, the nonlinearity correction code can be combined with an digital code output by a time-to-digital converter (TDC) to generate a phase difference between a reference clock and a feedback clock.
H03L 7/08 - Automatic control of frequency or phase; Synchronisation using a reference signal applied to a frequency- or phase-locked loop - Details of the phase-locked loop
H03L 7/099 - Automatic control of frequency or phase; Synchronisation using a reference signal applied to a frequency- or phase-locked loop - Details of the phase-locked loop concerning mainly the controlled oscillator of the loop
H03M 1/82 - Digital/analogue converters with intermediate conversion to time interval
An adder for fractional logarithmic number system (FLNS) format operands includes a compare-and-swap circuit that inputs first and second FLNS operands represented by fixed point values and provides a greater one as operand x and a lesser or equal one as operand y. Sign bits are sx and sy of x and y, respectively, qx and qy, are integer portions of x and y, respectively, fraction portions of x and y have integer values rx and ry, respectively. The compare-and-swap circuit is configured to provide sx as a sign bit, sz of a sum z=x(1+y/x) for x≠0. A subtraction circuit subtracts (qy+ry/n)−(qx+rx/n) and outputs qα and rα, such that α=y/x, where n=2wr and wr is a bit-width of rx and ry. An approximation circuit provides an approximation of (1+α) to a nearest FLNS value, β, as fixed point value having an integer portion qβ and a fraction portion that has an integer value rβ. A summing circuit adds qx+rx/n+qβ+rβ/n in response to sx=sy, and subtracts qx+rx/n−qβ−rβ/n in response to sx≠sy, to provide the sum as a fixed point value having an integer portion qz and a fraction portion that as an integer has a value rz.
G06F 7/483 - Computations with numbers represented by a non-linear combination of denominational numbers, e.g. rational numbers, logarithmic number system or floating-point numbers
G06F 7/499 - Denomination or exception handling, e.g. rounding or overflow
Embodiments herein describe a processor system that inciudes an integrated, adaptive accelerator. In one embodiment, the processor system includes multiple core complex chiplets that each contain one or processing cores for a host CPU. In addition the processor system inciudes an accelerator chiplet. The processor system can assign one or more of the core complex chiplets to the accelerator chiplet to form an IO device while the remaining core complex chiplets form the CPU for the host. In this manner, rather than the accelerator and the CPU having independent computer resources, the accelerator can be integrated into the processor system of the host so that hardware resources can be divided between the CPU and the accelerator depending on the needs of the particular application(s) executed by the host.