A system and/or an integrated circuit including a multiplier-accumulator execution pipeline which includes a plurality of MACs to implement a plurality of multiply and accumulate operations. A first memory stores filter weights having a Gaussian floating point (“GFP”) data format and a first bit length. A data format conversion circuitry includes circuitry to convert the filter weights from the GFP data format and the first bit length to filter weights having the data format and bit length that are different from the GFP data format and the first bit length. The converted filter weights are output to the MACs, wherein in operation, the MACs are configured to perform the plurality of multiply operations using (a) the input data and (b) the filter weights having the data format and bit length that are different from the GFP data format and the first bit length, respectively.
G06F 7/544 - Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation using unspecified devices for evaluating functions by calculation
H03M 7/24 - Conversion to or from floating-point codes
2.
Single-Weight-Multiple-Data Multiply-Accumulate with Winograd Layers
An integrated circuit device includes a broadcast data path, a weighting-value memory, Winograd conversion circuitry and multiply-accumulate units. The Winograd conversion circuitry executes a first Winograd conversion function with respect to an input data set to render a converted input data set onto the broadcast data path and executes a second Winograd conversion function with respect to a filter-weight data set to store a converted weighting data set within the weighting-value memory. The multiply-accumulate units, coupled in common to the broadcast data path to receive the converted input data set and coupled to receive respective converted weighting data values from the weighting-value memory, execute a parallel sequence of multiply-accumulate operations to generate an interim output data set that is, in turn, converted to a final output data set through execution of a third Winograd conversion function within the Winograd conversion circuitry.
G06F 7/544 - Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation using unspecified devices for evaluating functions by calculation
An integrated circuit device includes operand storage circuitry to output first and second operands each having a first standard floating point format, multiplier circuitry to multiply the first and second operands to generate a multiplication product first having a second standard floating point format and product accumulation circuitry. The product accumulation circuitry reformats the multiplication product to coarse floating format having a reduced numeric range relative to the originally generated multiplication product and then adds the reformatted multiplication product to a previously generated accumulation value, also having the coarse floating point format, to generate an updated accumulation value having the coarse floating point format, storing the updated accumulation value in place of the previously generated accumulation value.
G06F 7/544 - Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation using unspecified devices for evaluating functions by calculation
An integrated circuit device includes one or more broadcast data paths, a weighting-value memory and multiply-accumulate (MAC) units. The MAC units are coupled in common to each of the broadcast data paths and coupled to receive respective weighting values from the weighting-value memory via respective weighting-value paths. Each of the MAC units includes MAC circuits that each receive an input data value via a respective one of the broadcast data paths and a shared one of the weighting values via a shared one of the respective weighting-value paths; generate a sequence of multiplication products by multiplying the input data value with the shared one of the weighting values; and accumulate a sum of the multiplication products. A configuration value stored within a programmable register controls the number of timing cycles over which the sum of the multiplication products is accumulated.
G06F 7/544 - Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation using unspecified devices for evaluating functions by calculation
5.
Multiply-Accumulate Engine with Non-Normalized Floating-Point Accumulator
A floating-point summation circuit implemented within an integrated circuit device and having inputs to receive a first normalized floating-point operand having an exponent field and a fraction field, and a non-normalized floating-point operand having an exponent field and a fraction field, the fraction field of the non-normalized floating-point operand having a first significant bit in any of at least two different bit positions. Normalizing circuitry within the floating-point summation circuit generates, at least by normalizing the fraction field of the non-normalized floating-point operand, a second normalized floating-point operand having a value corresponding to that of the non-normalized floating point operand, and adder circuitry within the floating-point summation-circuit generates a floating-point sum by adding the first and second normalized floating-point operands.
An integrated circuit device includes broadcast data paths, a weighting-value memory, multiply-accumulate (MAC) units, and shared shift-out circuitry. The MAC units are coupled in common to each of the broadcast data paths and coupled to receive respective weighting values from the weighting-value memory via respective weighting-value paths. Each of the MAC units includes MAC circuits that each receive an input data value via a respective one of the broadcast data paths and a shared one of the weighting values via a shared one of the respective weighting-value paths; generate a sequence of multiplication products by multiplying the input data value with the shared one of the weighting values; accumulate a sum of the multiplication products; and output the sum of the multiplication products to a respective one of a plurality of serially coupled storage elements within the shared shift-out path.
G06F 7/544 - Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation using unspecified devices for evaluating functions by calculation
7.
BROADCAST DATA MULTIPLY-ACCUMULATE WITH SHARED UNLOAD
An integrated circuit device includes broadcast data paths, a weighting-value memory, multiply-accumulate (MAC) units, and shared shift-out circuitry. The MAC units are coupled in common to each of the broadcast data paths and coupled to receive respective weighting values from the weighting-value memory via respective weighting-value paths. Each of the MAC units includes MAC circuits that each receive an input data value via a respective one of the broadcast data paths and a shared one of the weighting values via a shared one of the respective weighting-value paths; generate a sequence of multiplication products by multiplying the input data value with the shared one of the weighting values; accumulate a sum of the multiplication products; and output the sum of the multiplication products to a respective one of a plurality of serially coupled storage elements within the shared shift-out path.
G06F 7/544 - Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation using unspecified devices for evaluating functions by calculation
An integrated circuit device includes broadcast data paths, a weighting-value memory, and multiply-accumulate (MAC) units. The MAC units are coupled in common to each of the broadcast data paths and coupled to receive respective weighting values from the weighting-value memory via respective weighting-value paths. Each of the MAC units includes a plurality of MAC circuits coupled respectively to the broadcast data paths, with each of the MAC circuits within a given one of the MAC units (i) receiving an input data value via a respective one of the broadcast data paths and a shared one of the weighting values via a shared one of the respective weighting-value paths, (ii) generating a sequence of multiplication products by multiplying the input data value with the shared one of the weighting values, and (iii) accumulating a sum of the multiplication products.
G06F 9/30 - Arrangements for executing machine instructions, e.g. instruction decode
G06F 7/544 - Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation using unspecified devices for evaluating functions by calculation
An integrated circuit device includes broadcast data paths, a weighting-value memory, and multiply-accumulate (MAC) units. The MAC units are coupled in common to each of the broadcast data paths and coupled to receive respective weighting values from the weighting-value memory via respective weighting-value paths. Each of the MAC units includes a plurality of MAC circuits coupled respectively to the broadcast data paths, with each of the MAC circuits within a given one of the MAC units (i) receiving an input data value via a respective one of the broadcast data paths and a shared one of the weighting values via a shared one of the respective weighting-value paths, (ii) generating a sequence of multiplication products by multiplying the input data value with the shared one of the weighting values, and (iii) accumulating a sum of the multiplication products.
G06F 7/544 - Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation using unspecified devices for evaluating functions by calculation
Multiply-accumulate processors within a tensor processing unit simultaneously execute, in each of a sequence of multiply-accumulate cycles, respective multiply operations using a shared input data operand and respective weighting operands, each of the multiply-accumulate processors applying a new shared input data operand and respective weighting operand in each successive multiply-accumulate cycle to accumulate, as a component of an output tensor, a respective sum- of-multiplication-products.
G06F 7/544 - Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation using unspecified devices for evaluating functions by calculation
Multiply-accumulate processors within a tensor processing unit simultaneously execute, in each of a sequence of multiply-accumulate cycles, respective multiply operations using a shared input data operand and respective weighting operands, each of the multiply-accumulate processors applying a new shared input data operand and respective weighting operand in each successive multiply-accumulate cycle to accumulate, as a component of an output tensor, a respective sum-of-multiplication-products.
G06F 7/544 - Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation using unspecified devices for evaluating functions by calculation
An integrated circuit including a multiplier-accumulator execution pipeline including a plurality of multiplier-accumulator circuits to, in operation, perform multiply and accumulate operations, wherein each multiplier-accumulator circuit includes: (i) a multiplier to multiply first input data, having a first floating point data format, by a filter weight data, having the first floating point data format, and generate and output a product data having a second floating point data format, and (ii) an accumulator, coupled to the multiplier of the associated MAC circuit, to add second input data and the product data output by the associated multiplier to generate sum data. The plurality of multiplier-accumulator circuits of the multiplier-accumulator execution pipeline may be connected in series and, in operation, perform a plurality of concatenated multiply and accumulate operations.
G06F 9/38 - Concurrent instruction execution, e.g. pipeline, look ahead
G06F 9/30 - Arrangements for executing machine instructions, e.g. instruction decode
G06F 7/544 - Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation using unspecified devices for evaluating functions by calculation
G06F 15/173 - Interprocessor communication using an interconnection network, e.g. matrix, shuffle, pyramid, star or snowflake
G06F 7/501 - Half or full adders, i.e. basic adder cells for one denomination
13.
MAC processing pipelines, circuitry to control and configure same, and methods of operating same
An integrated circuit including control/configure circuitry which interfaces with a plurality of interconnected MACs and/or one or more rows of interconnected connected MACs. The control/configure circuitry may include a plurality of control/configure circuits, each control/configure circuit interfaces with at least one MAC pipeline, wherein each pipeline includes a plurality of linearly connected multiplier-accumulator circuits. Each control/configure circuit may include one or more of (i) a configurable input data signal path to provide data to the MACs of the pipeline during the execution sequence(s) and (ii) a configurable output data path for the output data generated by execution sequence (i.e., input data that was processed via the multiplier-accumulator circuits of the pipeline). In one embodiment, the sum data, generated by the accumulator during an execution cycle is stored in the associated MAC for use in the subsequent execution cycle as the second data by the same accumulator of the associated MAC.
G06F 7/544 - Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation using unspecified devices for evaluating functions by calculation
14.
Multiplier-Accumulator Circuitry, and Processing Pipeline including Same
An integrated circuit comprising a plurality of MACs, connected to form a pipeline, to perform a plurality of multiply and accumulate operations, wherein each MAC includes: (A) a multiplier, coupled to memory to (i) receive the multiplier weight data, (ii) multiply first data and the multiplier weight data and (iii) output product data, (B) an accumulator, coupled to the multiplier of the MAC, to add second data and the first product data and output sum data, and (C) a load-store register, coupled to: (i) an output of the accumulator of the associated MAC and (ii) an input of the load-store register of an immediately successive MAC. Each load-store register may include two interconnected registers, and is configurable to, on the same clock cycle, (a) load the initialization data into the accumulator of the immediately successive MAC and (b) store the sum data from the associated MAC into the load-store register.
An integrated circuit including a multiplier-accumulator execution pipeline including a plurality of multiplier-accumulator circuits to process the data, using filter weights, via a plurality of multiply and accumulate operations. The integrated circuit includes first conversion circuitry, coupled the pipeline, having inputs to receive a plurality of sets of data, wherein each set of data includes a plurality of data, Winograd conversion circuitry to convert each set of data to a corresponding Winograd set of data, floating point format conversion circuitry, coupled to the Winograd conversion circuitry, to convert the data of each Winograd set of data to a floating point data format. In operation, the multiplier-accumulator circuits are configured to perform the plurality of multiply and accumulate operations using the data of the plurality of Winograd sets of data from the first conversion circuitry and the filter weights, and generate output data based on the multiply and accumulate operations.
An integrated circuit comprising a plurality MAC processors, interconnected into a linear pipeline, configurable to process input data, wherein each MAC processor includes (A) a multiplier and (B) an accumulator circuit, and (C) a plurality of rotate input data paths, wherein each rotate input data path couples two sequential MAC processors of the linear pipeline including an input of the multiplier circuit of a first MAC processor of sequential MAC processors to an input of the multiplier circuit of the immediately following MAC processor of the associated sequential MAC processors of the pipeline - wherein each rotate input data path is configurable to provide rotate input data from a first MAC processor of sequential MAC processors of the linear pipeline to the immediately following MAC processor of the associated sequential MAC processors thereby forming a serial circular path via the plurality of rotate input data paths.
An integrated circuit comprising a plurality MAC processors, interconnected into a linear pipeline, configurable to process input data, wherein each MAC processor includes (A) a multiplier and (B) an accumulator circuit, and (C) a plurality of rotate input data paths, wherein each rotate input data path couples two sequential MAC processors of the linear pipeline including an input of the multiplier circuit of a first MAC processor of sequential MAC processors to an input of the multiplier circuit of the immediately following MAC processor of the associated sequential MAC processors of the pipeline—wherein each rotate input data path is configurable to provide rotate input data from a first MAC processor of sequential MAC processors of the linear pipeline to the immediately following MAC processor of the associated sequential MAC processors thereby forming a serial circular path via the plurality of rotate input data paths.
G06F 7/544 - Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation using unspecified devices for evaluating functions by calculation
An integrated circuit including a plurality of processing components to process image data of a plurality of image frames, wherein each image frame includes a plurality of stages. Each processing component includes a plurality of execution pipelines, wherein each pipeline includes a plurality of multiplier-accumulator circuits configurable to perform multiply and accumulate operations using image data and filter weights, wherein: (i) a first processing component is configurable to process all of the data associated with a first plurality of stages of each image frame, and (ii) a second processing component of the plurality of processing components is configurable to process all of the data associated with a second plurality of stages of each image frame. The first and second processing component processes data associated with the first and second plurality of stages, respectively, of a first image frame concurrently.
An integrated circuit comprising a plurality of multiplier-accumulator circuits connected in series to form a linear pipeline to process first data, via performing a plurality of concatenated multiply and accumulate operations, and generate MAC output data, wherein each multiplier-accumulator circuit of the plurality of multiplier- accumulator circuits includes (i) a multiplier to multiply first data by a multiplier weight data and generate a product data, and (ii) an accumulator, coupled to the multiplier of the associated multiplier-accumulator circuit, to add second data and the product data of the associated multiplier to generate sum data. The integrated circuit further includes an activation circuit, connected to the output of the linear pipeline of the plurality of multiplier-accumulator circuits, to receive the MAC output data and process the MAC output data, via a non-linear activation function, to generate MAC pipeline output data.
H03K 19/173 - Logic circuits, i.e. having at least two inputs acting on one output; Inverting circuits using specified components using elementary logic circuits as components
An integrated circuit comprising a plurality of multiplier-accumulator circuits connected in series to form a linear pipeline to process first data, via performing a plurality of concatenated multiply and accumulate operations, and generate MAC output data, wherein each multiplier-accumulator circuit of the plurality of multiplier-accumulator circuits includes (i) a multiplier to multiply first data by a multiplier weight data and generate a product data, and (ii) an accumulator, coupled to the multiplier of the associated multiplier-accumulator circuit, to add second data and the product data of the associated multiplier to generate sum data. The integrated circuit further includes an activation circuit, connected to the output of the linear pipeline of the plurality of multiplier-accumulator circuits, to receive the MAC output data and process the MAC output data, via a non-linear activation function, to generate MAC pipeline output data.
G06F 7/544 - Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation using unspecified devices for evaluating functions by calculation
21.
Multiplier-accumulator processing pipelines and processing component, and methods of operating same
An integrated circuit including a plurality of processing components to process image data of a plurality of image frames, wherein each image frame includes a plurality of stages. Each processing component includes a plurality of execution pipelines, wherein each pipeline includes a plurality of multiplier-accumulator circuits configurable to perform multiply and accumulate operations using image data and filter weights, wherein: (i) a first processing component is configured to process all of the data associated with a first plurality of stages of each image frame, and (ii) a second processing component of the plurality of processing components is configured to process all of the data associated with a second plurality of stages of each image frame. The first and second processing component processes data associated with the first and second plurality of stages, respectively, of a first image frame concurrently.
An integrated circuit including configurable multiplier-accumulator circuitry, wherein, during processing operations, a plurality of the multiplier-accumulator circuits are serially connected into pipelines to perform concatenated multiply and accumulate operations. The integrated circuit includes a first memory and a second memory, and a switch interconnect network, including configurable multiplexers arranged in a plurality of switch matrices. The first and second memories are configurable as either a dedicated read memory or a dedicated write memory and connected to a given pipeline, via the switch interconnect network, during a processing operation performed thereby; wherein, during a first processing operations, the first memory is dedicated to write data to a first pipeline and the second memory is dedicated to read data therefrom and, during a second processing operation, the first memory is dedicated to read data from a second pipeline and the second memory is dedicated to write data thereto.
G06F 9/38 - Concurrent instruction execution, e.g. pipeline, look ahead
G06F 7/544 - Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation using unspecified devices for evaluating functions by calculation
G06F 9/30 - Arrangements for executing machine instructions, e.g. instruction decode
H03K 19/17736 - Structural details of routing resources
G06F 5/16 - Multiplexed systems, i.e. using two or more similar devices which are alternately accessed for enqueue and dequeue operations, e.g. ping-pong buffers
23.
MULTIPLIER CIRCUIT ARRAY, MAC AND MAC PIPELINE INCLUDING SAME, AND METHODS OF CONFIGURING SAME
An integrated circuit comprising a MAC pipeline including a plurality of MACs connected in series to perform concatenated multiply and accumulate operations, wherein each MAC includes a multiplier circuit array, including a plurality of multiplier circuits, to multiply first data and weight data and generate product data. The plurality of multiplier circuits, in one embodiment, includes a first multiplier circuit to multiply first portions of the first data and the weight data to generate a first field, and a second multiplier circuit to multiply a second portions of the first data and weight data to generate a second field, wherein the product data includes data which is representative of the first field and the second field. An accumulator circuit adds the product data, output from the associated multiplier circuit array, and second data. The multiply cores of the first and second multiplier circuits are separate and different.
G06F 7/38 - Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation
G06F 7/499 - Denomination or exception handling, e.g. rounding or overflow
24.
Multiplier Circuit Array, MAC and MAC Pipeline including Same, and Methods of Configuring Same
An integrated circuit comprising a MAC pipeline including a plurality of MACs connected in series to perform concatenated multiply and accumulate operations, wherein each MAC includes a multiplier circuit array, including a plurality of multiplier circuits, to multiply first data and weight data and generate product data. The plurality of multiplier circuits, in one embodiment, includes a first multiplier circuit to multiply first portions of the first data and the weight data to generate a first field, and a second multiplier circuit to multiply a second portions of the first data and weight data to generate a second field, wherein the product data includes data which is representative of the first field and the second field. An accumulator circuit adds the product data, output from the associated multiplier circuit array, and second data. The multiply cores of the first and second multiplier circuits are separate and different.
G06F 7/544 - Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation using unspecified devices for evaluating functions by calculation
G06F 7/499 - Denomination or exception handling, e.g. rounding or overflow
A method of routing interconnects of a field programmable gate array including: a plurality of logic tiles, and a tile-to-tile interconnect network, having a plurality of tile-to-tile interconnects to interconnect logic tile networks of the logic tiles, the method comprises: routing a first plurality of tile-to-tile interconnects in a first plurality of logic tiles. After routing the first plurality of tile-to-tile interconnects, routing a second plurality of tile-to-tile interconnects in a second plurality of logic tiles. The start/end point of each tile-to-tile interconnect in the first plurality and the second plurality of tiles is independent of the start/end point of the other tile-to-tile interconnects in the first and second plurality, respectively. Routing the second plurality of tile-to-tile interconnects includes connecting at least one start/end point of each tile-to-tile interconnect in the second plurality of tiles to at least one start/end point of each interconnect in the first plurality of tiles.
An integrated circuit including memory to store image data and filter weights, and a plurality of multiply-accumulator execution pipelines, each multiply-accumulator execution pipeline coupled to the memory to receive (i) image data and (ii) filter weights, wherein each multiply-accumulator execution pipeline processes the image data, using associated filter weights, via a plurality of multiply and accumulate operations. In one embodiment, the multiply-accumulator circuitry of each multiply-accumulator execution pipeline, in operation, receives a different set of image data, each set including a plurality of image data, and, using filter weights associated with the received set of image data, processes the set of image data associated therewith, via performing a plurality of multiply and accumulate operations concurrently with the multiply-accumulator circuitry of the other multiply-accumulator execution pipelines, to generate output data. Each set of image data includes all of the image that correlates to the output data generated therefrom.
An integrated circuit comprising a plurality MAC pipelines wherein each MAC pipeline includes: (i) a plurality of MACs connected in series and (ii) a plurality of data paths including an accumulation data path, wherein each MAC includes a multiplier to multiply to generate product data and an accumulator to generate sum data. The integrated circuit further comprises a plurality of control/configure circuits, wherein each control/configure circuit connects directly to and is associated with a MAC pipeline, wherein each control/configure circuit includes an accumulation data path which is configurable to directly connect to the accumulation data path of the MAC pipeline to form an accumulation ring when the control/configure circuit is configured in an accumulation mode, and an output data path configurable to directly connect to the output of the accumulation data path of the MAC pipeline when the control/configure circuit is configured in an output data mode.
An integrated circuit comprising a plurality MAC pipelines wherein each MAC pipeline includes: (i) a plurality of MACs connected in series and (ii) a plurality of data paths including an accumulation data path, wherein each MAC includes a multiplier to multiply to generate product data and an accumulator to generate sum data. The integrated circuit further comprises a plurality of control/configure circuits, wherein each control/configure circuit connects directly to and is associated with a MAC pipeline, wherein each control/configure circuit includes an accumulation data path which is configurable to directly connect to the accumulation data path of the MAC pipeline to form an accumulation ring when the control/configure circuit is configured in an accumulation mode, and an output data path configurable to directly connect to the output of the accumulation data path of the MAC pipeline when the control/configure circuit is configured in an output data mode.
G06F 7/544 - Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation using unspecified devices for evaluating functions by calculation
G06F 5/16 - Multiplexed systems, i.e. using two or more similar devices which are alternately accessed for enqueue and dequeue operations, e.g. ping-pong buffers
29.
MAC processing pipelines having programmable granularity, and methods of operating same
An integrated circuit comprising a plurality of multiplier-accumulator circuits connected in series in a linear pipeline to perform a plurality of concatenated multiply and accumulate operations, wherein each multiplier-accumulator circuit of the plurality of multiplier-accumulator circuits includes: a multiplier to multiply first data by a multiplier weight data and generate a product data, and an accumulator, coupled to the multiplier of the associated multiplier-accumulator circuit, to add second data and the product data of the associated multiplier to generate sum data. The integrated circuit also includes a plurality of granularity configuration circuits, wherein each granularity configuration circuit is associated with a different multiplier-accumulator circuit of the plurality of multiplier-accumulator circuits to operationally (i) disconnect the multiplier and accumulator of the associated multiplier-accumulator circuit from the linear pipeline during operation or (ii) connect the multiplier and accumulator of the associated multiplier-accumulator circuit to the linear pipeline during operation.
G06F 7/544 - Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation using unspecified devices for evaluating functions by calculation
G06F 7/57 - Arithmetic logic units [ALU], i.e. arrangements or devices for performing two or more of the operations covered by groups or for performing logical operations
30.
MAC PROCESSING PIPELINES HAVING PROGRAMMABLE GRANULARITY, AND METHODS OF OPERATING SAME
An integrated circuit comprising a plurality of multiplier-accumulator circuits connected in series in a linear pipeline to perform a plurality of concatenated multiply and accumulate operations, wherein each multiplier-accumulator circuit of the plurality of multiplier-accumulator circuits includes: a multiplier to multiply first data by a multiplier weight data and generate a product data, and an accumulator, coupled to the multiplier of the associated multiplier-accumulator circuit, to add second data and the product data of the associated multiplier to generate sum data. The integrated circuit also includes a plurality of granularity configuration circuits, wherein each granularity configuration circuit is associated with a different multiplier-accumulator circuit of the plurality of multiplier-accumulator circuits to operationally (i) disconnect the multiplier and accumulator of the associated multiplier-accumulator circuit from the linear pipeline during operation or (ii) connect the multiplier and accumulator of the associated multiplier-accumulator circuit to the linear pipeline during operation.
G06F 7/38 - Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation
H03K 19/173 - Logic circuits, i.e. having at least two inputs acting on one output; Inverting circuits using specified components using elementary logic circuits as components
H03K 19/177 - Logic circuits, i.e. having at least two inputs acting on one output; Inverting circuits using specified components using elementary logic circuits as components arranged in matrix form
31.
MAC processing pipelines, circuitry to control and configure same, and methods of operating same
An integrated circuit including control/configure circuitry which interfaces with a plurality of interconnected (e.g., serially) multiplier-accumulator circuits and/or one or more rows of interconnected (e.g., serially) multiplier-accumulator circuits. The control/configure circuitry may include a plurality of control/configure circuits, each control/configure circuit interfaces with at least one multi-bit MAC execution pipeline, wherein each pipeline includes a plurality of interconnected (e.g., serially) multiplier-accumulator circuits. Each control/configure circuit may include one or more (or all) of (i) a configurable input data signal path to provide data to the MACs of the pipeline during the execution sequence(s), (ii) a configurable accumulation data path for the ongoing/accumulating MAC accumulation totals generated by the MACs during an execution sequence, and (iii) a configurable output data path for the output data generated by execution sequence (i.e., input data that was processed via the multiplier-accumulator circuits or MAC processors of the execution pipeline).
G06F 7/544 - Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation using unspecified devices for evaluating functions by calculation
32.
MAC PROCESSING PIPELINES, CIRCUITRY TO CONTROL AND CONFIGURE SAME, AND METHODS OF OPERATING SAME
An integrated circuit including control/configure circuitry which interfaces with a plurality of interconnected (e.g., serially) multiplier-accumulator circuits and/or one or more rows of interconnected (e.g., serially) multiplier-accumulator circuits. The control/configure circuitry may include a plurality of control/configure circuits, each control/configure circuit interfaces with at least one multi-bit MAC execution pipeline, wherein each pipeline includes a plurality of interconnected (e.g., serially) multiplier- accumulator circuits. Each control/configure circuit may include one or more (or all) of (i) a configurable input data signal path to provide data to the MACs of the pipeline during the execution sequence(s), (ii) a configurable accumulation data path for the ongoing/accumulating MAC accumulation totals generated by the MACs during an execution sequence, and (iii) a configurable output data path for the output data generated by execution sequence (i.e., input data that was processed via the multiplier-accumulator circuits or MAC processors of the execution pipeline).
An integrated circuit comprising a plurality of multiplier-accumulator circuits, connected in series, wherein the plurality of multiplier-accumulator circuits includes a first MAC circuit, including a multiplier to multiply first data and first multiplier weight data and output first product data, and an accumulator, coupled to the multiplier of the first MAC circuit, to add second data and the first product data and output first sum data. The plurality of multiplier-accumulator circuits further includes a second MAC circuit including a multiplier to multiply third data and second multiplier weight data and output second product data, and an accumulator, coupled to the multiplier of the second MAC circuit and the accumulator of the first MAC circuit, to generate and output second sum data. A first load-store register is coupled to an output of the accumulator of the first MAC circuit and an input of the accumulator of the second MAC circuit.
An integrated circuit including a plurality of logarithmic addition-accumulator circuits, connected in series, to, in operation, perform logarithmic addition and accumulate operations, wherein each logarithmic addition-accumulator circuit includes: (i) a logarithmic addition circuit to add a first input data and a filter weight data, each having the logarithmic data format, and to generate and output first sum data having a logarithmic data format, and (ii) an accumulator, coupled to the logarithmic addition circuit of the associated logarithmic addition-accumulator circuit, to add a second input data and the first sum data output by the associated logarithmic addition circuit to generate first accumulation data. The integrated circuit may further include first data format conversion circuitry, coupled to the output of each logarithmic addition circuit, to convert the data format of the first sum data to a floating point data format wherein the accumulator may be a floating point type.
G06F 7/483 - Computations with numbers represented by a non-linear combination of denominational numbers, e.g. rational numbers, logarithmic number system or floating-point numbers
G06F 7/544 - Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation using unspecified devices for evaluating functions by calculation
An integrated circuit including a plurality of logarithmic addition-accumulator circuits, connected in series, to, in operation, perform logarithmic addition and accumulate operations, wherein each logarithmic addition-accumulator circuit includes: (i) a logarithmic addition circuit to add a first input data and a filter weight data, each having the logarithmic data format, and to generate and output first sum data having a logarithmic data format, and (ii) an accumulator, coupled to the logarithmic addition circuit of the associated logarithmic addition-accumulator circuit, to add a second input data and the first sum data output by the associated logarithmic addition circuit to generate first accumulation data. The integrated circuit may further include first data format conversion circuitry, coupled to the output of each logarithmic addition circuit, to convert the data format of the first sum data to a floating point data format wherein the accumulator may be a floating point type.
G06F 7/57 - Arithmetic logic units [ALU], i.e. arrangements or devices for performing two or more of the operations covered by groups or for performing logical operations
A system and/or an integrated circuit including: (a) a multiplier-accumulator execution pipeline including multiplier-accumulator circuits to process image data, using associated filter weights, via a plurality of multiply and accumulate operations and (b) first data format conversion circuitry including (i) inputs to receive filter weights of a plurality of sets of filter weights, wherein each set includes a plurality of filter weights each having a block-scaled fraction data format, (ii) conversion circuitry, coupled to the inputs, to convert the filter weights of each set from the block-scaled fraction data format to a floating point data format, and (iii) outputs to output the filter weights having the floating point data format. In operation, the multiplier-accumulator circuits of the pipeline are configured to perform the plurality of multiply and accumulate operations using (a) the image data and (b) the filter weights having the floating point data format.
An integrated circuit comprising a field programmable gate array including a plurality of logic tiles, wherein, during operation, each logic tile is configurable to connect with at least one other logic tile, and wherein each logic tile includes: (1) a normal operating mode and test mode, (2) an interconnect network including a plurality of multiplexers, wherein during operation, the interconnect network of each logic tile is configurable to connect with the interconnect network of at least one other logic tile in the normal operating mode and (3) bitcells to store data. The FPGA also includes control circuitry, electrically connected to each logic tile, to configure each logic tile in a test mode and enable concurrently writing configuration test data into each logic tile of the plurality of logic tiles when the FPGA is in the test mode.
H03K 19/17728 - Reconfigurable logic blocks, e.g. lookup tables
G01R 31/3177 - Testing of logic operation, e.g. by logic analysers
H03K 19/173 - Logic circuits, i.e. having at least two inputs acting on one output; Inverting circuits using specified components using elementary logic circuits as components
38.
MAC processing pipeline having conversion circuitry, and methods of operating same
An integrated circuit including a multiplier-accumulator execution pipeline including a plurality of multiplier-accumulator circuits to process the data, using filter weights, via a plurality of multiply and accumulate operations. The integrated circuit includes first conversion circuitry, coupled the pipeline, having inputs to receive a plurality of sets of data, wherein each set of data includes a plurality of data, Winograd conversion circuitry to convert each set of data to a corresponding Winograd set of data, floating point format conversion circuitry, coupled to the Winograd conversion circuitry, to convert the data of each Winograd set of data to a floating point data format. In operation, the multiplier-accumulator circuits are configured to: perform the plurality of multiply and accumulate operations using the data of the plurality of Winograd sets of data from the first conversion circuitry and the filter weights, and generate output data based on the multiply and accumulate operations.
An integrated circuit including a multiplier-accumulator execution pipeline including a plurality of multiplier-accumulator circuits to process the data, using filter weights, via a plurality of multiply and accumulate operations. The integrated circuit includes first conversion circuitry, coupled the pipeline, having inputs to receive a plurality of sets of data, wherein each set of data includes a plurality of data, Winograd conversion circuitry to convert each set of data to a corresponding Winograd set of data, floating point format conversion circuitry, coupled to the Winograd conversion circuitry, to convert the data of each Winograd set of data to a floating point data format. In operation, the multiplier-accumulator circuits are configured to: perform the plurality of multiply and accumulate operations using the data of the plurality of Winograd sets of data from the first conversion circuitry and the filter weights, and generate output data based on the multiply and accumulate operations.
An integrated circuit including configurable multiplier-accumulator circuitry, wherein, during processing operations, a plurality of the multiplier-accumulator circuits are serially connected into pipelines to perform concatenated multiply and accumulate operations. The integrated circuit includes a first memory and a second memory, and a switch interconnect network, including configurable multiplexers arranged in a plurality of switch matrices. The first and second memories are configurable as either a dedicated read memory or a dedicated write memory and connected to a given pipeline, via the switch interconnect network, during a processing operation performed thereby; wherein, during a first processing operations, the first memory is dedicated to write data to a first pipeline and the second memory is dedicated to read data therefrom and, during a second processing operation, the first memory is dedicated to read data from a second pipeline and the second memory is dedicated to write data thereto.
G06F 9/38 - Concurrent instruction execution, e.g. pipeline, look ahead
G06F 7/544 - Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation using unspecified devices for evaluating functions by calculation
G06F 5/16 - Multiplexed systems, i.e. using two or more similar devices which are alternately accessed for enqueue and dequeue operations, e.g. ping-pong buffers
G06F 9/30 - Arrangements for executing machine instructions, e.g. instruction decode
A method of routing interconnects of a field programmable gate array including: a plurality of logic tiles, and a tile-to-tile interconnect network, having a plurality of tile-to-tile interconnects to interconnect logic tile networks of the logic tiles, the method comprises: routing a first plurality of tile-to-tile interconnects in a first plurality of logic tiles. After routing the first plurality of tile-to-tile interconnects, routing a second plurality of tile-to-tile interconnects in a second plurality of logic tiles. The start/end point of each tile-to-tile interconnect in the first plurality and the second plurality of tiles is independent of the start/end point of the other tile-to-tile interconnects in the first and second plurality, respectively. Routing the second plurality of tile-to-tile interconnects includes connecting at least one start/end point of each tile-to-tile interconnect in the second plurality of tiles to at least one start/end point of each interconnect in the first plurality of tiles.
An integrated circuit including an FPGA having an input to receive an input data stream which includes a first portion and a second portion, processing circuitry to generate processed data by processing only the first portion of the input data stream via a data processing operation, and an output to output the processed data. The integrated circuit further includes logic circuitry, separate from the FPGA, including an input to receive the input data stream, data alignment circuitry to temporally synchronize the second portion of the input data stream with the processing of the first portion of the input data stream via the processing circuitry, and data combining circuitry to generate an output data stream using the processed data from the FPGA and the second portion of the input data stream received from the data alignment circuitry.
A method of routing interconnects of a field programmable gate array including: a plurality of logic tiles, and a tile-to-tile interconnect network, having a plurality of tile-to-tile interconnects to interconnect logic tile networks of the logic tiles, the method comprises: routing a first plurality of tile-to-tile interconnects in a first plurality of logic tiles. After routing the first plurality of tile-to-tile interconnects, routing a second plurality of tile-to-tile interconnects in a second plurality of logic tiles. The start/end point of each tile-to-tile interconnect in the first plurality and the second plurality of tiles is independent of the start/end point of the other tile-to-tile interconnects in the first and second plurality, respectively. Routing the second plurality of tile-to-tile interconnects includes connecting at least one start/end point of each tile-to-tile interconnect in the second plurality of tiles to at least one start/end point of each interconnect in the first plurality of tiles.
An integrated circuit including a plurality of processing components, including first and second processing components, wherein each processing component includes first memory to store image data and a plurality of multiplier-accumulator execution pipelines, wherein each multiplier-accumulator execution pipeline includes a plurality of multiplier-accumulator circuits to, in operation, perform multiply and accumulate operations using data from the first memory and filter weights. The first processing component is configured to process all of the data associated with all of stages of a first image frame via the plurality of multiplier-accumulator execution pipelines of the first processing component. The second processing component is configured to process all of the data associated with all of stages of a second image frame via the plurality of multiplier-accumulator execution pipelines of the second processing component, wherein the first image frame and the second image frame are successive image frames.
An integrated circuit including a plurality of processing components, including first and second processing components, wherein each processing component includes first memory to store image data and a plurality of multiplier-accumulator execution pipelines, wherein each multiplier-accumulator execution pipeline includes a plurality of multiplier-accumulator circuits to, in operation, perform multiply and accumulate operations using data from the first memory and filter weights. The first processing component is configured to process all of the data associated with all of stages of a first image frame via the plurality of multiplier-accumulator execution pipelines of the first processing component. The second processing component is configured to process all of the data associated with all of stages of a second image frame via the plurality of multiplier-accumulator execution pipelines of the second processing component, wherein the first image frame and the second image frame are successive image frames.
An integrated circuit including memory to store image data and filter weights, and a plurality of multiplier-accumulator execution pipelines, each multiplier-accumulator execution pipeline coupled to the memory to receive (i) image data and (ii) filter weights, wherein each multiplier-accumulator execution pipeline processes the image data, using associated filter weights, via a plurality of multiply and accumulate operations. In one embodiment, the multiplier-accumulator circuitry of each multiplier-accumulator execution pipeline, in operation, receives a different set of image data, each set including a plurality of image data, and, using filter weights associated with the received set of image data, processes the set of image data associated therewith, via performing a plurality of multiply and accumulate operations concurrently with the multiplier-accumulator circuitry of the other multiplier-accumulator execution pipelines, to generate output data. Each set of image data includes all of the image that correlates to the output data generated therefrom.
An integrated circuit including memory to store image data and filter weights, and a plurality of multiply-accumulator execution pipelines, each multiply-accumulator execution pipeline coupled to the memory to receive (i) image data and (ii) filter weights, wherein each multiply-accumulator execution pipeline processes the image data, using associated filter weights, via a plurality of multiply and accumulate operations. In one embodiment, the multiply-accumulator circuitry of each multiply-accumulator execution pipeline, in operation, receives a different set of image data, each set including a plurality of image data, and, using filter weights associated with the received set of image data, processes the set of image data associated therewith, via performing a plurality of multiply and accumulate operations concurrently with the multiply-accumulator circuitry of the other multiply-accumulator execution pipelines, to generate output data. Each set of image data includes all of the image that correlates to the output data generated therefrom.
G06F 7/38 - Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation
G06F 7/48 - Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation using unspecified devices
G06F 7/544 - Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation using unspecified devices for evaluating functions by calculation
48.
Multiplier-accumulator circuitry, and processing pipeline including same
An integrated circuit comprising a plurality of multiply-accumulator circuitry, configurable in a concatenation architecture, to perform a plurality of multiply and accumulate operations, wherein the plurality of multiply-accumulator circuitry is organized into a plurality of groups, including a first group of multiply-accumulator circuitry and a second group of multiply-accumulator circuitry, wherein each group includes: a plurality of MAC circuits, each including a multiplier to multiply data by a multiplier weight data and generate a product data, and an accumulator to add input data and the product data to generate sum data, and wherein the plurality of MAC circuits of each group is organized in at least one row and connected in series to perform a plurality of concatenated multiply and accumulate operations. The integrated circuit also includes configurable interface circuitry to connect and/or disconnect the plurality of MAC circuits of the first and second groups of multiply-accumulator circuitry.
H03K 19/173 - Logic circuits, i.e. having at least two inputs acting on one output; Inverting circuits using specified components using elementary logic circuits as components
H03K 19/1776 - Structural details of configuration resources for memories
An integrated circuit comprising a field programmable gate array (FPGA) including a plurality of logic tiles wherein each logic tile includes circuitry including (i) logic circuitry and (ii) an interconnect network including a plurality of multiplexers. The FPGA further includes a robust memory cell including: three or more storage elements that are more than one time programmable to store a data state, majority detection circuitry to detect a majority data state stored in the three or more storage elements; and an output, coupled to the majority detection circuitry, to output mode select data which is representative of the majority data state stored in the storage elements. The FPGA also includes mode/function select circuitry to configure a mode of operation of at least a portion of the circuitry in one or more of the plurality of logic tiles based on the mode select data.
G01R 31/3177 - Testing of logic operation, e.g. by logic analysers
G06F 30/331 - Design verification, e.g. functional simulation or model checking using simulation with hardware acceleration, e.g. by using field programmable gate array [FPGA] or emulation
H03K 19/17728 - Reconfigurable logic blocks, e.g. lookup tables
50.
Reconfigurable data processing pipeline, and method of operating same
An integrated circuit including an FPGA, configurable to process data via a plurality of data processing operations, and an ASIC, electrically coupled to logic circuitry of the FPGA via switch interconnect network thereof. In one embodiment, the ASIC includes a plurality of circuit blocks, each circuit block configurable to process data via a data processing operation, and selection circuitry, coupled to the logic circuitry of the FPGA and the plurality of circuit blocks of the ASIC, configurable to connect one or more of the plurality of circuit blocks of the ASIC and the logic circuitry of the FPGA into a first circuit configuration to perform a first data processing operation, and in situ, connect one or more of the plurality of circuit blocks of the ASIC and the logic circuitry of the FPGA into a second circuit configuration to perform a second data processing operation.
An integrated circuit comprising a plurality of one-hot-bit multiplexers interconnected to form a switch interconnect network (e.g., hierarchical and/or mesh type networks), wherein each of the plurality of one-hot-bit multiplexers includes an output, inputs, and input selects, wherein each one-hot-bit multiplexer of the plurality of one-hot-bit multiplexers are capable of receiving: (i) an input select signal to select one of the plurality of inputs, (ii) an operational input signal at a selected input during a normal operation of the switch interconnect network, and (iii) an initialization input signal at the selected input during an initialization operation. The integrated circuit further includes initialization circuitry to generate a plurality of initialization input signals in response to an initialization signal, wherein the initialization input signals have (i) an identical data state and (ii) are applied to the selected input of each of the plurality of one-hot-bit multiplexers during the initialization operation.
An integrated circuit comprising an FPGA including programmable/configurable logic circuitry having a periphery, wherein resources (e.g., memory (e.g., high-speed local RAM), one or more busses, and/or circuitry external to the FPGA (e.g., a processor, a controller and/or system/external memory), is/are disposed outside the periphery of the programmable/configurable logic circuitry which includes a plurality of logic tiles, wherein at least one logic tile is located completely within the interior of the periphery and wherein each logic tile of the array of logic tiles includes a plurality of I/Os located on the perimeter of the logic tile wherein a first portion of the I/Os are located on a perimeter of the logic tile that is interior to the periphery, and the first portion of I/Os of each logic tile of the plurality of the logic tiles are directly connected to the bus to provide communication between the resources and the logic tiles.
An integrated circuit comprising a field programmable gate array including a plurality of logic tiles, wherein, during operation, each logic tile is configurable to connect with at least one other logic tile, and wherein each logic tile includes: (1) a normal operating mode and test mode, (2) an interconnect network including a plurality of multiplexers, wherein during operation, the interconnect network of each logic tile is configurable to connect with the interconnect network of at least one other logic tile in the normal operating mode and (3) bitcells to store data. The FPGA also includes control circuitry, electrically connected to each logic tile, to configure each logic tile in a test mode and enable concurrently writing configuration test data into each logic tile of the plurality of logic tiles when the FPGA is in the test mode.
H03K 19/17728 - Reconfigurable logic blocks, e.g. lookup tables
G01R 31/3177 - Testing of logic operation, e.g. by logic analysers
H03K 19/173 - Logic circuits, i.e. having at least two inputs acting on one output; Inverting circuits using specified components using elementary logic circuits as components
54.
Multiplier-accumulator circuit, logic tile architecture for multiply-accumulate, and IC including logic tile array
An integrated circuit comprising a plurality of multiply-accumulator circuitry interconnected in a concatenation architecture. Each multiply-accumulator circuitry includes first and second MAC circuits and a load-store register. The first MAC circuit includes a multiplier to multiply first data by a first multiplier weight data and generate a first product data, and an accumulator to add first input data and the first product data to generate first sum data. The second MAC circuit includes a multiplier to multiply second data by a second multiplier weight data and generate a second product data, and an accumulator, coupled to the multiplier of the second MAC circuit and the accumulator of the first MAC circuit, to add the first sum data and the second product data to generate second sum data. The load-store register is coupled to the accumulator of the second MAC circuit to temporarily store the second sum data.
H03K 19/23 - Majority or minority circuits, i.e. giving output having the state of the majority or the minority of the inputs
G06F 7/38 - Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation
H03K 19/1776 - Structural details of configuration resources for memories
An integrated circuit comprising a plurality of multiply-accumulator circuitry interconnected in a concatenation architecture. Each multiply-accumulator circuitry includes first and second MAC circuits and a load-store register. The first MAC circuit includes a multiplier to multiply first data by a first multiplier weight data and generate a first product data, and an accumulator to add first input data and the first product data to generate first sum data. The second MAC circuit includes a multiplier to multiply second data by a second multiplier weight data and generate a second product data, and an accumulator, coupled to the multiplier of the second MAC circuit and the accumulator of the first MAC circuit, to add the first sum data and the second product data to generate second sum data. The load-store register is coupled to the accumulator of the second MAC circuit to temporarily store the second sum data.
H03K 19/177 - Logic circuits, i.e. having at least two inputs acting on one output; Inverting circuits using specified components using elementary logic circuits as components arranged in matrix form
56.
Test circuitry and techniques for logic tiles of FPGA
An integrated circuit comprising a plurality of logic tiles, wherein, during operation, each logic tile is configurable to connect with at least one other logic tile, and wherein each logic tile includes: (1) a normal operating mode, (2) a test mode, (3) an interconnect network including a plurality of multiplexers, wherein during operation, the interconnect network of each logic tile is configurable to electrically connect with the interconnect network of at least one adjacent logic tile of the plurality of logic tiles via one or more tile-to-tile interconnects in the normal operating mode and (4) isolation circuitry, connected between the associated interconnect network and the interconnect network of each adjacent logic tile, configurable to responsively disconnect tile-to-tile interconnects disposed between the interconnect network of each adjacent logic tile in the test mode to thereby electrically disconnect interconnect networks of adjacent logic tiles in the test mode.
H03K 19/177 - Logic circuits, i.e. having at least two inputs acting on one output; Inverting circuits using specified components using elementary logic circuits as components arranged in matrix form
H03K 19/173 - Logic circuits, i.e. having at least two inputs acting on one output; Inverting circuits using specified components using elementary logic circuits as components
G01R 31/3177 - Testing of logic operation, e.g. by logic analysers
09 - Scientific and electric apparatus and instruments
Goods & Services
Integrated circuit (IP) cores implementing embedded field programmable gate arrays (FPGAs) with clusters of multiplier-accumulators (MACs) optimized for neural network inference; software for programming the embedded FPGA IP core using neural network description models; documentation and user guides in electronic format, sold as a unit, for use with the foregoing.
09 - Scientific and electric apparatus and instruments
Goods & Services
Integrated circuits which operate as an neural network inference coprocessor, connecting to a small number of DRAMs and to a host processor, with an optional connection for connecting multiple chips in a chain; software for programming co-processor using neural network model description languages; documentation and user guides in electronic format, sold as a unit, for use with the foregoing.
59.
Clock distribution and generation architecture for logic tiles of an integrated circuit and method of operating same
An integrated circuit comprising an array of logic tiles, arranged in an array of rows and columns. The array of logic tiles includes a first logic tile to receive a first external clock signal wherein each logic tile of a first plurality of logic tiles generates the tile clock using (i) the first external clock signal or (ii) a delayed version thereof from one of the plurality of output clock paths of a logic tile in the first plurality, and a second logic tile to receive a second external clock signal wherein each logic tile of a second plurality of logic tiles generates the tile clock using (i) the second external clock signal or (ii) a delayed version thereof from one of the plurality of output clock paths of a logic tile in the second plurality, wherein the first and second external clock signals are the same clock signals.
H03K 19/177 - Logic circuits, i.e. having at least two inputs acting on one output; Inverting circuits using specified components using elementary logic circuits as components arranged in matrix form
H03K 19/17736 - Structural details of routing resources
H03K 19/17796 - Structural details for adapting physical parameters for physical disposition of blocks
H03K 19/17728 - Reconfigurable logic blocks, e.g. lookup tables
60.
Clock architecture, including clock mesh fabric for FPGA, and method of operating same
An integrated circuit comprising (1) an array of logic tiles including a first and a second plurality of logic tiles, wherein each logic tile of the array is configurable to electrically connect with at least one other logic tile, and (2) a clock mesh fabric to provide a mesh clock signal to the first plurality of the logic tiles. Each logic tile of the first plurality includes clock distribution and transmission circuitry including: (1) tile clock generation circuitry configurable to generate a tile clock signal having a skew which is balanced with respect to the tile clock signals of each logic tile of the first plurality of logic tiles, and (2) clock selection circuitry configurable to receive the mesh clock signal and the tile clock signal and responsively output the tile clock to the circuitry which performs operations using or based on the associated tile clock.
H03K 19/17736 - Structural details of routing resources
H03K 19/17704 - Logic circuits, i.e. having at least two inputs acting on one output; Inverting circuits using specified components using elementary logic circuits as components arranged in matrix form the logic functions being realised by the interconnection of rows and columns
An integrated circuit comprising a plurality of logic tiles, wherein each logic tile (i) is physically adjacent to at least one other logic tile of the plurality and (ii) includes a configurable switch interconnect network including a plurality of switches electrically interconnected and arranged into a plurality of switch matrices, wherein the plurality of switch matrices are arranged into a plurality of stages including: (a) at least two of the stages which is configured in a hierarchical network, and (b) a mesh stage, wherein each switch matrix of the mesh stage includes an output that is directly connected to an input of a plurality of different switch matrices of the mesh stage and wherein the mesh stage of switch matrices of each logic tile is directly connected to the mesh stage of switch matrices of at least one other logic tile of the plurality of the logic tiles.
G06F 7/38 - Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation
H03K 19/177 - Logic circuits, i.e. having at least two inputs acting on one output; Inverting circuits using specified components using elementary logic circuits as components arranged in matrix form
H03K 19/173 - Logic circuits, i.e. having at least two inputs acting on one output; Inverting circuits using specified components using elementary logic circuits as components
H03K 19/17748 - Structural details of configuration resources
H03K 19/17736 - Structural details of routing resources
H03K 19/1776 - Structural details of configuration resources for memories
09 - Scientific and electric apparatus and instruments
Goods & Services
Integrated circuits which operate as an neural network inference coprocessor, connecting to a small number of dynamic random access memory chips (DRAMs) and to a host processor, with an optional connection for connecting multiple chips in a chain; downloadable software for programming co-processor using neural network model description languages; instructional documentation and user guide, sold as a unit, for use with the foregoing
63.
FPGA HAVING PROGRAMMABLE POWERED-UP/POWERED-DOWN LOGIC TILES, AND METHOD OF CONFIGURING AND OPERATING SAME
An integrated circuit comprising a field programmable gate array including a plurality of logic tiles, wherein, during operation of the field programmable gate array, each logic tile is configurable to connect with at least one logic tile of the plurality of logic tiles, and wherein each logic tile of the plurality of logic tiles includes an interconnect network, including a plurality of multiplexers, and logic circuitry. The field programmable gate array, in a first operational mode, includes a first group of logic tiles that are programmed in a powered-up state wherein each logic tile of the first group of logic tiles consumes electrical power during operation, and a second group of logic tiles of the plurality of logic tiles are programmed in a powered-down state wherein each logic tile of the second group of logic tiles does not consume electrical power during operation.
H03K 19/177 - Logic circuits, i.e. having at least two inputs acting on one output; Inverting circuits using specified components using elementary logic circuits as components arranged in matrix form
64.
FPGA having programmable powered-up/powered-down logic tiles, and method of configuring and operating same
An integrated circuit comprising a field programmable gate array including a plurality of logic tiles, wherein, during operation of the field programmable gate array, each logic tile is configurable to connect with at least one logic tile of the plurality of logic tiles, and wherein each logic tile of the plurality of logic tiles includes an interconnect network, including a plurality of multiplexers, and logic circuitry. The field programmable gate array, in a first operational mode, includes a first group of logic tiles that are programmed in a powered-up state wherein each logic tile of the first group of logic tiles consumes electrical power during operation, and a second group of logic tiles of the plurality of logic tiles are programmed in a powered-down state wherein each logic tile of the second group of logic tiles does not consume electrical power during operation.
H03K 19/177 - Logic circuits, i.e. having at least two inputs acting on one output; Inverting circuits using specified components using elementary logic circuits as components arranged in matrix form
An integrated circuit comprising (i) an array of logic tiles wherein each logic tile is configurable to connect with at least one adjacent logic tile and (ii) a clock mesh fabric including a clock mesh to provide a mesh clock signal to each of the logic tiles of the array of logic tiles. In one embodiment, each logic tile of the array of logic tiles includes (1) distribution and transmission circuitry configurable to provide an associated tile clock to circuitry which performs operations using or based on the associated tile clock, wherein the distribution and transmission circuitry includes circuitry to generate a tile clock signal having a skew which is balanced with respect to the tile clock signals generated by the generation circuitry of each tile, and (2) selection circuitry to responsively output the associated tile clock which corresponds to the mesh clock signal or the tile clock signal.
H03K 19/177 - Logic circuits, i.e. having at least two inputs acting on one output; Inverting circuits using specified components using elementary logic circuits as components arranged in matrix form
66.
CLOCK DISTRIBUTION AND GENERATION ARCHITECTURE FOR LOGIC TILES OF AN INTEGRATED CIRCUIT AND METHOD OF OPERATING SAME
An integrated circuit comprising an array of logic tiles, arranged in an array of rows and columns. The array of logic tiles includes a first logic tile to receive a first external clock signal wherein each logic tile of a first plurality of logic tiles generates the tile clock using (i) the first external clock signal or (ii) a delayed version thereof from one of the plurality of output clock paths of a logic tile in the first plurality, and a second logic tile to receive a second external clock signal wherein each logic tile of a second plurality of logic tiles generates the tile clock using (i) the second external clock signal or (ii) a delayed version thereof from one of the plurality of output clock paths of a logic tile in the second plurality, wherein the first and second external clock signals are the same clock signals.
H03K 19/177 - Logic circuits, i.e. having at least two inputs acting on one output; Inverting circuits using specified components using elementary logic circuits as components arranged in matrix form
G11C 11/417 - Auxiliary circuits, e.g. for addressing, decoding, driving, writing, sensing, timing or power reduction for memory cells of the field-effect type
H03K 19/00 - Logic circuits, i.e. having at least two inputs acting on one output; Inverting circuits
H03K 19/173 - Logic circuits, i.e. having at least two inputs acting on one output; Inverting circuits using specified components using elementary logic circuits as components
67.
Clock distribution and generation architecture for logic tiles of an integrated circuit and method of operating same
An integrated circuit comprising an array of logic tiles, arranged in an array of rows and columns. The array of logic tiles includes a first logic tile to receive a first external clock signal wherein each logic tile of a first plurality of logic tiles generates the tile clock using (i) the first external clock signal or (ii) a delayed version thereof from one of the plurality of output clock paths of a logic tile in the first plurality, and a second logic tile to receive a second external clock signal wherein each logic tile of a second plurality of logic tiles generates the tile clock using (i) the second external clock signal or (ii) a delayed version thereof from one of the plurality of output clock paths of a logic tile in the second plurality, wherein the first and second external clock signals are the same clock signals.
H03K 19/177 - Logic circuits, i.e. having at least two inputs acting on one output; Inverting circuits using specified components using elementary logic circuits as components arranged in matrix form
68.
FPGA having a virtual array of logic tiles, and method of configuring and operating same
An integrated circuit comprising a physical array of logic tiles, wherein each logic tile includes a perimeter and a plurality of external I/O disposed in a layout on the perimeter of the logic tile wherein the layout of the external I/O of each logic tile is identical. The physical array includes a first virtual array of logic tiles, programmed to perform data processing operations, including a first plurality of logic tiles of the physical array. The physical array also includes a second virtual array of logic tiles, programmed to perform second operations, including a second plurality of logic tiles of the physical array. The logic tiles of the second plurality are different from the logic tiles of the first plurality. In one embodiment, performance of the data processing operations of the first virtual array is independent from performance of the second operations of the second virtual array.
H03K 19/177 - Logic circuits, i.e. having at least two inputs acting on one output; Inverting circuits using specified components using elementary logic circuits as components arranged in matrix form
An integrated circuit comprising a physical array of logic tiles, wherein each logic tile includes a perimeter and a plurality of external I/O disposed in a layout on the perimeter of the logic tile wherein the layout of the external I/O of each logic tile is identical. The physical array includes a first virtual array of logic tiles, programmed to perform data processing operations, including a first plurality of logic tiles of the physical array. The physical array also includes a second virtual array of logic tiles, programmed to perform second operations, including a second plurality of logic tiles of the physical array. The logic tiles of the second plurality are different from the logic tiles of the first plurality. In one embodiment, performance of the data processing operations of the first virtual array is independent from performance of the second operations of the second virtual array.
H03K 19/177 - Logic circuits, i.e. having at least two inputs acting on one output; Inverting circuits using specified components using elementary logic circuits as components arranged in matrix form
G11C 11/417 - Auxiliary circuits, e.g. for addressing, decoding, driving, writing, sensing, timing or power reduction for memory cells of the field-effect type
H03K 19/00 - Logic circuits, i.e. having at least two inputs acting on one output; Inverting circuits
H03K 19/173 - Logic circuits, i.e. having at least two inputs acting on one output; Inverting circuits using specified components using elementary logic circuits as components
70.
Block memory layout and architecture for programmable logic IC, and method of operating same
An integrated circuit comprising a first memory array and programmable/configurable logic circuitry including a plurality of logic tiles wherein each logic tile includes a perimeter, a plurality of external I/O disposed in an I/O layout on the perimeter, wherein the I/O layout of each tile is identical. Each external I/O is configurable as an external I/O to connect to and communicate with external circuitry, or a memory I/O to point-to-point connect to memory located adjacent thereto, or an unused I/O. The first memory array is physically adjacent to a first logic tile on a first portion of the perimeter of the first logic tile which is interior to the periphery of the programmable/configurable logic circuitry, and point-to-point connected to the memory I/O. In operation, circuitry of the first logic tile is configured to read data from and write data to the first memory array via the memory I/O.
H03K 19/177 - Logic circuits, i.e. having at least two inputs acting on one output; Inverting circuits using specified components using elementary logic circuits as components arranged in matrix form
H03K 19/1776 - Structural details of configuration resources for memories
H03K 19/17736 - Structural details of routing resources
H03K 19/17728 - Reconfigurable logic blocks, e.g. lookup tables
H03K 19/173 - Logic circuits, i.e. having at least two inputs acting on one output; Inverting circuits using specified components using elementary logic circuits as components
G11C 5/02 - Disposition of storage elements, e.g. in the form of a matrix array
71.
Integrated circuit including an array of logic tiles, each logic tile including a configurable switch interconnect network
An integrated circuit comprising a plurality of logic tiles, wherein each logic tile (i) is physically adjacent to at least one other logic tile of the plurality and (ii) includes a configurable switch interconnect network including a plurality of switches electrically interconnected and arranged into a plurality of switch matrices, wherein the plurality of switch matrices are arranged into a plurality of stages including: (a) at least two of the stages which is configured in a hierarchical network, and (b) a mesh stage, wherein each switch matrix of the mesh stage includes an output that is directly connected to an input of a plurality of different switch matrices of the mesh stage and wherein the mesh stage of switch matrices of each logic tile is directly connected to the mesh stage of switch matrices of at least one other logic tile of the plurality of the logic tiles.
H03K 19/177 - Logic circuits, i.e. having at least two inputs acting on one output; Inverting circuits using specified components using elementary logic circuits as components arranged in matrix form
H01L 25/00 - Assemblies consisting of a plurality of individual semiconductor or other solid state devices
H03K 19/173 - Logic circuits, i.e. having at least two inputs acting on one output; Inverting circuits using specified components using elementary logic circuits as components
72.
Programmable decoupling capacitance of configurable logic circuitry and method of operating same
An integrated circuit comprising at least one logic tile, wherein at least one logic tile includes a mesh interconnect network. The mesh network includes (i) a plurality of interconnected multiplexers, wherein each multiplexer includes inputs, an output, and a plurality of selection inputs to receive signals that control whether an input is connected to the output and (ii) an inactive/static multiplexer which includes inputs, an output that is inactive/static, and a plurality of selection inputs to receive signals that control whether an input of the inactive/static multiplexer is connected to the output wherein such output is connected to an input of at least one of the interconnected multiplexers of the mesh network. In operation, the selection signals applied to the selection inputs of the inactive/static multiplexer are programmed to concurrently connect two or more inputs to the output of the inactive/static multiplexer.
An integrated circuit includes a field programmable gate array including: (i) a plurality of memory cells (e.g., static memory cells) to store data, wherein each memory cell includes a first output, (ii) a multiplexer including inputs, an output and input selects, (iii) a plurality of poly-silicon conductors, each poly-silicon conductor is disposed in the substrate and connected to the first output of an associated memory cell, (iv) poly-silicon extensions, each poly-silicon extension is (a) connected to an associated poly-silicon conductor and (b) coupled to an associated input select of the multiplexer, wherein the poly-silicon extensions are disposed in the substrate and at least partially under a metal conductor, disposed above the substrate, in the field programmable gate array.
H03K 19/177 - Logic circuits, i.e. having at least two inputs acting on one output; Inverting circuits using specified components using elementary logic circuits as components arranged in matrix form
74.
Programmable decoupling capacitance of configurable logic circuitry and method of operating same
An integrated circuit comprising at least one logic tile including a plurality of multiplexers interconnected into a network configuration, wherein each multiplexer includes a plurality of inputs, an output and a plurality of selection inputs to receive selection signals to determine whether an input of the plurality of inputs is connected to the output. The logic tile further includes (i) at least one inactive multiplexer having an output that is inactive in the network configuration and/or (ii) at least one static multiplexer receiving static selection signals, wherein during operation of the integrated circuit, the selection inputs of the inactive and/or the static multiplexer receive selection signals responsively connect (whether directly or indirectly) two or more inputs of the inactive and/or the static multiplexer to the output of the inactive multiplexer.
An integrated circuit comprising a field programmable gate array including a plurality of logic tiles physically organized in at least one row and at least one column and wherein each logic tile (i) is electrically coupled and physically adjacent to at least one other logic tile of the plurality of logic tiles and (ii) includes (a) logic circuitry, (b) memory, and (c) a configurable switch interconnect network which is electrically coupled to the memory, wherein the configurable switch interconnect network includes a plurality of switches electrically interconnected and organized into a plurality of switch matrices and wherein the plurality of switch matrices are arranged in a plurality of stages. In one embodiment, each logic tile of the plurality of logic tiles is capable of communicating, during operation, with at least one other logic tile of the plurality of logic tiles.
H03K 19/177 - Logic circuits, i.e. having at least two inputs acting on one output; Inverting circuits using specified components using elementary logic circuits as components arranged in matrix form
H01L 25/00 - Assemblies consisting of a plurality of individual semiconductor or other solid state devices
H03K 19/173 - Logic circuits, i.e. having at least two inputs acting on one output; Inverting circuits using specified components using elementary logic circuits as components
76.
Multiplexer-memory cell circuit, layout thereof and method of manufacturing same
An integrated circuit includes a field programmable gate array including: (i) a plurality of memory cells (e.g., static memory cells) to store data, wherein each memory cell includes a first output, (ii) a multiplexer including inputs, an output and input selects, (iii) a plurality of poly-silicon conductors, each poly-silicon conductor is disposed in the substrate and connected to the first output of an associated memory cell, (iv) poly-silicon extensions, each poly-silicon extension is (a) connected to an associated poly-silicon conductor and (b) coupled to an associated input select of the multiplexer, wherein the poly-silicon extensions are disposed in the substrate and at least partially under a metal conductor in the field programmable gate array.
H03K 19/173 - Logic circuits, i.e. having at least two inputs acting on one output; Inverting circuits using specified components using elementary logic circuits as components
H03K 19/177 - Logic circuits, i.e. having at least two inputs acting on one output; Inverting circuits using specified components using elementary logic circuits as components arranged in matrix form
77.
BLOCK MEMORY LAYOUT AND ARCHITECTURE FOR PROGRAMMABLE LOGIC IC, AND METHOD OF OPERATING SAME
An integrated circuit comprising programmable/configurable logic circuitry including a plurality of logic tiles, arranged in an array, wherein each logic tile includes logic circuitry and I/O connected in an interconnect network via multiplexers. A first logic tile includes (i) a first portion of a perimeter which forms at least a portion of the periphery of the programmable/configurable logic circuitry and (ii) a second portion of a perimeter which is interior to such circuitry's periphery, wherein memory I/O is disposed on the second portion of the perimeter of the first logic tile. A second logic tile includes a second portion of a perimeter which is interior to the programmable/configurable logic circuitry's periphery and opposes the first logic tile's perimeter. Memory array(s), located between the second portions of the perimeters of the first and second logic tiles, is/are coupled to memory I/O of at least the first logic tile.
H03K 19/177 - Logic circuits, i.e. having at least two inputs acting on one output; Inverting circuits using specified components using elementary logic circuits as components arranged in matrix form
78.
Block memory layout and architecture for programmable logic IC, and method of operating same
An integrated circuit comprising programmable/configurable logic circuitry including a plurality of logic tiles, arranged in an array, wherein each logic tile includes logic circuitry and I/O connected in an interconnect network via multiplexers. A first logic tile includes (i) a first portion of a perimeter which forms at least a portion of the periphery of the programmable/configurable logic circuitry and (ii) a second portion of a perimeter which is interior to such circuitry's periphery, wherein memory I/O is disposed on the second portion of the perimeter of the first logic tile. A second logic tile includes a second portion of a perimeter which is interior to the programmable/configurable logic circuitry's periphery and opposes the first logic tile's perimeter. Memory array(s), located between the second portions of the perimeters of the first and second logic tiles, is/are coupled to memory I/O of at least the first logic tile.
H03K 19/177 - Logic circuits, i.e. having at least two inputs acting on one output; Inverting circuits using specified components using elementary logic circuits as components arranged in matrix form
H03K 19/173 - Logic circuits, i.e. having at least two inputs acting on one output; Inverting circuits using specified components using elementary logic circuits as components
G11C 5/02 - Disposition of storage elements, e.g. in the form of a matrix array
09 - Scientific and electric apparatus and instruments
Goods & Services
Integrated circuit cores; integrated circuit (IP) cores implementing embedded field programmable gate arrays (FPGAs) with a wide range of options for use in any type of digital integrated circuit; integrated circuits and semiconductor devices; components and parts for integrated circuits and semiconductor devices; computer components; software for programming integrated circuit cores; software for programming embedded FPGA IP cores; software for programming integrated circuits and semiconductor devices; documentation and user guides, sold as a unit; electronic documentation and user guides; downloadable documentation and user guides, for use with the foregoing, in International Class 9.
09 - Scientific and electric apparatus and instruments
Goods & Services
Integrated circuit cores; integrated circuit (IP) cores implementing embedded field programmable gate arrays (FPGAs) with a wide range of options for use in any type of digital integrated circuit; integrated circuits and semiconductor devices; components and parts for integrated circuits and semiconductor devices; computer components; software for programming integrated circuit cores; software for programming embedded FPGA IP cores; software for programming integrated circuits and semiconductor devices; documentation and user guides, sold as a unit; electronic documentation and user guides; downloadable documentation and user guides, for use with the foregoing, in International Class 9.
81.
Clock distribution architecture for logic tiles of an integrated circuit and method of operation thereof
An integrated circuit includes a plurality of logic tiles, arranged in an array, wherein, during operation, each logic tile is configurable to connect with at least one logic tile that is adjacent thereto, and wherein each logic tile includes: clock distribution and transmission circuitry, configurable to (i) receive at least one tile input clock signal from one or more logic tiles which is/are adjacent thereto and (ii) to transmit at least one tile output clock signal to one or more logic tiles which is/are adjacent thereto; tile clock generation circuitry which is configurable to generate at least one tile clock using or based on the at least one input clock signal; circuitry, coupled to clock distribution and transmission circuitry, to disable circuitry of the clock distribution and transmission circuitry; and logic circuitry to perform operations using or based on at least one tile clock.
H03K 19/177 - Logic circuits, i.e. having at least two inputs acting on one output; Inverting circuits using specified components using elementary logic circuits as components arranged in matrix form
G06F 1/08 - Clock generators with changeable or programmable clock frequency
82.
Mixed-radix and/or mixed-mode switch matrix architecture and integrated circuit, and method of operating same
An integrated circuit comprising a plurality of switch matrices wherein the plurality of switch matrices are arranged in stages including (i) a first stage, configured in a hierarchical network (for example, a radix-4 network), (ii) a second stage configured in a hierarchical network (for example, a radix-2 or radix-3 network) and coupled to switches of the first stage, and (iii) a third stage configured in a mesh network and coupled to switches of the first and/or second stages. In one embodiment, the third stage of switch matrices is located between the first stage and second stage of switch matrices; in another embodiment, the third stage is the highest stage.
H03K 19/177 - Logic circuits, i.e. having at least two inputs acting on one output; Inverting circuits using specified components using elementary logic circuits as components arranged in matrix form
H03K 19/173 - Logic circuits, i.e. having at least two inputs acting on one output; Inverting circuits using specified components using elementary logic circuits as components
83.
Multiplexer-memory cell circuit, layout thereof and method of manufacturing same
An integrated circuit includes a plurality of logic tiles, wherein each logic tile includes a plurality of edges and, is configurable to communicate, during operation, with at least one adjacent logic tile, and wherein a first logic tile includes: (i) a plurality of static memory cells to store data, wherein each memory cell includes a first output, (ii) a multiplexer including inputs, an output and input selects, (iii) a plurality of poly-silicon conductors, each poly-silicon conductor is disposed in the substrate and connected to the first output of an associated memory cell, (iv) poly-silicon extensions, each poly-silicon extension is (a) connected to an associated poly-silicon conductor and (b) coupled to an associated input select of the multiplexer, wherein the poly-silicon extensions are disposed in the substrate and at least partially under a metal conductor in the first logic tile.
H03K 19/173 - Logic circuits, i.e. having at least two inputs acting on one output; Inverting circuits using specified components using elementary logic circuits as components
H03K 19/177 - Logic circuits, i.e. having at least two inputs acting on one output; Inverting circuits using specified components using elementary logic circuits as components arranged in matrix form
09 - Scientific and electric apparatus and instruments
Goods & Services
Integrated circuit (IP) cores implementing embedded field programmable gate arrays (FPGAs) with a wide range of options for use in any type of digital Integrated circuit; software for programming the embedded FPGA IP core; documentation and user guides, sold as a unit, for use with the foregoing
09 - Scientific and electric apparatus and instruments
Goods & Services
Integrated circuit (IP) cores implementing embedded field programmable gate arrays (FPGAs) with a wide range of options for use in any type of digital Integrated circuit; software for programming the embedded FPGA IP core; documentation and user guides, sold as a unit, for use with the foregoing
86.
MIXED-RADIX AND/OR MIXED-MODE SWITCH MATRIX ARCHITECTURE AND INTEGRATED CIRCUIT, AND METHOD OF OPERATING SAME
An integrated circuit comprising a plurality of logic tiles, wherein each logic tile includes a plurality of (i) computing elements and (ii) switch matrices. The plurality of switch matrices are arranged in stages including (i) a first stage, configured in a hierarchical network (for example, a radix-4 network), wherein, each switch matrix of the first stage is connected to at least one associated computing element, (ii) a second stage configured in a hierarchical network (for example, a radix-2 or radix-3 network) and coupled to switches of the first stage, and (iii) a third stage configured in a mesh network and coupled to switches of the first and/or second stages. In one embodiment, the third stage of switch matrices is located between the first stage and second stage of switch matrices; in another embodiment, the third stage is the highest stage.
H03K 19/177 - Logic circuits, i.e. having at least two inputs acting on one output; Inverting circuits using specified components using elementary logic circuits as components arranged in matrix form
87.
Mixed-radix and/or mixed-mode switch matrix architecture and integrated circuit, and method of operating same
An integrated circuit comprising a plurality of logic tiles, wherein each logic tile includes a plurality of (i) computing elements and (ii) switch matrices. The plurality of switch matrices are arranged in stages including (i) a first stage, configured in a hierarchical network (for example, a radix-4 network), wherein, each switch matrix of the first stage is connected to at least one associated computing element, (ii) a second stage configured in a hierarchical network (for example, a radix-2 or radix-3 network) and coupled to switches of the first stage, and (iii) a third stage configured in a mesh network and coupled to switches of the first and/or second stages. In one embodiment, the third stage of switch matrices is located between the first stage and second stage of switch matrices; in another embodiment, the third stage is the highest stage.
H03K 19/177 - Logic circuits, i.e. having at least two inputs acting on one output; Inverting circuits using specified components using elementary logic circuits as components arranged in matrix form
H01L 25/00 - Assemblies consisting of a plurality of individual semiconductor or other solid state devices
H03K 19/173 - Logic circuits, i.e. having at least two inputs acting on one output; Inverting circuits using specified components using elementary logic circuits as components
88.
Clock distribution architecture for logic tiles of an integrated circuit and method of operation thereof
An integrated circuit includes a plurality of logic tiles, wherein each logic tile is configurable to connect with at least one adjacent logic tile; a first logic tile includes: (i) an input clock path which is associated with an edge and to receive a tile input clock signal, (ii) a plurality of output clock paths, each output clock path is associated with an edge of the tile and includes at least one u-turn circuit to: (a) receive a tile clock signal having a predetermined skew relative to the tile input clock signal and (b) output a tile clock signal having a predetermined skew relative to a tile output clock signal, (iii) a tile clock generation path which includes a plurality of the u-turn circuits to generate a tile clock based on the tile clock signals, and (iv) programmable logic circuitry to perform operations using the tile clock.
H03K 19/177 - Logic circuits, i.e. having at least two inputs acting on one output; Inverting circuits using specified components using elementary logic circuits as components arranged in matrix form
G06F 1/08 - Clock generators with changeable or programmable clock frequency
89.
CLOCK DISTRIBUTION ARCHITECTURE FOR LOGIC TILES OF AN INTEGRATED CIRCUIT AND METHOD OF OPERATION THEREOF
An integrated circuit includes a plurality of logic tiles, wherein each logic tile includes a plurality of edges and is configurable to connect with adjacent logic tile. Each logic tile includes a plurality of input/output clock paths, wherein each input/output clock path is associated with a different edge of the logic tile. The plurality of input/output clock paths include a plurality of input clock path, each input clock path configurable to receive a tile input clock signal from an adjacent first logic tile, and a plurality of output clock paths, each output clock path configurable to output a tile output clock signal to an adjacent second logic tile. An output clock path includes a u-turn circuit to receive a tile clock signal having a first predetermined skew and provide a tile clock signal having a second predetermined skew.
H03K 19/177 - Logic circuits, i.e. having at least two inputs acting on one output; Inverting circuits using specified components using elementary logic circuits as components arranged in matrix form
90.
Clock distribution architecture for logic tiles of an integrated circuit and method of operation thereof
An integrated circuit includes a plurality of logic tiles, wherein each logic tile includes a plurality of edges and is configurable to connect with adjacent logic tile. Each logic tile includes a plurality of input/output clock paths, wherein each input/output clock path is associated with a different edge of the logic tile. The plurality of input/output clock paths include a plurality of input clock path, each input clock path configurable to receive a tile input clock signal from an adjacent first logic tile, and a plurality of output clock paths, each output clock path configurable to output a tile output clock signal to an adjacent second logic tile. An output clock path includes a u-turn circuit to receive a tile clock signal having a first predetermined skew and provide a tile clock signal having a second predetermined skew.
H03K 19/177 - Logic circuits, i.e. having at least two inputs acting on one output; Inverting circuits using specified components using elementary logic circuits as components arranged in matrix form