An integrated circuit device includes broadcast data paths, a weighting-value memory, and multiply-accumulate (MAC) units. The MAC units are coupled in common to each of the broadcast data paths and coupled to receive respective weighting values from the weighting-value memory via respective weighting-value paths. Each of the MAC units includes a plurality of MAC circuits coupled respectively to the broadcast data paths, with each of the MAC circuits within a given one of the MAC units (i) receiving an input data value via a respective one of the broadcast data paths and a shared one of the weighting values via a shared one of the respective weighting-value paths, (ii) generating a sequence of multiplication products by multiplying the input data value with the shared one of the weighting values, and (iii) accumulating a sum of the multiplication products.
G06F 7/544 - Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation using unspecified devices for evaluating functions by calculation
Multiply-accumulate processors within a tensor processing unit simultaneously execute, in each of a sequence of multiply-accumulate cycles, respective multiply operations using a shared input data operand and respective weighting operands, each of the multiply-accumulate processors applying a new shared input data operand and respective weighting operand in each successive multiply-accumulate cycle to accumulate, as a component of an output tensor, a respective sum- of-multiplication-products.
G06F 7/544 - Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation using unspecified devices for evaluating functions by calculation
3.
MAC PROCESSING PIPELINES, CIRCUITRY TO CONFIGURE SAME, AND METHODS OF OPERATING SAME
An integrated circuit comprising a plurality MAC processors, interconnected into a linear pipeline, configurable to process input data, wherein each MAC processor includes (A) a multiplier and (B) an accumulator circuit, and (C) a plurality of rotate input data paths, wherein each rotate input data path couples two sequential MAC processors of the linear pipeline including an input of the multiplier circuit of a first MAC processor of sequential MAC processors to an input of the multiplier circuit of the immediately following MAC processor of the associated sequential MAC processors of the pipeline - wherein each rotate input data path is configurable to provide rotate input data from a first MAC processor of sequential MAC processors of the linear pipeline to the immediately following MAC processor of the associated sequential MAC processors thereby forming a serial circular path via the plurality of rotate input data paths.
An integrated circuit comprising a plurality of multiplier-accumulator circuits connected in series to form a linear pipeline to process first data, via performing a plurality of concatenated multiply and accumulate operations, and generate MAC output data, wherein each multiplier-accumulator circuit of the plurality of multiplier- accumulator circuits includes (i) a multiplier to multiply first data by a multiplier weight data and generate a product data, and (ii) an accumulator, coupled to the multiplier of the associated multiplier-accumulator circuit, to add second data and the product data of the associated multiplier to generate sum data. The integrated circuit further includes an activation circuit, connected to the output of the linear pipeline of the plurality of multiplier-accumulator circuits, to receive the MAC output data and process the MAC output data, via a non-linear activation function, to generate MAC pipeline output data.
H03K 19/173 - Logic circuits, i.e. having at least two inputs acting on one output; Inverting circuits using specified components using elementary logic circuits as components
An integrated circuit comprising a MAC pipeline including a plurality of MACs connected in series to perform concatenated multiply and accumulate operations, wherein each MAC includes a multiplier circuit array, including a plurality of multiplier circuits, to multiply first data and weight data and generate product data. The plurality of multiplier circuits, in one embodiment, includes a first multiplier circuit to multiply first portions of the first data and the weight data to generate a first field, and a second multiplier circuit to multiply a second portions of the first data and weight data to generate a second field, wherein the product data includes data which is representative of the first field and the second field. An accumulator circuit adds the product data, output from the associated multiplier circuit array, and second data. The multiply cores of the first and second multiplier circuits are separate and different.
G06F 7/38 - Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation
G06F 7/499 - Denomination or exception handling, e.g. rounding or overflow
6.
CONFIGURABLE MAC PIPELINES FOR FINITE-IMPULSE-RESPONSE FILTERING, AND METHODS OF OPERATING SAME
An integrated circuit comprising a plurality MAC pipelines wherein each MAC pipeline includes: (i) a plurality of MACs connected in series and (ii) a plurality of data paths including an accumulation data path, wherein each MAC includes a multiplier to multiply to generate product data and an accumulator to generate sum data. The integrated circuit further comprises a plurality of control/configure circuits, wherein each control/configure circuit connects directly to and is associated with a MAC pipeline, wherein each control/configure circuit includes an accumulation data path which is configurable to directly connect to the accumulation data path of the MAC pipeline to form an accumulation ring when the control/configure circuit is configured in an accumulation mode, and an output data path configurable to directly connect to the output of the accumulation data path of the MAC pipeline when the control/configure circuit is configured in an output data mode.
An integrated circuit comprising a plurality of multiplier-accumulator circuits connected in series in a linear pipeline to perform a plurality of concatenated multiply and accumulate operations, wherein each multiplier-accumulator circuit of the plurality of multiplier-accumulator circuits includes: a multiplier to multiply first data by a multiplier weight data and generate a product data, and an accumulator, coupled to the multiplier of the associated multiplier-accumulator circuit, to add second data and the product data of the associated multiplier to generate sum data. The integrated circuit also includes a plurality of granularity configuration circuits, wherein each granularity configuration circuit is associated with a different multiplier-accumulator circuit of the plurality of multiplier-accumulator circuits to operationally (i) disconnect the multiplier and accumulator of the associated multiplier-accumulator circuit from the linear pipeline during operation or (ii) connect the multiplier and accumulator of the associated multiplier-accumulator circuit to the linear pipeline during operation.
G06F 7/38 - Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation
H03K 19/173 - Logic circuits, i.e. having at least two inputs acting on one output; Inverting circuits using specified components using elementary logic circuits as components
H03K 19/177 - Logic circuits, i.e. having at least two inputs acting on one output; Inverting circuits using specified components using elementary logic circuits as components arranged in matrix form
8.
MAC PROCESSING PIPELINES, CIRCUITRY TO CONTROL AND CONFIGURE SAME, AND METHODS OF OPERATING SAME
An integrated circuit including control/configure circuitry which interfaces with a plurality of interconnected (e.g., serially) multiplier-accumulator circuits and/or one or more rows of interconnected (e.g., serially) multiplier-accumulator circuits. The control/configure circuitry may include a plurality of control/configure circuits, each control/configure circuit interfaces with at least one multi-bit MAC execution pipeline, wherein each pipeline includes a plurality of interconnected (e.g., serially) multiplier- accumulator circuits. Each control/configure circuit may include one or more (or all) of (i) a configurable input data signal path to provide data to the MACs of the pipeline during the execution sequence(s), (ii) a configurable accumulation data path for the ongoing/accumulating MAC accumulation totals generated by the MACs during an execution sequence, and (iii) a configurable output data path for the output data generated by execution sequence (i.e., input data that was processed via the multiplier-accumulator circuits or MAC processors of the execution pipeline).
An integrated circuit including a plurality of logarithmic addition-accumulator circuits, connected in series, to, in operation, perform logarithmic addition and accumulate operations, wherein each logarithmic addition-accumulator circuit includes: (i) a logarithmic addition circuit to add a first input data and a filter weight data, each having the logarithmic data format, and to generate and output first sum data having a logarithmic data format, and (ii) an accumulator, coupled to the logarithmic addition circuit of the associated logarithmic addition-accumulator circuit, to add a second input data and the first sum data output by the associated logarithmic addition circuit to generate first accumulation data. The integrated circuit may further include first data format conversion circuitry, coupled to the output of each logarithmic addition circuit, to convert the data format of the first sum data to a floating point data format wherein the accumulator may be a floating point type.
G06F 7/57 - Arithmetic logic units [ALU], i.e. arrangements or devices for performing two or more of the operations covered by groups or for performing logical operations
An integrated circuit including a multiplier-accumulator execution pipeline including a plurality of multiplier-accumulator circuits to process the data, using filter weights, via a plurality of multiply and accumulate operations. The integrated circuit includes first conversion circuitry, coupled the pipeline, having inputs to receive a plurality of sets of data, wherein each set of data includes a plurality of data, Winograd conversion circuitry to convert each set of data to a corresponding Winograd set of data, floating point format conversion circuitry, coupled to the Winograd conversion circuitry, to convert the data of each Winograd set of data to a floating point data format. In operation, the multiplier-accumulator circuits are configured to: perform the plurality of multiply and accumulate operations using the data of the plurality of Winograd sets of data from the first conversion circuitry and the filter weights, and generate output data based on the multiply and accumulate operations.
An integrated circuit including a plurality of processing components, including first and second processing components, wherein each processing component includes first memory to store image data and a plurality of multiplier-accumulator execution pipelines, wherein each multiplier-accumulator execution pipeline includes a plurality of multiplier-accumulator circuits to, in operation, perform multiply and accumulate operations using data from the first memory and filter weights. The first processing component is configured to process all of the data associated with all of stages of a first image frame via the plurality of multiplier-accumulator execution pipelines of the first processing component. The second processing component is configured to process all of the data associated with all of stages of a second image frame via the plurality of multiplier-accumulator execution pipelines of the second processing component, wherein the first image frame and the second image frame are successive image frames.
An integrated circuit including memory to store image data and filter weights, and a plurality of multiply-accumulator execution pipelines, each multiply-accumulator execution pipeline coupled to the memory to receive (i) image data and (ii) filter weights, wherein each multiply-accumulator execution pipeline processes the image data, using associated filter weights, via a plurality of multiply and accumulate operations. In one embodiment, the multiply-accumulator circuitry of each multiply-accumulator execution pipeline, in operation, receives a different set of image data, each set including a plurality of image data, and, using filter weights associated with the received set of image data, processes the set of image data associated therewith, via performing a plurality of multiply and accumulate operations concurrently with the multiply-accumulator circuitry of the other multiply-accumulator execution pipelines, to generate output data. Each set of image data includes all of the image that correlates to the output data generated therefrom.
G06F 7/38 - Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation
G06F 7/48 - Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation using unspecified devices
G06F 7/544 - Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation using unspecified devices for evaluating functions by calculation
13.
MULTIPLIER-ACCUMULATOR CIRCUIT, LOGIC TILE ARCHITECTURE FOR MULTIPLY-ACCUMULATE AND IC INCLUDING LOGIC TILE ARRAY
An integrated circuit comprising a plurality of multiply-accumulator circuitry interconnected in a concatenation architecture. Each multiply-accumulator circuitry includes first and second MAC circuits and a load-store register. The first MAC circuit includes a multiplier to multiply first data by a first multiplier weight data and generate a first product data, and an accumulator to add first input data and the first product data to generate first sum data. The second MAC circuit includes a multiplier to multiply second data by a second multiplier weight data and generate a second product data, and an accumulator, coupled to the multiplier of the second MAC circuit and the accumulator of the first MAC circuit, to add the first sum data and the second product data to generate second sum data. The load-store register is coupled to the accumulator of the second MAC circuit to temporarily store the second sum data.
H03K 19/177 - Logic circuits, i.e. having at least two inputs acting on one output; Inverting circuits using specified components using elementary logic circuits as components arranged in matrix form
14.
FPGA HAVING PROGRAMMABLE POWERED-UP/POWERED-DOWN LOGIC TILES, AND METHOD OF CONFIGURING AND OPERATING SAME
An integrated circuit comprising a field programmable gate array including a plurality of logic tiles, wherein, during operation of the field programmable gate array, each logic tile is configurable to connect with at least one logic tile of the plurality of logic tiles, and wherein each logic tile of the plurality of logic tiles includes an interconnect network, including a plurality of multiplexers, and logic circuitry. The field programmable gate array, in a first operational mode, includes a first group of logic tiles that are programmed in a powered-up state wherein each logic tile of the first group of logic tiles consumes electrical power during operation, and a second group of logic tiles of the plurality of logic tiles are programmed in a powered-down state wherein each logic tile of the second group of logic tiles does not consume electrical power during operation.
H03K 19/177 - Logic circuits, i.e. having at least two inputs acting on one output; Inverting circuits using specified components using elementary logic circuits as components arranged in matrix form
15.
CLOCK DISTRIBUTION AND GENERATION ARCHITECTURE FOR LOGIC TILES OF AN INTEGRATED CIRCUIT AND METHOD OF OPERATING SAME
An integrated circuit comprising an array of logic tiles, arranged in an array of rows and columns. The array of logic tiles includes a first logic tile to receive a first external clock signal wherein each logic tile of a first plurality of logic tiles generates the tile clock using (i) the first external clock signal or (ii) a delayed version thereof from one of the plurality of output clock paths of a logic tile in the first plurality, and a second logic tile to receive a second external clock signal wherein each logic tile of a second plurality of logic tiles generates the tile clock using (i) the second external clock signal or (ii) a delayed version thereof from one of the plurality of output clock paths of a logic tile in the second plurality, wherein the first and second external clock signals are the same clock signals.
H03K 19/177 - Logic circuits, i.e. having at least two inputs acting on one output; Inverting circuits using specified components using elementary logic circuits as components arranged in matrix form
G11C 11/417 - Auxiliary circuits, e.g. for addressing, decoding, driving, writing, sensing, timing or power reduction for memory cells of the field-effect type
H03K 19/00 - Logic circuits, i.e. having at least two inputs acting on one output; Inverting circuits
H03K 19/173 - Logic circuits, i.e. having at least two inputs acting on one output; Inverting circuits using specified components using elementary logic circuits as components
16.
FPGA HAVING A VIRTUAL ARRAY OF LOGIC TILES, AND METHOD OF CONFIGURING AND OPERATING SAME
An integrated circuit comprising a physical array of logic tiles, wherein each logic tile includes a perimeter and a plurality of external I/O disposed in a layout on the perimeter of the logic tile wherein the layout of the external I/O of each logic tile is identical. The physical array includes a first virtual array of logic tiles, programmed to perform data processing operations, including a first plurality of logic tiles of the physical array. The physical array also includes a second virtual array of logic tiles, programmed to perform second operations, including a second plurality of logic tiles of the physical array. The logic tiles of the second plurality are different from the logic tiles of the first plurality. In one embodiment, performance of the data processing operations of the first virtual array is independent from performance of the second operations of the second virtual array.
H03K 19/177 - Logic circuits, i.e. having at least two inputs acting on one output; Inverting circuits using specified components using elementary logic circuits as components arranged in matrix form
G11C 11/417 - Auxiliary circuits, e.g. for addressing, decoding, driving, writing, sensing, timing or power reduction for memory cells of the field-effect type
H03K 19/00 - Logic circuits, i.e. having at least two inputs acting on one output; Inverting circuits
H03K 19/173 - Logic circuits, i.e. having at least two inputs acting on one output; Inverting circuits using specified components using elementary logic circuits as components
17.
BLOCK MEMORY LAYOUT AND ARCHITECTURE FOR PROGRAMMABLE LOGIC IC, AND METHOD OF OPERATING SAME
An integrated circuit comprising programmable/configurable logic circuitry including a plurality of logic tiles, arranged in an array, wherein each logic tile includes logic circuitry and I/O connected in an interconnect network via multiplexers. A first logic tile includes (i) a first portion of a perimeter which forms at least a portion of the periphery of the programmable/configurable logic circuitry and (ii) a second portion of a perimeter which is interior to such circuitry's periphery, wherein memory I/O is disposed on the second portion of the perimeter of the first logic tile. A second logic tile includes a second portion of a perimeter which is interior to the programmable/configurable logic circuitry's periphery and opposes the first logic tile's perimeter. Memory array(s), located between the second portions of the perimeters of the first and second logic tiles, is/are coupled to memory I/O of at least the first logic tile.
H03K 19/177 - Logic circuits, i.e. having at least two inputs acting on one output; Inverting circuits using specified components using elementary logic circuits as components arranged in matrix form
18.
MIXED-RADIX AND/OR MIXED-MODE SWITCH MATRIX ARCHITECTURE AND INTEGRATED CIRCUIT, AND METHOD OF OPERATING SAME
An integrated circuit comprising a plurality of logic tiles, wherein each logic tile includes a plurality of (i) computing elements and (ii) switch matrices. The plurality of switch matrices are arranged in stages including (i) a first stage, configured in a hierarchical network (for example, a radix-4 network), wherein, each switch matrix of the first stage is connected to at least one associated computing element, (ii) a second stage configured in a hierarchical network (for example, a radix-2 or radix-3 network) and coupled to switches of the first stage, and (iii) a third stage configured in a mesh network and coupled to switches of the first and/or second stages. In one embodiment, the third stage of switch matrices is located between the first stage and second stage of switch matrices; in another embodiment, the third stage is the highest stage.
H03K 19/177 - Logic circuits, i.e. having at least two inputs acting on one output; Inverting circuits using specified components using elementary logic circuits as components arranged in matrix form
19.
CLOCK DISTRIBUTION ARCHITECTURE FOR LOGIC TILES OF AN INTEGRATED CIRCUIT AND METHOD OF OPERATION THEREOF
An integrated circuit includes a plurality of logic tiles, wherein each logic tile includes a plurality of edges and is configurable to connect with adjacent logic tile. Each logic tile includes a plurality of input/output clock paths, wherein each input/output clock path is associated with a different edge of the logic tile. The plurality of input/output clock paths include a plurality of input clock path, each input clock path configurable to receive a tile input clock signal from an adjacent first logic tile, and a plurality of output clock paths, each output clock path configurable to output a tile output clock signal to an adjacent second logic tile. An output clock path includes a u-turn circuit to receive a tile clock signal having a first predetermined skew and provide a tile clock signal having a second predetermined skew.
H03K 19/177 - Logic circuits, i.e. having at least two inputs acting on one output; Inverting circuits using specified components using elementary logic circuits as components arranged in matrix form