A method, computer system, and a non-transitory computer-readable storage medium for performing primitive batch binning are disclosed. The method, computer system, and non-transitory computer-readable storage medium include techniques for generating a primitive batch from a plurality of primitives, computing respective bin intercepts for each of the plurality of primitives in the primitive batch, and shading the primitive batch by iteratively processing each of the respective bin intercepts computed until all of the respective bin intercepts are processed.
A semiconductor package includes multiple dies that share the same package pin. An output enable register provided on each die is used to select the die that drives an output to the shared pin. A hardware arbitration circuit ensures that two or more dies do not drive an output to the shared pin at the same time.
G06F 13/20 - Handling requests for interconnection or transfer for access to input/output bus
H01L 23/538 - Arrangements for conducting electric current within the device in operation from one component to another the interconnection structure between a plurality of semiconductor chips being formed on, or in, insulating substrates
H01L 25/065 - Assemblies consisting of a plurality of individual semiconductor or other solid state devices all the devices being of a type provided for in the same subgroup of groups , or in a single subclass of , , e.g. assemblies of rectifier diodes the devices not having separate containers the devices being of a type provided for in group
The disclosed device for packet coalescing includes detecting a trigger condition for initiating packet coalescing of packet traffic and sending, to an endpoint device, a notification to start packet coalescing. The device can observe a status in response to starting the packet coalescing and report a performance of the packet coalescing. A system can include a controller that detects a trigger condition for packet coalescing and notifies an endpoint device via a notification register. The controller can read a status register to report, based on the read status, a packet coalescing performance. Various other methods, systems, and computer-readable media are also disclosed.
The disclosed device for packet coalescing includes detecting a trigger condition for initiating packet coalescing of packet traffic and sending, to an endpoint device, a notification to start packet coalescing. The device can observe a status in response to starting the packet coalescing and report a performance of the packet coalescing. A system can include a controller that detects a trigger condition for packet coalescing and notifies an endpoint device via a notification register. The controller can read a status register to report, based on the read status, a packet coalescing performance. Various other methods, systems, and computer-readable media are also disclosed.
Systems, apparatuses, and methods for prefetching data by a display controller. From time to time, a performance-state change of a memory are performed. During such changes, a memory clock frequency is changed for a memory subsystem storing frame buffer(s) used to drive pixels to a display device. During the performance-state change, memory accesses may be temporarily blocked. To sustain a desired quality of service for the display, a display controller is configured to prefetch data in advance of the performance-state change. In order to ensure the display controller has sufficient memory bandwidth to accomplish the prefetch, bandwidth reduction circuitry in clients of the system are configured to temporarily reduce memory bandwidth of corresponding clients.
An apparatus and method for efficiently managing performance among multiple integrated circuits in separate semiconductor chips. In various implementations, a computing system includes at least a first processing node and a second processing node. While processing tasks, the first processing node uses a first memory and the second processing node uses a second memory. A first communication channel transfers data between the first processing node and the second processing node. The first processing node accesses the second memory using a second communication channel different from the first communication channel and supports point-to-point communication. The second memory services access requests from the first and second processing nodes as the access requests are received while foregoing access conflict detection. The first processing node accesses the second memory after determining a particular amount of time has elapsed after reception of an indication from the second processing node specifying that a particular task has begun.
A scheduler of an apparatus exposes an application programming interface (API) usable to specify quality-of-service (QoS) parameters, e.g., latency, throughput, and so forth. An application, for instance, specifies the QoS parameters for a workload to be processed using a hardware compute unit. The QoS parameters are employed by the scheduler as a basis to configure a partition within a hardware compute unit. The partition is configured such that processing resources that are available via the partition to process the workload comply with the specified quality-of-service.
An electronic device uses a tiling scheme selected from among a set of tiling schemes for processing instances of input data through a neural network. Each of the tiling schemes is associated with a different arrangement of portions into which instances of input data are divided for processing in the neural network. In operation, processing circuitry in the electronic device acquires information about a neural network and properties of the processing circuitry. The processing circuitry then selects a given tiling scheme from among a set of tiling schemes based on the information. The processing circuitry next processes instances of input data in the neural network using the given tiling scheme. Processing each instance of input data in the neural network includes dividing the instance of input data into portions based on the given tiling scheme, separately processing each of the portions in the neural network, and combining the respective outputs to generate an output for the instance of input data.
Methods and devices are provided for processing image data on a sub-frame portion basis using layers of a convolutional neural network. The processing device comprises memory and a processor. The processor is configured to determine, for an input tile of an image, a receptive field via backward propagation and determine a size of the input tile based on the receptive field and an amount of local memory allocated to store data for the input tile. The processor determines whether the amount of local memory allocated to store the data of the input tile and padded data for the receptive field.
The disclosed computer-implemented method for generating remedy recommendations for power and performance issues within semiconductor software and hardware. For example, the disclosed systems and methods can apply a rule-based model to telemetry data to generate rule-based root-cause outputs as well as telemetry-based unknown outputs. The disclosed systems and methods can further apply a root-cause machine learning model to the telemetry-based unknown outputs to analyze deep and complex failure patterns with the telemetry-based unknown outputs to ultimately generate one or more root-cause remedy recommendations that are specific to the identified failure and the client computing device that is experiencing that failure.
A remote display synchronization technique preserves the presence of a local display device for a remotely-rendered video stream. A server and a client device cooperate to dynamically determine a target frame rate for a stream of rendered frames suitable for the current capacities of the server and the client device and networking conditions. The server generates from this target frame rate a synchronization signal that serves as timing control for the rendering process. The client device may provide feedback to instigate a change in the target frame rate, and thus a corresponding change in the synchronization signal. In this approach, the rendering frame rate and the encoding frequency may be “synchronized” in a manner consistent with the capacities of the server, the network, and the client device, resulting in generation, encoding, transmission, decoding, and presentation of a stream of frames that mitigates missed encoding of frames while providing acceptable latency.
A63F 13/355 - Performing operations on behalf of clients with restricted processing capabilities, e.g. servers transform changing game scene into an MPEG-stream for transmitting to a mobile phone or a thin client
A63F 13/358 - Adapting the game course according to the network or server load, e.g. for reducing latency due to different connection speeds between clients
H04N 19/132 - Sampling, masking or truncation of coding units, e.g. adaptive resampling, frame skipping, frame interpolation or high-frequency transform coefficient masking
A display processing device includes a display device interface and a processing unit. The processing is configured to transition at least a first component of the display processing system into a low-power state in response to an active region of a first video frame of a plurality of video frames having completed. A second component of the display processing device is configured to maintain a temporal count value corresponding to a current frame line of the plurality of video frames, and further to generate a first signal in response to the temporal count value corresponding to a first trigger value. The first signal causes the at least first component to transition out of the low-power state.
H04N 19/172 - Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object the region being a picture, frame or field
H04N 21/43 - Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronizing decoder's clock; Client middleware
13.
MULTI-PASS WRITEBACK WITH SINGLE-PASS DISPLAY CONSUMPTION
Techniques described herein allow multi-pass writeback processing of graphical frames (such as those having a high or ultrahigh resolution) to reduce bandwidth for display operations by, for example, splitting an input stream for processing by separate graphical pipelines as two or more spatially segmented portions. After receiving a graphical frame for processing, the graphical frame is spatially segmented into multiple portions. Each of the multiple portions is provided to a respective graphical pipeline of a plurality of graphical pipelines for processing. Each processed portion of the graphical frame is written substantially simultaneously to a corresponding portion of a system memory.
G09G 3/20 - Control arrangements or circuits, of interest only in connection with visual indicators other than cathode-ray tubes for presentation of an assembly of a number of characters, e.g. a page, by composing the assembly by combination of individual elements arranged in a matrix
G09G 5/393 - Arrangements for updating the contents of the bit-mapped memory
A method and system for distributing keys in a key distribution system includes receiving a connection for communication from a first component. A determination is made whether the first component requires a key be generated and distributed. Based upon a security mode for the communication, the key generated and distributed to the first component.
G06F 21/57 - Certifying or maintaining trusted computer platforms, e.g. secure boots or power-downs, version controls, system software checks, secure updates or assessing vulnerabilities
Devices and methods for node traversal for ray tracing are provided, which comprise casting a first ray in a space comprising objects represented by geometric shapes, traversing, for the first ray, at least one first node of an accelerated hierarchy structure representing an approximate volume of a group of the geometric shapes and a second node representing a volume of one of the geometric shapes, casting a second ray in the space, selecting, for the second ray, a starting node of traversal based on locations of intersection of the first ray and the second ray and an identifier which identifies one or more nodes intersected by the first ray and traversing, for the second ray, the accelerated hierarchy structure beginning at the starting node of traversal.
A method and apparatus for storing keys in a key storage block includes processing a key request. A first key is allocated based upon the key request. The first key is stored in the key storage block, wherein the first key is of a first size and includes a first rule.
A technique for servicing a memory request is disclosed. The technique includes obtaining permissions associated with a source and a destination specified by the memory request, obtaining a first set of address translations for the memory request, and executing operations for a first request, using the first set of address translations.
A system and method for determining power-performance state transition thresholds in a computing system. A processor comprises several functional blocks and a power manager. Each of the functional blocks produces data corresponding to an activity level associated with the respective functional block. The power manager determines activity levels of the functional blocks and compares the activity level of a given functional block to a threshold to determine if a power-performance state (P-state) transition is indicated. The threshold is determined in part on a current P-state of the given functional block. When the current P-state of the given functional block is relatively high, the threshold activity level to transition to a higher P-state is higher than it would be if the current P-state were relatively low. The power manager is further configured to determine the thresholds based in part on one or more of a type of circuit being monitored and a type of workload being executed.
An apparatus and method for efficiently routing power signals across a semiconductor die. In various implementations, an integrated circuit includes, at a first node that receives a power supply reference, a first micro through silicon via (TSV) that traverses through a silicon substrate layer to a backside metal layer. The integrated circuit includes, at a second node that receives the power supply reference, a second micro TSV that physically contacts at least one source region. The integrated circuit includes a first power rail that connects the first micro TSV to the second micro TSV. This power rail replaces contacts between the micro TSVs and a second power rail such as the frontside metal zero (M0) layer. Each of the first power rail, the second power rail, and the backside metal layer provides power connection redundancy that increases charge sharing, improves wafer yield, and reduces voltage droop.
H01L 23/528 - Layout of the interconnection structure
H01L 21/768 - Applying interconnections to be used for carrying current between separate components within a device
H01L 23/535 - Arrangements for conducting electric current within the device in operation from one component to another including internal interconnections, e.g. cross-under constructions
A processor for implementing a last use cache policy is configured to access data in a portion of a cache, determine that the data in the portion of the cache is no longer needed, and mark the data in the portion of the cache as non-dirty responsive to the determining that the data in the portion of the cache is no longer needed. The marking of the data as non-dirty is indicative that the data in the portion of the cache is not to be evicted from the cache to a memory.
A device includes one or more processors and one or more parallel accelerated processors. Additionally, a system management unit is configured to monitor the one or more parallel accelerated processors and to obtain one or more power management metrics for a parallel accelerated processor. A host driver included in the device is configured to receive a guest request for one or more power management metrics of the parallel accelerated processor from a virtual function of a virtual machine executing on the device, to transmit a host request for the one or more power management metrics from the host driver to the system management unit in response to receiving the guest request, to receive the one or more power management metrics from the system management unit at the host driver, and to transmit the one or more power management metrics from the host driver to the virtual machine.
Systems, apparatuses, and methods for prefetching data by a display controller are proposed. From time to time, a performance-state change of a memory is performed. During such changes, a memory clock frequency is changed for a memory subsystem (220) storing frame buffer(s) (230) used to drive pixels to a display device (250). During the performance-state change, memory accesses may be temporarily blocked. To sustain a desired quality of service for the display, a display controller (150) is configured to prefetch data in advance of the performance-state change. In order to ensure the display controller has sufficient memory bandwidth to accomplish the prefetch, bandwidth reduction circuitry (112A, 112N) in clients (205) of the system are configured to temporarily reduce memory bandwidth of corresponding clients.
G09G 5/00 - Control arrangements or circuits for visual indicators common to cathode-ray tube indicators and other visual indicators
G06F 1/324 - Power saving characterised by the action undertaken by lowering clock frequency
G06F 1/3234 - Power saving characterised by the action undertaken
G06F 3/06 - Digital input from, or digital output to, record carriers
G06F 12/0862 - Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches with prefetch
G09G 5/393 - Arrangements for updating the contents of the bit-mapped memory
G11C 7/22 - Read-write [R-W] timing or clocking circuits; Read-write [R-W] control signal generators or management
G09G 5/395 - Arrangements specially adapted for transferring the contents of the bit-mapped memory to the screen
23.
SYSTEMS AND METHODS FOR GENERATING REMEDY RECOMMENDATIONS FOR POWER AND PERFORMANCE ISSUES WITHIN SEMICONDUCTOR SOFTWARE AND HARDWARE
The disclosed computer-implemented method for generating remedy recommendations for power and performance issues within semiconductor software and hardware. For example, the disclosed systems and methods can apply a rule-based model to telemetry data to generate rule-based root-cause outputs as well as telemetry-based unknown outputs. The disclosed systems and methods can further apply a root-cause machine learning model to the telemetry-based unknown outputs to analyze deep and complex failure patterns with the telemetry-based unknown outputs to ultimately generate one or more root-cause remedy recommendations that are specific to the identified failure and the client computing device that is experiencing that failure.
A remote display synchronization technique preserves the presence of a local display device for a remotely-rendered video stream. A server and a client device cooperate to dynamically determine a target frame rate for a stream of rendered frames suitable for the current capacities of the server and the client device and networking conditions. The server generates from this target frame rate a synchronization signal that serves as timing control for the rendering process. The client device may provide feedback to instigate a change in the target frame rate, and thus a corresponding change in the synchronization signal. In this approach, the rendering frame rate and the encoding frequency may be "synchronized" in a manner consistent with the capacities of the server, the network, and the client device, resulting in generation, encoding, transmission, decoding, and presentation of a stream of frames that mitigates missed encoding of frames while providing acceptable latency.
H04N 21/242 - Synchronization processes, e.g. processing of PCR [Program Clock References]
H04N 21/2662 - Controlling the complexity of the video stream, e.g. by scaling the resolution or bitrate of the video stream based on the client capabilities
H04N 21/24 - Monitoring of processes or resources, e.g. monitoring of server load, available bandwidth or upstream requests
Power management using temperature gradient information is described. In accordance with the described techniques, temperature measurements of a component are obtained from two or more sensors of the component. A temperature of a hotspot of the component is predicted based on the temperature measurements obtained from the two or more sensors of the component. Operation of the component is adjusted based on the predicted temperature of the hotspot.
An integrated circuit includes a power supply monitor, a clock generator, and a divider. The power supply monitor is operable to provide a trigger signal in response to a power supply voltage dropping below a threshold voltage. The clock generator is operable to provide a first clock signal having a frequency dependent on a value of a frequency control word, and to change the frequency of the first clock signal over time using a native slope in response to a change in the frequency control word. The divider is responsive to an assertion of the trigger signal to divide a frequency of the first clock signal by a divide value to provide a second clock signal.
H03L 7/08 - Automatic control of frequency or phase; Synchronisation using a reference signal applied to a frequency- or phase-locked loop - Details of the phase-locked loop
G01R 19/165 - Indicating that current or voltage is either above or below a predetermined value or within or outside a predetermined range of values
27.
Leveraging an Adaptive Oscillator for Fast Frequency Changes
Systems, apparatuses, and methods for managing power and performance in a computing system. A system management unit detects a condition indicating a change in a power-performance state of a given computing unit is indicated. In response to detecting the indication, the system management unit is configured to initiate a change to a frequency of a clock signal generated by an adaptive oscillator by changing a voltage supplied to the adaptive oscillator. The adaptive oscillator is configured to rapidly change a frequency of the clock signal generated in response to detecting a change in a droopy supply voltage of the adaptive oscillator. The new frequency generated by the adaptive oscillator is based in part on a difference between the droopy supply voltage and a regulated supply voltage of the adaptive oscillator.
Systems, apparatuses, and methods for prefetching data by a display controller. From time to time, a performance-state change of a memory are performed. During such changes, a memory clock frequency is changed for a memory subsystem storing frame buffer(s) used to drive pixels to a display device. During the performance-state change, memory accesses may be temporarily blocked. In order to reduce visual artifacts that may occur while the memory accesses are blocked, a memory subsystem includes a control circuit configured to enable a caching mode which caches display data provided to the display controller. Subsequent requests for display data from the display controller are then serviced using the cached data instead of accessing memory.
Systems and methods are disclosed for managing diversified virtual memory by an engine. Techniques disclosed include receiving one or more request messages, each request message including a job descriptor that specifies an operation to be performed on a respective virtual memory space, processing the job descriptors by generating one or more commands for transmission to one or more virtual memory managers, and transmitting the one or more commands to the one or more virtual memory managers (VMMs) for processing.
A method and system for distributing keys in a key distribution system includes receiving a connection for communication from a first component. A determination is made whether the first component requires a key be generated and distributed. Based upon a security mode for the communication, the key generated and distributed to the first component.
Devices and methods for multi-resolution geometric representation for ray tracing are described which include casting a ray in a space comprising objects represented by geometric shapes and approximating a volume of the geometric shapes using an accelerated hierarchy structure. The accelerated hierarchy structure comprises first nodes each representing a volume of one of the geometric shapes in the space and second nodes each representing an approximate volume of a group of the geometric shapes. When the ray is determined to intersect a bounding box of a second node representing one group of the geometric shapes, a selection is made between traversal and non-traversal of other second nodes based on a LOD for representing the volume of the one group of geometric shapes.
Resource use orchestration for multiple application instances is described. In accordance with the described techniques, a time interval for accessing a resource is divided into multiple time slots. In one or more implementations, the resource is a graphics processing unit. Each of a plurality of containers associated with an application is assigned to one of the multiple time slots according to a disbursement algorithm. A respective signal offset is provided to each container based on an assigned time slot of the container. The provided signal offsets cause the plurality of containers to access the resource for the application in a predetermined order.
A63F 13/335 - Interconnection arrangements between game servers and game devices; Interconnection arrangements between game devices; Interconnection arrangements between game servers using wide area network [WAN] connections using Internet
A63F 13/352 - Interconnection arrangements between game servers and game devices; Interconnection arrangements between game devices; Interconnection arrangements between game servers - Details of game servers involving special game server arrangements, e.g. regional servers connected to a national server or a plurality of servers managing partitions of the game world
A63F 13/358 - Adapting the game course according to the network or server load, e.g. for reducing latency due to different connection speeds between clients
A method for clock distribution network control includes determining, at a first clock node of a plurality of clock nodes within a clock distribution network, a downstream clock request status. A clock request signal is transmitted by the first clock node to an upstream parent node based on the downstream clock request status. A clock buffer of the first clock node is toggled based at least in part on the clock request signal to the parent node. If the first clock node receives an asserted clock request signal from one or more downstream child nodes and clock acknowledgment signal from the parent node, a clock enable signal is asserted to the clock buffer to output a clock signal to the one or more downstream child nodes.
Data integrity checks for reducing communication latency is described. A transmitting endpoint transmits data to a receiving endpoint by generating an integrity tag for a first subset of data blocks and a second integrity tag for a second subset of data blocks. In implementations, the first and second integrity tags overlap at least one data block and are offset based on computational complexities of generating the integrity tags. A receiving endpoint generates comparison tags for each of the integrity tags and uses the comparison tags to validate an authenticity of received data. In response to validating the first and second integrity tags, data blocks covered by both the first and second integrity tags are released for use. Additional integrity tags are generated and validated for subsequent subsets of data blocks during data communication, thus reducing latency by offsetting times at which comparison tags are generated and validated.
Methods and devices are provided for processing data using a neural network. Activations from a previous layer of the neural network are received by a layer of the neural network. Weighted values, to be applied to values of elements of the activations, are determined based on a spatial correlation of the elements and a task error output by the layer. The weighted values are applied to the values of the elements and a combined error is determined based on the task error and the spatial correlation.
An apparatus and method for performing efficient video transmission. In various implementations, a computing system includes a transmitter sending a video stream to a receiver over a network. Before encoding a video frame, the transmitter identifies a first set of one or more macroblocks of the video frame that includes text. The transmitter replaces pixel color information with pixel distance information for the first set of one or more macroblocks. The transmitter inserts, in metadata information, indications that identify the first set of one or more macroblocks and specify the color values of pixels in the first set of one or more macroblocks. The transmitter encodes the video frame and sends it along with the metadata information to the receiver. The receiver uses the metadata information to reproduce the original pixel colors and maintain text clarity of an image to be depicted on a display device.
H04N 19/167 - Position within a video image, e.g. region of interest [ROI]
H04N 19/46 - Embedding additional information in the video signal during the compression process
H04N 19/117 - Filters, e.g. for pre-processing or post-processing
H04N 19/176 - Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object the region being a block, e.g. a macroblock
H04N 19/186 - Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being a colour or a chrominance component
37.
OFFSET DATA INTEGRITY CHECKS FOR LATENCY REDUCTION
Data integrity checks for reducing communication latency is described. A transmitting endpoint transmits data to a receiving endpoint by generating an integrity tag for a first subset of data blocks and a second integrity tag for a second subset of data blocks. In implementations, the first and second integrity tags overlap at least one data block and are offset based on computational complexities of generating the integrity tags. A receiving endpoint generates comparison tags for each of the integrity tags and uses the comparison tags to validate an authenticity of received data. In response to validating the first and second integrity tags, data blocks covered by both the first and second integrity tags are released for use. Additional integrity tags are generated and validated for subsequent subsets of data blocks during data communication, thus reducing latency by offsetting times at which comparison tags are generated and validated.
H04L 9/32 - Arrangements for secret or secure communications; Network security protocols including means for verifying the identity or authority of a user of the system
An apparatus and method for efficient power management of multiple integrated circuits. In various implementations, a computing system includes first partition and a second partition. The second partition includes video pre-processing circuitry that identifies regions of a video frame to be presented on a screen or monitor that don't change or regions that can have one or more of resolution and color accuracy be below a threshold. The first partition includes a parallel data processor with one or more compute units, each with multiple lanes of execution. Based on the identified regions, the first partition generates an execution mask indicating which lanes of the compute units are inactive. The parallel data processor copies result data from the active lanes to outputs of the inactive lanes.
An apparatus and method for performing efficient video transmission. In various implementations, a computing system includes a transmitter sending a video stream to a receiver over a network. Before encoding a video frame, the transmitter identifies a first set of one or more macroblocks of the video frame that includes text. The transmitter replaces pixel color information with pixel distance information for the first set of one or more macroblocks. The transmitter inserts, in metadata information, indications that identify the first set of one or more macroblocks and specify the color values of pixels in the first set of one or more macroblocks. The transmitter encodes the video frame and sends it along with the metadata information to the receiver. The receiver uses the metadata information to reproduce the original pixel colors and maintain text clarity of an image to be depicted on a display device.
H04N 19/70 - Methods or arrangements for coding, decoding, compressing or decompressing digital video signals characterised by syntax aspects related to video coding, e.g. related to compression standards
40.
METHOD AND APPARATUS FOR POWER MANAGEMENT OF A GRAPHICS PROCESSING CORE IN A VIRTUAL ENVIRONMENT
A method and apparatus controls power management of a graphics processing core when multiple virtual machines are allocated to the graphics processing core on a much finer-grain level than conventional systems. In one example, the method and apparatus processes a plurality of virtual machine power control setting requests to determine a power control request for a power management unit of a graphics processing core. The method and apparatus then controls power levels of the graphics processing core with the power management unit based on the determined power control request.
A semiconductor package includes a first die, a second die, and an interconnect die coupled to a first plurality of through-die vias in the first die and a second plurality of through-die vias in the second die. The interconnect die provides communications pathways the first die and the second die.
H01L 23/538 - Arrangements for conducting electric current within the device in operation from one component to another the interconnection structure between a plurality of semiconductor chips being formed on, or in, insulating substrates
H01L 21/50 - Assembly of semiconductor devices using processes or apparatus not provided for in a single one of the groups
H01L 25/065 - Assemblies consisting of a plurality of individual semiconductor or other solid state devices all the devices being of a type provided for in the same subgroup of groups , or in a single subclass of , , e.g. assemblies of rectifier diodes the devices not having separate containers the devices being of a type provided for in group
H01L 27/06 - Devices consisting of a plurality of semiconductor or other solid-state components formed in or on a common substrate including integrated passive circuit elements with at least one potential-jump barrier or surface barrier the substrate being a semiconductor body including a plurality of individual components in a non-repetitive configuration
42.
VIDEO ENCODING/DECODING USING DETECTED PATTERN OF PIXEL INTENSITY DIFFERENCES
Methods and apparatus encode image frames using intra-frame prediction by predicting pixels for a block of current pixels, based on a detected spatial pattern of pixel intensity differences among a plurality of neighboring reconstructed pixels to the block of current pixels, and encode a block of pixels of the image frame using the predicted block of reconstructed pixels. Inter-frame prediction is provided by determining whether blocks of pixels in temporally neighboring reconstructed frames corresponding to a candidate motion vector have a pattern of pixel intensity differences among the blocks from temporally neighboring frames. Predicted blocks are produced for a reconstructed frame based on the determined pattern of pixel intensity difference among temporally neighboring frames.
H04N 19/52 - Processing of motion vectors by encoding by predictive encoding
H04N 19/176 - Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object the region being a block, e.g. a macroblock
H04N 19/172 - Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object the region being a picture, frame or field
H04N 19/182 - Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being a pixel
H04N 19/136 - Incoming video signal characteristics or properties
H04N 19/593 - Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding involving spatial prediction techniques
43.
APPARATUS AND METHOD FOR PROVIDING SUBSYSTEM PROCESSOR BASED POWER SHIFTING FOR PERIPHERAL DEVICES
A computing device and method controls power consumption of a graphics processing unit in the computing device by the GPU determining an allocated power for the USB device connected through a USB port, such as a USB-C port. The GPU issues allocated power information for the external USB device to cause the allocated power to be provided to the USB device and includes issuing allocated power information to a power delivery (PD) controller that is connected to a USB port. In some implementations, the GPU shifts at least a portion of the allocated power from the USB device back to the GPU in response to a usage change event associated with the USB device for improving GPU performance. The usage change event can be a disconnect event of the USB device, a power renegotiation event between the USB device and the GPU, or any other suitable usage change event.
A semiconductor device includes a substrate and a conductive pad coupled to the substrate. A first solder mask is coupled to the substrate and to a portion of the conductive pad so the first solder mask covers the portion of the conductive pad and extends above the conductive pad. A second solder mask is coupled to a portion of the first solder mask and extends above the first solder mask.
Systems, apparatuses, and methods for implementing efficient power optimization in a computing system are disclosed. A system management unit configured to track computing activity of a computing device while processing each frame of a plurality of frames. The computing activity is tracked at least for a given period of time comprising a plurality of time slices. The system management unit further correlates a time slice associated with a given frame with a time slice associated with at least one previously processed frame from the plurality of frames, based at least in part on the tracked computing activity. The system management unit predicts a clock frequency to render the given frame, based at least in part on the correlation and renders the given frame using the predicted clock frequency.
B60R 25/20 - Means to switch the anti-theft system on or off
G01S 13/84 - Systems using reradiation of radio waves, e.g. secondary radar systems; Analogous systems wherein continuous-type signals are transmitted for distance determination by phase measurement
H01Q 1/32 - Adaptation for use in or on road or rail vehicles
H04B 7/06 - Diversity systems; Multi-antenna systems, i.e. transmission or reception using multiple antennas using two or more spaced independent antennas at the transmitting station
H04W 4/40 - Services specially adapted for particular environments, situations or purposes for vehicles, e.g. vehicle-to-pedestrians [V2P]
G07C 9/28 - Individual registration on entry or exit involving the use of a pass the pass enabling tracking or indicating presence
H01Q 25/00 - Antennas or antenna systems providing at least two radiating patterns
Techniques for implementing an overstress design for verification that reduce production and verification time by enabling a verification system to perform verification of components of a circuit design selectively, accurately, and exhaustively under extreme stress scenarios are disclosed. Circuit nodes in an emulation model are selected and overstress is provided to the nodes such that behavior of the circuit under such extreme stress scenarios is readily observable, enabling designers to produce circuits that are more secure, reliable, and resilient in case of failures. Overstress is provided to the node to enable verification of the emulation model without having to design complex test signal representations to produce extreme stress conditions. A request for manufacture is generated including aspects of the emulation model to enable verification of a fabricated circuit in a similar or identical manner to those used to verify the emulation model.
Methods and systems are disclosed for managing the power consumed by cores of a system on chip (SoC). Techniques disclosed include obtaining application information that is indicative of an application being executed on the cores, detecting a workload associated with the application, and limiting one or more operating frequencies of the cores responsive to the detection of the workload. Techniques disclosed also include profiling the detected workload and limiting the one or more operating frequencies of the cores based on the profiling.
A random-access memory (RAM) includes a plurality of memory banks, a memory channel interface circuit, and a metadata processing circuit. The memory channel interface circuit couples to a memory channel adapted for coupling to a memory controller. The metadata processing circuit is connected to the memory channel interface circuit and receiving a poison bit sent over the memory channel associated with a write command and write data for the write command. The RAM, responsive to the poison bit indicating that the write data is poisoned, stores at least one of: the poison bit and a code indicating a value of the poison bit in a selected memory bank.
An apparatus and method for efficiently generating clock signals. An integrated circuit includes multiple clock dividers both at its I/O boundaries and across its semiconductor die. A clock divider receives an input clock signal, and an indication of a reduction factor that is a positive, non-zero and a non-integer value less than one. The clock divider generates an output clock signal based on the input clock signal and the reduction factor. The reduction factor can be an M-bit pattern where M is a positive, non-zero integer greater than one. Therefore, the clock divider generates the output clock signal with a reduced clock rate that has a smallest configurable granularity that is 1/M of the input clock frequency. An asserted bit in the M-bit pattern indicates that the output clock signal should have an asserted value during a corresponding clock cycle of the input clock signal.
H03L 7/197 - Indirect frequency synthesis, i.e. generating a desired one of a number of predetermined frequencies using a frequency- or phase-locked loop using a frequency divider or counter in the loop a time difference being used for locking the loop, the counter counting between numbers which are variable in time or the frequency divider dividing by a factor variable in time, e.g. for obtaining fractional frequency division
H03L 7/08 - Automatic control of frequency or phase; Synchronisation using a reference signal applied to a frequency- or phase-locked loop - Details of the phase-locked loop
H03L 7/081 - Automatic control of frequency or phase; Synchronisation using a reference signal applied to a frequency- or phase-locked loop - Details of the phase-locked loop provided with an additional controlled phase shifter
50.
REGION-OF-INTEREST (ROI)-BASED IMAGE ENHANCEMENT USING A RESIDUAL NETWORK
Region-of-interest (ROI)-based image enhancement using a residual network, including: generating, based on an input image and a residual path of a residual network, a first output corresponding to a region-of-interest of the input image; generating, based on the input image and a skip path of the residual network, a second output; and generating an output image based on the first output and the second output.
Systems, apparatuses, and methods for dynamically estimating power losses in a computing system. A system management circuit tracks a state of a computing system and dynamically estimates power losses in the computing system based in part on the state. Based on the estimated power losses, power consumption of the computing system is estimated. In response to detecting reduced power losses in at least a portion of the computing system, the system management circuit is configured to increase a power-performance state of one or more circuits of the computing system while remaining within a power allocation limit of the computing system.
Systems, apparatuses, and methods for managing power allocation in a computing system. A system management unit detects a condition indicating a change in power is indicated. Such a change may be detecting an indication that a power change is either required, possible, or requested. In response to detecting a reduction in power is indicated, the system management unit identifies currently executing tasks of the computing system and accesses sensitivity data to determine which of a number of computing units (or power domains) to select for power reduction. Based at least in part on the data, a unit is identified that is determined to have a relatively low sensitivity to power state changes under the current operating conditions. A relatively low sensitivity indicates that a change in power to the corresponding unit will not have as significant an impact on overall performance of the computing system than if another unit was selected. Power allocated for the selected unit is then decreased.
A processing system including a parallel processing unit selectively allocating pages of memory for interleaving across configurable subsets of channels based on a mode of allocation. In some embodiments, in a first mode, a page of memory is allocated to and interleaved across a plurality of channels, and in a second mode, a page of memory is allocated to and interleaved across a subset of the plurality of channels.
Systems, apparatuses, and methods for dynamically estimating power losses in a computing system. A system management circuit tracks a state of a computing system and dynamically estimates power losses in the computing system based in part on the state. Based on the estimated power losses, power consumption of the computing system is estimated. In response to detecting reduced power losses in at least a portion of the computing system, the system management circuit is configured to increase a power-performance state of one or more circuits of the computing system while remaining within a power allocation limit of the computing system.
Systems, apparatuses, and methods for managing power allocation in a computing system. A system management unit detects a condition indicating a change in power is indicated. Such a change may be detecting an indication that a power change is either required, possible, or requested. In response to detecting a reduction in power is indicated, the system management unit identifies currently executing tasks of the computing system and accesses sensitivity data to determine which of a number of computing units (or power domains) to select for power reduction. Based at least in part on the data, a unit is identified that is determined to have a relatively low sensitivity to power state changes under the current operating conditions. A relatively low sensitivity indicates that a change in power to the corresponding unit will not have as significant an impact on overall performance of the computing system than if another unit was selected. Power allocated for the selected unit is then decreased.
Methods and systems are disclosed for managing performance states of a data fabric of a system on chip (SoC). Techniques disclosed include determining a performance state of the data fabric based on data fabric bandwidth utilizations of respective components of the SoC. A metric, characteristic of a workload centric to cores of the SoC, is derived from hardware counters, and, based on the metric, it is determined whether to alter the performance state.
A system and method for automatically adjusting light conditions in a video conference environment are disclosed. A video conferencing system includes a camera to capture an image of a video conference participant and a computing device to evaluate and compensate for lighting conditions. A display device enables the participant to view other video conference participants. Video data captured by the camera is conveyed to the computing device. The computing device is configured to evaluate lighting conditions of a captured image and evaluate the lighting conditions for possible adjustment. Responsive to an evaluation of the lighting conditions, the computing device is configured to automatically generate a light border for display on the display device. The light border is composited with window display data received from a video conference application. The light border generated by the computing device is generated to create an amount of light that compensates for low light, or uneven light, conditions.
An apparatus and method for efficiently updating power supply voltages due to degradation from aging. A computing system includes one or more functional units and a runtime voltage calibrator (or calibrator). The calibrator is capable of performing power supply calibration for the one or more supply voltage power rail used by the one or more functional units. The calibrator identifies a particular ground reference power rail that is received by the one or more functional units. The calibrator also identifies a first supply voltage power rail that is received by at least a first functional unit of the one or more functional units. If the runtime voltage calibrator determines that all circuitry that uses the particular ground reference power rail is idle, the calibrator performs power supply calibration for the first supply voltage power rail. The calibrator does not wait for a bootup operation and avoids interference from ground bounce.
Methods and systems are disclosed for managing performance states of a data fabric of a system on chip (SoC). Techniques disclosed include determining a performance state of the data fabric based on data fabric bandwidth utilizations of respective components of the SoC. A metric, characteristic of a workload centric to cores of the SoC, is derived from hardware counters, and, based on the metric, it is determined whether to alter the performance state.
A data processor is for accessing a memory having a first pseudo channel and a second pseudo channel. The data processor includes at least one memory accessing agent, a memory controller, and a data fabric. The at least one memory accessing agent generates generating memory access requests including first memory access requests that access the memory. The memory controller provides memory commands to the memory in response to the first memory access requests. The data fabric routes the first memory access requests to a first downstream port in response to a corresponding first memory request accessing the first pseudo channel, and to a second downstream port in response to the corresponding first memory request accessing the second pseudo channel. The memory controller has first and second upstream ports coupled to the first and second downstream ports of the data fabric, respectively, and a downstream port coupled to the memory.
G11C 7/22 - Read-write [R-W] timing or clocking circuits; Read-write [R-W] control signal generators or management
G11C 7/10 - Input/output [I/O] data interface arrangements, e.g. I/O data control circuits, I/O data buffers
G11C 8/18 - Address timing or clocking circuits; Address control signal generation or management, e.g. for row address strobe [RAS] or column address strobe [CAS] signals
G11C 5/02 - Disposition of storage elements, e.g. in the form of a matrix array
Techniques for implementing a smart feedback design for verification that reduce production and verification time by enabling a verification system to perform piecemeal verification of components of a circuit design selectively, accurately, and exhaustively before a final, overall circuit design is completed are disclosed. Circuit nodes in an emulation model are selected and smart feedback is provided to the nodes in response to signals detected at the nodes such that behavior of unavailable or unverified components to be located at the nodes can be simulated. Smart feedback can be provided to the node to enable verification of the emulation model without having to wait for the unverified or unavailable components to be provided or verified. A request for manufacture may be generated including aspects of the emulation model to enable verification of a fabricated circuit in a similar or identical manner to those used to verify the emulation model.
G06F 30/3308 - Design verification, e.g. functional simulation or model checking using simulation
G06F 30/27 - Design optimisation, verification or simulation using machine learning, e.g. artificial intelligence, neural networks, support vector machines [SVM] or training a model
62.
ASSIGNING BIT BUDGETS TO PARALLEL ENCODED VIDEO DATA
A technique for encoding video is provided. The technique includes for a first portion of a first frame that is encoded by a first encoder in parallel with a second portion of the first frame that is encoded by a second encoder, determining a historical complexity distribution; determining a first bit budget for the first portion of the first frame based on the historical complexity distribution; and encoding the first portion of the first frame by the first encoder, based on the first bit budget.
H04N 19/436 - Methods or arrangements for coding, decoding, compressing or decompressing digital video signals - characterised by implementation details or hardware specially adapted for video compression or decompression, e.g. dedicated software implementation using parallelised computational arrangements
H04N 19/14 - Coding unit complexity, e.g. amount of activity or edge presence estimation
H04N 19/172 - Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object the region being a picture, frame or field
H04N 19/176 - Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object the region being a block, e.g. a macroblock
H04N 19/146 - Data rate or code amount at the encoder output
63.
ADAPTIVE THREAD MANAGEMENT FOR HETEROGENOUS COMPUTING ARCHITECTURES
An apparatus and method for efficiently scheduling tasks in a dynamic manner to multiple cores that support a heterogeneous computing architecture. A computing system includes multiple cores with at least two cores being capable of executing instructions of a same instruction set architecture (ISA), and therefore, are architecturally compatible. In an implementation, each of the at least two cores is a general-purpose central processing unit (CPU) core capable of executing instructions of a same ISA. However, the throughput and the power consumption greatly differ between the at least two cores based on their hardware designs. An operating system scheduler assigns a thread to a first core, and the first core measures thread dynamic behavior of the thread over a time interval. Based on the thread dynamic behavior, the scheduler reassigns the thread to a second core different from the first core.
An apparatus and method for efficiently scheduling tasks in a dynamic manner to multiple cores that support a heterogeneous computing architecture. A computing system includes multiple cores with at least two cores being capable of executing instructions of a same instruction set architecture (ISA), and therefore, are architecturally compatible. In an implementation, each of the at least two cores is a general-purpose central processing unit (CPU) core capable of executing instructions of a same ISA. However, the throughput and the power consumption greatly differ between the at least two cores based on their hardware designs. An operating system scheduler assigns a thread to a first core, and the first core measures thread dynamic behavior of the thread over a time interval. Based on the thread dynamic behavior, the scheduler reassigns the thread to a second core different from the first core.
Huffman packing for delta compression is described. In accordance with the described techniques, delta values between neighboring elements of a data block are generated using delta compression. The delta values are transformed according to a transformation algorithm. The transformed delta values are packed using Huffman encoding to generate compressed data that corresponds to the data block.
A data processor is for accessing a memory having a first pseudo channel and a second pseudo channel. The data processor includes at least one memory accessing agent, a memory controller, and a data fabric. The at least one memory accessing agent generates generating memory access requests including first memory access requests that access the memory. The memory controller provides memory commands to the memory in response to the first memory access requests. The data fabric routes the first memory access requests to a first downstream port in response to a corresponding first memory request accessing the first pseudo channel, and to a second downstream port in response to the corresponding first memory request accessing the second pseudo channel. The memory controller has first and second upstream ports coupled to the first and second downstream ports of the data fabric, respectively, and a downstream port coupled to the memory.
Adaptive digital content preprocessing techniques based on a bitrate are described. In an implementation, a parameter of a preprocessing module is set based on a target bitrate. The parameter specifies an amount of preprocessing to be performed in preprocessing digital content. Preprocessed digital content is generated by preprocessing the digital content by the specified amount using the preprocessing module. Encoded digital content is generated by compressing the preprocessed digital content using a compression technique by an encoder. The encoded digital content is then transmitted for communication at the target bitrate.
H04N 21/2662 - Controlling the complexity of the video stream, e.g. by scaling the resolution or bitrate of the video stream based on the client capabilities
H04N 19/117 - Filters, e.g. for pre-processing or post-processing
H04N 19/136 - Incoming video signal characteristics or properties
H04N 21/24 - Monitoring of processes or resources, e.g. monitoring of server load, available bandwidth or upstream requests
68.
MULTI-GPU DEVICE PCIE TOPOLOGY RETRIEVAL IN GUEST VM
A system and method for efficiently scheduling tasks to multiple endpoint devices are described. In various implementations, a computing system has a physical hardware topology that includes multiple endpoint devices and one or more general-purpose central processing units (CPUs). A virtualization layer is added between the hardware of the computing system and an operating system that creates a guest virtual machine (VM) with multiple endpoint devices. The guest VM utilizes a guest VM topology that is different from the physical hardware topology. The processor of an endpoint device that runs the guest VM accesses a table of latency information for one or more pairs of endpoints of the guest VM based on physical hardware topology, rather than based on the guest VM topology. The processor schedules tasks on paths between endpoint devices based on the table.
A method and system for providing memory in a computer system. The method includes receiving a memory access request for a shared memory address from a processor, mapping the received memory access request to at least one virtual memory pool to produce a mapping result, and providing the mapping result to the processor.
G06F 12/1036 - Address translation using associative or pseudo-associative address translation means, e.g. translation look-aside buffer [TLB] for multiple virtual address spaces, e.g. segmentation
G06F 12/08 - Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
G06F 12/06 - Addressing a physical block of locations, e.g. base addressing, module addressing, address space extension, memory dedication
Adaptive decoder-drive encoder reconfiguration techniques are described. In one example, techniques include detecting an operational condition at a consumer using a sensor, the consumer receiving a communication of digital content from an encoder; generating an adaptation instruction by the decoder based on the detecting; transmitting the adaptation instruction by the decoder for receipt by the encoder; and receiving an adapted communication of the digital content generated by the encoder, the adapted communication caused by reconfiguration of the encoder based on the adaptation instruction received from the decoder.
H04N 21/2343 - Processing of video elementary streams, e.g. splicing of video streams or manipulating MPEG-4 scene graphs involving reformatting operations of video signals for distribution or compliance with end-user requests or end-user device requirements
H04N 21/4402 - Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to MPEG-4 scene graphs involving reformatting operations of video signals for household redistribution, storage or real-time display
H04N 21/442 - Monitoring of processes or resources, e.g. detecting the failure of a recording device, monitoring the downstream bandwidth, the number of times a movie has been viewed or the storage space available from the internal hard disk
71.
DYNAMIC REPARTITION OF MEMORY PHYSICAL ADDRESS MAPPING
Systems and methods for dynamic repartitioning of physical memory address mapping involve relocating data stored at one or more physical memory locations of one or more memory devices to another memory device or mass storage device, repartitioning one or more corresponding physical memory maps to include new mappings between physical memory addresses and physical memory locations of the one or more memory devices, then loading the relocated data back onto the one or more memory devices at physical memory locations determined by the new physical address mapping. Such dynamic repartitioning of the physical memory address mapping does not require a processing system to be rebooted and has various applications in connection with interleaving reconfiguration and error correcting code (ECC) reconfiguration of the processing system.
A system and method for texture decompression is described. The method comprises receiving a compressed texture block including two or more disjoint subsets of data and decompressing the compressed texture block. The decompressing includes decompressing each of the two or more disjoint subsets in the compressed texture block to form texels. The two or more disjoint subsets include a first disjoint subset having a first set of color endpoints and a first index value for a first texel, and a second disjoint subset having a second set of color endpoints.
H04N 19/426 - Methods or arrangements for coding, decoding, compressing or decompressing digital video signals - characterised by implementation details or hardware specially adapted for video compression or decompression, e.g. dedicated software implementation characterised by memory arrangements using memory downsizing methods
H04N 19/176 - Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object the region being a block, e.g. a macroblock
H04N 19/119 - Adaptive subdivision aspects e.g. subdivision of a picture into rectangular or non-rectangular coding blocks
H04N 19/46 - Embedding additional information in the video signal during the compression process
H04N 19/60 - Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using transform coding
H04N 19/182 - Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being a pixel
An address translation buffer or ATB is provided for emulating or implementing the PCIe (Peripheral Component Interface Express) ATS (Address Translation Services) protocol within a PCIe-compliant device. The ATB operates in place of (or in addition to) an address translation cache (ATC), but is implemented in firmware or hardware without requiring the robust set of resources associated with a permanent hardware cache (e.g., circuitry for cache control and lookup). A component of the device (e.g., a DMA engine) requests translation of an untranslated address, via a host input/output memory management unit for example, and the response (including a translated address) is stored in the ATB for use for a single DMA operation (which may involve multiple transactions across the PCIe bus).
G06F 13/28 - Handling requests for interconnection or transfer for access to input/output bus using burst mode transfer, e.g. direct memory access, cycle steal
A technique for operating a cache is disclosed. The technique includes in response to a power down trigger that indicates that the cache effectiveness is considered to be low, powering down the cache.
Platform power management includes boosting performance in a platform power boost mode or restricting performance to keep a power or temperature under a desired threshold in a platform power cap mode. Platform power management exploits the mutually exclusive nature of activities and the associated headroom created in a temperature and/or power budget of a server platform to boost performance of a particular component while also keeping temperature and/or power below a threshold or budget.
A link controller includes a Peripheral Component Interconnect Express (PCIe) physical layer circuit for coupling to a communication link and providing a data path over the communication link, a first data link layer controller which operates according to a PCIe protocol, and a second data link layer controller which operates according to a non-PCIe protocol. A multiplexer-demultiplexer selectively connects both data link layer controllers to the PCIe physical layer circuit. A protocol translation circuit is coupled between the multiplexer-demultiplexer and the second data link layer controller, the protocol translation circuit receiving traffic data from the second data link layer controller in a non-PCIe format, encapsulating the non-PCIe format in a PCIe format, and passing traffic data to the multiplexer-demultiplexer circuit.
A technique for operating a cache is disclosed. The technique includes in response to a power down trigger that indicates that the cache effectiveness is considered to be low, powering down the cache.
G06F 1/3287 - Power saving characterised by the action undertaken by switching off individual functional units in the computer system
G06F 1/3234 - Power saving characterised by the action undertaken
G06F 12/0891 - Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches using clearing, invalidating or resetting means
Apparatuses, systems and methods for routing requests and responses targeting a shared resource. A queue in a communication fabric is located in a path between the requesters and a shared resource. In some embodiments, the shared resource is a shared address translation cache stored in an endpoint. The physical channel between the queue and the shared resource supports multiple virtual channels. The queue assigns at least one entry to each virtual channel of a group of virtual channels where the group includes a virtual channel for each address translation request type from a single requester of the multiple requesters. When the at least one entry for a given requester is de-allocated, the queue allocates this entry only with requests from the assigned virtual channel even if the empty entry is the only available entry of the queue.
A system and method for creating layout for semiconductor chips are described. In various implementations, an integrated circuit includes at least a first functional block and a second functional block. The first functional block includes circuitry that has a first set of parameters of a first process corner. The second functional block includes circuitry that has a second set of parameters of a second process corner different from the first set of parameters of the first process corner. For a same set of operating conditions, the second functional block has device characteristics different from device characteristics of the first functional block based on the first process corner and the second process corner being different from one another. The integrated circuit is fabricated with a process corner mask that indicates which areas of the die use the first process corner and which areas use the second process corner.
A processing apparatus is provided which includes memory configured to store hardware parameter settings for each of a plurality of applications. The processing apparatus also includes a processor in communication with the memory configured to store, in the memory, the hardware parameter settings, identify one of the plurality of applications as a currently executing application and control an operation of hardware by tuning a plurality of hardware parameters according to the stored hardware parameter settings for the identified application.
Methods, systems, and apparatuses provide support for multiple address spaces in order to facilitate data movement. One apparatus includes an input/output memory management unit (IOMMU) comprising: a plurality of memory-mapped input/output (MMIO) registers that map memory address spaces belonging to the IOMMU and at least a second IOMMU; and hardware control logic operative to: synchronize the plurality of MMIO registers of the at least the second IOMMU; receive, from a peripheral component endpoint coupled to the IOMMU, a direct memory access (DMA) request, the DMA request to a memory address space belonging to the at least the second IOMMU; access the plurality of MMIO registers of the IOMMU based on context data of the DMA request; and access, from the IOMMU, a function assigned to the memory address space belonging to the at least the second IOMMU based on the accessed plurality of MMIO registers.
G06F 12/06 - Addressing a physical block of locations, e.g. base addressing, module addressing, address space extension, memory dedication
G06F 13/28 - Handling requests for interconnection or transfer for access to input/output bus using burst mode transfer, e.g. direct memory access, cycle steal
G06F 13/42 - Bus transfer protocol, e.g. handshake; Synchronisation
82.
DETECTING PERSONAL-SPACE VIOLATIONS IN ARTIFICIAL INTELLIGENCE BASED NON-PLAYER CHARACTERS
Systems, apparatuses, and methods for detecting personal-space violations in artificial intelligence (AI) based non-player characters (NPCs) are disclosed. An AI engine creates a NPC that accompanies and/or interacts with a player controlled by a user playing a video game. During gameplay, measures of context-dependent personal space around the player and/or one or more NPCs are generated. A control circuit monitors the movements of the NPC during gameplay and determines whether the NPC is adhering to or violating the measures of context-dependent personal space. The control circuit can monitor the movements of multiple NPCs simultaneously during gameplay, keeping a separate score for each NPC. After some amount of time has elapsed, the scores of the NPCs are recorded, and then the scores are provided to a machine learning engine to retrain the AI engines controlling the NPCs.
A63F 13/56 - Computing the motion of game characters with respect to other game characters, game objects or elements of the game scene, e.g. for simulating the behaviour of a group of virtual soldiers or for path finding
83.
STACK-BASED RAY TRAVERSAL WITH DYNAMIC MULTIPLE-NODE ITERATIONS
A technique for performing ray tracing operations is provided, The technique includes, in response to detecting that a threshold number of traversal stage work-items of a wavefront have terminated, increasing intersection test parallelization for non-terminated work-items..
A technique for performing ray tracing operations is provided. The technique includes, in response to detecting that a threshold number of traversal stage work-items of a wavefront have terminated, increasing intersection test parallelization for non-terminated work-items.
A first frame of a video stream is obtained. The first frame is defined by a plurality of pixels associated with a set of color data. A determination is made that a pixel of the plurality of pixels comprises high-frequency information. Responsive to the determination that the pixel comprises high-frequency information, a pixel lock is generated for the pixel such that color data associated with the pixel is maintained during a color accumulation process for at least one of the first frame or a second frame of the video stream that is subsequent to the first frame.
A first frame of a video stream rendered at a first resolution is obtained. A second frame of the video stream upscaled to a second higher resolution is also obtained. The first plurality of pixels is upscaled to the second resolution. The upsampling generates upsampled color data for the upsampled first plurality of pixels. The upsampled color data is accumulated with a second set of color data associated with a second plurality of pixels defining the second frame to generate final color data for the upsampled first plurality of pixels. Color data of the second set of color data associated with a pixel lock contributes more to the final color data than corresponding color data of the upsampled color data. The upsampled first plurality of pixels is stored with the final color data as an upscaled frame representing the first frame at the second resolution.
A graphics processing system comprises at least one memory device storing a plurality of pixel command threads and a plurality of vertex command threads. An arbiter coupled to the at least one memory device is provided that selects a pixel command thread from the plurality of pixel command threads and a vertex command thread from the plurality of vertex command threads. The arbiter further selects a command thread from the previously selected pixel command thread and the vertex command thread, which command thread is provided to a command processing engine capable of processing pixel command threads and vertex command threads.
G09G 5/00 - Control arrangements or circuits for visual indicators common to cathode-ray tube indicators and other visual indicators
G09G 5/36 - Control arrangements or circuits for visual indicators common to cathode-ray tube indicators and other visual indicators characterised by the display of individual graphic patterns using a bit-mapped memory
88.
VIDEO TIMING FOR DISPLAY SYSTEMS WITH VARIABLE REFRESH RATES
A display system supports variable refresh rates that include a plurality of refresh rates. A source such as a graphics processing unit (GPU) provides frames to the display system at a selected one of the refresh rates. The refresh rates are factored into a corresponding plurality of prime factors. A plurality of numbers of lines per frame in frames provided at the plurality of refresh rates is determined based on one or more ratios of the plurality of refresh rates, the plurality of prime factors, and a line rate for providing frames to the display system at the plurality of refresh rates. The source then selectively provides frames to the display system at one refresh rate of the plurality of refresh rates using the same line rate regardless of which refresh rate is chosen. Furthermore, the number of lines per frame is an integer for frames provided at the refresh rates.
A method and system for operating in a single display mode operation and a dual pipe mode of operation is disclosed. The method and system includes operating in a dual pipe mode of operation in which each display pipe transmits data from a respective buffer to an associated display. The method and system further includes operating in a single display mode of operation in which one display pipe transmits data from a plurality of buffers to an associated display.
A memory package includes first, second, third, and fourth channels arranged consecutively in a clockwise direction on the memory package, each of the first, second, third, and fourth channels having access circuitry and memory arrays. In a first mode, the first channel controls access to the memory arrays in the second channel and the fourth channel controls access to the memory arrays in the third channel.
A memory package includes first, second, third, and fourth channels arranged consecutively in a clockwise direction on the memory package, each of the first, second, third, and fourth channels having access circuitry and memory arrays. In a first mode, the first channel controls access to the memory arrays in the second channel and the fourth channel controls access to the memory arrays in the third channel.
An apparatus includes a processor configured to determine a first distribution associated with an artificial agent based on behavior associated with the artificial agent and a second distribution based on behavior of a user. The processor is further configured to generate a human-likeness similarity measurement by comparing the first distribution to the second distribution and modify the behavior of the artificial agent in response to the similarity measurement failing to satisfy a similarity threshold.
An apparatus includes a processor configured to determine a first distribution associated with an artificial agent based on behavior associated with the artificial agent and a second distribution based on behavior of a user. The processor is further configured to generate a human-likeness similarity measurement by comparing the first distribution to the second distribution and modify the behavior of the artificial agent in response to the similarity measurement failing to satisfy a similarity threshold.
There are many instances where a standard dynamic range (“SDR”) overlay is displayed over high dynamic range (“HDR”) content on HDR displays. Because the overlay is SDR, the maximum brightness of the overlay is much lower than the maximum brightness of the HDR content, which can lead to the SDR elements being obscured if those elements have at least some transparency. The present disclosure provides techniques including modifying the luminance of either or both of the HDR and SDR content when an SDR layer with some transparency is displayed over HDR content. A variety of techniques are provided. In one example, a fixed adjustment is applied to pixels of one or both of the SDR layer and the HDR layer. The fixed adjustment comprises decreasing the luminance of the HDR layer and/or increasing the luminance of the SDR layer. In another example, a variable adjustment is applied.
H04N 23/741 - Circuitry for compensating brightness variation in the scene by increasing the dynamic range of the image compared to the dynamic range of the electronic image sensors
95.
Arbitration allocating requests during backpressure
An arbitration system receives requests to access a destination during an arbitration window that spans multiple processor clock cycles. During each clock cycle, the destination is monitored to determine whether the destination is suffering from backpressure by receiving more requests than the destination is able to accommodate during the clock cycle. In response to detecting backpressure, a masking index value assigned to a requesting source is incremented, which limits an amount of requests from the source that will be granted destination access during a subsequent arbitration window. Alternatively, in response to detecting an absence of backpressure during an arbitration window, the masking index value is decremented, which increases the amount of requests from the source that will be granted destination access during a subsequent arbitration window. This arbitration process continues for successive arbitration windows, oscillating between incrementing and decrementing the masking index value during the successive arbitration windows.
G06F 13/372 - Handling requests for interconnection or transfer for access to common bus or bus system with decentralised access control using a time-dependent priority, e.g. individually loaded time counters or time slot
G06F 13/364 - Handling requests for interconnection or transfer for access to common bus or bus system with centralised access control using independent requests or grants, e.g. using separated request and grant lines
G06F 9/48 - Program initiating; Program switching, e.g. by interrupt
G06F 13/366 - Handling requests for interconnection or transfer for access to common bus or bus system with centralised access control using a centralised polling arbiter
96.
Peripheral device protocols in confidential compute architectures
Restricting peripheral device protocols in confidential compute architectures, the method including: receiving a first address translation request from a peripheral device supporting a first protocol, wherein the first protocol supports cache coherency between the peripheral device and a processor cache; determining that a confidential compute architecture is enabled; and providing, in response to the first address translation request, a response including an indication to the peripheral device to not use the first protocol.
Systems, apparatuses, and methods for implementing a safety monitor framework for a safety-critical inference application are disclosed. A system includes a safety-critical inference application, a safety monitor, and an inference accelerator engine. The safety monitor receives an input image, test data, and a neural network specification from the safety-critical inference application. The safety monitor generates a modified image by adding additional objects outside of the input image. The safety monitor provides the modified image and neural network specification to the inference accelerator engine which processes the modified image and provides outputs to the safety monitor. The safety monitor determines the likelihood of erroneous processing of the original input image by comparing the outputs for the additional objects with a known good result. The safety monitor complements the overall fault coverage of the inference accelerator engine and covers faults only observable at the network level.
G06V 10/82 - Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
G06V 10/764 - Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
G06V 10/98 - Detection or correction of errors, e.g. by rescanning the pattern or by human intervention; Evaluation of the quality of the acquired patterns
Cascading execution of atomic operations, including: receiving a request for each thread of a plurality of threads to perform an atomic operation, wherein the plurality of threads comprises a plurality of thread subsets each corresponding to a local memory, wherein the local memory for a thread subset is accessible by the thread subset and inaccessible to a remainder of threads in the plurality of threads; generating a plurality of intermediate results by performing, by each thread subset, the atomic operation in the local memory corresponding to the thread subset; and generating a result for the request by aggregating the plurality of intermediate results in a shared memory accessible to all threads in the plurality of threads.
Restricting peripheral device protocols in confidential compute architectures, the method including: receiving a first address translation request from a peripheral device supporting a first protocol, wherein the first protocol supports cache coherency between the peripheral device and a processor cache; determining that a confidential compute architecture is enabled; and providing, in response to the first address translation request, a response including an indication to the peripheral device to not use the first protocol.
A processing system [100] divides successive dispatches [135] of work items into portions [145]. The successive dispatches are separated from each other by barriers [202], [204], each barrier indicating that the work items of the previous dispatch must complete execution before work items of a subsequent dispatch can begin execution. In some embodiments, the processing system interleaves execution of portions of a first dispatch with portions of subsequent dispatches that consume data produced by the first dispatch. The processing system thereby reduces the amount of data written to the local cache [120] by a producer dispatch while preserving data locality for a subsequent consumer (or consumer/producer) dispatch and facilitating processing efficiency.