Intel FPGA IP Core Cache Interface (CCI-P)

Size: px
Start display at page:

Download "Intel FPGA IP Core Cache Interface (CCI-P)"

Transcription

1 Intel FPGA IP Core Cache Interface (CCI-P) Interface Specification September 2017 Revision 0.5 Document Number: External

2 Notice: This document contains information on products in the design phase of development. The information here is subject to change without notice. Do not finalize a design with this information. Intel technologies features and benefits depend on system configuration and may require enabled hardware, software, or service activation. Learn more at intel.com, or from the OEM or retailer. No computer system can be absolutely secure. Intel does not assume any liability for lost or stolen data or systems or any damages resulting from such losses. You may not use or facilitate the use of this document in connection with any infringement or other legal analysis concerning Intel products described herein. You agree to grant Intel a non-exclusive, royalty-free license to any patent claim thereafter drafted which includes subject matter disclosed herein. No license (express or implied, by estoppel or otherwise) to any intellectual property rights is granted by this document. The products described may contain design defects or errors known as errata which may cause the product to deviate from published specifications. Current characterized errata are available on request. This document contains information on products, services and/or processes in development. All information provided here is subject to change without notice. Contact your Intel representative to obtain the latest Intel product specifications and roadmaps. Intel disclaims all express and implied warranties, including without limitation, the implied warranties of merchantability, fitness for a particular purpose, and non-infringement, as well as any warranty arising from course of performance, course of dealing, or usage in trade. Tests document performance of components on a particular test, in specific systems. Differences in hardware, software, or configuration will affect actual performance. Consult other sources of information to evaluate performance as you consider your purchase. For more complete information about performance and benchmark results, visit Results have been estimated or simulated using internal Intel analysis or architecture simulation or modeling, and provided to you for informational purposes. Any differences in your system hardware, software or configuration may affect your actual performance. Copies of documents which have an order number and are referenced in this document may be obtained by calling or by visiting Intel, the Intel logo, and Xeon are trademarks of Intel Corporation in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others. Copyright 2017, Intel Corporation. All Rights Reserved. 2 Document Number: External, Revision 0.5

3 Contents 1 About this Document Intended Audience Conventions Related Documentation Glossary Introduction Multi-chip and Discrete Package with Intel FPGA Block Diagram Development Model Memory Hierarchy CCI-P Interface Features Signaling Information Read from/write to Main Memory UMsg MMIO Cycles to I/O Memory CCI-P Tx Signals Tx Header Format CCI-P Rx Signals Rx Header and Rx Data Format Multi-Cache Line Memory Requests Additional Control Signals Protocol Flow Upstream Requests Downstream Requests Ordering Rules Memory Requests MMIO Requests Timing Diagrams Clock Frequency CCI-P Guidance AFU Requirements Mandatory AFU CSR Definitions AFU Discovery Flow AFU_ID How to Create an AFU_ID / GUID How to Use an AFU_ID Basic Building Blocks Device Feature List Document Number: External, Revision 0.5 3

4 Figures Tables Figure 2-1. High Level Block Diagram of MCP/DCP with Intel FPGA IP Logic. 13 Figure 2-2. MCP/DCP with Intel FPGA IP System Memory Hierarchy, 1 Processor Topology Figure 3-1. CCI-P Signals Figure 3-2. UMsg Initialization and Usage Flow Figure 3-3. Multi-CL Memory Request Figure 3-4. Multi-CL Memory Write Responses Figure 3-5. Multi-CL Memory Read Responses Figure 3-6. Two Writes on Same VC, Only One Outstanding Figure 3-7. Write Out of Order Commit Figure 3-8. Use WrFence to Enforce Write Ordering Figure 3-9. Read Re-Ordering to Same Address, Different VCs Figure Read Re-Ordering to Same Address, Same VC Figure Tx Channel 0 and 1 Almost Full Threshold Figure Write Fence Behavior Figure C0 Rx Channel Interleaved between MMIO Requests and Memory Responses Figure Rd Response Timeout Figure 4-1. AFU Discovery Flow Figure 6-1. Example Feature Hierarchy Figure 6-2. Device Feature Conceptual View Table 1-1. Related Documentation... 8 Table 1-2. Acronyms and Definition Table... 9 Table 2-1. CCI-P Features Table 2-2. Comparison of Platform Capabilities Table 2-3. AFU Memory Read Paths Table 3-1. CCI-P Features Summary Table 3-2. Tx Channel Description Table 3-3. Tx Header Field Definitions Table 3-4. Tx Header Field Definitions Table 3-5. C0 Read Memory Request Header Format Structure: t_ccip_c0_reqmemhdr Table 3-6. C1 Write Memory Request Header Format Structure: t_ccip_c1_reqmemhdr Table 3-7. C1 Fence Header Format Structure: t_ccip_c1_reqfencehdr Table 3-8. C2 MMIO Response Header Format Table 3-9. Rx Channel Signal Description Table Rx Header Field Definitions Table AFU Rx Response Encodings and Channels Mapping Table C0 Memory Read Response Header Format Structure: t_ccip_c0_rspmemhdr Table MMIO Request Header Format Document Number: External, Revision 0.5

5 Table C1 Memory Write Response Header Format Structure: t_ccip_c1_rspmemhdr Table UMsg Header Format Table WrFence Header Format Structure: t_ccip_c1_rspfencehdr Table Clock and Reset Table Protocol Flow for Upstream Request from AFU to FIU Table CCI-P VL0 protocol flows Table Protocol Flow for Downstream Requests from CPU to AFU Table Ordering Rules for Upstream Requests from AFU Table MMIO Ordering Rules Table Clock Frequency Table Recommended Choices for Memory Requests Table 4-1. Register Attribute Definition Table 4-2. Mandatory AFU CSRs Table 4-3. Feature Header CSR Definition Table 4-4. AFU_ID_L CSR Definition Table 4-5. AFU_ID_H CSR Definition Table 4-6. DFH_RSVD0 CSR Definition Table 4-7. DFH_RSVD1 CSR Definition Table 6-1. Device Feature Header CSR Table 6-2. Next DFH Byte Offset Example Table 6-3. Mandatory BBB DFH Register Map Table 6-4. BBB_ID_L CSR Definition Table 6-5. BB_ID_H CSR Definition Code Code 3-1. ccip_std_afu Port Map Code 3-2. Tx Interface Structure Inside ccip_if_pkg.sv Code 3-3. Tx Channel Structure Inside ccip_if_pkg.sv Code 3-4. Rx Interface Structure Inside ccip_if_pkg.sv Code 3-5. Rx Channel Structure Inside ccip_if_pkg.sv Code 4-1. Set the Mandatory AFU Registers in the AFU Code 4-2. Software Reads the AFU ID Document Number: External, Revision 0.5 5

6 Document Revision History Document Number Revision Number Description Date External 0.5 Initial External Revision Sept Document Number: External, Revision 0.5

7 1 About this Document This document describes the Core Cache Interface (CCI-P) specification which is the interface between the Accelerated Function Unit (AFU) and a multi-chip package (MCP) or Discrete Chip Package (DCP) with Intel FPGA IP. 1.1 Intended Audience The intended audience is system engineers, platform architects, and software developers. Users must design the HW AFU to be compliant with the CCI-P specification. 1.2 Conventions Conventions used in this document include the following: # Precedes a command that indicates the command is to be entered as root. $ Indicates a command is to be entered as a user. This font <variable_name> Filenames, commands, and keywords are printed in this font. Long command lines are printed in this font. Although some very long command lines may wrap to the next line, the return is not considered part of the command; do not enter it. Indicates the placeholder text that appears between the angle brackets is to be replaced with an appropriate value. Do not enter the angle brackets. Document Number: External, Revision 0.5 7

8 1.3 Related Documentation Table 1-1. Related Documentation Description Intel Arria 10 Avalon-ST Interface with SR-IOV PCIe Solutions User Guide Document Number or Location This document is Intel Arria 10 PCIe* SR-IOV datasheet. Intel Software Developers Manual This document contains all three volumes of the Intel 64 and IA-32 Architecture Software Development Manual: Basic Architecture, Order Number ; Instruction Set Reference A-Z, Order Number ; System Programming Guide, Order Number Refer to all three volumes when evaluating your design needs. manuals/64-ia-32-architectures-software-developer-manual pdf Intel Virtualization Technology for Directed I/O Architecture Specification This document describes the Intel Virtualization Technology for Directed I/O (Intel VT for Directed I/O); specifically, it describes the components supporting I/O virtualization as it applies to platforms that use Intel processors and core logic chipsets complying with Intel platform specifications. roduct-specifications/vt-directed-io-spec.pdf 8 Document Number: External, Revision 0.5

9 1.4 Glossary Table 1-2. Acronyms and Definition Table Acronyms Description Expansion AFU Accelerated Function Unit Hardware accelerator implemented in FPGA logic that accelerates or intends to accelerate an application. ALI AFU Link Interface This is the interface between software and CCI-P. ASE AFU Simulation Environment A co-development and simulation tool suite available in software SDK. CA Caching Agent A Caching Agent (CA) makes read and write requests to the coherent memory in the system. It is also responsible for servicing snoops generated by other Intel QuickPath Interconnect (Intel QPI) agents in the system. CCI-P Core Cache Interface Interface between the AFU and the FPGA Interface Unit (FIU). CL Cache Line 64-byte cache line DPI Direct Programming Interface A set of features in SystemVerilog that allows export/import of parameters to/from a C function. FIU FPGA Interface Unit The Intel UPI and PCIe* on FPGA together form the FIU sub-block. FPGA Field Programmable Grid Array PA Physical Address Physical Address of the host machine IP Intellectual property A sample AFU IPC Inter-Process Communication Refers to constructs in Linux* like shared memory (/dev/shm) and message queues (/dev/mqueue); these are leveraged for ASE core functionality. KiB 1024 bytes The term KiB is for 1024 bytes and KB for 1000 bytes. When referring to memory, KB is often used and KiB is implied. When referring to clock frequency, khz is used, and here K is Mdata Message Tag Data This is a user-defined field, which is relayed from Tx header to the Rx header. It may be used to tag requests with transaction ID or channel ID. Msg Message Message - a control notification NLB Native Loopback Adapter Intel s proprietary interconnect protocol between Intel cores or other IP PAR Place and Route A set of runtime and software development tools that facilitate the deployment of systems consisting of a collection of non-uniform, asymmetric compute resources. RdLine_I 1 Read Line Invalid Memory Read Request, with FPGA cache hint set to Invalid, i.e., do not cache it. The line will not cached in FPGA, but may cause FPGA cache pollution. 1 The cache tag is used to track the request status for all outstanding requests on UPI. Therefore, even though RdLine_I is marked Invalid upon completion, it consumes the cache tag temporarily to track the request status over UPI. This action may result in the eviction of a cache line, resulting in cache pollution. The Document Number: External, Revision 0.5 9

10 Acronyms Description Expansion RdLine_S Read Line Shared Memory Read Request, with FPGA cache hint set to shared. An attempt will be made to keep it in FPGA cache in a shared state. Rx Receive Receive or input from an AFU s perspective Tx Transmit Transmit or output from an AFU s perspective Upstream Direction up to CPU Logical direction towards CPU. Example, upstream port, means port going to CPU. UMsg UMsgH Intel UPI Unordered Message from CPU to AFU Unordered Message Hint from CPU to AFU Intel Ultra Path Interconnect (Intel UPI) An unordered notification with a 64-byte payload This is a hint to a subsequent UMsg. No data payload. Intel s proprietary coherent interconnect protocol between Intel cores or other IP. WrLine_I Write Line Invalid Memory Write Request, with FPGA cache hint set to Invalid. FIU will write the data with no intention of keeping the data in FPGA cache. WrLine_M Write Line Modified Memory Write Request, with FPGA cache hint set to Modified. FIU will write the data and leave it in the FPGA cache in Modified state. WrPush_I Write Push Invalid Memory Write Request, with FPGA cache hint set to Invalid. FIU writes the data into the processor s Last Level Cache (LLC) with no intention of keeping the data in FPGA cache. The LLC it writes to is always the LLC associated with the processor where the DRAM address is homed. advantage of using RdLine_I is that it is not tracked by CPU directory; thus it will prevent snooping from CPU. 10 Document Number: External, Revision 0.5

11 2 Introduction CCI-P is the hardware-side signaling interface between the Accelerated Function Unit (AFU) and the FPGA Interface Unit (FIU). This document defines the signaling interface. It specifies the access types, the request format and the memory model, and the mandatory AFU CSRs. It provides timing diagrams and AFU design guidelines. CCI-P provides an abstraction of the physical links between the FPGA and CPU. An AFU sees a unified interface with four virtual channels and a unified address space. CCI-P uses data payloads with up to four cache lines (4 CL). Table 2-1 lists some key CCI-P features. Table 2-1. CCI-P Features Feature Data Transfer Size Addressing Mode Addressing Width (CL aligned addresses) Caching Hints Virtual Channels Response Ordering MMIO Read and Write FPGA to CPU Interrupt Interface Clk frequency CCI-P 64, 128, 256B Physical Addressing Mode 42 bits Yes VA, VL0, VH0, VH1 Out of order responses Supported Not Supported 400 MHz CCI-P introduces two architectural concepts: Device Feature Lists (DFLs) and Basic Building Blocks (BBBs). DFL defines a structure for grouping like functionalities and enumerating them. BBB defines an architecture for wrapping features into building blocks. You can incorporate these building blocks into your AFU. BBBs are source-visible reference designs; other than a few mandatory registers, there are no other requirements imposed on a BBB. For example, the Memory Properties Factory (MPF) is a BBB that translates virtual memory addresses to physical memory addresses for memory shared between the Intel Xeon processor and the FPGA. MPF also does read response ordering and provides data hazard resolution. Section 5 provides more information on BBBs. Document Number: External, Revision

12 2.1 Multi-chip and Discrete Package with Intel FPGA Block Diagram FPGA logic (as shown in Figure 2-1) is divided into two parts: the Intel-provided FPGA Interface Unit (FIU) represented by the blue box (called the blue bitstream) and the userdeveloped AFU represented by the green box (called the green bitstream). The blue bitstream is the system/platform code which is configured at boot time and remains in memory to manage system buses. The green bitstream is in a partial configuration region and may be updated on a live system. The FIU implements all the key features required for deployment and manageability of FPGA using an Intel Xeon processor within the datacenter. The FIU implements the interface protocols for links between the CPU and FPGA. The FIU also provides platform capabilities such as Intel Virtual Technology (Intel VT) for Directed I/O (Intel VT-d), security, error monitoring, performance monitoring, power and thermal management, partial reconfiguration of the AFU, etc. Note: The three physical links: PCIe0, PCIe1, and UPI. These physical links are multiplexed as virtual channels on the CCI-P interface. Refer to Section 2.3 for more information about physical and virtual channels. The System Management Bus (SMBUS) interface running between the Intel Xeon processor, and the MCP or DCP with Intel FPGA IP is SMBUS-like; it does not follow published SMBUS specifications. It is used for out-of-band temperature monitoring, configuration during the bootstrap process, and platform debug purposes. 12 Document Number: External, Revision 0.5

13 CCI-P Figure 2-1. High Level Block Diagram of MCP/DCP with Intel FPGA IP Logic Intel Xeon Intel IP: FPGA Interface Unit (FIU) SMBus slave PCIe Gen3x8 EP0 PCIe Gen3x8 EP1 Coherent intf UPI 9.2G Cache controller Data Channel Control Channel BDX only blocks SKX only blocks Optionalparameterized CCI-U CCI-U CCI-U FPGA Management Engine (FME) 1. thermal monitor 2. power monitor 3. performance monitor 4. Partial Reconfiguration 5. global errors Fabric IOMMU & Device TLB CCI-P Port0 - SignalTap - UMsg - port reset - port errors PR Unit AFU 0 Refer to Figure 2-2 for a list of platform capabilities. Unified address space Even though FIU has three physical links going to the CPU, the AFU maintains a single view of the system address space. A write to address X directed over coherent interface or PCIe goes to the same cache line in the system memory. Intel VT-d support MCP/DCP with Intel FPGA IP has hardware support for memory isolation. Partial Reconfiguration (PR) of AFU Document Number: External, Revision

14 PR uses Altera FPGA technology to allow a user to reconfigure parts of the FPGA device dynamically, while the remainder of the FPGA continues to operate. MCP/DCP with Intel FPGA IP supports one AFU. Remote Debug MCP/DCP with Intel FPGA IP product enables remote access to SignalTap II Logic Analyzer for in system debug. The remote access gives capability to use the SignalTap II Logic Analyzer through network when physical access is not available as would be an expected debug usage in a data center environment. Table 2-2. Comparison of Platform Capabilities Capability Intel Xeon Processor E v4 Family with FPGA IP Current MCP/DCP with Intel FPGA IP Unified Address space Yes Yes Intel VT-d support for AFU No Yes Partial Reconfiguration Yes Yes RemoteDebug Yes Yes FPGA Cache size 64 KiB direct mapped 128 KiB direct mapped 2.2 Development Model The two AFU development models supported are Hardware Description Language (HDL) design and OpenCL design. 1. HDL design This is the traditional FPGA development flow, where users design an AFU in an HDL language like Verilog, System Verilog or VHDL adhering to the CCI-P interface specification. Users then compile their code (the RTL) through the Intel Quartus tool chain to generate an AFU bitstream. 2. OpenCL design The OpenCL SDK is a framework for writing programs at a higher level of C-like abstraction. Users develop an AFU in OpenCL C and compile it along with the MCP/DCP with Intel FPGA IP Board Support Package (BSP) to generate an FPGA bitstream and a software executable. For best performance, the OpenCL code must be optimized for the MCP/DCP with Intel FPGA IP platform. Applications can even simultaneously utilize multiple distinct implementations of the same service API. 14 Document Number: External, Revision 0.5

15 2.3 Memory Hierarchy This section explains the memory hierarchy in the MCP/DCP Intel FPGA IP system. Refer to Figure 2-2. The green dotted box shows the multi-processor coherence domain. The FIU on the FPGA extends the coherence domain from the processor to the FPGA, encompassing a cache implemented on the FPGA (called the FPGA cache). The FIU implements a cache controller and UPI Caching Agent (CA). The CA makes read and write requests to coherent system memory and services snoop requests to the FIU cache. Figure 2-2. MCP/DCP with Intel FPGA IP System Memory Hierarchy, 1 Processor Topology The CCI-P interface abstracts the physical links to the processor and provides simple load/store semantics to the AFU for accessing system memory. The physical links are presented as virtual channels on the CCI-P interface. Each request can select the virtual channel. The virtual channels are called VL0, VH0, and VH1. There is a fourth called VA (for V Auto) where the FIU maps requests to the three physical buses, optimizing for bandwidth. Refer to Table 2-3. The response header identifies which VC was selected by the FIU. For a single-processor system, AFU sees a three-level memory hierarchy: FIU Cache (2) Processor Last Level Cache (LLC) (3) DRAM The memory access latency increases from (1) to (3). Note that the AFU accesses 2 nd and 3 rd level memory along three independent paths, each with a different latency. Table 2-3 lists the different possible AFU Memory Read operations in Document Number: External, Revision

16 increasing order of latency. Each row shows the request path, and the node that services the request is highlighted in GREEN. Table 2-3. AFU Memory Read Paths Requests FPGA Cache Processor LLC DRAM FPGA Cache Hit Hit (only applies to VL0) Processor Cache Hit Miss Hit All Cache Miss Miss Miss Read If still developing experience with the CCI-P interface, choose the VA channel. This channel is optimized for maximum bandwidth and producer-consumer type data flows. Refer to Section 3.12 for ordering rules. When choosing VA, the FIU makes a decision to steer your request to a physical link based on the following: Caching hint Data payload size Link utilization Cacheable requests will be biased towards the UPI link. 64B requests will be biased towards UPI link. A cache line is 64 byte. A multi-cache line read/write will NOT be split, it is guaranteed to be processed by a single physical link. VA will attempt to balance the load across the virtual channels. The cache is along the VL0 data path. The VC steering decision is made before the cache lookup. You could incur a high memory latency, if the requested cache line is cached in FPGA, and the request got steered to VH*. In this case, the processor will have to snoop the FPGA cache, in order to complete the VH request. 16 Document Number: External, Revision 0.5

17 3 CCI-P Interface CCI-P provides access to two types of memory: main memory and input/output (I/O) memory. Main Memory I/O Memory Subsequent to this section, main memory is just referred to as memory. This is the memory attached to the processor and exposed to the operating system. Requests from the AFU to main memory are called upstream requests. I/O memory is implemented within the I/O device, which in our case is the AFU. How this memory is implemented and organized is up to the AFU. The AFU may choose flip-flops, M20Ks or MLABs. The CCI-P interface defines a request format to access I/O memory using Memory Mapped I/O (MMIO) requests. Requests from the processor to I/O Memory are called downstream requests. The AFU s MMIO address space is 256 KB. Figure 3-1 shows all CCI-P signals grouped into three Tx channels, two Rx channels and some additional control signals. Tx/Rx Channels The flow direction is from the AFU point of view. Tx flows from AFU to FIU. Rx flows from FIU to AFU. Grouping of signals that together completely defines the request or response. Figure 3-1 reflects the organization shown in the files ccip_std_afu.sv and ccip_if_pkg.sv. Document Number: External, Revision

18 Figure 3-1. CCI-P Signals 18 Document Number: External, Revision 0.5

19 3.1 Features Table 3-1 summarizes the features unique to the CCI-P interface for the AFU. Table 3-1. CCI-P Features Summary Virtual Channels VL0 VH0 VH1 VA Memory Request Addressing Mode Address Width Physical links are presented to the AFU as channels. The AFU can select the virtual channel for each memory request. Low latency virtual channel. (Mapped to UPI) High latency virtual channel. (Mapped to PCIe0). Protocol efficiency is better for larger data payloads. High latency virtual channel. (Mapped to PCIe1). Protocol efficiency is better for larger data payloads. Virtual Auto: FIU auto selects the link based on link utilization, request caching hint, and payload size. Latency: expect to see high variance BW: expect to see high steady state BW AFU read/write to memory Physical address 42 bits (CL address) Data Lengths 64B 128B 256B Byte Addressing FPGA Caching Hint <request>_i <request>_s <request>_m MMIO Request Not supported The AFU can ask the FIU to cache the CL in a specific state. For requests directed to VL0, FIU attempts to cache the data in the requested state, given as a hint. Except for WrPush_I, cache hint requests on VH0/1 are ignored. Note: The caching hint is only a hint and provides no guarantee of final cache state. Ignoring a cache hint, impacts performance but does not impact functionality. No intention to cache Desire to cache in shared (S) state Desire to cache in modified (M) state CPU read/write to AFU I/O Memory MMIO Read payload 4B 8B MMIO Write payload 4B 8B 64B MMIO writes could be combined by the x86 Write Combining buffer UMsg Unordered Message UMsgs data payload CPU read/write to AFU I/O Memory 64B # UMsg supported 8 per AFU Document Number: External, Revision

20 3.2 Signaling Information All CCI-P signals must be synchronous to pclk. All signals are active high, unless explicitly mentioned. Active low signals use a suffix _n. Intel recommends using the CCI-P structures defined inside ccip_if_pkg.sv file. This is included in the RTL package. All AFU output signals must be registered. AFU output bits marked as RSVD are reserved and must be driven to 0. AFU output bits marked as RSVD-DNC, are don t care bits. The AFU can drive either 0 or 1. All AFU input signals must also be registered. AFU input bits marked as RSVD must be treated as don t care (X) by the AFU. Code 3-1 shows the port map for the ccip_std_afu module. The AFU must be instantiated under here. The subsequent sections explains the interface signals. Code 3-1. ccip_std_afu Port Map $ module ccip_std_afu( // CCI-P Clocks and Resets input logic pclk, // 400MHz - CCI-P clock domain. Primary // interface clock input logic pclkdiv2, // 200MHz - CCI-P clock domain. input logic pclkdiv4, // 100MHz - CCI-P clock domain. input logic uclk_usr, // User clock domain. input logic uclk_usrdiv2, // User clock domain. Half the programmed // frequency input logic pck_cp2af_softreset, // CCI-P ACTIVE HIGH Soft // Reset input logic [1:0] pck_cp2af_pwrstate, // CCI-P AFU Power State input logic pck_cp2af_error, // CCI-P Protocol Error // Detected // Interface structures input t_if_ccip_rx pck_cp2af_srx, // CCI-P Rx Port output t_if_ccip_tx pck_af2cp_stx // CCI-P Tx Port ); 3.3 Read from/write to Main Memory The AFU makes a memory read request to the FIU over Channel 0 (C0), using Tx signals, and receives the response over C0, using Rx signals. AFU drives the C0 valid signal to indicate that C0 Hdr contains a request. The c0_reqmemhdr structure provides a convenient mapping from flat bit-vector to read request fields. The req_type signal provides a cache hint (RDLINE_I, Invalid or RDLINE_S, Shared). The mdata field is a user defined request ID. 20 Document Number: External, Revision 0.5

21 Then, the FIU responds over C0. The resp_type signal in the c0_rspmemhdr structure indicates response type (Memory Read or UMsg Received). The data field in C0 contains the data that were read. The mdata field in the c0_rspmemhdr structure contains the same value that went out with the request. The AFU makes a memory write request to the FIU over Channel 1 (C1), using Tx signals, and receives the response over C1, using Rx signals. AFU drives the C1 valid signal to indicate that C1 Hdr contains a request. The c1_reqmemhdr structure provides a convenient mapping from flat bit-vector to write request fields. The req_type signal provides request type and cache hint. Then, the FIU responds over C1 using Rx signals. The resp_type field in the c0_rspmemhdr structure indicates whether the response is for a memory write. The mdata field in the c1_respmemhdr structure contains the same value that went out with the write request. Write memory requests need explicit synchronization using WrFence. 3.4 UMsg UMsg provides the same functionality as a spin loop from the AFU, without burning the CCI-P read bandwidth. Think of it as a spin loop optimization, where a monitoring agent inside the FPGA cache controller is monitoring snoops to cache lines allocated by the driver. When it sees a snoop to the cache line, it reads the data back and sends an UMsg to the AFU. UMsg flow makes use of the cache coherency protocol to implement a high speed unordered messaging path from CPU to AFU. This process consists of two stages as shown in Figure 3-2. The first stage is initialization, this is where SW pins the UMsg Address Space (UMAS) and shares the UMAS start address with the FPGA cache controller. Once this is done, the FPGA cache controller reads each cache line in the UMAS and puts it as shared state in the FPGA cache. The second stage is actual usage, where the CPU writes to the UMAS. A CPU write to UMAS generates a snoop to FPGA cache. The FPGA responds to the snoop and marks the line as invalid. The CPU write request completes, and the data become globally visible. A snoop in UMAS address range, triggers the Monitoring Agent (MA), which in turn sends out a read request to CPU for the Cache Line (CL) and optionally sends out an UMsg with Hint (UMsgH) to the AFU. When the read request completes, an UMsg with 64B data is sent to the AFU. Document Number: External, Revision

22 Usage Intialization Figure 3-2. UMsg Initialization and Usage Flow CPU Memory FPGA QPI Agent Setup UMAS (Pinned Memory) Inform FPGA of UMAS location AFU CPU Writes to UMAS CPU Wr causes a Snoop to FPGA UMsgH For ultra low latency, Snp itself is used as a UMsgH FPGA gets the read data UMsg + 64B data Snp + Read Data is sent as UMsg Functionally, UMsg is equivalent to a spin loop or a monitor and mwait instruction on an Intel Xeon processor. Some key characteristics of UMsgs: 1. Just as spin loops to different addresses in a multi-threaded application have no relative ordering guarantee, UMsgs to different addresses have no ordering guarantee between them. 2. Every CPU write to a UMAS CL, may not result in a corresponding UMsg. The AFU may miss an intermediate change in the value of a CL, but it is guaranteed to see the newest data in the CL. Again it helps to think of this like a spin loop: if the producer thread updates the flag CL multiple times, it is possible that polling thread misses an intermediate change in value, but it is guaranteed to see the newest value. Here is an example usage. Software updates to a descriptor queue pointer may be mapped to an UMsg. The pointer is always expected to increment. The UMsg will guarantee that AFU sees the final value of the pointer, it may miss intermediate updates to the pointer, which is acceptable. 1. The UMsg will use the FPGA cache, as a result it could cause cache pollution, a situation in which a program unnecessarily loads data into the cache and causes other needed data to be evicted, thus degrading performance. 2. Because the CPU may exhibit false snooping, UMsgH should be treated as a hint. That is, you can start a speculative execution or pre-fetch based on UMsgH, but you should wait for UMsg before committing the results. 22 Document Number: External, Revision 0.5

23 3. The UMsg provides the same latency as an AFU read polling using RdLine_S, but it saves CCI-P channel bandwidth which can be used for read traffic. 3.5 MMIO Cycles to I/O Memory MMIO Write requests posted AFU must not return a response. MMIO Read requests non-posted AFU must return a response. Key points: Read data length supported = 4B, 8B Write data length supported = 4B, 8B AFU must support 8B MMIO accesses to I/O memory and register file. 4B accesses are optional. It can be avoided by coordinating with the software application developer. Maximum outstanding MMIO read requests is limited to 64. MMIO read request timeout value = 512 pclk cycles Maximum MMIO request rate = 1 request per 2 pclks cycles MMIO reads to undefined AFU registers should still return a response. The FIU makes an MMIO read request to the AFU over C0, using Rx signals. The mmiordvalid indicates that C0 Hdr contains a MMIO read request. The c0_reqmmiohdr structure provides a convenient mapping from flat bit-vector to MMIO read request fields {address, length, tid}. Then, the AFU drives a response over C2 using Tx signals. The C2 signal mmiordvalid indicates that the C2 Hdr and data fields contain the MMIO Read response. The c0_rspmmiohdr.tid field must match that provided in c0_reqmmiohdr.tid; this is used to match the response against request. It is illegal to split an 8B MMIO Read request into 2, 4B MMIO Read responses. The FIU makes an MMIO write request to the AFU over C0, using Rx signals. mmiowrvalid indicates that the c0_reqmmiohdr structure is an MMIO write request and contains the IO address to be written. The C0 data field contains the data to be written. For generating 64B MMIO Writes to AFU, use AVX-512 writes in MCP/DCP and later processors. It is not feasible to guarantee 64B MMIO writes from earlier processors. Document Number: External, Revision

24 3.6 CCI-P Tx Signals Code 3-2. Tx Interface Structure Inside ccip_if_pkg.sv $ typedef struct packed { t_if_ccip_c0_tx c0; t_if_ccip_c1_tx c1; t_if_ccip_c2_tx c2; } t_if_ccip_tx; There are three Tx channels: The C0 and C1 Tx channels are used for memory requests. They provide independent flow control. The C0 Tx channel is used for memory read requests; the C1 Tx channel is used for memory write requests. The C2 Tx channel is used to return MMIO Read response to the FIU. The CCI-P port guarantees to accept responses on C2; therefore, it has no flow control. Code 3-3. Tx Channel Structure Inside ccip_if_pkg.sv // Channel 0 : Memory Reads typedef struct packed { t_ccip_c0_reqmemhdr hdr; // Request Header logic valid; // Request Valid } t_if_ccip_c0_tx; // corresponding AlmostFull inside t_if_ccip_rx.c0txalmfull // Channel 1 : Memory Writes typedef struct packed { t_ccip_c1_reqmemhdr hdr; // Request Header t_ccip_cldata data; // Request Data logic valid; // Request Wr Valid } t_if_ccip_c1_tx; // corresponding AlmostFull inside t_if_ccip_rx.c1txalmfull // Channel 2 : MMIO Read response typedef struct packed { t_ccip_c2_rspmmiohdr hdr; // Response Header logic mmiordvalid; // Response Read Valid t_ccip_mmiodata data; // Response Data } t_if_ccip_c2_tx; Each Tx channel has a valid signal to qualify the corresponding header and data signals within the structure. Table 3-2 describes the signals that make up the CCI-P Tx interface. 24 Document Number: External, Revision 0.5

25 Table 3-2. Tx Channel Description Signal Width Direction Description pck_af2cp_stx.c0.hdr 74b Output Channel 0 request header.refer to Table 3-3. Tx Header Field Definitions. pck_af2cp_stx.c0.valid 1b Output When set to 1, it indicates channel 0 request header is valid. pck_cp2af_srx.c0txalmfull 1b Input When set to 1, Tx Channel0 is almost full. After this signal is set, AFU is allowed to send a maximum of 8 requests. When set to 0, AFU can start sending requests immediately. pck_af2cp_stx.c1.hdr 80b Output Channel 1 request header. Refer to Table 3-3. Tx Header Field Definitions. pck_af2cp_stx.c1.data 512b Output Channel 1 data pck_af2cp_stx.c1.valid 1b Output When set to 1, it indicates channel 1 request header and data is valid. pck_cp2af_srx.c1txalmfull 1b Input When set to 1, Tx Channel1 is almost full. After this signal is set, AFU is allowed to send a maximum of 8 requests or data. When set to 0, AFU can start sending requests immediately. pck_af2cp_stx.c2.hdr 9b Output Channel 2 response header. Refer to Table 3-3. Tx Header Field Definitions. pck_af2cp_stx.c2.mmiordvalid 1b Output When set to 1, indicates Channel 2 response header and data is valid. pck_af2cp_stx.c2.data 64b Output Channel 2 data. MMIO Read Data that AFU returns to FIU. For 4B reads, data must be driven on bits [31:0]. For 8B reads, AFU must drive one 8B data response. Response cannot be split into two 4B responses. Document Number: External, Revision

26 3.7 Tx Header Format Table 3-3. Tx Header Field Definitions Field Description mdata tid vc_sel Metadata: user defined request ID that is returned unmodified from request to response header. For multi-cl writes on C1 Tx, mdata is only valid for the header when sop=1. Transaction ID: AFU must return the tid MMIO Read request to response header. It is used to match the response against the request. Virtual Channel selected 2 h0 VA 2 h1 VL0 2 h2 VH0 2 h3 VH1 All CLs that form a multi-cl write request are routed over the same virtual channel (VC). req_type Request types listed in Table 3-4. sop cl_len address Start of Packet for multi-cl memory write 1 b1 marks the first header. Must write in increasing address order. 1 b0 subsequent headers Length for memory requests 2 b00 64B 2 b01 128B 2 b11 256B 64B aligned Physical Address, that is, byte_address>>6 The address must be naturally aligned with regards to the cl_len field. Example for cl_len=2 b01, the address must be divisible by 128B, similarly for cl_len=2 b11, the address must be divisible by 256B. 26 Document Number: External, Revision 0.5

27 Table 3-4. Tx Header Field Definitions Request Type Encoding Data Description Hdr Format t_if_ccip_c0_tx: enum t_ccip_c0_req ereq_rdline_i 4 h0 No Memory read request with no intention to cache. C0 Memory Request Header. Refer to Table 3-5. ereq_rdline_s 4 h1 No Memory read request with caching hint set to Shared. t_if_ccip_c1_tx: enum t_ccip_c1_req ereq_wrline_i 4 h0 Yes Memory write request with no intention of keeping the data in FPGA cache. C1 Memory Request Hdr Refer to Table 3-6. ereq_wrline_m 4 h1 Yes Memory write request with caching hint set to Modified. ereq_wrpush_i 4 h2 Yes Memory Write Request, with caching hint set to Invalid. FIU writes the data into the processor s last level cache (LLC) with no intention of keeping the data in FPGA cache. The LLC it writes to is always the LLC associated with the processor where the DRAM address is homed. ereq_wrline_i 4 h0 Yes Memory write request with no intention of keeping the data in FPGA cache. Fence Hdr Refer to Table 3-7. t_if_ccip_c2_tx does not have a request type field MMIO Rd N.A. Yes MMIO read response MMIO Rd Response Hdr Refer to Table 3-8. All unused encodings are considered reserved. Table 3-5. C0 Read Memory Request Header Format Structure: t_ccip_c0_reqmemhdr Bit # Bits Field [73:72] 2 vc_sel [71:70] 2 RSVD [69:68] 2 cl_len [67:64] 4 req_type [63:58] 6 RSVD [57:16] 42 address [15:0] 16 mdata Document Number: External, Revision

28 Table 3-6. C1 Write Memory Request Header Format Structure: t_ccip_c1_reqmemhdr Bit # Bits Field SOP=1 [79:74] [73:72] [71] [70] [69:68] [67:64] [63:58] [57:18] [17:16] [15:0] Field SOP=0 6 RSVD RSVD 2 vc_sel RSVD-DNC 1 sop=1 sop=0 1 RSVD RSVD 2 cl_len RSVD-DNC 4 req_type req_type 6 RSVD RSVD 40 RSVD-DNC address 2 address 16 mdata RSVD-DNC Table 3-7. C1 Fence Header Format Structure: t_ccip_c1_reqfencehdr Bit # Bits Field [79:74] [73:72] [71:68] [67:64] [63:16] [15:0] 6 RSVD 2 vc_sel 4 RSVD 4 req_type 48 RSVD 16 mdata Table 3-8. C2 MMIO Response Header Format [8:0] Bit # Bits Field 9 tid 3.8 CCI-P Rx Signals Code 3-4. Rx Interface Structure Inside ccip_if_pkg.sv typedef struct packed { logic c0txalmfull; // C0 Request Channel Almost Full logic c1txalmfull; // C1 Request Channel Almost Full t_if_ccip_c0_rx c0; t_if_ccip_c1_rx c1; } t_if_ccip_rx; 28 Document Number: External, Revision 0.5

29 There are two Rx channels. Channel 0 interleaves memory responses, MMIO requests and UMsgs. Channel 1 returns responses for AFU requests initiated on Tx Channel 1. The c0txalmfull and c1txalmfull signals are inputs to the AFU. Although they are declared with the Rx signals structure, they logically belong to the Tx interface and so were described in the previous section. Rx Channels have no flow control. The AFU must accept responses for memory requests it generated. The AFU must pre-allocate buffers before generating a memory request. The AFU must also accept MMIO requests. Code 3-5. Rx Channel Structure Inside ccip_if_pkg.sv typedef struct packed { logic c0txalmfull; // C0 Request Channel Almost Full logic c1txalmfull; // C1 Request Channel Almost Full t_if_ccip_c0_rx c0; t_if_ccip_c1_rx c1; } t_if_ccip_rx; Rx Channel 0 has separate valid signals for memory requests and MMIO requests. Only one of those valid signals can be set in a cycle. MMIO request has a separate valid signal for MMIO Read and MMIO Write. When either mmiordvalid or mmiowrvalid is set the message is an MMIO request and should be processed by casting t_if_ccip_c0_rx.hdr to t_ccip_c0_reqmmiohdr. Table 3-9. Rx Channel Signal Description pck_cp2af_srx.c0.hdr Signal Width Direction Description 28b Input Channel 0 response header or MMIO request header. Refer to 3-10 Rx Header Field Definitions. pck_cp2af_srx.c0.data 512b Input Channel 0 Data bus Memory Read Response and UMsg: Returns 64B data pck_cp2af_srx.c0.resp_valid 1b Input MMIO Write Request: For 4B write, data driven on bits [31:0] For 8B write, data driven on bits [63:0] When set to 1, it indicates header and data on Channel 0 are valid. The header must be interpreted as a memory response, decode resp_type field. pck_cp2af_srx.c0.mmiordvalid 1b Input When set to 1, it indicates a MMIO Read request Channel 0. pck_cp2af_srx.c0.mmiowrvalid 1b Input When set to 1, it indicates a MMIO Write request on Chanel 0. pck_cp2af_srx.c1.hdr 28b Input Channel 1 response header. Refer to 3-10 Rx Header Field Definitions pck_cp2af_srx.c1.respvalid 1b Input When set to 1, it indicates header on channel 1 is a valid response. Document Number: External, Revision

30 3.8.1 Rx Header and Rx Data Format Table Rx Header Field Definitions mdata Field Description Metadata: User defined request ID, returned unmodified from memory request to response header. For multi-cl memory response, the same mdata is returned for each CL. vc_used format Virtual channel used: when using VA, this field identifies the virtual channel selected for the request by FIU. For other VCs it returns the request VC. When using multi-cl memory write requests, FIU may return a single response for the entire payload or a response per CL in the payload. 1 b0 Unpacked write response: returns a response per CL. Look up the cl_num field to identify the cache line. NOTE: Using unpacked write response is not used with MPF as responses to AFU are always packed. 1 b1 Packed write response: returns a single response for entire payload. cl_num field gives the payload size, that is, 1 CL, 2 CLs, or 4CLs. cl_num format=0 For a response with >1CL data payload, this field identifies the cl_num. 2 h0 1st CL. Lowest Address 2 h1 2nd CL 2 h3 4th CL. Highest Address Responses may be returned out of order. hit_miss MMIO Length MMIO Address UMsg ID UMsg Type format=1 This field identifies the data payload size. 2 h0 1 CL or 64B 2 h1 2 CL or 128B 2 h3 4 CL or 256B Cache Hit/Miss status. AFU can use this to generate fine grained hit/miss statistics for various modules. 1 h0 Cache Miss 1 h1 Cache Hit Length for MMIO requests: 2 h0 4B 2 h1 8B Double word (DWORD) aligned MMIO address offset, that is, byte Address>>2. Identifies the CL corresponding to the UMsg Two type of UMsg are supported: 1 b1 UMsgH (Hint) without data 1 b0 UMsg with Data 30 Document Number: External, Revision 0.5

31 Table AFU Rx Response Encodings and Channels Mapping Request Type Encoding Data Payload Hdr Format t_if_ccip_c0_rx: enum t_ccip_c0_rsp ersp_rdline 4 h0 Yes Memory Response Header. Refer to Table Qualified with c0.rspvalid MMIO Read N.A. No MMIO Request Header. Refer to Table MMIO Write N.A. Yes ersp_umsg 4 h4 Yes/No UMsg Response Header. Refer to Table Qualified with c0.rspvalid t_if_ccip_c1_rx: enum t_ccip_c1_rsp ersp_rdline 4 h0 Yes Memory Response Header. Refer to Table Qualified with c0.rspvalid MMIO Read N.A. No MMIO Request Header. Refer to Table Table C0 Memory Read Response Header Format Structure: t_ccip_c0_rspmemhdr Bit # Bits Field [27:26] 2 vc_used [25] 1 RSVD [24] 1 hit_miss [23:22] 2 RSVD [21:20] 2 cl_num [19:16] 4 resp_type [15:0] 16 mdata Table MMIO Request Header Format Bit # Bits Field [27:12] 16 address [11:10] 2 length [9] 1 RSVD [8:0] 9 TID Document Number: External, Revision

32 Table C1 Memory Write Response Header Format Structure: t_ccip_c1_rspmemhdr Bit # Bits Field [27:26] [25] [24] [23] [22] [21:20] [19:16] [15:0] 2 vc_used 1 RSVD 1 hit_miss 1 format 1 RSVD 2 cl_num 4 resp_type 16 mdata Table UMsg Header Format Bit # Bits Field [27:20] [19:16] [15] [14:3] [2:0] 8 RSVD 4 resp_type 1 UMsg Type 12 RSVD 3 UMsg ID Table WrFence Header Format Structure: t_ccip_c1_rspfencehdr Bit # Bits Field [27:20] [19:16] [15:0] 8 RSVD 4 resp_type 16 mdata 3.9 Multi-Cache Line Memory Requests To achieve highest link efficiency, pack the memory requests into large transfer sizes. Use the multi-cl requests for this. Listed below are the characteristics of multi-cl memory requests: Highest memory bandwidth is achieved when using a data payload of 4CLs. Memory write request should always begin with the lowest address first. SOP=1 in the c1_reqmemhdr marks the first CL. All subsequent headers in the multi-cl request must drive the corresponding CL address. An N CL memory write request takes N cycles on Channel 1. It is legal to have bubbles between the cycles that form a multi-cl request, but one request cannot be interleaved with another request. It is illegal to start a new request without completing the entire data payload for a multi-cl write request. FIU guarantees to complete the multi-cl VA requests on a single VC. 32 Document Number: External, Revision 0.5

33 The memory request address must be naturally aligned. A 2CL request should start on a 2CL boundary. Its CL address must be divisible by 2. A 4CL request should be aligned on a 4CL boundary. Its CL address must be divisible by 4. Figure 3-3 is an example of a multi-cl Memory Write Request. Figure 3-3. Multi-CL Memory Request pclk pck_af2cp_stx.c1.valid pck_af2cp_stx.c1.data D0 D1 D2 D3 D4 D5 D6 D7 D8 pck_af2cp_stx.c1.hdr.vc_sel pck_af2cp_stx.c1.hdr.sop pck_af2cp_stx.c1.hdr.cl_len pck_af2cp_stx.c1.hdr.addr[41:2] pck_af2cp_stx.c1.hdr.addr[1:0] VA VH0 VL0 VH1 h1 h0 h0 h0 h1 h0 h1 h1 h0 h3 h1 h0 h1 h1040 h1041 h1043 h1044 h0 h1 h2 h3 h0 h1 h1 h0 h1 pck_af2cp_stx.c1.hdr.req_type WrLine_I WrLine_M WrLin e_m WrLine_I pck_af2cp_stx.c1.hdr.mdata h10 h11 h12 h13 Figure 3-4 is an example for a Memory Write Response Cycles. For unpacked response, the individual CLs could return out of order. Figure 3-4. Multi-CL Memory Write Responses Document Number: External, Revision

34 Figure 3-5 is an example of a Memory Read Response Cycle. The read response can be reordered within itself; that is, there is no guaranteed ordering between individual CLs of a multi-cl Read. All CLs within a multi-cl response have the same mdata and same vc_used. Individual CLs of a multi-cl Read are identified using the cl_num field. Figure 3-5. Multi-CL Memory Read Responses 3.10 Additional Control Signals Unless otherwise mentioned, all signals are active high. Table Clock and Reset Signal Width Direction Description pck_cp2af_softreset 1b Input Synchronous ACTIVE HIGH soft reset. When set to 1, AFU must reset all logic. Minimum Reset pulse width is 256 pclk cycles. All outstanding CCI-P requests will be flushed before de-asserting soft reset. A soft reset will not reset the FIU. pclk 1b Input Primary interface clock. All CCI-P interface signals are synchronous to this clock. Clock frequency is listed in Section pclkdiv2 1b Input Synchronous and in phase with pclk. 0.5x clock frequency. pclkdiv4 1b Input Synchronous and in phase with pclk. 0.25x clock frequency. 34 Document Number: External, Revision 0.5

35 Signal Width Direction Description uclk_usr 1b Input The user defined clock is not synchronous with the pclk. AFU must synchronize the signals to pclk domain before driving the CCI-P interface. Default frequency is MHz. Intel Quartus partial reconfiguration flow does not allow PLLs to be instantiated in the reconfigurable region (that is, the AFU). The AFU load utility will program the user defined clock frequency before de-asserting pck_cp2af_softreset. uclk_usrdiv2 1b Input Synchronous with uclk_usr and 0.5x the frequency. pck_cp2af_pwrstate 2b Input Indicates the current AFU power state request. In response to this, the AFU must attempt to reduce its power consumption. If sufficient power reduction is not achieved, the AFU may be Reset. 2 h0 AP0 - Normal operation mode 2 h1 AP1 - Request for 50% power reduction 2 h2 Reserved, illegal 2 h3 AP2 - Request for 90% power reduction When pck_cp2af_pwrstate is set to AP1, the FIU will start throttling the memory request path to achieve 50% throughput reduction. The AFU is also expected to reduce it power utilization to 50%, by throttling back accesses to FPGA internal memory resources and its compute engines. Similarly upon transition to AP2, the FIU will throttle the memory request paths to achieve 90% throughput reduction over normal state, and AFU in turn is expected to reduce its power utilization to 90%. pck_cp2af_error 1b Input CCI-P protocol error has been detected and logged in the PORT Error register. This register is visible to the AFU. It can be used as trigger for signal taps. When such an error is detected, the CCI-P interface stops accepting new requests and sets AlmFull is set to 1. There is no expectation to complete outstanding requests. The AFU is not reset. Document Number: External, Revision

Acceleration Stack for Intel Xeon CPU with FPGAs Core Cache Interface (CCI-P) Reference Manual

Acceleration Stack for Intel Xeon CPU with FPGAs Core Cache Interface (CCI-P) Reference Manual Acceleration Stack for Intel Xeon CPU with FPGAs Core Cache Interface (CCI-P) Reference Subscribe Send Feedback Latest document on the web: PDF HTML Contents Contents 1 Acceleration Stack for Intel Xeon

More information

Acceleration Stack for Intel Xeon CPU with FPGAs Core Cache Interface (CCI-P) Reference Manual

Acceleration Stack for Intel Xeon CPU with FPGAs Core Cache Interface (CCI-P) Reference Manual Acceleration Stack for Intel Xeon CPU with FPGAs Core Cache Interface (CCI-P) Reference Updated for Intel Acceleration Stack: 1.0 Production Subscribe Send Feedback Latest document on the web: PDF HTML

More information

Michael Adler 2017/09

Michael Adler 2017/09 Michael Adler 2017/09 Outline System overview Core Cache Interface (CCI-P) abstraction Application-specific memory hierarchies (Memory Properties Factory MPF) Clocking Simulation (ASE) GitHub open source

More information

Intel Xeon with FPGA IP Asynchronous Core Cache Interface (CCI-P) Shim

Intel Xeon with FPGA IP Asynchronous Core Cache Interface (CCI-P) Shim Intel Xeon with FPGA IP Asynchronous Core Cache Interface (CCI-P) Shim AN-828 2017.10.02 Subscribe Send Feedback Contents Contents 1... 3 1.1 Conventions...3 1.2 Glossary...3 1.3 Introduction...3 1.4 Design...

More information

Michael Adler 2017/05

Michael Adler 2017/05 Michael Adler 2017/05 FPGA Design Philosophy Standard platform code should: Provide base services and semantics needed by all applications Consume as little FPGA area as possible FPGAs overcome a huge

More information

Clear CMOS after Hardware Configuration Changes

Clear CMOS after Hardware Configuration Changes Clear CMOS after Hardware Configuration Changes Technical White Paper August 2018 Revision 001 Document Number: 337986-001 You may not use or facilitate the use of this document in connection with any

More information

Accelerator Functional Unit (AFU) Developer s Guide

Accelerator Functional Unit (AFU) Developer s Guide Accelerator Functional Unit (AFU) Developer s Guide Updated for Intel Acceleration Stack for Intel Xeon CPU with FPGAs: 1.1 Production Subscribe Latest document on the web: PDF HTML Contents Contents 1.

More information

DMA Accelerator Functional Unit (AFU) User Guide

DMA Accelerator Functional Unit (AFU) User Guide DA Accelerator Functional Unit (AFU) User Guide Updated for Intel Acceleration tack: 1.0 Production ubscribe end Feedback Latest document on the web: PDF HTL Contents Contents 1. About this Document...

More information

Intel Unite Plugin Guide for VDO360 Clearwater

Intel Unite Plugin Guide for VDO360 Clearwater Intel Unite Plugin Guide for VDO360 Clearwater INSTALLATION AND USER GUIDE Version 1.2 December 2017 Legal Disclaimers & Copyrights All information provided here is subject to change without notice. Contact

More information

Intel Unite. Intel Unite Firewall Help Guide

Intel Unite. Intel Unite Firewall Help Guide Intel Unite Intel Unite Firewall Help Guide September 2015 Legal Disclaimers & Copyrights All information provided here is subject to change without notice. Contact your Intel representative to obtain

More information

Intel Acceleration Stack for Intel Xeon CPU with FPGAs Version 1.2 Release Notes

Intel Acceleration Stack for Intel Xeon CPU with FPGAs Version 1.2 Release Notes Intel Acceleration Stack for Intel Xeon CPU with FPGAs Version 1.2 Updated for Intel Acceleration Stack for Intel Xeon CPU with FPGAs: 1.2 Subscribe Latest document on the web: PDF HTML Contents Contents

More information

Architecture Specification

Architecture Specification PCI-to-PCI Bridge Architecture Specification, Revision 1.2 June 9, 2003 PCI-to-PCI Bridge Architecture Specification Revision 1.1 December 18, 1998 Revision History REVISION ISSUE DATE COMMENTS 1.0 04/05/94

More information

Intel Ethernet Controller I350 Frequently Asked Questions (FAQs)

Intel Ethernet Controller I350 Frequently Asked Questions (FAQs) Intel Ethernet Controller I350 Frequently Asked Questions (FAQs) Networking Division (ND) June 2014 Revision 2.2 Legal By using this document, in addition to any agreements you have with Intel, you accept

More information

A Reconfigurable Computing System Based on a Cache-Coherent Fabric

A Reconfigurable Computing System Based on a Cache-Coherent Fabric A Reconfigurable Computing System Based on a Cache-Coherent Fabric Presenter: Neal Oliver Intel Corporation June 10, 2012 Authors- Neal Oliver, Rahul R Sharma, Stephen Chang, Bhushan Chitlur, Elkin Garcia,

More information

LNet Roadmap & Development. Amir Shehata Lustre * Network Engineer Intel High Performance Data Division

LNet Roadmap & Development. Amir Shehata Lustre * Network Engineer Intel High Performance Data Division LNet Roadmap & Development Amir Shehata Lustre * Network Engineer Intel High Performance Data Division Outline LNet Roadmap Non-contiguous buffer support Map-on-Demand re-work 2 LNet Roadmap (2.12) LNet

More information

Intel Xeon W-3175X Processor Thermal Design Power (TDP) and Power Rail DC Specifications

Intel Xeon W-3175X Processor Thermal Design Power (TDP) and Power Rail DC Specifications Intel Xeon W-3175X Processor Thermal Design Power (TDP) and Power Rail DC Specifications Datasheet Addendum Revision 001 January 2019 Document Number: 338672-001 Intel products described herein. You agree

More information

Intel Accelerator Functional Unit (AFU) Simulation Environment (ASE) Quick Start User Guide

Intel Accelerator Functional Unit (AFU) Simulation Environment (ASE) Quick Start User Guide Intel Accelerator Functional Unit (AFU) Simulation Environment (ASE) Quick Start User Guide Updated for Intel Acceleration Stack: 1.0 Production Subscribe Send Feedback Latest document on the web: PDF

More information

Omni-Path Cluster Configurator

Omni-Path Cluster Configurator Omni-Path Cluster Configurator User Guide October 2016 Legal Disclaimer Legal Disclaimer You may not use or facilitate the use of this document in connection with any infringement or other legal analysis

More information

Running Docker* Containers on Intel Xeon Phi Processors

Running Docker* Containers on Intel Xeon Phi Processors Running Docker* Containers on Intel Xeon Phi Processors White Paper March 2017 Revision 001 Document Number: 335644-001US Notice: This document contains information on products in the design phase of development.

More information

Simplify Software Integration for FPGA Accelerators with OPAE

Simplify Software Integration for FPGA Accelerators with OPAE white paper Intel FPGA Simplify Software Integration for FPGA Accelerators with OPAE Cross-Platform FPGA Programming Layer for Application Developers Authors Enno Luebbers Senior Software Engineer Intel

More information

Intel Platform Innovation Framework for EFI SMBus Host Controller Protocol Specification. Version 0.9 April 1, 2004

Intel Platform Innovation Framework for EFI SMBus Host Controller Protocol Specification. Version 0.9 April 1, 2004 Intel Platform Innovation Framework for EFI SMBus Host Controller Protocol Specification Version 0.9 April 1, 2004 SMBus Host Controller Protocol Specification THIS SPECIFICATION IS PROVIDED "AS IS" WITH

More information

Intel QuickAssist Technology

Intel QuickAssist Technology Performance Optimization Guide September 2018 Document Number: 330687-005 You may not use or facilitate the use of this document in connection with any infringement or other legal analysis concerning Intel

More information

Pactron FPGA Accelerated Computing Solutions

Pactron FPGA Accelerated Computing Solutions Pactron FPGA Accelerated Computing Solutions Intel Xeon + Altera FPGA 2015 Pactron HJPC Corporation 1 Motivation for Accelerators Enhanced Performance: Accelerators compliment CPU cores to meet market

More information

The following modifications have been made to this version of the DSM specification:

The following modifications have been made to this version of the DSM specification: NVDIMM DSM Interface Revision V1.6 August 9, 2017 The following modifications have been made to this version of the DSM specification: - General o Added two tables of supported Function Ids, Revision Ids

More information

Intel X38 Express Chipset

Intel X38 Express Chipset Intel X38 Express Chipset Specification Update For the 82X38 Memory Controller Hub (MCH) December 2007 Document Number: 317611-002 Legal Lines and Disclaimers INFORMATION IN THIS DOCUMENT IS PROVIDED IN

More information

BIOS Implementation of UCSI

BIOS Implementation of UCSI BIOS Implementation of UCSI Technical White Paper February 2016 Revision 001 Document: 333897-001 You may not use or facilitate the use of this document in connection with any infringement or other legal

More information

Intel QuickAssist for Windows*

Intel QuickAssist for Windows* Intel QuickAssist for Windows* Release Notes Package Version: QAT1.0.0-40 June 2018 Revision 001US Document Number: 337758-001US You may not use or facilitate the use of this document in connection with

More information

Modernizing Meetings: Delivering Intel Unite App Authentication with RFID

Modernizing Meetings: Delivering Intel Unite App Authentication with RFID Modernizing Meetings: Delivering Intel Unite App Authentication with RFID INTEL UNITE SOLUTION WHITE PAPER Revision 1.0 Document Number: 599309-1.0 Legal Disclaimers and Copyrights All information provided

More information

Intel Celeron Processor J1900, N2807 & N2930 for Internet of Things Platforms

Intel Celeron Processor J1900, N2807 & N2930 for Internet of Things Platforms Intel Celeron Processor J1900, N2807 & N2930 for Internet of Things Platforms Document Number: 335864-001 You may not use or facilitate the use of this document in connection with any infringement or other

More information

Interlaken IP Core (2nd Generation) Design Example User Guide

Interlaken IP Core (2nd Generation) Design Example User Guide Interlaken IP Core (2nd Generation) Design Example User Guide UG-20051 2017.09.19 Subscribe Send Feedback Contents Contents 1 Quick Start Guide... 3 1.1 Directory Structure... 4 1.2 Design Components...

More information

Intel Acceleration Stack for Intel Xeon CPU with FPGAs 1.0 Errata

Intel Acceleration Stack for Intel Xeon CPU with FPGAs 1.0 Errata Intel Acceleration Stack for Intel Xeon CPU with FPGAs 1.0 Errata Updated for Intel Acceleration Stack for Intel Xeon CPU with FPGAs: 1.0 Production Subscribe Send Feedback Latest document on the web:

More information

Intel 815 Chipset Family: Graphics and Memory Controller Hub (GMCH)

Intel 815 Chipset Family: Graphics and Memory Controller Hub (GMCH) Intel 815 Chipset Family: 82815 Graphics and Memory Controller Hub (GMCH) Specification Update May 2001 Notice: The Intel 82815 GMCH may contain design defects or errors known as errata which may cause

More information

Spring 2017 :: CSE 506. Device Programming. Nima Honarmand

Spring 2017 :: CSE 506. Device Programming. Nima Honarmand Device Programming Nima Honarmand read/write interrupt read/write Spring 2017 :: CSE 506 Device Interface (Logical View) Device Interface Components: Device registers Device Memory DMA buffers Interrupt

More information

AN 829: PCI Express* Avalon -MM DMA Reference Design

AN 829: PCI Express* Avalon -MM DMA Reference Design AN 829: PCI Express* Avalon -MM DMA Reference Design Updated for Intel Quartus Prime Design Suite: 18.0 Subscribe Latest document on the web: PDF HTML Contents Contents 1....3 1.1. Introduction...3 1.1.1.

More information

This presentation covers Gen Z Memory Management Unit (ZMMU) and memory interleave capabilities.

This presentation covers Gen Z Memory Management Unit (ZMMU) and memory interleave capabilities. This presentation covers Gen Z Memory Management Unit (ZMMU) and memory interleave capabilities. 1 2 Given the operational similarities between a Requester ZMMU and a Responder ZMMU, much of the underlying

More information

Sample for OpenCL* and DirectX* Video Acceleration Surface Sharing

Sample for OpenCL* and DirectX* Video Acceleration Surface Sharing Sample for OpenCL* and DirectX* Video Acceleration Surface Sharing User s Guide Intel SDK for OpenCL* Applications Sample Documentation Copyright 2010 2013 Intel Corporation All Rights Reserved Document

More information

Messaging Overview. Introduction. Gen-Z Messaging

Messaging Overview. Introduction. Gen-Z Messaging Page 1 of 6 Messaging Overview Introduction Gen-Z is a new data access technology that not only enhances memory and data storage solutions, but also provides a framework for both optimized and traditional

More information

Remote Update Intel FPGA IP User Guide

Remote Update Intel FPGA IP User Guide Remote Update Intel FPGA IP User Guide Updated for Intel Quartus Prime Design Suite: 18.0 Subscribe Latest document on the web: PDF HTML Contents Contents 1. Remote Update Intel FPGA IP User Guide... 3

More information

PCI-X Protocol Addendum to the PCI Local Bus Specification Revision 2.0a

PCI-X Protocol Addendum to the PCI Local Bus Specification Revision 2.0a PCI-X Protocol Addendum to the PCI Local Bus Specification Revision 2.0a July 22, 2003 REVISION REVISION HISTORY DATE 1.0 Initial release. 9/22/99 1.0a Clarifications and typographical corrections. 7/24/00

More information

Intel Unite Solution Version 4.0

Intel Unite Solution Version 4.0 Intel Unite Solution Version 4.0 Guest Access Application Guide Revision 1.0 October 2018 Document ID: XXXX Legal Disclaimers and Copyrights This document contains information on products, services and/or

More information

Zhang Tianfei. Rosen Xu

Zhang Tianfei. Rosen Xu Zhang Tianfei Rosen Xu Agenda Part 1: FPGA and OPAE - Intel FPGAs and the Modern Datacenter - Platform Options and the Acceleration Stack - FPGA Hardware overview - Open Programmable Acceleration Engine

More information

Intel Unite Solution Version 4.0

Intel Unite Solution Version 4.0 Intel Unite Solution Version 4.0 Cisco TelePresence* Application Guide Revision 1.0 October 2018 Document ID: XXX Legal Disclaimers and Copyrights This document contains information on products, services

More information

Open Programmable Acceleration Engine (OPAE) C API Programming Guide

Open Programmable Acceleration Engine (OPAE) C API Programming Guide Open Programmable Acceleration Engine (OPAE) C API Programming Guide Subscribe Send Feedback Latest document on the web: PDF HTML Contents Contents 1 Open Programmable Acceleration Engine C API Programming

More information

Intel & Lustre: LUG Micah Bhakti

Intel & Lustre: LUG Micah Bhakti Intel & Lustre: LUG 2018 Micah Bhakti Exciting Information from Lawyers All information provided here is subject to change without notice. Contact your Intel representative to obtain the latest Intel product

More information

Techniques for Lowering Power Consumption in Design Utilizing the Intel EP80579 Integrated Processor Product Line

Techniques for Lowering Power Consumption in Design Utilizing the Intel EP80579 Integrated Processor Product Line Techniques for Lowering Power Consumption in Design Utilizing the Intel Integrated Processor Product Line Order Number: 320180-003US Legal Lines and Disclaimers INFORMATION IN THIS DOCUMENT IS PROVIDED

More information

Introduction Electrical Considerations Data Transfer Synchronization Bus Arbitration VME Bus Local Buses PCI Bus PCI Bus Variants Serial Buses

Introduction Electrical Considerations Data Transfer Synchronization Bus Arbitration VME Bus Local Buses PCI Bus PCI Bus Variants Serial Buses Introduction Electrical Considerations Data Transfer Synchronization Bus Arbitration VME Bus Local Buses PCI Bus PCI Bus Variants Serial Buses 1 Most of the integrated I/O subsystems are connected to the

More information

Intel Virtualization Technology Roadmap and VT-d Support in Xen

Intel Virtualization Technology Roadmap and VT-d Support in Xen Intel Virtualization Technology Roadmap and VT-d Support in Xen Jun Nakajima Intel Open Source Technology Center Legal Disclaimer INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL PRODUCTS.

More information

21154 PCI-to-PCI Bridge Configuration

21154 PCI-to-PCI Bridge Configuration 21154 PCI-to-PCI Bridge Configuration Application Note October 1998 Order Number: 278080-001 Information in this document is provided in connection with Intel products. No license, express or implied,

More information

TLBs, Paging-Structure Caches, and Their Invalidation

TLBs, Paging-Structure Caches, and Their Invalidation TLBs, Paging-Structure Caches, and Their Invalidation Application Note April 2007 Document Number: 317080-001 Legal Statements INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL PRODUCTS.

More information

Generic Serial Flash Interface Intel FPGA IP Core User Guide

Generic Serial Flash Interface Intel FPGA IP Core User Guide Generic Serial Flash Interface Intel FPGA IP Core User Guide Updated for Intel Quartus Prime Design Suite: 18.0 Subscribe Send Feedback Latest document on the web: PDF HTML Contents Contents 1. Generic

More information

Localized Adaptive Contrast Enhancement (LACE)

Localized Adaptive Contrast Enhancement (LACE) Localized Adaptive Contrast Enhancement (LACE) Graphics Driver Technical White Paper September 2018 Revision 1.0 You may not use or facilitate the use of this document in connection with any infringement

More information

AN 690: PCI Express DMA Reference Design for Stratix V Devices

AN 690: PCI Express DMA Reference Design for Stratix V Devices AN 690: PCI Express DMA Reference Design for Stratix V Devices an690-1.0 Subscribe The PCI Express Avalon Memory-Mapped (Avalon-MM) DMA Reference Design highlights the performance of the Avalon-MM 256-Bit

More information

Hardware-Assisted Mediated Pass-Through with VFIO. Kevin Tian Principal Engineer, Intel

Hardware-Assisted Mediated Pass-Through with VFIO. Kevin Tian Principal Engineer, Intel Hardware-Assisted Mediated Pass-Through with VFIO Kevin Tian Principal Engineer, Intel 1 Legal Disclaimer No license (express or implied, by estoppel or otherwise) to any intellectual property rights is

More information

PCI Express Multi-Channel DMA Interface

PCI Express Multi-Channel DMA Interface 2014.12.15 UG-01160 Subscribe The PCI Express DMA Multi-Channel Controller Example Design provides multi-channel support for the Stratix V Avalon Memory-Mapped (Avalon-MM) DMA for PCI Express IP Core.

More information

Intel Compute Card Slot Design Overview

Intel Compute Card Slot Design Overview + Intel Compute Card Slot Design Overview Revision Number 1.1 May 14, 2018 Disclaimer You may not use or facilitate the use of this document in connection with any infringement or other legal analysis

More information

PCI Express*: Migrating to Intel Stratix 10 Devices for the Avalon Streaming Interface

PCI Express*: Migrating to Intel Stratix 10 Devices for the Avalon Streaming Interface PCI Express*: Migrating to Intel Stratix 10 Devices for the Avalon Streaming Interface AN791 2017.05.08 Last updated for Intel Quartus Prime Design Suite: Quartus Prime Pro v17.1 Stratix 10 Editions Subscribe

More information

6th Generation Intel Core Processor Series

6th Generation Intel Core Processor Series 6th Generation Intel Core Processor Series Application Power Guidelines Addendum Supporting the 6th Generation Intel Core Processor Series Based on the S-Processor Lines August 2015 Document Number: 332854-001US

More information

Intel Speed Select Technology Base Frequency - Enhancing Performance

Intel Speed Select Technology Base Frequency - Enhancing Performance Intel Speed Select Technology Base Frequency - Enhancing Performance Application Note April 2019 Document Number: 338928-001 You may not use or facilitate the use of this document in connection with any

More information

SerialLite III Streaming IP Core Design Example User Guide for Intel Arria 10 Devices

SerialLite III Streaming IP Core Design Example User Guide for Intel Arria 10 Devices IP Core Design Example User Guide for Intel Arria 10 Devices Updated for Intel Quartus Prime Design Suite: 17.1 Subscribe Send Feedback Latest document on the web: PDF HTML Contents Contents 1 Quick Start

More information

Enhanced Serial Peripheral Interface (espi)

Enhanced Serial Peripheral Interface (espi) Enhanced Serial Peripheral Interface (espi) Addendum for Server Platforms December 2013 Revision 0.7 329957 0BIntroduction Intel hereby grants you a fully-paid, non-exclusive, non-transferable, worldwide,

More information

NVDIMM DSM Interface Example

NVDIMM DSM Interface Example Revision 1.3 December 2016 See the change bars associated with the following changes to this document: 1) Common _DSMs supported by all NVDIMMs have been removed from this document. 2) Changes to SMART

More information

Applying the Benefits of Network on a Chip Architecture to FPGA System Design

Applying the Benefits of Network on a Chip Architecture to FPGA System Design white paper Intel FPGA Applying the Benefits of on a Chip Architecture to FPGA System Design Authors Kent Orthner Senior Manager, Software and IP Intel Corporation Table of Contents Abstract...1 Introduction...1

More information

Creating PCI Express Links in Intel FPGAs

Creating PCI Express Links in Intel FPGAs Creating PCI Express Links in Intel FPGAs Course Description This course provides all necessary theoretical and practical know how to create PCI Express links in Intel FPGAs. The course goes into great

More information

Intel Visual Compute Accelerator Product Family

Intel Visual Compute Accelerator Product Family Intel Visual Compute Accelerator Product Family Release Notes for 2.2 release Rev 1.0 July 2018 Intel Server Products and Solutions Intel Visual Compute Accelerator Release Notes Document

More information

Intel Omni-Path Fabric Manager GUI Software

Intel Omni-Path Fabric Manager GUI Software Intel Omni-Path Fabric Manager GUI Software Release Notes for V10.7 Rev. 1.0 April 2018 Order No.: J95968-1.0 You may not use or facilitate the use of this document in connection with any infringement

More information

SerialLite III Streaming IP Core Design Example User Guide for Intel Stratix 10 Devices

SerialLite III Streaming IP Core Design Example User Guide for Intel Stratix 10 Devices SerialLite III Streaming IP Core Design Example User Guide for Intel Stratix 10 Devices Updated for Intel Quartus Prime Design Suite: 17.1 Stratix 10 ES Editions Subscribe Send Feedback Latest document

More information

Intel Stratix 10 Low Latency 40G Ethernet Design Example User Guide

Intel Stratix 10 Low Latency 40G Ethernet Design Example User Guide Intel Stratix 10 Low Latency 40G Ethernet Design Example User Guide Updated for Intel Quartus Prime Design Suite: 18.1 Subscribe Latest document on the web: PDF HTML Contents Contents 1. Quick Start Guide...

More information

Intel True Scale Fabric Switches Series

Intel True Scale Fabric Switches Series Intel True Scale Fabric Switches 12000 Series Doc. Number: H70235 Revision: 001US No license (express or implied, by estoppel or otherwise) to any intellectual property rights is granted by this document.

More information

Lecture 10: Cache Coherence: Part I. Parallel Computer Architecture and Programming CMU , Spring 2013

Lecture 10: Cache Coherence: Part I. Parallel Computer Architecture and Programming CMU , Spring 2013 Lecture 10: Cache Coherence: Part I Parallel Computer Architecture and Programming Cache design review Let s say your code executes int x = 1; (Assume for simplicity x corresponds to the address 0x12345604

More information

Intel Storage System JBOD 2000S3 Product Family

Intel Storage System JBOD 2000S3 Product Family Intel Storage System JBOD 2000S3 Product Family SCSI Enclosure Services Programming Guide SES Version 3.0, Revision 1.8 Apr 2017 Intel Server Boards and Systems Headline

More information

Low Latency 100G Ethernet Intel Stratix 10 FPGA IP Design Example User Guide

Low Latency 100G Ethernet Intel Stratix 10 FPGA IP Design Example User Guide Low Latency 100G Ethernet Intel Stratix 10 FPGA IP Design Example User Guide Updated for Intel Quartus Prime Design Suite: 18.0 Subscribe Send Feedback Latest document on the web: PDF HTML Contents Contents

More information

Intel Unite Solution Intel Unite Plugin for WebEx*

Intel Unite Solution Intel Unite Plugin for WebEx* Intel Unite Solution Intel Unite Plugin for WebEx* Version 1.0 Legal Notices and Disclaimers All information provided here is subject to change without notice. Contact your Intel representative to obtain

More information

RapidIO TM Interconnect Specification Part 7: System and Device Inter-operability Specification

RapidIO TM Interconnect Specification Part 7: System and Device Inter-operability Specification RapidIO TM Interconnect Specification Part 7: System and Device Inter-operability Specification Rev. 1.3, 06/2005 Copyright RapidIO Trade Association RapidIO Trade Association Revision History Revision

More information

I/O virtualization. Jiang, Yunhong Yang, Xiaowei Software and Service Group 2009 虚拟化技术全国高校师资研讨班

I/O virtualization. Jiang, Yunhong Yang, Xiaowei Software and Service Group 2009 虚拟化技术全国高校师资研讨班 I/O virtualization Jiang, Yunhong Yang, Xiaowei 1 Legal Disclaimer INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL PRODUCTS. NO LICENSE,

More information

Crusoe Processor Model TM5800

Crusoe Processor Model TM5800 Model TM5800 Crusoe TM Processor Model TM5800 Features VLIW processor and x86 Code Morphing TM software provide x86-compatible mobile platform solution Processors fabricated in latest 0.13µ process technology

More information

PCI-X Protocol Addendum to the PCI Local Bus Specification Revision 2.0a

PCI-X Protocol Addendum to the PCI Local Bus Specification Revision 2.0a PCI-X Protocol Addendum to the PCI Local Bus Specification Revision 2.0a July 29, 2002July 22, 2003 REVISION REVISION HISTORY DATE 1.0 Initial release. 9/22/99 1.0a Clarifications and typographical corrections.

More information

Stanislav Bratanov; Roman Belenov; Ludmila Pakhomova 4/27/2015

Stanislav Bratanov; Roman Belenov; Ludmila Pakhomova 4/27/2015 Stanislav Bratanov; Roman Belenov; Ludmila Pakhomova 4/27/2015 What is Intel Processor Trace? Intel Processor Trace (Intel PT) provides hardware a means to trace branching, transaction, and timing information

More information

Intel Unite Solution. Plugin Guide for Protected Guest Access

Intel Unite Solution. Plugin Guide for Protected Guest Access Intel Unite Solution Plugin Guide for Protected Guest Access June 2016 Legal Disclaimers & Copyrights All information provided here is subject to change without notice. Contact your Intel representative

More information

OpenCL* and Microsoft DirectX* Video Acceleration Surface Sharing

OpenCL* and Microsoft DirectX* Video Acceleration Surface Sharing OpenCL* and Microsoft DirectX* Video Acceleration Surface Sharing Intel SDK for OpenCL* Applications Sample Documentation Copyright 2010 2012 Intel Corporation All Rights Reserved Document Number: 327281-001US

More information

ENVISION TECHNOLOGY CONFERENCE. Functional intel (ia) BLA PARTHAS, INTEL PLATFORM ARCHITECT

ENVISION TECHNOLOGY CONFERENCE. Functional intel (ia) BLA PARTHAS, INTEL PLATFORM ARCHITECT ENVISION TECHNOLOGY CONFERENCE Functional Safety @ intel (ia) BLA PARTHAS, INTEL PLATFORM ARCHITECT Legal Notices & Disclaimers This document contains information on products, services and/or processes

More information

OpenCL* Device Fission for CPU Performance

OpenCL* Device Fission for CPU Performance OpenCL* Device Fission for CPU Performance Summary Device fission is an addition to the OpenCL* specification that gives more power and control to OpenCL programmers over managing which computational units

More information

PCI-SIG ENGINEERING CHANGE NOTICE

PCI-SIG ENGINEERING CHANGE NOTICE PCI-SIG ENGINEERING CHANGE NOTICE TITLE: Lightweight Notification (LN) Protocol DATE: Introduced: Jan 27, 2009; Last Updated Oct 2, 2011 Protocol Workgroup Final Approval: October 6, 2011 AFFECTED DOCUMENT:

More information

Intel MAX 10 User Flash Memory User Guide

Intel MAX 10 User Flash Memory User Guide Intel MAX 10 User Flash Memory User Guide Updated for Intel Quartus Prime Design Suite: 18.0 Subscribe Send Feedback Latest document on the web: PDF HTML Contents Contents 1. Intel MAX 10 User Flash Memory

More information

Intel X48 Express Chipset Memory Controller Hub (MCH)

Intel X48 Express Chipset Memory Controller Hub (MCH) Intel X48 Express Chipset Memory Controller Hub (MCH) Specification Update March 2008 Document Number: 319123-001 Legal Lines and Disclaimers INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH

More information

Intel Visual Compute Accelerator Product Family

Intel Visual Compute Accelerator Product Family Intel Visual Compute Accelerator Product Family Release Notes for 2.1 release Rev 1.0 May 2018 Intel Server Products and Solutions Document Revision History Date Revision Changes May 2018

More information

LogiCORE IP AXI DMA v6.01.a

LogiCORE IP AXI DMA v6.01.a LogiCORE IP AXI DMA v6.01.a Product Guide Table of Contents SECTION I: SUMMARY IP Facts Chapter 1: Overview Typical System Interconnect......................................................... 8 Operating

More information

Intel s Architecture for NFV

Intel s Architecture for NFV Intel s Architecture for NFV Evolution from specialized technology to mainstream programming Net Futures 2015 Network applications Legal Disclaimer INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION

More information

Jomar Silva Technical Evangelist

Jomar Silva Technical Evangelist Jomar Silva Technical Evangelist Agenda Introduction Intel Graphics Performance Analyzers: what is it, where do I get it, and how do I use it? Intel GPA with VR What devices can I use Intel GPA with and

More information

Intel Firmware Support Package (Intel FSP) for Intel Xeon Processor D Product Family (formerly Broadwell-DE), Gold 001

Intel Firmware Support Package (Intel FSP) for Intel Xeon Processor D Product Family (formerly Broadwell-DE), Gold 001 Intel Firmware Support Package (Intel FSP) for Intel Xeon Processor D Product Family (formerly Broadwell-DE), Gold 001 Release Notes February 2016 You may not use or facilitate the use of this document

More information

Interlaken Look-Aside Protocol Definition

Interlaken Look-Aside Protocol Definition Interlaken Look-Aside Protocol Definition Contents Terms and Conditions This document has been developed with input from a variety of companies, including members of the Interlaken Alliance, all of which

More information

i960 VH Embedded-PCI Processor

i960 VH Embedded-PCI Processor i960 VH Embedded-PCI Processor Specification Update November 1998 Notice: The 80960VH may contain design defects or errors known as errata. Characterized errata that may cause 80960VH s behavior to deviate

More information

High Bandwidth Memory (HBM2) Interface Intel FPGA IP Design Example User Guide

High Bandwidth Memory (HBM2) Interface Intel FPGA IP Design Example User Guide High Bandwidth Memory (HBM2) Interface Intel FPGA IP Design Example Updated for Intel Quartus Prime Design Suite: 18.1.1 Subscribe Latest document on the web: PDF HTML Contents Contents 1. High Bandwidth

More information

Intel Unite Plugin for Logitech GROUP* and Logitech CONNECT* Devices INSTALLATION AND USER GUIDE

Intel Unite Plugin for Logitech GROUP* and Logitech CONNECT* Devices INSTALLATION AND USER GUIDE Intel Unite Plugin for Logitech GROUP* and Logitech CONNECT* Devices INSTALLATION AND USER GUIDE November 2017 You may not use or facilitate the use of this document in connection with any infringement

More information

Lecture 10: Cache Coherence: Part I. Parallel Computer Architecture and Programming CMU /15-618, Spring 2015

Lecture 10: Cache Coherence: Part I. Parallel Computer Architecture and Programming CMU /15-618, Spring 2015 Lecture 10: Cache Coherence: Part I Parallel Computer Architecture and Programming CMU 15-418/15-618, Spring 2015 Tunes Marble House The Knife (Silent Shout) Before starting The Knife, we were working

More information

Intel Omni-Path Fabric Manager GUI Software

Intel Omni-Path Fabric Manager GUI Software Intel Omni-Path Fabric Manager GUI Software Release Notes for V10.9.0 Rev. 1.0 December 2018 Doc. No.: K38339, Rev.: 1.0 You may not use or facilitate the use of this document in connection with any infringement

More information

Intel Atom Processor Based Platform Technologies. Intelligent Systems Group Intel Corporation

Intel Atom Processor Based Platform Technologies. Intelligent Systems Group Intel Corporation Intel Atom Processor Based Platform Technologies Intelligent Systems Group Intel Corporation Legal Disclaimer INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL PRODUCTS. NO LICENSE, EXPRESS

More information

Movidius Neural Compute Stick

Movidius Neural Compute Stick Movidius Neural Compute Stick You may not use or facilitate the use of this document in connection with any infringement or other legal analysis concerning Intel products described herein. You agree to

More information

LOW PIN COUNT (LPC) INTERFACE SPECIFICATION

LOW PIN COUNT (LPC) INTERFACE SPECIFICATION LOW PIN COUNT (LPC) INTERFACE SPECIFICATION Revision 1.0 September 29, 1997 Intel may have patents and/or patent applications related to the various Low Pin Count interfaces described in the Low Pin Count

More information

Michael Kinsner, Dirk Seynhaeve IWOCL 2018

Michael Kinsner, Dirk Seynhaeve IWOCL 2018 Michael Kinsner, Dirk Seynhaeve IWOCL 2018 Topics 1. FPGA overview 2. Motivating application classes 3. Host pipes 4. Some data 2 FPGA: Fine-grained Massive Parallelism Intel Stratix 10 FPGA: Over 5 Million

More information

DRAM and Storage-Class Memory (SCM) Overview

DRAM and Storage-Class Memory (SCM) Overview Page 1 of 7 DRAM and Storage-Class Memory (SCM) Overview Introduction/Motivation Looking forward, volatile and non-volatile memory will play a much greater role in future infrastructure solutions. Figure

More information