Conversation
This driver is for Intel X540/X550 NICs and works with IO-APIC and MSI/MSI-X interrupts. Signed-off-by: Terry Bai <tianyi.bai@unsw.edu.au>
Signed-off-by: Terry Bai <tianyi.bai@unsw.edu.au>
| static inline void set_flags(uintptr_t reg, uint32_t flags) { | ||
| set_reg(reg, get_reg(reg) | flags); | ||
| } | ||
|
|
||
| static inline void clear_flags(uintptr_t reg, uint32_t flags) { | ||
| set_reg(reg, get_reg(reg) & ~flags); | ||
| } |
There was a problem hiding this comment.
do we really need these helpers?
| void clear_interrupts(void) | ||
| { | ||
| set_reg(EIMC, IXGBE_IRQ_CLEAR_MASK); | ||
| get_reg(EICR); // Write Flush? |
There was a problem hiding this comment.
what does this comment mean?
There was a problem hiding this comment.
The comment that I left is to figure out why the read is there. I suspect it's for making sure the previous write request has been completed, but not quite sure.
From "PCI Express® Base Specification Revision 5.0":
A Memory Read Request of 1 DW with no bytes enabled, or “zero-length Read,” may be used by devices as a type of flush Request. For a Requester, the flush semantic allows a device to ensure that previously issued Posted Writes have been completed at their PCI Express destination. To be effective in all cases, the address for the zero-length Read must target the same device as the Posted Writes that are being flushed. One recommended approach is using the same address as one of the Posted Writes being flushed.
| | (uint32_t)buffer.len; | ||
| desc->read.olinfo_status = ((uint32_t)buffer.len << IXGBE_ADVTXD_PAYLEN_SHIFT); | ||
|
|
||
| /* THREAD_MEMORY_RELEASE(); */ |
There was a problem hiding this comment.
either remove these comments or add the memory release
There was a problem hiding this comment.
To me the write barrier here can be removed. The NIC uses the the tail register to know which descriptor to fetch.
| config.virt_tx.active_queue.vaddr, | ||
| config.virt_tx.num_buffers); | ||
|
|
||
| // Disable Interrupts, see Section 4.6.3.1 |
There was a problem hiding this comment.
can we link the manual we are referncing somewhere for these sections to make sense?
| // Enable MSI. see PCI Express Technology 3.0 Chapter 17 for more details. | ||
| /* set_flags16(PCI_COMMAND_16, BIT(10)); */ | ||
| /* set_flags16(PCI_MSI_MESSAGE_CONTROL_16, BIT(0)); */ | ||
| /* clear_flags16(PCI_MSI_MESSAGE_CONTROL_16, BIT(4) | BIT(5) | BIT(6)); */ | ||
| /* set_reg(PCI_MSI_MESSAGE_ADDRESS_LOW, 0xFEEu << 20); */ | ||
| /* set_reg(PCI_MSI_MESSAGE_ADDRESS_HIGH, 0); */ | ||
| /* set_reg16(PCI_MSI_MESSAGE_DATA_16, 0x31); */ | ||
| /* clear_flags16(PCI_MSI_MASK, BIT(0)); */ | ||
|
|
||
| // Enable MSI-X, see PCI Express Technology 3.0 Chapter 17 for more details. | ||
| /* // Disable legacy interrupts. TODO: this should be done by PCI driver. */ | ||
| /* set_flags16(PCI_COMMAND_16, BIT(10)); */ | ||
| /* // Set vector message address to Local APIC of CPU0 */ | ||
| /* set_reg(DEVICE_MSIX_TABLE + 0x0, 0xFEEu << 20); */ | ||
| /* set_reg(DEVICE_MSIX_TABLE + 0x4, 0); */ | ||
| /* // Set vector data to Interrupt Vector */ | ||
| /* set_reg(DEVICE_MSIX_TABLE + 0x8, 0x32); */ | ||
| /* // Unmask vector 0 to enable interrupts through it */ | ||
| /* set_reg(DEVICE_MSIX_TABLE + 0xC, 0xFFFFFFFE); */ | ||
| /* // Enable MSI-X. TODO: this should be set by PCI driver. */ | ||
| /* set_flags(PCI_MSIX_CTRL, BIT(31)); */ |
There was a problem hiding this comment.
Either these are needed or not; there's no need for commented out code.
There was a problem hiding this comment.
As described in this PR, these are for manually switch which type of IRQs to use before a PCI driver is ready.
| // @jade: why do we get into this loop all the time even when there is no packets in there? | ||
|
|
||
| /* If buffer slot is still empty, we have processed all packets the device has filled */ | ||
| ixgbe_adv_rx_desc_wb_t desc = device.rx_ring[device.rx_head].wb; |
There was a problem hiding this comment.
This access to the write-back format of the advanced descriptor should be done via pointers, as in other ethernet drivers.
In particular, the access to the DD bit of the status_error field should be ordered before the access to the length field. According to the X550 manual (rev. 2.7) section 7.1.5.4,
Software can determine if a packet has been received only by checking the receive descriptor
DDbit in memory. Checking throughDDbits (and not by checking the receive head pointer in
RDH/RDLregisters) eliminates a potential race condition
In section 7.1.5.5 of the manual, the second method to determine the availability of a receive packet is mentioned by 'performing a Programmed I/O read of the Receive Descriptor Head register', but there is a race condition if 'systems performing I/O write buffering'.
| break; | ||
| if ((desc.upper.status_error & IXGBE_RXDADV_STAT_EOP) == 0) | ||
| break; | ||
|
|
There was a problem hiding this comment.
There should be a read barrier THREAD_MEMORY_ACQUIRE between the comparison for the DD bit and the access to the length field of the descriptor. The EOP bit is in the same byte of the DD bit and thus the ordering of it w.r.t. that of DD does not matter (because of single-copy atomicity).
| desc->read.pkt_addr = buffer.io_or_offset; | ||
| desc->read.hdr_addr = 0; | ||
|
|
||
| THREAD_MEMORY_RELEASE(); |
There was a problem hiding this comment.
It doesn't seem to me that this write barrier is necessary. According to section 7.1.5.5 of the X550 manual (rev. 2.7),
Software inserts receive descriptors by advancing the tail pointer(s) to refer to the address of the entry just beyond the last valid descriptor.
which corresponds to set_reg(RDT(0), device.rx_tail) below.
| if (provided) { | ||
| /* THREAD_MEMORY_RELEASE(); */ | ||
| set_reg(TDT(0), device.tx_tail); | ||
| get_reg(TDT(0)); |
There was a problem hiding this comment.
TODO: to me the purpose of get_reg here is to wait until the previous set_reg completes.
| } | ||
| // @jade: should I move this to the outer block? | ||
| if (provided) { | ||
| /* THREAD_MEMORY_RELEASE(); */ |
There was a problem hiding this comment.
I believe the write barrier here is required. According to section 7.2.3.4 of the X550 datasheet (rev. 2.7),
Software must update the Tail register on packet boundaries.
| | (uint32_t)buffer.len; | ||
| desc->read.olinfo_status = ((uint32_t)buffer.len << IXGBE_ADVTXD_PAYLEN_SHIFT); | ||
|
|
||
| /* THREAD_MEMORY_RELEASE(); */ |
There was a problem hiding this comment.
To me the write barrier here can be removed. The NIC uses the the tail register to know which descriptor to fetch.
| if ((hw_desc.status & IXGBE_ADVTXD_STAT_DD) == 0) | ||
| break; | ||
|
|
||
| THREAD_MEMORY_RELEASE(); |
There was a problem hiding this comment.
This should be a read barrier This barrier should be removed.THREAD_MEMORY_ACQUIRE(). The purpose is to order the access of DD bit and the enqueue of descr_mdata.
| while (!hw_tx_ring_empty()) { | ||
| /* Ensure that this buffer has been sent by the device */ | ||
| ixgbe_adv_tx_desc_wb_t hw_desc = device.tx_ring[device.tx_head].wb; | ||
| if ((hw_desc.status & IXGBE_ADVTXD_STAT_DD) == 0) |
There was a problem hiding this comment.
TODO: it is unclear to me whether we should use the head pointer to detect completion of Tx descriptor write back. According to section 7.2.3.5.2 of the X550 datasheet (rev. 2.7),
In legacy hardware, transmit requests are completed by writing the
DDbit to the transmit descriptor ring. This causes cache trash since both the driver and hardware are writing to the descriptor ring in host memory. Instead of writing theDDbits to signal that a transmit request is complete, hardware can write the contents of the descriptor queue head to host memory. The driver reads that memory location to determine which transmit requests are complete.
This driver is for Intel X540/X550 NICs and works with IO-APIC and MSI/MSI-X interrupts.
Before a proper PCI driver (#622) gets implemented, this experimental driver needs the access to PCI configuration space and set things up by itself. Interrupt types need to be manually switched by (un)commenting code snippets in
meta.pyandethernet.c.