Skip to content

drivers/network: add ixgbe driver#682

Open
terryzbai wants to merge 2 commits intomainfrom
ixgbe_driver_cleanup
Open

drivers/network: add ixgbe driver#682
terryzbai wants to merge 2 commits intomainfrom
ixgbe_driver_cleanup

Conversation

@terryzbai
Copy link
Copy Markdown
Contributor

This driver is for Intel X540/X550 NICs and works with IO-APIC and MSI/MSI-X interrupts.

Before a proper PCI driver (#622) gets implemented, this experimental driver needs the access to PCI configuration space and set things up by itself. Interrupt types need to be manually switched by (un)commenting code snippets in meta.py and ethernet.c.

This driver is for Intel X540/X550 NICs and works
with IO-APIC and MSI/MSI-X interrupts.

Signed-off-by: Terry Bai <tianyi.bai@unsw.edu.au>
Signed-off-by: Terry Bai <tianyi.bai@unsw.edu.au>
Copy link
Copy Markdown
Contributor

@midnightveil midnightveil left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Brief skim.

Comment thread drivers/network/ixgbe/ethernet.c Outdated
Comment thread drivers/network/ixgbe/ethernet.c
Comment thread drivers/network/ixgbe/ethernet.c
Comment thread drivers/network/ixgbe/ethernet.c Outdated
Comment on lines +84 to +90
static inline void set_flags(uintptr_t reg, uint32_t flags) {
set_reg(reg, get_reg(reg) | flags);
}

static inline void clear_flags(uintptr_t reg, uint32_t flags) {
set_reg(reg, get_reg(reg) & ~flags);
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we really need these helpers?

void clear_interrupts(void)
{
set_reg(EIMC, IXGBE_IRQ_CLEAR_MASK);
get_reg(EICR); // Write Flush?
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what does this comment mean?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment that I left is to figure out why the read is there. I suspect it's for making sure the previous write request has been completed, but not quite sure.

From "PCI Express® Base Specification Revision 5.0":

A Memory Read Request of 1 DW with no bytes enabled, or “zero-length Read,” may be used by devices as a type of flush Request. For a Requester, the flush semantic allows a device to ensure that previously issued Posted Writes have been completed at their PCI Express destination. To be effective in all cases, the address for the zero-length Read must target the same device as the Posted Writes that are being flushed. One recommended approach is using the same address as one of the Posted Writes being flushed.

| (uint32_t)buffer.len;
desc->read.olinfo_status = ((uint32_t)buffer.len << IXGBE_ADVTXD_PAYLEN_SHIFT);

/* THREAD_MEMORY_RELEASE(); */
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

either remove these comments or add the memory release

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To me the write barrier here can be removed. The NIC uses the the tail register to know which descriptor to fetch.

Comment thread drivers/network/ixgbe/ethernet.c
config.virt_tx.active_queue.vaddr,
config.virt_tx.num_buffers);

// Disable Interrupts, see Section 4.6.3.1
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we link the manual we are referncing somewhere for these sections to make sense?

Comment on lines +336 to +356
// Enable MSI. see PCI Express Technology 3.0 Chapter 17 for more details.
/* set_flags16(PCI_COMMAND_16, BIT(10)); */
/* set_flags16(PCI_MSI_MESSAGE_CONTROL_16, BIT(0)); */
/* clear_flags16(PCI_MSI_MESSAGE_CONTROL_16, BIT(4) | BIT(5) | BIT(6)); */
/* set_reg(PCI_MSI_MESSAGE_ADDRESS_LOW, 0xFEEu << 20); */
/* set_reg(PCI_MSI_MESSAGE_ADDRESS_HIGH, 0); */
/* set_reg16(PCI_MSI_MESSAGE_DATA_16, 0x31); */
/* clear_flags16(PCI_MSI_MASK, BIT(0)); */

// Enable MSI-X, see PCI Express Technology 3.0 Chapter 17 for more details.
/* // Disable legacy interrupts. TODO: this should be done by PCI driver. */
/* set_flags16(PCI_COMMAND_16, BIT(10)); */
/* // Set vector message address to Local APIC of CPU0 */
/* set_reg(DEVICE_MSIX_TABLE + 0x0, 0xFEEu << 20); */
/* set_reg(DEVICE_MSIX_TABLE + 0x4, 0); */
/* // Set vector data to Interrupt Vector */
/* set_reg(DEVICE_MSIX_TABLE + 0x8, 0x32); */
/* // Unmask vector 0 to enable interrupts through it */
/* set_reg(DEVICE_MSIX_TABLE + 0xC, 0xFFFFFFFE); */
/* // Enable MSI-X. TODO: this should be set by PCI driver. */
/* set_flags(PCI_MSIX_CTRL, BIT(31)); */
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Either these are needed or not; there's no need for commented out code.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As described in this PR, these are for manually switch which type of IRQs to use before a PCI driver is ready.

Comment thread drivers/network/ixgbe/ethernet.h
Copy link
Copy Markdown
Contributor

@KurtWu10 KurtWu10 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've only checked the provide/return functions of Rx/Tx. I need to read the intel manual further to understand its concurrency model. I don't see the need of using inline assembly for get_reg*/set_reg* either.

// @jade: why do we get into this loop all the time even when there is no packets in there?

/* If buffer slot is still empty, we have processed all packets the device has filled */
ixgbe_adv_rx_desc_wb_t desc = device.rx_ring[device.rx_head].wb;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This access to the write-back format of the advanced descriptor should be done via pointers, as in other ethernet drivers.

In particular, the access to the DD bit of the status_error field should be ordered before the access to the length field. According to the X550 manual (rev. 2.7) section 7.1.5.4,

Software can determine if a packet has been received only by checking the receive descriptor
DD bit in memory. Checking through DD bits (and not by checking the receive head pointer in
RDH/RDL registers) eliminates a potential race condition

In section 7.1.5.5 of the manual, the second method to determine the availability of a receive packet is mentioned by 'performing a Programmed I/O read of the Receive Descriptor Head register', but there is a race condition if 'systems performing I/O write buffering'.

break;
if ((desc.upper.status_error & IXGBE_RXDADV_STAT_EOP) == 0)
break;

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There should be a read barrier THREAD_MEMORY_ACQUIRE between the comparison for the DD bit and the access to the length field of the descriptor. The EOP bit is in the same byte of the DD bit and thus the ordering of it w.r.t. that of DD does not matter (because of single-copy atomicity).

desc->read.pkt_addr = buffer.io_or_offset;
desc->read.hdr_addr = 0;

THREAD_MEMORY_RELEASE();
Copy link
Copy Markdown
Contributor

@KurtWu10 KurtWu10 Mar 31, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't seem to me that this write barrier is necessary. According to section 7.1.5.5 of the X550 manual (rev. 2.7),

Software inserts receive descriptors by advancing the tail pointer(s) to refer to the address of the entry just beyond the last valid descriptor.

which corresponds to set_reg(RDT(0), device.rx_tail) below.

if (provided) {
/* THREAD_MEMORY_RELEASE(); */
set_reg(TDT(0), device.tx_tail);
get_reg(TDT(0));
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO: to me the purpose of get_reg here is to wait until the previous set_reg completes.

}
// @jade: should I move this to the outer block?
if (provided) {
/* THREAD_MEMORY_RELEASE(); */
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe the write barrier here is required. According to section 7.2.3.4 of the X550 datasheet (rev. 2.7),

Software must update the Tail register on packet boundaries.

| (uint32_t)buffer.len;
desc->read.olinfo_status = ((uint32_t)buffer.len << IXGBE_ADVTXD_PAYLEN_SHIFT);

/* THREAD_MEMORY_RELEASE(); */
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To me the write barrier here can be removed. The NIC uses the the tail register to know which descriptor to fetch.

if ((hw_desc.status & IXGBE_ADVTXD_STAT_DD) == 0)
break;

THREAD_MEMORY_RELEASE();
Copy link
Copy Markdown
Contributor

@KurtWu10 KurtWu10 Mar 31, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be a read barrier THREAD_MEMORY_ACQUIRE(). The purpose is to order the access of DD bit and the enqueue of descr_mdata. This barrier should be removed.

while (!hw_tx_ring_empty()) {
/* Ensure that this buffer has been sent by the device */
ixgbe_adv_tx_desc_wb_t hw_desc = device.tx_ring[device.tx_head].wb;
if ((hw_desc.status & IXGBE_ADVTXD_STAT_DD) == 0)
Copy link
Copy Markdown
Contributor

@KurtWu10 KurtWu10 Mar 31, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO: it is unclear to me whether we should use the head pointer to detect completion of Tx descriptor write back. According to section 7.2.3.5.2 of the X550 datasheet (rev. 2.7),

In legacy hardware, transmit requests are completed by writing the DD bit to the transmit descriptor ring. This causes cache trash since both the driver and hardware are writing to the descriptor ring in host memory. Instead of writing the DD bits to signal that a transmit request is complete, hardware can write the contents of the descriptor queue head to host memory. The driver reads that memory location to determine which transmit requests are complete.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants