Skip to content

Conversation

@Johan511
Copy link
Contributor

@Johan511 Johan511 commented Jul 8, 2025

CxPlatDataPathResolveAddress in src/platform/datapath_unix.c forwards the return value of getaddrinfo as QUIC_STATUS to signal the success/failure of the function

This is incorrect, the return value of getaddrinfo indicates success if 0 else it is to be considered as failure
Reference: getaddrinfo man page

In case of QUIC_STATUS, a positive value indicates failure and a negative value indicates success.

This patch fixes this.

PS: All the error return values are negative for getaddrinfo, hence the existing implementation would never indicate an error.

@Johan511 Johan511 requested a review from a team as a code owner July 8, 2025 09:00

return Status;
if(Result != 0)
return QUIC_STATUS_INVALID_PARAMETER;
Copy link
Contributor

@ProjectsByJackHe ProjectsByJackHe Jul 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, according to the manpage, there are multiple error types (https://man7.org/linux/man-pages/man3/getaddrinfo.3.html#RETURN_VALUE), why is it ok to encapsulate all of them with QUIC_STATUS_INVALID_PARAMETER? Should we have some logic here to better map unix errors with QUIC errors?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interestingly, the windows specific version of this maps uses HRESULT_FROM_WIN32(WSAHOST_NOT_FOUND), which doesn't have a QUIC_STATUS explicit equivalent.

Maybe we could define QUIC_STATUS_HOST_NOT_FOUND and use it in both cases.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think a platform specific host not found error would be good.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we could define QUIC_STATUS_HOST_NOT_FOUND and use it in both cases.

All MsQuic errors we use right now are a mapping from certain posix errors (or windows errors)
Can you suggest which error should it be mapped to (for POSIX)?

Also to answer @ProjectsByJackHe
I do not see much value is translating each of the return values of getaddrinfo into the respective QUIC_STATUS_*. Please let me know if you think otherwise.

I can get behind a QUIC_STATUS_ADDR_RESOLVE_ERROR but not an equivalent for each error possible by getaddrinfo

Also there exists a QUIC_STATUS_ADDRESS_NOT_AVAILABLE which we never seem to use and is defined only in msquic_posix.h and not in it's windows counterpart (probably the reason we do not use it). Any opinions on it?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

QUIC_STATUS_ADDRESS_NOT_AVAILABLE was added in #3038. I am not sure of the context, but I would be cautious before modifying it.

Maybe it could be re-used if it makes sense (I am not familiar enough with EADDRNOTAVAIL to know if it would make sense to use it here, and mapped to HRESULT_FROM_WIN32(WSAHOST_NOT_FOUND) for Windows.
Or we can go with a new name if another Unix error code makes more sense here.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would EHOSTUNREACH make sense?

Copy link
Contributor Author

@Johan511 Johan511 Jul 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

EHOSTUNREACH
No valid routing table entry matches the destination
address. This error can be caused by an ICMP message from
a remote router or for the local routing table.

I'm not well versed with windows so will discuss from a POSIX perspective

This will lead to all the error codes being represented by HOST UNREACHABLE

What if getaddrfamily fails because the application mentioned it is ipv4 but the string is provides is unmistakeably ipv6?

Either we go overkill and add all these error codes in QUIC

Or return a simple EINVAL

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My angle is more from the point of view of the caller of ConnectionStart. It would be nice if we could fail with an error code that indicate "something went wrong with the address resolution" so the app developer has an idea on what to start with when debugging.

But that is honestly a minor concern, and EINVAL already be better than the current state (with the equivalent for the windows side)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, you are right.

I have added a macro NETDB_ERR_CODE_TO_QUIC_STATUS
Can you please review it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alternatively we can have

#define QUIC_STATUS_NETDB_GENERIC_ERROR     NETDB_ERR_CODE_TO_QUIC_STATUS(0)

#ifdef EAI_*
#define QUIC_STATUS_NETDB_ERR_*      NETDBNETDB_ERR_CODE_TO_QUIC_STATUS(EAI_*)
#else
#define QUIC_STATUS_NETDB_ERR_* QUIC_STATUS_GENERIC_NETDB_ERR
#endif

This would make translating each of these codes to their windows counter parts much easier and vice versa
Also removes the issue of getting the platform / libc version support for an error code wrong

@codecov
Copy link

codecov bot commented Jul 8, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 85.42%. Comparing base (5356efa) to head (6afe6b6).

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #5228      +/-   ##
==========================================
- Coverage   86.73%   85.42%   -1.31%     
==========================================
  Files          59       59              
  Lines       18330    18330              
==========================================
- Hits        15899    15659     -240     
- Misses       2431     2671     +240     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@csujedihy
Copy link
Collaborator

The description is slightly off- "a negative value indicates success." => "a non-positive value indicates success."

Exit:

return Status;
if(Result != 0)
Copy link
Collaborator

@csujedihy csujedihy Jul 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please ensure the coding style is consistent throughout this function.

if (Result != 0) {
    return QUIC_STATUS_INVALID_PARAMETER;
else {
    return QUIC_STATUS_SUCCESS
}

…ns of error value for QUIC_STATUS and return value of getaddrinfo

Signed-off-by: HHN <[email protected]>
@Johan511
Copy link
Contributor Author

Johan511 commented Jul 9, 2025

If this issue is being discussed in depth
Just want to tag another issue I have with which led me to finding this bug

Over here we define QUIC_LOCALHOST_FOR_AF(Af) as localhost or ip6-localhost on posix machines

I use a posix machine and my /etc/hosts does not have an entry for ip6-localhost which led to a failure in getaddrinfo (which was not found in getaddrinfo which led to this PR)

In case of windows we always use localhost, any opinions on changing the posix variant to always be localhost too?

@Johan511 Johan511 force-pushed the CxPlatDataPathResolveAddress-bug branch from 6afe6b6 to 20d2ba2 Compare July 11, 2025 19:32

#define NETDB_ERR_CODE_TO_QUIC_STATUS(Alert) (Alert + QUIC_NET_DB_ERROR_BASE)

# define QUIC_STATUS_BADFLAGS NETDB_ERR_CODE_TO_QUIC_STATUS(EAI_BADFLAGS) // 200001023
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure I like this solution.

Largely (ignoring QUIC_STATUS_ADDRESS_NOT_AVAILABLE which seems a strange case), QUIC_STATUS are defined for both Windows and Unix with similar names (and different values based on what makes sense for the OS). This allows the application code to be platform agnostic.

If we add QUIC_STATUS in the Unix header only, the application code stops being compatible cross-plateform if it uses one of them.

Unless we can find an ERRNO that would make sense as a generic "address resolution failed" (in which case, we can define a single new QUIC_STATUS_ADDRESS_RESOLUTION_FAILED mapped to HRESULT_FROM_WIN32(WSAHOST_NOT_FOUND) on Windows and that ERRNO on Unix), I would rather simply use one of the existing QUIC_STATUS, like QUIC_STATUS_UNREACHABLE or QUIC_STATUS_INVALID_PARAMETER.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants