From: Baptiste Jonglez Date: Mon, 20 Feb 2017 15:59:28 +0000 (+0100) Subject: dnsmasq: Add upstream patch fixing SERVFAIL issues with multiple servers X-Git-Url: http://git.cdn.openwrt.org/?a=commitdiff_plain;h=59508e309e91ba152ae43ef1d6983f2f6f873632;p=openwrt%2Fstaging%2Fjogo.git dnsmasq: Add upstream patch fixing SERVFAIL issues with multiple servers This fixes FS#391 for lede-17.01 Signed-off-by: Baptiste Jonglez --- diff --git a/package/network/services/dnsmasq/patches/000-fix-servfail-handling.patch b/package/network/services/dnsmasq/patches/000-fix-servfail-handling.patch new file mode 100644 index 0000000000..e311c34729 --- /dev/null +++ b/package/network/services/dnsmasq/patches/000-fix-servfail-handling.patch @@ -0,0 +1,130 @@ +From 68f6312d4bae30b78daafcd6f51dc441b8685b1e Mon Sep 17 00:00:00 2001 +From: Baptiste Jonglez +Date: Mon, 6 Feb 2017 21:09:11 +0000 +Subject: [PATCH] Stop treating SERVFAIL as a successful response from upstream + servers. + +This effectively reverts most of 51967f9807 ("SERVFAIL is an expected +error return, don't try all servers.") and 4ace25c5d6 ("Treat REFUSED (not +SERVFAIL) as an unsuccessful upstream response"). + +With the current behaviour, as soon as dnsmasq receives a SERVFAIL from an +upstream server, it stops trying to resolve the query and simply returns +SERVFAIL to the client. With this commit, dnsmasq will instead try to +query other upstream servers upon receiving a SERVFAIL response. + +According to RFC 1034 and 1035, the semantic of SERVFAIL is that of a +temporary error condition. Recursive resolvers are expected to encounter +network or resources issues from time to time, and will respond with +SERVFAIL in this case. Similarly, if a validating DNSSEC resolver [RFC +4033] encounters issues when checking signatures (unknown signing +algorithm, missing signatures, expired signatures because of a wrong +system clock, etc), it will respond with SERVFAIL. + +Note that all those behaviours are entirely different from a negative +response, which would provide a definite indication that the requested +name does not exist. In our case, if an upstream server responds with +SERVFAIL, another upstream server may well provide a positive answer for +the same query. + +Thus, this commit will increase robustness whenever some upstream servers +encounter temporary issues or are misconfigured. + +Quoting RFC 1034, Section 4.3.1. "Queries and responses": + + If recursive service is requested and available, the recursive response + to a query will be one of the following: + + - The answer to the query, possibly preface by one or more CNAME + RRs that specify aliases encountered on the way to an answer. + + - A name error indicating that the name does not exist. This + may include CNAME RRs that indicate that the original query + name was an alias for a name which does not exist. + + - A temporary error indication. + +Here is Section 5.2.3. of RFC 1034, "Temporary failures": + + In a less than perfect world, all resolvers will occasionally be unable + to resolve a particular request. This condition can be caused by a + resolver which becomes separated from the rest of the network due to a + link failure or gateway problem, or less often by coincident failure or + unavailability of all servers for a particular domain. + +And finally, RFC 1035 specifies RRCODE 2 for this usage, which is now more +widely known as SERVFAIL (RFC 1035, Section 4.1.1. "Header section format"): + + RCODE Response code - this 4 bit field is set as part of + responses. The values have the following + interpretation: + (...) + + 2 Server failure - The name server was + unable to process this query due to a + problem with the name server. + +For the DNSSEC-related usage of SERVFAIL, here is RFC 4033 +Section 5. "Scope of the DNSSEC Document Set and Last Hop Issues": + + A validating resolver can determine the following 4 states: + (...) + + Insecure: The validating resolver has a trust anchor, a chain of + trust, and, at some delegation point, signed proof of the + non-existence of a DS record. This indicates that subsequent + branches in the tree are provably insecure. A validating resolver + may have a local policy to mark parts of the domain space as + insecure. + + Bogus: The validating resolver has a trust anchor and a secure + delegation indicating that subsidiary data is signed, but the + response fails to validate for some reason: missing signatures, + expired signatures, signatures with unsupported algorithms, data + missing that the relevant NSEC RR says should be present, and so + forth. + (...) + + This specification only defines how security-aware name servers can + signal non-validating stub resolvers that data was found to be bogus + (using RCODE=2, "Server Failure"; see [RFC4035]). + +Notice the difference between a definite negative answer ("Insecure" +state), and an indefinite error condition ("Bogus" state). The second +type of error may be specific to a recursive resolver, for instance +because its system clock has been incorrectly set, or because it does not +implement newer cryptographic primitives. Another recursive resolver may +succeed for the same query. + +There are other similar situations in which the specified behaviour is +similar to the one implemented by this commit. + +For instance, RFC 2136 specifies the behaviour of a "requestor" that wants +to update a zone using the DNS UPDATE mechanism. The requestor tries to +contact all authoritative name servers for the zone, with the following +behaviour specified in RFC 2136, Section 4: + + 4.6. If a response is received whose RCODE is SERVFAIL or NOTIMP, or + if no response is received within an implementation dependent timeout + period, or if an ICMP error is received indicating that the server's + port is unreachable, then the requestor will delete the unusable + server from its internal name server list and try the next one, + repeating until the name server list is empty. If the requestor runs + out of servers to try, an appropriate error will be returned to the + requestor's caller. +--- + src/forward.c | 3 ++- + 1 file changed, 2 insertions(+), 1 deletion(-) + +--- a/src/forward.c ++++ b/src/forward.c +@@ -853,7 +853,8 @@ void reply_query(int fd, int family, tim + we get a good reply from another server. Kill it when we've + had replies from all to avoid filling the forwarding table when + everything is broken */ +- if (forward->forwardall == 0 || --forward->forwardall == 1 || RCODE(header) != REFUSED) ++ if (forward->forwardall == 0 || --forward->forwardall == 1 || ++ (RCODE(header) != REFUSED && RCODE(header) != SERVFAIL)) + { + int check_rebind = 0, no_cache_dnssec = 0, cache_secure = 0, bogusanswer = 0; +