DNS Propagation Guide: Caching & Update Logic

1. The Myth of Propagation

In physics, propagation describes waves moving through a medium. In DNS, nothing is "propelling" your change. Instead, DNS Propagation is the process of the world's millions of caches expiring.

When you change a record on your DNS server (the Authoritative Server), nothing tells the rest of the world that you've done it. The rest of the world (ISP Resolvers like Comcast or Google) only checks your server when its own internal timer (the TTL) runs out.

2. The TTL Timer ( Simon Says )

TTL (Time to Live) is the most important field in propagation. If your TTL is set to 3600 seconds (1 hour), a resolver will save your IP and never ask you again for 60 minutes.

If a user visits your site at 10:00 AM, the resolver saves the record until 11:00 AM. If you update your IP at 10:30 AM, that specific user (and everyone using that ISP) is stuck on the old IP until 11:00 AM.

T_{\text{expire}} = T_{\text{cached}} + \text{TTL}

The deterministic expiration time of a cached DNS record.

However, there is a hidden variable: TTL Jitter. Some high-scale recursive resolvers (like those used by major cloud providers) intentionally subtract a random number of seconds from your TTL (e.g., 5-10%) to prevent "cache stampedes." If millions of records all expired at exactly the same second, the authoritative servers would be crushed by a synchronized wave of queries. Jitter smears this load over a larger temporal window, which technically slows down "perfect" propagation but preserves internet stability.

DNS Propagation Simulator

Visualizing TTL expiration vs. instant updates

Authoritative

10.0.0.1

The "Source of Truth" where you change your IP.

Resolver

ISP CACHE

User ACached

10.0.0.1

45s

User BExpired

Unknown

User CCached

10.0.0.1

Try the Experiment

Click "Request" (Arrow) for each user to see them fetch IPs.
Click the Blue Header Button to update the global record.
Observe how User C (low TTL) updates quickly while User A stays stuck on old data.

Stale Data (Cache hasn't expired)

Fresh Data (Direct from Source)

3. Iterative vs. Recursive Resolution Mechanics

To understand propagation, one must understand how a query actually traverses the global hierarchy. There are two primary types of resolution, and their interaction determines how fast your change is "seen."

Recursive Resolution

The client asks the resolver (e.g., your ISP) to do all the work. The resolver handles the complexity and returns the final IP. This is where most caching occurs.

Iterative Resolution

The resolver asks the Root, then the TLD, then the Authoritative server. It follows the pointers (referrals) itself. Propagation is the process of these referrals pointing to new data.

The Iterative Loop is where "Propagation" begins. When you change your Name Servers (NS records), you are updating the referral pointer at the TLD level (e.g., the .com registry). This change has its own TTL, often 48 hours. If a recursive resolver has cached the old NS records, it will continue to query your old DNS provider even if your new provider has the correct records.

4. The Role of Glue Records

Glue records are A records provided by the parent zone for the name servers of the child zone. If your name servers are ns1.example.com and ns2.example.com, the registry needs the IP addresses of these servers to prevent a circular dependency.

Updating glue records is the slowest form of propagation. It requires a registry-level update, which then must propagate to all Root and TLD servers. This is why "moving your domain to a new host" takes longer than "pointing your domain to a new IP."

5. EDNS Client Subnet (ECS) and Geo-Propagation

EDNS Client Subnet (RFC 7871) allows recursive resolvers to pass a portion of the client's IP address to the authoritative server. This allows the server to give a geographically optimized response (e.g., a user in London gets a UK-based IP).

This complicates propagation because resolvers now cache records per subnet. If a user on one subnet triggers a lookup, that response is cached for everyone on that subnet. If you update your geo-steering rules, propagation might happen instantly for users in New York but remain stale for users in Los Angeles, depending on when each regional resolver node last refreshed its specific subnet cache.

7. Manual Cache Purging: Forced Propagation

While you cannot force an ISP to refresh its cache, the world's most popular public resolvers provide web-based tools to manually purge specific records. This is a critical step in "accelerating" propagation for users who rely on Google or Cloudflare.

Google Public DNS: Uses the Flush Cache tool to clear records from their global anycast network.
Cloudflare (1.1.1.1): Provides a similar API and web interface to invalidate stale DNS entries across their edge nodes.

By clearing these public caches, you can often reach 40-60% of the internet's users with your update in seconds rather than hours.

8. Negative Caching (The 'Not Found' error)

If someone asks for newsite.example.com before you've added it, the server says "NXDOMAIN" (doesn't exist). The resolver caches THAT failure.

9. The Strategy for Zero-Downtime Migration

Professional network architects don't "hope" for propagation; they control the temporal window of the transition. This is achieved through a process called TTL Stepping.

Phase	Action	Timing	Goal
1. Preparation	Lower TTL to 300s (5m)	T-minus 48 Hours	Ensure caches expire quickly on switch-day
2. Dual-Run	Keep old server live	Switch + 24 Hours	Catch "stray" lookups from aggressive caches
3. Stabilization	Raise TTL to 86400 (24h)	T-plus 48 Hours	Reduce server load and DNS query costs

10. The Anycast & BGP Convergence Intersection

Modern DNS performance relies heavily on IP Anycast. In an anycast configuration, multiple servers across the globe announce the same IP address via BGP (Border Gateway Protocol). When a recursive resolver (like Google 8.8.8.8) tries to reach your authoritative server, BGP routes the request to the "closest" node (usually measured in network hops or latency).

This creates a dual-layer propagation delay:

BGP Convergence: If you move your DNS hosting to a new provider, the BGP routes must propagate through the global routing table. This usually takes minutes but can be delayed by Route Flap Damping.
Geographic Divergence: A resolver in Tokyo might hit an anycast node that has received the update, while a resolver in London hits a node that is still serving the old zone file due to internal replication lag within the DNS provider's network.

P(t) = 1 - e^{-\lambda t}

The probability of a record being updated globally follows an exponential distribution where $\lambda$ is the inverse of the mean TTL.

11. DNSSEC: The "Chain of Trust" Propagation

When DNSSEC (Domain Name System Security Extensions) is enabled, propagation becomes significantly more complex. You aren't just updating an $\text{A}$ record; you are updating the cryptographic signatures that validate it.

DNSSEC propagation requires synchronization with the Parent Zone (e.g., the $\text{.com}$ or $\text{.net}$ registry). The DS record in the parent zone has its own TTL, often much higher than your child zone records (sometimes 24-48 hours). This means that even if your IP is updated, the security validation chain might still point to the old keys, causing a global outage during the "dead zone" of overlap.

12. GSLB Hydraulics and Zero-TTL Myths

Global Server Load Balancing (GSLB) uses DNS to steer traffic to the nearest healthy data center. To achieve fast failover, many administrators set their TTL to 0 or 30 seconds. This is a double-edged sword.

While a low TTL speeds up "propagation" (failover), it increases the Recursive Latency for every user. Instead of hitting a local ISP cache (0.5ms), every single request must travel to your authoritative server (potentially 100ms+).

13. Forensic TTL Analysis: Calculating Cache Age

You can verify exactly where a resolver is in its caching lifecycle using forensic `dig` analysis. By querying a recursive resolver repeatedly, you can observe the TTL decrementing in real-time.

14. RFC 8767: Serving Stale Data for Resiliency

A modern complication in DNS propagation is RFC 8767, which allows resolvers to serve "stale" (expired) data if the authoritative server is unreachable. If your DNS provider goes offline during your migration, resolvers won't simply stop resolving your site; they will continue to serve the old IP address indefinitely until your servers come back online.

This effectively pauses propagation. The "tail" of the propagation curve becomes much longer in unstable network conditions, as resolvers prioritize availability over freshness.

15. The Calculus of Convergence

To estimate the time required for 95% of global resolvers to see your update, we use a convergence model. Let $T$ be the TTL. Since users visit the site at random intervals, the time $t$ since their last cache refresh is uniformly distributed $U(0, T)$ .

The Propagation Delay Formula

\text{Avg Delay} = \frac{T}{2} + \text{Convergence}_{BGP} + \text{Lat}_{Sync}

Where:

$T$ : TTL in seconds.
$\text{Convergence}_{BGP}$ : Time for anycast routing to stabilize (typically 60-300s).
$\text{Lat}_{Sync}$ : Internal database replication lag of the DNS provider.

16. Case Study: The 2021 Global BGP/DNS Collapse

In October 2021, a major social media platform effectively "vanished" from the internet. The root cause was a BGP withdrawal of the routes to their authoritative DNS servers. Because the resolvers could no longer reach the authoritative servers, propagation (or in this case, de-propagation) happened at the speed of the TTL.

However, because many ISPs were using Negative Caching (caching the fact that the server was unreachable), the site remained invisible for hours after the BGP routes were restored. This illustrates the "Negative Propagation" effect, where errors are harder to flush than successful records.

17. Negative Caching and the SOA Record

When a DNS lookup fails (NXDOMAIN), the resolver caches that failure. This is called Negative Caching. The duration of this cache is defined by the SOA (Start of Authority) record's "Minimum TTL" field.

\text{Negative Cache Time} = \min(\text{SOA TTL}, \text{SOA Minimum})

If you create a new subdomain (e.g., beta.pingdo.net), and someone tries to visit it before you hit save, the resolver will remember that it doesn't exist. Even after you add the record, those users will still get a 404/NXDOMAIN error until the SOA's negative TTL expires. This is why "instant" propagation of new records often feels broken.

18. Lame Delegation: The Propagation Dead-End

A Lame Delegation occurs when a parent zone points to a name server that is not authoritative for the child zone. This often happens during migrations where the old host is deleted before the new host is fully propagated.

During this phase, resolvers will receive an error from the name server. Because this is a "hard failure" of the infrastructure, many resolvers will cache the unreachability of the domain, leading to a prolonged outage that persists long after the delegation is fixed.

19. Advanced Monitoring: The Global Dig

Professional DevOps teams don't rely on "What my computer sees." They use Global Propagation Checkers. These tools query hundreds of recursive resolvers in different countries and providers simultaneously.

20. DNS Prefetching and Cache Warming

To mitigate the effects of propagation lag, modern browsers and applications use DNS Prefetching. When a user hovers over a link, the browser preemptively resolves the DNS of that link's domain.

From a server-side perspective, "Cache Warming" is the practice of proactively querying major public resolvers (8.8.8.8, 1.1.1.1, 9.9.9.9) immediately after an IP update. By forcing these nodes to refresh their cache while traffic is low, you "pre-propagate" the change so that when the main wave of users arrives, the resolver already has the new IP in its local memory.

21. The Privacy Paradox of ECS

While EDNS Client Subnet (ECS) improves propagation speed for geo-targeted records, it introduces a significant privacy trade-off. By passing client subnets to authoritative servers, resolvers are leaking geographic metadata about users to third-party DNS providers.

Some privacy-focused resolvers (like 1.1.1.1 or specific configurations of Unbound) disable ECS. For a network engineer, this means that "Geo-Propagation" will fail for these users. A user in Germany using a privacy-hardened resolver might be routed to a US server because the authoritative server only sees the resolver's IP (likely in a different country) rather than the user's subnet.

22. Case Study: The 2016 Dyn Cyberattack

In 2016, a massive Mirai botnet DDoS attack against Dyn (a major DNS provider) illustrated the limits of propagation-based resilience. Because Dyn's authoritative servers were unreachable, resolvers worldwide could not refresh their caches.

For domains with long TTLs (e.g., 24 hours), the internet continued to work normally for most users. However, for domains with short TTLs (e.g., 60 seconds), the sites vanished almost instantly as their caches expired and could not be replenished. This event sparked a major industry shift toward "Secondary DNS" configurations, where two independent providers serve the same zone with identical TTLs to ensure propagation never stops, even if one provider is under attack.

23. Packet-Level Debugging with TCPDUMP

When all else fails and you need to know why a specific server isn't seeing your update, you must move to the packet level. Using tcpdump, you can intercept the DNS traffic and inspect the AA (Authoritative Answer) bit and the TTL field in the response.

24. DNS-over-HTTPS (DoH) and Application-Level Propagation

The rise of DNS-over-HTTPS (DoH) adds yet another layer of complexity to the propagation landscape. Because DoH allows applications (like Firefox or Chrome) to bypass the OS resolver and query a public resolver directly via an encrypted HTTPS tunnel, propagation becomes "application-specific."

An engineer might flush the OS cache and see the correct result in their terminal, but the browser—using a different DoH provider with its own caching logic—continues to show the old IP. This fragmentation of the resolution path means that troubleshooting propagation now requires verifying not just the network settings, but the specific application-level DNS configurations of the client.

Figure 4: Forensic visualization of global Anycast convergence and TTL state divergence across major Tier-1 network nodes.

25. CNAME Flattening and the "Apex Alias" Propagation Loop

Modern DNS providers (Cloudflare, Route53, Azure DNS) solve the RFC-compliance issue of having a CNAME at the zone apex (the root domain) using a technique called CNAME Flattening or Alias Records. While this allows example.com to point to a load balancer, it introduces a unique propagation behavior.

In standard CNAME resolution, the resolver gets a pointer and then performs a second lookup. In flattened resolution, the authoritative server performs the lookup on your behalf and returns the final $\text{A}/\text{AAAA}$ record directly.

Forensic Propagation Anomaly:

If your Load Balancer's IP changes, your DNS provider must update its flattened response. This creates a "double-caching" layer. The DNS provider caches the Load Balancer's IP internally, and the global recursive resolvers cache the DNS provider's response. If the internal cache of the DNS provider is stale, propagation will fail even if you have a 1-second TTL on your side. This is often seen in high-availability environments using AWS ALBs or Cloudfront.

26. Resolver Benchmarking: BIND vs. Unbound vs. PowerDNS

Not all resolvers obey the laws of physics (RFCs) equally. When analyzing propagation, we must account for the software-specific behaviors of the world's most common recursive engines.

BIND (Berkeley Internet Name Domain): The elder statesman of DNS. BIND strictly adheres to TTLs but can be prone to "Cache Pollution" if not configured with the minimal-responses flag. Its propagation speed is the global baseline.
Unbound: Built with a focus on security and performance. Unbound uses "Aggressive NSEC" (RFC 8198) which can actually slow down propagation of new records by proactively synthesizing NXDOMAIN responses based on existing NSEC records in its cache.
PowerDNS: Known for its flexibility. PowerDNS Recursor often employs "Cache Pre-fetching," where it refreshes popular records before they expire. This can make propagation feel "instant" for high-traffic sites but can lead to "Zombie Records" if the authoritative server is intermittently unreachable during the pre-fetch window.

27. The "Ghost Nameserver" Residual and TLD Glue Forensics

The most difficult propagation issue to diagnose is the Ghost Nameserver. This occurs when you update your Name Servers at your registrar, but the TLD (Top-Level Domain) registry (e.g., Verisign for .com) retains the old "Glue" records in their cache.

Because TLD servers (the ones that tell resolvers where example.com lives) have extremely high TTLs (often 48-72 hours), a subset of global resolvers may continue to see the old nameservers long after the "official" update. This creates an intermittent failure where some users reach the new site and others hit a "Domain Not Found" error, depending on which TLD node their resolver is hitting.

28. Multi-Cloud Sync and Propagation Race Conditions

In modern enterprise architectures, it is common to use Multi-Cloud DNS (e.g., Route53 + Azure DNS) for redundancy. Propagation in these environments is governed by the slowest common denominator.

When you update an IP, you must update it in both providers simultaneously. If there is a 5-minute lag in your automation script, you create a "Split-Brain" state where 50% of your users (those hitting Provider A) see the update, while the other 50% (those hitting Provider B) see the old IP. In a BGP-Anycast environment, this can lead to "Routing Flaps" where a user's browser rapidly switches between the two IPs, breaking session persistence and causing application errors.

29. DNS Record Type Divergence: A vs. TXT vs. MX

It is a common observation that $\text{A}$ records seem to propagate "faster" than $\text{TXT}$ or $\text{MX}$ records. This is not due to the DNS protocol itself, but rather the Refresh Frequency of the applications using them.

Web Browsers: Constantly query $\text{A}$ records and are highly optimized for cache invalidation.
Mail Servers (MTAs): Often have very aggressive internal caching for $\text{MX}$ records (sometimes ignoring the TTL entirely) to prevent mail loops.
CDN/WAF Verification: Services that use $\text{TXT}$ records for ownership verification often only check once every 24 hours, making "TXT propagation" feel agonizingly slow.

30. The "Zone Cut" Anomaly and Propagation Boundaries

A "Zone Cut" is the point in the DNS tree where responsibility for a subdomain is delegated to a different set of nameservers. For example, api.example.com might be a zone cut if it has its own NS records.

Propagation at the zone cut is subject to the Parent's TTL. If you decide to remove the delegation and turn api back into a simple $\text{A}$ record within the example.com zone, resolvers that have cached the NS records for api.example.com will continue to try and find a separate nameserver that no longer exists, resulting in a SERVFAIL.

Conclusion: Mastering the Distributed State

DNS propagation is not a mystery; it is a deterministic result of distributed state management. By lowering TTLs in advance, understanding the impact of BGP Anycast, managing the DNSSEC chain of trust, accounting for negative caching in the SOA record, and monitoring for TTL jitter, you can transform a "waiting game" into a precise engineering operation. The ultimate goal is Global Determinism—ensuring that the view of your infrastructure is consistent across every continent, provider, and device.

References & Technical Sources

[1]
P. Mockapetris (1987). Domain Names - Concepts and Facilities. IETF RFC 1034.
"Defines the fundamental concepts of DNS, including caching and TTL mechanisms."
Source Document
[2]
P. Mockapetris (1987). Domain Names - Implementation and Specification. IETF RFC 1035.
"Specifies the details of DNS resource records, including the TTL field."
Source Document
[3]
D. Lawrence et al. (2020). Serving Stale Data to Improve DNS Resiliency. IETF RFC 8767.
"Describes the practice of serving stale DNS data when authoritative servers are unreachable, impacting perceived propagation."
Source Document
[4]
C. Contavalli et al. (2016). Client Subnet in DNS Queries. IETF RFC 7871.
"Defines the EDNS Client Subnet option, which allows recursive resolvers to pass client network information to authoritative servers for geo-routing."
Source Document
[5]
S. Bortzmeyer et al. (2022). DNS Query Name Minimization to Improve Privacy. IETF RFC 9156.
"Explains how QNAME minimization affects the iterative lookup process and potential propagation edge cases."
Source Document

Compiled by the Pingdo Engineering Team for educational purposes.

Frequently Asked Questions

Does 1.1.1.1 update faster than my ISP?

Usually, YES. Cloudflare (1.1.1.1) and Google (8.8.8.8) have much shorter internal cache-lifetimes and better connections to the world's authoritative servers than local ISP routers.

Can I fix propagation by restarting my router?

It might help your LOCAL devices if your router had its own cache, but it won't fix the thousands of commuters or customers worldwide who use different providers.

Why do TTLs have weird numbers like 14400?

These are just seconds. 14400 is exactly 4 hours. 86400 is exactly 24 hours. Engineers use these round numbers to keep track of their network update-cycles.

In a Nutshell

1. The Myth of Propagation

2. The TTL Timer ( Simon Says )

DNS Propagation Simulator

Try the Experiment

3. Iterative vs. Recursive Resolution Mechanics

Recursive Resolution

Iterative Resolution

4. The Role of Glue Records

5. EDNS Client Subnet (ECS) and Geo-Propagation

7. Manual Cache Purging: Forced Propagation

8. Negative Caching (The 'Not Found' error)

9. The Strategy for Zero-Downtime Migration

10. The Anycast & BGP Convergence Intersection

11. DNSSEC: The "Chain of Trust" Propagation

12. GSLB Hydraulics and Zero-TTL Myths

13. Forensic TTL Analysis: Calculating Cache Age

14. RFC 8767: Serving Stale Data for Resiliency

15. The Calculus of Convergence

The Propagation Delay Formula

16. Case Study: The 2021 Global BGP/DNS Collapse

17. Negative Caching and the SOA Record

18. Lame Delegation: The Propagation Dead-End

19. Advanced Monitoring: The Global Dig

20. DNS Prefetching and Cache Warming

21. The Privacy Paradox of ECS

22. Case Study: The 2016 Dyn Cyberattack

23. Packet-Level Debugging with TCPDUMP

24. DNS-over-HTTPS (DoH) and Application-Level Propagation

25. CNAME Flattening and the "Apex Alias" Propagation Loop

Forensic Propagation Anomaly:

26. Resolver Benchmarking: BIND vs. Unbound vs. PowerDNS

27. The "Ghost Nameserver" Residual and TLD Glue Forensics

28. Multi-Cloud Sync and Propagation Race Conditions

29. DNS Record Type Divergence: A vs. TXT vs. MX

30. The "Zone Cut" Anomaly and Propagation Boundaries

Conclusion: Mastering the Distributed State

References & Technical Sources

Frequently Asked Questions

Does 1.1.1.1 update faster than my ISP?

Can I fix propagation by restarting my router?

Why do TTLs have weird numbers like 14400?