The state of the Internet as of January 2022

In a previous post, I wrote about working with The Internet Society to rewrite an IPv6 crawler. In this post, I wanted to share some of the results I found interesting from the most recent crawl of the top million websites (27th January 2022).

Website Records

This crawl found 1,734,877 published A records for website hostnames (note that it doesn't take the stem, so www.silvermou.se would be recognised but not silvermou.se - this was for backwards compatibility).

IPv4

Of the 1,734,877 records, 1,734,758 had A records, of which only 505,879 (29.16%) were unique.

102,889 (5.93%) of the A records were the same 7 IPs:

   8573 208.91.197.25,AS40034 CONFLUENCE-NETWORK-INC
  10098 23.227.38.74,AS13335 CLOUDFLARENET
  10611 199.59.243.200,AS16509 AMAZON-02
  14905 208.91.197.26,AS40034 CONFLUENCE-NETWORK-INC
  15671 188.114.97.0,AS13335 CLOUDFLARENET
  15776 188.114.96.0,AS13335 CLOUDFLARENET
  27255 209.99.64.71,AS40034 CONFLUENCE-NETWORK-INC

550,123 (31.71%) were registered as either Cloudflare, Amazon, Confluence, or Google:

  10032 AS22612 NAMECHEAP-NET
  10443 AS396982 GOOGLE-CLOUD-PLATFORM
  10674 n/a
  11354 AS54600 PEGTECHINC
  12490 AS26496 AS-26496-GO-DADDY-COM-LLC
  13046 AS8560 IONOS SE
  13415 AS2635 AUTOMATTIC
  15298 "AS37963 Hangzhou Alibaba Advertising Co.
  18499 "AS63949 Linode
  20667 AS395954 LEASEWEB-USA-LAX-11
  21504 AS54113 FASTLY
  22066 AS32244 LIQUIDWEB
  22208 AS8075 MICROSOFT-CORP-MSN-AS-BLOCK
  23476 AS25504 Vautron Rechenzentrum AG
  25872 AS46606 UNIFIEDLAYER-AS-1
  36156 AS14061 DIGITALOCEAN-ASN
  38157 AS24940 Hetzner Online GmbH
  38566 AS16276 OVH SAS
  46082 AS14618 AMAZON-AES
  61781 AS15169 GOOGLE
 107292 AS40034 CONFLUENCE-NETWORK-INC
 117007 AS16509 AMAZON-02
 200605 AS13335 CLOUDFLARENET

Statistics per A record

Of the 1,734,758 records returned:

  • 1,441,639 (83.10%) were reachable on both port 80 and port 443;
  • 231,094 (13.32%) were HTTP only;
  • 4,606 (0.27%) were HTTPS only;
  • 62,025 (3.58%) were unreachable on port 80;
  • 288,513 (16.63%) were unreachable on port 443;
  • 57,419 (3.31%) were not reachable on either port 80 or port 443.

Statistics per unique IPv4 address

To recap the above figures per unique IP address, of which there were 505,879:

  • 443,384 (87.65%) were reachable on both port 80 and port 443;
  • 39,386 (7.79%) were HTTP only;
  • 1,880 (0.37%) were HTTPS only;
  • 23,109 (4.57%) were unreachable on port 80;
  • 60,615 (11.98%) were unreachable on port 443;
  • 21,229 (4.20%) were not reachable on either port 80 or port 443.

IPv6

Of the 1,734,877 records, 292,671 (16.87%) had AAAA records, of which 102,742 (35.11%) were unique.

Statistics per AAAA record

Of the 292,671 records returned:

  • 280,551 (95.86%) were reachable on both port 80 and port 443;
  • 4,471 (1.53%) were HTTP only;
  • 783 (0.27%) were HTTPS only;
  • 7,649 (2.61%) were unreachable on port 80;
  • 11,337 (3.87%) were unreachable on port 443;
  • 6,866 (2.35%) were not reachable on either port 80 or port 443.

Statistics per unique IPv6 address

To recap the above figures per unique IPv6 address, of which there were 102,742:

  • 96,524 (93.95%) were reachable on both port 80 and port 443;
  • 1,782 (1.73%) were HTTP only;
  • 406 (0.40%) were HTTPS only;
  • 4,436 (4.32%) were unreachable on port 80;
  • 5,812 (5.66%) were unreachable on port 443;
  • 4,030 (3.92%) were not reachable on either port 80 or port 443.

Dual connectivity

  • 1,447,206 (83.17%) only had A records published;
  • 5,119 (0.29%) only had AAAA records published;
  • 280,757 of the 1,734,877 (16.18%) were reachable on both IPv4 and IPv6; 777 records (0.045%) were not reachable on either protocol.

Mailserver (MX) Records

The total number of domains with MX records published was 744,226, with 1,679,283 total MX records published.

1,202 MX records were set to localhost., and a further 20,964 did not resolve to a valid A or AAAA record. Discarding these left 735,875 domains and 1,658,319 MX records.

For the MX records:

  • 1,658,091 (99.99%) had corresponding A records;
  • 1,487,116 (89.68%) were reachable over IPv4;
  • 827,490 (49.90%) had corresponding AAAA records;
  • 817,936 (49.32%) were reachable over IPv6;
  • 228 (0.014%) had only AAAA records published;
  • 5,347 (0.32%) were not reachable over either IPv4 or IPv6;
  • 30,068 (1.8%) did not support STARTTLS.

For the domains:

  • 735,848 of the 735,875 (99.9963%) had at least one MX record with a corresponding A record;
  • 621,352 (84.44%) had at least one MX record which was reachable over IPv4 at the time of the crawl;
  • 199,282 of the 735,875 (27.08%) had at least one MX record with a corresponding AAAA record;
  • 192,766 (26.20%) had at least one MX record which was reachable over IPv6 at the time of the crawl.

752,436 of the 827,490 AAAA records (90.93%) were pointed to Google's mailservers.

Without Google the percentage of domains with IPv6 accessible mailservers would drop from 27.08% to 6.88%, and total number of MX records with AAAA records would drop from 49.90% to 8.17%.

Uniqueness of mailservers

  • Of the 1,658,319 MX records, only 434,810 (26.22%) were unique;
  • 751,618 (45.32%) of all published MX records were pointing to one of Google's MX servers;
  • An additional 24,750 (1.49%) were pointing to an IP address in AS15169 GOOGLE;
  • 148,845 domains (representing 20.23% of those with MX records) had at least one MX record pointing to Google;
  • 160,936 domains (21.87%) had at least one MX record pointing to AS15169 GOOGLE;
  • 104,357 domains (14.18%) had at least one MX record pointing to Outlook, and an additional 1,845 (0.25%) were pointing to another MX record in AS8075 MICROSOFT-CORP-MSN-AS-BLOCK;
  • This represents 36.3% of domains having at least one MX record pointing to either Outlook or Gmail.

I grouped the data by domain,AS entries to catch domains which have varied MX records, and was left with 800,527 entries, sorted as follows:

   2785 "AS52129 Proofpoint
   2893 AS198610 Beget LLC
   3163 AS31034 Aruba S.p.A.
   3376 AS13916 PROOFPOINT-UT7
   3378 "AS45102 Alibaba US Technology Co.
   3466 "AS63949 Linode
   3574 AS13335 CLOUDFLARENET
   3626 AS32244 LIQUIDWEB
   4171 AS42427 Mimecast Services Limited
   4758 AS25504 Vautron Rechenzentrum AG
   4794 AS14061 DIGITALOCEAN-ASN
   6016 AS2639 ZOHO-AS
   6637 AS27357 RACKSPACE
   7026 AS19994 RACKSPACE
   7062 AS26211 PROOFPOINT-ASN-US-WEST
   7549 AS14618 AMAZON-AES
   7586 AS136958 China Unicom Guangdong IP network
   8268 AS30031 MIMECAST
   8447 n/a
  10102 AS22843 PROOFPOINT-ASN-US-EAST
  10366 AS8560 IONOS SE
  11518 AS24940 Hetzner Online GmbH
  11765 AS13238 YANDEX LLC
  14874 AS22612 NAMECHEAP-NET
  15775 AS26496 AS-26496-GO-DADDY-COM-LLC
  16455 AS16276 OVH SAS
  17111 AS46606 UNIFIEDLAYER-AS-1
  28245 AS16509 AMAZON-02
 105971 AS8075 MICROSOFT-CORP-MSN-AS-BLOCK
 160936 AS15169 GOOGLE

The above shows that Google and Microsoft are dominating the mailserver market for the top million sites in this dataset.

Nameservers

  • There were 2,502,001 nameservers (DNS servers) returned;
  • 1,605,634 (64.17%) of these were accessible over IPv6;
  • Only 198,457 (7.93%) were unique;
  • There were 15,706 different providers according the AS information;
  • 554,168 (22.15%) were pointing to Cloudflare;
  • 1,006,524 (40.23%) were pointing to one of three providers - Cloudflare, Amazon, or Host Europe.
  39929 AS397213 ULTRADNS
  46505 AS16276 OVH SAS
  56039 AS15169 GOOGLE
  58225 AS16552 TIGGEE
  59154 AS8560 IONOS SE
 154298 AS44273 Host Europe GmbH
 298058 AS16509 AMAZON-02
 554168 AS13335 CLOUDFLARENET

Summary

The Internet is increasingly centralised and the majority of WWW and MX records point to the same few places - in the case of WWW a large part of this is due to Cloudflare, which of course masks the true hosting location, but these hosts are still relying on Cloudflare:

  • 5.93% of hosts use the same 7 IPv4 addresses - though this has actually decreased from 14.10% in 2010 - 9.74% were using Blogspot domains then;
  • 31.71%) were registered as either Cloudflare, Amazon, Confluence, or Google. In 2010, only 13.97% were resolving to the same 4 providers, and 30% share was split across 19 providers;
  • Of all of the 3,126,709 IP addresses collected (including for MX, NTP and NS), only 874,980 (27.98%) were unique;
  • IPv6 availability is still quite low, with only 16.83% of hosts having AAAA records published (though up from 1.53% in 2010);
  • 45.32% of all MX records pointed to Google, and 36.3% of all domains had at least one MX record pointing to either Google or Outlook - in 2010 this was 9.50% of domains;
  • Over 90% of the IPv6 reachable MX records were Google;
  • Only 7.93% of the 2,502,001 nameservers were unique (down from 15.34% in 2010), and over 40% of them pointed to one of three providers (up from 23% in 2010);
  • 17.04% of the domains were using one of three providers for at least one nameserver, up from 6.45% in 2010 - when 17% was split across 24 providers.