Finally, after 4+ weeks of dire internet connectivity problems at the Family TreeHouse, the problem has been found, and solved. Hallelujah!!
The problem began one morning over 4 weeks ago. We woke up to our alarm clocks blinking at us – an overnight power failure. Could have been a thunderstorm, or something else, but whatever it was, it didn’t wake either of us up. By checking mechanical clocks and the blinking time setting on some of our digital clocks, one can get an idea of when the outage happened, and for how long. This one looked to have been a very short one (seconds or less) about 3 am or so.
After getting ready for the day, I continued my usual morning routine and went down to my basement office to check email and look at on-line newspapers. Not an option this morning. My computer had ridden through the outage with flying colors (I have a small uninterruptible power supply – UPS – on the rig downstairs), but the Comcast cable modem wasn’t letting anyone out onto the internet. I didn’t have time that morning to troubleshoot it, so off to work I went.
That night I tried to figure out what was wrong, but it didn’t make any sense. I have a more-elaborate-than-usual home network setup (click here for a diagram), but it’s all hooked up in a standard manner, so I started removing parts from the chain, looking for the problem part.
First thing I did was to take the wireless router out of the chain, no luck. Then I took the switches out of the chain by wiring my computer directly to the router, still nothing. I spent quite a long time checking and re-checking the router configuration and the cables, but nothing seemed amiss. I tried swapping the wireless router for the wired router (either one can do the job by itself), but that didn’t work either. Finally I reset both routers to their factory defaults, still nothing. At this point I decided that the routers must’ve taken a hit, so I went out and bought another one ($35 or so, not that expensive these days), but that didn’t solve the problem either!
I found that if I hot-wired my computer (or any other single computer) directly to the cable modem, that worked. This just re-confirmed my conclusion that something was amiss with the routers, but a brand new router didn’t solve the problem, so I was getting VERY puzzled! I was also running out of time, because I was heading out Saturday for a business trip to Germany, and would be gone a week. I left my computer hot-wired to the cable modem, and set up accounts for Lynn and Audrey on my computer, so they could get internet access while I was gone.
I arrived home Friday night, and had Saturday and part of Sunday to work on the network problem again, before flying off on another business trip. I also had to do chore-like things like mowing the lawn and stuff like that, so I didn’t have tons of time. I tried all combinations of routers and switches, and I found that as long as there wasn’t a router in the chain (computer to cable modem worked, computer to switch to cable modem worked), I could get ONE and only one computer connected to the internet.
About this point in time, I started calling Comcast tech support to see if there was something else going on with the bigger network. The fact that I could get one computer connected to the cable modem working consistently was all they cared about, so they were washing their hands of it all. I even pulled my cable modem (a Motorola Surfboard 5100) out of the chain and brought it to the Comcast drop-off center, to be replaced with a new one (I pay $3 a month to rent their cable modem for just such a situation). The new modem is a 5120, newer model, but it still wouldn’t work with any of the routers.
My travel schedule was such that I was flying out on Sunday each week, attending meetings somewhere or other Monday through Friday, and then flying back Saturday. Flying in on Saturday and then back out again on Sunday doesn’t leave a lot of time for much of anything, never mind network troubleshooting, so I was getting nowhere with the problem, and Lynn and Audrey were getting frustrated too.
Finally, with a two-week break from traveling, I called Comcast again, went through the situation one more time, did some trivial power-on power-off things that I had done a hundred times before and they didn’t work then and they didn’t work now, and finally told them to schedule a technician visit, and I explained that if the technician couldn’t figure out what was wrong, I was going to call Verizon FIOS to see what they could do for me. That got their attention. I was also aware that the brand of router I was using was not the brand that Comcast provided when you bought the extra multi-computer service, so I was starting to get a little paranoid that Comcast had figured out how to tell what router you were using, and refuse to connect to user-owned devices!
The tech arrived this afternoon and I took 5 minutes to outline what was happening and what I had done so far. He had no additional ideas, but he checked the signal on the broadband cable (it was fine) and then called his boss back at the Comcast tech center in Boston. He tried to explain the situation, but had trouble remembering all the details (can’t blame him!) so he handed his cell phone to me and I did the explaining. The super-tech had me do some more power-off, power-on things (which didn’t work again) and then had me talk him through some of the settings for the router. Nothing seemed amiss. Finally he said that he was heading out on vacation in an hour or so, but he would swing by the house on his way home and see if he could figure it out. The first tech then excused himself and headed off to his next service call.
The super-tech showed up about 45 minutes later, carrying his tool-bag of tricks and a new Netgear router (the brand that Comcast installs when you pay for multi-computer support). I went through the gory details of what had happened and what the symptoms were and what I had tried one more time, and he said that this was sounding vaguely familiar but he couldn’t put his finger on it.
He connected his laptop to the router, and went through the configuration screens himself. Flipping quickly through the various screens, he saw something amiss finally… and it was bizarre.
The following is a bit geeky, but there’s no other way to explain it, so you’ll have to bear with me…
In the router listing for the primary Domain Name Service (DNS) server, there was the numerical IP address (192.168.xx.yy) for my internal network printer! He took that out, and everything worked! He then remembered where and why he had seen something similar, and it is strange.
Network printers are dumb devices, and one thing they do is they shout out their IP address on the local network at frequent and regular intervals. This is so all the computers on the local network can know that the printer is on and ready to print. The printer is on the UPS, so it didn’t shut down during the power failure.
The router is not on the UPS, so it quickly shut itself off, and then back on again due to the power outage. As a router “boots” it calls out on the internet on one side of the router and on the local network on the other side of the router for network information. One of the questions it shoots out on the internet side is “Are there any DNS servers out there?” and if there are they respond with their IP address and the router records that for later use. Evidently, just as the router shouted out it’s DNS question, my network printer shouted out it’s address, and the router mistakenly put that address in the configuration for the DNS server.
Now DNS servers are real important to the operation of the internet. DNS servers convert the human readable network addresses (like http://www.cnn.com) to computer readable IP addresses (like 22.214.171.124), but network printers don’t know how to do this, so my router was asking my network printer to translate network addresses to IP addresses and it was ignoring those requests, but without those translations, you can’t surf the internet!
Once the printer IP address was removed, everything worked perfectly! I SHOULD have noticed that mismatch, but I didn’t. What’s worse, as I brought in different routers, I hand-copied the configuration over to each new router, perpetuating the error! SHEEESH!